Outils pour utilisateurs

Outils du site


raspberry:usb_issue (lu 16961 fois)

Problème usb

Sur mon raspberry j’ai un disque dur externe de 2To alimenté via un bloc externe.
Depuis quelques mois la liaison usb décrochait et causait des dégats sur le système qui ne retrouvait plus le disque.

Je retrouvais ces messages là

[11479.843107] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11511.710777] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11513.930620] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11515.020540] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11549.748010] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11556.557534] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11559.287315] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11566.366811] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11570.016534] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11589.535109] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11606.593871] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11813.328845] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11816.058738] usb 1-1.2: reset high-speed USB device number 4 using dwc_otg
[11816.100080] usb 1-1.2: USB disconnect, device number 4

et ensuite mon filesysteme btrfs pétait un plomb

[11816.103345] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[11816.103409] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
[11816.103434] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
[11816.103469] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
[11816.103490] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
[11816.103521] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
[11816.103542] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
[11816.103571] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
[11816.103592] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
[11816.103622] BTRFS error (device sda1): bdev /dev/sda1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
[11816.111223] scsi 0:0:0:0: rejecting I/O to offline device
[11816.111253] scsi 0:0:0:0: [sda] killing request
[11816.111291] scsi 0:0:0:0: rejecting I/O to offline device
[11816.111305] scsi 0:0:0:0: [sda] killing request
[11816.111320] scsi 0:0:0:0: rejecting I/O to dead device
[11816.111382] scsi 0:0:0:0: rejecting I/O to dead device
[11816.111414] scsi 0:0:0:0: rejecting I/O to dead device
[11816.111449] scsi 0:0:0:0: rejecting I/O to dead device
[11816.111477] scsi 0:0:0:0: rejecting I/O to dead device
[11816.115711] scsi 0:0:0:0: rejecting I/O to dead device
[11816.115980] scsi 0:0:0:0: [sda] UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[11816.116008] scsi 0:0:0:0: [sda] CDB: opcode=0x28 28 00 2a fc 1d 98 00 00 08 00
[11816.116028] blk_update_request: I/O error, dev sda, sector 721165720
[11816.116127] scsi 0:0:0:0: [sda] UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00
[11816.116150] scsi 0:0:0:0: [sda] CDB: opcode=0x28 28 00 2c 00 ef 40 00 00 08 00
[11816.116162] blk_update_request: I/O error, dev sda, sector 738258752

L’usb retrouvait mon disque mais il passait de sda à sdb

[11816.438677] usb 1-1.2: new high-speed USB device number 5 using dwc_otg
[11816.570665] usb 1-1.2: New USB device found, idVendor=152d, idProduct=2329
[11816.570688] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=5
[11816.570699] usb 1-1.2: Product: USB to ATA/ATAPI bridge
[11816.570708] usb 1-1.2: Manufacturer: JMicron
[11816.570717] usb 1-1.2: SerialNumber: 1131ADDDFFFF
[11816.584102] usb-storage 1-1.2:1.0: USB Mass Storage device detected
[11816.589639] usb-storage 1-1.2:1.0: Quirks match for vid 152d pid 2329: 8020
[11816.596653] scsi host1: usb-storage 1-1.2:1.0
[11817.671767] scsi 1:0:0:0: Direct-Access     Hitachi  HDS722020ALA330       PQ: 0 ANSI: 2 CCS
[11817.760386] sd 1:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[11817.761243] sd 1:0:0:0: [sdb] Write Protect is off
[11817.761269] sd 1:0:0:0: [sdb] Mode Sense: 28 00 00 00
[11817.762087] sd 1:0:0:0: [sdb] No Caching mode page found
[11817.762106] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[11817.780606]  sdb: sdb1
[11817.791676] sd 1:0:0:0: [sdb] Attached SCSI disk
[11817.913460] sd 1:0:0:0: Attached scsi generic sg0 type 0

btrfs ne comprenait plus rien, une partition sdb1 monté en tant que sda1 …. tout ça fini par un crash

[11821.149491] btrfs_dev_stat_print_on_error: 1713 callbacks suppressed
[11821.149523] BTRFS error (device sda1): bdev /dev/sdb1 errs: wr 995, rd 729, flush 0, corrupt 0, gen 0
[11821.149668] BTRFS error (device sda1): bdev /dev/sdb1 errs: wr 995, rd 730, flush 0, corrupt 0, gen 0
[11821.151817] BTRFS error (device sda1): bdev /dev/sdb1 errs: wr 995, rd 731, flush 0, corrupt 0, gen 0
[11821.151929] ------------[ cut here ]------------
[11821.153107] WARNING: CPU: 0 PID: 1369 at fs/btrfs/extent-tree.c:6961 __btrfs_free_extent+0xa74/0x114c [btrfs]
[11821.153122] BTRFS: Transaction aborted (error -5)
[11821.153128] Modules linked in: sg xt_multiport iptable_filter ip_tables x_tables snd_bcm2835 snd_pcm snd_seq snd_seq_device snd_timer snd btrfs xor raid6_pq zlib_deflate bcm2835_gpiomem uio_pdrv_genirq fixed uio
[11821.153210] CPU: 0 PID: 1369 Comm: btrfs-transacti Not tainted 4.9.30+ #1001
[11821.153217] Hardware name: BCM2835
[11821.153276] [<c0016148>] (unwind_backtrace) from [<c0013c90>] (show_stack+0x20/0x24)
[11821.153304] [<c0013c90>] (show_stack) from [<c03193d8>] (dump_stack+0x20/0x28)
[11821.153330] [<c03193d8>] (dump_stack) from [<c0021ccc>] (__warn+0xe4/0x10c)
[11821.153347] [<c0021ccc>] (__warn) from [<c0021d3c>] (warn_slowpath_fmt+0x48/0x50)
[11821.154442] [<c0021d3c>] (warn_slowpath_fmt) from [<bf060c38>] (__btrfs_free_extent+0xa74/0x114c [btrfs])
[11821.156592] [<bf060c38>] (__btrfs_free_extent [btrfs]) from [<bf0669e0>] (__btrfs_run_delayed_refs+0xb18/0x1604 [btrfs])
[11821.158698] [<bf0669e0>] (__btrfs_run_delayed_refs [btrfs]) from [<bf06b0cc>] (btrfs_run_delayed_refs+0x88/0x2c8 [btrfs])
[11821.160937] [<bf06b0cc>] (btrfs_run_delayed_refs [btrfs]) from [<bf086af4>] (btrfs_commit_transaction+0x34/0xdbc [btrfs])
[11821.162969] [<bf086af4>] (btrfs_commit_transaction [btrfs]) from [<bf0815e0>] (transaction_kthread+0x160/0x174 [btrfs])
[11821.164173] [<bf0815e0>] (transaction_kthread [btrfs]) from [<c003e854>] (kthread+0xf8/0x114)
[11821.164221] [<c003e854>] (kthread) from [<c000fec8>] (ret_from_fork+0x14/0x2c)
[11821.164234] ---[ end trace 81275073096d8f81 ]---
[11821.164252] BTRFS: error (device sda1) in __btrfs_free_extent:6961: errno=-5 IO failure
[11821.174121] BTRFS info (device sda1): forced readonly
[11821.174155] BTRFS: error (device sda1) in btrfs_run_delayed_refs:2967: errno=-5 IO failure
[28538.854567] BTRFS error (device sda1): bdev /dev/sdb1 errs: wr 995, rd 732, flush 0, corrupt 0, gen 0
[28538.854850] BTRFS error (device sda1): bdev /dev/sdb1 errs: wr 995, rd 733, flush 0, corrupt 0, gen 0
[28538.855066] BTRFS error (device sda1): bdev /dev/sdb1 errs: wr 995, rd 734, flush 0, corrupt 0, gen 0
[28538.855216] BTRFS error (device sda1): bdev /dev/sdb1 errs: wr 995, rd 735, flush 0, corrupt 0, gen 0

Après quelques recherche, j’espère avoir trouver une solution, rajouter dans le /boot/cmdline.txt le paramètre suivant

dwc_otg.speed=1

Je verrai par la suite si ça résout mon problème….

Cette solution semble marcher :)

Maintenant je suis confronté à un autre problème

[32398.188252] INFO: task kworker/u2:1:13577 blocked for more than 120 seconds.
[32398.188288]       Not tainted 4.1.19+ #858
[32398.188301] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[32398.188315] kworker/u2:1    D c0534afc     0 13577      2 0x00000000
[32398.189053] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[32398.189174] [<c0534afc>] (__schedule) from [<c0534e44>] (schedule+0x4c/0xa4)
[32398.189213] [<c0534e44>] (schedule) from [<c0537224>] (schedule_timeout+0x180/0x1f8)
[32398.189241] [<c0537224>] (schedule_timeout) from [<c0535864>] (wait_for_common+0x9c/0x150)
[32398.189268] [<c0535864>] (wait_for_common) from [<c0535938>] (wait_for_completion+0x20/0x24)
[32398.189698] [<c0535938>] (wait_for_completion) from [<bf05a360>] (btrfs_async_run_delayed_refs+0x100/0x134 [btrfs])
[32398.190447] [<bf05a360>] (btrfs_async_run_delayed_refs [btrfs]) from [<bf079bfc>] (__btrfs_end_transaction+0x248/0x368 [btrfs])
[32398.191306] [<bf079bfc>] (__btrfs_end_transaction [btrfs]) from [<bf079d38>] (btrfs_end_transaction+0x1c/0x20 [btrfs])
[32398.192167] [<bf079d38>] (btrfs_end_transaction [btrfs]) from [<bf083a6c>] (btrfs_finish_ordered_io+0x174/0x67c [btrfs])
[32398.193016] [<bf083a6c>] (btrfs_finish_ordered_io [btrfs]) from [<bf0842a4>] (finish_ordered_fn+0x1c/0x20 [btrfs])
[32398.193938] [<bf0842a4>] (finish_ordered_fn [btrfs]) from [<bf0b2634>] (normal_work_helper+0x8c/0x358 [btrfs])
[32398.194936] [<bf0b2634>] (normal_work_helper [btrfs]) from [<bf0b2c18>] (btrfs_endio_write_helper+0x1c/0x20 [btrfs])
[32398.195510] [<bf0b2c18>] (btrfs_endio_write_helper [btrfs]) from [<c0038e7c>] (process_one_work+0x11c/0x394)
[32398.195551] [<c0038e7c>] (process_one_work) from [<c0039134>] (worker_thread+0x40/0x4d4)
[32398.195585] [<c0039134>] (worker_thread) from [<c003e5dc>] (kthread+0xdc/0xf8)
[32398.195623] [<c003e5dc>] (kthread) from [<c000f858>] (ret_from_fork+0x14/0x3c)
[32398.195646] INFO: task kworker/u2:3:13579 blocked for more than 120 seconds.
[32398.195662]       Not tainted 4.1.19+ #858
[32398.195673] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[32398.195687] kworker/u2:3    D c0534afc     0 13579      2 0x00000000
[32398.196208] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]

Apparemment, pendant mon backup, il y a trop d’IO disk et ça fini par crasher (pour faire simple)

Une très bonne doc explique tout ça avec plusieurs solutions ⇒ https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/

Je vais tester ceci ⇒ https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ qui permet, si j’ai bien compris, de réduire la mise en cache des fichiers disques et d’écrire plus souvent sur le disque.


Le problème semble encore là et la lenteur semble être bien présente

# hdparm -t -T /dev/sda

/dev/sda:
 Timing cached reads:     8 MB in  2.01 seconds =   3.99 MB/sec
 Timing buffered disk reads:   4 MB in  3.82 seconds =   1.05 MB/sec

d’ailleurs un test d’écriture avec dd à planté..

# dd if=/dev/zero of=/media/memup/test.tmp bs=1M count=1024 conv=fdatasync
INFO: task kworker/u2:3:24564 blocked for more than 120 seconds.
      Not tainted 4.1.19+ #858
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u2:3    D c0534afc     0 24564      2 0x00000000
Workqueue: writeback bdi_writeback_workfn (flush-btrfs-3)
[<c0534afc>] (__schedule) from [<c0534e44>] (schedule+0x4c/0xa4)
[<c0534e44>] (schedule) from [<bf05458c>] (wait_block_group_cache_progress+0xb8/0xf0 [btrfs])
[<bf05458c>] (wait_block_group_cache_progress [btrfs]) from [<bf060b78>] (find_free_extent+0x4c4/0xec8 [btrfs])
[<bf060b78>] (find_free_extent [btrfs]) from [<bf061730>] (btrfs_reserve_extent+0x78/0x204 [btrfs])
[<bf061730>] (btrfs_reserve_extent [btrfs]) from [<bf080b54>] (cow_file_range+0x1d4/0x810 [btrfs])
[<bf080b54>] (cow_file_range [btrfs]) from [<bf082120>] (run_delalloc_range+0x38c/0x45c [btrfs])
[<bf082120>] (run_delalloc_range [btrfs]) from [<bf09d050>] (writepage_delalloc+0x174/0x230 [btrfs])
[<bf09d050>] (writepage_delalloc [btrfs]) from [<bf09e50c>] (__extent_writepage+0xe4/0x2f0 [btrfs])
[<bf09e50c>] (__extent_writepage [btrfs]) from [<bf0a0a4c>] (extent_writepages+0x2c4/0x3bc [btrfs])
[<bf0a0a4c>] (extent_writepages [btrfs]) from [<bf07ccf4>] (btrfs_writepages+0x30/0x38 [btrfs])
[<bf07ccf4>] (btrfs_writepages [btrfs]) from [<c00e0e50>] (do_writepages+0x34/0x48)
[<c00e0e50>] (do_writepages) from [<c014c060>] (__writeback_single_inode+0x40/0x2ac)
[<c014c060>] (__writeback_single_inode) from [<c014c44c>] (writeback_sb_inodes+0x180/0x390)
[<c014c44c>] (writeback_sb_inodes) from [<c014c6e8>] (__writeback_inodes_wb+0x8c/0xd0)
[<c014c6e8>] (__writeback_inodes_wb) from [<c014c90c>] (wb_writeback+0x1e0/0x27c)
[<c014c90c>] (wb_writeback) from [<c014d030>] (bdi_writeback_workfn+0x1d0/0x474)
[<c014d030>] (bdi_writeback_workfn) from [<c0038e7c>] (process_one_work+0x11c/0x394)
[<c0038e7c>] (process_one_work) from [<c0039134>] (worker_thread+0x40/0x4d4)
[<c0039134>] (worker_thread) from [<c003e5dc>] (kthread+0xdc/0xf8)
[<c003e5dc>] (kthread) from [<c000f858>] (ret_from_fork+0x14/0x3c)

Conclusion, en fait le problème venait d’une défaillance du disque. Le problème était visible dans le rapport SMART. Une usure prématuré rendant les performances du disques appauvri.

Le disque avait les mêmes symptôme branché sur un PC, la génération d’un fichier avec dd échouait. Suite à ça, j’ai racheté un nouveau disque et fait l’erreur de faire un clonage de disque.

Pour une raison que j’ignore, les mêmes symptomes étaient présent sur la nouvelle partition cloné, impossible de faire un dd.

Finalement, création d’une partition toute fraiche et copie des fichiers via rsync, maintenant tout fonctionne comme avant, voir mieux :)

raspberry/usb_issue.txt · Dernière modification: 16-06-2017 00:03 de edmc73