Discussion:
sata_sil24 broken since 2.6.23-rc4-mm1
Torsten Kaiser
2007-09-27 17:34:34 UTC
Permalink
I compared the dmesg form good and bad boots with -rc7-mm1 but could
not see any difference, so do you think that these additional
diagnostics could show a difference?
Or could you suggest any other debugging options I should try?
I think since its a reproducible problem, I think it's easiest to get
you straight to git-bisect. In this case, that would be
Sorry, but I don't think that will work.
It seems that I am able to reproduce the bug, but not reliable. And my
current best guess to make it happen involves the step "leaf the
computer powered off for 8 hours".
I estimate that even with the 8 hour pause only at ~50% of the boots
one drive fails. So I have no safe point to mark a bisect step as
'good'.
a) start with a known good point (v2.6.22? v2.6.23?) and
known bad point (HEAD, aka the most recent commit in
libata-dev.git#upstream)
Known good is for me 2.6.23-rc3-mm1, the first known bad is 2.6.23-rc4-mm1.
I will try to look at the diff between these revisions some more, but
the change in sata_sil24.c looked like a perfect match for the
symptoms I was seeing.

What I just noticed, as I wanted two re-add the drive to the RAID:
This time it was not sda, but sdb that was kicked. But otherwise the
errors are perfectly identical.

I will try to make a 2.6.23-rc3.5-mm1 to narrow it down some more...

Torsten
Tejun Heo
2007-09-27 20:22:28 UTC
Permalink
Post by Torsten Kaiser
Known good is for me 2.6.23-rc3-mm1, the first known bad is 2.6.23-rc4-mm1.
I will try to look at the diff between these revisions some more, but
the change in sata_sil24.c looked like a perfect match for the
symptoms I was seeing.
I think the first thing to do here is to verify 2.6.23-rc3-mm1 still
works fine and my previous debug patch is pretty much meaningless if
address initialization failure isn't the cause.

Thanks.
--
tejun
Torsten Kaiser
2007-09-28 05:36:26 UTC
Permalink
Post by Tejun Heo
Post by Torsten Kaiser
Known good is for me 2.6.23-rc3-mm1, the first known bad is 2.6.23-rc4-mm1.
I will try to look at the diff between these revisions some more, but
the change in sata_sil24.c looked like a perfect match for the
symptoms I was seeing.
I think the first thing to do here is to verify 2.6.23-rc3-mm1 still
works fine and my previous debug patch is pretty much meaningless if
address initialization failure isn't the cause.
After the first trouble with -rc4-mm1 I switched back to -rc3-mm1. I
booted that kernel 7 times over 4 days and never had trouble. (Before
-rc4-mm1 came out, I used -rc3-mm1 for over a week)

So in case of -rc3-mm1 I'm pretty sure that it works.

Not completely sure is if 2.6.23-rc7-sglist kernel works. I booted
that 9 times, but from a quick look in /var/log/messages, I might not
have hit the "correct" situation to trigger the error.
That kernel is vanilla 2.6.23-rc7 plus the patch from
http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2
( http://marc.info/?l=linux-ide&m=119055574826083&w=2 )

Torsten
Torsten Kaiser
2007-09-30 06:00:26 UTC
Permalink
Post by Torsten Kaiser
So in case of -rc3-mm1 I'm pretty sure that it works.
That's still the case.
Post by Torsten Kaiser
Not completely sure is if 2.6.23-rc7-sglist kernel works. I booted
that 9 times, but from a quick look in /var/log/messages, I might not
have hit the "correct" situation to trigger the error.
That kernel is vanilla 2.6.23-rc7 plus the patch from
http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2
( http://marc.info/?l=linux-ide&m=119055574826083&w=2 )
That is no longer the case. Yesterday this kernel did also show the failure.
The error looked a little bit different, but happend at the same
location during the bootup.
Sadly dmesg overflowed and I was not able to capture the first error.

[ 53.462632] Freeing unused kernel memory: 332k freed
ite cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 77.170903] ata1.00: exception Emask 0x40 SAct 0x1 SErr 0x0 action 0x2
[ 77.170905] ata1.00: irq_stat 0x00020002, SGT no on qword boundary
[ 77.170908] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
[ 77.170909] res 50/00:00:af:ea:42/00:00:25:00:00/e0 Emask
0x40 (internal error)
[ 77.504292] ata1: soft resetting port
[ 77.544221] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 77.642567] ata1.00: configured for UDMA/100
[ 77.642572] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE,SUGGEST_OK
[ 77.642574] sd 0:0:0:0: [sda] Sense Key : Aborted Command [current]
[descriptor]
[ 77.642578] Descriptor sense data with sense descriptors (in hex):
[ 77.642580] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 77.642584] 00 00 00 af
[ 77.642586] sd 0:0:0:0: [sda] Add. Sense: No additional sense information
[ 77.642589] end_request: I/O error, dev sda, sector 625137161
[ 77.642593] ata1: EH complete
[ 77.642609] sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
[ 77.642616] sd 0:0:0:0: [sda] Write Protect is off
[ 77.642618] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 77.642628] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 77.666927] md: super_written gets error=-5, uptodate=0
[ 77.666930] raid5: Disk failure on sda2, disabling device.
Operation continuing on 1 devices

That "Operation continuing on 1 devices" is a 'little' bit misleading.
A RAID5 with two failed devices will not continue to operate. :(
The same error is repeated several times, so I expect the first error
also looked like that.

Other things I have done to narrow it down:
Comparing 2.6.23-rc3-mm1 and 2.6.23-rc4-mm1 I found the following hunk
from libata-core.c:
@@ -2182,6 +2183,17 @@
if (ap->ops->cable_detect)
ap->cbl = ap->ops->cable_detect(ap);

+ /* We may have SATA bridge glue hiding here irrespective of the
+ reported cable types and sensed types */
+ ata_link_for_each_dev(dev, &ap->link) {
+ if (!ata_dev_enabled(dev))
+ continue;
+ /* SATA drives indicate we have a bridge. We don't know which
+ end of the link the bridge is which is a problem */
+ if (ata_id_is_sata(dev->id))
+ ap->cbl = ATA_CBL_SATA;
+ }
+

But removing this from the current -mm does still show the error, so
thats not the source of the bug. (That test kicked the first drive
(sdb) from the RAID5 with the same error as most times)

Rebooting into the "safe" rc7-sglist-kernel I got the above error that
killed the RAID5.
I rebooted into a system (kernel 2.6.21-rc5-mm2, please not the
2.6.*21*, that is only a rescue system and not updated often) on a
separate partition to rebuild it. When I tried to readd the failed sdb
the system locked up with this error:

Sep 29 22:25:37 treogen [ 205.407893] ata2.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 29 22:25:37 treogen [ 205.407900] ata2.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Sep 29 22:25:37 treogen [ 205.407901] res
40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (time out)
Sep 29 22:25:38 treogen [ 205.720531] ata2: soft resetting port
Sep 29 22:25:38 treogen [ 205.720554] ata2: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 29 22:25:38 treogen [ 205.730560] ATA: abnormal status 0xEA on
port 0xffffc20001432007
Sep 29 22:25:38 treogen [ 205.740560] ATA: abnormal status 0xEA on
port 0xffffc20001432007
Sep 29 22:26:07 treogen [ 235.343001] ata1.00: exception Emask 0x0
SAct 0x1 SErr 0x0 action 0x2 frozen
Sep 29 22:26:07 treogen [ 235.343007] ata1.00: cmd
61/08:00:bf:52:2a/00:00:01:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 29 22:26:07 treogen [ 235.343009] res
40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (time out)
Sep 29 22:26:08 treogen [ 235.655644] ata1: soft resetting port
Sep 29 22:26:08 treogen [ 235.655666] ata1: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 29 22:26:08 treogen [ 235.675602] ata2.00: qc timeout (cmd 0xec)
Sep 29 22:26:08 treogen [ 235.675608] ata2.00: failed to IDENTIFY
(I/O error, err_mask=0x104)
Sep 29 22:26:08 treogen [ 235.675610] ata2.00: revalidation failed (errno=-5)
Sep 29 22:26:08 treogen [ 235.675612] ata2: failed to recover some
devices, retrying in 5 secs
Sep 29 22:26:13 treogen [ 240.668113] ata2: hard resetting port
Sep 29 22:26:15 treogen [ 242.833438] ata2: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 29 22:26:15 treogen [ 242.843444] ATA: abnormal status 0xEC on
port 0xffffc20001432007
Sep 29 22:26:15 treogen [ 242.853443] ATA: abnormal status 0xEC on
port 0xffffc20001432007
Sep 29 22:26:38 treogen [ 265.590759] ata1.00: qc timeout (cmd 0xec)
Sep 29 22:26:38 treogen [ 265.590765] ata1.00: failed to IDENTIFY
(I/O error, err_mask=0x104)
Sep 29 22:26:38 treogen [ 265.590767] ata1.00: revalidation failed (errno=-5)
Sep 29 22:26:38 treogen [ 265.590769] ata1: failed to recover some
devices, retrying in 5 secs
Sep 29 22:26:43 treogen [ 270.583271] ata1: hard resetting port
Sep 29 22:26:45 treogen [ 272.748596] ata1: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 29 22:26:45 treogen [ 272.758601] ATA: abnormal status 0xEC on
port 0xffffc20001430007
Sep 29 22:26:45 treogen [ 272.768602] ATA: abnormal status 0xEC on
port 0xffffc20001430007
Sep 29 22:26:45 treogen [ 272.788490] ata2.00: qc timeout (cmd 0xec)
Sep 29 22:26:45 treogen [ 272.788494] ata2.00: failed to IDENTIFY
(I/O error, err_mask=0x104)
Sep 29 22:26:45 treogen [ 272.788496] ata2.00: revalidation failed (errno=-5)
Sep 29 22:26:45 treogen [ 272.788501] ata2.00: limiting speed to UDMA/100:PIO3
Sep 29 22:26:45 treogen [ 272.788503] ata2: failed to recover some
devices, retrying in 5 secs
Sep 29 22:26:50 treogen [ 277.781001] ata2: hard resetting port
Sep 29 22:26:52 treogen [ 279.946326] ata2: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 29 22:26:52 treogen [ 279.956331] ATA: abnormal status 0xEC on
port 0xffffc20001432007
Sep 29 22:26:52 treogen [ 279.966330] ATA: abnormal status 0xEC on
port 0xffffc20001432007
Sep 29 22:27:15 treogen [ 302.703652] ata1.00: qc timeout (cmd 0xec)
Sep 29 22:27:15 treogen [ 302.703661] ata1.00: failed to IDENTIFY
(I/O error, err_mask=0x104)
Sep 29 22:27:15 treogen [ 302.703663] ata1.00: revalidation failed (errno=-5)
Sep 29 22:27:15 treogen [ 302.703669] ata1.00: limiting speed to UDMA/100:PIO3
Sep 29 22:27:15 treogen [ 302.703671] ata1: failed to recover some
devices, retrying in 5 secs
Sep 29 22:27:20 treogen [ 307.696161] ata1: hard resetting port
Sep 29 22:27:22 treogen [ 309.884773] ata1: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Sep 29 22:27:22 treogen [ 309.894784] ATA: abnormal status 0xEC on
port 0xffffc20001430007
Sep 29 22:27:22 treogen [ 309.904786] ATA: abnormal status 0xEC on
port 0xffffc20001430007
Sep 29 22:27:22 treogen [ 309.904940] ata2.00: qc timeout (cmd 0xec)
Sep 29 22:27:22 treogen [ 309.904946] ata2.00: failed to IDENTIFY
(I/O error, err_mask=0x104)
Sep 29 22:27:22 treogen [ 309.904948] ata2.00: revalidation failed (errno=-5)
Sep 29 22:27:22 treogen [ 309.904951] ata2.00: disabled
Sep 29 22:27:23 treogen [ 310.406968] ata2: EH complete
Sep 29 22:27:23 treogen [ 310.407013] sd 1:0:0:0: [sdb] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 29 22:27:23 treogen [ 310.407018] end_request: I/O error, dev
sdb, sector 19550911

Is it possible that the bug is much older, and only some change in
rc3-mm->rc4-mm makes it visible that the initialization of the SiI3132
is incomplete?

Currently I try a 2.6.23-rc4-mm1 with only 629 of 1628 mm-patches applied...

Torsten
Tejun Heo
2007-09-30 14:34:58 UTC
Permalink
Hello, Torsten.
Post by Torsten Kaiser
Post by Torsten Kaiser
So in case of -rc3-mm1 I'm pretty sure that it works.
That's still the case.
Ah... that's weird. It would be much better if -rc3-mm1 is broken too. :-P
Post by Torsten Kaiser
Post by Torsten Kaiser
Not completely sure is if 2.6.23-rc7-sglist kernel works. I booted
that 9 times, but from a quick look in /var/log/messages, I might not
have hit the "correct" situation to trigger the error.
That kernel is vanilla 2.6.23-rc7 plus the patch from
http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2
( http://marc.info/?l=linux-ide&m=119055574826083&w=2 )
That is no longer the case. Yesterday this kernel did also show the failure.
The error looked a little bit different, but happend at the same
location during the bootup.
Sadly dmesg overflowed and I was not able to capture the first error.
[ 53.462632] Freeing unused kernel memory: 332k freed
ite cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 77.170903] ata1.00: exception Emask 0x40 SAct 0x1 SErr 0x0 action 0x2
[ 77.170905] ata1.00: irq_stat 0x00020002, SGT no on qword boundary
[ 77.170908] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
[ 77.170909] res 50/00:00:af:ea:42/00:00:25:00:00/e0 Emask
0x40 (internal error)
Hmm... SGT not on qword boundary? Please add the following to the end
of sil24_port_start() and report successful and failed kernel boot log.

printk("XXX sil24 cb=%p cb_dma=%llx\n",
cb, (unsigned long long)cb_dma);

Also, does 'dmesg -s 10000000' give full dmesg?
Post by Torsten Kaiser
I rebooted into a system (kernel 2.6.21-rc5-mm2, please not the
2.6.*21*, that is only a rescue system and not updated often) on a
separate partition to rebuild it. When I tried to readd the failed sdb
Sep 29 22:25:37 treogen [ 205.407893] ata2.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 29 22:25:37 treogen [ 205.407900] ata2.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Sep 29 22:25:37 treogen [ 205.407901] res
40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (time out)
Is it possible that the bug is much older, and only some change in
rc3-mm->rc4-mm makes it visible that the initialization of the SiI3132
is incomplete?
I can't tell but there is a pretty large userbase of sil24/32 and you
seem to be the only one to report this problem yet. I think it might be
coming somewhere else than libata or sata_sil24 itself. Hmmm... It
would be really great if you can break -rc3-mm1 too. Correct behavior
for something like that for just one -mm version sounds very weird.

Thanks.
--
tejun
Torsten Kaiser
2007-09-30 16:19:19 UTC
Permalink
Post by Tejun Heo
Hello, Torsten.
Post by Torsten Kaiser
Post by Torsten Kaiser
So in case of -rc3-mm1 I'm pretty sure that it works.
That's still the case.
Ah... that's weird. It would be much better if -rc3-mm1 is broken too. :-P
It would be much better if it would break all the time. Then bisecting
would not be guesswork. :-P
Post by Tejun Heo
Post by Torsten Kaiser
Post by Torsten Kaiser
Not completely sure is if 2.6.23-rc7-sglist kernel works. I booted
that 9 times, but from a quick look in /var/log/messages, I might not
have hit the "correct" situation to trigger the error.
That kernel is vanilla 2.6.23-rc7 plus the patch from
http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2
( http://marc.info/?l=linux-ide&m=119055574826083&w=2 )
That is no longer the case. Yesterday this kernel did also show the failure.
The error looked a little bit different, but happend at the same
location during the bootup.
Sadly dmesg overflowed and I was not able to capture the first error.
[ 53.462632] Freeing unused kernel memory: 332k freed
ite cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 77.170903] ata1.00: exception Emask 0x40 SAct 0x1 SErr 0x0 action 0x2
[ 77.170905] ata1.00: irq_stat 0x00020002, SGT no on qword boundary
[ 77.170908] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
[ 77.170909] res 50/00:00:af:ea:42/00:00:25:00:00/e0 Emask
0x40 (internal error)
Hmm... SGT not on qword boundary? Please add the following to the end
of sil24_port_start() and report successful and failed kernel boot log.
printk("XXX sil24 cb=%p cb_dma=%llx\n",
cb, (unsigned long long)cb_dma);
Added to my current "2.6.23-rc4-mm0.5"
I will report it's output.
Post by Tejun Heo
Also, does 'dmesg -s 10000000' give full dmesg?
That boot ended in a minimal initrd environment that normally only
starts the RAID5 and then opens contained encrypted real root.
I was just able to push the output from dmesg through the serial link,
but had no man pages to tell me about -s ...
And that kind of error was until now a one-of-a-kind one. All other
errors where not "internal error" but "timeout".
But one time I had another SGT related error:
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: exception Emask 0x20
SAct 0x1 SErr 0x0 action 0x2
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: irq_stat 0x00020002,
PCI master abort while fetching SGT
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 11 19:19:24 treogen [ 33.340000] res
50/00:00:af:ea:42/00:00:25:00:00/e0 Emask 0x20 (host bus error)

What I find kind of interessing is, that while I got three different
error codes the cmd part of the output was always the same.
Post by Tejun Heo
Post by Torsten Kaiser
Is it possible that the bug is much older, and only some change in
rc3-mm->rc4-mm makes it visible that the initialization of the SiI3132
is incomplete?
I can't tell but there is a pretty large userbase of sil24/32 and you
seem to be the only one to report this problem yet. I think it might be
coming somewhere else than libata or sata_sil24 itself. Hmmm... It
would be really great if you can break -rc3-mm1 too. Correct behavior
for something like that for just one -mm version sounds very weird.
It's not just 2.6.23-rc4-mm1. All -mm's after rc4 are broken for me.
Confirmed breakage on -rc4-mm1, -rc6-mm1 and -rc8-mm1. I'm just
narrowing on rc4-mm1 because that was the first version to break.

I'm currently trying to bisect 2.6.23-rc4-mm1. Here is the current status:

[the 2.6.23-rc4-mm1 series-file has 2013 lines]
Up to (incl.) x86_64-convert-to-clockevents.patch (line 747): 2 good boots
Up to (incl.) x86_64-cleanup-struct-irqaction-initializers-patch
(line779): 2 good boots
Up to (incl.) slub-optimize-cacheline-use-for-zeroing.patch (line
1045): 1 failed
Up to (incl.) fix-discrepancy-between-vdso-based... (line1461): 1 good, 1 failed

Next try: up to patch fs-remove-some-aop_truncated_page.patch

That means from the patches added to the rc4 variant of the mm-kernel
the following are remaining:
git-xfs-build-fix.patch
enforce-noreplace-smp-in-alternative_instructions.patch
paravirt-fix-preemptible-lazy-mode-bug.patch
i386-apic-fix-4-bit-apicid-assumption-of-mach-default.patch
fix-the-max-path-calculation-in-radix-treec-update.patch
mm-no-need-to-cast-vmalloc-return-value-in-zone_wait_table_init.patch
introduce-write_begin-write_end-aops-fix2.patch
implement-simple-fs-aops-fix.patch
ext2-convert-to-new-aops-fix2.patch
ext3-convert-to-new-aops-fix-fix.patch
ext4-convert-to-new-aops-fix-fix.patch
gfs2-convert-to-new-aops-fix.patch
reiserfs-convert-to-new-aops-fix2.patch
hostfs-convert-to-new-aops-fix-fix.patch
ufs-convert-to-new-aops-fix2.patch
sysv-convert-to-new-aops-fix2.patch
minix-convert-to-new-aops-fix2.patch
affs-convert-to-new-aops-fix-fix.patch
memoryless-nodes-add-n_cpu-node-state-move-setup-of-n_cpu-node-state-mask.patch
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix.patch
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix-2.patch
update-n_high_memory-node-state-for-memory-hotadd.patch
slub-avoid-page-struct-cacheline-bouncing-due-to-remote-frees-to-cpu-slab.patch
slub-do-not-use-page-mapping.patch
slub-move-page-offset-to-kmem_cache_cpu-offset.patch
slub-avoid-touching-page-struct-when-freeing-to-per-cpu-slab.patch
slub-place-kmem_cache_cpu-structures-in-a-numa-aware-way.patch
slub-optimize-cacheline-use-for-zeroing.patch

But due to the unreliable nature of the bug, I can't be to sure about that.

Next version is compiled, now again switching the PC off for an hour...

Torsten
Tejun Heo
2007-09-30 17:39:48 UTC
Permalink
Post by Torsten Kaiser
That boot ended in a minimal initrd environment that normally only
starts the RAID5 and then opens contained encrypted real root.
I was just able to push the output from dmesg through the serial link,
but had no man pages to tell me about -s ...
And that kind of error was until now a one-of-a-kind one. All other
errors where not "internal error" but "timeout".
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: exception Emask 0x20
SAct 0x1 SErr 0x0 action 0x2
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: irq_stat 0x00020002,
PCI master abort while fetching SGT
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 11 19:19:24 treogen [ 33.340000] res
50/00:00:af:ea:42/00:00:25:00:00/e0 Emask 0x20 (host bus error)
What I find kind of interessing is, that while I got three different
error codes the cmd part of the output was always the same.
That's NCQ write command. You'll be using it a lot if you're rebuilding
md5. It seems like something is going wrong with request DMA or sg
mapping. Maybe some change in block/*.[hc]?
Post by Torsten Kaiser