[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wdc.c retries

I'm still finding and fixing bugs in the wdc disk driver.

Bug no 1:
		if (wdc->sc_errors == (WDIORETRIES + 1) / 2) {

Note that WDIORETRIES is 3, so this test is true when the error count is 2.
wdcunwedge increments the count, so when wdcstart gets called from
wdcrestart, the error count is 3.  So even though the controller has now
been reset and is ready to go, ata_start doesn't retry the operation, it
just aborts it.

So I fixed that and it still failed.  Why?  Bug no 2.  Every time you go
through ata_start on a retry, this code runs:

		blkno = xfer->c_blkno+xfer->c_p_offset;
		xfer->c_blkno = blkno / (d_link->sc_lp->d_secsize / DEV_BSIZE);

Obviously this only works the first time.  Next time through, blkno has
already been translated, and it gets translated again.

So I fixed that, and we come to bug no 3.  The error count gets set to 0
every time we do a RECAL:

	case READY:
		wdc->sc_errors = 0;

Now since we're doing a RECAL on every unwedge, and we do an unwedge every
time the error count gets to 2, we will never exceed the maximum retry count
WDIORETRIES.  So on a real disk error, one that isn't corrected by
unwedging, we'll just loop infinitely through the reset/RECAL/restart code.

I think I've got this all fixed now, and I've tested it, with atapi too, but
I'd feel better if more people could test it, particularly with atapi.
Patches are at http://www.citi.umich.edu/u/rees/openbsd/wdc22.diff.

Visit your host, monkey.org