[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

kernel/1443: system detect non-existent ST506 drive.




>Number:         1443
>Category:       kernel
>Synopsis:       system detect non-existent ST506 drive.
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bugs
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 12 18:50:01 MDT 2000
>Last-Modified:
>Originator:     Grigoriy Orlov
>Organization:
>Release:        2.8-current and older.
>Environment:
	System      : OpenBSD 2.8-current 07.09.2000
	Architecture: OpenBSD.i386
	Machine     : i686
>Description:

Sometimes kernel detect non-existent drive and report it as ST506 drive.
Then kernel fail to read sector from drive:
dkcsum: wd0 matched BIOS disk 80
wd1(pciide0:1:0): timeout
        type: ata
        c_bcount: 512
        c_skip: 0
pciide0:1:0: recal timed out
wd1c: device timeout reading fsbn 0 (wd1 bn 0; cn 0 tn 0 sn 0), retrying
wd1(pciide0:1:0): timeout
        type: ata
        c_bcount: 512
        c_skip: 0
pciide0:1:0: recal timed out
wd1c: device timeout reading fsbn 0 (wd1 bn 0; cn 0 tn 0 sn 0), retrying
wd1(pciide0:1:0): timeout
        type: ata
        c_bcount: 512
        c_skip: 0

In my configuration kernel going to ddb prompt. Dumping core impossible
because dump area is not yet configured.

This problem was also pointed in mailing lists:

http://www.geocrawler.com/archives/3/262/2000/8/0/4219511/
http://www.geocrawler.com/archives/3/254/2000/9/0/4348413/

My GENERIC dmesg:

OpenBSD 2.8-beta (GENERIC) #9: Thu Oct 12 22:16:53 MSD 2000
    gluk@bastion.ptci.ru:/sys/arch/i386/compile/GENERIC
cpu0: Intel Pentium II (Celeron) ("GenuineIntel" 686-class, 128KB L2 cache) 451 MHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real mem  = 133804032 (130668K)
avail mem = 119218176 (116424K)
using 1658 buffers containing 6791168 bytes (6632K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(01) BIOS, date 11/25/99, BIOS32 rev. 0 @ 0xfb3b0
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery charge unknown
pcibios0 at bios0: rev. 2.1 found at 0xfb3e0
pcibios0: PCI IRQ Routing Table rev. 1.0 found at 0xfdf00, size 144 bytes (7 entries)
pcibios0: PCI Interrupt Router at 000:07:0 ("Intel 82371SB (Triton II) PCI-ISA" rev 0x00)
pcibios0: PCI Exclusive IRQs: 9 10 11
pcibios0: PCI bus #1 is the last bus
pci0 at mainbus0 bus 0: configuration mode 1
pchb0 at pci0 dev 0 function 0 "Intel 82443BX PCI-AGP" rev 0x02
ppb0 at pci0 dev 1 function 0 "Intel 82443BX AGP" rev 0x02
pci1 at ppb0 bus 1
"Matrox MGA G200 AGP" rev 0x01 at pci1 dev 0 function 0 not configured
pcib0 at pci0 dev 7 function 0 "Intel 82371AB PIIX4 ISA" rev 0x02
pciide0 at pci0 dev 7 function 1 "Intel 82371AB IDE (PIIX4)" rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: <IBM-DTTA-371010>
wd0: can use 32-bit, PIO mode 4, DMA mode 2, Ultra-DMA mode 2
wd0: 16-sector PIO, LBA, 9641MB, 16383 cyl, 16 head, 63 sec, 19746720 sectors
pciide0: channel 0 interrupting at irq 14
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (using DMA data transfers)
"Intel 82371AB USB (PIIX4)" rev 0x01 at pci0 dev 7 function 2 not configured
"Intel 82371AB Power Management (PIIX4)" rev 0x02 at pci0 dev 7 function 3 not configured
le1 at pci0 dev 9 function 0 "AMD 79c970 PCnet-PCI LANCE" rev 0x16
le1: address 00:00:01:35:03:05
le1: 8 receive buffers, 2 transmit buffers
le1: interrupting at irq 9
rl0 at pci0 dev 11 function 0 "Realtek 8139" rev 0x10: irq 9 address 00:60:52:06:7b:53
rlphy0 at rl0 phy 0: RTL internal phy
ncr0 at pci0 dev 13 function 0 "Symbios Logic 53c875" rev 0x26: ultra wide scsi, irq 10
scsibus0 at ncr0: 16 targets
cd0 at scsibus0 targ 5 lun 0: <NEC, CD-ROM DRIVE:465, 1.03> SCSI2 5/cdrom removable
probe(ncr0:5:1): 20.0 MB/s (50 ns, offset 16)
fxp0 at pci0 dev 15 function 0 "Intel 82557" rev 0x05: irq 11, address 00:a0:c9:e7:fd:82
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 0
isa0 at pcib0
isadma0 at isa0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
vt0 at isa0 port 0x60/16 irq 1: vga 80 col, color, 8 scr, mf2-kbd
pms0 at vt0 irq 12
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
isapnp0 at isa0 port 0x279: read port 0x203
sb1 at isapnp0 "Creative ViBRA16X PnP, CTL0043, , Audio" port 0x220/16,0x330/2,0x388/4 irq 5 drq 1,3: dsp v4.16
midi1 at sb1: <SB MPU-401 UART>
audio0 at sb1
opl0 at sb1: model OPL3
midi2 at opl0: <SB Yamaha OPL3>
joy0 at isapnp0 "Creative ViBRA16X PnP, CTL7005, PNPB02F, Game" port 0x201/1
biomask 4440 netmask 4e40 ttymask 5ec2
pctr: 686-class user-level performance counters enabled
mtrr: Pentium Pro MTRR support
dkcsum: wd0 matched BIOS disk 80
root on wd0a
rootdev=0x0 rrootdev=0x300 rawdev=0x302

Current drive detection algorithm:
   1. issue soft reset command and wait while drive clear busy in the 
      status register.
   2. check drive's signature, and if it is not ATAPI, assume it is ATA
      or OLD.
   3. try to distinct ATA from OLD and kill "ghost" in wdcattach routine.

If drive isn't present, then reading it's status return random set of bits.
Point 3 implement the wrong logic and sometimes unable to kill "ghost". 
In my configuration one of ten boot fails.

Possible workaround: disable all ide controllers which has no drives.

>How-To-Repeat:

This problem reproducable, but not on all systems. Enable ide 
controller without drives attached to it. Reboot machine in circle,
at some moment(possible after several hours) system going to ddb prompt.

>Fix:

This patch fixes, simplify and improve the drive detection algorithm.
It's also do some cleanups and probably slightly speedup boot process.
I test this patch with aproximately 25 various ide drives and cdroms, 
including 3 different MFM drives. Tests was done on different i386
computers with Intel-PIIX4 (TX,BX), "VIA VT82C586A" and several 
ISA/VLB controllers. Probably need more testing espesially with 
PCMCI/CARDBUS and ESDI drives. Logic of my fix described in comments.

I ready to discuss a problem and i can explain all my changes in wdc.c.

Index: wdc.c
===================================================================
RCS file: /cvs/src/sys/dev/ic/wdc.c,v
retrieving revision 1.20
diff -u -r1.20 wdc.c
--- wdc.c	2000/07/20 19:15:23	1.20
+++ wdc.c	2000/10/12 17:18:38
@@ -94,8 +94,6 @@
 
 #include "atapiscsi.h"
 
-#define WDCDEBUG
-
 #define WDCDELAY  100 /* 100 microseconds */
 #define WDCNDELAY_RST (WDC_RESET_WAIT * 1000 / WDCDELAY)
 #if 0
@@ -345,16 +343,17 @@
 }
 
 
-/* Test to see controller with at last one attached drive is there.
+/* Test to see controller with at least one attached drive is there.
  * Returns a bit for each possible drive found (0x01 for drive 0,
  * 0x02 for drive 1).
  * Logic:
- * - If a status register is at 0xff, assume there is no drive here
- *   (ISA has pull-up resistors). If no drive at all -> return.
+ * - If a status register is at 0x7f, assume there is no drive here
+ *   (ISA has pull-up resistors, but bit 7 sometimes has pull-down resistor ?). 
+ *   If no drive at all -> return.
  * - reset the controller, wait for it to complete (may take up to 31s !).
  *   If timeout -> return.
- * - test ATA/ATAPI signatures. If at last one drive found -> return.
- * - try an ATA command on the master.
+ * - test ATA/ATAPI signatures. Wait for ready if drive isn't ATAPI.
+ * - return drive mask.
  */
 
 int
@@ -364,6 +363,7 @@
 	u_int8_t st0, st1, sc, sn, cl, ch;
 	u_int8_t ret_value = 0x03;
 	u_int8_t drive;
+	int	i;
 
 	if (!chp->_vtbl)
 		chp->_vtbl = &wdc_default_vtbl;
@@ -390,24 +390,19 @@
 		    chp->wdc ? chp->wdc->sc_dev.dv_xname : "wdcprobe",
 		    chp->channel, st0, st1), DEBUG_PROBE);
 
-		if (st0 == 0xff)
+		if ((st0 & 0x7f) == 0x7f)
 			ret_value &= ~0x01;
-		if (st1 == 0xff)
+		if ((st1 & 0x7f) == 0x7f)
 			ret_value &= ~0x02;
 		if (ret_value == 0) 
 			return 0;
 	}
 
 	/* assert SRST, wait for reset to complete */
-	CHP_WRITE_REG(chp, wdr_sdh, WDSD_IBM);
+	CHP_WRITE_REG(chp,wdr_ctlr, WDCTL_RST | WDCTL_4BIT); 
 	delay(10);
-	CHP_WRITE_REG(chp,wdr_ctlr, WDCTL_RST | WDCTL_IDS); 
-	DELAY(1000);
-	CHP_WRITE_REG(chp, wdr_ctlr, WDCTL_IDS);
-	delay(1000);
-	(void) CHP_READ_REG(chp, wdr_error);
 	CHP_WRITE_REG(chp, wdr_ctlr, WDCTL_4BIT);
-	delay(10);
+	delay(2000);
 
 	ret_value = __wdcwait_reset(chp, ret_value);
 	WDCDEBUG_PRINT(("%s:%d: after reset, ret_value=0x%d\n",
@@ -420,9 +415,10 @@
 
 	/*
 	 * Test presence of drives. First test register signatures looking for
-	 * ATAPI devices. If it's not an ATAPI and reset said there may be
-	 * something here assume it's ATA or OLD. Ghost will be killed later in
-	 * attach routine.
+	 * ATA/ATAPI devices. If drive isn't ATAPI, wait for it's readiness.
+	 * Timeout -> no drive, if any, drive is ATA or OLD. Distinction
+	 * between ATA and OLD will be done later in attach routine 
+	 * by issuing an IDENTIFY command.
 	 */
 	for (drive = 0; drive < 2; drive++) {
 		if ((ret_value & (0x01 << drive)) == 0)
@@ -436,8 +432,8 @@
 		cl = CHP_READ_REG(chp, wdr_cyl_lo);
 		ch = CHP_READ_REG(chp, wdr_cyl_hi);
 
-		WDCDEBUG_PRINT(("%s:%d:%d: after reset, st=0x%x, sc=0x%x sn=0x%x "
-		    "cl=0x%x ch=0x%x\n",
+		WDCDEBUG_PRINT(("%s:%d:%d: after reset, st=0x%x, sc=0x%x"
+		    " sn=0x%x cl=0x%x ch=0x%x\n",
 		    chp->wdc ? chp->wdc->sc_dev.dv_xname : "wdcprobe",
 	    	    chp->channel, drive, st0, sc, sn, cl, ch), DEBUG_PROBE);
 		/*
@@ -448,11 +444,38 @@
 		if (cl == 0x14 && ch == 0xeb) {
 			chp->ch_drive[drive].drive_flags |= DRIVE_ATAPI;
 		} else {
+		  if (sc == 0x01 && sn == 0x01 && cl == 0x0 && ch == 0x0){
+		    for(i=0; i < 1000; i++) {		/* 1 sec */
+		      st0 = CHP_READ_REG(chp, wdr_status);
+		      if((st0 & WDCS_BSY) == 0 && (st0 & WDCS_DRDY) != 0) {
+			WDCDEBUG_PRINT(("%s:%d:%d: waiting for ready %d msec\n",
+			    chp->wdc ? chp->wdc->sc_dev.dv_xname : "wdcprobe",
+		 	    chp->channel, drive, i), DEBUG_PROBE);
+			break;
+		      }else{
+		    	delay(1000);			/* 1 millisecond */
+		      }
+		    }
+		    if((st0 & WDCS_BSY) != 0 || (st0 & WDCS_DRDY) == 0) {
+		   	ret_value &= ~(0x01 << drive);
+			continue;
+		    }
+		    if(ret_value & (0x01 << drive)) {
+			/* 
+			 * here is ATA or OLD drive, we are 
+			 * distinct it in wdcattach() 
+			 */
 			chp->ch_drive[drive].drive_flags |= DRIVE_ATA;
 			if (chp->wdc == NULL ||
 			    (chp->wdc->cap & WDC_CAPABILITY_PREATA) != 0)
 				chp->ch_drive[drive].drive_flags |= DRIVE_OLD;
+		    }
+		  }else{ 
+		    ret_value &= ~(0x01 << drive);
+		  }
 		}
+		if(!(ret_value & (0x01 << drive)))
+			chp->ch_drive[drive].drive_flags &= ~DRIVE;
 	}
 
 #ifdef WDCDEBUG
@@ -530,6 +553,8 @@
 	timeout_set(&chp->ch_timo, wdctimeout, chp);
 	
 	for (i = 0; i < 2; i++) {
+		if ((chp->ch_drive[i].drive_flags & DRIVE) == 0)
+			continue;
 		chp->ch_drive[i].chnl_softc = chp;
 		chp->ch_drive[i].drive = i;
 		/* If controller can't do 16bit flag the drives as 32bit */
@@ -537,59 +562,23 @@
 		    (WDC_CAPABILITY_DATA16 | WDC_CAPABILITY_DATA32)) ==
 		    WDC_CAPABILITY_DATA32)
 			chp->ch_drive[i].drive_flags |= DRIVE_CAP32;
-		if ((chp->ch_drive[i].drive_flags & DRIVE) == 0)
-			continue;
 
 		if (i == 1 && ((chp->ch_drive[0].drive_flags & DRIVE) == 0))
 			chp->ch_flags |= WDCF_ONESLAVE;
-
-		/* Issue a IDENTIFY command, to try to detect slave ghost */
+		/*
+		 * Issue an IDENTIFY command in order to distinct ATA from OLD.
+		 * This also kill ATAPI ghost.
+		 */
 		if (ata_get_params(&chp->ch_drive[i], at_poll, &params) ==
 		    CMD_OK) {
 			/* If IDENTIFY succeded, this is not an OLD ctrl */
-			chp->ch_drive[0].drive_flags &= ~DRIVE_OLD;
-			chp->ch_drive[1].drive_flags &= ~DRIVE_OLD;
+			chp->ch_drive[i].drive_flags &= ~DRIVE_OLD;
 		} else {
 			chp->ch_drive[i].drive_flags &=
 			    ~(DRIVE_ATA | DRIVE_ATAPI);
 			WDCDEBUG_PRINT(("%s:%d:%d: IDENTIFY failed\n",
 			    chp->wdc->sc_dev.dv_xname,
 			    chp->channel, i), DEBUG_PROBE);
-			if ((chp->ch_drive[i].drive_flags & DRIVE_OLD) == 0)
-				continue;
-			/*
-			 * Pre-ATA drive ?
-			 * Test registers writability (Error register not
-			 * writable, but cyllo is), then try an ATA command.
-			 */
-			CHP_WRITE_REG(chp, wdr_sdh, WDSD_IBM | (i << 4));
-			delay(10);
-			CHP_WRITE_REG(chp, wdr_features, 0x58);
-			CHP_WRITE_REG(chp, wdr_cyl_lo, 0xa5);
-			if ((CHP_READ_REG(chp, wdr_error) == 0x58) ||
-			    (CHP_READ_REG(chp, wdr_cyl_lo) != 0xa5)) {
-				WDCDEBUG_PRINT(("%s:%d:%d: register "
-				    "writability failed\n",
-				    chp->wdc->sc_dev.dv_xname,
-				    chp->channel, i), DEBUG_PROBE);
-				    chp->ch_drive[i].drive_flags &= ~DRIVE_OLD;
-			}
-			CHP_WRITE_REG(chp, wdr_sdh, WDSD_IBM | (i << 4));
-			delay(100);
-			if (wait_for_ready(chp, 10000) != 0) {
-				WDCDEBUG_PRINT(("%s:%d:%d: not ready\n",
-				    chp->wdc->sc_dev.dv_xname,
-				    chp->channel, i), DEBUG_PROBE);
-				chp->ch_drive[i].drive_flags &= ~DRIVE_OLD;
-				continue;
-			}
-			CHP_WRITE_REG(chp, wdr_command, WDCC_RECAL);
-			if (wait_for_ready(chp, 10000) != 0) {
-				WDCDEBUG_PRINT(("%s:%d:%d: WDCC_RECAL failed\n",
-				    chp->wdc->sc_dev.dv_xname,
-				    chp->channel, i), DEBUG_PROBE);
-				chp->ch_drive[i].drive_flags &= ~DRIVE_OLD;
-			}
 		}
 	}
 	ctrl_flags = chp->wdc->sc_dev.dv_cfdata->cf_flags;
@@ -824,13 +813,10 @@
 	if (!chp->_vtbl)
 		chp->_vtbl = &wdc_default_vtbl;
 
-	CHP_WRITE_REG(chp, wdr_sdh, WDSD_IBM); /* master */
-	CHP_WRITE_REG(chp, wdr_ctlr, WDCTL_RST | WDCTL_IDS);
-	delay(1000);
-	CHP_WRITE_REG(chp, wdr_ctlr, WDCTL_IDS);
-	delay(2000);
-	(void) CHP_READ_REG(chp,wdr_error);
+	CHP_WRITE_REG(chp, wdr_ctlr, WDCTL_RST | WDCTL_4BIT);
+	delay(10);
 	CHP_WRITE_REG(chp, wdr_ctlr, WDCTL_4BIT);
+	delay(2000);
 
 	drv_mask1 = (chp->ch_drive[0].drive_flags & DRIVE) ? 0x01:0x00;
 	drv_mask1 |= (chp->ch_drive[1].drive_flags & DRIVE) ? 0x02:0x00;
@@ -856,9 +842,6 @@
 	int timeout;
 	u_int8_t st0, st1;
 
-	/* Wait 50ms for drive firmware to settle */
-	delay(50000);
-
 	/* wait for BSY to deassert */
 	for (timeout = 0; timeout < WDCNDELAY_RST;timeout++) {
 		CHP_WRITE_REG(chp, wdr_sdh, WDSD_IBM); /* master */
@@ -888,15 +871,16 @@
 		}
 		delay(WDCDELAY);
 	}
-	/* Reset timed out. Maybe it's because drv_mask was not rigth */
+	/* Reset timed out. Maybe it's because drv_mask was not right */
 	if (st0 & WDCS_BSY)
 		drv_mask &= ~0x01;
 	if (st1 & WDCS_BSY)
 		drv_mask &= ~0x02;
 end:
-	WDCDEBUG_PRINT(("%s:%d: wdcwait_reset() end, st0=0x%x, st1=0x%x\n",
+	WDCDEBUG_PRINT(("%s:%d: wdcwait_reset() end, st0=0x%x, st1=0x%x, "
+			"reset time=%d msec\n",
 	    chp->wdc ? chp->wdc->sc_dev.dv_xname : "wdcprobe", chp->channel,
-	    st0, st1), DEBUG_PROBE);
+	    st0, st1, timeout*WDCDELAY/1000), DEBUG_PROBE);
 
 	return drv_mask;
 }

>Audit-Trail:
>Unformatted: