[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: openbsd under heavy load corrupts fs and crash ?



That sounds like a bad cable or drive to me.

On Feb 3, 2004, at 3:15 AM, Per-Erik Persson wrote:

> I have this delicate problem that has been following me for the last 
> year.
>
> My only two OpenBSD servers one 1.7Ghz Celeron with  "ServerWorks CSB6 
> IDE" chipset an the other one is a  500Mhz PIII "Intel 82371AB IDE"
> This problem has been the same all thru 3.2  and 3.3(The DMA of the 
> CSB6 chipset got supported here)
> Both machines have two IDE disks that are equally heavily loaded with 
> diskaccess(postfix, imap, apache, nfs and scp) cpu and memory is not a 
> problem.
> If I enable softdeps the machines crash after a day or two,  always 
> with errors about ffs not being able to allocate data or some ffs 
> timeout.
> With syncronus mounts the computers can run for several months without 
> showing the same behavure. Usually I need to do a manual fsck after 
> rebooting, a file or two has been badly corrupted. Enabling softdeps 
> on only one partition will increase the chances for it to fail.
>
>
> The interesting information I got last time was:
>
> Feb  2 02:01:52 meso named[7465]: ---w2k machines trying to update the 
> nameserver all the time...----
> Feb  2 02:12:53 meso /bsd: wd0(pciide0:0:0): timeout
> Feb  2 02:12:54 meso /bsd:      type: ata
> Feb  2 02:12:54 meso /bsd:      c_bcount: 8192
> Feb  2 02:12:54 meso /bsd:      c_skip: 0
> Feb  2 02:12:54 meso /bsd: pciide0:0:0: bus-master DMA error: missing 
> interrupt,
> status=0x20
>
>
> I know that some people would suggest me to purchase some SCSI stuff 
> but that is not an option......
>
> These two machines are in production so debuging is not that easy. I 
> have memory dumps from the crash, but how do I get the trace and ps 
> info out of it and into a file without halting the machines ? This is 
> not found in a FAQ that i know of!