[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: READ_DMA48 error interpretation



In freebsd-questions Digest, Vol 164, Issue 1
At Message: 19
On Mon, 5 Feb 2007 01:13:31 -0600 (CST) Richard Lynch <ceo_(_at_)_l-i-e_(_dot_)_com> wrote:
 > On Tue, January 16, 2007 3:21 pm, Chuck Swiger wrote:
 > > On Jan 16, 2007, at 1:13 PM, Richard Lynch wrote:
 > >> I know the messages below mean the hard drive or IDE cards are
 > >> having
 > >> problems.  But is this like RED ALERT or more like YELLOW or what?
 > ...
 > >> +ad1: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=404955007
 > >> +ad1: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR>
 > >> error=10<NID_NOT_FOUND>
 > >> LBA=404955007
 > >> +g_vfs_done():ad1s1[READ(offset=207336931328, length=16384)]error = 5
 > 
 > > If you have current backups, it's a yellow alert.  Otherwise...
 > >
 > >> And what do I do about it?
 > >>
 > >> umount and fsck everything a lot?

Once should do :)  It's possible to have read errors, from a write error
say on unclean power removal, that don't indicate a drive fault at all.

 > >> swap cards/drives around until it stops?
 > >> Ignore it and pray?

The latter is or at least was listed as a backup strategy in the docs :)
 
 > > Try installing the sysutils/smartmontools port and run a drive self-
 > > test.  That will give you a much better assessment of the state of
 > > the drive and whether it is likely to completely fail in the next 24
 > > hours...
 > 
 > I ran the short test on the problem drives, and it said everything was
 > fine.
 > 
 > I'll try the long test at a later date.

Show us the result of 'smartctl -a <drive>' after a test or two.

 > Meanwhile, I turned on the smartd daemon, and am seeing two issues in
 > the logs...
 > 
 > #1. The drive temperatures seem ridiculously high to this naive
 > reader, but what do I know?...
 > 110 to 190 Celcius?  Yikes...  Or maybe that's normal?
 > How hot is too hot?

As perryh_(_at_)_pluto_(_dot_)_rain_(_dot_)_com pointed out, 100C is too hot.  I don't believe
those 110 to 190 numbers at all and suspect a drive would melt down at
anything near that.  Maybe these are Farenheit temperatures?

While perryh's advice about airflow and enclosures etc was spot on, I
suspect you need to check whether your particular drives may need some
corrective parameters if not fully covered by the smartctl database, as
some tend to do.  There are hints about this in smartctl(8) -v option.

 > #2. Sequences like this show up a fair amount:
 > Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed
 > from 152 to 153
 > Device: /dev/ad2, SMART Prefailure Attribute: 3 Spin_Up_Time changed
 > from 153 to 152
 > Device: /dev/ad0, SMART Prefailure Attribute: 8 Seek_Time_Performance
 > changed from 251 to 250

It'd be more useful to see these within the context shown by smartctl -a

 > So is the real "problem" just that the drives are spun down and can't
 > spin up fast enough? I can probably live with the consequences of
 > that, and just go on with life -- The occasional HTTP request for an
 > audio file will fail the first time, and they have to hit reload.
 > 
 > This box is the fail-safe roll-over server for audio files that are
 > all up online somewhere else managed by a professional (not me), so
 > it's no surprise that the rare time-out on the real server also ends
 > up with a drive spin up and failed request on the "backup".  Kind of
 > annoying, I guess, to an end user, but forcing the drives to always be
 > spinning is probably not a Good Idea.

I don't know about that; while I wouldn't worry too much about spin-up
times unless it's a major annoyance to clients, I've always subscribed
to drives lasting much longer if left spinning.  The server delivering
this mail has spun its old IBM DTLA-something drive 24h/365d for nearly
9 years now, despite no aircon in a hot climate (up to ~45C in summer).

 > Oh, here's a rather long excerpt of the log in case there's minutae
 > within it that I've failed to include:
 > http://l-i-e.com/smartd.log

The output of smartctl -a for one or two of your drives would likely be
much more indicative.  I don't claim to be an expert in this at all, but
some of us might spot any obvious anomalies.

Cheers, Ian

_______________________________________________
freebsd-questions_(_at_)_freebsd_(_dot_)_org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe_(_at_)_freebsd_(_dot_)_org"


Visit your host, monkey.org