[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: more bgpd weirdness

* Arvid Grøtting <arvidg_(_at_)_netfonds_(_dot_)_no> [2004-08-18 02:23]:
> Henning Brauer <lists-openbsd_(_at_)_bsws_(_dot_)_de> writes:
> > There's basically two possibilities, taking this and your 
> > other "impossible" bugs/crashes into account:
> > 1) there's some hard to trigger bug somewhere pretty deep in the code, 
> >    the msgbuf functions would be my first guess
> > 2) you have bad bad bad hardware, which manifests itself here.
> >    I have seen one and only one daemon freaking out on bad hardware 
> >    before...
> In my experience, bad hardware tends not to manifest itself quite as
> consistently as this.

Well, I have seen that before.
And given the sheer amount of imsgs that get send around it is simply 
likely that the header consistency check catches bad memory first.

It is not impossible that you are triggering a bug - however, given 
that various different failures you saw, and especially given HOW it 
failed, I don't see how this could not be hardware.

> > after looking over the msgbuf code again my guess is for #1.
> :-)

argh, murphy. I meant #2 of course.

> > You mentioned you have a second, mostly identical machine where those 
> > problems do not show up... did you try swapping them?
> No, I haven't been able to do that yet, but I have a new machine on
> order that I'll try swapping in instead of the one that shows broken
> behaviour.  It's not as identical, though.

that shouldn't be a problem.

> As I said, the other machine hasn't been tried with a many-peer
> configuration at all, so if the problems are indeed in the code, the
> configuration may indeed be what triggers it.

possible, but as said before, I don't believe so.

> Here's the config that fails to reload[1] again[2]; it is also the
> config on the router that gets the header error notifications:

wait wait, the reload problem has been fixed - do you see ANOTHER one??

Visit your host, monkey.org