[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IO fencing question



On Sat, Apr 08, 2006 at 03:54:58PM -0400, Barry, Christopher wrote:
> > -----Original Message-----
> > From: Jon Hart [mailto:jhart_(_at_)_spoofed_(_dot_)_org] 
> > Sent: Friday, April 07, 2006 1:25 PM
> > To: Barry, Christopher
> > Cc: misc_(_at_)_openbsd_(_dot_)_org
> > Subject: Re: IO fencing question
> > 
> > On Fri, Apr 07, 2006 at 12:26:45PM -0400, Barry, Christopher wrote:
> > > 	Thanks much for your answers. By 'soft', I mean a controlled
> > > reboot/shutdown where the power remains on even though the OS has
> > > obviously stopped running. I have not experienced any 
> > actual failures of
> > > anything, so I do not the outcome of that. Induced 'Hard' 
> > failure (e.g.
> > > pulling the plug) works perfectly.
> > > 
> > > 	The more I look at it, and think about it, I'm guessing the
> > > problem is more related to the redundant fibre ports on the 350-24T
> > > switch, actually holding onto information about the directly connect
> > > interface, and stubbornly sticking to it if it detects any kind of
> > > signal whatsoever.
> > 
> > I experienced this same sort of weirdness when setting up a pair of
> > redundant routers.  The two upstreams, which I had no control 
> > over, ran
> > OSPF.  If I powered off the machine, all was well.  If I simply halted
> > the machine, or there was power to it at all, their OSPF daemon would
> > detect a link and continue to route in the direction of our downed
> > router.
> > 
> > The problem, in the end, was that the Dell 1850s "primary" onboard
> > ethernet controller will exhibit link when there is power to 
> > the board.
> > The secondary, and any PCI/PCI-X cards that we added on afterward, did
> > not exhibit this behavior.
> > 
> > -jon
> > 
> 
> 
> Thanks everyone for your ideas on this. As it turns out, the issue is
> indeed the switch's redundant fiber port not releasing. As soon as power
> hits the server's motherboard, a link is present on the switch - even
> though all of my fiber NICs are in PCI slots. The only way I can
> reliably failover the switch port is to remove power completely from the
> router.
> 
> To do this, I'm thinking a combination of:
> <http://freshmeat.net/projects/powerswitch/>
> and:
> <http://www.servertech.com/products/product.aspx?GroupID=1&ProductID=12#
> >
> 
> Of course the powerswitch script will need a bit of hacking, and I'll
> need to wrap the whole deal in a looping testing script, looking for
> when stge0 on the backup becomes master. Then I'm thinking of attempting
> a 'ssh master -c "halt -p"', waiting a certain amount of seconds, and
> then switching off the power to the plug.
> 
> Does that sound like a reasonable approach? Anyone already done this and
> have some lessons for me?

While this is likely to work in practice, a more complete solution makes
sure that the box is only switched off if it is shut down properly. How
to handle a kernel panic is also nontrivial, as you both want the output
and the connection to be cut.

If you can manage it, it might be best to cut fiber access instead of
power.

Of course, none of this makes the system more stable.

		Joachim