[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

pfsync cluster fails to fail with state .....



First this used to work in 3.8 on these machines. Then they were playthings in the 'lab', now they are all upgraded to 3.9 and ready to deploy ... or are they?


There are two goofy things happening here ( three if you count me ). I can't tell if this is a symptom or cause, but I can not NAT behind a carp addr.


fw0:root:/root #ifconfig carp10 carp10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 description: virtual if for external traffic carp: MASTER carpdev xl0 vhid 22 advbase 1 advskew 10 groups: carp inet 10.120.10.100 netmask 0xffffff00 broadcast 10.120.10.255

fw1:root:/root #ifconfig carp10
carp10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        description: virtual if for external  traffic
        carp: BACKUP carpdev xl0 vhid 22 advbase 1 advskew 100
        groups: carp
        inet 10.120.10.100 netmask 0xffffff00 broadcast 10.120.10.255


the carp ifaces can do addr takeover, I have preempt on, my packets route and return properly, as long as each machine NAT behind xl0. Things seem to work except that on a fail-over a running download stalls and never recovers.


If I try to NAT behind carp10 ... nothing comes out. I have the xl0's on a hub and I watch on a 3'rd party. Even with this ruleset:


===================== set block-policy drop

# Normalization
scrub in no-df

# Translation

nat  on carp10  from xxx.xxx.35.0/24 to any ->carp10
#nat  on xl0  from xxx.xxx.35.0/24 to any ->xl0

pass in   quick log on { enc0 em1 em0 xl0 carp10 carp35  lo } keep state
pass out  quick log on { enc0 em1 em0 xl0 carp10 carp35  lo } keep state

====================

I see nothing leave carp10 when initiated from host xxx.xxx.35.235.
If I switch out the nat to xl0 it works, but stateful failover does not.

doing a dump on enc0 I see what looks like happy pfsync traffic
===================================

fw1:root:/root #tcpdump -ni enc0
tcpdump: WARNING: enc0: no IPv4 address assigned
tcpdump: listening on enc0, link-type ENC
08:43:30.293191 (authentic,confidential): SPI 0x9238583f: xxx.xxx.35.3: TDB UPD:
(DF) [tos 0x10] (encap)
08:43:30.295997 (authentic,confidential): SPI 0xdaf988d3: xxx.xxx.35.2: TDB UPD:
(DF) [tos 0x10] (encap)
08:43:30.779450 (authentic,confidential): SPI 0x9238583f: xxx.xxx.35.3: UPD ST COMP:
(DF) [tos 0x10] (encap)
08:43:30.785927 (authentic,confidential): SPI 0xdaf988d3: xxx.xxx.35.2: UPD ST COMP:
(DF) [tos 0x10] (encap)
08:43:31.109482 (authentic,confidential): SPI 0x9238583f: xxx.xxx.35.3: UPD ST COMP:
(DF) [tos 0x10] (encap)
08:43:31.293203 (authentic,confidential): SPI 0x9238583f: xxx.xxx.35.3: TDB UPD:
(DF) [tos 0x10] (encap)
08:43:31.295997 (authentic,confidential): SPI 0xdaf988d3: xxx.xxx.35.2: TDB UPD:
(DF) [tos 0x10] (encap)
08:43:31.623265 (authentic,confidential): SPI 0x9238583f: xxx.xxx.35.3: UPD ST COMP:
(DF) [tos 0x10] (encap)
08:43:31.786090 (authentic,confidential): SPI 0xdaf988d3: xxx.xxx.35.2: UPD ST COMP:
(DF) [tos 0x10] (encap)
=======================================


if I run pftop on each fw and initiate a session through one it appears to show on the others state table.


So I assume the failure is due to my inability to NAT behind carp. Would some one be willing to call me an idiot _AND_ give me a hint?