[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pfsync after reboot does not synchronize



On 6/5/06, David DeSimone <fox_(_at_)_verio_(_dot_)_net> wrote:
I tried posting some messages about PF to the freebsd-net mailing list,
but they seemed to be ignored.  So I thought I would try sending my
questions here.

I am trying to figure out why pfsync does not seem to work correctly
when one of my cluster nodes reboots.

When I reboot one of the cluster members, the state tables do appear to
synchronize, sort of, and populate with some of the same connection
states, but not all of them.

That is "pfctl -ss" on both cluster members will show a different number
of state entries.  Vastly different if the new member has only been up
for a minute or two.

In particular, long-lived, extant connections (such as IRC server
connections) seem to never show up in the rebooted member's state table,
even though the connections continue to update their state on the
current carp master.

I figured that doing ifconfig down/up would send some sort of "full
sync" message between the two members, to cause the entire state table
to be sent in bulk.  Eventually I learned that the method to do this is
to use "ifconfig syncdev" to force a bulk update:

    ifconfig pfsync0 syncdev fxp0   # $pfsync_syncdev

When I perform the above command, I see the following debug output (when
PF is configured at "misc" or "loud" debug level):

    On the cluster member receiving the requests:

        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request
        pfsync: received bulk update request

    On the cluster member making the request (where syncdev was just
    ifconfig'd):

        pfsync: requesting bulk update
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: received bulk update start
        pfsync: failed to receive bulk update status

After performing this manual action, I find the state table is much
better populated, and the two firewalls appear to be synchronized.
However, the messages above bother me.  It looks to me like the cluster
member making the request repeats it over and over again, and finally
gives up after PFSYNC_MAX_BULKTRIES (12) attempts.  Shouldn't that be
something that only happens in exceptional conditions?  Yet, I can make
it happen every time, even on a test cluster with no traffic (and thus
an almost empty state table).

Does anyone have any insight as to why I see these problems?

1.  Why does pfsync synchronize the state tables when I use the
    "ifconfig syncdev" trick to force a bulk update, yet it does
    not do this when the system is booting up?

2.  Why does pfsync keep repeating the bulk update request and then give
    up?  What message is not getting through?


The two cluster members have a direct cross-cable between them.  My PF
policy has these settings:

    set skip on pfsync0

    pass quick on fxp0 proto pfsync     # $pfsync_syncdev

I have also seen this problem with pfSense.  To get around the problem
I set the advskew to 200 on the host and wait 30 seconds to give
everything time to sync.  I am really not sure what is causing it but
it may be related to the pfsync hold down timer?   At any rate we
worked around the problem and I wanted to readdress it after our 1.0
release.  I am glad someone else is also seeing the problem.

Let me know if anyone needs more information.

Scott
_______________________________________________
freebsd-pf_(_at_)_freebsd_(_dot_)_org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscribe_(_at_)_freebsd_(_dot_)_org"