[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: changing semantics of the va_filerev (code review)

On Mon, 13 Apr 2009, Bruce Evans wrote:

va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this
purpose, but it isn't actually a file generation number like its
comments say (it is normally set to a random value on file creation
then never changed) and it is exposed to userland (st_gen).

i_gen is used by NFS to create T-stable (valid for a long time, including
a long time after the file is removed) file handles. It is used by
ffs_vptofh() to create the file handles for NFS that are recognized as
representing removed files, even after an i-node gets reused such that
the i-node number now represents another file.

va_gen isn't used much either.  In ext2fs, i_gen is a copy of the
on-disk field i_generation which is documented to be /* for NFS */ but
nfs doesn't use va_gen at all.  nfs3 (getattr, loadattrcache) doesn't
even initialize va_gen.  nfsv2 initializes it to a non-random value
based on a timestamp.  I'm not sure if it does this only on creation
or on every cache miss or on every call.  I think the uninitialized
va_gen gives stack garbage in st_gen, but in tests I get 0 for both
nfsv3 and nfsv2 (as root -- st_gen is always 0 for non-root).  I don't
understand the security issues for *_gen, but remember its being changed
for security.  cvs history shows that it used to actually be a generation
number in at least ffs, but for ffs files and not for individual file
changes (or for individual ffs file systems or all file systems).

Its initial value doesn't matter for it to work correctly.
It should get incremented each time an i-node gets reused for a different
file. (That's what the ESTALE magic is, the server reporting
to the client that the file handle is for a file that has been removed.)

The "security" business is a bit bogus to me. It's one of those security
by obscurity tricks, imho. The problem was that a file handle was easy to fake when i_gen is initially 0. Initializing it to a non-zero value
makes faking one a little harder, but... Part of the reason for doing
this was that IP#s were only checked against exports at mount time on
some systems (BSD has never been this way) and faking the one file handle
for the root of the FS (root i-node#, i_gen == 0) bypassed exports and
tah dah.

I'd like to change the semantics of va_filerev so that it can be used
by the nfsv4 server as the Change attribute. To do this, it needs to
change in 2 ways:
- must change upon metadata changes as well as data changes
- must persist across server reboots (ie. be moved to spare space in
	the on-disk i-node instead of in memory i-node)

Many nonstandard file systems, e.g., msdosfs, have no space to spare.

If a file system can't support it correctly, faking it with something
like modify time is about all you can do. Since Change is supposed to
change on every file modification, this fails when multiple changes
occur within the same tod clock time or the clock gets reset backwards,
as you noted below. (Linux uses a modify time with a 1sec clock
resolution for Change, which isn't correct and the Linux nfs server
folks know that. Since this breaks the AIX nfsv4 client, the AIX folks
tend to remind them:-)

Read-only file systems like cd9660 and udf probably don't need a a
variable generation count (since they never change), but their current
handling of va_filerev and va_gen is wrong if these fields have any
other semantics.  These file systems just initialize va_gen to 1 for
all files and va_gen to 0 (with an XXX in udf only) for all files.

Since they only need to change for modifications and their initial
values don't really matter, the above sounds fine to me.

va_ctime should give what you want for all file systems, since it
should be increased whenever anything changes.  However, most file
There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't.
Since the Change attribute must change for every file modification, I
feel safer incrementing it for both IN_UPDATE and IN_CHANGE. (It's 64bits,
so it won't wrap around for a little while.)

systems always set the nsec part to 0, so va_ctime doesn't track
all file changes.  This is a problem for things like make(1) too,
so if nsec timestamps aren't available or are take too long or are
not fine-grained enough, the nsec part should be abused as a generation
counter so that any change gives a strictly larger timestamp.  The
case where someone sets the clock backwards is broken but won't
happen often.

Many nonstandard file systems, e.g., msdosfs, have
no space for an on-disk ctime, so they fake va_ctime using an on-disk
mtime.  Since such file systems don't have many attributes, only a
few more cases are broken.

Yep, that's why ctime/mtime aren't sufficient.
If a read/write file system doesn't have support for it, all you
can do is fake it and hope the client works ok. I suspect the Linux
folks will eventually start to add support for it to ext3fs etc, because
of the above, but who knows. It seems that FreeBSD mostly uses FFS and
ZFS (which should have support for it, since the Solaris folks are into
NFSv4?), so at least we should be able to make those work correctly.

Have a good week, rick

freebsd-fs_(_at_)_freebsd_(_dot_)_org mailing list
To unsubscribe, send any mail to "freebsd-fs-unsubscribe_(_at_)_freebsd_(_dot_)_org"

Visit your host, monkey.org