[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: changing semantics of the va_filerev (code review)
- To: Rick Macklem <rmacklem_(_at_)_uoguelph_(_dot_)_ca>
- Subject: Re: changing semantics of the va_filerev (code review)
- From: Bruce Evans <brde_(_at_)_optusnet_(_dot_)_com_(_dot_)_au>
- Date: Tue, 14 Apr 2009 20:08:51 +1000 (EST)
- Cc: freebsd-fs_(_at_)_freebsd_(_dot_)_org
On Mon, 13 Apr 2009, Rick Macklem wrote:
On Mon, 13 Apr 2009, Bruce Evans wrote:
va_gen/i_gen/di_gen/st_gen seems to be even more suitable for this
purpose, but it isn't actually a file generation number like its
comments say (it is normally set to a random value on file creation
then never changed) and it is exposed to userland (st_gen).
i_gen is used by NFS to create T-stable (valid for a long time, including
a long time after the file is removed) file handles. It is used by
ffs_vptofh() to create the file handles for NFS that are recognized as
representing removed files, even after an i-node gets reused such that
the i-node number now represents another file.
Oops, I missed that since nfs's use of i_gen is indirect. What does
nfs do for file systems that don't detect removed files, e.g., msdosfs.
vptofh and fhtovp routines seem to have too many differences. E.g.,
file systems based on ffs return ESTALE for removed files, but
zfs_fhtovp() returns EINVAL.
I just noticed than the increment of i_gen was slightly broken for ffs
by a type mismatch in ffs2 (affects ffs1 too). Originally, i_gen had
the same type as di_gen (int32_t). Now i_gen has type int64_t but in
ffs1, di_gen of course still has type int32_t, and in ffs2, di_gen
still has type int32_t (apparently there was insufficient space to
expand it). This makes the overflow check in ffs_alloc.c (++ip->i_gen
== 0) more broken than before. Previously it only gave undefined
behaviour followed by a bogus check when overflow occurs for incrementing
from INT32_T_MAX. Now it has no effect, since it takes 293 years of
incrementing at a rate of 1GHz to reach overflow at INT64_T_MAX.
Overflow now occurs on assignment to di_gen.
The result of this bug is almost the the same as removing the silly
part of the security code -- the re-randomization on overflow. i_gen
may grow larger than UINT32_T_MAX, but usually refresh from the dinode
will keep it smaller. When it starts near UINT32_T_MAX and grows
larger, the overflow on assignment and a subsequent refresh will make
it nearly 0. Except, in 1 in every 2**32 cases, when the overflow makes
di_gen exactly 0, the subsequent refresh will randomize i_gen.
va_ctime should give what you want for all file systems, since it
should be increased whenever anything changes. However, most file
There are some places where IN_UPDATE gets set, but IN_CHANGE doesn't.
Are there? This would be a bug. I checked that ffs doesn't have this
Since the Change attribute must change for every file modification, I
feel safer incrementing it for both IN_UPDATE and IN_CHANGE. (It's 64bits,
so it won't wrap around for a little while.)
It would be a large and obvious bug to modify the file data (IN_UPDATE)
without setting IN_CHANGE.
systems always set the nsec part to 0, so va_ctime doesn't track
all file changes. This is a problem for things like make(1) too,
so if nsec timestamps aren't available or are take too long or are
not fine-grained enough, the nsec part should be abused as a generation
counter so that any change gives a strictly larger timestamp. The
case where someone sets the clock backwards is broken but won't
Many nonstandard file systems, e.g., msdosfs, have
no space for an on-disk ctime, so they fake va_ctime using an on-disk
mtime. Since such file systems don't have many attributes, only a
few more cases are broken.
Yep, that's why ctime/mtime aren't sufficient.
If a read/write file system doesn't have support for it, all you
can do is fake it and hope the client works ok. I suspect the Linux
They need to be fixed or faked well enough for make(1) too.
When the dinode has no space to spare, something can be done by keeping
state in the inode or vnode. This won't work across reboots of course
(except by hashing a reboot counter into the generation counts or
timestamps) but might be enough for all short-term uses. I'm not sure
how much is safe here.
freebsd-fs_(_at_)_freebsd_(_dot_)_org mailing list
To unsubscribe, send any mail to "freebsd-fs-unsubscribe_(_at_)_freebsd_(_dot_)_org"
Visit your host, monkey.org