Page MenuHomeFreeBSD

fix race between Lookup and file modifying RPCs in the NFS client
ClosedPublic

Authored by rmacklem on Oct 25 2021, 3:40 AM.

Details

Summary

PR#259071 provides a test program that fails for the NFS client.
Testing with it, there appears to be a race between Lookup
and VOPs like Setattr-of-size, where Lookup ends up loading
stale attributes (including what might be the wrong file size)
into the NFS vnode's attribute cache.

The race occurs when the modifying VOP (which holds a lock
on the vnode), blocks the acquisition of the vnode in Lookup,
after the RPC (with now potentially stale attributes).

Here's what seems to happen:
Child Parent

  • does stat()
    • does VOP_LOOKUP(), which does the Lookup RPC with the directory vnode locked - does ftruncate(), acquiring --> acquires file handle and exclusive vnode lock on file vnode attributes, including Size, valid at this point in time
    • blocks waiting for locked file vnode - does VOP_SETATTR() of Size, changing the file's size
      • releases the file vnode
    • acquires file vnode and fills in now stale attributes including the old wrong Size
      • does a read() which returns wrong data size

This patch fixes the problem by saving a timestamp in the NFS vnode
in the VOPs that modify the file (Setattr-of-size, Allocate, Deallocate).
Then lookup compares that timestamp with the time just before
starting the Lookup RPC after it has acquired the file's vnode.
If the modifying RPC occurred during the Lookup, the attributes
in the RPC reply are discarded, since they might be stale.
Although Deallocate never changes file size, it does change
modify_time and the NFSv4 Change attribute, so I included it.

With this patch the test program works as expected.

I believe there is a similar race between Lookup and Writes,
which I am working on a separate patch for,

Test Plan

Ran the test program (which failed without the patch)
and also ran a normal test cycle, which did not identify
any regression.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

What about nfsrpc_readdirplus, nfs_mknodrpc, nfs_lookitup, and nfs_create? It looks like they're all vulnerable to the same race. Though in some of those it's only a directory vnode that could be affected, so there's no file size to worry about.

sys/fs/nfsclient/nfs_clvnops.c
1416

Should you also check the timestamp right here?

Good point w.r.t. readdirplus (I never use it, so I always forget
about it;-).

  • VOP_CREATE() is only called if the Lookup fails, so it is creating a new file and that implies "no extant NFS vnode that would already have attributes".
  • VOP_MKNOD() is similar, but for non-regular files.
  • nfs_lookitup() - I'll take a look at this one. I can't remember when it gets used other than as a part of silly rename.
sys/fs/nfsclient/nfs_clvnops.c
1416

I don't think we need to worry about directories. The operations
that are setting the timestamp are all changing regular files.
Modify time for directories isn't critical and, as you note,
file size is relatively meaningless (depends on server file
system implementation and is just handed to userland for
information).

If we do need to worry about directory attributes being stale,
then a lot more work would be needed.

sys/fs/nfsclient/nfs_clvnops.c
1416

Makes sense. I can live with an out-of-date mtime on a directory.

Added checks for n_localmodtime to nfs_lookitup()
and nfsrpc_readdirplus(), as suggested by asomers@.

What's up with these file names that end in .write and .stat ?

Just how I keep track of different variants of the files.
I currently have at least 6 patches in progress for the
NFS client.
I do not use any tool like git to manage my own
patch versions.

Just get rid of the .write suffix on the "after" diff
file names, to avoid confusion.

This revision is now accepted and ready to land.Oct 30 2021, 1:13 AM