Perhaps add some language that it is up to the filesystem to populate name cache as appropriate.
Also I believe it is useful to note that name cache cannot be too dynamic, which means that pseudo file systems tend to not use cache. For instance, procfs and devfs. I believe that nullfs does not use namecache at all due to the structure.
Note in the text that it might be useful to allow special filesystems to avoid namecache, nonc tmpfs option as an example.
Not component but a vnode, i.e. the lookup result. I think it is too confusing to leave it as is.
This statement depends on fs. For instance, for UFS it is not true, i.e. there cannot exist a cached inode without corresponding vnode. Even for SU, where inode block might have dependencies attached to it (and thus cannot be reclaimed), it is still not thecase. I think this is also false for any filesystem in FreeBSD that uses persistent storage.
In other words, only tmpfs has this special feature.
I think this line needs some grammar fix, not sure which
It states filesystems have to call cache_enter and even shows how ufs can end up doing it.
devfs is a bad example in that it is mostly static and at least one node (null) keeps seeing a lot of traffic
i'll see about procfs
In that case can something be done about:
/* * clustering stuff */ daddr_t v_cstart; /* v start block of cluster */ daddr_t v_lasta; /* v last allocation */ daddr_t v_lastw; /* v last write */ int v_clen; /* v length of cur. cluster */
Removing this and moving v_hash into a 4 byte hole is almost enough to shrink the struct enough to fit 9 instead of 8 objects per page. There are few more bytes to shave elsewhere in the struct.
Of course zfs/tmpfs/nullfs do not use these fields, because they are for clustering support in the buffer cache.
I do not have a good idea where to move them. It would be fine to have them in inode, but then clustering code needs a way to find the place having vp as an input. Perhaps there could be a VOP like vop_clusterdata() that returns a pointer to the structure of these four fields.
I don't have a strong opinion, just a suggestion. It does save an indirect function call.
There is a lot more to remove or shrink. Basically the vnode can be made to fit 9 per page with some headway.
Thanks for rewriting this comment.
I have provided some suggestions for cleanup and/or clarification.
Like byte-range file locking, I originally implemented the name cache as part of UFS. Both were later extracted from UFS and provided as kernel support so that they could be used by other filesystems.
thorought should either be thorough or thoroughout
shortcommings => shortcomings
represented => described
vnode (see => vnode, see
hacks around the problem => minimises the number of locks
set => provided
For lockless case forward lookup avoids any writes to share areas apart from the terminal path component. =>
that => than
I do not understand this sentence: now hard to replace with malloc due to dependence on SMR.
I don't think this covers it. Key here is that this scales completely and it stems from not bouncing cache lines with anyone.
First, consider a per-SMR kernel.
Namecache maintains its own zones but there is no good reason to do it that I know of, instead it could use malloc (with better than power-of-2 granularity for malloc zones). In general, if more of the kernel was using malloc instead of hand rolled zones, it would be easier to justify better malloc granularity and likely get better total memory usage. For example most names are < 8 characters or so, yet every single one allocates a 104 byte object with 45 bytes to hold them. But there is already some overhead for the very fact that we are dealing with a namecache entry, so the buffer which comes after it has to be worth it. Should namecache-specific zones get abolished, the problem would disappear.
Next note in the current implementation traversal has to access memory from any of 4 namecache zones, the vnode zone and pwd zone (for cwd and other pointers).
Consider a kernel with a generalized safe memory reclamation mechanism which supports all allocations. Whether the namecache is using custom zones, malloc or whatever else, there would be smr_enter or compatible call upfront to provide protection. This is roughly how it works in Linux with RCU.
SMR as implemented here is very different. "vfs_smr" object is created and installed in aforementioned zones, then smr_enter(vfs_smr) provides protection against freeing from said zones. But it does not provide any protection against freeing from malloc or whatever else not roped in with vfs_smr. Let's say someone added malloc_smr. It would have to be smr_enter-ed separately and that comes with a huge single-threaded cost (an atomic op). There was a mode to do it without said op, but it would still come with extra overhead making this a non-starter. I don't know how the current could would handle a hypothetical global smr.
That's the rough outline. As you can see the comment assumed some familiarity with the above.