Page MenuHomeFreeBSD

cred: distribute reference count per thread
ClosedPublic

Authored by mjg on Mar 9 2020, 8:35 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Oct 17, 7:32 PM
Unknown Object (File)
Thu, Oct 2, 11:01 PM
Unknown Object (File)
Tue, Sep 30, 12:49 AM
Unknown Object (File)
Sep 20 2025, 9:10 AM
Unknown Object (File)
Sep 16 2025, 8:33 AM
Unknown Object (File)
Sep 13 2025, 8:16 AM
Unknown Object (File)
Sep 8 2025, 5:14 PM
Unknown Object (File)
Sep 6 2025, 5:49 PM
Subscribers

Details

Summary

See the patch for description how this works. It may need some tidy ups but I consider it committable in terms of how it works.

mutex is used for simplicity until this is proven to work out. I don't expect it to be a problem in the foreseeable future (modulo some microbenchmarks perhaps).

In contrast to D22811 this automatically takes care of all crhold/crfree consumers (in particular meaning both file and vm object). I verified with open1 and brk1 microbenchmarks that cred-related bouncing is eliminated. open1 is now gated on openfiles updates and brk1 on swap_reserve_by_cred, both of which should be solvable with something in the lines of sloppy counting.

Test Plan

Did some tests with changing creds et al while watching vmstat -z | grep -i cred. They always go back. Will run kyua later.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

  • introduce td_realucred. this fixes a bug against code which temporarily swaps td_ucred with something else
sys/kern/kern_prot.c
1868 ↗(On Diff #69342)

Why "cow"?

2133 ↗(On Diff #69342)

Don't we need the cred mutex here?

sys/sys/proc.h
310 ↗(On Diff #69342)

Missing a comment.

sys/sys/ucred.h
56 ↗(On Diff #69342)

Missing comments.

sys/kern/kern_prot.c
1868 ↗(On Diff #69342)

see callers -- thread_cow_get_proc and so on

2133 ↗(On Diff #69342)

for the old struct, yes.

  • add missing lock to proc_set_cred
  • add some comments
  • rebase
  • fix cred leak on failed fork

modulo abi assert adjustments i think this is committable

sys/sys/proc.h
270 ↗(On Diff #72361)

Why td_realucred is needed ? Also, I do not think that td_realucred is real pointer/reference, it is only used for comparison.

sys/sys/ucred.h
51 ↗(On Diff #72361)

Explain the usage of cr_mtx.

sys/sys/proc.h
270 ↗(On Diff #72361)

It is needed because there is code which temporarily switches credentials to something else, which makes it unsafe to modify the per-thread cred refcount.

sys/sys/ucred.h
51 ↗(On Diff #72361)

you mean the locking key like for other structs?

sys/sys/proc.h
270 ↗(On Diff #72361)

Is it only NFS or something else as well ? If only NFS, why not do full switching, with references transition ?

sys/sys/ucred.h
51 ↗(On Diff #72361)

Can it be done this way ? I thought that the rules are more involved than just 'some fields are under cr_mtx'.

sys/sys/proc.h
270 ↗(On Diff #72361)

It's also aio and some other stuff. This is the easiest approach I could come up with and has a nice side effect of easily asserting that creds got restored on kernel<->user boundary.

Can you explain why the following cannot work:

  • thread only own one reference on td_ucred, when td_ucred points to this ucred
  • all additional references come to td_ucredref

You have to fully convert thread references to proper ucred references on all changes to td_ucred, which you identified as syscall enter, NFS/AIO + might be something else. In other words, no cr_users and no td_realucred.

This is almost identical t what you described in comment in D22811. So any reason why such natural approach cannot work must be documented and explained.

The key is that cr_ref is arbitrary as long as cr_users > 0.

Consider a case where a thread keeps generating new file objects and sending them over a unix socket, where a different process receives and closes them. Assuming they all use the same creds, the first thread will keep incrementing its td_ucredref (and in particular allowing it to overflow at some point) while the receiving thread in the other process will keep decrementing its own count. Then should the the latter thread change credentials, transferring the count would give bogus cr_ref unless the former thread also transferred its own ref. cr_users signifies when the count becomes legitimate.

The fundamental difference between this patch and the older one is that instead of having a shared place to modify the count by cooperating parties, the place is per thread.

In D24007#554032, @mjg wrote:

The key is that cr_ref is arbitrary as long as cr_users > 0.

Consider a case where a thread keeps generating new file objects and sending them over a unix socket, where a different process receives and closes them. Assuming they all use the same creds, the first thread will keep incrementing its td_ucredref (and in particular allowing it to overflow at some point) while the receiving thread in the other process will keep decrementing its own count. Then should the the latter thread change credentials, transferring the count would give bogus cr_ref unless the former thread also transferred its own ref. cr_users signifies when the count becomes legitimate.

The fundamental difference between this patch and the older one is that instead of having a shared place to modify the count by cooperating parties, the place is per thread.

What you described above is a good point and must be added in the comment, like 'while cr_users > 0, c_ref is not valid and cannot be used to decide that the ucred structure can be freed'.
But my question is about the need for td_realucred.

Given aio et al always returning td_realucred it would deteriorate to the original performance problem. Switching creds can avoid any atomics if the switched-to creds are already held and are always switched back from. Then we are back to not having to sync td_realucred on each switch.

In D24007#554122, @mjg wrote:

Given aio et al always returning td_realucred it would deteriorate to the original performance problem. Switching creds can avoid any atomics if the switched-to creds are already held and are always switched back from. Then we are back to not having to sync td_realucred on each switch.

For AIO the handling of the job includes switching the address spaces, that for all practically occuring situations means two full TLB shootdown (PTI for Intel, lack of PCID for AMD). In this situation two more atomics should not matter.

Similarly, I believe you refer to the code which passes the io request to nfsiod threads, for NFS. There the cost is the network transaction.

I did not mean the single-threaded cost, but dirtying the (likely heavily shared) credential. With the patch as proposed this will normally only be happening on thread creation and destruction, which is very rare compared to any other operations.

In D24007#554335, @mjg wrote:

I did not mean the single-threaded cost, but dirtying the (likely heavily shared) credential. With the patch as proposed this will normally only be happening on thread creation and destruction, which is very rare compared to any other operations.

AIO/NFS should be unfrequent due to the backpressure, as well.

Anyway, I probably do not see much sense in arguing more. I still have a request, please change the type for td_realucred from struct ucred * to either void * or better uintptr_t, so that it is clear that this is not some credentials reference that is useful as credentials, for anything. Your patch only compares a ucred pointer with td_realucred, and this way it would not cause question 'what for that credentials are'.

  • tweak the main comment
  • fix abi checks

Changing the type of td_realucred gets in the way, for example it is assigned from crcowget. Thus I decided to skip as it would require several type casts.

This revision is now accepted and ready to land.Jun 9 2020, 9:26 PM
This revision was automatically updated to reflect the committed changes.