MFC after: 1 week
Sponsored by: Mellanox Technologies
Details
Diff Detail
- Repository
- rS FreeBSD src repository - subversion
- Lint
Lint Skipped - Unit
Tests Skipped - Build Status
Buildable 32967
Event Timeline
So what/where is the API specification ?
What are the synchronization guarantees ?
Are there any peculiarities in the implementation worth mention ?
Documentation is here:
https://www.kernel.org/doc/html/latest/core-api/xarray.html
Linux use a lot of atomics in there. We don't.
What are the synchronization guarantees?
All writing is done under a mutex.
All reading is done under RCU.
That's it.
I mean, what is the API user contract for locking.
Can you use FreeBSD primitives instead of Linux wrappers ? I mean atomics/mutexes/epochs.
sys/compat/linuxkpi/common/src/linux_radix.c | ||
---|---|---|
306 ↗ | (On Diff #72564) | In other place you also free root node if it has zero children count. |
331 ↗ | (On Diff #72564) | Style: space before '('. |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
61 | This increment lacks an assert that the operation did not overflow. | |
71 | The check for underflow is needed, even if racy. | |
77 | Style, no initialization in definition. Why this recursion is possible (explanation should be a comment) ? Is it because user could lock xa and then call the API ? That said, am I right that API distinguish between unlocked and locked entry points ? mtx_owned() generally should not be used, esp. when the code easily knows the state by contract. | |
109 | What state is kept 'AS-IS' ? I see, at least, that all readers are forced to use mutex while we sleep, is it needed ? |
sys/compat/linuxkpi/common/src/linux_radix.c | ||
---|---|---|
306 ↗ | (On Diff #72564) | I will have to look at this. |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
77 | The API is not clear about locked and unlocked and I think we need to handle both, looking at the driver code we need to support. The problem is that tree manipulations are allowed if flags do not allow sleeping. I can assert that. | |
109 | I don't think so. Let me analyze. |
sys/compat/linuxkpi/common/src/linux_radix.c | ||
---|---|---|
331 ↗ | (On Diff #72564) | Fixed. |
Add some more comments explaining why memory cleanup is not needed when inserting radix tree nodes and a failure happens.
sys/compat/linuxkpi/common/src/linux_radix.c | ||
---|---|---|
306 ↗ | (On Diff #72564) | I'm adding a comment here to explain the logic and why it is not leaking any memory this way. |
sys/compat/linuxkpi/common/include/linux/xarray.h | ||
---|---|---|
49 | Can you please write it in more proper way, 0xffffffff ? If I guess the meaning of the abbreviation 32b correctly. | |
sys/compat/linuxkpi/common/src/linux_radix.c | ||
229 ↗ | (On Diff #75293) | I do not think exclamation mark is warranted in comments. |
292 ↗ | (On Diff #75293) | We usually provide full sentences even in single-line comments. |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
48 | You ignored my notes from previous pass. |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
---|---|---|
48 | I do not want to retype all my comments. One of the biggest one was the request to use FreeBSD primitives directly instead of layering LinuxKPI over LinuxKPI. |
Use FreeBSD native API's when possible as suggested by @kib
Unfortunately we still have to use the LinuxKPI's RCU, because the EPOCH requires an epoch-tracker, which is not part of the xarray API.
Maybe this can be added to the task_struct structure ...
I suggest we use RCU, and then at some point convert RCU in LKPI to use EPOCH. Like said we need an epoch tracker in the LKPI thread structure, and it needs to support recursion.
Don't optimise the xarray implementation with RCU yet, because the radix tree needs to support it first!
Just use simple mutexes for now to protect read and write operations.
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
---|---|---|
91 | If mask must be 2**n-1, this should be asserted. Otherwise, code accept any value for the mask. Are there any required relations between *pindex and mask ? | |
113 | I suspect this could be quite slow for large contig runs. Can you add an interface for radix tree to return next free index ? |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
---|---|---|
91 | Let me put an assert there to make it more clear. |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
---|---|---|
113 | I'm not sure how much we can save and if you want to insert multiple items, then use __xa_alloc_cyclic() , because it tracks the last inserted index, to avoid lookup from the start all the time. |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
---|---|---|
113 | @kib: Can you explain a bit more how the radix tree implementation could find the first free slot faster? I mean, if you want to find the first free slot you need to iterate all the leaf nodes anyway. And the lookup per node is logarithmic, so I don't think we are wasting much time there! |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
---|---|---|
113 | I mean, that for large filled contig runs already existing in the radix tree, calling radix_tree_insert() for each allocated slot adds significant overhead for the implementation. Radix tree code loses its context on return. You could try to use already known tree pointers to iterate further and return next free slot in case of radix_tree_insert() failure. |
sys/compat/linuxkpi/common/src/linux_xarray.c | ||
---|---|---|
113 | I see your point. However can we leave that for later? There are two action items for improving the radix tree:
Action item #2 is a bigger showstopper than #1. I'd rather see we can get going with xarrays, than having everything optimised to the teeth :-) |
Note that things like epochs/RCU provide scalability while giving some hit on single-threaded performance. And the issue of large runs, if the use creates them, also affects single-threaded performance. Anyway, it is not a blocker.
See one small note inline.
sys/compat/linuxkpi/common/include/linux/xarray.h | ||
---|---|---|
48 | U suffix is not needed. The value is automatically promoted to unsigned if it cannot fit into int. |