Add UMA_ZONE_ALLFINI flag
Needs ReviewPublic
Actions

Authored by kib on Oct 28 2021, 5:46 AM.

Details

Reviewers

markj
• hselasky

Summary

This is for discussion, I do not intent the patch to be committed as is, most likely.

The new UMA_ZONE_ALLFINI flag is only meaningful together with UMA_ZONE_NOFREE. It requests that zone fini() method is called on all free elements in the zone keg.

The problem with the implementation is that it only works on the free slabs, free elements in the partially filled slabs are ignored. In other words, it behaves strange if there is a leak already. Also it consumes the last unused bit in flags.

Test Plan

See the test in the diff, in particular, commented out leak case.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

kib created this revision.Oct 28 2021, 5:46 AM

Herald added a subscriber: imp. · View Herald TranscriptOct 28 2021, 5:46 AM

kib requested review of this revision.Oct 28 2021, 5:46 AM

This is conceptually similar to the recent D32242, I think. In particular, we don't support destruction of a _NOFREE zone at all today (all slabs are leaked), but it could be implemented. The approach in that revision does not require a new flag, but I believe it provides the desired guarantee w.r.t. uk_fini. I'm not sure what you can do about leaked items.

My question then is, what's the motivation for the change? I don't object to something like this, but of course I'd prefer to avoid using _NOFREE entirely.

Is it simpler to have a flag to skip ZONE reclaim?

In D32702#738214, @markj wrote:

This is conceptually similar to the recent D32242, I think. In particular, we don't support destruction of a _NOFREE zone at all today (all slabs are leaked), but it could be implemented. The approach in that revision does not require a new flag, but I believe it provides the desired guarantee w.r.t. uk_fini. I'm not sure what you can do about leaked items.

My question then is, what's the motivation for the change? I don't object to something like this, but of course I'd prefer to avoid using _NOFREE entirely.

mlx5en(4) uses zone to manage items that reflects some hardware resource. Technically it is TLS send tags which are preallocated during the driver startup, creating corresponding resources in firmware, which requires sleeping context. Then, when upper layer passes a request to handle a new tx TLS connection which requires send tag, driver does allocation from zone.

Alternative there is to use custom allocator etc, but desire was to get the benefits from UMA caching and not design something either stupid like single list+mutex or stupid and too complicated like per-cpu cache and rebalancing, which is already handled by UMA.

But then, we need a way to enumerate and desrtoy all elements in the zone on driver unload, otherwise firmware would not allow to create a new resource until hw reset. There either guaranteed call to fini, or some enumerator that calls our callback for each element, are needed.

In D32702#744902, @kib wrote:

In D32702#738214, @markj wrote:

This is conceptually similar to the recent D32242, I think. In particular, we don't support destruction of a _NOFREE zone at all today (all slabs are leaked), but it could be implemented. The approach in that revision does not require a new flag, but I believe it provides the desired guarantee w.r.t. uk_fini. I'm not sure what you can do about leaked items.

My question then is, what's the motivation for the change? I don't object to something like this, but of course I'd prefer to avoid using _NOFREE entirely.

mlx5en(4) uses zone to manage items that reflects some hardware resource. Technically it is TLS send tags which are preallocated during the driver startup, creating corresponding resources in firmware, which requires sleeping context. Then, when upper layer passes a request to handle a new tx TLS connection which requires send tag, driver does allocation from zone.

Alternative there is to use custom allocator etc, but desire was to get the benefits from UMA caching and not design something either stupid like single list+mutex or stupid and too complicated like per-cpu cache and rebalancing, which is already handled by UMA.

But then, we need a way to enumerate and desrtoy all elements in the zone on driver unload, otherwise firmware would not allow to create a new resource until hw reset. There either guaranteed call to fini, or some enumerator that calls our callback for each element, are needed.

If the send tags are enumerated and allocated at driver initialization time, can't you use a UMA cache zone (i.e. uma_zcache_create()) to provide caching for them? This is how bufs and vm_pages are cached, for example. That is, an array of send tags can be allocated directly by the driver using kmem_malloc() or whatever, and a simple bitmap-based allocator can provide the backend import/release implementations for the UMA cache.

In D32702#744922, @markj wrote:

In D32702#744902, @kib wrote:

In D32702#738214, @markj wrote:

This is conceptually similar to the recent D32242, I think. In particular, we don't support destruction of a _NOFREE zone at all today (all slabs are leaked), but it could be implemented. The approach in that revision does not require a new flag, but I believe it provides the desired guarantee w.r.t. uk_fini. I'm not sure what you can do about leaked items.

My question then is, what's the motivation for the change? I don't object to something like this, but of course I'd prefer to avoid using _NOFREE entirely.

mlx5en(4) uses zone to manage items that reflects some hardware resource. Technically it is TLS send tags which are preallocated during the driver startup, creating corresponding resources in firmware, which requires sleeping context. Then, when upper layer passes a request to handle a new tx TLS connection which requires send tag, driver does allocation from zone.

Alternative there is to use custom allocator etc, but desire was to get the benefits from UMA caching and not design something either stupid like single list+mutex or stupid and too complicated like per-cpu cache and rebalancing, which is already handled by UMA.

But then, we need a way to enumerate and desrtoy all elements in the zone on driver unload, otherwise firmware would not allow to create a new resource until hw reset. There either guaranteed call to fini, or some enumerator that calls our callback for each element, are needed.

If the send tags are enumerated and allocated at driver initialization time, can't you use a UMA cache zone (i.e. uma_zcache_create()) to provide caching for them? This is how bufs and vm_pages are cached, for example. That is, an array of send tags can be allocated directly by the driver using kmem_malloc() or whatever, and a simple bitmap-based allocator can provide the backend import/release implementations for the UMA cache.

Hm, probably this can made work. But it is also somewhat round-about way for items that are basically just chunks of memory with an additional state. vm_page and even cached buffers are different in this regard.