Page MenuHomeFreeBSD

kern/dev/nvme: Use the per-namespace uma(9) zone for the bio.
Needs ReviewPublic

Authored by seigo.tanimura_gmail.com on May 27 2024, 10:08 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Sep 9, 12:52 AM
Unknown Object (File)
Sat, Sep 7, 4:24 AM
Unknown Object (File)
Wed, Sep 4, 9:13 AM
Unknown Object (File)
Sun, Sep 1, 5:13 AM
Unknown Object (File)
Tue, Aug 20, 7:04 AM
Unknown Object (File)
Aug 8 2024, 4:54 AM
Unknown Object (File)
Aug 7 2024, 1:48 PM
Unknown Object (File)
Aug 7 2024, 10:47 AM
Subscribers

Details

Reviewers
olce
kib
chuck
mav
Summary

The usage is essentially the same as the VM swap pager.

  • New Loader Tunable & Sysctl Entry
  • hw.nvme.nvme_reserved_new_bios The number of the bios reserved and preallocated per namespace. This is writable for hotplug.

Signed-off-by: Seigo Tanimura <seigo.tanimura@gmail.com>

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 57933
Build 54821: arc lint + arc unit

Event Timeline

I'd like to see data that shows this is the hot path. I recently added counters to count the splits.
Ideally, it would use whatever cloning zones already exist and not invent a new one for the nvme drive, imho.
Do you get the same, better or worse performance if you just disable this feature of the nvme drive entirely?

share/man/man4/nda.4
70

This is the wrong man page to document this.

sys/dev/nvme/nvme_ns.c
387

This style is wrong. Don't use NULL != foo or NULL == foo in the kernel. Usee foo != NULL.

Also, if it's cloning, it should be using the zone for the bio preferentially. This is exactly cloning, but with different parameters.

Finally, this seems as good a point as any, does this actually get called? The cloning code was for some really only drives originally, and more modern ones don't seem to have parameters that make it worthwhile anymore.

599

Style: don't use backwards compares.

600

Don't use backwards compares. (here and everywhere)

sys/dev/nvme/nvme_private.h
103

This shouldn't be a global.

sys/dev/nvme/nvme_sysctl.c
44

This shouldn't be a global. This should be a per-ns tunable (or at least per nvme drive since we don't have per-ns sysctls at the moment). with a default value.

55

Again, not global, but per controller.

I am not fully sure about the motivation of this change, but It feels wrong to me to have per-namespace zones. On a big system under heavy load UMA does a lot of work for per-CPU and per-domain caching, and doing it also per-namespace would multiply resource waste. Also last time I touched it, I remember it was difficult for UMA to operate in severely constrained environments, since eviction of per-CPU caches is quite expensive. I don't remember how reservation works in that context, but I suppose that having dozens of small zones with small reservations, but huge per-CPU caches is not a very viable configuration.