Page MenuHomeFreeBSD

hash(9): introduce hashalloc()/hashfree() KPI
Needs ReviewPublic

Authored by glebius on Tue, Mar 17, 7:27 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Apr 8, 4:18 PM
Unknown Object (File)
Wed, Apr 8, 9:55 AM
Unknown Object (File)
Sun, Apr 5, 1:21 PM
Unknown Object (File)
Tue, Mar 31, 4:38 PM
Unknown Object (File)
Thu, Mar 26, 9:56 AM
Unknown Object (File)
Tue, Mar 24, 11:50 PM
Unknown Object (File)
Tue, Mar 24, 11:49 PM
Unknown Object (File)
Tue, Mar 24, 9:48 AM

Details

Reviewers
gallatin
pouria
Group Reviewers
Src Committers
Doc Committers
Summary

This is a more extendable version than traditional hashinit(9). It allows
different kinds of slot headers with optional locks.

Implement traditional hashinit()/hashdestroy() on top of it.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped
Build Status
Buildable 71469
Build 68352: arc lint + arc unit

Event Timeline

Last minute change before sharing of course had a small mistake :)

This is really clever. What about other lock types (rm / rms, sx, etc)?

I'll see if rmlock(9) and sx(9) can be added and update.

Do you need to validate the enum value is know?

Can the lock and list types be automatically detected with _Generic somehow?

In D55904#1279057, @imp wrote:

Do you need to validate the enum value is know?

Do you mean if somebody passes a numeric instead of one of enum values? I think our compiler flags will forbid that.

Can the lock and list types be automatically detected with _Generic somehow?

I don't see a way to use _Generic facing outside of the KPI. It might be possible to make some use of it internally, I'll look into that.

Add sx(9) and rmlock(9) support.

In D55904#1279057, @imp wrote:

Do you need to validate the enum value is know?

Do you mean if somebody passes a numeric instead of one of enum values? I think our compiler flags will forbid that.

I'm thinking default: would make sure.

Can the lock and list types be automatically detected with _Generic somehow?

I don't see a way to use _Generic facing outside of the KPI. It might be possible to make some use of it internally, I'll look into that.

Sure.

Some additions to the manual page.

This revision is now accepted and ready to land.Wed, Mar 18, 5:28 PM
sys/kern/subr_hash.c
160–162

Why <= then >>=?
why not just <?

share/man/man9/hashinit.9
47–52

Are we going to remove hashinit_flags and hashdestroy as well?
We just updated them to use the new KPI, and we inform users that we plan to obsolete them at the same time.
If that's not the case, IMHO, we should include them in the new manual.
Otherwise, we should put a deprecation warning above their functions.

sys/kern/subr_hash.c
180

style?

284

style?

share/man/man9/hashalloc.9
2–3

SPDX tag?

jhb added inline comments.
share/man/man9/hashalloc.9
59

Usually contiguous means "physically contiguous" in the kernel (e.g. contigmalloc). malloc() always returns virtually contiguous memory, so if you are only talking about virtual contiguous here, I would remove the word to avoid confusion.

sys/kern/subr_hash.c
180

If you make slot and men of type char * instead of void * you can avoid all the char * casts. char * is the pointer spelling of uintptr_t as it were.

272

Can args be const here? That would clarify that hashfree can't return an error.

sys/kern/subr_hash.c
160–162

This follows exactly logic of hashinit(9):

The hashinit() function allocates hash tables that are sized to largest power of two less than or equal to argument nelements.

And new API offers the same wording. With suggested code we would create a power of two that is greater or equal.

From the hashinit(9) it is not actually clear what they meant with nelements - hash slots or objects stored in the hash? Most modern code treats as the former. But it could be that original logic was that nelements is expected number of objects. In that case logic makes sense, as we don't want guaranteed underfilled hashes. For example, nelements = 5 shall yield in size 4 hash not in size 8.

180

Pouria, I strongly disagreed with the part of style(9) that suggest to put all variable declarations to the top of the function. And if I remember correct this demand from the style.7 was finally removed.

I find a code much easier to read when variables are limited in their visibility scope. This also sends a hint to any future editor of the code, that the variable is supposed to have a limited visibility for a reason.

180

John, this can be done for hashalloc() easily and for hashfree() we would need an extra local variable to recast the argument from void *. I can do it if you insist.

But, IMHO, if I give you three code snippets without any extra context, and you need to guess the types and guess what the coder wanted to achieve:

header = header + i * hdrsize;
slot = (char *)mem + i * hdrsize;
slot = mem + i * hdrsize;

, then in the first two cases the result will be obvious to you and in the very last case it will be not. What I'm trying to tell is that (char *)s here are improving readability :) But if you insist, I can do that.

272

Right now hashfree() may change args->hdrsize, as it calls hashalloc_sizes(). With extra coding this can be avoided, of course.

I do not see a value to put const into API here. The structure is supposed to be opaque to the caller and is not supposed to be used with anything else except this API. Not to be shares with anything else. What value would constness give?

LGTM
Phabricator won't let me accept behalf of myself.

I've had a quick look at making pf use this, and I have a minor annoyance.

Pf allocates a state table per vnet, and might have different hash table sizes in different vnets. All other settings (name, lock type, ..) are the same, but the size may be different, so that leads to this:

diff --git a/sys/net/pfvar.h b/sys/net/pfvar.h
index 87ed701f66a7..0c061ac22c37 100644
--- a/sys/net/pfvar.h
+++ b/sys/net/pfvar.h
@@ -40,6 +40,7 @@
 #include <sys/cpuset.h>
 #include <sys/epoch.h>
 #include <sys/malloc.h>
+#include <sys/hash.h>
 #include <sys/nv.h>
 #include <sys/refcount.h>
 #include <sys/sdt.h>
@@ -2632,6 +2633,14 @@ struct pf_keyhash {
        struct mtx                      lock;
 };

+static const struct hashalloc_args PF_KEYHASH_ARGS = {
+       .mflags  = M_NOWAIT | M_ZERO,
+       .mtype   = M_PFHASH,
+       .head    = HASH_HEAD_LIST,
+       .lock    = HASH_LOCK_MTX,
+       .lname   = "bucket of pf keys",
+};
 struct pf_idhash {
        LIST_HEAD(, pf_kstate)          states;
        struct mtx                      lock;
diff --git a/sys/netpfil/pf/pf.c b/sys/netpfil/pf/pf.c
index 90342f045763..4ee6ba3140cd 100644
--- a/sys/netpfil/pf/pf.c
+++ b/sys/netpfil/pf/pf.c
@@ -1455,20 +1455,22 @@ pf_initialize(void)
            sizeof(struct pf_state_key), pf_state_key_ctor, NULL, NULL, NULL,
            UMA_ALIGN_PTR, 0);

-       V_pf_keyhash = mallocarray(V_pf_hashsize, sizeof(struct pf_keyhash),
-           M_PFHASH, M_NOWAIT | M_ZERO);
+       struct hashalloc_args keyhash_args = PF_KEYHASH_ARGS;
+       keyhash_args.size = V_pf_hashsize;
+       V_pf_keyhash = hashalloc(&keyhash_args);
        V_pf_idhash = mallocarray(V_pf_hashsize, sizeof(struct pf_idhash),
            M_PFHASH, M_NOWAIT | M_ZERO);
        if (V_pf_keyhash == NULL || V_pf_idhash == NULL) {
                printf("pf: Unable to allocate memory for "
                    "state_hashsize %lu.\n", V_pf_hashsize);

-               free(V_pf_keyhash, M_PFHASH);
+               hashfree(V_pf_keyhash, &keyhash_args);
                free(V_pf_idhash, M_PFHASH);

It'd be cleaner if the size was a separate argument, then we could use V_pf_hashsize directly, and not bother with copies of the const PF_KEYHASH_ARGS.
I wonder if that's a common pattern, or just a pf thing.
I suppose I could have wrapper functions to deal with this for me, because it's going to recur in idhash and udpendpointhash.

Thanks for feedback, Kristof! I don't think this is a common pattern. I think the cleanest way for pf would be to a wrapper function that has hashalloc_args on its stack and has size argument. It shall also return the allocated size. This will not spend data on bss for hashalloc_args. Lines of code wise it should be just 4 lines extra: 3 for the function header one for }.

Also in case of pf the wrapper can be used to allocate/init different hash tables, if it also takes an argument for the lock name.

Looking into that for a second time, I do not see a problem at all. When you enter pf_initialize() the V_pf_hashsize is already set. So just put the hashalloc_args on the stack of pf_initialize() and that's it.

  • make hashfree(9) like free(9) - digest NULL
  • use size_t for size, to allow enormous hash tables
  • augment lock flags with XXX_NEW when user didn't ask for M_ZERO
This revision now requires review to proceed.Fri, Mar 27, 7:18 PM

Ok, this is how it looks like for pf(4): https://reviews.freebsd.org/D56113 Running tests now, good so far.

zlei added inline comments.
sys/kern/subr_hash.c
52

The hashmask or table size is the nature of a hash table. It requires the users to provide the storage of hashmask.

I like this one,

/* Fixed sized hash table */
struct hashtable {
	long hashsize, /* hashmask ? */
	struct {
		LIST_HEAD(generic, generic) head,
		struct mtx          lock,
	} buckets[],
};

void
hashalloc(int elements, struct malloc_type *type, int flags)
{
...
	struct hashtable *tbl = malloc( ... );
	
	tlb->hashsize = hashsize;
	return (tlb);
}

void
hash_init(struct hashtable *tbl)
{

	long hashsize = tlb->hashsize;
	for (i = 0; i < hashsize; i++) {
		LIST_INIT();
		mtx_init();
	}
}

void
hash_destroy(struct hashtable *tbl)
{

	long hashsize = tlb->hashsize;
	for (i = 0; i < hashsize; i++) {
		/* ASSERT LIST_EMPTY */
		mtx_destroy();
	}
}

The size of CK_LIST_HEAD is same with LIST_HEAD. But this design does not meet other usage, say SLIST_HEAD.

For the lock, the a mutex should be sufficient, given it is to protect the bucket only. I'm not sure if other lock type ( say sx ) is required.

sys/kern/subr_hash.c
52

All in tree consumers store the size(or mask) separately from the table. There are consumers that create several hash tables of the same size.

If some module (e.g. your module) really wants to join the array of headers and size in one struct, the KPI doesn't forbid that. You can declare the structure in your module and hashalloc() only the table, and store the size.

52

The size of CK_LIST_HEAD is same with LIST_HEAD. But this design does not meet other usage, say SLIST_HEAD.

Sorry, can't understand this comment. The design supports SLIST_HEAD.

For the lock, the a mutex should be sufficient, given it is to protect the bucket only. I'm not sure if other lock type ( say sx ) is required.

I already see use for rwlocks here. The other locks is what @gallatin suggested. I don't see any problem with adding this support.