Page MenuHomeFreeBSD

WIP: inpcb: Split PCB hash tables
ClosedPublic

Authored by markj on Feb 13 2023, 11:16 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 26, 4:17 AM
Unknown Object (File)
Tue, Nov 26, 4:16 AM
Unknown Object (File)
Tue, Nov 26, 4:16 AM
Unknown Object (File)
Tue, Nov 26, 3:49 AM
Unknown Object (File)
Nov 4 2024, 1:22 PM
Unknown Object (File)
Oct 2 2024, 12:50 PM
Unknown Object (File)
Oct 2 2024, 2:38 AM
Unknown Object (File)
Sep 23 2024, 7:18 AM
Subscribers

Details

Summary

Currently we use a single hash table per PCB database for connected and
bound PCBs. Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.

Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb. Now the database has one table each
for connected and unconnected sockets. The patch currently refers to
them as "exact" and "wild" but these names aren't very good; it should
probably be just "conn" and "unconn"?

When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use. This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away. There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.

I also made the "rehash" parameter of in(6)_pcbconnect() unused. This
parameter seems confusing and it is simpler to let the in_pcb code
figure out what to do.

There are two wrinkles in this patch, hence the "WIP":
1a. The raw_ip code (ab)uses these hash tables as well, for now I just

hard-coded the "exact" table.

1b. The inpcb iterator functions can be used to search a hash bucket,

but this functionality is currently only used by raw_ip.
  1. UDP allows a socket to be connected and disconnected multiple times.

For (1) the plan is to modify raw_ip to avoid using these hash tables.
For now the patch works by just using "exact" everywhere, which works
for existing consumers.

For (2), the patch plugs a hole in the inpcb structure and uses it to
store an SMR sequence number. When an inpcb is disconnected, the global
write sequence number is advanced, and in order to re-connect, the
caller must wait for all readers to drain before reusing the hash chain
linkage pointers. I suspect that this is acceptable, but I'm not sure
how common it is to re-use UDP sockets for multiple connections.

Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

sys/netinet/in_pcb.c
517

I think we should allocate order(s) of magnitude less elements for the wild hash than for the normal one.

2545–2556

Just matter of personal preference, but I'd reduce this down to ^^

This revision was not accepted when it landed; it landed in state Needs Review.Apr 20 2023, 4:14 PM
This revision was automatically updated to reflect the committed changes.