Page MenuHomeFreeBSD

pfsync: hold b_mtx for callout_stop(pd_tmo)
ClosedPublic

Authored by kp on Mar 23 2023, 1:33 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, May 4, 5:28 PM
Unknown Object (File)
Sat, May 4, 5:28 PM
Unknown Object (File)
Sat, May 4, 5:28 PM
Unknown Object (File)
Sat, May 4, 5:28 PM
Unknown Object (File)
Sat, May 4, 5:28 PM
Unknown Object (File)
Sat, May 4, 5:27 PM
Unknown Object (File)
Sat, May 4, 5:16 PM
Unknown Object (File)
Jan 14 2024, 7:34 AM

Details

Summary

The pd_tmo callout has an associated mutex, which we must hold while
calling callout_stop().

Reported by: markj
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kp requested review of this revision.Mar 23 2023, 1:33 AM

Remove unrelated changes.

sys/netpfil/pf/if_pfsync.c
418–419

Are you only taking the lock for the sake of callout_stop()? That is, it's ok to do the tailq remove without the mutex held? It's not immediately obvious since b_deferred is decremented with the lock held.

427

Perhaps assert that pd->pd_refs == 0 before freeing?

sys/netpfil/pf/if_pfsync.c
427

That turns out to not quite work. pd_refs isn't really a reference count, but an indication to the timeout function that we want to do the free.

That's a bit silly, so I'm posting D39248 to remove it entirely (and restructure the timeout function a bit).

Protect tailq operation with the bucket lock.

sys/netpfil/pf/if_pfsync.c
417

I think loading TAILQ_FIRST(&b->b_deferrals) also needs to be done under the mutex.

419

Suppose ret == 0, i.e., the callout is about to be executed but is currently waiting for the bucket lock. Once the bucket lock is released, pfsync_defer_tmo() will execute. It will also remove pd from the deferrals list, so we end up removing pd twice, which will presumably result in a panic.

Either we should avoid modifying the list here when callout_stop() returns 0 (can it return -1? I think not?), or the callout handler needs to detect that pfsync_clone_destroy() has already removed the deferral structure from the list.

  • rework locking
  • use pfsync_undefer directly rather than doing basically the same thing with duplicated code
  • improved assertions (assert we've cleaned up all deferred packets)
This revision is now accepted and ready to land.Mar 27 2023, 9:49 PM
This revision was automatically updated to reflect the committed changes.