Page MenuHomeFreeBSD

improve inp locking in getsockopt, setsockopt, and related functions
AbandonedPublic

Authored by jason_eggnet.com on Oct 4 2017, 2:03 AM.

Details

Reviewers
rwatson
Group Reviewers
network
transport
Summary
  • Specifically during operations on the inp at the level of IPPROTO_IP and IPPROTO_IPV6.
  • Consistently handle locking in tcp_ipsec_pcbctl(), as well as use rlocks for reads, and wlocks for writes.
  • Lock in6p in ip6_getpcbopt().
  • Improve lock handling in ipsec_control_pcbpolicy()

Diff Detail

Lint
Lint OK
Unit
No Unit Test Coverage
Build Status
Buildable 13417
Build 13647: arc lint + arc unit

Event Timeline

rename OPTSET2292_REQUIRED to OPTSET2282_EXCLUSIVE

jason_eggnet.com edited the summary of this revision. (Show Details)Oct 4 2017, 3:59 AM

This issue was discovered by rare memory corruption when calling ip6_optlen(). The likely culprit was unlocked manipulation of in6p->in6p_outputopts.

We use DSCP / Traffic Class, which manipulates in6p_outputopts without locking in6p. That lead to a fuller audit of locking the inp / in6p at the IPPROTO_IP / IPPROTO_IPV6 levels.

setsockopt(s, IPPROTO_IPV6, IPV6_TCLASS, ...);

I have performed some testing with this code with WITNESS / INVARIANTS / INVARIANT_SUPPORT. But wanted to get feedback from the community before wider testing, as these are rather extensive locking changes.

ae added a comment.Oct 4 2017, 7:06 AM

Probably you need to modify netipsec/ipsec_pcb.c too.

sys/netinet/ip_output.c
1249

Is it necessary to acquire wlock for getsockopt?

jch added a subscriber: jch.Oct 4 2017, 7:21 AM
In D12586#261038, @ae wrote:

Probably you need to modify netipsec/ipsec_pcb.c too.

I'll take a look.

sys/netinet/ip_output.c
1249

For some but not all. I'll take a second pass at getsockopt.

  • Consistently handle locking in tcp_ipsec_pcbctl(), as well as use rlocks for reads, and wlocks for writes.
  • Lock in6p in ip6_getpcbopt().
  • Improve lock handling in ipsec_control_pcbpolicy().
jason_eggnet.com retitled this revision from ensure inp is locked during setsockopt and getsockopt to improve inp locking in getsockopt, setsockopt, and related functions.Oct 5 2017, 10:43 PM
jason_eggnet.com edited the summary of this revision. (Show Details)
swills added a subscriber: swills.Oct 6 2017, 2:01 PM
ae added inline comments.Oct 10 2017, 10:11 AM
sys/netinet6/ip6_output.c
2081

What is the purpose of this bzero?

2421–2423

I think it is not safe enough. Some of options use on-stack variable copies, but other do not. And it is possible to do dereference of already freed memory after INP_RUNLOCK().

sys/netipsec/ipsec_pcb.c
454

It seems you mixed INP_WUNLOCK() here and in some places (not all) in ipsec_set_pcbpolicy. Probably it will be better to have one wunlock here, than many in ipsec_set_pcbpolicy.

sys/netipsec/xform_tcp.c
106 ↗(On Diff #33731)

This block seems not needed. tp = intotcpcb(inp) can be moved below in the INP_WLOCK_RECHECK section.

124 ↗(On Diff #33731)

You changed the locking here, but tcp_usrreq.c is not changed and it expects that INP_WLOCK is acquired.

sys/netinet6/ip6_output.c
2081

Absolutely pointless, I will remove it.

2421–2423

Will fix this.

sys/netipsec/ipsec_pcb.c
454

Agreed, correcting.

sys/netipsec/xform_tcp.c
106 ↗(On Diff #33731)

Agreed, fixing.

124 ↗(On Diff #33731)

I missed TCPMD5_PCBCTL

I can match IPSEC_PCBCTL to the behavior of TCPMD5_PCBCTL, for the callers and the function, with the aim of not altering tcp_usrreq.c, which I think can be done.

TCPMD5_PCBCTL
on set, expects lock already acquired
on set error, expects lock to be released
on set no error, expects lock to be acquired

TCPMD5_PCBCTL
on get, expects lock already acquired
on get error or no error, expects lock to be released

This patch's behavior for IPSEC_PCBCTL:

IPSEC_PCBCTL
on set, expects lock not acquired
on set error or no error, expects lock to be released

IPSEC_PCBCTL
on get, expects lock not acquired
on get error or no error, expects lock to be released

gnn added a reviewer: rwatson.Oct 19 2017, 4:05 PM
rwatson added a subscriber: jhb.Oct 20 2017, 9:30 AM

Overall, correcting missing locking around non-atomic sequences of option changes -- e.g., the read-modify-write sequences relating to IPSEC -- seems an excellent idea. I'm not convinced, however, that additional locking is required around many of the single-field in-place updates (e.g., of the TTL) on x86 (with TSO), and from our conversation in the transport-protocol call last night, it sounds like you are not encountering problems with them. However, it may make sense to make the change regardless, in order to improve maintainability -- i.e., in case those structure fields see other lock-protected updates than are currently in the source tree, which could themselves suffer races. And it seems unlikely that any are in fast paths, nor will experience substantial contention. @jhb will be in Cambridge next week and I'll chat with him about the atomic write aspect (e.g., in particular, while the stores to fields such as the TTL will be atomic on all architectures, guarantees about ordering and progress will be weaker on some, such as POWER and ARMv8), to get his take.

So, overall, I'm supportive, but I would be tempted to break this out into two commits: (1) addressing definite bugs involving non-atomic read-modify write or compound write sequences that are known to cause problems (or definitely cause problems), and (2) those that are more discretionary changes that are unlikely to resolve bugs but may improve future maintainability.

update outstanding issues

I believe this diff covers all outstanding issues, it's a fairly
significant diff.

It looks like there might be cases in ip_ctloutput, ip6_ctloutput, rip_ctloutput, and ip6_raw_ctloutput where the inp/in6p being null is ok.

Another patch incoming to address that.

allow for inp / in6p being null in certain cases within ip_ctl functions

It looks like a socket is guaranteed to have a single defined inp that is valid (specifically, not free'd) from the time socket or accept is called all the way to the time that close is called.

Due to that, I'm going to unwind the in_pcbref usage which will greatly reduce this patch.

remove in_pcbref in ctl functions

Missed INP_WUNLOCK() in ipv6 setsockopt() processing

Fix memory leak in ip_ctloutput; add documentation to pcbctl function.

shurd added a subscriber: shurd.Dec 7 2017, 8:57 PM

I've been staring at this for a bit and think we may be able to use mbuf_tags(9) instead of malloc/free

jason_eggnet.com abandoned this revision.Mar 8 2018, 4:38 PM

I am abandoning this in favor of 6 smaller reviews, per @rwatson's general advice to break up the patch.

https://reviews.freebsd.org/D14619 - refactor ip6_getpcbopt() for better locking and memory management
https://reviews.freebsd.org/D14620 - improve write locking in ip6_ctloutput() with macros
https://reviews.freebsd.org/D14621 - fix memory management for fetching options in ip_ctloutput()
https://reviews.freebsd.org/D14622 - handle locking and memory safety for IPV6_PATHMTU in ip6_ctloutput()
https://reviews.freebsd.org/D14623 - Fix locking for IPSEC_PCBCTL and related functions.
https://reviews.freebsd.org/D14624 - simple locking fixes in ip_ctloutput, ip6_ctloutput, rip_ctloutput