Page MenuHomeFreeBSD

zlei (Zhenlei Huang)
User

Projects

User Details

User Since
Apr 1 2021, 3:21 AM (210 w, 15 h)

Recent Activity

Today

zlei updated the diff for D49357: netinet6: Remove set but not used global variable in6_maxmtu.

Removed local var change.

Thu, Apr 10, 3:23 AM
zlei added inline comments to D49357: netinet6: Remove set but not used global variable in6_maxmtu.
Thu, Apr 10, 3:20 AM
zlei added inline comments to D49357: netinet6: Remove set but not used global variable in6_maxmtu.
Thu, Apr 10, 3:00 AM

Tue, Mar 25

zlei added inline comments to D49486: tools/sysdoc: Chase sysctl rename.
Tue, Mar 25, 8:21 AM

Sun, Mar 23

zlei added a comment to D49447: WIP ifnet: Introduce and use ifnet_byindex_attached() for outpath.

@glebius

So instead of a lockless check for IFF_DYING

I think we eventually do not need IFF_DYING.

Sun, Mar 23, 6:07 AM
zlei added a comment to D49447: WIP ifnet: Introduce and use ifnet_byindex_attached() for outpath.

So instead of a lockless check for IFF_DYING you use lockless check for presence in STAILQ. I don't see a principal change here, but it could make the races less probable.

Sun, Mar 23, 6:01 AM

Fri, Mar 21

zlei added inline comments to D49444: ifnet: Defer detaching address family dependent data.
Fri, Mar 21, 5:44 PM
zlei retitled D49447: WIP ifnet: Introduce and use ifnet_byindex_attached() for outpath from ifnet: Introduce and use ifnet_byindex_attached() for outpath to WIP ifnet: Introduce and use ifnet_byindex_attached() for outpath.
Fri, Mar 21, 5:27 PM
zlei added a comment to D49447: WIP ifnet: Introduce and use ifnet_byindex_attached() for outpath.

This is still WIP, but I want it open for discussing. ifnet_byindex() is renamed to ifnet_byindex_ori() only to show clearly all its usage. Will not be in the final revision.

Fri, Mar 21, 5:26 PM
zlei requested review of D49447: WIP ifnet: Introduce and use ifnet_byindex_attached() for outpath.
Fri, Mar 21, 5:25 PM
zlei added a comment to D49359: ifnet: Fix the teardown process of an interface.

Well I though that it is the best result that the system panics when hitting this bug, but I was wrong.

The test shows it is even possible to write freed memory. That is,
thread A,

			(*dp->dom_ifdetach)(ifp,
			    ifp->if_afdata[dp->dom_family]);
			ifp->if_afdata[dp->dom_family] = NULL;

but thread B see stall reference, i.e. ifp->if_afdata[dp->dom_family] != NULL.

Fri, Mar 21, 3:14 PM
zlei requested review of D49444: ifnet: Defer detaching address family dependent data.
Fri, Mar 21, 3:12 PM
zlei updated subscribers of D49359: ifnet: Fix the teardown process of an interface.

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

I tested the patch for days, it appears quite stable as far as now ( surely, for the input path only ).

Can you share the steps to repeat the Github issue 207 ? I do not understand the original reporter's steps.

I assume you are not testing with netgraph/ppppoe device? The condition doesn't appear to trigger otherwise.

Fri, Mar 21, 1:59 PM
zlei added a comment to D49359: ifnet: Fix the teardown process of an interface.

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

Fri, Mar 21, 10:31 AM
zlei added a comment to D49434: ifnet: Refactor out the detaching of address family dependent data to dedicated function.

This may conflict with your local work ~

Fri, Mar 21, 10:28 AM
zlei requested review of D49434: ifnet: Refactor out the detaching of address family dependent data to dedicated function.
Fri, Mar 21, 10:28 AM

Thu, Mar 20

zlei added a comment to D49412: ifnet: Remove a redundant check for flag IFF_DYING from ifunit_ref().
In D49412#1126775, @mjg wrote:

Looking at if_unlink_ifnet:

IFNET_WLOCK();
CK_STAILQ_FOREACH(iter, &V_ifnet, if_link)
        if (iter == ifp) {
                CK_STAILQ_REMOVE(&V_ifnet, ifp, ifnet, if_link);
                if (!vmove)
                        ifp->if_flags |= IFF_DYING;
                found = 1;
                break;
        }

it very much *is* possible to race it and spot the flag.

cpu0                                                                                            cpu1
CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) {
       /* ifp is now loaded */
                                                                                                     CK_STAILQ_REMOVE(&V_ifnet, ifp, ifnet, if_link);
                                                                                                     ifp->if_flags |= IFF_DYING;
/* the flag *can* be visible by now */
if (strncmp(name, ifp->if_xname, IFNAMSIZ) == 0 &&
!(ifp->if_flags & IFF_DYING))

If anything the bug is that the flag can show up immediately after it was tested for not being there. It may be this happens to be harmless.

However, the claim that flag is impossible to spot here does not hold.

Thu, Mar 20, 7:14 AM

Wed, Mar 19

zlei added a comment to D49359: ifnet: Fix the teardown process of an interface.

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

Is that based on stable/14 ?

Correct. Am I missing patches for this to make more sense?

Wed, Mar 19, 11:50 AM
zlei requested review of D49412: ifnet: Remove a redundant check for flag IFF_DYING from ifunit_ref().
Wed, Mar 19, 3:01 AM
zlei updated subscribers of D15865: Provide process space virtualisation functionality for jails..

From the Jails Production User call: This work is still of interest, particularly in the context of OCI jail progress.
DCH: "This needs serious rebasing." Do any developers have interest in this feature?

Wed, Mar 19, 12:42 AM

Tue, Mar 18

zlei added a comment to D49359: ifnet: Fix the teardown process of an interface.

For 285129 this still crashes in the same place: https://github.com/opnsense/src/issues/207#issuecomment-2733080313

Tue, Mar 18, 2:14 PM
zlei added a comment to D49359: ifnet: Fix the teardown process of an interface.

Well I though that it is the best result that the system panics when hitting this bug, but I was wrong.

Tue, Mar 18, 10:01 AM
zlei added a comment to D49356: ifnet: Integrate if_unroute() into if_down().

Thanks!

I have only light concerns with aggressive MFC plan on refactoring changes. Sometimes bugs sneak into refactorings that seems to be innocent.

Yeah, I think I have checked that carefully. For this case, I checked the usage of if_unroute() from -current to stable/8.

Tue, Mar 18, 9:53 AM
zlei added a comment to D49359: ifnet: Fix the teardown process of an interface.

Why exactly does this fix the problem? As I understand, the specific bug is (probably) that NET_EPOCH_WAIT() is used too early.

Yes. This is only part of the fix. Still working on it yet. Found other paths to crash to kernel.

Tue, Mar 18, 2:33 AM
zlei added a comment to D49359: ifnet: Fix the teardown process of an interface.

Do you have a stress test suite you use for testing these changes?

Tue, Mar 18, 2:28 AM
zlei accepted D49348: netinet: Fix getcred sysctl handlers to do nothing if no input is given.

Looks good to me.

Tue, Mar 18, 2:22 AM

Fri, Mar 14

zlei added a comment to D49348: netinet: Fix getcred sysctl handlers to do nothing if no input is given.

It's probably ok to return EINVAL. I was a bit worried that doing so would result in errors from sysctl -a (which is how I found these bugs in the first place), but I think it's ok. Will update.

Fri, Mar 14, 12:14 PM
zlei added a comment to D49358: ifnet: Initialize address family dependent data region earlier.

This depends on D49357 .

Fri, Mar 14, 12:06 PM
zlei added a comment to D49359: ifnet: Fix the teardown process of an interface.

This is only the partial fix. I'm cleaning up the NET_EPOCH_ENTER / NET_EPOCH_EXIT and trying to find out all possible the races.

Fri, Mar 14, 12:05 PM
zlei requested review of D49359: ifnet: Fix the teardown process of an interface.
Fri, Mar 14, 12:01 PM
zlei requested review of D49358: ifnet: Initialize address family dependent data region earlier.
Fri, Mar 14, 11:56 AM
zlei requested review of D49357: netinet6: Remove set but not used global variable in6_maxmtu.
Fri, Mar 14, 11:53 AM
zlei requested review of D49356: ifnet: Integrate if_unroute() into if_down().
Fri, Mar 14, 11:48 AM
zlei added a comment to D49348: netinet: Fix getcred sysctl handlers to do nothing if no input is given.

Those sysctl handlers copy in a pair of sockaddr_in[6] ( pointed by req->newptr) and copy out the xucred, so what's the point of a NULL req->newptr ? Maybe an error EINVAL is better than 0 ?

Fri, Mar 14, 4:59 AM

Thu, Mar 13

zlei committed rGb755058a24c3: vnet: Use static initializers (authored by zlei).
vnet: Use static initializers
Thu, Mar 13, 10:15 AM
zlei committed rG140c473fb905: tcp_ratelimit: Use static initializers (authored by zlei).
tcp_ratelimit: Use static initializers
Thu, Mar 13, 10:15 AM
zlei committed rGe461ba1fa34a: lagg: Use static initializers (authored by zlei).
lagg: Use static initializers
Thu, Mar 13, 10:15 AM
zlei committed rG2ed4c9d94dc6: if_clone: Use static initializers (authored by zlei).
if_clone: Use static initializers
Thu, Mar 13, 10:15 AM
zlei committed rG0e03922b4501: if_bridge: Use static initializers (authored by zlei).
if_bridge: Use static initializers
Thu, Mar 13, 10:15 AM
zlei added a reverting change for rGd74b7baeb0d4: ifnet_byindex() actually requires network epoch: rGaf1bd15d3be3: udp: Do not recursively enter net epoch.
Thu, Mar 13, 10:15 AM
zlei committed rG7fa12e220c7a: bridgestp: Use static initializers (authored by zlei).
bridgestp: Use static initializers
Thu, Mar 13, 10:15 AM
zlei committed rGdbb5216e03cd: carp: Use static initializers (authored by zlei).
carp: Use static initializers
Thu, Mar 13, 10:15 AM
zlei committed rGe72e35500eb2: bpf: Use static initializers (authored by zlei).
bpf: Use static initializers
Thu, Mar 13, 10:15 AM
zlei committed rGaf1bd15d3be3: udp: Do not recursively enter net epoch (authored by zlei).
udp: Do not recursively enter net epoch
Thu, Mar 13, 10:14 AM
zlei added a reverting change for D33263: ifnet_byindex() actually requires network epoch: rGaf1bd15d3be3: udp: Do not recursively enter net epoch.
Thu, Mar 13, 10:14 AM
zlei accepted D49339: netinet6: Do not forward or send ICMPv6 messages to the unspec address.

Looks good to me.

Thu, Mar 13, 9:59 AM

Mar 10 2025

zlei committed rG6771ec1d6a50: netlink: Fix getting route scope of interface's IPv4 addresses (authored by zlei).
netlink: Fix getting route scope of interface's IPv4 addresses
Mar 10 2025, 10:25 AM
zlei committed rG9f26a03fdbdc: tests/netlink: Assert the route scope of interface's addresses (authored by zlei).
tests/netlink: Assert the route scope of interface's addresses
Mar 10 2025, 10:25 AM
zlei committed rG057165012b4d: netlink: Fix getting route scope of interface's IPv4 addresses (authored by zlei).
netlink: Fix getting route scope of interface's IPv4 addresses
Mar 10 2025, 10:14 AM
zlei committed rGe84a17db1cc0: tests/netlink: Assert the route scope of interface's addresses (authored by zlei).
tests/netlink: Assert the route scope of interface's addresses
Mar 10 2025, 10:14 AM

Mar 7 2025

zlei committed rG0e096bb3fcaa: netlink: Fix getting route scope of interface's IPv4 addresses (authored by zlei).
netlink: Fix getting route scope of interface's IPv4 addresses
Mar 7 2025, 4:15 AM
zlei committed rG5d8b48487acc: tests/netlink: Assert the route scope of interface's addresses (authored by zlei).
tests/netlink: Assert the route scope of interface's addresses
Mar 7 2025, 4:15 AM
zlei closed D49226: netlink: Fix getting route scope of interface's IPv4 addresses.
Mar 7 2025, 4:15 AM
zlei committed rG70831490663b: netinet: Make in_canforward() return bool (authored by zlei).
netinet: Make in_canforward() return bool
Mar 7 2025, 4:04 AM
zlei committed rG9406d7e32da7: netinet: Do not forward or ICMP response to INADDR_ANY (authored by zlei).
netinet: Do not forward or ICMP response to INADDR_ANY
Mar 7 2025, 4:04 AM
zlei committed rG841c38b5a028: netinet: Make in_ifhasaddr() return bool (authored by zlei).
netinet: Make in_ifhasaddr() return bool
Mar 7 2025, 4:04 AM
zlei committed rG1b2715c0581e: netinet: Make in_localaddr() return bool (authored by zlei).
netinet: Make in_localaddr() return bool
Mar 7 2025, 4:04 AM

Mar 6 2025

zlei added a comment to D49226: netlink: Fix getting route scope of interface's IPv4 addresses.

Zhenlei, please go forward with Alexander's review, don't wait on me. Thanks!

Mar 6 2025, 11:26 PM
zlei added a comment to D49212: route: protect against unattached if_afdata.

ifp dump:

In if_flags I see that IFF_DYING is set, which suggests that my theory explains the crash. I think this should be reproducible with a test setup that 1) creates and tears down a PPPoE interface in a loop, 2) tries to send traffic over some route assigned to the interface. Would you be willing to try that?

As for a solution, we can call NET_EPOCH_WAIT() after purging routes in if_detach_internal(), though that's something of a band-aid. The general pattern in if_detach_internal() should be to first remove the ifnet from globally visible data structures, then block and drain readers, then tear down the structure.

Does NET_EPOCH_WAIT() and NET_EPOCH_ENTER() establish a happens-before order ?

I had a look at the implementation of epoch_wait_preempt() and epoch_enter_preempt().

Mar 6 2025, 10:12 AM
zlei committed rG23f453ae34c2: vnet: Use static initializers (authored by zlei).
vnet: Use static initializers
Mar 6 2025, 4:53 AM
zlei committed rG09de37310313: tcp_ratelimit: Use static initializers (authored by zlei).
tcp_ratelimit: Use static initializers
Mar 6 2025, 4:53 AM
zlei committed rG36ebdd0155d0: lagg: Use static initializers (authored by zlei).
lagg: Use static initializers
Mar 6 2025, 4:53 AM
zlei committed rG1ba655149ed0: if_clone: Use static initializers (authored by zlei).
if_clone: Use static initializers
Mar 6 2025, 4:53 AM
zlei committed rG59dbc829cf07: if_bridge: Use static initializers (authored by zlei).
if_bridge: Use static initializers
Mar 6 2025, 4:53 AM
zlei committed rGfd0020f81a60: bridgestp: Use static initializers (authored by zlei).
bridgestp: Use static initializers
Mar 6 2025, 4:53 AM
zlei committed rGc7f8ffc70afa: bpf: Use static initializers (authored by zlei).
bpf: Use static initializers
Mar 6 2025, 4:53 AM
zlei committed rGb7d5bda6f109: carp: Use static initializers (authored by zlei).
carp: Use static initializers
Mar 6 2025, 4:53 AM
zlei updated subscribers of D49212: route: protect against unattached if_afdata.

Previous work D45690 by @takahiro.kurosawa_gmail.com . That is IMHO brute force and may relief the issue, but the net stack will have to check the unstable state of ifp a lot.

Mar 6 2025, 1:34 AM
zlei added a comment to D49212: route: protect against unattached if_afdata.

I think the root cause is, thread A NET_EPOCH_WAIT() does not block thread B's NET_EPOCH_ENTER(). Well the net epoch does protect the liveness of ifp, but it does not guard ifp's state from been changed. For example the two threads execute in the following sequence,

Mar 6 2025, 1:19 AM
zlei added a comment to D49212: route: protect against unattached if_afdata.

ifp dump:

In if_flags I see that IFF_DYING is set, which suggests that my theory explains the crash. I think this should be reproducible with a test setup that 1) creates and tears down a PPPoE interface in a loop, 2) tries to send traffic over some route assigned to the interface. Would you be willing to try that?

As for a solution, we can call NET_EPOCH_WAIT() after purging routes in if_detach_internal(), though that's something of a band-aid. The general pattern in if_detach_internal() should be to first remove the ifnet from globally visible data structures, then block and drain readers, then tear down the structure.

Mar 6 2025, 1:02 AM

Mar 5 2025

zlei added a comment to D49227: udp: Do not recursively enter net epoch.

Didn't do a full examination of all stacks, the change by itself should be correct. If any code path triggers the new assertion, that path needs to be fixed.

Mar 5 2025, 5:13 AM
zlei closed D49227: udp: Do not recursively enter net epoch.
Mar 5 2025, 5:01 AM
zlei added a reverting change for rGd74b7baeb0d4: ifnet_byindex() actually requires network epoch: rG2472f4dbe930: udp: Do not recursively enter net epoch.
Mar 5 2025, 5:01 AM
zlei committed rG2472f4dbe930: udp: Do not recursively enter net epoch (authored by zlei).
udp: Do not recursively enter net epoch
Mar 5 2025, 5:01 AM
zlei added a reverting change for D33263: ifnet_byindex() actually requires network epoch: rG2472f4dbe930: udp: Do not recursively enter net epoch.
Mar 5 2025, 5:01 AM

Mar 4 2025

zlei added a reverting change for rGd74b7baeb0d4: ifnet_byindex() actually requires network epoch: D49227: udp: Do not recursively enter net epoch.
Mar 4 2025, 9:52 AM
zlei requested review of D49227: udp: Do not recursively enter net epoch.
Mar 4 2025, 9:51 AM
zlei added a reverting change for D33263: ifnet_byindex() actually requires network epoch: D49227: udp: Do not recursively enter net epoch.
Mar 4 2025, 9:51 AM
zlei added a reviewer for D49226: netlink: Fix getting route scope of interface's IPv4 addresses: glebius.
Mar 4 2025, 7:11 AM
zlei added a comment to D49053: carp: Fix checking IPv4 multicast address.

I personally do not like this ntohl sprinkle everywhere. Some usages in sys/netlink/route/iface.c are also suspicious.

...

That may want more attention from the network .

Mar 4 2025, 7:06 AM
zlei requested review of D49226: netlink: Fix getting route scope of interface's IPv4 addresses.
Mar 4 2025, 7:02 AM
zlei accepted D49172: kern: wg: remove overly-restrictive address family check.

Looks good to me.
Just a reminder, the commit message should be revised as well.

Mar 4 2025, 6:28 AM
zlei added a comment to D49212: route: protect against unattached if_afdata.

This appears to be the same with https://bugs.freebsd.org/279653 .

Mar 4 2025, 2:34 AM
zlei added a comment to D49172: kern: wg: remove overly-restrictive address family check.

IPv6 packets can be routed via an IPv4 nexthop

IIRC that has not been implemented / supported yet, as IPv6 has much richer addresses, e.g., IPv6 via IPv6 LL nexthop, we do not need IPv6 via IPv4 nexthop.

I noticed that when I went to write the test by copying the wg_basic test and setting up IPv6 on the other side. :-)

That is a perfect valid usage.

Mar 4 2025, 2:27 AM

Mar 3 2025

zlei committed rG588838aa2484: libkern: strdup.c, strndup.c: Prefer memcpy() over bcopy() (authored by zlei).
libkern: strdup.c, strndup.c: Prefer memcpy() over bcopy()
Mar 3 2025, 2:58 PM

Mar 2 2025

zlei committed rG3ae7c763540a: netinet: Make in_canforward() return bool (authored by zlei).
netinet: Make in_canforward() return bool
Mar 2 2025, 3:03 PM
zlei committed rGf7174eb2b4c4: netinet: Do not forward or ICMP response to INADDR_ANY (authored by zlei).
netinet: Do not forward or ICMP response to INADDR_ANY
Mar 2 2025, 3:03 PM
zlei closed D49157: netinet: Do not forward or ICMP response to INADDR_ANY.
Mar 2 2025, 3:03 PM
zlei added inline comments to D49157: netinet: Do not forward or ICMP response to INADDR_ANY.
Mar 2 2025, 2:10 PM
zlei abandoned D42678: bus: Properly set virtual network stack before trying to attach devices.

D47174 is a better fix.

Mar 2 2025, 2:49 AM
zlei added inline comments to D49172: kern: wg: remove overly-restrictive address family check.
Mar 2 2025, 2:34 AM
zlei added a comment to D49172: kern: wg: remove overly-restrictive address family check.

IPv6 packets can be routed via an IPv4 nexthop

IIRC that has not been implemented / supported yet, as IPv6 has much richer addresses, e.g., IPv6 via IPv6 LL nexthop, we do not need IPv6 via IPv4 nexthop.

Mar 2 2025, 1:52 AM

Feb 28 2025

zlei closed D46364: ena: Make global counters style unified.

Landed as 90953d2f829a .

Feb 28 2025, 6:53 AM
zlei closed D46372: ena: Fix leaking ifmedia resources on detach.

Landed as 449496eb28da .

Feb 28 2025, 6:52 AM

Feb 27 2025

zlei committed rG97309cec6f31: netinet: Make in_ifhasaddr() return bool (authored by zlei).
netinet: Make in_ifhasaddr() return bool
Feb 27 2025, 3:59 PM
zlei committed rG69beb162848b: netinet: Make in_localaddr() return bool (authored by zlei).
netinet: Make in_localaddr() return bool
Feb 27 2025, 3:59 PM
zlei requested review of D49157: netinet: Do not forward or ICMP response to INADDR_ANY.
Feb 27 2025, 3:02 PM
zlei committed rG96b62eae3c24: netinet: Update a comment for in_localip() (authored by zlei).
netinet: Update a comment for in_localip()
Feb 27 2025, 2:47 PM
zlei added a comment to D49135: bpf: avoid panic on multiple readers.

panic: bpfread: lost bd_hbuf

Feb 27 2025, 3:49 AM

Feb 25 2025

zlei committed rGcad2df3e90d1: carp: Fix checking IPv4 multicast address (authored by zlei).
carp: Fix checking IPv4 multicast address
Feb 25 2025, 4:02 AM