Page MenuHomeFreeBSD

jtl (Jonathan T. Looney)
UserAdministrator

Projects

User Details

User Since
Oct 29 2015, 5:25 PM (278 w, 5 d)
Roles
Administrator

Recent Activity

Jan 14 2021

jtl accepted D28142: tcp: add sysctl to tolerate TCP segments missing timestamps.
Jan 14 2021, 2:36 PM

Dec 4 2020

jtl added a comment to D27459: Only bring down clone interfaces at shutdown.

This does not fix the regression I am experiencing in my test setup. I am testing with a machine which uses a LAGG interface to communicate with the outside world. Shutting this interface down still makes my SSH sessions hang.

Dec 4 2020, 4:44 PM

Dec 3 2020

jtl abandoned D27464: Fix hung TCP sessions on shutdown.

I just saw the discussion on the committers mailing list. First, it shows that @cy already has a proposed fix. Secondly, it shows that this is a larger issue (for example, netboot), which probably needs a different solution.

Dec 3 2020, 8:36 PM
jtl added a comment to D27464: Fix hung TCP sessions on shutdown.

When committing, please add a reference to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251540

Dec 3 2020, 8:23 PM
jtl requested review of D27464: Fix hung TCP sessions on shutdown.
Dec 3 2020, 7:49 PM

Nov 20 2020

jtl closed D27246: Fix dtrace symbol resolution for anonymous structs/unions.
Nov 20 2020, 7:09 PM
jtl committed rS367905: When copying types from one CTF container to another, ensure that we.
When copying types from one CTF container to another, ensure that we
Nov 20 2020, 7:09 PM
jtl added a comment to D27246: Fix dtrace symbol resolution for anonymous structs/unions.

I'd suggest running the DTrace test suite with this change if you haven't. make -C cddl/usr.sbin/dtrace WITH_DTRACE_TESTS= all install should install them to /usr/tests/cddl/usr.sbin/dtrace.

Nov 20 2020, 5:14 PM

Nov 17 2020

jtl added a comment to D27246: Fix dtrace symbol resolution for anonymous structs/unions.

Could you please re-upload with context?

Nov 17 2020, 3:47 PM
jtl updated the diff for D27246: Fix dtrace symbol resolution for anonymous structs/unions.

Updating the diff to include context.

Nov 17 2020, 3:30 PM
jtl closed D27213: Fix dtrace symbol resolution in the face of bitfields.
Nov 17 2020, 2:07 PM
jtl committed rS367763: When copying types from one CTF container to another, ensure that we.
When copying types from one CTF container to another, ensure that we
Nov 17 2020, 2:07 PM
jtl updated the diff for D27246: Fix dtrace symbol resolution for anonymous structs/unions.

While here, update the code in ctf_add_generic() to encode empty type names with index 0. This fixes the analogous case for type names.

Nov 17 2020, 3:09 AM
jtl updated the summary of D27246: Fix dtrace symbol resolution for anonymous structs/unions.
Nov 17 2020, 3:05 AM
jtl requested review of D27246: Fix dtrace symbol resolution for anonymous structs/unions.
Nov 17 2020, 3:03 AM
jtl added a comment to D27213: Fix dtrace symbol resolution in the face of bitfields.
In D27213#608228, @jtl wrote:

I can't reproduce this problem at all on head. The script appears to work properly. I remember that the use of anonymous unions in struct mbuf caused some problems, at least one of which was fixed by r305055, but that was a long time ago.

I can reproduce this problem locally. Some others can't reproduce the problem I reported in the main description. I'm not sure why, but there seems to be some non-determinism in the way symbols are loaded/resolved?

Nov 17 2020, 1:36 AM
jtl added a comment to D27213: Fix dtrace symbol resolution in the face of bitfields.
In D27213#608269, @ae wrote:

It seems I found how to reproduce it on test system:

  1. Load systemt without any unneeded modules
  2. kldload dtraceall
  3. Run
# dtrace -n 'fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }'
dtrace: description 'fbt::ip_input:entry ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  2  49220                   ip_input:entry ix0
  2  49220                   ip_input:entry ix0
  6  49220                   ip_input:entry ix0
^C
# kldunload dtraceall
# kldload ipfw
# kldload dtraceall
# dtrace -n 'fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }'
dtrace: invalid probe specifier fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }: in action list: m_pkthdr is not a member of struct mbuf
Nov 17 2020, 1:16 AM

Nov 16 2020

jtl added a comment to D27213: Fix dtrace symbol resolution in the face of bitfields.
In D27213#608180, @jtl wrote:
In D27213#608081, @ae wrote:

Recently I faced with this problem on some machines:

# dtrace -n 'fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }'
dtrace: invalid probe specifier fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }: in action list: m_pkthdr is not a member of struct mbuf

And it seems it is exactly related.

It does appear that problem is likely related to type resolution. However, for what its worth, this patch does not solve that problem on my test machine. So, that problem may have a different cause.

I can't reproduce this problem at all on head. The script appears to work properly. I remember that the use of anonymous unions in struct mbuf caused some problems, at least one of which was fixed by r305055, but that was a long time ago.

Nov 16 2020, 6:03 PM
jtl added a comment to D27213: Fix dtrace symbol resolution in the face of bitfields.
In D27213#608081, @ae wrote:

Recently I faced with this problem on some machines:

# dtrace -n 'fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }'
dtrace: invalid probe specifier fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }: in action list: m_pkthdr is not a member of struct mbuf

And it seems it is exactly related.

Nov 16 2020, 4:52 PM
jtl added inline comments to D27213: Fix dtrace symbol resolution in the face of bitfields.
Nov 16 2020, 3:43 PM

Nov 14 2020

jtl closed D27173: Add regression test for D27164.
Nov 14 2020, 3:45 PM
jtl committed rS367685: Add a regression test for the port-selection behavior fixed in r367680..
Add a regression test for the port-selection behavior fixed in r367680.
Nov 14 2020, 3:45 PM
jtl closed D27164: Fix implicit automatic local port selection during connect calls.
Nov 14 2020, 2:50 PM
jtl committed rS367680: Fix implicit automatic local port selection for IPv6 during connect calls..
Fix implicit automatic local port selection for IPv6 during connect calls.
Nov 14 2020, 2:50 PM
jtl updated the summary of D27213: Fix dtrace symbol resolution in the face of bitfields.
Nov 14 2020, 12:41 AM
jtl requested review of D27213: Fix dtrace symbol resolution in the face of bitfields.
Nov 14 2020, 12:33 AM

Nov 13 2020

jtl accepted D27188: LACP: When suppressing distributing, return ENOBUFS rather than ENETDOWN to preserve TCP conns.
Nov 13 2020, 4:48 PM

Nov 11 2020

jtl requested review of D27173: Add regression test for D27164.
Nov 11 2020, 3:43 AM

Nov 10 2020

jtl updated the summary of D27164: Fix implicit automatic local port selection during connect calls.
Nov 10 2020, 6:32 PM
jtl requested review of D27164: Fix implicit automatic local port selection during connect calls.
Nov 10 2020, 6:29 PM
jtl closed D27129: When destroying a UMA zone with a reserve, properly drain kegs.
Nov 10 2020, 6:12 PM
jtl committed rS367573: When destroying a UMA zone which has a reserve (set with.
When destroying a UMA zone which has a reserve (set with
Nov 10 2020, 6:12 PM

Nov 6 2020

jtl requested review of D27129: When destroying a UMA zone with a reserve, properly drain kegs.
Nov 6 2020, 9:18 PM

Nov 5 2020

jtl added reviewers for D18892: Phase 2 to add Proportional Rate Reduction (RFC6937) to FreeBSD: lstewart, rrs.
Nov 5 2020, 2:19 PM
jtl accepted D24237: Fix erroneous "DSACK" during loss recovery.

In general, this looks good. I have a small nit in that it seems like it would be worth considering whether it would be better to add the flag to the socket itself somehow so it could be synchronized by the socket lock. On the offchance someone did a socket operation which caused a wakeup while the TCP code was running, it seems possible that this might avoid a spurious wakeup. However, given the code in the src tree, I find it hard to reason through how this could occur.

Nov 5 2020, 2:12 PM

Nov 4 2020

jtl added a comment to D26082: pmcstat: Fix usage message.

I had a look at the man page and the -U option is indeed a bit confusing.
Certainly replacing the \n is the right thing to do; I will try to have a look at the src.

Nov 4 2020, 4:11 PM

Apr 23 2020

jtl added a reviewer for D23364: Send CWR only on new data, as per sec. 6.1.2 of RFC3168: lstewart.
Apr 23 2020, 2:37 PM

Apr 20 2020

jtl added a comment to D24400: Make encrypted swap more reliable.

With the change I made to keep the current behavior for everything except swap (which is fairly well tested), are there additional concerns?

Apr 20 2020, 4:56 PM

Apr 18 2020

jtl accepted D24477: llvm9 wont allow enum definition inside anon-struct.
Apr 18 2020, 4:29 PM

Apr 17 2020

jtl added a comment to D24477: llvm9 wont allow enum definition inside anon-struct.

I was able to replicate it locally using llvm's c++ 9.0.1 and defining _WANT_SOCKET prior to including the header file:

Apr 17 2020, 11:40 PM
jtl added a comment to D24477: llvm9 wont allow enum definition inside anon-struct.

Can you provide some more information on the exact error you saw? I'm 99.9% sure we successfully compiled these sources with LLVM 9.

Apr 17 2020, 8:40 PM
jtl added a reviewer for D24477: llvm9 wont allow enum definition inside anon-struct: glebius.
Apr 17 2020, 8:36 PM

Apr 16 2020

jtl committed rS360020: Avoid calling protocol drain routines more than once per reclamation event..
Avoid calling protocol drain routines more than once per reclamation event.
Apr 16 2020, 8:17 PM
jtl closed D24418: Avoid calling protocol drain routines more than once.
Apr 16 2020, 8:17 PM
jtl committed rS360019: Add a regression test for the changes in r359922 and r359923..
Add a regression test for the changes in r359922 and r359923.
Apr 16 2020, 8:07 PM

Apr 14 2020

jtl committed rS359923: Make sonewconn() overflow messages have per-socket rate-limits and values..
Make sonewconn() overflow messages have per-socket rate-limits and values.
Apr 14 2020, 3:38 PM
jtl closed D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
Apr 14 2020, 3:38 PM
jtl committed rS359922: Print more detail as part of the sonewconn() overflow message..
Print more detail as part of the sonewconn() overflow message.
Apr 14 2020, 3:30 PM
jtl closed D24272: Print more detail as part of the sonewconn() overflow message.
Apr 14 2020, 3:27 PM
jtl committed rS359921: Make the path length of UNIX domain sockets specified by a #define..
Make the path length of UNIX domain sockets specified by a #define.
Apr 14 2020, 3:27 PM
jtl accepted D24308: Improve blackhole detection.
Apr 14 2020, 3:14 PM
jtl created D24418: Avoid calling protocol drain routines more than once.
Apr 14 2020, 3:11 PM

Apr 13 2020

jtl updated the diff for D24400: Make encrypted swap more reliable.

Address comments by @jhb:

  • Delete the code to declare the GELI threads as kernel FPU threads. (I'll open a separate review for that.)
  • Switch the default to blocking mallocs for everything except swap requests.
Apr 13 2020, 10:53 PM
jtl added a comment to D24400: Make encrypted swap more reliable.
In D24400#536778, @jhb wrote:

To be clear, you only tested encrypted swap? Did you do any testing with encrypted volumes meant to hold persistent data after a reboot (e.g. holding a UFS volume on a disk) and seeing how it was impacted by ENOMEM?

Apr 13 2020, 10:46 PM
jtl created D24400: Make encrypted swap more reliable.
Apr 13 2020, 7:57 PM
jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Rebase onto D24272.

Apr 13 2020, 5:19 PM
jtl updated the diff for D24272: Print more detail as part of the sonewconn() overflow message.
  • Address @jhb's comments.
  • Add the comment requested by @bz on the reason UNIX domain sockets are restricted to 104 bytes.
Apr 13 2020, 5:03 PM
jtl added inline comments to D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
Apr 13 2020, 3:32 PM
jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Correct typo in the man page. (Thanks @bcr!)

Apr 13 2020, 3:31 PM
jtl added inline comments to D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
Apr 13 2020, 3:14 PM
jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Make the overflow rate-limit be controlled by a sysctl.

Apr 13 2020, 3:14 PM
jtl added a comment to D24272: Print more detail as part of the sonewconn() overflow message.

I'm going to do a tinderbox build of this + D24316 (mostly, to make sure I didn't mess up the various INET/INET6 combinations) and then commit them.

Apr 13 2020, 2:24 PM

Apr 9 2020

jtl accepted D24308: Improve blackhole detection.

Thanks for providing the extra context on the transport call today. Overall, this change looks good. I left a few nits in in-line comments.

Apr 9 2020, 3:21 PM
jtl requested changes to D24237: Fix erroneous "DSACK" during loss recovery.

As discussed on the transport call today, please change both the in order and out of order data path so we call sorwakeup_locked() (when necessary) after the SACK blocks are updated.

Apr 9 2020, 3:03 PM

Apr 6 2020

jtl created D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
Apr 6 2020, 8:53 PM
jtl updated the diff for D24272: Print more detail as part of the sonewconn() overflow message.

Switch to using sbuf(9) for string creation. Also, use a constant string for "local:".

Apr 6 2020, 6:16 PM

Apr 3 2020

jtl added a comment to D24272: Print more detail as part of the sonewconn() overflow message.
In D24272#534063, @jhb wrote:

I think this looks fine. It would be nice to use a string builder instead of memcpy/strcat, etc. like sbuf(9) to make it more robust to future changes.

Apr 3 2020, 11:39 PM
jtl added a comment to D24272: Print more detail as part of the sonewconn() overflow message.

FWIW, these are examples of the messages this produces:

Apr  3 19:58:34 c006 kernel: sonewconn: pcb 0xfffff805a566a200 (127.0.0.1:65432 (proto 6)): Listen queue overflow: 4 already in queue awaiting acceptance (1 occurrences)
Apr  3 19:59:44 c006 kernel: sonewconn: pcb 0xfffff80611de4000 ([::1]:65432 (proto 6)): Listen queue overflow: 4 already in queue awaiting acceptance (2 occurrences)
Apr  3 20:16:12 c006 kernel: sonewconn: pcb 0xfffff80170cde100 (local:/tmp/testsock): Listen queue overflow: 4 already in queue awaiting acceptance (1 occurrences)
Apr 3 2020, 8:16 PM
jtl created D24272: Print more detail as part of the sonewconn() overflow message.
Apr 3 2020, 8:14 PM

Mar 12 2020

jtl accepted D23904: Use tcpstats accessor functions for kernel modules in case of RACK and BBR.
Mar 12 2020, 2:47 PM

Mar 9 2020

jtl accepted D23998: Don't deref an mbuf ext_pgs pointer in tcp_m_copym() unless we need to.
Mar 9 2020, 4:25 PM

Feb 27 2020

jtl added reviewers for D23655: Cubic: prevent abrupt cwnd jumps after slow start: lstewart, transport.
Feb 27 2020, 2:37 PM
jtl added a comment to D18624: improvements to support code for RFC6675.
In D18624#458099, @jtl wrote:

This looks like it does what is described. As I understand, the use of delivered_data will be covered by a separate review. It looks like this will slightly change the way sacked_bytes is calculated. The change is probably a good thing, but it is worth verifying (and I haven't done this yet) that the updated calculation will work correctly.

Feb 27 2020, 2:09 PM

Feb 10 2020

jtl committed rS357742: Modify the vm.panic_on_oom sysctl to take a count of events..
Modify the vm.panic_on_oom sysctl to take a count of events.
Feb 10 2020, 6:06 PM
jtl closed D23601: Modify the vm.panic_on_oom sysctl to take a count of events.
Feb 10 2020, 6:06 PM
jtl created D23601: Modify the vm.panic_on_oom sysctl to take a count of events.
Feb 10 2020, 3:21 PM

Feb 7 2020

jtl added a comment to D23517: Align the laundry and page out worker shortfall calculations.
In D23517#516336, @jtl wrote:

So, the problem manifests as the laundry queue steadily growing without any swapping in response?

We noticed this when we tried enabling encrypted swap. On the console, we see a string of processes killed due to the server being out of memory. Then, eventually, the watchdog timer fires and kills the system. I don't know what triggers this cycle. It seems to happen on a small percentage of systems hours to days after boot. To the best of my knowledge, we have not been able to observe a system descend into this naturally.

We tried to recreate the problem by artificially creating memory pressure. We ran a program that allocates a lot of memory (equal to the sum of the free and inactive sizes) and sequentially writes to each page in a loop. When we did this, we saw:

  1. The laundry size is growing.
  2. Processes are continually killed due to low memory.
  3. Finally, the watchdog kicks in and reboots the system.

Now, we may have recreated a different problem that has similar symptoms. But, at minimum, it seems like this is showing *a* bug.

I've done tests like this in the past. Indeed, if the program is dirtying pages quickly enough we might give up an attempt an OOM kill, but we should definitely be targetting the runaway process first. It's possible that we may have swapped its kernel stack out, in which case I believe we have to swap it back in to reclaim anything, since reclamation happens during SIGKILL-triggered process exit. (I don't see offhand why another thread couldn't call pmap_remove_pages() on the target process before it is swapped back in though.)

I have not tried testing with a GELI-backed swap device though. Presumably you were using one? Do you see a difference in behaviour when swap is unencrypted?

Feb 7 2020, 2:32 PM

Feb 6 2020

jtl added a comment to D23517: Align the laundry and page out worker shortfall calculations.

Note that the page daemon uses vmd_free_target as the PID controller set point, but its target may be larger than the instantaneous difference vmd_free_target - vmd_free_count. So if it manages to free enough pages to satisfy vm_paging_target(), but not enough to satisfy the PID controller target, it'll trigger the laundry thread's shortfall mode, which then does nothing (unless pageout_deficit happens to be bigger than the negative difference). This suggests to me that the page daemon should be storing the value of page_shortage in the vmd_shortage field. In other words, the page daemon has failed to meet its target by page_shortage pages, and the laundry thread should try and make up that difference.

Feb 6 2020, 3:56 PM

Feb 5 2020

jtl updated subscribers of D23517: Align the laundry and page out worker shortfall calculations.

So, the problem manifests as the laundry queue steadily growing without any swapping in response?

Feb 5 2020, 11:14 PM
jtl created D23517: Align the laundry and page out worker shortfall calculations.
Feb 5 2020, 4:01 PM

Oct 24 2019

jtl abandoned D16850: Update the fragment reassembly code's handling of overlapping fragments to conform to RFC 8200..

@bz will commit.

Oct 24 2019, 2:07 PM
jtl abandoned D16847: Eliminate KAME custom circular queues in reassembly code..

Basically commited by @bz .

Oct 24 2019, 2:06 PM

Oct 1 2019

jtl added a comment to D21840: Make TCP_INFO report TFO.
In D21840#477432, @rscheff_gmx.at wrote:

TCP_INFO is not portable afaik; however, I wonder why the linux varian has a flag for "SYN_DATA" (which may get delivered to the appliacation if the TFO is present), but no flag for TFO, while this change is signaling the presence of TFO, but no SYN_DATA... Just curious...

Oct 1 2019, 3:20 PM

Sep 27 2019

jtl added a comment to D19622: Fix panic in network stack due memory use after free in relation to fragmented packets.
In D19622#476255, @bz wrote:

In a later call I think your suggestion was along the lines of:

(a) if ifnet goes away nuke the recvif pointer from the queued mbufs
(b) if reassembly times out do as outlined above and skip sending ICMP error/per-IF statistics/.. if the ifnet pointer was nuked
(c) if another fragment arrives (or the last fragment to complete the packet arrives) use that ones recvif pointer as that interface is expected to still be there to pass the packet on

Now there is a gray area between (b) and (c) in which (b) could be extended to "if we cannot find an ifnet pointer in the expected place, scan the fragments of the packet in question for any ifnet pointer and use that one" for error handling. It'd be a one-time slightly more expensive operation. If there's an attack however that is kind-of the extra work you'd want to avoid. Without the extra work, you may not find out as easily what kind of problem you are running into though as you are lacking statistics. Trade-off...

Sep 27 2019, 2:44 PM

Sep 26 2019

jtl added a comment to D14387: Further reduce keepalive timer rescheduling.
In D14387#476034, @bz wrote:

This may sound like a pain but as you already say in your message, this is two changes .. First .. Second ..
Can you split them up into such? First should be really easy to review and second should then be straight forward as well by itself.

Sep 26 2019, 8:27 PM
jtl committed rS352746: Add new functionality to switch to using cookies exclusively when we the.
Add new functionality to switch to using cookies exclusively when we the
Sep 26 2019, 3:19 PM
jtl committed rS352745: Access the syncache secret directly from the V_tcp_syncache variable,.
Access the syncache secret directly from the V_tcp_syncache variable,
Sep 26 2019, 3:07 PM
jtl committed rS352744: Remove the unused sch parameter to the syncache_respond() function. The.
Remove the unused sch parameter to the syncache_respond() function. The
Sep 26 2019, 3:02 PM
jtl closed D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.
Sep 26 2019, 3:02 PM

Sep 18 2019

jtl updated the diff for D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.

Switch to using callout_init_mtx to let the callout system acquire the pause lock.

Sep 18 2019, 12:01 PM

Sep 17 2019

jtl added inline comments to D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.
Sep 17 2019, 2:27 PM

Sep 14 2019

jtl added inline comments to D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.
Sep 14 2019, 1:13 AM

Sep 13 2019

jtl created D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.
Sep 13 2019, 7:09 PM

Sep 3 2019

jtl abandoned D17609: Optimize curthread.

An alternative, better, version was implemented by @mjg in rS339449.

Sep 3 2019, 6:19 PM
jtl abandoned D7350: PMC: Collect user call chains while in kernel space.

This was largely committed by mmacy last year.

Sep 3 2019, 6:17 PM
jtl abandoned D15483: More bcmp "optimization".
Sep 3 2019, 6:15 PM

Aug 19 2019

jtl added inline comments to D20655: Make use of stats(3) in the TCP stack.
Aug 19 2019, 1:39 PM

Aug 10 2019

jtl committed rS350829: MFC r350815:.
MFC r350815:
Aug 10 2019, 12:03 AM
jtl committed rS350828: MFC r350815:.
MFC r350815:
Aug 10 2019, 12:01 AM

Aug 9 2019

jtl added inline comments to D20655: Make use of stats(3) in the TCP stack.
Aug 9 2019, 6:48 PM