Delete the code to declare the GELI threads as kernel FPU threads. (I'll open a separate review for that.)
Switch the default to blocking mallocs for everything except swap requests.

Apr 13 2020, 10:53 PM

jtl added a comment to D24400: Make encrypted swap more reliable.

In D24400#536778, @jhb wrote:

To be clear, you only tested encrypted swap? Did you do any testing with encrypted volumes meant to hold persistent data after a reboot (e.g. holding a UFS volume on a disk) and seeing how it was impacted by ENOMEM?

Apr 13 2020, 10:46 PM

jtl created D24400: Make encrypted swap more reliable.

Apr 13 2020, 7:57 PM

jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Rebase onto D24272.

Apr 13 2020, 5:19 PM

jtl updated the diff for D24272: Print more detail as part of the sonewconn() overflow message.

Address @jhb's comments.
Add the comment requested by @bz on the reason UNIX domain sockets are restricted to 104 bytes.

Apr 13 2020, 5:03 PM

jtl added inline comments to D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Apr 13 2020, 3:32 PM

jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Correct typo in the man page. (Thanks @bcr!)

Apr 13 2020, 3:31 PM

jtl added inline comments to D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Apr 13 2020, 3:14 PM

jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Make the overflow rate-limit be controlled by a sysctl.

Apr 13 2020, 3:14 PM

jtl added a comment to D24272: Print more detail as part of the sonewconn() overflow message.

I'm going to do a tinderbox build of this + D24316 (mostly, to make sure I didn't mess up the various INET/INET6 combinations) and then commit them.

Apr 13 2020, 2:24 PM

Apr 9 2020

jtl accepted D24308: Improve blackhole detection.

Thanks for providing the extra context on the transport call today. Overall, this change looks good. I left a few nits in in-line comments.

Apr 9 2020, 3:21 PM

jtl requested changes to D24237: Fix erroneous "DSACK" during loss recovery.

As discussed on the transport call today, please change both the in order and out of order data path so we call sorwakeup_locked() (when necessary) after the SACK blocks are updated.

Apr 9 2020, 3:03 PM

Apr 6 2020

jtl created D24316: Make sonewconn overflow messages have per-socket rate-limits and values.

Apr 6 2020, 8:53 PM

jtl updated the diff for D24272: Print more detail as part of the sonewconn() overflow message.

Switch to using sbuf(9) for string creation. Also, use a constant string for "local:".

Apr 6 2020, 6:16 PM

Apr 3 2020

jtl added a comment to D24272: Print more detail as part of the sonewconn() overflow message.

In D24272#534063, @jhb wrote:

I think this looks fine. It would be nice to use a string builder instead of memcpy/strcat, etc. like sbuf(9) to make it more robust to future changes.

Apr 3 2020, 11:39 PM

jtl added a comment to D24272: Print more detail as part of the sonewconn() overflow message.

FWIW, these are examples of the messages this produces:

Apr  3 19:58:34 c006 kernel: sonewconn: pcb 0xfffff805a566a200 (127.0.0.1:65432 (proto 6)): Listen queue overflow: 4 already in queue awaiting acceptance (1 occurrences)
Apr  3 19:59:44 c006 kernel: sonewconn: pcb 0xfffff80611de4000 ([::1]:65432 (proto 6)): Listen queue overflow: 4 already in queue awaiting acceptance (2 occurrences)
Apr  3 20:16:12 c006 kernel: sonewconn: pcb 0xfffff80170cde100 (local:/tmp/testsock): Listen queue overflow: 4 already in queue awaiting acceptance (1 occurrences)

Apr 3 2020, 8:16 PM

jtl created D24272: Print more detail as part of the sonewconn() overflow message.

Apr 3 2020, 8:14 PM

Mar 12 2020

jtl accepted D23904: Use tcpstats accessor functions for kernel modules in case of RACK and BBR.

Mar 12 2020, 2:47 PM

Mar 9 2020

jtl accepted D23998: Don't deref an mbuf ext_pgs pointer in tcp_m_copym() unless we need to.

Mar 9 2020, 4:25 PM

Feb 27 2020

jtl added reviewers for D23655: Cubic: prevent abrupt cwnd jumps after slow start: lstewart, transport.

Feb 27 2020, 2:37 PM

jtl added a comment to D18624: improvements to support code for RFC6675.

In D18624#458099, @jtl wrote:

This looks like it does what is described. As I understand, the use of delivered_data will be covered by a separate review. It looks like this will slightly change the way sacked_bytes is calculated. The change is probably a good thing, but it is worth verifying (and I haven't done this yet) that the updated calculation will work correctly.

Feb 27 2020, 2:09 PM

Feb 10 2020

jtl committed rS357742: Modify the vm.panic_on_oom sysctl to take a count of events..

Modify the vm.panic_on_oom sysctl to take a count of events.

Feb 10 2020, 6:06 PM

jtl closed D23601: Modify the vm.panic_on_oom sysctl to take a count of events.

Feb 10 2020, 6:06 PM

jtl created D23601: Modify the vm.panic_on_oom sysctl to take a count of events.

Feb 10 2020, 3:21 PM

Feb 7 2020

jtl added a comment to D23517: Align the laundry and page out worker shortfall calculations.

In D23517#516688, @markj wrote:

In D23517#516336, @jtl wrote:

In D23517#516299, @markj wrote:

So, the problem manifests as the laundry queue steadily growing without any swapping in response?

We noticed this when we tried enabling encrypted swap. On the console, we see a string of processes killed due to the server being out of memory. Then, eventually, the watchdog timer fires and kills the system. I don't know what triggers this cycle. It seems to happen on a small percentage of systems hours to days after boot. To the best of my knowledge, we have not been able to observe a system descend into this naturally.

We tried to recreate the problem by artificially creating memory pressure. We ran a program that allocates a lot of memory (equal to the sum of the free and inactive sizes) and sequentially writes to each page in a loop. When we did this, we saw:

The laundry size is growing.

Processes are continually killed due to low memory.

Finally, the watchdog kicks in and reboots the system.

Now, we may have recreated a different problem that has similar symptoms. But, at minimum, it seems like this is showing *a* bug.

I've done tests like this in the past. Indeed, if the program is dirtying pages quickly enough we might give up an attempt an OOM kill, but we should definitely be targetting the runaway process first. It's possible that we may have swapped its kernel stack out, in which case I believe we have to swap it back in to reclaim anything, since reclamation happens during SIGKILL-triggered process exit. (I don't see offhand why another thread couldn't call pmap_remove_pages() on the target process before it is swapped back in though.)

I have not tried testing with a GELI-backed swap device though. Presumably you were using one? Do you see a difference in behaviour when swap is unencrypted?

Feb 7 2020, 2:32 PM

Feb 6 2020

jtl added a comment to D23517: Align the laundry and page out worker shortfall calculations.

In D23517#516299, @markj wrote:

Note that the page daemon uses vmd_free_target as the PID controller set point, but its target may be larger than the instantaneous difference vmd_free_target - vmd_free_count. So if it manages to free enough pages to satisfy vm_paging_target(), but not enough to satisfy the PID controller target, it'll trigger the laundry thread's shortfall mode, which then does nothing (unless pageout_deficit happens to be bigger than the negative difference). This suggests to me that the page daemon should be storing the value of page_shortage in the vmd_shortage field. In other words, the page daemon has failed to meet its target by page_shortage pages, and the laundry thread should try and make up that difference.

Feb 6 2020, 3:56 PM

Feb 5 2020

jtl updated subscribers of D23517: Align the laundry and page out worker shortfall calculations.

In D23517#516299, @markj wrote:

So, the problem manifests as the laundry queue steadily growing without any swapping in response?

Feb 5 2020, 11:14 PM

jtl created D23517: Align the laundry and page out worker shortfall calculations.

Feb 5 2020, 4:01 PM

Oct 24 2019

jtl abandoned D16850: Update the fragment reassembly code's handling of overlapping fragments to conform to RFC 8200..

@bz will commit.

Oct 24 2019, 2:07 PM

jtl abandoned D16847: Eliminate KAME custom circular queues in reassembly code..

Basically commited by @bz .

Oct 24 2019, 2:06 PM

Oct 1 2019

jtl added a comment to D21840: Make TCP_INFO report TFO.

In D21840#477432, @rscheff_gmx.at wrote:

TCP_INFO is not portable afaik; however, I wonder why the linux varian has a flag for "SYN_DATA" (which may get delivered to the appliacation if the TFO is present), but no flag for TFO, while this change is signaling the presence of TFO, but no SYN_DATA... Just curious...

Oct 1 2019, 3:20 PM

Sep 27 2019

jtl added a comment to D19622: Fix panic in network stack due memory use after free in relation to fragmented packets.

In D19622#476255, @bz wrote:

In a later call I think your suggestion was along the lines of:

(a) if ifnet goes away nuke the recvif pointer from the queued mbufs
(b) if reassembly times out do as outlined above and skip sending ICMP error/per-IF statistics/.. if the ifnet pointer was nuked
(c) if another fragment arrives (or the last fragment to complete the packet arrives) use that ones recvif pointer as that interface is expected to still be there to pass the packet on

Now there is a gray area between (b) and (c) in which (b) could be extended to "if we cannot find an ifnet pointer in the expected place, scan the fragments of the packet in question for any ifnet pointer and use that one" for error handling. It'd be a one-time slightly more expensive operation. If there's an attack however that is kind-of the extra work you'd want to avoid. Without the extra work, you may not find out as easily what kind of problem you are running into though as you are lacking statistics. Trade-off...

Sep 27 2019, 2:44 PM

Sep 26 2019

jtl added a comment to D14387: Further reduce keepalive timer rescheduling.

In D14387#476034, @bz wrote:

This may sound like a pain but as you already say in your message, this is two changes .. First .. Second ..
Can you split them up into such? First should be really easy to review and second should then be straight forward as well by itself.

Sep 26 2019, 8:27 PM

jtl committed rS352746: Add new functionality to switch to using cookies exclusively when we the.

Add new functionality to switch to using cookies exclusively when we the

Sep 26 2019, 3:19 PM

jtl committed rS352745: Access the syncache secret directly from the V_tcp_syncache variable,.

Access the syncache secret directly from the V_tcp_syncache variable,

Sep 26 2019, 3:07 PM

jtl committed rS352744: Remove the unused sch parameter to the syncache_respond() function. The.

Remove the unused sch parameter to the syncache_respond() function. The

Sep 26 2019, 3:02 PM

jtl closed D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.

Sep 26 2019, 3:02 PM

Sep 18 2019

jtl updated the diff for D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.

Switch to using callout_init_mtx to let the callout system acquire the pause lock.

Sep 18 2019, 12:01 PM

Sep 17 2019

jtl added inline comments to D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.

Sep 17 2019, 2:27 PM

Sep 14 2019

jtl added inline comments to D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.

Sep 14 2019, 1:13 AM

Sep 13 2019

jtl created D21644: During SYN floods, fallback exclusively to SYN cookies for a small period.

Sep 13 2019, 7:09 PM

Sep 3 2019

jtl abandoned D17609: Optimize curthread.

An alternative, better, version was implemented by @mjg in rS339449.

Sep 3 2019, 6:19 PM

jtl abandoned D7350: PMC: Collect user call chains while in kernel space.

This was largely committed by mmacy last year.

Sep 3 2019, 6:17 PM

jtl abandoned D15483: More bcmp "optimization".

Sep 3 2019, 6:15 PM

Aug 19 2019

jtl added inline comments to D20655: Make use of stats(3) in the TCP stack.

Aug 19 2019, 1:39 PM

Aug 10 2019

jtl committed rS350829: MFC r350815:.

MFC r350815:

Aug 10 2019, 12:03 AM

jtl committed rS350828: MFC r350815:.

MFC r350815:

Aug 10 2019, 12:01 AM

Aug 9 2019

jtl added inline comments to D20655: Make use of stats(3) in the TCP stack.

Aug 9 2019, 6:48 PM

jtl committed rS350815: In m_pulldown(), before trying to prepend bytes to the subsequent mbuf,.

In m_pulldown(), before trying to prepend bytes to the subsequent mbuf,

Aug 9 2019, 5:19 AM

Jul 29 2019

jtl added a comment to D18624: improvements to support code for RFC6675.

This looks like it does what is described. As I understand, the use of delivered_data will be covered by a separate review. It looks like this will slightly change the way sacked_bytes is calculated. The change is probably a good thing, but it is worth verifying (and I haven't done this yet) that the updated calculation will work correctly.

Jul 29 2019, 7:16 PM

Jul 18 2019

D16851: Add support for header chain validation on IPv6 Fragments (RFC7112) now requires changes to proceed.

Jul 18 2019, 2:46 PM

jtl accepted D13293: Don't return IIC_Exxxxx status values to userspace.

Jul 18 2019, 2:28 PM

Jun 19 2019

jtl committed rS349197: MFC r349192:.

MFC r349192:

Jun 19 2019, 4:25 PM

jtl committed rS349192: Add the ability to limit how much the code will fragment the RACK send map.

Add the ability to limit how much the code will fragment the RACK send map

Jun 19 2019, 1:55 PM

Jun 12 2019

jtl committed rS348996: The current IPMI KCS code is waiting 100us for all transitions (roughly.

The current IPMI KCS code is waiting 100us for all transitions (roughly

Jun 12 2019, 4:06 PM

jtl closed D20527: improve performance of ipmi kcs interface..

Jun 12 2019, 4:06 PM

Jun 8 2019

jtl committed rS348810: Currently, MCA entries remain on an every-growing linked list. This means.

Currently, MCA entries remain on an every-growing linked list. This means

Jun 8 2019, 6:27 PM

jtl closed D20482: Free MCA entries after logging.

Jun 8 2019, 6:27 PM

Jun 7 2019

jtl added a comment to D20482: Free MCA entries after logging.

In my local tree, I added functions to generate fake machine check records and exercise the logic now found in mca_process_records(). It appears to work correctly. I *think* this is now ready to land.

Jun 7 2019, 10:15 AM

jtl updated the diff for D20482: Free MCA entries after logging.

Changes based on review:

Switched to using STAILQ_CONCAT to refill the free list.
Dropped redundant calls to resize the free list.
Centralized the logging logic to reduce code duplication.

Jun 7 2019, 9:49 AM

Jun 6 2019

jtl added inline comments to D20482: Free MCA entries after logging.

Jun 6 2019, 12:43 AM

Jun 5 2019

jtl accepted D20527: improve performance of ipmi kcs interface..

Jun 5 2019, 7:57 PM

jtl added reviewers for D20527: improve performance of ipmi kcs interface.: imp, jhb.

Jun 5 2019, 7:56 PM

May 31 2019

jtl added inline comments to D20482: Free MCA entries after logging.

May 31 2019, 11:27 PM

jtl updated the diff for D20482: Free MCA entries after logging.

Incorporate feedback from @jhb.

May 31 2019, 11:25 PM

jtl updated the diff for D20482: Free MCA entries after logging.

Try 2:
Maintain the last N records in the mca_records list. N is user-configurable and defaults to -1 (unlimited; the current behavior).

May 31 2019, 8:38 PM

jtl added a comment to D20482: Free MCA entries after logging.

In D20482#442176, @jhb wrote:

So I have had tools in the past that parsed the list from the kernel. See https://github.com/freebsd/freebsd/compare/master...bsdjhb:libsmbios_ecc. At the very least I think there should perhaps be a tunable/sysctl to control the behavior.

May 31 2019, 5:48 PM

jtl created D20482: Free MCA entries after logging.

May 31 2019, 5:22 PM

May 24 2019

jtl added a comment to D20109: Need to wait for epoch callbacks to complete before detaching network interface.

In D20109#437264, @hselasky wrote:

@glebius: Multicast destruction is deferred. When we destroy a multicast address we need to call the if_ioctl of the belonging network interface to remove any multicast addresses. That's the problem. I think draining is a good way to implement a safe solution instead of using refcounts. Then we ensure that the ifnet is in a certain state when the multicast destruction callbacks are invoked.

May 24 2019, 2:30 PM

May 23 2019

jtl requested changes to D19622: Fix panic in network stack due memory use after free in relation to fragmented packets.

In the interests of avoiding discussion fragmentation, I am adding the feedback from the transport working group meeting today.

May 23 2019, 6:16 PM

jtl accepted D20374: Add PAWS check for ACK segments in the syncache code.

May 23 2019, 3:01 PM

Apr 25 2019

jtl accepted D20028: Track TCP connection's NUMA domain in the inpcb.

Accepting the inp change as transport role to unblock the review.

Apr 25 2019, 2:43 PM

jtl updated subscribers of D18811: nuke sack_newdata.

Apr 25 2019, 2:19 PM

Apr 18 2019

jtl added a comment to D19960: Remove support for RFC2675.

In D19960#429052, @bz wrote:

That said, there are vendors who list it (maybe not on the forwarding plane) but as RFC compliance supporting it:
https://www.juniper.net/documentation/en_US/junos/topics/reference/standards/ipv6.html

Apr 18 2019, 10:30 PM

jtl updated subscribers of D19960: Remove support for RFC2675.

I agree with @bz about ideal process. I also agree with @kristof about the practical implications of this feature. :-)

Apr 18 2019, 7:48 PM

Apr 13 2019

jtl accepted D19851: Fix warnings with lib/libpmc.

LGTM

Apr 13 2019, 11:03 PM

jtl added inline comments to D19895: Fix up CXXSTD support, added in r345708.

Apr 13 2019, 10:57 PM

jtl accepted D19861: Update and clarify pflog man page.

Apr 13 2019, 10:47 PM

Mar 18 2019

jtl added a comment to D19611: Incorrect KASSERT.

I'm confused how a RST could have tripped this assert. In that case, len should have been 0 and ((th_flags) & (TH_SYN | TH_FIN)) == 0 should have been true (i.e. neither SYN nor FIN was set). In other words, it looks to me as if a RST should already pass the assert without tripping it. Can you explain further what I'm missing?

Mar 18 2019, 3:37 PM