In D29519#666595, @tuexen wrote:In D29519#666585, @markj wrote:Ping? Any comments on the overall approach, or on the details of the change?
Hi Mark,
we discussed this at the last transport call. We agreed that that handling should be consistent but wanted to check until the next transport call (about two weeks from now) what the consistent way would be. My understanding is:
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Mon, Apr 12
Mon, Apr 12
jtl added a comment to D29519: Add missing sockaddr length and family validation to various protocols.
Thu, Apr 1
Thu, Apr 1
jtl added a comment to D29519: Add missing sockaddr length and family validation to various protocols.
Thanks for doing this! It looks like a very positive change, and I'm sure there was a lot of effort put into finding the right way to clean up the code.
jtl added a comment to D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
In D24316#639938, @brd wrote:
Thanks for doing this so quickly!
Wed, Mar 31
Wed, Mar 31
FWIW, I disagree with this change. I think we should instead use atomic operations here.
jtl committed R10:40d278253d20: Fetch the sigfastblock value in syscalls that wait for signals (authored by jtl).
Fetch the sigfastblock value in syscalls that wait for signals
jtl committed R10:a25c17022e2d: Fetch the sigfastblock value in syscalls that wait for signals (authored by jtl).
Fetch the sigfastblock value in syscalls that wait for signals
Mar 12 2021
Mar 12 2021
jtl committed R10:dbec10e08808: Fetch the sigfastblock value in syscalls that wait for signals (authored by jtl).
Fetch the sigfastblock value in syscalls that wait for signals
In D29225#654397, @kib wrote:It is strange indeed, and it sounds more like a self-inflicting action from userspace. Code in rtld or libthr should not leak sigfastblock block, but of course bugs are possible.
This is the change I am planning to commit once the regression tests finish running.
Mar 11 2021
Mar 11 2021
Jan 14 2021
Jan 14 2021
Dec 4 2020
Dec 4 2020
This does not fix the regression I am experiencing in my test setup. I am testing with a machine which uses a LAGG interface to communicate with the outside world. Shutting this interface down still makes my SSH sessions hang.
Dec 3 2020
Dec 3 2020
I just saw the discussion on the committers mailing list. First, it shows that @cy already has a proposed fix. Secondly, it shows that this is a larger issue (for example, netboot), which probably needs a different solution.
In D27464#613656, @tuexen wrote:When committing, please add a reference to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251540
Nov 20 2020
Nov 20 2020
When copying types from one CTF container to another, ensure that we
In D27246#609375, @markj wrote:I'd suggest running the DTrace test suite with this change if you haven't. make -C cddl/usr.sbin/dtrace WITH_DTRACE_TESTS= all install should install them to /usr/tests/cddl/usr.sbin/dtrace.
Nov 17 2020
Nov 17 2020
In D27246#608560, @markj wrote:Could you please re-upload with context?
Updating the diff to include context.
When copying types from one CTF container to another, ensure that we
While here, update the code in ctf_add_generic() to encode empty type names with index 0. This fixes the analogous case for type names.
In D27213#608228, @jtl wrote:In D27213#608181, @markj wrote:I can't reproduce this problem at all on head. The script appears to work properly. I remember that the use of anonymous unions in struct mbuf caused some problems, at least one of which was fixed by r305055, but that was a long time ago.
I can reproduce this problem locally. Some others can't reproduce the problem I reported in the main description. I'm not sure why, but there seems to be some non-determinism in the way symbols are loaded/resolved?
In D27213#608269, @ae wrote:It seems I found how to reproduce it on test system:
- Load systemt without any unneeded modules
- kldload dtraceall
- Run
# dtrace -n 'fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }' dtrace: description 'fbt::ip_input:entry ' matched 1 probe CPU ID FUNCTION:NAME 2 49220 ip_input:entry ix0 2 49220 ip_input:entry ix0 6 49220 ip_input:entry ix0 ^C # kldunload dtraceall # kldload ipfw # kldload dtraceall # dtrace -n 'fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }' dtrace: invalid probe specifier fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }: in action list: m_pkthdr is not a member of struct mbuf
Nov 16 2020
Nov 16 2020
In D27213#608181, @markj wrote:In D27213#608180, @jtl wrote:In D27213#608081, @ae wrote:Recently I faced with this problem on some machines:
# dtrace -n 'fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }' dtrace: invalid probe specifier fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }: in action list: m_pkthdr is not a member of struct mbufAnd it seems it is exactly related.
It does appear that problem is likely related to type resolution. However, for what its worth, this patch does not solve that problem on my test machine. So, that problem may have a different cause.
I can't reproduce this problem at all on head. The script appears to work properly. I remember that the use of anonymous unions in struct mbuf caused some problems, at least one of which was fixed by r305055, but that was a long time ago.
In D27213#608081, @ae wrote:Recently I faced with this problem on some machines:
# dtrace -n 'fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }' dtrace: invalid probe specifier fbt::ip_input:entry { printf("%s", stringof(args[0]->m_pkthdr.rcvif->if_xname)); }: in action list: m_pkthdr is not a member of struct mbufAnd it seems it is exactly related.
Nov 14 2020
Nov 14 2020
Add a regression test for the port-selection behavior fixed in r367680.
Fix implicit automatic local port selection for IPv6 during connect calls.
Nov 13 2020
Nov 13 2020
Nov 11 2020
Nov 11 2020
Nov 10 2020
Nov 10 2020
jtl updated the summary of D27164: Fix implicit automatic local port selection during connect calls.
When destroying a UMA zone which has a reserve (set with
Nov 6 2020
Nov 6 2020
Nov 5 2020
Nov 5 2020
jtl added reviewers for D18892: Phase 2 to add Proportional Rate Reduction (RFC6937) to FreeBSD: lstewart, rrs.
In general, this looks good. I have a small nit in that it seems like it would be worth considering whether it would be better to add the flag to the socket itself somehow so it could be synchronized by the socket lock. On the offchance someone did a socket operation which caused a wakeup while the TCP code was running, it seems possible that this might avoid a spurious wakeup. However, given the code in the src tree, I find it hard to reason through how this could occur.
Nov 4 2020
Nov 4 2020
In D26082#604611, @emaste wrote:I had a look at the man page and the -U option is indeed a bit confusing.
Certainly replacing the \n is the right thing to do; I will try to have a look at the src.
Apr 23 2020
Apr 23 2020
Apr 20 2020
Apr 20 2020
With the change I made to keep the current behavior for everything except swap (which is fairly well tested), are there additional concerns?
Apr 18 2020
Apr 18 2020
Apr 17 2020
Apr 17 2020
I was able to replicate it locally using llvm's c++ 9.0.1 and defining _WANT_SOCKET prior to including the header file:
Can you provide some more information on the exact error you saw? I'm 99.9% sure we successfully compiled these sources with LLVM 9.
Apr 16 2020
Apr 16 2020
jtl committed rS360020: Avoid calling protocol drain routines more than once per reclamation event..
Avoid calling protocol drain routines more than once per reclamation event.
Add a regression test for the changes in r359922 and r359923.
Apr 14 2020
Apr 14 2020
Make sonewconn() overflow messages have per-socket rate-limits and values.
Print more detail as part of the sonewconn() overflow message.
Make the path length of UNIX domain sockets specified by a #define.
Apr 13 2020
Apr 13 2020
Address comments by @jhb:
- Delete the code to declare the GELI threads as kernel FPU threads. (I'll open a separate review for that.)
- Switch the default to blocking mallocs for everything except swap requests.
In D24400#536778, @jhb wrote:To be clear, you only tested encrypted swap? Did you do any testing with encrypted volumes meant to hold persistent data after a reboot (e.g. holding a UFS volume on a disk) and seeing how it was impacted by ENOMEM?
jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
Rebase onto D24272.
jtl added inline comments to D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
Correct typo in the man page. (Thanks @bcr!)
jtl added inline comments to D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
jtl updated the diff for D24316: Make sonewconn overflow messages have per-socket rate-limits and values.
Make the overflow rate-limit be controlled by a sysctl.
I'm going to do a tinderbox build of this + D24316 (mostly, to make sure I didn't mess up the various INET/INET6 combinations) and then commit them.
Apr 9 2020
Apr 9 2020
Thanks for providing the extra context on the transport call today. Overall, this change looks good. I left a few nits in in-line comments.
As discussed on the transport call today, please change both the in order and out of order data path so we call sorwakeup_locked() (when necessary) after the SACK blocks are updated.
Apr 6 2020
Apr 6 2020
Switch to using sbuf(9) for string creation. Also, use a constant string for "local:".
Apr 3 2020
Apr 3 2020
In D24272#534063, @jhb wrote:I think this looks fine. It would be nice to use a string builder instead of memcpy/strcat, etc. like sbuf(9) to make it more robust to future changes.
FWIW, these are examples of the messages this produces:
Apr 3 19:58:34 c006 kernel: sonewconn: pcb 0xfffff805a566a200 (127.0.0.1:65432 (proto 6)): Listen queue overflow: 4 already in queue awaiting acceptance (1 occurrences) Apr 3 19:59:44 c006 kernel: sonewconn: pcb 0xfffff80611de4000 ([::1]:65432 (proto 6)): Listen queue overflow: 4 already in queue awaiting acceptance (2 occurrences) Apr 3 20:16:12 c006 kernel: sonewconn: pcb 0xfffff80170cde100 (local:/tmp/testsock): Listen queue overflow: 4 already in queue awaiting acceptance (1 occurrences)
Mar 12 2020
Mar 12 2020
Mar 9 2020
Mar 9 2020
Feb 27 2020
Feb 27 2020
jtl added reviewers for D23655: Cubic: prevent abrupt cwnd jumps after slow start: lstewart, transport.
In D18624#458099, @jtl wrote:This looks like it does what is described. As I understand, the use of delivered_data will be covered by a separate review. It looks like this will slightly change the way sacked_bytes is calculated. The change is probably a good thing, but it is worth verifying (and I haven't done this yet) that the updated calculation will work correctly.
Feb 10 2020
Feb 10 2020
Modify the vm.panic_on_oom sysctl to take a count of events.
Feb 7 2020
Feb 7 2020
In D23517#516688, @markj wrote:In D23517#516336, @jtl wrote:In D23517#516299, @markj wrote:So, the problem manifests as the laundry queue steadily growing without any swapping in response?
We noticed this when we tried enabling encrypted swap. On the console, we see a string of processes killed due to the server being out of memory. Then, eventually, the watchdog timer fires and kills the system. I don't know what triggers this cycle. It seems to happen on a small percentage of systems hours to days after boot. To the best of my knowledge, we have not been able to observe a system descend into this naturally.
We tried to recreate the problem by artificially creating memory pressure. We ran a program that allocates a lot of memory (equal to the sum of the free and inactive sizes) and sequentially writes to each page in a loop. When we did this, we saw:
- The laundry size is growing.
- Processes are continually killed due to low memory.
- Finally, the watchdog kicks in and reboots the system.
Now, we may have recreated a different problem that has similar symptoms. But, at minimum, it seems like this is showing *a* bug.
I've done tests like this in the past. Indeed, if the program is dirtying pages quickly enough we might give up an attempt an OOM kill, but we should definitely be targetting the runaway process first. It's possible that we may have swapped its kernel stack out, in which case I believe we have to swap it back in to reclaim anything, since reclamation happens during SIGKILL-triggered process exit. (I don't see offhand why another thread couldn't call pmap_remove_pages() on the target process before it is swapped back in though.)
I have not tried testing with a GELI-backed swap device though. Presumably you were using one? Do you see a difference in behaviour when swap is unencrypted?
Feb 6 2020
Feb 6 2020
In D23517#516299, @markj wrote:Note that the page daemon uses vmd_free_target as the PID controller set point, but its target may be larger than the instantaneous difference vmd_free_target - vmd_free_count. So if it manages to free enough pages to satisfy vm_paging_target(), but not enough to satisfy the PID controller target, it'll trigger the laundry thread's shortfall mode, which then does nothing (unless pageout_deficit happens to be bigger than the negative difference). This suggests to me that the page daemon should be storing the value of page_shortage in the vmd_shortage field. In other words, the page daemon has failed to meet its target by page_shortage pages, and the laundry thread should try and make up that difference.
Feb 5 2020
Feb 5 2020
In D23517#516299, @markj wrote:So, the problem manifests as the laundry queue steadily growing without any swapping in response?
Oct 24 2019
Oct 24 2019
jtl abandoned D16850: Update the fragment reassembly code's handling of overlapping fragments to conform to RFC 8200..
@bz will commit.
Basically commited by @bz .
Oct 1 2019
Oct 1 2019
In D21840#477432, @rscheff_gmx.at wrote:TCP_INFO is not portable afaik; however, I wonder why the linux varian has a flag for "SYN_DATA" (which may get delivered to the appliacation if the TFO is present), but no flag for TFO, while this change is signaling the presence of TFO, but no SYN_DATA... Just curious...
Sep 27 2019
Sep 27 2019
jtl added a comment to D19622: Fix panic in network stack due memory use after free in relation to fragmented packets.
In D19622#476255, @bz wrote:In a later call I think your suggestion was along the lines of:
(a) if ifnet goes away nuke the recvif pointer from the queued mbufs
(b) if reassembly times out do as outlined above and skip sending ICMP error/per-IF statistics/.. if the ifnet pointer was nuked
(c) if another fragment arrives (or the last fragment to complete the packet arrives) use that ones recvif pointer as that interface is expected to still be there to pass the packet onNow there is a gray area between (b) and (c) in which (b) could be extended to "if we cannot find an ifnet pointer in the expected place, scan the fragments of the packet in question for any ifnet pointer and use that one" for error handling. It'd be a one-time slightly more expensive operation. If there's an attack however that is kind-of the extra work you'd want to avoid. Without the extra work, you may not find out as easily what kind of problem you are running into though as you are lacking statistics. Trade-off...
Sep 26 2019
Sep 26 2019
In D14387#476034, @bz wrote:This may sound like a pain but as you already say in your message, this is two changes .. First .. Second ..
Can you split them up into such? First should be really easy to review and second should then be straight forward as well by itself.
Add new functionality to switch to using cookies exclusively when we the
Access the syncache secret directly from the V_tcp_syncache variable,
Remove the unused sch parameter to the syncache_respond() function. The