Why not move the old_method: label above the stack variables' declaration? I think it may be cleaner to read.
Like this:
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Fri, Jan 31
Wed, Jan 8
Jan 6 2025
Dec 18 2024
Dec 11 2024
Dec 10 2024
Additional comment:
Nov 25 2024
Nov 19 2024
update:
Looks this patch has some significant reduction on fragment (data_size % MSS) > 0 out of TSO data chunks: testD47474
TSO not enabled:
Nov 14 2024
In D47474#1084985, @rscheff wrote:Well, I had the same thought - the full MSS (including options) is less frequently used, that the mss without options...
So, it would certainly be more efficient to store the MSS exkluding options in tcpcb, and calculating the MSS inkl. option space only where that is needed...
But that should a different Diff IMHO, as this one is more in the break/fix category.
Nov 4 2024
I think you meant the title be:
tcp: consistently set CWND to MSS => tcp: consistently set CWND to 1
in case of SYN/SYN ACK retransmissions => in case of SYN retransmissions
Oct 28 2024
OK. I am approving it now as my test in https://wiki.freebsd.org/chengcui/testD43470 shows some improvement. Any bug related observations can be fixed later.
Oct 24 2024
Also, please correct the SUMMARY section:
Oct 23 2024
In D30155#1076291, @kbowling wrote:In D30155#1075233, @cc wrote:From my test result in testD30155, I didn't find any significant improvement under my eyes:
- no significant difference in ping latency
- no significant iperf3 performance improvement due to bad performance (3.x Gbps) in FreeBSD 15-current vs. (9.x Gbps) in stock Linux kernel 5.15.
Thanks for the results @cc. Something seems very strange with the throughput there, the main system I am testing is a xeon-d that is much less than 1/4th as powerful and can line rate both directions no issues and I also have an older 2x Xeon E5-2695 v2 (two NUMA domains) without throughput limitations. I will see if I can find my emulab credentials and take a look there, it seems like these might be 4-way NUMA machines but it is not expected to me that that would cause this magnitude of throughput issues, especially at the 10gbit data rate.
Oct 22 2024
Oct 21 2024
Oct 17 2024
update code based on discussion
From my test result in testD30155, I didn't find any significant improvement under my eyes:
Oct 16 2024
Better now. But it can be cleaner.
Add the __inline keyword to avoid overhead when possible.
Oct 15 2024
My current concern is that the definition and the usage of the super set macro TH_FLAGS or TCPF_ALL are inconsistent. For example, TH_ECE is in TH_FLAGS, but TH_ECN is in TCPF_ALL.
In D30155#1073639, @kbowling wrote:Ok this is a bit messy code and comment wise but I have the new algorithm working in what I believe to be the correct way with some bug fixes versus the origin and would like some data to see how to proceed before tidying everything up.
@cc it looks like emulab has ix(4) on d820s nodes, would you be willing to take a look at these 3 options similar to the e1000 test?
- Default in HEAD/STABLE: sysctl dev.ix.<N>.enable_aim=0
- New algorithm (on by default with this patch) sysctl dev.ix.<N>.enable_aim=1
- Old algorithm (FreeBSD <10) sysctl dev.ix.<N>.enable_aim=2
Oct 14 2024
I current concern is that new code for the TH_AE shall be in a separate patch, so that this patch can be a pure big non-functional change.
Oct 11 2024
Need code update.
Because of commit 440f4ba18e3a, please re-base.
Oct 10 2024
By the way based on my test, I didn't find this statement In addition, cwnd used to be 1 MSS right after RTO, increasing to 2 MSS more recently. to be true in your SUMMARY section. Also Address this by setting up snd_recover just in cc_cong_signal. needs to be revised.
With the provided packetdrill scripts before/after the fix, my test result is in my wiki: testD43355.
Oct 9 2024
I have no problem with this patch after testing it in Emulab. The test result is in my above comment.
If I recall these machines are Pentium 4 era and pretty CPU constrained. You can try the tunable 'hw.em.unsupported_tso=1' and then enable TSO on the interface to get some more bulk bandwidth, they are stable with TSO.
Are you able to detect any improvements or regressions otherwise? ping-pong time at low packet rate between two systems both set with enable_aim=0,1,2 would be interesting.
Oct 2 2024
In D46768#1069027, @kbowling wrote:In D46768#1069015, @cc wrote:In D46768#1067607, @kbowling wrote:@cc this code works well in my testing. There are now some quality of life improvements, at runtime you can now switch in the middle of a test. I run a tmux session with three splits, one of systat -vmsat, one of the benchmark (iperf3 or whatever), and one to either toggle sysctl dev.{em,igb}.<interface number>.enable_aim=<N> where <N> description which follows. You can also do something like sysctl dev.igb.0 | grep _rate to see the current queue values.
Existing static 8000 int/s behavior (how the driver is in main):
sysctl dev.igb.0.enable_aim=0Suggested new default, you will boot in this mode with this patch:
sysctl dev.igb.0.enable_aim=1Low latency option of above algorithm (up to 70k ints/s):
sysctl dev.igb.0.enable_aim=2ixl(4) algorithm bodged in that would need to be cleaned up:
sysctl dev.igb.0.enable_aim=3I would be curious to know what you find with these different options in an array of testing and I will use the results to ready this for actual use.
I didn't find any rate change by the sysctl. Please let me know if the hardware does not support this new change.
root@s1:~ # sysctl dev.em.2.enable_aim=0
dev.em.2.enable_aim: 0 -> 0
root@s1:~ # sysctl dev.em.2 | grep _rate
dev.em.2.queue_rx_0.interrupt_rate: 20032
dev.em.2.queue_tx_0.interrupt_rate: 20032
root@s1:~ # sysctl dev.em.2.enable_aim=1
dev.em.2.enable_aim: 0 -> 1
root@s1:~ # sysctl dev.em.2 | grep _rate
dev.em.2.queue_rx_0.interrupt_rate: 20032
dev.em.2.queue_tx_0.interrupt_rate: 20032
root@s1:~ # sysctl dev.em.2.enable_aim=2
dev.em.2.enable_aim: 1 -> 2
root@s1:~ # sysctl dev.em.2 | grep _rate
dev.em.2.queue_rx_0.interrupt_rate: 20032
dev.em.2.queue_tx_0.interrupt_rate: 20032
root@s1:~ #This looks to me like it is working, the algorithm is dynamic and 20k would be latency reducing idle queue. At enable_aim=0, you would see 8000. 20k looks right for an idle queue, what happens if you place a bulk load through it like iperf3? It should drop down to 4k.
In D46824#1068983, @tuexen wrote:In D46824#1068981, @jhb wrote:I can fix the type mismatch during commit. I have not looked to see if other stacks are affected.
Fixing the type mismatch would be good. I think other stacks are not affected, since I think they
do not send a FIN before any outstanding data is ACKed and nothing is buffered anymore.
In D46768#1067607, @kbowling wrote:@cc this code works well in my testing. There are now some quality of life improvements, at runtime you can now switch in the middle of a test. I run a tmux session with three splits, one of systat -vmsat, one of the benchmark (iperf3 or whatever), and one to either toggle sysctl dev.{em,igb}.<interface number>.enable_aim=<N> where <N> description which follows. You can also do something like sysctl dev.igb.0 | grep _rate to see the current queue values.
Existing static 8000 int/s behavior (how the driver is in main):
sysctl dev.igb.0.enable_aim=0Suggested new default, you will boot in this mode with this patch:
sysctl dev.igb.0.enable_aim=1Low latency option of above algorithm (up to 70k ints/s):
sysctl dev.igb.0.enable_aim=2ixl(4) algorithm bodged in that would need to be cleaned up:
sysctl dev.igb.0.enable_aim=3I would be curious to know what you find with these different options in an array of testing and I will use the results to ready this for actual use.
Oct 1 2024
I think this change also applies to the bbr and rack stacks.
Looks good to me. Thanks for removing the goto label skip_alloc that improves reading.
Sep 27 2024
In D46793#1067415, @cc wrote:Does the summary section need to be updated? I didn't find the mentioned leaking part in code. Or am I missing something?
Does the summary section need to be updated? I didn't find the mentioned leaking part in code. Or am I missing something?
In D46768#1067199, @kbowling wrote:Rebase on main and some small improvements and bug fixes. Upon more testing the reimported algorithm is tuned for igb and less governed than intended on lem/em due to a different unit of measure on the ITR register. Need to think a little on how I would like to handle that.
Sep 24 2024
Thanks for adding me as one of the reviewers. I will look at this patch and more likely test it in one of the machines in Emulab.
Sep 17 2024
re-base
re-base after commit b6c137de0af1
update function names based on Michael's suggestion
Sep 5 2024
split this patch into two parts: this patch and D46546
re-base
Sep 4 2024
Besides, I am wondering if TCPSTAT_INC(tcps_sndacks) and TCPSTAT_INC(tcps_sndtotal) consistency can also be improved after successful syncache_respond().
Sep 3 2024
Your summary section claims "In addition, cwnd used to be 1 MSS right after RTO, increasing to 2 MSS more recently." But I could not find the code change where cwnd is changed to 2MSS after RTO. Please elaborate if the summary needs to be revised or I am missing the point.
Aug 26 2024
Are you planning to remove the corresponding code in kernel space in a separate patch?
Aug 22 2024
Aug 15 2024
Aug 14 2024
Aug 11 2024
Aug 10 2024
Aug 8 2024
Aug 7 2024
Aug 6 2024
Jul 29 2024
Thought for a while if these new lines can be wrapped into a new function like try_cc_attach_from_listener(), but it looks to be unnecessary.
In D46141#1052243, @peter.lei_ieee.org wrote:LGTM.
I think for those who want newly accepted connections to use a "new" default stack, there is a mechanism using tcpsso that could be used to (attempt to) change the listening socket's stack so that new connections use that stack. This would need to be run for all desired listening sockets though.
Jul 26 2024
This change seems to revert the commit 6134aabe38c8. Is there any behavior change on V_functions_inherit_listen_socket_stack == 0 after this change? Is the TCP function block from the listener changed dynamically once the default stack is changed?
ship after update
Looks good to me, after the comment update for struct cc_var.
Jul 25 2024
In D46066#1051183, @tuexen wrote:In D46066#1050919, @cc wrote:Just saw https://reviews.freebsd.org/D46068. If this patch has not been committed, we can still further revise it, so that D46068 can be cleaner on function re-use.
I think the logic here can be split into two functions, such that the refined logic 1 on checking if we should send ACK can be re-used across multiple places.
logic 1:
bool is_ack_unlimited(struct tcpcb *tp) { /* focus on checking the epoch and * if should send ACK, return true; else return false */ }logic 2:
void tcp_send_challenge_ack(struct tcpcb *tp, struct tcphdr *th, struct mbuf *m) { if (is_ack_unlimited(tp)) {. tcp_respond(); .... } }I don't get the point. Eventually, all places should use tcp_send_challenge_ack(), I think. But that is multiple commits away.