Dec 13 2020
Jun 20 2020
if_mibdata seems to be not used at all. I can only find the reference in dev/ffec, and even there it is under #ifdef 0.
Might be it should be removed ?
If I can remember the theory I had way back when, the purpose of allowing writes to this data was simply to allow a client or administrator to clear the counters. Given the lack of use I think this is a reasonable simplification.
Jun 18 2020
I like this change; we've had nested variable declarations since 1989 and I think 30 years is more than enough time to get used to them. I'm a bit more cautious about declarations in the initializer of a for loop. I do acknowledge the concerns several developers have expressed above about stack depth; this information should be possible to derive from DWARF2 debugging information, so perhaps an automated tool could replace manual inspection of the source.
Apr 14 2020
I haven't been paying attention to this, but IME the problem with epair (and also bridge) interfaces was always that the generator would come up with the same MAC address on different *machines*. We simply gave up and manually assigned LAA MAC addresses to all such interfaces.
Apr 5 2020
I have been begging for this for years and strongly support.
Jan 3 2020
I wrote this, a very long time ago in a career far away. I suspect it can be deleted entirely, but this is a reasonable thing to do right now. Nobody should still be using any code that depended on the old kern.maxsockbuf sysctl node number.
Oct 4 2019
Did not review the code but I love the change. One less reason to install lsof!
Sep 16 2018
May 23 2018
Seems pretty sensible to me. These functions are of the same ilk (tzcode inventions) as timelocal (standardized as mktime()) and timegm (standardized as that stupid formula in the POSIX spec).
May 12 2018
May 11 2018
May 10 2018
As an aside, @jtl privately questioned the use of ENOMEM vs ENOBUFS. The TCP stack uses ENOBUFS and ENOMEM interchangeably for malloc failures, and @jtl is wondering if we should standardise on one or the other as far as user space app error handling is concerned to aid portability. This LKML thread and the Linux socket(2) man page suggest that returning both is kosher from both a POSIX point of view and as established precedent. I haven't consulted the standards myself, but it seems plausible that we should add ENOMEM to our tcp(4) - here is a revised patch that does so. @emaste, your were singled out as someone with a good working knowledge of POSIX... do you have any comment on the subject of socket API ENOMEM vs ENOBUFS errno usage?
May 9 2018
It would be really nice to see this fixed sooner rather than later. I can't tell from the phab UI who's put a block on this, but could we maybe move forward a bit? The current messages are both annoying and unhelpful.
Looks good to me.
Is it planned to address the client side (e.g., nss-pam-ldapd) or just the server?
It's probably worth a little bit of thought as to what is the more common case, a zillion epair interfaces on one host (read: half a zillion vnet jails) or a much smaller number of epairs on a larger number of hosts. It sounds like you are well placed to assign your own addresses, whatever the default may be.
Well, this is not network hot path and we now have jenkins_hash32() kernel function in all supported branches, so we could just allocate additional uint32_t key array on stack of this function, copy there all 64 bits of (long)hostid plus (uint32_t)if_index and make use of jenkins_hash32() to get them mixed to 32-bit hash value. Then use it as-is for eaddr-eaddr similar to current manner or switch to using 58:9C:FC for upper half of MAC and use lower 24 bits of the hash for the rest of MAC resetting lowest bit to 0 for first interface and to 1 for second one.
May 8 2018
Well, I have many hosts having over housand of ngXXX interfaces, so yes, two bytes are needed at least.
May 6 2018
I'd like to see a few more bits of uniqueness here; I don't trust that two bytes of hostid is going to be sufficient. Nonetheless, this is a good thing. (We probably won't use it -- we embed the host's canonical IPv4 address into the MAC -- but it's important that the default out-of-the-box config be usable!)
Jul 25 2017
Yes, you're touching on the underlying awkwardness here which is that the original overflow logging happens in sonewconn, which is at the generic socket layer. I toyed with the idea of simply changing the message at the generic socket layer, but everything I came up with either had gross layering violations (e.g. having a socket layer function with knowlege of all the potential so_pcb types) or got fairly complicated with arguably poor abstractions (e.g. adding a protosw call for something like char *pr_getsockdescr). I'm definitely open to suggestions here though.
Have to say, I cannot imagine any circumstances in which the message that's on by default would be useful when the message that's off by default isn't printed. The PCB address is useless debugging information here, whereas the 4-tuple of the dropped connection is actually meaningful -- so I'd reverse them (or better yet, figure out a way to combine the two messages while hiding the PCB address under a debug flag).
Aug 24 2016
Looks good to me.
May 27 2016
I've been meaning to do something like this over the past year (since I made my last comment) but haven't had the time. But after attending a few conferences I became convinced that what I really wanted was to capture exact timing for all RPCs and have a usermode program to analyze the data via something like ALQ. In particular, I want to measure median and tail latency (which is going to be different for different operations) and also queue-residence time in the various queues a request moves through. This may be too expensive for some (many?) applications, so I don't expect to see that sort of thing here.
Mar 8 2016
Feb 22 2016
Haven't read the diffs but I like this very much in concept.
Feb 14 2016
I agree that this gives correct results as is, but (as the original perpetrator) I'd rather see an explicit unsigned type called out here -- e.g., ~(some_type_t)0 -- rather than depending on two's-complement sign extension behavior, if you're not going to use a macro. Using bitwise operators on signed values makes me uncomfortable, and I wouldn't want to rely on WG14's good graces....
Jan 7 2016
Dec 28 2015
Oct 30 2015
Sep 7 2015
Sep 6 2015
For what it's worth, in all my testing -A and -R modes have been a small but significant lose, so I'll be interested to see if this mode fixes that.
Can we add some unit tests, please?
Agreed that the current behavior is a bug and we should just fix it. I think pkelsey is right as far as the correct fix.
Also I'd rather have a random local part with the slight chance of collisions rather than something deterministic as the moment you bridge them out to a real ethernet you get a lot of conflicts if you have multiple net machines on the same VLAN.
The only real solution to avoid long-term conflicts for people is to have their own management and statically assign the ether address to the jail to avoid us having to guess one.
No objection in principle but it's been far too long since I had anything to do with this code to express an opinion on the change itself.
This is now overtaken by events in mckusick's commit r287497.
Apr 10 2015
See this image for a demonstration of what a difference correctly sized hash tables makes. (You can ignore the red pluses, that's just the default sizing for everything.) This is the metadata-intensive SWBUILD workload of the SPEC SFS 2014 benchmark; I'm collecting data for the other workloads presently.
In D2265#14, @kostikbel wrote:
The hash which is not resized is the vfs_hash (vfs_hash.c), most likely. It is very much innocent to not adjust the hash size after desiredvnodes change at runtime.
There's nothing actually wrong with a SYN+FIN packet; the TCP spec unambiguously defines how it's supposed to be handled, and the code does it correctly as is (by turning off the FIN bit), which certainly seems to be within the spirit of what "scrub" is supposed to do. I'm not sure I see what the problem this is trying to solve is.
Apr 9 2015
This version actually works, and restores the run-time modifiability. The way I've implemented it preserves the compiled-in limit of MAXVNODES_MAX at boot time only; probably it should be bypassed or made into a louder warning when set as a tunable, or else enforced at run time as well. (It's not clear whether MAXVNODES_MAX is intended to be a true limit or just an additional sanity check for the auto-tuning implementation.)
I just hate when I have to reboot system to do some tuning. I think that read-write sysctl that is somewhat suboptimal is better then read-only. I think making sysctl read-only would be a step into wrong direction.
I don't see harm from making it tunable, that is good, but I don't like idea of making it read-only. I believe that loader-only tunables are too uncomfortable and should be avoided, when possible.
Don't overwrite the value specified in the tunable when we initialize desiredvnodes.
Apr 1 2015
svc_change_space_used() call out of the loop was kind of optimization when modifying variable shared by many threads, but that may be not so important.
Mar 31 2015
This code is running on a production fileserver now. Waiting for the all-arches compile test before I'm ready to commit.
Mar 30 2015
Previous patch had a use-after-free bug (that should have been obvious). This version reintroduces the "sz" variable to prevent this.
NB: This diff is relative to 10.1. Context lines have changed in 11-current so the diff won't apply directly.
Mar 29 2015
Speaking as a user, I would really love to see some form of this functionality happen.
Remove the redundant overflow check and expand the comment to make it clear what's going on when we compute the request buffer limits. This should be the last iteration of this patch. This version isn't running on a production server yet; hopefully that will happen on Monday.
It looks ok to me. I might have used int64_t instead of long, so that they're
64bits on all arches, but I don't think it will matter, since nmbclusters won't
be that large for 32bit arches, I think?
Mar 28 2015
This version eliminates the variable "sz" in svc_run_internal() entirely by not batching updates to the available request space.
In the previous diff, arithmetic overflow resulted in incorrect values being computed. This version changes all of the data types to be longs rather than ints, as is more appropriate for 64-bit machines. (I haven't compile-tested on an ILP32 machine to see if it even still builds -- I suspect not, and I'll need to figure out a way to work around the lack of atomic 64-bit ops on 32-bit architectures.)
This simple change actually failed even worse than the original code. I suspect signedness problems surfaced by the switch from gcc 4.2 to clang (we just upgraded from 9.3 to 10.1 on our NFS srevers), and I'm preparing a new diff that tries to deal with that.
Jan 24 2015
Linux emulator support? I'm assuming it's already stubbed out (haven't looked) and just needs to be switched to the new internal interface.
Aug 24 2014
Shouldn't all of these BLOCKS declarations be moved down to the main BSD_VISIBLE block, since they are non-standard interfaces?
As Jilles says, there's no requirement to have this check. But since access() is not called very frequently, adding one branch seems unlikely to hurt.