I'll do a CFT on this change with OPNsense users next week. A few can trigger the if_afdata panics with PPPoE.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Fri, Mar 14
simplified this and marked as POC to keep the discussion as a reference
In D49212#1125353, @melifaro wrote:The code cleanup in the diff LGTM - added a couple of comments.
Tue, Mar 11
Fri, Mar 7
Happy to see this progress. Do you think removing RT_LINK_IS_UP without adding the additional conditional would be a useful cleanup nonetheless? Half the code uses NH_IS_VALID, the other RT_LINK_IS_UP and all of it is within NH scope nowadays.
Thu, Mar 6
Something is filling the table while we read it with the lock held in pfr_get_addrs() is what this situation tells me, which points to a missing lock somewhere else, maybe introduced as far back as https://cgit.freebsd.org/src/commit/?id=890612bbeb69f
In my local testing KASAN allows me to push 100 mbit max, which is going to be a challenge asking the user to deploy this in production and to trigger the bug which only happens under traffic. I can try, but not optimistic.
I'm happy to pass an alternative patch on to the affected users, but I'm unable to reproduce this locally.
In D49214#1123225, @markj wrote:In D49214#1122687, @franco_opnsense.org wrote:I'm tracking down what looks to be a memory corruption of some sort:
Have you tried testing a GENERIC-KASAN kernel?
also remove the other assert
removed assert in tree traversal hotpath
Tue, Mar 4
I'm tracking down what looks to be a memory corruption of some sort:
Last bits in dmesg, perhaps unrelated:
Mon, Mar 3
Wed, Feb 26
Let me throw in the towel here. I've left two review comments.
First of all thanks for your work. I'm in the same boat as you wanting to see this progress.
Tue, Feb 25
Thu, Feb 20
Also confirmed now but the reporters have operational remarks when running with the fix, see https://github.com/opnsense/src/issues/239#issuecomment-2669497329
Wed, Feb 19
In D49053#1118526, @zlei wrote:IN_ is short for inet but no doubt it is a little ambiguity esp. for newbies. That requires network domain knowledge but apparently IN6_ is much better.
Linux KPI has a pragmatic approach:
Did the same here https://github.com/opnsense/src/commit/8f86d0fdd37 but haven't heard from the user yet whether the provided test kernel works for them.
Tue, Feb 18
Jan 8 2025
@des did you find the time to make a technical assessment? thanks!
Dec 23 2024
The Common Criteria aspects are documented here https://docs.vmware.com/en/VMware-SD-WAN/6.0/VMware-SD-WAN-Administration-Guide/GUID-DF592A36-E680-44CE-ABFC-ACE19B55B448.html
Dec 20 2024
Dec 18 2024
change summary
@kp let me know how to proceed
Dec 16 2024
We've tested https://github.com/opnsense/src/commit/7f55b75b successfully with the user in OPNsense.
Dec 13 2024
Dec 11 2024
Dec 6 2024
Dec 5 2024
Done, for reference: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283137
Dec 4 2024
Potentially fixed by c22c98798 ;)
This commit likely causes the following panic reported by multiple users:
Dec 3 2024
Offered a test kernel to the user in the meantime. @glebius is probably busy (last I spoke with him) and doesn't mind if you commit?
Like this?
Dec 2 2024
In D46301#1079978, @melifaro wrote:I'm going to come up with a different version of this patch (likely using a new flag rtmsg->rtm_flags to signal RTM_F_FORCE) in a day or two. The current version allows all netlink customers to fully bypass PINNED route protection, which defeats its purpose.
For the record I have no stakes in SCTP and I'm not involved in the changes done here. This work with the user reporting it is a courtesy to the FreeBSD pf code and to triage panics which I only think is a good idea for production code.
What's the status now? From my point of you I've explained the previous behaviour, the change that caused this to exhibit the wrong behaviour and how it was fixed. I'd like to get to a more productive cooperation based on technical urgency. I don't think tests prevented this from happening and I don't think the issue gets any better discussing commit messages.
Nov 28 2024
Here it is:
Nov 27 2024
Feel free to make a different suggestion. I've long been under the impressions explaining one liners is not the scope of a commit message and the tests should make this rather clear as requested in previous discussions.
I changed the summary.
D47658 is on hold as I struggle to understand why the code discussed there is not covered by existing tests and how to actually trigger it (it should easily panic after all but the code is never hit).
Nov 18 2024
In D47658#1086724, @kp wrote:The entire backtrace would have been nice.
Sounds good to me, thanks :)
Meh, sorry for the misinformation. All of this is very difficult to trace.
(kyua depends on e.g. libexec/atf-sh for script execution via shebang in /usr/src/tests/sys/netpfil/pf/*.sh)
I have WITHOUT_TESTS=yes and WITH_TESTS_SUPPORT=yes which adds kyua and libatf, but not atf tools in libexec. On a build without WITHOUT_TESTS=yes the atf tools are placed in the base system as a side effect.
Sure. For context:
I'm trying to understand the entry barrier here. If kyua fixes everything that's alright and thanks for the pointers. From release engineering scope there are a number of kyua integration challenges irrelevant to your argumentation (which I can understand), but you both seem to see the situation too easy from an established development workflow ("just do it right"). Using a custom src.conf already makes kyua defunct, but it is what it is. Going to raise appropriate patches elsewhere then. Thanks!
Nov 15 2024
So regardless of why I already stated this is a technical issue that is by no means "pointless", what do you suggest to improve this particular test to make it more robust? Uncontrolled creation of processes that inherit file descriptors isn't exactly clean design but I can see why you do not want to apply this mere bandaid with that larger issue at hand. I'm happy to do the work since a lot of people were asking for test cases and here I am offering work on test cases to get started. :)
Nov 14 2024
Thanks, just to be clear you imply the change is wrong even though the test still works?
Consider running tests from the src tree using atf-sh (I'm using devel/atf but the base one also works with the full path I think):
Nov 13 2024
Nov 8 2024
- libfetch: shuffle SSL_CRL_VERIFY options
"all" to "chain" is also fine. We can do "none" but it will just add conditionals to the code and the libfetch style works with implicit defaults everywhere. Your choice :)
So I avoided the "none" case for lack of functionality and added the "leaf" (could call it "one") case for flexibility. "opt" or "optional" .. I'm not attached either way. I also added the error number to the default message which has bugged me for a while now while testing this and the optional message that a CRL was not provided now goes to stderr which attempts to make sure it is seen by the user. In the pkg case fetch_info appeared to be suppressed somewhere. What do you think? :)
- libfetch: rewrite SSL_CRL_VERIFY behaviour
- libfetch: add the error number to verify callback failure case
- libfetch: wording
- libfetch: redo docs
Nov 6 2024
In D47433#1082491, @michaelo wrote:I don't disagree, but introducing multiple vars for the same config isn't better either in my opinion. Consider you want to expose that to the CLI for fetch(1), do you want to introduce multiple switches?
- lib/libfetch: feedback on previous
In D47433#1082488, @michaelo wrote:Well, then maybe SSL_VERIFY_CRL should not be boolean, but rather an enum? E.g, optional, yes, much like https://httpd.apache.org/docs/current/mod/mod_ssl.html#sslverifyclient because it the end it will require more and more flags. Default value would be none/NULL.
In D47433#1082486, @michaelo wrote:Like fine, but then CR, not CRL because we don't verify the list, do we? :-D Since it is a *verbose* flag I don't mind being verbose literally.
Oh about SSL_VERIFY or SSL_CRL I'm not sure. Keeping it closer to SSL_CRL_FILE may be more beneficial also with SSL_CRL_OPTIONAL in mind later. Don't want these vars too long if it can be avoided and cluster all CRL into SSL_CRL prefix?
In D47433#1082483, @michaelo wrote:WDYT?
In D47433#1082468, @michaelo wrote:I have now played around with the patch and one of our intermediate CAs:
$ openssl s_client -connect dw-eng-rsc.innomotics.net:443 CONNECTED(00000003) depth=2 C = DE, ST = Bayern, L = Muenchen, O = Siemens, serialNumber = ZZZZZZA1, OU = Siemens Trust Center, CN = Siemens Root CA V3.0 2016 verify return:1 depth=1 C = DE, ST = Bayern, L = Muenchen, O = Siemens, serialNumber = ZZZZZZE7, CN = Siemens Issuing CA Intranet Server 2022 verify return:1 depth=0 C = DE, O = Siemens, OU = IN HVM DW, CN = dw-eng-rsc.innomotics.net verify return:1It works with a CRL file as well as hashed in /etc/ssl/certs, but certctl(8) is totally unusable here (read broken), had to do it manually. Obtaining the CRLs from all CAs in that chain, convert from DER to text is a pain and technically needs to happen periodically. As being completely off topic: I consider CRLs as a total pain, cannot tell whether OCSP is any better, but that is a different discussion. I have only concerns with the verbose output which I will add inline.
In D47433#1082467, @michaelo wrote:While testing this, do you intend to add a flag to fetch(1) as well? E.g., --crl-verify?
Nov 5 2024
In D47433#1082177, @michaelo wrote:I think I have found it, the documentation isn't really good in this case for both SSL_CTX_load_verify_locations() and SSL_CTX_set_default_verify_paths(). If a hashed dir is passed it boils down to https://github.com/openssl/openssl/blob/ccaa754b5f66cc50d8ecbac48b38268e2acd715e/crypto/x509/x509_d2.c#L73-L76 where the manpage says:
X509_LOOKUP_add_dir() passes a directory specification from which certificates and CRLs are loaded on demand into the associated X509_STORE. type indicates what type of object is expected. This can only be used with a lookup using the implementation X509_LOOKUP_hash_dir(3).
Do you copy?
remove unused variable
Nov 4 2024
Oct 11 2024
Let's get this in for the sake of correctness although my testing was inconclusive and will circle back eventually.