Details

Reviewers

glebius
gallatin
imp
tuexen
peter.lei_ieee.org
ali_mashtizadeh.com

Summary

Add eventlog(9), a subscription-based framework for emitting
structured per-session events from kernel subsystems to userspace
or in-kernel consumers. It has three concepts:

Provider: a kernel subsystem that emits events. Multiple providers may share a name (e.g. the default and RACK TCP stacks both register as "tcp"); each gets a unique 16-bit id.
Session: an observed entity, e.g. one TCP connection, identified by a provider-defined uint64_t.
Subscriber: a consumer of events, either via a per-CPU double-buffered ring exposed through the host-global /dev/eventlog device, or via a synchronous in-kernel callback.

The hot write path enters an smr(9) section, walks subscribers
without locks, and commits with a single atomic_fcmpset_64 on
64-bit targets or a per-pcpu MTX_SPIN on 32-bit targets; NMI
re-entrancy is detected via mtx_owned() to avoid deadlock.

A small DSL in <provider>_eventlog_schema.src files describes
event ids and payload layouts; eventlog_gen.awk processes a
schema into producer or consumer headers.

usr.bin/elog/ ships elog(1), a reference consumer that prints
events as text on stdout or writes a binary .elog stream (-o).
sys/kern/kern_eventlog_test.c implements ktest_eventlog(4) for
ring correctness under concurrent producers and the 32-bit
fallback path; tests/sys/kern/elog_test.py covers elog(1) CLI
smoke tests. Man pages: eventlog(9), elog(1), elog(5).

No in-tree providers ship with this import; downstream consumers
register their schemas and call the emit API directly.

Diff Detail

Repository

rG FreeBSD src repository

Lint

Lint Skipped

Unit

Tests Skipped

Build Status

Buildable 75089
Build 71972: arc lint + arc unit

Event Timeline

nickbanks_netflix.com created this revision.May 12 2026, 9:25 PM

nickbanks_netflix.com held this revision as a draft.

Herald added subscribers: ziaee, olce. · View Herald TranscriptMay 12 2026, 9:25 PM

Harbormaster completed remote builds in B73057: Diff 177714.May 12 2026, 9:25 PM

nickbanks_netflix.com added a reviewer: peter.lei_ieee.org.May 12 2026, 9:39 PM

nickbanks_netflix.com published this revision for review.May 12 2026, 9:39 PM

Hi! We simplified the preferred license template now that SPDX notation has become an ISO standard. Thanks for including nice docs!

Add bin/elog to targets/pseudo/userland/Makefile.depend so DIRDEPS_BUILD also picks up the new userland binary.

Harbormaster completed remote builds in B73079: Diff 177792.May 13 2026, 4:48 PM

Switch newly-introduced files to the simplified BSD-2-Clause template (Copyright + SPDX, no inlined boilerplate) per the licensing-policy doc. Existing files we only modify keep their existing headers untouched.

Harbormaster completed remote builds in B73097: Diff 177827.May 14 2026, 2:20 PM

Add SPDX header to tests/sys/kern/kern_eventlog_test.py (it was previously unmarked).

Harbormaster completed remote builds in B73098: Diff 177829.May 14 2026, 2:27 PM

pouria added a subscriber: pouria.May 14 2026, 3:11 PM

ali_mashtizadeh.com added a reviewer: ali_mashtizadeh.com.May 22 2026, 7:49 PM

nickbanks_netflix.com added a child revision: D57309: tcp: add eventlog(9) provider for TCP.May 28 2026, 6:36 PM

rrs is currently traveling and has no access to phabricator Therefore he asked me to forward some questions:

What is this for (statistics, debugging, ?)?
If it’s for debugging why was BBlogging not used?
If it’s for stats, why were trace points in BBLogging used?

In D56979#1314544, @tuexen wrote:

rrs is currently traveling and has no access to phabricator Therefore he asked me to forward some questions:

What is this for (statistics, debugging, ?)?

If it’s for debugging why was BBlogging not used?

If it’s for stats, why were trace points in BBLogging used?

This is primarily for debugging, but is designed to be general purpose, so it can be used for statistics as well. EventLog attempts to tackle a few issues, and therefore bblogging wasn't viable:

It's not TCP connection specific. It's meant to support any kernel (and hopefully user in the future) component. Even for TCP, it's designed to support global, non-connection specific events (such as stack-wide goodput histograms, at a specific sampling rate).
It uses a schema to create well-defined events that can be used to auto-generate both event fire macros and binary-to-text code. It also makes it much easier to write/generate tooling that consumes the binary events to do analysis/visualizations
It fundamentally changes the collection and memory model. Subscribers set which events they want to collect (via keywords and flags) and supply the (per-CPU) buffers that events are written into. Obviously the per-TCP connection buffering model doesn't work if you don't have a TCP connection.

So, TL;DR, eventlog is different enough from how bblog is architected that it must be a new system.

So, TL;DR, eventlog is different enough from how bblog is architected that it must be a new system.

My understanding is that eventlog is designed to be a much more generic and efficient kernel event logging system that will eventually replace bblog, and things like KTR. Is that fair?

As a user, I can speak to its efficiency. I added ev logging to lagg / lacp and select driver code to log LACP traffic and state changes so as to point fingers at Arista where we are having some .. issues... maintaining stable LACP with 400g NICs. So I have this thing running always-on snooping 1.2Tb/s of lacp traffic on a 3x400g prototype box.

tuexen added inline comments.Jun 5 2026, 8:40 PM

usr.bin/elog/elog.c
446	Isn't this a usecase for `STAILQ_FOREACH_SAFE()`?
615	Would it make sense to describe this format in `elog.1`? Why are you using a width of two for the CPU? Why a minimum of 4 hex digits for the thread?

Address review feedback from tuexen on usr.bin/elog/elog.c:

Replace the open-coded STAILQ_FIRST/STAILQ_NEXT iterator in close_session_file() with STAILQ_FOREACH_SAFE. The function does a STAILQ_REMOVE inside the loop, which is exactly what _SAFE exists for; even though we return immediately after the remove (so the unsafe variant is not actually buggy here), the macro is the idiomatic match and matches the rest of the file.

Document the per-event output line format in elog.1 (CPU, thread id, timestamp, provider, session id, event name, payload), and explain the field widths. CPU stays at "%2u" minimum-width and TID stays at "%04x" minimum-width; both grow when the value needs more digits.

No functional change to capture or output content.

Harbormaster completed remote builds in B73744: Diff 179391.Jun 8 2026, 4:22 PM

Address review feedback from tuexen on usr.bin/elog/elog.c: STAILQ_FOREACH_SAFE in close_session_file(), and document the per-event output line format in elog.1. No functional change.

Harbormaster completed remote builds in B73745: Diff 179392.Jun 8 2026, 4:22 PM

Address tuexen review on usr.bin/elog/elog.c: STAILQ_FOREACH_SAFE in close_session_file() and document the per-event output line format in elog.1.

Harbormaster completed remote builds in B73746: Diff 179393.Jun 8 2026, 4:22 PM

nickbanks_netflix.com mentioned this in D57309: tcp: add eventlog(9) provider for TCP.Jun 10 2026, 12:09 AM

tuexen added inline comments.Jun 14 2026, 12:23 AM

usr.bin/elog/elog.c
959	Why not use `getopt()` or `getopt_long()` if you want support options with names? That way something like `elog -enr name.elog`would work (as I expected).

tuexen added inline comments.Jun 14 2026, 10:18 AM

usr.bin/elog/elog.1
182	Not sure it makes sense to mention `oca.py` upstream...

Address review feedback: parse elog(1) options with getopt_long; drop internal man-page xref

Harbormaster completed remote builds in B73887: Diff 179808.Jun 15 2026, 4:07 PM

In D56979#1320329, @nickbanks_netflix.com wrote:

Address review feedback: parse elog(1) options with getopt_long; drop internal man-page xref

Thank you. Much easier to use now.

elog: emit 'unknown argument' for unrecognized options

Harbormaster completed remote builds in B73914: Diff 179876.Jun 16 2026, 3:59 PM

tuexen added inline comments.Jun 21 2026, 1:43 PM

usr.bin/elog/elog.c
122	I have a question regarding endianness. I think you wanted that *.elog files are written in host byte order. And this is what happens. But how should a reader of the file detect the byte order? I was assuming this can be done by checking the byte order of the magic number, but we have a magic string. I tested this on a little endian and big endian machine. Here is what I get (one `.elog`-file was generated on a little endian machine, one was generated on a big endian machine): tuexen@blackbird:~ % hexdump -n 40 -C unknown_17463_127.0.0.1_1234.elog 00000000 45 4c 4f 47 00 00 00 01 00 00 00 00 25 ce f9 26 \|ELOG........%..&\| 00000010 00 06 54 c1 d6 51 af 50 00 00 00 00 00 00 00 97 \|..T..Q.P........\| 00000020 00 00 00 00 00 00 00 00 \|........\| 00000028 tuexen@blackbird:~ % hexdump -n 40 -C unknown_55792_127.0.0.1_1234.elog 00000000 45 4c 4f 47 01 00 00 00 33 d5 c9 08 00 00 00 00 \|ELOG....3.......\| 00000010 d7 e3 2d fa 50 54 06 00 97 00 00 00 00 00 00 00 \|..-.PT..........\| 00000020 00 00 00 00 00 00 00 00 \|........\| 00000028 tuexen@blackbird:~ % So how do you envision the reader detects the byte ordering? By looking at the version number?

gbe added a subscriber: gbe.Jun 23 2026, 8:41 AM

nickbanks_netflix.com marked 5 inline comments as done.Thu, Jun 25, 12:48 PM

nickbanks_netflix.com added inline comments.

usr.bin/elog/elog.c
122	I think the best answer is to actually make elog always use little endian, which will require updating both elog and eventlog (the AWK script that writes event payloads). I will do that in a follow up commit instead of shoving more into this one.

What is the rationale for using an AWK file to generate source? I find that it makes it hard to intuitively reason about what this is doing. (And, FWIW, my past experience is that things like this almost never stand the test of time--you almost always find a new capability you need which requires refactoring.) Can this be reworked in a way which makes the code more obvious (e.g. code which is included, or a more standard macro expansion)?

I think past history (e.g. with black box logging) has shown that it is easy to get mismatches between the schema a kernel is using and the schema user space is using, particularly when in a development environment (like the main branch) where you may recompile the kernel much more than userspace. How does your code avoid/address these mismatches?

Assume this happens:

Active buffer is 0
Active buffer swaps to 1
Reader begins reading
Active buffer swaps to 0 while reader is still reading

What will result?

In D56979#1329232, @jtl wrote:

What is the rationale for using an AWK file to generate source? I find that it makes it hard to intuitively reason about what this is doing.

My (limited) understanding was that AWK files were the standard way to have build-time generated headers.

(And, FWIW, my past experience is that things like this almost never stand the test of time--you almost always find a new capability you need which requires refactoring.) Can this be reworked in a way which makes the code more obvious (e.g. code which is included, or a more standard macro expansion)?

I have gone through many iterations of this process already, adding/modifying features and it hasn't required any full refactor (so far). I'm happy to use a different model, though I'm unclear on what exactly you're looking for. So, I'd appreciate some suggestions.

I think past history (e.g. with black box logging) has shown that it is easy to get mismatches between the schema a kernel is using and the schema user space is using, particularly when in a development environment (like the main branch) where you may recompile the kernel much more than userspace. How does your code avoid/address these mismatches?

Today, the solution to this is that you only append to the schema (either new fields to an existing event, or new events). User space will only decode what it knows about.

In an internal PR, I am leaning towards a model where providers give a binary representation of the schema itself, that elog (or other subscribers, i.e. a cloud backend) use to dynamically decode events. The above, static model would also continue to work if desired, but would require (generally, manual) updates when the provider changes. Practically, this is Ok (IMO), because real consumers of these events need to understand the meaning behind them, for instance if you want to plot a TCP window over time, you need more than a list of struct formats and sizes. You need to know which event includes the CWND, and understand when/why it's logged.

Assume this happens:

Active buffer is 0

Active buffer swaps to 1

Reader begins reading

Active buffer swaps to 0 while reader is still reading

What will result?

A swap can only occur if the reader's buffer is empty. The writer's buffer may or may not be full (but it will have something, otherwise there was no point to swap two empty buffers). So, in your example above, if the reader was still reading then the buffer swap wouldn't/couldn't have occurred. If the reader had completed reading everything, then the swap would occur and then the next read from the reader would start consuming whatever the writer had put in the buffer before the swap.

In D56979#1329256, @nickbanks_netflix.com wrote:

In D56979#1329232, @jtl wrote:

What is the rationale for using an AWK file to generate source? I find that it makes it hard to intuitively reason about what this is doing.

My (limited) understanding was that AWK files were the standard way to have build-time generated headers.

They are a standard way to do this. I just don't recall another example that is so long and detailed. I generally don't like non-trivial auto-generated code because it makes debugging very difficult. With GenAI, this may become less of an issue. And I will admit that it is a matter of preference, not correctness. But my history in trying to track down obscure bugs tells me that auto-generated code obfuscates enough that it complicates debugging.

I really will have to take the time to grok the code better before I have a substantive suggestion for an alternative. The generic answers are macros and decomposing things that can be de-duplicated into a shared include file. I don't know whether either one is appropriate here.

I think past history (e.g. with black box logging) has shown that it is easy to get mismatches between the schema a kernel is using and the schema user space is using, particularly when in a development environment (like the main branch) where you may recompile the kernel much more than userspace. How does your code avoid/address these mismatches?

Today, the solution to this is that you only append to the schema (either new fields to an existing event, or new events). User space will only decode what it knows about.

In an internal PR, I am leaning towards a model where providers give a binary representation of the schema itself, that elog (or other subscribers, i.e. a cloud backend) use to dynamically decode events. The above, static model would also continue to work if desired, but would require (generally, manual) updates when the provider changes. Practically, this is Ok (IMO), because real consumers of these events need to understand the meaning behind them, for instance if you want to plot a TCP window over time, you need more than a list of struct formats and sizes. You need to know which event includes the CWND, and understand when/why it's logged.

I think that makes sense and could be a nice addition. Alternatively, versioning could help detect (and, possibly, resolve) mismatches. If you don't (yet) have versioning, you may want to add it as a stop gap until something better is ready. IIRC, those of us doing dev work with early versions of black box logs wasted lots of time chasing bad info because we hadn't realized the versions in the kernel and user space were mismatched.

Assume this happens:

Active buffer is 0

Active buffer swaps to 1

Reader begins reading

Active buffer swaps to 0 while reader is still reading

What will result?

A swap can only occur if the reader's buffer is empty. The writer's buffer may or may not be full (but it will have something, otherwise there was no point to swap two empty buffers). So, in your example above, if the reader was still reading then the buffer swap wouldn't/couldn't have occurred. If the reader had completed reading everything, then the swap would occur and then the next read from the reader would start consuming whatever the writer had put in the buffer before the swap.

And, I assume, we would drop new events until a swap can occur? If so, that is a reasonable outcome.

TBH, it will take me quite a while to grok this review and have an intelligent opinion. I think we've always had a hard time getting effective reviews on 10K-line (actually, 1K+ line) commits. That's a function of relying on human volunteers to review. If there is any way you can decompose the commit, it might help. But it doesn't look like it might be easily decomposable.

In D56979#1329272, @jtl wrote:

In D56979#1329256, @nickbanks_netflix.com wrote:

In D56979#1329232, @jtl wrote:

What is the rationale for using an AWK file to generate source? I find that it makes it hard to intuitively reason about what this is doing.

My (limited) understanding was that AWK files were the standard way to have build-time generated headers.

They are a standard way to do this. I just don't recall another example that is so long and detailed. I generally don't like non-trivial auto-generated code because it makes debugging very difficult. With GenAI, this may become less of an issue. And I will admit that it is a matter of preference, not correctness. But my history in trying to track down obscure bugs tells me that auto-generated code obfuscates enough that it complicates debugging.

I really will have to take the time to grok the code better before I have a substantive suggestion for an alternative. The generic answers are macros and decomposing things that can be de-duplicated into a shared include file. I don't know whether either one is appropriate here.

I agree there is a TON of complexity in here, but that's because it's doing so much. I will think on how we might decompose things or use macros to make it simpler, but I suspect it wouldn't make a huge dent in the complexity here.

I think past history (e.g. with black box logging) has shown that it is easy to get mismatches between the schema a kernel is using and the schema user space is using, particularly when in a development environment (like the main branch) where you may recompile the kernel much more than userspace. How does your code avoid/address these mismatches?

Today, the solution to this is that you only append to the schema (either new fields to an existing event, or new events). User space will only decode what it knows about.

In an internal PR, I am leaning towards a model where providers give a binary representation of the schema itself, that elog (or other subscribers, i.e. a cloud backend) use to dynamically decode events. The above, static model would also continue to work if desired, but would require (generally, manual) updates when the provider changes. Practically, this is Ok (IMO), because real consumers of these events need to understand the meaning behind them, for instance if you want to plot a TCP window over time, you need more than a list of struct formats and sizes. You need to know which event includes the CWND, and understand when/why it's logged.

I think that makes sense and could be a nice addition. Alternatively, versioning could help detect (and, possibly, resolve) mismatches. If you don't (yet) have versioning, you may want to add it as a stop gap until something better is ready. IIRC, those of us doing dev work with early versions of black box logs wasted lots of time chasing bad info because we hadn't realized the versions in the kernel and user space were mismatched.

My (internal) PR that creates a binary representation of the schema (unfortunately, more complicated AWK code), does end up creating a 16-byte hash as the version. I've also considered adding an explicit version number directly in the schema but two things give me pause:

I suspect/expect people to forget to update any top-level version number for most changes.
The event IDs themselves are version number. If you need to make a "breaking change" for an event, you create a new event ID for the new format, and deprecate the old one. This is how I've used it internally to good affect (and how I used ETW in Windows for over a decade, which is similar in high-level design).
If I did add a top-level version, I'd naturally add some <major>.<minor> format (I think <patch> would be overkill), but that would imply (IMHO) it's fine to make breaking changes to the schema and ref the major version. In my experience, that only leads to lots of complexity and pain. Then, we're left with, only making "minor" version changes that are back-compat, at which point, you don't really need a version number all that much...

Assume this happens:

Active buffer is 0

Active buffer swaps to 1

Reader begins reading

Active buffer swaps to 0 while reader is still reading

What will result?

A swap can only occur if the reader's buffer is empty. The writer's buffer may or may not be full (but it will have something, otherwise there was no point to swap two empty buffers). So, in your example above, if the reader was still reading then the buffer swap wouldn't/couldn't have occurred. If the reader had completed reading everything, then the swap would occur and then the next read from the reader would start consuming whatever the writer had put in the buffer before the swap.

And, I assume, we would drop new events until a swap can occur? If so, that is a reasonable outcome.

Yes, drops will occur. And we expose the drop counters to the subscriber to know either (a) their buffer isn't big enough or (b) they aren't draining it fast enough.

TBH, it will take me quite a while to grok this review and have an intelligent opinion. I think we've always had a hard time getting effective reviews on 10K-line (actually, 1K+ line) commits. That's a function of relying on human volunteers to review. If there is any way you can decompose the commit, it might help. But it doesn't look like it might be easily decomposable.

I completely agree. Internally, this was a multi-month process. Theoretically, we could break this into parts:

The kernel and associated test code
The AWK script
The user space elog code

But honestly, even those pieces are big by themselves, and I don't think it really improves things.

eventlog: generate provider headers via make dependencies to fix a parallel-build header race

Herald added subscribers: emaste, bdrewery. · View Herald TranscriptTue, Jul 7, 2:12 PM

Harbormaster completed remote builds in B74583: Diff 181449.Tue, Jul 7, 2:13 PM

Scrub internal ticket reference from a test comment

Harbormaster completed remote builds in B74598: Diff 181473.Tue, Jul 7, 6:43 PM

I made a high level pass over the diff I think I have enough context to review it carefully now.

I wonder if it would be better to integrate with libctf to pull the enumerations and type information so that you don't have to be so verbose. Granted in the case of TCP, the definitions I'd worry about maintaining are tied to FreeBSD's ABI, but this might allow us to reduce the burden and just extract types straight from CTF.

I think my prior point that I mentioned offline about allowing parameters when subscribing has a lot broader applicability. It's a lot of data when our machines are loaded, it might be useful to ask for a specific connection in the case of TCP and/or configure specific counters for pmc.

I've written a few lock data structures. The buffering strategy seems really complicated and it has a few drawbacks. I think you can use a simpler ring structure with two independent atomic variables for the head and tail. This avoids write-write contention between the reader-writer, and would eliminate the need for a lock on machines without a 64-bit swap operation.. To deal with nesting you could use a ready bit/byte to avoid double buffering. If you follow this approach it becomes relatively easy to reason about the safety of allowing the userspace process to drain the log from a shared memory buffer and probably about half as many lines of code in the core.

In D56979#1336171, @ali_mashtizadeh.com wrote:

I made a high level pass over the diff I think I have enough context to review it carefully now.

I wonder if it would be better to integrate with libctf to pull the enumerations and type information so that you don't have to be so verbose. Granted in the case of TCP, the definitions I'd worry about maintaining are tied to FreeBSD's ABI, but this might allow us to reduce the burden and just extract types straight from CTF.

From a quick look, it seems that CTF would miss out on any preprocessor stuff like #defines, which is actually what makes up most of the verbosity in the schemas. Also, it creates a weird dependency problem, because you'd need to build to then generate the schema. So, I think I'll push back on that for now. Perhaps a middle ground: in the transport calls, we've discussed adding some build-phase validation that throws a compile error if there is any discrepancy between the schema and builds (to eliminate the drift problem).

I think my prior point that I mentioned offline about allowing parameters when subscribing has a lot broader applicability. It's a lot of data when our machines are loaded, it might be useful to ask for a specific connection in the case of TCP and/or configure specific counters for pmc.

I agree it would be interesting. I see it as a way for the subscriber (i.e. elog cmd line) to pass provider-specific data/args that can do anything. For TCP, we have some out of band (to elog) hooks already (i.e. socket options to turn on elog for one connection), but having it directly integrated into elog (i.e. enable='tcp.port=443') would be cool. But I think it could easily be added as a follow on feature, and not block the base reviews which are already huge.

I've written a few lock data structures. The buffering strategy seems really complicated and it has a few drawbacks. I think you can use a simpler ring structure with two independent atomic variables for the head and tail. This avoids write-write contention between the reader-writer, and would eliminate the need for a lock on machines without a 64-bit swap operation.. To deal with nesting you could use a ready bit/byte to avoid double buffering. If you follow this approach it becomes relatively easy to reason about the safety of allowing the userspace process to drain the log from a shared memory buffer and probably about half as many lines of code in the core.

I will look into this and see how the performance compares.

In D56979#1336240, @nickbanks_netflix.com wrote:

In D56979#1336171, @ali_mashtizadeh.com wrote:

I made a high level pass over the diff I think I have enough context to review it carefully now.

I wonder if it would be better to integrate with libctf to pull the enumerations and type information so that you don't have to be so verbose. Granted in the case of TCP, the definitions I'd worry about maintaining are tied to FreeBSD's ABI, but this might allow us to reduce the burden and just extract types straight from CTF.

From a quick look, it seems that CTF would miss out on any preprocessor stuff like #defines, which is actually what makes up most of the verbosity in the schemas. Also, it creates a weird dependency problem, because you'd need to build to then generate the schema. So, I think I'll push back on that for now. Perhaps a middle ground: in the transport calls, we've discussed adding some build-phase validation that throws a compile error if there is any discrepancy between the schema and builds (to eliminate the drift problem).

I think my prior point that I mentioned offline about allowing parameters when subscribing has a lot broader applicability. It's a lot of data when our machines are loaded, it might be useful to ask for a specific connection in the case of TCP and/or configure specific counters for pmc.

I agree it would be interesting. I see it as a way for the subscriber (i.e. elog cmd line) to pass provider-specific data/args that can do anything. For TCP, we have some out of band (to elog) hooks already (i.e. socket options to turn on elog for one connection), but having it directly integrated into elog (i.e. enable='tcp.port=443') would be cool. But I think it could easily be added as a follow on feature, and not block the base reviews which are already huge.

I've written a few lock data structures. The buffering strategy seems really complicated and it has a few drawbacks. I think you can use a simpler ring structure with two independent atomic variables for the head and tail. This avoids write-write contention between the reader-writer, and would eliminate the need for a lock on machines without a 64-bit swap operation.. To deal with nesting you could use a ready bit/byte to avoid double buffering. If you follow this approach it becomes relatively easy to reason about the safety of allowing the userspace process to drain the log from a shared memory buffer and probably about half as many lines of code in the core.

I will look into this and see how the performance compares.

Yea your comments on 1 and 2 are fair. For 3 if you carefully align things and use explicit ordering you should be very fast. With high core counts some of these concurrent queues may also use flat combining to reduce contention.

Yea your comments on 1 and 2 are fair. For 3 if you carefully align things and use explicit ordering you should be very fast. With high core counts some of these concurrent queues may also use flat combining to reduce contention.

Initial tests seem to show the ring model likely is better (and simpler), but I think I'd like to make that a standalone change to update eventlog (it doesn't change anything outside of eventlog or break any APIs) so it can get a closer review.

fold in merged fixes: SMR re-entrancy guard for nested NMI/interrupt/callout writes (drop instead of recursing smr_enter); eventlog_gen.awk exact keyword matching

Harbormaster completed remote builds in B75089: Diff 182606.Fri, Jul 24, 3:14 PM

kern: import eventlog(9), a kernel event logging framework
Needs ReviewPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 182606

include/eventlog/eventlog_gen.awk

include/eventlog/test_eventlog_schema.src

share/man/man5/Makefile

share/man/man5/elog.5

share/man/man9/Makefile

share/man/man9/eventlog.9

share/mk/bsd.eventlog.mk

sys/conf/eventlog.mk

sys/conf/files

sys/conf/kern.post.mk

sys/conf/kern.pre.mk

sys/conf/kmod.mk

sys/kern/kern_eventlog.c

sys/kern/kern_eventlog_test.c

sys/modules/ktest/Makefile

sys/modules/ktest/ktest_eventlog/Makefile

sys/sys/eventlog.h

sys/sys/eventlog_subscriber.h

targets/pseudo/userland/Makefile.depend

tests/sys/kern/Makefile

tests/sys/kern/elog_test.py

tests/sys/kern/kern_eventlog_test.py

usr.bin/Makefile

usr.bin/elog/Makefile

usr.bin/elog/elog.1

usr.bin/elog/elog.c

kern: import eventlog(9), a kernel event logging frameworkNeeds ReviewPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 182606

include/eventlog/eventlog_gen.awk

include/eventlog/test_eventlog_schema.src

share/man/man5/Makefile

share/man/man5/elog.5

share/man/man9/Makefile

share/man/man9/eventlog.9

share/mk/bsd.eventlog.mk

sys/conf/eventlog.mk

sys/conf/files

sys/conf/kern.post.mk

sys/conf/kern.pre.mk

sys/conf/kmod.mk

sys/kern/kern_eventlog.c

sys/kern/kern_eventlog_test.c

sys/modules/ktest/Makefile

sys/modules/ktest/ktest_eventlog/Makefile

sys/sys/eventlog.h

sys/sys/eventlog_subscriber.h

targets/pseudo/userland/Makefile.depend

tests/sys/kern/Makefile

tests/sys/kern/elog_test.py

tests/sys/kern/kern_eventlog_test.py

usr.bin/Makefile

usr.bin/elog/Makefile

usr.bin/elog/elog.1

usr.bin/elog/elog.c

kern: import eventlog(9), a kernel event logging framework
Needs ReviewPublic
Actions

Revision Contents
Changeset List