Page MenuHomeFreeBSD

Extend struct kevent fields and add absolute timers.
ClosedPublic

Authored by kib on Jun 2 2017, 7:42 AM.

Details

Summary

This change implements NOTE_MONOTONIC flag for EVFILT_TIMER, which specifies that the data field contains absolute time to fire the event.

To make this useful, data member of the struct kevent must be extended to 64bit. Using the opportunity, I also increased size of ident to 64bit and added ext members. This changes struct kevent almost to Apple struct kevent64, except I did not changed type of udata, the later would cause serious API incompatibilities.

Unlike Apple kevent64, symbol versioning allows us to claim ABI compatibility and still name the new syscall kevent(2). Compat shims are provided for both host native and compat32.

Test Plan

I checked the changes using both old and new (see the patch) tests/sys/kqueue/libkqueue/kqtest.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib created this revision.Jun 2 2017, 7:42 AM

Did you check this change with ARM 32/64 - bit platforms, that the compiler doesn't pad any of the structures or fields in question?

kib added a comment.Jun 2 2017, 8:08 AM

Did you check this change with ARM 32/64 - bit platforms, that the compiler doesn't pad any of the structures or fields in question?

I suppose, your question is about compat32 correctness, otherwise I do not see why padding makes any difference.

No, I did not, but also note that it is pointless. On ARMv8 AKA arm64 we do not have compat32.

It might matter on powerpc. If it does, the static assert in compat32/freebsd32_misc.c about kevent32 size would catch it. I did not bothered to run the patch through make universe, at this stage.

Hi @kib ,

It looked like some of the compat code was using hardcoded offsets, but now I see you've declared the old structures for compat11.

Regarding padding, it is not optimal to mix the type-width like in "struct kevent":

struct kevent {
        __uint64_t        ident;                /* identifier for this event */
        short                filter;                /* filter for event */
        unsigned short        flags;
        unsigned int        fflags;
        __int64_t        data;
        void                *udata;                /* opaque user data identifier */
        __uint64_t        ext[4];
        __uint32_t        ext2[1];
};

The alignment requirement for int64_t is 8 bytes for ARM-32 aswell as ARM-64 from what I know. That means you'll have padded 4-bytes after "void *udata" for ARM-32. I suggest you move "void *udata" after ext[4] to avoid that.

ed added a subscriber: ed.Jun 2 2017, 8:38 AM

Thanks for working on this! The CloudABI polling code can also make good use of this. I'll prepare a change to adjust that once this code lands.

kib added a comment.Jun 2 2017, 8:43 AM

Hi @kib ,
Regarding padding, it is not optimal to mix the type-width like in "struct kevent":
The alignment requirement for int64_t is 8 bytes for ARM-32 aswell as ARM-64 from what I know. That means you'll have padded 4-bytes after "void *udata" for ARM-32. I suggest you move "void *udata" after ext[4] to avoid that.

Yes, there is padding, I do not see how it affects anything.

OTOH, moving fields around would break user code which performs traditional (non-designated) static initialization of struct kevent.

ed added a comment.Jun 2 2017, 8:56 AM

One suggestion, though: instead of calling this NOTE_MONOTONIC, would it make sense to use NOTE_ABSTIME instead?

NOTE_MONOTONIC sounds as if it uses CLOCK_MONOTONIC under the hood, even though your implementation still uses CLOCK_REALTIME, right?

kib added a comment.Jun 2 2017, 9:02 AM
In D11025#228344, @ed wrote:

One suggestion, though: instead of calling this NOTE_MONOTONIC, would it make sense to use NOTE_ABSTIME instead?
NOTE_MONOTONIC sounds as if it uses CLOCK_MONOTONIC under the hood, even though your implementation still uses CLOCK_REALTIME, right?

Apple uses NOTE_ABSOLUTE, but their API is different from us, and from my understanding, glib maintainers, who asked for this feature, do not like Apple' edge-triggering feature of NOTE_ABSOLUTE.

I will rename the flag to NOTE_ABSTIME on next patch update.

I did not changed type of udata, the later would cause serious API incompatibilities.

udata can be extended to 64 bit being declared as anonymous union

struct kevent {
...
  union {
    void     *udata;
    uint64_t udata64;
  };
...
};

Our in-kernel epoll shim can benefit from this.
For pre C11-compilers it can be declared as padded void * to maintain ABI compatibility

kib added a comment.Jun 2 2017, 11:19 AM

I did not changed type of udata, the later would cause serious API incompatibilities.

udata can be extended to 64 bit being declared as anonymous union

struct kevent {
...
  union {
    void     *udata;
    uint64_t udata64;
  };
...
};

Our in-kernel epoll shim can benefit from this.
For pre C11-compilers it can be declared as padded void * to maintain ABI compatibility

IMO it is far beyond accepted level of uglyness. Making high-profile API depend on the compiler settings does not make the API usable. Also, does this trick work for C++ ? The union vs. padding either would not work or require even more magic for big endian arches.

I added four doublewords to struct kevent, which are currently copied around same as udata. We can e.g. declare ext[2] and ext[3] always copied, while ext[0] and ext[1] are for possible use by filters. Would it be enough for epoll ?

IMO it is far beyond accepted level of uglyness. Making high-profile API depend on the compiler settings does not make the API usable. Also, does this trick work for C++ ? The union vs. padding either would not work or require even more magic for big endian arches.

It seems that anonymous unions are supported by gcc4.2 as extension https://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/Unnamed-Fields.html and by C++11 as standard http://www.stroustrup.com/C++11FAQ.html#unions so ugly padding trick can be skipped. I did not test them yet.

I added four doublewords to struct kevent, which are currently copied around same as udata. We can e.g. declare ext[2] and ext[3] always copied, while ext[0] and ext[1] are for possible use by filters. Would it be enough for epoll ?

Yes, it is enough. But it looks slightly worse than 64bit udata field.

brooks added a subscriber: brooks.Jun 2 2017, 2:18 PM

I like the change of types for ident and data. intptr_t was defiantly the wrong type for data.

I don't understand __uint32_t ext2[1]. I don't see it in the Apple manpage so I'm not sure what it's for. It also results in inaccessible padding at the end of the struct on all architectures.

With my CHERI hat on I'd be tempted to add another 64-bit member before udata as there will be 8-bytes of padding after data due to pointers being 16 or 32-byte aligned.

kib added a comment.Jun 2 2017, 5:11 PM

I don't understand __uint32_t ext2[1]. I don't see it in the Apple manpage so I'm not sure what it's for. It also results in inaccessible padding at the end of the struct on all architectures.

It makes the compat32 structure representable by a 64bit (native) structure.
In principle, I can eliminate the ext2 pad by using only 32bit members and e.g. PAIR32TO64().

With my CHERI hat on I'd be tempted to add another 64-bit member before udata as there will be 8-bytes of padding after data due to pointers being 16 or 32-byte aligned.

I do not want to do that for the same reason as changing the udata to 64bit: it changes user-visible API, see above. If added, the change would break non-designated initializers for struct kevent, and more, the breakage is silent.

Assigning pointers to integers and vice versa usually pops up a warning. Depends if -Werror is set though.

However this issue can also be solved using the __aligned() parameter:

struct kevent {
        __uint64_t        ident;                /* identifier for this event */
        short                filter;                /* filter for event */
        unsigned short        flags;
        unsigned int        fflags;
        __int64_t        data;
        void                *udata __aligned(8); /* opaque user data identifier */
        __uint64_t        ext[4] __aligned(8);
        __uint32_t        ext2[1];
};

If udata and and ext are aligned to 8 bytes then there is always 64-bits of space at udata. And it doesn't break any existing applications w.r.t. init order.

#define KEVENT_SET_UDATA_64(x,y) xxx
#define KEVENT_GET_UDATA_64(x) xxx
kib added a comment.Jun 2 2017, 6:06 PM
void                *udata __aligned(8); /* opaque user data identifier */
__uint64_t        ext[4] __aligned(8);

You are making assumptions about future ABIs there.

Anyway, I am going to keep things natural and do not want to apply such hacks.

kib updated this revision to Diff 29151.Jun 2 2017, 6:08 PM
  • Remove ext2, compose struct kevent32 from 32bit types only.
  • Make EV_SET() clear ext[] array. Additionally make tests struct kevent comparision to not depend on undefined padding content.
  • Update kevent(2) man page with new structure layout and add explanation for ext[2, 3].
jhb added a subscriber: jhb.Jun 2 2017, 10:14 PM
jhb added inline comments.
lib/libc/sys/kqueue.2
151 ↗(On Diff #29151)

EVFILT_AIO returns a pointer in this field, so it would need to stay a uintptr_t for CHERI, though perhaps be padded out to uint64_t somehow? (Or anonymous-unioned).

kib added inline comments.Jun 2 2017, 10:46 PM
lib/libc/sys/kqueue.2
151 ↗(On Diff #29151)

Lets ask brooks@, is something special needed there ? I strongly prefer to not depend on non-c99/c++89 features and compiler extensions.

brooks added inline comments.Jun 2 2017, 11:35 PM
lib/libc/sys/kqueue.2
151 ↗(On Diff #29151)

Based on @jhb's comment, I need uintptr_t here. One (somewhat ugly, but portable) solution would be to use uint64_t on IPL32 systems and uintptr_t elsewhere.

kib added inline comments.Jun 2 2017, 11:55 PM
lib/libc/sys/kqueue.2
151 ↗(On Diff #29151)

Indeed ugly.

I propose to introduce an MD type kqident_t, which on all in-tree arches would be uint64_t. You can define it as something appropriate on CHERI.

I am somewhat unsure about compat32 in this scheme, probably architectures which carry compat32 would need to also define kqident32_t.

kib added inline comments.Jun 3 2017, 12:02 AM
lib/libc/sys/kqueue.2
151 ↗(On Diff #29151)

Or, just leave the ident to be uintptr_t, as it is now. For NOTE_ABSTIME, I only need the data expanded. Also, ext[] provides additional pass-through cells.

I extended ident because Apple did it for kqueue64(), but might be there is no use for the extension.

brooks@, please decide, since your platform imposes the strongest limitations there.

brooks added inline comments.Jun 3 2017, 12:06 AM
lib/libc/sys/kqueue.2
151 ↗(On Diff #29151)

I can make either work. uintptr_t would let me avoid modifying the code (which is a design principle for our pure-CHERI-capablity mode that I've generally been able to follow), but the typedef would be fine in practice and this is the sort of change I've been willing to make.

kib updated this revision to Diff 29166.Jun 3 2017, 12:42 AM

Revert ident type back to uintptr_t.

emaste added a subscriber: emaste.Jun 3 2017, 1:01 AM
kib updated this revision to Diff 29498.Jun 12 2017, 6:46 PM

Fix make universe. Changes are mostly in tests, in particular, NetBSD tests customizations are no longer needed because event.data becomes 64bit.

ngie accepted this revision.Jun 12 2017, 7:08 PM

The test changes look great -- thank you :)!

lib/libc/sys/kqueue.2
548 ↗(On Diff #29498)

This change is beneficial outside this feature commit.

tests/sys/kqueue/libkqueue/main.c
185 ↗(On Diff #29498)

Spurious space between cast and variable?

227–230 ↗(On Diff #29498)

Is this indentation correct?

This revision is now accepted and ready to land.Jun 12 2017, 7:08 PM
kib added inline comments.Jun 12 2017, 9:36 PM
tests/sys/kqueue/libkqueue/main.c
185 ↗(On Diff #29498)

I kept the style of the changed line.

Might be, other added casts should follow the whole file style as well.

227–230 ↗(On Diff #29498)

It follows the style of the file.

bapt accepted this revision.Jun 15 2017, 6:58 PM
In D11025#228355, @kib wrote:
In D11025#228344, @ed wrote:

One suggestion, though: instead of calling this NOTE_MONOTONIC, would it make sense to use NOTE_ABSTIME instead?
NOTE_MONOTONIC sounds as if it uses CLOCK_MONOTONIC under the hood, even though your implementation still uses CLOCK_REALTIME, right?

Apple uses NOTE_ABSOLUTE, but their API is different from us, and from my understanding, glib maintainers, who asked for this feature, do not like Apple' edge-triggering feature of NOTE_ABSOLUTE.
I will rename the flag to NOTE_ABSTIME on next patch update.

@kib how difficult is it to also implement real NOTE_MONOTONIC, i.e. event that expires in monotonic time from the getgo? I think this would be very useful for real-world applications that care about precise timekeeping. We use CLOCK_MONOTONIC pretty much all across our code here. Yes, it can be added later on, but without it the API is somewhat incomplete IMHO.

kib added a comment.Jun 15 2017, 8:06 PM

@kib how difficult is it to also implement real NOTE_MONOTONIC, i.e. event that expires in monotonic time from the getgo? I think this would be very useful for real-world applications that care about precise timekeeping. We use CLOCK_MONOTONIC pretty much all across our code here. Yes, it can be added later on, but without it the API is somewhat incomplete IMHO.

In which way would this timer different from the existing timers without NOTE_ABSTIME flag ?

In D11025#232116, @kib wrote:

@kib how difficult is it to also implement real NOTE_MONOTONIC, i.e. event that expires in monotonic time from the getgo? I think this would be very useful for real-world applications that care about precise timekeeping. We use CLOCK_MONOTONIC pretty much all across our code here. Yes, it can be added later on, but without it the API is somewhat incomplete IMHO.

In which way would this timer different from the existing timers without NOTE_ABSTIME flag ?

Well, CLOCK_MONOTONIC ticks in its own realm inrespective of CLOCK_REALTIME. Hence the difference is that if I want sometime that ticks exactly X seconds after specific value of monotonic timer, e.g. implement constant frequency/phase code. Using call without NOTE_ABSTIME would not let me do it, as it determines initial time automatically at the time of the call and then adds offset to it.

To make you a practical example, say I want a code that is excecuted with frequency of 1.000Hz starting at time when the CLOCK_MONOTONIC ticks 2.678s, I want events that fire at those values of CLOCK_MONOTONIC:

2.678 -> 3.678 -> 4.678 -> 5.678 -> 6.678 -> etc

Neither current API nor the new API allows me to do that precisely. Yes, I can sort of emulate it by calling clock_gettime() at each point and then calculate and schedule offset without NOTE_ABSTIME. That has two problems: precision (non-zero time interval between clock_gettime() and kevent()) and also this method is not immune against time shifs/corrections/leap seconds etc, so it won't be really a constant phase/frequency.

Having absolute timer that expires relative to the CLOCK_MONOTONIC solves both issues cleanly and elegantly.

kib added a comment.Jun 16 2017, 12:25 PM

Having absolute timer that expires relative to the CLOCK_MONOTONIC solves both issues cleanly and elegantly.

So this is a feature request which is unrelated to the core parts of the proposed patch. I want to finish this work in the defined scope.

Might be I will implement your suggested API later, but certainly not in this commit. That said, it would be useful if you prepare man page change and test for the new API.

brooks accepted this revision.Jun 16 2017, 6:43 PM
brooks added inline comments.
tests/sys/kqueue/libkqueue/timer.c
169 ↗(On Diff #29498)

Probably should be renamed to test_abstime()

kib marked an inline comment as done.Jun 16 2017, 11:12 PM
This revision was automatically updated to reflect the committed changes.