Page MenuHomeFreeBSD

Implement native eventfd system call.
AbandonedPublic

Authored by dchagin on Mar 29 2015, 6:50 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Apr 11, 7:13 PM
Unknown Object (File)
Tue, Apr 9, 2:39 AM
Unknown Object (File)
Jan 7 2024, 7:40 PM
Unknown Object (File)
Jan 7 2024, 7:40 PM
Unknown Object (File)
Dec 20 2023, 12:52 AM
Unknown Object (File)
Nov 22 2023, 2:00 PM
Unknown Object (File)
Nov 16 2023, 9:45 PM
Unknown Object (File)
Nov 13 2023, 12:11 PM

Details

Reviewers
emaste
jilles
trasz
rwatson
Group Reviewers
manpages
Test Plan

#include <sys/eventfd.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

#define handle_error(msg) \
do { \
perror(msg); \
exit(EXIT_FAILURE); \
} while (0)

int
main(int argc, char *argv[])
{
eventfd_t u;
int efd, j;
int error;

if (argc < 2) {

		fprintf(stderr, "Usage: %s <num>...\n", argv[0]);
		exit(EXIT_FAILURE);

}

efd = eventfd(0, EFD_CLOEXEC);
if (efd == -1)

		handle_error("eventfd");

switch (fork()) {
case 0:

		for (j = 1; j < argc; j++) {
			printf("Child writing %s to efd\n", argv[j]);
			u = strtoull(argv[j], NULL, 0);
			
			error = eventfd_write(efd, u);
			if (error != 0)
				handle_error("write");
		}
		printf("Child completed write loop\n");

		exit(EXIT_SUCCESS);

default:

		sleep(2);

		printf("Parent about to read\n");
		error = eventfd_read(efd, &u);
		if (error != 0)
			handle_error("read");
		printf("Parent read %llu (0x%llx) from efd\n",
		    (unsigned long long) u, (unsigned long long) u);
		exit(EXIT_SUCCESS);

case -1:

		handle_error("fork");

}
}

/* EFD_SEMAPHORE */

#include <sys/eventfd.h>
#include <sys/wait.h>
#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

static void xsem_wait(int fd)
{
eventfd_t cntr;

if (eventfd_read(fd, &cntr) != 0) {

		perror("reading eventfd");
		exit(1);

}
fprintf(stdout, "[%u] wait completed on %d: count=%" PRIu64 "\n",

	    getpid(), fd, cntr);

}

static void xsem_post(int fd, int count)
{
eventfd_t cntr = count;

if (eventfd_write(fd, cntr) != 0) {

		perror("writing eventfd");
		exit(1);

}
}

static void sem_player(int fd1, int fd2)
{
fprintf(stdout, "[%u] posting 1 on %d\n", getpid(), fd1);
xsem_post(fd1, 1);

fprintf(stdout, "[%u] waiting on %d\n", getpid(), fd2);
xsem_wait(fd2);

fprintf(stdout, "[%u] posting 1 on %d\n", getpid(), fd1);
xsem_post(fd1, 1);

fprintf(stdout, "[%u] waiting on %d\n", getpid(), fd2);
xsem_wait(fd2);

fprintf(stdout, "[%u] posting 5 on %d\n", getpid(), fd1);
xsem_post(fd1, 5);

fprintf(stdout, "[%u] waiting 5 times on %d\n", getpid(), fd2);
xsem_wait(fd2);
xsem_wait(fd2);
xsem_wait(fd2);
xsem_wait(fd2);
xsem_wait(fd2);
}

static void usage(char const *prg)
{
fprintf(stderr, "use: %s [-h]\n", prg);
}

int main(int argc, char **argv)
{
int c, fd1, fd2, status;
pid_t cpid_poster, cpid_waiter;

while ((c = getopt(argc, argv, "h")) != -1) {

		switch (c) {
		default:
			usage(argv[0]);
			return 1;
		}

}

if ((fd1 = eventfd(0, EFD_SEMAPHORE)) == -1 ||

	    (fd2 = eventfd(0, EFD_SEMAPHORE)) == -1) {
		perror("eventfd");
		return 1;

}
if ((cpid_poster = fork()) == 0) {

		sem_player(fd1, fd2);
		exit(0);

}
if ((cpid_waiter = fork()) == 0) {

		sem_player(fd2, fd1);
		exit(0);

}
waitpid(cpid_poster, &status, 0);
waitpid(cpid_waiter, &status, 0);

exit(0);
}

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

dchagin retitled this revision from to Implement native eventfd system call..
dchagin updated this object.
dchagin edited the test plan for this revision. (Show Details)

Maybe sys/kern/kern_event.c will be a better place for event-related system calls? like eventfd, timerfd etc...

dchagin retitled this revision from Implement native eventfd system call. to Implement native eventfd system call. .Mar 29 2015, 7:00 AM
dchagin added a reviewer: jilles.

This is not really a lot of code, but I'm not sure it should be added as a fixed part of the native API, since it does not do much that is not already possible using pipe() or kqueue().

Debian Code Search shows many usages of eventfd() but they typically have a fallback to something else or are in highly Linux-specific code.

It could be it will be unavoidable to add this at some time because of ports problems, but that seems not yet the case.

In D2172#8, @jilles wrote:

This is not really a lot of code, but I'm not sure it should be added as a fixed part of the native API, since it does not do much that is not already possible using pipe() or kqueue().

Debian Code Search shows many usages of eventfd() but they typically have a fallback to something else or are in highly Linux-specific code.

It could be it will be unavoidable to add this at some time because of ports problems, but that seems not yet the case.

maybe we need to organize a poll? to identify Linux syscalls list that I need to implement before lemul can be merged?

This functionality can duplicate some other syscalls, but it's not the point. The point is - not implementing APIs Linux provides makes porting software harder. Yes, most (large) software projects already have alternative code for dealing with non-Linux, but only because someone already put an effort into writing and maintaining it. Also, those code paths can be less excercised (read: not as well tested).

As for the code location - I prefer not to put together things that don't have much in common, and eventfd is a well separated piece of code, unrelated to other syscalls, so I'd vote for kern_eventfd.c, then kern_timerfd.c for timerfd syscalls etc.

lib/libc/sys/eventfd_rw.c
56 ↗(On Diff #4493)

Man page for those two functions would be nice, perhaps simply mention them in eventfd.2 and add MLINKS?

sys/kern/kern_eventfd.c
57

Erm, are all of these neccessary?

118

static void eventfdinit(void), perhaps? Same below.

150

Why uppercase?

Robert, does this code look ok? Are you ok with adding this syscall to capabilities.conf?

In D2172#9, @dchagin wrote:

maybe we need to organize a poll? to identify Linux syscalls list that I need to implement before lemul can be merged?

I don't see a reason to hold off lemul merging based on syscalls on the Linux side.

I think we can continue to discuss the topic of implementing native versions of Linux syscalls independent of the lemul merge - specifically, we shouldn't hold up the lemul work while sorting each of these out.

There's a straightforward case to be made for some of these, like ppoll(). Some others have reasonable points both for and against native versions. We can always rework a Linuxolator version into a native one and shim later on, if there's no consensus or immediate path forward on any given syscall.

In D2172#15, @emaste wrote:

I think we can continue to discuss the topic of implementing native versions of Linux syscalls independent of the lemul merge - specifically, we shouldn't hold up the lemul work while sorting each of these out.

After further discussion I think this is probably the wrong approach, in fact. If we do want to end up with a native version of a given syscall in the future we're better off implementing the native version, and writing the wrapper now, than trying to retrofit it later.

Especially given that the code already looks really well, apart from few (very) minor nits described above.

Hi all:

This patch appears to include neither support for MAC nor security event auditing. While this is the status quo in the Linuxulator world, omitting these features in the base kernel would be a problem, and it needs to be completed before the kernel feature is added.

Robert

Doh, completely forgot about those.

Is there a document describing the procedure to add new audit types?

As for MAC - does it make sense to use MAC pipe handling as a starting point?

https://wiki.freebsd.org/AddingSyscalls has a few notes on adding new system calls but not great detail. The two aspects to audit additions are allocating a new event type -- something to sort out in OpenBSM and then to merge to our tree (presumably in the Linux events section) -- and then to add auditing of security-relevant arguments. On MAC: yes, pipes, or one of the other IPC objects, such as POSIX SHM or similar. Also on audit, we'll need to decide if the event goes in any audit classes by default, etc.

Is the OpenBSM upstream still in p4, in //depot/projects/trustedbsd/openbsm/...?

Yes -- although brueffer and I have hoped to migrate it to github for some time. Unfortunately, the git p4 import module gets upset by the tag structure (or something) trying to do the migration, requiring us to manually propagate tags to the git repo. This is something I hope to have time to sort out in due course but have not yet been able to do. I recommend doing the allocation in p4 in the interim.

mjg added inline comments.
sys/kern/kern_eventfd.c
130

This would need MTX_NEW since there is no guarantee passed pointer is to zeroed data.

150

Does it really need a separate zone? sizeof struct eventfd looks like something approaching 120 and I presume it will be rarely used.

198

What's up with these checks? When could this possibly happen?

249

This would really be nicer if the error from mtx_sleep caused immediate return.

I've reviewed the patch as if it's accepted to add this to the native API. I think it is possible to split the patch and first add the new fd type and kern_eventfd.c for the Linuxulator's use without committing to a new native API. This is definitely a required feature for the Linuxulator soon, if not now.

Compatibility code at the libc level is not possible, so a __FreeBSD_version bump, an UPDATING entry and coordination with portmgr@ will be required. Ports will detect and use this, and if the kernel is older the result is not pretty.

lib/libc/sys/Makefile.inc
41

The file eventfd_rw.c should go into lib/libc/gen/, not lib/libc/sys/.

lib/libc/sys/Symbol.map
544

These are not needed, and adding them can be postponed until libthr, librt, etc. needs to use the private versions. Not adding them is slightly more efficient.

Note that I only just edited the AddingSyscalls wiki page to remove the requirement to add these, but various syscalls have already been added without exporting these symbols in FBSDprivate_1.0.

lib/libc/sys/eventfd.2
57

It is really a bit strange that initval is only 32 bits, but this strangeness is from Linux and should therefore not be fixed.

68

The file status is on the eventfd object, not on the file descriptor.

mdoc nit: O_NONBLOCK should have .Dv

255

If the text was copied from others, you can't claim your sole copyright on it.

sys/kern/kern_eventfd.c
177

Storing EFD_CLOEXEC and possibly EFD_NONBLOCK (see below) here could be considered misleading.

225

I think this should check fp->f_flags & FNONBLOCK instead. Similar for eventfd_write() below.

400

This function should permit FIONBIO so that non-blocking mode can be toggled. If you use fp->f_flags & FNONBLOCK above, you don't need to do anything else.

Not supporting FIOASYNC, FIOSETOWN and FIOGETOWN is OK but should probably be documented in the man page.

418

It may be helpful for debugging to add efd_count and the EFD_SEMAPHORE bit of efd_flags here.

lib/libc/sys/eventfd.2
255

Jilles, can Iwrite the authors in the (c) clause?

sys/kern/kern_eventfd.c
130

this is uminit method. see uma(9)

150

why not, especially due to uminit.

sys/kern/kern_eventfd.c
198

I'm not sure, on practice all of our foXXX methods check this.

225

f_flag and efd_flag is synchronized, I'd prefer to be consistent with EFD_SEMAPHORE

400

I'd prefer to be compatible with Linux behavior which does not permit ioctl on eventfd.
Weird, i'm disagree here, eventfd is well documented and i do not understand the reason for document things that is not needed by design.

dchagin retitled this revision from Implement native eventfd system call. to Implement native eventfd system call..
dchagin edited the test plan for this revision. (Show Details)
dchagin edited edge metadata.

fixed mosy of comments, still without audit/mac.

sys/kern/kern_eventfd.c
225

Yes, an alternative approach is to sync efd_flags & EFD_NONBLOCK with the file flags when eventfd_ioctl gets FIONBIO. In that case, efd_flags is no longer immutable and needs to be protected by a lock or atomic operations.

400

Linux supports fcntl(F_SETFL) to set (at least) O_NONBLOCK on eventfd. I think this is important because it may be necessary to toggle non-blocking operation when the object is already created. The Linux documentation probably states elsewhere that all file types support this.

The FreeBSD implementation of fcntl(F_SETFL) first modifies f_flag and then calls FIONBIO and FIOASYNC ioctls on the underlying object. Therefore, FIONBIO must work and FIOASYNC must at least allow turning off ASYNC mode.

409

I tested on Linux and it permits fstat() on an eventfd. It doesn't return useful information though: st_blksize is 4096, st_mode is 0600, st_dev and st_inode are constants (apparently) and the timestamps are the system boot time.

sys/kern/kern_eventfd.c
198

Can you give an example?

The only checks I' aware of are performed after fp is obtained to make the type is right, but not in any of fileops.

A few other general notes:

  1. I do not think it would be a big deal to have these file descriptors start off as Linux only and eventually change. You would just svn mv the file from compat/linux to sys/kern. In particular I would not want that discussion to hold up updating lemul. I think adding these as native API at least warrants a more public discussion on arch@. I'm not saying I strongly object to having eventfd (or even timer or signal fds) in the native API, but I would want that decision to be orthogonal to updating lemul, not a blocker.
  1. I don't see the changes for libprocstat, procstat, and fstat to support this new file descriptor type. Those should be done for any new file descriptor type, whether exported via the native API or only via alternate ABIs. At a mimum you will need to choose a letter for the new KF_TYPE and document it in the manpages (and support the new KF_TYPE in the appropriate switch statements). However, I also think you should consider exporting some type-specific data and displaying it (such as that suggested by Jilles)
lib/libc/sys/eventfd.2
254

No, you cannot presume that the authors wish their text to conform to your chosen license (and you can only use their content if their license permits). Usually if the license on the original source is acceptable this means you just preserve the original copyright/license block possibly adding your own (c) line if you make "significant" changes and wish your changes to be released under the same license terms.

sys/kern/kern_eventfd.c
369

Use invfo_truncate instead of this.

378

Similarly, if you were to omit this, you should use invfo_ioctl() to return the proper error value (ENOTTY). However, Jilles is correct in that you have to support certain ioctls to permit setting nonblocking I/O.

387

It is nicer to support a simple stat routine. FreeBSD does not generally consider fstat() optional. (There is no invfo_stat). You can use shm_stat in uipc_shm.c for inspiration if need be. For purely anonymous objects like these you can probably use fp->f_cred (credentials used to create the file descriptor) to set uid/gid rather than storing those in your private data.

411

We include the count of the older FD-based POSIX semaphores in the kinfo_file struct so that it can be examined with procstat, so including it here would be a good idea.

In D2172#44556, @jhb wrote:
  1. I do not think it would be a big deal to have these file descriptors start off as Linux only and eventually change. You would just svn mv the file from compat/linux to sys/kern. In particular I would not want that discussion to hold up updating lemul. I think adding these as native API at least warrants a more public discussion on arch@. I'm not saying I strongly object to having eventfd (or even timer or signal fds) in the native API, but I would want that decision to be orthogonal to updating lemul, not a blocker.
  1. I don't see the changes for libprocstat, procstat, and fstat to support this new file descriptor type. Those should be done for any new file descriptor type, whether exported via the native API or only via alternate ABIs. At a mimum you will need to choose a letter for the new KF_TYPE and document it in the manpages (and support the new KF_TYPE in the appropriate switch statements). However, I also think you should consider exporting some type-specific data and displaying it (such as that suggested by Jilles)

I agree on all points above - although also don't want to fall into the pattern of thinking "it's only the emulator, so doesn't require as careful a review of security / performance / stability concerns."

What are the next steps with this review?

kmacy has been working on bringing over the notify(3) APIs from Darwin[1], which appear to provide a superset of this functionality. It would be worth comparing the two and seeing what can be shared between the two. Ideally, the file descriptor returned by notify_register_file_descriptor would be of the same kind as that returned by eventfd and, outside of the Linuxulator (where it needs to be a system call for binary compat), we might be able to implement the eventfd stuff entirely in libc using the same system calls as notify(3).

[1] https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/notify.3.html

kmacy has been working on bringing over the notify(3) APIs from Darwin[1], which appear to provide a superset of this functionality. It would be worth comparing the two and seeing what can be shared between the two. Ideally, the file descriptor returned by notify_register_file_descriptor would be of the same kind as that returned by eventfd and, outside of the Linuxulator (where it needs to be a system call for binary compat), we might be able to implement the eventfd stuff entirely in libc using the same system calls as notify(3).

[1] https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/notify.3.html

Taking a quick look, notify(3) does not seem to involve kernel extensions. The notification daemon can provide read access to shared memory, send signals and create pipes sending the receiving side to the client. The fd from notify_register_file_descriptor() is clearly a pipe, given that it may return different notification tokens via read, and that notifications may be lost if the buffer is full (which is particularly bad if more than one notification type is received via a single fd).

On the other hand, eventfd can only handle one notification type per fd but can handle many occurrences without losing any.

kmacy has been working on bringing over the notify(3) APIs from Darwin[1], which appear to provide a superset of this functionality. It would be worth comparing the two and seeing what can be shared between the two. Ideally, the file descriptor returned by notify_register_file_descriptor would be of the same kind as that returned by eventfd and, outside of the Linuxulator (where it needs to be a system call for binary compat), we might be able to implement the eventfd stuff entirely in libc using the same system calls as notify(3).

[1] https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/notify.3.html

any progress here? I mean Kip's notify(3) implementation.