Page MenuHomeFreeBSD

fork: Touch the registers specific to FreeBSD under the condition.
ClosedPublic

Authored by dchagin on Aug 9 2021, 3:53 PM.

Details

Summary

At least Linux ABI does not use carry bit and expects that the %rdx is preserved.

Diff Detail

Repository
R10 FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sys/i386/i386/vm_machdep.c
263

I'm not very familiar with this part of the ABI, but linux32_set_syscall_retval() also clears PSL_C to indicate success, so isn't this change wrong for i386 and 32-bit Linux processes on amd64?

Don't you want per-ABI sv_cpu_fork method then?

sys/i386/i386/vm_machdep.c
263

Linux does not use carry bit to indicate error, this is why all the -errno stuff in Linuxkpi we have.

In D31472#709638, @kib wrote:

Don't you want per-ABI sv_cpu_fork method then?

nice proposal,
btw, what is the reason to set rdx to 1 here? i couldn't find it

sys/i386/i386/vm_machdep.c
263

I'll fix it later, not in this change

btw, what is the reason to set rdx to 1 here? i couldn't find it

This is quite interesting question, at least it was for me. I had to look at V7 and BSD 4.2 sources to get an idea.

It seems that V7 fork(2) syscall did not returned 0 in the child. Instead, it returned parent pid. In other words, current day idiom pid = fork(); if (pid == 0) {...child...} did not worked in V7. BSD 4.2 decided to make the life of the consumers easier, but keep compatibility with V7. So they started returning the high val set to 1 for child, and 0 for parent. It was probably some variant of SysV that made fork(2) return 0 to child.

I believe this setting of %rdx/%edx is the remnant of that time. You can see that sys_fork() and other syscall wrappers do td->td_retval[1] = 0; for the parent.

You can see it yourself there: https://minnie.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/kern_fork.c
At the top of the page there is a selector to look at different versions of the file. V7 fork is in https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/sys/sys/sys1.c

In D31472#709712, @kib wrote:

btw, what is the reason to set rdx to 1 here? i couldn't find it

This is quite interesting question, at least it was for me. I had to look at V7 and BSD 4.2 sources to get an idea.

It seems that V7 fork(2) syscall did not returned 0 in the child. Instead, it returned parent pid. In other words, current day idiom pid = fork(); if (pid == 0) {...child...} did not worked in V7. BSD 4.2 decided to make the life of the consumers easier, but keep compatibility with V7. So they started returning the high val set to 1 for child, and 0 for parent. It was probably some variant of SysV that made fork(2) return 0 to child.

I believe this setting of %rdx/%edx is the remnant of that time. You can see that sys_fork() and other syscall wrappers do td->td_retval[1] = 0; for the parent.

maybe then instead of adding sv hook just remove this line? carry bit flag does not interfere with Linux emulation, so we can leave it here

You can see it yourself there: https://minnie.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/kern_fork.c
At the top of the page there is a selector to look at different versions of the file. V7 fork is in https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/sys/sys/sys1.c

thank you

maybe then instead of adding sv hook just remove this line? carry bit flag does not interfere with Linux emulation, so we can leave it here

You can only extract the registers manipulation block into sv_cpu_fork on x86, leaving the rest of the function there. Then call sv_cpu_fork from cpu_fork()

In D31472#709712, @kib wrote:

btw, what is the reason to set rdx to 1 here? i couldn't find it

This is quite interesting question, at least it was for me. I had to look at V7 and BSD 4.2 sources to get an idea.

It seems that V7 fork(2) syscall did not returned 0 in the child. Instead, it returned parent pid. In other words, current day idiom pid = fork(); if (pid == 0) {...child...} did not worked in V7. BSD 4.2 decided to make the life of the consumers easier, but keep compatibility with V7. So they started returning the high val set to 1 for child, and 0 for parent. It was probably some variant of SysV that made fork(2) return 0 to child.

It's a bit more complicated than that. Fork has always been special, and the kernel has always done funky things in a "just so" way that libc cooperates with... I did a quick search, but didn't fill in all the details:

V7 fork, the system call, does weird things. While it does pass the parent's PID back in r0, it also increments the saved PC value in the parent.

V7 fork, the library call, behaves as you'd expect: the child returns 0 so the idiomatic code works. So the libc code looks like:

_fork:
      mov     r5,-(sp)
      mov     sp,r5                       / Setup the call frame
      sys     .fork
              br 1f                 / Child returns here and jumps forward
      bec     2f                   / the carry bit is set on errors, which use common routine
      jmp     cerror
1:
      mov     r0,_par_uid   / Save the parent's PID
      clr     r0                      / Clear the return value
2:
      mov     (sp)+,r5        / tear down the call frame and return to caller
      rts     pc

So that's V7. 4BSD did away with the instruction increment (because on the VAX instructions weren't all the same size). There retval[0] always had a pid (parent got child, and vice versa) and retval[1] was 0 for the parent and 1 for child. though things moved around a lot, 4.4BSD was released with this same convention. 386BSD inherited it (though in a bit of a funky way that's not relevant here).

I believe this setting of %rdx/%edx is the remnant of that time. You can see that sys_fork() and other syscall wrappers do td->td_retval[1] = 0; for the parent.

Remnant of a later time. As early as FreeBSD 3 I think we'd move away from this convention, but there's been a lot of churn in this area, so I'm not completely sure. Though setting edx = 1 on this return path would flag it as a child and explain where the mystery retval[1] = 1 went that I was was searching for.

So it's at least a historical accident that the FreeBSD ABI acts like this. And it looks like we're still tagging it. I didn't study long enough to give a better answer, but this has a long history and it's complicated and to sort out all the ins and outs. Non x86 are mixed. arm and powerpc return '0' as the second return value, but mips returns 1 as well. This is likely a copied vestige from its netbsd origins. Since I dashed this out, I'm sure there's mistakes an opportunity for nitpicking (I did guess at a few details that were later confirmed by code and a few I'm not sure I later confirmed: I've not gone over them all twice to make sure).

Ah, found another stash of code.
4.4-Lite2 and FreeBSD 2.1 use the convention I mentioned: pid in retval[0] (eax on x86) and retval[1] is 0 - parent, 1 - child (edx on x86). FreeBSD 3 seems to have lost this,
FreeBSD 3 moved this to fork_return and eventually it made its way into cpu_fork() in vm_machdep.c where I can see none of this convention today and that's where I stopped looking in detail...

BTW, I found a cache of system V sources, and they also have the retval[] convention found through 4.4BSD and the early FreeBSD code. 32V, System III, System V and System Vr4.0 (at least on x86) all have this same convention. Linux does not. I checked qemu emulation as well as their sources. So the if is correct there, but I have a suggestion...

sys/i386/i386/vm_machdep.c
261–262

As explained, any BSD we emulate will need this. Also System Vr4 had the same convention. However, linux sets ax = 0 only. It does not touch dx (I wonder how they got system V emulation correct then). qemu's linux-user does the same. Linux seems the odd duck here, so maybe this should check for the LINUX ABI instead and not do it them.

I'll note: I did not check cloud ABI to see what it does, but if it fall through this code, it would historically set edx=1 for the child, so maybe that should continue. Not sure the status of cloudabi these days, though.

qemu-bsd-user, btw, always sets the second rval to 0 or 1, but emulates the current practice of returning '0' for the pid of the child. It's clear that all the FreeBSD architectures do not set this second retval (arm sets it to 0 for sure)...

add a new sv_cpu_fork hook and call it from cpu_fork() if it initialized.

Add a short comment about touching dx in cpu_fork()

All this historical digging makes a nice blog post, I believe.

sys/i386/i386/vm_machdep.c
261–262

Avoiding a comparision and avoiding one-ABI-specific code in the generic cpu_fork() was the reason to propose sv_cpu_fork, in the first place.

Please move all ABI-specific code into corresponding methods. Also it would be nice to avoid the sv_cpu_fork == NULL check at all, by providing sv_cpu_fork for all sysents.

In D31472#709944, @kib wrote:

All this historical digging makes a nice blog post, I believe.

sys/i386/i386/vm_machdep.c
261–262

ok, at first I did it so, it turned out too many changes with the same effect)

is machine/cpu.h is a good place for sv_cpu_fork() declaration?

sys/i386/i386/vm_machdep.c
261–262

We have <arch>/include/md_var.h and x86/include/x86_var.h for such things.
Do not name it sv_cpu_fork(), please.

kib added inline comments.
sys/sys/sysent.h
158

You might add a comment like /* Only used on x86 */.

This revision is now accepted and ready to land.Aug 12 2021, 12:07 AM