Page MenuHomeFreeBSD

usr.sbin/bhyve: add rendezvous layer for initialization step of bhyve
Needs ReviewPublic

Authored by aokblast on Dec 29 2024, 9:24 AM.
Tags
None
Referenced Files
F133469847: D48241.id149284.diff
Sun, Oct 26, 12:51 AM
F133441284: D48241.diff
Sat, Oct 25, 8:00 PM
Unknown Object (File)
Sat, Oct 25, 4:42 AM
Unknown Object (File)
Fri, Oct 24, 2:44 PM
Unknown Object (File)
Fri, Oct 24, 2:13 PM
Unknown Object (File)
Fri, Oct 24, 2:13 PM
Unknown Object (File)
Fri, Oct 24, 2:13 PM
Unknown Object (File)
Fri, Oct 24, 9:33 AM

Details

Reviewers
None
Group Reviewers
bhyve
Summary

THere maybe multiple wait operation in different components of bhyve (e.g. rfb, sock)

In original implementation, each wait operation will block the main thread thus
we can not connect to the socket in arbitrary order. Moreover, the actual
order depends on the argument order pass to the bhyve program

To better address this, we should add a rendezvous layer that can wait
all component at the same time

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 61732
Build 58616: arc lint + arc unit

Event Timeline

Are there any other potential consumers of this interface, besides rfb? I see the comment that mentions why gdb can't participate, which is a bit unsatisfying but okay.

usr.sbin/bhyve/rendezvous.c
74 ↗(On Diff #148512)

This interface behaves basically like a semaphore: rendezvous_add_member() increases the count, rendezvous_member_notify() decreases it, and rendezvous() waits for it to decrease to zero.

I think it could be implemented more simply: just have a single mutex, CV, and a counter. I don't see any reason to have per-consumer state either.

92 ↗(On Diff #148512)

We should check somehow that this operation is always done before rendezvous() is called.

usr.sbin/bhyve/rfb.c
1327

Is there any need for these functions to be separate? It looks like we might as well automatically add the new member when it's created.

1333–1334

This conditional can simply be deleted.

Update implementation by using only one mutex and cond variable

aokblast marked 2 inline comments as done.EditedDec 30 2024, 5:17 PM

Another consumer of rendezvous will be the raw socket over serial console that I have implemented a couple months ago. I haven't implement the wait operation on raw socket serial console. I would like to confirm the interface of rendezvous before I implement it.

BTW, for gdb, we can still wait gdb on rendezvous layer by deleting the set of vcpu0 and use rendezvous_notify. But I think setting vcpu0 maybe a better implementation when consider the behavior of gdb?

usr.sbin/bhyve/rendezvous.c
74 ↗(On Diff #148512)

Ya, I think you are right. Fix it

92 ↗(On Diff #148512)

Do you means the original implementation by separating add_member and member_create. If it is, I think the new implementation that combine these two have solve the problem

Fix condition when there is no rendezvous member

Another consumer of rendezvous will be the raw socket over serial console that I have implemented a couple months ago. I haven't implement the wait operation on raw socket serial console. I would like to confirm the interface of rendezvous before I implement it.

I left some inline comments. It's hard to make judgements about the interface when there is only one consumer. It looks ok to me, but maybe the uart driver has some extra requirements.

BTW, for gdb, we can still wait gdb on rendezvous layer by deleting the set of vcpu0 and use rendezvous_notify. But I think setting vcpu0 maybe a better implementation when consider the behavior of gdb?

Actually, I do not understand why gdb cannot use the rendezvous layer. bhyve_start_vcpu() is called after rendezvous(), so the following would happen:

  1. init_gdb() registers a notifier,
  2. rendezvous() blocks until notification is received
  3. a debugger attaches and new_connection() notifies the bhyverun thread
  4. bhyverun adds vCPUs
  5. vCPU threads block in gdb_cpu_add() until the debugger resumes the guest

What goes wrong here?

usr.sbin/bhyve/rendezvous.c
40 ↗(On Diff #148560)

IMHO this facility should just be a couple of functions in bhyverun.c - it is specific to bhyve startup (e.g., it can only be used once) and not very abstract.

I think you can simply have a global struct in bhvyerun.c with public functions like

void bhyve_startup_block(); /* called to register a notifier and block startup */
void bhyve_startup_notify(); /* notify the blocked thread */

and have bhyve_startup_wait() in implemented privately in bhyverun.c.

The rendezvous implementation is good but the interface is not general enough to warrant its own file, IMHO.

96 ↗(On Diff #148560)

We should assert that the counter is > 0.

115 ↗(On Diff #148560)
pthread_mutex_lock(...);
while (softc->counter > 0)
    pthread_cond_wait(...);
pthread_mutex_unlock(...);

would be a bit simpler and avoids extra (un)locking when executing the loop multiple times.

  • Add wait operation for uart
  • usr.sbin/bhyve: add wait operation for uart

I discover the original wait operation for gdb is not working now. I configure one vm with rfb and gdb. When gdb is with wait operation, the rfb does not have any output. But if the gdb is not waiting, the rfb will have output and gdb can attach to vm after boot up. I checkout to the main branch and it is also the case.

So I decide to implement wait operation for raw tcp console now to prove these stuff work.

BTW, a change is that I discover mevent dispatcher is executed at the bottom of main thread. That means we have to execute the bhyve_init_wait at each vcpu_thread instead of main thread or we cannot get mevent dispatched (because mevent_dispatch is not executed), then the bhyve_init_wait will not be notified.

I discover the original wait operation for gdb is not working now. I configure one vm with rfb and gdb. When gdb is with wait operation, the rfb does not have any output. But if the gdb is not waiting, the rfb will have output and gdb can attach to vm after boot up. I checkout to the main branch and it is also the case.

I guess it works if you attach a debugger first, then the rfb client?

So I decide to implement wait operation for raw tcp console now to prove these stuff work.

Why can't the gdb stub use this mechanism too?

BTW, a change is that I discover mevent dispatcher is executed at the bottom of main thread. That means we have to execute the bhyve_init_wait at each vcpu_thread instead of main thread or we cannot get mevent dispatched (because mevent_dispatch is not executed), then the bhyve_init_wait will not be notified.

I see, that seems reasonable.

usr.sbin/bhyve/bhyverun.c
128

This typedef is not needed.

137

This can just be a static global variable, there's no reason to allocate it lazily, since we always allocate it exactly once.

171

Unneeded return statement.

usr.sbin/bhyve/uart_backend.c
441–444

It would be cleaner to first use strchr(',') to break up the string into different options. Then you can use sscanf() to extract the address and port, and handle the rest separately. Otherwise adding new options in the future will be painful.

I don't implement it on gdb just because I cannot make wait operation on gdb work on my PC. Thus I am unable to test. Still checking where is the problem.

A quick find-up is that gdb_cpu_add storing the cpu suspend status and recover it in gdb wait mode.

gdb_cpu_add is executed in cpu_thread. At the same time, the task of resume BSP is at the bottom of main_thread.

If cpu_thread run faster than main_thread, the cpu suspend status will include BSP because BSP haven't been resumed (by the main_thread) yet.

It means when gdb is connected and suspend status is recovered, the BSP will be included thus the machine cannot be boot (no CPU is resumed) up.

A fix is that we use the block mechanism we proposed in this patch without suspend cpu0.

But a new problem I discover is that when gdb connected at the time when bhyve haven't booted up to OS, the bhyve will exit.

I attach bhyve to gdb and gdb shows vm triggered the TRIPLEFAULT and then exit with return value 3.

Still findout what is the problem but I think this facility works right now.