So instead of having a lot code in bhyve, I would suggest to have portion of primitives that can help to implement warm, live migration outside of bhyve process.
Wait. You're casually suggesting that years of work be discarded in favor of an alternate approach. Please be more specific about what you perceive to be the problem with the method taken in this patch series. "I have another idea" isn't a very good reason to go in a completely different direction.
Suggestion was about to help you to move to the right direction. I understand that this patch series ate time to implement, but it should be not just reason to integrate it into bhyve. Do you agree? Instead, I would talk about better implementation with more robust approach.
I suppose the following reasons to discuss about this patch-series:
- Snapshot/resume code was totally thrown and "implementing team" forget about the code since it was integrated in 2020. Yes, it was a good job, but it was incomplete and has a lot of issues and nobody wants to fix them.
- Who will support integrated code ? This patch-series has a lot code and it potentially has bugs. Will it be the same story just integrate something and forget about code, i.e. do not participate in bug fixing?
- Instead of adding a lot of code to bhyve, it is better to place code outside from bhyve source, to make bhyve pretty small and secure.
- Warm/Live migrations can have different steps and requirements, hooks, etc. For example, it can be done with storage migration (Storage vMotion) or without, with additional tunning VM like vCPU slowdown or without. And better place for implementing those is another program that will handle all needed requirements and functionalities.
Correct me, if I see something wrong.
Jun 21 2023
Jun 20 2023
The *live* migration must be part of bhyve, as the migration process must transfer the virtual machine's memory from one host to the other while the virtual machine runs on the original host.
Of course, some changes to the migration code are unavoidable in case the snapshot process changes in some particular ways, but they should remain minimal in my expectations.
I believe that migration code can be outside of bhyve process even with warm or live migration. Really, suppose to have bmigrate process written in C, shell, python (high level language):
a. bhyve dumps diff-pages to a file1.
b. bmigrate sends data to host2 or just places on shared storage
c. bmigrate calls bhyve to suspend without memory.
d. bmigrate sends suspended data to host2. and restore memory 1-st iteration on host2.
e. bmigrate calls vmm to dump diff-pages to file2.
f. bmigrate sends diff-pages to host2 and restore 2-st iteration.
g. bmigrate restore process on host2.
This is rough idea, but it should work w/o adding a lot of code to bhyve.
The same is for warm migration. Why this code should be in bhyve? All states, cpus, etc. could be tracked outside of bhyve.
Before proceed this work to commit, could you provide high level design for this approach ?
@corvink , I think this commit is too early. Because we haven't finished supporting snapshots, but this commit already fixes the user interface for live migration, which is not finished at all.
Sorry for the late comment, but maybe this commit should be rolled back?
Apr 1 2023
Jun 30 2021
Rebased the code. The changes were added on top of d26ef5c7ac830812f07a02787f25fed5d6f8609e (from github.com/freebsd/freebsd-src).
Apr 4 2021
We tested the patch and everything seems OK from our part. The fix works as intended. Thanks!