In D19729#422870, @kib wrote:In D19729#422803, @tychon wrote:Since the PCI Express endpoint and associated data paths are synthesized on the FPGA they have a larger than average vulnerability to SEUs. Hence we want to limit the "aperture" by which the DMA/bridge can scribble on memory if things go awry. That's done both by constraining access with the IOMMU and even with that limiting write-only access (using the BUS_DMA_NOWRITE) to bus_dmamap_load()) even further.
Do you enable busdma DMAR for everything, or limit the scope to the fpga device only ? If the later, do you use hw.busdma.pciX.X.X.X tunables to control that, or have something more involved ?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Mar 29 2019
Mar 29 2019
Mar 28 2019
Mar 28 2019
tychon added a comment to D19729: use the BUS_DMA_NOWRITE flag to expose and create the read-only VT-d IOMMU mappings.
tychon added a comment to D19729: use the BUS_DMA_NOWRITE flag to expose and create the read-only VT-d IOMMU mappings.
In D19729#422742, @kib wrote:In D19729#422726, @tychon wrote:In D19729#422689, @kib wrote:The flag is only supported by sparc64, and there is no single use of it in the tree.
Indeed there is no current in tree user. It is however used outside of the tree in conjunction with a Mellanox Innova-2 card. Plus, once in tree anyone is free to use and it will no longer be sparc64 only :-)
I am curious. Is this some private code to communicate with FPGA ? (Because I know both busdma dmar and in-tree ml5_fpga too closely).
Mar 27 2019
Mar 27 2019
Use the BUS_DMA_NOWRITE flag to expose and create the read-only VT-d
tychon added a comment to D19729: use the BUS_DMA_NOWRITE flag to expose and create the read-only VT-d IOMMU mappings.
In D19729#422689, @kib wrote:The flag is only supported by sparc64, and there is no single use of it in the tree.
tychon updated the diff for D19725: ioat(4) should use bus_dma(9) for the operation source and destination addresses to work properly with the VT-d IOMMU.
Use ioat-> max_xfer_size instead of BUS_SPACE_MAXADDR when creating the DMA tag for the operands.
tychon retitled D19725: ioat(4) should use bus_dma(9) for the operation source and destination addresses to work properly with the VT-d IOMMU from ioat(4) should use bus_dma(9) to work properly with the VT-d IOMMU to ioat(4) should use bus_dma(9) for the operation source and destination addresses to work properly with the VT-d IOMMU.
tychon added a comment to D19725: ioat(4) should use bus_dma(9) for the operation source and destination addresses to work properly with the VT-d IOMMU.
In D19725#422587, @mav wrote:Generally I'd indeed like to see some busdma integration either in the driver, or may be in some wrapper/subsystem on top of it to simplify usage, but right now in my code I am already using busdma to translate from virtual to physical addresses, and this busdma call here will be duplicate waste of time, if not worse somehow.
Mar 26 2019
Mar 26 2019
Mar 1 2019
Mar 1 2019
This looks great and addresses my concern from an earlier iteration. Thanks for fixing it!
Feb 8 2019
Feb 8 2019
pms(4) should use bus_get_dma_tag() to get parent tag.
Sep 17 2018
Sep 17 2018
Jun 8 2018
Jun 8 2018
Don't bother looking for non-executable pages when a process is
tychon added inline comments to D15708: Don’t bother looking for non-executable pages when a process is excluded from PTI..
tychon updated the diff for D15708: Don’t bother looking for non-executable pages when a process is excluded from PTI..
Incorporate feedback.
Apr 27 2018
Apr 27 2018
Expand the checks for UCR3 == PMAP_NO_CR3 to enable processes to be
Apr 25 2018
Apr 25 2018
I've also fixed up the frame->tf_rip == (long)doreti_iret code in trap().
If a trap is encountered upon executing iretq from within doreti() the
tychon added a comment to D15183: in PTI case account for stack alignment adjustment while copying frames during nested fault.
Thanks also for sharing your test program! I ended up reling on a tweaking to the ucontext_t too:
tychon added a comment to D15183: in PTI case account for stack alignment adjustment while copying frames during nested fault.
The current code works because the offset of pc_pti_stack in struct pcpu is such that:
Apr 24 2018
Apr 24 2018
Apr 18 2018
Apr 18 2018
Add top of PTI stack to PCPU to avoid it's calculation in cpu_switch().
Apr 17 2018
Apr 17 2018
I've updated the diff to incorporate the feedback.
I've updated the diff to incorporate the feedback.
Apr 16 2018
Apr 16 2018
As requested, I've updated the patch with full context.
Apr 13 2018
Apr 13 2018
Add SDT probes to vmexit on Intel.
Mar 10 2018
Mar 10 2018
Mar 9 2018
Mar 9 2018
Mar 7 2018
Mar 7 2018
The changes I proposed in the comments here, and reviewed in D14548, have been committed.
Fix a lock recursion introduced in r327065.
Mar 6 2018
Mar 6 2018
Not to make this my magnus opus, but based on some offline discussion with jhb, I had still left some room for improvement. I believe I've addressed those concerns with this updated diff.
Mar 2 2018
Mar 2 2018
Mar 1 2018
Mar 1 2018
Include KASSERT suggested by jhb.
Feb 28 2018
Feb 28 2018
Feb 15 2018
Feb 15 2018
If you capture VMCS_GUEST_INTR_STATUS in the EXIT_REASON_HLT case of vmx_exit_process() then you can use that value in vmx_pending_intr() without having to use a vmx_getreg(). That circumvents the locking issue, removes the need for the special case #ifdef INVARIANTS/#endif code and may even improve the latency by skipping an iteration of VMPTRLD() and VMCLEAR().
Feb 12 2018
Feb 12 2018
tychon added a comment to D14272: mitigate against CVE-2017-5715 by flushing the return stack buffer (RSB) upon returning from the guest.
While a conditional branch dependent on global variable isn't quite the same as an indirect branch dependent on a global variable, it would seem the opportunity still exists to evict the cache line containing the global variable to give the CPU some time to speculatively execute in the neighborhood of the branch. What I've seen in other implementation is the placement of the rsb flushing code before any branches. Perhaps that's overly conservative but I think we should follow that approach too.
Provide further mitigation against CVE-2017-5715 by flushing the
Feb 8 2018
Feb 8 2018
Jan 15 2018
Jan 15 2018
Provide some mitigation against CVE-2017-5715 by clearing registers
Dec 21 2017
Dec 21 2017
tychon committed rS327065: Recognize a pending virtual interrupt while emulating the halt instruction..
Recognize a pending virtual interrupt while emulating the halt instruction.
tychon updated the diff for D13573: recognize a pending virtual interrupt while emulating halt instruction.
Replace magic constants with #define, within the function, to clarify usage.
Jul 28 2017
Jul 28 2017
May 19 2017
May 19 2017
tychon added a comment to D10581: Raise BLOCKIF_IOV_MAX to 128. Windows uses at least 67 and qemu alsosupports 128..
In D10581#223869, @grehan wrote:I emailed Marcelo some changes I had to this - it doesn't work as is.
May 18 2017
May 18 2017
tychon added a comment to D10581: Raise BLOCKIF_IOV_MAX to 128. Windows uses at least 67 and qemu alsosupports 128..
Seems reasonable to me.
Mar 30 2017
Mar 30 2017
Reorder includes to placate MIPS build.
Add support for capturing 'struct ptrace_lwpinfo' for signals
Mar 29 2017
Mar 29 2017
tychon requested review of D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Is everyone reasonably content with this?
Mar 24 2017
Mar 24 2017
tychon updated the diff for D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Deleted debug drivel which snuck in.
tychon updated the diff for D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Addressed Mark's request to create a function to capture the siginfo_t for the specific thread, and clear the stale siginfo_t for other threads, in the process receiving the signal.
Mar 23 2017
Mar 23 2017
tychon added inline comments to D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Mar 22 2017
Mar 22 2017
tychon updated the diff for D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
I've reverted back to a note-per-thread and (hopefully) addressed some review comments.
tychon added inline comments to D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
tychon updated the diff for D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
I think I've incorporated all the feedback I received with the exception of adding to the procstat output the additional data collected in the 'gcore' case. In that case, the core file contains more data than I'm printing but I'm struggling on how to format it in a useful way. That could make a reasonable follow on commit.
Mar 20 2017
Mar 20 2017
tychon added a comment to D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
In D9995#208051, @markj wrote:In D9995#208023, @tychon wrote:Sorry to be so noisy that's twice now that Differential threw away my inline note.
Hm, you still need to hit "submit" after saving an inline note in order to post it. I can't think of any other gotchas.
To belabor the point, the structsize is part of the note itself to support procstat -- because that's how 'procstat' notes are formatted. If we don't want to use procstat to view the note I can omit the structsize but it does seem like a nice to have. That's assuming that the size of the note is sufficient for rudimentary versioning otherwise this is just a reinforcement of that.
Thanks, I see.
tychon added a comment to D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Sorry to be so noisy that's twice now that Differential threw away my inline note. To belabor the point, the structsize is part of the note itself to support procstat -- because that's how 'procstat' notes are formatted. If we don't want to use procstat to view the note I can omit the structsize but it does seem like a nice to have. That's assuming that the size of the note is sufficient for rudimentary versioning otherwise this is just a reinforcement of that.
tychon added a comment to D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Missed inline comment and the structsize being part of the note itself.
tychon planned changes to D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Thanks for the reviews. I will incorporate the feedback; some of which I have replied to 'inline'. Mark, I'll also take another pass at the usage and initialization of 'td_dbgksi'.
Mar 17 2017
Mar 17 2017
tychon updated the diff for D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Mark noted my diff was missing some context. I've generated with "svnlite diff -x -U9999".
tychon updated the diff for D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
This needs a bit more polish (and testing) but at a higher level how's this?
tychon retitled D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it from add siginfo_t to a corefile note and support in procstat to view it to add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
Mar 14 2017
Mar 14 2017
tychon added a comment to D9995: add 'struct ptrace_lwpinfo' to a corefile note and support in procstat to view it.
In D9995#206504, @kib wrote:In D9995#206488, @jhb wrote:So while this is what Linux does (a single siginfo note), the idea I've been kicking around is replacing the NT_THRMISC note with a per-thread note that stores the entire 'struct ptrace_lwpinfo'. This would include both the thread name as well as the siginfo_t for each thread. It is almost an expanded NT_PRSTATUS (though not quite). But in particular we have slowly added more things to 'struct ptrace_lwpinfo' over the last few years and having a note that includes it would mean that each new thing we add in the future would automatically be included in core dumps. This is not a bad patch though, and if kib@ is ok with just having a single siginfo_t note I won't object. (Not sure what others think of a NT_LWPINFO or the like as I don't think I've mentioned this idea before.)
Sure I prefer the approach you described, but somebody has to implement it. If the alternatives are between not having anything and this patch, lets go with the patch. But might be the patch' author consider extending and modifying it to implement your proposal ?
Jun 26 2015
Jun 26 2015
verify_gla() needs to account for non-zero segment base addresses.
Jun 11 2015
Jun 11 2015
In D2762#53053, @neel wrote:In D2762#52988, @tychon wrote:In D2762#52919, @neel wrote:In D2762#52915, @tychon wrote:Thanks Tycho. My responses inline:
This restructuring will be really handy. However, I think it's possible
to unify the handling of 'sysmem' and 'devmem' such that 'sysmem'
differs from 'devmem' only in persistance and accessibility (PROT_*).Since alloc_memseg() is already called by both vm_setup_memory() and
vm_create_devmem() you'll simply need to provide a vanity name for
"lower", "low" and "high" memory to vm_alloc_memseg() when it's
created. Then you can delete the inference that a memsesg is 'RAM'
you get when you encounter a VM_MEMSEG_NAME. You'll also need to
augment vm_alloc_memseg() will to take a persistance argument
which will be set for 'sysmem' and clear for 'devmem'.I think knowing which memory segments represent system memory is useful.
For e.g., I use this in iommu_modify() to only map 'sysmem' in the passthru
device's address space. This enforces consistency with the treatment of other
(emulated) device memory (e.g. AHCI BAR) from the point of view of the
passthru device.It appears beyond just useful but rather an implementation requirement :-)
As a side benefit to treating all memory roughly equal is that you'll
be able to expose the entire guest's memory PA-wise in
/dev/vmm/testvmm.lowermem, /dev/vmm/testvmm.lowmem and
/dev/vmm/testvmm.highmem!That's one way to do it although I don't see it as being functionally
different than mmap(/dev/vmm/vmname).It's not different so there isn't much point.
My point really was that the API as proposed would be impossible for someone with a copy of vmm_dev.h and a binary only vmm.ko to use properly. Not that that someone like that exists but libvmmapi.so and bhyve shouldn't contain too much embedded knowledge of the kernel internals.
Specifically, VM_ALLOC_MEMSEG takes a struct vm_memseg which in no way makes it obvious that when you supply a name you get a devseg and that it's necessary to omit the name when you want to get a "true" memseg. It you add 'int ismemseg' then it would be obvious.
That's a fair criticism and especially makes sense if all memory segments had a vanity name.
When each memseg has a vanity name you can dispense with
providing segid to vm_alloc_memseg() and instead have it returned as a
cookie of sorts; subsequent operations such as vm_map_memseg() and
vm_unmap_memseg() will take the segid and you can provide a
vm_get_segid_by_name() for the unfortuante consumer who misplaces the segid.You'll also be able to get rid of these upfront defines:
enum { VM_SYSMEM, VM_BOOTROM, VM_FRAMEBUFFER, };and automatically be able to support more than device ROM, etc.
Possibly, but that is just shuffling the identifier from an enumeration
to a character string (the segment name). It will still be necessary for
somebody to dole out or arbitrate who uses what character string.Also, an opaque identifier will require adding a 'vm_memseg_getnext()'
API for bhyvectl to be able to iterate over all memory segments.The enum { VM_SYSMEM, VM_BOOTROM ... } is a handy mnemonic to
make it easy to identify memory segments. It does not limit the number
or the type of memory segments that can be created.Indeed the enum doesn't limit the number nor types or memory segments but having the consumer supply two identifiers (the id and when required the name) make it's it a bit split brain.
Yes, that's a fair point.
Compounding this is that it's not possible to go from the device file back to the segid even with a brute force search over the entire gpa-space with vm_mmap_getnext() as the name isn't returned by vm_mmap_getnext()!
'vm_mmap_getnext()' returns a 'segid' which can be used in 'vm_get_memseg()' to arrive at the name.
Jun 10 2015
Jun 10 2015
In D2762#52919, @neel wrote:In D2762#52915, @tychon wrote:Thanks Tycho. My responses inline:
This restructuring will be really handy. However, I think it's possible
to unify the handling of 'sysmem' and 'devmem' such that 'sysmem'
differs from 'devmem' only in persistance and accessibility (PROT_*).Since alloc_memseg() is already called by both vm_setup_memory() and
vm_create_devmem() you'll simply need to provide a vanity name for
"lower", "low" and "high" memory to vm_alloc_memseg() when it's
created. Then you can delete the inference that a memsesg is 'RAM'
you get when you encounter a VM_MEMSEG_NAME. You'll also need to
augment vm_alloc_memseg() will to take a persistance argument
which will be set for 'sysmem' and clear for 'devmem'.I think knowing which memory segments represent system memory is useful.
For e.g., I use this in iommu_modify() to only map 'sysmem' in the passthru
device's address space. This enforces consistency with the treatment of other
(emulated) device memory (e.g. AHCI BAR) from the point of view of the
passthru device.
Jun 9 2015
Jun 9 2015
This restructuring will be really handy. However, I think it's possible
to unify the handling of 'sysmem' and 'devmem' such that 'sysmem'
differs from 'devmem' only in persistance and accessibility (PROT_*).
Support guest writes to the TSC by enabling the "use TSC offsetting"
May 21 2015
May 21 2015
The 'hostbridge' device exists to allow guests to infer msi/msix
May 4 2015
May 4 2015
Thanks, that makes sense. Looks good to me!
tychon added a comment to D2428: Deprecate the 3-way return values from vm_gla2gpa() and vm_copy_setup()..
This looks very nice. My only feedback is that I got confused as to whether or not the "return fault (*fault)" provided by vmm_fetch_instruction was boolean or not. In most cases it was treated as such, but you've actually got the information to do better. Specifically around line 624 in the new vmm_instruction_emul.c you could set *fault to IDT_SS or IDT_GP. If you think no one will ever care about the specific fault, perhaps renaming fault to is_fault would further cement it's boolean nature.
Apr 25 2015
Apr 25 2015
STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.
Dec 30 2014
Dec 30 2014
Ok, in that case I'll just change the parameter types to 'uint8_t' to make it obvious that this is a single-byte API.
I was planning to implement something along these lines when bhyve supports suspend/resume:
int vm_rtc_state_save(struct vmctx *ctx, struct vm_rtc_state *state);
int vm_rtc_state_restore(struct vmctx *ctx, struct vm_rtc_state *state);'struct vm_rtc_state' would hold all 128 bytes of the RTC+NVRAM and possibly some additional information that would be useful when resurrecting the RTC state.
Then with respect to the other code, I'm a bit curious about the VM_RTC_READ/VM_RTC_WRITE interface. I'd sort of expected a size or length parameter -- perhaps length so the NVRAM component can be bulk import/exported -- but even size would be fine. Another alternative is to make the 'value' or 'retval' uint8_t so it's obvious it's a byte-based.
Does the following seem alright?
int vm_rtc_write(struct vmctx *ctx, int offset, uint8_t value);
int vm_rtc_read(struct vmctx *ctx, int offset, uint8_t *retval);
int vm_rtc_size(struct vmctx *ctx); /* return size of the nvram including the RTC control/status registers */
I'm still in the process of reviewing this, however at first glance I'm not sure I see any reason to remove the HPET Legacy Routing support.
When you first mentioned it, I assumed that some interface change which made it difficult to support and that it added the to the test matrix. Now seeing the implementation it appears entirely independent from the RTC support.
The reason it appears independent is because I removed LegacyRouting :-)
Since it's not broken and provides additional functionality -- which I can't help bug imagine that some guest out there is relying on -- I don't see the justification from deleting it as there are several other examples of "redundant mechanisms".
LegacyRouting is an optional capability so guests shouldn't be upset if they don't see this capability. I haven't run into any issues with the different guests that I have tested.
Also, LegacyRouting doesn't obviate the need for a complete PIT and RTC device emulation since we'll always need to support guests that depend on them.
The interrupt routing gets complicated with LegacyRouting. For e.g., if LegacyRouting is enabled then RTC periodic interrupts are disconnected from the PIC but the alarm and update-ended interrupts are now routed to the SCI. This is not insurmountable but I don't see a tangible benefit.
In any case the most important reason to deprecate LegacyRouting is because doubles the test matrix.
Dec 29 2014
Dec 29 2014
I'm still in the process of reviewing this, however at first glance I'm not sure I see any reason to remove the HPET Legacy Routing support.
Dec 19 2014
Dec 19 2014
So odd that SFN was implemented minus the ability to actually enable it. Seems like the priority to commit jumped the priority to complete ;-)
Dec 16 2014
Dec 16 2014
I had written a similar test-stub when I wrote the code originally. Obviously, I misinterpreted the results :-(