For DoneList allocations, its size depends on the number of loaded DSOs. For map_object(), its size is the number of segments in the object. In both cases, over-grown situations would cause a stack overflow. Noted by: kevans rtld: add spinlock around the crt malloc calls Right now, the rtld malloc is called under the write-locked rtld bind lock. A future change adds places where only read-locked rtld bind lock is held, and then the spinlock protects the malloc structures from the parallel updates.
Details
- Reviewers
kevans
Diff Detail
- Repository
- rG FreeBSD src repository
- Lint
Lint Skipped - Unit
Tests Skipped
Event Timeline
I haven't had a chance to thoroughly review this yet, but I think it's a good idea regardless of the question about vfork.
| libexec/rtld-elf/map_object.c | ||
|---|---|---|
| 116 | Is this actually safe in a vfork-ish context? Actually, is *all* of this safe in a vfork-ish context? I kind of wonder if we need to try and enumerate the symbols involved and force binding before we rfork to avoid rtld altogether. | |
Patch as is has two serious bugs.
First one is that rtld malloc requires the rtld bind lock taken exclusively, which is generally not true for symbol lookups, where we only lock it shared.
Second bug is that sometimes we longjmp() from the middle of the symbol lookup, and this would leak some memory.
| libexec/rtld-elf/map_object.c | ||
|---|---|---|
| 116 | I believe that rtld as is should work after vfork() both in the parent and in the child. If the process is not mt, there is no question. For the mt process, we do not stop other threads from running. If an rtld lock is owned by the thread in the parent, it should be eventually cleaned which is visible to the child due to the vmspace sharing. The tricky part is that lock waiter might be in other process. Note that both parent and child share the same p_vmspace, so the umtx key that indexes the waiting thread into the umtx queue is identical for the parent and child. Then a wakeup from either parent or child thread should still wake inter-process. | |
Add malloc spinlock.
Record the donelist allocation in SymLook req to free it on relock.
Revert to alloca() for typical symbol lookup case, still doing malloc() if the number of loaded objects require rtld to use more than half of the stack page.
I am commenting here instead of PR for consistency. The most recent version of the patch doesn't introduce any breakage, but also doesn't solve bug 289425.
Might be, but lets confirm that reducing DLP_ALLOCA_LIMIT helps. Ultimately I think that only very small size of the alloca() allocations are safe, because e.g. user code might force absurdly low stack size.
Thank you for the suggestion. I am testing on two machines. So far so good.
| libexec/rtld-elf/rtld.c | ||
|---|---|---|
| 1040 | Reducing it to 128 seems to help. | |