Page MenuHomeFreeBSD

rtld: stop using alloca()
Needs ReviewPublic

Authored by kib on Sat, Jun 27, 3:16 PM.
Tags
None
Referenced Files
F160790556: D57908.diff
Sat, Jun 27, 9:43 PM
F160787535: D57908.diff
Sat, Jun 27, 8:55 PM
F160770172: D57908.diff
Sat, Jun 27, 4:00 PM
F160767561: D57908.diff
Sat, Jun 27, 3:22 PM

Details

Reviewers
kevans
Summary
For DoneList allocations, its size depends on the number of loaded DSOs.
For map_object(), its size is the number of segments in the object.
In both cases, over-grown situations would cause a stack overflow.

Noted by:       kevans


rtld: add spinlock around the crt malloc calls

Right now, the rtld malloc is called under the write-locked rtld bind
lock. A future change adds places where only read-locked rtld bind lock
is held, and then the spinlock protects the malloc structures from the
parallel updates.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

kib requested review of this revision.Sat, Jun 27, 3:16 PM

I haven't had a chance to thoroughly review this yet, but I think it's a good idea regardless of the question about vfork.

libexec/rtld-elf/map_object.c
116

Is this actually safe in a vfork-ish context? Actually, is *all* of this safe in a vfork-ish context? I kind of wonder if we need to try and enumerate the symbols involved and force binding before we rfork to avoid rtld altogether.

kib marked an inline comment as done.Sat, Jun 27, 5:39 PM

Patch as is has two serious bugs.

First one is that rtld malloc requires the rtld bind lock taken exclusively, which is generally not true for symbol lookups, where we only lock it shared.
Second bug is that sometimes we longjmp() from the middle of the symbol lookup, and this would leak some memory.

libexec/rtld-elf/map_object.c
116

I believe that rtld as is should work after vfork() both in the parent and in the child.

If the process is not mt, there is no question.

For the mt process, we do not stop other threads from running. If an rtld lock is owned by the thread in the parent, it should be eventually cleaned which is visible to the child due to the vmspace sharing. The tricky part is that lock waiter might be in other process. Note that both parent and child share the same p_vmspace, so the umtx key that indexes the waiting thread into the umtx queue is identical for the parent and child. Then a wakeup from either parent or child thread should still wake inter-process.

kib marked an inline comment as done.
kib edited the summary of this revision. (Show Details)

Add malloc spinlock.
Record the donelist allocation in SymLook req to free it on relock.

Revert to alloca() for typical symbol lookup case, still doing malloc() if the number of loaded objects require rtld to use more than half of the stack page.

I am commenting here instead of PR for consistency. The most recent version of the patch doesn't introduce any breakage, but also doesn't solve bug 289425.

I am commenting here instead of PR for consistency. The most recent version of the patch doesn't introduce any breakage, but also doesn't solve bug 289425.

Try reducing DLP_ALLOCA_LIMIT twice until it helps.

In D57908#1327731, @kib wrote:

I am commenting here instead of PR for consistency. The most recent version of the patch doesn't introduce any breakage, but also doesn't solve bug 289425.

Try reducing DLP_ALLOCA_LIMIT twice until it helps.

We could also probably bump the spawn stack size by a page or two as a compromise?

In D57908#1327731, @kib wrote:

I am commenting here instead of PR for consistency. The most recent version of the patch doesn't introduce any breakage, but also doesn't solve bug 289425.

Try reducing DLP_ALLOCA_LIMIT twice until it helps.

We could also probably bump the spawn stack size by a page or two as a compromise?

Might be, but lets confirm that reducing DLP_ALLOCA_LIMIT helps. Ultimately I think that only very small size of the alloca() allocations are safe, because e.g. user code might force absurdly low stack size.

Thank you for the suggestion. I am testing on two machines. So far so good.

libexec/rtld-elf/rtld.c
1040

Reducing it to 128 seems to help.