Page MenuHomeFreeBSD

amd64: move common_tss into pcpu
ClosedPublic

Authored by kib on Sun, Nov 3, 5:29 PM.

Details

Summary

This saves some memory, around 256K I think. Also it reduces some code, e.g. KPTI does not need to specially map common_tss anymore.

Also, common_tss become domain-local.

Diff Detail

Repository
rS FreeBSD src repository
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib created this revision.Sun, Nov 3, 5:29 PM
kib added inline comments.Sun, Nov 3, 5:29 PM
sys/amd64/amd64/mp_machdep.c
342 ↗(On Diff #63905)

This block is simply moved above, so that pcpu_init() happens before initialization of pc_common_tss.

emaste added a subscriber: emaste.Tue, Nov 5, 3:34 PM
kib added a subscriber: pho.Tue, Nov 5, 6:04 PM
jhb accepted this revision.Wed, Nov 6, 7:46 PM
jhb added inline comments.
sys/amd64/amd64/cpu_switch.S
191 ↗(On Diff #63905)

I guess with our macros it doesn't work, but too bad you can't just use leaq here instead:

leaq PCPU(COMMONTSS),%r13
sys/amd64/amd64/machdep.c
1693 ↗(On Diff #63905)

This is redundant I think? amd64_bsp_pcpu_init1() is already setting this?

sys/amd64/amd64/pmap.c
9724 ↗(On Diff #63905)

This seems like the next candidate to move perhaps so that it is domain-local? Will it fit into the remaining space?

This revision is now accepted and ready to land.Wed, Nov 6, 7:46 PM
kib marked an inline comment as done.Wed, Nov 6, 8:28 PM
kib added inline comments.
sys/amd64/amd64/cpu_switch.S
191 ↗(On Diff #63905)

I need linear address there, while lea can only provide the effective. AFAIK there is no way to read the segment base at all, unless using something like RDGSBASE. This is why pc_prvspace exists.

sys/amd64/amd64/machdep.c
1693 ↗(On Diff #63905)

Yes, removed.

sys/amd64/amd64/pmap.c
9724 ↗(On Diff #63905)

Do you mean IDT or GDT ?

I believe GDT can be moved, it currently uses 13 descriptors which means 13 * 8 == 104 bytes. PCPU has around 3K free. I will do this next.

For IDT, I am not sure. Currently we use single global IDT, do you want to dup it to each CPU ? In principle it is doable but I remember a discussion some time ago about using per-cpu IDT to increase number of hw-unique MSI vectors, and an argument was that interrupt sources in sw provide enough dispatching capability so that we do not need it.

jhb added inline comments.Thu, Nov 7, 4:57 PM
sys/amd64/amd64/pmap.c
9724 ↗(On Diff #63905)

I actually meant IDT, though you are right that the per-CPU indirection is not in the IDT itself. The GDT does seem trivial as it is already per-CPU. For IDT we could perhaps do a per-domain IDT if we think it made any difference. Perhaps Netflix's workload is one where they could try a patch to see if there is any measurable difference.

This revision was automatically updated to reflect the committed changes.
kib marked an inline comment as done.