Page MenuHomeFreeBSD

msetdomain prototype (similar to mbind())
Needs ReviewPublic

Authored by jeff on Mar 29 2018, 6:01 AM.
Tags
None
Referenced Files
Unknown Object (File)
Feb 10 2024, 1:01 AM
Unknown Object (File)
Dec 26 2023, 4:51 PM
Unknown Object (File)
Dec 23 2023, 2:25 AM
Unknown Object (File)
Dec 21 2023, 10:18 PM
Unknown Object (File)
Nov 25 2023, 6:33 AM
Unknown Object (File)
Nov 21 2023, 12:42 PM
Unknown Object (File)
Nov 11 2023, 8:29 AM
Unknown Object (File)
Nov 9 2023, 1:34 PM
Subscribers

Details

Reviewers
alc
kib
markj
Summary

This is an implementation parts of the following functionality:
http://man7.org/linux/man-pages/man2/mbind.2.html

I'm still testing so this is a WIP. I am not overly familiar with the vm_map_entry and anonymous vm_object split/merge rules. I'm looking for comments on whether I'm missing anything big in vm_map_setdomain(). I need to detect holes in the address space still since that should throw an error. I noticed that there was a ton of repeated code in vm_map.c and did a little cleanup to create a function I needed.

The kern_cpuset part is just boilerplate to copy in and validate the domain. I can easily test and verify that so feel free to ignore.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

vm/vm_fault.c
1601

We really don't want this to inherit. private copies of a file should use the thread's policy.

vm/vm_map.c
4202

This locking was unique to this instance of the pattern. It does not look like it's necessary.

vm/vm_object.c
1331

We really don't want this to inherit. private copies of a file should use the thread's policy.

Can you please generate diffs with large context (svn diff -x -U9999999), otherwise using the review system is rather hard.

vm/vm_fault.c
1601

Note that this is only for wired map entries. I agree that the removal of the assignment makes this code correct in more situations than it was with the assignment present.

vm/vm_map.c
858

Can you extract this part into separate patch ?

For instance, I want this part merged to stable/11.

4202

I think that it is not necessary.

kern/kern_cpuset.c
2224

We have trunc_page() for this.

2226

Style: no space after cast.

Also the previous assignment is redundant.

2269

KERN_INVALID_ADDRESS should probably map to ENOMEM for consistency with other similar syscalls?

vm/vm_map.c
4313

"contained in"

4376

Style: redundant parentheses.

4378

What's the purpose of calling vm_object_collapse() if OBJ_NOSPLIT is set?

vm/vm_map.c
4378

There is no purpose, but it is also harmless. OBJ_NOSPLIT implies that there is no backing object so the call is NOP.

kern/kern_cpuset.c
2226

This is of course copy & paste from vm_kern.c. I will review more carefully.

2269

Yes I will put a switch in once I have finalized the other code.

vm/vm_map.c
858

Yes I will do so before commit.

4376–4383

I think this is missing something like:

if (current->eflags & MAP_ENTRY_NEEDS_COPY) {
        vm_object_shadow(&old_entry->object.vm_object,
            &current->offset, current->end - current->start);
        current->eflags &= ~MAP_ENTRY_NEEDS_COPY;
        object = current->object.vm_object;
vm/vm_map.c
4376–4383

It might be too forcefull. It is not clear whether we should do it immediately or postpone the COW object creation until the first write fault. You would store the policy in the entry and then migrate it to the object on fault. In fact, this is very similar to how swap accounting work.

vm/vm_map.c
4376–4383

Isn't the downside only an extra vm object?

It seems the other way will require a lot of extra accounting for what could be simple. swap is used for every object, domain policy will be rare.

jeff marked 3 inline comments as done.Mar 30 2018, 5:44 AM
jeff added inline comments.
kern/kern_cpuset.c
2226

Upon further review, this is not quite the same.

Imagine address of 4095 and length of 4096
pageoff = 4096
addr = 0;
size = 4096 + 4095 = 8192
round_page() = 8192
two pages

trunc_page(4095) = 0;
round_page(size) = 4096;
one page

It could probably be written more clearly if we were to just calculate end and then round down start and round-up end but this is the pattern the other memory syscalls use.

vm/vm_map.c
4378

I guess the question is then, what do I do if I can't split? Does that leave me applying a domainset to a larger range than requested?

I addressed review feedback.

I read linux source and found that they return EFAULT if any of the range is unmapped.

I wrote a test program that tried various iterations of holes in address space, private mappings of files, shared mappings of files, etc. I fixed a few bugs in the process. This is getting closer.

kern/kern_cpuset.c
2029

perhaps error != 0

2035

Style requires no declarations in the middle of the function.

2207

Excessive ().

vm/vm_map.c
4376–4383

Extra object in the shadow chain. This makes scans/forks/collapses and vm_fault() cost.

We only do the copy on actual fault, before. Now, if e.g. an application does mbind() over its entire address space, then all text segments which are need_copy/ro, get the additional object in the chain.

OTOH I tend to agree that it is fine for the initial implementation. MIght be the involved implementation should move the entry->cred push from entry to object into some helper, and this helper used consistently. Then push of the domain from entry to object would be one-liner.

I intend to commit this next week. I will denote that it is experimental and API may change in a man page and in comments. I think we're going to need more burn-in time with applications before 12.0 settles. I have a commitment from Netflix to sponsor that work.