Page MenuHomeFreeBSD

Fixes for sendfile.

Authored by kib on Mar 11 2020, 10:29 PM.


  1. Teach buffer pager about bogus pages. They should be silently skipped.
  2. Do not call into a vnode pager while leaving some pages from the same block as the current run, xbusy. This immediately deadlocks if pager needs to instantiate the buffer.
  3. Do not double-unbusy on error.
  4. Ensure that iodone() is called on erorrs from local pagers.
  5. Wait for all in-flight async io runs to complete before unwiring pages on error.

Also, it

  1. expands comments explaining nuances which are hard to see even after re-reading the code
  2. add specific malloc type so that leaks are more easily seen that e.g. just wired or busy pages leak
  3. order headers alphabetically
  4. some style.

PR: 244713 [1, 2]

There are more issues, I limited the scope to interaction with the pagers.
Even there, there are more issues, most serious is explained in XXXKIB comment.

The patch passed stress2 sendfile subset, but I asked Peter for more run time.

Diff Detail

rS FreeBSD src repository - subversion
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

kib retitled this revision from Fix sendfile over filesystems that use buffer pager. to Fixes for sendfile..
kib edited the summary of this revision. (Show Details)
kib added reviewers: glebius, markj.
263 ↗(On Diff #69822)

Can we rename pg to pa for consistency with sendfile_swapin(), or perhaps ma?

275 ↗(On Diff #69822)

When is pg == NULL?

282 ↗(On Diff #69822)

"might have invalidated them in the meantime"

384 ↗(On Diff #69822)

Shouldn't it be OFF_TO_IDX(off) + i, where i is the offset we pass when calling vm_pager_get_pages_async() below?

5236 ↗(On Diff #69822)

Could this fix PR 242626, where we saw an infinite loop in vfs_bio_getpages()?

kib marked 5 inline comments as done.Mar 24 2020, 7:25 PM
kib added inline comments.
263 ↗(On Diff #69822)

I renamed to pa.

275 ↗(On Diff #69822)

There are calls to sendfile_iodone(sfio, NULL, 0, error) at least. In some variants of the patch I did something to pg before count was evaluated. I can remove the check, I think, if you prefer.

384 ↗(On Diff #69822)

sfio is shared by many in-flight async reads, each having its own start position in the sfio->pg array. The formula in the sendfile_iodone() in the arg to vm_page_relookup() takes care of that.

5236 ↗(On Diff #69822)

Yes, I suspect it is the same cause as 244713. But both bogus page skipping, and unbusying nearby pages from the same buffer are needed.

kib marked 4 inline comments as done.

Rename pg to pa.
Remove pg != NULL check.
Fix grammar.

384 ↗(On Diff #69822)

Ah, right. I missed the (sfio->pa - pg) term in that call.

1018 ↗(On Diff #69839)

Do we need to wait for pending I/O to complete here too?

kib marked 2 inline comments as done.Mar 25 2020, 4:34 PM
kib added inline comments.
1018 ↗(On Diff #69839)

Yes, I think you are right. This case cannot happen on amd64.

I added a trivial function so that if more high-tech approach is implemented, e.g. with blockcount, it does not need to propagate.

kib marked an inline comment as done.

Wait for all io to finish before unwiring in case sf buf cannot be allocated.

This revision is now accepted and ready to land.Mar 25 2020, 5:38 PM

Just a general note, that I really dislike moving of local variables to a greater scope. Yes, I know that a document from the previous century named style(9) suggests to do that. But doing it this way makes code harder to read! Change my mind.

420 ↗(On Diff #69859)

Why is this line removed? Do we now rely on a pager checking that next page is already valid and disabling readahead?

Thanks a lot for the change! I'd really wish that scopes of local variables isn't increased. If it is possible to avoid doing that, I'd really appreciate.

Thanks a lot for the change! I'd really wish that scopes of local variables isn't increased. If it is possible to avoid doing that, I'd really appreciate.

Technically having all vars global probably improves both compiler life and stack overflow detection, because there is no question of limited or non-existent lifetime for any of local. Assume that some frame is only instantiated conditionally, then theoretically it could cause stack overflow.

I do not want to force Peter to retest the patch, he spent a lot of time.

420 ↗(On Diff #69859)

If you look at the line 497, rhpages reference is passed to get_pages() only if i + count == npages. Since we are decrementing count there, we always pass NULL for read-ahead ref.

This revision was automatically updated to reflect the committed changes.