Page MenuHomeFreeBSD

Fix linux_destroy_dev() behaviour when there are still files open from the destroying cdev.
ClosedPublic

Authored by kib on Dec 19 2018, 10:32 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 23, 10:26 PM
Unknown Object (File)
Tue, Jan 21, 4:45 PM
Unknown Object (File)
Sun, Jan 19, 9:20 PM
Unknown Object (File)
Fri, Jan 3, 11:08 AM
Unknown Object (File)
Wed, Jan 1, 2:11 AM
Unknown Object (File)
Mon, Dec 30, 2:30 AM
Unknown Object (File)
Dec 8 2024, 8:59 PM
Unknown Object (File)
Nov 24 2024, 1:01 PM

Details

Summary

Currently linux_destroy_dev() waits for the reference count on the linux cdev to drain, and each open file hold the reference. Practically it means that destroy_dev() is blocked until all userspace processes that have the cdev open, exit. FreeBSD devfs does not have such problem, because device refcount only prevents freeing of the cdev memory, and separate 'active methods' counter blocks destroy_dev() until all threads leave the cdevsw methods. After that, attempts to enter cdevsw methods are refused with an error.

Implement somewhat similar mechanism for LinuxKPI cdevs. Lower cdev refcount to only mean hold on the cdev memory. Add sirefs count to track both number of threads inside the cdev methods, and for single-bit indicator that cdev is being destroyed. In the later case, the call is redirected to the dummy cdev.

Add the mlx5_ib_disassociate_ucontext() method for mlx5 ib driver, which basically enables call to linux_destroy_dev() on the mlx5ib driver unload, which now uses the updated linux_destroy_dev() infrastructure.

Add a partial implementation for zap_vma_ptes() when possible (basically, when vma is backed by a managed object).

Sponsored by: Mellanox Technologies

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Does this code handle forking of file descriptors?

Does this code handle forking of file descriptors?

What do you mean by forking ? Duping happens at descriptors layer, not file.

sys/compat/linuxkpi/common/include/linux/cdev.h
55 ↗(On Diff #52158)

You might want to bump the FreeBSD_version changing the size of this structure.

sys/compat/linuxkpi/common/src/linux_compat.c
727 ↗(On Diff #52158)

It might be more correct to use filp->f_op, in case some driver is overriding the file operations?

1509 ↗(On Diff #52158)

else

error = 0; ???
sys/dev/mlx5/mlx5_ib/mlx5_ib_main.c
1338 ↗(On Diff #52158)

This part should be committed separately.

In D18606#396732, @kib wrote:

Does this code handle forking of file descriptors?

What do you mean by forking ? Duping happens at descriptors layer, not file.

OK, seems fine.

kib marked an inline comment as done.Dec 19 2018, 1:04 PM
kib added inline comments.
sys/compat/linuxkpi/common/include/linux/cdev.h
55 ↗(On Diff #52158)

In fact the size of the structure only changed on i386 but not on amd64.

sys/compat/linuxkpi/common/src/linux_compat.c
727 ↗(On Diff #52158)

Ok.

1509 ↗(On Diff #52158)

error is initialized above.

sys/dev/mlx5/mlx5_ib/mlx5_ib_main.c
1338 ↗(On Diff #52158)

There are at least three commits in the diff, another is zap_vma_ptes().

kib marked an inline comment as done.

Only use ldev fops when switched to dummy cdev.

BTW, there is a race between linux_destroy_dev() and linux_dev_fdopen(). If destroy started after fdopen call started and passed the ref check before fdopen incremented it, in either current or patched version we access freed memory.
I will fix it after this patch lands.

In D18606#396764, @kib wrote:

BTW, there is a race between linux_destroy_dev() and linux_dev_fdopen(). If destroy started after fdopen call started and passed the ref check before fdopen incremented it, in either current or patched version we access freed memory.
I will fix it after this patch lands.

Actually no, the new code seems to be immune to the problem.

The main portion of the change looks fine to me, I did not look very closely at the mlx5 change though.

sys/compat/linuxkpi/common/src/linux_compat.c
691 ↗(On Diff #52162)

It seems that this is really only supposed to unmap mappings associated with the specified vma. I don't think there's an easy way to do that with the information available from that struct, though.

696 ↗(On Diff #52162)

Extra newline.

794 ↗(On Diff #52162)

Oof, this is new to me:

1092                 /*                                                                                                                                       
1093                  * If the vn_open replaced the method vector, something                                                                                  
1094                  * wonderous happened deep below and we just pass it up                                                                                  
1095                  * pretending we know what we do.                                                                                                        
1096                  */                                                                                                                                      
1097                 if (error == ENXIO && fp->f_ops != &badfileops)                                                                                          
1098                         goto success;

Are there any other places in the tree that use this trick?

This revision is now accepted and ready to land.Dec 19 2018, 5:31 PM
kib marked an inline comment as done.Dec 20 2018, 3:27 AM
kib added inline comments.
sys/compat/linuxkpi/common/src/linux_compat.c
691 ↗(On Diff #52162)

Yes, it is very limited. This is one of the reason why I advocate for using managed device pager for all new code.

794 ↗(On Diff #52162)

I do not remember an other place in the tree, but this backdoor existed from the beginning of devfs, I believe. Also you should note similar ENODEV trick for fdescfs (look for several lines below in kern_openat()).

Remove stray empty line.

This revision now requires review to proceed.Dec 20 2018, 3:28 AM

No apparent regressions with the drm driver the Intel machine I tested with.

This revision is now accepted and ready to land.Dec 30 2018, 12:48 PM
This revision was automatically updated to reflect the committed changes.

What happend to the diffs in this differential revision?

This revision was not accepted when it landed; it landed in state Needs Review.Dec 30 2018, 3:38 PM
This revision was automatically updated to reflect the committed changes.
This revision was not accepted when it landed; it landed in state Needs Review.Dec 30 2018, 3:46 PM
This revision was automatically updated to reflect the committed changes.