Page MenuHomeFreeBSD

Bug 132845: ggated: Fix double file open and file descriptor leak.
ClosedPublic

Authored by ota_j.email.ne.jp on Aug 24 2020, 2:12 AM.
Tags
None
Referenced Files
F106741524: D26168.id76297.diff
Sat, Jan 4, 5:31 PM
F106741217: D26168.id76417.diff
Sat, Jan 4, 5:25 PM
F106740326: D26168.id76275.diff
Sat, Jan 4, 5:07 PM
Unknown Object (File)
Thu, Jan 2, 3:39 AM
Unknown Object (File)
Thu, Dec 26, 11:41 PM
Unknown Object (File)
Sun, Dec 22, 6:20 PM
Unknown Object (File)
Nov 30 2024, 9:27 PM
Unknown Object (File)
Oct 24 2024, 8:22 AM

Details

Summary

Ggated opens a file twice during handshake. One can observe it with fstat.
Add a check for open fd to precent opening same device/file twice.

In addition, as ggated keeps the fd after forking for a child to
handle connection, ggatec destroy does not completely close the file descriptor.
As a result, unplugging device damages or ZFS volumes exported with ggated
cannot be modified, i.e. zfs revert, zfs receive, without killing ggated daemon.

Test Plan

% mdconfig -a -t swap -s 100M
md0
% cat /etc/ggated
127.0.0.0/24 RW /dev/md0
% ggated
% ggatec create 127.0.0.1 /dev/md0
ggate0
% fstat | grep md0
root ggated 71947 8 /dev 119 crw-r----- md0 rw
root ggated 71947 10 /dev 119 crw-r----- md0 rw
root ggated 71944 8 /dev 119 crw-r----- md0 rw
root ggated 71944 10 /dev 119 crw-r----- md0 rw
root md0 71931 wd / 2 drwxr-xr-x 1024 r
root md0 71931 root / 2 drwxr-xr-x 1024 r
% ggatec destroy -u -0
% fstat | grep md0
root ggated 71944 8 /dev 119 crw-r----- md0 rw
root ggated 71944 10 /dev 119 crw-r----- md0 rw
root md0 71931 wd / 2 drwxr-xr-x 1024 r
root md0 71931 root / 2 drwxr-xr-x 1024 r
% pkill ggated
% fstat | grep md0
root md0 71931 wd / 2 drwxr-xr-x 1024 r
root md0 71931 root / 2 drwxr-xr-x 1024 r

% bug fix
% ggatec create 127.0.0.1 /dev/md0
ggate0
% fstat | grep md0
root ggated.full 71991 8 /dev 119 crw-r----- md0 rw
root md0 71931 wd / 2 drwxr-xr-x 1024 r
% ggatec destroy -u -0
% fstat | grep md0
root md0 71931 wd / 2 drwxr-xr-x 1024 r
root md0 71931 root / 2 drwxr-xr-x 1024 r

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

I'm a bit confused by the description. It looks like we only open the device once during the handshake, the real problem is that the parent keeps the disk device open.

sbin/ggate/ggated/ggated.c
540 ↗(On Diff #76136)

Shouldn't we set conn->c_diskfd = -1 here?

543 ↗(On Diff #76136)

The opening brace and else if should be on the same line.

sbin/ggate/ggated/ggated.c
543 ↗(On Diff #76136)

Sorry, closing brace.

First of all, thank you for your quick response.

It has been quite a while and it took me a while to recollect what I found and how I attempted to fix.
I'm updating the patch and will be testing for a couple of days.

One ggate session creates 2 TCP connections.

  1. ggatec requests 2 handshakes (read and write) and each handshake results ggated opening 1 fd for a file.
  2. when ggated receives the both handshakes, ggated forks but file descriptors left open.

In addition, when we use ggatec rescue, it will open 2 additional file descriptors.
Ggated creates a new connection for rescue, thus, it doesn't make sense to keep connection after forking.

Ref ggatec - https://svnweb.freebsd.org/base/head/sbin/ggate/ggatec/ggatec.c?view=markup#l380
Ref ggated - https://svnweb.freebsd.org/base/head/sbin/ggate/ggated/ggated.c?view=markup#l525

In other words, these are the problems:

  1. ggated opens the same file twice and get 2 fds,
  2. ggated lose track of 1 of 2 fds, and
  3. ggated doesn't close these fds.

In older releases, having 2 TCP connections caused major slowness and I reported.
I haven't seen such slowness anymore with newer releases for years.
In the patch, I addressed all of above issues but had to bump the protocol version.
In addition, there wasn't phabricator back then and didn't get much attentions.
Ref - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=132798 .

We can improve protocol negotiations and resource use later. I'm trying to address resource leak at moment.

sbin/ggate/ggated/ggated.c
540 ↗(On Diff #76136)

I'm moving close() to connection_close() function.
That looks a better place to close.
It is called after this function returns.

First of all, thank you for your quick response.

It has been quite a while and it took me a while to recollect what I found and how I attempted to fix.
I'm updating the patch and will be testing for a couple of days.

One ggate session creates 2 TCP connections.

  1. ggatec requests 2 handshakes (read and write) and each handshake results ggated opening 1 fd for a file.
  2. when ggated receives the both handshakes, ggated forks but file descriptors left open.

In addition, when we use ggatec rescue, it will open 2 additional file descriptors.
Ggated creates a new connection for rescue, thus, it doesn't make sense to keep connection after forking.

I think I see, thanks.

Ref ggatec - https://svnweb.freebsd.org/base/head/sbin/ggate/ggatec/ggatec.c?view=markup#l380
Ref ggated - https://svnweb.freebsd.org/base/head/sbin/ggate/ggated/ggated.c?view=markup#l525

In other words, these are the problems:

  1. ggated opens the same file twice and get 2 fds,
  2. ggated lose track of 1 of 2 fds, and
  3. ggated doesn't close these fds.

In older releases, having 2 TCP connections caused major slowness and I reported.
I haven't seen such slowness anymore with newer releases for years.
In the patch, I addressed all of above issues but had to bump the protocol version.
In addition, there wasn't phabricator back then and didn't get much attentions.
Ref - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=132798 .

We can improve protocol negotiations and resource use later. I'm trying to address resource leak at moment.

Ok. I am not very familiar with ggate but am happy to help review and commit patches for any outstanding issues.

Moved close() to connection_remove() and aligned with send/recv fds.

connection_remove() is called after fork().

sbin/ggate/ggated/ggated.c
353 ↗(On Diff #76275)

How exactly do we know that the connection always refers to the same disk? That is, why do we not close diskfd and reopen here?

ota_j.email.ne.jp added inline comments.
sbin/ggate/ggated/ggated.c
353 ↗(On Diff #76275)

ggated/ggatec uses c_token to identify a session in connection_find() function.

For long term solution, I'd like to suggest creating a newer protocol to use a single fd for both read and write and use one handshake instead of 2 like I originally tried to address at https://bugs.freebsd.org/bugzilla/attachment.cgi?id=94611&action=diff .

For short term, it looks we can reject path mismatch. ggatec won't function opening 2 different files, anyway...

we have conn->c_path stored from the 1st handshake. We can check c_path is the same for the same c_token; otherwise to reject the 2nd handshake. It looks we can add such a check here.

Verify paths are the same between handshakes for a single session.

sbin/ggate/ggated/ggated.c
944 ↗(On Diff #76297)

Shouldn't we call connect_remove() here? One step at a time I guess.

353 ↗(On Diff #76275)

Hrm, I see that ggated doesn't really try to handle untrusted requests. connection_new() blindly strdup()s a path sent by a client without verifying that it's nul-terminated... :/

This revision is now accepted and ready to land.Aug 28 2020, 3:41 PM

Thank you for your support, Mark.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=132845 is the corresponding bugzilla entry.

I'm hoping this bug fix gets into 12.2-RELEASE.

Meanwhile, I have also entered https://reviews.freebsd.org/D21388 to address debug mode printing negative offset a while ago.
If you have time, please take a look as well.

sbin/ggate/ggated/ggated.c
944 ↗(On Diff #76297)

ggatec doesn't try to recover form half-processed connection negotiation, today.

This gets cleaned up by connection_cleanups().

Simplifying the handshake will solve many of these issues for long run.