Page MenuHomeFreeBSD

[ggated/ggatec] Simplify Gate Handshake.
Needs ReviewPublic

Authored by ota_j.email.ne.jp on Sep 24 2020, 2:06 AM.
Tags
None
Referenced Files
F108759924: D26542.id78405.diff
Mon, Jan 27, 6:43 PM
Unknown Object (File)
Tue, Jan 21, 5:59 AM
Unknown Object (File)
Sat, Jan 18, 10:15 PM
Unknown Object (File)
Sat, Jan 18, 5:54 PM
Unknown Object (File)
Dec 20 2024, 2:08 PM
Unknown Object (File)
Dec 2 2024, 2:21 AM
Unknown Object (File)
Nov 30 2024, 10:19 AM
Unknown Object (File)
Nov 14 2024, 11:52 AM

Details

Reviewers
markj
pjd
Group Reviewers
manpages
Summary

Existing handshake, version 0, exchanges a token twice to pass input and output separately with different TCP connections.

New handshake, version 1, uses a single connection for both input and output.
This reduces the number of TCP connections between client and server by a half and also simplifies server side code( after version 0 is removed.)

New client will try newer protocol version first and if fails, it falls back to the old protocol, by default. -0 or -1 may be used to explicitly specify the protocol version to use for a connection.

The server side will accept both versions.

This idea was mentioned in https://reviews.freebsd.org/D26168 .
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=132798 was the original bug report I filed and proposed a fix. Back then, same change was suggested but without compatibilities.

Test Plan

Try all combination of new and old protocol with new implemtaton and with older version, i.e. 12.x-RELEASE.

server% ggated
client% ggatec <host> <path>
client% ggatec -0 <host> <path>
client% ggatec -1 <host> <path>

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 34235
Build 31380: arc lint + arc unit

Event Timeline

ota_j.email.ne.jp retitled this revision from Simplify Gate Handshake. to [ggated/ggatec] Simplify Gate Handshake..Sep 24 2020, 3:48 AM
ota_j.email.ne.jp edited the summary of this revision. (Show Details)
ota_j.email.ne.jp added a reviewer: markj.

Could you explain why using a single TCP connection improves performance? I find it easy to believe that it does, but I'm fairly ignorant of TCP internals and would like to understand the reason for it.

The slowness issue existed back in a decade ago around 6.1-RELEASE and some years/releases thereafter.

That's when I posted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=132798 and according to my note, the transmission rate was 1 packet per sec. I recall that happened to a specific pair of machines. Some pairs of machines or local connections worked okay. I don't see these issues any longer in the past couple of years. I suspect it was more like driver issues. The patch on the bugzilla allowed all of combinations of machines to work at maximum capacities. The original patch wasn't backward compatible.

Given TCP problem seem to be gone now, the change is to simplify connection negotiation and reduce open TCP ports. The existing implementation opens 2 TCP connections per client. So, more TCP ports are used. The new code only needs to one. The current protocol will run out of TCP ports sooner than the new one.

Ggated maintains a list of connection in order to match read TCP and write TCP connections for a single session and pair up during hand shake. New protocol won't need data structure on ggated side. I don't know if there is a chance to remove the old protocol, though.

I don't know if there is a chance to remove the old protocol, though.

I'm not sure. What do you use ggate for? I don't think it gets a lot of use in general.

sbin/ggate/ggatec/ggatec.c
78

This is rather confusing. 2 apparently means that versions 0 and 1 are acceptable. It might make more sense to have a bitmask of permitted versions.

293–295

Extra whitespace after "?".

294

This can just be %u.

406

Extra whitespace between the parentheses.

419

Can just be if (g_gatec_connect_v1()) { ...

423

The style is wrong here, see style(9).

559

This looks wrong. Shouldn't it be protocol_version = ch - '0'?

sbin/ggate/ggated/ggated.c
851

"handshake() returns true..."

930–931

How can we have gv_version != GGATE_VERSION0 here? If it is equal to GGATE_VERSION1 then conn == NULL.

987–988

Isn't this statement unreachable?

sbin/ggate/shared/ggate.h
50

I would leave "ggatec" uncapitalized, ditto below.

Address some of problems pointed by code review.

I make multiple read-only connection for installworld/installkernel from different machines.
Otherwise, I use ggate to export some ufs file systems.
I could use nfs for these as well.

Few cases, I had to export devises to run installworld/kernel for upgrading from RELEASE-12.0 to RELEASE-12.1 due to ifunc as installkernel/installworld failed due to difference in compile/link flags. That was an convenient use case.

Another rare case is I had diskless boot and wanted to nfs export from the diskless boot for local dis file sharing. NFS exports from diskless boot seem to have problems and I had use ggated to export.

I will address other comments in following updates.

Use bitmask to indicate which protocol versions to try.
Also remove duplicate check.

ota_j.email.ne.jp added inline comments.
sbin/ggate/ggated/ggated.c
987–988

Yes, that's the case because of another switch statement returning earlier above.

What's the best practice in FreeBSD?

Remove extra spaces around a function argument.

bcr added a subscriber: bcr.

OK from manpages.

Unidirectional data transfer is definitely the worst case for TCP and a single bidirectional socket should work better than two unidirectional sockets whilst using less resources. That said, have you done any performance comparisons between the protocol versions with recent FreeBSD?

I have been testing with this code for about a week. Other than the problem I identified in g_gatec_connect(), I haven't had any issues. I haven't specifically tried comparing protocols 0 and 1 but protocol 1 should at worst perform the same as protocol 0 and used only half the TCP connections, making it an overall win.

sbin/ggate/ggatec/ggatec.c
418

The 3 try_version assignments in g_gatec_connect() break the automatic link recovery: If ggatec detects a synchronisation error with, or loses connection to, its peer ggated process then it will close the existing socket(s) and call g_gatec_connect() to establish a new connection. The two "try_version = 1 << GGATE_VERSION_n" assignments mean that it will never try a different protocol - which could be necessary if ggated restarted. Worse, if both connection attempts fail then try_version is set to 0 and ggatec can never recover.

try_version is never read outside this function so the assignments serve no purpose.