What is the benefit of moving it? Also, at the new location, you perform the checks for every tcp_output() call, although it is only needed when dealing with the connect() call.
Furthermore, if you remove it from tcp_connect(), you also need to remove it from tcp6_connect().

In the code being removed from sys/netinet/tcp_usrreq.c, this comment said:

XXX: This should move to tcp_output().

Which is what I am doing here.

I also did the same for tcp6_output() with the updated diff.

So in short, the benefit is that we do not duplicate code in tcp_output() and tcp6_output().

Looking through the history of that comment (to move this to tcp_output), it seems that the context was something different. In rS166403 that comment was added, when the advertised receive window scale was simply adjusted up to have a minimum granularity of just above 1 MSS. But only part of that code was kept subsequently... With other words, that comment to have this section moved appeared under a different context, and seems to be a left-over without merit not (thus remove that line from the comment instead, perhaps)?

I don't think that continously re-evaluating this in the tcp_output fastpath (with the associated L1 cpu cache churn) brings any benefit - the should not change over the lifetime of a tcp session (since the window scale factor is announced only once). However, checking the variables will load the cachelines over and over again - possibly slowing down things slightly with no gain.

If removing duplicate code between tcp_connect and tcp6_connect is the goal, there is much more duplicated there - while virtually indentical after the first few IP version specific lines in these routines... But having code that really only has to be evaluated once, only gone through once, sounds better to me....

Makes sense.