Page MenuHomeFreeBSD

ifconfig(8) list scan: display non-Latin AP names and account for CJK character width
Needs ReviewPublic

Authored by danfe on Nov 5 2021, 12:54 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Oct 30, 12:20 PM
Unknown Object (File)
Sun, Oct 20, 6:42 PM
Unknown Object (File)
Sun, Oct 20, 1:32 PM
Unknown Object (File)
Oct 5 2024, 8:27 AM
Unknown Object (File)
Sep 30 2024, 2:14 PM
Unknown Object (File)
Sep 25 2024, 11:17 PM
Unknown Object (File)
Sep 18 2024, 11:26 AM
Unknown Object (File)
Sep 1 2024, 7:27 AM

Details

Reviewers
emaste
Group Reviewers
wireless
Summary

People living in counties which use non-Latin script to write their language often use it to name their WiFi network access points (APs). They work just fine when used in the /etc/wpa_supplicant.conf, but would be reported as hex strings by ifconfig wlan0 list scan, because it considers any character outside 0x20..0x7e range as non-printable.

This probably made sense when 8-bit encondings were common standard, but becomes a nuisance in UTF-8 world. Hence I propose we try to handle the SSID as multibyte string (subject to user's locale) and fallback to hex-dumping when we encounter a character that cannot be recognized, converted, or printed.

I took a step further and also account for the fact that CJK characters are double-width, so mixed CJK/Russian/Latin access points data are now printed correctly aligned in the scan results.

Test Plan

ifconfig wlan0 list scan
env LANG=C ifconfig wlan0 list scan (this will get the original behavior)

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

danfe requested review of this revision.Nov 5 2021, 12:54 PM
emaste requested changes to this revision.Nov 5 2021, 1:56 PM
emaste added a subscriber: emaste.

This seems like a very useful addition, but I am concerned about invalid/illegal sequences and note that errors from mbtowc() are not handled.

sbin/ifconfig/ifieee80211.c
3742

error handling from mbtowc?

This revision now requires changes to proceed.Nov 5 2021, 1:56 PM
sbin/ifconfig/ifieee80211.c
3459–3460

Ideally we'd perform some validity checking here, not just < ' '.

@emaste wrote:

I am concerned about invalid/illegal sequences and note that errors from mbtowc() are not handled.

Right, esp. that it can return -1, I have to fix this. I guess we can judge about character validity based on its return value.

In theory you have to check for the Extended Capabilities element (127, IEEE80211_ELEMID_EXTCAP) to be present in the same frame that contains the SSID and bit 48 (SSID is UTF-8 encoded) being set.
If that is not there or it is not set the encoding is undefined; I think there was an exception if you knew from previous frames which had the extcap; I'd have to go and look it up.

So the proper solution is slightly more complex and I'd suggest adding that check before doing an UTF-8 decode.

@bz wrote:

In theory you have to check for the Extended Capabilities element (127, IEEE80211_ELEMID_EXTCAP) to be present in the same frame that contains the SSID and bit 48 (SSID is UTF-8 encoded) being set.

So the proper solution is slightly more complex and I'd suggest adding that check before doing an UTF-8 decode.

Very good points, yes we should do that.

@bz wrote:

In theory you have to check for the Extended Capabilities element (127, IEEE80211_ELEMID_EXTCAP) to be present in the same frame that contains the SSID and bit 48 (SSID is UTF-8 encoded) being set.

Hi Bjoern, sorry for not looking into this sooner rather than later, but yesterday I finally did. :-) Apparently we currently don't make any use (and thus keep track) of extended capabilities, per what I see in net80211/ieee80211_input.c. So we first have to add missing bits to wlan.ko. I've modified the code like this:

diff --git a/sys/net80211/ieee80211_input.c b/sys/net80211/ieee80211_input.c
index 66a5ba1c4035..04e03acef652 100644
--- a/sys/net80211/ieee80211_input.c
+++ b/sys/net80211/ieee80211_input.c
@@ -560,6 +560,8 @@ ieee80211_parse_beacon(struct ieee80211_node *ni, struct mbuf *m,
 	scan->ies = frm;
 	scan->ies_len = efrm - frm;
 
+	int utf8_ssid = 0;
+
 	while (efrm - frm > 1) {
 		IEEE80211_VERIFY_LENGTH(efrm - frm, frm[1] + 2,
 		    return (scan->status = IEEE80211_BPARSE_BADIELEN));
@@ -642,8 +644,11 @@ ieee80211_parse_beacon(struct ieee80211_node *ni, struct mbuf *m,
 			scan->meshconf = frm;
 			break;
 #endif
-		/* Extended capabilities; nothing handles it for now */
+		/* Extended capabilities; we only check bit 48 (UTF-8 SSID) */
 		case IEEE80211_ELEMID_EXTCAP:
+			utf8_ssid = 2;
+			if (frm[1] >= 7 && frm[8] & 0x01)
+				utf8_ssid = 3;
 			break;
 		case IEEE80211_ELEMID_VENDOR:
 			if (iswpaoui(frm))
@@ -733,6 +738,8 @@ ieee80211_parse_beacon(struct ieee80211_node *ni, struct mbuf *m,
 		 */
 		IEEE80211_VERIFY_LENGTH(scan->country[1], 3 * sizeof(uint8_t),
 		    scan->country = NULL);
+		if (utf8_ssid > 0)
+			scan->country[utf8_ssid] |= 0x20;
 	}
 	if (scan->csa != NULL) {
 		/*

So if an EXTCAP element is present in the frame, I'd lowercase the first country code letter, and if bit 48 (first bit of the 7th octet) is set, I'd lowercase the second country code letter. This way I can easily see the results in ifconfig -v wlan0 list scan without modifying anything else. To my surprise, while most frames did contain extended capabilities, none of them indicated that SSID is UTF-8 encoded (while it really was) and one AP with non-Latin UTF-8 SSID did not send any EXPCAP element. Am I peeking at the wrong place? Or is it really the case that many APs simply do not set that bit even if the SSID is indeed encoded in UTF-8, per this comment?

danfe edited the test plan for this revision. (Show Details)

Since there apparently is no reliable way to judge about SSID string encoding (not too many appliances set the corresponding Extended Capabilities bit) and based on the earlier feedback I'm proposing the second, more robust version of the patch. It now does not assume any particular encoding of the SSID and lets mbtowc() and wcwidth() do their job. We just have to check their return values and fallback to hex-dumping as soon as we encounter a character that cannot be recognized, converted, or printed, depending on the user's locale. This correctly handles 8-bit locales (LANG=C) and also allows to get rid of the if (*p < ' ' || *p > 0x7e) check completely.

I've dropped the truncating the printable SSID with ellipsis since it's generally unsafe to ASCII-patch multibyte string like this; doing it properly would needlessly complicate the logic. I think it's enough to do that for hex-dumping. Corresponding comment had been adjusted.

danfe edited the test plan for this revision. (Show Details)
melifaro added inline comments.
sbin/ifconfig/ifieee80211.c
3464

I'd try to split this into multiple functions to improve readability, for example:

bool is_ssid_printable(essid, essid_len, &printable_len)
&
copy_essid_hex()

It'd be nice to name variables in a bit more human-redable fashion