Page MenuHomeFreeBSD

Added ssid values to render UTF-8 encoded characters in ifconfig(8)
ClosedPublic

Authored by farhan_farhan.codes on Jun 20 2018, 5:22 AM.
Tags
None
Referenced Files
F99473340: D15922.id44571.diff
Wed, Oct 9, 11:47 PM
Unknown Object (File)
Sat, Oct 5, 6:04 AM
Unknown Object (File)
Sat, Oct 5, 6:03 AM
Unknown Object (File)
Sat, Oct 5, 6:03 AM
Unknown Object (File)
Sat, Oct 5, 6:03 AM
Unknown Object (File)
Sat, Oct 5, 6:03 AM
Unknown Object (File)
Sat, Oct 5, 6:03 AM
Unknown Object (File)
Sat, Oct 5, 5:43 AM
Subscribers

Details

Summary

Currently ifconfig(8) only prints the hex representation of ssid names
with non-ASCII characters. Many modern terminals are able to properly render
non-ASCII characters. This change checks if the terminal charmap is UTF-8, and
if so, will render the characters, rather than the hex value.

The method to verify the charmap is the same as "/usr/bin/locale charmap".

Test Plan

Set the ssid with non-ASCII characters. With the charmap set to UTF-8
verify that the ssid value will render properly.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

farhan_farhan.codes set the repository for this revision to rS FreeBSD src repository - subversion.

Minor correction to the diff to have quotes around the ssid name when it includes non-ASCII characters

I'm not sure if I object to this change or not, but it's worth noting that SSIDs are not necessarily UTF-8 strings. Unless the SSIDEncoding is set it is 0-32 octets. Having 0 bytes in the middle of the SSID is valid (though I'd be very surprised if that actually worked on more than a handful of devices). If SSIDEncoding is set it is indeed a UTF-8 string.
For additional fun Microsoft got this wrong and several Windows versions interpret the SSID as being Latin1 encoded.

In D15922#336873, @kristof wrote:

I'm not sure if I object to this change or not, but it's worth noting that SSIDs are not necessarily UTF-8 strings. Unless the SSIDEncoding is set it is 0-32 octets. Having 0 bytes in the middle of the SSID is valid (though I'd be very surprised if that actually worked on more than a handful of devices). If SSIDEncoding is set it is indeed a UTF-8 string.
For additional fun Microsoft got this wrong and several Windows versions interpret the SSID as being Latin1 encoded.

I discussed this with Farhan in IRC for a bit- I like the idea of actually respecting SSIDEncoding (the SSID EXTCAP, from what I can tell, by the time it hits us). I know for that hostap sets it given proper configuration- do you have any insight as to how many consumer APs actually set it or expose it properly? I went to go plumb this out and save it to the vap, only to discover my AP doesn't actually offer any way to set it.

I guess what I'm getting at is- if it's not commonly set or exposed, I think we should go ahead and do this. If we get it wrong, there's always the temporary workaround of env LC_CTYPE=C ifconfig and exposing the SSID Encoding bit doesn't appear to be a monumental task.

I discussed this with Farhan in IRC for a bit- I like the idea of actually respecting SSIDEncoding (the SSID EXTCAP, from what I can tell, by the time it hits us). I know for that hostap sets it given proper configuration- do you have any insight as to how many consumer APs actually set it or expose it properly? I went to go plumb this out and save it to the vap, only to discover my AP doesn't actually offer any way to set it.

That's a very good question. I've looked at what the APs around me transmit and they don't include it. I've also asked a friend who works in the field, and he also thinks it's not commonly used. I've also found at least some other people on the internet claiming that some APs transmit what they intend to be utf-8 SSIDs without actually setting SSIDEncoding.
Given that, I think you're right: let's include this as-is, and if it turns out to break things (it probably won't), we've got workarounds and a path to a better implementation.

sbin/ifconfig/ifieee80211.c
5394

Please do flip this around so that either utf8 actually reflects that the current codeset is UTF-8 and the rest of the logic later is inverted, or rename this to not_utf8 or something to that effect -- though I would kind of prefer the former.

In D15922#336873, @kristof wrote:

I'm not sure if I object to this change or not, but it's worth noting that SSIDs are not necessarily UTF-8 strings. Unless the SSIDEncoding is set it is 0-32 octets. Having 0 bytes in the middle of the SSID is valid (though I'd be very surprised if that actually worked on more than a handful of devices). If SSIDEncoding is set it is indeed a UTF-8 string.
For additional fun Microsoft got this wrong and several Windows versions interpret the SSID as being Latin1 encoded.

In D15922#339499, @kristof wrote:

I discussed this with Farhan in IRC for a bit- I like the idea of actually respecting SSIDEncoding (the SSID EXTCAP, from what I can tell, by the time it hits us). I know for that hostap sets it given proper configuration- do you have any insight as to how many consumer APs actually set it or expose it properly? I went to go plumb this out and save it to the vap, only to discover my AP doesn't actually offer any way to set it.

That's a very good question. I've looked at what the APs around me transmit and they don't include it. I've also asked a friend who works in the field, and he also thinks it's not commonly used. I've also found at least some other people on the internet claiming that some APs transmit what they intend to be utf-8 SSIDs without actually setting SSIDEncoding.
Given that, I think you're right: let's include this as-is, and if it turns out to break things (it probably won't), we've got workarounds and a path to a better implementation.

Few thoughts:

  1. I understand that beacon frames' extended capabilities field specify if the SSID is utf-8 encoded. However, if a user manually adds a utf-8 encoding ssid, it wouldnot rely on the SSIDEncoding bit, and would default to "not utf-8", despite being utf-8.
  2. I do not understand how net80211(8) enough, but as discussed on IRC sys/net80211/ieee80211_input.c Line 615 parses the extended capabilities feature. Without understanding ieee80211_parse_beacon(), I need to identify where beacon frames headers are stored in memory, and then add or update the extended capabilities field.

As discussed, let me submit a quick revision to this change LC_ALL to LC_CTYPE.

Changed the usage of the utf8 variable to true meaning "the terminal is utf8 capable"
Changed LC_ALL to LC_CTYPE

farhan_farhan.codes marked an inline comment as done.

Previous commit set utf8 when false, not true. Resolved.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 28 2018, 3:37 AM
This revision was automatically updated to reflect the committed changes.