Page MenuHomeFreeBSD

Bug 224552 - 'od -c' show wrong char when it is a non-printable
ClosedPublic

Authored by yuripv_icloud.com on Jan 18 2018, 1:28 AM.
Tags
None
Referenced Files
F106139935: D13963.diff
Thu, Dec 26, 1:36 AM
Unknown Object (File)
Thu, Nov 28, 7:47 PM
Unknown Object (File)
Wed, Nov 27, 12:05 PM
Unknown Object (File)
Nov 24 2024, 12:45 PM
Unknown Object (File)
Nov 21 2024, 11:02 PM
Unknown Object (File)
Nov 18 2024, 5:38 PM
Unknown Object (File)
Nov 18 2024, 1:30 AM
Unknown Object (File)
Nov 17 2024, 10:33 PM
Subscribers

Details

Summary

od(1): Fix wrong output for some corner cases in multibyte locales.

Restore the original character to print if we used the look-ahead
buffer, but that didn't help -- we either got an illegal sequence
or still can't complete.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

Hi,

Thanks! Can you please also either:

a.) Provide a sample input, preferably a small/simple input, that breaks it, or
b.) Add a test case for it in tests/

I would appreciate it- either one is fine by me, just as long as we can get a test case written by one of us to demonstrate. =)

I thought about adding a test case and will add one, there are some questions though:

  • it should be new od_test.sh, right? not adding the od test cases to hexdump_sh.test where they don't belong.
  • is it ok to do 'LC_CTYPE=en_US.UTF-8 od' in the script? issue is only visible in multibyte locale (any) and I guess en_US.UTF-8 should be most common.

I thought about adding a test case and will add one, there are some questions though:

Excellent, thanks!

  • it should be new od_test.sh, right? not adding the od test cases to hexdump_sh.test where they don't belong.

Indeed. We should probably also add some other od test cases since it's not really covered in hexdump_test, but that can be a "later" problem unless you're willing and wanting to add some. We basically just need some simple tests to make sure od is still behaving sanely for each of the arguments, then we can expand that as other bugs pop up. We also lack multibyte coverage on the hexdump side, so that's other room for improvement.

The other stuff can definitely wait, though- they're completely irrelevant to this problem in front of us. =)

  • is it ok to do 'LC_CTYPE=en_US.UTF-8 od' in the script? issue is only visible in multibyte locale (any) and I guess en_US.UTF-8 should be most common.

Sure- NetBSD does this kind of thing in the cut(1) test [1]. I wanted to say it would be a good idea to do something like: locale -a | grep -q 'en_US.UTF-8' so you can fail gracefully (atf_skip) if the locale isn't on the system, but we already have some places where we apparently didn't care so I guess we didn't deem it likely to matter.

[1] https://svnweb.freebsd.org/base/head/contrib/netbsd-tests/usr.bin/cut/t_cut.sh?view=markup#l113

quick check (before/after the fix):

loki:yuri:~/ws/pr224552/usr.bin/hexdump/tests$ atf-sh od_test.sh c_flag
od_test.sh: WARNING: Running test cases outside of kyua(1) is unsupported
od_test.sh: WARNING: No isolation nor timeout control is being applied; you may get unexpected failures; see atf-test-case(4)
Executing command [ env LC_CTYPE=en_US.UTF-8 od -c /home/yuri/ws/pr224552/usr.bin/hexdump/tests/./d_od_a.in ]
Fail: stdout does not match golden output
--- /home/yuri/ws/pr224552/usr.bin/hexdump/tests/./d_od_cflag_a.out     2018-01-18 06:06:35.567676000 +0300
+++ /tmp/check.C5luBh/stdout    2018-01-18 06:29:17.777017000 +0300
@@ -1,3 +1,3 @@
-0000000    T   e   s   t   T   e   s   t   T   e   s   t   T   e   s 345
-0000020    T   e   s   t 345
+0000000    T   e   s   t   T   e   s   t   T   e   s   t   T   e   s 124
+0000020    T   e   s   t 360
 0000025
failed: atf-check failed; see the output of the test for details
loki:yuri:~/ws/pr224552/usr.bin/hexdump/tests$ PATH=/usr/obj/home/yuri/ws/pr224552/amd64.amd64/usr.bin/hexdump/:$PATH atf-sh od_test.sh c_flag
od_test.sh: WARNING: Running test cases outside of kyua(1) is unsupported
od_test.sh: WARNING: No isolation nor timeout control is being applied; you may get unexpected failures; see atf-test-case(4)
Executing command [ env LC_CTYPE=en_US.UTF-8 od -c /home/yuri/ws/pr224552/usr.bin/hexdump/tests/./d_od_a.in ]
Executing command [ env LC_CTYPE=en_US.UTF-8 od -c /home/yuri/ws/pr224552/usr.bin/hexdump/tests/./d_od_b.in ]
passed
usr.bin/hexdump/tests/od_test.sh
36 ↗(On Diff #38125)

This test actually fails for me; this is what I get running it through manually (note that my locale is set to en_US.UTF-8 by default):

0000000    T   e   s   t   T   e   s   t   T   e   s   t   T   e   s   å
0000020   **   T   e   s   t   å  **                                    
0000027

Here's some debug output scattered in:

0000000  trying clen=1
  T trying clen=1
  e trying clen=1
  s trying clen=1
  t trying clen=1
  T trying clen=1
  e trying clen=1
  s trying clen=1
  t trying clen=1
  T trying clen=1
  e trying clen=1
  s trying clen=1
  t trying clen=1
  T trying clen=1
  e trying clen=1
  s trying clen=18446744073709551614
Peeking
trying clen=1
  å
0000020   ** trying clen=1
  T trying clen=1
  e trying clen=1
  s trying clen=1
  t trying clen=2
  å  **                                    
0000027

So for the first character, it hits the preceding 's', gets back -2 from mbrtowc, and attempts a peek. The following mbrtowc then returns a 1,

usr.bin/hexdump/tests/od_test.sh
36 ↗(On Diff #38125)

That's weird, but most likely git/phabricator/something else changed the file contents, they should be the output of:

printf "TestTestTestTes\345Test\345" > d_od_a.in
usr.bin/hexdump/tests/od_test.sh
36 ↗(On Diff #38125)

Right, that's the ticket. =P

Can you go ahead and remove these files, and instead have them generated at test runtime? At that point, you can just:

printf "TestTestTestTes\345Test\345" > foo.in
...
od -c foo.in

Also, you should update the copyright notice at the top of this file. I lay no claim to this test set you've created.

yuripv_icloud.com added inline comments.
usr.bin/hexdump/tests/od_test.sh
36 ↗(On Diff #38125)

Done. Hopefully using "FreeBSD Project" in the copyright line is OK.

remove deleted .in files from Makefile

AFAIK, there's no technical reason it can't be assigned to "The FreeBSD Project" -- it's generally the author's (your) name, but I would assume that that's not strictly necessary if you'd rather not.

I think this looks good, minus the $(atf_get_srcdir) occurrences in the two atf_checks, but I can fix this prior to commit; no need for another round of review based on that. I'll take one last look-over in the morning (~9 hours from this post) and commit it unless you have further adjustments.

Hi,

Sorry, this got stalled while I poked others about the copyright assignment. I'm told that we don't currently have a process for assigning copyright to a general entity like this, but that one is being developed. The way I interpreted it being explained to me, assigning copyright to "The FreeBSD Project" isn't valid because this isn't a legal entity (IIRC); we only get away with it in /COPYRIGHT because there it represents all of the individual contributors to the project, and this is a special case because it's not over a specific work but the project in general.

The only step forward at this point is if we commit the file with something along the lines of Copyright (c) 2018 Yuri Pankov, since "anonymous" (in a way) contributions are not currently accepted until we have some process for assignment of copyright to the FreeBSD Foundation (IIRC this is how the process is going to work). Is this OK?

Ditto these comments for the other review you've added me to, as well.

Thanks for explanation, copyright updated; removed atf_get_srcdir you mentioned as well.

This revision is now accepted and ready to land.Jan 20 2018, 2:47 AM
This revision was automatically updated to reflect the committed changes.