⚙ D31603 [libc][NFC] Add restrict, pure, and malloc_like qualifiers to string functions

gfunni234_gmail.com requested review of this revision.Aug 18 2021, 6:07 PM

gfunni234_gmail.com created this revision.

Added the usual suspects to review this + markj because it touches the san code.

Before this could be accepted I'd at least expect a full ports exp-run, and quite a bit of testing on the resulting packages, as changing the behavior of memcpy() as above will very likely lead to some failures. Strictly speaking you should never call memcpy() with overlapping arguments, but our traditional implementation has always allowed it. So how much fallout would this give? I know that when glibc changed something in this area they got complaints about programs breaking, if those should not have done the overlapping memcpy's.

That said, if there is a real performance gain that can be proven, then maybe it is worth it? Did anybody do some testing to see if there is any gain, in exchange for adding a footgun?

lib/libc/string/bcopy.c
83 ↗	(On Diff #93896)	I would prefer to split the cosmetic changes to this function from the functional ones. E.g, have one commit that adds the __restrict and only adds this `#ifndef`, then later the all the switching around of dst and src, and all the other cosmetics.
lib/libc/string/mempcpy.c
40	I'm not sure what is gained here except more characters? The end cast to `void ` is not needed, it is implicit because of the return type. Changing the inner cast from `char ` to `unsigned char *` also does not have any effect, since the pointers are never dereferenced. The original seems clearer to me, and less obfuscated.
sys/sys/libkern.h
171	I take it that these are the kernel side versions, which don't attempt to load locales and that sort of stuff? Otherwise they're definitely not pure. :)

This is hard to review for several reasons:

There are lots of unrelated changes going on here (adding restrict/pure/etc is one set of changes, assuming forwards copy for memcpy, randomly shuffling things around, changing types of things with no explanation). It'd be much easier to review each individual change on its own, and I would like to see the changes committed individually too so there is the opportunity to have proper explanation for every aspect you're changing, plus it's then bisectable, and if one commit breaks something just that individual commit can be reverted.
Your diff has been uploaded with the default amount of context (hence the "Context not available") so it's difficult to see the surrounding code to verify that it does the right thing after these changes. Please upload either with git diff -U99999 (number of 9s arbitrary just big enough to always be the whole file) or use arc diff which will grab the whole file automatically.

We also need to think very carefully as a project about whether we want memcpy to no longer be an alias for memmove, as that does have implications for compatibility with existing code, even if it shouldn't be relied upon. I don't know if our assembly implementations already break that though, I'd have to check.

lib/libc/string/bcopy.c
43 ↗	(On Diff #93896)	Why? Shouldn't make a difference for FreeBSD as, other than signedness, they should be the same. For CheriBSD we need to use a different type, intcap_t (which is what intptr_t is for pure-capability code so the existing code _almost_ works, except for hybrid code intptr_t remains a 32/64-bit integer, not a capability like intcap_t).
71 ↗	(On Diff #93896)	__predict_false?
76 ↗	(On Diff #93896)	This should be entirely equivalent to what was there before, and any modern compiler should be able to optimise that.
79 ↗	(On Diff #93896)	Moving these violates style(9)
79 ↗	(On Diff #93896)	Why use u_char all of a sudden? char works just fine.
81 ↗	(On Diff #93896)	Using size_t here is actually good for CHERI, we changed it downstream as otherwise you have `uintptr_t \| uintptr_t` later, the provenance for which is ambiguous, but we don't actually need a pointer, just the low bits of it.
90 ↗	(On Diff #93896)	Why the blank line?
110 ↗	(On Diff #93896)	Why move dst and src around?
122 ↗	(On Diff #93896)	Why move?
124 ↗	(On Diff #93896)	Why the blank line?
126 ↗	(On Diff #93896)	< has become <=, I assume because you copied it from the forward case, and I think this now behaves incorrectly for an unaligned-but-by-the-same-amount (e.g. 0x1001 to 0x2001) word-sized memcpy. It works for the forward case because `t = wsize - (t & wmask)` gives you `wsize` when `t == length`, but in the backwards case `t &= wmask` gives you 0 not `length` (which, due to the use of `TLOOP1`, ends up doing SIZE_MAX+1 iterations rather than 0).
137 ↗	(On Diff #93896)	Why swap the order?
144 ↗	(On Diff #93896)	There's nothing else in the file, seems a bit unnecessary
lib/libc/string/mempcpy.c
40	Why? unsigned isn't needed, and implicit casts to `void *` are legal C.
lib/libc/string/strtok.c
60	You've lost the return NULL for if s is still NULL
sys/sys/libkern.h
179	`__restrict` needs a space before it, but touching libkern.h files without the associated changes to the implementation is wrong.

In D31603#712550, @dim wrote:

I know that when glibc changed something in this area they got complaints about programs breaking, if those should not have done the overlapping memcpy's.

See e.g.: https://lwn.net/Articles/414467/ , and https://bugzilla.redhat.com/show_bug.cgi?id=638477

gfunni234_gmail.com marked an inline comment as done.Aug 20 2021, 1:25 AM

gfunni234_gmail.com added inline comments.

lib/libc/string/bcopy.c
110 ↗	(On Diff #93896)	It reads better since src comes after dst

Added context, fixed behavior

lib/libc/string/bcopy.c
71 ↗	(On Diff #93896)	yes

gfunni234_gmail.com marked 3 inline comments as done.Aug 20 2021, 1:38 AM

gfunni234_gmail.com added inline comments.

lib/libc/string/bcopy.c
76 ↗	(On Diff #93896)	yeah it just looks cleaner
lib/libc/string/strtok.c
60	I mean it is undefined to have s and last be null according to the c standard

gfunni234_gmail.com updated this revision to Diff 93954.Aug 20 2021, 1:44 AM

gfunni234_gmail.com marked 2 inline comments as done.

Many of my per-line comments remain unaddressed, as do all of my general comments and dim's.

lib/libc/string/bcopy.c
71 ↗	(On Diff #93896)	I mean "use the __predict_false macro rather than writing your own".
76 ↗	(On Diff #93896)	That's not a good reason to change something. Both are equivalent semantically, both get optimised to the same thing and both are easy to understand. But whether or not it's cleaner is subjective; personally I prefer what's there because it's shorter and reuses code.
79 ↗	(On Diff #93896)	Neither of these have been addressed.
110 ↗	(On Diff #93896)	Subjective. Also it was better when they were both on the same line, rather than one hiding at the end of the previous long line. Don't change code because you can.
lib/libc/string/strtok.c
60	Not true; `last` is entirely unspecified and up to the implementation. In particular, when our strtok_r returns the final token, it sets last to NULL, but the application needs to call strtok_r one more time to see the NULL return value. This means the final call to strtok_r for an application iterating over the tokens is going to dereference a null pointer. See https://godbolt.org/z/9qEqb34z4 for an example proving the old code works but your new code renders the standard use broken. Also, strtok_r is POSIX not C.

gfunni234_gmail.com added inline comments.Aug 21 2021, 1:34 PM

lib/libc/string/bcopy.c
110 ↗	(On Diff #93896)	Don't think of it in terms of changes, but think of the code how it IS. Especially when the other code has dst FIRST and then src. Consistency is better even in small cases like this

gfunni234_gmail.com marked 2 inline comments as done.Aug 21 2021, 1:44 PM

gfunni234_gmail.com added inline comments.

lib/libc/string/bcopy.c
79 ↗	(On Diff #93896)	unsigned char is always a byte per the c standard. Strangely, it is not guaranteed for char, despite it being that way on pretty much every arch. It feels wrong to operate on the char when the function is about raw memory, not string chars.

gfunni234_gmail.com added inline comments.Aug 21 2021, 1:45 PM

lib/libc/string/bcopy.c
76 ↗	(On Diff #93896)	Also reason: tloop1 is not defined when tloop is, so there is no guarantee of it being properly substituted, even though every compiler thus far has been doing so

Resolved more complaints

gfunni234_gmail.com marked an inline comment as done.Aug 21 2021, 3:54 PM

I think changing memcpy's behavior at this late date is a complete non starter. It would be a huge self inflicted wound that will take years to sort out. I'd completely drop that part of this change. It's already too large as it is.

This revision now requires changes to proceed.Aug 21 2021, 4:16 PM

Removed memcpy changes

All fixed!

gfunni234_gmail.com updated this revision to Diff 94005.Aug 21 2021, 5:27 PM

gfunni234_gmail.com updated this revision to Diff 94060.Aug 23 2021, 4:12 PM

gfunni234_gmail.com updated this revision to Diff 94119.Aug 24 2021, 5:49 PM

gfunni234_gmail.com retitled this revision from Add restrict, pure, and malloc_like qualifiers to string functions to [libc][NFC] Add restrict, pure, and malloc_like qualifiers to string functions.

Your description still completely ignores a large chunk of the unrelated changes you're making

lib/libc/string/mempcpy.c
40	Comments still apply; zero reason to change the char to u_char below. Then you don't need to include a new header too.
sys/libkern/strlcat.c
56–62	Old code was clearer
62	style(9) violation

Addressed all issues!

gfunni234_gmail.com marked 2 inline comments as done.Aug 24 2021, 7:14 PM

gfunni234_gmail.com added inline comments.

lib/libc/string/mempcpy.c
40	is char * always one byte? if so then you are right

In D31603#714295, @gfunni234_gmail.com wrote:

Addressed all issues!

Not true, your description is still inaccurate

lib/libc/string/mempcpy.c
40	`sizeof(char) == sizeof(signed char) == sizeof(unsigned char) == 1` by definition
sys/libkern/strncat.c
57	Comma operator? Really?

gfunni234_gmail.com updated this revision to Diff 94138.Aug 24 2021, 7:37 PM

gfunni234_gmail.com edited the summary of this revision. (Show Details)Aug 24 2021, 7:46 PM

gfunni234_gmail.com marked an inline comment as done.Aug 24 2021, 7:50 PM

gfunni234_gmail.com marked an inline comment as done.

Addressed everything else

[libc][NFC] Add restrict, pure, and malloc_like qualifiers to string functions
Needs ReviewPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 94005

include/string.h

lib/libc/string/memccpy.3

lib/libc/string/mempcpy.c

lib/libc/string/strtok.3

lib/libc/string/strtok.c

lib/libc/string/strxfrm.3

sys/libkern/strdup.c

sys/libkern/strlcat.c

sys/libkern/strncat.c

sys/sys/libkern.h

[libc][NFC] Add restrict, pure, and malloc_like qualifiers to string functionsNeeds ReviewPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 94005

include/string.h

lib/libc/string/memccpy.3

lib/libc/string/mempcpy.c

lib/libc/string/strtok.3

lib/libc/string/strtok.c

lib/libc/string/strxfrm.3

sys/libkern/strdup.c

sys/libkern/strlcat.c

sys/libkern/strncat.c

sys/sys/libkern.h

[libc][NFC] Add restrict, pure, and malloc_like qualifiers to string functions
Needs ReviewPublic
Actions

Revision Contents
Changeset List