Page MenuHomeFreeBSD

lib/libc/amd64/string: add baseline implementation of stpcpy.S
ClosedPublic

Authored by fuz on Aug 7 2023, 5:57 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, May 1, 5:44 AM
Unknown Object (File)
Fri, Apr 26, 1:23 AM
Unknown Object (File)
Tue, Apr 23, 2:18 PM
Unknown Object (File)
Tue, Apr 23, 4:32 AM
Unknown Object (File)
Sat, Apr 6, 11:37 PM
Unknown Object (File)
Sat, Apr 6, 10:13 PM
Unknown Object (File)
Sat, Apr 6, 8:56 PM
Unknown Object (File)
Mar 13 2024, 4:09 AM

Details

Summary

This commit adds a baseline implementation of stpcpy(3) for amd64.
It performs quite well in comparison to the previous scalar implementation
as well as agains bionic and glibc (though glibc is faster for very long
strings).

Fiddle with the Makefile to also have strcpy(3) call into the optimised
stpcpy(3) code. Extend the strcpy(3) test case from NetBSD's test suite
to cover longer strings, which was needed to catch some bugs. Also make
it so the test case can be executed on a custom stprcpy() instead of
hardcoding the one shipped in libc.

Document the new kernel in simd(7).

As per previous discussion in D40693, I've left out the Foundation
copyright in the NetBSD test suite bits. @jrm @imp, please let me know
if this is correct. For stpcpy.S, I added the copyright as @mjg
previously indicated that he is fine with this.

Benchmarks available online:

os: FreeBSD
arch: amd64
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
        │ stpcpy_baseline.out │           stpcpy_bionic.out           │           stpcpy_scalar.out           │
        │       sec/op        │    sec/op      vs base                │    sec/op     vs base                 │
Short             77.93µ ± 2%   100.12µ ±  0%  +28.46% (p=0.000 n=20)    80.30µ ± 1%    +3.04% (p=0.000 n=20)
Mid               13.01µ ± 1%    25.29µ ±  1%  +94.46% (p=0.000 n=20)    36.41µ ± 0%  +179.96% (p=0.000 n=20)
Long              3.032µ ± 0%    3.084µ ± 12%        ~ (p=0.289 n=20)   33.099µ ± 0%  +991.70% (p=0.000 n=20)
geomean           14.54µ         19.84µ        +36.45%                   45.91µ       +215.79%

        │ stpcpy_baseline.out │           stpcpy_bionic.out            │          stpcpy_scalar.out           │
        │         B/s         │      B/s        vs base                │     B/s       vs base                │
Short            1.494Gi ± 2%    1.163Gi ±  0%  -22.16% (p=0.000 n=20)   1.450Gi ± 1%   -2.95% (p=0.000 n=20)
Mid              8.951Gi ± 1%    4.603Gi ±  1%  -48.57% (p=0.000 n=20)   3.197Gi ± 0%  -64.28% (p=0.000 n=20)
Long            38.397Gi ± 0%   37.754Gi ± 13%        ~ (p=0.289 n=20)   3.517Gi ± 0%  -90.84% (p=0.000 n=20)
geomean          8.007Gi         5.868Gi        -26.71%                  2.536Gi       -68.33%

os: Linux
arch: x86_64
cpu:
        │ stpcpy_glibc.out │
        │      sec/op      │
Short          93.23µ ± 0%
Mid            16.61µ ± 1%
Long           2.623µ ± 0%
geomean        15.96µ

        │ stpcpy_glibc.out │
        │       B/s        │
Short         1.249Gi ± 0%
Mid           7.008Gi ± 1%
Long          44.38Gi ± 0%
geomean       7.296Gi

Sponsored by: The FreeBSD Foundation

Test Plan

passes test suite extended to check stpcpy more thoroughly.

Diff Detail

Repository
rG FreeBSD src repository
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

fuz requested review of this revision.Aug 7 2023, 5:57 PM

i was fine with strlen.S as the file was mine. stpcpy.S is not, I don't have a single line of code there. You should probably ask @emaste how to handle this. I *suspect* some form of "portions of this file blah blah" would do it.

lib/libc/amd64/string/stpcpy.S: fix misguided literal escape

I had put this in after testing, seems like it is not supported after all.

Yeah you are right. I may have confused this with memcmp(3), another kernel I have to submit. So same thing as with the NetBSD test suite bits. @emaste @jmd, please let me know how to proceed here.

contrib/netbsd-tests/lib/libc/string/t_strcpy.c
5–8 ↗(On Diff #125669)

I'd defer to Foundation members, but I don't know if this should be contributed back to NetBSD with this advertising note.

lib/libc/amd64/string/stpcpy.S
22 ↗(On Diff #125669)

While it is not too late, can we change the user interface for ARCHFUNC()? I want the level which specifies position in the table, to be the explicit argument for the macro. As I understand, right now the level is determined by position of ARCHFUNC in the ARCHFUNCS block.

176 ↗(On Diff #125669)

You are not going to repeat this line on each occurence of the TZCNT instruction, right?
I suggest to add a note about TZCNT/BSF, and perhaps similar note about LZCNT, to archlevel.c instead.

lib/libc/amd64/string/stpcpy.S
22 ↗(On Diff #125669)

I had originally planned to do it that way but didn't find a nice solution. I have some ideas as to how to achieve this by now and will report back if I manage to make it work. As I envision it, this would likely not change the usage of these macros.

There are two ideas:

  1. make ARCHFUNC(foo, level) advance to the right offset from foo_funcs before emitting the address. This would error out if the archlevels are given in wrong order
  2. make ARCHFUNCS(foo) generate a table of .quad entries like .quad .Lfoo_level_offset with .Lfoo_level_offset set to 0 initially. Then ARCHFUNC(foo, level) boils down to .equ .Lfoo_level_offset, foo_level-foo_funcs.

Which one do you prefer?

176 ↗(On Diff #125669)

You requested that I add this comment. Note that this idiom is independent of the arch dispatch framework and may be used in other parts of FreeBSD as well. For example, I briefly considered implementing ffs(3) with tzcnt before settling on using the more portable __builtin_ctz functions in D40730. These functions are not governed by the arch dispatch framework.

Are there other parts of the Intel Software Development Manual you would like me to quote to make the code easier to understand? E.g. I have already noted that flag output of variable shifts is unreliable as the flags remain unmodified for zero shifts, so a seemingly superfluous extra test (or in this case, bsf) instruction is required to find if the result is nonzero. I could also mention that I use test instead of bsf in the hot loop as test fuses with conditional jumps, whereas bsf does not.

contrib/netbsd-tests/lib/libc/string/t_strcpy.c
5–8 ↗(On Diff #125669)

I think it should. Of course, they are free to remove it, since it's public domain, but they likely would just leave it in place since there's a substantial amount added.

lib/libc/amd64/string/stpcpy.S
4 ↗(On Diff #125669)

I think we want to leave this off. This file was previously public domain, and while it's OK legally, the project generally leaves things in the public domain in the public domain to avoid hurt feelings and other complications that it has brought about in the past.

Especially with no license statement, and the original with the explicit public domain. This would be my preferred choice and advice.

However, since it is much closer to a rewrite (it at least doubles the size of the and adds interesting functionality). So, if the foundation really wants to copyright it and license it, please do it like:

/*-
 * Copyright (c) 2023, The FreeBSD Foundation
 *
 * SPDX-License-Expression: BSD-2-Clause
 *
 * Portions of this software were developed by Robert Clausecker
 * <fuz@FreeBSD.org> under sponsorship from the FreeBSD Foundation.
 *
 * Adapted from:
 * Adapted by Guillaume Morin <guillaume@morinfr.org> from strcpy.S
 * written by J.T. Conklin <jtc@acorntoolworks.com>
 *
 * originally dedicated to the public domain
 */

or something similar to keep it clear that you are licensing this under BSD-2-Clause. You mentioned mjg, but he had nothing to do with stpcpy.S: it looks like alan cox brought it in and gnn tweaked it a bit. Did he write bits of what you've added?

lib/libc/amd64/string/stpcpy.S
4 ↗(On Diff #125669)

I'd say if it's a trivial change to a public-domain or otherwise atypically licensed file the FF would be fine just leaving license header changes out altogether. However, as @imp says this is a significant new work and I think @imp's suggestion just above is a good one. Otherwise the new routine probably needs to go in a new file.

lib/libc/amd64/string/stpcpy.S
4 ↗(On Diff #125669)
/*-
 * Copyright (c) 2023, The FreeBSD Foundation
 *
 * SPDX-License-Expression: BSD-2-Clause
 *
 * Portions of this software were developed by Robert Clausecker
 * <fuz@FreeBSD.org> under sponsorship from the FreeBSD Foundation.
 *
 * Adapted from NetBSD's common/lib/libc/arch/x86_64/string/strcpy.S
 * written by J.T. Conklin <jtc@acorntoolworks.com> and
 * adapted by Guillaume Morin <guillaume@morinfr.org> to implement stpcpy
 * that was originally dedicated to the public domain
 */

might be a bit clearer as to its origin...

I added a slight tweak in the wording, but anything that makes it clear that we originally grabbed this from NetBSD, adapted it to stpcpy and then added a lot of new code. Feel free to tweak further, but I think that gives all the proper credit, which is where the vast majority of the hurt feelings come from in these cases.

lib/libc/amd64/string/stpcpy.S
22 ↗(On Diff #125669)

Perhaps either of variants can be make working, but IMO it is too complicated.

Why not add e.g. ARCHFUNC_Vx macros instead of single ARCHFUNC(), and only check that they are used in the right order, without holes?

176 ↗(On Diff #125669)

Any non-obvious places, or any places where your intent might be lost if somebody modifies the code without knowing the intent. All cases you listed in the second paragraph of your response, IMO deserve the explicit explanation.

there are weird whitespace issues, past that lgtm from asm standpoint

Whitespace issues have been confirmed by @mjg to be a Phab issue. Can I get this accepted so I can commit?

  • lib/libc/amd64/string/stpcpy.S: rework copyright header

Following @imp's recommendation.

fuz marked 5 inline comments as done.Aug 20 2023, 9:37 PM

@mjg @kib Is there anything preventing this DR from getting accepted? I would like to get this and D41442 in for 14.0-RELEASE.

lib/libc/amd64/string/stpcpy.S
22 ↗(On Diff #125669)

I would like to address changes to the dispatch framework in a separate DR as they are not related to this function.

lib/libc/amd64/string/stpcpy.S
22 ↗(On Diff #125669)

Definitely, I did not proposed to do it together with any specific function update.

Then what's preventing this DR from being accepted? Is there someone else who needs to be added as a reviewer?

I thought kib had possibly unresolved concerns.

To spell it out, you can consider it accepted from my end.

If he does not have any further comments, you can commit.

This revision is now accepted and ready to land.Aug 20 2023, 10:41 PM
ngie requested changes to this revision.Aug 21 2023, 12:51 AM

Does this change to contrib/netbsd-tests chase an upstream change, or is it changing the tests to accommodate this change?
I'm trying to reduce the amount of unnecessary change upstream so I can drop a new version of contrib/netbsd-tests, while also upstreaming all valid changes to NetBSD and it's become a bit difficult in some areas where folks have added FreeBSD-specific tests to contrib/netbsd-tests (which was not my original intent).
(FWIW I'm not saying this change is bad -- I'm just trying to stop this from being automatically seen as mergeable)

This revision now requires changes to proceed.Aug 21 2023, 12:51 AM

tests can be moved to a different review which will land at a different date if that helps

this is definitely not a blocker for the routine

In D41349#946047, @mjg wrote:

tests can be moved to a different review which will land at a different date if that helps

this is definitely not a blocker for the routine

Could the tests change please be done independent of the non-tests code change?

@ngie I have extended the unit test as I needed larger test cases to trigger some bugs in previous versions of my code. The other bit about being able to use dlopen to load a custom function to test was added by the example of some of the other tests in the NetBSD test suite. Neither is related to an upstream change and neither should be impossible to contribute back to upstream. Will remove the unit test from the changeset.

  • Revert "contrib/netbsd-tests/lib/libc/string/t_strcpy.c: extend test cases"
  • Revert "contrib/netbsd-tests/lib/libc/string/t_strcpy.c: permit testing an external strcpy implementation"

This removes the test suite bits from the changeset as requested
by @ngie. Is the approval to commit still valid?

Revert missed update to copyright header.

it's all fine with me

This revision is now accepted and ready to land.Aug 21 2023, 6:19 PM