A new syscall is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals.
Main purpose of the fast sigblock feature is to allow rtld to not issue two sigprocmask(2) syscalls for each symbol binding operation in single-threaded processes. Rtld needs to block signals as part of locking to ensure signal safety of the bind process, because signal handlers might need to lazily resolve symbol references. For multi-threaded processes, libthr intercepts all signals handlers which allows to delay signal delivery by manually raising the delivered signal upon unlock. This is not ideal, and I did not wanted to make libc intercept signal handlers too.
There is some rudimentary use of the fast sigblock in libthr, but it is not related to the critical sections, which are still use signal raising on exit. This is because critical sections have to handle more issues than only signals.
Benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half.
The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to `^C` (but killable by SIGKILL).
Benchmark
```
stock
root@r-freeb43:/data/work # ( truss ./hello > /dev/null ) |& wc -l
63
root@r-freeb43:/data/work # ./syscall_timing getppid
Clock resolution: 0.000000001
test loop time iterations periteration
getppid 0 1.016000291 2897051 0.000000350
getppid 1 1.009834382 2877363 0.000000350
getppid 2 1.000017563 2848788 0.000000351
getppid 3 1.027104944 2927389 0.000000350
getppid 4 1.017986489 2901363 0.000000350
getppid 5 1.010239487 2879487 0.000000350
getppid 6 1.004737854 2861549 0.000000351
getppid 7 1.016996867 2902481 0.000000350
getppid 8 1.000985560 2854247 0.000000350
getppid 9 1.023981476 2921559 0.000000350
buildworld -s -j 32
2950.74 real 70371.54 user 2495.92 sys
3033.48 real 70085.36 user 2482.85 sys
2985.57 real 70240.54 user 2495.64 sys
2927.52 real 70204.11 user 2486.19 sys
3007.15 real 70140.09 user 2489.78 sys
fast sigblock
root@r-freeb43:/tmp # ( truss ./hello > /dev/null ) | & wc -l
37
root@r-freeb43:/tmp # ./syscall_timing getppid
Clock resolution: 0.000000001
test loop time iterations periteration
getppid 0 1.000028797 2888175 0.000000346
getppid 1 1.050218490 3031970 0.000000346
getppid 2 1.027986764 2967578 0.000000346
getppid 3 1.046828208 3021258 0.000000346
getppid 4 1.000027060 2886588 0.000000346
getppid 5 1.027107249 2964878 0.000000346
getppid 6 1.021487553 2950501 0.000000346
getppid 7 1.001574229 2890893 0.000000346
getppid 8 1.009152278 2914871 0.000000346
getppid 9 1.042734749 3012841 0.000000346
buildworld -s -j 32
2998.44 real 70147.80 user 2478.18 sys
2935.23 real 70159.03 user 2490.75 sys
2961.46 real 70072.53 user 2501.04 sys
2939.97 real 70115.35 user 2504.80 sys
2978.95 real 70006.44 user 2509.96 sys
```