Most applications these days use sysctlbyname(), but some still use sysctl()
for a few reasons:
- legacy/BSD-portable code
- performance (sysctlbyname requires multiple system calls)
- per-resource sysctl trees (e.g., CTL_KERN.KERN_PROC.KERN_PROC_PID)
To make libcap_sysctl a better drop-in replacement for the
non-capsicumized functions, I added cap_sysctl(3) and cap_sysctlnametomib().
I also revised the limit interface to accommodate the new interfaces, and to
avoid direct manipulation of nvlists.
cap_sysctl(3) and cap_sysctlnametomib(3) are subject to the same limits as
named sysctls. Limits can be specified by name or by MIB. When a limit is
specified by name, we automatically resolve the name to a MIB and update
the limit. This is made somewhat awkward by the fact that the "limits" nvlist
in the service command handler is const; that is, we cannot modify the service
limits in the sysctlnametomib command handler.
The old code used a very simple format for limit nvlists: the sysctl name was a key
and the allowed operations was stored as a number value. Obviously this doesn't
work for MIBs, so I extended the nvlist format somewhat. Now, the set of limits is
an nvlist of nvlists. Each sub-nvlist contains one or both of "name" and "mib", and
also contains the allowed operations ("operation"). See the updated man page for
the new limit interface.
While testing I found that cap_sysctl is much slower than direct system calls.
A loop which just reads a sysctl shows a 15x increase in runtime. This overhead
is a combination of IPC overhead, nvlist construction, and nvlist serialization; I
did not test with any limits set, but that would have made it worse. I
suspect that the only true solution is to provide a "sysctlfd" abstraction which
allows applications to reference subtrees of the sysctl tree. However, cap_sysctl
is still useful for sandboxing existing applications without requiring non-trivial changes.