diff --git a/share/man/man7/security.7 b/share/man/man7/security.7 index f87833ece112..e3f11765d083 100644 --- a/share/man/man7/security.7 +++ b/share/man/man7/security.7 @@ -1,1115 +1,1118 @@ .\" Copyright (C) 1998 Matthew Dillon. All rights reserved. .\" Copyright (c) 2019 The FreeBSD Foundation, Inc. .\" .\" Parts of this documentation were written by .\" Konstantin Belousov under sponsorship .\" from the FreeBSD Foundation. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd September 29, 2022 +.Dd March 30, 2023 .Dt SECURITY 7 .Os .Sh NAME .Nm security .Nd introduction to security under FreeBSD .Sh DESCRIPTION Security is a function that begins and ends with the system administrator. While all .Bx multi-user systems have some inherent security, the job of building and maintaining additional security mechanisms to keep users .Dq honest is probably one of the single largest undertakings of the sysadmin. Machines are only as secure as you make them, and security concerns are ever competing with the human necessity for convenience. .Ux systems, in general, are capable of running a huge number of simultaneous processes and many of these processes operate as servers \(em meaning that external entities can connect and talk to them. As yesterday's mini-computers and mainframes become today's desktops, and as computers become networked and internetworked, security becomes an ever bigger issue. .Pp Security is best implemented through a layered onion approach. In a nutshell, what you want to do is to create as many layers of security as are convenient and then carefully monitor the system for intrusions. .Pp System security also pertains to dealing with various forms of attacks, including attacks that attempt to crash or otherwise make a system unusable but do not attempt to break root. Security concerns can be split up into several categories: .Bl -enum -offset indent .It Denial of Service attacks (DoS) .It User account compromises .It Root compromise through accessible servers .It Root compromise via user accounts .It Backdoor creation .El .Pp A denial of service attack is an action that deprives the machine of needed resources. Typically, DoS attacks are brute-force mechanisms that attempt to crash or otherwise make a machine unusable by overwhelming its servers or network stack. Some DoS attacks try to take advantages of bugs in the networking stack to crash a machine with a single packet. The latter can only be fixed by applying a bug fix to the kernel. Attacks on servers can often be fixed by properly specifying options to limit the load the servers incur on the system under adverse conditions. Brute-force network attacks are harder to deal with. A spoofed-packet attack, for example, is nearly impossible to stop short of cutting your system off from the Internet. It may not be able to take your machine down, but it can fill up your Internet pipe. .Pp A user account compromise is even more common than a DoS attack. Some sysadmins still run .Nm telnetd and .Xr ftpd 8 servers on their machines. These servers, by default, do not operate over encrypted connections. The result is that if you have any moderate-sized user base, one or more of your users logging into your system from a remote location (which is the most common and convenient way to log in to a system) will have his or her password sniffed. The attentive system administrator will analyze his remote access logs looking for suspicious source addresses even for successful logins. .Pp One must always assume that once an attacker has access to a user account, the attacker can break root. However, the reality is that in a well secured and maintained system, access to a user account does not necessarily give the attacker access to root. The distinction is important because without access to root the attacker cannot generally hide his tracks and may, at best, be able to do nothing more than mess with the user's files or crash the machine. User account compromises are very common because users tend not to take the precautions that sysadmins take. .Pp System administrators must keep in mind that there are potentially many ways to break root on a machine. The attacker may know the root password, the attacker may find a bug in a root-run server and be able to break root over a network connection to that server, or the attacker may know of a bug in an SUID-root program that allows the attacker to break root once he has broken into a user's account. If an attacker has found a way to break root on a machine, the attacker may not have a need to install a backdoor. Many of the root holes found and closed to date involve a considerable amount of work by the attacker to clean up after himself, so most attackers do install backdoors. This gives you a convenient way to detect the attacker. Making it impossible for an attacker to install a backdoor may actually be detrimental to your security because it will not close off the hole the attacker used to break in originally. .Pp Security remedies should always be implemented with a multi-layered .Dq onion peel approach and can be categorized as follows: .Bl -enum -offset indent .It Securing root and staff accounts .It Securing root \(em root-run servers and SUID/SGID binaries .It Securing user accounts .It Securing the password file .It Securing the kernel core, raw devices, and file systems .It Quick detection of inappropriate changes made to the system .It Paranoia .El .Sh SECURING THE ROOT ACCOUNT AND SECURING STAFF ACCOUNTS Do not bother securing staff accounts if you have not secured the root account. Most systems have a password assigned to the root account. The first thing you do is assume that the password is .Em always compromised. This does not mean that you should remove the password. The password is almost always necessary for console access to the machine. What it does mean is that you should not make it possible to use the password outside of the console or possibly even with a .Xr su 1 utility. For example, make sure that your PTYs are specified as being .Dq Li insecure in the .Pa /etc/ttys file so that direct root logins via .Xr telnet 1 are disallowed. If using other login services such as .Xr sshd 8 , make sure that direct root logins are disabled there as well. Consider every access method \(em services such as .Xr ftp 1 often fall through the cracks. Direct root logins should only be allowed via the system console. .Pp Of course, as a sysadmin you have to be able to get to root, so we open up a few holes. But we make sure these holes require additional password verification to operate. One way to make root accessible is to add appropriate staff accounts to the .Dq Li wheel group (in .Pa /etc/group ) . The staff members placed in the .Li wheel group are allowed to .Xr su 1 to root. You should never give staff members native .Li wheel access by putting them in the .Li wheel group in their password entry. Staff accounts should be placed in a .Dq Li staff group, and then added to the .Li wheel group via the .Pa /etc/group file. Only those staff members who actually need to have root access should be placed in the .Li wheel group. It is also possible, when using an authentication method such as Kerberos, to use Kerberos's .Pa .k5login file in the root account to allow a .Xr ksu 1 to root without having to place anyone at all in the .Li wheel group. This may be the better solution since the .Li wheel mechanism still allows an intruder to break root if the intruder has gotten hold of your password file and can break into a staff account. While having the .Li wheel mechanism is better than having nothing at all, it is not necessarily the safest option. .Pp An indirect way to secure the root account is to secure your staff accounts by using an alternative login access method and *'ing out the crypted password for the staff accounts. This way an intruder may be able to steal the password file but will not be able to break into any staff accounts or root, even if root has a crypted password associated with it (assuming, of course, that you have limited root access to the console). Staff members get into their staff accounts through a secure login mechanism such as .Xr kerberos 8 or .Xr ssh 1 using a private/public key pair. When you use something like Kerberos you generally must secure the machines which run the Kerberos servers and your desktop workstation. When you use a public/private key pair with SSH, you must generally secure the machine you are logging in .Em from (typically your workstation), but you can also add an additional layer of protection to the key pair by password protecting the keypair when you create it with .Xr ssh-keygen 1 . Being able to star-out the passwords for staff accounts also guarantees that staff members can only log in through secure access methods that you have set up. You can thus force all staff members to use secure, encrypted connections for all their sessions which closes an important hole used by many intruders: that of sniffing the network from an unrelated, less secure machine. .Pp The more indirect security mechanisms also assume that you are logging in from a more restrictive server to a less restrictive server. For example, if your main box is running all sorts of servers, your workstation should not be running any. In order for your workstation to be reasonably secure you should run as few servers as possible, up to and including no servers at all, and you should run a password-protected screen blanker. Of course, given physical access to a workstation, an attacker can break any sort of security you put on it. This is definitely a problem that you should consider but you should also consider the fact that the vast majority of break-ins occur remotely, over a network, from people who do not have physical access to your workstation or servers. .Pp Using something like Kerberos also gives you the ability to disable or change the password for a staff account in one place and have it immediately affect all the machines the staff member may have an account on. If a staff member's account gets compromised, the ability to instantly change his password on all machines should not be underrated. With discrete passwords, changing a password on N machines can be a mess. You can also impose re-passwording restrictions with Kerberos: not only can a Kerberos ticket be made to timeout after a while, but the Kerberos system can require that the user choose a new password after a certain period of time (say, once a month). .Sh SECURING ROOT \(em ROOT-RUN SERVERS AND SUID/SGID BINARIES The prudent sysadmin only runs the servers he needs to, no more, no less. Be aware that third party servers are often the most bug-prone. For example, running an old version of .Xr imapd 8 or .Xr popper 8 Pq Pa ports/mail/popper is like giving a universal root ticket out to the entire world. Never run a server that you have not checked out carefully. Many servers do not need to be run as root. For example, the .Xr talkd 8 , .Xr comsat 8 , and .Xr fingerd 8 daemons can be run in special user .Dq sandboxes . A sandbox is not perfect unless you go to a large amount of trouble, but the onion approach to security still stands: if someone is able to break in through a server running in a sandbox, they still have to break out of the sandbox. The more layers the attacker must break through, the lower the likelihood of his success. Root holes have historically been found in virtually every server ever run as root, including basic system servers. If you are running a machine through which people only log in via .Xr sshd 8 and never log in via .Nm telnetd then turn off this service! .Pp .Fx now defaults to running .Xr talkd 8 , .Xr comsat 8 , and .Xr fingerd 8 in a sandbox. Depending on whether you are installing a new system or upgrading an existing system, the special user accounts used by these sandboxes may not be installed. The prudent sysadmin would research and implement sandboxes for servers whenever possible. .Pp There are a number of other servers that typically do not run in sandboxes: .Xr sendmail 8 , .Xr popper 8 , .Xr imapd 8 , .Xr ftpd 8 , and others. There are alternatives to some of these, but installing them may require more work than you are willing to put (the convenience factor strikes again). You may have to run these servers as root and rely on other mechanisms to detect break-ins that might occur through them. .Pp The other big potential root hole in a system are the SUID-root and SGID binaries installed on the system. Most of these binaries, such as .Xr su 1 , reside in .Pa /bin , /sbin , /usr/bin , or .Pa /usr/sbin . While nothing is 100% safe, the system-default SUID and SGID binaries can be considered reasonably safe. Still, root holes are occasionally found in these binaries. A root hole was found in Xlib in 1998 that made .Xr xterm 1 Pq Pa ports/x11/xterm (which is typically SUID) vulnerable. It is better to be safe than sorry and the prudent sysadmin will restrict SUID binaries that only staff should run to a special group that only staff can access, and get rid of .Pq Dq Li "chmod 000" any SUID binaries that nobody uses. A server with no display generally does not need an .Xr xterm 1 Pq Pa ports/x11/xterm binary. SGID binaries can be almost as dangerous. If an intruder can break an SGID-kmem binary the intruder might be able to read .Pa /dev/kmem and thus read the crypted password file, potentially compromising any passworded account. Alternatively an intruder who breaks group .Dq Li kmem can monitor keystrokes sent through PTYs, including PTYs used by users who log in through secure methods. An intruder that breaks the .Dq Li tty group can write to almost any user's TTY. If a user is running a terminal program or emulator with a keyboard-simulation feature, the intruder can potentially generate a data stream that causes the user's terminal to echo a command, which is then run as that user. .Sh SECURING USER ACCOUNTS User accounts are usually the most difficult to secure. While you can impose draconian access restrictions on your staff and *-out their passwords, you may not be able to do so with any general user accounts you might have. If you do have sufficient control then you may win out and be able to secure the user accounts properly. If not, you simply have to be more vigilant in your monitoring of those accounts. Use of SSH and Kerberos for user accounts is more problematic due to the extra administration and technical support required, but still a very good solution compared to a crypted password file. .Sh SECURING THE PASSWORD FILE The only sure fire way is to *-out as many passwords as you can and use SSH or Kerberos for access to those accounts. Even though the crypted password file .Pq Pa /etc/spwd.db can only be read by root, it may be possible for an intruder to obtain read access to that file even if the attacker cannot obtain root-write access. .Pp Your security scripts should always check for and report changes to the password file (see .Sx CHECKING FILE INTEGRITY below). .Sh SECURING THE KERNEL CORE, RAW DEVICES, AND FILE SYSTEMS If an attacker breaks root he can do just about anything, but there are certain conveniences. For example, most modern kernels have a packet sniffing device driver built in. Under .Fx it is called the .Xr bpf 4 device. An intruder will commonly attempt to run a packet sniffer on a compromised machine. You do not need to give the intruder the capability and most systems should not have the .Xr bpf 4 device compiled in. .Pp But even if you turn off the .Xr bpf 4 device, you still have .Pa /dev/mem and .Pa /dev/kmem to worry about. For that matter, the intruder can still write to raw disk devices. Also, there is another kernel feature called the module loader, .Xr kldload 8 . An enterprising intruder can use a KLD module to install his own .Xr bpf 4 device or other sniffing device on a running kernel. To avoid these problems you have to run the kernel at a higher security level, at least level 1. The security level can be set with a .Xr sysctl 8 on the .Va kern.securelevel variable. Once you have set the security level to 1, write access to raw devices will be denied and special .Xr chflags 1 flags, such as .Cm schg , will be enforced. You must also ensure that the .Cm schg flag is set on critical startup binaries, directories, and script files \(em everything that gets run up to the point where the security level is set. This might be overdoing it, and upgrading the system is much more difficult when you operate at a higher security level. You may compromise and run the system at a higher security level but not set the .Cm schg flag for every system file and directory under the sun. Another possibility is to simply mount .Pa / and .Pa /usr read-only. It should be noted that being too draconian in what you attempt to protect may prevent the all-important detection of an intrusion. .Pp The kernel runs with five different security levels. Any super-user process can raise the level, but no process can lower it. The security levels are: .Bl -tag -width flag .It Ic -1 Permanently insecure mode \- always run the system in insecure mode. This is the default initial value. .It Ic 0 Insecure mode \- immutable and append-only flags may be turned off. All devices may be read or written subject to their permissions. .It Ic 1 Secure mode \- the system immutable and system append-only flags may not be turned off; disks for mounted file systems, .Pa /dev/mem and .Pa /dev/kmem may not be opened for writing; .Pa /dev/io (if your platform has it) may not be opened at all; kernel modules (see .Xr kld 4 ) may not be loaded or unloaded. The kernel debugger may not be entered using the .Va debug.kdb.enter -sysctl. +sysctl unless a +.Xr MAC 9 +policy grants access, for example using +.Xr mac_ddb 4 . A panic or trap cannot be forced using the .Va debug.kdb.panic , .Va debug.kdb.panic_str and other sysctl's. .It Ic 2 Highly secure mode \- same as secure mode, plus disks may not be opened for writing (except by .Xr mount 2 ) whether mounted or not. This level precludes tampering with file systems by unmounting them, but also inhibits running .Xr newfs 8 while the system is multi-user. .Pp In addition, kernel time changes are restricted to less than or equal to one second. Attempts to change the time by more than this will log the message .Dq Time adjustment clamped to +1 second . .It Ic 3 Network secure mode \- same as highly secure mode, plus IP packet filter rules (see .Xr ipfw 8 , .Xr ipfirewall 4 and .Xr pfctl 8 ) cannot be changed and .Xr dummynet 4 or .Xr pf 4 configuration cannot be adjusted. .El .Pp The security level can be configured with variables documented in .Xr rc.conf 5 . .Sh CHECKING FILE INTEGRITY: BINARIES, CONFIG FILES, ETC When it comes right down to it, you can only protect your core system configuration and control files so much before the convenience factor rears its ugly head. For example, using .Xr chflags 1 to set the .Cm schg bit on most of the files in .Pa / and .Pa /usr is probably counterproductive because while it may protect the files, it also closes a detection window. The last layer of your security onion is perhaps the most important \(em detection. The rest of your security is pretty much useless (or, worse, presents you with a false sense of safety) if you cannot detect potential incursions. Half the job of the onion is to slow down the attacker rather than stop him in order to give the detection layer a chance to catch him in the act. .Pp The best way to detect an incursion is to look for modified, missing, or unexpected files. The best way to look for modified files is from another (often centralized) limited-access system. Writing your security scripts on the extra-secure limited-access system makes them mostly invisible to potential attackers, and this is important. In order to take maximum advantage you generally have to give the limited-access box significant access to the other machines in the business, usually either by doing a read-only NFS export of the other machines to the limited-access box, or by setting up SSH keypairs to allow the limit-access box to SSH to the other machines. Except for its network traffic, NFS is the least visible method \(em allowing you to monitor the file systems on each client box virtually undetected. If your limited-access server is connected to the client boxes through a switch, the NFS method is often the better choice. If your limited-access server is connected to the client boxes through a hub or through several layers of routing, the NFS method may be too insecure (network-wise) and using SSH may be the better choice even with the audit-trail tracks that SSH lays. .Pp Once you give a limit-access box at least read access to the client systems it is supposed to monitor, you must write scripts to do the actual monitoring. Given an NFS mount, you can write scripts out of simple system utilities such as .Xr find 1 and .Xr md5 1 . It is best to physically .Xr md5 1 the client-box files boxes at least once a day, and to test control files such as those found in .Pa /etc and .Pa /usr/local/etc even more often. When mismatches are found relative to the base MD5 information the limited-access machine knows is valid, it should scream at a sysadmin to go check it out. A good security script will also check for inappropriate SUID binaries and for new or deleted files on system partitions such as .Pa / and .Pa /usr . .Pp When using SSH rather than NFS, writing the security script is much more difficult. You essentially have to .Xr scp 1 the scripts to the client box in order to run them, making them visible, and for safety you also need to .Xr scp 1 the binaries (such as .Xr find 1 ) that those scripts use. The .Xr sshd 8 daemon on the client box may already be compromised. All in all, using SSH may be necessary when running over unsecure links, but it is also a lot harder to deal with. .Pp A good security script will also check for changes to user and staff members access configuration files: .Pa .rhosts , .shosts , .ssh/authorized_keys and so forth, files that might fall outside the purview of the MD5 check. .Pp If you have a huge amount of user disk space it may take too long to run through every file on those partitions. In this case, setting mount flags to disallow SUID binaries on those partitions is a good idea. The .Cm nosuid option (see .Xr mount 8 ) is what you want to look into. I would scan them anyway at least once a week, since the object of this layer is to detect a break-in whether or not the break-in is effective. .Pp Process accounting (see .Xr accton 8 ) is a relatively low-overhead feature of the operating system which I recommend using as a post-break-in evaluation mechanism. It is especially useful in tracking down how an intruder has actually broken into a system, assuming the file is still intact after the break-in occurs. .Pp Finally, security scripts should process the log files and the logs themselves should be generated in as secure a manner as possible \(em remote syslog can be very useful. An intruder tries to cover his tracks, and log files are critical to the sysadmin trying to track down the time and method of the initial break-in. One way to keep a permanent record of the log files is to run the system console to a serial port and collect the information on a continuing basis through a secure machine monitoring the consoles. .Sh PARANOIA A little paranoia never hurts. As a rule, a sysadmin can add any number of security features as long as they do not affect convenience, and can add security features that do affect convenience with some added thought. Even more importantly, a security administrator should mix it up a bit \(em if you use recommendations such as those given by this manual page verbatim, you give away your methodologies to the prospective attacker who also has access to this manual page. .Sh SPECIAL SECTION ON DoS ATTACKS This section covers Denial of Service attacks. A DoS attack is typically a packet attack. While there is not much you can do about modern spoofed packet attacks that saturate your network, you can generally limit the damage by ensuring that the attacks cannot take down your servers. .Bl -enum -offset indent .It Limiting server forks .It Limiting springboard attacks (ICMP response attacks, ping broadcast, etc.) .It Kernel Route Cache .El .Pp A common DoS attack is against a forking server that attempts to cause the server to eat processes, file descriptors, and memory until the machine dies. The .Xr inetd 8 server has several options to limit this sort of attack. It should be noted that while it is possible to prevent a machine from going down it is not generally possible to prevent a service from being disrupted by the attack. Read the .Xr inetd 8 manual page carefully and pay specific attention to the .Fl c , C , and .Fl R options. Note that spoofed-IP attacks will circumvent the .Fl C option to .Xr inetd 8 , so typically a combination of options must be used. Some standalone servers have self-fork-limitation parameters. .Pp The .Xr sendmail 8 daemon has its .Fl OMaxDaemonChildren option which tends to work much better than trying to use .Xr sendmail 8 Ns 's load limiting options due to the load lag. You should specify a .Va MaxDaemonChildren parameter when you start .Xr sendmail 8 high enough to handle your expected load but not so high that the computer cannot handle that number of .Nm sendmail Ns 's without falling on its face. It is also prudent to run .Xr sendmail 8 in .Dq queued mode .Pq Fl ODeliveryMode=queued and to run the daemon .Pq Dq Nm sendmail Fl bd separate from the queue-runs .Pq Dq Nm sendmail Fl q15m . If you still want real-time delivery you can run the queue at a much lower interval, such as .Fl q1m , but be sure to specify a reasonable .Va MaxDaemonChildren option for that .Xr sendmail 8 to prevent cascade failures. .Pp The .Xr syslogd 8 daemon can be attacked directly and it is strongly recommended that you use the .Fl s option whenever possible, and the .Fl a option otherwise. .Pp You should also be fairly careful with connect-back services such as tcpwrapper's reverse-identd, which can be attacked directly. You generally do not want to use the reverse-ident feature of tcpwrappers for this reason. .Pp It is a very good idea to protect internal services from external access by firewalling them off at your border routers. The idea here is to prevent saturation attacks from outside your LAN, not so much to protect internal services from network-based root compromise. Always configure an exclusive firewall, i.e., .So firewall everything .Em except ports A, B, C, D, and M-Z .Sc . This way you can firewall off all of your low ports except for certain specific services such as .Xr talkd 8 , .Xr sendmail 8 , and other internet-accessible services. If you try to configure the firewall the other way \(em as an inclusive or permissive firewall, there is a good chance that you will forget to .Dq close a couple of services or that you will add a new internal service and forget to update the firewall. You can still open up the high-numbered port range on the firewall to allow permissive-like operation without compromising your low ports. Also take note that .Fx allows you to control the range of port numbers used for dynamic binding via the various .Va net.inet.ip.portrange sysctl's .Pq Dq Li "sysctl net.inet.ip.portrange" , which can also ease the complexity of your firewall's configuration. I usually use a normal first/last range of 4000 to 5000, and a hiport range of 49152 to 65535, then block everything under 4000 off in my firewall (except for certain specific internet-accessible ports, of course). .Pp Another common DoS attack is called a springboard attack \(em to attack a server in a manner that causes the server to generate responses which then overload the server, the local network, or some other machine. The most common attack of this nature is the ICMP PING BROADCAST attack. The attacker spoofs ping packets sent to your LAN's broadcast address with the source IP address set to the actual machine they wish to attack. If your border routers are not configured to stomp on ping's to broadcast addresses, your LAN winds up generating sufficient responses to the spoofed source address to saturate the victim, especially when the attacker uses the same trick on several dozen broadcast addresses over several dozen different networks at once. Broadcast attacks of over a hundred and twenty megabits have been measured. A second common springboard attack is against the ICMP error reporting system. By constructing packets that generate ICMP error responses, an attacker can saturate a server's incoming network and cause the server to saturate its outgoing network with ICMP responses. This type of attack can also crash the server by running it out of .Vt mbuf Ns 's , especially if the server cannot drain the ICMP responses it generates fast enough. The .Fx kernel has a new kernel compile option called .Dv ICMP_BANDLIM which limits the effectiveness of these sorts of attacks. The last major class of springboard attacks is related to certain internal .Xr inetd 8 services such as the UDP echo service. An attacker simply spoofs a UDP packet with the source address being server A's echo port, and the destination address being server B's echo port, where server A and B are both on your LAN. The two servers then bounce this one packet back and forth between each other. The attacker can overload both servers and their LANs simply by injecting a few packets in this manner. Similar problems exist with the internal chargen port. A competent sysadmin will turn off all of these .Xr inetd 8 Ns -internal test services. .Sh ACCESS ISSUES WITH KERBEROS AND SSH There are a few issues with both Kerberos and SSH that need to be addressed if you intend to use them. Kerberos5 is an excellent authentication protocol but the kerberized .Xr telnet 1 suck rocks. There are bugs that make them unsuitable for dealing with binary streams. Also, by default Kerberos does not encrypt a session unless you use the .Fl x option. SSH encrypts everything by default. .Pp SSH works quite well in every respect except when it is set up to forward encryption keys. What this means is that if you have a secure workstation holding keys that give you access to the rest of the system, and you .Xr ssh 1 to an unsecure machine, your keys become exposed. The actual keys themselves are not exposed, but .Xr ssh 1 installs a forwarding port for the duration of your login and if an attacker has broken root on the unsecure machine he can utilize that port to use your keys to gain access to any other machine that your keys unlock. .Pp We recommend that you use SSH in combination with Kerberos whenever possible for staff logins. SSH can be compiled with Kerberos support. This reduces your reliance on potentially exposable SSH keys while at the same time protecting passwords via Kerberos. SSH keys should only be used for automated tasks from secure machines (something that Kerberos is unsuited to). We also recommend that you either turn off key-forwarding in the SSH configuration, or that you make use of the .Va from Ns = Ns Ar IP/DOMAIN option that SSH allows in its .Pa authorized_keys file to make the key only usable to entities logging in from specific machines. .Sh KNOBS AND TWEAKS .Fx provides several knobs and tweak handles that make some introspection information access more restricted. Some people consider this as improving system security, so the knobs are briefly listed there, together with controls which enable some mitigations of the hardware state leaks. .Pp Hardware mitigation sysctl knobs described below have been moved under .Pa machdep.mitigations , with backwards-compatibility shims to accept the existing names. A future change will rationalize the sense of the individual sysctls (so that enabled / true always indicates that the mitigation is active). For that reason the previous names remain the canonical way to set the mitigations, and are documented here. Backwards compatibility shims for the interim sysctls under .Pa machdep.mitigations will not be added. .Bl -tag -width security.bsd.unprivileged_proc_debug .It Dv security.bsd.see_other_uids Controls visibility of processes owned by different uid. The knob directly affects the .Dv kern.proc sysctls filtering of data, which results in restricted output from utilities like .Xr ps 1 . .It Dv security.bsd.see_other_gids Same, for processes owned by different gid. .It Dv security.bsd.see_jail_proc Same, for processes belonging to a jail. .It Dv security.bsd.conservative_signals When enabled, unprivileged users are only allowed to send job control and usual termination signals like .Dv SIGKILL , .Dv SIGINT , and .Dv SIGTERM , to the processes executing programs with changed uids. .It Dv security.bsd.unprivileged_proc_debug Controls availability of the process debugging facilities to non-root users. See also .Xr proccontrol 1 mode .Dv trace . .It Dv vm.pmap.pti Tunable, amd64-only. Enables mode of operation of virtual memory system where usermode page tables are sanitized to prevent so-called Meltdown information leak on some Intel CPUs. By default, the system detects whether the CPU needs the workaround, and enables it automatically. See also .Xr proccontrol 1 mode .Dv kpti . .It Dv machdep.mitigations.flush_rsb_ctxsw amd64. Controls Return Stack Buffer flush on context switch, to prevent cross-process ret2spec attacks. Only needed, and only enabled by default, if the machine supports SMEP, otherwise IBRS would do necessary flushing on kernel entry anyway. .It Dv hw.mds_disable amd64 and i386. Controls Microarchitectural Data Sampling hardware information leak mitigation. .It Dv hw.spec_store_bypass_disable amd64 and i386. Controls Speculative Store Bypass hardware information leak mitigation. .It Dv hw.ibrs_disable amd64 and i386. Controls Indirect Branch Restricted Speculation hardware information leak mitigation. .It Dv machdep.syscall_ret_flush_l1d amd64. Controls force-flush of L1D cache on return from syscalls which report errors other than .Ev EEXIST , .Ev EAGAIN , .Ev EXDEV , .Ev ENOENT , .Ev ENOTCONN , and .Ev EINPROGRESS . This is mostly a paranoid setting added to prevent hypothetical exploitation of unknown gadgets for unknown hardware issues. The error codes exclusion list is composed of the most common errors which typically occurs on normal system operation. .It Dv machdep.nmi_flush_l1d_sw amd64. Controls force-flush of L1D cache on NMI; this provides software assist for bhyve mitigation of L1 terminal fault hardware information leak. .It Dv hw.vmm.vmx.l1d_flush amd64. Controls the mitigation of L1 Terminal Fault in bhyve hypervisor. .It Dv vm.pmap.allow_2m_x_ept amd64. Allows the use of superpages for executable mappings under the EPT page table format used by hypervisors on Intel CPUs to map the guest physical address space to machine physical memory. May be disabled to work around a CPU Erratum called Machine Check Error Avoidance on Page Size Change. .It Dv machdep.mitigations.rngds.enable amd64 and i386. Controls mitigation of Special Register Buffer Data Sampling versus optimization of the MCU access. When set to zero, the mitigation is disabled, and the RDSEED and RDRAND instructions do not incur serialization overhead for shared buffer accesses, and do not serialize off-core memory accessses. .It Dv kern.elf32.aslr.enable Controls system-global Address Space Layout Randomization (ASLR) for normal non-PIE (Position Independent Executable) 32-bit ELF binaries. See also the .Xr proccontrol 1 .Dv aslr mode, also affected by the per-image control note flag. .It Dv kern.elf32.aslr.pie_enable Controls system-global Address Space Layout Randomization for position-independent (PIE) 32-bit binaries. .It Dv kern.elf32.aslr.honor_sbrk Makes ASLR less aggressive and more compatible with old binaries relying on the sbrk area. .It Dv kern.elf32.aslr.stack If ASLR is enabled for a binary, a non-zero value enables randomization of the stack. Otherwise, the stack is mapped at a fixed location determined by the process ABI. .It Dv kern.elf64.aslr.enable ASLR control for 64-bit ELF binaries. .It Dv kern.elf64.aslr.pie_enable ASLR control for 64-bit ELF PIEs. .It Dv kern.elf64.aslr.honor_sbrk ASLR sbrk compatibility control for 64-bit binaries. .It Dv kern.elf64.aslr.stack Controls stack address randomization for 64-bit binaries. .It Dv kern.elf32.nxstack Enables non-executable stack for 32-bit processes. Enabled by default if supported by hardware and corresponding binary. .It Dv kern.elf64.nxstack Enables non-executable stack for 64-bit processes. .It Dv kern.elf32.allow_wx Enables mapping of simultaneously writable and executable pages for 32-bit processes. .It Dv kern.elf64.allow_wx Enables mapping of simultaneously writable and executable pages for 64-bit processes. .El .Sh SEE ALSO .Xr chflags 1 , .Xr find 1 , .Xr md5 1 , .Xr netstat 1 , .Xr openssl 1 , .Xr proccontrol 1 , .Xr ps 1 , .Xr ssh 1 , .Xr xdm 1 Pq Pa ports/x11/xorg-clients , .Xr group 5 , .Xr ttys 5 , .Xr accton 8 , .Xr init 8 , .Xr sshd 8 , .Xr sysctl 8 , .Xr syslogd 8 , .Xr vipw 8 .Sh HISTORY The .Nm manual page was originally written by .An Matthew Dillon and first appeared in .Fx 3.1 , December 1998. diff --git a/sys/kern/kern_shutdown.c b/sys/kern/kern_shutdown.c index 70de6b691fd3..4544e217802d 100644 --- a/sys/kern/kern_shutdown.c +++ b/sys/kern/kern_shutdown.c @@ -1,1835 +1,1835 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1986, 1988, 1991, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)kern_shutdown.c 8.3 (Berkeley) 1/21/94 */ #include __FBSDID("$FreeBSD$"); #include "opt_ddb.h" #include "opt_ekcd.h" #include "opt_kdb.h" #include "opt_panic.h" #include "opt_printf.h" #include "opt_sched.h" #include "opt_watchdog.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static MALLOC_DEFINE(M_DUMPER, "dumper", "dumper block buffer"); #ifndef PANIC_REBOOT_WAIT_TIME #define PANIC_REBOOT_WAIT_TIME 15 /* default to 15 seconds */ #endif static int panic_reboot_wait_time = PANIC_REBOOT_WAIT_TIME; SYSCTL_INT(_kern, OID_AUTO, panic_reboot_wait_time, CTLFLAG_RWTUN, &panic_reboot_wait_time, 0, "Seconds to wait before rebooting after a panic"); static int reboot_wait_time = 0; SYSCTL_INT(_kern, OID_AUTO, reboot_wait_time, CTLFLAG_RWTUN, &reboot_wait_time, 0, "Seconds to wait before rebooting"); /* * Note that stdarg.h and the ANSI style va_start macro is used for both * ANSI and traditional C compilers. */ #include #ifdef KDB #ifdef KDB_UNATTENDED int debugger_on_panic = 0; #else int debugger_on_panic = 1; #endif SYSCTL_INT(_debug, OID_AUTO, debugger_on_panic, - CTLFLAG_RWTUN | CTLFLAG_SECURE, - &debugger_on_panic, 0, "Run debugger on kernel panic"); + CTLFLAG_RWTUN, &debugger_on_panic, 0, + "Run debugger on kernel panic"); static bool debugger_on_recursive_panic = false; SYSCTL_BOOL(_debug, OID_AUTO, debugger_on_recursive_panic, - CTLFLAG_RWTUN | CTLFLAG_SECURE, - &debugger_on_recursive_panic, 0, "Run debugger on recursive kernel panic"); + CTLFLAG_RWTUN, &debugger_on_recursive_panic, 0, + "Run debugger on recursive kernel panic"); int debugger_on_trap = 0; SYSCTL_INT(_debug, OID_AUTO, debugger_on_trap, - CTLFLAG_RWTUN | CTLFLAG_SECURE, - &debugger_on_trap, 0, "Run debugger on kernel trap before panic"); + CTLFLAG_RWTUN, &debugger_on_trap, 0, + "Run debugger on kernel trap before panic"); #ifdef KDB_TRACE static int trace_on_panic = 1; static bool trace_all_panics = true; #else static int trace_on_panic = 0; static bool trace_all_panics = false; #endif SYSCTL_INT(_debug, OID_AUTO, trace_on_panic, CTLFLAG_RWTUN | CTLFLAG_SECURE, &trace_on_panic, 0, "Print stack trace on kernel panic"); SYSCTL_BOOL(_debug, OID_AUTO, trace_all_panics, CTLFLAG_RWTUN, &trace_all_panics, 0, "Print stack traces on secondary kernel panics"); #endif /* KDB */ static int sync_on_panic = 0; SYSCTL_INT(_kern, OID_AUTO, sync_on_panic, CTLFLAG_RWTUN, &sync_on_panic, 0, "Do a sync before rebooting from a panic"); static bool poweroff_on_panic = 0; SYSCTL_BOOL(_kern, OID_AUTO, poweroff_on_panic, CTLFLAG_RWTUN, &poweroff_on_panic, 0, "Do a power off instead of a reboot on a panic"); static bool powercycle_on_panic = 0; SYSCTL_BOOL(_kern, OID_AUTO, powercycle_on_panic, CTLFLAG_RWTUN, &powercycle_on_panic, 0, "Do a power cycle instead of a reboot on a panic"); static SYSCTL_NODE(_kern, OID_AUTO, shutdown, CTLFLAG_RW | CTLFLAG_MPSAFE, 0, "Shutdown environment"); #ifndef DIAGNOSTIC static int show_busybufs; #else static int show_busybufs = 1; #endif SYSCTL_INT(_kern_shutdown, OID_AUTO, show_busybufs, CTLFLAG_RW, &show_busybufs, 0, "Show busy buffers during shutdown"); int suspend_blocked = 0; SYSCTL_INT(_kern, OID_AUTO, suspend_blocked, CTLFLAG_RW, &suspend_blocked, 0, "Block suspend due to a pending shutdown"); #ifdef EKCD FEATURE(ekcd, "Encrypted kernel crash dumps support"); MALLOC_DEFINE(M_EKCD, "ekcd", "Encrypted kernel crash dumps data"); struct kerneldumpcrypto { uint8_t kdc_encryption; uint8_t kdc_iv[KERNELDUMP_IV_MAX_SIZE]; union { struct { keyInstance aes_ki; cipherInstance aes_ci; } u_aes; struct chacha_ctx u_chacha; } u; #define kdc_ki u.u_aes.aes_ki #define kdc_ci u.u_aes.aes_ci #define kdc_chacha u.u_chacha uint32_t kdc_dumpkeysize; struct kerneldumpkey kdc_dumpkey[]; }; #endif struct kerneldumpcomp { uint8_t kdc_format; struct compressor *kdc_stream; uint8_t *kdc_buf; size_t kdc_resid; }; static struct kerneldumpcomp *kerneldumpcomp_create(struct dumperinfo *di, uint8_t compression); static void kerneldumpcomp_destroy(struct dumperinfo *di); static int kerneldumpcomp_write_cb(void *base, size_t len, off_t off, void *arg); static int kerneldump_gzlevel = 6; SYSCTL_INT(_kern, OID_AUTO, kerneldump_gzlevel, CTLFLAG_RWTUN, &kerneldump_gzlevel, 0, "Kernel crash dump compression level"); /* * Variable panicstr contains argument to first call to panic; used as flag * to indicate that the kernel has already called panic. */ const char *panicstr; bool __read_frequently panicked; int __read_mostly dumping; /* system is dumping */ int rebooting; /* system is rebooting */ /* * Used to serialize between sysctl kern.shutdown.dumpdevname and list * modifications via ioctl. */ static struct mtx dumpconf_list_lk; MTX_SYSINIT(dumper_configs, &dumpconf_list_lk, "dumper config list", MTX_DEF); /* Our selected dumper(s). */ static TAILQ_HEAD(dumpconflist, dumperinfo) dumper_configs = TAILQ_HEAD_INITIALIZER(dumper_configs); /* Context information for dump-debuggers, saved by the dump_savectx() macro. */ struct pcb dumppcb; /* Registers. */ lwpid_t dumptid; /* Thread ID. */ static struct cdevsw reroot_cdevsw = { .d_version = D_VERSION, .d_name = "reroot", }; static void poweroff_wait(void *, int); static void shutdown_halt(void *junk, int howto); static void shutdown_panic(void *junk, int howto); static void shutdown_reset(void *junk, int howto); static int kern_reroot(void); /* register various local shutdown events */ static void shutdown_conf(void *unused) { EVENTHANDLER_REGISTER(shutdown_final, poweroff_wait, NULL, SHUTDOWN_PRI_FIRST); EVENTHANDLER_REGISTER(shutdown_final, shutdown_halt, NULL, SHUTDOWN_PRI_LAST + 100); EVENTHANDLER_REGISTER(shutdown_final, shutdown_panic, NULL, SHUTDOWN_PRI_LAST + 100); } SYSINIT(shutdown_conf, SI_SUB_INTRINSIC, SI_ORDER_ANY, shutdown_conf, NULL); /* * The only reason this exists is to create the /dev/reroot/ directory, * used by reroot code in init(8) as a mountpoint for tmpfs. */ static void reroot_conf(void *unused) { int error; struct cdev *cdev; error = make_dev_p(MAKEDEV_CHECKNAME | MAKEDEV_WAITOK, &cdev, &reroot_cdevsw, NULL, UID_ROOT, GID_WHEEL, 0600, "reroot/reroot"); if (error != 0) { printf("%s: failed to create device node, error %d", __func__, error); } } SYSINIT(reroot_conf, SI_SUB_DEVFS, SI_ORDER_ANY, reroot_conf, NULL); /* * The system call that results in a reboot. */ /* ARGSUSED */ int sys_reboot(struct thread *td, struct reboot_args *uap) { int error; error = 0; #ifdef MAC error = mac_system_check_reboot(td->td_ucred, uap->opt); #endif if (error == 0) error = priv_check(td, PRIV_REBOOT); if (error == 0) { if (uap->opt & RB_REROOT) error = kern_reroot(); else kern_reboot(uap->opt); } return (error); } static void shutdown_nice_task_fn(void *arg, int pending __unused) { int howto; howto = (uintptr_t)arg; /* Send a signal to init(8) and have it shutdown the world. */ PROC_LOCK(initproc); if ((howto & RB_POWEROFF) != 0) { BOOTTRACE("SIGUSR2 to init(8)"); kern_psignal(initproc, SIGUSR2); } else if ((howto & RB_POWERCYCLE) != 0) { BOOTTRACE("SIGWINCH to init(8)"); kern_psignal(initproc, SIGWINCH); } else if ((howto & RB_HALT) != 0) { BOOTTRACE("SIGUSR1 to init(8)"); kern_psignal(initproc, SIGUSR1); } else { BOOTTRACE("SIGINT to init(8)"); kern_psignal(initproc, SIGINT); } PROC_UNLOCK(initproc); } static struct task shutdown_nice_task = TASK_INITIALIZER(0, &shutdown_nice_task_fn, NULL); /* * Called by events that want to shut down.. e.g on a PC */ void shutdown_nice(int howto) { if (initproc != NULL && !SCHEDULER_STOPPED()) { BOOTTRACE("shutdown initiated"); shutdown_nice_task.ta_context = (void *)(uintptr_t)howto; taskqueue_enqueue(taskqueue_fast, &shutdown_nice_task); } else { /* * No init(8) running, or scheduler would not allow it * to run, so simply reboot. */ kern_reboot(howto | RB_NOSYNC); } } static void print_uptime(void) { int f; struct timespec ts; getnanouptime(&ts); printf("Uptime: "); f = 0; if (ts.tv_sec >= 86400) { printf("%ldd", (long)ts.tv_sec / 86400); ts.tv_sec %= 86400; f = 1; } if (f || ts.tv_sec >= 3600) { printf("%ldh", (long)ts.tv_sec / 3600); ts.tv_sec %= 3600; f = 1; } if (f || ts.tv_sec >= 60) { printf("%ldm", (long)ts.tv_sec / 60); ts.tv_sec %= 60; f = 1; } printf("%lds\n", (long)ts.tv_sec); } int doadump(boolean_t textdump) { boolean_t coredump; int error; error = 0; if (dumping) return (EBUSY); if (TAILQ_EMPTY(&dumper_configs)) return (ENXIO); dump_savectx(); dumping++; coredump = TRUE; #ifdef DDB if (textdump && textdump_pending) { coredump = FALSE; textdump_dumpsys(TAILQ_FIRST(&dumper_configs)); } #endif if (coredump) { struct dumperinfo *di; TAILQ_FOREACH(di, &dumper_configs, di_next) { error = dumpsys(di); if (error == 0) break; } } dumping--; return (error); } /* * Trace the shutdown reason. */ static void reboottrace(int howto) { if ((howto & RB_DUMP) != 0) { if ((howto & RB_HALT) != 0) BOOTTRACE("system panic: halting..."); if ((howto & RB_POWEROFF) != 0) BOOTTRACE("system panic: powering off..."); if ((howto & (RB_HALT|RB_POWEROFF)) == 0) BOOTTRACE("system panic: rebooting..."); } else { if ((howto & RB_HALT) != 0) BOOTTRACE("system halting..."); if ((howto & RB_POWEROFF) != 0) BOOTTRACE("system powering off..."); if ((howto & (RB_HALT|RB_POWEROFF)) == 0) BOOTTRACE("system rebooting..."); } } /* * kern_reboot(9): Shut down the system cleanly to prepare for reboot, halt, or * power off. */ void kern_reboot(int howto) { static int once = 0; if (initproc != NULL && curproc != initproc) BOOTTRACE("kernel shutdown (dirty) started"); else BOOTTRACE("kernel shutdown (clean) started"); /* * Normal paths here don't hold Giant, but we can wind up here * unexpectedly with it held. Drop it now so we don't have to * drop and pick it up elsewhere. The paths it is locking will * never be returned to, and it is preferable to preclude * deadlock than to lock against code that won't ever * continue. */ while (mtx_owned(&Giant)) mtx_unlock(&Giant); #if defined(SMP) /* * Bind us to the first CPU so that all shutdown code runs there. Some * systems don't shutdown properly (i.e., ACPI power off) if we * run on another processor. */ if (!SCHEDULER_STOPPED()) { thread_lock(curthread); sched_bind(curthread, CPU_FIRST()); thread_unlock(curthread); KASSERT(PCPU_GET(cpuid) == CPU_FIRST(), ("%s: not running on cpu 0", __func__)); } #endif /* We're in the process of rebooting. */ rebooting = 1; reboottrace(howto); /* We are out of the debugger now. */ kdb_active = 0; /* * Do any callouts that should be done BEFORE syncing the filesystems. */ EVENTHANDLER_INVOKE(shutdown_pre_sync, howto); BOOTTRACE("shutdown pre sync complete"); /* * Now sync filesystems */ if (!cold && (howto & RB_NOSYNC) == 0 && once == 0) { once = 1; BOOTTRACE("bufshutdown begin"); bufshutdown(show_busybufs); BOOTTRACE("bufshutdown end"); } print_uptime(); cngrab(); /* * Ok, now do things that assume all filesystem activity has * been completed. */ EVENTHANDLER_INVOKE(shutdown_post_sync, howto); BOOTTRACE("shutdown post sync complete"); if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping) doadump(TRUE); /* Now that we're going to really halt the system... */ BOOTTRACE("shutdown final begin"); if (shutdown_trace) boottrace_dump_console(); EVENTHANDLER_INVOKE(shutdown_final, howto); /* * Call this directly so that reset is attempted even if shutdown * handlers are not yet registered. */ shutdown_reset(NULL, howto); for(;;) ; /* safety against shutdown_reset not working */ /* NOTREACHED */ } /* * The system call that results in changing the rootfs. */ static int kern_reroot(void) { struct vnode *oldrootvnode, *vp; struct mount *mp, *devmp; int error; if (curproc != initproc) return (EPERM); /* * Mark the filesystem containing currently-running executable * (the temporary copy of init(8)) busy. */ vp = curproc->p_textvp; error = vn_lock(vp, LK_SHARED); if (error != 0) return (error); mp = vp->v_mount; error = vfs_busy(mp, MBF_NOWAIT); if (error != 0) { vfs_ref(mp); VOP_UNLOCK(vp); error = vfs_busy(mp, 0); vn_lock(vp, LK_SHARED | LK_RETRY); vfs_rel(mp); if (error != 0) { VOP_UNLOCK(vp); return (ENOENT); } if (VN_IS_DOOMED(vp)) { VOP_UNLOCK(vp); vfs_unbusy(mp); return (ENOENT); } } VOP_UNLOCK(vp); /* * Remove the filesystem containing currently-running executable * from the mount list, to prevent it from being unmounted * by vfs_unmountall(), and to avoid confusing vfs_mountroot(). * * Also preserve /dev - forcibly unmounting it could cause driver * reinitialization. */ vfs_ref(rootdevmp); devmp = rootdevmp; rootdevmp = NULL; mtx_lock(&mountlist_mtx); TAILQ_REMOVE(&mountlist, mp, mnt_list); TAILQ_REMOVE(&mountlist, devmp, mnt_list); mtx_unlock(&mountlist_mtx); oldrootvnode = rootvnode; /* * Unmount everything except for the two filesystems preserved above. */ vfs_unmountall(); /* * Add /dev back; vfs_mountroot() will move it into its new place. */ mtx_lock(&mountlist_mtx); TAILQ_INSERT_HEAD(&mountlist, devmp, mnt_list); mtx_unlock(&mountlist_mtx); rootdevmp = devmp; vfs_rel(rootdevmp); /* * Mount the new rootfs. */ vfs_mountroot(); /* * Update all references to the old rootvnode. */ mountcheckdirs(oldrootvnode, rootvnode); /* * Add the temporary filesystem back and unbusy it. */ mtx_lock(&mountlist_mtx); TAILQ_INSERT_TAIL(&mountlist, mp, mnt_list); mtx_unlock(&mountlist_mtx); vfs_unbusy(mp); return (0); } /* * If the shutdown was a clean halt, behave accordingly. */ static void shutdown_halt(void *junk, int howto) { if (howto & RB_HALT) { printf("\n"); printf("The operating system has halted.\n"); printf("Please press any key to reboot.\n\n"); wdog_kern_pat(WD_TO_NEVER); switch (cngetc()) { case -1: /* No console, just die */ cpu_halt(); /* NOTREACHED */ default: break; } } } /* * Check to see if the system panicked, pause and then reboot * according to the specified delay. */ static void shutdown_panic(void *junk, int howto) { int loop; if (howto & RB_DUMP) { if (panic_reboot_wait_time != 0) { if (panic_reboot_wait_time != -1) { printf("Automatic reboot in %d seconds - " "press a key on the console to abort\n", panic_reboot_wait_time); for (loop = panic_reboot_wait_time * 10; loop > 0; --loop) { DELAY(1000 * 100); /* 1/10th second */ /* Did user type a key? */ if (cncheckc() != -1) break; } if (!loop) return; } } else { /* zero time specified - reboot NOW */ return; } printf("--> Press a key on the console to reboot,\n"); printf("--> or switch off the system now.\n"); cngetc(); } } /* * Everything done, now reset */ static void shutdown_reset(void *junk, int howto) { printf("Rebooting...\n"); DELAY(reboot_wait_time * 1000000); /* * Acquiring smp_ipi_mtx here has a double effect: * - it disables interrupts avoiding CPU0 preemption * by fast handlers (thus deadlocking against other CPUs) * - it avoids deadlocks against smp_rendezvous() or, more * generally, threads busy-waiting, with this spinlock held, * and waiting for responses by threads on other CPUs * (ie. smp_tlb_shootdown()). * * For the !SMP case it just needs to handle the former problem. */ #ifdef SMP mtx_lock_spin(&smp_ipi_mtx); #else spinlock_enter(); #endif cpu_reset(); /* NOTREACHED */ /* assuming reset worked */ } #if defined(WITNESS) || defined(INVARIANT_SUPPORT) static int kassert_warn_only = 0; #ifdef KDB static int kassert_do_kdb = 0; #endif #ifdef KTR static int kassert_do_ktr = 0; #endif static int kassert_do_log = 1; static int kassert_log_pps_limit = 4; static int kassert_log_mute_at = 0; static int kassert_log_panic_at = 0; static int kassert_suppress_in_panic = 0; static int kassert_warnings = 0; SYSCTL_NODE(_debug, OID_AUTO, kassert, CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, "kassert options"); #ifdef KASSERT_PANIC_OPTIONAL #define KASSERT_RWTUN CTLFLAG_RWTUN #else #define KASSERT_RWTUN CTLFLAG_RDTUN #endif SYSCTL_INT(_debug_kassert, OID_AUTO, warn_only, KASSERT_RWTUN, &kassert_warn_only, 0, "KASSERT triggers a panic (0) or just a warning (1)"); #ifdef KDB SYSCTL_INT(_debug_kassert, OID_AUTO, do_kdb, KASSERT_RWTUN, &kassert_do_kdb, 0, "KASSERT will enter the debugger"); #endif #ifdef KTR SYSCTL_UINT(_debug_kassert, OID_AUTO, do_ktr, KASSERT_RWTUN, &kassert_do_ktr, 0, "KASSERT does a KTR, set this to the KTRMASK you want"); #endif SYSCTL_INT(_debug_kassert, OID_AUTO, do_log, KASSERT_RWTUN, &kassert_do_log, 0, "If warn_only is enabled, log (1) or do not log (0) assertion violations"); SYSCTL_INT(_debug_kassert, OID_AUTO, warnings, CTLFLAG_RD | CTLFLAG_STATS, &kassert_warnings, 0, "number of KASSERTs that have been triggered"); SYSCTL_INT(_debug_kassert, OID_AUTO, log_panic_at, KASSERT_RWTUN, &kassert_log_panic_at, 0, "max number of KASSERTS before we will panic"); SYSCTL_INT(_debug_kassert, OID_AUTO, log_pps_limit, KASSERT_RWTUN, &kassert_log_pps_limit, 0, "limit number of log messages per second"); SYSCTL_INT(_debug_kassert, OID_AUTO, log_mute_at, KASSERT_RWTUN, &kassert_log_mute_at, 0, "max number of KASSERTS to log"); SYSCTL_INT(_debug_kassert, OID_AUTO, suppress_in_panic, KASSERT_RWTUN, &kassert_suppress_in_panic, 0, "KASSERTs will be suppressed while handling a panic"); #undef KASSERT_RWTUN static int kassert_sysctl_kassert(SYSCTL_HANDLER_ARGS); SYSCTL_PROC(_debug_kassert, OID_AUTO, kassert, CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_SECURE | CTLFLAG_MPSAFE, NULL, 0, kassert_sysctl_kassert, "I", "set to trigger a test kassert"); static int kassert_sysctl_kassert(SYSCTL_HANDLER_ARGS) { int error, i; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error == 0) { i = 0; error = sysctl_handle_int(oidp, &i, 0, req); } if (error != 0 || req->newptr == NULL) return (error); KASSERT(0, ("kassert_sysctl_kassert triggered kassert %d", i)); return (0); } #ifdef KASSERT_PANIC_OPTIONAL /* * Called by KASSERT, this decides if we will panic * or if we will log via printf and/or ktr. */ void kassert_panic(const char *fmt, ...) { static char buf[256]; va_list ap; va_start(ap, fmt); (void)vsnprintf(buf, sizeof(buf), fmt, ap); va_end(ap); /* * If we are suppressing secondary panics, log the warning but do not * re-enter panic/kdb. */ if (KERNEL_PANICKED() && kassert_suppress_in_panic) { if (kassert_do_log) { printf("KASSERT failed: %s\n", buf); #ifdef KDB if (trace_all_panics && trace_on_panic) kdb_backtrace(); #endif } return; } /* * panic if we're not just warning, or if we've exceeded * kassert_log_panic_at warnings. */ if (!kassert_warn_only || (kassert_log_panic_at > 0 && kassert_warnings >= kassert_log_panic_at)) { va_start(ap, fmt); vpanic(fmt, ap); /* NORETURN */ } #ifdef KTR if (kassert_do_ktr) CTR0(ktr_mask, buf); #endif /* KTR */ /* * log if we've not yet met the mute limit. */ if (kassert_do_log && (kassert_log_mute_at == 0 || kassert_warnings < kassert_log_mute_at)) { static struct timeval lasterr; static int curerr; if (ppsratecheck(&lasterr, &curerr, kassert_log_pps_limit)) { printf("KASSERT failed: %s\n", buf); kdb_backtrace(); } } #ifdef KDB if (kassert_do_kdb) { kdb_enter(KDB_WHY_KASSERT, buf); } #endif atomic_add_int(&kassert_warnings, 1); } #endif /* KASSERT_PANIC_OPTIONAL */ #endif /* * Panic is called on unresolvable fatal errors. It prints "panic: mesg", * and then reboots. If we are called twice, then we avoid trying to sync * the disks as this often leads to recursive panics. */ void panic(const char *fmt, ...) { va_list ap; va_start(ap, fmt); vpanic(fmt, ap); } void vpanic(const char *fmt, va_list ap) { #ifdef SMP cpuset_t other_cpus; #endif struct thread *td = curthread; int bootopt, newpanic; static char buf[256]; spinlock_enter(); #ifdef SMP /* * stop_cpus_hard(other_cpus) should prevent multiple CPUs from * concurrently entering panic. Only the winner will proceed * further. */ if (panicstr == NULL && !kdb_active) { other_cpus = all_cpus; CPU_CLR(PCPU_GET(cpuid), &other_cpus); stop_cpus_hard(other_cpus); } #endif /* * Ensure that the scheduler is stopped while panicking, even if panic * has been entered from kdb. */ td->td_stopsched = 1; bootopt = RB_AUTOBOOT; newpanic = 0; if (KERNEL_PANICKED()) bootopt |= RB_NOSYNC; else { bootopt |= RB_DUMP; panicstr = fmt; panicked = true; newpanic = 1; } if (newpanic) { (void)vsnprintf(buf, sizeof(buf), fmt, ap); panicstr = buf; cngrab(); printf("panic: %s\n", buf); } else { printf("panic: "); vprintf(fmt, ap); printf("\n"); } #ifdef SMP printf("cpuid = %d\n", PCPU_GET(cpuid)); #endif printf("time = %jd\n", (intmax_t )time_second); #ifdef KDB if ((newpanic || trace_all_panics) && trace_on_panic) kdb_backtrace(); if (debugger_on_panic) kdb_enter(KDB_WHY_PANIC, "panic"); else if (!newpanic && debugger_on_recursive_panic) kdb_enter(KDB_WHY_PANIC, "re-panic"); #endif /*thread_lock(td); */ td->td_flags |= TDF_INPANIC; /* thread_unlock(td); */ if (!sync_on_panic) bootopt |= RB_NOSYNC; if (poweroff_on_panic) bootopt |= RB_POWEROFF; if (powercycle_on_panic) bootopt |= RB_POWERCYCLE; kern_reboot(bootopt); } /* * Support for poweroff delay. * * Please note that setting this delay too short might power off your machine * before the write cache on your hard disk has been flushed, leading to * soft-updates inconsistencies. */ #ifndef POWEROFF_DELAY # define POWEROFF_DELAY 5000 #endif static int poweroff_delay = POWEROFF_DELAY; SYSCTL_INT(_kern_shutdown, OID_AUTO, poweroff_delay, CTLFLAG_RW, &poweroff_delay, 0, "Delay before poweroff to write disk caches (msec)"); static void poweroff_wait(void *junk, int howto) { if ((howto & (RB_POWEROFF | RB_POWERCYCLE)) == 0 || poweroff_delay <= 0) return; DELAY(poweroff_delay * 1000); } /* * Some system processes (e.g. syncer) need to be stopped at appropriate * points in their main loops prior to a system shutdown, so that they * won't interfere with the shutdown process (e.g. by holding a disk buf * to cause sync to fail). For each of these system processes, register * shutdown_kproc() as a handler for one of shutdown events. */ static int kproc_shutdown_wait = 60; SYSCTL_INT(_kern_shutdown, OID_AUTO, kproc_shutdown_wait, CTLFLAG_RW, &kproc_shutdown_wait, 0, "Max wait time (sec) to stop for each process"); void kproc_shutdown(void *arg, int howto) { struct proc *p; int error; if (KERNEL_PANICKED()) return; p = (struct proc *)arg; printf("Waiting (max %d seconds) for system process `%s' to stop... ", kproc_shutdown_wait, p->p_comm); error = kproc_suspend(p, kproc_shutdown_wait * hz); if (error == EWOULDBLOCK) printf("timed out\n"); else printf("done\n"); } void kthread_shutdown(void *arg, int howto) { struct thread *td; int error; if (KERNEL_PANICKED()) return; td = (struct thread *)arg; printf("Waiting (max %d seconds) for system thread `%s' to stop... ", kproc_shutdown_wait, td->td_name); error = kthread_suspend(td, kproc_shutdown_wait * hz); if (error == EWOULDBLOCK) printf("timed out\n"); else printf("done\n"); } static int dumpdevname_sysctl_handler(SYSCTL_HANDLER_ARGS) { char buf[256]; struct dumperinfo *di; struct sbuf sb; int error; error = sysctl_wire_old_buffer(req, 0); if (error != 0) return (error); sbuf_new_for_sysctl(&sb, buf, sizeof(buf), req); mtx_lock(&dumpconf_list_lk); TAILQ_FOREACH(di, &dumper_configs, di_next) { if (di != TAILQ_FIRST(&dumper_configs)) sbuf_putc(&sb, ','); sbuf_cat(&sb, di->di_devname); } mtx_unlock(&dumpconf_list_lk); error = sbuf_finish(&sb); sbuf_delete(&sb); return (error); } SYSCTL_PROC(_kern_shutdown, OID_AUTO, dumpdevname, CTLTYPE_STRING | CTLFLAG_RD | CTLFLAG_MPSAFE, &dumper_configs, 0, dumpdevname_sysctl_handler, "A", "Device(s) for kernel dumps"); static int _dump_append(struct dumperinfo *di, void *virtual, size_t length); #ifdef EKCD static struct kerneldumpcrypto * kerneldumpcrypto_create(size_t blocksize, uint8_t encryption, const uint8_t *key, uint32_t encryptedkeysize, const uint8_t *encryptedkey) { struct kerneldumpcrypto *kdc; struct kerneldumpkey *kdk; uint32_t dumpkeysize; dumpkeysize = roundup2(sizeof(*kdk) + encryptedkeysize, blocksize); kdc = malloc(sizeof(*kdc) + dumpkeysize, M_EKCD, M_WAITOK | M_ZERO); arc4rand(kdc->kdc_iv, sizeof(kdc->kdc_iv), 0); kdc->kdc_encryption = encryption; switch (kdc->kdc_encryption) { case KERNELDUMP_ENC_AES_256_CBC: if (rijndael_makeKey(&kdc->kdc_ki, DIR_ENCRYPT, 256, key) <= 0) goto failed; break; case KERNELDUMP_ENC_CHACHA20: chacha_keysetup(&kdc->kdc_chacha, key, 256); break; default: goto failed; } kdc->kdc_dumpkeysize = dumpkeysize; kdk = kdc->kdc_dumpkey; kdk->kdk_encryption = kdc->kdc_encryption; memcpy(kdk->kdk_iv, kdc->kdc_iv, sizeof(kdk->kdk_iv)); kdk->kdk_encryptedkeysize = htod32(encryptedkeysize); memcpy(kdk->kdk_encryptedkey, encryptedkey, encryptedkeysize); return (kdc); failed: zfree(kdc, M_EKCD); return (NULL); } static int kerneldumpcrypto_init(struct kerneldumpcrypto *kdc) { uint8_t hash[SHA256_DIGEST_LENGTH]; SHA256_CTX ctx; struct kerneldumpkey *kdk; int error; error = 0; if (kdc == NULL) return (0); /* * When a user enters ddb it can write a crash dump multiple times. * Each time it should be encrypted using a different IV. */ SHA256_Init(&ctx); SHA256_Update(&ctx, kdc->kdc_iv, sizeof(kdc->kdc_iv)); SHA256_Final(hash, &ctx); bcopy(hash, kdc->kdc_iv, sizeof(kdc->kdc_iv)); switch (kdc->kdc_encryption) { case KERNELDUMP_ENC_AES_256_CBC: if (rijndael_cipherInit(&kdc->kdc_ci, MODE_CBC, kdc->kdc_iv) <= 0) { error = EINVAL; goto out; } break; case KERNELDUMP_ENC_CHACHA20: chacha_ivsetup(&kdc->kdc_chacha, kdc->kdc_iv, NULL); break; default: error = EINVAL; goto out; } kdk = kdc->kdc_dumpkey; memcpy(kdk->kdk_iv, kdc->kdc_iv, sizeof(kdk->kdk_iv)); out: explicit_bzero(hash, sizeof(hash)); return (error); } static uint32_t kerneldumpcrypto_dumpkeysize(const struct kerneldumpcrypto *kdc) { if (kdc == NULL) return (0); return (kdc->kdc_dumpkeysize); } #endif /* EKCD */ static struct kerneldumpcomp * kerneldumpcomp_create(struct dumperinfo *di, uint8_t compression) { struct kerneldumpcomp *kdcomp; int format; switch (compression) { case KERNELDUMP_COMP_GZIP: format = COMPRESS_GZIP; break; case KERNELDUMP_COMP_ZSTD: format = COMPRESS_ZSTD; break; default: return (NULL); } kdcomp = malloc(sizeof(*kdcomp), M_DUMPER, M_WAITOK | M_ZERO); kdcomp->kdc_format = compression; kdcomp->kdc_stream = compressor_init(kerneldumpcomp_write_cb, format, di->maxiosize, kerneldump_gzlevel, di); if (kdcomp->kdc_stream == NULL) { free(kdcomp, M_DUMPER); return (NULL); } kdcomp->kdc_buf = malloc(di->maxiosize, M_DUMPER, M_WAITOK | M_NODUMP); return (kdcomp); } static void kerneldumpcomp_destroy(struct dumperinfo *di) { struct kerneldumpcomp *kdcomp; kdcomp = di->kdcomp; if (kdcomp == NULL) return; compressor_fini(kdcomp->kdc_stream); zfree(kdcomp->kdc_buf, M_DUMPER); free(kdcomp, M_DUMPER); } /* * Free a dumper. Must not be present on global list. */ void dumper_destroy(struct dumperinfo *di) { if (di == NULL) return; zfree(di->blockbuf, M_DUMPER); kerneldumpcomp_destroy(di); #ifdef EKCD zfree(di->kdcrypto, M_EKCD); #endif zfree(di, M_DUMPER); } /* * Allocate and set up a new dumper from the provided template. */ int dumper_create(const struct dumperinfo *di_template, const char *devname, const struct diocskerneldump_arg *kda, struct dumperinfo **dip) { struct dumperinfo *newdi; int error = 0; if (dip == NULL) return (EINVAL); /* Allocate a new dumper */ newdi = malloc(sizeof(*newdi) + strlen(devname) + 1, M_DUMPER, M_WAITOK | M_ZERO); memcpy(newdi, di_template, sizeof(*newdi)); newdi->blockbuf = NULL; newdi->kdcrypto = NULL; newdi->kdcomp = NULL; strcpy(newdi->di_devname, devname); if (kda->kda_encryption != KERNELDUMP_ENC_NONE) { #ifdef EKCD newdi->kdcrypto = kerneldumpcrypto_create(newdi->blocksize, kda->kda_encryption, kda->kda_key, kda->kda_encryptedkeysize, kda->kda_encryptedkey); if (newdi->kdcrypto == NULL) { error = EINVAL; goto cleanup; } #else error = EOPNOTSUPP; goto cleanup; #endif } if (kda->kda_compression != KERNELDUMP_COMP_NONE) { #ifdef EKCD /* * We can't support simultaneous unpadded block cipher * encryption and compression because there is no guarantee the * length of the compressed result is exactly a multiple of the * cipher block size. */ if (kda->kda_encryption == KERNELDUMP_ENC_AES_256_CBC) { error = EOPNOTSUPP; goto cleanup; } #endif newdi->kdcomp = kerneldumpcomp_create(newdi, kda->kda_compression); if (newdi->kdcomp == NULL) { error = EINVAL; goto cleanup; } } newdi->blockbuf = malloc(newdi->blocksize, M_DUMPER, M_WAITOK | M_ZERO); *dip = newdi; return (0); cleanup: dumper_destroy(newdi); return (error); } /* * Create a new dumper and register it in the global list. */ int dumper_insert(const struct dumperinfo *di_template, const char *devname, const struct diocskerneldump_arg *kda) { struct dumperinfo *newdi, *listdi; bool inserted; uint8_t index; int error; index = kda->kda_index; MPASS(index != KDA_REMOVE && index != KDA_REMOVE_DEV && index != KDA_REMOVE_ALL); error = priv_check(curthread, PRIV_SETDUMPER); if (error != 0) return (error); error = dumper_create(di_template, devname, kda, &newdi); if (error != 0) return (error); /* Add the new configuration to the queue */ mtx_lock(&dumpconf_list_lk); inserted = false; TAILQ_FOREACH(listdi, &dumper_configs, di_next) { if (index == 0) { TAILQ_INSERT_BEFORE(listdi, newdi, di_next); inserted = true; break; } index--; } if (!inserted) TAILQ_INSERT_TAIL(&dumper_configs, newdi, di_next); mtx_unlock(&dumpconf_list_lk); return (0); } #ifdef DDB void dumper_ddb_insert(struct dumperinfo *newdi) { TAILQ_INSERT_HEAD(&dumper_configs, newdi, di_next); } void dumper_ddb_remove(struct dumperinfo *di) { TAILQ_REMOVE(&dumper_configs, di, di_next); } #endif static bool dumper_config_match(const struct dumperinfo *di, const char *devname, const struct diocskerneldump_arg *kda) { if (kda->kda_index == KDA_REMOVE_ALL) return (true); if (strcmp(di->di_devname, devname) != 0) return (false); /* * Allow wildcard removal of configs matching a device on g_dev_orphan. */ if (kda->kda_index == KDA_REMOVE_DEV) return (true); if (di->kdcomp != NULL) { if (di->kdcomp->kdc_format != kda->kda_compression) return (false); } else if (kda->kda_compression != KERNELDUMP_COMP_NONE) return (false); #ifdef EKCD if (di->kdcrypto != NULL) { if (di->kdcrypto->kdc_encryption != kda->kda_encryption) return (false); /* * Do we care to verify keys match to delete? It seems weird * to expect multiple fallback dump configurations on the same * device that only differ in crypto key. */ } else #endif if (kda->kda_encryption != KERNELDUMP_ENC_NONE) return (false); return (true); } /* * Remove and free the requested dumper(s) from the global list. */ int dumper_remove(const char *devname, const struct diocskerneldump_arg *kda) { struct dumperinfo *di, *sdi; bool found; int error; error = priv_check(curthread, PRIV_SETDUMPER); if (error != 0) return (error); /* * Try to find a matching configuration, and kill it. * * NULL 'kda' indicates remove any configuration matching 'devname', * which may remove multiple configurations in atypical configurations. */ found = false; mtx_lock(&dumpconf_list_lk); TAILQ_FOREACH_SAFE(di, &dumper_configs, di_next, sdi) { if (dumper_config_match(di, devname, kda)) { found = true; TAILQ_REMOVE(&dumper_configs, di, di_next); dumper_destroy(di); } } mtx_unlock(&dumpconf_list_lk); /* Only produce ENOENT if a more targeted match didn't match. */ if (!found && kda->kda_index == KDA_REMOVE) return (ENOENT); return (0); } static int dump_check_bounds(struct dumperinfo *di, off_t offset, size_t length) { if (di->mediasize > 0 && length != 0 && (offset < di->mediaoffset || offset - di->mediaoffset + length > di->mediasize)) { if (di->kdcomp != NULL && offset >= di->mediaoffset) { printf( "Compressed dump failed to fit in device boundaries.\n"); return (E2BIG); } printf("Attempt to write outside dump device boundaries.\n" "offset(%jd), mediaoffset(%jd), length(%ju), mediasize(%jd).\n", (intmax_t)offset, (intmax_t)di->mediaoffset, (uintmax_t)length, (intmax_t)di->mediasize); return (ENOSPC); } if (length % di->blocksize != 0) { printf("Attempt to write partial block of length %ju.\n", (uintmax_t)length); return (EINVAL); } if (offset % di->blocksize != 0) { printf("Attempt to write at unaligned offset %jd.\n", (intmax_t)offset); return (EINVAL); } return (0); } #ifdef EKCD static int dump_encrypt(struct kerneldumpcrypto *kdc, uint8_t *buf, size_t size) { switch (kdc->kdc_encryption) { case KERNELDUMP_ENC_AES_256_CBC: if (rijndael_blockEncrypt(&kdc->kdc_ci, &kdc->kdc_ki, buf, 8 * size, buf) <= 0) { return (EIO); } if (rijndael_cipherInit(&kdc->kdc_ci, MODE_CBC, buf + size - 16 /* IV size for AES-256-CBC */) <= 0) { return (EIO); } break; case KERNELDUMP_ENC_CHACHA20: chacha_encrypt_bytes(&kdc->kdc_chacha, buf, buf, size); break; default: return (EINVAL); } return (0); } /* Encrypt data and call dumper. */ static int dump_encrypted_write(struct dumperinfo *di, void *virtual, off_t offset, size_t length) { static uint8_t buf[KERNELDUMP_BUFFER_SIZE]; struct kerneldumpcrypto *kdc; int error; size_t nbytes; kdc = di->kdcrypto; while (length > 0) { nbytes = MIN(length, sizeof(buf)); bcopy(virtual, buf, nbytes); if (dump_encrypt(kdc, buf, nbytes) != 0) return (EIO); error = dump_write(di, buf, offset, nbytes); if (error != 0) return (error); offset += nbytes; virtual = (void *)((uint8_t *)virtual + nbytes); length -= nbytes; } return (0); } #endif /* EKCD */ static int kerneldumpcomp_write_cb(void *base, size_t length, off_t offset, void *arg) { struct dumperinfo *di; size_t resid, rlength; int error; di = arg; if (length % di->blocksize != 0) { /* * This must be the final write after flushing the compression * stream. Write as many full blocks as possible and stash the * residual data in the dumper's block buffer. It will be * padded and written in dump_finish(). */ rlength = rounddown(length, di->blocksize); if (rlength != 0) { error = _dump_append(di, base, rlength); if (error != 0) return (error); } resid = length - rlength; memmove(di->blockbuf, (uint8_t *)base + rlength, resid); bzero((uint8_t *)di->blockbuf + resid, di->blocksize - resid); di->kdcomp->kdc_resid = resid; return (EAGAIN); } return (_dump_append(di, base, length)); } /* * Write kernel dump headers at the beginning and end of the dump extent. * Write the kernel dump encryption key after the leading header if we were * configured to do so. */ static int dump_write_headers(struct dumperinfo *di, struct kerneldumpheader *kdh) { #ifdef EKCD struct kerneldumpcrypto *kdc; #endif void *buf; size_t hdrsz; uint64_t extent; uint32_t keysize; int error; hdrsz = sizeof(*kdh); if (hdrsz > di->blocksize) return (ENOMEM); #ifdef EKCD kdc = di->kdcrypto; keysize = kerneldumpcrypto_dumpkeysize(kdc); #else keysize = 0; #endif /* * If the dump device has special handling for headers, let it take care * of writing them out. */ if (di->dumper_hdr != NULL) return (di->dumper_hdr(di, kdh)); if (hdrsz == di->blocksize) buf = kdh; else { buf = di->blockbuf; memset(buf, 0, di->blocksize); memcpy(buf, kdh, hdrsz); } extent = dtoh64(kdh->dumpextent); #ifdef EKCD if (kdc != NULL) { error = dump_write(di, kdc->kdc_dumpkey, di->mediaoffset + di->mediasize - di->blocksize - extent - keysize, keysize); if (error != 0) return (error); } #endif error = dump_write(di, buf, di->mediaoffset + di->mediasize - 2 * di->blocksize - extent - keysize, di->blocksize); if (error == 0) error = dump_write(di, buf, di->mediaoffset + di->mediasize - di->blocksize, di->blocksize); return (error); } /* * Don't touch the first SIZEOF_METADATA bytes on the dump device. This is to * protect us from metadata and metadata from us. */ #define SIZEOF_METADATA (64 * 1024) /* * Do some preliminary setup for a kernel dump: initialize state for encryption, * if requested, and make sure that we have enough space on the dump device. * * We set things up so that the dump ends before the last sector of the dump * device, at which the trailing header is written. * * +-----------+------+-----+----------------------------+------+ * | | lhdr | key | ... kernel dump ... | thdr | * +-----------+------+-----+----------------------------+------+ * 1 blk opt <------- dump extent --------> 1 blk * * Dumps written using dump_append() start at the beginning of the extent. * Uncompressed dumps will use the entire extent, but compressed dumps typically * will not. The true length of the dump is recorded in the leading and trailing * headers once the dump has been completed. * * The dump device may provide a callback, in which case it will initialize * dumpoff and take care of laying out the headers. */ int dump_start(struct dumperinfo *di, struct kerneldumpheader *kdh) { #ifdef EKCD struct kerneldumpcrypto *kdc; #endif void *key; uint64_t dumpextent, span; uint32_t keysize; int error; #ifdef EKCD /* Send the key before the dump so a partial dump is still usable. */ kdc = di->kdcrypto; error = kerneldumpcrypto_init(kdc); if (error != 0) return (error); keysize = kerneldumpcrypto_dumpkeysize(kdc); key = keysize > 0 ? kdc->kdc_dumpkey : NULL; #else error = 0; keysize = 0; key = NULL; #endif if (di->dumper_start != NULL) { error = di->dumper_start(di, key, keysize); } else { dumpextent = dtoh64(kdh->dumpextent); span = SIZEOF_METADATA + dumpextent + 2 * di->blocksize + keysize; if (di->mediasize < span) { if (di->kdcomp == NULL) return (E2BIG); /* * We don't yet know how much space the compressed dump * will occupy, so try to use the whole swap partition * (minus the first 64KB) in the hope that the * compressed dump will fit. If that doesn't turn out to * be enough, the bounds checking in dump_write() * will catch us and cause the dump to fail. */ dumpextent = di->mediasize - span + dumpextent; kdh->dumpextent = htod64(dumpextent); } /* * The offset at which to begin writing the dump. */ di->dumpoff = di->mediaoffset + di->mediasize - di->blocksize - dumpextent; } di->origdumpoff = di->dumpoff; return (error); } static int _dump_append(struct dumperinfo *di, void *virtual, size_t length) { int error; #ifdef EKCD if (di->kdcrypto != NULL) error = dump_encrypted_write(di, virtual, di->dumpoff, length); else #endif error = dump_write(di, virtual, di->dumpoff, length); if (error == 0) di->dumpoff += length; return (error); } /* * Write to the dump device starting at dumpoff. When compression is enabled, * writes to the device will be performed using a callback that gets invoked * when the compression stream's output buffer is full. */ int dump_append(struct dumperinfo *di, void *virtual, size_t length) { void *buf; if (di->kdcomp != NULL) { /* Bounce through a buffer to avoid CRC errors. */ if (length > di->maxiosize) return (EINVAL); buf = di->kdcomp->kdc_buf; memmove(buf, virtual, length); return (compressor_write(di->kdcomp->kdc_stream, buf, length)); } return (_dump_append(di, virtual, length)); } /* * Write to the dump device at the specified offset. */ int dump_write(struct dumperinfo *di, void *virtual, off_t offset, size_t length) { int error; error = dump_check_bounds(di, offset, length); if (error != 0) return (error); return (di->dumper(di->priv, virtual, offset, length)); } /* * Perform kernel dump finalization: flush the compression stream, if necessary, * write the leading and trailing kernel dump headers now that we know the true * length of the dump, and optionally write the encryption key following the * leading header. */ int dump_finish(struct dumperinfo *di, struct kerneldumpheader *kdh) { int error; if (di->kdcomp != NULL) { error = compressor_flush(di->kdcomp->kdc_stream); if (error == EAGAIN) { /* We have residual data in di->blockbuf. */ error = _dump_append(di, di->blockbuf, di->blocksize); if (error == 0) /* Compensate for _dump_append()'s adjustment. */ di->dumpoff -= di->blocksize - di->kdcomp->kdc_resid; di->kdcomp->kdc_resid = 0; } if (error != 0) return (error); /* * We now know the size of the compressed dump, so update the * header accordingly and recompute parity. */ kdh->dumplength = htod64(di->dumpoff - di->origdumpoff); kdh->parity = 0; kdh->parity = kerneldump_parity(kdh); compressor_reset(di->kdcomp->kdc_stream); } error = dump_write_headers(di, kdh); if (error != 0) return (error); (void)dump_write(di, NULL, 0, 0); return (0); } void dump_init_header(const struct dumperinfo *di, struct kerneldumpheader *kdh, const char *magic, uint32_t archver, uint64_t dumplen) { size_t dstsize; bzero(kdh, sizeof(*kdh)); strlcpy(kdh->magic, magic, sizeof(kdh->magic)); strlcpy(kdh->architecture, MACHINE_ARCH, sizeof(kdh->architecture)); kdh->version = htod32(KERNELDUMPVERSION); kdh->architectureversion = htod32(archver); kdh->dumplength = htod64(dumplen); kdh->dumpextent = kdh->dumplength; kdh->dumptime = htod64(time_second); #ifdef EKCD kdh->dumpkeysize = htod32(kerneldumpcrypto_dumpkeysize(di->kdcrypto)); #else kdh->dumpkeysize = 0; #endif kdh->blocksize = htod32(di->blocksize); strlcpy(kdh->hostname, prison0.pr_hostname, sizeof(kdh->hostname)); dstsize = sizeof(kdh->versionstring); if (strlcpy(kdh->versionstring, version, dstsize) >= dstsize) kdh->versionstring[dstsize - 2] = '\n'; if (panicstr != NULL) strlcpy(kdh->panicstring, panicstr, sizeof(kdh->panicstring)); if (di->kdcomp != NULL) kdh->compression = di->kdcomp->kdc_format; kdh->parity = kerneldump_parity(kdh); } #ifdef DDB DB_SHOW_COMMAND_FLAGS(panic, db_show_panic, DB_CMD_MEMSAFE) { if (panicstr == NULL) db_printf("panicstr not set\n"); else db_printf("panic: %s\n", panicstr); } #endif diff --git a/sys/kern/subr_kdb.c b/sys/kern/subr_kdb.c index b1bf197be3dc..ff981cdfe47c 100644 --- a/sys/kern/subr_kdb.c +++ b/sys/kern/subr_kdb.c @@ -1,769 +1,799 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD * * Copyright (c) 2004 The FreeBSD Project * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_kdb.h" #include "opt_stack.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef SMP #include #endif #include u_char __read_frequently kdb_active = 0; static void *kdb_jmpbufp = NULL; struct kdb_dbbe *kdb_dbbe = NULL; static struct pcb kdb_pcb; struct pcb *kdb_thrctx = NULL; struct thread *kdb_thread = NULL; struct trapframe *kdb_frame = NULL; #ifdef BREAK_TO_DEBUGGER #define KDB_BREAK_TO_DEBUGGER 1 #else #define KDB_BREAK_TO_DEBUGGER 0 #endif #ifdef ALT_BREAK_TO_DEBUGGER #define KDB_ALT_BREAK_TO_DEBUGGER 1 #else #define KDB_ALT_BREAK_TO_DEBUGGER 0 #endif static int kdb_break_to_debugger = KDB_BREAK_TO_DEBUGGER; static int kdb_alt_break_to_debugger = KDB_ALT_BREAK_TO_DEBUGGER; +static int kdb_enter_securelevel = 0; KDB_BACKEND(null, NULL, NULL, NULL, NULL); static int kdb_sysctl_available(SYSCTL_HANDLER_ARGS); static int kdb_sysctl_current(SYSCTL_HANDLER_ARGS); static int kdb_sysctl_enter(SYSCTL_HANDLER_ARGS); static int kdb_sysctl_panic(SYSCTL_HANDLER_ARGS); static int kdb_sysctl_panic_str(SYSCTL_HANDLER_ARGS); static int kdb_sysctl_trap(SYSCTL_HANDLER_ARGS); static int kdb_sysctl_trap_code(SYSCTL_HANDLER_ARGS); static int kdb_sysctl_stack_overflow(SYSCTL_HANDLER_ARGS); static SYSCTL_NODE(_debug, OID_AUTO, kdb, CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, "KDB nodes"); SYSCTL_PROC(_debug_kdb, OID_AUTO, available, CTLTYPE_STRING | CTLFLAG_RD | CTLFLAG_MPSAFE, NULL, 0, kdb_sysctl_available, "A", "list of available KDB backends"); SYSCTL_PROC(_debug_kdb, OID_AUTO, current, CTLTYPE_STRING | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, 0, kdb_sysctl_current, "A", "currently selected KDB backend"); SYSCTL_PROC(_debug_kdb, OID_AUTO, enter, - CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_SECURE | CTLFLAG_MPSAFE, NULL, 0, + CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_MPSAFE, NULL, 0, kdb_sysctl_enter, "I", "set to enter the debugger"); SYSCTL_PROC(_debug_kdb, OID_AUTO, panic, CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_SECURE | CTLFLAG_MPSAFE, NULL, 0, kdb_sysctl_panic, "I", "set to panic the kernel"); SYSCTL_PROC(_debug_kdb, OID_AUTO, panic_str, CTLTYPE_STRING | CTLFLAG_RW | CTLFLAG_SECURE | CTLFLAG_MPSAFE, NULL, 0, kdb_sysctl_panic_str, "A", "trigger a kernel panic, using the provided string as the panic message"); SYSCTL_PROC(_debug_kdb, OID_AUTO, trap, CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_SECURE | CTLFLAG_MPSAFE, NULL, 0, kdb_sysctl_trap, "I", "set to cause a page fault via data access"); SYSCTL_PROC(_debug_kdb, OID_AUTO, trap_code, CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_SECURE | CTLFLAG_MPSAFE, NULL, 0, kdb_sysctl_trap_code, "I", "set to cause a page fault via code access"); SYSCTL_PROC(_debug_kdb, OID_AUTO, stack_overflow, CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_SECURE | CTLFLAG_MPSAFE, NULL, 0, kdb_sysctl_stack_overflow, "I", "set to cause a stack overflow"); SYSCTL_INT(_debug_kdb, OID_AUTO, break_to_debugger, - CTLFLAG_RWTUN | CTLFLAG_SECURE, + CTLFLAG_RWTUN, &kdb_break_to_debugger, 0, "Enable break to debugger"); SYSCTL_INT(_debug_kdb, OID_AUTO, alt_break_to_debugger, - CTLFLAG_RWTUN | CTLFLAG_SECURE, + CTLFLAG_RWTUN, &kdb_alt_break_to_debugger, 0, "Enable alternative break to debugger"); +SYSCTL_INT(_debug_kdb, OID_AUTO, enter_securelevel, + CTLFLAG_RWTUN | CTLFLAG_SECURE, + &kdb_enter_securelevel, 0, + "Maximum securelevel to enter a KDB backend"); + /* * Flag to indicate to debuggers why the debugger was entered. */ const char * volatile kdb_why = KDB_WHY_UNSET; static int kdb_sysctl_available(SYSCTL_HANDLER_ARGS) { struct kdb_dbbe **iter; struct sbuf sbuf; int error; sbuf_new_for_sysctl(&sbuf, NULL, 64, req); SET_FOREACH(iter, kdb_dbbe_set) { if ((*iter)->dbbe_active == 0) sbuf_printf(&sbuf, "%s ", (*iter)->dbbe_name); } error = sbuf_finish(&sbuf); sbuf_delete(&sbuf); return (error); } static int kdb_sysctl_current(SYSCTL_HANDLER_ARGS) { char buf[16]; int error; if (kdb_dbbe != NULL) strlcpy(buf, kdb_dbbe->dbbe_name, sizeof(buf)); else *buf = '\0'; error = sysctl_handle_string(oidp, buf, sizeof(buf), req); if (error != 0 || req->newptr == NULL) return (error); if (kdb_active) return (EBUSY); return (kdb_dbbe_select(buf)); } static int kdb_sysctl_enter(SYSCTL_HANDLER_ARGS) { int error, i; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error == 0) { i = 0; error = sysctl_handle_int(oidp, &i, 0, req); } if (error != 0 || req->newptr == NULL) return (error); if (kdb_active) return (EBUSY); kdb_enter(KDB_WHY_SYSCTL, "sysctl debug.kdb.enter"); return (0); } static int kdb_sysctl_panic(SYSCTL_HANDLER_ARGS) { int error, i; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error == 0) { i = 0; error = sysctl_handle_int(oidp, &i, 0, req); } if (error != 0 || req->newptr == NULL) return (error); panic("kdb_sysctl_panic"); return (0); } static int kdb_sysctl_panic_str(SYSCTL_HANDLER_ARGS) { int error; static char buf[256]; /* static buffer to limit mallocs when panicing */ *buf = '\0'; error = sysctl_handle_string(oidp, buf, sizeof(buf), req); if (error != 0 || req->newptr == NULL) return (error); panic("kdb_sysctl_panic: %s", buf); return (0); } static int kdb_sysctl_trap(SYSCTL_HANDLER_ARGS) { int error, i; int *addr = (int *)0x10; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error == 0) { i = 0; error = sysctl_handle_int(oidp, &i, 0, req); } if (error != 0 || req->newptr == NULL) return (error); return (*addr); } static int kdb_sysctl_trap_code(SYSCTL_HANDLER_ARGS) { int error, i; void (*fp)(u_int, u_int, u_int) = (void *)0xdeadc0de; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error == 0) { i = 0; error = sysctl_handle_int(oidp, &i, 0, req); } if (error != 0 || req->newptr == NULL) return (error); (*fp)(0x11111111, 0x22222222, 0x33333333); return (0); } static void kdb_stack_overflow(volatile int *x) __noinline; static void kdb_stack_overflow(volatile int *x) { if (*x > 10000000) return; kdb_stack_overflow(x); *x += PCPU_GET(cpuid) / 1000000; } static int kdb_sysctl_stack_overflow(SYSCTL_HANDLER_ARGS) { int error, i; volatile int x; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error == 0) { i = 0; error = sysctl_handle_int(oidp, &i, 0, req); } if (error != 0 || req->newptr == NULL) return (error); x = 0; kdb_stack_overflow(&x); return (0); } void kdb_panic(const char *msg) { kdb_why = KDB_WHY_PANIC; printf("KDB: panic\n"); panic("%s", msg); } void kdb_reboot(void) { kdb_why = KDB_WHY_REBOOT; printf("KDB: reboot requested\n"); shutdown_nice(0); } /* * Solaris implements a new BREAK which is initiated by a character sequence * CR ~ ^b which is similar to a familiar pattern used on Sun servers by the * Remote Console. * * Note that this function may be called from almost anywhere, with interrupts * disabled and with unknown locks held, so it must not access data other than * its arguments. Its up to the caller to ensure that the state variable is * consistent. */ #define KEY_CR 13 /* CR '\r' */ #define KEY_TILDE 126 /* ~ */ #define KEY_CRTLB 2 /* ^B */ #define KEY_CRTLP 16 /* ^P */ #define KEY_CRTLR 18 /* ^R */ /* States of th KDB "alternate break sequence" detecting state machine. */ enum { KDB_ALT_BREAK_SEEN_NONE, KDB_ALT_BREAK_SEEN_CR, KDB_ALT_BREAK_SEEN_CR_TILDE, }; int kdb_break(void) { if (!kdb_break_to_debugger) return (0); kdb_enter(KDB_WHY_BREAK, "Break to debugger"); return (KDB_REQ_DEBUGGER); } static int kdb_alt_break_state(int key, int *state) { int brk; /* All states transition to KDB_ALT_BREAK_SEEN_CR on a CR. */ if (key == KEY_CR) { *state = KDB_ALT_BREAK_SEEN_CR; return (0); } brk = 0; switch (*state) { case KDB_ALT_BREAK_SEEN_CR: *state = KDB_ALT_BREAK_SEEN_NONE; if (key == KEY_TILDE) *state = KDB_ALT_BREAK_SEEN_CR_TILDE; break; case KDB_ALT_BREAK_SEEN_CR_TILDE: *state = KDB_ALT_BREAK_SEEN_NONE; if (key == KEY_CRTLB) brk = KDB_REQ_DEBUGGER; else if (key == KEY_CRTLP) brk = KDB_REQ_PANIC; else if (key == KEY_CRTLR) brk = KDB_REQ_REBOOT; break; case KDB_ALT_BREAK_SEEN_NONE: default: *state = KDB_ALT_BREAK_SEEN_NONE; break; } return (brk); } static int kdb_alt_break_internal(int key, int *state, int force_gdb) { int brk; if (!kdb_alt_break_to_debugger) return (0); brk = kdb_alt_break_state(key, state); switch (brk) { case KDB_REQ_DEBUGGER: if (force_gdb) kdb_dbbe_select("gdb"); kdb_enter(KDB_WHY_BREAK, "Break to debugger"); break; case KDB_REQ_PANIC: if (force_gdb) kdb_dbbe_select("gdb"); kdb_panic("Panic sequence on console"); break; case KDB_REQ_REBOOT: kdb_reboot(); break; } return (0); } int kdb_alt_break(int key, int *state) { return (kdb_alt_break_internal(key, state, 0)); } /* * This variation on kdb_alt_break() is used only by dcons, which has its own * configuration flag to force GDB use regardless of the global KDB * configuration. */ int kdb_alt_break_gdb(int key, int *state) { return (kdb_alt_break_internal(key, state, 1)); } /* * Print a backtrace of the calling thread. The backtrace is generated by * the selected debugger, provided it supports backtraces. If no debugger * is selected or the current debugger does not support backtraces, this * function silently returns. */ void kdb_backtrace(void) { if (kdb_dbbe != NULL && kdb_dbbe->dbbe_trace != NULL) { printf("KDB: stack backtrace:\n"); kdb_dbbe->dbbe_trace(); } #ifdef STACK else { struct stack st; printf("KDB: stack backtrace:\n"); stack_save(&st); stack_print_ddb(&st); } #endif } /* * Similar to kdb_backtrace() except that it prints a backtrace of an * arbitrary thread rather than the calling thread. */ void kdb_backtrace_thread(struct thread *td) { if (kdb_dbbe != NULL && kdb_dbbe->dbbe_trace_thread != NULL) { printf("KDB: stack backtrace of thread %d:\n", td->td_tid); kdb_dbbe->dbbe_trace_thread(td); } #ifdef STACK else { struct stack st; printf("KDB: stack backtrace of thread %d:\n", td->td_tid); if (stack_save_td(&st, td) == 0) stack_print_ddb(&st); } #endif } /* * Set/change the current backend. */ int kdb_dbbe_select(const char *name) { struct kdb_dbbe *be, **iter; SET_FOREACH(iter, kdb_dbbe_set) { be = *iter; if (be->dbbe_active == 0 && strcmp(be->dbbe_name, name) == 0) { kdb_dbbe = be; return (0); } } return (EINVAL); } +static bool +kdb_backend_permitted(struct kdb_dbbe *be, struct ucred *cred) +{ + int error; + + error = securelevel_gt(cred, kdb_enter_securelevel); +#ifdef MAC + /* + * Give MAC a chance to weigh in on the policy: if the securelevel is + * not raised, then MAC may veto the backend, otherwise MAC may + * explicitly grant access. + */ + if (error == 0) { + error = mac_kdb_check_backend(be); + if (error != 0) { + printf("MAC prevented execution of KDB backend: %s\n", + be->dbbe_name); + return (false); + } + } else if (mac_kdb_grant_backend(be) == 0) { + error = 0; + } +#endif + if (error != 0) + printf("refusing to enter KDB with elevated securelevel\n"); + return (error == 0); +} + /* * Enter the currently selected debugger. If a message has been provided, * it is printed first. If the debugger does not support the enter method, * it is entered by using breakpoint(), which enters the debugger through * kdb_trap(). The 'why' argument will contain a more mechanically usable * string than 'msg', and is relied upon by DDB scripting to identify the * reason for entering the debugger so that the right script can be run. */ void kdb_enter(const char *why, const char *msg) { if (kdb_dbbe != NULL && kdb_active == 0) { kdb_why = why; if (msg != NULL) printf("KDB: enter: %s\n", msg); breakpoint(); kdb_why = KDB_WHY_UNSET; } } /* * Initialize the kernel debugger interface. */ void kdb_init(void) { struct kdb_dbbe *be, **iter; int cur_pri, pri; kdb_active = 0; kdb_dbbe = NULL; cur_pri = -1; SET_FOREACH(iter, kdb_dbbe_set) { be = *iter; pri = (be->dbbe_init != NULL) ? be->dbbe_init() : -1; be->dbbe_active = (pri >= 0) ? 0 : -1; if (pri > cur_pri) { cur_pri = pri; kdb_dbbe = be; } } if (kdb_dbbe != NULL) { printf("KDB: debugger backends:"); SET_FOREACH(iter, kdb_dbbe_set) { be = *iter; if (be->dbbe_active == 0) printf(" %s", be->dbbe_name); } printf("\n"); printf("KDB: current backend: %s\n", kdb_dbbe->dbbe_name); } } /* * Handle contexts. */ void * kdb_jmpbuf(jmp_buf new) { void *old; old = kdb_jmpbufp; kdb_jmpbufp = new; return (old); } void kdb_reenter(void) { if (!kdb_active || kdb_jmpbufp == NULL) return; printf("KDB: reentering\n"); kdb_backtrace(); longjmp(kdb_jmpbufp, 1); /* NOTREACHED */ } void kdb_reenter_silent(void) { if (!kdb_active || kdb_jmpbufp == NULL) return; longjmp(kdb_jmpbufp, 1); /* NOTREACHED */ } /* * Thread-related support functions. */ struct pcb * kdb_thr_ctx(struct thread *thr) { #if defined(SMP) && defined(KDB_STOPPEDPCB) struct pcpu *pc; #endif if (thr == curthread) return (&kdb_pcb); #if defined(SMP) && defined(KDB_STOPPEDPCB) STAILQ_FOREACH(pc, &cpuhead, pc_allcpu) { if (pc->pc_curthread == thr && CPU_ISSET(pc->pc_cpuid, &stopped_cpus)) return (KDB_STOPPEDPCB(pc)); } #endif return (thr->td_pcb); } struct thread * kdb_thr_first(void) { struct proc *p; struct thread *thr; u_int i; /* This function may be called early. */ if (pidhashtbl == NULL) return (&thread0); for (i = 0; i <= pidhash; i++) { LIST_FOREACH(p, &pidhashtbl[i], p_hash) { thr = FIRST_THREAD_IN_PROC(p); if (thr != NULL) return (thr); } } return (NULL); } struct thread * kdb_thr_from_pid(pid_t pid) { struct proc *p; LIST_FOREACH(p, PIDHASH(pid), p_hash) { if (p->p_pid == pid) return (FIRST_THREAD_IN_PROC(p)); } return (NULL); } struct thread * kdb_thr_lookup(lwpid_t tid) { struct thread *thr; thr = kdb_thr_first(); while (thr != NULL && thr->td_tid != tid) thr = kdb_thr_next(thr); return (thr); } struct thread * kdb_thr_next(struct thread *thr) { struct proc *p; u_int hash; p = thr->td_proc; thr = TAILQ_NEXT(thr, td_plist); if (thr != NULL) return (thr); if (pidhashtbl == NULL) return (NULL); hash = p->p_pid & pidhash; for (;;) { p = LIST_NEXT(p, p_hash); while (p == NULL) { if (++hash > pidhash) return (NULL); p = LIST_FIRST(&pidhashtbl[hash]); } thr = FIRST_THREAD_IN_PROC(p); if (thr != NULL) return (thr); } } int kdb_thr_select(struct thread *thr) { if (thr == NULL) return (EINVAL); kdb_thread = thr; kdb_thrctx = kdb_thr_ctx(thr); return (0); } /* * Enter the debugger due to a trap. */ int kdb_trap(int type, int code, struct trapframe *tf) { #ifdef SMP cpuset_t other_cpus; #endif struct kdb_dbbe *be; register_t intr; int handled; int did_stop_cpus; be = kdb_dbbe; if (be == NULL || be->dbbe_trap == NULL) return (0); /* We reenter the debugger through kdb_reenter(). */ if (kdb_active) return (0); intr = intr_disable(); if (!SCHEDULER_STOPPED()) { #ifdef SMP other_cpus = all_cpus; CPU_ANDNOT(&other_cpus, &other_cpus, &stopped_cpus); CPU_CLR(PCPU_GET(cpuid), &other_cpus); stop_cpus_hard(other_cpus); #endif curthread->td_stopsched = 1; did_stop_cpus = 1; } else did_stop_cpus = 0; kdb_active++; kdb_frame = tf; /* Let MD code do its thing first... */ kdb_cpu_trap(type, code); makectx(tf, &kdb_pcb); kdb_thr_select(curthread); cngrab(); for (;;) { -#ifdef MAC - if (mac_kdb_check_backend(be) != 0) { - printf("MAC prevented execution of KDB backend: %s\n", - be->dbbe_name); + if (!kdb_backend_permitted(be, curthread->td_ucred)) { /* Unhandled breakpoint traps are fatal. */ handled = 1; break; } -#endif handled = be->dbbe_trap(type, code); if (be == kdb_dbbe) break; be = kdb_dbbe; if (be == NULL || be->dbbe_trap == NULL) break; printf("Switching to %s back-end\n", be->dbbe_name); } cnungrab(); kdb_active--; if (did_stop_cpus) { curthread->td_stopsched = 0; #ifdef SMP CPU_AND(&other_cpus, &other_cpus, &stopped_cpus); restart_cpus(other_cpus); #endif } intr_restore(intr); return (handled); } diff --git a/sys/security/mac/mac_framework.h b/sys/security/mac/mac_framework.h index 31951c97a69e..8a1de6fe13e1 100644 --- a/sys/security/mac/mac_framework.h +++ b/sys/security/mac/mac_framework.h @@ -1,713 +1,714 @@ /*- * Copyright (c) 1999-2002, 2007-2011 Robert N. M. Watson * Copyright (c) 2001-2005 Networks Associates Technology, Inc. * Copyright (c) 2005-2006 SPARTA, Inc. * All rights reserved. * * This software was developed by Robert Watson for the TrustedBSD Project. * * This software was developed for the FreeBSD Project in part by Network * Associates Laboratories, the Security Research Division of Network * Associates, Inc. under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), * as part of the DARPA CHATS research program. * * This software was enhanced by SPARTA ISSO under SPAWAR contract * N66001-04-C-6019 ("SEFOS"). * * This software was developed at the University of Cambridge Computer * Laboratory with support from a grant from Google, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ /* * Kernel interface for Mandatory Access Control -- how kernel services * interact with the TrustedBSD MAC Framework. */ #ifndef _SECURITY_MAC_MAC_FRAMEWORK_H_ #define _SECURITY_MAC_MAC_FRAMEWORK_H_ #ifndef _KERNEL #error "no user-serviceable parts inside" #endif struct auditinfo; struct auditinfo_addr; struct bpf_d; struct cdev; struct componentname; struct devfs_dirent; struct ifnet; struct ifreq; struct image_params; struct inpcb; struct ip6q; struct ipq; struct kdb_dbbe; struct ksem; struct label; struct m_tag; struct mac; struct mbuf; struct mount; struct msg; struct msqid_kernel; struct proc; struct semid_kernel; struct shmfd; struct shmid_kernel; struct sockaddr; struct socket; struct sysctl_oid; struct sysctl_req; struct pipepair; struct thread; struct timespec; struct ucred; struct vattr; struct vnode; struct vop_setlabel_args; #include /* XXX acl_type_t */ #include /* accmode_t */ #include /* db_expr_t */ /* * Entry points to the TrustedBSD MAC Framework from the remainder of the * kernel: entry points are named based on a principle object type and an * action relating to it. They are sorted alphabetically first by object * type and then action. In some situations, the principle object type is * obvious, and in other cases, less so as multiple objects may be inolved * in the operation. */ int mac_bpfdesc_check_receive(struct bpf_d *d, struct ifnet *ifp); void mac_bpfdesc_create(struct ucred *cred, struct bpf_d *d); void mac_bpfdesc_create_mbuf(struct bpf_d *d, struct mbuf *m); void mac_bpfdesc_destroy(struct bpf_d *); void mac_bpfdesc_init(struct bpf_d *); void mac_cred_associate_nfsd(struct ucred *cred); int mac_cred_check_setaudit(struct ucred *cred, struct auditinfo *ai); int mac_cred_check_setaudit_addr(struct ucred *cred, struct auditinfo_addr *aia); int mac_cred_check_setauid(struct ucred *cred, uid_t auid); int mac_cred_check_setegid(struct ucred *cred, gid_t egid); int mac_cred_check_seteuid(struct ucred *cred, uid_t euid); int mac_cred_check_setgid(struct ucred *cred, gid_t gid); int mac_cred_check_setgroups(struct ucred *cred, int ngroups, gid_t *gidset); int mac_cred_check_setregid(struct ucred *cred, gid_t rgid, gid_t egid); int mac_cred_check_setresgid(struct ucred *cred, gid_t rgid, gid_t egid, gid_t sgid); int mac_cred_check_setresuid(struct ucred *cred, uid_t ruid, uid_t euid, uid_t suid); int mac_cred_check_setreuid(struct ucred *cred, uid_t ruid, uid_t euid); int mac_cred_check_setuid(struct ucred *cred, uid_t uid); int mac_cred_check_visible(struct ucred *cr1, struct ucred *cr2); void mac_cred_copy(struct ucred *cr1, struct ucred *cr2); void mac_cred_create_init(struct ucred *cred); void mac_cred_create_swapper(struct ucred *cred); void mac_cred_destroy(struct ucred *); void mac_cred_init(struct ucred *); int mac_ddb_command_register(struct db_command_table *table, struct db_command *cmd); int mac_ddb_command_exec(struct db_command *cmd, db_expr_t addr, bool have_addr, db_expr_t count, char *modif); void mac_devfs_create_device(struct ucred *cred, struct mount *mp, struct cdev *dev, struct devfs_dirent *de); void mac_devfs_create_directory(struct mount *mp, char *dirname, int dirnamelen, struct devfs_dirent *de); void mac_devfs_create_symlink(struct ucred *cred, struct mount *mp, struct devfs_dirent *dd, struct devfs_dirent *de); void mac_devfs_destroy(struct devfs_dirent *); void mac_devfs_init(struct devfs_dirent *); void mac_devfs_update(struct mount *mp, struct devfs_dirent *de, struct vnode *vp); void mac_devfs_vnode_associate(struct mount *mp, struct devfs_dirent *de, struct vnode *vp); int mac_ifnet_check_transmit_impl(struct ifnet *ifp, struct mbuf *m); #ifdef MAC extern bool mac_ifnet_check_transmit_fp_flag; #else #define mac_ifnet_check_transmit_fp_flag false #endif #define mac_ifnet_check_transmit_enabled() __predict_false(mac_ifnet_check_transmit_fp_flag) static inline int mac_ifnet_check_transmit(struct ifnet *ifp, struct mbuf *m) { if (mac_ifnet_check_transmit_enabled()) return (mac_ifnet_check_transmit_impl(ifp, m)); return (0); } void mac_ifnet_create(struct ifnet *ifp); void mac_ifnet_create_mbuf_impl(struct ifnet *ifp, struct mbuf *m); #ifdef MAC extern bool mac_ifnet_create_mbuf_fp_flag; #else #define mac_ifnet_create_mbuf_fp_flag false #endif #define mac_ifnet_create_mbuf_enabled() __predict_false(mac_ifnet_create_mbuf_fp_flag) static inline void mac_ifnet_create_mbuf(struct ifnet *ifp, struct mbuf *m) { if (mac_ifnet_create_mbuf_enabled()) mac_ifnet_create_mbuf_impl(ifp, m); } void mac_ifnet_destroy(struct ifnet *); void mac_ifnet_init(struct ifnet *); int mac_ifnet_ioctl_get(struct ucred *cred, struct ifreq *ifr, struct ifnet *ifp); int mac_ifnet_ioctl_set(struct ucred *cred, struct ifreq *ifr, struct ifnet *ifp); int mac_inpcb_check_deliver(struct inpcb *inp, struct mbuf *m); int mac_inpcb_check_visible(struct ucred *cred, struct inpcb *inp); void mac_inpcb_create(struct socket *so, struct inpcb *inp); void mac_inpcb_create_mbuf(struct inpcb *inp, struct mbuf *m); void mac_inpcb_destroy(struct inpcb *); int mac_inpcb_init(struct inpcb *, int); void mac_inpcb_sosetlabel(struct socket *so, struct inpcb *inp); void mac_ip6q_create(struct mbuf *m, struct ip6q *q6); void mac_ip6q_destroy(struct ip6q *q6); int mac_ip6q_init(struct ip6q *q6, int); int mac_ip6q_match(struct mbuf *m, struct ip6q *q6); void mac_ip6q_reassemble(struct ip6q *q6, struct mbuf *m); void mac_ip6q_update(struct mbuf *m, struct ip6q *q6); void mac_ipq_create(struct mbuf *m, struct ipq *q); void mac_ipq_destroy(struct ipq *q); int mac_ipq_init(struct ipq *q, int); int mac_ipq_match(struct mbuf *m, struct ipq *q); void mac_ipq_reassemble(struct ipq *q, struct mbuf *m); void mac_ipq_update(struct mbuf *m, struct ipq *q); int mac_kdb_check_backend(struct kdb_dbbe *be); +int mac_kdb_grant_backend(struct kdb_dbbe *be); int mac_kenv_check_dump(struct ucred *cred); int mac_kenv_check_get(struct ucred *cred, char *name); int mac_kenv_check_set(struct ucred *cred, char *name, char *value); int mac_kenv_check_unset(struct ucred *cred, char *name); int mac_kld_check_load(struct ucred *cred, struct vnode *vp); int mac_kld_check_stat(struct ucred *cred); void mac_mbuf_copy(struct mbuf *, struct mbuf *); int mac_mbuf_init(struct mbuf *, int); void mac_mbuf_tag_copy(struct m_tag *, struct m_tag *); void mac_mbuf_tag_destroy(struct m_tag *); int mac_mbuf_tag_init(struct m_tag *, int); int mac_mount_check_stat(struct ucred *cred, struct mount *mp); void mac_mount_create(struct ucred *cred, struct mount *mp); void mac_mount_destroy(struct mount *); void mac_mount_init(struct mount *); void mac_netinet_arp_send(struct ifnet *ifp, struct mbuf *m); void mac_netinet_firewall_reply(struct mbuf *mrecv, struct mbuf *msend); void mac_netinet_firewall_send(struct mbuf *m); void mac_netinet_fragment(struct mbuf *m, struct mbuf *frag); void mac_netinet_icmp_reply(struct mbuf *mrecv, struct mbuf *msend); void mac_netinet_icmp_replyinplace(struct mbuf *m); void mac_netinet_igmp_send(struct ifnet *ifp, struct mbuf *m); void mac_netinet_tcp_reply(struct mbuf *m); void mac_netinet6_nd6_send(struct ifnet *ifp, struct mbuf *m); int mac_pipe_check_ioctl(struct ucred *cred, struct pipepair *pp, unsigned long cmd, void *data); int mac_pipe_check_poll_impl(struct ucred *cred, struct pipepair *pp); #ifdef MAC extern bool mac_pipe_check_poll_fp_flag; #else #define mac_pipe_check_poll_fp_flag false #endif #define mac_pipe_check_poll_enabled() __predict_false(mac_pipe_check_poll_fp_flag) static inline int mac_pipe_check_poll(struct ucred *cred, struct pipepair *pp) { if (mac_pipe_check_poll_enabled()) return (mac_pipe_check_poll_impl(cred, pp)); return (0); } #ifdef MAC extern bool mac_pipe_check_stat_fp_flag; #else #define mac_pipe_check_stat_fp_flag false #endif #define mac_pipe_check_stat_enabled() __predict_false(mac_pipe_check_stat_fp_flag) int mac_pipe_check_stat(struct ucred *cred, struct pipepair *pp); int mac_pipe_check_read_impl(struct ucred *cred, struct pipepair *pp); #ifdef MAC extern bool mac_pipe_check_read_fp_flag; #else #define mac_pipe_check_read_fp_flag false #endif #define mac_pipe_check_read_enabled() __predict_false(mac_pipe_check_read_fp_flag) static inline int mac_pipe_check_read(struct ucred *cred, struct pipepair *pp) { if (mac_pipe_check_read_enabled()) return (mac_pipe_check_read_impl(cred, pp)); return (0); } int mac_pipe_check_write(struct ucred *cred, struct pipepair *pp); void mac_pipe_create(struct ucred *cred, struct pipepair *pp); void mac_pipe_destroy(struct pipepair *); void mac_pipe_init(struct pipepair *); int mac_pipe_label_set(struct ucred *cred, struct pipepair *pp, struct label *label); int mac_posixsem_check_getvalue(struct ucred *active_cred, struct ucred *file_cred, struct ksem *ks); int mac_posixsem_check_open(struct ucred *cred, struct ksem *ks); int mac_posixsem_check_post(struct ucred *active_cred, struct ucred *file_cred, struct ksem *ks); int mac_posixsem_check_setmode(struct ucred *cred, struct ksem *ks, mode_t mode); int mac_posixsem_check_setowner(struct ucred *cred, struct ksem *ks, uid_t uid, gid_t gid); int mac_posixsem_check_stat(struct ucred *active_cred, struct ucred *file_cred, struct ksem *ks); int mac_posixsem_check_unlink(struct ucred *cred, struct ksem *ks); int mac_posixsem_check_wait(struct ucred *active_cred, struct ucred *file_cred, struct ksem *ks); void mac_posixsem_create(struct ucred *cred, struct ksem *ks); void mac_posixsem_destroy(struct ksem *); void mac_posixsem_init(struct ksem *); int mac_posixshm_check_create(struct ucred *cred, const char *path); int mac_posixshm_check_mmap(struct ucred *cred, struct shmfd *shmfd, int prot, int flags); int mac_posixshm_check_open(struct ucred *cred, struct shmfd *shmfd, accmode_t accmode); int mac_posixshm_check_read(struct ucred *active_cred, struct ucred *file_cred, struct shmfd *shmfd); int mac_posixshm_check_setmode(struct ucred *cred, struct shmfd *shmfd, mode_t mode); int mac_posixshm_check_setowner(struct ucred *cred, struct shmfd *shmfd, uid_t uid, gid_t gid); int mac_posixshm_check_stat(struct ucred *active_cred, struct ucred *file_cred, struct shmfd *shmfd); int mac_posixshm_check_truncate(struct ucred *active_cred, struct ucred *file_cred, struct shmfd *shmfd); int mac_posixshm_check_unlink(struct ucred *cred, struct shmfd *shmfd); int mac_posixshm_check_write(struct ucred *active_cred, struct ucred *file_cred, struct shmfd *shmfd); void mac_posixshm_create(struct ucred *cred, struct shmfd *shmfd); void mac_posixshm_destroy(struct shmfd *); void mac_posixshm_init(struct shmfd *); int mac_priv_check_impl(struct ucred *cred, int priv); #ifdef MAC extern bool mac_priv_check_fp_flag; #else #define mac_priv_check_fp_flag false #endif #define mac_priv_check_enabled() __predict_false(mac_priv_check_fp_flag) static inline int mac_priv_check(struct ucred *cred, int priv) { if (mac_priv_check_enabled()) return (mac_priv_check_impl(cred, priv)); return (0); } int mac_priv_grant_impl(struct ucred *cred, int priv); #ifdef MAC extern bool mac_priv_grant_fp_flag; #else #define mac_priv_grant_fp_flag false #endif #define mac_priv_grant_enabled() __predict_false(mac_priv_grant_fp_flag) static inline int mac_priv_grant(struct ucred *cred, int priv) { if (mac_priv_grant_enabled()) return (mac_priv_grant_impl(cred, priv)); return (EPERM); } int mac_proc_check_debug(struct ucred *cred, struct proc *p); int mac_proc_check_sched(struct ucred *cred, struct proc *p); int mac_proc_check_signal(struct ucred *cred, struct proc *p, int signum); int mac_proc_check_wait(struct ucred *cred, struct proc *p); void mac_proc_destroy(struct proc *); void mac_proc_init(struct proc *); void mac_proc_vm_revoke(struct thread *td); int mac_execve_enter(struct image_params *imgp, struct mac *mac_p); void mac_execve_exit(struct image_params *imgp); void mac_execve_interpreter_enter(struct vnode *interpvp, struct label **interplabel); void mac_execve_interpreter_exit(struct label *interpvplabel); int mac_socket_check_accept(struct ucred *cred, struct socket *so); int mac_socket_check_bind(struct ucred *cred, struct socket *so, struct sockaddr *sa); int mac_socket_check_connect(struct ucred *cred, struct socket *so, struct sockaddr *sa); int mac_socket_check_create(struct ucred *cred, int domain, int type, int proto); int mac_socket_check_deliver(struct socket *so, struct mbuf *m); int mac_socket_check_listen(struct ucred *cred, struct socket *so); int mac_socket_check_poll(struct ucred *cred, struct socket *so); int mac_socket_check_receive(struct ucred *cred, struct socket *so); int mac_socket_check_send(struct ucred *cred, struct socket *so); int mac_socket_check_stat(struct ucred *cred, struct socket *so); int mac_socket_check_visible(struct ucred *cred, struct socket *so); void mac_socket_create_mbuf(struct socket *so, struct mbuf *m); void mac_socket_create(struct ucred *cred, struct socket *so); void mac_socket_destroy(struct socket *); int mac_socket_init(struct socket *, int); void mac_socket_newconn(struct socket *oldso, struct socket *newso); int mac_getsockopt_label(struct ucred *cred, struct socket *so, struct mac *extmac); int mac_getsockopt_peerlabel(struct ucred *cred, struct socket *so, struct mac *extmac); int mac_setsockopt_label(struct ucred *cred, struct socket *so, struct mac *extmac); void mac_socketpeer_set_from_mbuf(struct mbuf *m, struct socket *so); void mac_socketpeer_set_from_socket(struct socket *oldso, struct socket *newso); void mac_syncache_create(struct label *l, struct inpcb *inp); void mac_syncache_create_mbuf(struct label *l, struct mbuf *m); void mac_syncache_destroy(struct label **l); int mac_syncache_init(struct label **l); int mac_system_check_acct(struct ucred *cred, struct vnode *vp); int mac_system_check_audit(struct ucred *cred, void *record, int length); int mac_system_check_auditctl(struct ucred *cred, struct vnode *vp); int mac_system_check_auditon(struct ucred *cred, int cmd); int mac_system_check_reboot(struct ucred *cred, int howto); int mac_system_check_swapon(struct ucred *cred, struct vnode *vp); int mac_system_check_swapoff(struct ucred *cred, struct vnode *vp); int mac_system_check_sysctl(struct ucred *cred, struct sysctl_oid *oidp, void *arg1, int arg2, struct sysctl_req *req); void mac_sysvmsg_cleanup(struct msg *msgptr); void mac_sysvmsg_create(struct ucred *cred, struct msqid_kernel *msqkptr, struct msg *msgptr); void mac_sysvmsg_destroy(struct msg *); void mac_sysvmsg_init(struct msg *); int mac_sysvmsq_check_msgmsq(struct ucred *cred, struct msg *msgptr, struct msqid_kernel *msqkptr); int mac_sysvmsq_check_msgrcv(struct ucred *cred, struct msg *msgptr); int mac_sysvmsq_check_msgrmid(struct ucred *cred, struct msg *msgptr); int mac_sysvmsq_check_msqctl(struct ucred *cred, struct msqid_kernel *msqkptr, int cmd); int mac_sysvmsq_check_msqget(struct ucred *cred, struct msqid_kernel *msqkptr); int mac_sysvmsq_check_msqrcv(struct ucred *cred, struct msqid_kernel *msqkptr); int mac_sysvmsq_check_msqsnd(struct ucred *cred, struct msqid_kernel *msqkptr); void mac_sysvmsq_cleanup(struct msqid_kernel *msqkptr); void mac_sysvmsq_create(struct ucred *cred, struct msqid_kernel *msqkptr); void mac_sysvmsq_destroy(struct msqid_kernel *); void mac_sysvmsq_init(struct msqid_kernel *); int mac_sysvsem_check_semctl(struct ucred *cred, struct semid_kernel *semakptr, int cmd); int mac_sysvsem_check_semget(struct ucred *cred, struct semid_kernel *semakptr); int mac_sysvsem_check_semop(struct ucred *cred, struct semid_kernel *semakptr, size_t accesstype); void mac_sysvsem_cleanup(struct semid_kernel *semakptr); void mac_sysvsem_create(struct ucred *cred, struct semid_kernel *semakptr); void mac_sysvsem_destroy(struct semid_kernel *); void mac_sysvsem_init(struct semid_kernel *); int mac_sysvshm_check_shmat(struct ucred *cred, struct shmid_kernel *shmsegptr, int shmflg); int mac_sysvshm_check_shmctl(struct ucred *cred, struct shmid_kernel *shmsegptr, int cmd); int mac_sysvshm_check_shmdt(struct ucred *cred, struct shmid_kernel *shmsegptr); int mac_sysvshm_check_shmget(struct ucred *cred, struct shmid_kernel *shmsegptr, int shmflg); void mac_sysvshm_cleanup(struct shmid_kernel *shmsegptr); void mac_sysvshm_create(struct ucred *cred, struct shmid_kernel *shmsegptr); void mac_sysvshm_destroy(struct shmid_kernel *); void mac_sysvshm_init(struct shmid_kernel *); void mac_thread_userret(struct thread *td); #if defined(MAC) && defined(DEBUG_VFS_LOCKS) void mac_vnode_assert_locked(struct vnode *vp, const char *func); #else #define mac_vnode_assert_locked(vp, func) do { } while (0) #endif int mac_vnode_associate_extattr(struct mount *mp, struct vnode *vp); void mac_vnode_associate_singlelabel(struct mount *mp, struct vnode *vp); int mac_vnode_check_access_impl(struct ucred *cred, struct vnode *dvp, accmode_t accmode); extern bool mac_vnode_check_access_fp_flag; #define mac_vnode_check_access_enabled() __predict_false(mac_vnode_check_access_fp_flag) static inline int mac_vnode_check_access(struct ucred *cred, struct vnode *dvp, accmode_t accmode) { mac_vnode_assert_locked(dvp, "mac_vnode_check_access"); if (mac_vnode_check_access_enabled()) return (mac_vnode_check_access_impl(cred, dvp, accmode)); return (0); } int mac_vnode_check_chdir(struct ucred *cred, struct vnode *dvp); int mac_vnode_check_chroot(struct ucred *cred, struct vnode *dvp); int mac_vnode_check_create(struct ucred *cred, struct vnode *dvp, struct componentname *cnp, struct vattr *vap); int mac_vnode_check_deleteacl(struct ucred *cred, struct vnode *vp, acl_type_t type); int mac_vnode_check_deleteextattr(struct ucred *cred, struct vnode *vp, int attrnamespace, const char *name); int mac_vnode_check_exec(struct ucred *cred, struct vnode *vp, struct image_params *imgp); int mac_vnode_check_getacl(struct ucred *cred, struct vnode *vp, acl_type_t type); int mac_vnode_check_getextattr(struct ucred *cred, struct vnode *vp, int attrnamespace, const char *name); int mac_vnode_check_link(struct ucred *cred, struct vnode *dvp, struct vnode *vp, struct componentname *cnp); int mac_vnode_check_listextattr(struct ucred *cred, struct vnode *vp, int attrnamespace); int mac_vnode_check_lookup_impl(struct ucred *cred, struct vnode *dvp, struct componentname *cnp); #ifdef MAC extern bool mac_vnode_check_lookup_fp_flag; #else #define mac_vnode_check_lookup_fp_flag false #endif #define mac_vnode_check_lookup_enabled() __predict_false(mac_vnode_check_lookup_fp_flag) static inline int mac_vnode_check_lookup(struct ucred *cred, struct vnode *dvp, struct componentname *cnp) { mac_vnode_assert_locked(dvp, "mac_vnode_check_lookup"); if (mac_vnode_check_lookup_enabled()) return (mac_vnode_check_lookup_impl(cred, dvp, cnp)); return (0); } int mac_vnode_check_mmap_impl(struct ucred *cred, struct vnode *vp, int prot, int flags); #ifdef MAC extern bool mac_vnode_check_mmap_fp_flag; #else #define mac_vnode_check_mmap_fp_flag false #endif #define mac_vnode_check_mmap_enabled() __predict_false(mac_vnode_check_mmap_fp_flag) static inline int mac_vnode_check_mmap(struct ucred *cred, struct vnode *vp, int prot, int flags) { mac_vnode_assert_locked(vp, "mac_vnode_check_mmap"); if (mac_vnode_check_mmap_enabled()) return (mac_vnode_check_mmap_impl(cred, vp, prot, flags)); return (0); } int mac_vnode_check_open_impl(struct ucred *cred, struct vnode *vp, accmode_t accmode); #ifdef MAC extern bool mac_vnode_check_open_fp_flag; #else #define mac_vnode_check_open_fp_flag false #endif #define mac_vnode_check_open_enabled() __predict_false(mac_vnode_check_open_fp_flag) static inline int mac_vnode_check_open(struct ucred *cred, struct vnode *vp, accmode_t accmode) { mac_vnode_assert_locked(vp, "mac_vnode_check_open"); if (mac_vnode_check_open_enabled()) return (mac_vnode_check_open_impl(cred, vp, accmode)); return (0); } int mac_vnode_check_mprotect(struct ucred *cred, struct vnode *vp, int prot); #define mac_vnode_check_poll_enabled() __predict_false(mac_vnode_check_poll_fp_flag) #ifdef MAC extern bool mac_vnode_check_poll_fp_flag; int mac_vnode_check_poll(struct ucred *active_cred, struct ucred *file_cred, struct vnode *vp); #else #define mac_vnode_check_poll_fp_flag false static inline int mac_vnode_check_poll(struct ucred *active_cred, struct ucred *file_cred, struct vnode *vp) { return (0); } #endif int mac_vnode_check_readdir(struct ucred *cred, struct vnode *vp); int mac_vnode_check_readlink_impl(struct ucred *cred, struct vnode *dvp); #ifdef MAC extern bool mac_vnode_check_readlink_fp_flag; #else #define mac_vnode_check_readlink_fp_flag false #endif #define mac_vnode_check_readlink_enabled() __predict_false(mac_vnode_check_readlink_fp_flag) static inline int mac_vnode_check_readlink(struct ucred *cred, struct vnode *vp) { mac_vnode_assert_locked(vp, "mac_vnode_check_readlink"); if (mac_vnode_check_readlink_enabled()) return (mac_vnode_check_readlink_impl(cred, vp)); return (0); } #define mac_vnode_check_rename_from_enabled() __predict_false(mac_vnode_check_rename_from_fp_flag) #ifdef MAC extern bool mac_vnode_check_rename_from_fp_flag; #endif int mac_vnode_check_rename_from(struct ucred *cred, struct vnode *dvp, struct vnode *vp, struct componentname *cnp); int mac_vnode_check_rename_to(struct ucred *cred, struct vnode *dvp, struct vnode *vp, int samedir, struct componentname *cnp); int mac_vnode_check_revoke(struct ucred *cred, struct vnode *vp); int mac_vnode_check_setacl(struct ucred *cred, struct vnode *vp, acl_type_t type, struct acl *acl); int mac_vnode_check_setextattr(struct ucred *cred, struct vnode *vp, int attrnamespace, const char *name); int mac_vnode_check_setflags(struct ucred *cred, struct vnode *vp, u_long flags); int mac_vnode_check_setmode(struct ucred *cred, struct vnode *vp, mode_t mode); int mac_vnode_check_setowner(struct ucred *cred, struct vnode *vp, uid_t uid, gid_t gid); int mac_vnode_check_setutimes(struct ucred *cred, struct vnode *vp, struct timespec atime, struct timespec mtime); int mac_vnode_check_stat_impl(struct ucred *active_cred, struct ucred *file_cred, struct vnode *vp); #ifdef MAC extern bool mac_vnode_check_stat_fp_flag; #else #define mac_vnode_check_stat_fp_flag false #endif #define mac_vnode_check_stat_enabled() __predict_false(mac_vnode_check_stat_fp_flag) static inline int mac_vnode_check_stat(struct ucred *active_cred, struct ucred *file_cred, struct vnode *vp) { mac_vnode_assert_locked(vp, "mac_vnode_check_stat"); if (mac_vnode_check_stat_enabled()) return (mac_vnode_check_stat_impl(active_cred, file_cred, vp)); return (0); } int mac_vnode_check_read_impl(struct ucred *active_cred, struct ucred *file_cred, struct vnode *vp); #ifdef MAC extern bool mac_vnode_check_read_fp_flag; #else #define mac_vnode_check_read_fp_flag false #endif #define mac_vnode_check_read_enabled() __predict_false(mac_vnode_check_read_fp_flag) static inline int mac_vnode_check_read(struct ucred *active_cred, struct ucred *file_cred, struct vnode *vp) { mac_vnode_assert_locked(vp, "mac_vnode_check_read"); if (mac_vnode_check_read_enabled()) return (mac_vnode_check_read_impl(active_cred, file_cred, vp)); return (0); } int mac_vnode_check_write_impl(struct ucred *active_cred, struct ucred *file_cred, struct vnode *vp); #ifdef MAC extern bool mac_vnode_check_write_fp_flag; #else #define mac_vnode_check_write_fp_flag false #endif #define mac_vnode_check_write_enabled() __predict_false(mac_vnode_check_write_fp_flag) static inline int mac_vnode_check_write(struct ucred *active_cred, struct ucred *file_cred, struct vnode *vp) { mac_vnode_assert_locked(vp, "mac_vnode_check_write"); if (mac_vnode_check_write_enabled()) return (mac_vnode_check_write_impl(active_cred, file_cred, vp)); return (0); } int mac_vnode_check_unlink(struct ucred *cred, struct vnode *dvp, struct vnode *vp, struct componentname *cnp); void mac_vnode_copy_label(struct label *, struct label *); void mac_vnode_init(struct vnode *); int mac_vnode_create_extattr(struct ucred *cred, struct mount *mp, struct vnode *dvp, struct vnode *vp, struct componentname *cnp); void mac_vnode_destroy(struct vnode *); void mac_vnode_execve_transition(struct ucred *oldcred, struct ucred *newcred, struct vnode *vp, struct label *interpvplabel, struct image_params *imgp); int mac_vnode_execve_will_transition(struct ucred *cred, struct vnode *vp, struct label *interpvplabel, struct image_params *imgp); void mac_vnode_relabel(struct ucred *cred, struct vnode *vp, struct label *newlabel); /* * Calls to help various file systems implement labeling functionality using * their existing EA implementation. */ int vop_stdsetlabel_ea(struct vop_setlabel_args *ap); #endif /* !_SECURITY_MAC_MAC_FRAMEWORK_H_ */ diff --git a/sys/security/mac/mac_kdb.c b/sys/security/mac/mac_kdb.c index 9082ec7d4580..08bb3eb743f1 100644 --- a/sys/security/mac/mac_kdb.c +++ b/sys/security/mac/mac_kdb.c @@ -1,69 +1,78 @@ /*- * SPDX-License-Identifier: BSD-2-Clause * * Copyright (c) 2021-2022 Klara Systems * * This software was developed by Mitchell Horne * under sponsorship from Juniper Networks and Klara Systems. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include "opt_mac.h" #include #include #include #include #include #include #include +int +mac_kdb_grant_backend(struct kdb_dbbe *be) +{ + int error = 0; + + MAC_POLICY_GRANT_NOSLEEP(kdb_check_backend, be); + return (error); +} + int mac_kdb_check_backend(struct kdb_dbbe *be) { int error = 0; MAC_POLICY_CHECK_NOSLEEP(kdb_check_backend, be); return (error); } int mac_ddb_command_register(struct db_command_table *table, struct db_command *cmd) { int error = 0; MAC_POLICY_CHECK_NOSLEEP(ddb_command_register, table, cmd); return (error); } int mac_ddb_command_exec(struct db_command *cmd, db_expr_t addr, bool have_addr, db_expr_t count, char *modif) { int error = 0; MAC_POLICY_CHECK_NOSLEEP(ddb_command_exec, cmd, addr, have_addr, count, modif); return (error); }