diff --git a/sbin/dumpon/dumpon.8 b/sbin/dumpon/dumpon.8 index 1ab3c1650adc..a62bd366dfb4 100644 --- a/sbin/dumpon/dumpon.8 +++ b/sbin/dumpon/dumpon.8 @@ -1,375 +1,409 @@ .\" Copyright (c) 1980, 1991, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" From: @(#)swapon.8 8.1 (Berkeley) 6/5/93 .\" $FreeBSD$ .\" -.Dd November 17, 2018 +.Dd May 6, 2019 .Dt DUMPON 8 .Os .Sh NAME .Nm dumpon .Nd "specify a device for crash dumps" .Sh SYNOPSIS .Nm +.Op Fl i Ar index +.Op Fl r .Op Fl v .Op Fl k Ar pubkey .Op Fl Z .Op Fl z .Ar device .Nm +.Op Fl i Ar index +.Op Fl r .Op Fl v .Op Fl k Ar pubkey .Op Fl Z .Op Fl z .Op Fl g Ar gateway .Fl s Ar server .Fl c Ar client .Ar iface .Nm .Op Fl v .Cm off .Nm .Op Fl v .Fl l .Sh DESCRIPTION The .Nm utility is used to configure where the kernel can save a crash dump in the case of a panic. .Pp System administrators should typically configure .Nm in a persistent fashion using the .Xr rc.conf 5 variables .Va dumpdev and .Va dumpon_flags . For more information on this usage, see .Xr rc.conf 5 . +.Pp +Starting in +.Fx 13.0 , +.Nm +can configure a series of fallback dump devices. +For example, an administrator may prefer +.Xr netdump 4 +by default, but if the +.Xr netdump 4 +service cannot be reached or some other failure occurs, they might choose a +local disk dump as a second choice option. .Ss General options .Bl -tag -width _k_pubkey +.It Fl i Ar index +Insert the specified dump configuration into the prioritized fallback dump +device list at the specified index, starting at zero. +.Pp +If +.Fl i +is not specified, the configured dump device is appended to the prioritized +list. +.It Fl r +Remove the specified dump device configuration or configurations from the +fallback dump device list rather than inserting or appending it. +In contrast, +.Do +.Nm +off +.Dc +removes all configured devices. +Conflicts with +.Fl i . .It Fl k Ar pubkey Configure encrypted kernel dumps. .Pp A random, one-time symmetric key is automatically generated for bulk kernel dump encryption every time .Nm is used. The provided .Ar pubkey is used to encrypt a copy of the symmetric key. The encrypted dump contents consist of a standard dump header, the pubkey-encrypted symmetric key contents, and the symmetric key encrypted core dump contents. .Pp As a result, only someone with the corresponding private key can decrypt the symmetric key. The symmetric key is necessary to decrypt the kernel core. The goal of the mechanism is to provide confidentiality. .Pp The .Va pubkey file should be a PEM-formatted RSA key of at least 1024 bits. .It Fl l -List the currently configured dump device, or /dev/null if no device is +List the currently configured dump device(s), or /dev/null if no devices are configured. .It Fl v Enable verbose mode. .It Fl Z Enable compression (Zstandard). .It Fl z Enable compression (gzip). Only one compression method may be enabled at a time, so .Fl z is incompatible with .Fl Z . .Pp Zstandard provides superior compression ratio and performance. .El .Ss Netdump .Nm may also configure the kernel to dump to a remote .Xr netdumpd 8 server. (The .Xr netdumpd 8 server is available in ports.) .Xr netdump 4 eliminates the need to reserve space for crash dumps. It is especially useful in diskless environments. When .Nm is used to configure netdump, the .Ar device (or .Ar iface ) parameter should specify a network interface (e.g., .Va igb1 ) . The specified NIC must be up (online) to configure netdump. .Pp .Xr netdump 4 specific options include: .Bl -tag -width _g_gateway .It Fl c Ar client The local IP address of the .Xr netdump 4 client. .It Fl g Ar gateway The first-hop router between .Ar client and .Ar server . If the .Fl g option is not specified and the system has a default route, the default router is used as the .Xr netdump 4 gateway. If the .Fl g option is not specified and the system does not have a default route, .Ar server is assumed to be on the same link as .Ar client . .It Fl s Ar server The IP address of the .Xr netdumpd 8 server. .El .Pp All of these options can be specified in the .Xr rc.conf 5 variable .Va dumpon_flags . .Ss Minidumps The default type of kernel crash dump is the mini crash dump. Mini crash dumps hold only memory pages in use by the kernel. Alternatively, full memory dumps can be enabled by setting the .Va debug.minidump .Xr sysctl 8 variable to 0. .Ss Full dumps For systems using full memory dumps, the size of the specified dump device must be at least the size of physical memory. Even though an additional 64 kB header is added to the dump, the BIOS for a platform typically holds back some memory, so it is not usually necessary to size the dump device larger than the actual amount of RAM available in the machine. Also, when using full memory dumps, the .Nm utility will refuse to enable a dump device which is smaller than the total amount of physical memory as reported by the .Va hw.physmem .Xr sysctl 8 variable. .Sh IMPLEMENTATION NOTES Because the file system layer is already dead by the time a crash dump is taken, it is not possible to send crash dumps directly to a file. .Pp The .Xr loader 8 variable .Va dumpdev may be used to enable early kernel core dumps for system panics which occur before userspace starts. .Sh EXAMPLES In order to generate an RSA private key, a user can use the .Xr genrsa 1 tool: .Pp .Dl # openssl genrsa -out private.pem 4096 .Pp A public key can be extracted from the private key using the .Xr rsa 1 tool: .Pp .Dl # openssl rsa -in private.pem -out public.pem -pubout .Pp Once the RSA keys are created in a safe place, the public key may be moved to the untrusted netdump client machine. Now .Pa public.pem can be used by .Nm to configure encrypted kernel crash dumps: .Pp .Dl # dumpon -k public.pem /dev/ada0s1b .Pp It is recommended to test if the kernel saves encrypted crash dumps using the current configuration. The easiest way to do that is to cause a kernel panic using the .Xr ddb 4 debugger: .Pp .Dl # sysctl debug.kdb.panic=1 .Pp In the debugger the following commands should be typed to write a core dump and reboot: .Pp .Dl db> call doadump(0) .Dl db> reset .Pp After reboot .Xr savecore 8 should be able to save the core dump in the .Va Dq dumpdir directory, which is .Pa /var/crash by default: .Pp .Dl # savecore /dev/ada0s1b .Pp Three files should be created in the core directory: .Pa info.# , .Pa key.# and .Pa vmcore_encrypted.# (where .Dq # is the number of the last core dump saved by .Xr savecore 8 ) . The .Pa vmcore_encrypted.# can be decrypted using the .Xr decryptcore 8 utility: .Pp .Dl # decryptcore -p private.pem -k key.# -e vmcore_encrypted.# -c vmcore.# .Pp or shorter: .Pp .Dl # decryptcore -p private.pem -n # .Pp The .Pa vmcore.# can be now examined using .Xr kgdb 1 : .Pp .Dl # kgdb /boot/kernel/kernel vmcore.# .Pp or shorter: .Pp .Dl # kgdb -n # .Pp The core was decrypted properly if .Xr kgdb 1 does not print any errors. Note that the live kernel might be at a different path which can be examined by looking at the .Va kern.bootfile .Xr sysctl 8 . .Sh SEE ALSO .Xr gzip 1 , .Xr kgdb 1 , .Xr zstd 1 , .Xr ddb 4 , .Xr netdump 4 , .Xr fstab 5 , .Xr rc.conf 5 , .Xr config 8 , .Xr decryptcore 8 , .Xr init 8 , .Xr loader 8 , .Xr rc 8 , .Xr savecore 8 , .Xr swapon 8 , .Xr panic 9 .Sh HISTORY The .Nm utility appeared in .Fx 2.0.5 . .Pp Support for encrypted kernel core dumps and netdump was added in .Fx 12.0 . .Sh AUTHORS The .Nm manual page was written by .An Mark Johnston Aq Mt markj@FreeBSD.org , .An Conrad Meyer Aq Mt cem@FreeBSD.org , .An Konrad Witaszczyk Aq Mt def@FreeBSD.org , and countless others. .Sh CAVEATS To configure encrypted kernel core dumps, the running kernel must have been compiled with the .Dv EKCD option. .Pp Netdump does not automatically update the configured .Ar gateway if routing topology changes. .Pp The size of a compressed dump or a minidump is not a fixed function of RAM size. Therefore, when at least one of these options is enabled, the .Nm utility cannot verify that the .Ar device has sufficient space for a dump. .Nm is also unable to verify that a configured .Xr netdumpd 8 server has sufficient space for a dump. .Pp .Fl Z requires a kernel compiled with the .Dv ZSTDIO kernel option. Similarly, .Fl z requires the .Dv GZIO option. .Sh BUGS It is currently not possible to configure both compression and encryption. The encrypted dump format assumes that the kernel dump size is a multiple of the cipher block size, which may not be true when the dump is compressed. .Pp Netdump only supports IPv4 at this time. .Sh SECURITY CONSIDERATIONS The current encrypted kernel core dump scheme does not provide integrity nor authentication. That is, the recipient of an encrypted kernel core dump cannot know if they received an intact core dump, nor can they verify the provenance of the dump. .Pp RSA keys smaller than 1024 bits are practical to factor and therefore weak. Even 1024 bit keys may not be large enough to ensure privacy for many years, so NIST recommends a minimum of 2048 bit RSA keys. As a seatbelt, .Nm prevents users from configuring encrypted kernel dumps with extremely weak RSA keys. If you do not care for cryptographic privacy guarantees, just use .Nm without specifying a .Fl k Ar pubkey option. .Pp This process is sandboxed using .Xr capsicum 4 . diff --git a/sbin/dumpon/dumpon.c b/sbin/dumpon/dumpon.c index c9805c47d14c..3eec6495b215 100644 --- a/sbin/dumpon/dumpon.c +++ b/sbin/dumpon/dumpon.c @@ -1,523 +1,560 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1980, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #if 0 #ifndef lint static const char copyright[] = "@(#) Copyright (c) 1980, 1993\n\ The Regents of the University of California. All rights reserved.\n"; #endif /* not lint */ #ifndef lint static char sccsid[] = "From: @(#)swapon.c 8.1 (Berkeley) 6/5/93"; #endif /* not lint */ #endif #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef HAVE_CRYPTO #include #include #include #endif static int verbose; static void _Noreturn usage(void) { fprintf(stderr, - "usage: dumpon [-v] [-k ] [-Zz] \n" - " dumpon [-v] [-k ] [-Zz]\n" + "usage: dumpon [-i index] [-r] [-v] [-k ] [-Zz] \n" + " dumpon [-i index] [-r] [-v] [-k ] [-Zz]\n" " [-g ] -s -c \n" " dumpon [-v] off\n" " dumpon [-v] -l\n"); exit(EX_USAGE); } /* * Look for a default route on the specified interface. */ static char * find_gateway(const char *ifname) { struct ifaddrs *ifa, *ifap; struct rt_msghdr *rtm; struct sockaddr *sa; struct sockaddr_dl *sdl; struct sockaddr_in *dst, *mask, *gw; char *buf, *next, *ret; size_t sz; int error, i, ifindex, mib[7]; /* First look up the interface index. */ if (getifaddrs(&ifap) != 0) err(EX_OSERR, "getifaddrs"); for (ifa = ifap; ifa != NULL; ifa = ifa->ifa_next) { if (ifa->ifa_addr->sa_family != AF_LINK) continue; if (strcmp(ifa->ifa_name, ifname) == 0) { sdl = (struct sockaddr_dl *)(void *)ifa->ifa_addr; ifindex = sdl->sdl_index; break; } } if (ifa == NULL) errx(1, "couldn't find interface index for '%s'", ifname); freeifaddrs(ifap); /* Now get the IPv4 routing table. */ mib[0] = CTL_NET; mib[1] = PF_ROUTE; mib[2] = 0; mib[3] = AF_INET; mib[4] = NET_RT_DUMP; mib[5] = 0; mib[6] = -1; /* FIB */ for (;;) { if (sysctl(mib, nitems(mib), NULL, &sz, NULL, 0) != 0) err(EX_OSERR, "sysctl(NET_RT_DUMP)"); buf = malloc(sz); error = sysctl(mib, nitems(mib), buf, &sz, NULL, 0); if (error == 0) break; if (errno != ENOMEM) err(EX_OSERR, "sysctl(NET_RT_DUMP)"); free(buf); } ret = NULL; for (next = buf; next < buf + sz; next += rtm->rtm_msglen) { rtm = (struct rt_msghdr *)(void *)next; if (rtm->rtm_version != RTM_VERSION) continue; if ((rtm->rtm_flags & RTF_GATEWAY) == 0 || rtm->rtm_index != ifindex) continue; dst = gw = mask = NULL; sa = (struct sockaddr *)(rtm + 1); for (i = 0; i < RTAX_MAX; i++) { if ((rtm->rtm_addrs & (1 << i)) != 0) { switch (i) { case RTAX_DST: dst = (void *)sa; break; case RTAX_GATEWAY: gw = (void *)sa; break; case RTAX_NETMASK: mask = (void *)sa; break; } } sa = (struct sockaddr *)((char *)sa + SA_SIZE(sa)); } if (dst->sin_addr.s_addr == INADDR_ANY && mask->sin_addr.s_addr == 0) { ret = inet_ntoa(gw->sin_addr); break; } } free(buf); return (ret); } static void check_size(int fd, const char *fn) { int name[] = { CTL_HW, HW_PHYSMEM }; size_t namelen = nitems(name); unsigned long physmem; size_t len; off_t mediasize; int minidump; len = sizeof(minidump); if (sysctlbyname("debug.minidump", &minidump, &len, NULL, 0) == 0 && minidump == 1) return; len = sizeof(physmem); if (sysctl(name, namelen, &physmem, &len, NULL, 0) != 0) err(EX_OSERR, "can't get memory size"); if (ioctl(fd, DIOCGMEDIASIZE, &mediasize) != 0) err(EX_OSERR, "%s: can't get size", fn); if ((uintmax_t)mediasize < (uintmax_t)physmem) { if (verbose) printf("%s is smaller than physical memory\n", fn); exit(EX_IOERR); } } #ifdef HAVE_CRYPTO static void genkey(const char *pubkeyfile, struct diocskerneldump_arg *kdap) { FILE *fp; RSA *pubkey; assert(pubkeyfile != NULL); assert(kdap != NULL); fp = NULL; pubkey = NULL; fp = fopen(pubkeyfile, "r"); if (fp == NULL) err(1, "Unable to open %s", pubkeyfile); if (caph_enter() < 0) err(1, "Unable to enter capability mode"); pubkey = RSA_new(); if (pubkey == NULL) { errx(1, "Unable to allocate an RSA structure: %s", ERR_error_string(ERR_get_error(), NULL)); } pubkey = PEM_read_RSA_PUBKEY(fp, &pubkey, NULL, NULL); fclose(fp); fp = NULL; if (pubkey == NULL) errx(1, "Unable to read data from %s.", pubkeyfile); /* * RSA keys under ~1024 bits are trivially factorable (2018). OpenSSL * provides an API for RSA keys to estimate the symmetric-cipher * "equivalent" bits of security (defined in NIST SP800-57), which as * of this writing equates a 2048-bit RSA key to 112 symmetric cipher * bits. * * Use this API as a seatbelt to avoid suggesting to users that their * privacy is protected by encryption when the key size is insufficient * to prevent compromise via factoring. * * Future work: Sanity check for weak 'e', and sanity check for absence * of 'd' (i.e., the supplied key is a public key rather than a full * keypair). */ #if OPENSSL_VERSION_NUMBER >= 0x10100000L if (RSA_security_bits(pubkey) < 112) #else if (RSA_size(pubkey) * 8 < 2048) #endif errx(1, "Small RSA keys (you provided: %db) can be " "factored cheaply. Please generate a larger key.", RSA_size(pubkey) * 8); kdap->kda_encryptedkeysize = RSA_size(pubkey); if (kdap->kda_encryptedkeysize > KERNELDUMP_ENCKEY_MAX_SIZE) { errx(1, "Public key has to be at most %db long.", 8 * KERNELDUMP_ENCKEY_MAX_SIZE); } kdap->kda_encryptedkey = calloc(1, kdap->kda_encryptedkeysize); if (kdap->kda_encryptedkey == NULL) err(1, "Unable to allocate encrypted key"); kdap->kda_encryption = KERNELDUMP_ENC_AES_256_CBC; arc4random_buf(kdap->kda_key, sizeof(kdap->kda_key)); if (RSA_public_encrypt(sizeof(kdap->kda_key), kdap->kda_key, kdap->kda_encryptedkey, pubkey, RSA_PKCS1_PADDING) != (int)kdap->kda_encryptedkeysize) { errx(1, "Unable to encrypt the one-time key."); } RSA_free(pubkey); } #endif static void listdumpdev(void) { + static char ip[200]; + char dumpdev[PATH_MAX]; - struct netdump_conf ndconf; + struct diocskerneldump_arg ndconf; size_t len; const char *sysctlname = "kern.shutdown.dumpdevname"; int fd; len = sizeof(dumpdev); if (sysctlbyname(sysctlname, &dumpdev, &len, NULL, 0) != 0) { if (errno == ENOMEM) { err(EX_OSERR, "Kernel returned too large of a buffer for '%s'\n", sysctlname); } else { err(EX_OSERR, "Sysctl get '%s'\n", sysctlname); } } if (strlen(dumpdev) == 0) (void)strlcpy(dumpdev, _PATH_DEVNULL, sizeof(dumpdev)); - if (verbose) - printf("kernel dumps on "); - printf("%s\n", dumpdev); + if (verbose) { + char *ctx, *dd; + unsigned idx; + + printf("kernel dumps on priority: device\n"); + idx = 0; + ctx = dumpdev; + while ((dd = strsep(&ctx, ",")) != NULL) + printf("%u: %s\n", idx++, dd); + } else + printf("%s\n", dumpdev); /* If netdump is enabled, print the configuration parameters. */ if (verbose) { fd = open(_PATH_NETDUMP, O_RDONLY); if (fd < 0) { if (errno != ENOENT) err(EX_OSERR, "opening %s", _PATH_NETDUMP); return; } - if (ioctl(fd, NETDUMPGCONF, &ndconf) != 0) { + if (ioctl(fd, DIOCGKERNELDUMP, &ndconf) != 0) { if (errno != ENXIO) - err(EX_OSERR, "ioctl(NETDUMPGCONF)"); + err(EX_OSERR, "ioctl(DIOCGKERNELDUMP)"); (void)close(fd); return; } - printf("server address: %s\n", inet_ntoa(ndconf.ndc_server)); - printf("client address: %s\n", inet_ntoa(ndconf.ndc_client)); - printf("gateway address: %s\n", inet_ntoa(ndconf.ndc_gateway)); + printf("server address: %s\n", + inet_ntop(ndconf.kda_af, &ndconf.kda_server, ip, + sizeof(ip))); + printf("client address: %s\n", + inet_ntop(ndconf.kda_af, &ndconf.kda_client, ip, + sizeof(ip))); + printf("gateway address: %s\n", + inet_ntop(ndconf.kda_af, &ndconf.kda_gateway, ip, + sizeof(ip))); (void)close(fd); } } static int opendumpdev(const char *arg, char *dumpdev) { int fd, i; if (strncmp(arg, _PATH_DEV, sizeof(_PATH_DEV) - 1) == 0) strlcpy(dumpdev, arg, PATH_MAX); else { i = snprintf(dumpdev, PATH_MAX, "%s%s", _PATH_DEV, arg); if (i < 0) err(EX_OSERR, "%s", arg); if (i >= PATH_MAX) errc(EX_DATAERR, EINVAL, "%s", arg); } fd = open(dumpdev, O_RDONLY); if (fd < 0) err(EX_OSFILE, "%s", dumpdev); return (fd); } int main(int argc, char *argv[]) { char dumpdev[PATH_MAX]; - struct diocskerneldump_arg _kda, *kdap; - struct netdump_conf ndconf; + struct diocskerneldump_arg ndconf, *kdap; struct addrinfo hints, *res; const char *dev, *pubkeyfile, *server, *client, *gateway; int ch, error, fd; - bool enable, gzip, list, netdump, zstd; + bool gzip, list, netdump, zstd, insert, rflag; + uint8_t ins_idx; - gzip = list = netdump = zstd = false; + gzip = list = netdump = zstd = insert = rflag = false; kdap = NULL; pubkeyfile = NULL; server = client = gateway = NULL; + ins_idx = KDA_APPEND; - while ((ch = getopt(argc, argv, "c:g:k:ls:vZz")) != -1) + while ((ch = getopt(argc, argv, "c:g:i:k:lrs:vZz")) != -1) switch ((char)ch) { case 'c': client = optarg; break; case 'g': gateway = optarg; break; + case 'i': + { + int i; + + i = atoi(optarg); + if (i < 0 || i >= KDA_APPEND - 1) + errx(EX_USAGE, + "-i index must be between zero and %d.", + (int)KDA_APPEND - 2); + insert = true; + ins_idx = i; + } + break; case 'k': pubkeyfile = optarg; break; case 'l': list = true; break; + case 'r': + rflag = true; + break; case 's': server = optarg; break; case 'v': verbose = 1; break; case 'Z': zstd = true; break; case 'z': gzip = true; break; default: usage(); } if (gzip && zstd) errx(EX_USAGE, "The -z and -Z options are mutually exclusive."); + if (insert && rflag) + errx(EX_USAGE, "The -i and -r options are mutually exclusive."); + argc -= optind; argv += optind; if (list) { listdumpdev(); exit(EX_OK); } if (argc != 1) usage(); #ifndef HAVE_CRYPTO if (pubkeyfile != NULL) errx(EX_UNAVAILABLE,"Unable to use the public key." " Recompile dumpon with OpenSSL support."); #endif if (server != NULL && client != NULL) { - enable = true; dev = _PATH_NETDUMP; netdump = true; - kdap = &ndconf.ndc_kda; } else if (server == NULL && client == NULL && argc > 0) { - enable = strcmp(argv[0], "off") != 0; - dev = enable ? argv[0] : _PATH_DEVNULL; + if (strcmp(argv[0], "off") == 0) { + rflag = true; + dev = _PATH_DEVNULL; + } else + dev = argv[0]; netdump = false; - kdap = &_kda; } else usage(); fd = opendumpdev(dev, dumpdev); - if (!netdump && !gzip) + if (!netdump && !gzip && !rflag) check_size(fd, dumpdev); + kdap = &ndconf; bzero(kdap, sizeof(*kdap)); - kdap->kda_enable = 0; - if (ioctl(fd, DIOCSKERNELDUMP, kdap) != 0) - err(EX_OSERR, "ioctl(DIOCSKERNELDUMP)"); - if (!enable) - exit(EX_OK); - explicit_bzero(kdap, sizeof(*kdap)); - kdap->kda_enable = 1; + if (rflag) + kdap->kda_index = KDA_REMOVE; + else + kdap->kda_index = ins_idx; + kdap->kda_compression = KERNELDUMP_COMP_NONE; if (zstd) kdap->kda_compression = KERNELDUMP_COMP_ZSTD; else if (gzip) kdap->kda_compression = KERNELDUMP_COMP_GZIP; if (netdump) { memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_INET; hints.ai_protocol = IPPROTO_UDP; res = NULL; error = getaddrinfo(server, NULL, &hints, &res); if (error != 0) err(1, "%s", gai_strerror(error)); if (res == NULL) errx(1, "failed to resolve '%s'", server); server = inet_ntoa( ((struct sockaddr_in *)(void *)res->ai_addr)->sin_addr); freeaddrinfo(res); - if (strlcpy(ndconf.ndc_iface, argv[0], - sizeof(ndconf.ndc_iface)) >= sizeof(ndconf.ndc_iface)) + if (strlcpy(ndconf.kda_iface, argv[0], + sizeof(ndconf.kda_iface)) >= sizeof(ndconf.kda_iface)) errx(EX_USAGE, "invalid interface name '%s'", argv[0]); - if (inet_aton(server, &ndconf.ndc_server) == 0) + if (inet_aton(server, &ndconf.kda_server.in4) == 0) errx(EX_USAGE, "invalid server address '%s'", server); - if (inet_aton(client, &ndconf.ndc_client) == 0) + if (inet_aton(client, &ndconf.kda_client.in4) == 0) errx(EX_USAGE, "invalid client address '%s'", client); if (gateway == NULL) { gateway = find_gateway(argv[0]); if (gateway == NULL) { if (verbose) printf( "failed to look up gateway for %s\n", server); gateway = server; } } - if (inet_aton(gateway, &ndconf.ndc_gateway) == 0) + if (inet_aton(gateway, &ndconf.kda_gateway.in4) == 0) errx(EX_USAGE, "invalid gateway address '%s'", gateway); + ndconf.kda_af = AF_INET; + } #ifdef HAVE_CRYPTO - if (pubkeyfile != NULL) - genkey(pubkeyfile, kdap); -#endif - error = ioctl(fd, NETDUMPSCONF, &ndconf); - if (error != 0) - error = errno; - explicit_bzero(kdap->kda_encryptedkey, - kdap->kda_encryptedkeysize); - free(kdap->kda_encryptedkey); - explicit_bzero(kdap, sizeof(*kdap)); - if (error != 0) - errc(EX_OSERR, error, "ioctl(NETDUMPSCONF)"); - } else { -#ifdef HAVE_CRYPTO - if (pubkeyfile != NULL) - genkey(pubkeyfile, kdap); + if (pubkeyfile != NULL) + genkey(pubkeyfile, kdap); #endif - error = ioctl(fd, DIOCSKERNELDUMP, kdap); - if (error != 0) - error = errno; - explicit_bzero(kdap->kda_encryptedkey, - kdap->kda_encryptedkeysize); - free(kdap->kda_encryptedkey); - explicit_bzero(kdap, sizeof(*kdap)); - if (error != 0) - errc(EX_OSERR, error, "ioctl(DIOCSKERNELDUMP)"); + error = ioctl(fd, DIOCSKERNELDUMP, kdap); + if (error != 0) + error = errno; + explicit_bzero(kdap->kda_encryptedkey, kdap->kda_encryptedkeysize); + free(kdap->kda_encryptedkey); + explicit_bzero(kdap, sizeof(*kdap)); + if (error != 0) { + if (netdump) { + /* + * Be slightly less user-hostile for some common + * errors, especially as users don't have any great + * discoverability into which NICs support netdump. + */ + if (error == ENXIO) + errx(EX_OSERR, "Unable to configure netdump " + "because the interface's link is down."); + else if (error == ENODEV) + errx(EX_OSERR, "Unable to configure netdump " + "because the interface driver does not yet " + "support netdump."); + } + errc(EX_OSERR, error, "ioctl(DIOCSKERNELDUMP)"); } + if (verbose) - printf("kernel dumps on %s\n", dumpdev); + listdumpdev(); exit(EX_OK); } diff --git a/sys/dev/null/null.c b/sys/dev/null/null.c index c1e81ed24024..6ec127ba2718 100644 --- a/sys/dev/null/null.c +++ b/sys/dev/null/null.c @@ -1,207 +1,218 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD * * Copyright (c) 2000 Mark R. V. Murray & Jeroen C. van Gelderen * Copyright (c) 2001-2004 Mark R. V. Murray * Copyright (c) 2014 Eitan Adler * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer * in this position and unchanged. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include /* For use with destroy_dev(9). */ static struct cdev *full_dev; static struct cdev *null_dev; static struct cdev *zero_dev; static d_write_t full_write; static d_write_t null_write; static d_ioctl_t null_ioctl; static d_ioctl_t zero_ioctl; static d_read_t zero_read; static struct cdevsw full_cdevsw = { .d_version = D_VERSION, .d_read = zero_read, .d_write = full_write, .d_ioctl = zero_ioctl, .d_name = "full", }; static struct cdevsw null_cdevsw = { .d_version = D_VERSION, .d_read = (d_read_t *)nullop, .d_write = null_write, .d_ioctl = null_ioctl, .d_name = "null", }; static struct cdevsw zero_cdevsw = { .d_version = D_VERSION, .d_read = zero_read, .d_write = null_write, .d_ioctl = zero_ioctl, .d_name = "zero", .d_flags = D_MMAP_ANON, }; /* ARGSUSED */ static int full_write(struct cdev *dev __unused, struct uio *uio __unused, int flags __unused) { return (ENOSPC); } /* ARGSUSED */ static int null_write(struct cdev *dev __unused, struct uio *uio, int flags __unused) { uio->uio_resid = 0; return (0); } /* ARGSUSED */ static int null_ioctl(struct cdev *dev __unused, u_long cmd, caddr_t data __unused, int flags __unused, struct thread *td) { + struct diocskerneldump_arg kda; int error; error = 0; switch (cmd) { #ifdef COMPAT_FREEBSD11 case DIOCSKERNELDUMP_FREEBSD11: + gone_in(13, "FreeBSD 11.x ABI compat"); + /* FALLTHROUGH */ +#endif +#ifdef COMPAT_FREEBSD12 + case DIOCSKERNELDUMP_FREEBSD12: + if (cmd == DIOCSKERNELDUMP_FREEBSD12) + gone_in(14, "FreeBSD 12.x ABI compat"); + /* FALLTHROUGH */ #endif case DIOCSKERNELDUMP: - error = clear_dumper(td); + bzero(&kda, sizeof(kda)); + kda.kda_index = KDA_REMOVE_ALL; + error = dumper_remove(NULL, &kda); break; case FIONBIO: break; case FIOASYNC: if (*(int *)data != 0) error = EINVAL; break; default: error = ENOIOCTL; } return (error); } /* ARGSUSED */ static int zero_ioctl(struct cdev *dev __unused, u_long cmd, caddr_t data __unused, int flags __unused, struct thread *td) { int error; error = 0; switch (cmd) { case FIONBIO: break; case FIOASYNC: if (*(int *)data != 0) error = EINVAL; break; default: error = ENOIOCTL; } return (error); } /* ARGSUSED */ static int zero_read(struct cdev *dev __unused, struct uio *uio, int flags __unused) { void *zbuf; ssize_t len; int error = 0; KASSERT(uio->uio_rw == UIO_READ, ("Can't be in %s for write", __func__)); zbuf = __DECONST(void *, zero_region); while (uio->uio_resid > 0 && error == 0) { len = uio->uio_resid; if (len > ZERO_REGION_SIZE) len = ZERO_REGION_SIZE; error = uiomove(zbuf, len, uio); } return (error); } /* ARGSUSED */ static int null_modevent(module_t mod __unused, int type, void *data __unused) { switch(type) { case MOD_LOAD: if (bootverbose) printf("null: \n"); full_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, &full_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0666, "full"); null_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, &null_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0666, "null"); zero_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, &zero_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0666, "zero"); break; case MOD_UNLOAD: destroy_dev(full_dev); destroy_dev(null_dev); destroy_dev(zero_dev); break; case MOD_SHUTDOWN: break; default: return (EOPNOTSUPP); } return (0); } DEV_MODULE(null, null_modevent, NULL); MODULE_VERSION(null, 1); diff --git a/sys/geom/geom_dev.c b/sys/geom/geom_dev.c index 8c3b3a1c4bde..a68f9de4bd5b 100644 --- a/sys/geom/geom_dev.c +++ b/sys/geom/geom_dev.c @@ -1,870 +1,899 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 2002 Poul-Henning Kamp * Copyright (c) 2002 Networks Associates Technology, Inc. * All rights reserved. * * This software was developed for the FreeBSD Project by Poul-Henning Kamp * and NAI Labs, the Security Research Division of Network Associates, Inc. * under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the * DARPA CHATS research program. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The names of the authors may not be used to endorse or promote * products derived from this software without specific prior written * permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include struct g_dev_softc { struct mtx sc_mtx; struct cdev *sc_dev; struct cdev *sc_alias; int sc_open; u_int sc_active; #define SC_A_DESTROY (1 << 31) #define SC_A_OPEN (1 << 30) #define SC_A_ACTIVE (SC_A_OPEN - 1) }; static d_open_t g_dev_open; static d_close_t g_dev_close; static d_strategy_t g_dev_strategy; static d_ioctl_t g_dev_ioctl; static struct cdevsw g_dev_cdevsw = { .d_version = D_VERSION, .d_open = g_dev_open, .d_close = g_dev_close, .d_read = physread, .d_write = physwrite, .d_ioctl = g_dev_ioctl, .d_strategy = g_dev_strategy, .d_name = "g_dev", .d_flags = D_DISK | D_TRACKCLOSE, }; static g_init_t g_dev_init; static g_fini_t g_dev_fini; static g_taste_t g_dev_taste; static g_orphan_t g_dev_orphan; static g_attrchanged_t g_dev_attrchanged; static g_resize_t g_dev_resize; static struct g_class g_dev_class = { .name = "DEV", .version = G_VERSION, .init = g_dev_init, .fini = g_dev_fini, .taste = g_dev_taste, .orphan = g_dev_orphan, .attrchanged = g_dev_attrchanged, .resize = g_dev_resize }; /* * We target 262144 (8 x 32768) sectors by default as this significantly * increases the throughput on commonly used SSD's with a marginal * increase in non-interruptible request latency. */ static uint64_t g_dev_del_max_sectors = 262144; SYSCTL_DECL(_kern_geom); SYSCTL_NODE(_kern_geom, OID_AUTO, dev, CTLFLAG_RW, 0, "GEOM_DEV stuff"); SYSCTL_QUAD(_kern_geom_dev, OID_AUTO, delete_max_sectors, CTLFLAG_RW, &g_dev_del_max_sectors, 0, "Maximum number of sectors in a single " "delete request sent to the provider. Larger requests are chunked " "so they can be interrupted. (0 = disable chunking)"); static char *dumpdev = NULL; static void g_dev_init(struct g_class *mp) { dumpdev = kern_getenv("dumpdev"); } static void g_dev_fini(struct g_class *mp) { freeenv(dumpdev); dumpdev = NULL; } static int -g_dev_setdumpdev(struct cdev *dev, struct diocskerneldump_arg *kda, - struct thread *td) +g_dev_setdumpdev(struct cdev *dev, struct diocskerneldump_arg *kda) { struct g_kerneldump kd; struct g_consumer *cp; int error, len; - if (dev == NULL || kda == NULL) - return (clear_dumper(td)); + MPASS(dev != NULL && kda != NULL); + MPASS(kda->kda_index != KDA_REMOVE); cp = dev->si_drv2; len = sizeof(kd); memset(&kd, 0, len); kd.offset = 0; kd.length = OFF_MAX; error = g_io_getattr("GEOM::kerneldump", cp, &len, &kd); if (error != 0) return (error); - error = set_dumper(&kd.di, devtoname(dev), td, kda->kda_compression, - kda->kda_encryption, kda->kda_key, kda->kda_encryptedkeysize, - kda->kda_encryptedkey); + error = dumper_insert(&kd.di, devtoname(dev), kda); if (error == 0) dev->si_flags |= SI_DUMPDEV; return (error); } static int init_dumpdev(struct cdev *dev) { struct diocskerneldump_arg kda; struct g_consumer *cp; const char *devprefix = "/dev/", *devname; int error; size_t len; bzero(&kda, sizeof(kda)); - kda.kda_enable = 1; + kda.kda_index = KDA_APPEND; if (dumpdev == NULL) return (0); len = strlen(devprefix); devname = devtoname(dev); if (strcmp(devname, dumpdev) != 0 && (strncmp(dumpdev, devprefix, len) != 0 || strcmp(devname, dumpdev + len) != 0)) return (0); cp = (struct g_consumer *)dev->si_drv2; error = g_access(cp, 1, 0, 0); if (error != 0) return (error); - error = g_dev_setdumpdev(dev, &kda, curthread); + error = g_dev_setdumpdev(dev, &kda); if (error == 0) { freeenv(dumpdev); dumpdev = NULL; } (void)g_access(cp, -1, 0, 0); return (error); } static void g_dev_destroy(void *arg, int flags __unused) { struct g_consumer *cp; struct g_geom *gp; struct g_dev_softc *sc; char buf[SPECNAMELEN + 6]; g_topology_assert(); cp = arg; gp = cp->geom; sc = cp->private; g_trace(G_T_TOPOLOGY, "g_dev_destroy(%p(%s))", cp, gp->name); snprintf(buf, sizeof(buf), "cdev=%s", gp->name); devctl_notify_f("GEOM", "DEV", "DESTROY", buf, M_WAITOK); if (cp->acr > 0 || cp->acw > 0 || cp->ace > 0) g_access(cp, -cp->acr, -cp->acw, -cp->ace); g_detach(cp); g_destroy_consumer(cp); g_destroy_geom(gp); mtx_destroy(&sc->sc_mtx); g_free(sc); } void g_dev_print(void) { struct g_geom *gp; char const *p = ""; LIST_FOREACH(gp, &g_dev_class.geom, geom) { printf("%s%s", p, gp->name); p = " "; } printf("\n"); } static void g_dev_set_physpath(struct g_consumer *cp) { struct g_dev_softc *sc; char *physpath; int error, physpath_len; if (g_access(cp, 1, 0, 0) != 0) return; sc = cp->private; physpath_len = MAXPATHLEN; physpath = g_malloc(physpath_len, M_WAITOK|M_ZERO); error = g_io_getattr("GEOM::physpath", cp, &physpath_len, physpath); g_access(cp, -1, 0, 0); if (error == 0 && strlen(physpath) != 0) { struct cdev *dev, *old_alias_dev; struct cdev **alias_devp; dev = sc->sc_dev; old_alias_dev = sc->sc_alias; alias_devp = (struct cdev **)&sc->sc_alias; make_dev_physpath_alias(MAKEDEV_WAITOK, alias_devp, dev, old_alias_dev, physpath); } else if (sc->sc_alias) { destroy_dev((struct cdev *)sc->sc_alias); sc->sc_alias = NULL; } g_free(physpath); } static void g_dev_set_media(struct g_consumer *cp) { struct g_dev_softc *sc; struct cdev *dev; char buf[SPECNAMELEN + 6]; sc = cp->private; dev = sc->sc_dev; snprintf(buf, sizeof(buf), "cdev=%s", dev->si_name); devctl_notify_f("DEVFS", "CDEV", "MEDIACHANGE", buf, M_WAITOK); devctl_notify_f("GEOM", "DEV", "MEDIACHANGE", buf, M_WAITOK); dev = sc->sc_alias; if (dev != NULL) { snprintf(buf, sizeof(buf), "cdev=%s", dev->si_name); devctl_notify_f("DEVFS", "CDEV", "MEDIACHANGE", buf, M_WAITOK); devctl_notify_f("GEOM", "DEV", "MEDIACHANGE", buf, M_WAITOK); } } static void g_dev_attrchanged(struct g_consumer *cp, const char *attr) { if (strcmp(attr, "GEOM::media") == 0) { g_dev_set_media(cp); return; } if (strcmp(attr, "GEOM::physpath") == 0) { g_dev_set_physpath(cp); return; } } static void g_dev_resize(struct g_consumer *cp) { char buf[SPECNAMELEN + 6]; snprintf(buf, sizeof(buf), "cdev=%s", cp->provider->name); devctl_notify_f("GEOM", "DEV", "SIZECHANGE", buf, M_WAITOK); } struct g_provider * g_dev_getprovider(struct cdev *dev) { struct g_consumer *cp; g_topology_assert(); if (dev == NULL) return (NULL); if (dev->si_devsw != &g_dev_cdevsw) return (NULL); cp = dev->si_drv2; return (cp->provider); } static struct g_geom * g_dev_taste(struct g_class *mp, struct g_provider *pp, int insist __unused) { struct g_geom *gp; struct g_geom_alias *gap; struct g_consumer *cp; struct g_dev_softc *sc; int error; struct cdev *dev, *adev; char buf[SPECNAMELEN + 6]; g_trace(G_T_TOPOLOGY, "dev_taste(%s,%s)", mp->name, pp->name); g_topology_assert(); gp = g_new_geomf(mp, "%s", pp->name); sc = g_malloc(sizeof(*sc), M_WAITOK | M_ZERO); mtx_init(&sc->sc_mtx, "g_dev", NULL, MTX_DEF); cp = g_new_consumer(gp); cp->private = sc; cp->flags |= G_CF_DIRECT_SEND | G_CF_DIRECT_RECEIVE; error = g_attach(cp, pp); KASSERT(error == 0, ("g_dev_taste(%s) failed to g_attach, err=%d", pp->name, error)); error = make_dev_p(MAKEDEV_CHECKNAME | MAKEDEV_WAITOK, &dev, &g_dev_cdevsw, NULL, UID_ROOT, GID_OPERATOR, 0640, "%s", gp->name); if (error != 0) { printf("%s: make_dev_p() failed (gp->name=%s, error=%d)\n", __func__, gp->name, error); g_detach(cp); g_destroy_consumer(cp); g_destroy_geom(gp); mtx_destroy(&sc->sc_mtx); g_free(sc); return (NULL); } dev->si_flags |= SI_UNMAPPED; sc->sc_dev = dev; dev->si_iosize_max = MAXPHYS; dev->si_drv2 = cp; error = init_dumpdev(dev); if (error != 0) printf("%s: init_dumpdev() failed (gp->name=%s, error=%d)\n", __func__, gp->name, error); g_dev_attrchanged(cp, "GEOM::physpath"); snprintf(buf, sizeof(buf), "cdev=%s", gp->name); devctl_notify_f("GEOM", "DEV", "CREATE", buf, M_WAITOK); /* * Now add all the aliases for this drive */ LIST_FOREACH(gap, &pp->geom->aliases, ga_next) { error = make_dev_alias_p(MAKEDEV_CHECKNAME | MAKEDEV_WAITOK, &adev, dev, "%s", gap->ga_alias); if (error) { printf("%s: make_dev_alias_p() failed (name=%s, error=%d)\n", __func__, gap->ga_alias, error); continue; } snprintf(buf, sizeof(buf), "cdev=%s", gap->ga_alias); devctl_notify_f("GEOM", "DEV", "CREATE", buf, M_WAITOK); } return (gp); } static int g_dev_open(struct cdev *dev, int flags, int fmt, struct thread *td) { struct g_consumer *cp; struct g_dev_softc *sc; int error, r, w, e; cp = dev->si_drv2; if (cp == NULL) return (ENXIO); /* g_dev_taste() not done yet */ g_trace(G_T_ACCESS, "g_dev_open(%s, %d, %d, %p)", cp->geom->name, flags, fmt, td); r = flags & FREAD ? 1 : 0; w = flags & FWRITE ? 1 : 0; #ifdef notyet e = flags & O_EXCL ? 1 : 0; #else e = 0; #endif /* * This happens on attempt to open a device node with O_EXEC. */ if (r + w + e == 0) return (EINVAL); if (w) { /* * When running in very secure mode, do not allow * opens for writing of any disks. */ error = securelevel_ge(td->td_ucred, 2); if (error) return (error); } g_topology_lock(); error = g_access(cp, r, w, e); g_topology_unlock(); if (error == 0) { sc = cp->private; mtx_lock(&sc->sc_mtx); if (sc->sc_open == 0 && (sc->sc_active & SC_A_ACTIVE) != 0) wakeup(&sc->sc_active); sc->sc_open += r + w + e; if (sc->sc_open == 0) atomic_clear_int(&sc->sc_active, SC_A_OPEN); else atomic_set_int(&sc->sc_active, SC_A_OPEN); mtx_unlock(&sc->sc_mtx); } return (error); } static int g_dev_close(struct cdev *dev, int flags, int fmt, struct thread *td) { struct g_consumer *cp; struct g_dev_softc *sc; int error, r, w, e; cp = dev->si_drv2; if (cp == NULL) return (ENXIO); g_trace(G_T_ACCESS, "g_dev_close(%s, %d, %d, %p)", cp->geom->name, flags, fmt, td); r = flags & FREAD ? -1 : 0; w = flags & FWRITE ? -1 : 0; #ifdef notyet e = flags & O_EXCL ? -1 : 0; #else e = 0; #endif /* * The vgonel(9) - caused by eg. forced unmount of devfs - calls * VOP_CLOSE(9) on devfs vnode without any FREAD or FWRITE flags, * which would result in zero deltas, which in turn would cause * panic in g_access(9). * * Note that we cannot zero the counters (ie. do "r = cp->acr" * etc) instead, because the consumer might be opened in another * devfs instance. */ if (r + w + e == 0) return (EINVAL); sc = cp->private; mtx_lock(&sc->sc_mtx); sc->sc_open += r + w + e; if (sc->sc_open == 0) atomic_clear_int(&sc->sc_active, SC_A_OPEN); else atomic_set_int(&sc->sc_active, SC_A_OPEN); while (sc->sc_open == 0 && (sc->sc_active & SC_A_ACTIVE) != 0) msleep(&sc->sc_active, &sc->sc_mtx, 0, "g_dev_close", hz / 10); mtx_unlock(&sc->sc_mtx); g_topology_lock(); error = g_access(cp, r, w, e); g_topology_unlock(); return (error); } /* * XXX: Until we have unmessed the ioctl situation, there is a race against * XXX: a concurrent orphanization. We cannot close it by holding topology * XXX: since that would prevent us from doing our job, and stalling events * XXX: will break (actually: stall) the BSD disklabel hacks. */ static int g_dev_ioctl(struct cdev *dev, u_long cmd, caddr_t data, int fflag, struct thread *td) { struct g_consumer *cp; struct g_provider *pp; off_t offset, length, chunk, odd; int i, error; +#ifdef COMPAT_FREEBSD12 + struct diocskerneldump_arg kda_copy; +#endif cp = dev->si_drv2; pp = cp->provider; error = 0; KASSERT(cp->acr || cp->acw, ("Consumer with zero access count in g_dev_ioctl")); i = IOCPARM_LEN(cmd); switch (cmd) { case DIOCGSECTORSIZE: *(u_int *)data = cp->provider->sectorsize; if (*(u_int *)data == 0) error = ENOENT; break; case DIOCGMEDIASIZE: *(off_t *)data = cp->provider->mediasize; if (*(off_t *)data == 0) error = ENOENT; break; case DIOCGFWSECTORS: error = g_io_getattr("GEOM::fwsectors", cp, &i, data); if (error == 0 && *(u_int *)data == 0) error = ENOENT; break; case DIOCGFWHEADS: error = g_io_getattr("GEOM::fwheads", cp, &i, data); if (error == 0 && *(u_int *)data == 0) error = ENOENT; break; case DIOCGFRONTSTUFF: error = g_io_getattr("GEOM::frontstuff", cp, &i, data); break; #ifdef COMPAT_FREEBSD11 case DIOCSKERNELDUMP_FREEBSD11: { struct diocskerneldump_arg kda; + gone_in(13, "FreeBSD 11.x ABI compat"); + bzero(&kda, sizeof(kda)); kda.kda_encryption = KERNELDUMP_ENC_NONE; - kda.kda_enable = (uint8_t)*(u_int *)data; - if (kda.kda_enable == 0) - error = g_dev_setdumpdev(NULL, NULL, td); + kda.kda_index = (*(u_int *)data ? 0 : KDA_REMOVE_ALL); + if (kda.kda_index == KDA_REMOVE_ALL) + error = dumper_remove(devtoname(dev), &kda); else - error = g_dev_setdumpdev(dev, &kda, td); + error = g_dev_setdumpdev(dev, &kda); break; } +#endif +#ifdef COMPAT_FREEBSD12 + case DIOCSKERNELDUMP_FREEBSD12: + { + struct diocskerneldump_arg_freebsd12 *kda12; + + gone_in(14, "FreeBSD 12.x ABI compat"); + + kda12 = (void *)data; + memcpy(&kda_copy, kda12, sizeof(kda_copy)); + kda_copy.kda_index = (kda12->kda12_enable ? + 0 : KDA_REMOVE_ALL); + + explicit_bzero(kda12, sizeof(*kda12)); + /* Kludge to pass kda_copy to kda in fallthrough. */ + data = (void *)&kda_copy; + } + /* FALLTHROUGH */ #endif case DIOCSKERNELDUMP: { struct diocskerneldump_arg *kda; uint8_t *encryptedkey; kda = (struct diocskerneldump_arg *)data; - if (kda->kda_enable == 0) { - error = g_dev_setdumpdev(NULL, NULL, td); + if (kda->kda_index == KDA_REMOVE_ALL || + kda->kda_index == KDA_REMOVE_DEV || + kda->kda_index == KDA_REMOVE) { + error = dumper_remove(devtoname(dev), kda); + explicit_bzero(kda, sizeof(*kda)); break; } if (kda->kda_encryption != KERNELDUMP_ENC_NONE) { - if (kda->kda_encryptedkeysize <= 0 || + if (kda->kda_encryptedkeysize == 0 || kda->kda_encryptedkeysize > KERNELDUMP_ENCKEY_MAX_SIZE) { + explicit_bzero(kda, sizeof(*kda)); return (EINVAL); } encryptedkey = malloc(kda->kda_encryptedkeysize, M_TEMP, M_WAITOK); error = copyin(kda->kda_encryptedkey, encryptedkey, kda->kda_encryptedkeysize); } else { encryptedkey = NULL; } if (error == 0) { kda->kda_encryptedkey = encryptedkey; - error = g_dev_setdumpdev(dev, kda, td); + error = g_dev_setdumpdev(dev, kda); } if (encryptedkey != NULL) { explicit_bzero(encryptedkey, kda->kda_encryptedkeysize); free(encryptedkey, M_TEMP); } explicit_bzero(kda, sizeof(*kda)); break; } case DIOCGFLUSH: error = g_io_flush(cp); break; case DIOCGDELETE: offset = ((off_t *)data)[0]; length = ((off_t *)data)[1]; if ((offset % cp->provider->sectorsize) != 0 || (length % cp->provider->sectorsize) != 0 || length <= 0) { printf("%s: offset=%jd length=%jd\n", __func__, offset, length); error = EINVAL; break; } if ((cp->provider->mediasize > 0) && (offset >= cp->provider->mediasize)) { /* * Catch out-of-bounds requests here. The problem is * that due to historical GEOM I/O implementation * peculatities, g_delete_data() would always return * success for requests starting just the next byte * after providers media boundary. Condition check on * non-zero media size, since that condition would * (most likely) cause ENXIO instead. */ error = EIO; break; } while (length > 0) { chunk = length; if (g_dev_del_max_sectors != 0 && chunk > g_dev_del_max_sectors * cp->provider->sectorsize) { chunk = g_dev_del_max_sectors * cp->provider->sectorsize; if (cp->provider->stripesize > 0) { odd = (offset + chunk + cp->provider->stripeoffset) % cp->provider->stripesize; if (chunk > odd) chunk -= odd; } } error = g_delete_data(cp, offset, chunk); length -= chunk; offset += chunk; if (error) break; /* * Since the request size can be large, the service * time can be is likewise. We make this ioctl * interruptible by checking for signals for each bio. */ if (SIGPENDING(td)) break; } break; case DIOCGIDENT: error = g_io_getattr("GEOM::ident", cp, &i, data); break; case DIOCGPROVIDERNAME: if (pp == NULL) return (ENOENT); strlcpy(data, pp->name, i); break; case DIOCGSTRIPESIZE: *(off_t *)data = cp->provider->stripesize; break; case DIOCGSTRIPEOFFSET: *(off_t *)data = cp->provider->stripeoffset; break; case DIOCGPHYSPATH: error = g_io_getattr("GEOM::physpath", cp, &i, data); if (error == 0 && *(char *)data == '\0') error = ENOENT; break; case DIOCGATTR: { struct diocgattr_arg *arg = (struct diocgattr_arg *)data; if (arg->len > sizeof(arg->value)) { error = EINVAL; break; } error = g_io_getattr(arg->name, cp, &arg->len, &arg->value); break; } case DIOCZONECMD: { struct disk_zone_args *zone_args =(struct disk_zone_args *)data; struct disk_zone_rep_entry *new_entries, *old_entries; struct disk_zone_report *rep; size_t alloc_size; old_entries = NULL; new_entries = NULL; rep = NULL; alloc_size = 0; if (zone_args->zone_cmd == DISK_ZONE_REPORT_ZONES) { rep = &zone_args->zone_params.report; #define MAXENTRIES (MAXPHYS / sizeof(struct disk_zone_rep_entry)) if (rep->entries_allocated > MAXENTRIES) rep->entries_allocated = MAXENTRIES; alloc_size = rep->entries_allocated * sizeof(struct disk_zone_rep_entry); if (alloc_size != 0) new_entries = g_malloc(alloc_size, M_WAITOK| M_ZERO); old_entries = rep->entries; rep->entries = new_entries; } error = g_io_zonecmd(zone_args, cp); if (zone_args->zone_cmd == DISK_ZONE_REPORT_ZONES && alloc_size != 0 && error == 0) error = copyout(new_entries, old_entries, alloc_size); if (old_entries != NULL && rep != NULL) rep->entries = old_entries; if (new_entries != NULL) g_free(new_entries); break; } default: if (cp->provider->geom->ioctl != NULL) { error = cp->provider->geom->ioctl(cp->provider, cmd, data, fflag, td); } else { error = ENOIOCTL; } } return (error); } static void g_dev_done(struct bio *bp2) { struct g_consumer *cp; struct g_dev_softc *sc; struct bio *bp; int active; cp = bp2->bio_from; sc = cp->private; bp = bp2->bio_parent; bp->bio_error = bp2->bio_error; bp->bio_completed = bp2->bio_completed; bp->bio_resid = bp->bio_length - bp2->bio_completed; if (bp2->bio_cmd == BIO_ZONE) bcopy(&bp2->bio_zone, &bp->bio_zone, sizeof(bp->bio_zone)); if (bp2->bio_error != 0) { g_trace(G_T_BIO, "g_dev_done(%p) had error %d", bp2, bp2->bio_error); bp->bio_flags |= BIO_ERROR; } else { g_trace(G_T_BIO, "g_dev_done(%p/%p) resid %ld completed %jd", bp2, bp, bp2->bio_resid, (intmax_t)bp2->bio_completed); } g_destroy_bio(bp2); active = atomic_fetchadd_int(&sc->sc_active, -1) - 1; if ((active & SC_A_ACTIVE) == 0) { if ((active & SC_A_OPEN) == 0) wakeup(&sc->sc_active); if (active & SC_A_DESTROY) g_post_event(g_dev_destroy, cp, M_NOWAIT, NULL); } biodone(bp); } static void g_dev_strategy(struct bio *bp) { struct g_consumer *cp; struct bio *bp2; struct cdev *dev; struct g_dev_softc *sc; KASSERT(bp->bio_cmd == BIO_READ || bp->bio_cmd == BIO_WRITE || bp->bio_cmd == BIO_DELETE || bp->bio_cmd == BIO_FLUSH || bp->bio_cmd == BIO_ZONE, ("Wrong bio_cmd bio=%p cmd=%d", bp, bp->bio_cmd)); dev = bp->bio_dev; cp = dev->si_drv2; sc = cp->private; KASSERT(cp->acr || cp->acw, ("Consumer with zero access count in g_dev_strategy")); biotrack(bp, __func__); #ifdef INVARIANTS if ((bp->bio_offset % cp->provider->sectorsize) != 0 || (bp->bio_bcount % cp->provider->sectorsize) != 0) { bp->bio_resid = bp->bio_bcount; biofinish(bp, NULL, EINVAL); return; } #endif KASSERT(sc->sc_open > 0, ("Closed device in g_dev_strategy")); atomic_add_int(&sc->sc_active, 1); for (;;) { /* * XXX: This is not an ideal solution, but I believe it to * XXX: deadlock safely, all things considered. */ bp2 = g_clone_bio(bp); if (bp2 != NULL) break; pause("gdstrat", hz / 10); } KASSERT(bp2 != NULL, ("XXX: ENOMEM in a bad place")); bp2->bio_done = g_dev_done; g_trace(G_T_BIO, "g_dev_strategy(%p/%p) offset %jd length %jd data %p cmd %d", bp, bp2, (intmax_t)bp->bio_offset, (intmax_t)bp2->bio_length, bp2->bio_data, bp2->bio_cmd); g_io_request(bp2, cp); KASSERT(cp->acr || cp->acw, ("g_dev_strategy raced with g_dev_close and lost")); } /* * g_dev_callback() * * Called by devfs when asynchronous device destruction is completed. * - Mark that we have no attached device any more. * - If there are no outstanding requests, schedule geom destruction. * Otherwise destruction will be scheduled later by g_dev_done(). */ static void g_dev_callback(void *arg) { struct g_consumer *cp; struct g_dev_softc *sc; int active; cp = arg; sc = cp->private; g_trace(G_T_TOPOLOGY, "g_dev_callback(%p(%s))", cp, cp->geom->name); sc->sc_dev = NULL; sc->sc_alias = NULL; active = atomic_fetchadd_int(&sc->sc_active, SC_A_DESTROY); if ((active & SC_A_ACTIVE) == 0) g_post_event(g_dev_destroy, cp, M_WAITOK, NULL); } /* * g_dev_orphan() * * Called from below when the provider orphaned us. * - Clear any dump settings. * - Request asynchronous device destruction to prevent any more requests * from coming in. The provider is already marked with an error, so * anything which comes in the interim will be returned immediately. */ static void g_dev_orphan(struct g_consumer *cp) { struct cdev *dev; struct g_dev_softc *sc; g_topology_assert(); sc = cp->private; dev = sc->sc_dev; g_trace(G_T_TOPOLOGY, "g_dev_orphan(%p(%s))", cp, cp->geom->name); /* Reset any dump-area set on this device */ - if (dev->si_flags & SI_DUMPDEV) - (void)clear_dumper(curthread); + if (dev->si_flags & SI_DUMPDEV) { + struct diocskerneldump_arg kda; + + bzero(&kda, sizeof(kda)); + kda.kda_index = KDA_REMOVE_DEV; + (void)dumper_remove(devtoname(dev), &kda); + } /* Destroy the struct cdev *so we get no more requests */ delist_dev(dev); destroy_dev_sched_cb(dev, g_dev_callback, cp); } DECLARE_GEOM_CLASS(g_dev_class, g_dev); diff --git a/sys/geom/raid/g_raid.h b/sys/geom/raid/g_raid.h index d295ed32577e..e693e3c00504 100644 --- a/sys/geom/raid/g_raid.h +++ b/sys/geom/raid/g_raid.h @@ -1,471 +1,471 @@ /*- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD * * Copyright (c) 2010 Alexander Motin * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _G_RAID_H_ #define _G_RAID_H_ #include #include #include #include #ifdef _KERNEL #include #endif #define G_RAID_CLASS_NAME "RAID" #define G_RAID_MAGIC "GEOM::RAID" #define G_RAID_VERSION 0 struct g_raid_md_object; struct g_raid_tr_object; #define G_RAID_DEVICE_FLAG_NOAUTOSYNC 0x0000000000000001ULL #define G_RAID_DEVICE_FLAG_NOFAILSYNC 0x0000000000000002ULL #define G_RAID_DEVICE_FLAG_MASK (G_RAID_DEVICE_FLAG_NOAUTOSYNC | \ G_RAID_DEVICE_FLAG_NOFAILSYNC) #ifdef _KERNEL extern u_int g_raid_aggressive_spare; extern u_int g_raid_debug; extern int g_raid_enable; extern int g_raid_read_err_thresh; extern u_int g_raid_start_timeout; extern struct g_class g_raid_class; #define G_RAID_DEBUG(lvl, fmt, ...) do { \ if (g_raid_debug >= (lvl)) { \ if (g_raid_debug > 0) { \ printf("GEOM_RAID[%u]: " fmt "\n", \ lvl, ## __VA_ARGS__); \ } else { \ printf("GEOM_RAID: " fmt "\n", \ ## __VA_ARGS__); \ } \ } \ } while (0) #define G_RAID_DEBUG1(lvl, sc, fmt, ...) do { \ if (g_raid_debug >= (lvl)) { \ if (g_raid_debug > 0) { \ printf("GEOM_RAID[%u]: %s: " fmt "\n", \ lvl, (sc)->sc_name, ## __VA_ARGS__); \ } else { \ printf("GEOM_RAID: %s: " fmt "\n", \ (sc)->sc_name, ## __VA_ARGS__); \ } \ } \ } while (0) #define G_RAID_LOGREQ(lvl, bp, fmt, ...) do { \ if (g_raid_debug >= (lvl)) { \ if (g_raid_debug > 0) { \ printf("GEOM_RAID[%u]: " fmt " ", \ lvl, ## __VA_ARGS__); \ } else \ printf("GEOM_RAID: " fmt " ", ## __VA_ARGS__); \ g_print_bio(bp); \ printf("\n"); \ } \ } while (0) /* * Flags we use to distinguish I/O initiated by the TR layer to maintain * the volume's characteristics, fix subdisks, extra copies of data, etc. * * G_RAID_BIO_FLAG_SYNC I/O to update an extra copy of the data * for RAID volumes that maintain extra data * and need to rebuild that data. * G_RAID_BIO_FLAG_REMAP I/O done to try to provoke a subdisk into * doing some desirable action such as bad * block remapping after we detect a bad part * of the disk. * G_RAID_BIO_FLAG_LOCKED I/O holds range lock that should re released. * * and the following meta item: * G_RAID_BIO_FLAG_SPECIAL And of the I/O flags that need to make it * through the range locking which would * otherwise defer the I/O until after that * range is unlocked. */ #define G_RAID_BIO_FLAG_SYNC 0x01 #define G_RAID_BIO_FLAG_REMAP 0x02 #define G_RAID_BIO_FLAG_SPECIAL \ (G_RAID_BIO_FLAG_SYNC|G_RAID_BIO_FLAG_REMAP) #define G_RAID_BIO_FLAG_LOCKED 0x80 struct g_raid_lock { off_t l_offset; off_t l_length; void *l_callback_arg; int l_pending; LIST_ENTRY(g_raid_lock) l_next; }; #define G_RAID_EVENT_WAIT 0x01 #define G_RAID_EVENT_VOLUME 0x02 #define G_RAID_EVENT_SUBDISK 0x04 #define G_RAID_EVENT_DISK 0x08 #define G_RAID_EVENT_DONE 0x10 struct g_raid_event { void *e_tgt; int e_event; int e_flags; int e_error; TAILQ_ENTRY(g_raid_event) e_next; }; #define G_RAID_DISK_S_NONE 0x00 /* State is unknown. */ #define G_RAID_DISK_S_OFFLINE 0x01 /* Missing disk placeholder. */ #define G_RAID_DISK_S_DISABLED 0x02 /* Disabled. */ #define G_RAID_DISK_S_FAILED 0x03 /* Failed. */ #define G_RAID_DISK_S_STALE_FAILED 0x04 /* Old failed. */ #define G_RAID_DISK_S_SPARE 0x05 /* Hot-spare. */ #define G_RAID_DISK_S_STALE 0x06 /* Old disk, unused now. */ #define G_RAID_DISK_S_ACTIVE 0x07 /* Operational. */ #define G_RAID_DISK_E_DISCONNECTED 0x01 struct g_raid_disk { struct g_raid_softc *d_softc; /* Back-pointer to softc. */ struct g_consumer *d_consumer; /* GEOM disk consumer. */ void *d_md_data; /* Disk's metadata storage. */ - struct g_kerneldump d_kd; /* Kernel dumping method/args. */ int d_candelete; /* BIO_DELETE supported. */ uint64_t d_flags; /* Additional flags. */ u_int d_state; /* Disk state. */ u_int d_load; /* Disk average load. */ off_t d_last_offset; /* Last head offset. */ int d_read_errs; /* Count of the read errors */ TAILQ_HEAD(, g_raid_subdisk) d_subdisks; /* List of subdisks. */ TAILQ_ENTRY(g_raid_disk) d_next; /* Next disk in the node. */ + struct g_kerneldump d_kd; /* Kernel dumping method/args. */ }; #define G_RAID_SUBDISK_S_NONE 0x00 /* Absent. */ #define G_RAID_SUBDISK_S_FAILED 0x01 /* Failed. */ #define G_RAID_SUBDISK_S_NEW 0x02 /* Blank. */ #define G_RAID_SUBDISK_S_REBUILD 0x03 /* Blank + rebuild. */ #define G_RAID_SUBDISK_S_UNINITIALIZED 0x04 /* Disk of the new volume. */ #define G_RAID_SUBDISK_S_STALE 0x05 /* Dirty. */ #define G_RAID_SUBDISK_S_RESYNC 0x06 /* Dirty + check/repair. */ #define G_RAID_SUBDISK_S_ACTIVE 0x07 /* Usable. */ #define G_RAID_SUBDISK_E_NEW 0x01 /* A new subdisk has arrived */ #define G_RAID_SUBDISK_E_FAILED 0x02 /* A subdisk failed, but remains in volume */ #define G_RAID_SUBDISK_E_DISCONNECTED 0x03 /* A subdisk removed from volume. */ #define G_RAID_SUBDISK_E_FIRST_TR_PRIVATE 0x80 /* translation private events */ #define G_RAID_SUBDISK_POS(sd) \ ((sd)->sd_disk ? ((sd)->sd_disk->d_last_offset - (sd)->sd_offset) : 0) #define G_RAID_SUBDISK_TRACK_SIZE (1 * 1024 * 1024) #define G_RAID_SUBDISK_LOAD(sd) \ ((sd)->sd_disk ? ((sd)->sd_disk->d_load) : 0) #define G_RAID_SUBDISK_LOAD_SCALE 256 struct g_raid_subdisk { struct g_raid_softc *sd_softc; /* Back-pointer to softc. */ struct g_raid_disk *sd_disk; /* Where this subdisk lives. */ struct g_raid_volume *sd_volume; /* Volume, sd is a part of. */ off_t sd_offset; /* Offset on the disk. */ off_t sd_size; /* Size on the disk. */ u_int sd_pos; /* Position in volume. */ u_int sd_state; /* Subdisk state. */ off_t sd_rebuild_pos; /* Rebuild position. */ int sd_recovery; /* Count of recovery reqs. */ TAILQ_ENTRY(g_raid_subdisk) sd_next; /* Next subdisk on disk. */ }; #define G_RAID_MAX_SUBDISKS 16 #define G_RAID_MAX_VOLUMENAME 32 #define G_RAID_VOLUME_S_STARTING 0x00 #define G_RAID_VOLUME_S_BROKEN 0x01 #define G_RAID_VOLUME_S_DEGRADED 0x02 #define G_RAID_VOLUME_S_SUBOPTIMAL 0x03 #define G_RAID_VOLUME_S_OPTIMAL 0x04 #define G_RAID_VOLUME_S_UNSUPPORTED 0x05 #define G_RAID_VOLUME_S_STOPPED 0x06 #define G_RAID_VOLUME_S_ALIVE(s) \ ((s) == G_RAID_VOLUME_S_DEGRADED || \ (s) == G_RAID_VOLUME_S_SUBOPTIMAL || \ (s) == G_RAID_VOLUME_S_OPTIMAL) #define G_RAID_VOLUME_E_DOWN 0x00 #define G_RAID_VOLUME_E_UP 0x01 #define G_RAID_VOLUME_E_START 0x10 #define G_RAID_VOLUME_E_STARTMD 0x11 #define G_RAID_VOLUME_RL_RAID0 0x00 #define G_RAID_VOLUME_RL_RAID1 0x01 #define G_RAID_VOLUME_RL_RAID3 0x03 #define G_RAID_VOLUME_RL_RAID4 0x04 #define G_RAID_VOLUME_RL_RAID5 0x05 #define G_RAID_VOLUME_RL_RAID6 0x06 #define G_RAID_VOLUME_RL_RAIDMDF 0x07 #define G_RAID_VOLUME_RL_RAID1E 0x11 #define G_RAID_VOLUME_RL_SINGLE 0x0f #define G_RAID_VOLUME_RL_CONCAT 0x1f #define G_RAID_VOLUME_RL_RAID5E 0x15 #define G_RAID_VOLUME_RL_RAID5EE 0x25 #define G_RAID_VOLUME_RL_RAID5R 0x35 #define G_RAID_VOLUME_RL_UNKNOWN 0xff #define G_RAID_VOLUME_RLQ_NONE 0x00 #define G_RAID_VOLUME_RLQ_R1SM 0x00 #define G_RAID_VOLUME_RLQ_R1MM 0x01 #define G_RAID_VOLUME_RLQ_R3P0 0x00 #define G_RAID_VOLUME_RLQ_R3PN 0x01 #define G_RAID_VOLUME_RLQ_R4P0 0x00 #define G_RAID_VOLUME_RLQ_R4PN 0x01 #define G_RAID_VOLUME_RLQ_R5RA 0x00 #define G_RAID_VOLUME_RLQ_R5RS 0x01 #define G_RAID_VOLUME_RLQ_R5LA 0x02 #define G_RAID_VOLUME_RLQ_R5LS 0x03 #define G_RAID_VOLUME_RLQ_R6RA 0x00 #define G_RAID_VOLUME_RLQ_R6RS 0x01 #define G_RAID_VOLUME_RLQ_R6LA 0x02 #define G_RAID_VOLUME_RLQ_R6LS 0x03 #define G_RAID_VOLUME_RLQ_RMDFRA 0x00 #define G_RAID_VOLUME_RLQ_RMDFRS 0x01 #define G_RAID_VOLUME_RLQ_RMDFLA 0x02 #define G_RAID_VOLUME_RLQ_RMDFLS 0x03 #define G_RAID_VOLUME_RLQ_R1EA 0x00 #define G_RAID_VOLUME_RLQ_R1EO 0x01 #define G_RAID_VOLUME_RLQ_R5ERA 0x00 #define G_RAID_VOLUME_RLQ_R5ERS 0x01 #define G_RAID_VOLUME_RLQ_R5ELA 0x02 #define G_RAID_VOLUME_RLQ_R5ELS 0x03 #define G_RAID_VOLUME_RLQ_R5EERA 0x00 #define G_RAID_VOLUME_RLQ_R5EERS 0x01 #define G_RAID_VOLUME_RLQ_R5EELA 0x02 #define G_RAID_VOLUME_RLQ_R5EELS 0x03 #define G_RAID_VOLUME_RLQ_R5RRA 0x00 #define G_RAID_VOLUME_RLQ_R5RRS 0x01 #define G_RAID_VOLUME_RLQ_R5RLA 0x02 #define G_RAID_VOLUME_RLQ_R5RLS 0x03 #define G_RAID_VOLUME_RLQ_UNKNOWN 0xff struct g_raid_volume; struct g_raid_volume { struct g_raid_softc *v_softc; /* Back-pointer to softc. */ struct g_provider *v_provider; /* GEOM provider. */ struct g_raid_subdisk v_subdisks[G_RAID_MAX_SUBDISKS]; /* Subdisks of this volume. */ void *v_md_data; /* Volume's metadata storage. */ struct g_raid_tr_object *v_tr; /* Transformation object. */ char v_name[G_RAID_MAX_VOLUMENAME]; /* Volume name. */ u_int v_state; /* Volume state. */ u_int v_raid_level; /* Array RAID level. */ u_int v_raid_level_qualifier; /* RAID level det. */ u_int v_disks_count; /* Number of disks in array. */ u_int v_mdf_pdisks; /* Number of parity disks in RAIDMDF array. */ uint16_t v_mdf_polynomial; /* Polynomial for RAIDMDF. */ uint8_t v_mdf_method; /* Generation method for RAIDMDF. */ u_int v_strip_size; /* Array strip size. */ u_int v_rotate_parity; /* Rotate RAID5R parity after numer of stripes. */ u_int v_sectorsize; /* Volume sector size. */ off_t v_mediasize; /* Volume media size. */ struct bio_queue_head v_inflight; /* In-flight write requests. */ struct bio_queue_head v_locked; /* Blocked I/O requests. */ LIST_HEAD(, g_raid_lock) v_locks; /* List of locked regions. */ int v_pending_lock; /* writes to locked region */ int v_dirty; /* Volume is DIRTY. */ struct timeval v_last_done; /* Time of the last I/O. */ time_t v_last_write; /* Time of the last write. */ u_int v_writes; /* Number of active writes. */ struct root_hold_token *v_rootmount; /* Root mount delay token. */ int v_starting; /* Volume is starting */ int v_stopping; /* Volume is stopping */ int v_provider_open; /* Number of opens. */ int v_global_id; /* Global volume ID (rX). */ int v_read_only; /* Volume is read-only. */ TAILQ_ENTRY(g_raid_volume) v_next; /* List of volumes entry. */ LIST_ENTRY(g_raid_volume) v_global_next; /* Global list entry. */ }; #define G_RAID_NODE_E_WAKE 0x00 #define G_RAID_NODE_E_START 0x01 struct g_raid_softc { struct g_raid_md_object *sc_md; /* Metadata object. */ struct g_geom *sc_geom; /* GEOM class instance. */ uint64_t sc_flags; /* Additional flags. */ TAILQ_HEAD(, g_raid_volume) sc_volumes; /* List of volumes. */ TAILQ_HEAD(, g_raid_disk) sc_disks; /* List of disks. */ struct sx sc_lock; /* Main node lock. */ struct proc *sc_worker; /* Worker process. */ struct mtx sc_queue_mtx; /* Worker queues lock. */ TAILQ_HEAD(, g_raid_event) sc_events; /* Worker events queue. */ struct bio_queue_head sc_queue; /* Worker I/O queue. */ int sc_stopping; /* Node is stopping */ }; #define sc_name sc_geom->name SYSCTL_DECL(_kern_geom_raid); /* * KOBJ parent class of metadata processing modules. */ struct g_raid_md_class { KOBJ_CLASS_FIELDS; int mdc_enable; int mdc_priority; LIST_ENTRY(g_raid_md_class) mdc_list; }; /* * KOBJ instance of metadata processing module. */ struct g_raid_md_object { KOBJ_FIELDS; struct g_raid_md_class *mdo_class; struct g_raid_softc *mdo_softc; /* Back-pointer to softc. */ }; int g_raid_md_modevent(module_t, int, void *); #define G_RAID_MD_DECLARE(name, label) \ static moduledata_t g_raid_md_##name##_mod = { \ "g_raid_md_" __XSTRING(name), \ g_raid_md_modevent, \ &g_raid_md_##name##_class \ }; \ DECLARE_MODULE(g_raid_md_##name, g_raid_md_##name##_mod, \ SI_SUB_DRIVERS, SI_ORDER_SECOND); \ MODULE_DEPEND(g_raid_md_##name, geom_raid, 0, 0, 0); \ SYSCTL_NODE(_kern_geom_raid, OID_AUTO, name, CTLFLAG_RD, \ NULL, label " metadata module"); \ SYSCTL_INT(_kern_geom_raid_##name, OID_AUTO, enable, \ CTLFLAG_RWTUN, &g_raid_md_##name##_class.mdc_enable, 0, \ "Enable " label " metadata format taste") /* * KOBJ parent class of data transformation modules. */ struct g_raid_tr_class { KOBJ_CLASS_FIELDS; int trc_enable; int trc_priority; int trc_accept_unmapped; LIST_ENTRY(g_raid_tr_class) trc_list; }; /* * KOBJ instance of data transformation module. */ struct g_raid_tr_object { KOBJ_FIELDS; struct g_raid_tr_class *tro_class; struct g_raid_volume *tro_volume; /* Back-pointer to volume. */ }; int g_raid_tr_modevent(module_t, int, void *); #define G_RAID_TR_DECLARE(name, label) \ static moduledata_t g_raid_tr_##name##_mod = { \ "g_raid_tr_" __XSTRING(name), \ g_raid_tr_modevent, \ &g_raid_tr_##name##_class \ }; \ DECLARE_MODULE(g_raid_tr_##name, g_raid_tr_##name##_mod, \ SI_SUB_DRIVERS, SI_ORDER_FIRST); \ MODULE_DEPEND(g_raid_tr_##name, geom_raid, 0, 0, 0); \ SYSCTL_NODE(_kern_geom_raid, OID_AUTO, name, CTLFLAG_RD, \ NULL, label " transformation module"); \ SYSCTL_INT(_kern_geom_raid_##name, OID_AUTO, enable, \ CTLFLAG_RWTUN, &g_raid_tr_##name##_class.trc_enable, 0, \ "Enable " label " transformation module taste") const char * g_raid_volume_level2str(int level, int qual); int g_raid_volume_str2level(const char *str, int *level, int *qual); const char * g_raid_volume_state2str(int state); const char * g_raid_subdisk_state2str(int state); const char * g_raid_disk_state2str(int state); struct g_raid_softc * g_raid_create_node(struct g_class *mp, const char *name, struct g_raid_md_object *md); int g_raid_create_node_format(const char *format, struct gctl_req *req, struct g_geom **gp); struct g_raid_volume * g_raid_create_volume(struct g_raid_softc *sc, const char *name, int id); struct g_raid_disk * g_raid_create_disk(struct g_raid_softc *sc); const char * g_raid_get_diskname(struct g_raid_disk *disk); void g_raid_get_disk_info(struct g_raid_disk *disk); int g_raid_start_volume(struct g_raid_volume *vol); int g_raid_destroy_node(struct g_raid_softc *sc, int worker); int g_raid_destroy_volume(struct g_raid_volume *vol); int g_raid_destroy_disk(struct g_raid_disk *disk); void g_raid_iodone(struct bio *bp, int error); void g_raid_subdisk_iostart(struct g_raid_subdisk *sd, struct bio *bp); int g_raid_subdisk_kerneldump(struct g_raid_subdisk *sd, void *virtual, vm_offset_t physical, off_t offset, size_t length); struct g_consumer *g_raid_open_consumer(struct g_raid_softc *sc, const char *name); void g_raid_kill_consumer(struct g_raid_softc *sc, struct g_consumer *cp); void g_raid_report_disk_state(struct g_raid_disk *disk); void g_raid_change_disk_state(struct g_raid_disk *disk, int state); void g_raid_change_subdisk_state(struct g_raid_subdisk *sd, int state); void g_raid_change_volume_state(struct g_raid_volume *vol, int state); void g_raid_write_metadata(struct g_raid_softc *sc, struct g_raid_volume *vol, struct g_raid_subdisk *sd, struct g_raid_disk *disk); void g_raid_fail_disk(struct g_raid_softc *sc, struct g_raid_subdisk *sd, struct g_raid_disk *disk); void g_raid_tr_flush_common(struct g_raid_tr_object *tr, struct bio *bp); int g_raid_tr_kerneldump_common(struct g_raid_tr_object *tr, void *virtual, vm_offset_t physical, off_t offset, size_t length); u_int g_raid_ndisks(struct g_raid_softc *sc, int state); u_int g_raid_nsubdisks(struct g_raid_volume *vol, int state); u_int g_raid_nopens(struct g_raid_softc *sc); struct g_raid_subdisk * g_raid_get_subdisk(struct g_raid_volume *vol, int state); #define G_RAID_DESTROY_SOFT 0 #define G_RAID_DESTROY_DELAYED 1 #define G_RAID_DESTROY_HARD 2 int g_raid_destroy(struct g_raid_softc *sc, int how); int g_raid_event_send(void *arg, int event, int flags); int g_raid_lock_range(struct g_raid_volume *vol, off_t off, off_t len, struct bio *ignore, void *argp); int g_raid_unlock_range(struct g_raid_volume *vol, off_t off, off_t len); g_ctl_req_t g_raid_ctl; #endif /* _KERNEL */ #endif /* !_G_RAID_H_ */ diff --git a/sys/kern/kern_shutdown.c b/sys/kern/kern_shutdown.c index 2ac99440e783..f2e98144e38b 100644 --- a/sys/kern/kern_shutdown.c +++ b/sys/kern/kern_shutdown.c @@ -1,1585 +1,1717 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1986, 1988, 1991, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)kern_shutdown.c 8.3 (Berkeley) 1/21/94 */ #include __FBSDID("$FreeBSD$"); #include "opt_ddb.h" #include "opt_ekcd.h" #include "opt_kdb.h" #include "opt_panic.h" +#include "opt_printf.h" #include "opt_sched.h" #include "opt_watchdog.h" #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static MALLOC_DEFINE(M_DUMPER, "dumper", "dumper block buffer"); #ifndef PANIC_REBOOT_WAIT_TIME #define PANIC_REBOOT_WAIT_TIME 15 /* default to 15 seconds */ #endif static int panic_reboot_wait_time = PANIC_REBOOT_WAIT_TIME; SYSCTL_INT(_kern, OID_AUTO, panic_reboot_wait_time, CTLFLAG_RWTUN, &panic_reboot_wait_time, 0, "Seconds to wait before rebooting after a panic"); /* * Note that stdarg.h and the ANSI style va_start macro is used for both * ANSI and traditional C compilers. */ #include #ifdef KDB #ifdef KDB_UNATTENDED static int debugger_on_panic = 0; #else static int debugger_on_panic = 1; #endif SYSCTL_INT(_debug, OID_AUTO, debugger_on_panic, CTLFLAG_RWTUN | CTLFLAG_SECURE, &debugger_on_panic, 0, "Run debugger on kernel panic"); int debugger_on_trap = 0; SYSCTL_INT(_debug, OID_AUTO, debugger_on_trap, CTLFLAG_RWTUN | CTLFLAG_SECURE, &debugger_on_trap, 0, "Run debugger on kernel trap before panic"); #ifdef KDB_TRACE static int trace_on_panic = 1; static bool trace_all_panics = true; #else static int trace_on_panic = 0; static bool trace_all_panics = false; #endif SYSCTL_INT(_debug, OID_AUTO, trace_on_panic, CTLFLAG_RWTUN | CTLFLAG_SECURE, &trace_on_panic, 0, "Print stack trace on kernel panic"); SYSCTL_BOOL(_debug, OID_AUTO, trace_all_panics, CTLFLAG_RWTUN, &trace_all_panics, 0, "Print stack traces on secondary kernel panics"); #endif /* KDB */ static int sync_on_panic = 0; SYSCTL_INT(_kern, OID_AUTO, sync_on_panic, CTLFLAG_RWTUN, &sync_on_panic, 0, "Do a sync before rebooting from a panic"); static bool poweroff_on_panic = 0; SYSCTL_BOOL(_kern, OID_AUTO, poweroff_on_panic, CTLFLAG_RWTUN, &poweroff_on_panic, 0, "Do a power off instead of a reboot on a panic"); static bool powercycle_on_panic = 0; SYSCTL_BOOL(_kern, OID_AUTO, powercycle_on_panic, CTLFLAG_RWTUN, &powercycle_on_panic, 0, "Do a power cycle instead of a reboot on a panic"); static SYSCTL_NODE(_kern, OID_AUTO, shutdown, CTLFLAG_RW, 0, "Shutdown environment"); #ifndef DIAGNOSTIC static int show_busybufs; #else static int show_busybufs = 1; #endif SYSCTL_INT(_kern_shutdown, OID_AUTO, show_busybufs, CTLFLAG_RW, &show_busybufs, 0, ""); int suspend_blocked = 0; SYSCTL_INT(_kern, OID_AUTO, suspend_blocked, CTLFLAG_RW, &suspend_blocked, 0, "Block suspend due to a pending shutdown"); #ifdef EKCD FEATURE(ekcd, "Encrypted kernel crash dumps support"); MALLOC_DEFINE(M_EKCD, "ekcd", "Encrypted kernel crash dumps data"); struct kerneldumpcrypto { uint8_t kdc_encryption; uint8_t kdc_iv[KERNELDUMP_IV_MAX_SIZE]; keyInstance kdc_ki; cipherInstance kdc_ci; uint32_t kdc_dumpkeysize; struct kerneldumpkey kdc_dumpkey[]; }; #endif struct kerneldumpcomp { uint8_t kdc_format; struct compressor *kdc_stream; uint8_t *kdc_buf; size_t kdc_resid; }; static struct kerneldumpcomp *kerneldumpcomp_create(struct dumperinfo *di, uint8_t compression); static void kerneldumpcomp_destroy(struct dumperinfo *di); static int kerneldumpcomp_write_cb(void *base, size_t len, off_t off, void *arg); static int kerneldump_gzlevel = 6; SYSCTL_INT(_kern, OID_AUTO, kerneldump_gzlevel, CTLFLAG_RWTUN, &kerneldump_gzlevel, 0, "Kernel crash dump compression level"); /* * Variable panicstr contains argument to first call to panic; used as flag * to indicate that the kernel has already called panic. */ const char *panicstr; int dumping; /* system is dumping */ int rebooting; /* system is rebooting */ -static struct dumperinfo dumper; /* our selected dumper */ +/* + * Used to serialize between sysctl kern.shutdown.dumpdevname and list + * modifications via ioctl. + */ +static struct mtx dumpconf_list_lk; +MTX_SYSINIT(dumper_configs, &dumpconf_list_lk, "dumper config list", MTX_DEF); + +/* Our selected dumper(s). */ +static TAILQ_HEAD(dumpconflist, dumperinfo) dumper_configs = + TAILQ_HEAD_INITIALIZER(dumper_configs); /* Context information for dump-debuggers. */ static struct pcb dumppcb; /* Registers. */ lwpid_t dumptid; /* Thread ID. */ static struct cdevsw reroot_cdevsw = { .d_version = D_VERSION, .d_name = "reroot", }; static void poweroff_wait(void *, int); static void shutdown_halt(void *junk, int howto); static void shutdown_panic(void *junk, int howto); static void shutdown_reset(void *junk, int howto); static int kern_reroot(void); /* register various local shutdown events */ static void shutdown_conf(void *unused) { EVENTHANDLER_REGISTER(shutdown_final, poweroff_wait, NULL, SHUTDOWN_PRI_FIRST); EVENTHANDLER_REGISTER(shutdown_final, shutdown_halt, NULL, SHUTDOWN_PRI_LAST + 100); EVENTHANDLER_REGISTER(shutdown_final, shutdown_panic, NULL, SHUTDOWN_PRI_LAST + 100); EVENTHANDLER_REGISTER(shutdown_final, shutdown_reset, NULL, SHUTDOWN_PRI_LAST + 200); } SYSINIT(shutdown_conf, SI_SUB_INTRINSIC, SI_ORDER_ANY, shutdown_conf, NULL); /* * The only reason this exists is to create the /dev/reroot/ directory, * used by reroot code in init(8) as a mountpoint for tmpfs. */ static void reroot_conf(void *unused) { int error; struct cdev *cdev; error = make_dev_p(MAKEDEV_CHECKNAME | MAKEDEV_WAITOK, &cdev, &reroot_cdevsw, NULL, UID_ROOT, GID_WHEEL, 0600, "reroot/reroot"); if (error != 0) { printf("%s: failed to create device node, error %d", __func__, error); } } SYSINIT(reroot_conf, SI_SUB_DEVFS, SI_ORDER_ANY, reroot_conf, NULL); /* * The system call that results in a reboot. */ /* ARGSUSED */ int sys_reboot(struct thread *td, struct reboot_args *uap) { int error; error = 0; #ifdef MAC error = mac_system_check_reboot(td->td_ucred, uap->opt); #endif if (error == 0) error = priv_check(td, PRIV_REBOOT); if (error == 0) { if (uap->opt & RB_REROOT) error = kern_reroot(); else kern_reboot(uap->opt); } return (error); } static void shutdown_nice_task_fn(void *arg, int pending __unused) { int howto; howto = (uintptr_t)arg; /* Send a signal to init(8) and have it shutdown the world. */ PROC_LOCK(initproc); if (howto & RB_POWEROFF) kern_psignal(initproc, SIGUSR2); else if (howto & RB_POWERCYCLE) kern_psignal(initproc, SIGWINCH); else if (howto & RB_HALT) kern_psignal(initproc, SIGUSR1); else kern_psignal(initproc, SIGINT); PROC_UNLOCK(initproc); } static struct task shutdown_nice_task = TASK_INITIALIZER(0, &shutdown_nice_task_fn, NULL); /* * Called by events that want to shut down.. e.g on a PC */ void shutdown_nice(int howto) { if (initproc != NULL && !SCHEDULER_STOPPED()) { shutdown_nice_task.ta_context = (void *)(uintptr_t)howto; taskqueue_enqueue(taskqueue_fast, &shutdown_nice_task); } else { /* * No init(8) running, or scheduler would not allow it * to run, so simply reboot. */ kern_reboot(howto | RB_NOSYNC); } } static void print_uptime(void) { int f; struct timespec ts; getnanouptime(&ts); printf("Uptime: "); f = 0; if (ts.tv_sec >= 86400) { printf("%ldd", (long)ts.tv_sec / 86400); ts.tv_sec %= 86400; f = 1; } if (f || ts.tv_sec >= 3600) { printf("%ldh", (long)ts.tv_sec / 3600); ts.tv_sec %= 3600; f = 1; } if (f || ts.tv_sec >= 60) { printf("%ldm", (long)ts.tv_sec / 60); ts.tv_sec %= 60; f = 1; } printf("%lds\n", (long)ts.tv_sec); } int doadump(boolean_t textdump) { boolean_t coredump; int error; error = 0; if (dumping) return (EBUSY); - if (dumper.dumper == NULL) + if (TAILQ_EMPTY(&dumper_configs)) return (ENXIO); savectx(&dumppcb); dumptid = curthread->td_tid; dumping++; coredump = TRUE; #ifdef DDB if (textdump && textdump_pending) { coredump = FALSE; - textdump_dumpsys(&dumper); + textdump_dumpsys(TAILQ_FIRST(&dumper_configs)); } #endif - if (coredump) - error = dumpsys(&dumper); + if (coredump) { + struct dumperinfo *di; + + TAILQ_FOREACH(di, &dumper_configs, di_next) { + error = dumpsys(di); + if (error == 0) + break; + } + } dumping--; return (error); } /* * Shutdown the system cleanly to prepare for reboot, halt, or power off. */ void kern_reboot(int howto) { static int once = 0; /* * Normal paths here don't hold Giant, but we can wind up here * unexpectedly with it held. Drop it now so we don't have to * drop and pick it up elsewhere. The paths it is locking will * never be returned to, and it is preferable to preclude * deadlock than to lock against code that won't ever * continue. */ while (mtx_owned(&Giant)) mtx_unlock(&Giant); #if defined(SMP) /* * Bind us to the first CPU so that all shutdown code runs there. Some * systems don't shutdown properly (i.e., ACPI power off) if we * run on another processor. */ if (!SCHEDULER_STOPPED()) { thread_lock(curthread); sched_bind(curthread, CPU_FIRST()); thread_unlock(curthread); KASSERT(PCPU_GET(cpuid) == CPU_FIRST(), ("boot: not running on cpu 0")); } #endif /* We're in the process of rebooting. */ rebooting = 1; /* We are out of the debugger now. */ kdb_active = 0; /* * Do any callouts that should be done BEFORE syncing the filesystems. */ EVENTHANDLER_INVOKE(shutdown_pre_sync, howto); /* * Now sync filesystems */ if (!cold && (howto & RB_NOSYNC) == 0 && once == 0) { once = 1; bufshutdown(show_busybufs); } print_uptime(); cngrab(); /* * Ok, now do things that assume all filesystem activity has * been completed. */ EVENTHANDLER_INVOKE(shutdown_post_sync, howto); if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping) doadump(TRUE); /* Now that we're going to really halt the system... */ EVENTHANDLER_INVOKE(shutdown_final, howto); for(;;) ; /* safety against shutdown_reset not working */ /* NOTREACHED */ } /* * The system call that results in changing the rootfs. */ static int kern_reroot(void) { struct vnode *oldrootvnode, *vp; struct mount *mp, *devmp; int error; if (curproc != initproc) return (EPERM); /* * Mark the filesystem containing currently-running executable * (the temporary copy of init(8)) busy. */ vp = curproc->p_textvp; error = vn_lock(vp, LK_SHARED); if (error != 0) return (error); mp = vp->v_mount; error = vfs_busy(mp, MBF_NOWAIT); if (error != 0) { vfs_ref(mp); VOP_UNLOCK(vp, 0); error = vfs_busy(mp, 0); vn_lock(vp, LK_SHARED | LK_RETRY); vfs_rel(mp); if (error != 0) { VOP_UNLOCK(vp, 0); return (ENOENT); } if (vp->v_iflag & VI_DOOMED) { VOP_UNLOCK(vp, 0); vfs_unbusy(mp); return (ENOENT); } } VOP_UNLOCK(vp, 0); /* * Remove the filesystem containing currently-running executable * from the mount list, to prevent it from being unmounted * by vfs_unmountall(), and to avoid confusing vfs_mountroot(). * * Also preserve /dev - forcibly unmounting it could cause driver * reinitialization. */ vfs_ref(rootdevmp); devmp = rootdevmp; rootdevmp = NULL; mtx_lock(&mountlist_mtx); TAILQ_REMOVE(&mountlist, mp, mnt_list); TAILQ_REMOVE(&mountlist, devmp, mnt_list); mtx_unlock(&mountlist_mtx); oldrootvnode = rootvnode; /* * Unmount everything except for the two filesystems preserved above. */ vfs_unmountall(); /* * Add /dev back; vfs_mountroot() will move it into its new place. */ mtx_lock(&mountlist_mtx); TAILQ_INSERT_HEAD(&mountlist, devmp, mnt_list); mtx_unlock(&mountlist_mtx); rootdevmp = devmp; vfs_rel(rootdevmp); /* * Mount the new rootfs. */ vfs_mountroot(); /* * Update all references to the old rootvnode. */ mountcheckdirs(oldrootvnode, rootvnode); /* * Add the temporary filesystem back and unbusy it. */ mtx_lock(&mountlist_mtx); TAILQ_INSERT_TAIL(&mountlist, mp, mnt_list); mtx_unlock(&mountlist_mtx); vfs_unbusy(mp); return (0); } /* * If the shutdown was a clean halt, behave accordingly. */ static void shutdown_halt(void *junk, int howto) { if (howto & RB_HALT) { printf("\n"); printf("The operating system has halted.\n"); printf("Please press any key to reboot.\n\n"); switch (cngetc()) { case -1: /* No console, just die */ cpu_halt(); /* NOTREACHED */ default: break; } } } /* * Check to see if the system paniced, pause and then reboot * according to the specified delay. */ static void shutdown_panic(void *junk, int howto) { int loop; if (howto & RB_DUMP) { if (panic_reboot_wait_time != 0) { if (panic_reboot_wait_time != -1) { printf("Automatic reboot in %d seconds - " "press a key on the console to abort\n", panic_reboot_wait_time); for (loop = panic_reboot_wait_time * 10; loop > 0; --loop) { DELAY(1000 * 100); /* 1/10th second */ /* Did user type a key? */ if (cncheckc() != -1) break; } if (!loop) return; } } else { /* zero time specified - reboot NOW */ return; } printf("--> Press a key on the console to reboot,\n"); printf("--> or switch off the system now.\n"); cngetc(); } } /* * Everything done, now reset */ static void shutdown_reset(void *junk, int howto) { printf("Rebooting...\n"); DELAY(1000000); /* wait 1 sec for printf's to complete and be read */ /* * Acquiring smp_ipi_mtx here has a double effect: * - it disables interrupts avoiding CPU0 preemption * by fast handlers (thus deadlocking against other CPUs) * - it avoids deadlocks against smp_rendezvous() or, more * generally, threads busy-waiting, with this spinlock held, * and waiting for responses by threads on other CPUs * (ie. smp_tlb_shootdown()). * * For the !SMP case it just needs to handle the former problem. */ #ifdef SMP mtx_lock_spin(&smp_ipi_mtx); #else spinlock_enter(); #endif /* cpu_boot(howto); */ /* doesn't do anything at the moment */ cpu_reset(); /* NOTREACHED */ /* assuming reset worked */ } #if defined(WITNESS) || defined(INVARIANT_SUPPORT) static int kassert_warn_only = 0; #ifdef KDB static int kassert_do_kdb = 0; #endif #ifdef KTR static int kassert_do_ktr = 0; #endif static int kassert_do_log = 1; static int kassert_log_pps_limit = 4; static int kassert_log_mute_at = 0; static int kassert_log_panic_at = 0; static int kassert_suppress_in_panic = 0; static int kassert_warnings = 0; SYSCTL_NODE(_debug, OID_AUTO, kassert, CTLFLAG_RW, NULL, "kassert options"); #ifdef KASSERT_PANIC_OPTIONAL #define KASSERT_RWTUN CTLFLAG_RWTUN #else #define KASSERT_RWTUN CTLFLAG_RDTUN #endif SYSCTL_INT(_debug_kassert, OID_AUTO, warn_only, KASSERT_RWTUN, &kassert_warn_only, 0, "KASSERT triggers a panic (0) or just a warning (1)"); #ifdef KDB SYSCTL_INT(_debug_kassert, OID_AUTO, do_kdb, KASSERT_RWTUN, &kassert_do_kdb, 0, "KASSERT will enter the debugger"); #endif #ifdef KTR SYSCTL_UINT(_debug_kassert, OID_AUTO, do_ktr, KASSERT_RWTUN, &kassert_do_ktr, 0, "KASSERT does a KTR, set this to the KTRMASK you want"); #endif SYSCTL_INT(_debug_kassert, OID_AUTO, do_log, KASSERT_RWTUN, &kassert_do_log, 0, "If warn_only is enabled, log (1) or do not log (0) assertion violations"); SYSCTL_INT(_debug_kassert, OID_AUTO, warnings, KASSERT_RWTUN, &kassert_warnings, 0, "number of KASSERTs that have been triggered"); SYSCTL_INT(_debug_kassert, OID_AUTO, log_panic_at, KASSERT_RWTUN, &kassert_log_panic_at, 0, "max number of KASSERTS before we will panic"); SYSCTL_INT(_debug_kassert, OID_AUTO, log_pps_limit, KASSERT_RWTUN, &kassert_log_pps_limit, 0, "limit number of log messages per second"); SYSCTL_INT(_debug_kassert, OID_AUTO, log_mute_at, KASSERT_RWTUN, &kassert_log_mute_at, 0, "max number of KASSERTS to log"); SYSCTL_INT(_debug_kassert, OID_AUTO, suppress_in_panic, KASSERT_RWTUN, &kassert_suppress_in_panic, 0, "KASSERTs will be suppressed while handling a panic"); #undef KASSERT_RWTUN static int kassert_sysctl_kassert(SYSCTL_HANDLER_ARGS); SYSCTL_PROC(_debug_kassert, OID_AUTO, kassert, CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_SECURE, NULL, 0, kassert_sysctl_kassert, "I", "set to trigger a test kassert"); static int kassert_sysctl_kassert(SYSCTL_HANDLER_ARGS) { int error, i; error = sysctl_wire_old_buffer(req, sizeof(int)); if (error == 0) { i = 0; error = sysctl_handle_int(oidp, &i, 0, req); } if (error != 0 || req->newptr == NULL) return (error); KASSERT(0, ("kassert_sysctl_kassert triggered kassert %d", i)); return (0); } #ifdef KASSERT_PANIC_OPTIONAL /* * Called by KASSERT, this decides if we will panic * or if we will log via printf and/or ktr. */ void kassert_panic(const char *fmt, ...) { static char buf[256]; va_list ap; va_start(ap, fmt); (void)vsnprintf(buf, sizeof(buf), fmt, ap); va_end(ap); /* * If we are suppressing secondary panics, log the warning but do not * re-enter panic/kdb. */ if (panicstr != NULL && kassert_suppress_in_panic) { if (kassert_do_log) { printf("KASSERT failed: %s\n", buf); #ifdef KDB if (trace_all_panics && trace_on_panic) kdb_backtrace(); #endif } return; } /* * panic if we're not just warning, or if we've exceeded * kassert_log_panic_at warnings. */ if (!kassert_warn_only || (kassert_log_panic_at > 0 && kassert_warnings >= kassert_log_panic_at)) { va_start(ap, fmt); vpanic(fmt, ap); /* NORETURN */ } #ifdef KTR if (kassert_do_ktr) CTR0(ktr_mask, buf); #endif /* KTR */ /* * log if we've not yet met the mute limit. */ if (kassert_do_log && (kassert_log_mute_at == 0 || kassert_warnings < kassert_log_mute_at)) { static struct timeval lasterr; static int curerr; if (ppsratecheck(&lasterr, &curerr, kassert_log_pps_limit)) { printf("KASSERT failed: %s\n", buf); kdb_backtrace(); } } #ifdef KDB if (kassert_do_kdb) { kdb_enter(KDB_WHY_KASSERT, buf); } #endif atomic_add_int(&kassert_warnings, 1); } #endif /* KASSERT_PANIC_OPTIONAL */ #endif /* * Panic is called on unresolvable fatal errors. It prints "panic: mesg", * and then reboots. If we are called twice, then we avoid trying to sync * the disks as this often leads to recursive panics. */ void panic(const char *fmt, ...) { va_list ap; va_start(ap, fmt); vpanic(fmt, ap); } void vpanic(const char *fmt, va_list ap) { #ifdef SMP cpuset_t other_cpus; #endif struct thread *td = curthread; int bootopt, newpanic; static char buf[256]; spinlock_enter(); #ifdef SMP /* * stop_cpus_hard(other_cpus) should prevent multiple CPUs from * concurrently entering panic. Only the winner will proceed * further. */ if (panicstr == NULL && !kdb_active) { other_cpus = all_cpus; CPU_CLR(PCPU_GET(cpuid), &other_cpus); stop_cpus_hard(other_cpus); } #endif /* * Ensure that the scheduler is stopped while panicking, even if panic * has been entered from kdb. */ td->td_stopsched = 1; bootopt = RB_AUTOBOOT; newpanic = 0; if (panicstr) bootopt |= RB_NOSYNC; else { bootopt |= RB_DUMP; panicstr = fmt; newpanic = 1; } if (newpanic) { (void)vsnprintf(buf, sizeof(buf), fmt, ap); panicstr = buf; cngrab(); printf("panic: %s\n", buf); } else { printf("panic: "); vprintf(fmt, ap); printf("\n"); } #ifdef SMP printf("cpuid = %d\n", PCPU_GET(cpuid)); #endif printf("time = %jd\n", (intmax_t )time_second); #ifdef KDB if ((newpanic || trace_all_panics) && trace_on_panic) kdb_backtrace(); if (debugger_on_panic) kdb_enter(KDB_WHY_PANIC, "panic"); #endif /*thread_lock(td); */ td->td_flags |= TDF_INPANIC; /* thread_unlock(td); */ if (!sync_on_panic) bootopt |= RB_NOSYNC; if (poweroff_on_panic) bootopt |= RB_POWEROFF; if (powercycle_on_panic) bootopt |= RB_POWERCYCLE; kern_reboot(bootopt); } /* * Support for poweroff delay. * * Please note that setting this delay too short might power off your machine * before the write cache on your hard disk has been flushed, leading to * soft-updates inconsistencies. */ #ifndef POWEROFF_DELAY # define POWEROFF_DELAY 5000 #endif static int poweroff_delay = POWEROFF_DELAY; SYSCTL_INT(_kern_shutdown, OID_AUTO, poweroff_delay, CTLFLAG_RW, &poweroff_delay, 0, "Delay before poweroff to write disk caches (msec)"); static void poweroff_wait(void *junk, int howto) { if ((howto & (RB_POWEROFF | RB_POWERCYCLE)) == 0 || poweroff_delay <= 0) return; DELAY(poweroff_delay * 1000); } /* * Some system processes (e.g. syncer) need to be stopped at appropriate * points in their main loops prior to a system shutdown, so that they * won't interfere with the shutdown process (e.g. by holding a disk buf * to cause sync to fail). For each of these system processes, register * shutdown_kproc() as a handler for one of shutdown events. */ static int kproc_shutdown_wait = 60; SYSCTL_INT(_kern_shutdown, OID_AUTO, kproc_shutdown_wait, CTLFLAG_RW, &kproc_shutdown_wait, 0, "Max wait time (sec) to stop for each process"); void kproc_shutdown(void *arg, int howto) { struct proc *p; int error; if (panicstr) return; p = (struct proc *)arg; printf("Waiting (max %d seconds) for system process `%s' to stop... ", kproc_shutdown_wait, p->p_comm); error = kproc_suspend(p, kproc_shutdown_wait * hz); if (error == EWOULDBLOCK) printf("timed out\n"); else printf("done\n"); } void kthread_shutdown(void *arg, int howto) { struct thread *td; int error; if (panicstr) return; td = (struct thread *)arg; printf("Waiting (max %d seconds) for system thread `%s' to stop... ", kproc_shutdown_wait, td->td_name); error = kthread_suspend(td, kproc_shutdown_wait * hz); if (error == EWOULDBLOCK) printf("timed out\n"); else printf("done\n"); } -static char dumpdevname[sizeof(((struct cdev*)NULL)->si_name)]; -SYSCTL_STRING(_kern_shutdown, OID_AUTO, dumpdevname, CTLFLAG_RD, - dumpdevname, 0, "Device for kernel dumps"); +static int +dumpdevname_sysctl_handler(SYSCTL_HANDLER_ARGS) +{ + char buf[256]; + struct dumperinfo *di; + struct sbuf sb; + int error; + + error = sysctl_wire_old_buffer(req, 0); + if (error != 0) + return (error); + + sbuf_new_for_sysctl(&sb, buf, sizeof(buf), req); + + mtx_lock(&dumpconf_list_lk); + TAILQ_FOREACH(di, &dumper_configs, di_next) { + if (di != TAILQ_FIRST(&dumper_configs)) + sbuf_putc(&sb, ','); + sbuf_cat(&sb, di->di_devname); + } + mtx_unlock(&dumpconf_list_lk); + + error = sbuf_finish(&sb); + sbuf_delete(&sb); + return (error); +} +SYSCTL_PROC(_kern_shutdown, OID_AUTO, dumpdevname, CTLTYPE_STRING | CTLFLAG_RD, + &dumper_configs, 0, dumpdevname_sysctl_handler, "A", + "Device(s) for kernel dumps"); static int _dump_append(struct dumperinfo *di, void *virtual, vm_offset_t physical, size_t length); #ifdef EKCD static struct kerneldumpcrypto * kerneldumpcrypto_create(size_t blocksize, uint8_t encryption, const uint8_t *key, uint32_t encryptedkeysize, const uint8_t *encryptedkey) { struct kerneldumpcrypto *kdc; struct kerneldumpkey *kdk; uint32_t dumpkeysize; dumpkeysize = roundup2(sizeof(*kdk) + encryptedkeysize, blocksize); kdc = malloc(sizeof(*kdc) + dumpkeysize, M_EKCD, M_WAITOK | M_ZERO); arc4rand(kdc->kdc_iv, sizeof(kdc->kdc_iv), 0); kdc->kdc_encryption = encryption; switch (kdc->kdc_encryption) { case KERNELDUMP_ENC_AES_256_CBC: if (rijndael_makeKey(&kdc->kdc_ki, DIR_ENCRYPT, 256, key) <= 0) goto failed; break; default: goto failed; } kdc->kdc_dumpkeysize = dumpkeysize; kdk = kdc->kdc_dumpkey; kdk->kdk_encryption = kdc->kdc_encryption; memcpy(kdk->kdk_iv, kdc->kdc_iv, sizeof(kdk->kdk_iv)); kdk->kdk_encryptedkeysize = htod32(encryptedkeysize); memcpy(kdk->kdk_encryptedkey, encryptedkey, encryptedkeysize); return (kdc); failed: explicit_bzero(kdc, sizeof(*kdc) + dumpkeysize); free(kdc, M_EKCD); return (NULL); } static int kerneldumpcrypto_init(struct kerneldumpcrypto *kdc) { uint8_t hash[SHA256_DIGEST_LENGTH]; SHA256_CTX ctx; struct kerneldumpkey *kdk; int error; error = 0; if (kdc == NULL) return (0); /* * When a user enters ddb it can write a crash dump multiple times. * Each time it should be encrypted using a different IV. */ SHA256_Init(&ctx); SHA256_Update(&ctx, kdc->kdc_iv, sizeof(kdc->kdc_iv)); SHA256_Final(hash, &ctx); bcopy(hash, kdc->kdc_iv, sizeof(kdc->kdc_iv)); switch (kdc->kdc_encryption) { case KERNELDUMP_ENC_AES_256_CBC: if (rijndael_cipherInit(&kdc->kdc_ci, MODE_CBC, kdc->kdc_iv) <= 0) { error = EINVAL; goto out; } break; default: error = EINVAL; goto out; } kdk = kdc->kdc_dumpkey; memcpy(kdk->kdk_iv, kdc->kdc_iv, sizeof(kdk->kdk_iv)); out: explicit_bzero(hash, sizeof(hash)); return (error); } static uint32_t kerneldumpcrypto_dumpkeysize(const struct kerneldumpcrypto *kdc) { if (kdc == NULL) return (0); return (kdc->kdc_dumpkeysize); } #endif /* EKCD */ static struct kerneldumpcomp * kerneldumpcomp_create(struct dumperinfo *di, uint8_t compression) { struct kerneldumpcomp *kdcomp; int format; switch (compression) { case KERNELDUMP_COMP_GZIP: format = COMPRESS_GZIP; break; case KERNELDUMP_COMP_ZSTD: format = COMPRESS_ZSTD; break; default: return (NULL); } kdcomp = malloc(sizeof(*kdcomp), M_DUMPER, M_WAITOK | M_ZERO); kdcomp->kdc_format = compression; kdcomp->kdc_stream = compressor_init(kerneldumpcomp_write_cb, format, di->maxiosize, kerneldump_gzlevel, di); if (kdcomp->kdc_stream == NULL) { free(kdcomp, M_DUMPER); return (NULL); } kdcomp->kdc_buf = malloc(di->maxiosize, M_DUMPER, M_WAITOK | M_NODUMP); return (kdcomp); } static void kerneldumpcomp_destroy(struct dumperinfo *di) { struct kerneldumpcomp *kdcomp; kdcomp = di->kdcomp; if (kdcomp == NULL) return; compressor_fini(kdcomp->kdc_stream); explicit_bzero(kdcomp->kdc_buf, di->maxiosize); free(kdcomp->kdc_buf, M_DUMPER); free(kdcomp, M_DUMPER); } +/* + * Must not be present on global list. + */ +static void +free_single_dumper(struct dumperinfo *di) +{ + + if (di == NULL) + return; + + if (di->blockbuf != NULL) { + explicit_bzero(di->blockbuf, di->blocksize); + free(di->blockbuf, M_DUMPER); + } + + kerneldumpcomp_destroy(di); + +#ifdef EKCD + if (di->kdcrypto != NULL) { + explicit_bzero(di->kdcrypto, sizeof(*di->kdcrypto) + + di->kdcrypto->kdc_dumpkeysize); + free(di->kdcrypto, M_EKCD); + } +#endif + + explicit_bzero(di, sizeof(*di)); + free(di, M_DUMPER); +} + /* Registration of dumpers */ int -set_dumper(struct dumperinfo *di, const char *devname, struct thread *td, - uint8_t compression, uint8_t encryption, const uint8_t *key, - uint32_t encryptedkeysize, const uint8_t *encryptedkey) +dumper_insert(const struct dumperinfo *di_template, const char *devname, + const struct diocskerneldump_arg *kda) { - size_t wantcopy; + struct dumperinfo *newdi, *listdi; + bool inserted; + uint8_t index; int error; - error = priv_check(td, PRIV_SETDUMPER); + index = kda->kda_index; + MPASS(index != KDA_REMOVE && index != KDA_REMOVE_DEV && + index != KDA_REMOVE_ALL); + + error = priv_check(curthread, PRIV_SETDUMPER); if (error != 0) return (error); - if (dumper.dumper != NULL) - return (EBUSY); - dumper = *di; - dumper.blockbuf = NULL; - dumper.kdcrypto = NULL; - dumper.kdcomp = NULL; + newdi = malloc(sizeof(*newdi) + strlen(devname) + 1, M_DUMPER, M_WAITOK + | M_ZERO); + memcpy(newdi, di_template, sizeof(*newdi)); + newdi->blockbuf = NULL; + newdi->kdcrypto = NULL; + newdi->kdcomp = NULL; + strcpy(newdi->di_devname, devname); - if (encryption != KERNELDUMP_ENC_NONE) { + if (kda->kda_encryption != KERNELDUMP_ENC_NONE) { #ifdef EKCD - dumper.kdcrypto = kerneldumpcrypto_create(di->blocksize, - encryption, key, encryptedkeysize, encryptedkey); - if (dumper.kdcrypto == NULL) { + newdi->kdcrypto = kerneldumpcrypto_create(di_template->blocksize, + kda->kda_encryption, kda->kda_key, + kda->kda_encryptedkeysize, kda->kda_encryptedkey); + if (newdi->kdcrypto == NULL) { error = EINVAL; goto cleanup; } #else error = EOPNOTSUPP; goto cleanup; #endif } - - wantcopy = strlcpy(dumpdevname, devname, sizeof(dumpdevname)); - if (wantcopy >= sizeof(dumpdevname)) { - printf("set_dumper: device name truncated from '%s' -> '%s'\n", - devname, dumpdevname); - } - - if (compression != KERNELDUMP_COMP_NONE) { + if (kda->kda_compression != KERNELDUMP_COMP_NONE) { /* * We currently can't support simultaneous encryption and - * compression. + * compression because our only encryption mode is an unpadded + * block cipher, go figure. This is low hanging fruit to fix. */ - if (encryption != KERNELDUMP_ENC_NONE) { + if (kda->kda_encryption != KERNELDUMP_ENC_NONE) { error = EOPNOTSUPP; goto cleanup; } - dumper.kdcomp = kerneldumpcomp_create(&dumper, compression); - if (dumper.kdcomp == NULL) { + newdi->kdcomp = kerneldumpcomp_create(newdi, + kda->kda_compression); + if (newdi->kdcomp == NULL) { error = EINVAL; goto cleanup; } } - dumper.blockbuf = malloc(di->blocksize, M_DUMPER, M_WAITOK | M_ZERO); + newdi->blockbuf = malloc(newdi->blocksize, M_DUMPER, M_WAITOK | M_ZERO); + + /* Add the new configuration to the queue */ + mtx_lock(&dumpconf_list_lk); + inserted = false; + TAILQ_FOREACH(listdi, &dumper_configs, di_next) { + if (index == 0) { + TAILQ_INSERT_BEFORE(listdi, newdi, di_next); + inserted = true; + break; + } + index--; + } + if (!inserted) + TAILQ_INSERT_TAIL(&dumper_configs, newdi, di_next); + mtx_unlock(&dumpconf_list_lk); + return (0); cleanup: - (void)clear_dumper(td); + free_single_dumper(newdi); return (error); } -int -clear_dumper(struct thread *td) +static bool +dumper_config_match(const struct dumperinfo *di, const char *devname, + const struct diocskerneldump_arg *kda) { - int error; + if (kda->kda_index == KDA_REMOVE_ALL) + return (true); - error = priv_check(td, PRIV_SETDUMPER); - if (error != 0) - return (error); + if (strcmp(di->di_devname, devname) != 0) + return (false); -#ifdef NETDUMP - netdump_mbuf_drain(); -#endif + /* + * Allow wildcard removal of configs matching a device on g_dev_orphan. + */ + if (kda->kda_index == KDA_REMOVE_DEV) + return (true); + if (di->kdcomp != NULL) { + if (di->kdcomp->kdc_format != kda->kda_compression) + return (false); + } else if (kda->kda_compression != KERNELDUMP_COMP_NONE) + return (false); #ifdef EKCD - if (dumper.kdcrypto != NULL) { - explicit_bzero(dumper.kdcrypto, sizeof(*dumper.kdcrypto) + - dumper.kdcrypto->kdc_dumpkeysize); - free(dumper.kdcrypto, M_EKCD); - } + if (di->kdcrypto != NULL) { + if (di->kdcrypto->kdc_encryption != kda->kda_encryption) + return (false); + /* + * Do we care to verify keys match to delete? It seems weird + * to expect multiple fallback dump configurations on the same + * device that only differ in crypto key. + */ + } else #endif + if (kda->kda_encryption != KERNELDUMP_ENC_NONE) + return (false); - kerneldumpcomp_destroy(&dumper); + return (true); +} + +int +dumper_remove(const char *devname, const struct diocskerneldump_arg *kda) +{ + struct dumperinfo *di, *sdi; + bool found; + int error; - if (dumper.blockbuf != NULL) { - explicit_bzero(dumper.blockbuf, dumper.blocksize); - free(dumper.blockbuf, M_DUMPER); + error = priv_check(curthread, PRIV_SETDUMPER); + if (error != 0) + return (error); + + /* + * Try to find a matching configuration, and kill it. + * + * NULL 'kda' indicates remove any configuration matching 'devname', + * which may remove multiple configurations in atypical configurations. + */ + found = false; + mtx_lock(&dumpconf_list_lk); + TAILQ_FOREACH_SAFE(di, &dumper_configs, di_next, sdi) { + if (dumper_config_match(di, devname, kda)) { + found = true; + TAILQ_REMOVE(&dumper_configs, di, di_next); + free_single_dumper(di); + } } - explicit_bzero(&dumper, sizeof(dumper)); - dumpdevname[0] = '\0'; + mtx_unlock(&dumpconf_list_lk); + + /* Only produce ENOENT if a more targeted match didn't match. */ + if (!found && kda->kda_index == KDA_REMOVE) + return (ENOENT); return (0); } static int dump_check_bounds(struct dumperinfo *di, off_t offset, size_t length) { if (di->mediasize > 0 && length != 0 && (offset < di->mediaoffset || offset - di->mediaoffset + length > di->mediasize)) { if (di->kdcomp != NULL && offset >= di->mediaoffset) { printf( "Compressed dump failed to fit in device boundaries.\n"); return (E2BIG); } printf("Attempt to write outside dump device boundaries.\n" "offset(%jd), mediaoffset(%jd), length(%ju), mediasize(%jd).\n", (intmax_t)offset, (intmax_t)di->mediaoffset, (uintmax_t)length, (intmax_t)di->mediasize); return (ENOSPC); } if (length % di->blocksize != 0) { printf("Attempt to write partial block of length %ju.\n", (uintmax_t)length); return (EINVAL); } if (offset % di->blocksize != 0) { printf("Attempt to write at unaligned offset %jd.\n", (intmax_t)offset); return (EINVAL); } return (0); } #ifdef EKCD static int dump_encrypt(struct kerneldumpcrypto *kdc, uint8_t *buf, size_t size) { switch (kdc->kdc_encryption) { case KERNELDUMP_ENC_AES_256_CBC: if (rijndael_blockEncrypt(&kdc->kdc_ci, &kdc->kdc_ki, buf, 8 * size, buf) <= 0) { return (EIO); } if (rijndael_cipherInit(&kdc->kdc_ci, MODE_CBC, buf + size - 16 /* IV size for AES-256-CBC */) <= 0) { return (EIO); } break; default: return (EINVAL); } return (0); } /* Encrypt data and call dumper. */ static int dump_encrypted_write(struct dumperinfo *di, void *virtual, vm_offset_t physical, off_t offset, size_t length) { static uint8_t buf[KERNELDUMP_BUFFER_SIZE]; struct kerneldumpcrypto *kdc; int error; size_t nbytes; kdc = di->kdcrypto; while (length > 0) { nbytes = MIN(length, sizeof(buf)); bcopy(virtual, buf, nbytes); if (dump_encrypt(kdc, buf, nbytes) != 0) return (EIO); error = dump_write(di, buf, physical, offset, nbytes); if (error != 0) return (error); offset += nbytes; virtual = (void *)((uint8_t *)virtual + nbytes); length -= nbytes; } return (0); } #endif /* EKCD */ static int kerneldumpcomp_write_cb(void *base, size_t length, off_t offset, void *arg) { struct dumperinfo *di; size_t resid, rlength; int error; di = arg; if (length % di->blocksize != 0) { /* * This must be the final write after flushing the compression * stream. Write as many full blocks as possible and stash the * residual data in the dumper's block buffer. It will be * padded and written in dump_finish(). */ rlength = rounddown(length, di->blocksize); if (rlength != 0) { error = _dump_append(di, base, 0, rlength); if (error != 0) return (error); } resid = length - rlength; memmove(di->blockbuf, (uint8_t *)base + rlength, resid); di->kdcomp->kdc_resid = resid; return (EAGAIN); } return (_dump_append(di, base, 0, length)); } /* * Write kernel dump headers at the beginning and end of the dump extent. * Write the kernel dump encryption key after the leading header if we were * configured to do so. */ static int dump_write_headers(struct dumperinfo *di, struct kerneldumpheader *kdh) { #ifdef EKCD struct kerneldumpcrypto *kdc; #endif void *buf, *key; size_t hdrsz; uint64_t extent; uint32_t keysize; int error; hdrsz = sizeof(*kdh); if (hdrsz > di->blocksize) return (ENOMEM); #ifdef EKCD kdc = di->kdcrypto; key = kdc->kdc_dumpkey; keysize = kerneldumpcrypto_dumpkeysize(kdc); #else key = NULL; keysize = 0; #endif /* * If the dump device has special handling for headers, let it take care * of writing them out. */ if (di->dumper_hdr != NULL) return (di->dumper_hdr(di, kdh, key, keysize)); if (hdrsz == di->blocksize) buf = kdh; else { buf = di->blockbuf; memset(buf, 0, di->blocksize); memcpy(buf, kdh, hdrsz); } extent = dtoh64(kdh->dumpextent); #ifdef EKCD if (kdc != NULL) { error = dump_write(di, kdc->kdc_dumpkey, 0, di->mediaoffset + di->mediasize - di->blocksize - extent - keysize, keysize); if (error != 0) return (error); } #endif error = dump_write(di, buf, 0, di->mediaoffset + di->mediasize - 2 * di->blocksize - extent - keysize, di->blocksize); if (error == 0) error = dump_write(di, buf, 0, di->mediaoffset + di->mediasize - di->blocksize, di->blocksize); return (error); } /* * Don't touch the first SIZEOF_METADATA bytes on the dump device. This is to * protect us from metadata and metadata from us. */ #define SIZEOF_METADATA (64 * 1024) /* * Do some preliminary setup for a kernel dump: initialize state for encryption, * if requested, and make sure that we have enough space on the dump device. * * We set things up so that the dump ends before the last sector of the dump * device, at which the trailing header is written. * * +-----------+------+-----+----------------------------+------+ * | | lhdr | key | ... kernel dump ... | thdr | * +-----------+------+-----+----------------------------+------+ * 1 blk opt <------- dump extent --------> 1 blk * * Dumps written using dump_append() start at the beginning of the extent. * Uncompressed dumps will use the entire extent, but compressed dumps typically * will not. The true length of the dump is recorded in the leading and trailing * headers once the dump has been completed. * * The dump device may provide a callback, in which case it will initialize * dumpoff and take care of laying out the headers. */ int dump_start(struct dumperinfo *di, struct kerneldumpheader *kdh) { uint64_t dumpextent, span; uint32_t keysize; int error; #ifdef EKCD error = kerneldumpcrypto_init(di->kdcrypto); if (error != 0) return (error); keysize = kerneldumpcrypto_dumpkeysize(di->kdcrypto); #else error = 0; keysize = 0; #endif if (di->dumper_start != NULL) { error = di->dumper_start(di); } else { dumpextent = dtoh64(kdh->dumpextent); span = SIZEOF_METADATA + dumpextent + 2 * di->blocksize + keysize; if (di->mediasize < span) { if (di->kdcomp == NULL) return (E2BIG); /* * We don't yet know how much space the compressed dump * will occupy, so try to use the whole swap partition * (minus the first 64KB) in the hope that the * compressed dump will fit. If that doesn't turn out to * be enough, the bounds checking in dump_write() * will catch us and cause the dump to fail. */ dumpextent = di->mediasize - span + dumpextent; kdh->dumpextent = htod64(dumpextent); } /* * The offset at which to begin writing the dump. */ di->dumpoff = di->mediaoffset + di->mediasize - di->blocksize - dumpextent; } di->origdumpoff = di->dumpoff; return (error); } static int _dump_append(struct dumperinfo *di, void *virtual, vm_offset_t physical, size_t length) { int error; #ifdef EKCD if (di->kdcrypto != NULL) error = dump_encrypted_write(di, virtual, physical, di->dumpoff, length); else #endif error = dump_write(di, virtual, physical, di->dumpoff, length); if (error == 0) di->dumpoff += length; return (error); } /* * Write to the dump device starting at dumpoff. When compression is enabled, * writes to the device will be performed using a callback that gets invoked * when the compression stream's output buffer is full. */ int dump_append(struct dumperinfo *di, void *virtual, vm_offset_t physical, size_t length) { void *buf; if (di->kdcomp != NULL) { /* Bounce through a buffer to avoid CRC errors. */ if (length > di->maxiosize) return (EINVAL); buf = di->kdcomp->kdc_buf; memmove(buf, virtual, length); return (compressor_write(di->kdcomp->kdc_stream, buf, length)); } return (_dump_append(di, virtual, physical, length)); } /* * Write to the dump device at the specified offset. */ int dump_write(struct dumperinfo *di, void *virtual, vm_offset_t physical, off_t offset, size_t length) { int error; error = dump_check_bounds(di, offset, length); if (error != 0) return (error); return (di->dumper(di->priv, virtual, physical, offset, length)); } /* * Perform kernel dump finalization: flush the compression stream, if necessary, * write the leading and trailing kernel dump headers now that we know the true * length of the dump, and optionally write the encryption key following the * leading header. */ int dump_finish(struct dumperinfo *di, struct kerneldumpheader *kdh) { int error; if (di->kdcomp != NULL) { error = compressor_flush(di->kdcomp->kdc_stream); if (error == EAGAIN) { /* We have residual data in di->blockbuf. */ error = dump_write(di, di->blockbuf, 0, di->dumpoff, di->blocksize); di->dumpoff += di->kdcomp->kdc_resid; di->kdcomp->kdc_resid = 0; } if (error != 0) return (error); /* * We now know the size of the compressed dump, so update the * header accordingly and recompute parity. */ kdh->dumplength = htod64(di->dumpoff - di->origdumpoff); kdh->parity = 0; kdh->parity = kerneldump_parity(kdh); compressor_reset(di->kdcomp->kdc_stream); } error = dump_write_headers(di, kdh); if (error != 0) return (error); (void)dump_write(di, NULL, 0, 0, 0); return (0); } void dump_init_header(const struct dumperinfo *di, struct kerneldumpheader *kdh, char *magic, uint32_t archver, uint64_t dumplen) { size_t dstsize; bzero(kdh, sizeof(*kdh)); strlcpy(kdh->magic, magic, sizeof(kdh->magic)); strlcpy(kdh->architecture, MACHINE_ARCH, sizeof(kdh->architecture)); kdh->version = htod32(KERNELDUMPVERSION); kdh->architectureversion = htod32(archver); kdh->dumplength = htod64(dumplen); kdh->dumpextent = kdh->dumplength; kdh->dumptime = htod64(time_second); #ifdef EKCD kdh->dumpkeysize = htod32(kerneldumpcrypto_dumpkeysize(di->kdcrypto)); #else kdh->dumpkeysize = 0; #endif kdh->blocksize = htod32(di->blocksize); strlcpy(kdh->hostname, prison0.pr_hostname, sizeof(kdh->hostname)); dstsize = sizeof(kdh->versionstring); if (strlcpy(kdh->versionstring, version, dstsize) >= dstsize) kdh->versionstring[dstsize - 2] = '\n'; if (panicstr != NULL) strlcpy(kdh->panicstring, panicstr, sizeof(kdh->panicstring)); if (di->kdcomp != NULL) kdh->compression = di->kdcomp->kdc_format; kdh->parity = kerneldump_parity(kdh); } #ifdef DDB DB_SHOW_COMMAND(panic, db_show_panic) { if (panicstr == NULL) db_printf("panicstr not set\n"); else db_printf("panic: %s\n", panicstr); } #endif diff --git a/sys/netinet/netdump/netdump.h b/sys/netinet/netdump/netdump.h index 7575afecd1b7..cdf53b78c50e 100644 --- a/sys/netinet/netdump/netdump.h +++ b/sys/netinet/netdump/netdump.h @@ -1,130 +1,130 @@ /*- * Copyright (c) 2005-2014 Sandvine Incorporated * Copyright (c) 2000 Darrell Anderson * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _NETINET_NETDUMP_H_ #define _NETINET_NETDUMP_H_ #include #include #include #include #include #define NETDUMP_PORT 20023 /* Server UDP port for heralds. */ #define NETDUMP_ACKPORT 20024 /* Client UDP port for acks. */ #define NETDUMP_HERALD 1 /* Broadcast before starting a dump. */ #define NETDUMP_FINISHED 2 /* Send after finishing a dump. */ #define NETDUMP_VMCORE 3 /* Contains dump data. */ #define NETDUMP_KDH 4 /* Contains kernel dump header. */ #define NETDUMP_EKCD_KEY 5 /* Contains kernel dump key. */ #define NETDUMP_DATASIZE 4096 /* Arbitrary packet size limit. */ struct netdump_msg_hdr { uint32_t mh_type; /* Netdump message type. */ uint32_t mh_seqno; /* Match acks with msgs. */ uint64_t mh_offset; /* vmcore offset (bytes). */ uint32_t mh_len; /* Attached data (bytes). */ uint32_t mh__pad; } __packed; struct netdump_ack { uint32_t na_seqno; /* Match acks with msgs. */ } __packed; -struct netdump_conf { - struct diocskerneldump_arg ndc_kda; - char ndc_iface[IFNAMSIZ]; - struct in_addr ndc_server; - struct in_addr ndc_client; - struct in_addr ndc_gateway; +struct netdump_conf_freebsd12 { + struct diocskerneldump_arg_freebsd12 ndc12_kda; + char ndc12_iface[IFNAMSIZ]; + struct in_addr ndc12_server; + struct in_addr ndc12_client; + struct in_addr ndc12_gateway; }; -#define _PATH_NETDUMP "/dev/netdump" +#define NETDUMPGCONF_FREEBSD12 _IOR('n', 1, struct netdump_conf_freebsd12) +#define NETDUMPSCONF_FREEBSD12 _IOW('n', 2, struct netdump_conf_freebsd12) -#define NETDUMPGCONF _IOR('n', 1, struct netdump_conf) -#define NETDUMPSCONF _IOW('n', 2, struct netdump_conf) +#define _PATH_NETDUMP "/dev/netdump" #ifdef _KERNEL #ifdef NETDUMP #define NETDUMP_MAX_IN_FLIGHT 64 enum netdump_ev { NETDUMP_START, NETDUMP_END, }; struct ifnet; struct mbuf; void netdump_reinit(struct ifnet *); typedef void netdump_init_t(struct ifnet *, int *nrxr, int *ncl, int *clsize); typedef void netdump_event_t(struct ifnet *, enum netdump_ev); typedef int netdump_transmit_t(struct ifnet *, struct mbuf *); typedef int netdump_poll_t(struct ifnet *, int); struct netdump_methods { netdump_init_t *nd_init; netdump_event_t *nd_event; netdump_transmit_t *nd_transmit; netdump_poll_t *nd_poll; }; #define NETDUMP_DEFINE(driver) \ static netdump_init_t driver##_netdump_init; \ static netdump_event_t driver##_netdump_event; \ static netdump_transmit_t driver##_netdump_transmit; \ static netdump_poll_t driver##_netdump_poll; \ \ static struct netdump_methods driver##_netdump_methods = { \ .nd_init = driver##_netdump_init, \ .nd_event = driver##_netdump_event, \ .nd_transmit = driver##_netdump_transmit, \ .nd_poll = driver##_netdump_poll, \ } #define NETDUMP_REINIT(ifp) netdump_reinit(ifp) #define NETDUMP_SET(ifp, driver) \ (ifp)->if_netdump_methods = &driver##_netdump_methods #else /* !NETDUMP */ #define NETDUMP_DEFINE(driver) #define NETDUMP_REINIT(ifp) #define NETDUMP_SET(ifp, driver) #endif /* NETDUMP */ #endif /* _KERNEL */ #endif /* _NETINET_NETDUMP_H_ */ diff --git a/sys/netinet/netdump/netdump_client.c b/sys/netinet/netdump/netdump_client.c index 2503491ca131..7aff10609c39 100644 --- a/sys/netinet/netdump/netdump_client.c +++ b/sys/netinet/netdump/netdump_client.c @@ -1,1323 +1,1421 @@ /*- * Copyright (c) 2005-2014 Sandvine Incorporated. All rights reserved. * Copyright (c) 2000 Darrell Anderson * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* * netdump_client.c * FreeBSD subsystem supporting netdump network dumps. * A dedicated server must be running to accept client dumps. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define NETDDEBUG(f, ...) do { \ if (nd_debug > 0) \ printf(("%s: " f), __func__, ## __VA_ARGS__); \ } while (0) #define NETDDEBUG_IF(i, f, ...) do { \ if (nd_debug > 0) \ if_printf((i), ("%s: " f), __func__, ## __VA_ARGS__); \ } while (0) #define NETDDEBUGV(f, ...) do { \ if (nd_debug > 1) \ printf(("%s: " f), __func__, ## __VA_ARGS__); \ } while (0) #define NETDDEBUGV_IF(i, f, ...) do { \ if (nd_debug > 1) \ if_printf((i), ("%s: " f), __func__, ## __VA_ARGS__); \ } while (0) static int netdump_arp_gw(void); static void netdump_cleanup(void); -static int netdump_configure(struct netdump_conf *, struct thread *); +static int netdump_configure(struct diocskerneldump_arg *, + struct thread *); static int netdump_dumper(void *priv __unused, void *virtual, vm_offset_t physical __unused, off_t offset, size_t length); static int netdump_ether_output(struct mbuf *m, struct ifnet *ifp, struct ether_addr dst, u_short etype); static void netdump_handle_arp(struct mbuf **mb); static void netdump_handle_ip(struct mbuf **mb); static int netdump_ioctl(struct cdev *dev __unused, u_long cmd, caddr_t addr, int flags __unused, struct thread *td); static int netdump_modevent(module_t mod, int type, void *priv); static void netdump_network_poll(void); static void netdump_pkt_in(struct ifnet *ifp, struct mbuf *m); static int netdump_send(uint32_t type, off_t offset, unsigned char *data, uint32_t datalen); static int netdump_send_arp(in_addr_t dst); static int netdump_start(struct dumperinfo *di); static int netdump_udp_output(struct mbuf *m); /* Must be at least as big as the chunks dumpsys() gives us. */ static unsigned char nd_buf[MAXDUMPPGS * PAGE_SIZE]; static uint32_t nd_seqno; static int dump_failed, have_gw_mac; static void (*drv_if_input)(struct ifnet *, struct mbuf *); static int restore_gw_addr; static uint64_t rcvd_acks; CTASSERT(sizeof(rcvd_acks) * NBBY == NETDUMP_MAX_IN_FLIGHT); /* Configuration parameters. */ -static struct netdump_conf nd_conf; -#define nd_server nd_conf.ndc_server -#define nd_client nd_conf.ndc_client -#define nd_gateway nd_conf.ndc_gateway +static struct diocskerneldump_arg nd_conf; +#define nd_server nd_conf.kda_server.in4 +#define nd_client nd_conf.kda_client.in4 +#define nd_gateway nd_conf.kda_gateway.in4 /* General dynamic settings. */ static struct ether_addr nd_gw_mac; static struct ifnet *nd_ifp; static uint16_t nd_server_port = NETDUMP_PORT; FEATURE(netdump, "Netdump client support"); static SYSCTL_NODE(_net, OID_AUTO, netdump, CTLFLAG_RD, NULL, "netdump parameters"); static int nd_debug; SYSCTL_INT(_net_netdump, OID_AUTO, debug, CTLFLAG_RWTUN, &nd_debug, 0, "Debug message verbosity"); static int nd_enabled; SYSCTL_INT(_net_netdump, OID_AUTO, enabled, CTLFLAG_RD, &nd_enabled, 0, "netdump configuration status"); static char nd_path[MAXPATHLEN]; SYSCTL_STRING(_net_netdump, OID_AUTO, path, CTLFLAG_RW, nd_path, sizeof(nd_path), "Server path for output files"); static int nd_polls = 2000; SYSCTL_INT(_net_netdump, OID_AUTO, polls, CTLFLAG_RWTUN, &nd_polls, 0, "Number of times to poll before assuming packet loss (0.5ms per poll)"); static int nd_retries = 10; SYSCTL_INT(_net_netdump, OID_AUTO, retries, CTLFLAG_RWTUN, &nd_retries, 0, "Number of retransmit attempts before giving up"); static int nd_arp_retries = 3; SYSCTL_INT(_net_netdump, OID_AUTO, arp_retries, CTLFLAG_RWTUN, &nd_arp_retries, 0, "Number of ARP attempts before giving up"); /* * Checks for netdump support on a network interface * * Parameters: * ifp The network interface that is being tested for support * * Returns: * int 1 if the interface is supported, 0 if not */ static bool netdump_supported_nic(struct ifnet *ifp) { return (ifp->if_netdump_methods != NULL); } /*- * Network specific primitives. * Following down the code they are divided ordered as: * - Packet buffer primitives * - Output primitives * - Input primitives * - Polling primitives */ /* * Handles creation of the ethernet header, then places outgoing packets into * the tx buffer for the NIC * * Parameters: * m The mbuf containing the packet to be sent (will be freed by * this function or the NIC driver) * ifp The interface to send on * dst The destination ethernet address (source address will be looked * up using ifp) * etype The ETHERTYPE_* value for the protocol that is being sent * * Returns: * int see errno.h, 0 for success */ static int netdump_ether_output(struct mbuf *m, struct ifnet *ifp, struct ether_addr dst, u_short etype) { struct ether_header *eh; if (((ifp->if_flags & (IFF_MONITOR | IFF_UP)) != IFF_UP) || (ifp->if_drv_flags & IFF_DRV_RUNNING) != IFF_DRV_RUNNING) { if_printf(ifp, "netdump_ether_output: interface isn't up\n"); m_freem(m); return (ENETDOWN); } /* Fill in the ethernet header. */ M_PREPEND(m, ETHER_HDR_LEN, M_NOWAIT); if (m == NULL) { printf("%s: out of mbufs\n", __func__); return (ENOBUFS); } eh = mtod(m, struct ether_header *); memcpy(eh->ether_shost, IF_LLADDR(ifp), ETHER_ADDR_LEN); memcpy(eh->ether_dhost, dst.octet, ETHER_ADDR_LEN); eh->ether_type = htons(etype); return ((ifp->if_netdump_methods->nd_transmit)(ifp, m)); } /* * Unreliable transmission of an mbuf chain to the netdump server * Note: can't handle fragmentation; fails if the packet is larger than * nd_ifp->if_mtu after adding the UDP/IP headers * * Parameters: * m mbuf chain * * Returns: * int see errno.h, 0 for success */ static int netdump_udp_output(struct mbuf *m) { struct udpiphdr *ui; struct ip *ip; MPASS(nd_ifp != NULL); M_PREPEND(m, sizeof(struct udpiphdr), M_NOWAIT); if (m == NULL) { printf("%s: out of mbufs\n", __func__); return (ENOBUFS); } if (m->m_pkthdr.len > nd_ifp->if_mtu) { printf("netdump_udp_output: Packet is too big: %d > MTU %u\n", m->m_pkthdr.len, nd_ifp->if_mtu); m_freem(m); return (ENOBUFS); } ui = mtod(m, struct udpiphdr *); bzero(ui->ui_x1, sizeof(ui->ui_x1)); ui->ui_pr = IPPROTO_UDP; ui->ui_len = htons(m->m_pkthdr.len - sizeof(struct ip)); ui->ui_ulen = ui->ui_len; ui->ui_src = nd_client; ui->ui_dst = nd_server; /* Use this src port so that the server can connect() the socket */ ui->ui_sport = htons(NETDUMP_ACKPORT); ui->ui_dport = htons(nd_server_port); ui->ui_sum = 0; if ((ui->ui_sum = in_cksum(m, m->m_pkthdr.len)) == 0) ui->ui_sum = 0xffff; ip = mtod(m, struct ip *); ip->ip_v = IPVERSION; ip->ip_hl = sizeof(struct ip) >> 2; ip->ip_tos = 0; ip->ip_len = htons(m->m_pkthdr.len); ip->ip_id = 0; ip->ip_off = htons(IP_DF); ip->ip_ttl = 255; ip->ip_sum = 0; ip->ip_sum = in_cksum(m, sizeof(struct ip)); return (netdump_ether_output(m, nd_ifp, nd_gw_mac, ETHERTYPE_IP)); } /* * Builds and sends a single ARP request to locate the server * * Return value: * 0 on success * errno on error */ static int netdump_send_arp(in_addr_t dst) { struct ether_addr bcast; struct mbuf *m; struct arphdr *ah; int pktlen; MPASS(nd_ifp != NULL); /* Fill-up a broadcast address. */ memset(&bcast, 0xFF, ETHER_ADDR_LEN); m = m_gethdr(M_NOWAIT, MT_DATA); if (m == NULL) { printf("netdump_send_arp: Out of mbufs\n"); return (ENOBUFS); } pktlen = arphdr_len2(ETHER_ADDR_LEN, sizeof(struct in_addr)); m->m_len = pktlen; m->m_pkthdr.len = pktlen; MH_ALIGN(m, pktlen); ah = mtod(m, struct arphdr *); ah->ar_hrd = htons(ARPHRD_ETHER); ah->ar_pro = htons(ETHERTYPE_IP); ah->ar_hln = ETHER_ADDR_LEN; ah->ar_pln = sizeof(struct in_addr); ah->ar_op = htons(ARPOP_REQUEST); memcpy(ar_sha(ah), IF_LLADDR(nd_ifp), ETHER_ADDR_LEN); ((struct in_addr *)ar_spa(ah))->s_addr = nd_client.s_addr; bzero(ar_tha(ah), ETHER_ADDR_LEN); ((struct in_addr *)ar_tpa(ah))->s_addr = dst; return (netdump_ether_output(m, nd_ifp, bcast, ETHERTYPE_ARP)); } /* * Sends ARP requests to locate the server and waits for a response. * We first try to ARP the server itself, and fall back to the provided * gateway if the server appears to be off-link. * * Return value: * 0 on success * errno on error */ static int netdump_arp_gw(void) { in_addr_t dst; int error, polls, retries; dst = nd_server.s_addr; restart: for (retries = 0; retries < nd_arp_retries && have_gw_mac == 0; retries++) { error = netdump_send_arp(dst); if (error != 0) return (error); for (polls = 0; polls < nd_polls && have_gw_mac == 0; polls++) { netdump_network_poll(); DELAY(500); } if (have_gw_mac == 0) printf("(ARP retry)"); } if (have_gw_mac != 0) return (0); if (dst == nd_server.s_addr && nd_server.s_addr != nd_gateway.s_addr) { printf("Failed to ARP server, trying to reach gateway...\n"); dst = nd_gateway.s_addr; goto restart; } printf("\nARP timed out.\n"); return (ETIMEDOUT); } /* * Dummy free function for netdump clusters. */ static void netdump_mbuf_free(struct mbuf *m __unused) { } /* * Construct and reliably send a netdump packet. May fail from a resource * shortage or extreme number of unacknowledged retransmissions. Wait for * an acknowledgement before returning. Splits packets into chunks small * enough to be sent without fragmentation (looks up the interface MTU) * * Parameters: * type netdump packet type (HERALD, FINISHED, or VMCORE) * offset vmcore data offset (bytes) * data vmcore data * datalen vmcore data size (bytes) * * Returns: * int see errno.h, 0 for success */ static int netdump_send(uint32_t type, off_t offset, unsigned char *data, uint32_t datalen) { struct netdump_msg_hdr *nd_msg_hdr; struct mbuf *m, *m2; uint64_t want_acks; uint32_t i, pktlen, sent_so_far; int retries, polls, error; want_acks = 0; rcvd_acks = 0; retries = 0; MPASS(nd_ifp != NULL); retransmit: /* Chunks can be too big to fit in packets. */ for (i = sent_so_far = 0; sent_so_far < datalen || (i == 0 && datalen == 0); i++) { pktlen = datalen - sent_so_far; /* First bound: the packet structure. */ pktlen = min(pktlen, NETDUMP_DATASIZE); /* Second bound: the interface MTU (assume no IP options). */ pktlen = min(pktlen, nd_ifp->if_mtu - sizeof(struct udpiphdr) - sizeof(struct netdump_msg_hdr)); /* * Check if it is retransmitting and this has been ACKed * already. */ if ((rcvd_acks & (1 << i)) != 0) { sent_so_far += pktlen; continue; } /* * Get and fill a header mbuf, then chain data as an extended * mbuf. */ m = m_gethdr(M_NOWAIT, MT_DATA); if (m == NULL) { printf("netdump_send: Out of mbufs\n"); return (ENOBUFS); } m->m_len = sizeof(struct netdump_msg_hdr); m->m_pkthdr.len = sizeof(struct netdump_msg_hdr); MH_ALIGN(m, sizeof(struct netdump_msg_hdr)); nd_msg_hdr = mtod(m, struct netdump_msg_hdr *); nd_msg_hdr->mh_seqno = htonl(nd_seqno + i); nd_msg_hdr->mh_type = htonl(type); nd_msg_hdr->mh_offset = htobe64(offset + sent_so_far); nd_msg_hdr->mh_len = htonl(pktlen); nd_msg_hdr->mh__pad = 0; if (pktlen != 0) { m2 = m_get(M_NOWAIT, MT_DATA); if (m2 == NULL) { m_freem(m); printf("netdump_send: Out of mbufs\n"); return (ENOBUFS); } MEXTADD(m2, data + sent_so_far, pktlen, netdump_mbuf_free, NULL, NULL, 0, EXT_DISPOSABLE); m2->m_len = pktlen; m_cat(m, m2); m->m_pkthdr.len += pktlen; } error = netdump_udp_output(m); if (error != 0) return (error); /* Note that we're waiting for this packet in the bitfield. */ want_acks |= (1 << i); sent_so_far += pktlen; } if (i >= NETDUMP_MAX_IN_FLIGHT) printf("Warning: Sent more than %d packets (%d). " "Acknowledgements will fail unless the size of " "rcvd_acks/want_acks is increased.\n", NETDUMP_MAX_IN_FLIGHT, i); /* * Wait for acks. A *real* window would speed things up considerably. */ polls = 0; while (rcvd_acks != want_acks) { if (polls++ > nd_polls) { if (retries++ > nd_retries) return (ETIMEDOUT); printf(". "); goto retransmit; } netdump_network_poll(); DELAY(500); } nd_seqno += i; return (0); } /* * Handler for IP packets: checks their sanity and then processes any netdump * ACK packets it finds. * * It needs to replicate partially the behaviour of ip_input() and * udp_input(). * * Parameters: * mb a pointer to an mbuf * containing the packet received * Updates *mb if m_pullup et al change the pointer * Assumes the calling function will take care of freeing the mbuf */ static void netdump_handle_ip(struct mbuf **mb) { struct ip *ip; struct udpiphdr *udp; struct netdump_ack *nd_ack; struct mbuf *m; int rcv_ackno; unsigned short hlen; /* IP processing. */ m = *mb; if (m->m_pkthdr.len < sizeof(struct ip)) { NETDDEBUG("dropping packet too small for IP header\n"); return; } if (m->m_len < sizeof(struct ip)) { m = m_pullup(m, sizeof(struct ip)); *mb = m; if (m == NULL) { NETDDEBUG("m_pullup failed\n"); return; } } ip = mtod(m, struct ip *); /* IP version. */ if (ip->ip_v != IPVERSION) { NETDDEBUG("bad IP version %d\n", ip->ip_v); return; } /* Header length. */ hlen = ip->ip_hl << 2; if (hlen < sizeof(struct ip)) { NETDDEBUG("bad IP header length (%hu)\n", hlen); return; } if (hlen > m->m_len) { m = m_pullup(m, hlen); *mb = m; if (m == NULL) { NETDDEBUG("m_pullup failed\n"); return; } ip = mtod(m, struct ip *); } /* Ignore packets with IP options. */ if (hlen > sizeof(struct ip)) { NETDDEBUG("drop packet with IP options\n"); return; } #ifdef INVARIANTS if ((IN_LOOPBACK(ntohl(ip->ip_dst.s_addr)) || IN_LOOPBACK(ntohl(ip->ip_src.s_addr))) && (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) == 0) { NETDDEBUG("Bad IP header (RFC1122)\n"); return; } #endif /* Checksum. */ if ((m->m_pkthdr.csum_flags & CSUM_IP_CHECKED) != 0) { if ((m->m_pkthdr.csum_flags & CSUM_IP_VALID) == 0) { NETDDEBUG("bad IP checksum\n"); return; } } else { /* XXX */ ; } /* Convert fields to host byte order. */ ip->ip_len = ntohs(ip->ip_len); if (ip->ip_len < hlen) { NETDDEBUG("IP packet smaller (%hu) than header (%hu)\n", ip->ip_len, hlen); return; } if (m->m_pkthdr.len < ip->ip_len) { NETDDEBUG("IP packet bigger (%hu) than ethernet packet (%d)\n", ip->ip_len, m->m_pkthdr.len); return; } if (m->m_pkthdr.len > ip->ip_len) { /* Truncate the packet to the IP length. */ if (m->m_len == m->m_pkthdr.len) { m->m_len = ip->ip_len; m->m_pkthdr.len = ip->ip_len; } else m_adj(m, ip->ip_len - m->m_pkthdr.len); } ip->ip_off = ntohs(ip->ip_off); /* Check that the source is the server's IP. */ if (ip->ip_src.s_addr != nd_server.s_addr) { NETDDEBUG("drop packet not from server (from 0x%x)\n", ip->ip_src.s_addr); return; } /* Check if the destination IP is ours. */ if (ip->ip_dst.s_addr != nd_client.s_addr) { NETDDEBUGV("drop packet not to our IP\n"); return; } if (ip->ip_p != IPPROTO_UDP) { NETDDEBUG("drop non-UDP packet\n"); return; } /* Do not deal with fragments. */ if ((ip->ip_off & (IP_MF | IP_OFFMASK)) != 0) { NETDDEBUG("drop fragmented packet\n"); return; } /* UDP custom is to have packet length not include IP header. */ ip->ip_len -= hlen; /* UDP processing. */ /* Get IP and UDP headers together, along with the netdump packet. */ if (m->m_pkthdr.len < sizeof(struct udpiphdr) + sizeof(struct netdump_ack)) { NETDDEBUG("ignoring small packet\n"); return; } if (m->m_len < sizeof(struct udpiphdr) + sizeof(struct netdump_ack)) { m = m_pullup(m, sizeof(struct udpiphdr) + sizeof(struct netdump_ack)); *mb = m; if (m == NULL) { NETDDEBUG("m_pullup failed\n"); return; } } udp = mtod(m, struct udpiphdr *); if (ntohs(udp->ui_u.uh_dport) != NETDUMP_ACKPORT) { NETDDEBUG("not on the netdump port.\n"); return; } /* Netdump processing. */ /* * Packet is meant for us. Extract the ack sequence number and the * port number if necessary. */ nd_ack = (struct netdump_ack *)(mtod(m, caddr_t) + sizeof(struct udpiphdr)); rcv_ackno = ntohl(nd_ack->na_seqno); if (nd_server_port == NETDUMP_PORT) nd_server_port = ntohs(udp->ui_u.uh_sport); if (rcv_ackno >= nd_seqno + NETDUMP_MAX_IN_FLIGHT) printf("%s: ACK %d too far in future!\n", __func__, rcv_ackno); else if (rcv_ackno >= nd_seqno) { /* We're interested in this ack. Record it. */ rcvd_acks |= 1 << (rcv_ackno - nd_seqno); } } /* * Handler for ARP packets: checks their sanity and then * 1. If the ARP is a request for our IP, respond with our MAC address * 2. If the ARP is a response from our server, record its MAC address * * It needs to replicate partially the behaviour of arpintr() and * in_arpinput(). * * Parameters: * mb a pointer to an mbuf * containing the packet received * Updates *mb if m_pullup et al change the pointer * Assumes the calling function will take care of freeing the mbuf */ static void netdump_handle_arp(struct mbuf **mb) { char buf[INET_ADDRSTRLEN]; struct in_addr isaddr, itaddr, myaddr; struct ether_addr dst; struct mbuf *m; struct arphdr *ah; struct ifnet *ifp; uint8_t *enaddr; int req_len, op; m = *mb; ifp = m->m_pkthdr.rcvif; if (m->m_len < sizeof(struct arphdr)) { m = m_pullup(m, sizeof(struct arphdr)); *mb = m; if (m == NULL) { NETDDEBUG("runt packet: m_pullup failed\n"); return; } } ah = mtod(m, struct arphdr *); if (ntohs(ah->ar_hrd) != ARPHRD_ETHER) { NETDDEBUG("unknown hardware address 0x%2D)\n", (unsigned char *)&ah->ar_hrd, ""); return; } if (ntohs(ah->ar_pro) != ETHERTYPE_IP) { NETDDEBUG("drop ARP for unknown protocol %d\n", ntohs(ah->ar_pro)); return; } req_len = arphdr_len2(ifp->if_addrlen, sizeof(struct in_addr)); if (m->m_len < req_len) { m = m_pullup(m, req_len); *mb = m; if (m == NULL) { NETDDEBUG("runt packet: m_pullup failed\n"); return; } } ah = mtod(m, struct arphdr *); op = ntohs(ah->ar_op); memcpy(&isaddr, ar_spa(ah), sizeof(isaddr)); memcpy(&itaddr, ar_tpa(ah), sizeof(itaddr)); enaddr = (uint8_t *)IF_LLADDR(ifp); myaddr = nd_client; if (memcmp(ar_sha(ah), enaddr, ifp->if_addrlen) == 0) { NETDDEBUG("ignoring ARP from myself\n"); return; } if (isaddr.s_addr == nd_client.s_addr) { printf("%s: %*D is using my IP address %s!\n", __func__, ifp->if_addrlen, (u_char *)ar_sha(ah), ":", inet_ntoa_r(isaddr, buf)); return; } if (memcmp(ar_sha(ah), ifp->if_broadcastaddr, ifp->if_addrlen) == 0) { NETDDEBUG("ignoring ARP from broadcast address\n"); return; } if (op == ARPOP_REPLY) { if (isaddr.s_addr != nd_gateway.s_addr && isaddr.s_addr != nd_server.s_addr) { inet_ntoa_r(isaddr, buf); NETDDEBUG( "ignoring ARP reply from %s (not netdump server)\n", buf); return; } memcpy(nd_gw_mac.octet, ar_sha(ah), min(ah->ar_hln, ETHER_ADDR_LEN)); have_gw_mac = 1; NETDDEBUG("got server MAC address %6D\n", nd_gw_mac.octet, ":"); return; } if (op != ARPOP_REQUEST) { NETDDEBUG("ignoring ARP non-request/reply\n"); return; } if (itaddr.s_addr != nd_client.s_addr) { NETDDEBUG("ignoring ARP not to our IP\n"); return; } memcpy(ar_tha(ah), ar_sha(ah), ah->ar_hln); memcpy(ar_sha(ah), enaddr, ah->ar_hln); memcpy(ar_tpa(ah), ar_spa(ah), ah->ar_pln); memcpy(ar_spa(ah), &itaddr, ah->ar_pln); ah->ar_op = htons(ARPOP_REPLY); ah->ar_pro = htons(ETHERTYPE_IP); m->m_flags &= ~(M_BCAST|M_MCAST); m->m_len = arphdr_len(ah); m->m_pkthdr.len = m->m_len; memcpy(dst.octet, ar_tha(ah), ETHER_ADDR_LEN); netdump_ether_output(m, ifp, dst, ETHERTYPE_ARP); *mb = NULL; } /* * Handler for incoming packets directly from the network adapter * Identifies the packet type (IP or ARP) and passes it along to one of the * helper functions netdump_handle_ip or netdump_handle_arp. * * It needs to replicate partially the behaviour of ether_input() and * ether_demux(). * * Parameters: * ifp the interface the packet came from (should be nd_ifp) * m an mbuf containing the packet received */ static void netdump_pkt_in(struct ifnet *ifp, struct mbuf *m) { struct ifreq ifr; struct ether_header *eh; u_short etype; /* Ethernet processing. */ if ((m->m_flags & M_PKTHDR) == 0) { NETDDEBUG_IF(ifp, "discard frame without packet header\n"); goto done; } if (m->m_len < ETHER_HDR_LEN) { NETDDEBUG_IF(ifp, "discard frame without leading eth header (len %u pktlen %u)\n", m->m_len, m->m_pkthdr.len); goto done; } if ((m->m_flags & M_HASFCS) != 0) { m_adj(m, -ETHER_CRC_LEN); m->m_flags &= ~M_HASFCS; } eh = mtod(m, struct ether_header *); etype = ntohs(eh->ether_type); if ((m->m_flags & M_VLANTAG) != 0 || etype == ETHERTYPE_VLAN) { NETDDEBUG_IF(ifp, "ignoring vlan packets\n"); goto done; } if (if_gethwaddr(ifp, &ifr) != 0) { NETDDEBUG_IF(ifp, "failed to get hw addr for interface\n"); goto done; } if (memcmp(ifr.ifr_addr.sa_data, eh->ether_dhost, ETHER_ADDR_LEN) != 0) { NETDDEBUG_IF(ifp, "discard frame with incorrect destination addr\n"); goto done; } /* Done ethernet processing. Strip off the ethernet header. */ m_adj(m, ETHER_HDR_LEN); switch (etype) { case ETHERTYPE_ARP: netdump_handle_arp(&m); break; case ETHERTYPE_IP: netdump_handle_ip(&m); break; default: NETDDEBUG_IF(ifp, "dropping unknown ethertype %hu\n", etype); break; } done: if (m != NULL) m_freem(m); } /* * After trapping, instead of assuming that most of the network stack is sane, * we just poll the driver directly for packets. */ static void netdump_network_poll(void) { MPASS(nd_ifp != NULL); nd_ifp->if_netdump_methods->nd_poll(nd_ifp, 1000); } /*- * Dumping specific primitives. */ /* * Callback from dumpsys() to dump a chunk of memory. * Copies it out to our static buffer then sends it across the network. * Detects the initial KDH and makes sure it is given a special packet type. * * Parameters: * priv Unused. Optional private pointer. * virtual Virtual address (where to read the data from) * physical Unused. Physical memory address. * offset Offset from start of core file * length Data length * * Return value: * 0 on success * errno on error */ static int netdump_dumper(void *priv __unused, void *virtual, vm_offset_t physical __unused, off_t offset, size_t length) { int error; NETDDEBUGV("netdump_dumper(NULL, %p, NULL, %ju, %zu)\n", virtual, (uintmax_t)offset, length); if (virtual == NULL) { if (dump_failed != 0) printf("failed to dump the kernel core\n"); else if (netdump_send(NETDUMP_FINISHED, 0, NULL, 0) != 0) printf("failed to close the transaction\n"); else printf("\nnetdump finished.\n"); netdump_cleanup(); return (0); } if (length > sizeof(nd_buf)) return (ENOSPC); memmove(nd_buf, virtual, length); error = netdump_send(NETDUMP_VMCORE, offset, nd_buf, length); if (error != 0) { dump_failed = 1; return (error); } return (0); } /* * Perform any initalization needed prior to transmitting the kernel core. */ static int netdump_start(struct dumperinfo *di) { char *path; char buf[INET_ADDRSTRLEN]; uint32_t len; int error; error = 0; /* Check if the dumping is allowed to continue. */ if (nd_enabled == 0) return (EINVAL); if (panicstr == NULL) { printf( "netdump_start: netdump may only be used after a panic\n"); return (EINVAL); } MPASS(nd_ifp != NULL); if (nd_server.s_addr == INADDR_ANY) { printf("netdump_start: can't netdump; no server IP given\n"); return (EINVAL); } if (nd_client.s_addr == INADDR_ANY) { printf("netdump_start: can't netdump; no client IP given\n"); return (EINVAL); } /* We start dumping at offset 0. */ di->dumpoff = 0; nd_seqno = 1; /* * nd_server_port could have switched after the first ack the * first time it gets called. Adjust it accordingly. */ nd_server_port = NETDUMP_PORT; /* Switch to the netdump mbuf zones. */ netdump_mbuf_dump(); nd_ifp->if_netdump_methods->nd_event(nd_ifp, NETDUMP_START); /* Make the card use *our* receive callback. */ drv_if_input = nd_ifp->if_input; nd_ifp->if_input = netdump_pkt_in; if (nd_gateway.s_addr == INADDR_ANY) { restore_gw_addr = 1; nd_gateway.s_addr = nd_server.s_addr; } printf("netdump in progress. searching for server...\n"); if (netdump_arp_gw()) { printf("failed to locate server MAC address\n"); error = EINVAL; goto trig_abort; } if (nd_path[0] != '\0') { path = nd_path; len = strlen(path) + 1; } else { path = NULL; len = 0; } if (netdump_send(NETDUMP_HERALD, 0, path, len) != 0) { printf("failed to contact netdump server\n"); error = EINVAL; goto trig_abort; } printf("netdumping to %s (%6D)\n", inet_ntoa_r(nd_server, buf), nd_gw_mac.octet, ":"); return (0); trig_abort: netdump_cleanup(); return (error); } static int netdump_write_headers(struct dumperinfo *di, struct kerneldumpheader *kdh, void *key, uint32_t keysize) { int error; memcpy(nd_buf, kdh, sizeof(*kdh)); error = netdump_send(NETDUMP_KDH, 0, nd_buf, sizeof(*kdh)); if (error == 0 && keysize > 0) { if (keysize > sizeof(nd_buf)) return (EINVAL); memcpy(nd_buf, key, keysize); error = netdump_send(NETDUMP_EKCD_KEY, 0, nd_buf, keysize); } return (error); } /* * Cleanup routine for a possibly failed netdump. */ static void netdump_cleanup(void) { if (restore_gw_addr != 0) { nd_gateway.s_addr = INADDR_ANY; restore_gw_addr = 0; } if (drv_if_input != NULL) { nd_ifp->if_input = drv_if_input; drv_if_input = NULL; } nd_ifp->if_netdump_methods->nd_event(nd_ifp, NETDUMP_END); } /*- * KLD specific code. */ static struct cdevsw netdump_cdevsw = { .d_version = D_VERSION, .d_ioctl = netdump_ioctl, .d_name = "netdump", }; static struct cdev *netdump_cdev; static int -netdump_configure(struct netdump_conf *conf, struct thread *td) +netdump_configure(struct diocskerneldump_arg *conf, struct thread *td) { struct epoch_tracker et; struct ifnet *ifp; CURVNET_SET(TD_TO_VNET(td)); if (!IS_DEFAULT_VNET(curvnet)) { CURVNET_RESTORE(); return (EINVAL); } NET_EPOCH_ENTER(et); CK_STAILQ_FOREACH(ifp, &V_ifnet, if_link) { - if (strcmp(ifp->if_xname, conf->ndc_iface) == 0) + if (strcmp(ifp->if_xname, conf->kda_iface) == 0) break; } /* XXX ref */ NET_EPOCH_EXIT(et); CURVNET_RESTORE(); if (ifp == NULL) return (ENOENT); if ((if_getflags(ifp) & IFF_UP) == 0) return (ENXIO); if (!netdump_supported_nic(ifp) || ifp->if_type != IFT_ETHER) - return (EINVAL); + return (ENODEV); nd_ifp = ifp; netdump_reinit(ifp); memcpy(&nd_conf, conf, sizeof(nd_conf)); nd_enabled = 1; return (0); } /* * Reinitialize the mbuf pool used by drivers while dumping. This is called * from the generic ioctl handler for SIOCSIFMTU after the driver has * reconfigured itself. */ void netdump_reinit(struct ifnet *ifp) { int clsize, nmbuf, ncl, nrxr; if (ifp != nd_ifp) return; ifp->if_netdump_methods->nd_init(ifp, &nrxr, &ncl, &clsize); KASSERT(nrxr > 0, ("invalid receive ring count %d", nrxr)); /* * We need two headers per message on the transmit side. Multiply by * four to give us some breathing room. */ nmbuf = ncl * (4 + nrxr); ncl *= nrxr; netdump_mbuf_reinit(nmbuf, ncl, clsize); } /* * ioctl(2) handler for the netdump device. This is currently only used to * register netdump as a dump device. * * Parameters: * dev, Unused. * cmd, The ioctl to be handled. * addr, The parameter for the ioctl. * flags, Unused. * td, The thread invoking this ioctl. * * Returns: * 0 on success, and an errno value on failure. */ static int netdump_ioctl(struct cdev *dev __unused, u_long cmd, caddr_t addr, int flags __unused, struct thread *td) { - struct diocskerneldump_arg *kda; + struct diocskerneldump_arg kda_copy, *conf; struct dumperinfo dumper; - struct netdump_conf *conf; uint8_t *encryptedkey; int error; #ifdef COMPAT_FREEBSD11 u_int u; #endif +#ifdef COMPAT_FREEBSD12 + struct diocskerneldump_arg_freebsd12 *kda12; + struct netdump_conf_freebsd12 *conf12; +#endif + conf = NULL; error = 0; switch (cmd) { #ifdef COMPAT_FREEBSD11 case DIOCSKERNELDUMP_FREEBSD11: + gone_in(13, "11.x ABI compatibility"); u = *(u_int *)addr; if (u != 0) { error = ENXIO; break; } if (nd_enabled) { nd_enabled = 0; netdump_mbuf_drain(); } break; #endif - case DIOCSKERNELDUMP: - kda = (void *)addr; - if (kda->kda_enable != 0) { +#ifdef COMPAT_FREEBSD12 + /* + * Used by dumpon(8) in 12.x for clearing previous + * configuration -- then NETDUMPSCONF_FREEBSD12 is used to + * actually configure netdump. + */ + case DIOCSKERNELDUMP_FREEBSD12: + gone_in(14, "12.x ABI compatibility"); + + kda12 = (void *)addr; + if (kda12->kda12_enable) { error = ENXIO; break; } if (nd_enabled) { nd_enabled = 0; netdump_mbuf_drain(); } break; - case NETDUMPGCONF: - conf = (struct netdump_conf *)addr; + + case NETDUMPGCONF_FREEBSD12: + gone_in(14, "FreeBSD 12.x ABI compat"); + conf12 = (void *)addr; + + if (!nd_enabled) { + error = ENXIO; + break; + } + if (nd_conf.kda_af != AF_INET) { + error = EOPNOTSUPP; + break; + } + + strlcpy(conf12->ndc12_iface, nd_ifp->if_xname, + sizeof(conf12->ndc12_iface)); + memcpy(&conf12->ndc12_server, &nd_server, + sizeof(conf12->ndc12_server)); + memcpy(&conf12->ndc12_client, &nd_client, + sizeof(conf12->ndc12_client)); + memcpy(&conf12->ndc12_gateway, &nd_gateway, + sizeof(conf12->ndc12_gateway)); + break; +#endif + case DIOCGKERNELDUMP: + conf = (void *)addr; + /* + * For now, index is ignored; netdump doesn't support multiple + * configurations (yet). + */ if (!nd_enabled) { error = ENXIO; + conf = NULL; break; } - strlcpy(conf->ndc_iface, nd_ifp->if_xname, - sizeof(conf->ndc_iface)); - memcpy(&conf->ndc_server, &nd_server, sizeof(nd_server)); - memcpy(&conf->ndc_client, &nd_client, sizeof(nd_client)); - memcpy(&conf->ndc_gateway, &nd_gateway, sizeof(nd_gateway)); + strlcpy(conf->kda_iface, nd_ifp->if_xname, + sizeof(conf->kda_iface)); + memcpy(&conf->kda_server, &nd_server, sizeof(nd_server)); + memcpy(&conf->kda_client, &nd_client, sizeof(nd_client)); + memcpy(&conf->kda_gateway, &nd_gateway, sizeof(nd_gateway)); + conf->kda_af = nd_conf.kda_af; + conf = NULL; break; - case NETDUMPSCONF: - conf = (struct netdump_conf *)addr; + +#ifdef COMPAT_FREEBSD12 + case NETDUMPSCONF_FREEBSD12: + gone_in(14, "FreeBSD 12.x ABI compat"); + + conf12 = (struct netdump_conf_freebsd12 *)addr; + + _Static_assert(offsetof(struct diocskerneldump_arg, kda_server) + == offsetof(struct netdump_conf_freebsd12, ndc12_server), + "simplifying assumption"); + + memset(&kda_copy, 0, sizeof(kda_copy)); + memcpy(&kda_copy, conf12, + offsetof(struct diocskerneldump_arg, kda_server)); + + /* 12.x ABI could only configure IPv4 (INET) netdump. */ + kda_copy.kda_af = AF_INET; + memcpy(&kda_copy.kda_server.in4, &conf12->ndc12_server, + sizeof(struct in_addr)); + memcpy(&kda_copy.kda_client.in4, &conf12->ndc12_client, + sizeof(struct in_addr)); + memcpy(&kda_copy.kda_gateway.in4, &conf12->ndc12_gateway, + sizeof(struct in_addr)); + + kda_copy.kda_index = + (conf12->ndc12_kda.kda12_enable ? 0 : KDA_REMOVE_ALL); + + conf = &kda_copy; + explicit_bzero(conf12, sizeof(*conf12)); + /* FALLTHROUGH */ +#endif + case DIOCSKERNELDUMP: encryptedkey = NULL; - kda = &conf->ndc_kda; + if (cmd == DIOCSKERNELDUMP) { + conf = (void *)addr; + memcpy(&kda_copy, conf, sizeof(kda_copy)); + } + /* Netdump only supports IP4 at this time. */ + if (conf->kda_af != AF_INET) { + error = EPROTONOSUPPORT; + break; + } - conf->ndc_iface[sizeof(conf->ndc_iface) - 1] = '\0'; - if (kda->kda_enable == 0) { - if (nd_enabled) { - error = clear_dumper(td); + conf->kda_iface[sizeof(conf->kda_iface) - 1] = '\0'; + if (conf->kda_index == KDA_REMOVE || + conf->kda_index == KDA_REMOVE_DEV || + conf->kda_index == KDA_REMOVE_ALL) { + if (nd_enabled || conf->kda_index == KDA_REMOVE_ALL) { + error = dumper_remove(conf->kda_iface, conf); if (error == 0) { nd_enabled = 0; netdump_mbuf_drain(); } } break; } error = netdump_configure(conf, td); if (error != 0) break; - if (kda->kda_encryption != KERNELDUMP_ENC_NONE) { - if (kda->kda_encryptedkeysize <= 0 || - kda->kda_encryptedkeysize > - KERNELDUMP_ENCKEY_MAX_SIZE) - return (EINVAL); - encryptedkey = malloc(kda->kda_encryptedkeysize, M_TEMP, - M_WAITOK); - error = copyin(kda->kda_encryptedkey, encryptedkey, - kda->kda_encryptedkeysize); + if (conf->kda_encryption != KERNELDUMP_ENC_NONE) { + if (conf->kda_encryptedkeysize <= 0 || + conf->kda_encryptedkeysize > + KERNELDUMP_ENCKEY_MAX_SIZE) { + error = EINVAL; + break; + } + encryptedkey = malloc(conf->kda_encryptedkeysize, + M_TEMP, M_WAITOK); + error = copyin(conf->kda_encryptedkey, encryptedkey, + conf->kda_encryptedkeysize); if (error != 0) { free(encryptedkey, M_TEMP); - return (error); + break; } + + conf->kda_encryptedkey = encryptedkey; } memset(&dumper, 0, sizeof(dumper)); dumper.dumper_start = netdump_start; dumper.dumper_hdr = netdump_write_headers; dumper.dumper = netdump_dumper; dumper.priv = NULL; dumper.blocksize = NETDUMP_DATASIZE; dumper.maxiosize = MAXDUMPPGS * PAGE_SIZE; dumper.mediaoffset = 0; dumper.mediasize = 0; - error = set_dumper(&dumper, conf->ndc_iface, td, - kda->kda_compression, kda->kda_encryption, - kda->kda_key, kda->kda_encryptedkeysize, - encryptedkey); + error = dumper_insert(&dumper, conf->kda_iface, conf); if (encryptedkey != NULL) { - explicit_bzero(encryptedkey, kda->kda_encryptedkeysize); + explicit_bzero(encryptedkey, + conf->kda_encryptedkeysize); free(encryptedkey, M_TEMP); } if (error != 0) { nd_enabled = 0; netdump_mbuf_drain(); } break; default: - error = EINVAL; + error = ENOTTY; break; } + explicit_bzero(&kda_copy, sizeof(kda_copy)); + if (conf != NULL) + explicit_bzero(conf, sizeof(*conf)); return (error); } /* * Called upon system init or kld load. Initializes the netdump parameters to * sane defaults (locates the first available NIC and uses the first IPv4 IP on * that card as the client IP). Leaves the server IP unconfigured. * * Parameters: * mod, Unused. * what, The module event type. * priv, Unused. * * Returns: * int, An errno value if an error occured, 0 otherwise. */ static int netdump_modevent(module_t mod __unused, int what, void *priv __unused) { - struct netdump_conf conf; + struct diocskerneldump_arg conf; char *arg; int error; error = 0; switch (what) { case MOD_LOAD: error = make_dev_p(MAKEDEV_WAITOK, &netdump_cdev, &netdump_cdevsw, 0, UID_ROOT, GID_WHEEL, 0600, "netdump"); if (error != 0) return (error); if ((arg = kern_getenv("net.dump.iface")) != NULL) { - strlcpy(conf.ndc_iface, arg, sizeof(conf.ndc_iface)); + strlcpy(conf.kda_iface, arg, sizeof(conf.kda_iface)); freeenv(arg); if ((arg = kern_getenv("net.dump.server")) != NULL) { - inet_aton(arg, &conf.ndc_server); + inet_aton(arg, &conf.kda_server.in4); freeenv(arg); } if ((arg = kern_getenv("net.dump.client")) != NULL) { - inet_aton(arg, &conf.ndc_server); + inet_aton(arg, &conf.kda_server.in4); freeenv(arg); } if ((arg = kern_getenv("net.dump.gateway")) != NULL) { - inet_aton(arg, &conf.ndc_server); + inet_aton(arg, &conf.kda_server.in4); freeenv(arg); } + conf.kda_af = AF_INET; /* Ignore errors; we print a message to the console. */ (void)netdump_configure(&conf, curthread); } break; case MOD_UNLOAD: - destroy_dev(netdump_cdev); if (nd_enabled) { + struct diocskerneldump_arg kda; + printf("netdump: disabling dump device for unload\n"); - (void)clear_dumper(curthread); + + bzero(&kda, sizeof(kda)); + kda.kda_index = KDA_REMOVE_DEV; + (void)dumper_remove(nd_conf.kda_iface, &kda); + + netdump_mbuf_drain(); nd_enabled = 0; } + destroy_dev(netdump_cdev); break; default: error = EOPNOTSUPP; break; } return (error); } static moduledata_t netdump_mod = { "netdump", netdump_modevent, NULL, }; MODULE_VERSION(netdump, 1); DECLARE_MODULE(netdump, netdump_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); diff --git a/sys/sys/conf.h b/sys/sys/conf.h index 1e004ebebcef..5741e66c5522 100644 --- a/sys/sys/conf.h +++ b/sys/sys/conf.h @@ -1,374 +1,378 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1990, 1993 * The Regents of the University of California. All rights reserved. * Copyright (c) 2000 * Poul-Henning Kamp. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)conf.h 8.5 (Berkeley) 1/9/95 * $FreeBSD$ */ #ifndef _SYS_CONF_H_ #define _SYS_CONF_H_ #ifdef _KERNEL #include #else #include #endif struct snapdata; struct devfs_dirent; struct cdevsw; struct file; struct cdev { void *si_spare0; u_int si_flags; #define SI_ETERNAL 0x0001 /* never destroyed */ #define SI_ALIAS 0x0002 /* carrier of alias name */ #define SI_NAMED 0x0004 /* make_dev{_alias} has been called */ #define SI_CHEAPCLONE 0x0008 /* can be removed_dev'ed when vnode reclaims */ #define SI_CHILD 0x0010 /* child of another struct cdev **/ #define SI_DUMPDEV 0x0080 /* is kernel dumpdev */ #define SI_CLONELIST 0x0200 /* on a clone list */ #define SI_UNMAPPED 0x0400 /* can handle unmapped I/O */ #define SI_NOSPLIT 0x0800 /* I/O should not be split up */ struct timespec si_atime; struct timespec si_ctime; struct timespec si_mtime; uid_t si_uid; gid_t si_gid; mode_t si_mode; struct ucred *si_cred; /* cached clone-time credential */ int si_drv0; int si_refcount; LIST_ENTRY(cdev) si_list; LIST_ENTRY(cdev) si_clone; LIST_HEAD(, cdev) si_children; LIST_ENTRY(cdev) si_siblings; struct cdev *si_parent; struct mount *si_mountpt; void *si_drv1, *si_drv2; struct cdevsw *si_devsw; int si_iosize_max; /* maximum I/O size (for physio &al) */ u_long si_usecount; u_long si_threadcount; union { struct snapdata *__sid_snapdata; } __si_u; char si_name[SPECNAMELEN + 1]; }; #define si_snapdata __si_u.__sid_snapdata #ifdef _KERNEL /* * Definitions of device driver entry switches */ struct bio; struct buf; struct dumperinfo; struct kerneldumpheader; struct thread; struct uio; struct knote; struct clonedevs; struct vm_object; struct vnode; typedef int d_open_t(struct cdev *dev, int oflags, int devtype, struct thread *td); typedef int d_fdopen_t(struct cdev *dev, int oflags, struct thread *td, struct file *fp); typedef int d_close_t(struct cdev *dev, int fflag, int devtype, struct thread *td); typedef void d_strategy_t(struct bio *bp); typedef int d_ioctl_t(struct cdev *dev, u_long cmd, caddr_t data, int fflag, struct thread *td); typedef int d_read_t(struct cdev *dev, struct uio *uio, int ioflag); typedef int d_write_t(struct cdev *dev, struct uio *uio, int ioflag); typedef int d_poll_t(struct cdev *dev, int events, struct thread *td); typedef int d_kqfilter_t(struct cdev *dev, struct knote *kn); typedef int d_mmap_t(struct cdev *dev, vm_ooffset_t offset, vm_paddr_t *paddr, int nprot, vm_memattr_t *memattr); typedef int d_mmap_single_t(struct cdev *cdev, vm_ooffset_t *offset, vm_size_t size, struct vm_object **object, int nprot); typedef void d_purge_t(struct cdev *dev); typedef int dumper_t( void *_priv, /* Private to the driver. */ void *_virtual, /* Virtual (mapped) address. */ vm_offset_t _physical, /* Physical address of virtual. */ off_t _offset, /* Byte-offset to write at. */ size_t _length); /* Number of bytes to dump. */ typedef int dumper_start_t(struct dumperinfo *di); typedef int dumper_hdr_t(struct dumperinfo *di, struct kerneldumpheader *kdh, void *key, uint32_t keylen); #endif /* _KERNEL */ /* * Types for d_flags. */ #define D_TAPE 0x0001 #define D_DISK 0x0002 #define D_TTY 0x0004 #define D_MEM 0x0008 /* /dev/(k)mem */ #ifdef _KERNEL #define D_TYPEMASK 0xffff /* * Flags for d_flags which the drivers can set. */ #define D_TRACKCLOSE 0x00080000 /* track all closes */ #define D_MMAP_ANON 0x00100000 /* special treatment in vm_mmap.c */ #define D_NEEDGIANT 0x00400000 /* driver want Giant */ #define D_NEEDMINOR 0x00800000 /* driver uses clone_create() */ /* * Version numbers. */ #define D_VERSION_00 0x20011966 #define D_VERSION_01 0x17032005 /* Add d_uid,gid,mode & kind */ #define D_VERSION_02 0x28042009 /* Add d_mmap_single */ #define D_VERSION_03 0x17122009 /* d_mmap takes memattr,vm_ooffset_t */ #define D_VERSION_04 0x5c48c353 /* SPECNAMELEN bumped to MAXNAMLEN */ #define D_VERSION D_VERSION_04 /* * Flags used for internal housekeeping */ #define D_INIT 0x80000000 /* cdevsw initialized */ /* * Character device switch table */ struct cdevsw { int d_version; u_int d_flags; const char *d_name; d_open_t *d_open; d_fdopen_t *d_fdopen; d_close_t *d_close; d_read_t *d_read; d_write_t *d_write; d_ioctl_t *d_ioctl; d_poll_t *d_poll; d_mmap_t *d_mmap; d_strategy_t *d_strategy; dumper_t *d_dump; d_kqfilter_t *d_kqfilter; d_purge_t *d_purge; d_mmap_single_t *d_mmap_single; int32_t d_spare0[3]; void *d_spare1[3]; /* These fields should not be messed with by drivers */ LIST_HEAD(, cdev) d_devs; int d_spare2; union { struct cdevsw *gianttrick; SLIST_ENTRY(cdevsw) postfree_list; } __d_giant; }; #define d_gianttrick __d_giant.gianttrick #define d_postfree_list __d_giant.postfree_list struct module; struct devsw_module_data { int (*chainevh)(struct module *, int, void *); /* next handler */ void *chainarg; /* arg for next event handler */ /* Do not initialize fields hereafter */ }; #define DEV_MODULE_ORDERED(name, evh, arg, ord) \ static moduledata_t name##_mod = { \ #name, \ evh, \ arg \ }; \ DECLARE_MODULE(name, name##_mod, SI_SUB_DRIVERS, ord) #define DEV_MODULE(name, evh, arg) \ DEV_MODULE_ORDERED(name, evh, arg, SI_ORDER_MIDDLE) void clone_setup(struct clonedevs **cdp); void clone_cleanup(struct clonedevs **); #define CLONE_UNITMASK 0xfffff #define CLONE_FLAG0 (CLONE_UNITMASK + 1) int clone_create(struct clonedevs **, struct cdevsw *, int *unit, struct cdev **dev, int extra); #define MAKEDEV_REF 0x01 #define MAKEDEV_WHTOUT 0x02 #define MAKEDEV_NOWAIT 0x04 #define MAKEDEV_WAITOK 0x08 #define MAKEDEV_ETERNAL 0x10 #define MAKEDEV_CHECKNAME 0x20 struct make_dev_args { size_t mda_size; int mda_flags; struct cdevsw *mda_devsw; struct ucred *mda_cr; uid_t mda_uid; gid_t mda_gid; int mda_mode; int mda_unit; void *mda_si_drv1; void *mda_si_drv2; }; void make_dev_args_init_impl(struct make_dev_args *_args, size_t _sz); #define make_dev_args_init(a) \ make_dev_args_init_impl((a), sizeof(struct make_dev_args)) int count_dev(struct cdev *_dev); void delist_dev(struct cdev *_dev); void destroy_dev(struct cdev *_dev); int destroy_dev_sched(struct cdev *dev); int destroy_dev_sched_cb(struct cdev *dev, void (*cb)(void *), void *arg); void destroy_dev_drain(struct cdevsw *csw); void drain_dev_clone_events(void); struct cdevsw *dev_refthread(struct cdev *_dev, int *_ref); struct cdevsw *devvn_refthread(struct vnode *vp, struct cdev **devp, int *_ref); void dev_relthread(struct cdev *_dev, int _ref); void dev_depends(struct cdev *_pdev, struct cdev *_cdev); void dev_ref(struct cdev *dev); void dev_refl(struct cdev *dev); void dev_rel(struct cdev *dev); struct cdev *make_dev(struct cdevsw *_devsw, int _unit, uid_t _uid, gid_t _gid, int _perms, const char *_fmt, ...) __printflike(6, 7); struct cdev *make_dev_cred(struct cdevsw *_devsw, int _unit, struct ucred *_cr, uid_t _uid, gid_t _gid, int _perms, const char *_fmt, ...) __printflike(7, 8); struct cdev *make_dev_credf(int _flags, struct cdevsw *_devsw, int _unit, struct ucred *_cr, uid_t _uid, gid_t _gid, int _mode, const char *_fmt, ...) __printflike(8, 9); int make_dev_p(int _flags, struct cdev **_cdev, struct cdevsw *_devsw, struct ucred *_cr, uid_t _uid, gid_t _gid, int _mode, const char *_fmt, ...) __printflike(8, 9); int make_dev_s(struct make_dev_args *_args, struct cdev **_cdev, const char *_fmt, ...) __printflike(3, 4); struct cdev *make_dev_alias(struct cdev *_pdev, const char *_fmt, ...) __printflike(2, 3); int make_dev_alias_p(int _flags, struct cdev **_cdev, struct cdev *_pdev, const char *_fmt, ...) __printflike(4, 5); int make_dev_physpath_alias(int _flags, struct cdev **_cdev, struct cdev *_pdev, struct cdev *_old_alias, const char *_physpath); void dev_lock(void); void dev_unlock(void); #ifdef KLD_MODULE #define MAKEDEV_ETERNAL_KLD 0 #else #define MAKEDEV_ETERNAL_KLD MAKEDEV_ETERNAL #endif #define dev2unit(d) ((d)->si_drv0) typedef void d_priv_dtor_t(void *data); int devfs_get_cdevpriv(void **datap); int devfs_set_cdevpriv(void *priv, d_priv_dtor_t *dtr); void devfs_clear_cdevpriv(void); ino_t devfs_alloc_cdp_inode(void); void devfs_free_cdp_inode(ino_t ino); #define UID_ROOT 0 #define UID_BIN 3 #define UID_UUCP 66 #define UID_NOBODY 65534 #define GID_WHEEL 0 #define GID_KMEM 2 #define GID_TTY 4 #define GID_OPERATOR 5 #define GID_BIN 7 #define GID_GAMES 13 #define GID_VIDEO 44 #define GID_DIALER 68 #define GID_NOGROUP 65533 #define GID_NOBODY 65534 typedef void (*dev_clone_fn)(void *arg, struct ucred *cred, char *name, int namelen, struct cdev **result); int dev_stdclone(char *_name, char **_namep, const char *_stem, int *_unit); EVENTHANDLER_DECLARE(dev_clone, dev_clone_fn); /* Stuff relating to kernel-dump */ struct kerneldumpcrypto; struct kerneldumpheader; struct dumperinfo { dumper_t *dumper; /* Dumping function. */ dumper_start_t *dumper_start; /* Dumper callback for dump_start(). */ dumper_hdr_t *dumper_hdr; /* Dumper callback for writing headers. */ void *priv; /* Private parts. */ u_int blocksize; /* Size of block in bytes. */ u_int maxiosize; /* Max size allowed for an individual I/O */ off_t mediaoffset; /* Initial offset in bytes. */ off_t mediasize; /* Space available in bytes. */ /* MI kernel dump state. */ void *blockbuf; /* Buffer for padding shorter dump blocks */ off_t dumpoff; /* Offset of ongoing kernel dump. */ off_t origdumpoff; /* Starting dump offset. */ struct kerneldumpcrypto *kdcrypto; /* Kernel dump crypto. */ struct kerneldumpcomp *kdcomp; /* Kernel dump compression. */ + + TAILQ_ENTRY(dumperinfo) di_next; + + char di_devname[]; }; extern int dumping; /* system is dumping */ int doadump(boolean_t); -int set_dumper(struct dumperinfo *di, const char *devname, struct thread *td, - uint8_t compression, uint8_t encryption, const uint8_t *key, - uint32_t encryptedkeysize, const uint8_t *encryptedkey); -int clear_dumper(struct thread *td); +struct diocskerneldump_arg; +int dumper_insert(const struct dumperinfo *di_template, const char *devname, + const struct diocskerneldump_arg *kda); +int dumper_remove(const char *devname, const struct diocskerneldump_arg *kda); int dump_start(struct dumperinfo *di, struct kerneldumpheader *kdh); int dump_append(struct dumperinfo *, void *, vm_offset_t, size_t); int dump_write(struct dumperinfo *, void *, vm_offset_t, off_t, size_t); int dump_finish(struct dumperinfo *di, struct kerneldumpheader *kdh); void dump_init_header(const struct dumperinfo *di, struct kerneldumpheader *kdh, char *magic, uint32_t archver, uint64_t dumplen); #endif /* _KERNEL */ #endif /* !_SYS_CONF_H_ */ diff --git a/sys/sys/disk.h b/sys/sys/disk.h index 4338b03e924a..4fb0f8b61904 100644 --- a/sys/sys/disk.h +++ b/sys/sys/disk.h @@ -1,159 +1,212 @@ /*- * SPDX-License-Identifier: Beerware * * ---------------------------------------------------------------------------- * "THE BEER-WARE LICENSE" (Revision 42): * wrote this file. As long as you retain this notice you * can do whatever you want with this stuff. If we meet some day, and you think * this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp * ---------------------------------------------------------------------------- * * $FreeBSD$ * */ #ifndef _SYS_DISK_H_ #define _SYS_DISK_H_ #include #include #include #include +#include + +#include +#include #ifdef _KERNEL #ifndef _SYS_CONF_H_ #include /* XXX: temporary to avoid breakage */ #endif void disk_err(struct bio *bp, const char *what, int blkdone, int nl); #endif #define DIOCGSECTORSIZE _IOR('d', 128, u_int) /* * Get the sector size of the device in bytes. The sector size is the * smallest unit of data which can be transferred from this device. * Usually this is a power of 2 but it might not be (i.e. CDROM audio). */ #define DIOCGMEDIASIZE _IOR('d', 129, off_t) /* Get media size in bytes */ /* * Get the size of the entire device in bytes. This should be a * multiple of the sector size. */ #define DIOCGFWSECTORS _IOR('d', 130, u_int) /* Get firmware's sectorcount */ /* * Get the firmware's notion of number of sectors per track. This * value is mostly used for compatibility with various ill designed * disk label formats. Don't use it unless you have to. */ #define DIOCGFWHEADS _IOR('d', 131, u_int) /* Get firmware's headcount */ /* * Get the firmwares notion of number of heads per cylinder. This * value is mostly used for compatibility with various ill designed * disk label formats. Don't use it unless you have to. */ #define DIOCSKERNELDUMP_FREEBSD11 _IOW('d', 133, u_int) /* * Enable/Disable (the argument is boolean) the device for kernel * core dumps. */ #define DIOCGFRONTSTUFF _IOR('d', 134, off_t) /* * Many disk formats have some amount of space reserved at the * start of the disk to hold bootblocks, various disklabels and * similar stuff. This ioctl returns the number of such bytes * which may apply to the device. */ #define DIOCGFLUSH _IO('d', 135) /* Flush write cache */ /* * Flush write cache of the device. */ #define DIOCGDELETE _IOW('d', 136, off_t[2]) /* Delete data */ /* * Mark data on the device as unused. */ #define DISK_IDENT_SIZE 256 #define DIOCGIDENT _IOR('d', 137, char[DISK_IDENT_SIZE]) /*- * Get the ident of the given provider. Ident is (most of the time) * a uniqe and fixed provider's identifier. Ident's properties are as * follow: * - ident value is preserved between reboots, * - provider can be detached/attached and ident is preserved, * - provider's name can change - ident can't, * - ident value should not be based on on-disk metadata; in other * words copying whole data from one disk to another should not * yield the same ident for the other disk, * - there could be more than one provider with the same ident, but * only if they point at exactly the same physical storage, this is * the case for multipathing for example, * - GEOM classes that consumes single providers and provide single * providers, like geli, gbde, should just attach class name to the * ident of the underlying provider, * - ident is an ASCII string (is printable), * - ident is optional and applications can't relay on its presence. */ #define DIOCGPROVIDERNAME _IOR('d', 138, char[MAXPATHLEN]) /* * Store the provider name, given a device path, in a buffer. The buffer * must be at least MAXPATHLEN bytes long. */ #define DIOCGSTRIPESIZE _IOR('d', 139, off_t) /* Get stripe size in bytes */ /* * Get the size of the device's optimal access block in bytes. * This should be a multiple of the sector size. */ #define DIOCGSTRIPEOFFSET _IOR('d', 140, off_t) /* Get stripe offset in bytes */ /* * Get the offset of the first device's optimal access block in bytes. * This should be a multiple of the sector size. */ #define DIOCGPHYSPATH _IOR('d', 141, char[MAXPATHLEN]) /* * Get a string defining the physical path for a given provider. * This has similar rules to ident, but is intended to uniquely * identify the physical location of the device, not the current * occupant of that location. */ struct diocgattr_arg { char name[64]; int len; union { char str[DISK_IDENT_SIZE]; off_t off; int i; uint16_t u16; } value; }; #define DIOCGATTR _IOWR('d', 142, struct diocgattr_arg) #define DIOCZONECMD _IOWR('d', 143, struct disk_zone_args) +struct diocskerneldump_arg_freebsd12 { + uint8_t kda12_enable; + uint8_t kda12_compression; + uint8_t kda12_encryption; + uint8_t kda12_key[KERNELDUMP_KEY_MAX_SIZE]; + uint32_t kda12_encryptedkeysize; + uint8_t *kda12_encryptedkey; +}; +#define DIOCSKERNELDUMP_FREEBSD12 \ + _IOW('d', 144, struct diocskerneldump_arg_freebsd12) + +union kd_ip { + struct in_addr in4; + struct in6_addr in6; +}; + +/* + * Sentinel values for kda_index. + * + * If kda_index is KDA_REMOVE_ALL, all dump configurations are cleared. + * + * If kda_index is KDA_REMOVE_DEV, all dump configurations for the specified + * device are cleared. + * + * If kda_index is KDA_REMOVE, only the specified dump configuration for the + * given device is removed from the list of fallback dump configurations. + * + * If kda_index is KDA_APPEND, the dump configuration is added after all + * existing dump configurations. + * + * Otherwise, the new configuration is inserted into the fallback dump list at + * index 'kda_index'. + */ +#define KDA_REMOVE UINT8_MAX +#define KDA_REMOVE_ALL (UINT8_MAX - 1) +#define KDA_REMOVE_DEV (UINT8_MAX - 2) +#define KDA_APPEND (UINT8_MAX - 3) struct diocskerneldump_arg { - uint8_t kda_enable; + uint8_t kda_index; uint8_t kda_compression; uint8_t kda_encryption; uint8_t kda_key[KERNELDUMP_KEY_MAX_SIZE]; uint32_t kda_encryptedkeysize; uint8_t *kda_encryptedkey; + char kda_iface[IFNAMSIZ]; + union kd_ip kda_server; + union kd_ip kda_client; + union kd_ip kda_gateway; + uint8_t kda_af; }; -#define DIOCSKERNELDUMP _IOW('d', 144, struct diocskerneldump_arg) +_Static_assert(__offsetof(struct diocskerneldump_arg, kda_iface) == + sizeof(struct diocskerneldump_arg_freebsd12), "simplifying assumption"); +#define DIOCSKERNELDUMP _IOW('d', 145, struct diocskerneldump_arg) /* * Enable/Disable the device for kernel core dumps. */ +#define DIOCGKERNELDUMP _IOWR('d', 146, struct diocskerneldump_arg) + /* + * Get current kernel netdump configuration details for a given index. + */ + #endif /* _SYS_DISK_H_ */ diff --git a/sys/sys/param.h b/sys/sys/param.h index c4d6c6a5e420..ac6201144bd3 100644 --- a/sys/sys/param.h +++ b/sys/sys/param.h @@ -1,367 +1,367 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1982, 1986, 1989, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)param.h 8.3 (Berkeley) 4/4/95 * $FreeBSD$ */ #ifndef _SYS_PARAM_H_ #define _SYS_PARAM_H_ #include #define BSD 199506 /* System version (year & month). */ #define BSD4_3 1 #define BSD4_4 1 /* * __FreeBSD_version numbers are documented in the Porter's Handbook. * If you bump the version for any reason, you should update the documentation * there. * Currently this lives here in the doc/ repository: * * head/en_US.ISO8859-1/books/porters-handbook/versions/chapter.xml * * scheme is: Rxx * 'R' is in the range 0 to 4 if this is a release branch or * X.0-CURRENT before releng/X.0 is created, otherwise 'R' is * in the range 5 to 9. */ #undef __FreeBSD_version -#define __FreeBSD_version 1300022 /* Master, propagated to newvers */ +#define __FreeBSD_version 1300023 /* Master, propagated to newvers */ /* * __FreeBSD_kernel__ indicates that this system uses the kernel of FreeBSD, * which by definition is always true on FreeBSD. This macro is also defined * on other systems that use the kernel of FreeBSD, such as GNU/kFreeBSD. * * It is tempting to use this macro in userland code when we want to enable * kernel-specific routines, and in fact it's fine to do this in code that * is part of FreeBSD itself. However, be aware that as presence of this * macro is still not widespread (e.g. older FreeBSD versions, 3rd party * compilers, etc), it is STRONGLY DISCOURAGED to check for this macro in * external applications without also checking for __FreeBSD__ as an * alternative. */ #undef __FreeBSD_kernel__ #define __FreeBSD_kernel__ #if defined(_KERNEL) || defined(IN_RTLD) #define P_OSREL_SIGWAIT 700000 #define P_OSREL_SIGSEGV 700004 #define P_OSREL_MAP_ANON 800104 #define P_OSREL_MAP_FSTRICT 1100036 #define P_OSREL_SHUTDOWN_ENOTCONN 1100077 #define P_OSREL_MAP_GUARD 1200035 #define P_OSREL_WRFSBASE 1200041 #define P_OSREL_CK_CYLGRP 1200046 #define P_OSREL_VMTOTAL64 1200054 #define P_OSREL_CK_SUPERBLOCK 1300000 #define P_OSREL_CK_INODE 1300005 #define P_OSREL_MAJOR(x) ((x) / 100000) #endif #ifndef LOCORE #include #endif /* * Machine-independent constants (some used in following include files). * Redefined constants are from POSIX 1003.1 limits file. * * MAXCOMLEN should be >= sizeof(ac_comm) (see ) */ #include #define MAXCOMLEN 19 /* max command name remembered */ #define MAXINTERP PATH_MAX /* max interpreter file name length */ #define MAXLOGNAME 33 /* max login name length (incl. NUL) */ #define MAXUPRC CHILD_MAX /* max simultaneous processes */ #define NCARGS ARG_MAX /* max bytes for an exec function */ #define NGROUPS (NGROUPS_MAX+1) /* max number groups */ #define NOFILE OPEN_MAX /* max open files per process */ #define NOGROUP 65535 /* marker for empty group set member */ #define MAXHOSTNAMELEN 256 /* max hostname size */ #define SPECNAMELEN 255 /* max length of devicename */ /* More types and definitions used throughout the kernel. */ #ifdef _KERNEL #include #include #ifndef LOCORE #include #include #endif #ifndef FALSE #define FALSE 0 #endif #ifndef TRUE #define TRUE 1 #endif #endif #ifndef _KERNEL /* Signals. */ #include #endif /* Machine type dependent parameters. */ #include #ifndef _KERNEL #include #endif #ifndef DEV_BSHIFT #define DEV_BSHIFT 9 /* log2(DEV_BSIZE) */ #endif #define DEV_BSIZE (1<>PAGE_SHIFT) #endif /* * btodb() is messy and perhaps slow because `bytes' may be an off_t. We * want to shift an unsigned type to avoid sign extension and we don't * want to widen `bytes' unnecessarily. Assume that the result fits in * a daddr_t. */ #ifndef btodb #define btodb(bytes) /* calculates (bytes / DEV_BSIZE) */ \ (sizeof (bytes) > sizeof(long) \ ? (daddr_t)((unsigned long long)(bytes) >> DEV_BSHIFT) \ : (daddr_t)((unsigned long)(bytes) >> DEV_BSHIFT)) #endif #ifndef dbtob #define dbtob(db) /* calculates (db * DEV_BSIZE) */ \ ((off_t)(db) << DEV_BSHIFT) #endif #define PRIMASK 0x0ff #define PCATCH 0x100 /* OR'd with pri for tsleep to check signals */ #define PDROP 0x200 /* OR'd with pri to stop re-entry of interlock mutex */ #define NZERO 0 /* default "nice" */ #define NBBY 8 /* number of bits in a byte */ #define NBPW sizeof(int) /* number of bytes per word (integer) */ #define CMASK 022 /* default file mask: S_IWGRP|S_IWOTH */ #define NODEV (dev_t)(-1) /* non-existent device */ /* * File system parameters and macros. * * MAXBSIZE - Filesystems are made out of blocks of at most MAXBSIZE bytes * per block. MAXBSIZE may be made larger without effecting * any existing filesystems as long as it does not exceed MAXPHYS, * and may be made smaller at the risk of not being able to use * filesystems which require a block size exceeding MAXBSIZE. * * MAXBCACHEBUF - Maximum size of a buffer in the buffer cache. This must * be >= MAXBSIZE and can be set differently for different * architectures by defining it in . * Making this larger allows NFS to do larger reads/writes. * * BKVASIZE - Nominal buffer space per buffer, in bytes. BKVASIZE is the * minimum KVM memory reservation the kernel is willing to make. * Filesystems can of course request smaller chunks. Actual * backing memory uses a chunk size of a page (PAGE_SIZE). * The default value here can be overridden on a per-architecture * basis by defining it in . * * If you make BKVASIZE too small you risk seriously fragmenting * the buffer KVM map which may slow things down a bit. If you * make it too big the kernel will not be able to optimally use * the KVM memory reserved for the buffer cache and will wind * up with too-few buffers. * * The default is 16384, roughly 2x the block size used by a * normal UFS filesystem. */ #define MAXBSIZE 65536 /* must be power of 2 */ #ifndef MAXBCACHEBUF #define MAXBCACHEBUF MAXBSIZE /* must be a power of 2 >= MAXBSIZE */ #endif #ifndef BKVASIZE #define BKVASIZE 16384 /* must be power of 2 */ #endif #define BKVAMASK (BKVASIZE-1) /* * MAXPATHLEN defines the longest permissible path length after expanding * symbolic links. It is used to allocate a temporary buffer from the buffer * pool in which to do the name expansion, hence should be a power of two, * and must be less than or equal to MAXBSIZE. MAXSYMLINKS defines the * maximum number of symbolic links that may be expanded in a path name. * It should be set high enough to allow all legitimate uses, but halt * infinite loops reasonably quickly. */ #define MAXPATHLEN PATH_MAX #define MAXSYMLINKS 32 /* Bit map related macros. */ #define setbit(a,i) (((unsigned char *)(a))[(i)/NBBY] |= 1<<((i)%NBBY)) #define clrbit(a,i) (((unsigned char *)(a))[(i)/NBBY] &= ~(1<<((i)%NBBY))) #define isset(a,i) \ (((const unsigned char *)(a))[(i)/NBBY] & (1<<((i)%NBBY))) #define isclr(a,i) \ ((((const unsigned char *)(a))[(i)/NBBY] & (1<<((i)%NBBY))) == 0) /* Macros for counting and rounding. */ #ifndef howmany #define howmany(x, y) (((x)+((y)-1))/(y)) #endif #define nitems(x) (sizeof((x)) / sizeof((x)[0])) #define rounddown(x, y) (((x)/(y))*(y)) #define rounddown2(x, y) ((x)&(~((y)-1))) /* if y is power of two */ #define roundup(x, y) ((((x)+((y)-1))/(y))*(y)) /* to any y */ #define roundup2(x, y) (((x)+((y)-1))&(~((y)-1))) /* if y is powers of two */ #define powerof2(x) ((((x)-1)&(x))==0) /* Macros for min/max. */ #define MIN(a,b) (((a)<(b))?(a):(b)) #define MAX(a,b) (((a)>(b))?(a):(b)) #ifdef _KERNEL /* * Basic byte order function prototypes for non-inline functions. */ #ifndef LOCORE #ifndef _BYTEORDER_PROTOTYPED #define _BYTEORDER_PROTOTYPED __BEGIN_DECLS __uint32_t htonl(__uint32_t); __uint16_t htons(__uint16_t); __uint32_t ntohl(__uint32_t); __uint16_t ntohs(__uint16_t); __END_DECLS #endif #endif #ifndef _BYTEORDER_FUNC_DEFINED #define _BYTEORDER_FUNC_DEFINED #define htonl(x) __htonl(x) #define htons(x) __htons(x) #define ntohl(x) __ntohl(x) #define ntohs(x) __ntohs(x) #endif /* !_BYTEORDER_FUNC_DEFINED */ #endif /* _KERNEL */ /* * Scale factor for scaled integers used to count %cpu time and load avgs. * * The number of CPU `tick's that map to a unique `%age' can be expressed * by the formula (1 / (2 ^ (FSHIFT - 11))). The maximum load average that * can be calculated (assuming 32 bits) can be closely approximated using * the formula (2 ^ (2 * (16 - FSHIFT))) for (FSHIFT < 15). * * For the scheduler to maintain a 1:1 mapping of CPU `tick' to `%age', * FSHIFT must be at least 11; this gives us a maximum load avg of ~1024. */ #define FSHIFT 11 /* bits to right of fixed binary point */ #define FSCALE (1<> (PAGE_SHIFT - DEV_BSHIFT)) #define ctodb(db) /* calculates pages to devblks */ \ ((db) << (PAGE_SHIFT - DEV_BSHIFT)) /* * Old spelling of __containerof(). */ #define member2struct(s, m, x) \ ((struct s *)(void *)((char *)(x) - offsetof(struct s, m))) /* * Access a variable length array that has been declared as a fixed * length array. */ #define __PAST_END(array, offset) (((__typeof__(*(array)) *)(array))[offset]) #endif /* _SYS_PARAM_H_ */