Index: user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.8 =================================================================== --- user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.8 (revision 303774) +++ user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.8 (revision 303775) @@ -1,192 +1,201 @@ .\" Copyright (c) 2011-2012 Stefan Bethke. .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd September 20, 2013 .Dt ETHERSWITCHCFG 8 .Os .Sh NAME .Nm etherswitchcfg .Nd configure a built-in Ethernet switch .Sh SYNOPSIS .Nm .Op Fl "f control file" -.Ar info +.Cm info .Nm .Op Fl "f control file" -.Ar config +.Cm config .Ar command parameter .Nm .Op Fl "f control file" -.Ar phy +.Cm phy .Ar phy.register[=value] .Nm .Op Fl "f control file" -.Ar port%d +.Cm port%d .Ar [flags] command parameter .Nm .Op Fl "f control file" -.Ar reg +.Cm reg .Ar register[=value] .Nm .Op Fl "f control file" -.Ar vlangroup%d +.Cm vlangroup%d .Ar command parameter .Sh DESCRIPTION The .Nm utility is used to configure an Ethernet switch built into the system. .Nm accepts a number of options: .Pp .Bl -tag -width ".Fl f" -compact .It Fl "f control file" Specifies the .Xr etherswitch 4 control file that represents the switch to be configured. It defaults to .Pa /dev/etherswitch0 . .It Fl m When reporting port information, also list available media options for that port. .It Fl v Produce more verbose output. Without this flag, lines that represent inactive or empty configuration options are omitted. .El .Ss config The config command provides access to global switch configuration parameters. It support the following commands: .Pp -.Bl -tag -width ".Ar vlan_mode mode" -compact -.It Ar vlan_mode mode +.Bl -tag -width ".Cm vlan_mode mode" -compact +.It Cm vlan_mode Ar mode Sets the switch VLAN mode (depends on the hardware). .El .Ss phy The phy command provides access to the registers of the PHYs attached to or integrated into the switch controller. PHY registers are specified as phy.register, where .Ar phy is usually the port number, and .Ar register is the register number. Both can be provided as decimal, octal or hexadecimal numbers in any of the formats understood by .Xr strtol 3 . To set the register value, use the form instance.register=value. .Ss port The port command selects one of the ports of the switch. It supports the following commands: .Pp .Bl -tag -width ".Ar pvid number" -compact -.It Ar pvid number +.It Cm pvid Ar number Sets the default port VID that is used to process incoming frames that are not tagged. -.It Ar media mediaspec +.It Cm media Ar mediaspec Specifies the physical media configuration to be configured for a port. -.It Ar mediaopt mediaoption +.It Cm mediaopt Ar mediaoption Specifies a list of media options for a port. See .Xr ifconfig 8 for details on -.Ar media +.Cm media and -.Ar mediaopt . +.Cm mediaopt . +.It Cm led Ar number style +Sets the display style for a given LED. Available styles are: +.Cm default +(usually flash on activity), +.Cm on , +.Cm off , +and +.Cm blink . +Not all switches will support all styles. .El .Pp And the following flags (please note that not all flags are supported by all switch drivers): .Pp -.Bl -tag -width ".Ar addtag" -compact -.It Ar addtag +.Bl -tag -width ".Fl addtag" -compact +.It Cm addtag Add VLAN tag to each packet sent by the port. -.It Ar -addtag +.It Fl addtag Disable the add VLAN tag option. -.It Ar striptag +.It Cm striptag Strip the VLAN tags from the packets sent by the port. -.It Ar -striptag +.It Fl striptag Disable the strip VLAN tag option. -.It Ar firstlock +.It Cm firstlock This options makes the switch port lock on the first MAC address it sees. After that, usually you need to reset the switch to learn different MAC addresses. -.It Ar -firstlock +.It Fl firstlock Disable the first lock option. Note that sometimes you need to reset the switch to really disable this option. -.It Ar dropuntagged +.It Cm dropuntagged Drop packets without a VLAN tag. -.It Ar -dropuntagged +.It Fl dropuntagged Disable the drop untagged packets option. -.It Ar doubletag +.It Cm doubletag Enable QinQ for the port. -.It Ar -doubletag +.It Fl doubletag Disable QinQ for the port. -.It Ar ingress +.It Cm ingress Enable the ingress filter on the port. -.It Ar -ingress +.It Fl ingress Disable the ingress filter. .El .Ss reg The reg command provides access to the registers of the switch controller. .Ss vlangroup The vlangroup command selects one of the VLAN groups for configuration. It supports the following commands: .Pp -.Bl -tag -width ".Ar vlangroup" -compact -.It Ar vlan VID +.Bl -tag -width ".Cm members" -compact +.It Cm vlan Ar VID Sets the VLAN ID (802.1q VID) for this VLAN group. Frames transmitted on tagged member ports of this group will be tagged with this VID. Incoming frames carrying this tag will be forwarded according to the configuration of this VLAN group. -.It Ar members port,... +.It Cm members Ar port,... Configures which ports are to be a member of this VLAN group. The port numbers are given as a comma-separated list. Each port can optionally be followed by .Dq t to indicate that frames on this port are tagged. .El .Sh FILES .Bl -tag -width /dev/etherswitch? -compact .It Pa /dev/etherswitch? Control file for the Ethernet switch driver. .El .Sh EXAMPLES Configure VLAN group 1 with a VID of 2 and make ports 0 and 5 its members while excluding all other ports. Port 5 will send and receive tagged frames while port 0 will be untagged. Incoming untagged frames on port 0 are assigned to vlangroup1. .Pp .Dl # etherswitchcfg vlangroup1 vlan 2 members 0,5t port0 pvid 2 .Sh SEE ALSO .Xr etherswitch 4 .Sh HISTORY .Nm first appeared in .Fx 10.0 . .Sh AUTHORS .An Stefan Bethke Index: user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.c =================================================================== --- user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.c (revision 303775) @@ -1,710 +1,752 @@ /*- * Copyright (c) 2011-2012 Stefan Bethke. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include int get_media_subtype(int, const char *); int get_media_mode(int, const char *); int get_media_options(int, const char *); int lookup_media_word(struct ifmedia_description *, const char *); void print_media_word(int, int); void print_media_word_ifconfig(int); /* some constants */ #define IEEE802DOT1Q_VID_MAX 4094 #define IFMEDIAREQ_NULISTENTRIES 256 enum cmdmode { MODE_NONE = 0, MODE_PORT, MODE_CONFIG, MODE_VLANGROUP, MODE_REGISTER, MODE_PHYREG }; struct cfg { int fd; int verbose; int mediatypes; const char *controlfile; etherswitch_conf_t conf; etherswitch_info_t info; enum cmdmode mode; int unit; }; struct cmds { enum cmdmode mode; const char *name; int args; void (*f)(struct cfg *, char *argv[]); }; static struct cmds cmds[]; +/* Must match the ETHERSWITCH_PORT_LED_* enum order */ +static const char *ledstyles[] = { "default", "on", "off", "blink", NULL }; /* * Print a value a la the %b format of the kernel's printf. * Stolen from ifconfig.c. */ static void printb(const char *s, unsigned v, const char *bits) { int i, any = 0; char c; if (bits && *bits == 8) printf("%s=%o", s, v); else printf("%s=%x", s, v); bits++; if (bits) { putchar('<'); while ((i = *bits++) != '\0') { if (v & (1 << (i-1))) { if (any) putchar(','); any = 1; for (; (c = *bits) > 32; bits++) putchar(c); } else for (; *bits > 32; bits++) ; } putchar('>'); } } static int read_register(struct cfg *cfg, int r) { struct etherswitch_reg er; er.reg = r; if (ioctl(cfg->fd, IOETHERSWITCHGETREG, &er) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETREG)"); return (er.val); } static void write_register(struct cfg *cfg, int r, int v) { struct etherswitch_reg er; er.reg = r; er.val = v; if (ioctl(cfg->fd, IOETHERSWITCHSETREG, &er) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETREG)"); } static int read_phyregister(struct cfg *cfg, int phy, int reg) { struct etherswitch_phyreg er; er.phy = phy; er.reg = reg; if (ioctl(cfg->fd, IOETHERSWITCHGETPHYREG, &er) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETPHYREG)"); return (er.val); } static void write_phyregister(struct cfg *cfg, int phy, int reg, int val) { struct etherswitch_phyreg er; er.phy = phy; er.reg = reg; er.val = val; if (ioctl(cfg->fd, IOETHERSWITCHSETPHYREG, &er) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETPHYREG)"); } static void set_port_vid(struct cfg *cfg, char *argv[]) { int v; etherswitch_port_t p; v = strtol(argv[1], NULL, 0); if (v < 0 || v > IEEE802DOT1Q_VID_MAX) errx(EX_USAGE, "pvid must be between 0 and %d", IEEE802DOT1Q_VID_MAX); bzero(&p, sizeof(p)); p.es_port = cfg->unit; if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)"); p.es_pvid = v; if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)"); } static void set_port_flag(struct cfg *cfg, char *argv[]) { char *flag; int n; uint32_t f; etherswitch_port_t p; n = 0; f = 0; flag = argv[0]; if (strcmp(flag, "none") != 0) { if (*flag == '-') { n++; flag++; } if (strcasecmp(flag, "striptag") == 0) f = ETHERSWITCH_PORT_STRIPTAG; else if (strcasecmp(flag, "addtag") == 0) f = ETHERSWITCH_PORT_ADDTAG; else if (strcasecmp(flag, "firstlock") == 0) f = ETHERSWITCH_PORT_FIRSTLOCK; else if (strcasecmp(flag, "dropuntagged") == 0) f = ETHERSWITCH_PORT_DROPUNTAGGED; else if (strcasecmp(flag, "doubletag") == 0) f = ETHERSWITCH_PORT_DOUBLE_TAG; else if (strcasecmp(flag, "ingress") == 0) f = ETHERSWITCH_PORT_INGRESS; } bzero(&p, sizeof(p)); p.es_port = cfg->unit; if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)"); if (n) p.es_flags &= ~f; else p.es_flags |= f; if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)"); } static void set_port_media(struct cfg *cfg, char *argv[]) { etherswitch_port_t p; int ifm_ulist[IFMEDIAREQ_NULISTENTRIES]; int subtype; bzero(&p, sizeof(p)); p.es_port = cfg->unit; p.es_ifmr.ifm_ulist = ifm_ulist; p.es_ifmr.ifm_count = IFMEDIAREQ_NULISTENTRIES; if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)"); if (p.es_ifmr.ifm_count == 0) return; subtype = get_media_subtype(IFM_TYPE(ifm_ulist[0]), argv[1]); p.es_ifr.ifr_media = (p.es_ifmr.ifm_current & IFM_IMASK) | IFM_TYPE(ifm_ulist[0]) | subtype; if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)"); } static void set_port_mediaopt(struct cfg *cfg, char *argv[]) { etherswitch_port_t p; int ifm_ulist[IFMEDIAREQ_NULISTENTRIES]; int options; bzero(&p, sizeof(p)); p.es_port = cfg->unit; p.es_ifmr.ifm_ulist = ifm_ulist; p.es_ifmr.ifm_count = IFMEDIAREQ_NULISTENTRIES; if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)"); options = get_media_options(IFM_TYPE(ifm_ulist[0]), argv[1]); if (options == -1) errx(EX_USAGE, "invalid media options \"%s\"", argv[1]); if (options & IFM_HDX) { p.es_ifr.ifr_media &= ~IFM_FDX; options &= ~IFM_HDX; } p.es_ifr.ifr_media |= options; if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)"); } static void +set_port_led(struct cfg *cfg, char *argv[]) +{ + etherswitch_port_t p; + int led; + int i; + + bzero(&p, sizeof(p)); + p.es_port = cfg->unit; + if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0) + err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)"); + + led = strtol(argv[1], NULL, 0); + if (led < 1 || led > p.es_nleds) + errx(EX_USAGE, "invalid led number %s; must be between 1 and %d", + argv[1], p.es_nleds); + + led--; + + for (i=0; ledstyles[i] != NULL; i++) { + if (strcmp(argv[2], ledstyles[i]) == 0) { + p.es_led[led] = i; + break; + } + } + if (ledstyles[i] == NULL) + errx(EX_USAGE, "invalid led style \"%s\"", argv[2]); + + if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0) + err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)"); +} + +static void set_vlangroup_vid(struct cfg *cfg, char *argv[]) { int v; etherswitch_vlangroup_t vg; v = strtol(argv[1], NULL, 0); if (v < 0 || v > IEEE802DOT1Q_VID_MAX) errx(EX_USAGE, "vlan must be between 0 and %d", IEEE802DOT1Q_VID_MAX); vg.es_vlangroup = cfg->unit; if (ioctl(cfg->fd, IOETHERSWITCHGETVLANGROUP, &vg) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETVLANGROUP)"); vg.es_vid = v; if (ioctl(cfg->fd, IOETHERSWITCHSETVLANGROUP, &vg) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETVLANGROUP)"); } static void set_vlangroup_members(struct cfg *cfg, char *argv[]) { etherswitch_vlangroup_t vg; int member, untagged; char *c, *d; int v; member = untagged = 0; if (strcmp(argv[1], "none") != 0) { for (c=argv[1]; *c; c=d) { v = strtol(c, &d, 0); if (d == c) break; if (v < 0 || v >= cfg->info.es_nports) errx(EX_USAGE, "Member port must be between 0 and %d", cfg->info.es_nports-1); if (d[0] == ',' || d[0] == '\0' || ((d[0] == 't' || d[0] == 'T') && (d[1] == ',' || d[1] == '\0'))) { if (d[0] == 't' || d[0] == 'T') { untagged &= ~ETHERSWITCH_PORTMASK(v); d++; } else untagged |= ETHERSWITCH_PORTMASK(v); member |= ETHERSWITCH_PORTMASK(v); d++; } else errx(EX_USAGE, "Invalid members specification \"%s\"", d); } } vg.es_vlangroup = cfg->unit; if (ioctl(cfg->fd, IOETHERSWITCHGETVLANGROUP, &vg) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETVLANGROUP)"); vg.es_member_ports = member; vg.es_untagged_ports = untagged; if (ioctl(cfg->fd, IOETHERSWITCHSETVLANGROUP, &vg) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETVLANGROUP)"); } static int set_register(struct cfg *cfg, char *arg) { int a, v; char *c; a = strtol(arg, &c, 0); if (c==arg) return (1); if (*c == '=') { - v = strtol(c+1, NULL, 0); + v = strtoul(c+1, NULL, 0); write_register(cfg, a, v); } - printf("\treg 0x%04x=0x%04x\n", a, read_register(cfg, a)); + printf("\treg 0x%04x=0x%08x\n", a, read_register(cfg, a)); return (0); } static int set_phyregister(struct cfg *cfg, char *arg) { int phy, reg, val; char *c, *d; phy = strtol(arg, &c, 0); if (c==arg) return (1); if (*c != '.') return (1); d = c+1; reg = strtol(d, &c, 0); if (d == c) return (1); if (*c == '=') { - val = strtol(c+1, NULL, 0); + val = strtoul(c+1, NULL, 0); write_phyregister(cfg, phy, reg, val); } printf("\treg %d.0x%02x=0x%04x\n", phy, reg, read_phyregister(cfg, phy, reg)); return (0); } static void set_vlan_mode(struct cfg *cfg, char *argv[]) { etherswitch_conf_t conf; bzero(&conf, sizeof(conf)); conf.cmd = ETHERSWITCH_CONF_VLAN_MODE; if (strcasecmp(argv[1], "isl") == 0) conf.vlan_mode = ETHERSWITCH_VLAN_ISL; else if (strcasecmp(argv[1], "port") == 0) conf.vlan_mode = ETHERSWITCH_VLAN_PORT; else if (strcasecmp(argv[1], "dot1q") == 0) conf.vlan_mode = ETHERSWITCH_VLAN_DOT1Q; else if (strcasecmp(argv[1], "dot1q4k") == 0) conf.vlan_mode = ETHERSWITCH_VLAN_DOT1Q_4K; else if (strcasecmp(argv[1], "qinq") == 0) conf.vlan_mode = ETHERSWITCH_VLAN_DOUBLE_TAG; else conf.vlan_mode = 0; if (ioctl(cfg->fd, IOETHERSWITCHSETCONF, &conf) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHSETCONF)"); } static void print_config(struct cfg *cfg) { const char *c; /* Get the device name. */ c = strrchr(cfg->controlfile, '/'); if (c != NULL) c = c + 1; else c = cfg->controlfile; /* Print VLAN mode. */ if (cfg->conf.cmd & ETHERSWITCH_CONF_VLAN_MODE) { printf("%s: VLAN mode: ", c); switch (cfg->conf.vlan_mode) { case ETHERSWITCH_VLAN_ISL: printf("ISL\n"); break; case ETHERSWITCH_VLAN_PORT: printf("PORT\n"); break; case ETHERSWITCH_VLAN_DOT1Q: printf("DOT1Q\n"); break; case ETHERSWITCH_VLAN_DOT1Q_4K: printf("DOT1Q4K\n"); break; case ETHERSWITCH_VLAN_DOUBLE_TAG: printf("QinQ\n"); break; default: printf("none\n"); } } } static void print_port(struct cfg *cfg, int port) { etherswitch_port_t p; int ifm_ulist[IFMEDIAREQ_NULISTENTRIES]; int i; bzero(&p, sizeof(p)); p.es_port = port; p.es_ifmr.ifm_ulist = ifm_ulist; p.es_ifmr.ifm_count = IFMEDIAREQ_NULISTENTRIES; if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)"); printf("port%d:\n", port); if (cfg->conf.vlan_mode == ETHERSWITCH_VLAN_DOT1Q) printf("\tpvid: %d\n", p.es_pvid); printb("\tflags", p.es_flags, ETHERSWITCH_PORT_FLAGS_BITS); printf("\n"); + if (p.es_nleds) { + printf("\tled: "); + for (i = 0; i < p.es_nleds; i++) { + printf("%d:%s%s", i+1, ledstyles[p.es_led[i]], (i==p.es_nleds-1)?"":" "); + } + printf("\n"); + } printf("\tmedia: "); print_media_word(p.es_ifmr.ifm_current, 1); if (p.es_ifmr.ifm_active != p.es_ifmr.ifm_current) { putchar(' '); putchar('('); print_media_word(p.es_ifmr.ifm_active, 0); putchar(')'); } putchar('\n'); printf("\tstatus: %s\n", (p.es_ifmr.ifm_status & IFM_ACTIVE) != 0 ? "active" : "no carrier"); if (cfg->mediatypes) { printf("\tsupported media:\n"); if (p.es_ifmr.ifm_count > IFMEDIAREQ_NULISTENTRIES) p.es_ifmr.ifm_count = IFMEDIAREQ_NULISTENTRIES; for (i=0; ifd, IOETHERSWITCHGETVLANGROUP, &vg) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETVLANGROUP)"); if ((vg.es_vid & ETHERSWITCH_VID_VALID) == 0) return; vg.es_vid &= ETHERSWITCH_VID_MASK; printf("vlangroup%d:\n", vlangroup); if (cfg->conf.vlan_mode == ETHERSWITCH_VLAN_PORT) printf("\tport: %d\n", vg.es_vid); else printf("\tvlan: %d\n", vg.es_vid); printf("\tmembers "); comma = 0; if (vg.es_member_ports != 0) for (i=0; iinfo.es_nports; i++) { if ((vg.es_member_ports & ETHERSWITCH_PORTMASK(i)) != 0) { if (comma) printf(","); printf("%d", i); if ((vg.es_untagged_ports & ETHERSWITCH_PORTMASK(i)) == 0) printf("t"); comma = 1; } } else printf("none"); printf("\n"); } static void print_info(struct cfg *cfg) { const char *c; int i; c = strrchr(cfg->controlfile, '/'); if (c != NULL) c = c + 1; else c = cfg->controlfile; if (cfg->verbose) { printf("%s: %s with %d ports and %d VLAN groups\n", c, cfg->info.es_name, cfg->info.es_nports, cfg->info.es_nvlangroups); printf("%s: ", c); printb("VLAN capabilities", cfg->info.es_vlan_caps, ETHERSWITCH_VLAN_CAPS_BITS); printf("\n"); } print_config(cfg); for (i=0; iinfo.es_nports; i++) { print_port(cfg, i); } for (i=0; iinfo.es_nvlangroups; i++) { print_vlangroup(cfg, i); } } static void usage(struct cfg *cfg __unused, char *argv[] __unused) { fprintf(stderr, "usage: etherswitchctl\n"); fprintf(stderr, "\tetherswitchcfg [-f control file] info\n"); fprintf(stderr, "\tetherswitchcfg [-f control file] config " "command parameter\n"); fprintf(stderr, "\t\tconfig commands: vlan_mode\n"); fprintf(stderr, "\tetherswitchcfg [-f control file] phy " "phy.register[=value]\n"); fprintf(stderr, "\tetherswitchcfg [-f control file] portX " "[flags] command parameter\n"); - fprintf(stderr, "\t\tport commands: pvid, media, mediaopt\n"); + fprintf(stderr, "\t\tport commands: pvid, media, mediaopt, led\n"); fprintf(stderr, "\tetherswitchcfg [-f control file] reg " "register[=value]\n"); fprintf(stderr, "\tetherswitchcfg [-f control file] vlangroupX " "command parameter\n"); fprintf(stderr, "\t\tvlangroup commands: vlan, members\n"); exit(EX_USAGE); } static void newmode(struct cfg *cfg, enum cmdmode mode) { if (mode == cfg->mode) return; switch (cfg->mode) { case MODE_NONE: break; case MODE_CONFIG: /* * Read the updated the configuration (it can be different * from the last time we read it). */ if (ioctl(cfg->fd, IOETHERSWITCHGETCONF, &cfg->conf) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETCONF)"); print_config(cfg); break; case MODE_PORT: print_port(cfg, cfg->unit); break; case MODE_VLANGROUP: print_vlangroup(cfg, cfg->unit); break; case MODE_REGISTER: case MODE_PHYREG: break; } cfg->mode = mode; } int main(int argc, char *argv[]) { int ch; struct cfg cfg; int i; bzero(&cfg, sizeof(cfg)); cfg.controlfile = "/dev/etherswitch0"; while ((ch = getopt(argc, argv, "f:mv?")) != -1) switch(ch) { case 'f': cfg.controlfile = optarg; break; case 'm': cfg.mediatypes++; break; case 'v': cfg.verbose++; break; case '?': /* FALLTHROUGH */ default: usage(&cfg, argv); } argc -= optind; argv += optind; cfg.fd = open(cfg.controlfile, O_RDONLY); if (cfg.fd < 0) err(EX_UNAVAILABLE, "Can't open control file: %s", cfg.controlfile); if (ioctl(cfg.fd, IOETHERSWITCHGETINFO, &cfg.info) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETINFO)"); if (ioctl(cfg.fd, IOETHERSWITCHGETCONF, &cfg.conf) != 0) err(EX_OSERR, "ioctl(IOETHERSWITCHGETCONF)"); if (argc == 0) { print_info(&cfg); return (0); } cfg.mode = MODE_NONE; while (argc > 0) { switch(cfg.mode) { case MODE_NONE: if (strcmp(argv[0], "info") == 0) { print_info(&cfg); } else if (sscanf(argv[0], "port%d", &cfg.unit) == 1) { if (cfg.unit < 0 || cfg.unit >= cfg.info.es_nports) errx(EX_USAGE, "port unit must be between 0 and %d", cfg.info.es_nports - 1); newmode(&cfg, MODE_PORT); } else if (sscanf(argv[0], "vlangroup%d", &cfg.unit) == 1) { if (cfg.unit < 0 || cfg.unit >= cfg.info.es_nvlangroups) errx(EX_USAGE, "vlangroup unit must be between 0 and %d", cfg.info.es_nvlangroups - 1); newmode(&cfg, MODE_VLANGROUP); } else if (strcmp(argv[0], "config") == 0) { newmode(&cfg, MODE_CONFIG); } else if (strcmp(argv[0], "phy") == 0) { newmode(&cfg, MODE_PHYREG); } else if (strcmp(argv[0], "reg") == 0) { newmode(&cfg, MODE_REGISTER); } else if (strcmp(argv[0], "help") == 0) { usage(&cfg, argv); } else { errx(EX_USAGE, "Unknown command \"%s\"", argv[0]); } break; case MODE_PORT: case MODE_CONFIG: case MODE_VLANGROUP: for(i=0; cmds[i].name != NULL; i++) { if (cfg.mode == cmds[i].mode && strcmp(argv[0], cmds[i].name) == 0) { if (argc < (cmds[i].args + 1)) { - printf("%s needs an argument\n", cmds[i].name); + printf("%s needs %d argument%s\n", cmds[i].name, cmds[i].args, (cmds[i].args==1)?"":","); break; } (cmds[i].f)(&cfg, argv); argc -= cmds[i].args; argv += cmds[i].args; break; } } if (cmds[i].name == NULL) { newmode(&cfg, MODE_NONE); continue; } break; case MODE_REGISTER: if (set_register(&cfg, argv[0]) != 0) { newmode(&cfg, MODE_NONE); continue; } break; case MODE_PHYREG: if (set_phyregister(&cfg, argv[0]) != 0) { newmode(&cfg, MODE_NONE); continue; } break; } argc--; argv++; } /* switch back to command mode to print configuration for last command */ newmode(&cfg, MODE_NONE); close(cfg.fd); return (0); } static struct cmds cmds[] = { { MODE_PORT, "pvid", 1, set_port_vid }, { MODE_PORT, "media", 1, set_port_media }, { MODE_PORT, "mediaopt", 1, set_port_mediaopt }, + { MODE_PORT, "led", 2, set_port_led }, { MODE_PORT, "addtag", 0, set_port_flag }, { MODE_PORT, "-addtag", 0, set_port_flag }, { MODE_PORT, "ingress", 0, set_port_flag }, { MODE_PORT, "-ingress", 0, set_port_flag }, { MODE_PORT, "striptag", 0, set_port_flag }, { MODE_PORT, "-striptag", 0, set_port_flag }, { MODE_PORT, "doubletag", 0, set_port_flag }, { MODE_PORT, "-doubletag", 0, set_port_flag }, { MODE_PORT, "firstlock", 0, set_port_flag }, { MODE_PORT, "-firstlock", 0, set_port_flag }, { MODE_PORT, "dropuntagged", 0, set_port_flag }, { MODE_PORT, "-dropuntagged", 0, set_port_flag }, { MODE_CONFIG, "vlan_mode", 1, set_vlan_mode }, { MODE_VLANGROUP, "vlan", 1, set_vlangroup_vid }, { MODE_VLANGROUP, "members", 1, set_vlangroup_members }, { 0, NULL, 0, NULL } }; Index: user/alc/PQ_LAUNDRY/sbin/pfctl/parse.y =================================================================== --- user/alc/PQ_LAUNDRY/sbin/pfctl/parse.y (revision 303774) +++ user/alc/PQ_LAUNDRY/sbin/pfctl/parse.y (revision 303775) @@ -1,6259 +1,6269 @@ /* $OpenBSD: parse.y,v 1.554 2008/10/17 12:59:53 henning Exp $ */ /* * Copyright (c) 2001 Markus Friedl. All rights reserved. * Copyright (c) 2001 Daniel Hartmeier. All rights reserved. * Copyright (c) 2001 Theo de Raadt. All rights reserved. * Copyright (c) 2002,2003 Henning Brauer. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ %{ #include __FBSDID("$FreeBSD$"); #include #include #include #ifdef __FreeBSD__ #include #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pfctl_parser.h" #include "pfctl.h" static struct pfctl *pf = NULL; static int debug = 0; static int rulestate = 0; static u_int16_t returnicmpdefault = (ICMP_UNREACH << 8) | ICMP_UNREACH_PORT; static u_int16_t returnicmp6default = (ICMP6_DST_UNREACH << 8) | ICMP6_DST_UNREACH_NOPORT; static int blockpolicy = PFRULE_DROP; static int require_order = 1; static int default_statelock; -TAILQ_HEAD(files, file) files = TAILQ_HEAD_INITIALIZER(files); +static TAILQ_HEAD(files, file) files = TAILQ_HEAD_INITIALIZER(files); static struct file { TAILQ_ENTRY(file) entry; FILE *stream; char *name; int lineno; int errors; } *file; struct file *pushfile(const char *, int); int popfile(void); int check_file_secrecy(int, const char *); int yyparse(void); int yylex(void); int yyerror(const char *, ...); int kw_cmp(const void *, const void *); int lookup(char *); int lgetc(int); int lungetc(int); int findeol(void); -TAILQ_HEAD(symhead, sym) symhead = TAILQ_HEAD_INITIALIZER(symhead); +static TAILQ_HEAD(symhead, sym) symhead = TAILQ_HEAD_INITIALIZER(symhead); struct sym { TAILQ_ENTRY(sym) entry; int used; int persist; char *nam; char *val; }; int symset(const char *, const char *, int); char *symget(const char *); int atoul(char *, u_long *); enum { PFCTL_STATE_NONE, PFCTL_STATE_OPTION, PFCTL_STATE_SCRUB, PFCTL_STATE_QUEUE, PFCTL_STATE_NAT, PFCTL_STATE_FILTER }; struct node_proto { u_int8_t proto; struct node_proto *next; struct node_proto *tail; }; struct node_port { u_int16_t port[2]; u_int8_t op; struct node_port *next; struct node_port *tail; }; struct node_uid { uid_t uid[2]; u_int8_t op; struct node_uid *next; struct node_uid *tail; }; struct node_gid { gid_t gid[2]; u_int8_t op; struct node_gid *next; struct node_gid *tail; }; struct node_icmp { u_int8_t code; u_int8_t type; u_int8_t proto; struct node_icmp *next; struct node_icmp *tail; }; enum { PF_STATE_OPT_MAX, PF_STATE_OPT_NOSYNC, PF_STATE_OPT_SRCTRACK, PF_STATE_OPT_MAX_SRC_STATES, PF_STATE_OPT_MAX_SRC_CONN, PF_STATE_OPT_MAX_SRC_CONN_RATE, PF_STATE_OPT_MAX_SRC_NODES, PF_STATE_OPT_OVERLOAD, PF_STATE_OPT_STATELOCK, PF_STATE_OPT_TIMEOUT, PF_STATE_OPT_SLOPPY, }; enum { PF_SRCTRACK_NONE, PF_SRCTRACK, PF_SRCTRACK_GLOBAL, PF_SRCTRACK_RULE }; struct node_state_opt { int type; union { u_int32_t max_states; u_int32_t max_src_states; u_int32_t max_src_conn; struct { u_int32_t limit; u_int32_t seconds; } max_src_conn_rate; struct { u_int8_t flush; char tblname[PF_TABLE_NAME_SIZE]; } overload; u_int32_t max_src_nodes; u_int8_t src_track; u_int32_t statelock; struct { int number; u_int32_t seconds; } timeout; } data; struct node_state_opt *next; struct node_state_opt *tail; }; struct peer { struct node_host *host; struct node_port *port; }; -struct node_queue { +static struct node_queue { char queue[PF_QNAME_SIZE]; char parent[PF_QNAME_SIZE]; char ifname[IFNAMSIZ]; int scheduler; struct node_queue *next; struct node_queue *tail; } *queues = NULL; struct node_qassign { char *qname; char *pqname; }; -struct filter_opts { +static struct filter_opts { int marker; #define FOM_FLAGS 0x01 #define FOM_ICMP 0x02 #define FOM_TOS 0x04 #define FOM_KEEP 0x08 #define FOM_SRCTRACK 0x10 #define FOM_SETPRIO 0x0400 #define FOM_PRIO 0x2000 struct node_uid *uid; struct node_gid *gid; struct { u_int8_t b1; u_int8_t b2; u_int16_t w; u_int16_t w2; } flags; struct node_icmp *icmpspec; u_int32_t tos; u_int32_t prob; struct { int action; struct node_state_opt *options; } keep; int fragment; int allowopts; char *label; struct node_qassign queues; char *tag; char *match_tag; u_int8_t match_tag_not; u_int rtableid; u_int8_t prio; u_int8_t set_prio[2]; struct { struct node_host *addr; u_int16_t port; } divert; } filter_opts; -struct antispoof_opts { +static struct antispoof_opts { char *label; u_int rtableid; } antispoof_opts; -struct scrub_opts { +static struct scrub_opts { int marker; #define SOM_MINTTL 0x01 #define SOM_MAXMSS 0x02 #define SOM_FRAGCACHE 0x04 #define SOM_SETTOS 0x08 int nodf; int minttl; int maxmss; int settos; int fragcache; int randomid; int reassemble_tcp; char *match_tag; u_int8_t match_tag_not; u_int rtableid; } scrub_opts; -struct queue_opts { +static struct queue_opts { int marker; #define QOM_BWSPEC 0x01 #define QOM_SCHEDULER 0x02 #define QOM_PRIORITY 0x04 #define QOM_TBRSIZE 0x08 #define QOM_QLIMIT 0x10 struct node_queue_bw queue_bwspec; struct node_queue_opt scheduler; int priority; int tbrsize; int qlimit; } queue_opts; -struct table_opts { +static struct table_opts { int flags; int init_addr; struct node_tinithead init_nodes; } table_opts; -struct pool_opts { +static struct pool_opts { int marker; #define POM_TYPE 0x01 #define POM_STICKYADDRESS 0x02 u_int8_t opts; int type; int staticport; struct pf_poolhashkey *key; } pool_opts; -struct codel_opts codel_opts; -struct node_hfsc_opts hfsc_opts; -struct node_fairq_opts fairq_opts; -struct node_state_opt *keep_state_defaults = NULL; +static struct codel_opts codel_opts; +static struct node_hfsc_opts hfsc_opts; +static struct node_fairq_opts fairq_opts; +static struct node_state_opt *keep_state_defaults = NULL; int disallow_table(struct node_host *, const char *); int disallow_urpf_failed(struct node_host *, const char *); int disallow_alias(struct node_host *, const char *); int rule_consistent(struct pf_rule *, int); int filter_consistent(struct pf_rule *, int); int nat_consistent(struct pf_rule *); int rdr_consistent(struct pf_rule *); int process_tabledef(char *, struct table_opts *); void expand_label_str(char *, size_t, const char *, const char *); void expand_label_if(const char *, char *, size_t, const char *); void expand_label_addr(const char *, char *, size_t, u_int8_t, struct node_host *); void expand_label_port(const char *, char *, size_t, struct node_port *); void expand_label_proto(const char *, char *, size_t, u_int8_t); void expand_label_nr(const char *, char *, size_t); void expand_label(char *, size_t, const char *, u_int8_t, struct node_host *, struct node_port *, struct node_host *, struct node_port *, u_int8_t); void expand_rule(struct pf_rule *, struct node_if *, struct node_host *, struct node_proto *, struct node_os *, struct node_host *, struct node_port *, struct node_host *, struct node_port *, struct node_uid *, struct node_gid *, struct node_icmp *, const char *); int expand_altq(struct pf_altq *, struct node_if *, struct node_queue *, struct node_queue_bw bwspec, struct node_queue_opt *); int expand_queue(struct pf_altq *, struct node_if *, struct node_queue *, struct node_queue_bw, struct node_queue_opt *); int expand_skip_interface(struct node_if *); int check_rulestate(int); int getservice(char *); int rule_label(struct pf_rule *, char *); int rt_tableid_max(void); void mv_rules(struct pf_ruleset *, struct pf_ruleset *); void decide_address_family(struct node_host *, sa_family_t *); void remove_invalid_hosts(struct node_host **, sa_family_t *); int invalid_redirect(struct node_host *, sa_family_t); u_int16_t parseicmpspec(char *, sa_family_t); -TAILQ_HEAD(loadanchorshead, loadanchors) +static TAILQ_HEAD(loadanchorshead, loadanchors) loadanchorshead = TAILQ_HEAD_INITIALIZER(loadanchorshead); struct loadanchors { TAILQ_ENTRY(loadanchors) entries; char *anchorname; char *filename; }; typedef struct { union { int64_t number; double probability; int i; char *string; u_int rtableid; struct { u_int8_t b1; u_int8_t b2; u_int16_t w; u_int16_t w2; } b; struct range { int a; int b; int t; } range; struct node_if *interface; struct node_proto *proto; struct node_icmp *icmp; struct node_host *host; struct node_os *os; struct node_port *port; struct node_uid *uid; struct node_gid *gid; struct node_state_opt *state_opt; struct peer peer; struct { struct peer src, dst; struct node_os *src_os; } fromto; struct { struct node_host *host; u_int8_t rt; u_int8_t pool_opts; sa_family_t af; struct pf_poolhashkey *key; } route; struct redirection { struct node_host *host; struct range rport; } *redirection; struct { int action; struct node_state_opt *options; } keep_state; struct { u_int8_t log; u_int8_t logif; u_int8_t quick; } logquick; struct { int neg; char *name; } tagged; struct pf_poolhashkey *hashkey; struct node_queue *queue; struct node_queue_opt queue_options; struct node_queue_bw queue_bwspec; struct node_qassign qassign; struct filter_opts filter_opts; struct antispoof_opts antispoof_opts; struct queue_opts queue_opts; struct scrub_opts scrub_opts; struct table_opts table_opts; struct pool_opts pool_opts; struct node_hfsc_opts hfsc_opts; struct node_fairq_opts fairq_opts; struct codel_opts codel_opts; } v; int lineno; } YYSTYPE; #define PPORT_RANGE 1 #define PPORT_STAR 2 int parseport(char *, struct range *r, int); #define DYNIF_MULTIADDR(addr) ((addr).type == PF_ADDR_DYNIFTL && \ (!((addr).iflags & PFI_AFLAG_NOALIAS) || \ !isdigit((addr).v.ifname[strlen((addr).v.ifname)-1]))) %} %token PASS BLOCK SCRUB RETURN IN OS OUT LOG QUICK ON FROM TO FLAGS %token RETURNRST RETURNICMP RETURNICMP6 PROTO INET INET6 ALL ANY ICMPTYPE %token ICMP6TYPE CODE KEEP MODULATE STATE PORT RDR NAT BINAT ARROW NODF %token MINTTL ERROR ALLOWOPTS FASTROUTE FILENAME ROUTETO DUPTO REPLYTO NO LABEL %token NOROUTE URPFFAILED FRAGMENT USER GROUP MAXMSS MAXIMUM TTL TOS DROP TABLE %token REASSEMBLE FRAGDROP FRAGCROP ANCHOR NATANCHOR RDRANCHOR BINATANCHOR %token SET OPTIMIZATION TIMEOUT LIMIT LOGINTERFACE BLOCKPOLICY RANDOMID %token REQUIREORDER SYNPROXY FINGERPRINTS NOSYNC DEBUG SKIP HOSTID %token ANTISPOOF FOR INCLUDE %token BITMASK RANDOM SOURCEHASH ROUNDROBIN STATICPORT PROBABILITY %token ALTQ CBQ CODEL PRIQ HFSC FAIRQ BANDWIDTH TBRSIZE LINKSHARE REALTIME %token UPPERLIMIT QUEUE PRIORITY QLIMIT HOGS BUCKETS RTABLE TARGET INTERVAL %token LOAD RULESET_OPTIMIZATION PRIO %token STICKYADDRESS MAXSRCSTATES MAXSRCNODES SOURCETRACK GLOBAL RULE %token MAXSRCCONN MAXSRCCONNRATE OVERLOAD FLUSH SLOPPY %token TAGGED TAG IFBOUND FLOATING STATEPOLICY STATEDEFAULTS ROUTE SETTOS %token DIVERTTO DIVERTREPLY %token STRING %token NUMBER %token PORTBINARY %type interface if_list if_item_not if_item %type number icmptype icmp6type uid gid %type tos not yesno %type probability %type no dir af fragcache optimizer %type sourcetrack flush unaryop statelock %type action nataction natpasslog scrubaction %type flags flag blockspec prio %type portplain portstar portrange %type hashkey %type proto proto_list proto_item %type protoval %type icmpspec %type icmp_list icmp_item %type icmp6_list icmp6_item %type reticmpspec reticmp6spec %type fromto %type ipportspec from to %type ipspec toipspec xhost host dynaddr host_list %type redir_host_list redirspec %type route_host route_host_list routespec %type os xos os_list %type portspec port_list port_item %type uids uid_list uid_item %type gids gid_list gid_item %type route %type redirection redirpool %type label stringall tag anchorname %type string varstring numberstring %type keep %type state_opt_spec state_opt_list state_opt_item %type logquick quick log logopts logopt %type antispoof_ifspc antispoof_iflst antispoof_if %type qname %type qassign qassign_list qassign_item %type scheduler %type cbqflags_list cbqflags_item %type priqflags_list priqflags_item %type hfscopts_list hfscopts_item hfsc_opts %type fairqopts_list fairqopts_item fairq_opts %type codelopts_list codelopts_item codel_opts %type bandwidth %type filter_opts filter_opt filter_opts_l %type filter_sets filter_set filter_sets_l %type antispoof_opts antispoof_opt antispoof_opts_l %type queue_opts queue_opt queue_opts_l %type scrub_opts scrub_opt scrub_opts_l %type table_opts table_opt table_opts_l %type pool_opts pool_opt pool_opts_l %type tagged %type rtable %% ruleset : /* empty */ | ruleset include '\n' | ruleset '\n' | ruleset option '\n' | ruleset scrubrule '\n' | ruleset natrule '\n' | ruleset binatrule '\n' | ruleset pfrule '\n' | ruleset anchorrule '\n' | ruleset loadrule '\n' | ruleset altqif '\n' | ruleset queuespec '\n' | ruleset varset '\n' | ruleset antispoof '\n' | ruleset tabledef '\n' | '{' fakeanchor '}' '\n'; | ruleset error '\n' { file->errors++; } ; include : INCLUDE STRING { struct file *nfile; if ((nfile = pushfile($2, 0)) == NULL) { yyerror("failed to include file %s", $2); free($2); YYERROR; } free($2); file = nfile; lungetc('\n'); } ; /* * apply to previouslys specified rule: must be careful to note * what that is: pf or nat or binat or rdr */ fakeanchor : fakeanchor '\n' | fakeanchor anchorrule '\n' | fakeanchor binatrule '\n' | fakeanchor natrule '\n' | fakeanchor pfrule '\n' | fakeanchor error '\n' ; optimizer : string { if (!strcmp($1, "none")) $$ = 0; else if (!strcmp($1, "basic")) $$ = PF_OPTIMIZE_BASIC; else if (!strcmp($1, "profile")) $$ = PF_OPTIMIZE_BASIC | PF_OPTIMIZE_PROFILE; else { yyerror("unknown ruleset-optimization %s", $1); YYERROR; } } ; option : SET OPTIMIZATION STRING { if (check_rulestate(PFCTL_STATE_OPTION)) { free($3); YYERROR; } if (pfctl_set_optimization(pf, $3) != 0) { yyerror("unknown optimization %s", $3); free($3); YYERROR; } free($3); } | SET RULESET_OPTIMIZATION optimizer { if (!(pf->opts & PF_OPT_OPTIMIZE)) { pf->opts |= PF_OPT_OPTIMIZE; pf->optimize = $3; } } | SET TIMEOUT timeout_spec | SET TIMEOUT '{' optnl timeout_list '}' | SET LIMIT limit_spec | SET LIMIT '{' optnl limit_list '}' | SET LOGINTERFACE stringall { if (check_rulestate(PFCTL_STATE_OPTION)) { free($3); YYERROR; } if (pfctl_set_logif(pf, $3) != 0) { yyerror("error setting loginterface %s", $3); free($3); YYERROR; } free($3); } | SET HOSTID number { if ($3 == 0 || $3 > UINT_MAX) { yyerror("hostid must be non-zero"); YYERROR; } if (pfctl_set_hostid(pf, $3) != 0) { yyerror("error setting hostid %08x", $3); YYERROR; } } | SET BLOCKPOLICY DROP { if (pf->opts & PF_OPT_VERBOSE) printf("set block-policy drop\n"); if (check_rulestate(PFCTL_STATE_OPTION)) YYERROR; blockpolicy = PFRULE_DROP; } | SET BLOCKPOLICY RETURN { if (pf->opts & PF_OPT_VERBOSE) printf("set block-policy return\n"); if (check_rulestate(PFCTL_STATE_OPTION)) YYERROR; blockpolicy = PFRULE_RETURN; } | SET REQUIREORDER yesno { if (pf->opts & PF_OPT_VERBOSE) printf("set require-order %s\n", $3 == 1 ? "yes" : "no"); require_order = $3; } | SET FINGERPRINTS STRING { if (pf->opts & PF_OPT_VERBOSE) printf("set fingerprints \"%s\"\n", $3); if (check_rulestate(PFCTL_STATE_OPTION)) { free($3); YYERROR; } if (!pf->anchor->name[0]) { if (pfctl_file_fingerprints(pf->dev, pf->opts, $3)) { yyerror("error loading " "fingerprints %s", $3); free($3); YYERROR; } } free($3); } | SET STATEPOLICY statelock { if (pf->opts & PF_OPT_VERBOSE) switch ($3) { case 0: printf("set state-policy floating\n"); break; case PFRULE_IFBOUND: printf("set state-policy if-bound\n"); break; } default_statelock = $3; } | SET DEBUG STRING { if (check_rulestate(PFCTL_STATE_OPTION)) { free($3); YYERROR; } if (pfctl_set_debug(pf, $3) != 0) { yyerror("error setting debuglevel %s", $3); free($3); YYERROR; } free($3); } | SET SKIP interface { if (expand_skip_interface($3) != 0) { yyerror("error setting skip interface(s)"); YYERROR; } } | SET STATEDEFAULTS state_opt_list { if (keep_state_defaults != NULL) { yyerror("cannot redefine state-defaults"); YYERROR; } keep_state_defaults = $3; } ; stringall : STRING { $$ = $1; } | ALL { if (($$ = strdup("all")) == NULL) { err(1, "stringall: strdup"); } } ; string : STRING string { if (asprintf(&$$, "%s %s", $1, $2) == -1) err(1, "string: asprintf"); free($1); free($2); } | STRING ; varstring : numberstring varstring { if (asprintf(&$$, "%s %s", $1, $2) == -1) err(1, "string: asprintf"); free($1); free($2); } | numberstring ; numberstring : NUMBER { char *s; if (asprintf(&s, "%lld", (long long)$1) == -1) { yyerror("string: asprintf"); YYERROR; } $$ = s; } | STRING ; varset : STRING '=' varstring { if (pf->opts & PF_OPT_VERBOSE) printf("%s = \"%s\"\n", $1, $3); if (symset($1, $3, 0) == -1) err(1, "cannot store variable %s", $1); free($1); free($3); } ; anchorname : STRING { $$ = $1; } | /* empty */ { $$ = NULL; } ; pfa_anchorlist : /* empty */ | pfa_anchorlist '\n' | pfa_anchorlist pfrule '\n' | pfa_anchorlist anchorrule '\n' ; pfa_anchor : '{' { char ta[PF_ANCHOR_NAME_SIZE]; struct pf_ruleset *rs; /* steping into a brace anchor */ pf->asd++; pf->bn++; pf->brace = 1; /* create a holding ruleset in the root */ snprintf(ta, PF_ANCHOR_NAME_SIZE, "_%d", pf->bn); rs = pf_find_or_create_ruleset(ta); if (rs == NULL) err(1, "pfa_anchor: pf_find_or_create_ruleset"); pf->astack[pf->asd] = rs->anchor; pf->anchor = rs->anchor; } '\n' pfa_anchorlist '}' { pf->alast = pf->anchor; pf->asd--; pf->anchor = pf->astack[pf->asd]; } | /* empty */ ; anchorrule : ANCHOR anchorname dir quick interface af proto fromto filter_opts pfa_anchor { struct pf_rule r; struct node_proto *proto; if (check_rulestate(PFCTL_STATE_FILTER)) { if ($2) free($2); YYERROR; } if ($2 && ($2[0] == '_' || strstr($2, "/_") != NULL)) { free($2); yyerror("anchor names beginning with '_' " "are reserved for internal use"); YYERROR; } memset(&r, 0, sizeof(r)); if (pf->astack[pf->asd + 1]) { /* move inline rules into relative location */ pf_anchor_setup(&r, &pf->astack[pf->asd]->ruleset, $2 ? $2 : pf->alast->name); if (r.anchor == NULL) err(1, "anchorrule: unable to " "create ruleset"); if (pf->alast != r.anchor) { if (r.anchor->match) { yyerror("inline anchor '%s' " "already exists", r.anchor->name); YYERROR; } mv_rules(&pf->alast->ruleset, &r.anchor->ruleset); } pf_remove_if_empty_ruleset(&pf->alast->ruleset); pf->alast = r.anchor; } else { if (!$2) { yyerror("anchors without explicit " "rules must specify a name"); YYERROR; } } r.direction = $3; r.quick = $4.quick; r.af = $6; r.prob = $9.prob; r.rtableid = $9.rtableid; if ($9.tag) if (strlcpy(r.tagname, $9.tag, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } if ($9.match_tag) if (strlcpy(r.match_tagname, $9.match_tag, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } r.match_tag_not = $9.match_tag_not; if (rule_label(&r, $9.label)) YYERROR; free($9.label); r.flags = $9.flags.b1; r.flagset = $9.flags.b2; if (($9.flags.b1 & $9.flags.b2) != $9.flags.b1) { yyerror("flags always false"); YYERROR; } if ($9.flags.b1 || $9.flags.b2 || $8.src_os) { for (proto = $7; proto != NULL && proto->proto != IPPROTO_TCP; proto = proto->next) ; /* nothing */ if (proto == NULL && $7 != NULL) { if ($9.flags.b1 || $9.flags.b2) yyerror( "flags only apply to tcp"); if ($8.src_os) yyerror( "OS fingerprinting only " "applies to tcp"); YYERROR; } } r.tos = $9.tos; if ($9.keep.action) { yyerror("cannot specify state handling " "on anchors"); YYERROR; } if ($9.match_tag) if (strlcpy(r.match_tagname, $9.match_tag, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } r.match_tag_not = $9.match_tag_not; if ($9.marker & FOM_PRIO) { if ($9.prio == 0) r.prio = PF_PRIO_ZERO; else r.prio = $9.prio; } if ($9.marker & FOM_SETPRIO) { r.set_prio[0] = $9.set_prio[0]; r.set_prio[1] = $9.set_prio[1]; r.scrub_flags |= PFSTATE_SETPRIO; } decide_address_family($8.src.host, &r.af); decide_address_family($8.dst.host, &r.af); expand_rule(&r, $5, NULL, $7, $8.src_os, $8.src.host, $8.src.port, $8.dst.host, $8.dst.port, $9.uid, $9.gid, $9.icmpspec, pf->astack[pf->asd + 1] ? pf->alast->name : $2); free($2); pf->astack[pf->asd + 1] = NULL; } | NATANCHOR string interface af proto fromto rtable { struct pf_rule r; if (check_rulestate(PFCTL_STATE_NAT)) { free($2); YYERROR; } memset(&r, 0, sizeof(r)); r.action = PF_NAT; r.af = $4; r.rtableid = $7; decide_address_family($6.src.host, &r.af); decide_address_family($6.dst.host, &r.af); expand_rule(&r, $3, NULL, $5, $6.src_os, $6.src.host, $6.src.port, $6.dst.host, $6.dst.port, 0, 0, 0, $2); free($2); } | RDRANCHOR string interface af proto fromto rtable { struct pf_rule r; if (check_rulestate(PFCTL_STATE_NAT)) { free($2); YYERROR; } memset(&r, 0, sizeof(r)); r.action = PF_RDR; r.af = $4; r.rtableid = $7; decide_address_family($6.src.host, &r.af); decide_address_family($6.dst.host, &r.af); if ($6.src.port != NULL) { yyerror("source port parameter not supported" " in rdr-anchor"); YYERROR; } if ($6.dst.port != NULL) { if ($6.dst.port->next != NULL) { yyerror("destination port list " "expansion not supported in " "rdr-anchor"); YYERROR; } else if ($6.dst.port->op != PF_OP_EQ) { yyerror("destination port operators" " not supported in rdr-anchor"); YYERROR; } r.dst.port[0] = $6.dst.port->port[0]; r.dst.port[1] = $6.dst.port->port[1]; r.dst.port_op = $6.dst.port->op; } expand_rule(&r, $3, NULL, $5, $6.src_os, $6.src.host, $6.src.port, $6.dst.host, $6.dst.port, 0, 0, 0, $2); free($2); } | BINATANCHOR string interface af proto fromto rtable { struct pf_rule r; if (check_rulestate(PFCTL_STATE_NAT)) { free($2); YYERROR; } memset(&r, 0, sizeof(r)); r.action = PF_BINAT; r.af = $4; r.rtableid = $7; if ($5 != NULL) { if ($5->next != NULL) { yyerror("proto list expansion" " not supported in binat-anchor"); YYERROR; } r.proto = $5->proto; free($5); } if ($6.src.host != NULL || $6.src.port != NULL || $6.dst.host != NULL || $6.dst.port != NULL) { yyerror("fromto parameter not supported" " in binat-anchor"); YYERROR; } decide_address_family($6.src.host, &r.af); decide_address_family($6.dst.host, &r.af); pfctl_add_rule(pf, &r, $2); free($2); } ; loadrule : LOAD ANCHOR string FROM string { struct loadanchors *loadanchor; if (strlen(pf->anchor->name) + 1 + strlen($3) >= MAXPATHLEN) { yyerror("anchorname %s too long, max %u\n", $3, MAXPATHLEN - 1); free($3); YYERROR; } loadanchor = calloc(1, sizeof(struct loadanchors)); if (loadanchor == NULL) err(1, "loadrule: calloc"); if ((loadanchor->anchorname = malloc(MAXPATHLEN)) == NULL) err(1, "loadrule: malloc"); if (pf->anchor->name[0]) snprintf(loadanchor->anchorname, MAXPATHLEN, "%s/%s", pf->anchor->name, $3); else strlcpy(loadanchor->anchorname, $3, MAXPATHLEN); if ((loadanchor->filename = strdup($5)) == NULL) err(1, "loadrule: strdup"); TAILQ_INSERT_TAIL(&loadanchorshead, loadanchor, entries); free($3); free($5); }; scrubaction : no SCRUB { $$.b2 = $$.w = 0; if ($1) $$.b1 = PF_NOSCRUB; else $$.b1 = PF_SCRUB; } ; scrubrule : scrubaction dir logquick interface af proto fromto scrub_opts { struct pf_rule r; if (check_rulestate(PFCTL_STATE_SCRUB)) YYERROR; memset(&r, 0, sizeof(r)); r.action = $1.b1; r.direction = $2; r.log = $3.log; r.logif = $3.logif; if ($3.quick) { yyerror("scrub rules do not support 'quick'"); YYERROR; } r.af = $5; if ($8.nodf) r.rule_flag |= PFRULE_NODF; if ($8.randomid) r.rule_flag |= PFRULE_RANDOMID; if ($8.reassemble_tcp) { if (r.direction != PF_INOUT) { yyerror("reassemble tcp rules can not " "specify direction"); YYERROR; } r.rule_flag |= PFRULE_REASSEMBLE_TCP; } if ($8.minttl) r.min_ttl = $8.minttl; if ($8.maxmss) r.max_mss = $8.maxmss; if ($8.marker & SOM_SETTOS) { r.rule_flag |= PFRULE_SET_TOS; r.set_tos = $8.settos; } if ($8.fragcache) r.rule_flag |= $8.fragcache; if ($8.match_tag) if (strlcpy(r.match_tagname, $8.match_tag, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } r.match_tag_not = $8.match_tag_not; r.rtableid = $8.rtableid; expand_rule(&r, $4, NULL, $6, $7.src_os, $7.src.host, $7.src.port, $7.dst.host, $7.dst.port, NULL, NULL, NULL, ""); } ; scrub_opts : { bzero(&scrub_opts, sizeof scrub_opts); scrub_opts.rtableid = -1; } scrub_opts_l { $$ = scrub_opts; } | /* empty */ { bzero(&scrub_opts, sizeof scrub_opts); scrub_opts.rtableid = -1; $$ = scrub_opts; } ; scrub_opts_l : scrub_opts_l scrub_opt | scrub_opt ; scrub_opt : NODF { if (scrub_opts.nodf) { yyerror("no-df cannot be respecified"); YYERROR; } scrub_opts.nodf = 1; } | MINTTL NUMBER { if (scrub_opts.marker & SOM_MINTTL) { yyerror("min-ttl cannot be respecified"); YYERROR; } if ($2 < 0 || $2 > 255) { yyerror("illegal min-ttl value %d", $2); YYERROR; } scrub_opts.marker |= SOM_MINTTL; scrub_opts.minttl = $2; } | MAXMSS NUMBER { if (scrub_opts.marker & SOM_MAXMSS) { yyerror("max-mss cannot be respecified"); YYERROR; } if ($2 < 0 || $2 > 65535) { yyerror("illegal max-mss value %d", $2); YYERROR; } scrub_opts.marker |= SOM_MAXMSS; scrub_opts.maxmss = $2; } | SETTOS tos { if (scrub_opts.marker & SOM_SETTOS) { yyerror("set-tos cannot be respecified"); YYERROR; } scrub_opts.marker |= SOM_SETTOS; scrub_opts.settos = $2; } | fragcache { if (scrub_opts.marker & SOM_FRAGCACHE) { yyerror("fragcache cannot be respecified"); YYERROR; } scrub_opts.marker |= SOM_FRAGCACHE; scrub_opts.fragcache = $1; } | REASSEMBLE STRING { if (strcasecmp($2, "tcp") != 0) { yyerror("scrub reassemble supports only tcp, " "not '%s'", $2); free($2); YYERROR; } free($2); if (scrub_opts.reassemble_tcp) { yyerror("reassemble tcp cannot be respecified"); YYERROR; } scrub_opts.reassemble_tcp = 1; } | RANDOMID { if (scrub_opts.randomid) { yyerror("random-id cannot be respecified"); YYERROR; } scrub_opts.randomid = 1; } | RTABLE NUMBER { if ($2 < 0 || $2 > rt_tableid_max()) { yyerror("invalid rtable id"); YYERROR; } scrub_opts.rtableid = $2; } | not TAGGED string { scrub_opts.match_tag = $3; scrub_opts.match_tag_not = $1; } ; fragcache : FRAGMENT REASSEMBLE { $$ = 0; /* default */ } | FRAGMENT FRAGCROP { $$ = 0; } | FRAGMENT FRAGDROP { $$ = 0; } ; antispoof : ANTISPOOF logquick antispoof_ifspc af antispoof_opts { struct pf_rule r; struct node_host *h = NULL, *hh; struct node_if *i, *j; if (check_rulestate(PFCTL_STATE_FILTER)) YYERROR; for (i = $3; i; i = i->next) { bzero(&r, sizeof(r)); r.action = PF_DROP; r.direction = PF_IN; r.log = $2.log; r.logif = $2.logif; r.quick = $2.quick; r.af = $4; if (rule_label(&r, $5.label)) YYERROR; r.rtableid = $5.rtableid; j = calloc(1, sizeof(struct node_if)); if (j == NULL) err(1, "antispoof: calloc"); if (strlcpy(j->ifname, i->ifname, sizeof(j->ifname)) >= sizeof(j->ifname)) { free(j); yyerror("interface name too long"); YYERROR; } j->not = 1; if (i->dynamic) { h = calloc(1, sizeof(*h)); if (h == NULL) err(1, "address: calloc"); h->addr.type = PF_ADDR_DYNIFTL; set_ipmask(h, 128); if (strlcpy(h->addr.v.ifname, i->ifname, sizeof(h->addr.v.ifname)) >= sizeof(h->addr.v.ifname)) { free(h); yyerror( "interface name too long"); YYERROR; } hh = malloc(sizeof(*hh)); if (hh == NULL) err(1, "address: malloc"); bcopy(h, hh, sizeof(*hh)); h->addr.iflags = PFI_AFLAG_NETWORK; } else { h = ifa_lookup(j->ifname, PFI_AFLAG_NETWORK); hh = NULL; } if (h != NULL) expand_rule(&r, j, NULL, NULL, NULL, h, NULL, NULL, NULL, NULL, NULL, NULL, ""); if ((i->ifa_flags & IFF_LOOPBACK) == 0) { bzero(&r, sizeof(r)); r.action = PF_DROP; r.direction = PF_IN; r.log = $2.log; r.logif = $2.logif; r.quick = $2.quick; r.af = $4; if (rule_label(&r, $5.label)) YYERROR; r.rtableid = $5.rtableid; if (hh != NULL) h = hh; else h = ifa_lookup(i->ifname, 0); if (h != NULL) expand_rule(&r, NULL, NULL, NULL, NULL, h, NULL, NULL, NULL, NULL, NULL, NULL, ""); } else free(hh); } free($5.label); } ; antispoof_ifspc : FOR antispoof_if { $$ = $2; } | FOR '{' optnl antispoof_iflst '}' { $$ = $4; } ; antispoof_iflst : antispoof_if optnl { $$ = $1; } | antispoof_iflst comma antispoof_if optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; antispoof_if : if_item { $$ = $1; } | '(' if_item ')' { $2->dynamic = 1; $$ = $2; } ; antispoof_opts : { bzero(&antispoof_opts, sizeof antispoof_opts); antispoof_opts.rtableid = -1; } antispoof_opts_l { $$ = antispoof_opts; } | /* empty */ { bzero(&antispoof_opts, sizeof antispoof_opts); antispoof_opts.rtableid = -1; $$ = antispoof_opts; } ; antispoof_opts_l : antispoof_opts_l antispoof_opt | antispoof_opt ; antispoof_opt : label { if (antispoof_opts.label) { yyerror("label cannot be redefined"); YYERROR; } antispoof_opts.label = $1; } | RTABLE NUMBER { if ($2 < 0 || $2 > rt_tableid_max()) { yyerror("invalid rtable id"); YYERROR; } antispoof_opts.rtableid = $2; } ; not : '!' { $$ = 1; } | /* empty */ { $$ = 0; } ; tabledef : TABLE '<' STRING '>' table_opts { struct node_host *h, *nh; struct node_tinit *ti, *nti; if (strlen($3) >= PF_TABLE_NAME_SIZE) { yyerror("table name too long, max %d chars", PF_TABLE_NAME_SIZE - 1); free($3); YYERROR; } if (pf->loadopt & PFCTL_FLAG_TABLE) if (process_tabledef($3, &$5)) { free($3); YYERROR; } free($3); for (ti = SIMPLEQ_FIRST(&$5.init_nodes); ti != SIMPLEQ_END(&$5.init_nodes); ti = nti) { if (ti->file) free(ti->file); for (h = ti->host; h != NULL; h = nh) { nh = h->next; free(h); } nti = SIMPLEQ_NEXT(ti, entries); free(ti); } } ; table_opts : { bzero(&table_opts, sizeof table_opts); SIMPLEQ_INIT(&table_opts.init_nodes); } table_opts_l { $$ = table_opts; } | /* empty */ { bzero(&table_opts, sizeof table_opts); SIMPLEQ_INIT(&table_opts.init_nodes); $$ = table_opts; } ; table_opts_l : table_opts_l table_opt | table_opt ; table_opt : STRING { if (!strcmp($1, "const")) table_opts.flags |= PFR_TFLAG_CONST; else if (!strcmp($1, "persist")) table_opts.flags |= PFR_TFLAG_PERSIST; else if (!strcmp($1, "counters")) table_opts.flags |= PFR_TFLAG_COUNTERS; else { yyerror("invalid table option '%s'", $1); free($1); YYERROR; } free($1); } | '{' optnl '}' { table_opts.init_addr = 1; } | '{' optnl host_list '}' { struct node_host *n; struct node_tinit *ti; for (n = $3; n != NULL; n = n->next) { switch (n->addr.type) { case PF_ADDR_ADDRMASK: continue; /* ok */ case PF_ADDR_RANGE: yyerror("address ranges are not " "permitted inside tables"); break; case PF_ADDR_DYNIFTL: yyerror("dynamic addresses are not " "permitted inside tables"); break; case PF_ADDR_TABLE: yyerror("tables cannot contain tables"); break; case PF_ADDR_NOROUTE: yyerror("\"no-route\" is not permitted " "inside tables"); break; case PF_ADDR_URPFFAILED: yyerror("\"urpf-failed\" is not " "permitted inside tables"); break; default: yyerror("unknown address type %d", n->addr.type); } YYERROR; } if (!(ti = calloc(1, sizeof(*ti)))) err(1, "table_opt: calloc"); ti->host = $3; SIMPLEQ_INSERT_TAIL(&table_opts.init_nodes, ti, entries); table_opts.init_addr = 1; } | FILENAME STRING { struct node_tinit *ti; if (!(ti = calloc(1, sizeof(*ti)))) err(1, "table_opt: calloc"); ti->file = $2; SIMPLEQ_INSERT_TAIL(&table_opts.init_nodes, ti, entries); table_opts.init_addr = 1; } ; altqif : ALTQ interface queue_opts QUEUE qassign { struct pf_altq a; if (check_rulestate(PFCTL_STATE_QUEUE)) YYERROR; memset(&a, 0, sizeof(a)); if ($3.scheduler.qtype == ALTQT_NONE) { yyerror("no scheduler specified!"); YYERROR; } a.scheduler = $3.scheduler.qtype; a.qlimit = $3.qlimit; a.tbrsize = $3.tbrsize; if ($5 == NULL && $3.scheduler.qtype != ALTQT_CODEL) { yyerror("no child queues specified"); YYERROR; } if (expand_altq(&a, $2, $5, $3.queue_bwspec, &$3.scheduler)) YYERROR; } ; queuespec : QUEUE STRING interface queue_opts qassign { struct pf_altq a; if (check_rulestate(PFCTL_STATE_QUEUE)) { free($2); YYERROR; } memset(&a, 0, sizeof(a)); if (strlcpy(a.qname, $2, sizeof(a.qname)) >= sizeof(a.qname)) { yyerror("queue name too long (max " "%d chars)", PF_QNAME_SIZE-1); free($2); YYERROR; } free($2); if ($4.tbrsize) { yyerror("cannot specify tbrsize for queue"); YYERROR; } if ($4.priority > 255) { yyerror("priority out of range: max 255"); YYERROR; } a.priority = $4.priority; a.qlimit = $4.qlimit; a.scheduler = $4.scheduler.qtype; if (expand_queue(&a, $3, $5, $4.queue_bwspec, &$4.scheduler)) { yyerror("errors in queue definition"); YYERROR; } } ; queue_opts : { bzero(&queue_opts, sizeof queue_opts); queue_opts.priority = DEFAULT_PRIORITY; queue_opts.qlimit = DEFAULT_QLIMIT; queue_opts.scheduler.qtype = ALTQT_NONE; queue_opts.queue_bwspec.bw_percent = 100; } queue_opts_l { $$ = queue_opts; } | /* empty */ { bzero(&queue_opts, sizeof queue_opts); queue_opts.priority = DEFAULT_PRIORITY; queue_opts.qlimit = DEFAULT_QLIMIT; queue_opts.scheduler.qtype = ALTQT_NONE; queue_opts.queue_bwspec.bw_percent = 100; $$ = queue_opts; } ; queue_opts_l : queue_opts_l queue_opt | queue_opt ; queue_opt : BANDWIDTH bandwidth { if (queue_opts.marker & QOM_BWSPEC) { yyerror("bandwidth cannot be respecified"); YYERROR; } queue_opts.marker |= QOM_BWSPEC; queue_opts.queue_bwspec = $2; } | PRIORITY NUMBER { if (queue_opts.marker & QOM_PRIORITY) { yyerror("priority cannot be respecified"); YYERROR; } if ($2 < 0 || $2 > 255) { yyerror("priority out of range: max 255"); YYERROR; } queue_opts.marker |= QOM_PRIORITY; queue_opts.priority = $2; } | QLIMIT NUMBER { if (queue_opts.marker & QOM_QLIMIT) { yyerror("qlimit cannot be respecified"); YYERROR; } if ($2 < 0 || $2 > 65535) { yyerror("qlimit out of range: max 65535"); YYERROR; } queue_opts.marker |= QOM_QLIMIT; queue_opts.qlimit = $2; } | scheduler { if (queue_opts.marker & QOM_SCHEDULER) { yyerror("scheduler cannot be respecified"); YYERROR; } queue_opts.marker |= QOM_SCHEDULER; queue_opts.scheduler = $1; } | TBRSIZE NUMBER { if (queue_opts.marker & QOM_TBRSIZE) { yyerror("tbrsize cannot be respecified"); YYERROR; } if ($2 < 0 || $2 > 65535) { yyerror("tbrsize too big: max 65535"); YYERROR; } queue_opts.marker |= QOM_TBRSIZE; queue_opts.tbrsize = $2; } ; bandwidth : STRING { double bps; char *cp; $$.bw_percent = 0; bps = strtod($1, &cp); if (cp != NULL) { if (strlen(cp) > 1) { char *cu = cp + 1; if (!strcmp(cu, "Bit") || !strcmp(cu, "B") || !strcmp(cu, "bit") || !strcmp(cu, "b")) { *cu = 0; } } if (!strcmp(cp, "b")) ; /* nothing */ else if (!strcmp(cp, "K")) bps *= 1000; else if (!strcmp(cp, "M")) bps *= 1000 * 1000; else if (!strcmp(cp, "G")) bps *= 1000 * 1000 * 1000; else if (!strcmp(cp, "%")) { if (bps < 0 || bps > 100) { yyerror("bandwidth spec " "out of range"); free($1); YYERROR; } $$.bw_percent = bps; bps = 0; } else { yyerror("unknown unit %s", cp); free($1); YYERROR; } } free($1); $$.bw_absolute = (u_int32_t)bps; } | NUMBER { if ($1 < 0 || $1 > UINT_MAX) { yyerror("bandwidth number too big"); YYERROR; } $$.bw_percent = 0; $$.bw_absolute = $1; } ; scheduler : CBQ { $$.qtype = ALTQT_CBQ; $$.data.cbq_opts.flags = 0; } | CBQ '(' cbqflags_list ')' { $$.qtype = ALTQT_CBQ; $$.data.cbq_opts.flags = $3; } | PRIQ { $$.qtype = ALTQT_PRIQ; $$.data.priq_opts.flags = 0; } | PRIQ '(' priqflags_list ')' { $$.qtype = ALTQT_PRIQ; $$.data.priq_opts.flags = $3; } | HFSC { $$.qtype = ALTQT_HFSC; bzero(&$$.data.hfsc_opts, sizeof(struct node_hfsc_opts)); } | HFSC '(' hfsc_opts ')' { $$.qtype = ALTQT_HFSC; $$.data.hfsc_opts = $3; } | FAIRQ { $$.qtype = ALTQT_FAIRQ; bzero(&$$.data.fairq_opts, sizeof(struct node_fairq_opts)); } | FAIRQ '(' fairq_opts ')' { $$.qtype = ALTQT_FAIRQ; $$.data.fairq_opts = $3; } | CODEL { $$.qtype = ALTQT_CODEL; bzero(&$$.data.codel_opts, sizeof(struct codel_opts)); } | CODEL '(' codel_opts ')' { $$.qtype = ALTQT_CODEL; $$.data.codel_opts = $3; } ; cbqflags_list : cbqflags_item { $$ |= $1; } | cbqflags_list comma cbqflags_item { $$ |= $3; } ; cbqflags_item : STRING { if (!strcmp($1, "default")) $$ = CBQCLF_DEFCLASS; else if (!strcmp($1, "borrow")) $$ = CBQCLF_BORROW; else if (!strcmp($1, "red")) $$ = CBQCLF_RED; else if (!strcmp($1, "ecn")) $$ = CBQCLF_RED|CBQCLF_ECN; else if (!strcmp($1, "rio")) $$ = CBQCLF_RIO; else if (!strcmp($1, "codel")) $$ = CBQCLF_CODEL; else { yyerror("unknown cbq flag \"%s\"", $1); free($1); YYERROR; } free($1); } ; priqflags_list : priqflags_item { $$ |= $1; } | priqflags_list comma priqflags_item { $$ |= $3; } ; priqflags_item : STRING { if (!strcmp($1, "default")) $$ = PRCF_DEFAULTCLASS; else if (!strcmp($1, "red")) $$ = PRCF_RED; else if (!strcmp($1, "ecn")) $$ = PRCF_RED|PRCF_ECN; else if (!strcmp($1, "rio")) $$ = PRCF_RIO; else if (!strcmp($1, "codel")) $$ = PRCF_CODEL; else { yyerror("unknown priq flag \"%s\"", $1); free($1); YYERROR; } free($1); } ; hfsc_opts : { bzero(&hfsc_opts, sizeof(struct node_hfsc_opts)); } hfscopts_list { $$ = hfsc_opts; } ; hfscopts_list : hfscopts_item | hfscopts_list comma hfscopts_item ; hfscopts_item : LINKSHARE bandwidth { if (hfsc_opts.linkshare.used) { yyerror("linkshare already specified"); YYERROR; } hfsc_opts.linkshare.m2 = $2; hfsc_opts.linkshare.used = 1; } | LINKSHARE '(' bandwidth comma NUMBER comma bandwidth ')' { if ($5 < 0 || $5 > INT_MAX) { yyerror("timing in curve out of range"); YYERROR; } if (hfsc_opts.linkshare.used) { yyerror("linkshare already specified"); YYERROR; } hfsc_opts.linkshare.m1 = $3; hfsc_opts.linkshare.d = $5; hfsc_opts.linkshare.m2 = $7; hfsc_opts.linkshare.used = 1; } | REALTIME bandwidth { if (hfsc_opts.realtime.used) { yyerror("realtime already specified"); YYERROR; } hfsc_opts.realtime.m2 = $2; hfsc_opts.realtime.used = 1; } | REALTIME '(' bandwidth comma NUMBER comma bandwidth ')' { if ($5 < 0 || $5 > INT_MAX) { yyerror("timing in curve out of range"); YYERROR; } if (hfsc_opts.realtime.used) { yyerror("realtime already specified"); YYERROR; } hfsc_opts.realtime.m1 = $3; hfsc_opts.realtime.d = $5; hfsc_opts.realtime.m2 = $7; hfsc_opts.realtime.used = 1; } | UPPERLIMIT bandwidth { if (hfsc_opts.upperlimit.used) { yyerror("upperlimit already specified"); YYERROR; } hfsc_opts.upperlimit.m2 = $2; hfsc_opts.upperlimit.used = 1; } | UPPERLIMIT '(' bandwidth comma NUMBER comma bandwidth ')' { if ($5 < 0 || $5 > INT_MAX) { yyerror("timing in curve out of range"); YYERROR; } if (hfsc_opts.upperlimit.used) { yyerror("upperlimit already specified"); YYERROR; } hfsc_opts.upperlimit.m1 = $3; hfsc_opts.upperlimit.d = $5; hfsc_opts.upperlimit.m2 = $7; hfsc_opts.upperlimit.used = 1; } | STRING { if (!strcmp($1, "default")) hfsc_opts.flags |= HFCF_DEFAULTCLASS; else if (!strcmp($1, "red")) hfsc_opts.flags |= HFCF_RED; else if (!strcmp($1, "ecn")) hfsc_opts.flags |= HFCF_RED|HFCF_ECN; else if (!strcmp($1, "rio")) hfsc_opts.flags |= HFCF_RIO; else if (!strcmp($1, "codel")) hfsc_opts.flags |= HFCF_CODEL; else { yyerror("unknown hfsc flag \"%s\"", $1); free($1); YYERROR; } free($1); } ; fairq_opts : { bzero(&fairq_opts, sizeof(struct node_fairq_opts)); } fairqopts_list { $$ = fairq_opts; } ; fairqopts_list : fairqopts_item | fairqopts_list comma fairqopts_item ; fairqopts_item : LINKSHARE bandwidth { if (fairq_opts.linkshare.used) { yyerror("linkshare already specified"); YYERROR; } fairq_opts.linkshare.m2 = $2; fairq_opts.linkshare.used = 1; } | LINKSHARE '(' bandwidth number bandwidth ')' { if (fairq_opts.linkshare.used) { yyerror("linkshare already specified"); YYERROR; } fairq_opts.linkshare.m1 = $3; fairq_opts.linkshare.d = $4; fairq_opts.linkshare.m2 = $5; fairq_opts.linkshare.used = 1; } | HOGS bandwidth { fairq_opts.hogs_bw = $2; } | BUCKETS number { fairq_opts.nbuckets = $2; } | STRING { if (!strcmp($1, "default")) fairq_opts.flags |= FARF_DEFAULTCLASS; else if (!strcmp($1, "red")) fairq_opts.flags |= FARF_RED; else if (!strcmp($1, "ecn")) fairq_opts.flags |= FARF_RED|FARF_ECN; else if (!strcmp($1, "rio")) fairq_opts.flags |= FARF_RIO; else if (!strcmp($1, "codel")) fairq_opts.flags |= FARF_CODEL; else { yyerror("unknown fairq flag \"%s\"", $1); free($1); YYERROR; } free($1); } ; codel_opts : { bzero(&codel_opts, sizeof(struct codel_opts)); } codelopts_list { $$ = codel_opts; } ; codelopts_list : codelopts_item | codelopts_list comma codelopts_item ; codelopts_item : INTERVAL number { if (codel_opts.interval) { yyerror("interval already specified"); YYERROR; } codel_opts.interval = $2; } | TARGET number { if (codel_opts.target) { yyerror("target already specified"); YYERROR; } codel_opts.target = $2; } | STRING { if (!strcmp($1, "ecn")) codel_opts.ecn = 1; else { yyerror("unknown codel option \"%s\"", $1); free($1); YYERROR; } free($1); } ; qassign : /* empty */ { $$ = NULL; } | qassign_item { $$ = $1; } | '{' optnl qassign_list '}' { $$ = $3; } ; qassign_list : qassign_item optnl { $$ = $1; } | qassign_list comma qassign_item optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; qassign_item : STRING { $$ = calloc(1, sizeof(struct node_queue)); if ($$ == NULL) err(1, "qassign_item: calloc"); if (strlcpy($$->queue, $1, sizeof($$->queue)) >= sizeof($$->queue)) { yyerror("queue name '%s' too long (max " "%d chars)", $1, sizeof($$->queue)-1); free($1); free($$); YYERROR; } free($1); $$->next = NULL; $$->tail = $$; } ; pfrule : action dir logquick interface route af proto fromto filter_opts { struct pf_rule r; struct node_state_opt *o; struct node_proto *proto; int srctrack = 0; int statelock = 0; int adaptive = 0; int defaults = 0; if (check_rulestate(PFCTL_STATE_FILTER)) YYERROR; memset(&r, 0, sizeof(r)); r.action = $1.b1; switch ($1.b2) { case PFRULE_RETURNRST: r.rule_flag |= PFRULE_RETURNRST; r.return_ttl = $1.w; break; case PFRULE_RETURNICMP: r.rule_flag |= PFRULE_RETURNICMP; r.return_icmp = $1.w; r.return_icmp6 = $1.w2; break; case PFRULE_RETURN: r.rule_flag |= PFRULE_RETURN; r.return_icmp = $1.w; r.return_icmp6 = $1.w2; break; } r.direction = $2; r.log = $3.log; r.logif = $3.logif; r.quick = $3.quick; r.prob = $9.prob; r.rtableid = $9.rtableid; if ($9.marker & FOM_PRIO) { if ($9.prio == 0) r.prio = PF_PRIO_ZERO; else r.prio = $9.prio; } if ($9.marker & FOM_SETPRIO) { r.set_prio[0] = $9.set_prio[0]; r.set_prio[1] = $9.set_prio[1]; r.scrub_flags |= PFSTATE_SETPRIO; } r.af = $6; if ($9.tag) if (strlcpy(r.tagname, $9.tag, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } if ($9.match_tag) if (strlcpy(r.match_tagname, $9.match_tag, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } r.match_tag_not = $9.match_tag_not; if (rule_label(&r, $9.label)) YYERROR; free($9.label); r.flags = $9.flags.b1; r.flagset = $9.flags.b2; if (($9.flags.b1 & $9.flags.b2) != $9.flags.b1) { yyerror("flags always false"); YYERROR; } if ($9.flags.b1 || $9.flags.b2 || $8.src_os) { for (proto = $7; proto != NULL && proto->proto != IPPROTO_TCP; proto = proto->next) ; /* nothing */ if (proto == NULL && $7 != NULL) { if ($9.flags.b1 || $9.flags.b2) yyerror( "flags only apply to tcp"); if ($8.src_os) yyerror( "OS fingerprinting only " "apply to tcp"); YYERROR; } #if 0 if (($9.flags.b1 & parse_flags("S")) == 0 && $8.src_os) { yyerror("OS fingerprinting requires " "the SYN TCP flag (flags S/SA)"); YYERROR; } #endif } r.tos = $9.tos; r.keep_state = $9.keep.action; o = $9.keep.options; /* 'keep state' by default on pass rules. */ if (!r.keep_state && !r.action && !($9.marker & FOM_KEEP)) { r.keep_state = PF_STATE_NORMAL; o = keep_state_defaults; defaults = 1; } while (o) { struct node_state_opt *p = o; switch (o->type) { case PF_STATE_OPT_MAX: if (r.max_states) { yyerror("state option 'max' " "multiple definitions"); YYERROR; } r.max_states = o->data.max_states; break; case PF_STATE_OPT_NOSYNC: if (r.rule_flag & PFRULE_NOSYNC) { yyerror("state option 'sync' " "multiple definitions"); YYERROR; } r.rule_flag |= PFRULE_NOSYNC; break; case PF_STATE_OPT_SRCTRACK: if (srctrack) { yyerror("state option " "'source-track' " "multiple definitions"); YYERROR; } srctrack = o->data.src_track; r.rule_flag |= PFRULE_SRCTRACK; break; case PF_STATE_OPT_MAX_SRC_STATES: if (r.max_src_states) { yyerror("state option " "'max-src-states' " "multiple definitions"); YYERROR; } if (o->data.max_src_states == 0) { yyerror("'max-src-states' must " "be > 0"); YYERROR; } r.max_src_states = o->data.max_src_states; r.rule_flag |= PFRULE_SRCTRACK; break; case PF_STATE_OPT_OVERLOAD: if (r.overload_tblname[0]) { yyerror("multiple 'overload' " "table definitions"); YYERROR; } if (strlcpy(r.overload_tblname, o->data.overload.tblname, PF_TABLE_NAME_SIZE) >= PF_TABLE_NAME_SIZE) { yyerror("state option: " "strlcpy"); YYERROR; } r.flush = o->data.overload.flush; break; case PF_STATE_OPT_MAX_SRC_CONN: if (r.max_src_conn) { yyerror("state option " "'max-src-conn' " "multiple definitions"); YYERROR; } if (o->data.max_src_conn == 0) { yyerror("'max-src-conn' " "must be > 0"); YYERROR; } r.max_src_conn = o->data.max_src_conn; r.rule_flag |= PFRULE_SRCTRACK | PFRULE_RULESRCTRACK; break; case PF_STATE_OPT_MAX_SRC_CONN_RATE: if (r.max_src_conn_rate.limit) { yyerror("state option " "'max-src-conn-rate' " "multiple definitions"); YYERROR; } if (!o->data.max_src_conn_rate.limit || !o->data.max_src_conn_rate.seconds) { yyerror("'max-src-conn-rate' " "values must be > 0"); YYERROR; } if (o->data.max_src_conn_rate.limit > PF_THRESHOLD_MAX) { yyerror("'max-src-conn-rate' " "maximum rate must be < %u", PF_THRESHOLD_MAX); YYERROR; } r.max_src_conn_rate.limit = o->data.max_src_conn_rate.limit; r.max_src_conn_rate.seconds = o->data.max_src_conn_rate.seconds; r.rule_flag |= PFRULE_SRCTRACK | PFRULE_RULESRCTRACK; break; case PF_STATE_OPT_MAX_SRC_NODES: if (r.max_src_nodes) { yyerror("state option " "'max-src-nodes' " "multiple definitions"); YYERROR; } if (o->data.max_src_nodes == 0) { yyerror("'max-src-nodes' must " "be > 0"); YYERROR; } r.max_src_nodes = o->data.max_src_nodes; r.rule_flag |= PFRULE_SRCTRACK | PFRULE_RULESRCTRACK; break; case PF_STATE_OPT_STATELOCK: if (statelock) { yyerror("state locking option: " "multiple definitions"); YYERROR; } statelock = 1; r.rule_flag |= o->data.statelock; break; case PF_STATE_OPT_SLOPPY: if (r.rule_flag & PFRULE_STATESLOPPY) { yyerror("state sloppy option: " "multiple definitions"); YYERROR; } r.rule_flag |= PFRULE_STATESLOPPY; break; case PF_STATE_OPT_TIMEOUT: if (o->data.timeout.number == PFTM_ADAPTIVE_START || o->data.timeout.number == PFTM_ADAPTIVE_END) adaptive = 1; if (r.timeout[o->data.timeout.number]) { yyerror("state timeout %s " "multiple definitions", pf_timeouts[o->data. timeout.number].name); YYERROR; } r.timeout[o->data.timeout.number] = o->data.timeout.seconds; } o = o->next; if (!defaults) free(p); } /* 'flags S/SA' by default on stateful rules */ if (!r.action && !r.flags && !r.flagset && !$9.fragment && !($9.marker & FOM_FLAGS) && r.keep_state) { r.flags = parse_flags("S"); r.flagset = parse_flags("SA"); } if (!adaptive && r.max_states) { r.timeout[PFTM_ADAPTIVE_START] = (r.max_states / 10) * 6; r.timeout[PFTM_ADAPTIVE_END] = (r.max_states / 10) * 12; } if (r.rule_flag & PFRULE_SRCTRACK) { if (srctrack == PF_SRCTRACK_GLOBAL && r.max_src_nodes) { yyerror("'max-src-nodes' is " "incompatible with " "'source-track global'"); YYERROR; } if (srctrack == PF_SRCTRACK_GLOBAL && r.max_src_conn) { yyerror("'max-src-conn' is " "incompatible with " "'source-track global'"); YYERROR; } if (srctrack == PF_SRCTRACK_GLOBAL && r.max_src_conn_rate.seconds) { yyerror("'max-src-conn-rate' is " "incompatible with " "'source-track global'"); YYERROR; } if (r.timeout[PFTM_SRC_NODE] < r.max_src_conn_rate.seconds) r.timeout[PFTM_SRC_NODE] = r.max_src_conn_rate.seconds; r.rule_flag |= PFRULE_SRCTRACK; if (srctrack == PF_SRCTRACK_RULE) r.rule_flag |= PFRULE_RULESRCTRACK; } if (r.keep_state && !statelock) r.rule_flag |= default_statelock; if ($9.fragment) r.rule_flag |= PFRULE_FRAGMENT; r.allow_opts = $9.allowopts; decide_address_family($8.src.host, &r.af); decide_address_family($8.dst.host, &r.af); if ($5.rt) { if (!r.direction) { yyerror("direction must be explicit " "with rules that specify routing"); YYERROR; } r.rt = $5.rt; r.rpool.opts = $5.pool_opts; if ($5.key != NULL) memcpy(&r.rpool.key, $5.key, sizeof(struct pf_poolhashkey)); } if (r.rt && r.rt != PF_FASTROUTE) { decide_address_family($5.host, &r.af); remove_invalid_hosts(&$5.host, &r.af); if ($5.host == NULL) { yyerror("no routing address with " "matching address family found."); YYERROR; } if ((r.rpool.opts & PF_POOL_TYPEMASK) == PF_POOL_NONE && ($5.host->next != NULL || $5.host->addr.type == PF_ADDR_TABLE || DYNIF_MULTIADDR($5.host->addr))) r.rpool.opts |= PF_POOL_ROUNDROBIN; if ((r.rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_ROUNDROBIN && disallow_table($5.host, "tables are only " "supported in round-robin routing pools")) YYERROR; if ((r.rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_ROUNDROBIN && disallow_alias($5.host, "interface (%s) " "is only supported in round-robin " "routing pools")) YYERROR; if ($5.host->next != NULL) { if ((r.rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_ROUNDROBIN) { yyerror("r.rpool.opts must " "be PF_POOL_ROUNDROBIN"); YYERROR; } } } if ($9.queues.qname != NULL) { if (strlcpy(r.qname, $9.queues.qname, sizeof(r.qname)) >= sizeof(r.qname)) { yyerror("rule qname too long (max " "%d chars)", sizeof(r.qname)-1); YYERROR; } free($9.queues.qname); } if ($9.queues.pqname != NULL) { if (strlcpy(r.pqname, $9.queues.pqname, sizeof(r.pqname)) >= sizeof(r.pqname)) { yyerror("rule pqname too long (max " "%d chars)", sizeof(r.pqname)-1); YYERROR; } free($9.queues.pqname); } #ifdef __FreeBSD__ r.divert.port = $9.divert.port; #else if ((r.divert.port = $9.divert.port)) { if (r.direction == PF_OUT) { if ($9.divert.addr) { yyerror("address specified " "for outgoing divert"); YYERROR; } bzero(&r.divert.addr, sizeof(r.divert.addr)); } else { if (!$9.divert.addr) { yyerror("no address specified " "for incoming divert"); YYERROR; } if ($9.divert.addr->af != r.af) { yyerror("address family " "mismatch for divert"); YYERROR; } r.divert.addr = $9.divert.addr->addr.v.a.addr; } } #endif expand_rule(&r, $4, $5.host, $7, $8.src_os, $8.src.host, $8.src.port, $8.dst.host, $8.dst.port, $9.uid, $9.gid, $9.icmpspec, ""); } ; filter_opts : { bzero(&filter_opts, sizeof filter_opts); filter_opts.rtableid = -1; } filter_opts_l { $$ = filter_opts; } | /* empty */ { bzero(&filter_opts, sizeof filter_opts); filter_opts.rtableid = -1; $$ = filter_opts; } ; filter_opts_l : filter_opts_l filter_opt | filter_opt ; filter_opt : USER uids { if (filter_opts.uid) $2->tail->next = filter_opts.uid; filter_opts.uid = $2; } | GROUP gids { if (filter_opts.gid) $2->tail->next = filter_opts.gid; filter_opts.gid = $2; } | flags { if (filter_opts.marker & FOM_FLAGS) { yyerror("flags cannot be redefined"); YYERROR; } filter_opts.marker |= FOM_FLAGS; filter_opts.flags.b1 |= $1.b1; filter_opts.flags.b2 |= $1.b2; filter_opts.flags.w |= $1.w; filter_opts.flags.w2 |= $1.w2; } | icmpspec { if (filter_opts.marker & FOM_ICMP) { yyerror("icmp-type cannot be redefined"); YYERROR; } filter_opts.marker |= FOM_ICMP; filter_opts.icmpspec = $1; } | PRIO NUMBER { if (filter_opts.marker & FOM_PRIO) { yyerror("prio cannot be redefined"); YYERROR; } if ($2 < 0 || $2 > PF_PRIO_MAX) { yyerror("prio must be 0 - %u", PF_PRIO_MAX); YYERROR; } filter_opts.marker |= FOM_PRIO; filter_opts.prio = $2; } | TOS tos { if (filter_opts.marker & FOM_TOS) { yyerror("tos cannot be redefined"); YYERROR; } filter_opts.marker |= FOM_TOS; filter_opts.tos = $2; } | keep { if (filter_opts.marker & FOM_KEEP) { yyerror("modulate or keep cannot be redefined"); YYERROR; } filter_opts.marker |= FOM_KEEP; filter_opts.keep.action = $1.action; filter_opts.keep.options = $1.options; } | FRAGMENT { filter_opts.fragment = 1; } | ALLOWOPTS { filter_opts.allowopts = 1; } | label { if (filter_opts.label) { yyerror("label cannot be redefined"); YYERROR; } filter_opts.label = $1; } | qname { if (filter_opts.queues.qname) { yyerror("queue cannot be redefined"); YYERROR; } filter_opts.queues = $1; } | TAG string { filter_opts.tag = $2; } | not TAGGED string { filter_opts.match_tag = $3; filter_opts.match_tag_not = $1; } | PROBABILITY probability { double p; p = floor($2 * UINT_MAX + 0.5); if (p < 0.0 || p > UINT_MAX) { yyerror("invalid probability: %lf", p); YYERROR; } filter_opts.prob = (u_int32_t)p; if (filter_opts.prob == 0) filter_opts.prob = 1; } | RTABLE NUMBER { if ($2 < 0 || $2 > rt_tableid_max()) { yyerror("invalid rtable id"); YYERROR; } filter_opts.rtableid = $2; } | DIVERTTO portplain { #ifdef __FreeBSD__ filter_opts.divert.port = $2.a; if (!filter_opts.divert.port) { yyerror("invalid divert port: %u", ntohs($2.a)); YYERROR; } #endif } | DIVERTTO STRING PORT portplain { #ifndef __FreeBSD__ if ((filter_opts.divert.addr = host($2)) == NULL) { yyerror("could not parse divert address: %s", $2); free($2); YYERROR; } #else if ($2) #endif free($2); filter_opts.divert.port = $4.a; if (!filter_opts.divert.port) { yyerror("invalid divert port: %u", ntohs($4.a)); YYERROR; } } | DIVERTREPLY { #ifdef __FreeBSD__ yyerror("divert-reply has no meaning in FreeBSD pf(4)"); YYERROR; #else filter_opts.divert.port = 1; /* some random value */ #endif } | filter_sets ; filter_sets : SET '(' filter_sets_l ')' { $$ = filter_opts; } | SET filter_set { $$ = filter_opts; } ; filter_sets_l : filter_sets_l comma filter_set | filter_set ; filter_set : prio { if (filter_opts.marker & FOM_SETPRIO) { yyerror("prio cannot be redefined"); YYERROR; } filter_opts.marker |= FOM_SETPRIO; filter_opts.set_prio[0] = $1.b1; filter_opts.set_prio[1] = $1.b2; } prio : PRIO NUMBER { if ($2 < 0 || $2 > PF_PRIO_MAX) { yyerror("prio must be 0 - %u", PF_PRIO_MAX); YYERROR; } $$.b1 = $$.b2 = $2; } | PRIO '(' NUMBER comma NUMBER ')' { if ($3 < 0 || $3 > PF_PRIO_MAX || $5 < 0 || $5 > PF_PRIO_MAX) { yyerror("prio must be 0 - %u", PF_PRIO_MAX); YYERROR; } $$.b1 = $3; $$.b2 = $5; } ; probability : STRING { char *e; double p = strtod($1, &e); if (*e == '%') { p *= 0.01; e++; } if (*e) { yyerror("invalid probability: %s", $1); free($1); YYERROR; } free($1); $$ = p; } | NUMBER { $$ = (double)$1; } ; action : PASS { $$.b1 = PF_PASS; $$.b2 = $$.w = 0; } | BLOCK blockspec { $$ = $2; $$.b1 = PF_DROP; } ; blockspec : /* empty */ { $$.b2 = blockpolicy; $$.w = returnicmpdefault; $$.w2 = returnicmp6default; } | DROP { $$.b2 = PFRULE_DROP; $$.w = 0; $$.w2 = 0; } | RETURNRST { $$.b2 = PFRULE_RETURNRST; $$.w = 0; $$.w2 = 0; } | RETURNRST '(' TTL NUMBER ')' { if ($4 < 0 || $4 > 255) { yyerror("illegal ttl value %d", $4); YYERROR; } $$.b2 = PFRULE_RETURNRST; $$.w = $4; $$.w2 = 0; } | RETURNICMP { $$.b2 = PFRULE_RETURNICMP; $$.w = returnicmpdefault; $$.w2 = returnicmp6default; } | RETURNICMP6 { $$.b2 = PFRULE_RETURNICMP; $$.w = returnicmpdefault; $$.w2 = returnicmp6default; } | RETURNICMP '(' reticmpspec ')' { $$.b2 = PFRULE_RETURNICMP; $$.w = $3; $$.w2 = returnicmpdefault; } | RETURNICMP6 '(' reticmp6spec ')' { $$.b2 = PFRULE_RETURNICMP; $$.w = returnicmpdefault; $$.w2 = $3; } | RETURNICMP '(' reticmpspec comma reticmp6spec ')' { $$.b2 = PFRULE_RETURNICMP; $$.w = $3; $$.w2 = $5; } | RETURN { $$.b2 = PFRULE_RETURN; $$.w = returnicmpdefault; $$.w2 = returnicmp6default; } ; reticmpspec : STRING { if (!($$ = parseicmpspec($1, AF_INET))) { free($1); YYERROR; } free($1); } | NUMBER { u_int8_t icmptype; if ($1 < 0 || $1 > 255) { yyerror("invalid icmp code %lu", $1); YYERROR; } icmptype = returnicmpdefault >> 8; $$ = (icmptype << 8 | $1); } ; reticmp6spec : STRING { if (!($$ = parseicmpspec($1, AF_INET6))) { free($1); YYERROR; } free($1); } | NUMBER { u_int8_t icmptype; if ($1 < 0 || $1 > 255) { yyerror("invalid icmp code %lu", $1); YYERROR; } icmptype = returnicmp6default >> 8; $$ = (icmptype << 8 | $1); } ; dir : /* empty */ { $$ = PF_INOUT; } | IN { $$ = PF_IN; } | OUT { $$ = PF_OUT; } ; quick : /* empty */ { $$.quick = 0; } | QUICK { $$.quick = 1; } ; logquick : /* empty */ { $$.log = 0; $$.quick = 0; $$.logif = 0; } | log { $$ = $1; $$.quick = 0; } | QUICK { $$.quick = 1; $$.log = 0; $$.logif = 0; } | log QUICK { $$ = $1; $$.quick = 1; } | QUICK log { $$ = $2; $$.quick = 1; } ; log : LOG { $$.log = PF_LOG; $$.logif = 0; } | LOG '(' logopts ')' { $$.log = PF_LOG | $3.log; $$.logif = $3.logif; } ; logopts : logopt { $$ = $1; } | logopts comma logopt { $$.log = $1.log | $3.log; $$.logif = $3.logif; if ($$.logif == 0) $$.logif = $1.logif; } ; logopt : ALL { $$.log = PF_LOG_ALL; $$.logif = 0; } | USER { $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; } | GROUP { $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; } | TO string { const char *errstr; u_int i; $$.log = 0; if (strncmp($2, "pflog", 5)) { yyerror("%s: should be a pflog interface", $2); free($2); YYERROR; } i = strtonum($2 + 5, 0, 255, &errstr); if (errstr) { yyerror("%s: %s", $2, errstr); free($2); YYERROR; } free($2); $$.logif = i; } ; interface : /* empty */ { $$ = NULL; } | ON if_item_not { $$ = $2; } | ON '{' optnl if_list '}' { $$ = $4; } ; if_list : if_item_not optnl { $$ = $1; } | if_list comma if_item_not optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; if_item_not : not if_item { $$ = $2; $$->not = $1; } ; if_item : STRING { struct node_host *n; $$ = calloc(1, sizeof(struct node_if)); if ($$ == NULL) err(1, "if_item: calloc"); if (strlcpy($$->ifname, $1, sizeof($$->ifname)) >= sizeof($$->ifname)) { free($1); free($$); yyerror("interface name too long"); YYERROR; } if ((n = ifa_exists($1)) != NULL) $$->ifa_flags = n->ifa_flags; free($1); $$->not = 0; $$->next = NULL; $$->tail = $$; } ; af : /* empty */ { $$ = 0; } | INET { $$ = AF_INET; } | INET6 { $$ = AF_INET6; } ; proto : /* empty */ { $$ = NULL; } | PROTO proto_item { $$ = $2; } | PROTO '{' optnl proto_list '}' { $$ = $4; } ; proto_list : proto_item optnl { $$ = $1; } | proto_list comma proto_item optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; proto_item : protoval { u_int8_t pr; pr = (u_int8_t)$1; if (pr == 0) { yyerror("proto 0 cannot be used"); YYERROR; } $$ = calloc(1, sizeof(struct node_proto)); if ($$ == NULL) err(1, "proto_item: calloc"); $$->proto = pr; $$->next = NULL; $$->tail = $$; } ; protoval : STRING { struct protoent *p; p = getprotobyname($1); if (p == NULL) { yyerror("unknown protocol %s", $1); free($1); YYERROR; } $$ = p->p_proto; free($1); } | NUMBER { if ($1 < 0 || $1 > 255) { yyerror("protocol outside range"); YYERROR; } } ; fromto : ALL { $$.src.host = NULL; $$.src.port = NULL; $$.dst.host = NULL; $$.dst.port = NULL; $$.src_os = NULL; } | from os to { $$.src = $1; $$.src_os = $2; $$.dst = $3; } ; os : /* empty */ { $$ = NULL; } | OS xos { $$ = $2; } | OS '{' optnl os_list '}' { $$ = $4; } ; xos : STRING { $$ = calloc(1, sizeof(struct node_os)); if ($$ == NULL) err(1, "os: calloc"); $$->os = $1; $$->tail = $$; } ; os_list : xos optnl { $$ = $1; } | os_list comma xos optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; from : /* empty */ { $$.host = NULL; $$.port = NULL; } | FROM ipportspec { $$ = $2; } ; to : /* empty */ { $$.host = NULL; $$.port = NULL; } | TO ipportspec { if (disallow_urpf_failed($2.host, "\"urpf-failed\" is " "not permitted in a destination address")) YYERROR; $$ = $2; } ; ipportspec : ipspec { $$.host = $1; $$.port = NULL; } | ipspec PORT portspec { $$.host = $1; $$.port = $3; } | PORT portspec { $$.host = NULL; $$.port = $2; } ; optnl : '\n' optnl | ; ipspec : ANY { $$ = NULL; } | xhost { $$ = $1; } | '{' optnl host_list '}' { $$ = $3; } ; toipspec : TO ipspec { $$ = $2; } | /* empty */ { $$ = NULL; } ; host_list : ipspec optnl { $$ = $1; } | host_list comma ipspec optnl { if ($3 == NULL) $$ = $1; else if ($1 == NULL) $$ = $3; else { $1->tail->next = $3; $1->tail = $3->tail; $$ = $1; } } ; xhost : not host { struct node_host *n; for (n = $2; n != NULL; n = n->next) n->not = $1; $$ = $2; } | not NOROUTE { $$ = calloc(1, sizeof(struct node_host)); if ($$ == NULL) err(1, "xhost: calloc"); $$->addr.type = PF_ADDR_NOROUTE; $$->next = NULL; $$->not = $1; $$->tail = $$; } | not URPFFAILED { $$ = calloc(1, sizeof(struct node_host)); if ($$ == NULL) err(1, "xhost: calloc"); $$->addr.type = PF_ADDR_URPFFAILED; $$->next = NULL; $$->not = $1; $$->tail = $$; } ; host : STRING { if (($$ = host($1)) == NULL) { /* error. "any" is handled elsewhere */ free($1); yyerror("could not parse host specification"); YYERROR; } free($1); } | STRING '-' STRING { struct node_host *b, *e; if ((b = host($1)) == NULL || (e = host($3)) == NULL) { free($1); free($3); yyerror("could not parse host specification"); YYERROR; } if (b->af != e->af || b->addr.type != PF_ADDR_ADDRMASK || e->addr.type != PF_ADDR_ADDRMASK || unmask(&b->addr.v.a.mask, b->af) != (b->af == AF_INET ? 32 : 128) || unmask(&e->addr.v.a.mask, e->af) != (e->af == AF_INET ? 32 : 128) || b->next != NULL || b->not || e->next != NULL || e->not) { free(b); free(e); free($1); free($3); yyerror("invalid address range"); YYERROR; } memcpy(&b->addr.v.a.mask, &e->addr.v.a.addr, sizeof(b->addr.v.a.mask)); b->addr.type = PF_ADDR_RANGE; $$ = b; free(e); free($1); free($3); } | STRING '/' NUMBER { char *buf; if (asprintf(&buf, "%s/%lld", $1, (long long)$3) == -1) err(1, "host: asprintf"); free($1); if (($$ = host(buf)) == NULL) { /* error. "any" is handled elsewhere */ free(buf); yyerror("could not parse host specification"); YYERROR; } free(buf); } | NUMBER '/' NUMBER { char *buf; /* ie. for 10/8 parsing */ #ifdef __FreeBSD__ if (asprintf(&buf, "%lld/%lld", (long long)$1, (long long)$3) == -1) #else if (asprintf(&buf, "%lld/%lld", $1, $3) == -1) #endif err(1, "host: asprintf"); if (($$ = host(buf)) == NULL) { /* error. "any" is handled elsewhere */ free(buf); yyerror("could not parse host specification"); YYERROR; } free(buf); } | dynaddr | dynaddr '/' NUMBER { struct node_host *n; if ($3 < 0 || $3 > 128) { yyerror("bit number too big"); YYERROR; } $$ = $1; for (n = $1; n != NULL; n = n->next) set_ipmask(n, $3); } | '<' STRING '>' { if (strlen($2) >= PF_TABLE_NAME_SIZE) { yyerror("table name '%s' too long", $2); free($2); YYERROR; } $$ = calloc(1, sizeof(struct node_host)); if ($$ == NULL) err(1, "host: calloc"); $$->addr.type = PF_ADDR_TABLE; if (strlcpy($$->addr.v.tblname, $2, sizeof($$->addr.v.tblname)) >= sizeof($$->addr.v.tblname)) errx(1, "host: strlcpy"); free($2); $$->next = NULL; $$->tail = $$; } ; number : NUMBER | STRING { u_long ulval; if (atoul($1, &ulval) == -1) { yyerror("%s is not a number", $1); free($1); YYERROR; } else $$ = ulval; free($1); } ; dynaddr : '(' STRING ')' { int flags = 0; char *p, *op; op = $2; if (!isalpha(op[0])) { yyerror("invalid interface name '%s'", op); free(op); YYERROR; } while ((p = strrchr($2, ':')) != NULL) { if (!strcmp(p+1, "network")) flags |= PFI_AFLAG_NETWORK; else if (!strcmp(p+1, "broadcast")) flags |= PFI_AFLAG_BROADCAST; else if (!strcmp(p+1, "peer")) flags |= PFI_AFLAG_PEER; else if (!strcmp(p+1, "0")) flags |= PFI_AFLAG_NOALIAS; else { yyerror("interface %s has bad modifier", $2); free(op); YYERROR; } *p = '\0'; } if (flags & (flags - 1) & PFI_AFLAG_MODEMASK) { free(op); yyerror("illegal combination of " "interface modifiers"); YYERROR; } $$ = calloc(1, sizeof(struct node_host)); if ($$ == NULL) err(1, "address: calloc"); $$->af = 0; set_ipmask($$, 128); $$->addr.type = PF_ADDR_DYNIFTL; $$->addr.iflags = flags; if (strlcpy($$->addr.v.ifname, $2, sizeof($$->addr.v.ifname)) >= sizeof($$->addr.v.ifname)) { free(op); free($$); yyerror("interface name too long"); YYERROR; } free(op); $$->next = NULL; $$->tail = $$; } ; portspec : port_item { $$ = $1; } | '{' optnl port_list '}' { $$ = $3; } ; port_list : port_item optnl { $$ = $1; } | port_list comma port_item optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; port_item : portrange { $$ = calloc(1, sizeof(struct node_port)); if ($$ == NULL) err(1, "port_item: calloc"); $$->port[0] = $1.a; $$->port[1] = $1.b; if ($1.t) $$->op = PF_OP_RRG; else $$->op = PF_OP_EQ; $$->next = NULL; $$->tail = $$; } | unaryop portrange { if ($2.t) { yyerror("':' cannot be used with an other " "port operator"); YYERROR; } $$ = calloc(1, sizeof(struct node_port)); if ($$ == NULL) err(1, "port_item: calloc"); $$->port[0] = $2.a; $$->port[1] = $2.b; $$->op = $1; $$->next = NULL; $$->tail = $$; } | portrange PORTBINARY portrange { if ($1.t || $3.t) { yyerror("':' cannot be used with an other " "port operator"); YYERROR; } $$ = calloc(1, sizeof(struct node_port)); if ($$ == NULL) err(1, "port_item: calloc"); $$->port[0] = $1.a; $$->port[1] = $3.a; $$->op = $2; $$->next = NULL; $$->tail = $$; } ; portplain : numberstring { if (parseport($1, &$$, 0) == -1) { free($1); YYERROR; } free($1); } ; portrange : numberstring { if (parseport($1, &$$, PPORT_RANGE) == -1) { free($1); YYERROR; } free($1); } ; uids : uid_item { $$ = $1; } | '{' optnl uid_list '}' { $$ = $3; } ; uid_list : uid_item optnl { $$ = $1; } | uid_list comma uid_item optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; uid_item : uid { $$ = calloc(1, sizeof(struct node_uid)); if ($$ == NULL) err(1, "uid_item: calloc"); $$->uid[0] = $1; $$->uid[1] = $1; $$->op = PF_OP_EQ; $$->next = NULL; $$->tail = $$; } | unaryop uid { if ($2 == UID_MAX && $1 != PF_OP_EQ && $1 != PF_OP_NE) { yyerror("user unknown requires operator = or " "!="); YYERROR; } $$ = calloc(1, sizeof(struct node_uid)); if ($$ == NULL) err(1, "uid_item: calloc"); $$->uid[0] = $2; $$->uid[1] = $2; $$->op = $1; $$->next = NULL; $$->tail = $$; } | uid PORTBINARY uid { if ($1 == UID_MAX || $3 == UID_MAX) { yyerror("user unknown requires operator = or " "!="); YYERROR; } $$ = calloc(1, sizeof(struct node_uid)); if ($$ == NULL) err(1, "uid_item: calloc"); $$->uid[0] = $1; $$->uid[1] = $3; $$->op = $2; $$->next = NULL; $$->tail = $$; } ; uid : STRING { if (!strcmp($1, "unknown")) $$ = UID_MAX; else { struct passwd *pw; if ((pw = getpwnam($1)) == NULL) { yyerror("unknown user %s", $1); free($1); YYERROR; } $$ = pw->pw_uid; } free($1); } | NUMBER { if ($1 < 0 || $1 >= UID_MAX) { yyerror("illegal uid value %lu", $1); YYERROR; } $$ = $1; } ; gids : gid_item { $$ = $1; } | '{' optnl gid_list '}' { $$ = $3; } ; gid_list : gid_item optnl { $$ = $1; } | gid_list comma gid_item optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; gid_item : gid { $$ = calloc(1, sizeof(struct node_gid)); if ($$ == NULL) err(1, "gid_item: calloc"); $$->gid[0] = $1; $$->gid[1] = $1; $$->op = PF_OP_EQ; $$->next = NULL; $$->tail = $$; } | unaryop gid { if ($2 == GID_MAX && $1 != PF_OP_EQ && $1 != PF_OP_NE) { yyerror("group unknown requires operator = or " "!="); YYERROR; } $$ = calloc(1, sizeof(struct node_gid)); if ($$ == NULL) err(1, "gid_item: calloc"); $$->gid[0] = $2; $$->gid[1] = $2; $$->op = $1; $$->next = NULL; $$->tail = $$; } | gid PORTBINARY gid { if ($1 == GID_MAX || $3 == GID_MAX) { yyerror("group unknown requires operator = or " "!="); YYERROR; } $$ = calloc(1, sizeof(struct node_gid)); if ($$ == NULL) err(1, "gid_item: calloc"); $$->gid[0] = $1; $$->gid[1] = $3; $$->op = $2; $$->next = NULL; $$->tail = $$; } ; gid : STRING { if (!strcmp($1, "unknown")) $$ = GID_MAX; else { struct group *grp; if ((grp = getgrnam($1)) == NULL) { yyerror("unknown group %s", $1); free($1); YYERROR; } $$ = grp->gr_gid; } free($1); } | NUMBER { if ($1 < 0 || $1 >= GID_MAX) { yyerror("illegal gid value %lu", $1); YYERROR; } $$ = $1; } ; flag : STRING { int f; if ((f = parse_flags($1)) < 0) { yyerror("bad flags %s", $1); free($1); YYERROR; } free($1); $$.b1 = f; } ; flags : FLAGS flag '/' flag { $$.b1 = $2.b1; $$.b2 = $4.b1; } | FLAGS '/' flag { $$.b1 = 0; $$.b2 = $3.b1; } | FLAGS ANY { $$.b1 = 0; $$.b2 = 0; } ; icmpspec : ICMPTYPE icmp_item { $$ = $2; } | ICMPTYPE '{' optnl icmp_list '}' { $$ = $4; } | ICMP6TYPE icmp6_item { $$ = $2; } | ICMP6TYPE '{' optnl icmp6_list '}' { $$ = $4; } ; icmp_list : icmp_item optnl { $$ = $1; } | icmp_list comma icmp_item optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; icmp6_list : icmp6_item optnl { $$ = $1; } | icmp6_list comma icmp6_item optnl { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; icmp_item : icmptype { $$ = calloc(1, sizeof(struct node_icmp)); if ($$ == NULL) err(1, "icmp_item: calloc"); $$->type = $1; $$->code = 0; $$->proto = IPPROTO_ICMP; $$->next = NULL; $$->tail = $$; } | icmptype CODE STRING { const struct icmpcodeent *p; if ((p = geticmpcodebyname($1-1, $3, AF_INET)) == NULL) { yyerror("unknown icmp-code %s", $3); free($3); YYERROR; } free($3); $$ = calloc(1, sizeof(struct node_icmp)); if ($$ == NULL) err(1, "icmp_item: calloc"); $$->type = $1; $$->code = p->code + 1; $$->proto = IPPROTO_ICMP; $$->next = NULL; $$->tail = $$; } | icmptype CODE NUMBER { if ($3 < 0 || $3 > 255) { yyerror("illegal icmp-code %lu", $3); YYERROR; } $$ = calloc(1, sizeof(struct node_icmp)); if ($$ == NULL) err(1, "icmp_item: calloc"); $$->type = $1; $$->code = $3 + 1; $$->proto = IPPROTO_ICMP; $$->next = NULL; $$->tail = $$; } ; icmp6_item : icmp6type { $$ = calloc(1, sizeof(struct node_icmp)); if ($$ == NULL) err(1, "icmp_item: calloc"); $$->type = $1; $$->code = 0; $$->proto = IPPROTO_ICMPV6; $$->next = NULL; $$->tail = $$; } | icmp6type CODE STRING { const struct icmpcodeent *p; if ((p = geticmpcodebyname($1-1, $3, AF_INET6)) == NULL) { yyerror("unknown icmp6-code %s", $3); free($3); YYERROR; } free($3); $$ = calloc(1, sizeof(struct node_icmp)); if ($$ == NULL) err(1, "icmp_item: calloc"); $$->type = $1; $$->code = p->code + 1; $$->proto = IPPROTO_ICMPV6; $$->next = NULL; $$->tail = $$; } | icmp6type CODE NUMBER { if ($3 < 0 || $3 > 255) { yyerror("illegal icmp-code %lu", $3); YYERROR; } $$ = calloc(1, sizeof(struct node_icmp)); if ($$ == NULL) err(1, "icmp_item: calloc"); $$->type = $1; $$->code = $3 + 1; $$->proto = IPPROTO_ICMPV6; $$->next = NULL; $$->tail = $$; } ; icmptype : STRING { const struct icmptypeent *p; if ((p = geticmptypebyname($1, AF_INET)) == NULL) { yyerror("unknown icmp-type %s", $1); free($1); YYERROR; } $$ = p->type + 1; free($1); } | NUMBER { if ($1 < 0 || $1 > 255) { yyerror("illegal icmp-type %lu", $1); YYERROR; } $$ = $1 + 1; } ; icmp6type : STRING { const struct icmptypeent *p; if ((p = geticmptypebyname($1, AF_INET6)) == NULL) { yyerror("unknown icmp6-type %s", $1); free($1); YYERROR; } $$ = p->type + 1; free($1); } | NUMBER { if ($1 < 0 || $1 > 255) { yyerror("illegal icmp6-type %lu", $1); YYERROR; } $$ = $1 + 1; } ; tos : STRING { if (!strcmp($1, "lowdelay")) $$ = IPTOS_LOWDELAY; else if (!strcmp($1, "throughput")) $$ = IPTOS_THROUGHPUT; else if (!strcmp($1, "reliability")) $$ = IPTOS_RELIABILITY; else if ($1[0] == '0' && $1[1] == 'x') $$ = strtoul($1, NULL, 16); else $$ = 256; /* flag bad argument */ if ($$ < 0 || $$ > 255) { yyerror("illegal tos value %s", $1); free($1); YYERROR; } free($1); } | NUMBER { $$ = $1; if ($$ < 0 || $$ > 255) { yyerror("illegal tos value %s", $1); YYERROR; } } ; sourcetrack : SOURCETRACK { $$ = PF_SRCTRACK; } | SOURCETRACK GLOBAL { $$ = PF_SRCTRACK_GLOBAL; } | SOURCETRACK RULE { $$ = PF_SRCTRACK_RULE; } ; statelock : IFBOUND { $$ = PFRULE_IFBOUND; } | FLOATING { $$ = 0; } ; keep : NO STATE { $$.action = 0; $$.options = NULL; } | KEEP STATE state_opt_spec { $$.action = PF_STATE_NORMAL; $$.options = $3; } | MODULATE STATE state_opt_spec { $$.action = PF_STATE_MODULATE; $$.options = $3; } | SYNPROXY STATE state_opt_spec { $$.action = PF_STATE_SYNPROXY; $$.options = $3; } ; flush : /* empty */ { $$ = 0; } | FLUSH { $$ = PF_FLUSH; } | FLUSH GLOBAL { $$ = PF_FLUSH | PF_FLUSH_GLOBAL; } ; state_opt_spec : '(' state_opt_list ')' { $$ = $2; } | /* empty */ { $$ = NULL; } ; state_opt_list : state_opt_item { $$ = $1; } | state_opt_list comma state_opt_item { $1->tail->next = $3; $1->tail = $3; $$ = $1; } ; state_opt_item : MAXIMUM NUMBER { if ($2 < 0 || $2 > UINT_MAX) { yyerror("only positive values permitted"); YYERROR; } $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_MAX; $$->data.max_states = $2; $$->next = NULL; $$->tail = $$; } | NOSYNC { $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_NOSYNC; $$->next = NULL; $$->tail = $$; } | MAXSRCSTATES NUMBER { if ($2 < 0 || $2 > UINT_MAX) { yyerror("only positive values permitted"); YYERROR; } $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_MAX_SRC_STATES; $$->data.max_src_states = $2; $$->next = NULL; $$->tail = $$; } | MAXSRCCONN NUMBER { if ($2 < 0 || $2 > UINT_MAX) { yyerror("only positive values permitted"); YYERROR; } $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_MAX_SRC_CONN; $$->data.max_src_conn = $2; $$->next = NULL; $$->tail = $$; } | MAXSRCCONNRATE NUMBER '/' NUMBER { if ($2 < 0 || $2 > UINT_MAX || $4 < 0 || $4 > UINT_MAX) { yyerror("only positive values permitted"); YYERROR; } $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_MAX_SRC_CONN_RATE; $$->data.max_src_conn_rate.limit = $2; $$->data.max_src_conn_rate.seconds = $4; $$->next = NULL; $$->tail = $$; } | OVERLOAD '<' STRING '>' flush { if (strlen($3) >= PF_TABLE_NAME_SIZE) { yyerror("table name '%s' too long", $3); free($3); YYERROR; } $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); if (strlcpy($$->data.overload.tblname, $3, PF_TABLE_NAME_SIZE) >= PF_TABLE_NAME_SIZE) errx(1, "state_opt_item: strlcpy"); free($3); $$->type = PF_STATE_OPT_OVERLOAD; $$->data.overload.flush = $5; $$->next = NULL; $$->tail = $$; } | MAXSRCNODES NUMBER { if ($2 < 0 || $2 > UINT_MAX) { yyerror("only positive values permitted"); YYERROR; } $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_MAX_SRC_NODES; $$->data.max_src_nodes = $2; $$->next = NULL; $$->tail = $$; } | sourcetrack { $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_SRCTRACK; $$->data.src_track = $1; $$->next = NULL; $$->tail = $$; } | statelock { $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_STATELOCK; $$->data.statelock = $1; $$->next = NULL; $$->tail = $$; } | SLOPPY { $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_SLOPPY; $$->next = NULL; $$->tail = $$; } | STRING NUMBER { int i; if ($2 < 0 || $2 > UINT_MAX) { yyerror("only positive values permitted"); YYERROR; } for (i = 0; pf_timeouts[i].name && strcmp(pf_timeouts[i].name, $1); ++i) ; /* nothing */ if (!pf_timeouts[i].name) { yyerror("illegal timeout name %s", $1); free($1); YYERROR; } if (strchr(pf_timeouts[i].name, '.') == NULL) { yyerror("illegal state timeout %s", $1); free($1); YYERROR; } free($1); $$ = calloc(1, sizeof(struct node_state_opt)); if ($$ == NULL) err(1, "state_opt_item: calloc"); $$->type = PF_STATE_OPT_TIMEOUT; $$->data.timeout.number = pf_timeouts[i].timeout; $$->data.timeout.seconds = $2; $$->next = NULL; $$->tail = $$; } ; label : LABEL STRING { $$ = $2; } ; qname : QUEUE STRING { $$.qname = $2; $$.pqname = NULL; } | QUEUE '(' STRING ')' { $$.qname = $3; $$.pqname = NULL; } | QUEUE '(' STRING comma STRING ')' { $$.qname = $3; $$.pqname = $5; } ; no : /* empty */ { $$ = 0; } | NO { $$ = 1; } ; portstar : numberstring { if (parseport($1, &$$, PPORT_RANGE|PPORT_STAR) == -1) { free($1); YYERROR; } free($1); } ; redirspec : host { $$ = $1; } | '{' optnl redir_host_list '}' { $$ = $3; } ; redir_host_list : host optnl { $$ = $1; } | redir_host_list comma host optnl { $1->tail->next = $3; $1->tail = $3->tail; $$ = $1; } ; redirpool : /* empty */ { $$ = NULL; } | ARROW redirspec { $$ = calloc(1, sizeof(struct redirection)); if ($$ == NULL) err(1, "redirection: calloc"); $$->host = $2; $$->rport.a = $$->rport.b = $$->rport.t = 0; } | ARROW redirspec PORT portstar { $$ = calloc(1, sizeof(struct redirection)); if ($$ == NULL) err(1, "redirection: calloc"); $$->host = $2; $$->rport = $4; } ; hashkey : /* empty */ { $$ = calloc(1, sizeof(struct pf_poolhashkey)); if ($$ == NULL) err(1, "hashkey: calloc"); $$->key32[0] = arc4random(); $$->key32[1] = arc4random(); $$->key32[2] = arc4random(); $$->key32[3] = arc4random(); } | string { if (!strncmp($1, "0x", 2)) { if (strlen($1) != 34) { free($1); yyerror("hex key must be 128 bits " "(32 hex digits) long"); YYERROR; } $$ = calloc(1, sizeof(struct pf_poolhashkey)); if ($$ == NULL) err(1, "hashkey: calloc"); if (sscanf($1, "0x%8x%8x%8x%8x", &$$->key32[0], &$$->key32[1], &$$->key32[2], &$$->key32[3]) != 4) { free($$); free($1); yyerror("invalid hex key"); YYERROR; } } else { MD5_CTX context; $$ = calloc(1, sizeof(struct pf_poolhashkey)); if ($$ == NULL) err(1, "hashkey: calloc"); MD5Init(&context); MD5Update(&context, (unsigned char *)$1, strlen($1)); MD5Final((unsigned char *)$$, &context); HTONL($$->key32[0]); HTONL($$->key32[1]); HTONL($$->key32[2]); HTONL($$->key32[3]); } free($1); } ; pool_opts : { bzero(&pool_opts, sizeof pool_opts); } pool_opts_l { $$ = pool_opts; } | /* empty */ { bzero(&pool_opts, sizeof pool_opts); $$ = pool_opts; } ; pool_opts_l : pool_opts_l pool_opt | pool_opt ; pool_opt : BITMASK { if (pool_opts.type) { yyerror("pool type cannot be redefined"); YYERROR; } pool_opts.type = PF_POOL_BITMASK; } | RANDOM { if (pool_opts.type) { yyerror("pool type cannot be redefined"); YYERROR; } pool_opts.type = PF_POOL_RANDOM; } | SOURCEHASH hashkey { if (pool_opts.type) { yyerror("pool type cannot be redefined"); YYERROR; } pool_opts.type = PF_POOL_SRCHASH; pool_opts.key = $2; } | ROUNDROBIN { if (pool_opts.type) { yyerror("pool type cannot be redefined"); YYERROR; } pool_opts.type = PF_POOL_ROUNDROBIN; } | STATICPORT { if (pool_opts.staticport) { yyerror("static-port cannot be redefined"); YYERROR; } pool_opts.staticport = 1; } | STICKYADDRESS { if (filter_opts.marker & POM_STICKYADDRESS) { yyerror("sticky-address cannot be redefined"); YYERROR; } pool_opts.marker |= POM_STICKYADDRESS; pool_opts.opts |= PF_POOL_STICKYADDR; } ; redirection : /* empty */ { $$ = NULL; } | ARROW host { $$ = calloc(1, sizeof(struct redirection)); if ($$ == NULL) err(1, "redirection: calloc"); $$->host = $2; $$->rport.a = $$->rport.b = $$->rport.t = 0; } | ARROW host PORT portstar { $$ = calloc(1, sizeof(struct redirection)); if ($$ == NULL) err(1, "redirection: calloc"); $$->host = $2; $$->rport = $4; } ; natpasslog : /* empty */ { $$.b1 = $$.b2 = 0; $$.w2 = 0; } | PASS { $$.b1 = 1; $$.b2 = 0; $$.w2 = 0; } | PASS log { $$.b1 = 1; $$.b2 = $2.log; $$.w2 = $2.logif; } | log { $$.b1 = 0; $$.b2 = $1.log; $$.w2 = $1.logif; } ; nataction : no NAT natpasslog { if ($1 && $3.b1) { yyerror("\"pass\" not valid with \"no\""); YYERROR; } if ($1) $$.b1 = PF_NONAT; else $$.b1 = PF_NAT; $$.b2 = $3.b1; $$.w = $3.b2; $$.w2 = $3.w2; } | no RDR natpasslog { if ($1 && $3.b1) { yyerror("\"pass\" not valid with \"no\""); YYERROR; } if ($1) $$.b1 = PF_NORDR; else $$.b1 = PF_RDR; $$.b2 = $3.b1; $$.w = $3.b2; $$.w2 = $3.w2; } ; natrule : nataction interface af proto fromto tag tagged rtable redirpool pool_opts { struct pf_rule r; if (check_rulestate(PFCTL_STATE_NAT)) YYERROR; memset(&r, 0, sizeof(r)); r.action = $1.b1; r.natpass = $1.b2; r.log = $1.w; r.logif = $1.w2; r.af = $3; if (!r.af) { if ($5.src.host && $5.src.host->af && !$5.src.host->ifindex) r.af = $5.src.host->af; else if ($5.dst.host && $5.dst.host->af && !$5.dst.host->ifindex) r.af = $5.dst.host->af; } if ($6 != NULL) if (strlcpy(r.tagname, $6, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } if ($7.name) if (strlcpy(r.match_tagname, $7.name, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } r.match_tag_not = $7.neg; r.rtableid = $8; if (r.action == PF_NONAT || r.action == PF_NORDR) { if ($9 != NULL) { yyerror("translation rule with 'no' " "does not need '->'"); YYERROR; } } else { if ($9 == NULL || $9->host == NULL) { yyerror("translation rule requires '-> " "address'"); YYERROR; } if (!r.af && ! $9->host->ifindex) r.af = $9->host->af; remove_invalid_hosts(&$9->host, &r.af); if (invalid_redirect($9->host, r.af)) YYERROR; if (check_netmask($9->host, r.af)) YYERROR; r.rpool.proxy_port[0] = ntohs($9->rport.a); switch (r.action) { case PF_RDR: if (!$9->rport.b && $9->rport.t && $5.dst.port != NULL) { r.rpool.proxy_port[1] = ntohs($9->rport.a) + (ntohs( $5.dst.port->port[1]) - ntohs( $5.dst.port->port[0])); } else r.rpool.proxy_port[1] = ntohs($9->rport.b); break; case PF_NAT: r.rpool.proxy_port[1] = ntohs($9->rport.b); if (!r.rpool.proxy_port[0] && !r.rpool.proxy_port[1]) { r.rpool.proxy_port[0] = PF_NAT_PROXY_PORT_LOW; r.rpool.proxy_port[1] = PF_NAT_PROXY_PORT_HIGH; } else if (!r.rpool.proxy_port[1]) r.rpool.proxy_port[1] = r.rpool.proxy_port[0]; break; default: break; } r.rpool.opts = $10.type; if ((r.rpool.opts & PF_POOL_TYPEMASK) == PF_POOL_NONE && ($9->host->next != NULL || $9->host->addr.type == PF_ADDR_TABLE || DYNIF_MULTIADDR($9->host->addr))) r.rpool.opts = PF_POOL_ROUNDROBIN; if ((r.rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_ROUNDROBIN && disallow_table($9->host, "tables are only " "supported in round-robin redirection " "pools")) YYERROR; if ((r.rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_ROUNDROBIN && disallow_alias($9->host, "interface (%s) " "is only supported in round-robin " "redirection pools")) YYERROR; if ($9->host->next != NULL) { if ((r.rpool.opts & PF_POOL_TYPEMASK) != PF_POOL_ROUNDROBIN) { yyerror("only round-robin " "valid for multiple " "redirection addresses"); YYERROR; } } } if ($10.key != NULL) memcpy(&r.rpool.key, $10.key, sizeof(struct pf_poolhashkey)); if ($10.opts) r.rpool.opts |= $10.opts; if ($10.staticport) { if (r.action != PF_NAT) { yyerror("the 'static-port' option is " "only valid with nat rules"); YYERROR; } if (r.rpool.proxy_port[0] != PF_NAT_PROXY_PORT_LOW && r.rpool.proxy_port[1] != PF_NAT_PROXY_PORT_HIGH) { yyerror("the 'static-port' option can't" " be used when specifying a port" " range"); YYERROR; } r.rpool.proxy_port[0] = 0; r.rpool.proxy_port[1] = 0; } expand_rule(&r, $2, $9 == NULL ? NULL : $9->host, $4, $5.src_os, $5.src.host, $5.src.port, $5.dst.host, $5.dst.port, 0, 0, 0, ""); free($9); } ; binatrule : no BINAT natpasslog interface af proto FROM host toipspec tag tagged rtable redirection { struct pf_rule binat; struct pf_pooladdr *pa; if (check_rulestate(PFCTL_STATE_NAT)) YYERROR; if (disallow_urpf_failed($9, "\"urpf-failed\" is not " "permitted as a binat destination")) YYERROR; memset(&binat, 0, sizeof(binat)); if ($1 && $3.b1) { yyerror("\"pass\" not valid with \"no\""); YYERROR; } if ($1) binat.action = PF_NOBINAT; else binat.action = PF_BINAT; binat.natpass = $3.b1; binat.log = $3.b2; binat.logif = $3.w2; binat.af = $5; if (!binat.af && $8 != NULL && $8->af) binat.af = $8->af; if (!binat.af && $9 != NULL && $9->af) binat.af = $9->af; if (!binat.af && $13 != NULL && $13->host) binat.af = $13->host->af; if (!binat.af) { yyerror("address family (inet/inet6) " "undefined"); YYERROR; } if ($4 != NULL) { memcpy(binat.ifname, $4->ifname, sizeof(binat.ifname)); binat.ifnot = $4->not; free($4); } if ($10 != NULL) if (strlcpy(binat.tagname, $10, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } if ($11.name) if (strlcpy(binat.match_tagname, $11.name, PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) { yyerror("tag too long, max %u chars", PF_TAG_NAME_SIZE - 1); YYERROR; } binat.match_tag_not = $11.neg; binat.rtableid = $12; if ($6 != NULL) { binat.proto = $6->proto; free($6); } if ($8 != NULL && disallow_table($8, "invalid use of " "table <%s> as the source address of a binat rule")) YYERROR; if ($8 != NULL && disallow_alias($8, "invalid use of " "interface (%s) as the source address of a binat " "rule")) YYERROR; if ($13 != NULL && $13->host != NULL && disallow_table( $13->host, "invalid use of table <%s> as the " "redirect address of a binat rule")) YYERROR; if ($13 != NULL && $13->host != NULL && disallow_alias( $13->host, "invalid use of interface (%s) as the " "redirect address of a binat rule")) YYERROR; if ($8 != NULL) { if ($8->next) { yyerror("multiple binat ip addresses"); YYERROR; } if ($8->addr.type == PF_ADDR_DYNIFTL) $8->af = binat.af; if ($8->af != binat.af) { yyerror("binat ip versions must match"); YYERROR; } if (check_netmask($8, binat.af)) YYERROR; memcpy(&binat.src.addr, &$8->addr, sizeof(binat.src.addr)); free($8); } if ($9 != NULL) { if ($9->next) { yyerror("multiple binat ip addresses"); YYERROR; } if ($9->af != binat.af && $9->af) { yyerror("binat ip versions must match"); YYERROR; } if (check_netmask($9, binat.af)) YYERROR; memcpy(&binat.dst.addr, &$9->addr, sizeof(binat.dst.addr)); binat.dst.neg = $9->not; free($9); } if (binat.action == PF_NOBINAT) { if ($13 != NULL) { yyerror("'no binat' rule does not need" " '->'"); YYERROR; } } else { if ($13 == NULL || $13->host == NULL) { yyerror("'binat' rule requires" " '-> address'"); YYERROR; } remove_invalid_hosts(&$13->host, &binat.af); if (invalid_redirect($13->host, binat.af)) YYERROR; if ($13->host->next != NULL) { yyerror("binat rule must redirect to " "a single address"); YYERROR; } if (check_netmask($13->host, binat.af)) YYERROR; if (!PF_AZERO(&binat.src.addr.v.a.mask, binat.af) && !PF_AEQ(&binat.src.addr.v.a.mask, &$13->host->addr.v.a.mask, binat.af)) { yyerror("'binat' source mask and " "redirect mask must be the same"); YYERROR; } TAILQ_INIT(&binat.rpool.list); pa = calloc(1, sizeof(struct pf_pooladdr)); if (pa == NULL) err(1, "binat: calloc"); pa->addr = $13->host->addr; pa->ifname[0] = 0; TAILQ_INSERT_TAIL(&binat.rpool.list, pa, entries); free($13); } pfctl_add_rule(pf, &binat, ""); } ; tag : /* empty */ { $$ = NULL; } | TAG STRING { $$ = $2; } ; tagged : /* empty */ { $$.neg = 0; $$.name = NULL; } | not TAGGED string { $$.neg = $1; $$.name = $3; } ; rtable : /* empty */ { $$ = -1; } | RTABLE NUMBER { if ($2 < 0 || $2 > rt_tableid_max()) { yyerror("invalid rtable id"); YYERROR; } $$ = $2; } ; route_host : STRING { $$ = calloc(1, sizeof(struct node_host)); if ($$ == NULL) err(1, "route_host: calloc"); $$->ifname = $1; set_ipmask($$, 128); $$->next = NULL; $$->tail = $$; } | '(' STRING host ')' { $$ = $3; $$->ifname = $2; } ; route_host_list : route_host optnl { $$ = $1; } | route_host_list comma route_host optnl { if ($1->af == 0) $1->af = $3->af; if ($1->af != $3->af) { yyerror("all pool addresses must be in the " "same address family"); YYERROR; } $1->tail->next = $3; $1->tail = $3->tail; $$ = $1; } ; routespec : route_host { $$ = $1; } | '{' optnl route_host_list '}' { $$ = $3; } ; route : /* empty */ { $$.host = NULL; $$.rt = 0; $$.pool_opts = 0; } | FASTROUTE { $$.host = NULL; $$.rt = PF_FASTROUTE; $$.pool_opts = 0; } | ROUTETO routespec pool_opts { $$.host = $2; $$.rt = PF_ROUTETO; $$.pool_opts = $3.type | $3.opts; if ($3.key != NULL) $$.key = $3.key; } | REPLYTO routespec pool_opts { $$.host = $2; $$.rt = PF_REPLYTO; $$.pool_opts = $3.type | $3.opts; if ($3.key != NULL) $$.key = $3.key; } | DUPTO routespec pool_opts { $$.host = $2; $$.rt = PF_DUPTO; $$.pool_opts = $3.type | $3.opts; if ($3.key != NULL) $$.key = $3.key; } ; timeout_spec : STRING NUMBER { if (check_rulestate(PFCTL_STATE_OPTION)) { free($1); YYERROR; } if ($2 < 0 || $2 > UINT_MAX) { yyerror("only positive values permitted"); YYERROR; } if (pfctl_set_timeout(pf, $1, $2, 0) != 0) { yyerror("unknown timeout %s", $1); free($1); YYERROR; } free($1); } + | INTERVAL NUMBER { + if (check_rulestate(PFCTL_STATE_OPTION)) + YYERROR; + if ($2 < 0 || $2 > UINT_MAX) { + yyerror("only positive values permitted"); + YYERROR; + } + if (pfctl_set_timeout(pf, "interval", $2, 0) != 0) + YYERROR; + } ; timeout_list : timeout_list comma timeout_spec optnl | timeout_spec optnl ; limit_spec : STRING NUMBER { if (check_rulestate(PFCTL_STATE_OPTION)) { free($1); YYERROR; } if ($2 < 0 || $2 > UINT_MAX) { yyerror("only positive values permitted"); YYERROR; } if (pfctl_set_limit(pf, $1, $2) != 0) { yyerror("unable to set limit %s %u", $1, $2); free($1); YYERROR; } free($1); } ; limit_list : limit_list comma limit_spec optnl | limit_spec optnl ; comma : ',' | /* empty */ ; yesno : NO { $$ = 0; } | STRING { if (!strcmp($1, "yes")) $$ = 1; else { yyerror("invalid value '%s', expected 'yes' " "or 'no'", $1); free($1); YYERROR; } free($1); } ; unaryop : '=' { $$ = PF_OP_EQ; } | '!' '=' { $$ = PF_OP_NE; } | '<' '=' { $$ = PF_OP_LE; } | '<' { $$ = PF_OP_LT; } | '>' '=' { $$ = PF_OP_GE; } | '>' { $$ = PF_OP_GT; } ; %% int yyerror(const char *fmt, ...) { va_list ap; file->errors++; va_start(ap, fmt); fprintf(stderr, "%s:%d: ", file->name, yylval.lineno); vfprintf(stderr, fmt, ap); fprintf(stderr, "\n"); va_end(ap); return (0); } int disallow_table(struct node_host *h, const char *fmt) { for (; h != NULL; h = h->next) if (h->addr.type == PF_ADDR_TABLE) { yyerror(fmt, h->addr.v.tblname); return (1); } return (0); } int disallow_urpf_failed(struct node_host *h, const char *fmt) { for (; h != NULL; h = h->next) if (h->addr.type == PF_ADDR_URPFFAILED) { yyerror(fmt); return (1); } return (0); } int disallow_alias(struct node_host *h, const char *fmt) { for (; h != NULL; h = h->next) if (DYNIF_MULTIADDR(h->addr)) { yyerror(fmt, h->addr.v.tblname); return (1); } return (0); } int rule_consistent(struct pf_rule *r, int anchor_call) { int problems = 0; switch (r->action) { case PF_PASS: case PF_DROP: case PF_SCRUB: case PF_NOSCRUB: problems = filter_consistent(r, anchor_call); break; case PF_NAT: case PF_NONAT: problems = nat_consistent(r); break; case PF_RDR: case PF_NORDR: problems = rdr_consistent(r); break; case PF_BINAT: case PF_NOBINAT: default: break; } return (problems); } int filter_consistent(struct pf_rule *r, int anchor_call) { int problems = 0; if (r->proto != IPPROTO_TCP && r->proto != IPPROTO_UDP && (r->src.port_op || r->dst.port_op)) { yyerror("port only applies to tcp/udp"); problems++; } if (r->proto != IPPROTO_ICMP && r->proto != IPPROTO_ICMPV6 && (r->type || r->code)) { yyerror("icmp-type/code only applies to icmp"); problems++; } if (!r->af && (r->type || r->code)) { yyerror("must indicate address family with icmp-type/code"); problems++; } if (r->overload_tblname[0] && r->max_src_conn == 0 && r->max_src_conn_rate.seconds == 0) { yyerror("'overload' requires 'max-src-conn' " "or 'max-src-conn-rate'"); problems++; } if ((r->proto == IPPROTO_ICMP && r->af == AF_INET6) || (r->proto == IPPROTO_ICMPV6 && r->af == AF_INET)) { yyerror("proto %s doesn't match address family %s", r->proto == IPPROTO_ICMP ? "icmp" : "icmp6", r->af == AF_INET ? "inet" : "inet6"); problems++; } if (r->allow_opts && r->action != PF_PASS) { yyerror("allow-opts can only be specified for pass rules"); problems++; } if (r->rule_flag & PFRULE_FRAGMENT && (r->src.port_op || r->dst.port_op || r->flagset || r->type || r->code)) { yyerror("fragments can be filtered only on IP header fields"); problems++; } if (r->rule_flag & PFRULE_RETURNRST && r->proto != IPPROTO_TCP) { yyerror("return-rst can only be applied to TCP rules"); problems++; } if (r->max_src_nodes && !(r->rule_flag & PFRULE_RULESRCTRACK)) { yyerror("max-src-nodes requires 'source-track rule'"); problems++; } if (r->action == PF_DROP && r->keep_state) { yyerror("keep state on block rules doesn't make sense"); problems++; } if (r->rule_flag & PFRULE_STATESLOPPY && (r->keep_state == PF_STATE_MODULATE || r->keep_state == PF_STATE_SYNPROXY)) { yyerror("sloppy state matching cannot be used with " "synproxy state or modulate state"); problems++; } return (-problems); } int nat_consistent(struct pf_rule *r) { return (0); /* yeah! */ } int rdr_consistent(struct pf_rule *r) { int problems = 0; if (r->proto != IPPROTO_TCP && r->proto != IPPROTO_UDP) { if (r->src.port_op) { yyerror("src port only applies to tcp/udp"); problems++; } if (r->dst.port_op) { yyerror("dst port only applies to tcp/udp"); problems++; } if (r->rpool.proxy_port[0]) { yyerror("rpool port only applies to tcp/udp"); problems++; } } if (r->dst.port_op && r->dst.port_op != PF_OP_EQ && r->dst.port_op != PF_OP_RRG) { yyerror("invalid port operator for rdr destination port"); problems++; } return (-problems); } int process_tabledef(char *name, struct table_opts *opts) { struct pfr_buffer ab; struct node_tinit *ti; bzero(&ab, sizeof(ab)); ab.pfrb_type = PFRB_ADDRS; SIMPLEQ_FOREACH(ti, &opts->init_nodes, entries) { if (ti->file) if (pfr_buf_load(&ab, ti->file, 0, append_addr)) { if (errno) yyerror("cannot load \"%s\": %s", ti->file, strerror(errno)); else yyerror("file \"%s\" contains bad data", ti->file); goto _error; } if (ti->host) if (append_addr_host(&ab, ti->host, 0, 0)) { yyerror("cannot create address buffer: %s", strerror(errno)); goto _error; } } if (pf->opts & PF_OPT_VERBOSE) print_tabledef(name, opts->flags, opts->init_addr, &opts->init_nodes); if (!(pf->opts & PF_OPT_NOACTION) && pfctl_define_table(name, opts->flags, opts->init_addr, pf->anchor->name, &ab, pf->anchor->ruleset.tticket)) { yyerror("cannot define table %s: %s", name, pfr_strerror(errno)); goto _error; } pf->tdirty = 1; pfr_buf_clear(&ab); return (0); _error: pfr_buf_clear(&ab); return (-1); } struct keywords { const char *k_name; int k_val; }; /* macro gore, but you should've seen the prior indentation nightmare... */ #define FREE_LIST(T,r) \ do { \ T *p, *node = r; \ while (node != NULL) { \ p = node; \ node = node->next; \ free(p); \ } \ } while (0) #define LOOP_THROUGH(T,n,r,C) \ do { \ T *n; \ if (r == NULL) { \ r = calloc(1, sizeof(T)); \ if (r == NULL) \ err(1, "LOOP: calloc"); \ r->next = NULL; \ } \ n = r; \ while (n != NULL) { \ do { \ C; \ } while (0); \ n = n->next; \ } \ } while (0) void expand_label_str(char *label, size_t len, const char *srch, const char *repl) { char *tmp; char *p, *q; if ((tmp = calloc(1, len)) == NULL) err(1, "expand_label_str: calloc"); p = q = label; while ((q = strstr(p, srch)) != NULL) { *q = '\0'; if ((strlcat(tmp, p, len) >= len) || (strlcat(tmp, repl, len) >= len)) errx(1, "expand_label: label too long"); q += strlen(srch); p = q; } if (strlcat(tmp, p, len) >= len) errx(1, "expand_label: label too long"); strlcpy(label, tmp, len); /* always fits */ free(tmp); } void expand_label_if(const char *name, char *label, size_t len, const char *ifname) { if (strstr(label, name) != NULL) { if (!*ifname) expand_label_str(label, len, name, "any"); else expand_label_str(label, len, name, ifname); } } void expand_label_addr(const char *name, char *label, size_t len, sa_family_t af, struct node_host *h) { char tmp[64], tmp_not[66]; if (strstr(label, name) != NULL) { switch (h->addr.type) { case PF_ADDR_DYNIFTL: snprintf(tmp, sizeof(tmp), "(%s)", h->addr.v.ifname); break; case PF_ADDR_TABLE: snprintf(tmp, sizeof(tmp), "<%s>", h->addr.v.tblname); break; case PF_ADDR_NOROUTE: snprintf(tmp, sizeof(tmp), "no-route"); break; case PF_ADDR_URPFFAILED: snprintf(tmp, sizeof(tmp), "urpf-failed"); break; case PF_ADDR_ADDRMASK: if (!af || (PF_AZERO(&h->addr.v.a.addr, af) && PF_AZERO(&h->addr.v.a.mask, af))) snprintf(tmp, sizeof(tmp), "any"); else { char a[48]; int bits; if (inet_ntop(af, &h->addr.v.a.addr, a, sizeof(a)) == NULL) snprintf(tmp, sizeof(tmp), "?"); else { bits = unmask(&h->addr.v.a.mask, af); if ((af == AF_INET && bits < 32) || (af == AF_INET6 && bits < 128)) snprintf(tmp, sizeof(tmp), "%s/%d", a, bits); else snprintf(tmp, sizeof(tmp), "%s", a); } } break; default: snprintf(tmp, sizeof(tmp), "?"); break; } if (h->not) { snprintf(tmp_not, sizeof(tmp_not), "! %s", tmp); expand_label_str(label, len, name, tmp_not); } else expand_label_str(label, len, name, tmp); } } void expand_label_port(const char *name, char *label, size_t len, struct node_port *port) { char a1[6], a2[6], op[13] = ""; if (strstr(label, name) != NULL) { snprintf(a1, sizeof(a1), "%u", ntohs(port->port[0])); snprintf(a2, sizeof(a2), "%u", ntohs(port->port[1])); if (!port->op) ; else if (port->op == PF_OP_IRG) snprintf(op, sizeof(op), "%s><%s", a1, a2); else if (port->op == PF_OP_XRG) snprintf(op, sizeof(op), "%s<>%s", a1, a2); else if (port->op == PF_OP_EQ) snprintf(op, sizeof(op), "%s", a1); else if (port->op == PF_OP_NE) snprintf(op, sizeof(op), "!=%s", a1); else if (port->op == PF_OP_LT) snprintf(op, sizeof(op), "<%s", a1); else if (port->op == PF_OP_LE) snprintf(op, sizeof(op), "<=%s", a1); else if (port->op == PF_OP_GT) snprintf(op, sizeof(op), ">%s", a1); else if (port->op == PF_OP_GE) snprintf(op, sizeof(op), ">=%s", a1); expand_label_str(label, len, name, op); } } void expand_label_proto(const char *name, char *label, size_t len, u_int8_t proto) { struct protoent *pe; char n[4]; if (strstr(label, name) != NULL) { pe = getprotobynumber(proto); if (pe != NULL) expand_label_str(label, len, name, pe->p_name); else { snprintf(n, sizeof(n), "%u", proto); expand_label_str(label, len, name, n); } } } void expand_label_nr(const char *name, char *label, size_t len) { char n[11]; if (strstr(label, name) != NULL) { snprintf(n, sizeof(n), "%u", pf->anchor->match); expand_label_str(label, len, name, n); } } void expand_label(char *label, size_t len, const char *ifname, sa_family_t af, struct node_host *src_host, struct node_port *src_port, struct node_host *dst_host, struct node_port *dst_port, u_int8_t proto) { expand_label_if("$if", label, len, ifname); expand_label_addr("$srcaddr", label, len, af, src_host); expand_label_addr("$dstaddr", label, len, af, dst_host); expand_label_port("$srcport", label, len, src_port); expand_label_port("$dstport", label, len, dst_port); expand_label_proto("$proto", label, len, proto); expand_label_nr("$nr", label, len); } int expand_altq(struct pf_altq *a, struct node_if *interfaces, struct node_queue *nqueues, struct node_queue_bw bwspec, struct node_queue_opt *opts) { struct pf_altq pa, pb; char qname[PF_QNAME_SIZE]; struct node_queue *n; struct node_queue_bw bw; int errs = 0; if ((pf->loadopt & PFCTL_FLAG_ALTQ) == 0) { FREE_LIST(struct node_if, interfaces); if (nqueues) FREE_LIST(struct node_queue, nqueues); return (0); } LOOP_THROUGH(struct node_if, interface, interfaces, memcpy(&pa, a, sizeof(struct pf_altq)); if (strlcpy(pa.ifname, interface->ifname, sizeof(pa.ifname)) >= sizeof(pa.ifname)) errx(1, "expand_altq: strlcpy"); if (interface->not) { yyerror("altq on ! is not supported"); errs++; } else { if (eval_pfaltq(pf, &pa, &bwspec, opts)) errs++; else if (pfctl_add_altq(pf, &pa)) errs++; if (pf->opts & PF_OPT_VERBOSE) { print_altq(&pf->paltq->altq, 0, &bwspec, opts); if (nqueues && nqueues->tail) { printf("queue { "); LOOP_THROUGH(struct node_queue, queue, nqueues, printf("%s ", queue->queue); ); printf("}"); } printf("\n"); } if (pa.scheduler == ALTQT_CBQ || pa.scheduler == ALTQT_HFSC) { /* now create a root queue */ memset(&pb, 0, sizeof(struct pf_altq)); if (strlcpy(qname, "root_", sizeof(qname)) >= sizeof(qname)) errx(1, "expand_altq: strlcpy"); if (strlcat(qname, interface->ifname, sizeof(qname)) >= sizeof(qname)) errx(1, "expand_altq: strlcat"); if (strlcpy(pb.qname, qname, sizeof(pb.qname)) >= sizeof(pb.qname)) errx(1, "expand_altq: strlcpy"); if (strlcpy(pb.ifname, interface->ifname, sizeof(pb.ifname)) >= sizeof(pb.ifname)) errx(1, "expand_altq: strlcpy"); pb.qlimit = pa.qlimit; pb.scheduler = pa.scheduler; bw.bw_absolute = pa.ifbandwidth; bw.bw_percent = 0; if (eval_pfqueue(pf, &pb, &bw, opts)) errs++; else if (pfctl_add_altq(pf, &pb)) errs++; } LOOP_THROUGH(struct node_queue, queue, nqueues, n = calloc(1, sizeof(struct node_queue)); if (n == NULL) err(1, "expand_altq: calloc"); if (pa.scheduler == ALTQT_CBQ || pa.scheduler == ALTQT_HFSC) if (strlcpy(n->parent, qname, sizeof(n->parent)) >= sizeof(n->parent)) errx(1, "expand_altq: strlcpy"); if (strlcpy(n->queue, queue->queue, sizeof(n->queue)) >= sizeof(n->queue)) errx(1, "expand_altq: strlcpy"); if (strlcpy(n->ifname, interface->ifname, sizeof(n->ifname)) >= sizeof(n->ifname)) errx(1, "expand_altq: strlcpy"); n->scheduler = pa.scheduler; n->next = NULL; n->tail = n; if (queues == NULL) queues = n; else { queues->tail->next = n; queues->tail = n; } ); } ); FREE_LIST(struct node_if, interfaces); if (nqueues) FREE_LIST(struct node_queue, nqueues); return (errs); } int expand_queue(struct pf_altq *a, struct node_if *interfaces, struct node_queue *nqueues, struct node_queue_bw bwspec, struct node_queue_opt *opts) { struct node_queue *n, *nq; struct pf_altq pa; u_int8_t found = 0; u_int8_t errs = 0; if ((pf->loadopt & PFCTL_FLAG_ALTQ) == 0) { FREE_LIST(struct node_queue, nqueues); return (0); } if (queues == NULL) { yyerror("queue %s has no parent", a->qname); FREE_LIST(struct node_queue, nqueues); return (1); } LOOP_THROUGH(struct node_if, interface, interfaces, LOOP_THROUGH(struct node_queue, tqueue, queues, if (!strncmp(a->qname, tqueue->queue, PF_QNAME_SIZE) && (interface->ifname[0] == 0 || (!interface->not && !strncmp(interface->ifname, tqueue->ifname, IFNAMSIZ)) || (interface->not && strncmp(interface->ifname, tqueue->ifname, IFNAMSIZ)))) { /* found ourself in queues */ found++; memcpy(&pa, a, sizeof(struct pf_altq)); if (pa.scheduler != ALTQT_NONE && pa.scheduler != tqueue->scheduler) { yyerror("exactly one scheduler type " "per interface allowed"); return (1); } pa.scheduler = tqueue->scheduler; /* scheduler dependent error checking */ switch (pa.scheduler) { case ALTQT_PRIQ: if (nqueues != NULL) { yyerror("priq queues cannot " "have child queues"); return (1); } if (bwspec.bw_absolute > 0 || bwspec.bw_percent < 100) { yyerror("priq doesn't take " "bandwidth"); return (1); } break; default: break; } if (strlcpy(pa.ifname, tqueue->ifname, sizeof(pa.ifname)) >= sizeof(pa.ifname)) errx(1, "expand_queue: strlcpy"); if (strlcpy(pa.parent, tqueue->parent, sizeof(pa.parent)) >= sizeof(pa.parent)) errx(1, "expand_queue: strlcpy"); if (eval_pfqueue(pf, &pa, &bwspec, opts)) errs++; else if (pfctl_add_altq(pf, &pa)) errs++; for (nq = nqueues; nq != NULL; nq = nq->next) { if (!strcmp(a->qname, nq->queue)) { yyerror("queue cannot have " "itself as child"); errs++; continue; } n = calloc(1, sizeof(struct node_queue)); if (n == NULL) err(1, "expand_queue: calloc"); if (strlcpy(n->parent, a->qname, sizeof(n->parent)) >= sizeof(n->parent)) errx(1, "expand_queue strlcpy"); if (strlcpy(n->queue, nq->queue, sizeof(n->queue)) >= sizeof(n->queue)) errx(1, "expand_queue strlcpy"); if (strlcpy(n->ifname, tqueue->ifname, sizeof(n->ifname)) >= sizeof(n->ifname)) errx(1, "expand_queue strlcpy"); n->scheduler = tqueue->scheduler; n->next = NULL; n->tail = n; if (queues == NULL) queues = n; else { queues->tail->next = n; queues->tail = n; } } if ((pf->opts & PF_OPT_VERBOSE) && ( (found == 1 && interface->ifname[0] == 0) || (found > 0 && interface->ifname[0] != 0))) { print_queue(&pf->paltq->altq, 0, &bwspec, interface->ifname[0] != 0, opts); if (nqueues && nqueues->tail) { printf("{ "); LOOP_THROUGH(struct node_queue, queue, nqueues, printf("%s ", queue->queue); ); printf("}"); } printf("\n"); } } ); ); FREE_LIST(struct node_queue, nqueues); FREE_LIST(struct node_if, interfaces); if (!found) { yyerror("queue %s has no parent", a->qname); errs++; } if (errs) return (1); else return (0); } void expand_rule(struct pf_rule *r, struct node_if *interfaces, struct node_host *rpool_hosts, struct node_proto *protos, struct node_os *src_oses, struct node_host *src_hosts, struct node_port *src_ports, struct node_host *dst_hosts, struct node_port *dst_ports, struct node_uid *uids, struct node_gid *gids, struct node_icmp *icmp_types, const char *anchor_call) { sa_family_t af = r->af; int added = 0, error = 0; char ifname[IF_NAMESIZE]; char label[PF_RULE_LABEL_SIZE]; char tagname[PF_TAG_NAME_SIZE]; char match_tagname[PF_TAG_NAME_SIZE]; struct pf_pooladdr *pa; struct node_host *h; u_int8_t flags, flagset, keep_state; if (strlcpy(label, r->label, sizeof(label)) >= sizeof(label)) errx(1, "expand_rule: strlcpy"); if (strlcpy(tagname, r->tagname, sizeof(tagname)) >= sizeof(tagname)) errx(1, "expand_rule: strlcpy"); if (strlcpy(match_tagname, r->match_tagname, sizeof(match_tagname)) >= sizeof(match_tagname)) errx(1, "expand_rule: strlcpy"); flags = r->flags; flagset = r->flagset; keep_state = r->keep_state; LOOP_THROUGH(struct node_if, interface, interfaces, LOOP_THROUGH(struct node_proto, proto, protos, LOOP_THROUGH(struct node_icmp, icmp_type, icmp_types, LOOP_THROUGH(struct node_host, src_host, src_hosts, LOOP_THROUGH(struct node_port, src_port, src_ports, LOOP_THROUGH(struct node_os, src_os, src_oses, LOOP_THROUGH(struct node_host, dst_host, dst_hosts, LOOP_THROUGH(struct node_port, dst_port, dst_ports, LOOP_THROUGH(struct node_uid, uid, uids, LOOP_THROUGH(struct node_gid, gid, gids, r->af = af; /* for link-local IPv6 address, interface must match up */ if ((r->af && src_host->af && r->af != src_host->af) || (r->af && dst_host->af && r->af != dst_host->af) || (src_host->af && dst_host->af && src_host->af != dst_host->af) || (src_host->ifindex && dst_host->ifindex && src_host->ifindex != dst_host->ifindex) || (src_host->ifindex && *interface->ifname && src_host->ifindex != if_nametoindex(interface->ifname)) || (dst_host->ifindex && *interface->ifname && dst_host->ifindex != if_nametoindex(interface->ifname))) continue; if (!r->af && src_host->af) r->af = src_host->af; else if (!r->af && dst_host->af) r->af = dst_host->af; if (*interface->ifname) strlcpy(r->ifname, interface->ifname, sizeof(r->ifname)); else if (if_indextoname(src_host->ifindex, ifname)) strlcpy(r->ifname, ifname, sizeof(r->ifname)); else if (if_indextoname(dst_host->ifindex, ifname)) strlcpy(r->ifname, ifname, sizeof(r->ifname)); else memset(r->ifname, '\0', sizeof(r->ifname)); if (strlcpy(r->label, label, sizeof(r->label)) >= sizeof(r->label)) errx(1, "expand_rule: strlcpy"); if (strlcpy(r->tagname, tagname, sizeof(r->tagname)) >= sizeof(r->tagname)) errx(1, "expand_rule: strlcpy"); if (strlcpy(r->match_tagname, match_tagname, sizeof(r->match_tagname)) >= sizeof(r->match_tagname)) errx(1, "expand_rule: strlcpy"); expand_label(r->label, PF_RULE_LABEL_SIZE, r->ifname, r->af, src_host, src_port, dst_host, dst_port, proto->proto); expand_label(r->tagname, PF_TAG_NAME_SIZE, r->ifname, r->af, src_host, src_port, dst_host, dst_port, proto->proto); expand_label(r->match_tagname, PF_TAG_NAME_SIZE, r->ifname, r->af, src_host, src_port, dst_host, dst_port, proto->proto); error += check_netmask(src_host, r->af); error += check_netmask(dst_host, r->af); r->ifnot = interface->not; r->proto = proto->proto; r->src.addr = src_host->addr; r->src.neg = src_host->not; r->src.port[0] = src_port->port[0]; r->src.port[1] = src_port->port[1]; r->src.port_op = src_port->op; r->dst.addr = dst_host->addr; r->dst.neg = dst_host->not; r->dst.port[0] = dst_port->port[0]; r->dst.port[1] = dst_port->port[1]; r->dst.port_op = dst_port->op; r->uid.op = uid->op; r->uid.uid[0] = uid->uid[0]; r->uid.uid[1] = uid->uid[1]; r->gid.op = gid->op; r->gid.gid[0] = gid->gid[0]; r->gid.gid[1] = gid->gid[1]; r->type = icmp_type->type; r->code = icmp_type->code; if ((keep_state == PF_STATE_MODULATE || keep_state == PF_STATE_SYNPROXY) && r->proto && r->proto != IPPROTO_TCP) r->keep_state = PF_STATE_NORMAL; else r->keep_state = keep_state; if (r->proto && r->proto != IPPROTO_TCP) { r->flags = 0; r->flagset = 0; } else { r->flags = flags; r->flagset = flagset; } if (icmp_type->proto && r->proto != icmp_type->proto) { yyerror("icmp-type mismatch"); error++; } if (src_os && src_os->os) { r->os_fingerprint = pfctl_get_fingerprint(src_os->os); if ((pf->opts & PF_OPT_VERBOSE2) && r->os_fingerprint == PF_OSFP_NOMATCH) fprintf(stderr, "warning: unknown '%s' OS fingerprint\n", src_os->os); } else { r->os_fingerprint = PF_OSFP_ANY; } TAILQ_INIT(&r->rpool.list); for (h = rpool_hosts; h != NULL; h = h->next) { pa = calloc(1, sizeof(struct pf_pooladdr)); if (pa == NULL) err(1, "expand_rule: calloc"); pa->addr = h->addr; if (h->ifname != NULL) { if (strlcpy(pa->ifname, h->ifname, sizeof(pa->ifname)) >= sizeof(pa->ifname)) errx(1, "expand_rule: strlcpy"); } else pa->ifname[0] = 0; TAILQ_INSERT_TAIL(&r->rpool.list, pa, entries); } if (rule_consistent(r, anchor_call[0]) < 0 || error) yyerror("skipping rule due to errors"); else { r->nr = pf->astack[pf->asd]->match++; pfctl_add_rule(pf, r, anchor_call); added++; } )))))))))); FREE_LIST(struct node_if, interfaces); FREE_LIST(struct node_proto, protos); FREE_LIST(struct node_host, src_hosts); FREE_LIST(struct node_port, src_ports); FREE_LIST(struct node_os, src_oses); FREE_LIST(struct node_host, dst_hosts); FREE_LIST(struct node_port, dst_ports); FREE_LIST(struct node_uid, uids); FREE_LIST(struct node_gid, gids); FREE_LIST(struct node_icmp, icmp_types); FREE_LIST(struct node_host, rpool_hosts); if (!added) yyerror("rule expands to no valid combination"); } int expand_skip_interface(struct node_if *interfaces) { int errs = 0; if (!interfaces || (!interfaces->next && !interfaces->not && !strcmp(interfaces->ifname, "none"))) { if (pf->opts & PF_OPT_VERBOSE) printf("set skip on none\n"); errs = pfctl_set_interface_flags(pf, "", PFI_IFLAG_SKIP, 0); return (errs); } if (pf->opts & PF_OPT_VERBOSE) printf("set skip on {"); LOOP_THROUGH(struct node_if, interface, interfaces, if (pf->opts & PF_OPT_VERBOSE) printf(" %s", interface->ifname); if (interface->not) { yyerror("skip on ! is not supported"); errs++; } else errs += pfctl_set_interface_flags(pf, interface->ifname, PFI_IFLAG_SKIP, 1); ); if (pf->opts & PF_OPT_VERBOSE) printf(" }\n"); FREE_LIST(struct node_if, interfaces); if (errs) return (1); else return (0); } #undef FREE_LIST #undef LOOP_THROUGH int check_rulestate(int desired_state) { if (require_order && (rulestate > desired_state)) { yyerror("Rules must be in order: options, normalization, " "queueing, translation, filtering"); return (1); } rulestate = desired_state; return (0); } int kw_cmp(const void *k, const void *e) { return (strcmp(k, ((const struct keywords *)e)->k_name)); } int lookup(char *s) { /* this has to be sorted always */ static const struct keywords keywords[] = { { "all", ALL}, { "allow-opts", ALLOWOPTS}, { "altq", ALTQ}, { "anchor", ANCHOR}, { "antispoof", ANTISPOOF}, { "any", ANY}, { "bandwidth", BANDWIDTH}, { "binat", BINAT}, { "binat-anchor", BINATANCHOR}, { "bitmask", BITMASK}, { "block", BLOCK}, { "block-policy", BLOCKPOLICY}, { "buckets", BUCKETS}, { "cbq", CBQ}, { "code", CODE}, { "codelq", CODEL}, { "crop", FRAGCROP}, { "debug", DEBUG}, { "divert-reply", DIVERTREPLY}, { "divert-to", DIVERTTO}, { "drop", DROP}, { "drop-ovl", FRAGDROP}, { "dup-to", DUPTO}, { "fairq", FAIRQ}, { "fastroute", FASTROUTE}, { "file", FILENAME}, { "fingerprints", FINGERPRINTS}, { "flags", FLAGS}, { "floating", FLOATING}, { "flush", FLUSH}, { "for", FOR}, { "fragment", FRAGMENT}, { "from", FROM}, { "global", GLOBAL}, { "group", GROUP}, { "hfsc", HFSC}, { "hogs", HOGS}, { "hostid", HOSTID}, { "icmp-type", ICMPTYPE}, { "icmp6-type", ICMP6TYPE}, { "if-bound", IFBOUND}, { "in", IN}, { "include", INCLUDE}, { "inet", INET}, { "inet6", INET6}, { "interval", INTERVAL}, { "keep", KEEP}, { "label", LABEL}, { "limit", LIMIT}, { "linkshare", LINKSHARE}, { "load", LOAD}, { "log", LOG}, { "loginterface", LOGINTERFACE}, { "max", MAXIMUM}, { "max-mss", MAXMSS}, { "max-src-conn", MAXSRCCONN}, { "max-src-conn-rate", MAXSRCCONNRATE}, { "max-src-nodes", MAXSRCNODES}, { "max-src-states", MAXSRCSTATES}, { "min-ttl", MINTTL}, { "modulate", MODULATE}, { "nat", NAT}, { "nat-anchor", NATANCHOR}, { "no", NO}, { "no-df", NODF}, { "no-route", NOROUTE}, { "no-sync", NOSYNC}, { "on", ON}, { "optimization", OPTIMIZATION}, { "os", OS}, { "out", OUT}, { "overload", OVERLOAD}, { "pass", PASS}, { "port", PORT}, { "prio", PRIO}, { "priority", PRIORITY}, { "priq", PRIQ}, { "probability", PROBABILITY}, { "proto", PROTO}, { "qlimit", QLIMIT}, { "queue", QUEUE}, { "quick", QUICK}, { "random", RANDOM}, { "random-id", RANDOMID}, { "rdr", RDR}, { "rdr-anchor", RDRANCHOR}, { "realtime", REALTIME}, { "reassemble", REASSEMBLE}, { "reply-to", REPLYTO}, { "require-order", REQUIREORDER}, { "return", RETURN}, { "return-icmp", RETURNICMP}, { "return-icmp6", RETURNICMP6}, { "return-rst", RETURNRST}, { "round-robin", ROUNDROBIN}, { "route", ROUTE}, { "route-to", ROUTETO}, { "rtable", RTABLE}, { "rule", RULE}, { "ruleset-optimization", RULESET_OPTIMIZATION}, { "scrub", SCRUB}, { "set", SET}, { "set-tos", SETTOS}, { "skip", SKIP}, { "sloppy", SLOPPY}, { "source-hash", SOURCEHASH}, { "source-track", SOURCETRACK}, { "state", STATE}, { "state-defaults", STATEDEFAULTS}, { "state-policy", STATEPOLICY}, { "static-port", STATICPORT}, { "sticky-address", STICKYADDRESS}, { "synproxy", SYNPROXY}, { "table", TABLE}, { "tag", TAG}, { "tagged", TAGGED}, { "target", TARGET}, { "tbrsize", TBRSIZE}, { "timeout", TIMEOUT}, { "to", TO}, { "tos", TOS}, { "ttl", TTL}, { "upperlimit", UPPERLIMIT}, { "urpf-failed", URPFFAILED}, { "user", USER}, }; const struct keywords *p; p = bsearch(s, keywords, sizeof(keywords)/sizeof(keywords[0]), sizeof(keywords[0]), kw_cmp); if (p) { if (debug > 1) fprintf(stderr, "%s: %d\n", s, p->k_val); return (p->k_val); } else { if (debug > 1) fprintf(stderr, "string: %s\n", s); return (STRING); } } #define MAXPUSHBACK 128 -char *parsebuf; -int parseindex; -char pushback_buffer[MAXPUSHBACK]; -int pushback_index = 0; +static char *parsebuf; +static int parseindex; +static char pushback_buffer[MAXPUSHBACK]; +static int pushback_index = 0; int lgetc(int quotec) { int c, next; if (parsebuf) { /* Read character from the parsebuffer instead of input. */ if (parseindex >= 0) { c = parsebuf[parseindex++]; if (c != '\0') return (c); parsebuf = NULL; } else parseindex++; } if (pushback_index) return (pushback_buffer[--pushback_index]); if (quotec) { if ((c = getc(file->stream)) == EOF) { yyerror("reached end of file while parsing quoted string"); if (popfile() == EOF) return (EOF); return (quotec); } return (c); } while ((c = getc(file->stream)) == '\\') { next = getc(file->stream); if (next != '\n') { c = next; break; } yylval.lineno = file->lineno; file->lineno++; } while (c == EOF) { if (popfile() == EOF) return (EOF); c = getc(file->stream); } return (c); } int lungetc(int c) { if (c == EOF) return (EOF); if (parsebuf) { parseindex--; if (parseindex >= 0) return (c); } if (pushback_index < MAXPUSHBACK-1) return (pushback_buffer[pushback_index++] = c); else return (EOF); } int findeol(void) { int c; parsebuf = NULL; /* skip to either EOF or the first real EOL */ while (1) { if (pushback_index) c = pushback_buffer[--pushback_index]; else c = lgetc(0); if (c == '\n') { file->lineno++; break; } if (c == EOF) break; } return (ERROR); } int yylex(void) { char buf[8096]; char *p, *val; int quotec, next, c; int token; top: p = buf; while ((c = lgetc(0)) == ' ' || c == '\t') ; /* nothing */ yylval.lineno = file->lineno; if (c == '#') while ((c = lgetc(0)) != '\n' && c != EOF) ; /* nothing */ if (c == '$' && parsebuf == NULL) { while (1) { if ((c = lgetc(0)) == EOF) return (0); if (p + 1 >= buf + sizeof(buf) - 1) { yyerror("string too long"); return (findeol()); } if (isalnum(c) || c == '_') { *p++ = (char)c; continue; } *p = '\0'; lungetc(c); break; } val = symget(buf); if (val == NULL) { yyerror("macro '%s' not defined", buf); return (findeol()); } parsebuf = val; parseindex = 0; goto top; } switch (c) { case '\'': case '"': quotec = c; while (1) { if ((c = lgetc(quotec)) == EOF) return (0); if (c == '\n') { file->lineno++; continue; } else if (c == '\\') { if ((next = lgetc(quotec)) == EOF) return (0); if (next == quotec || c == ' ' || c == '\t') c = next; else if (next == '\n') continue; else lungetc(next); } else if (c == quotec) { *p = '\0'; break; } if (p + 1 >= buf + sizeof(buf) - 1) { yyerror("string too long"); return (findeol()); } *p++ = (char)c; } yylval.v.string = strdup(buf); if (yylval.v.string == NULL) err(1, "yylex: strdup"); return (STRING); case '<': next = lgetc(0); if (next == '>') { yylval.v.i = PF_OP_XRG; return (PORTBINARY); } lungetc(next); break; case '>': next = lgetc(0); if (next == '<') { yylval.v.i = PF_OP_IRG; return (PORTBINARY); } lungetc(next); break; case '-': next = lgetc(0); if (next == '>') return (ARROW); lungetc(next); break; } #define allowed_to_end_number(x) \ (isspace(x) || x == ')' || x ==',' || x == '/' || x == '}' || x == '=') if (c == '-' || isdigit(c)) { do { *p++ = c; if ((unsigned)(p-buf) >= sizeof(buf)) { yyerror("string too long"); return (findeol()); } } while ((c = lgetc(0)) != EOF && isdigit(c)); lungetc(c); if (p == buf + 1 && buf[0] == '-') goto nodigits; if (c == EOF || allowed_to_end_number(c)) { const char *errstr = NULL; *p = '\0'; yylval.v.number = strtonum(buf, LLONG_MIN, LLONG_MAX, &errstr); if (errstr) { yyerror("\"%s\" invalid number: %s", buf, errstr); return (findeol()); } return (NUMBER); } else { nodigits: while (p > buf + 1) lungetc(*--p); c = *--p; if (c == '-') return (c); } } #define allowed_in_string(x) \ (isalnum(x) || (ispunct(x) && x != '(' && x != ')' && \ x != '{' && x != '}' && x != '<' && x != '>' && \ x != '!' && x != '=' && x != '/' && x != '#' && \ x != ',')) if (isalnum(c) || c == ':' || c == '_') { do { *p++ = c; if ((unsigned)(p-buf) >= sizeof(buf)) { yyerror("string too long"); return (findeol()); } } while ((c = lgetc(0)) != EOF && (allowed_in_string(c))); lungetc(c); *p = '\0'; if ((token = lookup(buf)) == STRING) if ((yylval.v.string = strdup(buf)) == NULL) err(1, "yylex: strdup"); return (token); } if (c == '\n') { yylval.lineno = file->lineno; file->lineno++; } if (c == EOF) return (0); return (c); } int check_file_secrecy(int fd, const char *fname) { struct stat st; if (fstat(fd, &st)) { warn("cannot stat %s", fname); return (-1); } if (st.st_uid != 0 && st.st_uid != getuid()) { warnx("%s: owner not root or current user", fname); return (-1); } if (st.st_mode & (S_IRWXG | S_IRWXO)) { warnx("%s: group/world readable/writeable", fname); return (-1); } return (0); } struct file * pushfile(const char *name, int secret) { struct file *nfile; if ((nfile = calloc(1, sizeof(struct file))) == NULL || (nfile->name = strdup(name)) == NULL) { warn("malloc"); return (NULL); } if (TAILQ_FIRST(&files) == NULL && strcmp(nfile->name, "-") == 0) { nfile->stream = stdin; free(nfile->name); if ((nfile->name = strdup("stdin")) == NULL) { warn("strdup"); free(nfile); return (NULL); } } else if ((nfile->stream = fopen(nfile->name, "r")) == NULL) { warn("%s", nfile->name); free(nfile->name); free(nfile); return (NULL); } else if (secret && check_file_secrecy(fileno(nfile->stream), nfile->name)) { fclose(nfile->stream); free(nfile->name); free(nfile); return (NULL); } nfile->lineno = 1; TAILQ_INSERT_TAIL(&files, nfile, entry); return (nfile); } int popfile(void) { struct file *prev; if ((prev = TAILQ_PREV(file, files, entry)) != NULL) { prev->errors += file->errors; TAILQ_REMOVE(&files, file, entry); fclose(file->stream); free(file->name); free(file); file = prev; return (0); } return (EOF); } int parse_config(char *filename, struct pfctl *xpf) { int errors = 0; struct sym *sym; pf = xpf; errors = 0; rulestate = PFCTL_STATE_NONE; returnicmpdefault = (ICMP_UNREACH << 8) | ICMP_UNREACH_PORT; returnicmp6default = (ICMP6_DST_UNREACH << 8) | ICMP6_DST_UNREACH_NOPORT; blockpolicy = PFRULE_DROP; require_order = 1; if ((file = pushfile(filename, 0)) == NULL) { warn("cannot open the main config file!"); return (-1); } yyparse(); errors = file->errors; popfile(); /* Free macros and check which have not been used. */ while ((sym = TAILQ_FIRST(&symhead))) { if ((pf->opts & PF_OPT_VERBOSE2) && !sym->used) fprintf(stderr, "warning: macro '%s' not " "used\n", sym->nam); free(sym->nam); free(sym->val); TAILQ_REMOVE(&symhead, sym, entry); free(sym); } return (errors ? -1 : 0); } int symset(const char *nam, const char *val, int persist) { struct sym *sym; for (sym = TAILQ_FIRST(&symhead); sym && strcmp(nam, sym->nam); sym = TAILQ_NEXT(sym, entry)) ; /* nothing */ if (sym != NULL) { if (sym->persist == 1) return (0); else { free(sym->nam); free(sym->val); TAILQ_REMOVE(&symhead, sym, entry); free(sym); } } if ((sym = calloc(1, sizeof(*sym))) == NULL) return (-1); sym->nam = strdup(nam); if (sym->nam == NULL) { free(sym); return (-1); } sym->val = strdup(val); if (sym->val == NULL) { free(sym->nam); free(sym); return (-1); } sym->used = 0; sym->persist = persist; TAILQ_INSERT_TAIL(&symhead, sym, entry); return (0); } int pfctl_cmdline_symset(char *s) { char *sym, *val; int ret; if ((val = strrchr(s, '=')) == NULL) return (-1); if ((sym = malloc(strlen(s) - strlen(val) + 1)) == NULL) err(1, "pfctl_cmdline_symset: malloc"); strlcpy(sym, s, strlen(s) - strlen(val) + 1); ret = symset(sym, val + 1, 1); free(sym); return (ret); } char * symget(const char *nam) { struct sym *sym; TAILQ_FOREACH(sym, &symhead, entry) if (strcmp(nam, sym->nam) == 0) { sym->used = 1; return (sym->val); } return (NULL); } void mv_rules(struct pf_ruleset *src, struct pf_ruleset *dst) { int i; struct pf_rule *r; for (i = 0; i < PF_RULESET_MAX; ++i) { while ((r = TAILQ_FIRST(src->rules[i].active.ptr)) != NULL) { TAILQ_REMOVE(src->rules[i].active.ptr, r, entries); TAILQ_INSERT_TAIL(dst->rules[i].active.ptr, r, entries); dst->anchor->match++; } src->anchor->match = 0; while ((r = TAILQ_FIRST(src->rules[i].inactive.ptr)) != NULL) { TAILQ_REMOVE(src->rules[i].inactive.ptr, r, entries); TAILQ_INSERT_TAIL(dst->rules[i].inactive.ptr, r, entries); } } } void decide_address_family(struct node_host *n, sa_family_t *af) { if (*af != 0 || n == NULL) return; *af = n->af; while ((n = n->next) != NULL) { if (n->af != *af) { *af = 0; return; } } } void remove_invalid_hosts(struct node_host **nh, sa_family_t *af) { struct node_host *n = *nh, *prev = NULL; while (n != NULL) { if (*af && n->af && n->af != *af) { /* unlink and free n */ struct node_host *next = n->next; /* adjust tail pointer */ if (n == (*nh)->tail) (*nh)->tail = prev; /* adjust previous node's next pointer */ if (prev == NULL) *nh = next; else prev->next = next; /* free node */ if (n->ifname != NULL) free(n->ifname); free(n); n = next; } else { if (n->af && !*af) *af = n->af; prev = n; n = n->next; } } } int invalid_redirect(struct node_host *nh, sa_family_t af) { if (!af) { struct node_host *n; /* tables and dyniftl are ok without an address family */ for (n = nh; n != NULL; n = n->next) { if (n->addr.type != PF_ADDR_TABLE && n->addr.type != PF_ADDR_DYNIFTL) { yyerror("address family not given and " "translation address expands to multiple " "address families"); return (1); } } } if (nh == NULL) { yyerror("no translation address with matching address family " "found."); return (1); } return (0); } int atoul(char *s, u_long *ulvalp) { u_long ulval; char *ep; errno = 0; ulval = strtoul(s, &ep, 0); if (s[0] == '\0' || *ep != '\0') return (-1); if (errno == ERANGE && ulval == ULONG_MAX) return (-1); *ulvalp = ulval; return (0); } int getservice(char *n) { struct servent *s; u_long ulval; if (atoul(n, &ulval) == 0) { if (ulval > 65535) { yyerror("illegal port value %lu", ulval); return (-1); } return (htons(ulval)); } else { s = getservbyname(n, "tcp"); if (s == NULL) s = getservbyname(n, "udp"); if (s == NULL) { yyerror("unknown port %s", n); return (-1); } return (s->s_port); } } int rule_label(struct pf_rule *r, char *s) { if (s) { if (strlcpy(r->label, s, sizeof(r->label)) >= sizeof(r->label)) { yyerror("rule label too long (max %d chars)", sizeof(r->label)-1); return (-1); } } return (0); } u_int16_t parseicmpspec(char *w, sa_family_t af) { const struct icmpcodeent *p; u_long ulval; u_int8_t icmptype; if (af == AF_INET) icmptype = returnicmpdefault >> 8; else icmptype = returnicmp6default >> 8; if (atoul(w, &ulval) == -1) { if ((p = geticmpcodebyname(icmptype, w, af)) == NULL) { yyerror("unknown icmp code %s", w); return (0); } ulval = p->code; } if (ulval > 255) { yyerror("invalid icmp code %lu", ulval); return (0); } return (icmptype << 8 | ulval); } int parseport(char *port, struct range *r, int extensions) { char *p = strchr(port, ':'); if (p == NULL) { if ((r->a = getservice(port)) == -1) return (-1); r->b = 0; r->t = PF_OP_NONE; return (0); } if ((extensions & PPORT_STAR) && !strcmp(p+1, "*")) { *p = 0; if ((r->a = getservice(port)) == -1) return (-1); r->b = 0; r->t = PF_OP_IRG; return (0); } if ((extensions & PPORT_RANGE)) { *p++ = 0; if ((r->a = getservice(port)) == -1 || (r->b = getservice(p)) == -1) return (-1); if (r->a == r->b) { r->b = 0; r->t = PF_OP_NONE; } else r->t = PF_OP_RRG; return (0); } return (-1); } int pfctl_load_anchors(int dev, struct pfctl *pf, struct pfr_buffer *trans) { struct loadanchors *la; TAILQ_FOREACH(la, &loadanchorshead, entries) { if (pf->opts & PF_OPT_VERBOSE) fprintf(stderr, "\nLoading anchor %s from %s\n", la->anchorname, la->filename); if (pfctl_rules(dev, la->filename, pf->opts, pf->optimize, la->anchorname, trans) == -1) return (-1); } return (0); } int rt_tableid_max(void) { #ifdef __FreeBSD__ int fibs; size_t l = sizeof(fibs); if (sysctlbyname("net.fibs", &fibs, &l, NULL, 0) == -1) fibs = 16; /* XXX RT_MAXFIBS, at least limit it some. */ /* * As the OpenBSD code only compares > and not >= we need to adjust * here given we only accept values of 0..n and want to avoid #ifdefs * in the grammar. */ return (fibs - 1); #else return (RT_TABLEID_MAX); #endif } Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl.c =================================================================== --- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl.c (revision 303775) @@ -1,2388 +1,2388 @@ /* $OpenBSD: pfctl.c,v 1.278 2008/08/31 20:18:17 jmc Exp $ */ /* * Copyright (c) 2001 Daniel Hartmeier * Copyright (c) 2002,2003 Henning Brauer * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * - Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided * with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE * COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pfctl_parser.h" #include "pfctl.h" void usage(void); int pfctl_enable(int, int); int pfctl_disable(int, int); int pfctl_clear_stats(int, int); int pfctl_clear_interface_flags(int, int); int pfctl_clear_rules(int, int, char *); int pfctl_clear_nat(int, int, char *); int pfctl_clear_altq(int, int); int pfctl_clear_src_nodes(int, int); int pfctl_clear_states(int, const char *, int); void pfctl_addrprefix(char *, struct pf_addr *); int pfctl_kill_src_nodes(int, const char *, int); int pfctl_net_kill_states(int, const char *, int); int pfctl_label_kill_states(int, const char *, int); int pfctl_id_kill_states(int, const char *, int); void pfctl_init_options(struct pfctl *); int pfctl_load_options(struct pfctl *); int pfctl_load_limit(struct pfctl *, unsigned int, unsigned int); int pfctl_load_timeout(struct pfctl *, unsigned int, unsigned int); int pfctl_load_debug(struct pfctl *, unsigned int); int pfctl_load_logif(struct pfctl *, char *); int pfctl_load_hostid(struct pfctl *, u_int32_t); int pfctl_get_pool(int, struct pf_pool *, u_int32_t, u_int32_t, int, char *); void pfctl_print_rule_counters(struct pf_rule *, int); int pfctl_show_rules(int, char *, int, enum pfctl_show, char *, int); int pfctl_show_nat(int, int, char *); int pfctl_show_src_nodes(int, int); int pfctl_show_states(int, const char *, int); int pfctl_show_status(int, int); int pfctl_show_timeouts(int, int); int pfctl_show_limits(int, int); void pfctl_debug(int, u_int32_t, int); int pfctl_test_altqsupport(int, int); int pfctl_show_anchors(int, int, char *); int pfctl_ruleset_trans(struct pfctl *, char *, struct pf_anchor *); int pfctl_load_ruleset(struct pfctl *, char *, struct pf_ruleset *, int, int); int pfctl_load_rule(struct pfctl *, char *, struct pf_rule *, int); const char *pfctl_lookup_option(char *, const char * const *); -struct pf_anchor_global pf_anchors; -struct pf_anchor pf_main_anchor; +static struct pf_anchor_global pf_anchors; +static struct pf_anchor pf_main_anchor; -const char *clearopt; -char *rulesopt; -const char *showopt; -const char *debugopt; -char *anchoropt; -const char *optiopt = NULL; -const char *pf_device = "/dev/pf"; -char *ifaceopt; -char *tableopt; -const char *tblcmdopt; -int src_node_killers; -char *src_node_kill[2]; -int state_killers; -char *state_kill[2]; -int loadopt; -int altqsupport; +static const char *clearopt; +static char *rulesopt; +static const char *showopt; +static const char *debugopt; +static char *anchoropt; +static const char *optiopt = NULL; +static const char *pf_device = "/dev/pf"; +static char *ifaceopt; +static char *tableopt; +static const char *tblcmdopt; +static int src_node_killers; +static char *src_node_kill[2]; +static int state_killers; +static char *state_kill[2]; +int loadopt; +int altqsupport; -int dev = -1; -int first_title = 1; -int labels = 0; +int dev = -1; +static int first_title = 1; +static int labels = 0; #define INDENT(d, o) do { \ if (o) { \ int i; \ for (i=0; i < d; i++) \ printf(" "); \ } \ } while (0); \ static const struct { const char *name; int index; } pf_limits[] = { { "states", PF_LIMIT_STATES }, { "src-nodes", PF_LIMIT_SRC_NODES }, { "frags", PF_LIMIT_FRAGS }, { "table-entries", PF_LIMIT_TABLE_ENTRIES }, { NULL, 0 } }; struct pf_hint { const char *name; int timeout; }; static const struct pf_hint pf_hint_normal[] = { { "tcp.first", 2 * 60 }, { "tcp.opening", 30 }, { "tcp.established", 24 * 60 * 60 }, { "tcp.closing", 15 * 60 }, { "tcp.finwait", 45 }, { "tcp.closed", 90 }, { "tcp.tsdiff", 30 }, { NULL, 0 } }; static const struct pf_hint pf_hint_satellite[] = { { "tcp.first", 3 * 60 }, { "tcp.opening", 30 + 5 }, { "tcp.established", 24 * 60 * 60 }, { "tcp.closing", 15 * 60 + 5 }, { "tcp.finwait", 45 + 5 }, { "tcp.closed", 90 + 5 }, { "tcp.tsdiff", 60 }, { NULL, 0 } }; static const struct pf_hint pf_hint_conservative[] = { { "tcp.first", 60 * 60 }, { "tcp.opening", 15 * 60 }, { "tcp.established", 5 * 24 * 60 * 60 }, { "tcp.closing", 60 * 60 }, { "tcp.finwait", 10 * 60 }, { "tcp.closed", 3 * 60 }, { "tcp.tsdiff", 60 }, { NULL, 0 } }; static const struct pf_hint pf_hint_aggressive[] = { { "tcp.first", 30 }, { "tcp.opening", 5 }, { "tcp.established", 5 * 60 * 60 }, { "tcp.closing", 60 }, { "tcp.finwait", 30 }, { "tcp.closed", 30 }, { "tcp.tsdiff", 10 }, { NULL, 0 } }; static const struct { const char *name; const struct pf_hint *hint; } pf_hints[] = { { "normal", pf_hint_normal }, { "satellite", pf_hint_satellite }, { "high-latency", pf_hint_satellite }, { "conservative", pf_hint_conservative }, { "aggressive", pf_hint_aggressive }, { NULL, NULL } }; static const char * const clearopt_list[] = { "nat", "queue", "rules", "Sources", "states", "info", "Tables", "osfp", "all", NULL }; static const char * const showopt_list[] = { "nat", "queue", "rules", "Anchors", "Sources", "states", "info", "Interfaces", "labels", "timeouts", "memory", "Tables", "osfp", "all", NULL }; static const char * const tblcmdopt_list[] = { "kill", "flush", "add", "delete", "load", "replace", "show", "test", "zero", "expire", NULL }; static const char * const debugopt_list[] = { "none", "urgent", "misc", "loud", NULL }; static const char * const optiopt_list[] = { "none", "basic", "profile", NULL }; void usage(void) { extern char *__progname; fprintf(stderr, "usage: %s [-AdeghmNnOPqRrvz] [-a anchor] [-D macro=value] [-F modifier]\n" "\t[-f file] [-i interface] [-K host | network]\n" "\t[-k host | network | label | id] [-o level] [-p device]\n" "\t[-s modifier] [-t table -T command [address ...]] [-x level]\n", __progname); exit(1); } int pfctl_enable(int dev, int opts) { if (ioctl(dev, DIOCSTART)) { if (errno == EEXIST) errx(1, "pf already enabled"); else if (errno == ESRCH) errx(1, "pfil registeration failed"); else err(1, "DIOCSTART"); } if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "pf enabled\n"); if (altqsupport && ioctl(dev, DIOCSTARTALTQ)) if (errno != EEXIST) err(1, "DIOCSTARTALTQ"); return (0); } int pfctl_disable(int dev, int opts) { if (ioctl(dev, DIOCSTOP)) { if (errno == ENOENT) errx(1, "pf not enabled"); else err(1, "DIOCSTOP"); } if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "pf disabled\n"); if (altqsupport && ioctl(dev, DIOCSTOPALTQ)) if (errno != ENOENT) err(1, "DIOCSTOPALTQ"); return (0); } int pfctl_clear_stats(int dev, int opts) { if (ioctl(dev, DIOCCLRSTATUS)) err(1, "DIOCCLRSTATUS"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "pf: statistics cleared\n"); return (0); } int pfctl_clear_interface_flags(int dev, int opts) { struct pfioc_iface pi; if ((opts & PF_OPT_NOACTION) == 0) { bzero(&pi, sizeof(pi)); pi.pfiio_flags = PFI_IFLAG_SKIP; if (ioctl(dev, DIOCCLRIFFLAG, &pi)) err(1, "DIOCCLRIFFLAG"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "pf: interface flags reset\n"); } return (0); } int pfctl_clear_rules(int dev, int opts, char *anchorname) { struct pfr_buffer t; memset(&t, 0, sizeof(t)); t.pfrb_type = PFRB_TRANS; if (pfctl_add_trans(&t, PF_RULESET_SCRUB, anchorname) || pfctl_add_trans(&t, PF_RULESET_FILTER, anchorname) || pfctl_trans(dev, &t, DIOCXBEGIN, 0) || pfctl_trans(dev, &t, DIOCXCOMMIT, 0)) err(1, "pfctl_clear_rules"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "rules cleared\n"); return (0); } int pfctl_clear_nat(int dev, int opts, char *anchorname) { struct pfr_buffer t; memset(&t, 0, sizeof(t)); t.pfrb_type = PFRB_TRANS; if (pfctl_add_trans(&t, PF_RULESET_NAT, anchorname) || pfctl_add_trans(&t, PF_RULESET_BINAT, anchorname) || pfctl_add_trans(&t, PF_RULESET_RDR, anchorname) || pfctl_trans(dev, &t, DIOCXBEGIN, 0) || pfctl_trans(dev, &t, DIOCXCOMMIT, 0)) err(1, "pfctl_clear_nat"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "nat cleared\n"); return (0); } int pfctl_clear_altq(int dev, int opts) { struct pfr_buffer t; if (!altqsupport) return (-1); memset(&t, 0, sizeof(t)); t.pfrb_type = PFRB_TRANS; if (pfctl_add_trans(&t, PF_RULESET_ALTQ, "") || pfctl_trans(dev, &t, DIOCXBEGIN, 0) || pfctl_trans(dev, &t, DIOCXCOMMIT, 0)) err(1, "pfctl_clear_altq"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "altq cleared\n"); return (0); } int pfctl_clear_src_nodes(int dev, int opts) { if (ioctl(dev, DIOCCLRSRCNODES)) err(1, "DIOCCLRSRCNODES"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "source tracking entries cleared\n"); return (0); } int pfctl_clear_states(int dev, const char *iface, int opts) { struct pfioc_state_kill psk; memset(&psk, 0, sizeof(psk)); if (iface != NULL && strlcpy(psk.psk_ifname, iface, sizeof(psk.psk_ifname)) >= sizeof(psk.psk_ifname)) errx(1, "invalid interface: %s", iface); if (ioctl(dev, DIOCCLRSTATES, &psk)) err(1, "DIOCCLRSTATES"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "%d states cleared\n", psk.psk_killed); return (0); } void pfctl_addrprefix(char *addr, struct pf_addr *mask) { char *p; const char *errstr; int prefix, ret_ga, q, r; struct addrinfo hints, *res; if ((p = strchr(addr, '/')) == NULL) return; *p++ = '\0'; prefix = strtonum(p, 0, 128, &errstr); if (errstr) errx(1, "prefix is %s: %s", errstr, p); bzero(&hints, sizeof(hints)); /* prefix only with numeric addresses */ hints.ai_flags |= AI_NUMERICHOST; if ((ret_ga = getaddrinfo(addr, NULL, &hints, &res))) { errx(1, "getaddrinfo: %s", gai_strerror(ret_ga)); /* NOTREACHED */ } if (res->ai_family == AF_INET && prefix > 32) errx(1, "prefix too long for AF_INET"); else if (res->ai_family == AF_INET6 && prefix > 128) errx(1, "prefix too long for AF_INET6"); q = prefix >> 3; r = prefix & 7; switch (res->ai_family) { case AF_INET: bzero(&mask->v4, sizeof(mask->v4)); mask->v4.s_addr = htonl((u_int32_t) (0xffffffffffULL << (32 - prefix))); break; case AF_INET6: bzero(&mask->v6, sizeof(mask->v6)); if (q > 0) memset((void *)&mask->v6, 0xff, q); if (r > 0) *((u_char *)&mask->v6 + q) = (0xff00 >> r) & 0xff; break; } freeaddrinfo(res); } int pfctl_kill_src_nodes(int dev, const char *iface, int opts) { struct pfioc_src_node_kill psnk; struct addrinfo *res[2], *resp[2]; struct sockaddr last_src, last_dst; int killed, sources, dests; int ret_ga; killed = sources = dests = 0; memset(&psnk, 0, sizeof(psnk)); memset(&psnk.psnk_src.addr.v.a.mask, 0xff, sizeof(psnk.psnk_src.addr.v.a.mask)); memset(&last_src, 0xff, sizeof(last_src)); memset(&last_dst, 0xff, sizeof(last_dst)); pfctl_addrprefix(src_node_kill[0], &psnk.psnk_src.addr.v.a.mask); if ((ret_ga = getaddrinfo(src_node_kill[0], NULL, NULL, &res[0]))) { errx(1, "getaddrinfo: %s", gai_strerror(ret_ga)); /* NOTREACHED */ } for (resp[0] = res[0]; resp[0]; resp[0] = resp[0]->ai_next) { if (resp[0]->ai_addr == NULL) continue; /* We get lots of duplicates. Catch the easy ones */ if (memcmp(&last_src, resp[0]->ai_addr, sizeof(last_src)) == 0) continue; last_src = *(struct sockaddr *)resp[0]->ai_addr; psnk.psnk_af = resp[0]->ai_family; sources++; if (psnk.psnk_af == AF_INET) psnk.psnk_src.addr.v.a.addr.v4 = ((struct sockaddr_in *)resp[0]->ai_addr)->sin_addr; else if (psnk.psnk_af == AF_INET6) psnk.psnk_src.addr.v.a.addr.v6 = ((struct sockaddr_in6 *)resp[0]->ai_addr)-> sin6_addr; else errx(1, "Unknown address family %d", psnk.psnk_af); if (src_node_killers > 1) { dests = 0; memset(&psnk.psnk_dst.addr.v.a.mask, 0xff, sizeof(psnk.psnk_dst.addr.v.a.mask)); memset(&last_dst, 0xff, sizeof(last_dst)); pfctl_addrprefix(src_node_kill[1], &psnk.psnk_dst.addr.v.a.mask); if ((ret_ga = getaddrinfo(src_node_kill[1], NULL, NULL, &res[1]))) { errx(1, "getaddrinfo: %s", gai_strerror(ret_ga)); /* NOTREACHED */ } for (resp[1] = res[1]; resp[1]; resp[1] = resp[1]->ai_next) { if (resp[1]->ai_addr == NULL) continue; if (psnk.psnk_af != resp[1]->ai_family) continue; if (memcmp(&last_dst, resp[1]->ai_addr, sizeof(last_dst)) == 0) continue; last_dst = *(struct sockaddr *)resp[1]->ai_addr; dests++; if (psnk.psnk_af == AF_INET) psnk.psnk_dst.addr.v.a.addr.v4 = ((struct sockaddr_in *)resp[1]-> ai_addr)->sin_addr; else if (psnk.psnk_af == AF_INET6) psnk.psnk_dst.addr.v.a.addr.v6 = ((struct sockaddr_in6 *)resp[1]-> ai_addr)->sin6_addr; else errx(1, "Unknown address family %d", psnk.psnk_af); if (ioctl(dev, DIOCKILLSRCNODES, &psnk)) err(1, "DIOCKILLSRCNODES"); killed += psnk.psnk_killed; } freeaddrinfo(res[1]); } else { if (ioctl(dev, DIOCKILLSRCNODES, &psnk)) err(1, "DIOCKILLSRCNODES"); killed += psnk.psnk_killed; } } freeaddrinfo(res[0]); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "killed %d src nodes from %d sources and %d " "destinations\n", killed, sources, dests); return (0); } int pfctl_net_kill_states(int dev, const char *iface, int opts) { struct pfioc_state_kill psk; struct addrinfo *res[2], *resp[2]; struct sockaddr last_src, last_dst; int killed, sources, dests; int ret_ga; killed = sources = dests = 0; memset(&psk, 0, sizeof(psk)); memset(&psk.psk_src.addr.v.a.mask, 0xff, sizeof(psk.psk_src.addr.v.a.mask)); memset(&last_src, 0xff, sizeof(last_src)); memset(&last_dst, 0xff, sizeof(last_dst)); if (iface != NULL && strlcpy(psk.psk_ifname, iface, sizeof(psk.psk_ifname)) >= sizeof(psk.psk_ifname)) errx(1, "invalid interface: %s", iface); pfctl_addrprefix(state_kill[0], &psk.psk_src.addr.v.a.mask); if ((ret_ga = getaddrinfo(state_kill[0], NULL, NULL, &res[0]))) { errx(1, "getaddrinfo: %s", gai_strerror(ret_ga)); /* NOTREACHED */ } for (resp[0] = res[0]; resp[0]; resp[0] = resp[0]->ai_next) { if (resp[0]->ai_addr == NULL) continue; /* We get lots of duplicates. Catch the easy ones */ if (memcmp(&last_src, resp[0]->ai_addr, sizeof(last_src)) == 0) continue; last_src = *(struct sockaddr *)resp[0]->ai_addr; psk.psk_af = resp[0]->ai_family; sources++; if (psk.psk_af == AF_INET) psk.psk_src.addr.v.a.addr.v4 = ((struct sockaddr_in *)resp[0]->ai_addr)->sin_addr; else if (psk.psk_af == AF_INET6) psk.psk_src.addr.v.a.addr.v6 = ((struct sockaddr_in6 *)resp[0]->ai_addr)-> sin6_addr; else errx(1, "Unknown address family %d", psk.psk_af); if (state_killers > 1) { dests = 0; memset(&psk.psk_dst.addr.v.a.mask, 0xff, sizeof(psk.psk_dst.addr.v.a.mask)); memset(&last_dst, 0xff, sizeof(last_dst)); pfctl_addrprefix(state_kill[1], &psk.psk_dst.addr.v.a.mask); if ((ret_ga = getaddrinfo(state_kill[1], NULL, NULL, &res[1]))) { errx(1, "getaddrinfo: %s", gai_strerror(ret_ga)); /* NOTREACHED */ } for (resp[1] = res[1]; resp[1]; resp[1] = resp[1]->ai_next) { if (resp[1]->ai_addr == NULL) continue; if (psk.psk_af != resp[1]->ai_family) continue; if (memcmp(&last_dst, resp[1]->ai_addr, sizeof(last_dst)) == 0) continue; last_dst = *(struct sockaddr *)resp[1]->ai_addr; dests++; if (psk.psk_af == AF_INET) psk.psk_dst.addr.v.a.addr.v4 = ((struct sockaddr_in *)resp[1]-> ai_addr)->sin_addr; else if (psk.psk_af == AF_INET6) psk.psk_dst.addr.v.a.addr.v6 = ((struct sockaddr_in6 *)resp[1]-> ai_addr)->sin6_addr; else errx(1, "Unknown address family %d", psk.psk_af); if (ioctl(dev, DIOCKILLSTATES, &psk)) err(1, "DIOCKILLSTATES"); killed += psk.psk_killed; } freeaddrinfo(res[1]); } else { if (ioctl(dev, DIOCKILLSTATES, &psk)) err(1, "DIOCKILLSTATES"); killed += psk.psk_killed; } } freeaddrinfo(res[0]); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "killed %d states from %d sources and %d " "destinations\n", killed, sources, dests); return (0); } int pfctl_label_kill_states(int dev, const char *iface, int opts) { struct pfioc_state_kill psk; if (state_killers != 2 || (strlen(state_kill[1]) == 0)) { warnx("no label specified"); usage(); } memset(&psk, 0, sizeof(psk)); if (iface != NULL && strlcpy(psk.psk_ifname, iface, sizeof(psk.psk_ifname)) >= sizeof(psk.psk_ifname)) errx(1, "invalid interface: %s", iface); if (strlcpy(psk.psk_label, state_kill[1], sizeof(psk.psk_label)) >= sizeof(psk.psk_label)) errx(1, "label too long: %s", state_kill[1]); if (ioctl(dev, DIOCKILLSTATES, &psk)) err(1, "DIOCKILLSTATES"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "killed %d states\n", psk.psk_killed); return (0); } int pfctl_id_kill_states(int dev, const char *iface, int opts) { struct pfioc_state_kill psk; if (state_killers != 2 || (strlen(state_kill[1]) == 0)) { warnx("no id specified"); usage(); } memset(&psk, 0, sizeof(psk)); if ((sscanf(state_kill[1], "%jx/%x", &psk.psk_pfcmp.id, &psk.psk_pfcmp.creatorid)) == 2) HTONL(psk.psk_pfcmp.creatorid); else if ((sscanf(state_kill[1], "%jx", &psk.psk_pfcmp.id)) == 1) { psk.psk_pfcmp.creatorid = 0; } else { warnx("wrong id format specified"); usage(); } if (psk.psk_pfcmp.id == 0) { warnx("cannot kill id 0"); usage(); } psk.psk_pfcmp.id = htobe64(psk.psk_pfcmp.id); if (ioctl(dev, DIOCKILLSTATES, &psk)) err(1, "DIOCKILLSTATES"); if ((opts & PF_OPT_QUIET) == 0) fprintf(stderr, "killed %d states\n", psk.psk_killed); return (0); } int pfctl_get_pool(int dev, struct pf_pool *pool, u_int32_t nr, u_int32_t ticket, int r_action, char *anchorname) { struct pfioc_pooladdr pp; struct pf_pooladdr *pa; u_int32_t pnr, mpnr; memset(&pp, 0, sizeof(pp)); memcpy(pp.anchor, anchorname, sizeof(pp.anchor)); pp.r_action = r_action; pp.r_num = nr; pp.ticket = ticket; if (ioctl(dev, DIOCGETADDRS, &pp)) { warn("DIOCGETADDRS"); return (-1); } mpnr = pp.nr; TAILQ_INIT(&pool->list); for (pnr = 0; pnr < mpnr; ++pnr) { pp.nr = pnr; if (ioctl(dev, DIOCGETADDR, &pp)) { warn("DIOCGETADDR"); return (-1); } pa = calloc(1, sizeof(struct pf_pooladdr)); if (pa == NULL) err(1, "calloc"); bcopy(&pp.addr, pa, sizeof(struct pf_pooladdr)); TAILQ_INSERT_TAIL(&pool->list, pa, entries); } return (0); } void pfctl_move_pool(struct pf_pool *src, struct pf_pool *dst) { struct pf_pooladdr *pa; while ((pa = TAILQ_FIRST(&src->list)) != NULL) { TAILQ_REMOVE(&src->list, pa, entries); TAILQ_INSERT_TAIL(&dst->list, pa, entries); } } void pfctl_clear_pool(struct pf_pool *pool) { struct pf_pooladdr *pa; while ((pa = TAILQ_FIRST(&pool->list)) != NULL) { TAILQ_REMOVE(&pool->list, pa, entries); free(pa); } } void pfctl_print_rule_counters(struct pf_rule *rule, int opts) { if (opts & PF_OPT_DEBUG) { const char *t[PF_SKIP_COUNT] = { "i", "d", "f", "p", "sa", "sp", "da", "dp" }; int i; printf(" [ Skip steps: "); for (i = 0; i < PF_SKIP_COUNT; ++i) { if (rule->skip[i].nr == rule->nr + 1) continue; printf("%s=", t[i]); if (rule->skip[i].nr == -1) printf("end "); else printf("%u ", rule->skip[i].nr); } printf("]\n"); printf(" [ queue: qname=%s qid=%u pqname=%s pqid=%u ]\n", rule->qname, rule->qid, rule->pqname, rule->pqid); } if (opts & PF_OPT_VERBOSE) { printf(" [ Evaluations: %-8llu Packets: %-8llu " "Bytes: %-10llu States: %-6ju]\n", (unsigned long long)rule->evaluations, (unsigned long long)(rule->packets[0] + rule->packets[1]), (unsigned long long)(rule->bytes[0] + rule->bytes[1]), (uintmax_t)rule->u_states_cur); if (!(opts & PF_OPT_DEBUG)) printf(" [ Inserted: uid %u pid %u " "State Creations: %-6ju]\n", (unsigned)rule->cuid, (unsigned)rule->cpid, (uintmax_t)rule->u_states_tot); } } void pfctl_print_title(char *title) { if (!first_title) printf("\n"); first_title = 0; printf("%s\n", title); } int pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format, char *anchorname, int depth) { struct pfioc_rule pr; u_int32_t nr, mnr, header = 0; int rule_numbers = opts & (PF_OPT_VERBOSE2 | PF_OPT_DEBUG); int numeric = opts & PF_OPT_NUMERIC; int len = strlen(path); int brace; char *p; if (path[0]) snprintf(&path[len], MAXPATHLEN - len, "/%s", anchorname); else snprintf(&path[len], MAXPATHLEN - len, "%s", anchorname); memset(&pr, 0, sizeof(pr)); memcpy(pr.anchor, path, sizeof(pr.anchor)); if (opts & PF_OPT_SHOWALL) { pr.rule.action = PF_PASS; if (ioctl(dev, DIOCGETRULES, &pr)) { warn("DIOCGETRULES"); goto error; } header++; } pr.rule.action = PF_SCRUB; if (ioctl(dev, DIOCGETRULES, &pr)) { warn("DIOCGETRULES"); goto error; } if (opts & PF_OPT_SHOWALL) { if (format == PFCTL_SHOW_RULES && (pr.nr > 0 || header)) pfctl_print_title("FILTER RULES:"); else if (format == PFCTL_SHOW_LABELS && labels) pfctl_print_title("LABEL COUNTERS:"); } mnr = pr.nr; if (opts & PF_OPT_CLRRULECTRS) pr.action = PF_GET_CLR_CNTR; for (nr = 0; nr < mnr; ++nr) { pr.nr = nr; if (ioctl(dev, DIOCGETRULE, &pr)) { warn("DIOCGETRULE"); goto error; } if (pfctl_get_pool(dev, &pr.rule.rpool, nr, pr.ticket, PF_SCRUB, path) != 0) goto error; switch (format) { case PFCTL_SHOW_LABELS: break; case PFCTL_SHOW_RULES: if (pr.rule.label[0] && (opts & PF_OPT_SHOWALL)) labels = 1; print_rule(&pr.rule, pr.anchor_call, rule_numbers, numeric); printf("\n"); pfctl_print_rule_counters(&pr.rule, opts); break; case PFCTL_SHOW_NOTHING: break; } pfctl_clear_pool(&pr.rule.rpool); } pr.rule.action = PF_PASS; if (ioctl(dev, DIOCGETRULES, &pr)) { warn("DIOCGETRULES"); goto error; } mnr = pr.nr; for (nr = 0; nr < mnr; ++nr) { pr.nr = nr; if (ioctl(dev, DIOCGETRULE, &pr)) { warn("DIOCGETRULE"); goto error; } if (pfctl_get_pool(dev, &pr.rule.rpool, nr, pr.ticket, PF_PASS, path) != 0) goto error; switch (format) { case PFCTL_SHOW_LABELS: if (pr.rule.label[0]) { printf("%s %llu %llu %llu %llu" " %llu %llu %llu %ju\n", pr.rule.label, (unsigned long long)pr.rule.evaluations, (unsigned long long)(pr.rule.packets[0] + pr.rule.packets[1]), (unsigned long long)(pr.rule.bytes[0] + pr.rule.bytes[1]), (unsigned long long)pr.rule.packets[0], (unsigned long long)pr.rule.bytes[0], (unsigned long long)pr.rule.packets[1], (unsigned long long)pr.rule.bytes[1], (uintmax_t)pr.rule.u_states_tot); } break; case PFCTL_SHOW_RULES: brace = 0; if (pr.rule.label[0] && (opts & PF_OPT_SHOWALL)) labels = 1; INDENT(depth, !(opts & PF_OPT_VERBOSE)); if (pr.anchor_call[0] && ((((p = strrchr(pr.anchor_call, '_')) != NULL) && ((void *)p == (void *)pr.anchor_call || *(--p) == '/')) || (opts & PF_OPT_RECURSE))) { brace++; if ((p = strrchr(pr.anchor_call, '/')) != NULL) p++; else p = &pr.anchor_call[0]; } else p = &pr.anchor_call[0]; print_rule(&pr.rule, p, rule_numbers, numeric); if (brace) printf(" {\n"); else printf("\n"); pfctl_print_rule_counters(&pr.rule, opts); if (brace) { pfctl_show_rules(dev, path, opts, format, p, depth + 1); INDENT(depth, !(opts & PF_OPT_VERBOSE)); printf("}\n"); } break; case PFCTL_SHOW_NOTHING: break; } pfctl_clear_pool(&pr.rule.rpool); } path[len] = '\0'; return (0); error: path[len] = '\0'; return (-1); } int pfctl_show_nat(int dev, int opts, char *anchorname) { struct pfioc_rule pr; u_int32_t mnr, nr; static int nattype[3] = { PF_NAT, PF_RDR, PF_BINAT }; int i, dotitle = opts & PF_OPT_SHOWALL; memset(&pr, 0, sizeof(pr)); memcpy(pr.anchor, anchorname, sizeof(pr.anchor)); for (i = 0; i < 3; i++) { pr.rule.action = nattype[i]; if (ioctl(dev, DIOCGETRULES, &pr)) { warn("DIOCGETRULES"); return (-1); } mnr = pr.nr; for (nr = 0; nr < mnr; ++nr) { pr.nr = nr; if (ioctl(dev, DIOCGETRULE, &pr)) { warn("DIOCGETRULE"); return (-1); } if (pfctl_get_pool(dev, &pr.rule.rpool, nr, pr.ticket, nattype[i], anchorname) != 0) return (-1); if (dotitle) { pfctl_print_title("TRANSLATION RULES:"); dotitle = 0; } print_rule(&pr.rule, pr.anchor_call, opts & PF_OPT_VERBOSE2, opts & PF_OPT_NUMERIC); printf("\n"); pfctl_print_rule_counters(&pr.rule, opts); pfctl_clear_pool(&pr.rule.rpool); } } return (0); } int pfctl_show_src_nodes(int dev, int opts) { struct pfioc_src_nodes psn; struct pf_src_node *p; char *inbuf = NULL, *newinbuf = NULL; unsigned int len = 0; int i; memset(&psn, 0, sizeof(psn)); for (;;) { psn.psn_len = len; if (len) { newinbuf = realloc(inbuf, len); if (newinbuf == NULL) err(1, "realloc"); psn.psn_buf = inbuf = newinbuf; } if (ioctl(dev, DIOCGETSRCNODES, &psn) < 0) { warn("DIOCGETSRCNODES"); free(inbuf); return (-1); } if (psn.psn_len + sizeof(struct pfioc_src_nodes) < len) break; if (len == 0 && psn.psn_len == 0) goto done; if (len == 0 && psn.psn_len != 0) len = psn.psn_len; if (psn.psn_len == 0) goto done; /* no src_nodes */ len *= 2; } p = psn.psn_src_nodes; if (psn.psn_len > 0 && (opts & PF_OPT_SHOWALL)) pfctl_print_title("SOURCE TRACKING NODES:"); for (i = 0; i < psn.psn_len; i += sizeof(*p)) { print_src_node(p, opts); p++; } done: free(inbuf); return (0); } int pfctl_show_states(int dev, const char *iface, int opts) { struct pfioc_states ps; struct pfsync_state *p; char *inbuf = NULL, *newinbuf = NULL; unsigned int len = 0; int i, dotitle = (opts & PF_OPT_SHOWALL); memset(&ps, 0, sizeof(ps)); for (;;) { ps.ps_len = len; if (len) { newinbuf = realloc(inbuf, len); if (newinbuf == NULL) err(1, "realloc"); ps.ps_buf = inbuf = newinbuf; } if (ioctl(dev, DIOCGETSTATES, &ps) < 0) { warn("DIOCGETSTATES"); free(inbuf); return (-1); } if (ps.ps_len + sizeof(struct pfioc_states) < len) break; if (len == 0 && ps.ps_len == 0) goto done; if (len == 0 && ps.ps_len != 0) len = ps.ps_len; if (ps.ps_len == 0) goto done; /* no states */ len *= 2; } p = ps.ps_states; for (i = 0; i < ps.ps_len; i += sizeof(*p), p++) { if (iface != NULL && strcmp(p->ifname, iface)) continue; if (dotitle) { pfctl_print_title("STATES:"); dotitle = 0; } print_state(p, opts); } done: free(inbuf); return (0); } int pfctl_show_status(int dev, int opts) { struct pf_status status; if (ioctl(dev, DIOCGETSTATUS, &status)) { warn("DIOCGETSTATUS"); return (-1); } if (opts & PF_OPT_SHOWALL) pfctl_print_title("INFO:"); print_status(&status, opts); return (0); } int pfctl_show_timeouts(int dev, int opts) { struct pfioc_tm pt; int i; if (opts & PF_OPT_SHOWALL) pfctl_print_title("TIMEOUTS:"); memset(&pt, 0, sizeof(pt)); for (i = 0; pf_timeouts[i].name; i++) { pt.timeout = pf_timeouts[i].timeout; if (ioctl(dev, DIOCGETTIMEOUT, &pt)) err(1, "DIOCGETTIMEOUT"); printf("%-20s %10d", pf_timeouts[i].name, pt.seconds); if (pf_timeouts[i].timeout >= PFTM_ADAPTIVE_START && pf_timeouts[i].timeout <= PFTM_ADAPTIVE_END) printf(" states"); else printf("s"); printf("\n"); } return (0); } int pfctl_show_limits(int dev, int opts) { struct pfioc_limit pl; int i; if (opts & PF_OPT_SHOWALL) pfctl_print_title("LIMITS:"); memset(&pl, 0, sizeof(pl)); for (i = 0; pf_limits[i].name; i++) { pl.index = pf_limits[i].index; if (ioctl(dev, DIOCGETLIMIT, &pl)) err(1, "DIOCGETLIMIT"); printf("%-13s ", pf_limits[i].name); if (pl.limit == UINT_MAX) printf("unlimited\n"); else printf("hard limit %8u\n", pl.limit); } return (0); } /* callbacks for rule/nat/rdr/addr */ int pfctl_add_pool(struct pfctl *pf, struct pf_pool *p, sa_family_t af) { struct pf_pooladdr *pa; if ((pf->opts & PF_OPT_NOACTION) == 0) { if (ioctl(pf->dev, DIOCBEGINADDRS, &pf->paddr)) err(1, "DIOCBEGINADDRS"); } pf->paddr.af = af; TAILQ_FOREACH(pa, &p->list, entries) { memcpy(&pf->paddr.addr, pa, sizeof(struct pf_pooladdr)); if ((pf->opts & PF_OPT_NOACTION) == 0) { if (ioctl(pf->dev, DIOCADDADDR, &pf->paddr)) err(1, "DIOCADDADDR"); } } return (0); } int pfctl_add_rule(struct pfctl *pf, struct pf_rule *r, const char *anchor_call) { u_int8_t rs_num; struct pf_rule *rule; struct pf_ruleset *rs; char *p; rs_num = pf_get_ruleset_number(r->action); if (rs_num == PF_RULESET_MAX) errx(1, "Invalid rule type %d", r->action); rs = &pf->anchor->ruleset; if (anchor_call[0] && r->anchor == NULL) { /* * Don't make non-brace anchors part of the main anchor pool. */ if ((r->anchor = calloc(1, sizeof(*r->anchor))) == NULL) err(1, "pfctl_add_rule: calloc"); pf_init_ruleset(&r->anchor->ruleset); r->anchor->ruleset.anchor = r->anchor; if (strlcpy(r->anchor->path, anchor_call, sizeof(rule->anchor->path)) >= sizeof(rule->anchor->path)) errx(1, "pfctl_add_rule: strlcpy"); if ((p = strrchr(anchor_call, '/')) != NULL) { if (!strlen(p)) err(1, "pfctl_add_rule: bad anchor name %s", anchor_call); } else p = (char *)anchor_call; if (strlcpy(r->anchor->name, p, sizeof(rule->anchor->name)) >= sizeof(rule->anchor->name)) errx(1, "pfctl_add_rule: strlcpy"); } if ((rule = calloc(1, sizeof(*rule))) == NULL) err(1, "calloc"); bcopy(r, rule, sizeof(*rule)); TAILQ_INIT(&rule->rpool.list); pfctl_move_pool(&r->rpool, &rule->rpool); TAILQ_INSERT_TAIL(rs->rules[rs_num].active.ptr, rule, entries); return (0); } int pfctl_ruleset_trans(struct pfctl *pf, char *path, struct pf_anchor *a) { int osize = pf->trans->pfrb_size; if ((pf->loadopt & PFCTL_FLAG_NAT) != 0) { if (pfctl_add_trans(pf->trans, PF_RULESET_NAT, path) || pfctl_add_trans(pf->trans, PF_RULESET_BINAT, path) || pfctl_add_trans(pf->trans, PF_RULESET_RDR, path)) return (1); } if (a == pf->astack[0] && ((altqsupport && (pf->loadopt & PFCTL_FLAG_ALTQ) != 0))) { if (pfctl_add_trans(pf->trans, PF_RULESET_ALTQ, path)) return (2); } if ((pf->loadopt & PFCTL_FLAG_FILTER) != 0) { if (pfctl_add_trans(pf->trans, PF_RULESET_SCRUB, path) || pfctl_add_trans(pf->trans, PF_RULESET_FILTER, path)) return (3); } if (pf->loadopt & PFCTL_FLAG_TABLE) if (pfctl_add_trans(pf->trans, PF_RULESET_TABLE, path)) return (4); if (pfctl_trans(pf->dev, pf->trans, DIOCXBEGIN, osize)) return (5); return (0); } int pfctl_load_ruleset(struct pfctl *pf, char *path, struct pf_ruleset *rs, int rs_num, int depth) { struct pf_rule *r; int error, len = strlen(path); int brace = 0; pf->anchor = rs->anchor; if (path[0]) snprintf(&path[len], MAXPATHLEN - len, "/%s", pf->anchor->name); else snprintf(&path[len], MAXPATHLEN - len, "%s", pf->anchor->name); if (depth) { if (TAILQ_FIRST(rs->rules[rs_num].active.ptr) != NULL) { brace++; if (pf->opts & PF_OPT_VERBOSE) printf(" {\n"); if ((pf->opts & PF_OPT_NOACTION) == 0 && (error = pfctl_ruleset_trans(pf, path, rs->anchor))) { printf("pfctl_load_rulesets: " "pfctl_ruleset_trans %d\n", error); goto error; } } else if (pf->opts & PF_OPT_VERBOSE) printf("\n"); } if (pf->optimize && rs_num == PF_RULESET_FILTER) pfctl_optimize_ruleset(pf, rs); while ((r = TAILQ_FIRST(rs->rules[rs_num].active.ptr)) != NULL) { TAILQ_REMOVE(rs->rules[rs_num].active.ptr, r, entries); if ((error = pfctl_load_rule(pf, path, r, depth))) goto error; if (r->anchor) { if ((error = pfctl_load_ruleset(pf, path, &r->anchor->ruleset, rs_num, depth + 1))) goto error; } else if (pf->opts & PF_OPT_VERBOSE) printf("\n"); free(r); } if (brace && pf->opts & PF_OPT_VERBOSE) { INDENT(depth - 1, (pf->opts & PF_OPT_VERBOSE)); printf("}\n"); } path[len] = '\0'; return (0); error: path[len] = '\0'; return (error); } int pfctl_load_rule(struct pfctl *pf, char *path, struct pf_rule *r, int depth) { u_int8_t rs_num = pf_get_ruleset_number(r->action); char *name; struct pfioc_rule pr; int len = strlen(path); bzero(&pr, sizeof(pr)); /* set up anchor before adding to path for anchor_call */ if ((pf->opts & PF_OPT_NOACTION) == 0) pr.ticket = pfctl_get_ticket(pf->trans, rs_num, path); if (strlcpy(pr.anchor, path, sizeof(pr.anchor)) >= sizeof(pr.anchor)) errx(1, "pfctl_load_rule: strlcpy"); if (r->anchor) { if (r->anchor->match) { if (path[0]) snprintf(&path[len], MAXPATHLEN - len, "/%s", r->anchor->name); else snprintf(&path[len], MAXPATHLEN - len, "%s", r->anchor->name); name = path; } else name = r->anchor->path; } else name = ""; if ((pf->opts & PF_OPT_NOACTION) == 0) { if (pfctl_add_pool(pf, &r->rpool, r->af)) return (1); pr.pool_ticket = pf->paddr.ticket; memcpy(&pr.rule, r, sizeof(pr.rule)); if (r->anchor && strlcpy(pr.anchor_call, name, sizeof(pr.anchor_call)) >= sizeof(pr.anchor_call)) errx(1, "pfctl_load_rule: strlcpy"); if (ioctl(pf->dev, DIOCADDRULE, &pr)) err(1, "DIOCADDRULE"); } if (pf->opts & PF_OPT_VERBOSE) { INDENT(depth, !(pf->opts & PF_OPT_VERBOSE2)); print_rule(r, r->anchor ? r->anchor->name : "", pf->opts & PF_OPT_VERBOSE2, pf->opts & PF_OPT_NUMERIC); } path[len] = '\0'; pfctl_clear_pool(&r->rpool); return (0); } int pfctl_add_altq(struct pfctl *pf, struct pf_altq *a) { if (altqsupport && (loadopt & PFCTL_FLAG_ALTQ) != 0) { memcpy(&pf->paltq->altq, a, sizeof(struct pf_altq)); if ((pf->opts & PF_OPT_NOACTION) == 0) { if (ioctl(pf->dev, DIOCADDALTQ, pf->paltq)) { if (errno == ENXIO) errx(1, "qtype not configured"); else if (errno == ENODEV) errx(1, "%s: driver does not support " "altq", a->ifname); else err(1, "DIOCADDALTQ"); } } pfaltq_store(&pf->paltq->altq); } return (0); } int pfctl_rules(int dev, char *filename, int opts, int optimize, char *anchorname, struct pfr_buffer *trans) { #define ERR(x) do { warn(x); goto _error; } while(0) #define ERRX(x) do { warnx(x); goto _error; } while(0) struct pfr_buffer *t, buf; struct pfioc_altq pa; struct pfctl pf; struct pf_ruleset *rs; struct pfr_table trs; char *path; int osize; RB_INIT(&pf_anchors); memset(&pf_main_anchor, 0, sizeof(pf_main_anchor)); pf_init_ruleset(&pf_main_anchor.ruleset); pf_main_anchor.ruleset.anchor = &pf_main_anchor; if (trans == NULL) { bzero(&buf, sizeof(buf)); buf.pfrb_type = PFRB_TRANS; t = &buf; osize = 0; } else { t = trans; osize = t->pfrb_size; } memset(&pa, 0, sizeof(pa)); memset(&pf, 0, sizeof(pf)); memset(&trs, 0, sizeof(trs)); if ((path = calloc(1, MAXPATHLEN)) == NULL) ERRX("pfctl_rules: calloc"); if (strlcpy(trs.pfrt_anchor, anchorname, sizeof(trs.pfrt_anchor)) >= sizeof(trs.pfrt_anchor)) ERRX("pfctl_rules: strlcpy"); pf.dev = dev; pf.opts = opts; pf.optimize = optimize; pf.loadopt = loadopt; /* non-brace anchor, create without resolving the path */ if ((pf.anchor = calloc(1, sizeof(*pf.anchor))) == NULL) ERRX("pfctl_rules: calloc"); rs = &pf.anchor->ruleset; pf_init_ruleset(rs); rs->anchor = pf.anchor; if (strlcpy(pf.anchor->path, anchorname, sizeof(pf.anchor->path)) >= sizeof(pf.anchor->path)) errx(1, "pfctl_add_rule: strlcpy"); if (strlcpy(pf.anchor->name, anchorname, sizeof(pf.anchor->name)) >= sizeof(pf.anchor->name)) errx(1, "pfctl_add_rule: strlcpy"); pf.astack[0] = pf.anchor; pf.asd = 0; if (anchorname[0]) pf.loadopt &= ~PFCTL_FLAG_ALTQ; pf.paltq = &pa; pf.trans = t; pfctl_init_options(&pf); if ((opts & PF_OPT_NOACTION) == 0) { /* * XXX For the time being we need to open transactions for * the main ruleset before parsing, because tables are still * loaded at parse time. */ if (pfctl_ruleset_trans(&pf, anchorname, pf.anchor)) ERRX("pfctl_rules"); if (altqsupport && (pf.loadopt & PFCTL_FLAG_ALTQ)) pa.ticket = pfctl_get_ticket(t, PF_RULESET_ALTQ, anchorname); if (pf.loadopt & PFCTL_FLAG_TABLE) pf.astack[0]->ruleset.tticket = pfctl_get_ticket(t, PF_RULESET_TABLE, anchorname); } if (parse_config(filename, &pf) < 0) { if ((opts & PF_OPT_NOACTION) == 0) ERRX("Syntax error in config file: " "pf rules not loaded"); else goto _error; } if ((pf.loadopt & PFCTL_FLAG_FILTER && (pfctl_load_ruleset(&pf, path, rs, PF_RULESET_SCRUB, 0))) || (pf.loadopt & PFCTL_FLAG_NAT && (pfctl_load_ruleset(&pf, path, rs, PF_RULESET_NAT, 0) || pfctl_load_ruleset(&pf, path, rs, PF_RULESET_RDR, 0) || pfctl_load_ruleset(&pf, path, rs, PF_RULESET_BINAT, 0))) || (pf.loadopt & PFCTL_FLAG_FILTER && pfctl_load_ruleset(&pf, path, rs, PF_RULESET_FILTER, 0))) { if ((opts & PF_OPT_NOACTION) == 0) ERRX("Unable to load rules into kernel"); else goto _error; } if ((altqsupport && (pf.loadopt & PFCTL_FLAG_ALTQ) != 0)) if (check_commit_altq(dev, opts) != 0) ERRX("errors in altq config"); /* process "load anchor" directives */ if (!anchorname[0]) if (pfctl_load_anchors(dev, &pf, t) == -1) ERRX("load anchors"); if (trans == NULL && (opts & PF_OPT_NOACTION) == 0) { if (!anchorname[0]) if (pfctl_load_options(&pf)) goto _error; if (pfctl_trans(dev, t, DIOCXCOMMIT, osize)) ERR("DIOCXCOMMIT"); } return (0); _error: if (trans == NULL) { /* main ruleset */ if ((opts & PF_OPT_NOACTION) == 0) if (pfctl_trans(dev, t, DIOCXROLLBACK, osize)) err(1, "DIOCXROLLBACK"); exit(1); } else { /* sub ruleset */ return (-1); } #undef ERR #undef ERRX } FILE * pfctl_fopen(const char *name, const char *mode) { struct stat st; FILE *fp; fp = fopen(name, mode); if (fp == NULL) return (NULL); if (fstat(fileno(fp), &st)) { fclose(fp); return (NULL); } if (S_ISDIR(st.st_mode)) { fclose(fp); errno = EISDIR; return (NULL); } return (fp); } void pfctl_init_options(struct pfctl *pf) { pf->timeout[PFTM_TCP_FIRST_PACKET] = PFTM_TCP_FIRST_PACKET_VAL; pf->timeout[PFTM_TCP_OPENING] = PFTM_TCP_OPENING_VAL; pf->timeout[PFTM_TCP_ESTABLISHED] = PFTM_TCP_ESTABLISHED_VAL; pf->timeout[PFTM_TCP_CLOSING] = PFTM_TCP_CLOSING_VAL; pf->timeout[PFTM_TCP_FIN_WAIT] = PFTM_TCP_FIN_WAIT_VAL; pf->timeout[PFTM_TCP_CLOSED] = PFTM_TCP_CLOSED_VAL; pf->timeout[PFTM_UDP_FIRST_PACKET] = PFTM_UDP_FIRST_PACKET_VAL; pf->timeout[PFTM_UDP_SINGLE] = PFTM_UDP_SINGLE_VAL; pf->timeout[PFTM_UDP_MULTIPLE] = PFTM_UDP_MULTIPLE_VAL; pf->timeout[PFTM_ICMP_FIRST_PACKET] = PFTM_ICMP_FIRST_PACKET_VAL; pf->timeout[PFTM_ICMP_ERROR_REPLY] = PFTM_ICMP_ERROR_REPLY_VAL; pf->timeout[PFTM_OTHER_FIRST_PACKET] = PFTM_OTHER_FIRST_PACKET_VAL; pf->timeout[PFTM_OTHER_SINGLE] = PFTM_OTHER_SINGLE_VAL; pf->timeout[PFTM_OTHER_MULTIPLE] = PFTM_OTHER_MULTIPLE_VAL; pf->timeout[PFTM_FRAG] = PFTM_FRAG_VAL; pf->timeout[PFTM_INTERVAL] = PFTM_INTERVAL_VAL; pf->timeout[PFTM_SRC_NODE] = PFTM_SRC_NODE_VAL; pf->timeout[PFTM_TS_DIFF] = PFTM_TS_DIFF_VAL; pf->timeout[PFTM_ADAPTIVE_START] = PFSTATE_ADAPT_START; pf->timeout[PFTM_ADAPTIVE_END] = PFSTATE_ADAPT_END; pf->limit[PF_LIMIT_STATES] = PFSTATE_HIWAT; pf->limit[PF_LIMIT_FRAGS] = PFFRAG_FRENT_HIWAT; pf->limit[PF_LIMIT_SRC_NODES] = PFSNODE_HIWAT; pf->limit[PF_LIMIT_TABLE_ENTRIES] = PFR_KENTRY_HIWAT; pf->debug = PF_DEBUG_URGENT; } int pfctl_load_options(struct pfctl *pf) { int i, error = 0; if ((loadopt & PFCTL_FLAG_OPTION) == 0) return (0); /* load limits */ for (i = 0; i < PF_LIMIT_MAX; i++) { if ((pf->opts & PF_OPT_MERGE) && !pf->limit_set[i]) continue; if (pfctl_load_limit(pf, i, pf->limit[i])) error = 1; } /* * If we've set the limit, but haven't explicitly set adaptive * timeouts, do it now with a start of 60% and end of 120%. */ if (pf->limit_set[PF_LIMIT_STATES] && !pf->timeout_set[PFTM_ADAPTIVE_START] && !pf->timeout_set[PFTM_ADAPTIVE_END]) { pf->timeout[PFTM_ADAPTIVE_START] = (pf->limit[PF_LIMIT_STATES] / 10) * 6; pf->timeout_set[PFTM_ADAPTIVE_START] = 1; pf->timeout[PFTM_ADAPTIVE_END] = (pf->limit[PF_LIMIT_STATES] / 10) * 12; pf->timeout_set[PFTM_ADAPTIVE_END] = 1; } /* load timeouts */ for (i = 0; i < PFTM_MAX; i++) { if ((pf->opts & PF_OPT_MERGE) && !pf->timeout_set[i]) continue; if (pfctl_load_timeout(pf, i, pf->timeout[i])) error = 1; } /* load debug */ if (!(pf->opts & PF_OPT_MERGE) || pf->debug_set) if (pfctl_load_debug(pf, pf->debug)) error = 1; /* load logif */ if (!(pf->opts & PF_OPT_MERGE) || pf->ifname_set) if (pfctl_load_logif(pf, pf->ifname)) error = 1; /* load hostid */ if (!(pf->opts & PF_OPT_MERGE) || pf->hostid_set) if (pfctl_load_hostid(pf, pf->hostid)) error = 1; return (error); } int pfctl_set_limit(struct pfctl *pf, const char *opt, unsigned int limit) { int i; for (i = 0; pf_limits[i].name; i++) { if (strcasecmp(opt, pf_limits[i].name) == 0) { pf->limit[pf_limits[i].index] = limit; pf->limit_set[pf_limits[i].index] = 1; break; } } if (pf_limits[i].name == NULL) { warnx("Bad pool name."); return (1); } if (pf->opts & PF_OPT_VERBOSE) printf("set limit %s %d\n", opt, limit); return (0); } int pfctl_load_limit(struct pfctl *pf, unsigned int index, unsigned int limit) { struct pfioc_limit pl; memset(&pl, 0, sizeof(pl)); pl.index = index; pl.limit = limit; if (ioctl(pf->dev, DIOCSETLIMIT, &pl)) { if (errno == EBUSY) warnx("Current pool size exceeds requested hard limit"); else warnx("DIOCSETLIMIT"); return (1); } return (0); } int pfctl_set_timeout(struct pfctl *pf, const char *opt, int seconds, int quiet) { int i; if ((loadopt & PFCTL_FLAG_OPTION) == 0) return (0); for (i = 0; pf_timeouts[i].name; i++) { if (strcasecmp(opt, pf_timeouts[i].name) == 0) { pf->timeout[pf_timeouts[i].timeout] = seconds; pf->timeout_set[pf_timeouts[i].timeout] = 1; break; } } if (pf_timeouts[i].name == NULL) { warnx("Bad timeout name."); return (1); } if (pf->opts & PF_OPT_VERBOSE && ! quiet) printf("set timeout %s %d\n", opt, seconds); return (0); } int pfctl_load_timeout(struct pfctl *pf, unsigned int timeout, unsigned int seconds) { struct pfioc_tm pt; memset(&pt, 0, sizeof(pt)); pt.timeout = timeout; pt.seconds = seconds; if (ioctl(pf->dev, DIOCSETTIMEOUT, &pt)) { warnx("DIOCSETTIMEOUT"); return (1); } return (0); } int pfctl_set_optimization(struct pfctl *pf, const char *opt) { const struct pf_hint *hint; int i, r; if ((loadopt & PFCTL_FLAG_OPTION) == 0) return (0); for (i = 0; pf_hints[i].name; i++) if (strcasecmp(opt, pf_hints[i].name) == 0) break; hint = pf_hints[i].hint; if (hint == NULL) { warnx("invalid state timeouts optimization"); return (1); } for (i = 0; hint[i].name; i++) if ((r = pfctl_set_timeout(pf, hint[i].name, hint[i].timeout, 1))) return (r); if (pf->opts & PF_OPT_VERBOSE) printf("set optimization %s\n", opt); return (0); } int pfctl_set_logif(struct pfctl *pf, char *ifname) { if ((loadopt & PFCTL_FLAG_OPTION) == 0) return (0); if (!strcmp(ifname, "none")) { free(pf->ifname); pf->ifname = NULL; } else { pf->ifname = strdup(ifname); if (!pf->ifname) errx(1, "pfctl_set_logif: strdup"); } pf->ifname_set = 1; if (pf->opts & PF_OPT_VERBOSE) printf("set loginterface %s\n", ifname); return (0); } int pfctl_load_logif(struct pfctl *pf, char *ifname) { struct pfioc_if pi; memset(&pi, 0, sizeof(pi)); if (ifname && strlcpy(pi.ifname, ifname, sizeof(pi.ifname)) >= sizeof(pi.ifname)) { warnx("pfctl_load_logif: strlcpy"); return (1); } if (ioctl(pf->dev, DIOCSETSTATUSIF, &pi)) { warnx("DIOCSETSTATUSIF"); return (1); } return (0); } int pfctl_set_hostid(struct pfctl *pf, u_int32_t hostid) { if ((loadopt & PFCTL_FLAG_OPTION) == 0) return (0); HTONL(hostid); pf->hostid = hostid; pf->hostid_set = 1; if (pf->opts & PF_OPT_VERBOSE) printf("set hostid 0x%08x\n", ntohl(hostid)); return (0); } int pfctl_load_hostid(struct pfctl *pf, u_int32_t hostid) { if (ioctl(dev, DIOCSETHOSTID, &hostid)) { warnx("DIOCSETHOSTID"); return (1); } return (0); } int pfctl_set_debug(struct pfctl *pf, char *d) { u_int32_t level; if ((loadopt & PFCTL_FLAG_OPTION) == 0) return (0); if (!strcmp(d, "none")) pf->debug = PF_DEBUG_NONE; else if (!strcmp(d, "urgent")) pf->debug = PF_DEBUG_URGENT; else if (!strcmp(d, "misc")) pf->debug = PF_DEBUG_MISC; else if (!strcmp(d, "loud")) pf->debug = PF_DEBUG_NOISY; else { warnx("unknown debug level \"%s\"", d); return (-1); } pf->debug_set = 1; level = pf->debug; if ((pf->opts & PF_OPT_NOACTION) == 0) if (ioctl(dev, DIOCSETDEBUG, &level)) err(1, "DIOCSETDEBUG"); if (pf->opts & PF_OPT_VERBOSE) printf("set debug %s\n", d); return (0); } int pfctl_load_debug(struct pfctl *pf, unsigned int level) { if (ioctl(pf->dev, DIOCSETDEBUG, &level)) { warnx("DIOCSETDEBUG"); return (1); } return (0); } int pfctl_set_interface_flags(struct pfctl *pf, char *ifname, int flags, int how) { struct pfioc_iface pi; if ((loadopt & PFCTL_FLAG_OPTION) == 0) return (0); bzero(&pi, sizeof(pi)); pi.pfiio_flags = flags; if (strlcpy(pi.pfiio_name, ifname, sizeof(pi.pfiio_name)) >= sizeof(pi.pfiio_name)) errx(1, "pfctl_set_interface_flags: strlcpy"); if ((pf->opts & PF_OPT_NOACTION) == 0) { if (how == 0) { if (ioctl(pf->dev, DIOCCLRIFFLAG, &pi)) err(1, "DIOCCLRIFFLAG"); } else { if (ioctl(pf->dev, DIOCSETIFFLAG, &pi)) err(1, "DIOCSETIFFLAG"); } } return (0); } void pfctl_debug(int dev, u_int32_t level, int opts) { if (ioctl(dev, DIOCSETDEBUG, &level)) err(1, "DIOCSETDEBUG"); if ((opts & PF_OPT_QUIET) == 0) { fprintf(stderr, "debug level set to '"); switch (level) { case PF_DEBUG_NONE: fprintf(stderr, "none"); break; case PF_DEBUG_URGENT: fprintf(stderr, "urgent"); break; case PF_DEBUG_MISC: fprintf(stderr, "misc"); break; case PF_DEBUG_NOISY: fprintf(stderr, "loud"); break; default: fprintf(stderr, ""); break; } fprintf(stderr, "'\n"); } } int pfctl_test_altqsupport(int dev, int opts) { struct pfioc_altq pa; if (ioctl(dev, DIOCGETALTQS, &pa)) { if (errno == ENODEV) { if (opts & PF_OPT_VERBOSE) fprintf(stderr, "No ALTQ support in kernel\n" "ALTQ related functions disabled\n"); return (0); } else err(1, "DIOCGETALTQS"); } return (1); } int pfctl_show_anchors(int dev, int opts, char *anchorname) { struct pfioc_ruleset pr; u_int32_t mnr, nr; memset(&pr, 0, sizeof(pr)); memcpy(pr.path, anchorname, sizeof(pr.path)); if (ioctl(dev, DIOCGETRULESETS, &pr)) { if (errno == EINVAL) fprintf(stderr, "Anchor '%s' not found.\n", anchorname); else err(1, "DIOCGETRULESETS"); return (-1); } mnr = pr.nr; for (nr = 0; nr < mnr; ++nr) { char sub[MAXPATHLEN]; pr.nr = nr; if (ioctl(dev, DIOCGETRULESET, &pr)) err(1, "DIOCGETRULESET"); if (!strcmp(pr.name, PF_RESERVED_ANCHOR)) continue; sub[0] = 0; if (pr.path[0]) { strlcat(sub, pr.path, sizeof(sub)); strlcat(sub, "/", sizeof(sub)); } strlcat(sub, pr.name, sizeof(sub)); if (sub[0] != '_' || (opts & PF_OPT_VERBOSE)) printf(" %s\n", sub); if ((opts & PF_OPT_VERBOSE) && pfctl_show_anchors(dev, opts, sub)) return (-1); } return (0); } const char * pfctl_lookup_option(char *cmd, const char * const *list) { if (cmd != NULL && *cmd) for (; *list; list++) if (!strncmp(cmd, *list, strlen(cmd))) return (*list); return (NULL); } int main(int argc, char *argv[]) { int error = 0; int ch; int mode = O_RDONLY; int opts = 0; int optimize = PF_OPTIMIZE_BASIC; char anchorname[MAXPATHLEN]; char *path; if (argc < 2) usage(); while ((ch = getopt(argc, argv, "a:AdD:eqf:F:ghi:k:K:mnNOo:Pp:rRs:t:T:vx:z")) != -1) { switch (ch) { case 'a': anchoropt = optarg; break; case 'd': opts |= PF_OPT_DISABLE; mode = O_RDWR; break; case 'D': if (pfctl_cmdline_symset(optarg) < 0) warnx("could not parse macro definition %s", optarg); break; case 'e': opts |= PF_OPT_ENABLE; mode = O_RDWR; break; case 'q': opts |= PF_OPT_QUIET; break; case 'F': clearopt = pfctl_lookup_option(optarg, clearopt_list); if (clearopt == NULL) { warnx("Unknown flush modifier '%s'", optarg); usage(); } mode = O_RDWR; break; case 'i': ifaceopt = optarg; break; case 'k': if (state_killers >= 2) { warnx("can only specify -k twice"); usage(); /* NOTREACHED */ } state_kill[state_killers++] = optarg; mode = O_RDWR; break; case 'K': if (src_node_killers >= 2) { warnx("can only specify -K twice"); usage(); /* NOTREACHED */ } src_node_kill[src_node_killers++] = optarg; mode = O_RDWR; break; case 'm': opts |= PF_OPT_MERGE; break; case 'n': opts |= PF_OPT_NOACTION; break; case 'N': loadopt |= PFCTL_FLAG_NAT; break; case 'r': opts |= PF_OPT_USEDNS; break; case 'f': rulesopt = optarg; mode = O_RDWR; break; case 'g': opts |= PF_OPT_DEBUG; break; case 'A': loadopt |= PFCTL_FLAG_ALTQ; break; case 'R': loadopt |= PFCTL_FLAG_FILTER; break; case 'o': optiopt = pfctl_lookup_option(optarg, optiopt_list); if (optiopt == NULL) { warnx("Unknown optimization '%s'", optarg); usage(); } opts |= PF_OPT_OPTIMIZE; break; case 'O': loadopt |= PFCTL_FLAG_OPTION; break; case 'p': pf_device = optarg; break; case 'P': opts |= PF_OPT_NUMERIC; break; case 's': showopt = pfctl_lookup_option(optarg, showopt_list); if (showopt == NULL) { warnx("Unknown show modifier '%s'", optarg); usage(); } break; case 't': tableopt = optarg; break; case 'T': tblcmdopt = pfctl_lookup_option(optarg, tblcmdopt_list); if (tblcmdopt == NULL) { warnx("Unknown table command '%s'", optarg); usage(); } break; case 'v': if (opts & PF_OPT_VERBOSE) opts |= PF_OPT_VERBOSE2; opts |= PF_OPT_VERBOSE; break; case 'x': debugopt = pfctl_lookup_option(optarg, debugopt_list); if (debugopt == NULL) { warnx("Unknown debug level '%s'", optarg); usage(); } mode = O_RDWR; break; case 'z': opts |= PF_OPT_CLRRULECTRS; mode = O_RDWR; break; case 'h': /* FALLTHROUGH */ default: usage(); /* NOTREACHED */ } } if (tblcmdopt != NULL) { argc -= optind; argv += optind; ch = *tblcmdopt; if (ch == 'l') { loadopt |= PFCTL_FLAG_TABLE; tblcmdopt = NULL; } else mode = strchr("acdefkrz", ch) ? O_RDWR : O_RDONLY; } else if (argc != optind) { warnx("unknown command line argument: %s ...", argv[optind]); usage(); /* NOTREACHED */ } if (loadopt == 0) loadopt = ~0; if ((path = calloc(1, MAXPATHLEN)) == NULL) errx(1, "pfctl: calloc"); memset(anchorname, 0, sizeof(anchorname)); if (anchoropt != NULL) { int len = strlen(anchoropt); if (anchoropt[len - 1] == '*') { if (len >= 2 && anchoropt[len - 2] == '/') anchoropt[len - 2] = '\0'; else anchoropt[len - 1] = '\0'; opts |= PF_OPT_RECURSE; } if (strlcpy(anchorname, anchoropt, sizeof(anchorname)) >= sizeof(anchorname)) errx(1, "anchor name '%s' too long", anchoropt); loadopt &= PFCTL_FLAG_FILTER|PFCTL_FLAG_NAT|PFCTL_FLAG_TABLE; } if ((opts & PF_OPT_NOACTION) == 0) { dev = open(pf_device, mode); if (dev == -1) err(1, "%s", pf_device); altqsupport = pfctl_test_altqsupport(dev, opts); } else { dev = open(pf_device, O_RDONLY); if (dev >= 0) opts |= PF_OPT_DUMMYACTION; /* turn off options */ opts &= ~ (PF_OPT_DISABLE | PF_OPT_ENABLE); clearopt = showopt = debugopt = NULL; #if !defined(ENABLE_ALTQ) altqsupport = 0; #else altqsupport = 1; #endif } if (opts & PF_OPT_DISABLE) if (pfctl_disable(dev, opts)) error = 1; if (showopt != NULL) { switch (*showopt) { case 'A': pfctl_show_anchors(dev, opts, anchorname); break; case 'r': pfctl_load_fingerprints(dev, opts); pfctl_show_rules(dev, path, opts, PFCTL_SHOW_RULES, anchorname, 0); break; case 'l': pfctl_load_fingerprints(dev, opts); pfctl_show_rules(dev, path, opts, PFCTL_SHOW_LABELS, anchorname, 0); break; case 'n': pfctl_load_fingerprints(dev, opts); pfctl_show_nat(dev, opts, anchorname); break; case 'q': pfctl_show_altq(dev, ifaceopt, opts, opts & PF_OPT_VERBOSE2); break; case 's': pfctl_show_states(dev, ifaceopt, opts); break; case 'S': pfctl_show_src_nodes(dev, opts); break; case 'i': pfctl_show_status(dev, opts); break; case 't': pfctl_show_timeouts(dev, opts); break; case 'm': pfctl_show_limits(dev, opts); break; case 'a': opts |= PF_OPT_SHOWALL; pfctl_load_fingerprints(dev, opts); pfctl_show_nat(dev, opts, anchorname); pfctl_show_rules(dev, path, opts, 0, anchorname, 0); pfctl_show_altq(dev, ifaceopt, opts, 0); pfctl_show_states(dev, ifaceopt, opts); pfctl_show_src_nodes(dev, opts); pfctl_show_status(dev, opts); pfctl_show_rules(dev, path, opts, 1, anchorname, 0); pfctl_show_timeouts(dev, opts); pfctl_show_limits(dev, opts); pfctl_show_tables(anchorname, opts); pfctl_show_fingerprints(opts); break; case 'T': pfctl_show_tables(anchorname, opts); break; case 'o': pfctl_load_fingerprints(dev, opts); pfctl_show_fingerprints(opts); break; case 'I': pfctl_show_ifaces(ifaceopt, opts); break; } } if ((opts & PF_OPT_CLRRULECTRS) && showopt == NULL) pfctl_show_rules(dev, path, opts, PFCTL_SHOW_NOTHING, anchorname, 0); if (clearopt != NULL) { if (anchorname[0] == '_' || strstr(anchorname, "/_") != NULL) errx(1, "anchor names beginning with '_' cannot " "be modified from the command line"); switch (*clearopt) { case 'r': pfctl_clear_rules(dev, opts, anchorname); break; case 'n': pfctl_clear_nat(dev, opts, anchorname); break; case 'q': pfctl_clear_altq(dev, opts); break; case 's': pfctl_clear_states(dev, ifaceopt, opts); break; case 'S': pfctl_clear_src_nodes(dev, opts); break; case 'i': pfctl_clear_stats(dev, opts); break; case 'a': pfctl_clear_rules(dev, opts, anchorname); pfctl_clear_nat(dev, opts, anchorname); pfctl_clear_tables(anchorname, opts); if (!*anchorname) { pfctl_clear_altq(dev, opts); pfctl_clear_states(dev, ifaceopt, opts); pfctl_clear_src_nodes(dev, opts); pfctl_clear_stats(dev, opts); pfctl_clear_fingerprints(dev, opts); pfctl_clear_interface_flags(dev, opts); } break; case 'o': pfctl_clear_fingerprints(dev, opts); break; case 'T': pfctl_clear_tables(anchorname, opts); break; } } if (state_killers) { if (!strcmp(state_kill[0], "label")) pfctl_label_kill_states(dev, ifaceopt, opts); else if (!strcmp(state_kill[0], "id")) pfctl_id_kill_states(dev, ifaceopt, opts); else pfctl_net_kill_states(dev, ifaceopt, opts); } if (src_node_killers) pfctl_kill_src_nodes(dev, ifaceopt, opts); if (tblcmdopt != NULL) { error = pfctl_command_tables(argc, argv, tableopt, tblcmdopt, rulesopt, anchorname, opts); rulesopt = NULL; } if (optiopt != NULL) { switch (*optiopt) { case 'n': optimize = 0; break; case 'b': optimize |= PF_OPTIMIZE_BASIC; break; case 'o': case 'p': optimize |= PF_OPTIMIZE_PROFILE; break; } } if ((rulesopt != NULL) && (loadopt & PFCTL_FLAG_OPTION) && !anchorname[0]) if (pfctl_clear_interface_flags(dev, opts | PF_OPT_QUIET)) error = 1; if (rulesopt != NULL && !(opts & (PF_OPT_MERGE|PF_OPT_NOACTION)) && !anchorname[0] && (loadopt & PFCTL_FLAG_OPTION)) if (pfctl_file_fingerprints(dev, opts, PF_OSFP_FILE)) error = 1; if (rulesopt != NULL) { if (anchorname[0] == '_' || strstr(anchorname, "/_") != NULL) errx(1, "anchor names beginning with '_' cannot " "be modified from the command line"); if (pfctl_rules(dev, rulesopt, opts, optimize, anchorname, NULL)) error = 1; else if (!(opts & PF_OPT_NOACTION) && (loadopt & PFCTL_FLAG_TABLE)) warn_namespace_collision(NULL); } if (opts & PF_OPT_ENABLE) if (pfctl_enable(dev, opts)) error = 1; if (debugopt != NULL) { switch (*debugopt) { case 'n': pfctl_debug(dev, PF_DEBUG_NONE, opts); break; case 'u': pfctl_debug(dev, PF_DEBUG_URGENT, opts); break; case 'm': pfctl_debug(dev, PF_DEBUG_MISC, opts); break; case 'l': pfctl_debug(dev, PF_DEBUG_NOISY, opts); break; } } exit(error); } Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_altq.c =================================================================== --- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_altq.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_altq.c (revision 303775) @@ -1,1518 +1,1518 @@ /* $OpenBSD: pfctl_altq.c,v 1.93 2007/10/15 02:16:35 deraadt Exp $ */ /* * Copyright (c) 2002 * Sony Computer Science Laboratories Inc. * Copyright (c) 2002, 2003 Henning Brauer * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pfctl_parser.h" #include "pfctl.h" #define is_sc_null(sc) (((sc) == NULL) || ((sc)->m1 == 0 && (sc)->m2 == 0)) -TAILQ_HEAD(altqs, pf_altq) altqs = TAILQ_HEAD_INITIALIZER(altqs); -LIST_HEAD(gen_sc, segment) rtsc, lssc; +static TAILQ_HEAD(altqs, pf_altq) altqs = TAILQ_HEAD_INITIALIZER(altqs); +static LIST_HEAD(gen_sc, segment) rtsc, lssc; struct pf_altq *qname_to_pfaltq(const char *, const char *); u_int32_t qname_to_qid(const char *); static int eval_pfqueue_cbq(struct pfctl *, struct pf_altq *); static int cbq_compute_idletime(struct pfctl *, struct pf_altq *); static int check_commit_cbq(int, int, struct pf_altq *); static int print_cbq_opts(const struct pf_altq *); static int print_codel_opts(const struct pf_altq *, const struct node_queue_opt *); static int eval_pfqueue_priq(struct pfctl *, struct pf_altq *); static int check_commit_priq(int, int, struct pf_altq *); static int print_priq_opts(const struct pf_altq *); static int eval_pfqueue_hfsc(struct pfctl *, struct pf_altq *); static int check_commit_hfsc(int, int, struct pf_altq *); static int print_hfsc_opts(const struct pf_altq *, const struct node_queue_opt *); static int eval_pfqueue_fairq(struct pfctl *, struct pf_altq *); static int print_fairq_opts(const struct pf_altq *, const struct node_queue_opt *); static int check_commit_fairq(int, int, struct pf_altq *); static void gsc_add_sc(struct gen_sc *, struct service_curve *); static int is_gsc_under_sc(struct gen_sc *, struct service_curve *); static void gsc_destroy(struct gen_sc *); static struct segment *gsc_getentry(struct gen_sc *, double); static int gsc_add_seg(struct gen_sc *, double, double, double, double); static double sc_x2y(struct service_curve *, double); #ifdef __FreeBSD__ u_int32_t getifspeed(int, char *); #else u_int32_t getifspeed(char *); #endif u_long getifmtu(char *); int eval_queue_opts(struct pf_altq *, struct node_queue_opt *, u_int32_t); u_int32_t eval_bwspec(struct node_queue_bw *, u_int32_t); void print_hfsc_sc(const char *, u_int, u_int, u_int, const struct node_hfsc_sc *); void print_fairq_sc(const char *, u_int, u_int, u_int, const struct node_fairq_sc *); void pfaltq_store(struct pf_altq *a) { struct pf_altq *altq; if ((altq = malloc(sizeof(*altq))) == NULL) err(1, "malloc"); memcpy(altq, a, sizeof(struct pf_altq)); TAILQ_INSERT_TAIL(&altqs, altq, entries); } struct pf_altq * pfaltq_lookup(const char *ifname) { struct pf_altq *altq; TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(ifname, altq->ifname, IFNAMSIZ) == 0 && altq->qname[0] == 0) return (altq); } return (NULL); } struct pf_altq * qname_to_pfaltq(const char *qname, const char *ifname) { struct pf_altq *altq; TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(ifname, altq->ifname, IFNAMSIZ) == 0 && strncmp(qname, altq->qname, PF_QNAME_SIZE) == 0) return (altq); } return (NULL); } u_int32_t qname_to_qid(const char *qname) { struct pf_altq *altq; /* * We guarantee that same named queues on different interfaces * have the same qid, so we do NOT need to limit matching on * one interface! */ TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(qname, altq->qname, PF_QNAME_SIZE) == 0) return (altq->qid); } return (0); } void print_altq(const struct pf_altq *a, unsigned int level, struct node_queue_bw *bw, struct node_queue_opt *qopts) { if (a->qname[0] != 0) { print_queue(a, level, bw, 1, qopts); return; } #ifdef __FreeBSD__ if (a->local_flags & PFALTQ_FLAG_IF_REMOVED) printf("INACTIVE "); #endif printf("altq on %s ", a->ifname); switch (a->scheduler) { case ALTQT_CBQ: if (!print_cbq_opts(a)) printf("cbq "); break; case ALTQT_PRIQ: if (!print_priq_opts(a)) printf("priq "); break; case ALTQT_HFSC: if (!print_hfsc_opts(a, qopts)) printf("hfsc "); break; case ALTQT_FAIRQ: if (!print_fairq_opts(a, qopts)) printf("fairq "); break; case ALTQT_CODEL: if (!print_codel_opts(a, qopts)) printf("codel "); break; } if (bw != NULL && bw->bw_percent > 0) { if (bw->bw_percent < 100) printf("bandwidth %u%% ", bw->bw_percent); } else printf("bandwidth %s ", rate2str((double)a->ifbandwidth)); if (a->qlimit != DEFAULT_QLIMIT) printf("qlimit %u ", a->qlimit); printf("tbrsize %u ", a->tbrsize); } void print_queue(const struct pf_altq *a, unsigned int level, struct node_queue_bw *bw, int print_interface, struct node_queue_opt *qopts) { unsigned int i; #ifdef __FreeBSD__ if (a->local_flags & PFALTQ_FLAG_IF_REMOVED) printf("INACTIVE "); #endif printf("queue "); for (i = 0; i < level; ++i) printf(" "); printf("%s ", a->qname); if (print_interface) printf("on %s ", a->ifname); if (a->scheduler == ALTQT_CBQ || a->scheduler == ALTQT_HFSC || a->scheduler == ALTQT_FAIRQ) { if (bw != NULL && bw->bw_percent > 0) { if (bw->bw_percent < 100) printf("bandwidth %u%% ", bw->bw_percent); } else printf("bandwidth %s ", rate2str((double)a->bandwidth)); } if (a->priority != DEFAULT_PRIORITY) printf("priority %u ", a->priority); if (a->qlimit != DEFAULT_QLIMIT) printf("qlimit %u ", a->qlimit); switch (a->scheduler) { case ALTQT_CBQ: print_cbq_opts(a); break; case ALTQT_PRIQ: print_priq_opts(a); break; case ALTQT_HFSC: print_hfsc_opts(a, qopts); break; case ALTQT_FAIRQ: print_fairq_opts(a, qopts); break; } } /* * eval_pfaltq computes the discipline parameters. */ int eval_pfaltq(struct pfctl *pf, struct pf_altq *pa, struct node_queue_bw *bw, struct node_queue_opt *opts) { u_int rate, size, errors = 0; if (bw->bw_absolute > 0) pa->ifbandwidth = bw->bw_absolute; else #ifdef __FreeBSD__ if ((rate = getifspeed(pf->dev, pa->ifname)) == 0) { #else if ((rate = getifspeed(pa->ifname)) == 0) { #endif fprintf(stderr, "interface %s does not know its bandwidth, " "please specify an absolute bandwidth\n", pa->ifname); errors++; } else if ((pa->ifbandwidth = eval_bwspec(bw, rate)) == 0) pa->ifbandwidth = rate; errors += eval_queue_opts(pa, opts, pa->ifbandwidth); /* if tbrsize is not specified, use heuristics */ if (pa->tbrsize == 0) { rate = pa->ifbandwidth; if (rate <= 1 * 1000 * 1000) size = 1; else if (rate <= 10 * 1000 * 1000) size = 4; else if (rate <= 200 * 1000 * 1000) size = 8; else size = 24; size = size * getifmtu(pa->ifname); if (size > 0xffff) size = 0xffff; pa->tbrsize = size; } return (errors); } /* * check_commit_altq does consistency check for each interface */ int check_commit_altq(int dev, int opts) { struct pf_altq *altq; int error = 0; /* call the discipline check for each interface. */ TAILQ_FOREACH(altq, &altqs, entries) { if (altq->qname[0] == 0) { switch (altq->scheduler) { case ALTQT_CBQ: error = check_commit_cbq(dev, opts, altq); break; case ALTQT_PRIQ: error = check_commit_priq(dev, opts, altq); break; case ALTQT_HFSC: error = check_commit_hfsc(dev, opts, altq); break; case ALTQT_FAIRQ: error = check_commit_fairq(dev, opts, altq); break; default: break; } } } return (error); } /* * eval_pfqueue computes the queue parameters. */ int eval_pfqueue(struct pfctl *pf, struct pf_altq *pa, struct node_queue_bw *bw, struct node_queue_opt *opts) { /* should be merged with expand_queue */ struct pf_altq *if_pa, *parent, *altq; u_int32_t bwsum; int error = 0; /* find the corresponding interface and copy fields used by queues */ if ((if_pa = pfaltq_lookup(pa->ifname)) == NULL) { fprintf(stderr, "altq not defined on %s\n", pa->ifname); return (1); } pa->scheduler = if_pa->scheduler; pa->ifbandwidth = if_pa->ifbandwidth; if (qname_to_pfaltq(pa->qname, pa->ifname) != NULL) { fprintf(stderr, "queue %s already exists on interface %s\n", pa->qname, pa->ifname); return (1); } pa->qid = qname_to_qid(pa->qname); parent = NULL; if (pa->parent[0] != 0) { parent = qname_to_pfaltq(pa->parent, pa->ifname); if (parent == NULL) { fprintf(stderr, "parent %s not found for %s\n", pa->parent, pa->qname); return (1); } pa->parent_qid = parent->qid; } if (pa->qlimit == 0) pa->qlimit = DEFAULT_QLIMIT; if (pa->scheduler == ALTQT_CBQ || pa->scheduler == ALTQT_HFSC || pa->scheduler == ALTQT_FAIRQ) { pa->bandwidth = eval_bwspec(bw, parent == NULL ? 0 : parent->bandwidth); if (pa->bandwidth > pa->ifbandwidth) { fprintf(stderr, "bandwidth for %s higher than " "interface\n", pa->qname); return (1); } /* check the sum of the child bandwidth is under parent's */ if (parent != NULL) { if (pa->bandwidth > parent->bandwidth) { warnx("bandwidth for %s higher than parent", pa->qname); return (1); } bwsum = 0; TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) == 0 && altq->qname[0] != 0 && strncmp(altq->parent, pa->parent, PF_QNAME_SIZE) == 0) bwsum += altq->bandwidth; } bwsum += pa->bandwidth; if (bwsum > parent->bandwidth) { warnx("the sum of the child bandwidth higher" " than parent \"%s\"", parent->qname); } } } if (eval_queue_opts(pa, opts, parent == NULL? 0 : parent->bandwidth)) return (1); switch (pa->scheduler) { case ALTQT_CBQ: error = eval_pfqueue_cbq(pf, pa); break; case ALTQT_PRIQ: error = eval_pfqueue_priq(pf, pa); break; case ALTQT_HFSC: error = eval_pfqueue_hfsc(pf, pa); break; case ALTQT_FAIRQ: error = eval_pfqueue_fairq(pf, pa); break; default: break; } return (error); } /* * CBQ support functions */ #define RM_FILTER_GAIN 5 /* log2 of gain, e.g., 5 => 31/32 */ #define RM_NS_PER_SEC (1000000000) static int eval_pfqueue_cbq(struct pfctl *pf, struct pf_altq *pa) { struct cbq_opts *opts; u_int ifmtu; if (pa->priority >= CBQ_MAXPRI) { warnx("priority out of range: max %d", CBQ_MAXPRI - 1); return (-1); } ifmtu = getifmtu(pa->ifname); opts = &pa->pq_u.cbq_opts; if (opts->pktsize == 0) { /* use default */ opts->pktsize = ifmtu; if (opts->pktsize > MCLBYTES) /* do what TCP does */ opts->pktsize &= ~MCLBYTES; } else if (opts->pktsize > ifmtu) opts->pktsize = ifmtu; if (opts->maxpktsize == 0) /* use default */ opts->maxpktsize = ifmtu; else if (opts->maxpktsize > ifmtu) opts->pktsize = ifmtu; if (opts->pktsize > opts->maxpktsize) opts->pktsize = opts->maxpktsize; if (pa->parent[0] == 0) opts->flags |= (CBQCLF_ROOTCLASS | CBQCLF_WRR); cbq_compute_idletime(pf, pa); return (0); } /* * compute ns_per_byte, maxidle, minidle, and offtime */ static int cbq_compute_idletime(struct pfctl *pf, struct pf_altq *pa) { struct cbq_opts *opts; double maxidle_s, maxidle, minidle; double offtime, nsPerByte, ifnsPerByte, ptime, cptime; double z, g, f, gton, gtom; u_int minburst, maxburst; opts = &pa->pq_u.cbq_opts; ifnsPerByte = (1.0 / (double)pa->ifbandwidth) * RM_NS_PER_SEC * 8; minburst = opts->minburst; maxburst = opts->maxburst; if (pa->bandwidth == 0) f = 0.0001; /* small enough? */ else f = ((double) pa->bandwidth / (double) pa->ifbandwidth); nsPerByte = ifnsPerByte / f; ptime = (double)opts->pktsize * ifnsPerByte; cptime = ptime * (1.0 - f) / f; if (nsPerByte * (double)opts->maxpktsize > (double)INT_MAX) { /* * this causes integer overflow in kernel! * (bandwidth < 6Kbps when max_pkt_size=1500) */ if (pa->bandwidth != 0 && (pf->opts & PF_OPT_QUIET) == 0) warnx("queue bandwidth must be larger than %s", rate2str(ifnsPerByte * (double)opts->maxpktsize / (double)INT_MAX * (double)pa->ifbandwidth)); fprintf(stderr, "cbq: queue %s is too slow!\n", pa->qname); nsPerByte = (double)(INT_MAX / opts->maxpktsize); } if (maxburst == 0) { /* use default */ if (cptime > 10.0 * 1000000) maxburst = 4; else maxburst = 16; } if (minburst == 0) /* use default */ minburst = 2; if (minburst > maxburst) minburst = maxburst; z = (double)(1 << RM_FILTER_GAIN); g = (1.0 - 1.0 / z); gton = pow(g, (double)maxburst); gtom = pow(g, (double)(minburst-1)); maxidle = ((1.0 / f - 1.0) * ((1.0 - gton) / gton)); maxidle_s = (1.0 - g); if (maxidle > maxidle_s) maxidle = ptime * maxidle; else maxidle = ptime * maxidle_s; offtime = cptime * (1.0 + 1.0/(1.0 - g) * (1.0 - gtom) / gtom); minidle = -((double)opts->maxpktsize * (double)nsPerByte); /* scale parameters */ maxidle = ((maxidle * 8.0) / nsPerByte) * pow(2.0, (double)RM_FILTER_GAIN); offtime = (offtime * 8.0) / nsPerByte * pow(2.0, (double)RM_FILTER_GAIN); minidle = ((minidle * 8.0) / nsPerByte) * pow(2.0, (double)RM_FILTER_GAIN); maxidle = maxidle / 1000.0; offtime = offtime / 1000.0; minidle = minidle / 1000.0; opts->minburst = minburst; opts->maxburst = maxburst; opts->ns_per_byte = (u_int)nsPerByte; opts->maxidle = (u_int)fabs(maxidle); opts->minidle = (int)minidle; opts->offtime = (u_int)fabs(offtime); return (0); } static int check_commit_cbq(int dev, int opts, struct pf_altq *pa) { struct pf_altq *altq; int root_class, default_class; int error = 0; /* * check if cbq has one root queue and one default queue * for this interface */ root_class = default_class = 0; TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0) continue; if (altq->qname[0] == 0) /* this is for interface */ continue; if (altq->pq_u.cbq_opts.flags & CBQCLF_ROOTCLASS) root_class++; if (altq->pq_u.cbq_opts.flags & CBQCLF_DEFCLASS) default_class++; } if (root_class != 1) { warnx("should have one root queue on %s", pa->ifname); error++; } if (default_class != 1) { warnx("should have one default queue on %s", pa->ifname); error++; } return (error); } static int print_cbq_opts(const struct pf_altq *a) { const struct cbq_opts *opts; opts = &a->pq_u.cbq_opts; if (opts->flags) { printf("cbq("); if (opts->flags & CBQCLF_RED) printf(" red"); if (opts->flags & CBQCLF_ECN) printf(" ecn"); if (opts->flags & CBQCLF_RIO) printf(" rio"); if (opts->flags & CBQCLF_CODEL) printf(" codel"); if (opts->flags & CBQCLF_CLEARDSCP) printf(" cleardscp"); if (opts->flags & CBQCLF_FLOWVALVE) printf(" flowvalve"); if (opts->flags & CBQCLF_BORROW) printf(" borrow"); if (opts->flags & CBQCLF_WRR) printf(" wrr"); if (opts->flags & CBQCLF_EFFICIENT) printf(" efficient"); if (opts->flags & CBQCLF_ROOTCLASS) printf(" root"); if (opts->flags & CBQCLF_DEFCLASS) printf(" default"); printf(" ) "); return (1); } else return (0); } /* * PRIQ support functions */ static int eval_pfqueue_priq(struct pfctl *pf, struct pf_altq *pa) { struct pf_altq *altq; if (pa->priority >= PRIQ_MAXPRI) { warnx("priority out of range: max %d", PRIQ_MAXPRI - 1); return (-1); } /* the priority should be unique for the interface */ TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) == 0 && altq->qname[0] != 0 && altq->priority == pa->priority) { warnx("%s and %s have the same priority", altq->qname, pa->qname); return (-1); } } return (0); } static int check_commit_priq(int dev, int opts, struct pf_altq *pa) { struct pf_altq *altq; int default_class; int error = 0; /* * check if priq has one default class for this interface */ default_class = 0; TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0) continue; if (altq->qname[0] == 0) /* this is for interface */ continue; if (altq->pq_u.priq_opts.flags & PRCF_DEFAULTCLASS) default_class++; } if (default_class != 1) { warnx("should have one default queue on %s", pa->ifname); error++; } return (error); } static int print_priq_opts(const struct pf_altq *a) { const struct priq_opts *opts; opts = &a->pq_u.priq_opts; if (opts->flags) { printf("priq("); if (opts->flags & PRCF_RED) printf(" red"); if (opts->flags & PRCF_ECN) printf(" ecn"); if (opts->flags & PRCF_RIO) printf(" rio"); if (opts->flags & PRCF_CODEL) printf(" codel"); if (opts->flags & PRCF_CLEARDSCP) printf(" cleardscp"); if (opts->flags & PRCF_DEFAULTCLASS) printf(" default"); printf(" ) "); return (1); } else return (0); } /* * HFSC support functions */ static int eval_pfqueue_hfsc(struct pfctl *pf, struct pf_altq *pa) { struct pf_altq *altq, *parent; struct hfsc_opts *opts; struct service_curve sc; opts = &pa->pq_u.hfsc_opts; if (pa->parent[0] == 0) { /* root queue */ opts->lssc_m1 = pa->ifbandwidth; opts->lssc_m2 = pa->ifbandwidth; opts->lssc_d = 0; return (0); } LIST_INIT(&rtsc); LIST_INIT(&lssc); /* if link_share is not specified, use bandwidth */ if (opts->lssc_m2 == 0) opts->lssc_m2 = pa->bandwidth; if ((opts->rtsc_m1 > 0 && opts->rtsc_m2 == 0) || (opts->lssc_m1 > 0 && opts->lssc_m2 == 0) || (opts->ulsc_m1 > 0 && opts->ulsc_m2 == 0)) { warnx("m2 is zero for %s", pa->qname); return (-1); } if ((opts->rtsc_m1 < opts->rtsc_m2 && opts->rtsc_m1 != 0) || (opts->lssc_m1 < opts->lssc_m2 && opts->lssc_m1 != 0) || (opts->ulsc_m1 < opts->ulsc_m2 && opts->ulsc_m1 != 0)) { warnx("m1 must be zero for convex curve: %s", pa->qname); return (-1); } /* * admission control: * for the real-time service curve, the sum of the service curves * should not exceed 80% of the interface bandwidth. 20% is reserved * not to over-commit the actual interface bandwidth. * for the linkshare service curve, the sum of the child service * curve should not exceed the parent service curve. * for the upper-limit service curve, the assigned bandwidth should * be smaller than the interface bandwidth, and the upper-limit should * be larger than the real-time service curve when both are defined. */ parent = qname_to_pfaltq(pa->parent, pa->ifname); if (parent == NULL) errx(1, "parent %s not found for %s", pa->parent, pa->qname); TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0) continue; if (altq->qname[0] == 0) /* this is for interface */ continue; /* if the class has a real-time service curve, add it. */ if (opts->rtsc_m2 != 0 && altq->pq_u.hfsc_opts.rtsc_m2 != 0) { sc.m1 = altq->pq_u.hfsc_opts.rtsc_m1; sc.d = altq->pq_u.hfsc_opts.rtsc_d; sc.m2 = altq->pq_u.hfsc_opts.rtsc_m2; gsc_add_sc(&rtsc, &sc); } if (strncmp(altq->parent, pa->parent, PF_QNAME_SIZE) != 0) continue; /* if the class has a linkshare service curve, add it. */ if (opts->lssc_m2 != 0 && altq->pq_u.hfsc_opts.lssc_m2 != 0) { sc.m1 = altq->pq_u.hfsc_opts.lssc_m1; sc.d = altq->pq_u.hfsc_opts.lssc_d; sc.m2 = altq->pq_u.hfsc_opts.lssc_m2; gsc_add_sc(&lssc, &sc); } } /* check the real-time service curve. reserve 20% of interface bw */ if (opts->rtsc_m2 != 0) { /* add this queue to the sum */ sc.m1 = opts->rtsc_m1; sc.d = opts->rtsc_d; sc.m2 = opts->rtsc_m2; gsc_add_sc(&rtsc, &sc); /* compare the sum with 80% of the interface */ sc.m1 = 0; sc.d = 0; sc.m2 = pa->ifbandwidth / 100 * 80; if (!is_gsc_under_sc(&rtsc, &sc)) { warnx("real-time sc exceeds 80%% of the interface " "bandwidth (%s)", rate2str((double)sc.m2)); goto err_ret; } } /* check the linkshare service curve. */ if (opts->lssc_m2 != 0) { /* add this queue to the child sum */ sc.m1 = opts->lssc_m1; sc.d = opts->lssc_d; sc.m2 = opts->lssc_m2; gsc_add_sc(&lssc, &sc); /* compare the sum of the children with parent's sc */ sc.m1 = parent->pq_u.hfsc_opts.lssc_m1; sc.d = parent->pq_u.hfsc_opts.lssc_d; sc.m2 = parent->pq_u.hfsc_opts.lssc_m2; if (!is_gsc_under_sc(&lssc, &sc)) { warnx("linkshare sc exceeds parent's sc"); goto err_ret; } } /* check the upper-limit service curve. */ if (opts->ulsc_m2 != 0) { if (opts->ulsc_m1 > pa->ifbandwidth || opts->ulsc_m2 > pa->ifbandwidth) { warnx("upper-limit larger than interface bandwidth"); goto err_ret; } if (opts->rtsc_m2 != 0 && opts->rtsc_m2 > opts->ulsc_m2) { warnx("upper-limit sc smaller than real-time sc"); goto err_ret; } } gsc_destroy(&rtsc); gsc_destroy(&lssc); return (0); err_ret: gsc_destroy(&rtsc); gsc_destroy(&lssc); return (-1); } /* * FAIRQ support functions */ static int eval_pfqueue_fairq(struct pfctl *pf __unused, struct pf_altq *pa) { struct pf_altq *altq, *parent; struct fairq_opts *opts; struct service_curve sc; opts = &pa->pq_u.fairq_opts; if (pa->parent[0] == 0) { /* root queue */ opts->lssc_m1 = pa->ifbandwidth; opts->lssc_m2 = pa->ifbandwidth; opts->lssc_d = 0; return (0); } LIST_INIT(&lssc); /* if link_share is not specified, use bandwidth */ if (opts->lssc_m2 == 0) opts->lssc_m2 = pa->bandwidth; /* * admission control: * for the real-time service curve, the sum of the service curves * should not exceed 80% of the interface bandwidth. 20% is reserved * not to over-commit the actual interface bandwidth. * for the link-sharing service curve, the sum of the child service * curve should not exceed the parent service curve. * for the upper-limit service curve, the assigned bandwidth should * be smaller than the interface bandwidth, and the upper-limit should * be larger than the real-time service curve when both are defined. */ parent = qname_to_pfaltq(pa->parent, pa->ifname); if (parent == NULL) errx(1, "parent %s not found for %s", pa->parent, pa->qname); TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0) continue; if (altq->qname[0] == 0) /* this is for interface */ continue; if (strncmp(altq->parent, pa->parent, PF_QNAME_SIZE) != 0) continue; /* if the class has a link-sharing service curve, add it. */ if (opts->lssc_m2 != 0 && altq->pq_u.fairq_opts.lssc_m2 != 0) { sc.m1 = altq->pq_u.fairq_opts.lssc_m1; sc.d = altq->pq_u.fairq_opts.lssc_d; sc.m2 = altq->pq_u.fairq_opts.lssc_m2; gsc_add_sc(&lssc, &sc); } } /* check the link-sharing service curve. */ if (opts->lssc_m2 != 0) { sc.m1 = parent->pq_u.fairq_opts.lssc_m1; sc.d = parent->pq_u.fairq_opts.lssc_d; sc.m2 = parent->pq_u.fairq_opts.lssc_m2; if (!is_gsc_under_sc(&lssc, &sc)) { warnx("link-sharing sc exceeds parent's sc"); goto err_ret; } } gsc_destroy(&lssc); return (0); err_ret: gsc_destroy(&lssc); return (-1); } static int check_commit_hfsc(int dev, int opts, struct pf_altq *pa) { struct pf_altq *altq, *def = NULL; int default_class; int error = 0; /* check if hfsc has one default queue for this interface */ default_class = 0; TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0) continue; if (altq->qname[0] == 0) /* this is for interface */ continue; if (altq->parent[0] == 0) /* dummy root */ continue; if (altq->pq_u.hfsc_opts.flags & HFCF_DEFAULTCLASS) { default_class++; def = altq; } } if (default_class != 1) { warnx("should have one default queue on %s", pa->ifname); return (1); } /* make sure the default queue is a leaf */ TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0) continue; if (altq->qname[0] == 0) /* this is for interface */ continue; if (strncmp(altq->parent, def->qname, PF_QNAME_SIZE) == 0) { warnx("default queue is not a leaf"); error++; } } return (error); } static int check_commit_fairq(int dev __unused, int opts __unused, struct pf_altq *pa) { struct pf_altq *altq, *def = NULL; int default_class; int error = 0; /* check if fairq has one default queue for this interface */ default_class = 0; TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0) continue; if (altq->qname[0] == 0) /* this is for interface */ continue; if (altq->pq_u.fairq_opts.flags & FARF_DEFAULTCLASS) { default_class++; def = altq; } } if (default_class != 1) { warnx("should have one default queue on %s", pa->ifname); return (1); } /* make sure the default queue is a leaf */ TAILQ_FOREACH(altq, &altqs, entries) { if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0) continue; if (altq->qname[0] == 0) /* this is for interface */ continue; if (strncmp(altq->parent, def->qname, PF_QNAME_SIZE) == 0) { warnx("default queue is not a leaf"); error++; } } return (error); } static int print_hfsc_opts(const struct pf_altq *a, const struct node_queue_opt *qopts) { const struct hfsc_opts *opts; const struct node_hfsc_sc *rtsc, *lssc, *ulsc; opts = &a->pq_u.hfsc_opts; if (qopts == NULL) rtsc = lssc = ulsc = NULL; else { rtsc = &qopts->data.hfsc_opts.realtime; lssc = &qopts->data.hfsc_opts.linkshare; ulsc = &qopts->data.hfsc_opts.upperlimit; } if (opts->flags || opts->rtsc_m2 != 0 || opts->ulsc_m2 != 0 || (opts->lssc_m2 != 0 && (opts->lssc_m2 != a->bandwidth || opts->lssc_d != 0))) { printf("hfsc("); if (opts->flags & HFCF_RED) printf(" red"); if (opts->flags & HFCF_ECN) printf(" ecn"); if (opts->flags & HFCF_RIO) printf(" rio"); if (opts->flags & HFCF_CODEL) printf(" codel"); if (opts->flags & HFCF_CLEARDSCP) printf(" cleardscp"); if (opts->flags & HFCF_DEFAULTCLASS) printf(" default"); if (opts->rtsc_m2 != 0) print_hfsc_sc("realtime", opts->rtsc_m1, opts->rtsc_d, opts->rtsc_m2, rtsc); if (opts->lssc_m2 != 0 && (opts->lssc_m2 != a->bandwidth || opts->lssc_d != 0)) print_hfsc_sc("linkshare", opts->lssc_m1, opts->lssc_d, opts->lssc_m2, lssc); if (opts->ulsc_m2 != 0) print_hfsc_sc("upperlimit", opts->ulsc_m1, opts->ulsc_d, opts->ulsc_m2, ulsc); printf(" ) "); return (1); } else return (0); } static int print_codel_opts(const struct pf_altq *a, const struct node_queue_opt *qopts) { const struct codel_opts *opts; opts = &a->pq_u.codel_opts; if (opts->target || opts->interval || opts->ecn) { printf("codel("); if (opts->target) printf(" target %d", opts->target); if (opts->interval) printf(" interval %d", opts->interval); if (opts->ecn) printf("ecn"); printf(" ) "); return (1); } return (0); } static int print_fairq_opts(const struct pf_altq *a, const struct node_queue_opt *qopts) { const struct fairq_opts *opts; const struct node_fairq_sc *loc_lssc; opts = &a->pq_u.fairq_opts; if (qopts == NULL) loc_lssc = NULL; else loc_lssc = &qopts->data.fairq_opts.linkshare; if (opts->flags || (opts->lssc_m2 != 0 && (opts->lssc_m2 != a->bandwidth || opts->lssc_d != 0))) { printf("fairq("); if (opts->flags & FARF_RED) printf(" red"); if (opts->flags & FARF_ECN) printf(" ecn"); if (opts->flags & FARF_RIO) printf(" rio"); if (opts->flags & FARF_CODEL) printf(" codel"); if (opts->flags & FARF_CLEARDSCP) printf(" cleardscp"); if (opts->flags & FARF_DEFAULTCLASS) printf(" default"); if (opts->lssc_m2 != 0 && (opts->lssc_m2 != a->bandwidth || opts->lssc_d != 0)) print_fairq_sc("linkshare", opts->lssc_m1, opts->lssc_d, opts->lssc_m2, loc_lssc); printf(" ) "); return (1); } else return (0); } /* * admission control using generalized service curve */ /* add a new service curve to a generalized service curve */ static void gsc_add_sc(struct gen_sc *gsc, struct service_curve *sc) { if (is_sc_null(sc)) return; if (sc->d != 0) gsc_add_seg(gsc, 0.0, 0.0, (double)sc->d, (double)sc->m1); gsc_add_seg(gsc, (double)sc->d, 0.0, INFINITY, (double)sc->m2); } /* * check whether all points of a generalized service curve have * their y-coordinates no larger than a given two-piece linear * service curve. */ static int is_gsc_under_sc(struct gen_sc *gsc, struct service_curve *sc) { struct segment *s, *last, *end; double y; if (is_sc_null(sc)) { if (LIST_EMPTY(gsc)) return (1); LIST_FOREACH(s, gsc, _next) { if (s->m != 0) return (0); } return (1); } /* * gsc has a dummy entry at the end with x = INFINITY. * loop through up to this dummy entry. */ end = gsc_getentry(gsc, INFINITY); if (end == NULL) return (1); last = NULL; for (s = LIST_FIRST(gsc); s != end; s = LIST_NEXT(s, _next)) { if (s->y > sc_x2y(sc, s->x)) return (0); last = s; } /* last now holds the real last segment */ if (last == NULL) return (1); if (last->m > sc->m2) return (0); if (last->x < sc->d && last->m > sc->m1) { y = last->y + (sc->d - last->x) * last->m; if (y > sc_x2y(sc, sc->d)) return (0); } return (1); } static void gsc_destroy(struct gen_sc *gsc) { struct segment *s; while ((s = LIST_FIRST(gsc)) != NULL) { LIST_REMOVE(s, _next); free(s); } } /* * return a segment entry starting at x. * if gsc has no entry starting at x, a new entry is created at x. */ static struct segment * gsc_getentry(struct gen_sc *gsc, double x) { struct segment *new, *prev, *s; prev = NULL; LIST_FOREACH(s, gsc, _next) { if (s->x == x) return (s); /* matching entry found */ else if (s->x < x) prev = s; else break; } /* we have to create a new entry */ if ((new = calloc(1, sizeof(struct segment))) == NULL) return (NULL); new->x = x; if (x == INFINITY || s == NULL) new->d = 0; else if (s->x == INFINITY) new->d = INFINITY; else new->d = s->x - x; if (prev == NULL) { /* insert the new entry at the head of the list */ new->y = 0; new->m = 0; LIST_INSERT_HEAD(gsc, new, _next); } else { /* * the start point intersects with the segment pointed by * prev. divide prev into 2 segments */ if (x == INFINITY) { prev->d = INFINITY; if (prev->m == 0) new->y = prev->y; else new->y = INFINITY; } else { prev->d = x - prev->x; new->y = prev->d * prev->m + prev->y; } new->m = prev->m; LIST_INSERT_AFTER(prev, new, _next); } return (new); } /* add a segment to a generalized service curve */ static int gsc_add_seg(struct gen_sc *gsc, double x, double y, double d, double m) { struct segment *start, *end, *s; double x2; if (d == INFINITY) x2 = INFINITY; else x2 = x + d; start = gsc_getentry(gsc, x); end = gsc_getentry(gsc, x2); if (start == NULL || end == NULL) return (-1); for (s = start; s != end; s = LIST_NEXT(s, _next)) { s->m += m; s->y += y + (s->x - x) * m; } end = gsc_getentry(gsc, INFINITY); for (; s != end; s = LIST_NEXT(s, _next)) { s->y += m * d; } return (0); } /* get y-projection of a service curve */ static double sc_x2y(struct service_curve *sc, double x) { double y; if (x <= (double)sc->d) /* y belongs to the 1st segment */ y = x * (double)sc->m1; else /* y belongs to the 2nd segment */ y = (double)sc->d * (double)sc->m1 + (x - (double)sc->d) * (double)sc->m2; return (y); } /* * misc utilities */ #define R2S_BUFS 8 #define RATESTR_MAX 16 char * rate2str(double rate) { char *buf; static char r2sbuf[R2S_BUFS][RATESTR_MAX]; /* ring bufer */ static int idx = 0; int i; static const char unit[] = " KMG"; buf = r2sbuf[idx++]; if (idx == R2S_BUFS) idx = 0; for (i = 0; rate >= 1000 && i <= 3; i++) rate /= 1000; if ((int)(rate * 100) % 100) snprintf(buf, RATESTR_MAX, "%.2f%cb", rate, unit[i]); else snprintf(buf, RATESTR_MAX, "%d%cb", (int)rate, unit[i]); return (buf); } #ifdef __FreeBSD__ /* * XXX * FreeBSD does not have SIOCGIFDATA. * To emulate this, DIOCGIFSPEED ioctl added to pf. */ u_int32_t getifspeed(int pfdev, char *ifname) { struct pf_ifspeed io; bzero(&io, sizeof io); if (strlcpy(io.ifname, ifname, IFNAMSIZ) >= sizeof(io.ifname)) errx(1, "getifspeed: strlcpy"); if (ioctl(pfdev, DIOCGIFSPEED, &io) == -1) err(1, "DIOCGIFSPEED"); return ((u_int32_t)io.baudrate); } #else u_int32_t getifspeed(char *ifname) { int s; struct ifreq ifr; struct if_data ifrdat; if ((s = socket(get_socket_domain(), SOCK_DGRAM, 0)) < 0) err(1, "socket"); bzero(&ifr, sizeof(ifr)); if (strlcpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name)) >= sizeof(ifr.ifr_name)) errx(1, "getifspeed: strlcpy"); ifr.ifr_data = (caddr_t)&ifrdat; if (ioctl(s, SIOCGIFDATA, (caddr_t)&ifr) == -1) err(1, "SIOCGIFDATA"); if (close(s)) err(1, "close"); return ((u_int32_t)ifrdat.ifi_baudrate); } #endif u_long getifmtu(char *ifname) { int s; struct ifreq ifr; if ((s = socket(get_socket_domain(), SOCK_DGRAM, 0)) < 0) err(1, "socket"); bzero(&ifr, sizeof(ifr)); if (strlcpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name)) >= sizeof(ifr.ifr_name)) errx(1, "getifmtu: strlcpy"); if (ioctl(s, SIOCGIFMTU, (caddr_t)&ifr) == -1) #ifdef __FreeBSD__ ifr.ifr_mtu = 1500; #else err(1, "SIOCGIFMTU"); #endif if (close(s)) err(1, "close"); if (ifr.ifr_mtu > 0) return (ifr.ifr_mtu); else { warnx("could not get mtu for %s, assuming 1500", ifname); return (1500); } } int eval_queue_opts(struct pf_altq *pa, struct node_queue_opt *opts, u_int32_t ref_bw) { int errors = 0; switch (pa->scheduler) { case ALTQT_CBQ: pa->pq_u.cbq_opts = opts->data.cbq_opts; break; case ALTQT_PRIQ: pa->pq_u.priq_opts = opts->data.priq_opts; break; case ALTQT_HFSC: pa->pq_u.hfsc_opts.flags = opts->data.hfsc_opts.flags; if (opts->data.hfsc_opts.linkshare.used) { pa->pq_u.hfsc_opts.lssc_m1 = eval_bwspec(&opts->data.hfsc_opts.linkshare.m1, ref_bw); pa->pq_u.hfsc_opts.lssc_m2 = eval_bwspec(&opts->data.hfsc_opts.linkshare.m2, ref_bw); pa->pq_u.hfsc_opts.lssc_d = opts->data.hfsc_opts.linkshare.d; } if (opts->data.hfsc_opts.realtime.used) { pa->pq_u.hfsc_opts.rtsc_m1 = eval_bwspec(&opts->data.hfsc_opts.realtime.m1, ref_bw); pa->pq_u.hfsc_opts.rtsc_m2 = eval_bwspec(&opts->data.hfsc_opts.realtime.m2, ref_bw); pa->pq_u.hfsc_opts.rtsc_d = opts->data.hfsc_opts.realtime.d; } if (opts->data.hfsc_opts.upperlimit.used) { pa->pq_u.hfsc_opts.ulsc_m1 = eval_bwspec(&opts->data.hfsc_opts.upperlimit.m1, ref_bw); pa->pq_u.hfsc_opts.ulsc_m2 = eval_bwspec(&opts->data.hfsc_opts.upperlimit.m2, ref_bw); pa->pq_u.hfsc_opts.ulsc_d = opts->data.hfsc_opts.upperlimit.d; } break; case ALTQT_FAIRQ: pa->pq_u.fairq_opts.flags = opts->data.fairq_opts.flags; pa->pq_u.fairq_opts.nbuckets = opts->data.fairq_opts.nbuckets; pa->pq_u.fairq_opts.hogs_m1 = eval_bwspec(&opts->data.fairq_opts.hogs_bw, ref_bw); if (opts->data.fairq_opts.linkshare.used) { pa->pq_u.fairq_opts.lssc_m1 = eval_bwspec(&opts->data.fairq_opts.linkshare.m1, ref_bw); pa->pq_u.fairq_opts.lssc_m2 = eval_bwspec(&opts->data.fairq_opts.linkshare.m2, ref_bw); pa->pq_u.fairq_opts.lssc_d = opts->data.fairq_opts.linkshare.d; } break; case ALTQT_CODEL: pa->pq_u.codel_opts.target = opts->data.codel_opts.target; pa->pq_u.codel_opts.interval = opts->data.codel_opts.interval; pa->pq_u.codel_opts.ecn = opts->data.codel_opts.ecn; break; default: warnx("eval_queue_opts: unknown scheduler type %u", opts->qtype); errors++; break; } return (errors); } u_int32_t eval_bwspec(struct node_queue_bw *bw, u_int32_t ref_bw) { if (bw->bw_absolute > 0) return (bw->bw_absolute); if (bw->bw_percent > 0) return (ref_bw / 100 * bw->bw_percent); return (0); } void print_hfsc_sc(const char *scname, u_int m1, u_int d, u_int m2, const struct node_hfsc_sc *sc) { printf(" %s", scname); if (d != 0) { printf("("); if (sc != NULL && sc->m1.bw_percent > 0) printf("%u%%", sc->m1.bw_percent); else printf("%s", rate2str((double)m1)); printf(" %u", d); } if (sc != NULL && sc->m2.bw_percent > 0) printf(" %u%%", sc->m2.bw_percent); else printf(" %s", rate2str((double)m2)); if (d != 0) printf(")"); } void print_fairq_sc(const char *scname, u_int m1, u_int d, u_int m2, const struct node_fairq_sc *sc) { printf(" %s", scname); if (d != 0) { printf("("); if (sc != NULL && sc->m1.bw_percent > 0) printf("%u%%", sc->m1.bw_percent); else printf("%s", rate2str((double)m1)); printf(" %u", d); } if (sc != NULL && sc->m2.bw_percent > 0) printf(" %u%%", sc->m2.bw_percent); else printf(" %s", rate2str((double)m2)); if (d != 0) printf(")"); } Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_optimize.c =================================================================== --- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_optimize.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_optimize.c (revision 303775) @@ -1,1655 +1,1656 @@ /* $OpenBSD: pfctl_optimize.c,v 1.17 2008/05/06 03:45:21 mpf Exp $ */ /* * Copyright (c) 2004 Mike Frantzen * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pfctl_parser.h" #include "pfctl.h" /* The size at which a table becomes faster than individual rules */ #define TABLE_THRESHOLD 6 /* #define OPT_DEBUG 1 */ #ifdef OPT_DEBUG # define DEBUG(str, v...) \ printf("%s: " str "\n", __FUNCTION__ , ## v) #else # define DEBUG(str, v...) ((void)0) #endif /* * A container that lets us sort a superblock to optimize the skip step jumps */ struct pf_skip_step { int ps_count; /* number of items */ TAILQ_HEAD( , pf_opt_rule) ps_rules; TAILQ_ENTRY(pf_skip_step) ps_entry; }; /* * A superblock is a block of adjacent rules of similar action. If there * are five PASS rules in a row, they all become members of a superblock. * Once we have a superblock, we are free to re-order any rules within it * in order to improve performance; if a packet is passed, it doesn't matter * who passed it. */ struct superblock { TAILQ_HEAD( , pf_opt_rule) sb_rules; TAILQ_ENTRY(superblock) sb_entry; struct superblock *sb_profiled_block; TAILQ_HEAD(skiplist, pf_skip_step) sb_skipsteps[PF_SKIP_COUNT]; }; TAILQ_HEAD(superblocks, superblock); /* * Description of the PF rule structure. */ enum { BARRIER, /* the presence of the field puts the rule in it's own block */ BREAK, /* the field may not differ between rules in a superblock */ NOMERGE, /* the field may not differ between rules when combined */ COMBINED, /* the field may itself be combined with other rules */ DC, /* we just don't care about the field */ NEVER}; /* we should never see this field set?!? */ -struct pf_rule_field { +static struct pf_rule_field { const char *prf_name; int prf_type; size_t prf_offset; size_t prf_size; } pf_rule_desc[] = { #define PF_RULE_FIELD(field, ty) \ {#field, \ ty, \ offsetof(struct pf_rule, field), \ sizeof(((struct pf_rule *)0)->field)} /* * The presence of these fields in a rule put the rule in it's own * superblock. Thus it will not be optimized. It also prevents the * rule from being re-ordered at all. */ PF_RULE_FIELD(label, BARRIER), PF_RULE_FIELD(prob, BARRIER), PF_RULE_FIELD(max_states, BARRIER), PF_RULE_FIELD(max_src_nodes, BARRIER), PF_RULE_FIELD(max_src_states, BARRIER), PF_RULE_FIELD(max_src_conn, BARRIER), PF_RULE_FIELD(max_src_conn_rate, BARRIER), PF_RULE_FIELD(anchor, BARRIER), /* for now */ /* * These fields must be the same between all rules in the same superblock. * These rules are allowed to be re-ordered but only among like rules. * For instance we can re-order all 'tag "foo"' rules because they have the * same tag. But we can not re-order between a 'tag "foo"' and a * 'tag "bar"' since that would change the meaning of the ruleset. */ PF_RULE_FIELD(tagname, BREAK), PF_RULE_FIELD(keep_state, BREAK), PF_RULE_FIELD(qname, BREAK), PF_RULE_FIELD(pqname, BREAK), PF_RULE_FIELD(rt, BREAK), PF_RULE_FIELD(allow_opts, BREAK), PF_RULE_FIELD(rule_flag, BREAK), PF_RULE_FIELD(action, BREAK), PF_RULE_FIELD(log, BREAK), PF_RULE_FIELD(quick, BREAK), PF_RULE_FIELD(return_ttl, BREAK), PF_RULE_FIELD(overload_tblname, BREAK), PF_RULE_FIELD(flush, BREAK), PF_RULE_FIELD(rpool, BREAK), PF_RULE_FIELD(logif, BREAK), /* * Any fields not listed in this structure act as BREAK fields */ /* * These fields must not differ when we merge two rules together but * their difference isn't enough to put the rules in different superblocks. * There are no problems re-ordering any rules with these fields. */ PF_RULE_FIELD(af, NOMERGE), PF_RULE_FIELD(ifnot, NOMERGE), PF_RULE_FIELD(ifname, NOMERGE), /* hack for IF groups */ PF_RULE_FIELD(match_tag_not, NOMERGE), PF_RULE_FIELD(match_tagname, NOMERGE), PF_RULE_FIELD(os_fingerprint, NOMERGE), PF_RULE_FIELD(timeout, NOMERGE), PF_RULE_FIELD(return_icmp, NOMERGE), PF_RULE_FIELD(return_icmp6, NOMERGE), PF_RULE_FIELD(uid, NOMERGE), PF_RULE_FIELD(gid, NOMERGE), PF_RULE_FIELD(direction, NOMERGE), PF_RULE_FIELD(proto, NOMERGE), PF_RULE_FIELD(type, NOMERGE), PF_RULE_FIELD(code, NOMERGE), PF_RULE_FIELD(flags, NOMERGE), PF_RULE_FIELD(flagset, NOMERGE), PF_RULE_FIELD(tos, NOMERGE), PF_RULE_FIELD(src.port, NOMERGE), PF_RULE_FIELD(dst.port, NOMERGE), PF_RULE_FIELD(src.port_op, NOMERGE), PF_RULE_FIELD(dst.port_op, NOMERGE), PF_RULE_FIELD(src.neg, NOMERGE), PF_RULE_FIELD(dst.neg, NOMERGE), /* These fields can be merged */ PF_RULE_FIELD(src.addr, COMBINED), PF_RULE_FIELD(dst.addr, COMBINED), /* We just don't care about these fields. They're set by the kernel */ PF_RULE_FIELD(skip, DC), PF_RULE_FIELD(evaluations, DC), PF_RULE_FIELD(packets, DC), PF_RULE_FIELD(bytes, DC), PF_RULE_FIELD(kif, DC), PF_RULE_FIELD(states_cur, DC), PF_RULE_FIELD(states_tot, DC), PF_RULE_FIELD(src_nodes, DC), PF_RULE_FIELD(nr, DC), PF_RULE_FIELD(entries, DC), PF_RULE_FIELD(qid, DC), PF_RULE_FIELD(pqid, DC), PF_RULE_FIELD(anchor_relative, DC), PF_RULE_FIELD(anchor_wildcard, DC), PF_RULE_FIELD(tag, DC), PF_RULE_FIELD(match_tag, DC), PF_RULE_FIELD(overload_tbl, DC), /* These fields should never be set in a PASS/BLOCK rule */ PF_RULE_FIELD(natpass, NEVER), PF_RULE_FIELD(max_mss, NEVER), PF_RULE_FIELD(min_ttl, NEVER), PF_RULE_FIELD(set_tos, NEVER), }; int add_opt_table(struct pfctl *, struct pf_opt_tbl **, sa_family_t, struct pf_rule_addr *); int addrs_combineable(struct pf_rule_addr *, struct pf_rule_addr *); int addrs_equal(struct pf_rule_addr *, struct pf_rule_addr *); int block_feedback(struct pfctl *, struct superblock *); int combine_rules(struct pfctl *, struct superblock *); void comparable_rule(struct pf_rule *, const struct pf_rule *, int); int construct_superblocks(struct pfctl *, struct pf_opt_queue *, struct superblocks *); void exclude_supersets(struct pf_rule *, struct pf_rule *); int interface_group(const char *); int load_feedback_profile(struct pfctl *, struct superblocks *); int optimize_superblock(struct pfctl *, struct superblock *); int pf_opt_create_table(struct pfctl *, struct pf_opt_tbl *); void remove_from_skipsteps(struct skiplist *, struct superblock *, struct pf_opt_rule *, struct pf_skip_step *); int remove_identical_rules(struct pfctl *, struct superblock *); int reorder_rules(struct pfctl *, struct superblock *, int); int rules_combineable(struct pf_rule *, struct pf_rule *); void skip_append(struct superblock *, int, struct pf_skip_step *, struct pf_opt_rule *); int skip_compare(int, struct pf_skip_step *, struct pf_opt_rule *); void skip_init(void); int skip_cmp_af(struct pf_rule *, struct pf_rule *); int skip_cmp_dir(struct pf_rule *, struct pf_rule *); int skip_cmp_dst_addr(struct pf_rule *, struct pf_rule *); int skip_cmp_dst_port(struct pf_rule *, struct pf_rule *); int skip_cmp_ifp(struct pf_rule *, struct pf_rule *); int skip_cmp_proto(struct pf_rule *, struct pf_rule *); int skip_cmp_src_addr(struct pf_rule *, struct pf_rule *); int skip_cmp_src_port(struct pf_rule *, struct pf_rule *); int superblock_inclusive(struct superblock *, struct pf_opt_rule *); void superblock_free(struct pfctl *, struct superblock *); -int (*skip_comparitors[PF_SKIP_COUNT])(struct pf_rule *, struct pf_rule *); -const char *skip_comparitors_names[PF_SKIP_COUNT]; +static int (*skip_comparitors[PF_SKIP_COUNT])(struct pf_rule *, + struct pf_rule *); +static const char *skip_comparitors_names[PF_SKIP_COUNT]; #define PF_SKIP_COMPARITORS { \ { "ifp", PF_SKIP_IFP, skip_cmp_ifp }, \ { "dir", PF_SKIP_DIR, skip_cmp_dir }, \ { "af", PF_SKIP_AF, skip_cmp_af }, \ { "proto", PF_SKIP_PROTO, skip_cmp_proto }, \ { "saddr", PF_SKIP_SRC_ADDR, skip_cmp_src_addr }, \ { "sport", PF_SKIP_SRC_PORT, skip_cmp_src_port }, \ { "daddr", PF_SKIP_DST_ADDR, skip_cmp_dst_addr }, \ { "dport", PF_SKIP_DST_PORT, skip_cmp_dst_port } \ } -struct pfr_buffer table_buffer; -int table_identifier; +static struct pfr_buffer table_buffer; +static int table_identifier; int pfctl_optimize_ruleset(struct pfctl *pf, struct pf_ruleset *rs) { struct superblocks superblocks; struct pf_opt_queue opt_queue; struct superblock *block; struct pf_opt_rule *por; struct pf_rule *r; struct pf_rulequeue *old_rules; DEBUG("optimizing ruleset"); memset(&table_buffer, 0, sizeof(table_buffer)); skip_init(); TAILQ_INIT(&opt_queue); old_rules = rs->rules[PF_RULESET_FILTER].active.ptr; rs->rules[PF_RULESET_FILTER].active.ptr = rs->rules[PF_RULESET_FILTER].inactive.ptr; rs->rules[PF_RULESET_FILTER].inactive.ptr = old_rules; /* * XXX expanding the pf_opt_rule format throughout pfctl might allow * us to avoid all this copying. */ while ((r = TAILQ_FIRST(rs->rules[PF_RULESET_FILTER].inactive.ptr)) != NULL) { TAILQ_REMOVE(rs->rules[PF_RULESET_FILTER].inactive.ptr, r, entries); if ((por = calloc(1, sizeof(*por))) == NULL) err(1, "calloc"); memcpy(&por->por_rule, r, sizeof(*r)); if (TAILQ_FIRST(&r->rpool.list) != NULL) { TAILQ_INIT(&por->por_rule.rpool.list); pfctl_move_pool(&r->rpool, &por->por_rule.rpool); } else bzero(&por->por_rule.rpool, sizeof(por->por_rule.rpool)); TAILQ_INSERT_TAIL(&opt_queue, por, por_entry); } TAILQ_INIT(&superblocks); if (construct_superblocks(pf, &opt_queue, &superblocks)) goto error; if (pf->optimize & PF_OPTIMIZE_PROFILE) { if (load_feedback_profile(pf, &superblocks)) goto error; } TAILQ_FOREACH(block, &superblocks, sb_entry) { if (optimize_superblock(pf, block)) goto error; } rs->anchor->refcnt = 0; while ((block = TAILQ_FIRST(&superblocks))) { TAILQ_REMOVE(&superblocks, block, sb_entry); while ((por = TAILQ_FIRST(&block->sb_rules))) { TAILQ_REMOVE(&block->sb_rules, por, por_entry); por->por_rule.nr = rs->anchor->refcnt++; if ((r = calloc(1, sizeof(*r))) == NULL) err(1, "calloc"); memcpy(r, &por->por_rule, sizeof(*r)); TAILQ_INIT(&r->rpool.list); pfctl_move_pool(&por->por_rule.rpool, &r->rpool); TAILQ_INSERT_TAIL( rs->rules[PF_RULESET_FILTER].active.ptr, r, entries); free(por); } free(block); } return (0); error: while ((por = TAILQ_FIRST(&opt_queue))) { TAILQ_REMOVE(&opt_queue, por, por_entry); if (por->por_src_tbl) { pfr_buf_clear(por->por_src_tbl->pt_buf); free(por->por_src_tbl->pt_buf); free(por->por_src_tbl); } if (por->por_dst_tbl) { pfr_buf_clear(por->por_dst_tbl->pt_buf); free(por->por_dst_tbl->pt_buf); free(por->por_dst_tbl); } free(por); } while ((block = TAILQ_FIRST(&superblocks))) { TAILQ_REMOVE(&superblocks, block, sb_entry); superblock_free(pf, block); } return (1); } /* * Go ahead and optimize a superblock */ int optimize_superblock(struct pfctl *pf, struct superblock *block) { #ifdef OPT_DEBUG struct pf_opt_rule *por; #endif /* OPT_DEBUG */ /* We have a few optimization passes: * 1) remove duplicate rules or rules that are a subset of other * rules * 2) combine otherwise identical rules with different IP addresses * into a single rule and put the addresses in a table. * 3) re-order the rules to improve kernel skip steps * 4) re-order the 'quick' rules based on feedback from the * active ruleset statistics * * XXX combine_rules() doesn't combine v4 and v6 rules. would just * have to keep af in the table container, make af 'COMBINE' and * twiddle the af on the merged rule * XXX maybe add a weighting to the metric on skipsteps when doing * reordering. sometimes two sequential tables will be better * that four consecutive interfaces. * XXX need to adjust the skipstep count of everything after PROTO, * since they aren't actually checked on a proto mismatch in * pf_test_{tcp, udp, icmp}() * XXX should i treat proto=0, af=0 or dir=0 special in skepstep * calculation since they are a DC? * XXX keep last skiplist of last superblock to influence this * superblock. '5 inet6 log' should make '3 inet6' come before '4 * inet' in the next superblock. * XXX would be useful to add tables for ports * XXX we can also re-order some mutually exclusive superblocks to * try merging superblocks before any of these optimization passes. * for instance a single 'log in' rule in the middle of non-logging * out rules. */ /* shortcut. there will be a lot of 1-rule superblocks */ if (!TAILQ_NEXT(TAILQ_FIRST(&block->sb_rules), por_entry)) return (0); #ifdef OPT_DEBUG printf("--- Superblock ---\n"); TAILQ_FOREACH(por, &block->sb_rules, por_entry) { printf(" "); print_rule(&por->por_rule, por->por_rule.anchor ? por->por_rule.anchor->name : "", 1, 0); } #endif /* OPT_DEBUG */ if (remove_identical_rules(pf, block)) return (1); if (combine_rules(pf, block)) return (1); if ((pf->optimize & PF_OPTIMIZE_PROFILE) && TAILQ_FIRST(&block->sb_rules)->por_rule.quick && block->sb_profiled_block) { if (block_feedback(pf, block)) return (1); } else if (reorder_rules(pf, block, 0)) { return (1); } /* * Don't add any optimization passes below reorder_rules(). It will * have divided superblocks into smaller blocks for further refinement * and doesn't put them back together again. What once was a true * superblock might have been split into multiple superblocks. */ #ifdef OPT_DEBUG printf("--- END Superblock ---\n"); #endif /* OPT_DEBUG */ return (0); } /* * Optimization pass #1: remove identical rules */ int remove_identical_rules(struct pfctl *pf, struct superblock *block) { struct pf_opt_rule *por1, *por2, *por_next, *por2_next; struct pf_rule a, a2, b, b2; for (por1 = TAILQ_FIRST(&block->sb_rules); por1; por1 = por_next) { por_next = TAILQ_NEXT(por1, por_entry); for (por2 = por_next; por2; por2 = por2_next) { por2_next = TAILQ_NEXT(por2, por_entry); comparable_rule(&a, &por1->por_rule, DC); comparable_rule(&b, &por2->por_rule, DC); memcpy(&a2, &a, sizeof(a2)); memcpy(&b2, &b, sizeof(b2)); exclude_supersets(&a, &b); exclude_supersets(&b2, &a2); if (memcmp(&a, &b, sizeof(a)) == 0) { DEBUG("removing identical rule nr%d = *nr%d*", por1->por_rule.nr, por2->por_rule.nr); TAILQ_REMOVE(&block->sb_rules, por2, por_entry); if (por_next == por2) por_next = TAILQ_NEXT(por1, por_entry); free(por2); } else if (memcmp(&a2, &b2, sizeof(a2)) == 0) { DEBUG("removing identical rule *nr%d* = nr%d", por1->por_rule.nr, por2->por_rule.nr); TAILQ_REMOVE(&block->sb_rules, por1, por_entry); free(por1); break; } } } return (0); } /* * Optimization pass #2: combine similar rules with different addresses * into a single rule and a table */ int combine_rules(struct pfctl *pf, struct superblock *block) { struct pf_opt_rule *p1, *p2, *por_next; int src_eq, dst_eq; if ((pf->loadopt & PFCTL_FLAG_TABLE) == 0) { warnx("Must enable table loading for optimizations"); return (1); } /* First we make a pass to combine the rules. O(n log n) */ TAILQ_FOREACH(p1, &block->sb_rules, por_entry) { for (p2 = TAILQ_NEXT(p1, por_entry); p2; p2 = por_next) { por_next = TAILQ_NEXT(p2, por_entry); src_eq = addrs_equal(&p1->por_rule.src, &p2->por_rule.src); dst_eq = addrs_equal(&p1->por_rule.dst, &p2->por_rule.dst); if (src_eq && !dst_eq && p1->por_src_tbl == NULL && p2->por_dst_tbl == NULL && p2->por_src_tbl == NULL && rules_combineable(&p1->por_rule, &p2->por_rule) && addrs_combineable(&p1->por_rule.dst, &p2->por_rule.dst)) { DEBUG("can combine rules nr%d = nr%d", p1->por_rule.nr, p2->por_rule.nr); if (p1->por_dst_tbl == NULL && add_opt_table(pf, &p1->por_dst_tbl, p1->por_rule.af, &p1->por_rule.dst)) return (1); if (add_opt_table(pf, &p1->por_dst_tbl, p1->por_rule.af, &p2->por_rule.dst)) return (1); p2->por_dst_tbl = p1->por_dst_tbl; if (p1->por_dst_tbl->pt_rulecount >= TABLE_THRESHOLD) { TAILQ_REMOVE(&block->sb_rules, p2, por_entry); free(p2); } } else if (!src_eq && dst_eq && p1->por_dst_tbl == NULL && p2->por_src_tbl == NULL && p2->por_dst_tbl == NULL && rules_combineable(&p1->por_rule, &p2->por_rule) && addrs_combineable(&p1->por_rule.src, &p2->por_rule.src)) { DEBUG("can combine rules nr%d = nr%d", p1->por_rule.nr, p2->por_rule.nr); if (p1->por_src_tbl == NULL && add_opt_table(pf, &p1->por_src_tbl, p1->por_rule.af, &p1->por_rule.src)) return (1); if (add_opt_table(pf, &p1->por_src_tbl, p1->por_rule.af, &p2->por_rule.src)) return (1); p2->por_src_tbl = p1->por_src_tbl; if (p1->por_src_tbl->pt_rulecount >= TABLE_THRESHOLD) { TAILQ_REMOVE(&block->sb_rules, p2, por_entry); free(p2); } } } } /* * Then we make a final pass to create a valid table name and * insert the name into the rules. */ for (p1 = TAILQ_FIRST(&block->sb_rules); p1; p1 = por_next) { por_next = TAILQ_NEXT(p1, por_entry); assert(p1->por_src_tbl == NULL || p1->por_dst_tbl == NULL); if (p1->por_src_tbl && p1->por_src_tbl->pt_rulecount >= TABLE_THRESHOLD) { if (p1->por_src_tbl->pt_generated) { /* This rule is included in a table */ TAILQ_REMOVE(&block->sb_rules, p1, por_entry); free(p1); continue; } p1->por_src_tbl->pt_generated = 1; if ((pf->opts & PF_OPT_NOACTION) == 0 && pf_opt_create_table(pf, p1->por_src_tbl)) return (1); pf->tdirty = 1; if (pf->opts & PF_OPT_VERBOSE) print_tabledef(p1->por_src_tbl->pt_name, PFR_TFLAG_CONST, 1, &p1->por_src_tbl->pt_nodes); memset(&p1->por_rule.src.addr, 0, sizeof(p1->por_rule.src.addr)); p1->por_rule.src.addr.type = PF_ADDR_TABLE; strlcpy(p1->por_rule.src.addr.v.tblname, p1->por_src_tbl->pt_name, sizeof(p1->por_rule.src.addr.v.tblname)); pfr_buf_clear(p1->por_src_tbl->pt_buf); free(p1->por_src_tbl->pt_buf); p1->por_src_tbl->pt_buf = NULL; } if (p1->por_dst_tbl && p1->por_dst_tbl->pt_rulecount >= TABLE_THRESHOLD) { if (p1->por_dst_tbl->pt_generated) { /* This rule is included in a table */ TAILQ_REMOVE(&block->sb_rules, p1, por_entry); free(p1); continue; } p1->por_dst_tbl->pt_generated = 1; if ((pf->opts & PF_OPT_NOACTION) == 0 && pf_opt_create_table(pf, p1->por_dst_tbl)) return (1); pf->tdirty = 1; if (pf->opts & PF_OPT_VERBOSE) print_tabledef(p1->por_dst_tbl->pt_name, PFR_TFLAG_CONST, 1, &p1->por_dst_tbl->pt_nodes); memset(&p1->por_rule.dst.addr, 0, sizeof(p1->por_rule.dst.addr)); p1->por_rule.dst.addr.type = PF_ADDR_TABLE; strlcpy(p1->por_rule.dst.addr.v.tblname, p1->por_dst_tbl->pt_name, sizeof(p1->por_rule.dst.addr.v.tblname)); pfr_buf_clear(p1->por_dst_tbl->pt_buf); free(p1->por_dst_tbl->pt_buf); p1->por_dst_tbl->pt_buf = NULL; } } return (0); } /* * Optimization pass #3: re-order rules to improve skip steps */ int reorder_rules(struct pfctl *pf, struct superblock *block, int depth) { struct superblock *newblock; struct pf_skip_step *skiplist; struct pf_opt_rule *por; int i, largest, largest_list, rule_count = 0; TAILQ_HEAD( , pf_opt_rule) head; /* * Calculate the best-case skip steps. We put each rule in a list * of other rules with common fields */ for (i = 0; i < PF_SKIP_COUNT; i++) { TAILQ_FOREACH(por, &block->sb_rules, por_entry) { TAILQ_FOREACH(skiplist, &block->sb_skipsteps[i], ps_entry) { if (skip_compare(i, skiplist, por) == 0) break; } if (skiplist == NULL) { if ((skiplist = calloc(1, sizeof(*skiplist))) == NULL) err(1, "calloc"); TAILQ_INIT(&skiplist->ps_rules); TAILQ_INSERT_TAIL(&block->sb_skipsteps[i], skiplist, ps_entry); } skip_append(block, i, skiplist, por); } } TAILQ_FOREACH(por, &block->sb_rules, por_entry) rule_count++; /* * Now we're going to ignore any fields that are identical between * all of the rules in the superblock and those fields which differ * between every rule in the superblock. */ largest = 0; for (i = 0; i < PF_SKIP_COUNT; i++) { skiplist = TAILQ_FIRST(&block->sb_skipsteps[i]); if (skiplist->ps_count == rule_count) { DEBUG("(%d) original skipstep '%s' is all rules", depth, skip_comparitors_names[i]); skiplist->ps_count = 0; } else if (skiplist->ps_count == 1) { skiplist->ps_count = 0; } else { DEBUG("(%d) original skipstep '%s' largest jump is %d", depth, skip_comparitors_names[i], skiplist->ps_count); if (skiplist->ps_count > largest) largest = skiplist->ps_count; } } if (largest == 0) { /* Ugh. There is NO commonality in the superblock on which * optimize the skipsteps optimization. */ goto done; } /* * Now we're going to empty the superblock rule list and re-create * it based on a more optimal skipstep order. */ TAILQ_INIT(&head); while ((por = TAILQ_FIRST(&block->sb_rules))) { TAILQ_REMOVE(&block->sb_rules, por, por_entry); TAILQ_INSERT_TAIL(&head, por, por_entry); } while (!TAILQ_EMPTY(&head)) { largest = 1; /* * Find the most useful skip steps remaining */ for (i = 0; i < PF_SKIP_COUNT; i++) { skiplist = TAILQ_FIRST(&block->sb_skipsteps[i]); if (skiplist->ps_count > largest) { largest = skiplist->ps_count; largest_list = i; } } if (largest <= 1) { /* * Nothing useful left. Leave remaining rules in order. */ DEBUG("(%d) no more commonality for skip steps", depth); while ((por = TAILQ_FIRST(&head))) { TAILQ_REMOVE(&head, por, por_entry); TAILQ_INSERT_TAIL(&block->sb_rules, por, por_entry); } } else { /* * There is commonality. Extract those common rules * and place them in the ruleset adjacent to each * other. */ skiplist = TAILQ_FIRST(&block->sb_skipsteps[ largest_list]); DEBUG("(%d) skipstep '%s' largest jump is %d @ #%d", depth, skip_comparitors_names[largest_list], largest, TAILQ_FIRST(&TAILQ_FIRST(&block-> sb_skipsteps [largest_list])->ps_rules)-> por_rule.nr); TAILQ_REMOVE(&block->sb_skipsteps[largest_list], skiplist, ps_entry); /* * There may be further commonality inside these * rules. So we'll split them off into they're own * superblock and pass it back into the optimizer. */ if (skiplist->ps_count > 2) { if ((newblock = calloc(1, sizeof(*newblock))) == NULL) { warn("calloc"); return (1); } TAILQ_INIT(&newblock->sb_rules); for (i = 0; i < PF_SKIP_COUNT; i++) TAILQ_INIT(&newblock->sb_skipsteps[i]); TAILQ_INSERT_BEFORE(block, newblock, sb_entry); DEBUG("(%d) splitting off %d rules from superblock @ #%d", depth, skiplist->ps_count, TAILQ_FIRST(&skiplist->ps_rules)-> por_rule.nr); } else { newblock = block; } while ((por = TAILQ_FIRST(&skiplist->ps_rules))) { TAILQ_REMOVE(&head, por, por_entry); TAILQ_REMOVE(&skiplist->ps_rules, por, por_skip_entry[largest_list]); TAILQ_INSERT_TAIL(&newblock->sb_rules, por, por_entry); /* Remove this rule from all other skiplists */ remove_from_skipsteps(&block->sb_skipsteps[ largest_list], block, por, skiplist); } free(skiplist); if (newblock != block) if (reorder_rules(pf, newblock, depth + 1)) return (1); } } done: for (i = 0; i < PF_SKIP_COUNT; i++) { while ((skiplist = TAILQ_FIRST(&block->sb_skipsteps[i]))) { TAILQ_REMOVE(&block->sb_skipsteps[i], skiplist, ps_entry); free(skiplist); } } return (0); } /* * Optimization pass #4: re-order 'quick' rules based on feedback from the * currently running ruleset */ int block_feedback(struct pfctl *pf, struct superblock *block) { TAILQ_HEAD( , pf_opt_rule) queue; struct pf_opt_rule *por1, *por2; u_int64_t total_count = 0; struct pf_rule a, b; /* * Walk through all of the profiled superblock's rules and copy * the counters onto our rules. */ TAILQ_FOREACH(por1, &block->sb_profiled_block->sb_rules, por_entry) { comparable_rule(&a, &por1->por_rule, DC); total_count += por1->por_rule.packets[0] + por1->por_rule.packets[1]; TAILQ_FOREACH(por2, &block->sb_rules, por_entry) { if (por2->por_profile_count) continue; comparable_rule(&b, &por2->por_rule, DC); if (memcmp(&a, &b, sizeof(a)) == 0) { por2->por_profile_count = por1->por_rule.packets[0] + por1->por_rule.packets[1]; break; } } } superblock_free(pf, block->sb_profiled_block); block->sb_profiled_block = NULL; /* * Now we pull all of the rules off the superblock and re-insert them * in sorted order. */ TAILQ_INIT(&queue); while ((por1 = TAILQ_FIRST(&block->sb_rules)) != NULL) { TAILQ_REMOVE(&block->sb_rules, por1, por_entry); TAILQ_INSERT_TAIL(&queue, por1, por_entry); } while ((por1 = TAILQ_FIRST(&queue)) != NULL) { TAILQ_REMOVE(&queue, por1, por_entry); /* XXX I should sort all of the unused rules based on skip steps */ TAILQ_FOREACH(por2, &block->sb_rules, por_entry) { if (por1->por_profile_count > por2->por_profile_count) { TAILQ_INSERT_BEFORE(por2, por1, por_entry); break; } } #ifdef __FreeBSD__ if (por2 == NULL) #else if (por2 == TAILQ_END(&block->sb_rules)) #endif TAILQ_INSERT_TAIL(&block->sb_rules, por1, por_entry); } return (0); } /* * Load the current ruleset from the kernel and try to associate them with * the ruleset we're optimizing. */ int load_feedback_profile(struct pfctl *pf, struct superblocks *superblocks) { struct superblock *block, *blockcur; struct superblocks prof_superblocks; struct pf_opt_rule *por; struct pf_opt_queue queue; struct pfioc_rule pr; struct pf_rule a, b; int nr, mnr; TAILQ_INIT(&queue); TAILQ_INIT(&prof_superblocks); memset(&pr, 0, sizeof(pr)); pr.rule.action = PF_PASS; if (ioctl(pf->dev, DIOCGETRULES, &pr)) { warn("DIOCGETRULES"); return (1); } mnr = pr.nr; DEBUG("Loading %d active rules for a feedback profile", mnr); for (nr = 0; nr < mnr; ++nr) { struct pf_ruleset *rs; if ((por = calloc(1, sizeof(*por))) == NULL) { warn("calloc"); return (1); } pr.nr = nr; if (ioctl(pf->dev, DIOCGETRULE, &pr)) { warn("DIOCGETRULES"); return (1); } memcpy(&por->por_rule, &pr.rule, sizeof(por->por_rule)); rs = pf_find_or_create_ruleset(pr.anchor_call); por->por_rule.anchor = rs->anchor; if (TAILQ_EMPTY(&por->por_rule.rpool.list)) memset(&por->por_rule.rpool, 0, sizeof(por->por_rule.rpool)); TAILQ_INSERT_TAIL(&queue, por, por_entry); /* XXX pfctl_get_pool(pf->dev, &pr.rule.rpool, nr, pr.ticket, * PF_PASS, pf->anchor) ??? * ... pfctl_clear_pool(&pr.rule.rpool) */ } if (construct_superblocks(pf, &queue, &prof_superblocks)) return (1); /* * Now we try to associate the active ruleset's superblocks with * the superblocks we're compiling. */ block = TAILQ_FIRST(superblocks); blockcur = TAILQ_FIRST(&prof_superblocks); while (block && blockcur) { comparable_rule(&a, &TAILQ_FIRST(&block->sb_rules)->por_rule, BREAK); comparable_rule(&b, &TAILQ_FIRST(&blockcur->sb_rules)->por_rule, BREAK); if (memcmp(&a, &b, sizeof(a)) == 0) { /* The two superblocks lined up */ block->sb_profiled_block = blockcur; } else { DEBUG("superblocks don't line up between #%d and #%d", TAILQ_FIRST(&block->sb_rules)->por_rule.nr, TAILQ_FIRST(&blockcur->sb_rules)->por_rule.nr); break; } block = TAILQ_NEXT(block, sb_entry); blockcur = TAILQ_NEXT(blockcur, sb_entry); } /* Free any superblocks we couldn't link */ while (blockcur) { block = TAILQ_NEXT(blockcur, sb_entry); superblock_free(pf, blockcur); blockcur = block; } return (0); } /* * Compare a rule to a skiplist to see if the rule is a member */ int skip_compare(int skipnum, struct pf_skip_step *skiplist, struct pf_opt_rule *por) { struct pf_rule *a, *b; if (skipnum >= PF_SKIP_COUNT || skipnum < 0) errx(1, "skip_compare() out of bounds"); a = &por->por_rule; b = &TAILQ_FIRST(&skiplist->ps_rules)->por_rule; return ((skip_comparitors[skipnum])(a, b)); } /* * Add a rule to a skiplist */ void skip_append(struct superblock *superblock, int skipnum, struct pf_skip_step *skiplist, struct pf_opt_rule *por) { struct pf_skip_step *prev; skiplist->ps_count++; TAILQ_INSERT_TAIL(&skiplist->ps_rules, por, por_skip_entry[skipnum]); /* Keep the list of skiplists sorted by whichever is larger */ while ((prev = TAILQ_PREV(skiplist, skiplist, ps_entry)) && prev->ps_count < skiplist->ps_count) { TAILQ_REMOVE(&superblock->sb_skipsteps[skipnum], skiplist, ps_entry); TAILQ_INSERT_BEFORE(prev, skiplist, ps_entry); } } /* * Remove a rule from the other skiplist calculations. */ void remove_from_skipsteps(struct skiplist *head, struct superblock *block, struct pf_opt_rule *por, struct pf_skip_step *active_list) { struct pf_skip_step *sk, *next; struct pf_opt_rule *p2; int i, found; for (i = 0; i < PF_SKIP_COUNT; i++) { sk = TAILQ_FIRST(&block->sb_skipsteps[i]); if (sk == NULL || sk == active_list || sk->ps_count <= 1) continue; found = 0; do { TAILQ_FOREACH(p2, &sk->ps_rules, por_skip_entry[i]) if (p2 == por) { TAILQ_REMOVE(&sk->ps_rules, p2, por_skip_entry[i]); found = 1; sk->ps_count--; break; } } while (!found && (sk = TAILQ_NEXT(sk, ps_entry))); if (found && sk) { /* Does this change the sorting order? */ while ((next = TAILQ_NEXT(sk, ps_entry)) && next->ps_count > sk->ps_count) { TAILQ_REMOVE(head, sk, ps_entry); TAILQ_INSERT_AFTER(head, next, sk, ps_entry); } #ifdef OPT_DEBUG next = TAILQ_NEXT(sk, ps_entry); assert(next == NULL || next->ps_count <= sk->ps_count); #endif /* OPT_DEBUG */ } } } /* Compare two rules AF field for skiplist construction */ int skip_cmp_af(struct pf_rule *a, struct pf_rule *b) { if (a->af != b->af || a->af == 0) return (1); return (0); } /* Compare two rules DIRECTION field for skiplist construction */ int skip_cmp_dir(struct pf_rule *a, struct pf_rule *b) { if (a->direction == 0 || a->direction != b->direction) return (1); return (0); } /* Compare two rules DST Address field for skiplist construction */ int skip_cmp_dst_addr(struct pf_rule *a, struct pf_rule *b) { if (a->dst.neg != b->dst.neg || a->dst.addr.type != b->dst.addr.type) return (1); /* XXX if (a->proto != b->proto && a->proto != 0 && b->proto != 0 * && (a->proto == IPPROTO_TCP || a->proto == IPPROTO_UDP || * a->proto == IPPROTO_ICMP * return (1); */ switch (a->dst.addr.type) { case PF_ADDR_ADDRMASK: if (memcmp(&a->dst.addr.v.a.addr, &b->dst.addr.v.a.addr, sizeof(a->dst.addr.v.a.addr)) || memcmp(&a->dst.addr.v.a.mask, &b->dst.addr.v.a.mask, sizeof(a->dst.addr.v.a.mask)) || (a->dst.addr.v.a.addr.addr32[0] == 0 && a->dst.addr.v.a.addr.addr32[1] == 0 && a->dst.addr.v.a.addr.addr32[2] == 0 && a->dst.addr.v.a.addr.addr32[3] == 0)) return (1); return (0); case PF_ADDR_DYNIFTL: if (strcmp(a->dst.addr.v.ifname, b->dst.addr.v.ifname) != 0 || a->dst.addr.iflags != a->dst.addr.iflags || memcmp(&a->dst.addr.v.a.mask, &b->dst.addr.v.a.mask, sizeof(a->dst.addr.v.a.mask))) return (1); return (0); case PF_ADDR_NOROUTE: case PF_ADDR_URPFFAILED: return (0); case PF_ADDR_TABLE: return (strcmp(a->dst.addr.v.tblname, b->dst.addr.v.tblname)); } return (1); } /* Compare two rules DST port field for skiplist construction */ int skip_cmp_dst_port(struct pf_rule *a, struct pf_rule *b) { /* XXX if (a->proto != b->proto && a->proto != 0 && b->proto != 0 * && (a->proto == IPPROTO_TCP || a->proto == IPPROTO_UDP || * a->proto == IPPROTO_ICMP * return (1); */ if (a->dst.port_op == PF_OP_NONE || a->dst.port_op != b->dst.port_op || a->dst.port[0] != b->dst.port[0] || a->dst.port[1] != b->dst.port[1]) return (1); return (0); } /* Compare two rules IFP field for skiplist construction */ int skip_cmp_ifp(struct pf_rule *a, struct pf_rule *b) { if (strcmp(a->ifname, b->ifname) || a->ifname[0] == '\0') return (1); return (a->ifnot != b->ifnot); } /* Compare two rules PROTO field for skiplist construction */ int skip_cmp_proto(struct pf_rule *a, struct pf_rule *b) { return (a->proto != b->proto || a->proto == 0); } /* Compare two rules SRC addr field for skiplist construction */ int skip_cmp_src_addr(struct pf_rule *a, struct pf_rule *b) { if (a->src.neg != b->src.neg || a->src.addr.type != b->src.addr.type) return (1); /* XXX if (a->proto != b->proto && a->proto != 0 && b->proto != 0 * && (a->proto == IPPROTO_TCP || a->proto == IPPROTO_UDP || * a->proto == IPPROTO_ICMP * return (1); */ switch (a->src.addr.type) { case PF_ADDR_ADDRMASK: if (memcmp(&a->src.addr.v.a.addr, &b->src.addr.v.a.addr, sizeof(a->src.addr.v.a.addr)) || memcmp(&a->src.addr.v.a.mask, &b->src.addr.v.a.mask, sizeof(a->src.addr.v.a.mask)) || (a->src.addr.v.a.addr.addr32[0] == 0 && a->src.addr.v.a.addr.addr32[1] == 0 && a->src.addr.v.a.addr.addr32[2] == 0 && a->src.addr.v.a.addr.addr32[3] == 0)) return (1); return (0); case PF_ADDR_DYNIFTL: if (strcmp(a->src.addr.v.ifname, b->src.addr.v.ifname) != 0 || a->src.addr.iflags != a->src.addr.iflags || memcmp(&a->src.addr.v.a.mask, &b->src.addr.v.a.mask, sizeof(a->src.addr.v.a.mask))) return (1); return (0); case PF_ADDR_NOROUTE: case PF_ADDR_URPFFAILED: return (0); case PF_ADDR_TABLE: return (strcmp(a->src.addr.v.tblname, b->src.addr.v.tblname)); } return (1); } /* Compare two rules SRC port field for skiplist construction */ int skip_cmp_src_port(struct pf_rule *a, struct pf_rule *b) { if (a->src.port_op == PF_OP_NONE || a->src.port_op != b->src.port_op || a->src.port[0] != b->src.port[0] || a->src.port[1] != b->src.port[1]) return (1); /* XXX if (a->proto != b->proto && a->proto != 0 && b->proto != 0 * && (a->proto == IPPROTO_TCP || a->proto == IPPROTO_UDP || * a->proto == IPPROTO_ICMP * return (1); */ return (0); } void skip_init(void) { struct { char *name; int skipnum; int (*func)(struct pf_rule *, struct pf_rule *); } comps[] = PF_SKIP_COMPARITORS; int skipnum, i; for (skipnum = 0; skipnum < PF_SKIP_COUNT; skipnum++) { for (i = 0; i < sizeof(comps)/sizeof(*comps); i++) if (comps[i].skipnum == skipnum) { skip_comparitors[skipnum] = comps[i].func; skip_comparitors_names[skipnum] = comps[i].name; } } for (skipnum = 0; skipnum < PF_SKIP_COUNT; skipnum++) if (skip_comparitors[skipnum] == NULL) errx(1, "Need to add skip step comparitor to pfctl?!"); } /* * Add a host/netmask to a table */ int add_opt_table(struct pfctl *pf, struct pf_opt_tbl **tbl, sa_family_t af, struct pf_rule_addr *addr) { #ifdef OPT_DEBUG char buf[128]; #endif /* OPT_DEBUG */ static int tablenum = 0; struct node_host node_host; if (*tbl == NULL) { if ((*tbl = calloc(1, sizeof(**tbl))) == NULL || ((*tbl)->pt_buf = calloc(1, sizeof(*(*tbl)->pt_buf))) == NULL) err(1, "calloc"); (*tbl)->pt_buf->pfrb_type = PFRB_ADDRS; SIMPLEQ_INIT(&(*tbl)->pt_nodes); /* This is just a temporary table name */ snprintf((*tbl)->pt_name, sizeof((*tbl)->pt_name), "%s%d", PF_OPT_TABLE_PREFIX, tablenum++); DEBUG("creating table <%s>", (*tbl)->pt_name); } memset(&node_host, 0, sizeof(node_host)); node_host.af = af; node_host.addr = addr->addr; #ifdef OPT_DEBUG DEBUG("<%s> adding %s/%d", (*tbl)->pt_name, inet_ntop(af, &node_host.addr.v.a.addr, buf, sizeof(buf)), unmask(&node_host.addr.v.a.mask, af)); #endif /* OPT_DEBUG */ if (append_addr_host((*tbl)->pt_buf, &node_host, 0, 0)) { warn("failed to add host"); return (1); } if (pf->opts & PF_OPT_VERBOSE) { struct node_tinit *ti; if ((ti = calloc(1, sizeof(*ti))) == NULL) err(1, "malloc"); if ((ti->host = malloc(sizeof(*ti->host))) == NULL) err(1, "malloc"); memcpy(ti->host, &node_host, sizeof(*ti->host)); SIMPLEQ_INSERT_TAIL(&(*tbl)->pt_nodes, ti, entries); } (*tbl)->pt_rulecount++; if ((*tbl)->pt_rulecount == TABLE_THRESHOLD) DEBUG("table <%s> now faster than skip steps", (*tbl)->pt_name); return (0); } /* * Do the dirty work of choosing an unused table name and creating it. * (be careful with the table name, it might already be used in another anchor) */ int pf_opt_create_table(struct pfctl *pf, struct pf_opt_tbl *tbl) { static int tablenum; struct pfr_table *t; if (table_buffer.pfrb_type == 0) { /* Initialize the list of tables */ table_buffer.pfrb_type = PFRB_TABLES; for (;;) { pfr_buf_grow(&table_buffer, table_buffer.pfrb_size); table_buffer.pfrb_size = table_buffer.pfrb_msize; if (pfr_get_tables(NULL, table_buffer.pfrb_caddr, &table_buffer.pfrb_size, PFR_FLAG_ALLRSETS)) err(1, "pfr_get_tables"); if (table_buffer.pfrb_size <= table_buffer.pfrb_msize) break; } table_identifier = arc4random(); } /* XXX would be *really* nice to avoid duplicating identical tables */ /* Now we have to pick a table name that isn't used */ again: DEBUG("translating temporary table <%s> to <%s%x_%d>", tbl->pt_name, PF_OPT_TABLE_PREFIX, table_identifier, tablenum); snprintf(tbl->pt_name, sizeof(tbl->pt_name), "%s%x_%d", PF_OPT_TABLE_PREFIX, table_identifier, tablenum); PFRB_FOREACH(t, &table_buffer) { if (strcasecmp(t->pfrt_name, tbl->pt_name) == 0) { /* Collision. Try again */ DEBUG("wow, table <%s> in use. trying again", tbl->pt_name); table_identifier = arc4random(); goto again; } } tablenum++; if (pfctl_define_table(tbl->pt_name, PFR_TFLAG_CONST, 1, pf->astack[0]->name, tbl->pt_buf, pf->astack[0]->ruleset.tticket)) { warn("failed to create table %s in %s", tbl->pt_name, pf->astack[0]->name); return (1); } return (0); } /* * Partition the flat ruleset into a list of distinct superblocks */ int construct_superblocks(struct pfctl *pf, struct pf_opt_queue *opt_queue, struct superblocks *superblocks) { struct superblock *block = NULL; struct pf_opt_rule *por; int i; while (!TAILQ_EMPTY(opt_queue)) { por = TAILQ_FIRST(opt_queue); TAILQ_REMOVE(opt_queue, por, por_entry); if (block == NULL || !superblock_inclusive(block, por)) { if ((block = calloc(1, sizeof(*block))) == NULL) { warn("calloc"); return (1); } TAILQ_INIT(&block->sb_rules); for (i = 0; i < PF_SKIP_COUNT; i++) TAILQ_INIT(&block->sb_skipsteps[i]); TAILQ_INSERT_TAIL(superblocks, block, sb_entry); } TAILQ_INSERT_TAIL(&block->sb_rules, por, por_entry); } return (0); } /* * Compare two rule addresses */ int addrs_equal(struct pf_rule_addr *a, struct pf_rule_addr *b) { if (a->neg != b->neg) return (0); return (memcmp(&a->addr, &b->addr, sizeof(a->addr)) == 0); } /* * The addresses are not equal, but can we combine them into one table? */ int addrs_combineable(struct pf_rule_addr *a, struct pf_rule_addr *b) { if (a->addr.type != PF_ADDR_ADDRMASK || b->addr.type != PF_ADDR_ADDRMASK) return (0); if (a->neg != b->neg || a->port_op != b->port_op || a->port[0] != b->port[0] || a->port[1] != b->port[1]) return (0); return (1); } /* * Are we allowed to combine these two rules */ int rules_combineable(struct pf_rule *p1, struct pf_rule *p2) { struct pf_rule a, b; comparable_rule(&a, p1, COMBINED); comparable_rule(&b, p2, COMBINED); return (memcmp(&a, &b, sizeof(a)) == 0); } /* * Can a rule be included inside a superblock */ int superblock_inclusive(struct superblock *block, struct pf_opt_rule *por) { struct pf_rule a, b; int i, j; /* First check for hard breaks */ for (i = 0; i < sizeof(pf_rule_desc)/sizeof(*pf_rule_desc); i++) { if (pf_rule_desc[i].prf_type == BARRIER) { for (j = 0; j < pf_rule_desc[i].prf_size; j++) if (((char *)&por->por_rule)[j + pf_rule_desc[i].prf_offset] != 0) return (0); } } /* per-rule src-track is also a hard break */ if (por->por_rule.rule_flag & PFRULE_RULESRCTRACK) return (0); /* * Have to handle interface groups separately. Consider the following * rules: * block on EXTIFS to any port 22 * pass on em0 to any port 22 * (where EXTIFS is an arbitrary interface group) * The optimizer may decide to re-order the pass rule in front of the * block rule. But what if EXTIFS includes em0??? Such a reordering * would change the meaning of the ruleset. * We can't just lookup the EXTIFS group and check if em0 is a member * because the user is allowed to add interfaces to a group during * runtime. * Ergo interface groups become a defacto superblock break :-( */ if (interface_group(por->por_rule.ifname) || interface_group(TAILQ_FIRST(&block->sb_rules)->por_rule.ifname)) { if (strcasecmp(por->por_rule.ifname, TAILQ_FIRST(&block->sb_rules)->por_rule.ifname) != 0) return (0); } comparable_rule(&a, &TAILQ_FIRST(&block->sb_rules)->por_rule, NOMERGE); comparable_rule(&b, &por->por_rule, NOMERGE); if (memcmp(&a, &b, sizeof(a)) == 0) return (1); #ifdef OPT_DEBUG for (i = 0; i < sizeof(por->por_rule); i++) { int closest = -1; if (((u_int8_t *)&a)[i] != ((u_int8_t *)&b)[i]) { for (j = 0; j < sizeof(pf_rule_desc) / sizeof(*pf_rule_desc); j++) { if (i >= pf_rule_desc[j].prf_offset && i < pf_rule_desc[j].prf_offset + pf_rule_desc[j].prf_size) { DEBUG("superblock break @ %d due to %s", por->por_rule.nr, pf_rule_desc[j].prf_name); return (0); } if (i > pf_rule_desc[j].prf_offset) { if (closest == -1 || i-pf_rule_desc[j].prf_offset < i-pf_rule_desc[closest].prf_offset) closest = j; } } if (closest >= 0) DEBUG("superblock break @ %d on %s+%xh", por->por_rule.nr, pf_rule_desc[closest].prf_name, i - pf_rule_desc[closest].prf_offset - pf_rule_desc[closest].prf_size); else DEBUG("superblock break @ %d on field @ %d", por->por_rule.nr, i); return (0); } } #endif /* OPT_DEBUG */ return (0); } /* * Figure out if an interface name is an actual interface or actually a * group of interfaces. */ int interface_group(const char *ifname) { if (ifname == NULL || !ifname[0]) return (0); /* Real interfaces must end in a number, interface groups do not */ if (isdigit(ifname[strlen(ifname) - 1])) return (0); else return (1); } /* * Make a rule that can directly compared by memcmp() */ void comparable_rule(struct pf_rule *dst, const struct pf_rule *src, int type) { int i; /* * To simplify the comparison, we just zero out the fields that are * allowed to be different and then do a simple memcmp() */ memcpy(dst, src, sizeof(*dst)); for (i = 0; i < sizeof(pf_rule_desc)/sizeof(*pf_rule_desc); i++) if (pf_rule_desc[i].prf_type >= type) { #ifdef OPT_DEBUG assert(pf_rule_desc[i].prf_type != NEVER || *(((char *)dst) + pf_rule_desc[i].prf_offset) == 0); #endif /* OPT_DEBUG */ memset(((char *)dst) + pf_rule_desc[i].prf_offset, 0, pf_rule_desc[i].prf_size); } } /* * Remove superset information from two rules so we can directly compare them * with memcmp() */ void exclude_supersets(struct pf_rule *super, struct pf_rule *sub) { if (super->ifname[0] == '\0') memset(sub->ifname, 0, sizeof(sub->ifname)); if (super->direction == PF_INOUT) sub->direction = PF_INOUT; if ((super->proto == 0 || super->proto == sub->proto) && super->flags == 0 && super->flagset == 0 && (sub->flags || sub->flagset)) { sub->flags = super->flags; sub->flagset = super->flagset; } if (super->proto == 0) sub->proto = 0; if (super->src.port_op == 0) { sub->src.port_op = 0; sub->src.port[0] = 0; sub->src.port[1] = 0; } if (super->dst.port_op == 0) { sub->dst.port_op = 0; sub->dst.port[0] = 0; sub->dst.port[1] = 0; } if (super->src.addr.type == PF_ADDR_ADDRMASK && !super->src.neg && !sub->src.neg && super->src.addr.v.a.mask.addr32[0] == 0 && super->src.addr.v.a.mask.addr32[1] == 0 && super->src.addr.v.a.mask.addr32[2] == 0 && super->src.addr.v.a.mask.addr32[3] == 0) memset(&sub->src.addr, 0, sizeof(sub->src.addr)); else if (super->src.addr.type == PF_ADDR_ADDRMASK && sub->src.addr.type == PF_ADDR_ADDRMASK && super->src.neg == sub->src.neg && super->af == sub->af && unmask(&super->src.addr.v.a.mask, super->af) < unmask(&sub->src.addr.v.a.mask, sub->af) && super->src.addr.v.a.addr.addr32[0] == (sub->src.addr.v.a.addr.addr32[0] & super->src.addr.v.a.mask.addr32[0]) && super->src.addr.v.a.addr.addr32[1] == (sub->src.addr.v.a.addr.addr32[1] & super->src.addr.v.a.mask.addr32[1]) && super->src.addr.v.a.addr.addr32[2] == (sub->src.addr.v.a.addr.addr32[2] & super->src.addr.v.a.mask.addr32[2]) && super->src.addr.v.a.addr.addr32[3] == (sub->src.addr.v.a.addr.addr32[3] & super->src.addr.v.a.mask.addr32[3])) { /* sub->src.addr is a subset of super->src.addr/mask */ memcpy(&sub->src.addr, &super->src.addr, sizeof(sub->src.addr)); } if (super->dst.addr.type == PF_ADDR_ADDRMASK && !super->dst.neg && !sub->dst.neg && super->dst.addr.v.a.mask.addr32[0] == 0 && super->dst.addr.v.a.mask.addr32[1] == 0 && super->dst.addr.v.a.mask.addr32[2] == 0 && super->dst.addr.v.a.mask.addr32[3] == 0) memset(&sub->dst.addr, 0, sizeof(sub->dst.addr)); else if (super->dst.addr.type == PF_ADDR_ADDRMASK && sub->dst.addr.type == PF_ADDR_ADDRMASK && super->dst.neg == sub->dst.neg && super->af == sub->af && unmask(&super->dst.addr.v.a.mask, super->af) < unmask(&sub->dst.addr.v.a.mask, sub->af) && super->dst.addr.v.a.addr.addr32[0] == (sub->dst.addr.v.a.addr.addr32[0] & super->dst.addr.v.a.mask.addr32[0]) && super->dst.addr.v.a.addr.addr32[1] == (sub->dst.addr.v.a.addr.addr32[1] & super->dst.addr.v.a.mask.addr32[1]) && super->dst.addr.v.a.addr.addr32[2] == (sub->dst.addr.v.a.addr.addr32[2] & super->dst.addr.v.a.mask.addr32[2]) && super->dst.addr.v.a.addr.addr32[3] == (sub->dst.addr.v.a.addr.addr32[3] & super->dst.addr.v.a.mask.addr32[3])) { /* sub->dst.addr is a subset of super->dst.addr/mask */ memcpy(&sub->dst.addr, &super->dst.addr, sizeof(sub->dst.addr)); } if (super->af == 0) sub->af = 0; } void superblock_free(struct pfctl *pf, struct superblock *block) { struct pf_opt_rule *por; while ((por = TAILQ_FIRST(&block->sb_rules))) { TAILQ_REMOVE(&block->sb_rules, por, por_entry); if (por->por_src_tbl) { if (por->por_src_tbl->pt_buf) { pfr_buf_clear(por->por_src_tbl->pt_buf); free(por->por_src_tbl->pt_buf); } free(por->por_src_tbl); } if (por->por_dst_tbl) { if (por->por_dst_tbl->pt_buf) { pfr_buf_clear(por->por_dst_tbl->pt_buf); free(por->por_dst_tbl->pt_buf); } free(por->por_dst_tbl); } free(por); } if (block->sb_profiled_block) superblock_free(pf, block->sb_profiled_block); free(block); } Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_osfp.c =================================================================== --- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_osfp.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_osfp.c (revision 303775) @@ -1,1108 +1,1111 @@ /* $OpenBSD: pfctl_osfp.c,v 1.14 2006/04/08 02:13:14 ray Exp $ */ /* * Copyright (c) 2003 Mike Frantzen * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ +#include +__FBSDID("$FreeBSD$"); + #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pfctl_parser.h" #include "pfctl.h" #ifndef MIN # define MIN(a,b) (((a) < (b)) ? (a) : (b)) #endif /* MIN */ #ifndef MAX # define MAX(a,b) (((a) > (b)) ? (a) : (b)) #endif /* MAX */ #if 0 # define DEBUG(fp, str, v...) \ fprintf(stderr, "%s:%s:%s " str "\n", (fp)->fp_os.fp_class_nm, \ (fp)->fp_os.fp_version_nm, (fp)->fp_os.fp_subtype_nm , ## v); #else # define DEBUG(fp, str, v...) ((void)0) #endif struct name_entry; LIST_HEAD(name_list, name_entry); struct name_entry { LIST_ENTRY(name_entry) nm_entry; int nm_num; char nm_name[PF_OSFP_LEN]; struct name_list nm_sublist; int nm_sublist_num; }; -struct name_list classes = LIST_HEAD_INITIALIZER(&classes); -int class_count; -int fingerprint_count; +static struct name_list classes = LIST_HEAD_INITIALIZER(&classes); +static int class_count; +static int fingerprint_count; void add_fingerprint(int, int, struct pf_osfp_ioctl *); struct name_entry *fingerprint_name_entry(struct name_list *, char *); void pfctl_flush_my_fingerprints(struct name_list *); char *get_field(char **, size_t *, int *); int get_int(char **, size_t *, int *, int *, const char *, int, int, const char *, int); int get_str(char **, size_t *, char **, const char *, int, const char *, int); int get_tcpopts(const char *, int, const char *, pf_tcpopts_t *, int *, int *, int *, int *, int *, int *); void import_fingerprint(struct pf_osfp_ioctl *); const char *print_ioctl(struct pf_osfp_ioctl *); void print_name_list(int, struct name_list *, const char *); void sort_name_list(int, struct name_list *); struct name_entry *lookup_name_list(struct name_list *, const char *); /* Load fingerprints from a file */ int pfctl_file_fingerprints(int dev, int opts, const char *fp_filename) { FILE *in; char *line; size_t len; int i, lineno = 0; int window, w_mod, ttl, df, psize, p_mod, mss, mss_mod, wscale, wscale_mod, optcnt, ts0; pf_tcpopts_t packed_tcpopts; char *class, *version, *subtype, *desc, *tcpopts; struct pf_osfp_ioctl fp; pfctl_flush_my_fingerprints(&classes); if ((in = pfctl_fopen(fp_filename, "r")) == NULL) { warn("%s", fp_filename); return (1); } class = version = subtype = desc = tcpopts = NULL; if ((opts & PF_OPT_NOACTION) == 0) pfctl_clear_fingerprints(dev, opts); while ((line = fgetln(in, &len)) != NULL) { lineno++; if (class) free(class); if (version) free(version); if (subtype) free(subtype); if (desc) free(desc); if (tcpopts) free(tcpopts); class = version = subtype = desc = tcpopts = NULL; memset(&fp, 0, sizeof(fp)); /* Chop off comment */ for (i = 0; i < len; i++) if (line[i] == '#') { len = i; break; } /* Chop off whitespace */ while (len > 0 && isspace(line[len - 1])) len--; while (len > 0 && isspace(line[0])) { len--; line++; } if (len == 0) continue; #define T_DC 0x01 /* Allow don't care */ #define T_MSS 0x02 /* Allow MSS multiple */ #define T_MTU 0x04 /* Allow MTU multiple */ #define T_MOD 0x08 /* Allow modulus */ #define GET_INT(v, mod, n, ty, mx) \ get_int(&line, &len, &v, mod, n, ty, mx, fp_filename, lineno) #define GET_STR(v, n, mn) \ get_str(&line, &len, &v, n, mn, fp_filename, lineno) if (GET_INT(window, &w_mod, "window size", T_DC|T_MSS|T_MTU| T_MOD, 0xffff) || GET_INT(ttl, NULL, "ttl", 0, 0xff) || GET_INT(df, NULL, "don't fragment frag", 0, 1) || GET_INT(psize, &p_mod, "overall packet size", T_MOD|T_DC, 8192) || GET_STR(tcpopts, "TCP Options", 1) || GET_STR(class, "OS class", 1) || GET_STR(version, "OS version", 0) || GET_STR(subtype, "OS subtype", 0) || GET_STR(desc, "OS description", 2)) continue; if (get_tcpopts(fp_filename, lineno, tcpopts, &packed_tcpopts, &optcnt, &mss, &mss_mod, &wscale, &wscale_mod, &ts0)) continue; if (len != 0) { fprintf(stderr, "%s:%d excess field\n", fp_filename, lineno); continue; } fp.fp_ttl = ttl; if (df) fp.fp_flags |= PF_OSFP_DF; switch (w_mod) { case 0: break; case T_DC: fp.fp_flags |= PF_OSFP_WSIZE_DC; break; case T_MSS: fp.fp_flags |= PF_OSFP_WSIZE_MSS; break; case T_MTU: fp.fp_flags |= PF_OSFP_WSIZE_MTU; break; case T_MOD: fp.fp_flags |= PF_OSFP_WSIZE_MOD; break; } fp.fp_wsize = window; switch (p_mod) { case T_DC: fp.fp_flags |= PF_OSFP_PSIZE_DC; break; case T_MOD: fp.fp_flags |= PF_OSFP_PSIZE_MOD; } fp.fp_psize = psize; switch (wscale_mod) { case T_DC: fp.fp_flags |= PF_OSFP_WSCALE_DC; break; case T_MOD: fp.fp_flags |= PF_OSFP_WSCALE_MOD; } fp.fp_wscale = wscale; switch (mss_mod) { case T_DC: fp.fp_flags |= PF_OSFP_MSS_DC; break; case T_MOD: fp.fp_flags |= PF_OSFP_MSS_MOD; break; } fp.fp_mss = mss; fp.fp_tcpopts = packed_tcpopts; fp.fp_optcnt = optcnt; if (ts0) fp.fp_flags |= PF_OSFP_TS0; if (class[0] == '@') fp.fp_os.fp_enflags |= PF_OSFP_GENERIC; if (class[0] == '*') fp.fp_os.fp_enflags |= PF_OSFP_NODETAIL; if (class[0] == '@' || class[0] == '*') strlcpy(fp.fp_os.fp_class_nm, class + 1, sizeof(fp.fp_os.fp_class_nm)); else strlcpy(fp.fp_os.fp_class_nm, class, sizeof(fp.fp_os.fp_class_nm)); strlcpy(fp.fp_os.fp_version_nm, version, sizeof(fp.fp_os.fp_version_nm)); strlcpy(fp.fp_os.fp_subtype_nm, subtype, sizeof(fp.fp_os.fp_subtype_nm)); add_fingerprint(dev, opts, &fp); fp.fp_flags |= (PF_OSFP_DF | PF_OSFP_INET6); fp.fp_psize += sizeof(struct ip6_hdr) - sizeof(struct ip); add_fingerprint(dev, opts, &fp); } if (class) free(class); if (version) free(version); if (subtype) free(subtype); if (desc) free(desc); if (tcpopts) free(tcpopts); fclose(in); if (opts & PF_OPT_VERBOSE2) printf("Loaded %d passive OS fingerprints\n", fingerprint_count); return (0); } /* flush the kernel's fingerprints */ void pfctl_clear_fingerprints(int dev, int opts) { if (ioctl(dev, DIOCOSFPFLUSH)) err(1, "DIOCOSFPFLUSH"); } /* flush pfctl's view of the fingerprints */ void pfctl_flush_my_fingerprints(struct name_list *list) { struct name_entry *nm; while ((nm = LIST_FIRST(list)) != NULL) { LIST_REMOVE(nm, nm_entry); pfctl_flush_my_fingerprints(&nm->nm_sublist); free(nm); } fingerprint_count = 0; class_count = 0; } /* Fetch the active fingerprints from the kernel */ int pfctl_load_fingerprints(int dev, int opts) { struct pf_osfp_ioctl io; int i; pfctl_flush_my_fingerprints(&classes); for (i = 0; i >= 0; i++) { memset(&io, 0, sizeof(io)); io.fp_getnum = i; if (ioctl(dev, DIOCOSFPGET, &io)) { if (errno == EBUSY) break; warn("DIOCOSFPGET"); return (1); } import_fingerprint(&io); } return (0); } /* List the fingerprints */ void pfctl_show_fingerprints(int opts) { if (LIST_FIRST(&classes) != NULL) { if (opts & PF_OPT_SHOWALL) { pfctl_print_title("OS FINGERPRINTS:"); printf("%u fingerprints loaded\n", fingerprint_count); } else { printf("Class\tVersion\tSubtype(subversion)\n"); printf("-----\t-------\t-------------------\n"); sort_name_list(opts, &classes); print_name_list(opts, &classes, ""); } } } /* Lookup a fingerprint */ pf_osfp_t pfctl_get_fingerprint(const char *name) { struct name_entry *nm, *class_nm, *version_nm, *subtype_nm; pf_osfp_t ret = PF_OSFP_NOMATCH; int class, version, subtype; int unp_class, unp_version, unp_subtype; int wr_len, version_len, subtype_len; char *ptr, *wr_name; if (strcasecmp(name, "unknown") == 0) return (PF_OSFP_UNKNOWN); /* Try most likely no version and no subtype */ if ((nm = lookup_name_list(&classes, name))) { class = nm->nm_num; version = PF_OSFP_ANY; subtype = PF_OSFP_ANY; goto found; } else { /* Chop it up into class/version/subtype */ if ((wr_name = strdup(name)) == NULL) err(1, "malloc"); if ((ptr = strchr(wr_name, ' ')) == NULL) { free(wr_name); return (PF_OSFP_NOMATCH); } *ptr++ = '\0'; /* The class is easy to find since it is delimited by a space */ if ((class_nm = lookup_name_list(&classes, wr_name)) == NULL) { free(wr_name); return (PF_OSFP_NOMATCH); } class = class_nm->nm_num; /* Try no subtype */ if ((version_nm = lookup_name_list(&class_nm->nm_sublist, ptr))) { version = version_nm->nm_num; subtype = PF_OSFP_ANY; free(wr_name); goto found; } /* * There must be a version and a subtype. * We'll do some fuzzy matching to pick up things like: * Linux 2.2.14 (version=2.2 subtype=14) * FreeBSD 4.0-STABLE (version=4.0 subtype=STABLE) * Windows 2000 SP2 (version=2000 subtype=SP2) */ #define CONNECTOR(x) ((x) == '.' || (x) == ' ' || (x) == '\t' || (x) == '-') wr_len = strlen(ptr); LIST_FOREACH(version_nm, &class_nm->nm_sublist, nm_entry) { version_len = strlen(version_nm->nm_name); if (wr_len < version_len + 2 || !CONNECTOR(ptr[version_len])) continue; /* first part of the string must be version */ if (strncasecmp(ptr, version_nm->nm_name, version_len)) continue; LIST_FOREACH(subtype_nm, &version_nm->nm_sublist, nm_entry) { subtype_len = strlen(subtype_nm->nm_name); if (wr_len != version_len + subtype_len + 1) continue; /* last part of the string must be subtype */ if (strcasecmp(&ptr[version_len+1], subtype_nm->nm_name) != 0) continue; /* Found it!! */ version = version_nm->nm_num; subtype = subtype_nm->nm_num; free(wr_name); goto found; } } free(wr_name); return (PF_OSFP_NOMATCH); } found: PF_OSFP_PACK(ret, class, version, subtype); if (ret != PF_OSFP_NOMATCH) { PF_OSFP_UNPACK(ret, unp_class, unp_version, unp_subtype); if (class != unp_class) { fprintf(stderr, "warning: fingerprint table overflowed " "classes\n"); return (PF_OSFP_NOMATCH); } if (version != unp_version) { fprintf(stderr, "warning: fingerprint table overflowed " "versions\n"); return (PF_OSFP_NOMATCH); } if (subtype != unp_subtype) { fprintf(stderr, "warning: fingerprint table overflowed " "subtypes\n"); return (PF_OSFP_NOMATCH); } } if (ret == PF_OSFP_ANY) { /* should never happen */ fprintf(stderr, "warning: fingerprint packed to 'any'\n"); return (PF_OSFP_NOMATCH); } return (ret); } /* Lookup a fingerprint name by ID */ char * pfctl_lookup_fingerprint(pf_osfp_t fp, char *buf, size_t len) { int class, version, subtype; struct name_list *list; struct name_entry *nm; char *class_name, *version_name, *subtype_name; class_name = version_name = subtype_name = NULL; if (fp == PF_OSFP_UNKNOWN) { strlcpy(buf, "unknown", len); return (buf); } if (fp == PF_OSFP_ANY) { strlcpy(buf, "any", len); return (buf); } PF_OSFP_UNPACK(fp, class, version, subtype); if (class >= (1 << _FP_CLASS_BITS) || version >= (1 << _FP_VERSION_BITS) || subtype >= (1 << _FP_SUBTYPE_BITS)) { warnx("PF_OSFP_UNPACK(0x%x) failed!!", fp); strlcpy(buf, "nomatch", len); return (buf); } LIST_FOREACH(nm, &classes, nm_entry) { if (nm->nm_num == class) { class_name = nm->nm_name; if (version == PF_OSFP_ANY) goto found; list = &nm->nm_sublist; LIST_FOREACH(nm, list, nm_entry) { if (nm->nm_num == version) { version_name = nm->nm_name; if (subtype == PF_OSFP_ANY) goto found; list = &nm->nm_sublist; LIST_FOREACH(nm, list, nm_entry) { if (nm->nm_num == subtype) { subtype_name = nm->nm_name; goto found; } } /* foreach subtype */ strlcpy(buf, "nomatch", len); return (buf); } } /* foreach version */ strlcpy(buf, "nomatch", len); return (buf); } } /* foreach class */ strlcpy(buf, "nomatch", len); return (buf); found: snprintf(buf, len, "%s", class_name); if (version_name) { strlcat(buf, " ", len); strlcat(buf, version_name, len); if (subtype_name) { if (strchr(version_name, ' ')) strlcat(buf, " ", len); else if (strchr(version_name, '.') && isdigit(*subtype_name)) strlcat(buf, ".", len); else strlcat(buf, " ", len); strlcat(buf, subtype_name, len); } } return (buf); } /* lookup a name in a list */ struct name_entry * lookup_name_list(struct name_list *list, const char *name) { struct name_entry *nm; LIST_FOREACH(nm, list, nm_entry) if (strcasecmp(name, nm->nm_name) == 0) return (nm); return (NULL); } void add_fingerprint(int dev, int opts, struct pf_osfp_ioctl *fp) { struct pf_osfp_ioctl fptmp; struct name_entry *nm_class, *nm_version, *nm_subtype; int class, version, subtype; /* We expand #-# or #.#-#.# version/subtypes into multiple fingerprints */ #define EXPAND(field) do { \ int _dot = -1, _start = -1, _end = -1, _i = 0; \ /* pick major version out of #.# */ \ if (isdigit(fp->field[_i]) && fp->field[_i+1] == '.') { \ _dot = fp->field[_i] - '0'; \ _i += 2; \ } \ if (isdigit(fp->field[_i])) \ _start = fp->field[_i++] - '0'; \ else \ break; \ if (isdigit(fp->field[_i])) \ _start = (_start * 10) + fp->field[_i++] - '0'; \ if (fp->field[_i++] != '-') \ break; \ if (isdigit(fp->field[_i]) && fp->field[_i+1] == '.' && \ fp->field[_i] - '0' == _dot) \ _i += 2; \ else if (_dot != -1) \ break; \ if (isdigit(fp->field[_i])) \ _end = fp->field[_i++] - '0'; \ else \ break; \ if (isdigit(fp->field[_i])) \ _end = (_end * 10) + fp->field[_i++] - '0'; \ if (isdigit(fp->field[_i])) \ _end = (_end * 10) + fp->field[_i++] - '0'; \ if (fp->field[_i] != '\0') \ break; \ memcpy(&fptmp, fp, sizeof(fptmp)); \ for (;_start <= _end; _start++) { \ memset(fptmp.field, 0, sizeof(fptmp.field)); \ fptmp.fp_os.fp_enflags |= PF_OSFP_EXPANDED; \ if (_dot == -1) \ snprintf(fptmp.field, sizeof(fptmp.field), \ "%d", _start); \ else \ snprintf(fptmp.field, sizeof(fptmp.field), \ "%d.%d", _dot, _start); \ add_fingerprint(dev, opts, &fptmp); \ } \ } while(0) /* We allow "#-#" as a version or subtype and we'll expand it */ EXPAND(fp_os.fp_version_nm); EXPAND(fp_os.fp_subtype_nm); if (strcasecmp(fp->fp_os.fp_class_nm, "nomatch") == 0) errx(1, "fingerprint class \"nomatch\" is reserved"); version = PF_OSFP_ANY; subtype = PF_OSFP_ANY; nm_class = fingerprint_name_entry(&classes, fp->fp_os.fp_class_nm); if (nm_class->nm_num == 0) nm_class->nm_num = ++class_count; class = nm_class->nm_num; nm_version = fingerprint_name_entry(&nm_class->nm_sublist, fp->fp_os.fp_version_nm); if (nm_version) { if (nm_version->nm_num == 0) nm_version->nm_num = ++nm_class->nm_sublist_num; version = nm_version->nm_num; nm_subtype = fingerprint_name_entry(&nm_version->nm_sublist, fp->fp_os.fp_subtype_nm); if (nm_subtype) { if (nm_subtype->nm_num == 0) nm_subtype->nm_num = ++nm_version->nm_sublist_num; subtype = nm_subtype->nm_num; } } DEBUG(fp, "\tsignature %d:%d:%d %s", class, version, subtype, print_ioctl(fp)); PF_OSFP_PACK(fp->fp_os.fp_os, class, version, subtype); fingerprint_count++; #ifdef FAKE_PF_KERNEL /* Linked to the sys/net/pf_osfp.c. Call pf_osfp_add() */ if ((errno = pf_osfp_add(fp))) #else if ((opts & PF_OPT_NOACTION) == 0 && ioctl(dev, DIOCOSFPADD, fp)) #endif /* FAKE_PF_KERNEL */ { if (errno == EEXIST) { warn("Duplicate signature for %s %s %s", fp->fp_os.fp_class_nm, fp->fp_os.fp_version_nm, fp->fp_os.fp_subtype_nm); } else { err(1, "DIOCOSFPADD"); } } } /* import a fingerprint from the kernel */ void import_fingerprint(struct pf_osfp_ioctl *fp) { struct name_entry *nm_class, *nm_version, *nm_subtype; int class, version, subtype; PF_OSFP_UNPACK(fp->fp_os.fp_os, class, version, subtype); nm_class = fingerprint_name_entry(&classes, fp->fp_os.fp_class_nm); if (nm_class->nm_num == 0) { nm_class->nm_num = class; class_count = MAX(class_count, class); } nm_version = fingerprint_name_entry(&nm_class->nm_sublist, fp->fp_os.fp_version_nm); if (nm_version) { if (nm_version->nm_num == 0) { nm_version->nm_num = version; nm_class->nm_sublist_num = MAX(nm_class->nm_sublist_num, version); } nm_subtype = fingerprint_name_entry(&nm_version->nm_sublist, fp->fp_os.fp_subtype_nm); if (nm_subtype) { if (nm_subtype->nm_num == 0) { nm_subtype->nm_num = subtype; nm_version->nm_sublist_num = MAX(nm_version->nm_sublist_num, subtype); } } } fingerprint_count++; DEBUG(fp, "import signature %d:%d:%d", class, version, subtype); } /* Find an entry for a fingerprints class/version/subtype */ struct name_entry * fingerprint_name_entry(struct name_list *list, char *name) { struct name_entry *nm_entry; if (name == NULL || strlen(name) == 0) return (NULL); LIST_FOREACH(nm_entry, list, nm_entry) { if (strcasecmp(nm_entry->nm_name, name) == 0) { /* We'll move this to the front of the list later */ LIST_REMOVE(nm_entry, nm_entry); break; } } if (nm_entry == NULL) { nm_entry = calloc(1, sizeof(*nm_entry)); if (nm_entry == NULL) err(1, "calloc"); LIST_INIT(&nm_entry->nm_sublist); strlcpy(nm_entry->nm_name, name, sizeof(nm_entry->nm_name)); } LIST_INSERT_HEAD(list, nm_entry, nm_entry); return (nm_entry); } void print_name_list(int opts, struct name_list *nml, const char *prefix) { char newprefix[32]; struct name_entry *nm; LIST_FOREACH(nm, nml, nm_entry) { snprintf(newprefix, sizeof(newprefix), "%s%s\t", prefix, nm->nm_name); printf("%s\n", newprefix); print_name_list(opts, &nm->nm_sublist, newprefix); } } void sort_name_list(int opts, struct name_list *nml) { struct name_list new; struct name_entry *nm, *nmsearch, *nmlast; /* yes yes, it's a very slow sort. so sue me */ LIST_INIT(&new); while ((nm = LIST_FIRST(nml)) != NULL) { LIST_REMOVE(nm, nm_entry); nmlast = NULL; LIST_FOREACH(nmsearch, &new, nm_entry) { if (strcasecmp(nmsearch->nm_name, nm->nm_name) > 0) { LIST_INSERT_BEFORE(nmsearch, nm, nm_entry); break; } nmlast = nmsearch; } if (nmsearch == NULL) { if (nmlast) LIST_INSERT_AFTER(nmlast, nm, nm_entry); else LIST_INSERT_HEAD(&new, nm, nm_entry); } sort_name_list(opts, &nm->nm_sublist); } nmlast = NULL; while ((nm = LIST_FIRST(&new)) != NULL) { LIST_REMOVE(nm, nm_entry); if (nmlast == NULL) LIST_INSERT_HEAD(nml, nm, nm_entry); else LIST_INSERT_AFTER(nmlast, nm, nm_entry); nmlast = nm; } } /* parse the next integer in a formatted config file line */ int get_int(char **line, size_t *len, int *var, int *mod, const char *name, int flags, int max, const char *filename, int lineno) { int fieldlen, i; char *field; long val = 0; if (mod) *mod = 0; *var = 0; field = get_field(line, len, &fieldlen); if (field == NULL) return (1); if (fieldlen == 0) { fprintf(stderr, "%s:%d empty %s\n", filename, lineno, name); return (1); } i = 0; if ((*field == '%' || *field == 'S' || *field == 'T' || *field == '*') && fieldlen >= 1) { switch (*field) { case 'S': if (mod && (flags & T_MSS)) *mod = T_MSS; if (fieldlen == 1) return (0); break; case 'T': if (mod && (flags & T_MTU)) *mod = T_MTU; if (fieldlen == 1) return (0); break; case '*': if (fieldlen != 1) { fprintf(stderr, "%s:%d long '%c' %s\n", filename, lineno, *field, name); return (1); } if (mod && (flags & T_DC)) { *mod = T_DC; return (0); } case '%': if (mod && (flags & T_MOD)) *mod = T_MOD; if (fieldlen == 1) { fprintf(stderr, "%s:%d modulus %s must have a " "value\n", filename, lineno, name); return (1); } break; } if (mod == NULL || *mod == 0) { fprintf(stderr, "%s:%d does not allow %c' %s\n", filename, lineno, *field, name); return (1); } i++; } for (; i < fieldlen; i++) { if (field[i] < '0' || field[i] > '9') { fprintf(stderr, "%s:%d non-digit character in %s\n", filename, lineno, name); return (1); } val = val * 10 + field[i] - '0'; if (val < 0) { fprintf(stderr, "%s:%d %s overflowed\n", filename, lineno, name); return (1); } } if (val > max) { fprintf(stderr, "%s:%d %s value %ld > %d\n", filename, lineno, name, val, max); return (1); } *var = (int)val; return (0); } /* parse the next string in a formatted config file line */ int get_str(char **line, size_t *len, char **v, const char *name, int minlen, const char *filename, int lineno) { int fieldlen; char *ptr; ptr = get_field(line, len, &fieldlen); if (ptr == NULL) return (1); if (fieldlen < minlen) { fprintf(stderr, "%s:%d too short %s\n", filename, lineno, name); return (1); } if ((*v = malloc(fieldlen + 1)) == NULL) { perror("malloc()"); return (1); } memcpy(*v, ptr, fieldlen); (*v)[fieldlen] = '\0'; return (0); } /* Parse out the TCP opts */ int get_tcpopts(const char *filename, int lineno, const char *tcpopts, pf_tcpopts_t *packed, int *optcnt, int *mss, int *mss_mod, int *wscale, int *wscale_mod, int *ts0) { int i, opt; *packed = 0; *optcnt = 0; *wscale = 0; *wscale_mod = T_DC; *mss = 0; *mss_mod = T_DC; *ts0 = 0; if (strcmp(tcpopts, ".") == 0) return (0); for (i = 0; tcpopts[i] && *optcnt < PF_OSFP_MAX_OPTS;) { switch ((opt = toupper(tcpopts[i++]))) { case 'N': /* FALLTHROUGH */ case 'S': *packed = (*packed << PF_OSFP_TCPOPT_BITS) | (opt == 'N' ? PF_OSFP_TCPOPT_NOP : PF_OSFP_TCPOPT_SACK); break; case 'W': /* FALLTHROUGH */ case 'M': { int *this_mod, *this; if (opt == 'W') { this = wscale; this_mod = wscale_mod; } else { this = mss; this_mod = mss_mod; } *this = 0; *this_mod = 0; *packed = (*packed << PF_OSFP_TCPOPT_BITS) | (opt == 'W' ? PF_OSFP_TCPOPT_WSCALE : PF_OSFP_TCPOPT_MSS); if (tcpopts[i] == '*' && (tcpopts[i + 1] == '\0' || tcpopts[i + 1] == ',')) { *this_mod = T_DC; i++; break; } if (tcpopts[i] == '%') { *this_mod = T_MOD; i++; } do { if (!isdigit(tcpopts[i])) { fprintf(stderr, "%s:%d unknown " "character '%c' in %c TCP opt\n", filename, lineno, tcpopts[i], opt); return (1); } *this = (*this * 10) + tcpopts[i++] - '0'; } while(tcpopts[i] != ',' && tcpopts[i] != '\0'); break; } case 'T': if (tcpopts[i] == '0') { *ts0 = 1; i++; } *packed = (*packed << PF_OSFP_TCPOPT_BITS) | PF_OSFP_TCPOPT_TS; break; } (*optcnt) ++; if (tcpopts[i] == '\0') break; if (tcpopts[i] != ',') { fprintf(stderr, "%s:%d unknown option to %c TCP opt\n", filename, lineno, opt); return (1); } i++; } return (0); } /* rip the next field ouf of a formatted config file line */ char * get_field(char **line, size_t *len, int *fieldlen) { char *ret, *ptr = *line; size_t plen = *len; while (plen && isspace(*ptr)) { plen--; ptr++; } ret = ptr; *fieldlen = 0; for (; plen > 0 && *ptr != ':'; plen--, ptr++) (*fieldlen)++; if (plen) { *line = ptr + 1; *len = plen - 1; } else { *len = 0; } while (*fieldlen && isspace(ret[*fieldlen - 1])) (*fieldlen)--; return (ret); } const char * print_ioctl(struct pf_osfp_ioctl *fp) { static char buf[1024]; char tmp[32]; int i, opt; *buf = '\0'; if (fp->fp_flags & PF_OSFP_WSIZE_DC) strlcat(buf, "*", sizeof(buf)); else if (fp->fp_flags & PF_OSFP_WSIZE_MSS) strlcat(buf, "S", sizeof(buf)); else if (fp->fp_flags & PF_OSFP_WSIZE_MTU) strlcat(buf, "T", sizeof(buf)); else { if (fp->fp_flags & PF_OSFP_WSIZE_MOD) strlcat(buf, "%", sizeof(buf)); snprintf(tmp, sizeof(tmp), "%d", fp->fp_wsize); strlcat(buf, tmp, sizeof(buf)); } strlcat(buf, ":", sizeof(buf)); snprintf(tmp, sizeof(tmp), "%d", fp->fp_ttl); strlcat(buf, tmp, sizeof(buf)); strlcat(buf, ":", sizeof(buf)); if (fp->fp_flags & PF_OSFP_DF) strlcat(buf, "1", sizeof(buf)); else strlcat(buf, "0", sizeof(buf)); strlcat(buf, ":", sizeof(buf)); if (fp->fp_flags & PF_OSFP_PSIZE_DC) strlcat(buf, "*", sizeof(buf)); else { if (fp->fp_flags & PF_OSFP_PSIZE_MOD) strlcat(buf, "%", sizeof(buf)); snprintf(tmp, sizeof(tmp), "%d", fp->fp_psize); strlcat(buf, tmp, sizeof(buf)); } strlcat(buf, ":", sizeof(buf)); if (fp->fp_optcnt == 0) strlcat(buf, ".", sizeof(buf)); for (i = fp->fp_optcnt - 1; i >= 0; i--) { opt = fp->fp_tcpopts >> (i * PF_OSFP_TCPOPT_BITS); opt &= (1 << PF_OSFP_TCPOPT_BITS) - 1; switch (opt) { case PF_OSFP_TCPOPT_NOP: strlcat(buf, "N", sizeof(buf)); break; case PF_OSFP_TCPOPT_SACK: strlcat(buf, "S", sizeof(buf)); break; case PF_OSFP_TCPOPT_TS: strlcat(buf, "T", sizeof(buf)); if (fp->fp_flags & PF_OSFP_TS0) strlcat(buf, "0", sizeof(buf)); break; case PF_OSFP_TCPOPT_MSS: strlcat(buf, "M", sizeof(buf)); if (fp->fp_flags & PF_OSFP_MSS_DC) strlcat(buf, "*", sizeof(buf)); else { if (fp->fp_flags & PF_OSFP_MSS_MOD) strlcat(buf, "%", sizeof(buf)); snprintf(tmp, sizeof(tmp), "%d", fp->fp_mss); strlcat(buf, tmp, sizeof(buf)); } break; case PF_OSFP_TCPOPT_WSCALE: strlcat(buf, "W", sizeof(buf)); if (fp->fp_flags & PF_OSFP_WSCALE_DC) strlcat(buf, "*", sizeof(buf)); else { if (fp->fp_flags & PF_OSFP_WSCALE_MOD) strlcat(buf, "%", sizeof(buf)); snprintf(tmp, sizeof(tmp), "%d", fp->fp_wscale); strlcat(buf, tmp, sizeof(buf)); } break; } if (i != 0) strlcat(buf, ",", sizeof(buf)); } strlcat(buf, ":", sizeof(buf)); strlcat(buf, fp->fp_os.fp_class_nm, sizeof(buf)); strlcat(buf, ":", sizeof(buf)); strlcat(buf, fp->fp_os.fp_version_nm, sizeof(buf)); strlcat(buf, ":", sizeof(buf)); strlcat(buf, fp->fp_os.fp_subtype_nm, sizeof(buf)); strlcat(buf, ":", sizeof(buf)); snprintf(tmp, sizeof(tmp), "TcpOpts %d 0x%llx", fp->fp_optcnt, (long long int)fp->fp_tcpopts); strlcat(buf, tmp, sizeof(buf)); return (buf); } Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_parser.c =================================================================== --- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_parser.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_parser.c (revision 303775) @@ -1,1768 +1,1768 @@ /* $OpenBSD: pfctl_parser.c,v 1.240 2008/06/10 20:55:02 mcbride Exp $ */ /* * Copyright (c) 2001 Daniel Hartmeier * Copyright (c) 2002,2003 Henning Brauer * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * - Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided * with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE * COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pfctl_parser.h" #include "pfctl.h" void print_op (u_int8_t, const char *, const char *); void print_port (u_int8_t, u_int16_t, u_int16_t, const char *, int); void print_ugid (u_int8_t, unsigned, unsigned, const char *, unsigned); void print_flags (u_int8_t); void print_fromto(struct pf_rule_addr *, pf_osfp_t, struct pf_rule_addr *, u_int8_t, u_int8_t, int, int); int ifa_skip_if(const char *filter, struct node_host *p); struct node_host *ifa_grouplookup(const char *, int); struct node_host *host_if(const char *, int); struct node_host *host_v4(const char *, int); struct node_host *host_v6(const char *, int); struct node_host *host_dns(const char *, int, int); const char * const tcpflags = "FSRPAUEW"; static const struct icmptypeent icmp_type[] = { { "echoreq", ICMP_ECHO }, { "echorep", ICMP_ECHOREPLY }, { "unreach", ICMP_UNREACH }, { "squench", ICMP_SOURCEQUENCH }, { "redir", ICMP_REDIRECT }, { "althost", ICMP_ALTHOSTADDR }, { "routeradv", ICMP_ROUTERADVERT }, { "routersol", ICMP_ROUTERSOLICIT }, { "timex", ICMP_TIMXCEED }, { "paramprob", ICMP_PARAMPROB }, { "timereq", ICMP_TSTAMP }, { "timerep", ICMP_TSTAMPREPLY }, { "inforeq", ICMP_IREQ }, { "inforep", ICMP_IREQREPLY }, { "maskreq", ICMP_MASKREQ }, { "maskrep", ICMP_MASKREPLY }, { "trace", ICMP_TRACEROUTE }, { "dataconv", ICMP_DATACONVERR }, { "mobredir", ICMP_MOBILE_REDIRECT }, { "ipv6-where", ICMP_IPV6_WHEREAREYOU }, { "ipv6-here", ICMP_IPV6_IAMHERE }, { "mobregreq", ICMP_MOBILE_REGREQUEST }, { "mobregrep", ICMP_MOBILE_REGREPLY }, { "skip", ICMP_SKIP }, { "photuris", ICMP_PHOTURIS } }; static const struct icmptypeent icmp6_type[] = { { "unreach", ICMP6_DST_UNREACH }, { "toobig", ICMP6_PACKET_TOO_BIG }, { "timex", ICMP6_TIME_EXCEEDED }, { "paramprob", ICMP6_PARAM_PROB }, { "echoreq", ICMP6_ECHO_REQUEST }, { "echorep", ICMP6_ECHO_REPLY }, { "groupqry", ICMP6_MEMBERSHIP_QUERY }, { "listqry", MLD_LISTENER_QUERY }, { "grouprep", ICMP6_MEMBERSHIP_REPORT }, { "listenrep", MLD_LISTENER_REPORT }, { "groupterm", ICMP6_MEMBERSHIP_REDUCTION }, { "listendone", MLD_LISTENER_DONE }, { "routersol", ND_ROUTER_SOLICIT }, { "routeradv", ND_ROUTER_ADVERT }, { "neighbrsol", ND_NEIGHBOR_SOLICIT }, { "neighbradv", ND_NEIGHBOR_ADVERT }, { "redir", ND_REDIRECT }, { "routrrenum", ICMP6_ROUTER_RENUMBERING }, { "wrureq", ICMP6_WRUREQUEST }, { "wrurep", ICMP6_WRUREPLY }, { "fqdnreq", ICMP6_FQDN_QUERY }, { "fqdnrep", ICMP6_FQDN_REPLY }, { "niqry", ICMP6_NI_QUERY }, { "nirep", ICMP6_NI_REPLY }, { "mtraceresp", MLD_MTRACE_RESP }, { "mtrace", MLD_MTRACE } }; static const struct icmpcodeent icmp_code[] = { { "net-unr", ICMP_UNREACH, ICMP_UNREACH_NET }, { "host-unr", ICMP_UNREACH, ICMP_UNREACH_HOST }, { "proto-unr", ICMP_UNREACH, ICMP_UNREACH_PROTOCOL }, { "port-unr", ICMP_UNREACH, ICMP_UNREACH_PORT }, { "needfrag", ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG }, { "srcfail", ICMP_UNREACH, ICMP_UNREACH_SRCFAIL }, { "net-unk", ICMP_UNREACH, ICMP_UNREACH_NET_UNKNOWN }, { "host-unk", ICMP_UNREACH, ICMP_UNREACH_HOST_UNKNOWN }, { "isolate", ICMP_UNREACH, ICMP_UNREACH_ISOLATED }, { "net-prohib", ICMP_UNREACH, ICMP_UNREACH_NET_PROHIB }, { "host-prohib", ICMP_UNREACH, ICMP_UNREACH_HOST_PROHIB }, { "net-tos", ICMP_UNREACH, ICMP_UNREACH_TOSNET }, { "host-tos", ICMP_UNREACH, ICMP_UNREACH_TOSHOST }, { "filter-prohib", ICMP_UNREACH, ICMP_UNREACH_FILTER_PROHIB }, { "host-preced", ICMP_UNREACH, ICMP_UNREACH_HOST_PRECEDENCE }, { "cutoff-preced", ICMP_UNREACH, ICMP_UNREACH_PRECEDENCE_CUTOFF }, { "redir-net", ICMP_REDIRECT, ICMP_REDIRECT_NET }, { "redir-host", ICMP_REDIRECT, ICMP_REDIRECT_HOST }, { "redir-tos-net", ICMP_REDIRECT, ICMP_REDIRECT_TOSNET }, { "redir-tos-host", ICMP_REDIRECT, ICMP_REDIRECT_TOSHOST }, { "normal-adv", ICMP_ROUTERADVERT, ICMP_ROUTERADVERT_NORMAL }, { "common-adv", ICMP_ROUTERADVERT, ICMP_ROUTERADVERT_NOROUTE_COMMON }, { "transit", ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS }, { "reassemb", ICMP_TIMXCEED, ICMP_TIMXCEED_REASS }, { "badhead", ICMP_PARAMPROB, ICMP_PARAMPROB_ERRATPTR }, { "optmiss", ICMP_PARAMPROB, ICMP_PARAMPROB_OPTABSENT }, { "badlen", ICMP_PARAMPROB, ICMP_PARAMPROB_LENGTH }, { "unknown-ind", ICMP_PHOTURIS, ICMP_PHOTURIS_UNKNOWN_INDEX }, { "auth-fail", ICMP_PHOTURIS, ICMP_PHOTURIS_AUTH_FAILED }, { "decrypt-fail", ICMP_PHOTURIS, ICMP_PHOTURIS_DECRYPT_FAILED } }; static const struct icmpcodeent icmp6_code[] = { { "admin-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_ADMIN }, { "noroute-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOROUTE }, { "notnbr-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOTNEIGHBOR }, { "beyond-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_BEYONDSCOPE }, { "addr-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_ADDR }, { "port-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOPORT }, { "transit", ICMP6_TIME_EXCEEDED, ICMP6_TIME_EXCEED_TRANSIT }, { "reassemb", ICMP6_TIME_EXCEEDED, ICMP6_TIME_EXCEED_REASSEMBLY }, { "badhead", ICMP6_PARAM_PROB, ICMP6_PARAMPROB_HEADER }, { "nxthdr", ICMP6_PARAM_PROB, ICMP6_PARAMPROB_NEXTHEADER }, { "redironlink", ND_REDIRECT, ND_REDIRECT_ONLINK }, { "redirrouter", ND_REDIRECT, ND_REDIRECT_ROUTER } }; const struct pf_timeout pf_timeouts[] = { { "tcp.first", PFTM_TCP_FIRST_PACKET }, { "tcp.opening", PFTM_TCP_OPENING }, { "tcp.established", PFTM_TCP_ESTABLISHED }, { "tcp.closing", PFTM_TCP_CLOSING }, { "tcp.finwait", PFTM_TCP_FIN_WAIT }, { "tcp.closed", PFTM_TCP_CLOSED }, { "tcp.tsdiff", PFTM_TS_DIFF }, { "udp.first", PFTM_UDP_FIRST_PACKET }, { "udp.single", PFTM_UDP_SINGLE }, { "udp.multiple", PFTM_UDP_MULTIPLE }, { "icmp.first", PFTM_ICMP_FIRST_PACKET }, { "icmp.error", PFTM_ICMP_ERROR_REPLY }, { "other.first", PFTM_OTHER_FIRST_PACKET }, { "other.single", PFTM_OTHER_SINGLE }, { "other.multiple", PFTM_OTHER_MULTIPLE }, { "frag", PFTM_FRAG }, { "interval", PFTM_INTERVAL }, { "adaptive.start", PFTM_ADAPTIVE_START }, { "adaptive.end", PFTM_ADAPTIVE_END }, { "src.track", PFTM_SRC_NODE }, { NULL, 0 } }; const struct icmptypeent * geticmptypebynumber(u_int8_t type, sa_family_t af) { unsigned int i; if (af != AF_INET6) { for (i=0; i < nitems(icmp_type); i++) { if (type == icmp_type[i].type) return (&icmp_type[i]); } } else { for (i=0; i < nitems(icmp6_type); i++) { if (type == icmp6_type[i].type) return (&icmp6_type[i]); } } return (NULL); } const struct icmptypeent * geticmptypebyname(char *w, sa_family_t af) { unsigned int i; if (af != AF_INET6) { for (i=0; i < nitems(icmp_type); i++) { if (!strcmp(w, icmp_type[i].name)) return (&icmp_type[i]); } } else { for (i=0; i < nitems(icmp6_type); i++) { if (!strcmp(w, icmp6_type[i].name)) return (&icmp6_type[i]); } } return (NULL); } const struct icmpcodeent * geticmpcodebynumber(u_int8_t type, u_int8_t code, sa_family_t af) { unsigned int i; if (af != AF_INET6) { for (i=0; i < nitems(icmp_code); i++) { if (type == icmp_code[i].type && code == icmp_code[i].code) return (&icmp_code[i]); } } else { for (i=0; i < nitems(icmp6_code); i++) { if (type == icmp6_code[i].type && code == icmp6_code[i].code) return (&icmp6_code[i]); } } return (NULL); } const struct icmpcodeent * geticmpcodebyname(u_long type, char *w, sa_family_t af) { unsigned int i; if (af != AF_INET6) { for (i=0; i < nitems(icmp_code); i++) { if (type == icmp_code[i].type && !strcmp(w, icmp_code[i].name)) return (&icmp_code[i]); } } else { for (i=0; i < nitems(icmp6_code); i++) { if (type == icmp6_code[i].type && !strcmp(w, icmp6_code[i].name)) return (&icmp6_code[i]); } } return (NULL); } void print_op(u_int8_t op, const char *a1, const char *a2) { if (op == PF_OP_IRG) printf(" %s >< %s", a1, a2); else if (op == PF_OP_XRG) printf(" %s <> %s", a1, a2); else if (op == PF_OP_EQ) printf(" = %s", a1); else if (op == PF_OP_NE) printf(" != %s", a1); else if (op == PF_OP_LT) printf(" < %s", a1); else if (op == PF_OP_LE) printf(" <= %s", a1); else if (op == PF_OP_GT) printf(" > %s", a1); else if (op == PF_OP_GE) printf(" >= %s", a1); else if (op == PF_OP_RRG) printf(" %s:%s", a1, a2); } void print_port(u_int8_t op, u_int16_t p1, u_int16_t p2, const char *proto, int numeric) { char a1[6], a2[6]; struct servent *s; if (!numeric) s = getservbyport(p1, proto); else s = NULL; p1 = ntohs(p1); p2 = ntohs(p2); snprintf(a1, sizeof(a1), "%u", p1); snprintf(a2, sizeof(a2), "%u", p2); printf(" port"); if (s != NULL && (op == PF_OP_EQ || op == PF_OP_NE)) print_op(op, s->s_name, a2); else print_op(op, a1, a2); } void print_ugid(u_int8_t op, unsigned u1, unsigned u2, const char *t, unsigned umax) { char a1[11], a2[11]; snprintf(a1, sizeof(a1), "%u", u1); snprintf(a2, sizeof(a2), "%u", u2); printf(" %s", t); if (u1 == umax && (op == PF_OP_EQ || op == PF_OP_NE)) print_op(op, "unknown", a2); else print_op(op, a1, a2); } void print_flags(u_int8_t f) { int i; for (i = 0; tcpflags[i]; ++i) if (f & (1 << i)) printf("%c", tcpflags[i]); } void print_fromto(struct pf_rule_addr *src, pf_osfp_t osfp, struct pf_rule_addr *dst, sa_family_t af, u_int8_t proto, int verbose, int numeric) { char buf[PF_OSFP_LEN*3]; if (src->addr.type == PF_ADDR_ADDRMASK && dst->addr.type == PF_ADDR_ADDRMASK && PF_AZERO(&src->addr.v.a.addr, AF_INET6) && PF_AZERO(&src->addr.v.a.mask, AF_INET6) && PF_AZERO(&dst->addr.v.a.addr, AF_INET6) && PF_AZERO(&dst->addr.v.a.mask, AF_INET6) && !src->neg && !dst->neg && !src->port_op && !dst->port_op && osfp == PF_OSFP_ANY) printf(" all"); else { printf(" from "); if (src->neg) printf("! "); print_addr(&src->addr, af, verbose); if (src->port_op) print_port(src->port_op, src->port[0], src->port[1], proto == IPPROTO_TCP ? "tcp" : "udp", numeric); if (osfp != PF_OSFP_ANY) printf(" os \"%s\"", pfctl_lookup_fingerprint(osfp, buf, sizeof(buf))); printf(" to "); if (dst->neg) printf("! "); print_addr(&dst->addr, af, verbose); if (dst->port_op) print_port(dst->port_op, dst->port[0], dst->port[1], proto == IPPROTO_TCP ? "tcp" : "udp", numeric); } } void print_pool(struct pf_pool *pool, u_int16_t p1, u_int16_t p2, sa_family_t af, int id) { struct pf_pooladdr *pooladdr; if ((TAILQ_FIRST(&pool->list) != NULL) && TAILQ_NEXT(TAILQ_FIRST(&pool->list), entries) != NULL) printf("{ "); TAILQ_FOREACH(pooladdr, &pool->list, entries){ switch (id) { case PF_NAT: case PF_RDR: case PF_BINAT: print_addr(&pooladdr->addr, af, 0); break; case PF_PASS: if (PF_AZERO(&pooladdr->addr.v.a.addr, af)) printf("%s", pooladdr->ifname); else { printf("(%s ", pooladdr->ifname); print_addr(&pooladdr->addr, af, 0); printf(")"); } break; default: break; } if (TAILQ_NEXT(pooladdr, entries) != NULL) printf(", "); else if (TAILQ_NEXT(TAILQ_FIRST(&pool->list), entries) != NULL) printf(" }"); } switch (id) { case PF_NAT: if ((p1 != PF_NAT_PROXY_PORT_LOW || p2 != PF_NAT_PROXY_PORT_HIGH) && (p1 != 0 || p2 != 0)) { if (p1 == p2) printf(" port %u", p1); else printf(" port %u:%u", p1, p2); } break; case PF_RDR: if (p1) { printf(" port %u", p1); if (p2 && (p2 != p1)) printf(":%u", p2); } break; default: break; } switch (pool->opts & PF_POOL_TYPEMASK) { case PF_POOL_NONE: break; case PF_POOL_BITMASK: printf(" bitmask"); break; case PF_POOL_RANDOM: printf(" random"); break; case PF_POOL_SRCHASH: printf(" source-hash 0x%08x%08x%08x%08x", pool->key.key32[0], pool->key.key32[1], pool->key.key32[2], pool->key.key32[3]); break; case PF_POOL_ROUNDROBIN: printf(" round-robin"); break; } if (pool->opts & PF_POOL_STICKYADDR) printf(" sticky-address"); if (id == PF_NAT && p1 == 0 && p2 == 0) printf(" static-port"); } const char * const pf_reasons[PFRES_MAX+1] = PFRES_NAMES; const char * const pf_lcounters[LCNT_MAX+1] = LCNT_NAMES; const char * const pf_fcounters[FCNT_MAX+1] = FCNT_NAMES; const char * const pf_scounters[FCNT_MAX+1] = FCNT_NAMES; void print_status(struct pf_status *s, int opts) { char statline[80], *running; time_t runtime; int i; char buf[PF_MD5_DIGEST_LENGTH * 2 + 1]; static const char hex[] = "0123456789abcdef"; runtime = time(NULL) - s->since; running = s->running ? "Enabled" : "Disabled"; if (s->since) { unsigned int sec, min, hrs, day = runtime; sec = day % 60; day /= 60; min = day % 60; day /= 60; hrs = day % 24; day /= 24; snprintf(statline, sizeof(statline), "Status: %s for %u days %.2u:%.2u:%.2u", running, day, hrs, min, sec); } else snprintf(statline, sizeof(statline), "Status: %s", running); printf("%-44s", statline); switch (s->debug) { case PF_DEBUG_NONE: printf("%15s\n\n", "Debug: None"); break; case PF_DEBUG_URGENT: printf("%15s\n\n", "Debug: Urgent"); break; case PF_DEBUG_MISC: printf("%15s\n\n", "Debug: Misc"); break; case PF_DEBUG_NOISY: printf("%15s\n\n", "Debug: Loud"); break; } if (opts & PF_OPT_VERBOSE) { printf("Hostid: 0x%08x\n", ntohl(s->hostid)); for (i = 0; i < PF_MD5_DIGEST_LENGTH; i++) { buf[i + i] = hex[s->pf_chksum[i] >> 4]; buf[i + i + 1] = hex[s->pf_chksum[i] & 0x0f]; } buf[i + i] = '\0'; printf("Checksum: 0x%s\n\n", buf); } if (s->ifname[0] != 0) { printf("Interface Stats for %-16s %5s %16s\n", s->ifname, "IPv4", "IPv6"); printf(" %-25s %14llu %16llu\n", "Bytes In", (unsigned long long)s->bcounters[0][0], (unsigned long long)s->bcounters[1][0]); printf(" %-25s %14llu %16llu\n", "Bytes Out", (unsigned long long)s->bcounters[0][1], (unsigned long long)s->bcounters[1][1]); printf(" Packets In\n"); printf(" %-23s %14llu %16llu\n", "Passed", (unsigned long long)s->pcounters[0][0][PF_PASS], (unsigned long long)s->pcounters[1][0][PF_PASS]); printf(" %-23s %14llu %16llu\n", "Blocked", (unsigned long long)s->pcounters[0][0][PF_DROP], (unsigned long long)s->pcounters[1][0][PF_DROP]); printf(" Packets Out\n"); printf(" %-23s %14llu %16llu\n", "Passed", (unsigned long long)s->pcounters[0][1][PF_PASS], (unsigned long long)s->pcounters[1][1][PF_PASS]); printf(" %-23s %14llu %16llu\n\n", "Blocked", (unsigned long long)s->pcounters[0][1][PF_DROP], (unsigned long long)s->pcounters[1][1][PF_DROP]); } printf("%-27s %14s %16s\n", "State Table", "Total", "Rate"); printf(" %-25s %14u %14s\n", "current entries", s->states, ""); for (i = 0; i < FCNT_MAX; i++) { printf(" %-25s %14llu ", pf_fcounters[i], (unsigned long long)s->fcounters[i]); if (runtime > 0) printf("%14.1f/s\n", (double)s->fcounters[i] / (double)runtime); else printf("%14s\n", ""); } if (opts & PF_OPT_VERBOSE) { printf("Source Tracking Table\n"); printf(" %-25s %14u %14s\n", "current entries", s->src_nodes, ""); for (i = 0; i < SCNT_MAX; i++) { printf(" %-25s %14lld ", pf_scounters[i], #ifdef __FreeBSD__ (long long)s->scounters[i]); #else s->scounters[i]); #endif if (runtime > 0) printf("%14.1f/s\n", (double)s->scounters[i] / (double)runtime); else printf("%14s\n", ""); } } printf("Counters\n"); for (i = 0; i < PFRES_MAX; i++) { printf(" %-25s %14llu ", pf_reasons[i], (unsigned long long)s->counters[i]); if (runtime > 0) printf("%14.1f/s\n", (double)s->counters[i] / (double)runtime); else printf("%14s\n", ""); } if (opts & PF_OPT_VERBOSE) { printf("Limit Counters\n"); for (i = 0; i < LCNT_MAX; i++) { printf(" %-25s %14lld ", pf_lcounters[i], #ifdef __FreeBSD__ (unsigned long long)s->lcounters[i]); #else s->lcounters[i]); #endif if (runtime > 0) printf("%14.1f/s\n", (double)s->lcounters[i] / (double)runtime); else printf("%14s\n", ""); } } } void print_src_node(struct pf_src_node *sn, int opts) { struct pf_addr_wrap aw; int min, sec; memset(&aw, 0, sizeof(aw)); if (sn->af == AF_INET) aw.v.a.mask.addr32[0] = 0xffffffff; else memset(&aw.v.a.mask, 0xff, sizeof(aw.v.a.mask)); aw.v.a.addr = sn->addr; print_addr(&aw, sn->af, opts & PF_OPT_VERBOSE2); printf(" -> "); aw.v.a.addr = sn->raddr; print_addr(&aw, sn->af, opts & PF_OPT_VERBOSE2); printf(" ( states %u, connections %u, rate %u.%u/%us )\n", sn->states, sn->conn, sn->conn_rate.count / 1000, (sn->conn_rate.count % 1000) / 100, sn->conn_rate.seconds); if (opts & PF_OPT_VERBOSE) { sec = sn->creation % 60; sn->creation /= 60; min = sn->creation % 60; sn->creation /= 60; printf(" age %.2u:%.2u:%.2u", sn->creation, min, sec); if (sn->states == 0) { sec = sn->expire % 60; sn->expire /= 60; min = sn->expire % 60; sn->expire /= 60; printf(", expires in %.2u:%.2u:%.2u", sn->expire, min, sec); } printf(", %llu pkts, %llu bytes", #ifdef __FreeBSD__ (unsigned long long)(sn->packets[0] + sn->packets[1]), (unsigned long long)(sn->bytes[0] + sn->bytes[1])); #else sn->packets[0] + sn->packets[1], sn->bytes[0] + sn->bytes[1]); #endif switch (sn->ruletype) { case PF_NAT: if (sn->rule.nr != -1) printf(", nat rule %u", sn->rule.nr); break; case PF_RDR: if (sn->rule.nr != -1) printf(", rdr rule %u", sn->rule.nr); break; case PF_PASS: if (sn->rule.nr != -1) printf(", filter rule %u", sn->rule.nr); break; } printf("\n"); } } void print_rule(struct pf_rule *r, const char *anchor_call, int verbose, int numeric) { static const char *actiontypes[] = { "pass", "block", "scrub", "no scrub", "nat", "no nat", "binat", "no binat", "rdr", "no rdr" }; static const char *anchortypes[] = { "anchor", "anchor", "anchor", "anchor", "nat-anchor", "nat-anchor", "binat-anchor", "binat-anchor", "rdr-anchor", "rdr-anchor" }; int i, opts; if (verbose) printf("@%d ", r->nr); if (r->action > PF_NORDR) printf("action(%d)", r->action); else if (anchor_call[0]) { if (anchor_call[0] == '_') { printf("%s", anchortypes[r->action]); } else printf("%s \"%s\"", anchortypes[r->action], anchor_call); } else { printf("%s", actiontypes[r->action]); if (r->natpass) printf(" pass"); } if (r->action == PF_DROP) { if (r->rule_flag & PFRULE_RETURN) printf(" return"); else if (r->rule_flag & PFRULE_RETURNRST) { if (!r->return_ttl) printf(" return-rst"); else printf(" return-rst(ttl %d)", r->return_ttl); } else if (r->rule_flag & PFRULE_RETURNICMP) { const struct icmpcodeent *ic, *ic6; ic = geticmpcodebynumber(r->return_icmp >> 8, r->return_icmp & 255, AF_INET); ic6 = geticmpcodebynumber(r->return_icmp6 >> 8, r->return_icmp6 & 255, AF_INET6); switch (r->af) { case AF_INET: printf(" return-icmp"); if (ic == NULL) printf("(%u)", r->return_icmp & 255); else printf("(%s)", ic->name); break; case AF_INET6: printf(" return-icmp6"); if (ic6 == NULL) printf("(%u)", r->return_icmp6 & 255); else printf("(%s)", ic6->name); break; default: printf(" return-icmp"); if (ic == NULL) printf("(%u, ", r->return_icmp & 255); else printf("(%s, ", ic->name); if (ic6 == NULL) printf("%u)", r->return_icmp6 & 255); else printf("%s)", ic6->name); break; } } else printf(" drop"); } if (r->direction == PF_IN) printf(" in"); else if (r->direction == PF_OUT) printf(" out"); if (r->log) { printf(" log"); if (r->log & ~PF_LOG || r->logif) { int count = 0; printf(" ("); if (r->log & PF_LOG_ALL) printf("%sall", count++ ? ", " : ""); if (r->log & PF_LOG_SOCKET_LOOKUP) printf("%suser", count++ ? ", " : ""); if (r->logif) printf("%sto pflog%u", count++ ? ", " : "", r->logif); printf(")"); } } if (r->quick) printf(" quick"); if (r->ifname[0]) { if (r->ifnot) printf(" on ! %s", r->ifname); else printf(" on %s", r->ifname); } if (r->rt) { if (r->rt == PF_ROUTETO) printf(" route-to"); else if (r->rt == PF_REPLYTO) printf(" reply-to"); else if (r->rt == PF_DUPTO) printf(" dup-to"); else if (r->rt == PF_FASTROUTE) printf(" fastroute"); if (r->rt != PF_FASTROUTE) { printf(" "); print_pool(&r->rpool, 0, 0, r->af, PF_PASS); } } if (r->af) { if (r->af == AF_INET) printf(" inet"); else printf(" inet6"); } if (r->proto) { struct protoent *p; if ((p = getprotobynumber(r->proto)) != NULL) printf(" proto %s", p->p_name); else printf(" proto %u", r->proto); } print_fromto(&r->src, r->os_fingerprint, &r->dst, r->af, r->proto, verbose, numeric); if (r->uid.op) print_ugid(r->uid.op, r->uid.uid[0], r->uid.uid[1], "user", UID_MAX); if (r->gid.op) print_ugid(r->gid.op, r->gid.gid[0], r->gid.gid[1], "group", GID_MAX); if (r->flags || r->flagset) { printf(" flags "); print_flags(r->flags); printf("/"); print_flags(r->flagset); } else if (r->action == PF_PASS && (!r->proto || r->proto == IPPROTO_TCP) && !(r->rule_flag & PFRULE_FRAGMENT) && !anchor_call[0] && r->keep_state) printf(" flags any"); if (r->type) { const struct icmptypeent *it; it = geticmptypebynumber(r->type-1, r->af); if (r->af != AF_INET6) printf(" icmp-type"); else printf(" icmp6-type"); if (it != NULL) printf(" %s", it->name); else printf(" %u", r->type-1); if (r->code) { const struct icmpcodeent *ic; ic = geticmpcodebynumber(r->type-1, r->code-1, r->af); if (ic != NULL) printf(" code %s", ic->name); else printf(" code %u", r->code-1); } } if (r->tos) printf(" tos 0x%2.2x", r->tos); if (r->prio) printf(" prio %u", r->prio == PF_PRIO_ZERO ? 0 : r->prio); if (r->scrub_flags & PFSTATE_SETMASK) { char *comma = ""; printf(" set ("); if (r->scrub_flags & PFSTATE_SETPRIO) { if (r->set_prio[0] == r->set_prio[1]) printf("%s prio %u", comma, r->set_prio[0]); else printf("%s prio(%u, %u)", comma, r->set_prio[0], r->set_prio[1]); comma = ","; } printf(" )"); } if (!r->keep_state && r->action == PF_PASS && !anchor_call[0]) printf(" no state"); else if (r->keep_state == PF_STATE_NORMAL) printf(" keep state"); else if (r->keep_state == PF_STATE_MODULATE) printf(" modulate state"); else if (r->keep_state == PF_STATE_SYNPROXY) printf(" synproxy state"); if (r->prob) { char buf[20]; snprintf(buf, sizeof(buf), "%f", r->prob*100.0/(UINT_MAX+1.0)); for (i = strlen(buf)-1; i > 0; i--) { if (buf[i] == '0') buf[i] = '\0'; else { if (buf[i] == '.') buf[i] = '\0'; break; } } printf(" probability %s%%", buf); } opts = 0; if (r->max_states || r->max_src_nodes || r->max_src_states) opts = 1; if (r->rule_flag & PFRULE_NOSYNC) opts = 1; if (r->rule_flag & PFRULE_SRCTRACK) opts = 1; if (r->rule_flag & PFRULE_IFBOUND) opts = 1; if (r->rule_flag & PFRULE_STATESLOPPY) opts = 1; for (i = 0; !opts && i < PFTM_MAX; ++i) if (r->timeout[i]) opts = 1; if (opts) { printf(" ("); if (r->max_states) { printf("max %u", r->max_states); opts = 0; } if (r->rule_flag & PFRULE_NOSYNC) { if (!opts) printf(", "); printf("no-sync"); opts = 0; } if (r->rule_flag & PFRULE_SRCTRACK) { if (!opts) printf(", "); printf("source-track"); if (r->rule_flag & PFRULE_RULESRCTRACK) printf(" rule"); else printf(" global"); opts = 0; } if (r->max_src_states) { if (!opts) printf(", "); printf("max-src-states %u", r->max_src_states); opts = 0; } if (r->max_src_conn) { if (!opts) printf(", "); printf("max-src-conn %u", r->max_src_conn); opts = 0; } if (r->max_src_conn_rate.limit) { if (!opts) printf(", "); printf("max-src-conn-rate %u/%u", r->max_src_conn_rate.limit, r->max_src_conn_rate.seconds); opts = 0; } if (r->max_src_nodes) { if (!opts) printf(", "); printf("max-src-nodes %u", r->max_src_nodes); opts = 0; } if (r->overload_tblname[0]) { if (!opts) printf(", "); printf("overload <%s>", r->overload_tblname); if (r->flush) printf(" flush"); if (r->flush & PF_FLUSH_GLOBAL) printf(" global"); } if (r->rule_flag & PFRULE_IFBOUND) { if (!opts) printf(", "); printf("if-bound"); opts = 0; } if (r->rule_flag & PFRULE_STATESLOPPY) { if (!opts) printf(", "); printf("sloppy"); opts = 0; } for (i = 0; i < PFTM_MAX; ++i) if (r->timeout[i]) { int j; if (!opts) printf(", "); opts = 0; for (j = 0; pf_timeouts[j].name != NULL; ++j) if (pf_timeouts[j].timeout == i) break; printf("%s %u", pf_timeouts[j].name == NULL ? "inv.timeout" : pf_timeouts[j].name, r->timeout[i]); } printf(")"); } if (r->rule_flag & PFRULE_FRAGMENT) printf(" fragment"); if (r->rule_flag & PFRULE_NODF) printf(" no-df"); if (r->rule_flag & PFRULE_RANDOMID) printf(" random-id"); if (r->min_ttl) printf(" min-ttl %d", r->min_ttl); if (r->max_mss) printf(" max-mss %d", r->max_mss); if (r->rule_flag & PFRULE_SET_TOS) printf(" set-tos 0x%2.2x", r->set_tos); if (r->allow_opts) printf(" allow-opts"); if (r->action == PF_SCRUB) { if (r->rule_flag & PFRULE_REASSEMBLE_TCP) printf(" reassemble tcp"); printf(" fragment reassemble"); } if (r->label[0]) printf(" label \"%s\"", r->label); if (r->qname[0] && r->pqname[0]) printf(" queue(%s, %s)", r->qname, r->pqname); else if (r->qname[0]) printf(" queue %s", r->qname); if (r->tagname[0]) printf(" tag %s", r->tagname); if (r->match_tagname[0]) { if (r->match_tag_not) printf(" !"); printf(" tagged %s", r->match_tagname); } if (r->rtableid != -1) printf(" rtable %u", r->rtableid); if (r->divert.port) { #ifdef __FreeBSD__ printf(" divert-to %u", ntohs(r->divert.port)); #else if (PF_AZERO(&r->divert.addr, r->af)) { printf(" divert-reply"); } else { /* XXX cut&paste from print_addr */ char buf[48]; printf(" divert-to "); if (inet_ntop(r->af, &r->divert.addr, buf, sizeof(buf)) == NULL) printf("?"); else printf("%s", buf); printf(" port %u", ntohs(r->divert.port)); } #endif } if (!anchor_call[0] && (r->action == PF_NAT || r->action == PF_BINAT || r->action == PF_RDR)) { printf(" -> "); print_pool(&r->rpool, r->rpool.proxy_port[0], r->rpool.proxy_port[1], r->af, r->action); } } void print_tabledef(const char *name, int flags, int addrs, struct node_tinithead *nodes) { struct node_tinit *ti, *nti; struct node_host *h; printf("table <%s>", name); if (flags & PFR_TFLAG_CONST) printf(" const"); if (flags & PFR_TFLAG_PERSIST) printf(" persist"); if (flags & PFR_TFLAG_COUNTERS) printf(" counters"); SIMPLEQ_FOREACH(ti, nodes, entries) { if (ti->file) { printf(" file \"%s\"", ti->file); continue; } printf(" {"); for (;;) { for (h = ti->host; h != NULL; h = h->next) { printf(h->not ? " !" : " "); print_addr(&h->addr, h->af, 0); } nti = SIMPLEQ_NEXT(ti, entries); if (nti != NULL && nti->file == NULL) ti = nti; /* merge lists */ else break; } printf(" }"); } if (addrs && SIMPLEQ_EMPTY(nodes)) printf(" { }"); printf("\n"); } int parse_flags(char *s) { char *p, *q; u_int8_t f = 0; for (p = s; *p; p++) { if ((q = strchr(tcpflags, *p)) == NULL) return -1; else f |= 1 << (q - tcpflags); } return (f ? f : PF_TH_ALL); } void set_ipmask(struct node_host *h, u_int8_t b) { struct pf_addr *m, *n; int i, j = 0; m = &h->addr.v.a.mask; memset(m, 0, sizeof(*m)); while (b >= 32) { m->addr32[j++] = 0xffffffff; b -= 32; } for (i = 31; i > 31-b; --i) m->addr32[j] |= (1 << i); if (b) m->addr32[j] = htonl(m->addr32[j]); /* Mask off bits of the address that will never be used. */ n = &h->addr.v.a.addr; if (h->addr.type == PF_ADDR_ADDRMASK) for (i = 0; i < 4; i++) n->addr32[i] = n->addr32[i] & m->addr32[i]; } int check_netmask(struct node_host *h, sa_family_t af) { struct node_host *n = NULL; struct pf_addr *m; for (n = h; n != NULL; n = n->next) { if (h->addr.type == PF_ADDR_TABLE) continue; m = &h->addr.v.a.mask; /* fix up netmask for dynaddr */ if (af == AF_INET && h->addr.type == PF_ADDR_DYNIFTL && unmask(m, AF_INET6) > 32) set_ipmask(n, 32); /* netmasks > 32 bit are invalid on v4 */ if (af == AF_INET && (m->addr32[1] || m->addr32[2] || m->addr32[3])) { fprintf(stderr, "netmask %u invalid for IPv4 address\n", unmask(m, AF_INET6)); return (1); } } return (0); } /* interface lookup routines */ -struct node_host *iftab; +static struct node_host *iftab; void ifa_load(void) { struct ifaddrs *ifap, *ifa; struct node_host *n = NULL, *h = NULL; if (getifaddrs(&ifap) < 0) err(1, "getifaddrs"); for (ifa = ifap; ifa; ifa = ifa->ifa_next) { if (!(ifa->ifa_addr->sa_family == AF_INET || ifa->ifa_addr->sa_family == AF_INET6 || ifa->ifa_addr->sa_family == AF_LINK)) continue; n = calloc(1, sizeof(struct node_host)); if (n == NULL) err(1, "address: calloc"); n->af = ifa->ifa_addr->sa_family; n->ifa_flags = ifa->ifa_flags; #ifdef __KAME__ if (n->af == AF_INET6 && IN6_IS_ADDR_LINKLOCAL(&((struct sockaddr_in6 *) ifa->ifa_addr)->sin6_addr) && ((struct sockaddr_in6 *)ifa->ifa_addr)->sin6_scope_id == 0) { struct sockaddr_in6 *sin6; sin6 = (struct sockaddr_in6 *)ifa->ifa_addr; sin6->sin6_scope_id = sin6->sin6_addr.s6_addr[2] << 8 | sin6->sin6_addr.s6_addr[3]; sin6->sin6_addr.s6_addr[2] = 0; sin6->sin6_addr.s6_addr[3] = 0; } #endif n->ifindex = 0; if (n->af == AF_INET) { memcpy(&n->addr.v.a.addr, &((struct sockaddr_in *) ifa->ifa_addr)->sin_addr.s_addr, sizeof(struct in_addr)); memcpy(&n->addr.v.a.mask, &((struct sockaddr_in *) ifa->ifa_netmask)->sin_addr.s_addr, sizeof(struct in_addr)); if (ifa->ifa_broadaddr != NULL) memcpy(&n->bcast, &((struct sockaddr_in *) ifa->ifa_broadaddr)->sin_addr.s_addr, sizeof(struct in_addr)); if (ifa->ifa_dstaddr != NULL) memcpy(&n->peer, &((struct sockaddr_in *) ifa->ifa_dstaddr)->sin_addr.s_addr, sizeof(struct in_addr)); } else if (n->af == AF_INET6) { memcpy(&n->addr.v.a.addr, &((struct sockaddr_in6 *) ifa->ifa_addr)->sin6_addr.s6_addr, sizeof(struct in6_addr)); memcpy(&n->addr.v.a.mask, &((struct sockaddr_in6 *) ifa->ifa_netmask)->sin6_addr.s6_addr, sizeof(struct in6_addr)); if (ifa->ifa_broadaddr != NULL) memcpy(&n->bcast, &((struct sockaddr_in6 *) ifa->ifa_broadaddr)->sin6_addr.s6_addr, sizeof(struct in6_addr)); if (ifa->ifa_dstaddr != NULL) memcpy(&n->peer, &((struct sockaddr_in6 *) ifa->ifa_dstaddr)->sin6_addr.s6_addr, sizeof(struct in6_addr)); n->ifindex = ((struct sockaddr_in6 *) ifa->ifa_addr)->sin6_scope_id; } if ((n->ifname = strdup(ifa->ifa_name)) == NULL) err(1, "ifa_load: strdup"); n->next = NULL; n->tail = n; if (h == NULL) h = n; else { h->tail->next = n; h->tail = n; } } iftab = h; freeifaddrs(ifap); } int get_socket_domain(void) { int sdom; sdom = AF_UNSPEC; #ifdef WITH_INET6 if (sdom == AF_UNSPEC && feature_present("inet6")) sdom = AF_INET6; #endif #ifdef WITH_INET if (sdom == AF_UNSPEC && feature_present("inet")) sdom = AF_INET; #endif if (sdom == AF_UNSPEC) sdom = AF_LINK; return (sdom); } struct node_host * ifa_exists(const char *ifa_name) { struct node_host *n; struct ifgroupreq ifgr; int s; if (iftab == NULL) ifa_load(); /* check wether this is a group */ if ((s = socket(get_socket_domain(), SOCK_DGRAM, 0)) == -1) err(1, "socket"); bzero(&ifgr, sizeof(ifgr)); strlcpy(ifgr.ifgr_name, ifa_name, sizeof(ifgr.ifgr_name)); if (ioctl(s, SIOCGIFGMEMB, (caddr_t)&ifgr) == 0) { /* fake a node_host */ if ((n = calloc(1, sizeof(*n))) == NULL) err(1, "calloc"); if ((n->ifname = strdup(ifa_name)) == NULL) err(1, "strdup"); close(s); return (n); } close(s); for (n = iftab; n; n = n->next) { if (n->af == AF_LINK && !strncmp(n->ifname, ifa_name, IFNAMSIZ)) return (n); } return (NULL); } struct node_host * ifa_grouplookup(const char *ifa_name, int flags) { struct ifg_req *ifg; struct ifgroupreq ifgr; int s, len; struct node_host *n, *h = NULL; if ((s = socket(get_socket_domain(), SOCK_DGRAM, 0)) == -1) err(1, "socket"); bzero(&ifgr, sizeof(ifgr)); strlcpy(ifgr.ifgr_name, ifa_name, sizeof(ifgr.ifgr_name)); if (ioctl(s, SIOCGIFGMEMB, (caddr_t)&ifgr) == -1) { close(s); return (NULL); } len = ifgr.ifgr_len; if ((ifgr.ifgr_groups = calloc(1, len)) == NULL) err(1, "calloc"); if (ioctl(s, SIOCGIFGMEMB, (caddr_t)&ifgr) == -1) err(1, "SIOCGIFGMEMB"); for (ifg = ifgr.ifgr_groups; ifg && len >= sizeof(struct ifg_req); ifg++) { len -= sizeof(struct ifg_req); if ((n = ifa_lookup(ifg->ifgrq_member, flags)) == NULL) continue; if (h == NULL) h = n; else { h->tail->next = n; h->tail = n->tail; } } free(ifgr.ifgr_groups); close(s); return (h); } struct node_host * ifa_lookup(const char *ifa_name, int flags) { struct node_host *p = NULL, *h = NULL, *n = NULL; int got4 = 0, got6 = 0; const char *last_if = NULL; if ((h = ifa_grouplookup(ifa_name, flags)) != NULL) return (h); if (!strncmp(ifa_name, "self", IFNAMSIZ)) ifa_name = NULL; if (iftab == NULL) ifa_load(); for (p = iftab; p; p = p->next) { if (ifa_skip_if(ifa_name, p)) continue; if ((flags & PFI_AFLAG_BROADCAST) && p->af != AF_INET) continue; if ((flags & PFI_AFLAG_BROADCAST) && !(p->ifa_flags & IFF_BROADCAST)) continue; if ((flags & PFI_AFLAG_PEER) && !(p->ifa_flags & IFF_POINTOPOINT)) continue; if ((flags & PFI_AFLAG_NETWORK) && p->ifindex > 0) continue; if (last_if == NULL || strcmp(last_if, p->ifname)) got4 = got6 = 0; last_if = p->ifname; if ((flags & PFI_AFLAG_NOALIAS) && p->af == AF_INET && got4) continue; if ((flags & PFI_AFLAG_NOALIAS) && p->af == AF_INET6 && got6) continue; if (p->af == AF_INET) got4 = 1; else got6 = 1; n = calloc(1, sizeof(struct node_host)); if (n == NULL) err(1, "address: calloc"); n->af = p->af; if (flags & PFI_AFLAG_BROADCAST) memcpy(&n->addr.v.a.addr, &p->bcast, sizeof(struct pf_addr)); else if (flags & PFI_AFLAG_PEER) memcpy(&n->addr.v.a.addr, &p->peer, sizeof(struct pf_addr)); else memcpy(&n->addr.v.a.addr, &p->addr.v.a.addr, sizeof(struct pf_addr)); if (flags & PFI_AFLAG_NETWORK) set_ipmask(n, unmask(&p->addr.v.a.mask, n->af)); else { if (n->af == AF_INET) { if (p->ifa_flags & IFF_LOOPBACK && p->ifa_flags & IFF_LINK1) memcpy(&n->addr.v.a.mask, &p->addr.v.a.mask, sizeof(struct pf_addr)); else set_ipmask(n, 32); } else set_ipmask(n, 128); } n->ifindex = p->ifindex; n->next = NULL; n->tail = n; if (h == NULL) h = n; else { h->tail->next = n; h->tail = n; } } return (h); } int ifa_skip_if(const char *filter, struct node_host *p) { int n; if (p->af != AF_INET && p->af != AF_INET6) return (1); if (filter == NULL || !*filter) return (0); if (!strcmp(p->ifname, filter)) return (0); /* exact match */ n = strlen(filter); if (n < 1 || n >= IFNAMSIZ) return (1); /* sanity check */ if (filter[n-1] >= '0' && filter[n-1] <= '9') return (1); /* only do exact match in that case */ if (strncmp(p->ifname, filter, n)) return (1); /* prefix doesn't match */ return (p->ifname[n] < '0' || p->ifname[n] > '9'); } struct node_host * host(const char *s) { struct node_host *h = NULL; int mask, v4mask, v6mask, cont = 1; char *p, *q, *ps; if ((p = strrchr(s, '/')) != NULL) { mask = strtol(p+1, &q, 0); if (!q || *q || mask > 128 || q == (p+1)) { fprintf(stderr, "invalid netmask '%s'\n", p); return (NULL); } if ((ps = malloc(strlen(s) - strlen(p) + 1)) == NULL) err(1, "host: malloc"); strlcpy(ps, s, strlen(s) - strlen(p) + 1); v4mask = v6mask = mask; } else { if ((ps = strdup(s)) == NULL) err(1, "host: strdup"); v4mask = 32; v6mask = 128; mask = -1; } /* interface with this name exists? */ if (cont && (h = host_if(ps, mask)) != NULL) cont = 0; /* IPv4 address? */ if (cont && (h = host_v4(s, mask)) != NULL) cont = 0; /* IPv6 address? */ if (cont && (h = host_v6(ps, v6mask)) != NULL) cont = 0; /* dns lookup */ if (cont && (h = host_dns(ps, v4mask, v6mask)) != NULL) cont = 0; free(ps); if (h == NULL || cont == 1) { fprintf(stderr, "no IP address found for %s\n", s); return (NULL); } return (h); } struct node_host * host_if(const char *s, int mask) { struct node_host *n, *h = NULL; char *p, *ps; int flags = 0; if ((ps = strdup(s)) == NULL) err(1, "host_if: strdup"); while ((p = strrchr(ps, ':')) != NULL) { if (!strcmp(p+1, "network")) flags |= PFI_AFLAG_NETWORK; else if (!strcmp(p+1, "broadcast")) flags |= PFI_AFLAG_BROADCAST; else if (!strcmp(p+1, "peer")) flags |= PFI_AFLAG_PEER; else if (!strcmp(p+1, "0")) flags |= PFI_AFLAG_NOALIAS; else { free(ps); return (NULL); } *p = '\0'; } if (flags & (flags - 1) & PFI_AFLAG_MODEMASK) { /* Yep! */ fprintf(stderr, "illegal combination of interface modifiers\n"); free(ps); return (NULL); } if ((flags & (PFI_AFLAG_NETWORK|PFI_AFLAG_BROADCAST)) && mask > -1) { fprintf(stderr, "network or broadcast lookup, but " "extra netmask given\n"); free(ps); return (NULL); } if (ifa_exists(ps) || !strncmp(ps, "self", IFNAMSIZ)) { /* interface with this name exists */ h = ifa_lookup(ps, flags); for (n = h; n != NULL && mask > -1; n = n->next) set_ipmask(n, mask); } free(ps); return (h); } struct node_host * host_v4(const char *s, int mask) { struct node_host *h = NULL; struct in_addr ina; int bits = 32; memset(&ina, 0, sizeof(struct in_addr)); if (strrchr(s, '/') != NULL) { if ((bits = inet_net_pton(AF_INET, s, &ina, sizeof(ina))) == -1) return (NULL); } else { if (inet_pton(AF_INET, s, &ina) != 1) return (NULL); } h = calloc(1, sizeof(struct node_host)); if (h == NULL) err(1, "address: calloc"); h->ifname = NULL; h->af = AF_INET; h->addr.v.a.addr.addr32[0] = ina.s_addr; set_ipmask(h, bits); h->next = NULL; h->tail = h; return (h); } struct node_host * host_v6(const char *s, int mask) { struct addrinfo hints, *res; struct node_host *h = NULL; memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_INET6; hints.ai_socktype = SOCK_DGRAM; /*dummy*/ hints.ai_flags = AI_NUMERICHOST; if (getaddrinfo(s, "0", &hints, &res) == 0) { h = calloc(1, sizeof(struct node_host)); if (h == NULL) err(1, "address: calloc"); h->ifname = NULL; h->af = AF_INET6; memcpy(&h->addr.v.a.addr, &((struct sockaddr_in6 *)res->ai_addr)->sin6_addr, sizeof(h->addr.v.a.addr)); h->ifindex = ((struct sockaddr_in6 *)res->ai_addr)->sin6_scope_id; set_ipmask(h, mask); freeaddrinfo(res); h->next = NULL; h->tail = h; } return (h); } struct node_host * host_dns(const char *s, int v4mask, int v6mask) { struct addrinfo hints, *res0, *res; struct node_host *n, *h = NULL; int error, noalias = 0; int got4 = 0, got6 = 0; char *p, *ps; if ((ps = strdup(s)) == NULL) err(1, "host_dns: strdup"); if ((p = strrchr(ps, ':')) != NULL && !strcmp(p, ":0")) { noalias = 1; *p = '\0'; } memset(&hints, 0, sizeof(hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; /* DUMMY */ error = getaddrinfo(ps, NULL, &hints, &res0); if (error) { free(ps); return (h); } for (res = res0; res; res = res->ai_next) { if (res->ai_family != AF_INET && res->ai_family != AF_INET6) continue; if (noalias) { if (res->ai_family == AF_INET) { if (got4) continue; got4 = 1; } else { if (got6) continue; got6 = 1; } } n = calloc(1, sizeof(struct node_host)); if (n == NULL) err(1, "host_dns: calloc"); n->ifname = NULL; n->af = res->ai_family; if (res->ai_family == AF_INET) { memcpy(&n->addr.v.a.addr, &((struct sockaddr_in *) res->ai_addr)->sin_addr.s_addr, sizeof(struct in_addr)); set_ipmask(n, v4mask); } else { memcpy(&n->addr.v.a.addr, &((struct sockaddr_in6 *) res->ai_addr)->sin6_addr.s6_addr, sizeof(struct in6_addr)); n->ifindex = ((struct sockaddr_in6 *) res->ai_addr)->sin6_scope_id; set_ipmask(n, v6mask); } n->next = NULL; n->tail = n; if (h == NULL) h = n; else { h->tail->next = n; h->tail = n; } } freeaddrinfo(res0); free(ps); return (h); } /* * convert a hostname to a list of addresses and put them in the given buffer. * test: * if set to 1, only simple addresses are accepted (no netblock, no "!"). */ int append_addr(struct pfr_buffer *b, char *s, int test) { char *r; struct node_host *h, *n; int rv, not = 0; for (r = s; *r == '!'; r++) not = !not; if ((n = host(r)) == NULL) { errno = 0; return (-1); } rv = append_addr_host(b, n, test, not); do { h = n; n = n->next; free(h); } while (n != NULL); return (rv); } /* * same as previous function, but with a pre-parsed input and the ability * to "negate" the result. Does not free the node_host list. * not: * setting it to 1 is equivalent to adding "!" in front of parameter s. */ int append_addr_host(struct pfr_buffer *b, struct node_host *n, int test, int not) { int bits; struct pfr_addr addr; do { bzero(&addr, sizeof(addr)); addr.pfra_not = n->not ^ not; addr.pfra_af = n->af; addr.pfra_net = unmask(&n->addr.v.a.mask, n->af); switch (n->af) { case AF_INET: addr.pfra_ip4addr.s_addr = n->addr.v.a.addr.addr32[0]; bits = 32; break; case AF_INET6: memcpy(&addr.pfra_ip6addr, &n->addr.v.a.addr.v6, sizeof(struct in6_addr)); bits = 128; break; default: errno = EINVAL; return (-1); } if ((test && (not || addr.pfra_net != bits)) || addr.pfra_net > bits) { errno = EINVAL; return (-1); } if (pfr_buf_add(b, &addr)) return (-1); } while ((n = n->next) != NULL); return (0); } int pfctl_add_trans(struct pfr_buffer *buf, int rs_num, const char *anchor) { struct pfioc_trans_e trans; bzero(&trans, sizeof(trans)); trans.rs_num = rs_num; if (strlcpy(trans.anchor, anchor, sizeof(trans.anchor)) >= sizeof(trans.anchor)) errx(1, "pfctl_add_trans: strlcpy"); return pfr_buf_add(buf, &trans); } u_int32_t pfctl_get_ticket(struct pfr_buffer *buf, int rs_num, const char *anchor) { struct pfioc_trans_e *p; PFRB_FOREACH(p, buf) if (rs_num == p->rs_num && !strcmp(anchor, p->anchor)) return (p->ticket); errx(1, "pfctl_get_ticket: assertion failed"); } int pfctl_trans(int dev, struct pfr_buffer *buf, u_long cmd, int from) { struct pfioc_trans trans; bzero(&trans, sizeof(trans)); trans.size = buf->pfrb_size - from; trans.esize = sizeof(struct pfioc_trans_e); trans.array = ((struct pfioc_trans_e *)buf->pfrb_caddr) + from; return ioctl(dev, cmd, &trans); } Index: user/alc/PQ_LAUNDRY/share/man/man9/Makefile =================================================================== --- user/alc/PQ_LAUNDRY/share/man/man9/Makefile (revision 303774) +++ user/alc/PQ_LAUNDRY/share/man/man9/Makefile (revision 303775) @@ -1,1944 +1,1943 @@ # $FreeBSD$ .include PACKAGE=runtime-manuals MAN= accept_filter.9 \ accf_data.9 \ accf_dns.9 \ accf_http.9 \ acl.9 \ alq.9 \ altq.9 \ atomic.9 \ bios.9 \ bitset.9 \ boot.9 \ bpf.9 \ buf.9 \ buf_ring.9 \ BUF_ISLOCKED.9 \ BUF_LOCK.9 \ BUF_LOCKFREE.9 \ BUF_LOCKINIT.9 \ BUF_RECURSED.9 \ BUF_TIMELOCK.9 \ BUF_UNLOCK.9 \ bus_activate_resource.9 \ BUS_ADD_CHILD.9 \ bus_adjust_resource.9 \ bus_alloc_resource.9 \ BUS_BIND_INTR.9 \ bus_child_present.9 \ BUS_CHILD_DELETED.9 \ BUS_CHILD_DETACHED.9 \ BUS_CONFIG_INTR.9 \ BUS_DESCRIBE_INTR.9 \ bus_dma.9 \ bus_generic_attach.9 \ bus_generic_detach.9 \ bus_generic_new_pass.9 \ bus_generic_print_child.9 \ bus_generic_read_ivar.9 \ bus_generic_shutdown.9 \ BUS_GET_CPUS.9 \ bus_get_resource.9 \ bus_map_resource.9 \ BUS_NEW_PASS.9 \ BUS_PRINT_CHILD.9 \ BUS_READ_IVAR.9 \ BUS_RESCAN.9 \ bus_release_resource.9 \ bus_set_pass.9 \ bus_set_resource.9 \ BUS_SETUP_INTR.9 \ bus_space.9 \ byteorder.9 \ casuword.9 \ cd.9 \ condvar.9 \ config_intrhook.9 \ contigmalloc.9 \ copy.9 \ counter.9 \ cpuset.9 \ cr_cansee.9 \ critical_enter.9 \ cr_seeothergids.9 \ cr_seeotheruids.9 \ crypto.9 \ CTASSERT.9 \ DB_COMMAND.9 \ DECLARE_GEOM_CLASS.9 \ DECLARE_MODULE.9 \ DELAY.9 \ devclass.9 \ devclass_find.9 \ devclass_get_device.9 \ devclass_get_devices.9 \ devclass_get_drivers.9 \ devclass_get_maxunit.9 \ devclass_get_name.9 \ devclass_get_softc.9 \ dev_clone.9 \ devfs_set_cdevpriv.9 \ device.9 \ device_add_child.9 \ DEVICE_ATTACH.9 \ device_delete_child.9 \ DEVICE_DETACH.9 \ device_enable.9 \ device_find_child.9 \ device_get_children.9 \ device_get_devclass.9 \ device_get_driver.9 \ device_get_ivars.9 \ device_get_name.9 \ device_get_parent.9 \ device_get_softc.9 \ device_get_state.9 \ device_get_sysctl.9 \ device_get_unit.9 \ DEVICE_IDENTIFY.9 \ device_printf.9 \ DEVICE_PROBE.9 \ device_probe_and_attach.9 \ device_quiet.9 \ device_set_desc.9 \ device_set_driver.9 \ device_set_flags.9 \ DEVICE_SHUTDOWN.9 \ DEV_MODULE.9 \ devstat.9 \ devtoname.9 \ disk.9 \ domain.9 \ drbr.9 \ driver.9 \ DRIVER_MODULE.9 \ EVENTHANDLER.9 \ eventtimers.9 \ extattr.9 \ fail.9 \ fetch.9 \ firmware.9 \ fpu_kern.9 \ g_access.9 \ g_attach.9 \ g_bio.9 \ g_consumer.9 \ g_data.9 \ get_cyclecount.9 \ getenv.9 \ getnewvnode.9 \ g_event.9 \ g_geom.9 \ g_provider.9 \ g_provider_by_name.9 \ groupmember.9 \ g_wither_geom.9 \ hash.9 \ hashinit.9 \ hexdump.9 \ hhook.9 \ ieee80211.9 \ ieee80211_amrr.9 \ ieee80211_beacon.9 \ ieee80211_bmiss.9 \ ieee80211_crypto.9 \ ieee80211_ddb.9 \ ieee80211_input.9 \ ieee80211_node.9 \ ieee80211_output.9 \ ieee80211_proto.9 \ ieee80211_radiotap.9 \ ieee80211_regdomain.9 \ ieee80211_scan.9 \ ieee80211_vap.9 \ ifnet.9 \ inittodr.9 \ insmntque.9 \ intro.9 \ ithread.9 \ KASSERT.9 \ kern_testfrwk.9 \ kernacc.9 \ kernel_mount.9 \ khelp.9 \ kobj.9 \ kproc.9 \ kqueue.9 \ kthread.9 \ ktr.9 \ lock.9 \ locking.9 \ LOCK_PROFILING.9 \ mac.9 \ make_dev.9 \ malloc.9 \ mbchain.9 \ mbpool.9 \ mbuf.9 \ mbuf_tags.9 \ MD5.9 \ mdchain.9 \ memcchr.9 \ memguard.9 \ microseq.9 \ microtime.9 \ microuptime.9 \ mi_switch.9 \ mod_cc.9 \ module.9 \ MODULE_DEPEND.9 \ MODULE_VERSION.9 \ mtx_pool.9 \ mutex.9 \ namei.9 \ netisr.9 \ nv.9 \ osd.9 \ owll.9 \ own.9 \ panic.9 \ pbuf.9 \ PCBGROUP.9 \ p_candebug.9 \ p_cansee.9 \ pci.9 \ PCI_IOV_ADD_VF.9 \ PCI_IOV_INIT.9 \ pci_iov_schema.9 \ PCI_IOV_UNINIT.9 \ pfil.9 \ pfind.9 \ pget.9 \ pgfind.9 \ PHOLD.9 \ physio.9 \ pmap.9 \ pmap_activate.9 \ pmap_clear_modify.9 \ pmap_copy.9 \ pmap_enter.9 \ pmap_extract.9 \ pmap_growkernel.9 \ pmap_init.9 \ pmap_is_modified.9 \ pmap_is_prefaultable.9 \ pmap_map.9 \ pmap_mincore.9 \ pmap_object_init_pt.9 \ pmap_page_exists_quick.9 \ pmap_page_init.9 \ pmap_pinit.9 \ pmap_protect.9 \ pmap_qenter.9 \ pmap_quick_enter_page.9 \ pmap_release.9 \ pmap_remove.9 \ pmap_resident_count.9 \ pmap_unwire.9 \ pmap_zero_page.9 \ printf.9 \ prison_check.9 \ priv.9 \ proc_rwmem.9 \ pseudofs.9 \ psignal.9 \ random.9 \ random_harvest.9 \ redzone.9 \ refcount.9 \ resettodr.9 \ resource_int_value.9 \ rijndael.9 \ rman.9 \ rmlock.9 \ rtalloc.9 \ rtentry.9 \ runqueue.9 \ rwlock.9 \ sbuf.9 \ scheduler.9 \ SDT.9 \ securelevel_gt.9 \ selrecord.9 \ sema.9 \ sf_buf.9 \ sglist.9 \ shm_map.9 \ signal.9 \ sleep.9 \ sleepqueue.9 \ socket.9 \ stack.9 \ store.9 \ style.9 \ swi.9 \ sx.9 \ SYSCALL_MODULE.9 \ sysctl.9 \ sysctl_add_oid.9 \ sysctl_ctx_init.9 \ SYSINIT.9 \ taskqueue.9 \ tcp_functions.9 \ thread_exit.9 \ time.9 \ timeout.9 \ tvtohz.9 \ ucred.9 \ uidinfo.9 \ uio.9 \ unr.9 \ utopia.9 \ vaccess.9 \ vaccess_acl_nfs4.9 \ vaccess_acl_posix1e.9 \ vcount.9 \ vflush.9 \ VFS.9 \ vfs_busy.9 \ VFS_CHECKEXP.9 \ vfsconf.9 \ VFS_FHTOVP.9 \ vfs_getnewfsid.9 \ vfs_getopt.9 \ vfs_getvfs.9 \ VFS_MOUNT.9 \ vfs_mountedfrom.9 \ VFS_QUOTACTL.9 \ VFS_ROOT.9 \ vfs_rootmountalloc.9 \ VFS_SET.9 \ VFS_STATFS.9 \ vfs_suser.9 \ VFS_SYNC.9 \ vfs_timestamp.9 \ vfs_unbusy.9 \ VFS_UNMOUNT.9 \ vfs_unmountall.9 \ VFS_VGET.9 \ vget.9 \ vgone.9 \ vhold.9 \ vinvalbuf.9 \ vm_fault_prefault.9 \ vm_map.9 \ vm_map_check_protection.9 \ vm_map_create.9 \ vm_map_delete.9 \ vm_map_entry_resize_free.9 \ vm_map_find.9 \ vm_map_findspace.9 \ vm_map_inherit.9 \ vm_map_init.9 \ vm_map_insert.9 \ vm_map_lock.9 \ vm_map_lookup.9 \ vm_map_madvise.9 \ vm_map_max.9 \ vm_map_protect.9 \ vm_map_remove.9 \ vm_map_simplify_entry.9 \ vm_map_stack.9 \ vm_map_submap.9 \ vm_map_sync.9 \ vm_map_wire.9 \ vm_page_alloc.9 \ vm_page_bits.9 \ vm_page_busy.9 \ vm_page_cache.9 \ vm_page_deactivate.9 \ vm_page_dontneed.9 \ vm_page_aflag.9 \ vm_page_free.9 \ vm_page_grab.9 \ vm_page_hold.9 \ vm_page_insert.9 \ vm_page_lookup.9 \ vm_page_rename.9 \ vm_page_wire.9 \ vm_set_page_size.9 \ vmem.9 \ vn_fullpath.9 \ vn_isdisk.9 \ vnet.9 \ vnode.9 \ VOP_ACCESS.9 \ VOP_ACLCHECK.9 \ VOP_ADVISE.9 \ VOP_ADVLOCK.9 \ VOP_ALLOCATE.9 \ VOP_ATTRIB.9 \ VOP_BWRITE.9 \ VOP_CREATE.9 \ VOP_FSYNC.9 \ VOP_GETACL.9 \ VOP_GETEXTATTR.9 \ VOP_GETPAGES.9 \ VOP_INACTIVE.9 \ VOP_IOCTL.9 \ VOP_LINK.9 \ VOP_LISTEXTATTR.9 \ VOP_LOCK.9 \ VOP_LOOKUP.9 \ VOP_OPENCLOSE.9 \ VOP_PATHCONF.9 \ VOP_PRINT.9 \ VOP_RDWR.9 \ VOP_READDIR.9 \ VOP_READLINK.9 \ VOP_REALLOCBLKS.9 \ VOP_REMOVE.9 \ VOP_RENAME.9 \ VOP_REVOKE.9 \ VOP_SETACL.9 \ VOP_SETEXTATTR.9 \ VOP_STRATEGY.9 \ VOP_VPTOCNP.9 \ VOP_VPTOFH.9 \ vref.9 \ vrefcnt.9 \ vrele.9 \ vslock.9 \ watchdog.9 \ zone.9 MLINKS= unr.9 alloc_unr.9 \ unr.9 alloc_unrl.9 \ unr.9 alloc_unr_specific.9 \ unr.9 delete_unrhdr.9 \ unr.9 free_unr.9 \ unr.9 new_unrhdr.9 MLINKS+=accept_filter.9 accept_filt_add.9 \ accept_filter.9 accept_filt_del.9 \ accept_filter.9 accept_filt_generic_mod_event.9 \ accept_filter.9 accept_filt_get.9 MLINKS+=alq.9 ALQ.9 \ alq.9 alq_close.9 \ alq.9 alq_flush.9 \ alq.9 alq_get.9 \ alq.9 alq_getn.9 \ alq.9 alq_open.9 \ alq.9 alq_open_flags.9 \ alq.9 alq_post.9 \ alq.9 alq_post_flags.9 \ alq.9 alq_write.9 \ alq.9 alq_writen.9 MLINKS+=altq.9 ALTQ.9 MLINKS+=atomic.9 atomic_add.9 \ atomic.9 atomic_clear.9 \ atomic.9 atomic_cmpset.9 \ atomic.9 atomic_fetchadd.9 \ atomic.9 atomic_load.9 \ atomic.9 atomic_readandclear.9 \ atomic.9 atomic_set.9 \ atomic.9 atomic_store.9 \ atomic.9 atomic_subtract.9 \ atomic.9 atomic_swap.9 \ atomic.9 atomic_testandset.9 MLINKS+=bitset.9 BITSET_DEFINE.9 \ bitset.9 BITSET_T_INITIALIZER.9 \ bitset.9 BITSET_FSET.9 \ bitset.9 BIT_CLR.9 \ bitset.9 BIT_COPY.9 \ bitset.9 BIT_ISSET.9 \ bitset.9 BIT_SET.9 \ bitset.9 BIT_ZERO.9 \ bitset.9 BIT_FILL.9 \ bitset.9 BIT_SETOF.9 \ bitset.9 BIT_EMPTY.9 \ bitset.9 BIT_ISFULLSET.9 \ bitset.9 BIT_FFS.9 \ bitset.9 BIT_COUNT.9 \ bitset.9 BIT_SUBSET.9 \ bitset.9 BIT_OVERLAP.9 \ bitset.9 BIT_CMP.9 \ bitset.9 BIT_OR.9 \ bitset.9 BIT_AND.9 \ bitset.9 BIT_NAND.9 \ bitset.9 BIT_CLR_ATOMIC.9 \ bitset.9 BIT_SET_ATOMIC.9 \ bitset.9 BIT_SET_ATOMIC_ACQ.9 \ bitset.9 BIT_AND_ATOMIC.9 \ bitset.9 BIT_OR_ATOMIC.9 \ bitset.9 BIT_COPY_STORE_REL.9 MLINKS+=bpf.9 bpfattach.9 \ bpf.9 bpfattach2.9 \ bpf.9 bpfdetach.9 \ bpf.9 bpf_filter.9 \ bpf.9 bpf_mtap.9 \ bpf.9 bpf_mtap2.9 \ bpf.9 bpf_tap.9 \ bpf.9 bpf_validate.9 MLINKS+=buf.9 bp.9 MLINKS+=buf_ring.9 buf_ring_alloc.9 \ buf_ring.9 buf_ring_free.9 \ buf_ring.9 buf_ring_enqueue.9 \ buf_ring.9 buf_ring_enqueue_bytes.9 \ buf_ring.9 buf_ring_dequeue_mc.9 \ buf_ring.9 buf_ring_dequeue_sc.9 \ buf_ring.9 buf_ring_count.9 \ buf_ring.9 buf_ring_empty.9 \ buf_ring.9 buf_ring_full.9 \ buf_ring.9 buf_ring_peek.9 MLINKS+=bus_activate_resource.9 bus_deactivate_resource.9 MLINKS+=bus_alloc_resource.9 bus_alloc_resource_any.9 MLINKS+=BUS_BIND_INTR.9 bus_bind_intr.9 MLINKS+=BUS_DESCRIBE_INTR.9 bus_describe_intr.9 MLINKS+=bus_dma.9 busdma.9 \ bus_dma.9 bus_dmamap_create.9 \ bus_dma.9 bus_dmamap_destroy.9 \ bus_dma.9 bus_dmamap_load.9 \ bus_dma.9 bus_dmamap_load_bio.9 \ bus_dma.9 bus_dmamap_load_ccb.9 \ bus_dma.9 bus_dmamap_load_mbuf.9 \ bus_dma.9 bus_dmamap_load_mbuf_sg.9 \ bus_dma.9 bus_dmamap_load_uio.9 \ bus_dma.9 bus_dmamap_sync.9 \ bus_dma.9 bus_dmamap_unload.9 \ bus_dma.9 bus_dmamem_alloc.9 \ bus_dma.9 bus_dmamem_free.9 \ bus_dma.9 bus_dma_tag_create.9 \ bus_dma.9 bus_dma_tag_destroy.9 MLINKS+=bus_generic_read_ivar.9 bus_generic_write_ivar.9 MLINKS+=BUS_GET_CPUS.9 bus_get_cpus.9 MLINKS+=bus_map_resource.9 bus_unmap_resource.9 \ bus_map_resource.9 resource_init_map_request.9 MLINKS+=BUS_READ_IVAR.9 BUS_WRITE_IVAR.9 MLINKS+=BUS_SETUP_INTR.9 bus_setup_intr.9 \ BUS_SETUP_INTR.9 BUS_TEARDOWN_INTR.9 \ BUS_SETUP_INTR.9 bus_teardown_intr.9 MLINKS+=bus_space.9 bus_space_alloc.9 \ bus_space.9 bus_space_barrier.9 \ bus_space.9 bus_space_copy_region_1.9 \ bus_space.9 bus_space_copy_region_2.9 \ bus_space.9 bus_space_copy_region_4.9 \ bus_space.9 bus_space_copy_region_8.9 \ bus_space.9 bus_space_copy_region_stream_1.9 \ bus_space.9 bus_space_copy_region_stream_2.9 \ bus_space.9 bus_space_copy_region_stream_4.9 \ bus_space.9 bus_space_copy_region_stream_8.9 \ bus_space.9 bus_space_free.9 \ bus_space.9 bus_space_map.9 \ bus_space.9 bus_space_read_1.9 \ bus_space.9 bus_space_read_2.9 \ bus_space.9 bus_space_read_4.9 \ bus_space.9 bus_space_read_8.9 \ bus_space.9 bus_space_read_multi_1.9 \ bus_space.9 bus_space_read_multi_2.9 \ bus_space.9 bus_space_read_multi_4.9 \ bus_space.9 bus_space_read_multi_8.9 \ bus_space.9 bus_space_read_multi_stream_1.9 \ bus_space.9 bus_space_read_multi_stream_2.9 \ bus_space.9 bus_space_read_multi_stream_4.9 \ bus_space.9 bus_space_read_multi_stream_8.9 \ bus_space.9 bus_space_read_region_1.9 \ bus_space.9 bus_space_read_region_2.9 \ bus_space.9 bus_space_read_region_4.9 \ bus_space.9 bus_space_read_region_8.9 \ bus_space.9 bus_space_read_region_stream_1.9 \ bus_space.9 bus_space_read_region_stream_2.9 \ bus_space.9 bus_space_read_region_stream_4.9 \ bus_space.9 bus_space_read_region_stream_8.9 \ bus_space.9 bus_space_read_stream_1.9 \ bus_space.9 bus_space_read_stream_2.9 \ bus_space.9 bus_space_read_stream_4.9 \ bus_space.9 bus_space_read_stream_8.9 \ bus_space.9 bus_space_set_multi_1.9 \ bus_space.9 bus_space_set_multi_2.9 \ bus_space.9 bus_space_set_multi_4.9 \ bus_space.9 bus_space_set_multi_8.9 \ bus_space.9 bus_space_set_multi_stream_1.9 \ bus_space.9 bus_space_set_multi_stream_2.9 \ bus_space.9 bus_space_set_multi_stream_4.9 \ bus_space.9 bus_space_set_multi_stream_8.9 \ bus_space.9 bus_space_set_region_1.9 \ bus_space.9 bus_space_set_region_2.9 \ bus_space.9 bus_space_set_region_4.9 \ bus_space.9 bus_space_set_region_8.9 \ bus_space.9 bus_space_set_region_stream_1.9 \ bus_space.9 bus_space_set_region_stream_2.9 \ bus_space.9 bus_space_set_region_stream_4.9 \ bus_space.9 bus_space_set_region_stream_8.9 \ bus_space.9 bus_space_subregion.9 \ bus_space.9 bus_space_unmap.9 \ bus_space.9 bus_space_write_1.9 \ bus_space.9 bus_space_write_2.9 \ bus_space.9 bus_space_write_4.9 \ bus_space.9 bus_space_write_8.9 \ bus_space.9 bus_space_write_multi_1.9 \ bus_space.9 bus_space_write_multi_2.9 \ bus_space.9 bus_space_write_multi_4.9 \ bus_space.9 bus_space_write_multi_8.9 \ bus_space.9 bus_space_write_multi_stream_1.9 \ bus_space.9 bus_space_write_multi_stream_2.9 \ bus_space.9 bus_space_write_multi_stream_4.9 \ bus_space.9 bus_space_write_multi_stream_8.9 \ bus_space.9 bus_space_write_region_1.9 \ bus_space.9 bus_space_write_region_2.9 \ bus_space.9 bus_space_write_region_4.9 \ bus_space.9 bus_space_write_region_8.9 \ bus_space.9 bus_space_write_region_stream_1.9 \ bus_space.9 bus_space_write_region_stream_2.9 \ bus_space.9 bus_space_write_region_stream_4.9 \ bus_space.9 bus_space_write_region_stream_8.9 \ bus_space.9 bus_space_write_stream_1.9 \ bus_space.9 bus_space_write_stream_2.9 \ bus_space.9 bus_space_write_stream_4.9 \ bus_space.9 bus_space_write_stream_8.9 MLINKS+=byteorder.9 be16dec.9 \ byteorder.9 be16enc.9 \ byteorder.9 be16toh.9 \ byteorder.9 be32dec.9 \ byteorder.9 be32enc.9 \ byteorder.9 be32toh.9 \ byteorder.9 be64dec.9 \ byteorder.9 be64enc.9 \ byteorder.9 be64toh.9 \ byteorder.9 bswap16.9 \ byteorder.9 bswap32.9 \ byteorder.9 bswap64.9 \ byteorder.9 htobe16.9 \ byteorder.9 htobe32.9 \ byteorder.9 htobe64.9 \ byteorder.9 htole16.9 \ byteorder.9 htole32.9 \ byteorder.9 htole64.9 \ byteorder.9 le16dec.9 \ byteorder.9 le16enc.9 \ byteorder.9 le16toh.9 \ byteorder.9 le32dec.9 \ byteorder.9 le32enc.9 \ byteorder.9 le32toh.9 \ byteorder.9 le64dec.9 \ byteorder.9 le64enc.9 \ byteorder.9 le64toh.9 MLINKS+=condvar.9 cv_broadcast.9 \ condvar.9 cv_broadcastpri.9 \ condvar.9 cv_destroy.9 \ condvar.9 cv_init.9 \ condvar.9 cv_signal.9 \ condvar.9 cv_timedwait.9 \ condvar.9 cv_timedwait_sig.9 \ condvar.9 cv_timedwait_sig_sbt.9 \ condvar.9 cv_wait.9 \ condvar.9 cv_wait_sig.9 \ condvar.9 cv_wait_unlock.9 \ condvar.9 cv_wmesg.9 MLINKS+=config_intrhook.9 config_intrhook_disestablish.9 \ config_intrhook.9 config_intrhook_establish.9 MLINKS+=contigmalloc.9 contigfree.9 MLINKS+=casuword.9 casueword.9 \ casuword.9 casueword32.9 \ casuword.9 casuword32.9 MLINKS+=copy.9 copyin.9 \ copy.9 copyin_nofault.9 \ copy.9 copyinstr.9 \ copy.9 copyout.9 \ copy.9 copyout_nofault.9 \ copy.9 copystr.9 MLINKS+=counter.9 counter_u64_alloc.9 \ counter.9 counter_u64_free.9 \ counter.9 counter_u64_add.9 \ counter.9 counter_enter.9 \ counter.9 counter_exit.9 \ counter.9 counter_u64_add_protected.9 \ counter.9 counter_u64_fetch.9 \ counter.9 counter_u64_zero.9 \ counter.9 SYSCTL_COUNTER_U64.9 \ counter.9 SYSCTL_ADD_COUNTER_U64.9 \ counter.9 SYSCTL_COUNTER_U64_ARRAY.9 \ counter.9 SYSCTL_ADD_COUNTER_U64_ARRAY.9 MLINKS+=cpuset.9 CPUSET_T_INITIALIZER.9 \ cpuset.9 CPUSET_FSET.9 \ cpuset.9 CPU_CLR.9 \ cpuset.9 CPU_COPY.9 \ cpuset.9 CPU_ISSET.9 \ cpuset.9 CPU_SET.9 \ cpuset.9 CPU_ZERO.9 \ cpuset.9 CPU_FILL.9 \ cpuset.9 CPU_SETOF.9 \ cpuset.9 CPU_EMPTY.9 \ cpuset.9 CPU_ISFULLSET.9 \ cpuset.9 CPU_FFS.9 \ cpuset.9 CPU_COUNT.9 \ cpuset.9 CPU_SUBSET.9 \ cpuset.9 CPU_OVERLAP.9 \ cpuset.9 CPU_CMP.9 \ cpuset.9 CPU_OR.9 \ cpuset.9 CPU_AND.9 \ cpuset.9 CPU_NAND.9 \ cpuset.9 CPU_CLR_ATOMIC.9 \ cpuset.9 CPU_SET_ATOMIC.9 \ cpuset.9 CPU_SET_ATOMIC_ACQ.9 \ cpuset.9 CPU_AND_ATOMIC.9 \ cpuset.9 CPU_OR_ATOMIC.9 \ cpuset.9 CPU_COPY_STORE_REL.9 MLINKS+=critical_enter.9 critical.9 \ critical_enter.9 critical_exit.9 MLINKS+=crypto.9 crypto_dispatch.9 \ crypto.9 crypto_done.9 \ crypto.9 crypto_freereq.9 \ crypto.9 crypto_freesession.9 \ crypto.9 crypto_get_driverid.9 \ crypto.9 crypto_getreq.9 \ crypto.9 crypto_kdispatch.9 \ crypto.9 crypto_kdone.9 \ crypto.9 crypto_kregister.9 \ crypto.9 crypto_newsession.9 \ crypto.9 crypto_register.9 \ crypto.9 crypto_unblock.9 \ crypto.9 crypto_unregister.9 \ crypto.9 crypto_unregister_all.9 MLINKS+=DB_COMMAND.9 DB_SHOW_ALL_COMMAND.9 \ DB_COMMAND.9 DB_SHOW_COMMAND.9 MLINKS+=dev_clone.9 drain_dev_clone_events.9 MLINKS+=devfs_set_cdevpriv.9 devfs_clear_cdevpriv.9 \ devfs_set_cdevpriv.9 devfs_get_cdevpriv.9 MLINKS+=device_add_child.9 device_add_child_ordered.9 MLINKS+=device_enable.9 device_disable.9 \ device_enable.9 device_is_enabled.9 MLINKS+=device_get_ivars.9 device_set_ivars.9 MLINKS+=device_get_name.9 device_get_nameunit.9 MLINKS+=device_get_state.9 device_busy.9 \ device_get_state.9 device_is_alive.9 \ device_get_state.9 device_is_attached.9 \ device_get_state.9 device_unbusy.9 MLINKS+=device_get_sysctl.9 device_get_sysctl_ctx.9 \ device_get_sysctl.9 device_get_sysctl_tree.9 MLINKS+=device_quiet.9 device_is_quiet.9 \ device_quiet.9 device_verbose.9 MLINKS+=device_set_desc.9 device_get_desc.9 \ device_set_desc.9 device_set_desc_copy.9 MLINKS+=device_set_flags.9 device_get_flags.9 MLINKS+=devstat.9 devicestat.9 \ devstat.9 devstat_add_entry.9 \ devstat.9 devstat_end_transaction.9 \ devstat.9 devstat_remove_entry.9 \ devstat.9 devstat_start_transaction.9 MLINKS+=disk.9 disk_alloc.9 \ disk.9 disk_create.9 \ disk.9 disk_destroy.9 \ disk.9 disk_gone.9 \ disk.9 disk_resize.9 MLINKS+=domain.9 DOMAIN_SET.9 \ domain.9 domain_add.9 \ domain.9 pfctlinput.9 \ domain.9 pfctlinput2.9 \ domain.9 pffinddomain.9 \ domain.9 pffindproto.9 \ domain.9 pffindtype.9 MLINKS+=drbr.9 drbr_free.9 \ drbr.9 drbr_enqueue.9 \ drbr.9 drbr_dequeue.9 \ drbr.9 drbr_dequeue_cond.9 \ drbr.9 drbr_flush.9 \ drbr.9 drbr_empty.9 \ drbr.9 drbr_inuse.9 \ drbr.9 drbr_stats_update.9 MLINKS+=DRIVER_MODULE.9 DRIVER_MODULE_ORDERED.9 \ DRIVER_MODULE.9 EARLY_DRIVER_MODULE.9 \ DRIVER_MODULE.9 EARLY_DRIVER_MODULE_ORDERED.9 MLINKS+=EVENTHANDLER.9 EVENTHANDLER_DECLARE.9 \ EVENTHANDLER.9 EVENTHANDLER_DEREGISTER.9 \ EVENTHANDLER.9 eventhandler_deregister.9 \ EVENTHANDLER.9 eventhandler_find_list.9 \ EVENTHANDLER.9 EVENTHANDLER_INVOKE.9 \ EVENTHANDLER.9 eventhandler_prune_list.9 \ EVENTHANDLER.9 EVENTHANDLER_REGISTER.9 \ EVENTHANDLER.9 eventhandler_register.9 MLINKS+=eventtimers.9 et_register.9 \ eventtimers.9 et_deregister.9 \ eventtimers.9 et_ban.9 \ eventtimers.9 et_find.9 \ eventtimers.9 et_free.9 \ eventtimers.9 et_init.9 \ eventtimers.9 ET_LOCK.9 \ eventtimers.9 ET_UNLOCK.9 \ eventtimers.9 et_start.9 \ eventtimers.9 et_stop.9 MLINKS+=fail.9 KFAIL_POINT_CODE.9 \ fail.9 KFAIL_POINT_ERROR.9 \ fail.9 KFAIL_POINT_GOTO.9 \ fail.9 KFAIL_POINT_RETURN.9 \ fail.9 KFAIL_POINT_RETURN_VOID.9 MLINKS+=fetch.9 fubyte.9 \ fetch.9 fuswintr.9 \ fetch.9 fuword.9 \ fetch.9 fuword16.9 \ fetch.9 fuword32.9 \ fetch.9 fuword64.9 \ fetch.9 fueword.9 \ fetch.9 fueword32.9 \ fetch.9 fueword64.9 MLINKS+=firmware.9 firmware_get.9 \ firmware.9 firmware_put.9 \ firmware.9 firmware_register.9 \ firmware.9 firmware_unregister.9 MLINKS+=fpu_kern.9 fpu_kern_alloc_ctx.9 \ fpu_kern.9 fpu_kern_free_ctx.9 \ fpu_kern.9 fpu_kern_enter.9 \ fpu_kern.9 fpu_kern_leave.9 \ fpu_kern.9 fpu_kern_thread.9 \ fpu_kern.9 is_fpu_kern_thread.9 MLINKS+=g_attach.9 g_detach.9 MLINKS+=g_bio.9 g_alloc_bio.9 \ g_bio.9 g_clone_bio.9 \ g_bio.9 g_destroy_bio.9 \ g_bio.9 g_duplicate_bio.9 \ g_bio.9 g_new_bio.9 \ g_bio.9 g_print_bio.9 \ g_bio.9 g_reset_bio.9 MLINKS+=g_consumer.9 g_destroy_consumer.9 \ g_consumer.9 g_new_consumer.9 MLINKS+=g_data.9 g_read_data.9 \ g_data.9 g_write_data.9 MLINKS+=getenv.9 freeenv.9 \ getenv.9 getenv_int.9 \ getenv.9 getenv_long.9 \ getenv.9 getenv_string.9 \ getenv.9 getenv_quad.9 \ getenv.9 getenv_uint.9 \ getenv.9 getenv_ulong.9 \ getenv.9 setenv.9 \ getenv.9 testenv.9 \ getenv.9 unsetenv.9 MLINKS+=g_event.9 g_cancel_event.9 \ g_event.9 g_post_event.9 \ g_event.9 g_waitfor_event.9 MLINKS+=g_geom.9 g_destroy_geom.9 \ g_geom.9 g_new_geomf.9 MLINKS+=g_provider.9 g_destroy_provider.9 \ g_provider.9 g_error_provider.9 \ g_provider.9 g_new_providerf.9 MLINKS+=hash.9 hash32.9 \ hash.9 hash32_buf.9 \ hash.9 hash32_str.9 \ hash.9 hash32_stre.9 \ hash.9 hash32_strn.9 \ hash.9 hash32_strne.9 \ hash.9 jenkins_hash.9 \ hash.9 jenkins_hash32.9 MLINKS+=hashinit.9 hashdestroy.9 \ hashinit.9 hashinit_flags.9 \ hashinit.9 phashinit.9 MLINKS+=hhook.9 hhook_head_register.9 \ hhook.9 hhook_head_deregister.9 \ hhook.9 hhook_head_deregister_lookup.9 \ hhook.9 hhook_run_hooks.9 \ hhook.9 HHOOKS_RUN_IF.9 \ hhook.9 HHOOKS_RUN_LOOKUP_IF.9 MLINKS+=ieee80211.9 ieee80211_ifattach.9 \ ieee80211.9 ieee80211_ifdetach.9 MLINKS+=ieee80211_amrr.9 ieee80211_amrr_choose.9 \ ieee80211_amrr.9 ieee80211_amrr_cleanup.9 \ ieee80211_amrr.9 ieee80211_amrr_init.9 \ ieee80211_amrr.9 ieee80211_amrr_node_init.9 \ ieee80211_amrr.9 ieee80211_amrr_setinterval.9 \ ieee80211_amrr.9 ieee80211_amrr_tx_complete.9 \ ieee80211_amrr.9 ieee80211_amrr_tx_update.9 MLINKS+=ieee80211_beacon.9 ieee80211_beacon_alloc.9 \ ieee80211_beacon.9 ieee80211_beacon_notify.9 \ ieee80211_beacon.9 ieee80211_beacon_update.9 MLINKS+=ieee80211_bmiss.9 ieee80211_beacon_miss.9 MLINKS+=ieee80211_crypto.9 ieee80211_crypto_available.9 \ ieee80211_crypto.9 ieee80211_crypto_decap.9 \ ieee80211_crypto.9 ieee80211_crypto_delglobalkeys.9 \ ieee80211_crypto.9 ieee80211_crypto_delkey.9 \ ieee80211_crypto.9 ieee80211_crypto_demic.9 \ ieee80211_crypto.9 ieee80211_crypto_encap.9 \ ieee80211_crypto.9 ieee80211_crypto_enmic.9 \ ieee80211_crypto.9 ieee80211_crypto_newkey.9 \ ieee80211_crypto.9 ieee80211_crypto_register.9 \ ieee80211_crypto.9 ieee80211_crypto_reload_keys.9 \ ieee80211_crypto.9 ieee80211_crypto_setkey.9 \ ieee80211_crypto.9 ieee80211_crypto_unregister.9 \ ieee80211_crypto.9 ieee80211_key_update_begin.9 \ ieee80211_crypto.9 ieee80211_key_update_end.9 \ ieee80211_crypto.9 ieee80211_notify_michael_failure.9 \ ieee80211_crypto.9 ieee80211_notify_replay_failure.9 MLINKS+=ieee80211_input.9 ieee80211_input_all.9 MLINKS+=ieee80211_node.9 ieee80211_dump_node.9 \ ieee80211_node.9 ieee80211_dump_nodes.9 \ ieee80211_node.9 ieee80211_find_rxnode.9 \ ieee80211_node.9 ieee80211_find_rxnode_withkey.9 \ ieee80211_node.9 ieee80211_free_node.9 \ ieee80211_node.9 ieee80211_iterate_nodes.9 \ ieee80211_node.9 ieee80211_ref_node.9 \ ieee80211_node.9 ieee80211_unref_node.9 MLINKS+=ieee80211_output.9 ieee80211_process_callback.9 \ ieee80211_output.9 M_SEQNO_GET.9 \ ieee80211_output.9 M_WME_GETAC.9 MLINKS+=ieee80211_proto.9 ieee80211_new_state.9 \ ieee80211_proto.9 ieee80211_resume_all.9 \ ieee80211_proto.9 ieee80211_start_all.9 \ ieee80211_proto.9 ieee80211_stop_all.9 \ ieee80211_proto.9 ieee80211_suspend_all.9 \ ieee80211_proto.9 ieee80211_waitfor_parent.9 MLINKS+=ieee80211_radiotap.9 ieee80211_radiotap_active.9 \ ieee80211_radiotap.9 ieee80211_radiotap_active_vap.9 \ ieee80211_radiotap.9 ieee80211_radiotap_attach.9 \ ieee80211_radiotap.9 ieee80211_radiotap_tx.9 \ ieee80211_radiotap.9 radiotap.9 MLINKS+=ieee80211_regdomain.9 ieee80211_alloc_countryie.9 \ ieee80211_regdomain.9 ieee80211_init_channels.9 \ ieee80211_regdomain.9 ieee80211_sort_channels.9 MLINKS+=ieee80211_scan.9 ieee80211_add_scan.9 \ ieee80211_scan.9 ieee80211_bg_scan.9 \ ieee80211_scan.9 ieee80211_cancel_scan.9 \ ieee80211_scan.9 ieee80211_cancel_scan_any.9 \ ieee80211_scan.9 ieee80211_check_scan.9 \ ieee80211_scan.9 ieee80211_check_scan_current.9 \ ieee80211_scan.9 ieee80211_flush.9 \ ieee80211_scan.9 ieee80211_probe_curchan.9 \ ieee80211_scan.9 ieee80211_scan_assoc_fail.9 \ ieee80211_scan.9 ieee80211_scan_done.9 \ ieee80211_scan.9 ieee80211_scan_dump_channels.9 \ ieee80211_scan.9 ieee80211_scan_flush.9 \ ieee80211_scan.9 ieee80211_scan_iterate.9 \ ieee80211_scan.9 ieee80211_scan_next.9 \ ieee80211_scan.9 ieee80211_scan_timeout.9 \ ieee80211_scan.9 ieee80211_scanner_get.9 \ ieee80211_scan.9 ieee80211_scanner_register.9 \ ieee80211_scan.9 ieee80211_scanner_unregister.9 \ ieee80211_scan.9 ieee80211_scanner_unregister_all.9 \ ieee80211_scan.9 ieee80211_start_scan.9 MLINKS+=ieee80211_vap.9 ieee80211_vap_attach.9 \ ieee80211_vap.9 ieee80211_vap_detach.9 \ ieee80211_vap.9 ieee80211_vap_setup.9 MLINKS+=ifnet.9 if_addmulti.9 \ ifnet.9 if_alloc.9 \ ifnet.9 if_allmulti.9 \ ifnet.9 if_attach.9 \ ifnet.9 if_data.9 \ ifnet.9 IF_DEQUEUE.9 \ ifnet.9 if_delmulti.9 \ ifnet.9 if_detach.9 \ ifnet.9 if_down.9 \ ifnet.9 if_findmulti.9 \ ifnet.9 if_free.9 \ ifnet.9 if_free_type.9 \ ifnet.9 if_up.9 \ ifnet.9 ifa_free.9 \ ifnet.9 ifa_ifwithaddr.9 \ ifnet.9 ifa_ifwithdstaddr.9 \ ifnet.9 ifa_ifwithnet.9 \ ifnet.9 ifa_ref.9 \ ifnet.9 ifaddr.9 \ ifnet.9 ifaddr_byindex.9 \ ifnet.9 ifaof_ifpforaddr.9 \ ifnet.9 ifioctl.9 \ ifnet.9 ifpromisc.9 \ ifnet.9 ifqueue.9 \ ifnet.9 ifunit.9 \ ifnet.9 ifunit_ref.9 MLINKS+=insmntque.9 insmntque1.9 MLINKS+=ithread.9 ithread_add_handler.9 \ ithread.9 ithread_create.9 \ ithread.9 ithread_destroy.9 \ ithread.9 ithread_priority.9 \ ithread.9 ithread_remove_handler.9 \ ithread.9 ithread_schedule.9 MLINKS+=kernacc.9 useracc.9 MLINKS+=kernel_mount.9 free_mntarg.9 \ kernel_mount.9 kernel_vmount.9 \ kernel_mount.9 mount_arg.9 \ kernel_mount.9 mount_argb.9 \ kernel_mount.9 mount_argf.9 \ kernel_mount.9 mount_argsu.9 MLINKS+=khelp.9 khelp_add_hhook.9 \ khelp.9 KHELP_DECLARE_MOD.9 \ khelp.9 KHELP_DECLARE_MOD_UMA.9 \ khelp.9 khelp_destroy_osd.9 \ khelp.9 khelp_get_id.9 \ khelp.9 khelp_get_osd.9 \ khelp.9 khelp_init_osd.9 \ khelp.9 khelp_remove_hhook.9 MLINKS+=kobj.9 DEFINE_CLASS.9 \ kobj.9 kobj_class_compile.9 \ kobj.9 kobj_class_compile_static.9 \ kobj.9 kobj_class_free.9 \ kobj.9 kobj_create.9 \ kobj.9 kobj_delete.9 \ kobj.9 kobj_init.9 \ kobj.9 kobj_init_static.9 MLINKS+=kproc.9 kproc_create.9 \ kproc.9 kproc_exit.9 \ kproc.9 kproc_kthread_add.9 \ kproc.9 kproc_resume.9 \ kproc.9 kproc_shutdown.9 \ kproc.9 kproc_start.9 \ kproc.9 kproc_suspend.9 \ kproc.9 kproc_suspend_check.9 \ kproc.9 kthread_create.9 MLINKS+=kqueue.9 knlist_add.9 \ kqueue.9 knlist_clear.9 \ kqueue.9 knlist_delete.9 \ kqueue.9 knlist_destroy.9 \ kqueue.9 knlist_empty.9 \ kqueue.9 knlist_init.9 \ kqueue.9 knlist_init_mtx.9 \ kqueue.9 knlist_init_rw_reader.9 \ kqueue.9 knlist_remove.9 \ kqueue.9 knlist_remove_inevent.9 \ kqueue.9 knote_fdclose.9 \ kqueue.9 KNOTE_LOCKED.9 \ kqueue.9 KNOTE_UNLOCKED.9 \ kqueue.9 kqfd_register.9 \ kqueue.9 kqueue_add_filteropts.9 \ kqueue.9 kqueue_del_filteropts.9 MLINKS+=kthread.9 kthread_add.9 \ kthread.9 kthread_exit.9 \ kthread.9 kthread_resume.9 \ kthread.9 kthread_shutdown.9 \ kthread.9 kthread_start.9 \ kthread.9 kthread_suspend.9 \ kthread.9 kthread_suspend_check.9 MLINKS+=ktr.9 CTR0.9 \ ktr.9 CTR1.9 \ ktr.9 CTR2.9 \ ktr.9 CTR3.9 \ ktr.9 CTR4.9 \ ktr.9 CTR5.9 \ ktr.9 CTR6.9 MLINKS+=lock.9 lockdestroy.9 \ lock.9 lockinit.9 \ lock.9 lockmgr.9 \ lock.9 lockmgr_args.9 \ lock.9 lockmgr_args_rw.9 \ lock.9 lockmgr_assert.9 \ lock.9 lockmgr_disown.9 \ lock.9 lockmgr_printinfo.9 \ lock.9 lockmgr_recursed.9 \ lock.9 lockmgr_rw.9 \ - lock.9 lockmgr_waiters.9 \ lock.9 lockstatus.9 MLINKS+=LOCK_PROFILING.9 MUTEX_PROFILING.9 MLINKS+=make_dev.9 destroy_dev.9 \ make_dev.9 destroy_dev_drain.9 \ make_dev.9 destroy_dev_sched.9 \ make_dev.9 destroy_dev_sched_cb.9 \ make_dev.9 dev_depends.9 \ make_dev.9 make_dev_alias.9 \ make_dev.9 make_dev_alias_p.9 \ make_dev.9 make_dev_cred.9 \ make_dev.9 make_dev_credf.9 \ make_dev.9 make_dev_p.9 \ make_dev.9 make_dev_s.9 MLINKS+=malloc.9 free.9 \ malloc.9 MALLOC_DECLARE.9 \ malloc.9 MALLOC_DEFINE.9 \ malloc.9 realloc.9 \ malloc.9 reallocf.9 MLINKS+=mbchain.9 mb_detach.9 \ mbchain.9 mb_done.9 \ mbchain.9 mb_fixhdr.9 \ mbchain.9 mb_init.9 \ mbchain.9 mb_initm.9 \ mbchain.9 mb_put_int64be.9 \ mbchain.9 mb_put_int64le.9 \ mbchain.9 mb_put_mbuf.9 \ mbchain.9 mb_put_mem.9 \ mbchain.9 mb_put_uint16be.9 \ mbchain.9 mb_put_uint16le.9 \ mbchain.9 mb_put_uint32be.9 \ mbchain.9 mb_put_uint32le.9 \ mbchain.9 mb_put_uint8.9 \ mbchain.9 mb_put_uio.9 \ mbchain.9 mb_reserve.9 MLINKS+=mbpool.9 mbp_alloc.9 \ mbpool.9 mbp_card_free.9 \ mbpool.9 mbp_count.9 \ mbpool.9 mbp_create.9 \ mbpool.9 mbp_destroy.9 \ mbpool.9 mbp_ext_free.9 \ mbpool.9 mbp_free.9 \ mbpool.9 mbp_get.9 \ mbpool.9 mbp_get_keep.9 \ mbpool.9 mbp_sync.9 MLINKS+=\ mbuf.9 m_adj.9 \ mbuf.9 m_align.9 \ mbuf.9 M_ALIGN.9 \ mbuf.9 m_append.9 \ mbuf.9 m_apply.9 \ mbuf.9 m_cat.9 \ mbuf.9 m_catpkt.9 \ mbuf.9 MCHTYPE.9 \ mbuf.9 MCLGET.9 \ mbuf.9 m_collapse.9 \ mbuf.9 m_copyback.9 \ mbuf.9 m_copydata.9 \ mbuf.9 m_copym.9 \ mbuf.9 m_copypacket.9 \ mbuf.9 m_copyup.9 \ mbuf.9 m_defrag.9 \ mbuf.9 m_devget.9 \ mbuf.9 m_dup.9 \ mbuf.9 m_dup_pkthdr.9 \ mbuf.9 MEXTADD.9 \ mbuf.9 m_fixhdr.9 \ mbuf.9 m_free.9 \ mbuf.9 m_freem.9 \ mbuf.9 MGET.9 \ mbuf.9 m_get.9 \ mbuf.9 m_get2.9 \ mbuf.9 m_getjcl.9 \ mbuf.9 m_getcl.9 \ mbuf.9 m_getclr.9 \ mbuf.9 MGETHDR.9 \ mbuf.9 m_gethdr.9 \ mbuf.9 m_getm.9 \ mbuf.9 m_getptr.9 \ mbuf.9 MH_ALIGN.9 \ mbuf.9 M_LEADINGSPACE.9 \ mbuf.9 m_length.9 \ mbuf.9 M_MOVE_PKTHDR.9 \ mbuf.9 m_move_pkthdr.9 \ mbuf.9 M_PREPEND.9 \ mbuf.9 m_prepend.9 \ mbuf.9 m_pulldown.9 \ mbuf.9 m_pullup.9 \ mbuf.9 m_split.9 \ mbuf.9 mtod.9 \ mbuf.9 M_TRAILINGSPACE.9 \ mbuf.9 m_unshare.9 \ mbuf.9 M_WRITABLE.9 MLINKS+=\ mbuf_tags.9 m_tag_alloc.9 \ mbuf_tags.9 m_tag_copy.9 \ mbuf_tags.9 m_tag_copy_chain.9 \ mbuf_tags.9 m_tag_delete.9 \ mbuf_tags.9 m_tag_delete_chain.9 \ mbuf_tags.9 m_tag_delete_nonpersistent.9 \ mbuf_tags.9 m_tag_find.9 \ mbuf_tags.9 m_tag_first.9 \ mbuf_tags.9 m_tag_free.9 \ mbuf_tags.9 m_tag_get.9 \ mbuf_tags.9 m_tag_init.9 \ mbuf_tags.9 m_tag_locate.9 \ mbuf_tags.9 m_tag_next.9 \ mbuf_tags.9 m_tag_prepend.9 \ mbuf_tags.9 m_tag_unlink.9 MLINKS+=MD5.9 MD5Init.9 \ MD5.9 MD5Transform.9 MLINKS+=mdchain.9 md_append_record.9 \ mdchain.9 md_done.9 \ mdchain.9 md_get_int64.9 \ mdchain.9 md_get_int64be.9 \ mdchain.9 md_get_int64le.9 \ mdchain.9 md_get_mbuf.9 \ mdchain.9 md_get_mem.9 \ mdchain.9 md_get_uint16.9 \ mdchain.9 md_get_uint16be.9 \ mdchain.9 md_get_uint16le.9 \ mdchain.9 md_get_uint32.9 \ mdchain.9 md_get_uint32be.9 \ mdchain.9 md_get_uint32le.9 \ mdchain.9 md_get_uint8.9 \ mdchain.9 md_get_uio.9 \ mdchain.9 md_initm.9 \ mdchain.9 md_next_record.9 MLINKS+=microtime.9 bintime.9 \ microtime.9 getbintime.9 \ microtime.9 getmicrotime.9 \ microtime.9 getnanotime.9 \ microtime.9 nanotime.9 MLINKS+=microuptime.9 binuptime.9 \ microuptime.9 getbinuptime.9 \ microuptime.9 getmicrouptime.9 \ microuptime.9 getnanouptime.9 \ microuptime.9 getsbinuptime.9 \ microuptime.9 nanouptime.9 \ microuptime.9 sbinuptime.9 MLINKS+=mi_switch.9 cpu_switch.9 \ mi_switch.9 cpu_throw.9 MLINKS+=mod_cc.9 CCV.9 \ mod_cc.9 DECLARE_CC_MODULE.9 MLINKS+=mtx_pool.9 mtx_pool_alloc.9 \ mtx_pool.9 mtx_pool_create.9 \ mtx_pool.9 mtx_pool_destroy.9 \ mtx_pool.9 mtx_pool_find.9 \ mtx_pool.9 mtx_pool_lock.9 \ mtx_pool.9 mtx_pool_lock_spin.9 \ mtx_pool.9 mtx_pool_unlock.9 \ mtx_pool.9 mtx_pool_unlock_spin.9 MLINKS+=mutex.9 mtx_assert.9 \ mutex.9 mtx_destroy.9 \ mutex.9 mtx_init.9 \ mutex.9 mtx_initialized.9 \ mutex.9 mtx_lock.9 \ mutex.9 mtx_lock_flags.9 \ mutex.9 mtx_lock_spin.9 \ mutex.9 mtx_lock_spin_flags.9 \ mutex.9 mtx_owned.9 \ mutex.9 mtx_recursed.9 \ mutex.9 mtx_sleep.9 \ mutex.9 MTX_SYSINIT.9 \ mutex.9 mtx_trylock.9 \ mutex.9 mtx_trylock_flags.9 \ mutex.9 mtx_trylock_spin.9 \ mutex.9 mtx_trylock_spin_flags.9 \ mutex.9 mtx_unlock.9 \ mutex.9 mtx_unlock_flags.9 \ mutex.9 mtx_unlock_spin.9 \ mutex.9 mtx_unlock_spin_flags.9 MLINKS+=namei.9 NDFREE.9 \ namei.9 NDINIT.9 MLINKS+=netisr.9 netisr_clearqdrops.9 \ netisr.9 netisr_default_flow2cpu.9 \ netisr.9 netisr_dispatch.9 \ netisr.9 netisr_dispatch_src.9 \ netisr.9 netisr_get_cpucount.9 \ netisr.9 netisr_get_cpuid.9 \ netisr.9 netisr_getqdrops.9 \ netisr.9 netisr_getqlimit.9 \ netisr.9 netisr_queue.9 \ netisr.9 netisr_queue_src.9 \ netisr.9 netisr_register.9 \ netisr.9 netisr_setqlimit.9 \ netisr.9 netisr_unregister.9 MLINKS+=nv.9 libnv.9 \ nv.9 nvlist.9 \ nv.9 nvlist_add_binary.9 \ nv.9 nvlist_add_bool.9 \ nv.9 nvlist_add_descriptor.9 \ nv.9 nvlist_add_null.9 \ nv.9 nvlist_add_number.9 \ nv.9 nvlist_add_nvlist.9 \ nv.9 nvlist_add_string.9 \ nv.9 nvlist_add_stringf.9 \ nv.9 nvlist_add_stringv.9 \ nv.9 nvlist_clone.9 \ nv.9 nvlist_create.9 \ nv.9 nvlist_destroy.9 \ nv.9 nvlist_dump.9 \ nv.9 nvlist_empty.9 \ nv.9 nvlist_error.9 \ nv.9 nvlist_exists.9 \ nv.9 nvlist_exists_binary.9 \ nv.9 nvlist_exists_bool.9 \ nv.9 nvlist_exists_descriptor.9 \ nv.9 nvlist_exists_null.9 \ nv.9 nvlist_exists_number.9 \ nv.9 nvlist_exists_nvlist.9 \ nv.9 nvlist_exists_string.9 \ nv.9 nvlist_exists_type.9 \ nv.9 nvlist_fdump.9 \ nv.9 nvlist_flags.9 \ nv.9 nvlist_free.9 \ nv.9 nvlist_free_binary.9 \ nv.9 nvlist_free_bool.9 \ nv.9 nvlist_free_descriptor.9 \ nv.9 nvlist_free_null.9 \ nv.9 nvlist_free_number.9 \ nv.9 nvlist_free_nvlist.9 \ nv.9 nvlist_free_string.9 \ nv.9 nvlist_free_type.9 \ nv.9 nvlist_get_binary.9 \ nv.9 nvlist_get_bool.9 \ nv.9 nvlist_get_descriptor.9 \ nv.9 nvlist_get_number.9 \ nv.9 nvlist_get_nvlist.9 \ nv.9 nvlist_get_parent.9 \ nv.9 nvlist_get_string.9 \ nv.9 nvlist_move_binary.9 \ nv.9 nvlist_move_descriptor.9 \ nv.9 nvlist_move_nvlist.9 \ nv.9 nvlist_move_string.9 \ nv.9 nvlist_next.9 \ nv.9 nvlist_pack.9 \ nv.9 nvlist_recv.9 \ nv.9 nvlist_send.9 \ nv.9 nvlist_set_error.9 \ nv.9 nvlist_size.9 \ nv.9 nvlist_take_binary.9 \ nv.9 nvlist_take_bool.9 \ nv.9 nvlist_take_descriptor.9 \ nv.9 nvlist_take_number.9 \ nv.9 nvlist_take_nvlist.9 \ nv.9 nvlist_take_string.9 \ nv.9 nvlist_unpack.9 \ nv.9 nvlist_xfer.9 MLINKS+=osd.9 osd_call.9 \ osd.9 osd_del.9 \ osd.9 osd_deregister.9 \ osd.9 osd_exit.9 \ osd.9 osd_get.9 \ osd.9 osd_register.9 \ osd.9 osd_set.9 MLINKS+=panic.9 vpanic.9 MLINKS+=pbuf.9 getpbuf.9 \ pbuf.9 relpbuf.9 \ pbuf.9 trypbuf.9 MLINKS+=PCBGROUP.9 in_pcbgroup_byhash.9 \ PCBGROUP.9 in_pcbgroup_byinpcb.9 \ PCBGROUP.9 in_pcbgroup_destroy.9 \ PCBGROUP.9 in_pcbgroup_enabled.9 \ PCBGROUP.9 in_pcbgroup_init.9 \ PCBGROUP.9 in_pcbgroup_remove.9 \ PCBGROUP.9 in_pcbgroup_update.9 \ PCBGROUP.9 in_pcbgroup_update_mbuf.9 \ PCBGROUP.9 in6_pcbgroup_byhash.9 MLINKS+=pci.9 pci_alloc_msi.9 \ pci.9 pci_alloc_msix.9 \ pci.9 pci_disable_busmaster.9 \ pci.9 pci_disable_io.9 \ pci.9 pci_enable_busmaster.9 \ pci.9 pci_enable_io.9 \ pci.9 pci_find_bsf.9 \ pci.9 pci_find_cap.9 \ pci.9 pci_find_dbsf.9 \ pci.9 pci_find_device.9 \ pci.9 pci_find_extcap.9 \ pci.9 pci_find_htcap.9 \ pci.9 pci_find_pcie_root_port.9 \ pci.9 pci_get_id.9 \ pci.9 pci_get_max_read_req.9 \ pci.9 pci_get_powerstate.9 \ pci.9 pci_get_vpd_ident.9 \ pci.9 pci_get_vpd_readonly.9 \ pci.9 pci_iov_attach.9 \ pci.9 pci_iov_attach_name.9 \ pci.9 pci_iov_detach.9 \ pci.9 pci_msi_count.9 \ pci.9 pci_msix_count.9 \ pci.9 pci_msix_pba_bar.9 \ pci.9 pci_msix_table_bar.9 \ pci.9 pci_pending_msix.9 \ pci.9 pci_read_config.9 \ pci.9 pci_release_msi.9 \ pci.9 pci_remap_msix.9 \ pci.9 pci_restore_state.9 \ pci.9 pci_save_state.9 \ pci.9 pci_set_powerstate.9 \ pci.9 pci_set_max_read_req.9 \ pci.9 pci_write_config.9 \ pci.9 pcie_adjust_config.9 \ pci.9 pcie_read_config.9 \ pci.9 pcie_write_config.9 MLINKS+=pci_iov_schema.9 pci_iov_schema_alloc_node.9 \ pci_iov_schema.9 pci_iov_schema_add_bool.9 \ pci_iov_schema.9 pci_iov_schema_add_string.9 \ pci_iov_schema.9 pci_iov_schema_add_uint8.9 \ pci_iov_schema.9 pci_iov_schema_add_uint16.9 \ pci_iov_schema.9 pci_iov_schema_add_uint32.9 \ pci_iov_schema.9 pci_iov_schema_add_uint64.9 \ pci_iov_schema.9 pci_iov_schema_add_unicast_mac.9 MLINKS+=pfil.9 pfil_add_hook.9 \ pfil.9 pfil_head_register.9 \ pfil.9 pfil_head_unregister.9 \ pfil.9 pfil_hook_get.9 \ pfil.9 pfil_remove_hook.9 \ pfil.9 pfil_rlock.9 \ pfil.9 pfil_run_hooks.9 \ pfil.9 pfil_runlock.9 \ pfil.9 pfil_wlock.9 \ pfil.9 pfil_wunlock.9 MLINKS+=pfind.9 zpfind.9 MLINKS+=PHOLD.9 PRELE.9 \ PHOLD.9 _PHOLD.9 \ PHOLD.9 _PRELE.9 \ PHOLD.9 PROC_ASSERT_HELD.9 \ PHOLD.9 PROC_ASSERT_NOT_HELD.9 MLINKS+=pmap_copy.9 pmap_copy_page.9 MLINKS+=pmap_extract.9 pmap_extract_and_hold.9 MLINKS+=pmap_init.9 pmap_init2.9 MLINKS+=pmap_is_modified.9 pmap_ts_referenced.9 MLINKS+=pmap_pinit.9 pmap_pinit0.9 \ pmap_pinit.9 pmap_pinit2.9 MLINKS+=pmap_qenter.9 pmap_qremove.9 MLINKS+=pmap_quick_enter_page.9 pmap_quick_remove_page.9 MLINKS+=pmap_remove.9 pmap_remove_all.9 \ pmap_remove.9 pmap_remove_pages.9 MLINKS+=pmap_resident_count.9 pmap_wired_count.9 MLINKS+=pmap_zero_page.9 pmap_zero_area.9 \ pmap_zero_page.9 pmap_zero_idle.9 MLINKS+=printf.9 log.9 \ printf.9 tprintf.9 \ printf.9 uprintf.9 MLINKS+=priv.9 priv_check.9 \ priv.9 priv_check_cred.9 MLINKS+=proc_rwmem.9 proc_readmem.9 \ proc_rwmem.9 proc_writemem.9 MLINKS+=psignal.9 gsignal.9 \ psignal.9 pgsignal.9 \ psignal.9 tdsignal.9 MLINKS+=random.9 arc4rand.9 \ random.9 arc4random.9 \ random.9 read_random.9 \ random.9 read_random_uio.9 \ random.9 srandom.9 MLINKS+=refcount.9 refcount_acquire.9 \ refcount.9 refcount_init.9 \ refcount.9 refcount_release.9 MLINKS+=resource_int_value.9 resource_long_value.9 \ resource_int_value.9 resource_string_value.9 MLINKS+=rman.9 rman_activate_resource.9 \ rman.9 rman_adjust_resource.9 \ rman.9 rman_deactivate_resource.9 \ rman.9 rman_fini.9 \ rman.9 rman_first_free_region.9 \ rman.9 rman_get_bushandle.9 \ rman.9 rman_get_bustag.9 \ rman.9 rman_get_device.9 \ rman.9 rman_get_end.9 \ rman.9 rman_get_flags.9 \ rman.9 rman_get_mapping.9 \ rman.9 rman_get_rid.9 \ rman.9 rman_get_size.9 \ rman.9 rman_get_start.9 \ rman.9 rman_get_virtual.9 \ rman.9 rman_init.9 \ rman.9 rman_init_from_resource.9 \ rman.9 rman_is_region_manager.9 \ rman.9 rman_last_free_region.9 \ rman.9 rman_make_alignment_flags.9 \ rman.9 rman_manage_region.9 \ rman.9 rman_release_resource.9 \ rman.9 rman_reserve_resource.9 \ rman.9 rman_reserve_resource_bound.9 \ rman.9 rman_set_bushandle.9 \ rman.9 rman_set_bustag.9 \ rman.9 rman_set_mapping.9 \ rman.9 rman_set_rid.9 \ rman.9 rman_set_virtual.9 MLINKS+=rmlock.9 rm_assert.9 \ rmlock.9 rm_destroy.9 \ rmlock.9 rm_init.9 \ rmlock.9 rm_init_flags.9 \ rmlock.9 rm_rlock.9 \ rmlock.9 rm_runlock.9 \ rmlock.9 rm_sleep.9 \ rmlock.9 RM_SYSINIT.9 \ rmlock.9 rm_try_rlock.9 \ rmlock.9 rm_wlock.9 \ rmlock.9 rm_wowned.9 \ rmlock.9 rm_wunlock.9 MLINKS+=rtalloc.9 rtalloc1.9 \ rtalloc.9 rtalloc_ign.9 \ rtalloc.9 RT_ADDREF.9 \ rtalloc.9 RT_LOCK.9 \ rtalloc.9 RT_REMREF.9 \ rtalloc.9 RT_RTFREE.9 \ rtalloc.9 RT_UNLOCK.9 \ rtalloc.9 RTFREE_LOCKED.9 \ rtalloc.9 RTFREE.9 \ rtalloc.9 rtfree.9 \ rtalloc.9 rtalloc1_fib.9 \ rtalloc.9 rtalloc_ign_fib.9 \ rtalloc.9 rtalloc_fib.9 MLINKS+=runqueue.9 choosethread.9 \ runqueue.9 procrunnable.9 \ runqueue.9 remrunqueue.9 \ runqueue.9 setrunqueue.9 MLINKS+=rwlock.9 rw_assert.9 \ rwlock.9 rw_destroy.9 \ rwlock.9 rw_downgrade.9 \ rwlock.9 rw_init.9 \ rwlock.9 rw_init_flags.9 \ rwlock.9 rw_initialized.9 \ rwlock.9 rw_rlock.9 \ rwlock.9 rw_runlock.9 \ rwlock.9 rw_unlock.9 \ rwlock.9 rw_sleep.9 \ rwlock.9 RW_SYSINIT.9 \ rwlock.9 rw_try_rlock.9 \ rwlock.9 rw_try_upgrade.9 \ rwlock.9 rw_try_wlock.9 \ rwlock.9 rw_wlock.9 \ rwlock.9 rw_wowned.9 \ rwlock.9 rw_wunlock.9 MLINKS+=sbuf.9 sbuf_bcat.9 \ sbuf.9 sbuf_bcopyin.9 \ sbuf.9 sbuf_bcpy.9 \ sbuf.9 sbuf_cat.9 \ sbuf.9 sbuf_clear.9 \ sbuf.9 sbuf_copyin.9 \ sbuf.9 sbuf_cpy.9 \ sbuf.9 sbuf_data.9 \ sbuf.9 sbuf_delete.9 \ sbuf.9 sbuf_done.9 \ sbuf.9 sbuf_error.9 \ sbuf.9 sbuf_finish.9 \ sbuf.9 sbuf_len.9 \ sbuf.9 sbuf_new.9 \ sbuf.9 sbuf_new_auto.9 \ sbuf.9 sbuf_new_for_sysctl.9 \ sbuf.9 sbuf_printf.9 \ sbuf.9 sbuf_putc.9 \ sbuf.9 sbuf_set_drain.9 \ sbuf.9 sbuf_setpos.9 \ sbuf.9 sbuf_start_section.9 \ sbuf.9 sbuf_end_section.9 \ sbuf.9 sbuf_trim.9 \ sbuf.9 sbuf_vprintf.9 MLINKS+=scheduler.9 curpriority_cmp.9 \ scheduler.9 maybe_resched.9 \ scheduler.9 propagate_priority.9 \ scheduler.9 resetpriority.9 \ scheduler.9 roundrobin.9 \ scheduler.9 roundrobin_interval.9 \ scheduler.9 schedclock.9 \ scheduler.9 schedcpu.9 \ scheduler.9 sched_setup.9 \ scheduler.9 setrunnable.9 \ scheduler.9 updatepri.9 MLINKS+=SDT.9 SDT_PROVIDER_DECLARE.9 \ SDT.9 SDT_PROVIDER_DEFINE.9 \ SDT.9 SDT_PROBE_DECLARE.9 \ SDT.9 SDT_PROBE_DEFINE.9 \ SDT.9 SDT_PROBE.9 MLINKS+=securelevel_gt.9 securelevel_ge.9 MLINKS+=selrecord.9 seldrain.9 \ selrecord.9 selwakeup.9 MLINKS+=sema.9 sema_destroy.9 \ sema.9 sema_init.9 \ sema.9 sema_post.9 \ sema.9 sema_timedwait.9 \ sema.9 sema_trywait.9 \ sema.9 sema_value.9 \ sema.9 sema_wait.9 MLINKS+=sf_buf.9 sf_buf_alloc.9 \ sf_buf.9 sf_buf_free.9 \ sf_buf.9 sf_buf_kva.9 \ sf_buf.9 sf_buf_page.9 MLINKS+=sglist.9 sglist_alloc.9 \ sglist.9 sglist_append.9 \ sglist.9 sglist_append_bio.9 \ sglist.9 sglist_append_mbuf.9 \ sglist.9 sglist_append_phys.9 \ sglist.9 sglist_append_uio.9 \ sglist.9 sglist_append_user.9 \ sglist.9 sglist_append_vmpages.9 \ sglist.9 sglist_build.9 \ sglist.9 sglist_clone.9 \ sglist.9 sglist_consume_uio.9 \ sglist.9 sglist_count.9 \ sglist.9 sglist_count_vmpages.9 \ sglist.9 sglist_free.9 \ sglist.9 sglist_hold.9 \ sglist.9 sglist_init.9 \ sglist.9 sglist_join.9 \ sglist.9 sglist_length.9 \ sglist.9 sglist_reset.9 \ sglist.9 sglist_slice.9 \ sglist.9 sglist_split.9 MLINKS+=shm_map.9 shm_unmap.9 MLINKS+=signal.9 cursig.9 \ signal.9 execsigs.9 \ signal.9 issignal.9 \ signal.9 killproc.9 \ signal.9 pgsigio.9 \ signal.9 postsig.9 \ signal.9 SETSETNEQ.9 \ signal.9 SETSETOR.9 \ signal.9 SIGADDSET.9 \ signal.9 SIG_CONTSIGMASK.9 \ signal.9 SIGDELSET.9 \ signal.9 SIGEMPTYSET.9 \ signal.9 sigexit.9 \ signal.9 SIGFILLSET.9 \ signal.9 siginit.9 \ signal.9 SIGISEMPTY.9 \ signal.9 SIGISMEMBER.9 \ signal.9 SIGNOTEMPTY.9 \ signal.9 signotify.9 \ signal.9 SIGPENDING.9 \ signal.9 SIGSETAND.9 \ signal.9 SIGSETCANTMASK.9 \ signal.9 SIGSETEQ.9 \ signal.9 SIGSETNAND.9 \ signal.9 SIG_STOPSIGMASK.9 \ signal.9 trapsignal.9 MLINKS+=sleep.9 msleep.9 \ sleep.9 msleep_sbt.9 \ sleep.9 msleep_spin.9 \ sleep.9 msleep_spin_sbt.9 \ sleep.9 pause.9 \ sleep.9 pause_sbt.9 \ sleep.9 tsleep.9 \ sleep.9 tsleep_sbt.9 \ sleep.9 wakeup.9 \ sleep.9 wakeup_one.9 MLINKS+=sleepqueue.9 init_sleepqueues.9 \ sleepqueue.9 sleepq_abort.9 \ sleepqueue.9 sleepq_add.9 \ sleepqueue.9 sleepq_alloc.9 \ sleepqueue.9 sleepq_broadcast.9 \ sleepqueue.9 sleepq_free.9 \ sleepqueue.9 sleepq_lookup.9 \ sleepqueue.9 sleepq_lock.9 \ sleepqueue.9 sleepq_release.9 \ sleepqueue.9 sleepq_remove.9 \ sleepqueue.9 sleepq_set_timeout.9 \ sleepqueue.9 sleepq_set_timeout_sbt.9 \ sleepqueue.9 sleepq_signal.9 \ sleepqueue.9 sleepq_sleepcnt.9 \ sleepqueue.9 sleepq_timedwait.9 \ sleepqueue.9 sleepq_timedwait_sig.9 \ sleepqueue.9 sleepq_type.9 \ sleepqueue.9 sleepq_wait.9 \ sleepqueue.9 sleepq_wait_sig.9 MLINKS+=socket.9 soabort.9 \ socket.9 soaccept.9 \ socket.9 sobind.9 \ socket.9 socheckuid.9 \ socket.9 soclose.9 \ socket.9 soconnect.9 \ socket.9 socreate.9 \ socket.9 sodisconnect.9 \ socket.9 sodupsockaddr.9 \ socket.9 sofree.9 \ socket.9 sogetopt.9 \ socket.9 sohasoutofband.9 \ socket.9 solisten.9 \ socket.9 solisten_proto.9 \ socket.9 solisten_proto_check.9 \ socket.9 sonewconn.9 \ socket.9 sooptcopyin.9 \ socket.9 sooptcopyout.9 \ socket.9 sopoll.9 \ socket.9 sopoll_generic.9 \ socket.9 soreceive.9 \ socket.9 soreceive_dgram.9 \ socket.9 soreceive_generic.9 \ socket.9 soreceive_stream.9 \ socket.9 soreserve.9 \ socket.9 sorflush.9 \ socket.9 sosend.9 \ socket.9 sosend_dgram.9 \ socket.9 sosend_generic.9 \ socket.9 sosetopt.9 \ socket.9 soshutdown.9 \ socket.9 sotoxsocket.9 \ socket.9 soupcall_clear.9 \ socket.9 soupcall_set.9 \ socket.9 sowakeup.9 MLINKS+=stack.9 stack_copy.9 \ stack.9 stack_create.9 \ stack.9 stack_destroy.9 \ stack.9 stack_print.9 \ stack.9 stack_print_ddb.9 \ stack.9 stack_print_short.9 \ stack.9 stack_print_short_ddb.9 \ stack.9 stack_put.9 \ stack.9 stack_save.9 \ stack.9 stack_sbuf_print.9 \ stack.9 stack_sbuf_print_ddb.9 \ stack.9 stack_zero.9 MLINKS+=store.9 subyte.9 \ store.9 suswintr.9 \ store.9 suword.9 \ store.9 suword16.9 \ store.9 suword32.9 \ store.9 suword64.9 MLINKS+=swi.9 swi_add.9 \ swi.9 swi_remove.9 \ swi.9 swi_sched.9 MLINKS+=sx.9 sx_assert.9 \ sx.9 sx_destroy.9 \ sx.9 sx_downgrade.9 \ sx.9 sx_init.9 \ sx.9 sx_init_flags.9 \ sx.9 sx_sleep.9 \ sx.9 sx_slock.9 \ sx.9 sx_slock_sig.9 \ sx.9 sx_sunlock.9 \ sx.9 SX_SYSINIT.9 \ sx.9 sx_try_slock.9 \ sx.9 sx_try_upgrade.9 \ sx.9 sx_try_xlock.9 \ sx.9 sx_unlock.9 \ sx.9 sx_xholder.9 \ sx.9 sx_xlock.9 \ sx.9 sx_xlock_sig.9 \ sx.9 sx_xlocked.9 \ sx.9 sx_xunlock.9 MLINKS+=sysctl.9 SYSCTL_DECL.9 \ sysctl.9 SYSCTL_ADD_INT.9 \ sysctl.9 SYSCTL_ADD_LONG.9 \ sysctl.9 SYSCTL_ADD_NODE.9 \ sysctl.9 SYSCTL_ADD_OPAQUE.9 \ sysctl.9 SYSCTL_ADD_PROC.9 \ sysctl.9 SYSCTL_ADD_QUAD.9 \ sysctl.9 SYSCTL_ADD_ROOT_NODE.9 \ sysctl.9 SYSCTL_ADD_S8.9 \ sysctl.9 SYSCTL_ADD_S16.9 \ sysctl.9 SYSCTL_ADD_S32.9 \ sysctl.9 SYSCTL_ADD_S64.9 \ sysctl.9 SYSCTL_ADD_STRING.9 \ sysctl.9 SYSCTL_ADD_STRUCT.9 \ sysctl.9 SYSCTL_ADD_U8.9 \ sysctl.9 SYSCTL_ADD_U16.9 \ sysctl.9 SYSCTL_ADD_U32.9 \ sysctl.9 SYSCTL_ADD_U64.9 \ sysctl.9 SYSCTL_ADD_UAUTO.9 \ sysctl.9 SYSCTL_ADD_UINT.9 \ sysctl.9 SYSCTL_ADD_ULONG.9 \ sysctl.9 SYSCTL_ADD_UQUAD.9 \ sysctl.9 SYSCTL_CHILDREN.9 \ sysctl.9 SYSCTL_STATIC_CHILDREN.9 \ sysctl.9 SYSCTL_NODE_CHILDREN.9 \ sysctl.9 SYSCTL_PARENT.9 \ sysctl.9 SYSCTL_INT.9 \ sysctl.9 SYSCTL_LONG.9 \ sysctl.9 SYSCTL_NODE.9 \ sysctl.9 SYSCTL_OPAQUE.9 \ sysctl.9 SYSCTL_PROC.9 \ sysctl.9 SYSCTL_QUAD.9 \ sysctl.9 SYSCTL_ROOT_NODE.9 \ sysctl.9 SYSCTL_S8.9 \ sysctl.9 SYSCTL_S16.9 \ sysctl.9 SYSCTL_S32.9 \ sysctl.9 SYSCTL_S64.9 \ sysctl.9 SYSCTL_STRING.9 \ sysctl.9 SYSCTL_STRUCT.9 \ sysctl.9 SYSCTL_U8.9 \ sysctl.9 SYSCTL_U16.9 \ sysctl.9 SYSCTL_U32.9 \ sysctl.9 SYSCTL_U64.9 \ sysctl.9 SYSCTL_UINT.9 \ sysctl.9 SYSCTL_ULONG.9 \ sysctl.9 SYSCTL_UQUAD.9 MLINKS+=sysctl_add_oid.9 sysctl_move_oid.9 \ sysctl_add_oid.9 sysctl_remove_oid.9 \ sysctl_add_oid.9 sysctl_remove_name.9 MLINKS+=sysctl_ctx_init.9 sysctl_ctx_entry_add.9 \ sysctl_ctx_init.9 sysctl_ctx_entry_del.9 \ sysctl_ctx_init.9 sysctl_ctx_entry_find.9 \ sysctl_ctx_init.9 sysctl_ctx_free.9 MLINKS+=SYSINIT.9 SYSUNINIT.9 MLINKS+=taskqueue.9 TASK_INIT.9 \ taskqueue.9 TASK_INITIALIZER.9 \ taskqueue.9 taskqueue_block.9 \ taskqueue.9 taskqueue_cancel.9 \ taskqueue.9 taskqueue_cancel_timeout.9 \ taskqueue.9 taskqueue_create.9 \ taskqueue.9 taskqueue_create_fast.9 \ taskqueue.9 TASKQUEUE_DECLARE.9 \ taskqueue.9 TASKQUEUE_DEFINE.9 \ taskqueue.9 TASKQUEUE_DEFINE_THREAD.9 \ taskqueue.9 taskqueue_drain.9 \ taskqueue.9 taskqueue_drain_all.9 \ taskqueue.9 taskqueue_drain_timeout.9 \ taskqueue.9 taskqueue_enqueue.9 \ taskqueue.9 taskqueue_enqueue_timeout.9 \ taskqueue.9 TASKQUEUE_FAST_DEFINE.9 \ taskqueue.9 TASKQUEUE_FAST_DEFINE_THREAD.9 \ taskqueue.9 taskqueue_free.9 \ taskqueue.9 taskqueue_member.9 \ taskqueue.9 taskqueue_run.9 \ taskqueue.9 taskqueue_set_callback.9 \ taskqueue.9 taskqueue_start_threads.9 \ taskqueue.9 taskqueue_start_threads_pinned.9 \ taskqueue.9 taskqueue_unblock.9 \ taskqueue.9 TIMEOUT_TASK_INIT.9 MLINKS+=tcp_functions.9 register_tcp_functions.9 \ tcp_functions.9 deregister_tcp_functions.9 MLINKS+=time.9 boottime.9 \ time.9 time_second.9 \ time.9 time_uptime.9 MLINKS+=timeout.9 callout.9 \ timeout.9 callout_active.9 \ timeout.9 callout_async_drain.9 \ timeout.9 callout_deactivate.9 \ timeout.9 callout_drain.9 \ timeout.9 callout_handle_init.9 \ timeout.9 callout_init.9 \ timeout.9 callout_init_mtx.9 \ timeout.9 callout_init_rm.9 \ timeout.9 callout_init_rw.9 \ timeout.9 callout_pending.9 \ timeout.9 callout_reset.9 \ timeout.9 callout_reset_curcpu.9 \ timeout.9 callout_reset_on.9 \ timeout.9 callout_reset_sbt.9 \ timeout.9 callout_reset_sbt_curcpu.9 \ timeout.9 callout_reset_sbt_on.9 \ timeout.9 callout_schedule.9 \ timeout.9 callout_schedule_curcpu.9 \ timeout.9 callout_schedule_on.9 \ timeout.9 callout_schedule_sbt.9 \ timeout.9 callout_schedule_sbt_curcpu.9 \ timeout.9 callout_schedule_sbt_on.9 \ timeout.9 callout_stop.9 \ timeout.9 callout_when.9 \ timeout.9 untimeout.9 MLINKS+=ucred.9 cred_update_thread.9 \ ucred.9 crcopy.9 \ ucred.9 crcopysafe.9 \ ucred.9 crdup.9 \ ucred.9 crfree.9 \ ucred.9 crget.9 \ ucred.9 crhold.9 \ ucred.9 crsetgroups.9 \ ucred.9 crshared.9 \ ucred.9 cru2x.9 MLINKS+=uidinfo.9 uifind.9 \ uidinfo.9 uifree.9 \ uidinfo.9 uihashinit.9 \ uidinfo.9 uihold.9 MLINKS+=uio.9 uiomove.9 \ uio.9 uiomove_nofault.9 .if ${MK_USB} != "no" MAN+= usbdi.9 MLINKS+=usbdi.9 usbd_do_request.9 \ usbdi.9 usbd_do_request_flags.9 \ usbdi.9 usbd_errstr.9 \ usbdi.9 usbd_lookup_id_by_info.9 \ usbdi.9 usbd_lookup_id_by_uaa.9 \ usbdi.9 usbd_transfer_clear_stall.9 \ usbdi.9 usbd_transfer_drain.9 \ usbdi.9 usbd_transfer_pending.9 \ usbdi.9 usbd_transfer_poll.9 \ usbdi.9 usbd_transfer_setup.9 \ usbdi.9 usbd_transfer_start.9 \ usbdi.9 usbd_transfer_stop.9 \ usbdi.9 usbd_transfer_submit.9 \ usbdi.9 usbd_transfer_unsetup.9 \ usbdi.9 usbd_xfer_clr_flag.9 \ usbdi.9 usbd_xfer_frame_data.9 \ usbdi.9 usbd_xfer_frame_len.9 \ usbdi.9 usbd_xfer_get_frame.9 \ usbdi.9 usbd_xfer_get_priv.9 \ usbdi.9 usbd_xfer_is_stalled.9 \ usbdi.9 usbd_xfer_max_framelen.9 \ usbdi.9 usbd_xfer_max_frames.9 \ usbdi.9 usbd_xfer_max_len.9 \ usbdi.9 usbd_xfer_set_flag.9 \ usbdi.9 usbd_xfer_set_frame_data.9 \ usbdi.9 usbd_xfer_set_frame_len.9 \ usbdi.9 usbd_xfer_set_frame_offset.9 \ usbdi.9 usbd_xfer_set_frames.9 \ usbdi.9 usbd_xfer_set_interval.9 \ usbdi.9 usbd_xfer_set_priv.9 \ usbdi.9 usbd_xfer_set_stall.9 \ usbdi.9 usbd_xfer_set_timeout.9 \ usbdi.9 usbd_xfer_softc.9 \ usbdi.9 usbd_xfer_state.9 \ usbdi.9 usbd_xfer_status.9 \ usbdi.9 usb_fifo_alloc_buffer.9 \ usbdi.9 usb_fifo_attach.9 \ usbdi.9 usb_fifo_detach.9 \ usbdi.9 usb_fifo_free_buffer.9 \ usbdi.9 usb_fifo_get_data.9 \ usbdi.9 usb_fifo_get_data_buffer.9 \ usbdi.9 usb_fifo_get_data_error.9 \ usbdi.9 usb_fifo_get_data_linear.9 \ usbdi.9 usb_fifo_put_bytes_max.9 \ usbdi.9 usb_fifo_put_data.9 \ usbdi.9 usb_fifo_put_data_buffer.9 \ usbdi.9 usb_fifo_put_data_error.9 \ usbdi.9 usb_fifo_put_data_linear.9 \ usbdi.9 usb_fifo_reset.9 \ usbdi.9 usb_fifo_softc.9 \ usbdi.9 usb_fifo_wakeup.9 .endif MLINKS+=vcount.9 count_dev.9 MLINKS+=vfsconf.9 vfs_modevent.9 \ vfsconf.9 vfs_register.9 \ vfsconf.9 vfs_unregister.9 MLINKS+=vfs_getopt.9 vfs_copyopt.9 \ vfs_getopt.9 vfs_filteropt.9 \ vfs_getopt.9 vfs_flagopt.9 \ vfs_getopt.9 vfs_getopts.9 \ vfs_getopt.9 vfs_scanopt.9 \ vfs_getopt.9 vfs_setopt.9 \ vfs_getopt.9 vfs_setopt_part.9 \ vfs_getopt.9 vfs_setopts.9 MLINKS+=vhold.9 vdrop.9 \ vhold.9 vdropl.9 \ vhold.9 vholdl.9 MLINKS+=vmem.9 vmem_add.9 \ vmem.9 vmem_alloc.9 \ vmem.9 vmem_create.9 \ vmem.9 vmem_destroy.9 \ vmem.9 vmem_free.9 \ vmem.9 vmem_xalloc.9 \ vmem.9 vmem_xfree.9 MLINKS+=vm_map_lock.9 vm_map_lock_downgrade.9 \ vm_map_lock.9 vm_map_lock_read.9 \ vm_map_lock.9 vm_map_lock_upgrade.9 \ vm_map_lock.9 vm_map_trylock.9 \ vm_map_lock.9 vm_map_trylock_read.9 \ vm_map_lock.9 vm_map_unlock.9 \ vm_map_lock.9 vm_map_unlock_read.9 MLINKS+=vm_map_lookup.9 vm_map_lookup_done.9 MLINKS+=vm_map_max.9 vm_map_min.9 \ vm_map_max.9 vm_map_pmap.9 MLINKS+=vm_map_stack.9 vm_map_growstack.9 MLINKS+=vm_map_wire.9 vm_map_unwire.9 MLINKS+=vm_page_bits.9 vm_page_clear_dirty.9 \ vm_page_bits.9 vm_page_dirty.9 \ vm_page_bits.9 vm_page_is_valid.9 \ vm_page_bits.9 vm_page_set_invalid.9 \ vm_page_bits.9 vm_page_set_validclean.9 \ vm_page_bits.9 vm_page_test_dirty.9 \ vm_page_bits.9 vm_page_undirty.9 \ vm_page_bits.9 vm_page_zero_invalid.9 MLINKS+=vm_page_busy.9 vm_page_busied.9 \ vm_page_busy.9 vm_page_busy_downgrade.9 \ vm_page_busy.9 vm_page_busy_sleep.9 \ vm_page_busy.9 vm_page_sbusied.9 \ vm_page_busy.9 vm_page_sbusy.9 \ vm_page_busy.9 vm_page_sleep_if_busy.9 \ vm_page_busy.9 vm_page_sunbusy.9 \ vm_page_busy.9 vm_page_trysbusy.9 \ vm_page_busy.9 vm_page_tryxbusy.9 \ vm_page_busy.9 vm_page_xbusied.9 \ vm_page_busy.9 vm_page_xbusy.9 \ vm_page_busy.9 vm_page_xunbusy.9 \ vm_page_busy.9 vm_page_assert_sbusied.9 \ vm_page_busy.9 vm_page_assert_unbusied.9 \ vm_page_busy.9 vm_page_assert_xbusied.9 MLINKS+=vm_page_aflag.9 vm_page_aflag_clear.9 \ vm_page_aflag.9 vm_page_aflag_set.9 \ vm_page_aflag.9 vm_page_reference.9 MLINKS+=vm_page_free.9 vm_page_free_toq.9 \ vm_page_free.9 vm_page_free_zero.9 \ vm_page_free.9 vm_page_try_to_free.9 MLINKS+=vm_page_hold.9 vm_page_unhold.9 MLINKS+=vm_page_insert.9 vm_page_remove.9 MLINKS+=vm_page_wire.9 vm_page_unwire.9 MLINKS+=VOP_ACCESS.9 VOP_ACCESSX.9 MLINKS+=VOP_ATTRIB.9 VOP_GETATTR.9 \ VOP_ATTRIB.9 VOP_SETATTR.9 MLINKS+=VOP_CREATE.9 VOP_MKDIR.9 \ VOP_CREATE.9 VOP_MKNOD.9 \ VOP_CREATE.9 VOP_SYMLINK.9 MLINKS+=VOP_GETPAGES.9 VOP_PUTPAGES.9 MLINKS+=VOP_INACTIVE.9 VOP_RECLAIM.9 MLINKS+=VOP_LOCK.9 vn_lock.9 \ VOP_LOCK.9 VOP_ISLOCKED.9 \ VOP_LOCK.9 VOP_UNLOCK.9 MLINKS+=VOP_OPENCLOSE.9 VOP_CLOSE.9 \ VOP_OPENCLOSE.9 VOP_OPEN.9 MLINKS+=VOP_RDWR.9 VOP_READ.9 \ VOP_RDWR.9 VOP_WRITE.9 MLINKS+=VOP_REMOVE.9 VOP_RMDIR.9 MLINKS+=vnet.9 vimage.9 MLINKS+=vref.9 VREF.9 MLINKS+=vrele.9 vput.9 \ vrele.9 vunref.9 MLINKS+=vslock.9 vsunlock.9 MLINKS+=zone.9 uma.9 \ zone.9 uma_find_refcnt.9 \ zone.9 uma_zalloc.9 \ zone.9 uma_zalloc_arg.9 \ zone.9 uma_zcreate.9 \ zone.9 uma_zdestroy.9 \ zone.9 uma_zfree.9 \ zone.9 uma_zfree_arg.9 \ zone.9 uma_zone_get_cur.9 \ zone.9 uma_zone_get_max.9 \ zone.9 uma_zone_set_max.9 \ zone.9 uma_zone_set_warning.9 \ zone.9 uma_zone_set_maxaction.9 .include Index: user/alc/PQ_LAUNDRY/share/man/man9/lock.9 =================================================================== --- user/alc/PQ_LAUNDRY/share/man/man9/lock.9 (revision 303774) +++ user/alc/PQ_LAUNDRY/share/man/man9/lock.9 (revision 303775) @@ -1,424 +1,417 @@ .\" .\" Copyright (C) 2002 Chad David . All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice(s), this list of conditions and the following disclaimer as .\" the first lines of this file unmodified other than the possible .\" addition of one or more copyright notices. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice(s), this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY .\" EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE .\" DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY .\" DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES .\" (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR .\" SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER .\" CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH .\" DAMAGE. .\" .\" $FreeBSD$ .\" .Dd November 2, 2014 .Dt LOCK 9 .Os .Sh NAME .Nm lockinit , .Nm lockdestroy , .Nm lockmgr , .Nm lockmgr_args , .Nm lockmgr_args_rw , .Nm lockmgr_disown , .Nm lockmgr_printinfo , .Nm lockmgr_recursed , .Nm lockmgr_rw , -.Nm lockmgr_waiters , .Nm lockstatus , .Nm lockmgr_assert .Nd "lockmgr family of functions" .Sh SYNOPSIS .In sys/types.h .In sys/lock.h .In sys/lockmgr.h .Ft void .Fn lockinit "struct lock *lkp" "int prio" "const char *wmesg" "int timo" "int flags" .Ft void .Fn lockdestroy "struct lock *lkp" .Ft int .Fn lockmgr "struct lock *lkp" "u_int flags" "struct mtx *ilk" .Ft int .Fn lockmgr_args "struct lock *lkp" "u_int flags" "struct mtx *ilk" "const char *wmesg" "int prio" "int timo" .Ft int .Fn lockmgr_args_rw "struct lock *lkp" "u_int flags" "struct rwlock *ilk" "const char *wmesg" "int prio" "int timo" .Ft void .Fn lockmgr_disown "struct lock *lkp" .Ft void .Fn lockmgr_printinfo "const struct lock *lkp" .Ft int .Fn lockmgr_recursed "const struct lock *lkp" .Ft int .Fn lockmgr_rw "struct lock *lkp" "u_int flags" "struct rwlock *ilk" .Ft int -.Fn lockmgr_waiters "const struct lock *lkp" -.Ft int .Fn lockstatus "const struct lock *lkp" .Pp .Cd "options INVARIANTS" .Cd "options INVARIANT_SUPPORT" .Ft void .Fn lockmgr_assert "const struct lock *lkp" "int what" .Sh DESCRIPTION The .Fn lockinit function is used to initialize a lock. It must be called before any operation can be performed on a lock. Its arguments are: .Bl -tag -width ".Fa wmesg" .It Fa lkp A pointer to the lock to initialize. .It Fa prio The priority passed to .Xr sleep 9 . .It Fa wmesg The lock message. This is used for both debugging output and .Xr sleep 9 . .It Fa timo The timeout value passed to .Xr sleep 9 . .It Fa flags The flags the lock is to be initialized with: .Bl -tag -width ".Dv LK_CANRECURSE" .It Dv LK_ADAPTIVE Enable adaptive spinning for this lock if the kernel is compiled with the ADAPTIVE_LOCKMGRS option. .It Dv LK_CANRECURSE Allow recursive exclusive locks. .It Dv LK_NOPROFILE Disable lock profiling for this lock. .It Dv LK_NOSHARE Allow exclusive locks only. .It Dv LK_NOWITNESS Instruct .Xr witness 4 to ignore this lock. .It Dv LK_NODUP .Xr witness 4 should log messages about duplicate locks being acquired. .It Dv LK_QUIET Disable .Xr ktr 4 logging for this lock. .It Dv LK_TIMELOCK Use .Fa timo during a sleep; otherwise, 0 is used. .El .El .Pp The .Fn lockdestroy function is used to destroy a lock, and while it is called in a number of places in the kernel, it currently does nothing. .Pp The .Fn lockmgr and .Fn lockmgr_rw functions handle general locking functionality within the kernel, including support for shared and exclusive locks, and recursion. .Fn lockmgr and .Fn lockmgr_rw are also able to upgrade and downgrade locks. .Pp Their arguments are: .Bl -tag -width ".Fa flags" .It Fa lkp A pointer to the lock to manipulate. .It Fa flags Flags indicating what action is to be taken. .Bl -tag -width ".Dv LK_NODDLKTREAT" .It Dv LK_SHARED Acquire a shared lock. If an exclusive lock is currently held, .Dv EDEADLK will be returned. .It Dv LK_EXCLUSIVE Acquire an exclusive lock. If an exclusive lock is already held, and .Dv LK_CANRECURSE is not set, the system will .Xr panic 9 . .It Dv LK_DOWNGRADE Downgrade exclusive lock to a shared lock. Downgrading a shared lock is not permitted. If an exclusive lock has been recursed, the system will .Xr panic 9 . .It Dv LK_UPGRADE Upgrade a shared lock to an exclusive lock. If this call fails, the shared lock is lost, even if the .Dv LK_NOWAIT flag is specified. During the upgrade, the shared lock could be temporarily dropped. Attempts to upgrade an exclusive lock will cause a .Xr panic 9 . .It Dv LK_TRYUPGRADE Try to upgrade a shared lock to an exclusive lock. The failure to upgrade does not result in the dropping of the shared lock ownership. .It Dv LK_RELEASE Release the lock. Releasing a lock that is not held can cause a .Xr panic 9 . .It Dv LK_DRAIN Wait for all activity on the lock to end, then mark it decommissioned. This is used before freeing a lock that is part of a piece of memory that is about to be freed. (As documented in .In sys/lockmgr.h . ) .It Dv LK_SLEEPFAIL Fail if operation has slept. .It Dv LK_NOWAIT Do not allow the call to sleep. This can be used to test the lock. .It Dv LK_NOWITNESS Skip the .Xr witness 4 checks for this instance. .It Dv LK_CANRECURSE Allow recursion on an exclusive lock. For every lock there must be a release. .It Dv LK_INTERLOCK Unlock the interlock (which should be locked already). .It Dv LK_NODDLKTREAT Normally, .Fn lockmgr postpones serving further shared requests for shared-locked lock if there is exclusive waiter, to avoid exclusive lock starvation. But, if the thread requesting the shared lock already owns a shared lockmgr lock, the request is granted even in presence of the parallel exclusive lock request, which is done to avoid deadlocks with recursive shared acquisition. .Pp The .Dv LK_NODDLKTREAT flag can only be used by code which requests shared non-recursive lock. The flag allows exclusive requests to preempt the current shared request even if the current thread owns shared locks. This is safe since shared lock is guaranteed to not recurse, and is used when thread is known to held unrelated shared locks, to not cause unnecessary starvation. An example is .Dv vp locking in VFS .Xr lookup 9 , when .Dv dvp is already locked. .El .It Fa ilk An interlock mutex for controlling group access to the lock. If .Dv LK_INTERLOCK is specified, .Fn lockmgr and .Fn lockmgr_rw assume .Fa ilk is currently owned and not recursed, and will return it unlocked. See .Xr mtx_assert 9 . .El .Pp The .Fn lockmgr_args and .Fn lockmgr_args_rw function work like .Fn lockmgr and .Fn lockmgr_rw but accepting a .Fa wmesg , .Fa timo and .Fa prio on a per-instance basis. The specified values will override the default ones, but this can still be used passing, respectively, .Dv LK_WMESG_DEFAULT , .Dv LK_PRIO_DEFAULT and .Dv LK_TIMO_DEFAULT . .Pp The .Fn lockmgr_disown function switches the owner from the current thread to be .Dv LK_KERNPROC , if the lock is already held. .Pp The .Fn lockmgr_printinfo function prints debugging information about the lock. It is used primarily by .Xr VOP_PRINT 9 functions. .Pp The .Fn lockmgr_recursed function returns true if the lock is recursed, 0 otherwise. -.Pp -The -.Fn lockmgr_waiters -function returns true if the lock has waiters, 0 otherwise. .Pp The .Fn lockstatus function returns the status of the lock in relation to the current thread. .Pp When compiled with .Cd "options INVARIANTS" and .Cd "options INVARIANT_SUPPORT" , the .Fn lockmgr_assert function tests .Fa lkp for the assertions specified in .Fa what , and panics if they are not met. One of the following assertions must be specified: .Bl -tag -width ".Dv KA_UNLOCKED" .It Dv KA_LOCKED Assert that the current thread has either a shared or an exclusive lock on the .Vt lkp lock pointed to by the first argument. .It Dv KA_SLOCKED Assert that the current thread has a shared lock on the .Vt lkp lock pointed to by the first argument. .It Dv KA_XLOCKED Assert that the current thread has an exclusive lock on the .Vt lkp lock pointed to by the first argument. .It Dv KA_UNLOCKED Assert that the current thread has no lock on the .Vt lkp lock pointed to by the first argument. .El .Pp In addition, one of the following optional assertions can be used with either an .Dv KA_LOCKED , .Dv KA_SLOCKED , or .Dv KA_XLOCKED assertion: .Bl -tag -width ".Dv KA_NOTRECURSED" .It Dv KA_RECURSED Assert that the current thread has a recursed lock on .Fa lkp . .It Dv KA_NOTRECURSED Assert that the current thread does not have a recursed lock on .Fa lkp . .El .Sh RETURN VALUES The .Fn lockmgr and .Fn lockmgr_rw functions return 0 on success and non-zero on failure. .Pp The .Fn lockstatus function returns: .Bl -tag -width ".Dv LK_EXCLUSIVE" .It Dv LK_EXCLUSIVE An exclusive lock is held by the current thread. .It Dv LK_EXCLOTHER An exclusive lock is held by someone other than the current thread. .It Dv LK_SHARED A shared lock is held. .It Li 0 The lock is not held by anyone. .El .Sh ERRORS .Fn lockmgr and .Fn lockmgr_rw fail if: .Bl -tag -width Er .It Bq Er EBUSY .Dv LK_FORCEUPGRADE was requested and another thread had already requested a lock upgrade. .It Bq Er EBUSY .Dv LK_NOWAIT was set, and a sleep would have been required, or .Dv LK_TRYUPGRADE operation was not able to upgrade the lock. .It Bq Er ENOLCK .Dv LK_SLEEPFAIL was set and .Fn lockmgr or .Fn lockmgr_rw did sleep. .It Bq Er EINTR .Dv PCATCH was set in the lock priority, and a signal was delivered during a sleep. Note the .Er ERESTART error below. .It Bq Er ERESTART .Dv PCATCH was set in the lock priority, a signal was delivered during a sleep, and the system call is to be restarted. .It Bq Er EWOULDBLOCK a non-zero timeout was given, and the timeout expired. .El .Sh LOCKS If .Dv LK_INTERLOCK is passed in the .Fa flags argument to .Fn lockmgr or .Fn lockmgr_rw , the .Fa ilk must be held prior to calling .Fn lockmgr or .Fn lockmgr_rw , and will be returned unlocked. .Pp Upgrade attempts that fail result in the loss of the lock that is currently held. Also, it is invalid to upgrade an exclusive lock, and a .Xr panic 9 will be the result of trying. .Sh SEE ALSO .Xr condvar 9 , .Xr locking 9 , .Xr mtx_assert 9 , .Xr mutex 9 , .Xr panic 9 , .Xr rwlock 9 , .Xr sleep 9 , .Xr sx 9 , .Xr VOP_PRINT 9 .Sh AUTHORS This manual page was written by .An Chad David Aq Mt davidc@acns.ab.ca . Index: user/alc/PQ_LAUNDRY/sys/amd64/amd64/mem.c =================================================================== --- user/alc/PQ_LAUNDRY/sys/amd64/amd64/mem.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/amd64/amd64/mem.c (revision 303775) @@ -1,237 +1,239 @@ /*- * Copyright (c) 1988 University of Utah. * Copyright (c) 1982, 1986, 1990 The Regents of the University of California. * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department, and code derived from software contributed to * Berkeley by William Jolitz. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: Utah $Hdr: mem.c 1.13 89/10/08$ * from: @(#)mem.c 7.2 (Berkeley) 5/9/91 */ #include __FBSDID("$FreeBSD$"); /* * Memory special file */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* * Used in /dev/mem drivers and elsewhere */ MALLOC_DEFINE(M_MEMDESC, "memdesc", "memory range descriptors"); /* ARGSUSED */ int memrw(struct cdev *dev, struct uio *uio, int flags) { struct iovec *iov; void *p; ssize_t orig_resid; u_long v, vd; u_int c; int error; error = 0; orig_resid = uio->uio_resid; while (uio->uio_resid > 0 && error == 0) { iov = uio->uio_iov; if (iov->iov_len == 0) { uio->uio_iov++; uio->uio_iovcnt--; if (uio->uio_iovcnt < 0) panic("memrw"); continue; } v = uio->uio_offset; c = ulmin(iov->iov_len, PAGE_SIZE - (u_int)(v & PAGE_MASK)); switch (dev2unit(dev)) { case CDEV_MINOR_KMEM: /* * Since c is clamped to be less or equal than * PAGE_SIZE, the uiomove() call does not * access past the end of the direct map. */ if (v >= DMAP_MIN_ADDRESS && v < DMAP_MIN_ADDRESS + dmaplimit) { error = uiomove((void *)v, c, uio); break; } if (!kernacc((void *)v, c, uio->uio_rw == UIO_READ ? VM_PROT_READ : VM_PROT_WRITE)) { error = EFAULT; break; } /* * If the extracted address is not accessible * through the direct map, then we make a * private (uncached) mapping because we can't * depend on the existing kernel mapping * remaining valid until the completion of * uiomove(). * * XXX We cannot provide access to the * physical page 0 mapped into KVA. */ v = pmap_extract(kernel_pmap, v); if (v == 0) { error = EFAULT; break; } /* FALLTHROUGH */ case CDEV_MINOR_MEM: if (v < dmaplimit) { vd = PHYS_TO_DMAP(v); error = uiomove((void *)vd, c, uio); break; } if (v >= (1ULL << cpu_maxphyaddr)) { error = EFAULT; break; } p = pmap_mapdev(v, PAGE_SIZE); error = uiomove(p, c, uio); pmap_unmapdev((vm_offset_t)p, PAGE_SIZE); break; } } /* * Don't return error if any byte was written. Read and write * can return error only if no i/o was performed. */ if (uio->uio_resid != orig_resid) error = 0; return (error); } /* * allow user processes to MMAP some memory sections * instead of going through read/write */ /* ARGSUSED */ int memmmap(struct cdev *dev, vm_ooffset_t offset, vm_paddr_t *paddr, int prot __unused, vm_memattr_t *memattr __unused) { - if (dev2unit(dev) == CDEV_MINOR_MEM) + if (dev2unit(dev) == CDEV_MINOR_MEM) { + if (offset >= (1ULL << cpu_maxphyaddr)) + return (-1); *paddr = offset; - else if (dev2unit(dev) == CDEV_MINOR_KMEM) + } else if (dev2unit(dev) == CDEV_MINOR_KMEM) *paddr = vtophys(offset); /* else panic! */ return (0); } /* * Operations for changing memory attributes. * * This is basically just an ioctl shim for mem_range_attr_get * and mem_range_attr_set. */ /* ARGSUSED */ int memioctl(struct cdev *dev __unused, u_long cmd, caddr_t data, int flags, struct thread *td) { int nd, error = 0; struct mem_range_op *mo = (struct mem_range_op *)data; struct mem_range_desc *md; /* is this for us? */ if ((cmd != MEMRANGE_GET) && (cmd != MEMRANGE_SET)) return (ENOTTY); /* any chance we can handle this? */ if (mem_range_softc.mr_op == NULL) return (EOPNOTSUPP); /* do we have any descriptors? */ if (mem_range_softc.mr_ndesc == 0) return (ENXIO); switch (cmd) { case MEMRANGE_GET: nd = imin(mo->mo_arg[0], mem_range_softc.mr_ndesc); if (nd > 0) { md = (struct mem_range_desc *) malloc(nd * sizeof(struct mem_range_desc), M_MEMDESC, M_WAITOK); error = mem_range_attr_get(md, &nd); if (!error) error = copyout(md, mo->mo_desc, nd * sizeof(struct mem_range_desc)); free(md, M_MEMDESC); } else nd = mem_range_softc.mr_ndesc; mo->mo_arg[0] = nd; break; case MEMRANGE_SET: md = (struct mem_range_desc *)malloc(sizeof(struct mem_range_desc), M_MEMDESC, M_WAITOK); error = copyin(mo->mo_desc, md, sizeof(struct mem_range_desc)); /* clamp description string */ md->mr_owner[sizeof(md->mr_owner) - 1] = 0; if (error == 0) error = mem_range_attr_set(md, &mo->mo_arg[0]); free(md, M_MEMDESC); break; } return (error); } Index: user/alc/PQ_LAUNDRY/sys/cddl/compat/opensolaris/sys/vnode.h =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/compat/opensolaris/sys/vnode.h (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/compat/opensolaris/sys/vnode.h (revision 303775) @@ -1,289 +1,287 @@ /*- * Copyright (c) 2007 Pawel Jakub Dawidek * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _OPENSOLARIS_SYS_VNODE_H_ #define _OPENSOLARIS_SYS_VNODE_H_ #ifdef _KERNEL struct vnode; struct vattr; typedef struct vnode vnode_t; typedef struct vattr vattr_t; typedef enum vtype vtype_t; #include enum symfollow { NO_FOLLOW = NOFOLLOW }; #include #include_next #include #include #include #include #include #include typedef struct vop_vector vnodeops_t; #define VOP_FID VOP_VPTOFH #define vop_fid vop_vptofh #define vop_fid_args vop_vptofh_args #define a_fid a_fhp #define IS_XATTRDIR(dvp) (0) #define v_count v_usecount #define V_APPEND VAPPEND #define rootvfs (rootvnode == NULL ? NULL : rootvnode->v_mount) static __inline int vn_is_readonly(vnode_t *vp) { return (vp->v_mount->mnt_flag & MNT_RDONLY); } #define vn_vfswlock(vp) (0) #define vn_vfsunlock(vp) do { } while (0) #define vn_ismntpt(vp) ((vp)->v_type == VDIR && (vp)->v_mountedhere != NULL) #define vn_mountedvfs(vp) ((vp)->v_mountedhere) #define vn_has_cached_data(vp) \ ((vp)->v_object != NULL && \ ((vp)->v_object->resident_page_count > 0 || \ !vm_object_cache_is_empty((vp)->v_object))) #define vn_exists(vp) do { } while (0) #define vn_invalid(vp) do { } while (0) #define vn_renamepath(tdvp, svp, tnm, lentnm) do { } while (0) #define vn_free(vp) do { } while (0) #define vn_matchops(vp, vops) ((vp)->v_op == &(vops)) #define VN_HOLD(v) vref(v) #define VN_RELE(v) vrele(v) #define VN_URELE(v) vput(v) -#define VOP_REALVP(vp, vpp, ct) (*(vpp) = (vp), 0) - #define vnevent_create(vp, ct) do { } while (0) #define vnevent_link(vp, ct) do { } while (0) #define vnevent_remove(vp, dvp, name, ct) do { } while (0) #define vnevent_rmdir(vp, dvp, name, ct) do { } while (0) #define vnevent_rename_src(vp, dvp, name, ct) do { } while (0) #define vnevent_rename_dest(vp, dvp, name, ct) do { } while (0) #define vnevent_rename_dest_dir(vp, ct) do { } while (0) #define specvp(vp, rdev, type, cr) (VN_HOLD(vp), (vp)) #define MANDMODE(mode) (0) #define MANDLOCK(vp, mode) (0) #define chklock(vp, op, offset, size, mode, ct) (0) #define cleanlocks(vp, pid, foo) do { } while (0) #define cleanshares(vp, pid) do { } while (0) /* * We will use va_spare is place of Solaris' va_mask. * This field is initialized in zfs_setattr(). */ #define va_mask va_spare /* TODO: va_fileid is shorter than va_nodeid !!! */ #define va_nodeid va_fileid /* TODO: This field needs conversion! */ #define va_nblocks va_bytes #define va_blksize va_blocksize #define va_seq va_gen #define MAXOFFSET_T OFF_MAX #define EXCL 0 #define ACCESSED (AT_ATIME) #define STATE_CHANGED (AT_CTIME) #define CONTENT_MODIFIED (AT_MTIME | AT_CTIME) static __inline void vattr_init_mask(vattr_t *vap) { vap->va_mask = 0; if (vap->va_type != VNON) vap->va_mask |= AT_TYPE; if (vap->va_uid != (uid_t)VNOVAL) vap->va_mask |= AT_UID; if (vap->va_gid != (gid_t)VNOVAL) vap->va_mask |= AT_GID; if (vap->va_size != (u_quad_t)VNOVAL) vap->va_mask |= AT_SIZE; if (vap->va_atime.tv_sec != VNOVAL) vap->va_mask |= AT_ATIME; if (vap->va_mtime.tv_sec != VNOVAL) vap->va_mask |= AT_MTIME; if (vap->va_mode != (u_short)VNOVAL) vap->va_mask |= AT_MODE; if (vap->va_flags != VNOVAL) vap->va_mask |= AT_XVATTR; } #define FCREAT O_CREAT #define FTRUNC O_TRUNC #define FEXCL O_EXCL #define FDSYNC FFSYNC #define FRSYNC FFSYNC #define FSYNC FFSYNC #define FOFFMAX 0x00 #define FIGNORECASE 0x00 static __inline int vn_openat(char *pnamep, enum uio_seg seg, int filemode, int createmode, vnode_t **vpp, enum create crwhy, mode_t umask, struct vnode *startvp, int fd) { struct thread *td = curthread; struct nameidata nd; int error, operation; ASSERT(seg == UIO_SYSSPACE); if ((filemode & FCREAT) != 0) { ASSERT(filemode == (FWRITE | FCREAT | FTRUNC | FOFFMAX)); ASSERT(crwhy == CRCREAT); operation = CREATE; } else { ASSERT(filemode == (FREAD | FOFFMAX) || filemode == (FREAD | FWRITE | FOFFMAX)); ASSERT(crwhy == 0); operation = LOOKUP; } ASSERT(umask == 0); pwd_ensure_dirs(); if (startvp != NULL) vref(startvp); NDINIT_ATVP(&nd, operation, 0, UIO_SYSSPACE, pnamep, startvp, td); filemode |= O_NOFOLLOW; error = vn_open_cred(&nd, &filemode, createmode, 0, td->td_ucred, NULL); NDFREE(&nd, NDF_ONLY_PNBUF); if (error == 0) { /* We just unlock so we hold a reference. */ VOP_UNLOCK(nd.ni_vp, 0); *vpp = nd.ni_vp; } return (error); } static __inline int zfs_vn_open(char *pnamep, enum uio_seg seg, int filemode, int createmode, vnode_t **vpp, enum create crwhy, mode_t umask) { return (vn_openat(pnamep, seg, filemode, createmode, vpp, crwhy, umask, NULL, -1)); } #define vn_open(pnamep, seg, filemode, createmode, vpp, crwhy, umask) \ zfs_vn_open((pnamep), (seg), (filemode), (createmode), (vpp), (crwhy), (umask)) #define RLIM64_INFINITY 0 static __inline int zfs_vn_rdwr(enum uio_rw rw, vnode_t *vp, caddr_t base, ssize_t len, offset_t offset, enum uio_seg seg, int ioflag, int ulimit, cred_t *cr, ssize_t *residp) { struct thread *td = curthread; int error; ssize_t resid; ASSERT(ioflag == 0); ASSERT(ulimit == RLIM64_INFINITY); if (rw == UIO_WRITE) { ioflag = IO_SYNC; } else { ioflag = IO_DIRECT; } error = vn_rdwr(rw, vp, base, len, offset, seg, ioflag, cr, NOCRED, &resid, td); if (residp != NULL) *residp = (ssize_t)resid; return (error); } #define vn_rdwr(rw, vp, base, len, offset, seg, ioflag, ulimit, cr, residp) \ zfs_vn_rdwr((rw), (vp), (base), (len), (offset), (seg), (ioflag), (ulimit), (cr), (residp)) static __inline int zfs_vop_fsync(vnode_t *vp, int flag, cred_t *cr) { struct mount *mp; int error; ASSERT(flag == FSYNC); if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0) goto drop; vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = VOP_FSYNC(vp, MNT_WAIT, curthread); VOP_UNLOCK(vp, 0); vn_finished_write(mp); drop: return (error); } #define VOP_FSYNC(vp, flag, cr, ct) zfs_vop_fsync((vp), (flag), (cr)) static __inline int zfs_vop_close(vnode_t *vp, int flag, int count, offset_t offset, cred_t *cr) { int error; ASSERT(count == 1); ASSERT(offset == 0); error = vn_close(vp, flag, cr, curthread); return (error); } #define VOP_CLOSE(vp, oflags, count, offset, cr, ct) \ zfs_vop_close((vp), (oflags), (count), (offset), (cr)) static __inline int vn_rename(char *from, char *to, enum uio_seg seg) { ASSERT(seg == UIO_SYSSPACE); return (kern_renameat(curthread, AT_FDCWD, from, AT_FDCWD, to, seg)); } static __inline int vn_remove(char *fnamep, enum uio_seg seg, enum rm dirflag) { ASSERT(seg == UIO_SYSSPACE); ASSERT(dirflag == RMFILE); return (kern_unlinkat(curthread, AT_FDCWD, fnamep, seg, 0)); } #endif /* _KERNEL */ #endif /* _OPENSOLARIS_SYS_VNODE_H_ */ Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_dir.h =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_dir.h (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_dir.h (revision 303775) @@ -1,74 +1,74 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright 2010 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ #ifndef _SYS_FS_ZFS_DIR_H #define _SYS_FS_ZFS_DIR_H #include #include #include #ifdef __cplusplus extern "C" { #endif /* zfs_dirent_lock() flags */ #define ZNEW 0x0001 /* entry should not exist */ #define ZEXISTS 0x0002 /* entry should exist */ #define ZSHARED 0x0004 /* shared access (zfs_dirlook()) */ #define ZXATTR 0x0008 /* we want the xattr dir */ #define ZRENAMING 0x0010 /* znode is being renamed */ #define ZCILOOK 0x0020 /* case-insensitive lookup requested */ #define ZCIEXACT 0x0040 /* c-i requires c-s match (rename) */ #define ZHAVELOCK 0x0080 /* z_name_lock is already held */ /* mknode flags */ #define IS_ROOT_NODE 0x01 /* create a root node */ #define IS_XATTR 0x02 /* create an extended attribute node */ -extern int zfs_dirent_lock(zfs_dirlock_t **, znode_t *, char *, znode_t **, - int, int *, pathname_t *); -extern void zfs_dirent_unlock(zfs_dirlock_t *); -extern int zfs_link_create(zfs_dirlock_t *, znode_t *, dmu_tx_t *, int); -extern int zfs_link_destroy(zfs_dirlock_t *, znode_t *, dmu_tx_t *, int, +extern int zfs_dirent_lookup(znode_t *, const char *, znode_t **, int); +extern int zfs_link_create(znode_t *, const char *, znode_t *, dmu_tx_t *, int); +extern int zfs_link_destroy(znode_t *, const char *, znode_t *, dmu_tx_t *, int, boolean_t *); -extern int zfs_dirlook(znode_t *, char *, vnode_t **, int, int *, - pathname_t *); +#if 0 +extern int zfs_dirlook(vnode_t *, const char *, vnode_t **, int); +#else +extern int zfs_dirlook(znode_t *, const char *name, znode_t **); +#endif extern void zfs_mknode(znode_t *, vattr_t *, dmu_tx_t *, cred_t *, uint_t, znode_t **, zfs_acl_ids_t *); extern void zfs_rmnode(znode_t *); -extern void zfs_dl_name_switch(zfs_dirlock_t *dl, char *new, char **old); extern boolean_t zfs_dirempty(znode_t *); extern void zfs_unlinked_add(znode_t *, dmu_tx_t *); extern void zfs_unlinked_drain(zfsvfs_t *zfsvfs); extern int zfs_sticky_remove_access(znode_t *, znode_t *, cred_t *cr); extern int zfs_get_xattrdir(znode_t *, vnode_t **, cred_t *, int); extern int zfs_make_xattrdir(znode_t *, vattr_t *, vnode_t **, cred_t *); #ifdef __cplusplus } #endif #endif /* _SYS_FS_ZFS_DIR_H */ Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h (revision 303775) @@ -1,168 +1,169 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011 Pawel Jakub Dawidek . * All rights reserved. */ #ifndef _SYS_FS_ZFS_VFSOPS_H #define _SYS_FS_ZFS_VFSOPS_H #include #include #include #include #include #include #ifdef __cplusplus extern "C" { #endif typedef struct zfsvfs zfsvfs_t; struct znode; struct zfsvfs { vfs_t *z_vfs; /* generic fs struct */ zfsvfs_t *z_parent; /* parent fs */ objset_t *z_os; /* objset reference */ uint64_t z_root; /* id of root znode */ uint64_t z_unlinkedobj; /* id of unlinked zapobj */ uint64_t z_max_blksz; /* maximum block size for files */ uint64_t z_fuid_obj; /* fuid table object number */ uint64_t z_fuid_size; /* fuid table size */ avl_tree_t z_fuid_idx; /* fuid tree keyed by index */ avl_tree_t z_fuid_domain; /* fuid tree keyed by domain */ krwlock_t z_fuid_lock; /* fuid lock */ boolean_t z_fuid_loaded; /* fuid tables are loaded */ boolean_t z_fuid_dirty; /* need to sync fuid table ? */ struct zfs_fuid_info *z_fuid_replay; /* fuid info for replay */ zilog_t *z_log; /* intent log pointer */ uint_t z_acl_mode; /* acl chmod/mode behavior */ uint_t z_acl_inherit; /* acl inheritance behavior */ zfs_case_t z_case; /* case-sense */ boolean_t z_utf8; /* utf8-only */ int z_norm; /* normalization flags */ boolean_t z_atime; /* enable atimes mount option */ boolean_t z_unmounted; /* unmounted */ rrmlock_t z_teardown_lock; krwlock_t z_teardown_inactive_lock; list_t z_all_znodes; /* all vnodes in the fs */ kmutex_t z_znodes_lock; /* lock for z_all_znodes */ vnode_t *z_ctldir; /* .zfs directory pointer */ boolean_t z_show_ctldir; /* expose .zfs in the root dir */ boolean_t z_issnap; /* true if this is a snapshot */ boolean_t z_vscan; /* virus scan on/off */ boolean_t z_use_fuids; /* version allows fuids */ boolean_t z_replay; /* set during ZIL replay */ boolean_t z_use_sa; /* version allow system attributes */ + boolean_t z_use_namecache;/* make use of FreeBSD name cache */ uint64_t z_version; /* ZPL version */ uint64_t z_shares_dir; /* hidden shares dir */ kmutex_t z_lock; uint64_t z_userquota_obj; uint64_t z_groupquota_obj; uint64_t z_replay_eof; /* New end of file - replay only */ sa_attr_type_t *z_attr_table; /* SA attr mapping->id */ #define ZFS_OBJ_MTX_SZ 64 kmutex_t z_hold_mtx[ZFS_OBJ_MTX_SZ]; /* znode hold locks */ }; /* * Normal filesystems (those not under .zfs/snapshot) have a total * file ID size limited to 12 bytes (including the length field) due to * NFSv2 protocol's limitation of 32 bytes for a filehandle. For historical * reasons, this same limit is being imposed by the Solaris NFSv3 implementation * (although the NFSv3 protocol actually permits a maximum of 64 bytes). It * is not possible to expand beyond 12 bytes without abandoning support * of NFSv2. * * For normal filesystems, we partition up the available space as follows: * 2 bytes fid length (required) * 6 bytes object number (48 bits) * 4 bytes generation number (32 bits) * * We reserve only 48 bits for the object number, as this is the limit * currently defined and imposed by the DMU. */ typedef struct zfid_short { uint16_t zf_len; uint8_t zf_object[6]; /* obj[i] = obj >> (8 * i) */ uint8_t zf_gen[4]; /* gen[i] = gen >> (8 * i) */ } zfid_short_t; /* * Filesystems under .zfs/snapshot have a total file ID size of 22[*] bytes * (including the length field). This makes files under .zfs/snapshot * accessible by NFSv3 and NFSv4, but not NFSv2. * * For files under .zfs/snapshot, we partition up the available space * as follows: * 2 bytes fid length (required) * 6 bytes object number (48 bits) * 4 bytes generation number (32 bits) * 6 bytes objset id (48 bits) * 4 bytes[**] currently just zero (32 bits) * * We reserve only 48 bits for the object number and objset id, as these are * the limits currently defined and imposed by the DMU. * * [*] 20 bytes on FreeBSD to fit into the size of struct fid. * [**] 2 bytes on FreeBSD for the above reason. */ typedef struct zfid_long { zfid_short_t z_fid; uint8_t zf_setid[6]; /* obj[i] = obj >> (8 * i) */ uint8_t zf_setgen[2]; /* gen[i] = gen >> (8 * i) */ } zfid_long_t; #define SHORT_FID_LEN (sizeof (zfid_short_t) - sizeof (uint16_t)) #define LONG_FID_LEN (sizeof (zfid_long_t) - sizeof (uint16_t)) extern uint_t zfs_fsyncer_key; extern int zfs_super_owner; extern int zfs_suspend_fs(zfsvfs_t *zfsvfs); extern int zfs_resume_fs(zfsvfs_t *zfsvfs, const char *osname); extern int zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type, const char *domain, uint64_t rid, uint64_t *valuep); extern int zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type, uint64_t *cookiep, void *vbuf, uint64_t *bufsizep); extern int zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type, const char *domain, uint64_t rid, uint64_t quota); extern boolean_t zfs_owner_overquota(zfsvfs_t *zfsvfs, struct znode *, boolean_t isgroup); extern boolean_t zfs_fuid_overquota(zfsvfs_t *zfsvfs, boolean_t isgroup, uint64_t fuid); extern int zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers); extern int zfsvfs_create(const char *name, zfsvfs_t **zfvp); extern void zfsvfs_free(zfsvfs_t *zfsvfs); extern int zfs_check_global_label(const char *dsname, const char *hexsl); #ifdef _KERNEL extern void zfsvfs_update_fromname(const char *oldname, const char *newname); #endif #ifdef __cplusplus } #endif #endif /* _SYS_FS_ZFS_VFSOPS_H */ Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h (revision 303775) @@ -1,375 +1,377 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2012 by Delphix. All rights reserved. * Copyright (c) 2014 Integros [integros.com] */ #ifndef _SYS_FS_ZFS_ZNODE_H #define _SYS_FS_ZFS_ZNODE_H #ifdef _KERNEL #include #include #include #include #include #include #include #endif #include #include #ifdef __cplusplus extern "C" { #endif /* * Additional file level attributes, that are stored * in the upper half of zp_flags */ #define ZFS_READONLY 0x0000000100000000 #define ZFS_HIDDEN 0x0000000200000000 #define ZFS_SYSTEM 0x0000000400000000 #define ZFS_ARCHIVE 0x0000000800000000 #define ZFS_IMMUTABLE 0x0000001000000000 #define ZFS_NOUNLINK 0x0000002000000000 #define ZFS_APPENDONLY 0x0000004000000000 #define ZFS_NODUMP 0x0000008000000000 #define ZFS_OPAQUE 0x0000010000000000 #define ZFS_AV_QUARANTINED 0x0000020000000000 #define ZFS_AV_MODIFIED 0x0000040000000000 #define ZFS_REPARSE 0x0000080000000000 #define ZFS_OFFLINE 0x0000100000000000 #define ZFS_SPARSE 0x0000200000000000 #define ZFS_ATTR_SET(zp, attr, value, pflags, tx) \ { \ if (value) \ pflags |= attr; \ else \ pflags &= ~attr; \ VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_FLAGS(zp->z_zfsvfs), \ &pflags, sizeof (pflags), tx)); \ } /* * Define special zfs pflags */ #define ZFS_XATTR 0x1 /* is an extended attribute */ #define ZFS_INHERIT_ACE 0x2 /* ace has inheritable ACEs */ #define ZFS_ACL_TRIVIAL 0x4 /* files ACL is trivial */ #define ZFS_ACL_OBJ_ACE 0x8 /* ACL has CMPLX Object ACE */ #define ZFS_ACL_PROTECTED 0x10 /* ACL protected */ #define ZFS_ACL_DEFAULTED 0x20 /* ACL should be defaulted */ #define ZFS_ACL_AUTO_INHERIT 0x40 /* ACL should be inherited */ #define ZFS_BONUS_SCANSTAMP 0x80 /* Scanstamp in bonus area */ #define ZFS_NO_EXECS_DENIED 0x100 /* exec was given to everyone */ #define SA_ZPL_ATIME(z) z->z_attr_table[ZPL_ATIME] #define SA_ZPL_MTIME(z) z->z_attr_table[ZPL_MTIME] #define SA_ZPL_CTIME(z) z->z_attr_table[ZPL_CTIME] #define SA_ZPL_CRTIME(z) z->z_attr_table[ZPL_CRTIME] #define SA_ZPL_GEN(z) z->z_attr_table[ZPL_GEN] #define SA_ZPL_DACL_ACES(z) z->z_attr_table[ZPL_DACL_ACES] #define SA_ZPL_XATTR(z) z->z_attr_table[ZPL_XATTR] #define SA_ZPL_SYMLINK(z) z->z_attr_table[ZPL_SYMLINK] #define SA_ZPL_RDEV(z) z->z_attr_table[ZPL_RDEV] #define SA_ZPL_SCANSTAMP(z) z->z_attr_table[ZPL_SCANSTAMP] #define SA_ZPL_UID(z) z->z_attr_table[ZPL_UID] #define SA_ZPL_GID(z) z->z_attr_table[ZPL_GID] #define SA_ZPL_PARENT(z) z->z_attr_table[ZPL_PARENT] #define SA_ZPL_LINKS(z) z->z_attr_table[ZPL_LINKS] #define SA_ZPL_MODE(z) z->z_attr_table[ZPL_MODE] #define SA_ZPL_DACL_COUNT(z) z->z_attr_table[ZPL_DACL_COUNT] #define SA_ZPL_FLAGS(z) z->z_attr_table[ZPL_FLAGS] #define SA_ZPL_SIZE(z) z->z_attr_table[ZPL_SIZE] #define SA_ZPL_ZNODE_ACL(z) z->z_attr_table[ZPL_ZNODE_ACL] #define SA_ZPL_PAD(z) z->z_attr_table[ZPL_PAD] /* * Is ID ephemeral? */ #define IS_EPHEMERAL(x) (x > MAXUID) /* * Should we use FUIDs? */ #define USE_FUIDS(version, os) (version >= ZPL_VERSION_FUID && \ spa_version(dmu_objset_spa(os)) >= SPA_VERSION_FUID) #define USE_SA(version, os) (version >= ZPL_VERSION_SA && \ spa_version(dmu_objset_spa(os)) >= SPA_VERSION_SA) #define MASTER_NODE_OBJ 1 /* * Special attributes for master node. * "userquota@" and "groupquota@" are also valid (from * zfs_userquota_prop_prefixes[]). */ #define ZFS_FSID "FSID" #define ZFS_UNLINKED_SET "DELETE_QUEUE" #define ZFS_ROOT_OBJ "ROOT" #define ZPL_VERSION_STR "VERSION" #define ZFS_FUID_TABLES "FUID" #define ZFS_SHARES_DIR "SHARES" #define ZFS_SA_ATTRS "SA_ATTRS" /* * Path component length * * The generic fs code uses MAXNAMELEN to represent * what the largest component length is. Unfortunately, * this length includes the terminating NULL. ZFS needs * to tell the users via pathconf() and statvfs() what the * true maximum length of a component is, excluding the NULL. */ #define ZFS_MAXNAMELEN (MAXNAMELEN - 1) /* * Convert mode bits (zp_mode) to BSD-style DT_* values for storing in * the directory entries. */ #ifndef IFTODT #define IFTODT(mode) (((mode) & S_IFMT) >> 12) #endif /* * The directory entry has the type (currently unused on Solaris) in the * top 4 bits, and the object number in the low 48 bits. The "middle" * 12 bits are unused. */ #define ZFS_DIRENT_TYPE(de) BF64_GET(de, 60, 4) #define ZFS_DIRENT_OBJ(de) BF64_GET(de, 0, 48) /* * Directory entry locks control access to directory entries. * They are used to protect creates, deletes, and renames. * Each directory znode has a mutex and a list of locked names. */ #ifdef _KERNEL typedef struct zfs_dirlock { char *dl_name; /* directory entry being locked */ uint32_t dl_sharecnt; /* 0 if exclusive, > 0 if shared */ uint8_t dl_namelock; /* 1 if z_name_lock is NOT held */ uint16_t dl_namesize; /* set if dl_name was allocated */ kcondvar_t dl_cv; /* wait for entry to be unlocked */ struct znode *dl_dzp; /* directory znode */ struct zfs_dirlock *dl_next; /* next in z_dirlocks list */ } zfs_dirlock_t; typedef struct znode { struct zfsvfs *z_zfsvfs; vnode_t *z_vnode; uint64_t z_id; /* object ID for this znode */ +#ifdef illumos kmutex_t z_lock; /* znode modification lock */ krwlock_t z_parent_lock; /* parent lock for directories */ krwlock_t z_name_lock; /* "master" lock for dirent locks */ zfs_dirlock_t *z_dirlocks; /* directory entry lock list */ +#endif kmutex_t z_range_lock; /* protects changes to z_range_avl */ avl_tree_t z_range_avl; /* avl tree of file range locks */ uint8_t z_unlinked; /* file has been unlinked */ uint8_t z_atime_dirty; /* atime needs to be synced */ uint8_t z_zn_prefetch; /* Prefetch znodes? */ uint8_t z_moved; /* Has this znode been moved? */ uint_t z_blksz; /* block size in bytes */ uint_t z_seq; /* modification sequence number */ uint64_t z_mapcnt; /* number of pages mapped to file */ uint64_t z_gen; /* generation (cached) */ uint64_t z_size; /* file size (cached) */ uint64_t z_atime[2]; /* atime (cached) */ uint64_t z_links; /* file links (cached) */ uint64_t z_pflags; /* pflags (cached) */ uint64_t z_uid; /* uid fuid (cached) */ uint64_t z_gid; /* gid fuid (cached) */ mode_t z_mode; /* mode (cached) */ uint32_t z_sync_cnt; /* synchronous open count */ kmutex_t z_acl_lock; /* acl data lock */ zfs_acl_t *z_acl_cached; /* cached acl */ list_node_t z_link_node; /* all znodes in fs link */ sa_handle_t *z_sa_hdl; /* handle to sa data */ boolean_t z_is_sa; /* are we native sa? */ } znode_t; /* * Range locking rules * -------------------- * 1. When truncating a file (zfs_create, zfs_setattr, zfs_space) the whole * file range needs to be locked as RL_WRITER. Only then can the pages be * freed etc and zp_size reset. zp_size must be set within range lock. * 2. For writes and punching holes (zfs_write & zfs_space) just the range * being written or freed needs to be locked as RL_WRITER. * Multiple writes at the end of the file must coordinate zp_size updates * to ensure data isn't lost. A compare and swap loop is currently used * to ensure the file size is at least the offset last written. * 3. For reads (zfs_read, zfs_get_data & zfs_putapage) just the range being * read needs to be locked as RL_READER. A check against zp_size can then * be made for reading beyond end of file. */ /* * Convert between znode pointers and vnode pointers */ #ifdef DEBUG static __inline vnode_t * ZTOV(znode_t *zp) { vnode_t *vp = zp->z_vnode; ASSERT(vp == NULL || vp->v_data == NULL || vp->v_data == zp); return (vp); } static __inline znode_t * VTOZ(vnode_t *vp) { znode_t *zp = (znode_t *)vp->v_data; ASSERT(zp == NULL || zp->z_vnode == NULL || zp->z_vnode == vp); return (zp); } #else #define ZTOV(ZP) ((ZP)->z_vnode) #define VTOZ(VP) ((znode_t *)(VP)->v_data) #endif /* Called on entry to each ZFS vnode and vfs operation */ #define ZFS_ENTER(zfsvfs) \ { \ rrm_enter_read(&(zfsvfs)->z_teardown_lock, FTAG); \ if ((zfsvfs)->z_unmounted) { \ ZFS_EXIT(zfsvfs); \ return (EIO); \ } \ } /* Must be called before exiting the vop */ #define ZFS_EXIT(zfsvfs) rrm_exit(&(zfsvfs)->z_teardown_lock, FTAG) /* Verifies the znode is valid */ #define ZFS_VERIFY_ZP(zp) \ if ((zp)->z_sa_hdl == NULL) { \ ZFS_EXIT((zp)->z_zfsvfs); \ return (EIO); \ } \ /* * Macros for dealing with dmu_buf_hold */ #define ZFS_OBJ_HASH(obj_num) ((obj_num) & (ZFS_OBJ_MTX_SZ - 1)) #define ZFS_OBJ_MUTEX(zfsvfs, obj_num) \ (&(zfsvfs)->z_hold_mtx[ZFS_OBJ_HASH(obj_num)]) #define ZFS_OBJ_HOLD_ENTER(zfsvfs, obj_num) \ mutex_enter(ZFS_OBJ_MUTEX((zfsvfs), (obj_num))) #define ZFS_OBJ_HOLD_TRYENTER(zfsvfs, obj_num) \ mutex_tryenter(ZFS_OBJ_MUTEX((zfsvfs), (obj_num))) #define ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num) \ mutex_exit(ZFS_OBJ_MUTEX((zfsvfs), (obj_num))) /* Encode ZFS stored time values from a struct timespec */ #define ZFS_TIME_ENCODE(tp, stmp) \ { \ (stmp)[0] = (uint64_t)(tp)->tv_sec; \ (stmp)[1] = (uint64_t)(tp)->tv_nsec; \ } /* Decode ZFS stored time values to a struct timespec */ #define ZFS_TIME_DECODE(tp, stmp) \ { \ (tp)->tv_sec = (time_t)(stmp)[0]; \ (tp)->tv_nsec = (long)(stmp)[1]; \ } /* * Timestamp defines */ #define ACCESSED (AT_ATIME) #define STATE_CHANGED (AT_CTIME) #define CONTENT_MODIFIED (AT_MTIME | AT_CTIME) #define ZFS_ACCESSTIME_STAMP(zfsvfs, zp) \ if ((zfsvfs)->z_atime && !((zfsvfs)->z_vfs->vfs_flag & VFS_RDONLY)) \ zfs_tstamp_update_setup(zp, ACCESSED, NULL, NULL, B_FALSE); extern int zfs_init_fs(zfsvfs_t *, znode_t **); extern void zfs_set_dataprop(objset_t *); extern void zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *, dmu_tx_t *tx); extern void zfs_tstamp_update_setup(znode_t *, uint_t, uint64_t [2], uint64_t [2], boolean_t); extern void zfs_grow_blocksize(znode_t *, uint64_t, dmu_tx_t *); extern int zfs_freesp(znode_t *, uint64_t, uint64_t, int, boolean_t); extern void zfs_znode_init(void); extern void zfs_znode_fini(void); extern int zfs_zget(zfsvfs_t *, uint64_t, znode_t **); extern int zfs_rezget(znode_t *); extern void zfs_zinactive(znode_t *); extern void zfs_znode_delete(znode_t *, dmu_tx_t *); extern void zfs_znode_free(znode_t *); extern void zfs_remove_op_tables(); extern int zfs_create_op_tables(); extern dev_t zfs_cmpldev(uint64_t); extern int zfs_get_zplprop(objset_t *os, zfs_prop_t prop, uint64_t *value); extern int zfs_get_stats(objset_t *os, nvlist_t *nv); extern void zfs_znode_dmu_fini(znode_t *); extern void zfs_log_create(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype, znode_t *dzp, znode_t *zp, char *name, vsecattr_t *, zfs_fuid_info_t *, vattr_t *vap); extern int zfs_log_create_txtype(zil_create_t, vsecattr_t *vsecp, vattr_t *vap); extern void zfs_log_remove(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype, znode_t *dzp, char *name, uint64_t foid); #define ZFS_NO_OBJECT 0 /* no object id */ extern void zfs_log_link(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype, znode_t *dzp, znode_t *zp, char *name); extern void zfs_log_symlink(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype, znode_t *dzp, znode_t *zp, char *name, char *link); extern void zfs_log_rename(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype, znode_t *sdzp, char *sname, znode_t *tdzp, char *dname, znode_t *szp); extern void zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype, znode_t *zp, offset_t off, ssize_t len, int ioflag); extern void zfs_log_truncate(zilog_t *zilog, dmu_tx_t *tx, int txtype, znode_t *zp, uint64_t off, uint64_t len); extern void zfs_log_setattr(zilog_t *zilog, dmu_tx_t *tx, int txtype, znode_t *zp, vattr_t *vap, uint_t mask_applied, zfs_fuid_info_t *fuidp); #ifndef ZFS_NO_ACL extern void zfs_log_acl(zilog_t *zilog, dmu_tx_t *tx, znode_t *zp, vsecattr_t *vsecp, zfs_fuid_info_t *fuidp); #endif extern void zfs_xvattr_set(znode_t *zp, xvattr_t *xvap, dmu_tx_t *tx); extern void zfs_upgrade(zfsvfs_t *zfsvfs, dmu_tx_t *tx); extern int zfs_create_share_dir(zfsvfs_t *zfsvfs, dmu_tx_t *tx); extern zil_get_data_t zfs_get_data; extern zil_replay_func_t *zfs_replay_vector[TX_MAX_TYPE]; extern int zfsfstype; #endif /* _KERNEL */ extern int zfs_obj_to_path(objset_t *osp, uint64_t obj, char *buf, int len); #ifdef __cplusplus } #endif #endif /* _SYS_FS_ZFS_ZNODE_H */ Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c (revision 303775) @@ -1,2720 +1,2703 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2013 by Delphix. All rights reserved. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define ALLOW ACE_ACCESS_ALLOWED_ACE_TYPE #define DENY ACE_ACCESS_DENIED_ACE_TYPE #define MAX_ACE_TYPE ACE_SYSTEM_ALARM_CALLBACK_OBJECT_ACE_TYPE #define MIN_ACE_TYPE ALLOW #define OWNING_GROUP (ACE_GROUP|ACE_IDENTIFIER_GROUP) #define EVERYONE_ALLOW_MASK (ACE_READ_ACL|ACE_READ_ATTRIBUTES | \ ACE_READ_NAMED_ATTRS|ACE_SYNCHRONIZE) #define EVERYONE_DENY_MASK (ACE_WRITE_ACL|ACE_WRITE_OWNER | \ ACE_WRITE_ATTRIBUTES|ACE_WRITE_NAMED_ATTRS) #define OWNER_ALLOW_MASK (ACE_WRITE_ACL | ACE_WRITE_OWNER | \ ACE_WRITE_ATTRIBUTES|ACE_WRITE_NAMED_ATTRS) #define ZFS_CHECKED_MASKS (ACE_READ_ACL|ACE_READ_ATTRIBUTES|ACE_READ_DATA| \ ACE_READ_NAMED_ATTRS|ACE_WRITE_DATA|ACE_WRITE_ATTRIBUTES| \ ACE_WRITE_NAMED_ATTRS|ACE_APPEND_DATA|ACE_EXECUTE|ACE_WRITE_OWNER| \ ACE_WRITE_ACL|ACE_DELETE|ACE_DELETE_CHILD|ACE_SYNCHRONIZE) #define WRITE_MASK_DATA (ACE_WRITE_DATA|ACE_APPEND_DATA|ACE_WRITE_NAMED_ATTRS) #define WRITE_MASK_ATTRS (ACE_WRITE_ACL|ACE_WRITE_OWNER|ACE_WRITE_ATTRIBUTES| \ ACE_DELETE|ACE_DELETE_CHILD) #define WRITE_MASK (WRITE_MASK_DATA|WRITE_MASK_ATTRS) #define OGE_CLEAR (ACE_READ_DATA|ACE_LIST_DIRECTORY|ACE_WRITE_DATA| \ ACE_ADD_FILE|ACE_APPEND_DATA|ACE_ADD_SUBDIRECTORY|ACE_EXECUTE) #define OKAY_MASK_BITS (ACE_READ_DATA|ACE_LIST_DIRECTORY|ACE_WRITE_DATA| \ ACE_ADD_FILE|ACE_APPEND_DATA|ACE_ADD_SUBDIRECTORY|ACE_EXECUTE) #define ALL_INHERIT (ACE_FILE_INHERIT_ACE|ACE_DIRECTORY_INHERIT_ACE | \ ACE_NO_PROPAGATE_INHERIT_ACE|ACE_INHERIT_ONLY_ACE|ACE_INHERITED_ACE) #define RESTRICTED_CLEAR (ACE_WRITE_ACL|ACE_WRITE_OWNER) #define V4_ACL_WIDE_FLAGS (ZFS_ACL_AUTO_INHERIT|ZFS_ACL_DEFAULTED|\ ZFS_ACL_PROTECTED) #define ZFS_ACL_WIDE_FLAGS (V4_ACL_WIDE_FLAGS|ZFS_ACL_TRIVIAL|ZFS_INHERIT_ACE|\ ZFS_ACL_OBJ_ACE) #define ALL_MODE_EXECS (S_IXUSR | S_IXGRP | S_IXOTH) static uint16_t zfs_ace_v0_get_type(void *acep) { return (((zfs_oldace_t *)acep)->z_type); } static uint16_t zfs_ace_v0_get_flags(void *acep) { return (((zfs_oldace_t *)acep)->z_flags); } static uint32_t zfs_ace_v0_get_mask(void *acep) { return (((zfs_oldace_t *)acep)->z_access_mask); } static uint64_t zfs_ace_v0_get_who(void *acep) { return (((zfs_oldace_t *)acep)->z_fuid); } static void zfs_ace_v0_set_type(void *acep, uint16_t type) { ((zfs_oldace_t *)acep)->z_type = type; } static void zfs_ace_v0_set_flags(void *acep, uint16_t flags) { ((zfs_oldace_t *)acep)->z_flags = flags; } static void zfs_ace_v0_set_mask(void *acep, uint32_t mask) { ((zfs_oldace_t *)acep)->z_access_mask = mask; } static void zfs_ace_v0_set_who(void *acep, uint64_t who) { ((zfs_oldace_t *)acep)->z_fuid = who; } /*ARGSUSED*/ static size_t zfs_ace_v0_size(void *acep) { return (sizeof (zfs_oldace_t)); } static size_t zfs_ace_v0_abstract_size(void) { return (sizeof (zfs_oldace_t)); } static int zfs_ace_v0_mask_off(void) { return (offsetof(zfs_oldace_t, z_access_mask)); } /*ARGSUSED*/ static int zfs_ace_v0_data(void *acep, void **datap) { *datap = NULL; return (0); } static acl_ops_t zfs_acl_v0_ops = { zfs_ace_v0_get_mask, zfs_ace_v0_set_mask, zfs_ace_v0_get_flags, zfs_ace_v0_set_flags, zfs_ace_v0_get_type, zfs_ace_v0_set_type, zfs_ace_v0_get_who, zfs_ace_v0_set_who, zfs_ace_v0_size, zfs_ace_v0_abstract_size, zfs_ace_v0_mask_off, zfs_ace_v0_data }; static uint16_t zfs_ace_fuid_get_type(void *acep) { return (((zfs_ace_hdr_t *)acep)->z_type); } static uint16_t zfs_ace_fuid_get_flags(void *acep) { return (((zfs_ace_hdr_t *)acep)->z_flags); } static uint32_t zfs_ace_fuid_get_mask(void *acep) { return (((zfs_ace_hdr_t *)acep)->z_access_mask); } static uint64_t zfs_ace_fuid_get_who(void *args) { uint16_t entry_type; zfs_ace_t *acep = args; entry_type = acep->z_hdr.z_flags & ACE_TYPE_FLAGS; if (entry_type == ACE_OWNER || entry_type == OWNING_GROUP || entry_type == ACE_EVERYONE) return (-1); return (((zfs_ace_t *)acep)->z_fuid); } static void zfs_ace_fuid_set_type(void *acep, uint16_t type) { ((zfs_ace_hdr_t *)acep)->z_type = type; } static void zfs_ace_fuid_set_flags(void *acep, uint16_t flags) { ((zfs_ace_hdr_t *)acep)->z_flags = flags; } static void zfs_ace_fuid_set_mask(void *acep, uint32_t mask) { ((zfs_ace_hdr_t *)acep)->z_access_mask = mask; } static void zfs_ace_fuid_set_who(void *arg, uint64_t who) { zfs_ace_t *acep = arg; uint16_t entry_type = acep->z_hdr.z_flags & ACE_TYPE_FLAGS; if (entry_type == ACE_OWNER || entry_type == OWNING_GROUP || entry_type == ACE_EVERYONE) return; acep->z_fuid = who; } static size_t zfs_ace_fuid_size(void *acep) { zfs_ace_hdr_t *zacep = acep; uint16_t entry_type; switch (zacep->z_type) { case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE: case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE: case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE: case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE: return (sizeof (zfs_object_ace_t)); case ALLOW: case DENY: entry_type = (((zfs_ace_hdr_t *)acep)->z_flags & ACE_TYPE_FLAGS); if (entry_type == ACE_OWNER || entry_type == OWNING_GROUP || entry_type == ACE_EVERYONE) return (sizeof (zfs_ace_hdr_t)); /*FALLTHROUGH*/ default: return (sizeof (zfs_ace_t)); } } static size_t zfs_ace_fuid_abstract_size(void) { return (sizeof (zfs_ace_hdr_t)); } static int zfs_ace_fuid_mask_off(void) { return (offsetof(zfs_ace_hdr_t, z_access_mask)); } static int zfs_ace_fuid_data(void *acep, void **datap) { zfs_ace_t *zacep = acep; zfs_object_ace_t *zobjp; switch (zacep->z_hdr.z_type) { case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE: case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE: case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE: case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE: zobjp = acep; *datap = (caddr_t)zobjp + sizeof (zfs_ace_t); return (sizeof (zfs_object_ace_t) - sizeof (zfs_ace_t)); default: *datap = NULL; return (0); } } static acl_ops_t zfs_acl_fuid_ops = { zfs_ace_fuid_get_mask, zfs_ace_fuid_set_mask, zfs_ace_fuid_get_flags, zfs_ace_fuid_set_flags, zfs_ace_fuid_get_type, zfs_ace_fuid_set_type, zfs_ace_fuid_get_who, zfs_ace_fuid_set_who, zfs_ace_fuid_size, zfs_ace_fuid_abstract_size, zfs_ace_fuid_mask_off, zfs_ace_fuid_data }; /* * The following three functions are provided for compatibility with * older ZPL version in order to determine if the file use to have * an external ACL and what version of ACL previously existed on the * file. Would really be nice to not need this, sigh. */ uint64_t zfs_external_acl(znode_t *zp) { zfs_acl_phys_t acl_phys; int error; if (zp->z_is_sa) return (0); /* * Need to deal with a potential * race where zfs_sa_upgrade could cause * z_isa_sa to change. * * If the lookup fails then the state of z_is_sa should have * changed. */ if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zp->z_zfsvfs), &acl_phys, sizeof (acl_phys))) == 0) return (acl_phys.z_acl_extern_obj); else { /* * after upgrade the SA_ZPL_ZNODE_ACL should have been * removed */ VERIFY(zp->z_is_sa && error == ENOENT); return (0); } } /* * Determine size of ACL in bytes * * This is more complicated than it should be since we have to deal * with old external ACLs. */ static int zfs_acl_znode_info(znode_t *zp, int *aclsize, int *aclcount, zfs_acl_phys_t *aclphys) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; uint64_t acl_count; int size; int error; ASSERT(MUTEX_HELD(&zp->z_acl_lock)); if (zp->z_is_sa) { if ((error = sa_size(zp->z_sa_hdl, SA_ZPL_DACL_ACES(zfsvfs), &size)) != 0) return (error); *aclsize = size; if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_DACL_COUNT(zfsvfs), &acl_count, sizeof (acl_count))) != 0) return (error); *aclcount = acl_count; } else { if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zfsvfs), aclphys, sizeof (*aclphys))) != 0) return (error); if (aclphys->z_acl_version == ZFS_ACL_VERSION_INITIAL) { *aclsize = ZFS_ACL_SIZE(aclphys->z_acl_size); *aclcount = aclphys->z_acl_size; } else { *aclsize = aclphys->z_acl_size; *aclcount = aclphys->z_acl_count; } } return (0); } int zfs_znode_acl_version(znode_t *zp) { zfs_acl_phys_t acl_phys; if (zp->z_is_sa) return (ZFS_ACL_VERSION_FUID); else { int error; /* * Need to deal with a potential * race where zfs_sa_upgrade could cause * z_isa_sa to change. * * If the lookup fails then the state of z_is_sa should have * changed. */ if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zp->z_zfsvfs), &acl_phys, sizeof (acl_phys))) == 0) return (acl_phys.z_acl_version); else { /* * After upgrade SA_ZPL_ZNODE_ACL should have * been removed. */ VERIFY(zp->z_is_sa && error == ENOENT); return (ZFS_ACL_VERSION_FUID); } } } static int zfs_acl_version(int version) { if (version < ZPL_VERSION_FUID) return (ZFS_ACL_VERSION_INITIAL); else return (ZFS_ACL_VERSION_FUID); } static int zfs_acl_version_zp(znode_t *zp) { return (zfs_acl_version(zp->z_zfsvfs->z_version)); } zfs_acl_t * zfs_acl_alloc(int vers) { zfs_acl_t *aclp; aclp = kmem_zalloc(sizeof (zfs_acl_t), KM_SLEEP); list_create(&aclp->z_acl, sizeof (zfs_acl_node_t), offsetof(zfs_acl_node_t, z_next)); aclp->z_version = vers; if (vers == ZFS_ACL_VERSION_FUID) aclp->z_ops = zfs_acl_fuid_ops; else aclp->z_ops = zfs_acl_v0_ops; return (aclp); } zfs_acl_node_t * zfs_acl_node_alloc(size_t bytes) { zfs_acl_node_t *aclnode; aclnode = kmem_zalloc(sizeof (zfs_acl_node_t), KM_SLEEP); if (bytes) { aclnode->z_acldata = kmem_alloc(bytes, KM_SLEEP); aclnode->z_allocdata = aclnode->z_acldata; aclnode->z_allocsize = bytes; aclnode->z_size = bytes; } return (aclnode); } static void zfs_acl_node_free(zfs_acl_node_t *aclnode) { if (aclnode->z_allocsize) kmem_free(aclnode->z_allocdata, aclnode->z_allocsize); kmem_free(aclnode, sizeof (zfs_acl_node_t)); } static void zfs_acl_release_nodes(zfs_acl_t *aclp) { zfs_acl_node_t *aclnode; while (aclnode = list_head(&aclp->z_acl)) { list_remove(&aclp->z_acl, aclnode); zfs_acl_node_free(aclnode); } aclp->z_acl_count = 0; aclp->z_acl_bytes = 0; } void zfs_acl_free(zfs_acl_t *aclp) { zfs_acl_release_nodes(aclp); list_destroy(&aclp->z_acl); kmem_free(aclp, sizeof (zfs_acl_t)); } static boolean_t zfs_acl_valid_ace_type(uint_t type, uint_t flags) { uint16_t entry_type; switch (type) { case ALLOW: case DENY: case ACE_SYSTEM_AUDIT_ACE_TYPE: case ACE_SYSTEM_ALARM_ACE_TYPE: entry_type = flags & ACE_TYPE_FLAGS; return (entry_type == ACE_OWNER || entry_type == OWNING_GROUP || entry_type == ACE_EVERYONE || entry_type == 0 || entry_type == ACE_IDENTIFIER_GROUP); default: if (type >= MIN_ACE_TYPE && type <= MAX_ACE_TYPE) return (B_TRUE); } return (B_FALSE); } static boolean_t zfs_ace_valid(vtype_t obj_type, zfs_acl_t *aclp, uint16_t type, uint16_t iflags) { /* * first check type of entry */ if (!zfs_acl_valid_ace_type(type, iflags)) return (B_FALSE); switch (type) { case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE: case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE: case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE: case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE: if (aclp->z_version < ZFS_ACL_VERSION_FUID) return (B_FALSE); aclp->z_hints |= ZFS_ACL_OBJ_ACE; } /* * next check inheritance level flags */ if (obj_type == VDIR && (iflags & (ACE_FILE_INHERIT_ACE|ACE_DIRECTORY_INHERIT_ACE))) aclp->z_hints |= ZFS_INHERIT_ACE; if (iflags & (ACE_INHERIT_ONLY_ACE|ACE_NO_PROPAGATE_INHERIT_ACE)) { if ((iflags & (ACE_FILE_INHERIT_ACE| ACE_DIRECTORY_INHERIT_ACE)) == 0) { return (B_FALSE); } } return (B_TRUE); } static void * zfs_acl_next_ace(zfs_acl_t *aclp, void *start, uint64_t *who, uint32_t *access_mask, uint16_t *iflags, uint16_t *type) { zfs_acl_node_t *aclnode; ASSERT(aclp); if (start == NULL) { aclnode = list_head(&aclp->z_acl); if (aclnode == NULL) return (NULL); aclp->z_next_ace = aclnode->z_acldata; aclp->z_curr_node = aclnode; aclnode->z_ace_idx = 0; } aclnode = aclp->z_curr_node; if (aclnode == NULL) return (NULL); if (aclnode->z_ace_idx >= aclnode->z_ace_count) { aclnode = list_next(&aclp->z_acl, aclnode); if (aclnode == NULL) return (NULL); else { aclp->z_curr_node = aclnode; aclnode->z_ace_idx = 0; aclp->z_next_ace = aclnode->z_acldata; } } if (aclnode->z_ace_idx < aclnode->z_ace_count) { void *acep = aclp->z_next_ace; size_t ace_size; /* * Make sure we don't overstep our bounds */ ace_size = aclp->z_ops.ace_size(acep); if (((caddr_t)acep + ace_size) > ((caddr_t)aclnode->z_acldata + aclnode->z_size)) { return (NULL); } *iflags = aclp->z_ops.ace_flags_get(acep); *type = aclp->z_ops.ace_type_get(acep); *access_mask = aclp->z_ops.ace_mask_get(acep); *who = aclp->z_ops.ace_who_get(acep); aclp->z_next_ace = (caddr_t)aclp->z_next_ace + ace_size; aclnode->z_ace_idx++; return ((void *)acep); } return (NULL); } /*ARGSUSED*/ static uint64_t zfs_ace_walk(void *datap, uint64_t cookie, int aclcnt, uint16_t *flags, uint16_t *type, uint32_t *mask) { zfs_acl_t *aclp = datap; zfs_ace_hdr_t *acep = (zfs_ace_hdr_t *)(uintptr_t)cookie; uint64_t who; acep = zfs_acl_next_ace(aclp, acep, &who, mask, flags, type); return ((uint64_t)(uintptr_t)acep); } static zfs_acl_node_t * zfs_acl_curr_node(zfs_acl_t *aclp) { ASSERT(aclp->z_curr_node); return (aclp->z_curr_node); } /* * Copy ACE to internal ZFS format. * While processing the ACL each ACE will be validated for correctness. * ACE FUIDs will be created later. */ int zfs_copy_ace_2_fuid(zfsvfs_t *zfsvfs, vtype_t obj_type, zfs_acl_t *aclp, void *datap, zfs_ace_t *z_acl, uint64_t aclcnt, size_t *size, zfs_fuid_info_t **fuidp, cred_t *cr) { int i; uint16_t entry_type; zfs_ace_t *aceptr = z_acl; ace_t *acep = datap; zfs_object_ace_t *zobjacep; ace_object_t *aceobjp; for (i = 0; i != aclcnt; i++) { aceptr->z_hdr.z_access_mask = acep->a_access_mask; aceptr->z_hdr.z_flags = acep->a_flags; aceptr->z_hdr.z_type = acep->a_type; entry_type = aceptr->z_hdr.z_flags & ACE_TYPE_FLAGS; if (entry_type != ACE_OWNER && entry_type != OWNING_GROUP && entry_type != ACE_EVERYONE) { aceptr->z_fuid = zfs_fuid_create(zfsvfs, acep->a_who, cr, (entry_type == 0) ? ZFS_ACE_USER : ZFS_ACE_GROUP, fuidp); } /* * Make sure ACE is valid */ if (zfs_ace_valid(obj_type, aclp, aceptr->z_hdr.z_type, aceptr->z_hdr.z_flags) != B_TRUE) return (SET_ERROR(EINVAL)); switch (acep->a_type) { case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE: case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE: case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE: case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE: zobjacep = (zfs_object_ace_t *)aceptr; aceobjp = (ace_object_t *)acep; bcopy(aceobjp->a_obj_type, zobjacep->z_object_type, sizeof (aceobjp->a_obj_type)); bcopy(aceobjp->a_inherit_obj_type, zobjacep->z_inherit_type, sizeof (aceobjp->a_inherit_obj_type)); acep = (ace_t *)((caddr_t)acep + sizeof (ace_object_t)); break; default: acep = (ace_t *)((caddr_t)acep + sizeof (ace_t)); } aceptr = (zfs_ace_t *)((caddr_t)aceptr + aclp->z_ops.ace_size(aceptr)); } *size = (caddr_t)aceptr - (caddr_t)z_acl; return (0); } /* * Copy ZFS ACEs to fixed size ace_t layout */ static void zfs_copy_fuid_2_ace(zfsvfs_t *zfsvfs, zfs_acl_t *aclp, cred_t *cr, void *datap, int filter) { uint64_t who; uint32_t access_mask; uint16_t iflags, type; zfs_ace_hdr_t *zacep = NULL; ace_t *acep = datap; ace_object_t *objacep; zfs_object_ace_t *zobjacep; size_t ace_size; uint16_t entry_type; while (zacep = zfs_acl_next_ace(aclp, zacep, &who, &access_mask, &iflags, &type)) { switch (type) { case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE: case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE: case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE: case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE: if (filter) { continue; } zobjacep = (zfs_object_ace_t *)zacep; objacep = (ace_object_t *)acep; bcopy(zobjacep->z_object_type, objacep->a_obj_type, sizeof (zobjacep->z_object_type)); bcopy(zobjacep->z_inherit_type, objacep->a_inherit_obj_type, sizeof (zobjacep->z_inherit_type)); ace_size = sizeof (ace_object_t); break; default: ace_size = sizeof (ace_t); break; } entry_type = (iflags & ACE_TYPE_FLAGS); if ((entry_type != ACE_OWNER && entry_type != OWNING_GROUP && entry_type != ACE_EVERYONE)) { acep->a_who = zfs_fuid_map_id(zfsvfs, who, cr, (entry_type & ACE_IDENTIFIER_GROUP) ? ZFS_ACE_GROUP : ZFS_ACE_USER); } else { acep->a_who = (uid_t)(int64_t)who; } acep->a_access_mask = access_mask; acep->a_flags = iflags; acep->a_type = type; acep = (ace_t *)((caddr_t)acep + ace_size); } } static int zfs_copy_ace_2_oldace(vtype_t obj_type, zfs_acl_t *aclp, ace_t *acep, zfs_oldace_t *z_acl, int aclcnt, size_t *size) { int i; zfs_oldace_t *aceptr = z_acl; for (i = 0; i != aclcnt; i++, aceptr++) { aceptr->z_access_mask = acep[i].a_access_mask; aceptr->z_type = acep[i].a_type; aceptr->z_flags = acep[i].a_flags; aceptr->z_fuid = acep[i].a_who; /* * Make sure ACE is valid */ if (zfs_ace_valid(obj_type, aclp, aceptr->z_type, aceptr->z_flags) != B_TRUE) return (SET_ERROR(EINVAL)); } *size = (caddr_t)aceptr - (caddr_t)z_acl; return (0); } /* * convert old ACL format to new */ void zfs_acl_xform(znode_t *zp, zfs_acl_t *aclp, cred_t *cr) { zfs_oldace_t *oldaclp; int i; uint16_t type, iflags; uint32_t access_mask; uint64_t who; void *cookie = NULL; zfs_acl_node_t *newaclnode; ASSERT(aclp->z_version == ZFS_ACL_VERSION_INITIAL); /* * First create the ACE in a contiguous piece of memory * for zfs_copy_ace_2_fuid(). * * We only convert an ACL once, so this won't happen * everytime. */ oldaclp = kmem_alloc(sizeof (zfs_oldace_t) * aclp->z_acl_count, KM_SLEEP); i = 0; while (cookie = zfs_acl_next_ace(aclp, cookie, &who, &access_mask, &iflags, &type)) { oldaclp[i].z_flags = iflags; oldaclp[i].z_type = type; oldaclp[i].z_fuid = who; oldaclp[i++].z_access_mask = access_mask; } newaclnode = zfs_acl_node_alloc(aclp->z_acl_count * sizeof (zfs_object_ace_t)); aclp->z_ops = zfs_acl_fuid_ops; VERIFY(zfs_copy_ace_2_fuid(zp->z_zfsvfs, ZTOV(zp)->v_type, aclp, oldaclp, newaclnode->z_acldata, aclp->z_acl_count, &newaclnode->z_size, NULL, cr) == 0); newaclnode->z_ace_count = aclp->z_acl_count; aclp->z_version = ZFS_ACL_VERSION; kmem_free(oldaclp, aclp->z_acl_count * sizeof (zfs_oldace_t)); /* * Release all previous ACL nodes */ zfs_acl_release_nodes(aclp); list_insert_head(&aclp->z_acl, newaclnode); aclp->z_acl_bytes = newaclnode->z_size; aclp->z_acl_count = newaclnode->z_ace_count; } /* * Convert unix access mask to v4 access mask */ static uint32_t zfs_unix_to_v4(uint32_t access_mask) { uint32_t new_mask = 0; if (access_mask & S_IXOTH) new_mask |= ACE_EXECUTE; if (access_mask & S_IWOTH) new_mask |= ACE_WRITE_DATA; if (access_mask & S_IROTH) new_mask |= ACE_READ_DATA; return (new_mask); } static void zfs_set_ace(zfs_acl_t *aclp, void *acep, uint32_t access_mask, uint16_t access_type, uint64_t fuid, uint16_t entry_type) { uint16_t type = entry_type & ACE_TYPE_FLAGS; aclp->z_ops.ace_mask_set(acep, access_mask); aclp->z_ops.ace_type_set(acep, access_type); aclp->z_ops.ace_flags_set(acep, entry_type); if ((type != ACE_OWNER && type != OWNING_GROUP && type != ACE_EVERYONE)) aclp->z_ops.ace_who_set(acep, fuid); } /* * Determine mode of file based on ACL. */ uint64_t zfs_mode_compute(uint64_t fmode, zfs_acl_t *aclp, uint64_t *pflags, uint64_t fuid, uint64_t fgid) { int entry_type; mode_t mode; mode_t seen = 0; zfs_ace_hdr_t *acep = NULL; uint64_t who; uint16_t iflags, type; uint32_t access_mask; boolean_t an_exec_denied = B_FALSE; mode = (fmode & (S_IFMT | S_ISUID | S_ISGID | S_ISVTX)); while (acep = zfs_acl_next_ace(aclp, acep, &who, &access_mask, &iflags, &type)) { if (!zfs_acl_valid_ace_type(type, iflags)) continue; entry_type = (iflags & ACE_TYPE_FLAGS); /* * Skip over any inherit_only ACEs */ if (iflags & ACE_INHERIT_ONLY_ACE) continue; if (entry_type == ACE_OWNER || (entry_type == 0 && who == fuid)) { if ((access_mask & ACE_READ_DATA) && (!(seen & S_IRUSR))) { seen |= S_IRUSR; if (type == ALLOW) { mode |= S_IRUSR; } } if ((access_mask & ACE_WRITE_DATA) && (!(seen & S_IWUSR))) { seen |= S_IWUSR; if (type == ALLOW) { mode |= S_IWUSR; } } if ((access_mask & ACE_EXECUTE) && (!(seen & S_IXUSR))) { seen |= S_IXUSR; if (type == ALLOW) { mode |= S_IXUSR; } } } else if (entry_type == OWNING_GROUP || (entry_type == ACE_IDENTIFIER_GROUP && who == fgid)) { if ((access_mask & ACE_READ_DATA) && (!(seen & S_IRGRP))) { seen |= S_IRGRP; if (type == ALLOW) { mode |= S_IRGRP; } } if ((access_mask & ACE_WRITE_DATA) && (!(seen & S_IWGRP))) { seen |= S_IWGRP; if (type == ALLOW) { mode |= S_IWGRP; } } if ((access_mask & ACE_EXECUTE) && (!(seen & S_IXGRP))) { seen |= S_IXGRP; if (type == ALLOW) { mode |= S_IXGRP; } } } else if (entry_type == ACE_EVERYONE) { if ((access_mask & ACE_READ_DATA)) { if (!(seen & S_IRUSR)) { seen |= S_IRUSR; if (type == ALLOW) { mode |= S_IRUSR; } } if (!(seen & S_IRGRP)) { seen |= S_IRGRP; if (type == ALLOW) { mode |= S_IRGRP; } } if (!(seen & S_IROTH)) { seen |= S_IROTH; if (type == ALLOW) { mode |= S_IROTH; } } } if ((access_mask & ACE_WRITE_DATA)) { if (!(seen & S_IWUSR)) { seen |= S_IWUSR; if (type == ALLOW) { mode |= S_IWUSR; } } if (!(seen & S_IWGRP)) { seen |= S_IWGRP; if (type == ALLOW) { mode |= S_IWGRP; } } if (!(seen & S_IWOTH)) { seen |= S_IWOTH; if (type == ALLOW) { mode |= S_IWOTH; } } } if ((access_mask & ACE_EXECUTE)) { if (!(seen & S_IXUSR)) { seen |= S_IXUSR; if (type == ALLOW) { mode |= S_IXUSR; } } if (!(seen & S_IXGRP)) { seen |= S_IXGRP; if (type == ALLOW) { mode |= S_IXGRP; } } if (!(seen & S_IXOTH)) { seen |= S_IXOTH; if (type == ALLOW) { mode |= S_IXOTH; } } } } else { /* * Only care if this IDENTIFIER_GROUP or * USER ACE denies execute access to someone, * mode is not affected */ if ((access_mask & ACE_EXECUTE) && type == DENY) an_exec_denied = B_TRUE; } } /* * Failure to allow is effectively a deny, so execute permission * is denied if it was never mentioned or if we explicitly * weren't allowed it. */ if (!an_exec_denied && ((seen & ALL_MODE_EXECS) != ALL_MODE_EXECS || (mode & ALL_MODE_EXECS) != ALL_MODE_EXECS)) an_exec_denied = B_TRUE; if (an_exec_denied) *pflags &= ~ZFS_NO_EXECS_DENIED; else *pflags |= ZFS_NO_EXECS_DENIED; return (mode); } /* * Read an external acl object. If the intent is to modify, always * create a new acl and leave any cached acl in place. */ static int -zfs_acl_node_read(znode_t *zp, boolean_t have_lock, zfs_acl_t **aclpp, - boolean_t will_modify) +zfs_acl_node_read(znode_t *zp, zfs_acl_t **aclpp, boolean_t will_modify) { zfs_acl_t *aclp; int aclsize; int acl_count; zfs_acl_node_t *aclnode; zfs_acl_phys_t znode_acl; int version; int error; - boolean_t drop_lock = B_FALSE; ASSERT(MUTEX_HELD(&zp->z_acl_lock)); + ASSERT_VOP_LOCKED(ZTOV(zp), __func__); if (zp->z_acl_cached && !will_modify) { *aclpp = zp->z_acl_cached; return (0); } - /* - * close race where znode could be upgrade while trying to - * read the znode attributes. - * - * But this could only happen if the file isn't already an SA - * znode - */ - if (!zp->z_is_sa && !have_lock) { - mutex_enter(&zp->z_lock); - drop_lock = B_TRUE; - } version = zfs_znode_acl_version(zp); if ((error = zfs_acl_znode_info(zp, &aclsize, &acl_count, &znode_acl)) != 0) { goto done; } aclp = zfs_acl_alloc(version); aclp->z_acl_count = acl_count; aclp->z_acl_bytes = aclsize; aclnode = zfs_acl_node_alloc(aclsize); aclnode->z_ace_count = aclp->z_acl_count; aclnode->z_size = aclsize; if (!zp->z_is_sa) { if (znode_acl.z_acl_extern_obj) { error = dmu_read(zp->z_zfsvfs->z_os, znode_acl.z_acl_extern_obj, 0, aclnode->z_size, aclnode->z_acldata, DMU_READ_PREFETCH); } else { bcopy(znode_acl.z_ace_data, aclnode->z_acldata, aclnode->z_size); } } else { error = sa_lookup(zp->z_sa_hdl, SA_ZPL_DACL_ACES(zp->z_zfsvfs), aclnode->z_acldata, aclnode->z_size); } if (error != 0) { zfs_acl_free(aclp); zfs_acl_node_free(aclnode); /* convert checksum errors into IO errors */ if (error == ECKSUM) error = SET_ERROR(EIO); goto done; } list_insert_head(&aclp->z_acl, aclnode); *aclpp = aclp; if (!will_modify) zp->z_acl_cached = aclp; done: - if (drop_lock) - mutex_exit(&zp->z_lock); return (error); } /*ARGSUSED*/ void zfs_acl_data_locator(void **dataptr, uint32_t *length, uint32_t buflen, boolean_t start, void *userdata) { zfs_acl_locator_cb_t *cb = (zfs_acl_locator_cb_t *)userdata; if (start) { cb->cb_acl_node = list_head(&cb->cb_aclp->z_acl); } else { cb->cb_acl_node = list_next(&cb->cb_aclp->z_acl, cb->cb_acl_node); } *dataptr = cb->cb_acl_node->z_acldata; *length = cb->cb_acl_node->z_size; } int zfs_acl_chown_setattr(znode_t *zp) { int error; zfs_acl_t *aclp; - ASSERT(MUTEX_HELD(&zp->z_lock)); + ASSERT_VOP_ELOCKED(ZTOV(zp), __func__); ASSERT(MUTEX_HELD(&zp->z_acl_lock)); - if ((error = zfs_acl_node_read(zp, B_TRUE, &aclp, B_FALSE)) == 0) + if ((error = zfs_acl_node_read(zp, &aclp, B_FALSE)) == 0) zp->z_mode = zfs_mode_compute(zp->z_mode, aclp, &zp->z_pflags, zp->z_uid, zp->z_gid); return (error); } /* * common code for setting ACLs. * * This function is called from zfs_mode_update, zfs_perm_init, and zfs_setacl. * zfs_setacl passes a non-NULL inherit pointer (ihp) to indicate that it's * already checked the acl and knows whether to inherit. */ int zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx) { int error; zfsvfs_t *zfsvfs = zp->z_zfsvfs; dmu_object_type_t otype; zfs_acl_locator_cb_t locate = { 0 }; uint64_t mode; sa_bulk_attr_t bulk[5]; uint64_t ctime[2]; int count = 0; mode = zp->z_mode; mode = zfs_mode_compute(mode, aclp, &zp->z_pflags, zp->z_uid, zp->z_gid); zp->z_mode = mode; SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL, &mode, sizeof (mode)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL, &zp->z_pflags, sizeof (zp->z_pflags)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, sizeof (ctime)); if (zp->z_acl_cached) { zfs_acl_free(zp->z_acl_cached); zp->z_acl_cached = NULL; } /* * Upgrade needed? */ if (!zfsvfs->z_use_fuids) { otype = DMU_OT_OLDACL; } else { if ((aclp->z_version == ZFS_ACL_VERSION_INITIAL) && (zfsvfs->z_version >= ZPL_VERSION_FUID)) zfs_acl_xform(zp, aclp, cr); ASSERT(aclp->z_version >= ZFS_ACL_VERSION_FUID); otype = DMU_OT_ACL; } /* * Arrgh, we have to handle old on disk format * as well as newer (preferred) SA format. */ if (zp->z_is_sa) { /* the easy case, just update the ACL attribute */ locate.cb_aclp = aclp; SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_DACL_ACES(zfsvfs), zfs_acl_data_locator, &locate, aclp->z_acl_bytes); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_DACL_COUNT(zfsvfs), NULL, &aclp->z_acl_count, sizeof (uint64_t)); } else { /* Painful legacy way */ zfs_acl_node_t *aclnode; uint64_t off = 0; zfs_acl_phys_t acl_phys; uint64_t aoid; if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zfsvfs), &acl_phys, sizeof (acl_phys))) != 0) return (error); aoid = acl_phys.z_acl_extern_obj; if (aclp->z_acl_bytes > ZFS_ACE_SPACE) { /* * If ACL was previously external and we are now * converting to new ACL format then release old * ACL object and create a new one. */ if (aoid && aclp->z_version != acl_phys.z_acl_version) { error = dmu_object_free(zfsvfs->z_os, aoid, tx); if (error) return (error); aoid = 0; } if (aoid == 0) { aoid = dmu_object_alloc(zfsvfs->z_os, otype, aclp->z_acl_bytes, otype == DMU_OT_ACL ? DMU_OT_SYSACL : DMU_OT_NONE, otype == DMU_OT_ACL ? DN_MAX_BONUSLEN : 0, tx); } else { (void) dmu_object_set_blocksize(zfsvfs->z_os, aoid, aclp->z_acl_bytes, 0, tx); } acl_phys.z_acl_extern_obj = aoid; for (aclnode = list_head(&aclp->z_acl); aclnode; aclnode = list_next(&aclp->z_acl, aclnode)) { if (aclnode->z_ace_count == 0) continue; dmu_write(zfsvfs->z_os, aoid, off, aclnode->z_size, aclnode->z_acldata, tx); off += aclnode->z_size; } } else { void *start = acl_phys.z_ace_data; /* * Migrating back embedded? */ if (acl_phys.z_acl_extern_obj) { error = dmu_object_free(zfsvfs->z_os, acl_phys.z_acl_extern_obj, tx); if (error) return (error); acl_phys.z_acl_extern_obj = 0; } for (aclnode = list_head(&aclp->z_acl); aclnode; aclnode = list_next(&aclp->z_acl, aclnode)) { if (aclnode->z_ace_count == 0) continue; bcopy(aclnode->z_acldata, start, aclnode->z_size); start = (caddr_t)start + aclnode->z_size; } } /* * If Old version then swap count/bytes to match old * layout of znode_acl_phys_t. */ if (aclp->z_version == ZFS_ACL_VERSION_INITIAL) { acl_phys.z_acl_size = aclp->z_acl_count; acl_phys.z_acl_count = aclp->z_acl_bytes; } else { acl_phys.z_acl_size = aclp->z_acl_bytes; acl_phys.z_acl_count = aclp->z_acl_count; } acl_phys.z_acl_version = aclp->z_version; SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ZNODE_ACL(zfsvfs), NULL, &acl_phys, sizeof (acl_phys)); } /* * Replace ACL wide bits, but first clear them. */ zp->z_pflags &= ~ZFS_ACL_WIDE_FLAGS; zp->z_pflags |= aclp->z_hints; if (ace_trivial_common(aclp, 0, zfs_ace_walk) == 0) zp->z_pflags |= ZFS_ACL_TRIVIAL; zfs_tstamp_update_setup(zp, STATE_CHANGED, NULL, ctime, B_TRUE); return (sa_bulk_update(zp->z_sa_hdl, bulk, count, tx)); } static void zfs_acl_chmod(vtype_t vtype, uint64_t mode, boolean_t split, boolean_t trim, zfs_acl_t *aclp) { void *acep = NULL; uint64_t who; int new_count, new_bytes; int ace_size; int entry_type; uint16_t iflags, type; uint32_t access_mask; zfs_acl_node_t *newnode; size_t abstract_size = aclp->z_ops.ace_abstract_size(); void *zacep; boolean_t isdir; trivial_acl_t masks; new_count = new_bytes = 0; isdir = (vtype == VDIR); acl_trivial_access_masks((mode_t)mode, isdir, &masks); newnode = zfs_acl_node_alloc((abstract_size * 6) + aclp->z_acl_bytes); zacep = newnode->z_acldata; if (masks.allow0) { zfs_set_ace(aclp, zacep, masks.allow0, ALLOW, -1, ACE_OWNER); zacep = (void *)((uintptr_t)zacep + abstract_size); new_count++; new_bytes += abstract_size; } if (masks.deny1) { zfs_set_ace(aclp, zacep, masks.deny1, DENY, -1, ACE_OWNER); zacep = (void *)((uintptr_t)zacep + abstract_size); new_count++; new_bytes += abstract_size; } if (masks.deny2) { zfs_set_ace(aclp, zacep, masks.deny2, DENY, -1, OWNING_GROUP); zacep = (void *)((uintptr_t)zacep + abstract_size); new_count++; new_bytes += abstract_size; } while (acep = zfs_acl_next_ace(aclp, acep, &who, &access_mask, &iflags, &type)) { entry_type = (iflags & ACE_TYPE_FLAGS); /* * ACEs used to represent the file mode may be divided * into an equivalent pair of inherit-only and regular * ACEs, if they are inheritable. * Skip regular ACEs, which are replaced by the new mode. */ if (split && (entry_type == ACE_OWNER || entry_type == OWNING_GROUP || entry_type == ACE_EVERYONE)) { if (!isdir || !(iflags & (ACE_FILE_INHERIT_ACE|ACE_DIRECTORY_INHERIT_ACE))) continue; /* * We preserve owner@, group@, or @everyone * permissions, if they are inheritable, by * copying them to inherit_only ACEs. This * prevents inheritable permissions from being * altered along with the file mode. */ iflags |= ACE_INHERIT_ONLY_ACE; } /* * If this ACL has any inheritable ACEs, mark that in * the hints (which are later masked into the pflags) * so create knows to do inheritance. */ if (isdir && (iflags & (ACE_FILE_INHERIT_ACE|ACE_DIRECTORY_INHERIT_ACE))) aclp->z_hints |= ZFS_INHERIT_ACE; if ((type != ALLOW && type != DENY) || (iflags & ACE_INHERIT_ONLY_ACE)) { switch (type) { case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE: case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE: case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE: case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE: aclp->z_hints |= ZFS_ACL_OBJ_ACE; break; } } else { /* * Limit permissions granted by ACEs to be no greater * than permissions of the requested group mode. * Applies when the "aclmode" property is set to * "groupmask". */ if ((type == ALLOW) && trim) access_mask &= masks.group; } zfs_set_ace(aclp, zacep, access_mask, type, who, iflags); ace_size = aclp->z_ops.ace_size(acep); zacep = (void *)((uintptr_t)zacep + ace_size); new_count++; new_bytes += ace_size; } zfs_set_ace(aclp, zacep, masks.owner, ALLOW, -1, ACE_OWNER); zacep = (void *)((uintptr_t)zacep + abstract_size); zfs_set_ace(aclp, zacep, masks.group, ALLOW, -1, OWNING_GROUP); zacep = (void *)((uintptr_t)zacep + abstract_size); zfs_set_ace(aclp, zacep, masks.everyone, ALLOW, -1, ACE_EVERYONE); new_count += 3; new_bytes += abstract_size * 3; zfs_acl_release_nodes(aclp); aclp->z_acl_count = new_count; aclp->z_acl_bytes = new_bytes; newnode->z_ace_count = new_count; newnode->z_size = new_bytes; list_insert_tail(&aclp->z_acl, newnode); } int zfs_acl_chmod_setattr(znode_t *zp, zfs_acl_t **aclp, uint64_t mode) { int error = 0; mutex_enter(&zp->z_acl_lock); - mutex_enter(&zp->z_lock); + ASSERT_VOP_ELOCKED(ZTOV(zp), __func__); if (zp->z_zfsvfs->z_acl_mode == ZFS_ACL_DISCARD) *aclp = zfs_acl_alloc(zfs_acl_version_zp(zp)); else - error = zfs_acl_node_read(zp, B_TRUE, aclp, B_TRUE); + error = zfs_acl_node_read(zp, aclp, B_TRUE); if (error == 0) { (*aclp)->z_hints = zp->z_pflags & V4_ACL_WIDE_FLAGS; zfs_acl_chmod(ZTOV(zp)->v_type, mode, B_TRUE, (zp->z_zfsvfs->z_acl_mode == ZFS_ACL_GROUPMASK), *aclp); } - mutex_exit(&zp->z_lock); mutex_exit(&zp->z_acl_lock); return (error); } /* * Should ACE be inherited? */ static int zfs_ace_can_use(vtype_t vtype, uint16_t acep_flags) { int iflags = (acep_flags & 0xf); if ((vtype == VDIR) && (iflags & ACE_DIRECTORY_INHERIT_ACE)) return (1); else if (iflags & ACE_FILE_INHERIT_ACE) return (!((vtype == VDIR) && (iflags & ACE_NO_PROPAGATE_INHERIT_ACE))); return (0); } /* * inherit inheritable ACEs from parent */ static zfs_acl_t * zfs_acl_inherit(zfsvfs_t *zfsvfs, vtype_t vtype, zfs_acl_t *paclp, uint64_t mode) { void *pacep = NULL; void *acep; zfs_acl_node_t *aclnode; zfs_acl_t *aclp = NULL; uint64_t who; uint32_t access_mask; uint16_t iflags, newflags, type; size_t ace_size; void *data1, *data2; size_t data1sz, data2sz; uint_t aclinherit; boolean_t isdir = (vtype == VDIR); aclp = zfs_acl_alloc(paclp->z_version); aclinherit = zfsvfs->z_acl_inherit; if (aclinherit == ZFS_ACL_DISCARD || vtype == VLNK) return (aclp); while (pacep = zfs_acl_next_ace(paclp, pacep, &who, &access_mask, &iflags, &type)) { /* * don't inherit bogus ACEs */ if (!zfs_acl_valid_ace_type(type, iflags)) continue; /* * Check if ACE is inheritable by this vnode */ if ((aclinherit == ZFS_ACL_NOALLOW && type == ALLOW) || !zfs_ace_can_use(vtype, iflags)) continue; /* * Strip inherited execute permission from file if * not in mode */ if (aclinherit == ZFS_ACL_PASSTHROUGH_X && type == ALLOW && !isdir && ((mode & (S_IXUSR|S_IXGRP|S_IXOTH)) == 0)) { access_mask &= ~ACE_EXECUTE; } /* * Strip write_acl and write_owner from permissions * when inheriting an ACE */ if (aclinherit == ZFS_ACL_RESTRICTED && type == ALLOW) { access_mask &= ~RESTRICTED_CLEAR; } ace_size = aclp->z_ops.ace_size(pacep); aclnode = zfs_acl_node_alloc(ace_size); list_insert_tail(&aclp->z_acl, aclnode); acep = aclnode->z_acldata; zfs_set_ace(aclp, acep, access_mask, type, who, iflags|ACE_INHERITED_ACE); /* * Copy special opaque data if any */ if ((data1sz = paclp->z_ops.ace_data(pacep, &data1)) != 0) { VERIFY((data2sz = aclp->z_ops.ace_data(acep, &data2)) == data1sz); bcopy(data1, data2, data2sz); } aclp->z_acl_count++; aclnode->z_ace_count++; aclp->z_acl_bytes += aclnode->z_size; newflags = aclp->z_ops.ace_flags_get(acep); /* * If ACE is not to be inherited further, or if the vnode is * not a directory, remove all inheritance flags */ if (!isdir || (iflags & ACE_NO_PROPAGATE_INHERIT_ACE)) { newflags &= ~ALL_INHERIT; aclp->z_ops.ace_flags_set(acep, newflags|ACE_INHERITED_ACE); continue; } /* * This directory has an inheritable ACE */ aclp->z_hints |= ZFS_INHERIT_ACE; /* * If only FILE_INHERIT is set then turn on * inherit_only */ if ((iflags & (ACE_FILE_INHERIT_ACE | ACE_DIRECTORY_INHERIT_ACE)) == ACE_FILE_INHERIT_ACE) { newflags |= ACE_INHERIT_ONLY_ACE; aclp->z_ops.ace_flags_set(acep, newflags|ACE_INHERITED_ACE); } else { newflags &= ~ACE_INHERIT_ONLY_ACE; aclp->z_ops.ace_flags_set(acep, newflags|ACE_INHERITED_ACE); } } return (aclp); } /* * Create file system object initial permissions * including inheritable ACEs. * Also, create FUIDs for owner and group. */ int zfs_acl_ids_create(znode_t *dzp, int flag, vattr_t *vap, cred_t *cr, vsecattr_t *vsecp, zfs_acl_ids_t *acl_ids) { int error; zfsvfs_t *zfsvfs = dzp->z_zfsvfs; zfs_acl_t *paclp; gid_t gid; boolean_t trim = B_FALSE; boolean_t inherited = B_FALSE; + ASSERT_VOP_ELOCKED(ZTOV(dzp), __func__); bzero(acl_ids, sizeof (zfs_acl_ids_t)); acl_ids->z_mode = MAKEIMODE(vap->va_type, vap->va_mode); if (vsecp) if ((error = zfs_vsec_2_aclp(zfsvfs, vap->va_type, vsecp, cr, &acl_ids->z_fuidp, &acl_ids->z_aclp)) != 0) return (error); /* * Determine uid and gid. */ if ((flag & IS_ROOT_NODE) || zfsvfs->z_replay || ((flag & IS_XATTR) && (vap->va_type == VDIR))) { acl_ids->z_fuid = zfs_fuid_create(zfsvfs, (uint64_t)vap->va_uid, cr, ZFS_OWNER, &acl_ids->z_fuidp); acl_ids->z_fgid = zfs_fuid_create(zfsvfs, (uint64_t)vap->va_gid, cr, ZFS_GROUP, &acl_ids->z_fuidp); gid = vap->va_gid; } else { acl_ids->z_fuid = zfs_fuid_create_cred(zfsvfs, ZFS_OWNER, cr, &acl_ids->z_fuidp); acl_ids->z_fgid = 0; if (vap->va_mask & AT_GID) { acl_ids->z_fgid = zfs_fuid_create(zfsvfs, (uint64_t)vap->va_gid, cr, ZFS_GROUP, &acl_ids->z_fuidp); gid = vap->va_gid; if (acl_ids->z_fgid != dzp->z_gid && !groupmember(vap->va_gid, cr) && secpolicy_vnode_create_gid(cr) != 0) acl_ids->z_fgid = 0; } if (acl_ids->z_fgid == 0) { if (dzp->z_mode & S_ISGID) { char *domain; uint32_t rid; acl_ids->z_fgid = dzp->z_gid; gid = zfs_fuid_map_id(zfsvfs, acl_ids->z_fgid, cr, ZFS_GROUP); if (zfsvfs->z_use_fuids && IS_EPHEMERAL(acl_ids->z_fgid)) { domain = zfs_fuid_idx_domain( &zfsvfs->z_fuid_idx, FUID_INDEX(acl_ids->z_fgid)); rid = FUID_RID(acl_ids->z_fgid); zfs_fuid_node_add(&acl_ids->z_fuidp, domain, rid, FUID_INDEX(acl_ids->z_fgid), acl_ids->z_fgid, ZFS_GROUP); } } else { acl_ids->z_fgid = zfs_fuid_create_cred(zfsvfs, ZFS_GROUP, cr, &acl_ids->z_fuidp); #ifdef __FreeBSD_kernel__ gid = acl_ids->z_fgid = dzp->z_gid; #else gid = crgetgid(cr); #endif } } } /* * If we're creating a directory, and the parent directory has the * set-GID bit set, set in on the new directory. * Otherwise, if the user is neither privileged nor a member of the * file's new group, clear the file's set-GID bit. */ if (!(flag & IS_ROOT_NODE) && (dzp->z_mode & S_ISGID) && (vap->va_type == VDIR)) { acl_ids->z_mode |= S_ISGID; } else { if ((acl_ids->z_mode & S_ISGID) && secpolicy_vnode_setids_setgids(ZTOV(dzp), cr, gid) != 0) acl_ids->z_mode &= ~S_ISGID; } if (acl_ids->z_aclp == NULL) { mutex_enter(&dzp->z_acl_lock); - mutex_enter(&dzp->z_lock); if (!(flag & IS_ROOT_NODE) && (dzp->z_pflags & ZFS_INHERIT_ACE) && !(dzp->z_pflags & ZFS_XATTR)) { - VERIFY(0 == zfs_acl_node_read(dzp, B_TRUE, - &paclp, B_FALSE)); + VERIFY(0 == zfs_acl_node_read(dzp, &paclp, B_FALSE)); acl_ids->z_aclp = zfs_acl_inherit(zfsvfs, vap->va_type, paclp, acl_ids->z_mode); inherited = B_TRUE; } else { acl_ids->z_aclp = zfs_acl_alloc(zfs_acl_version_zp(dzp)); acl_ids->z_aclp->z_hints |= ZFS_ACL_TRIVIAL; } - mutex_exit(&dzp->z_lock); mutex_exit(&dzp->z_acl_lock); if (vap->va_type == VDIR) acl_ids->z_aclp->z_hints |= ZFS_ACL_AUTO_INHERIT; if (zfsvfs->z_acl_mode == ZFS_ACL_GROUPMASK && zfsvfs->z_acl_inherit != ZFS_ACL_PASSTHROUGH && zfsvfs->z_acl_inherit != ZFS_ACL_PASSTHROUGH_X) trim = B_TRUE; zfs_acl_chmod(vap->va_type, acl_ids->z_mode, B_FALSE, trim, acl_ids->z_aclp); } if (inherited || vsecp) { acl_ids->z_mode = zfs_mode_compute(acl_ids->z_mode, acl_ids->z_aclp, &acl_ids->z_aclp->z_hints, acl_ids->z_fuid, acl_ids->z_fgid); if (ace_trivial_common(acl_ids->z_aclp, 0, zfs_ace_walk) == 0) acl_ids->z_aclp->z_hints |= ZFS_ACL_TRIVIAL; } return (0); } /* * Free ACL and fuid_infop, but not the acl_ids structure */ void zfs_acl_ids_free(zfs_acl_ids_t *acl_ids) { if (acl_ids->z_aclp) zfs_acl_free(acl_ids->z_aclp); if (acl_ids->z_fuidp) zfs_fuid_info_free(acl_ids->z_fuidp); acl_ids->z_aclp = NULL; acl_ids->z_fuidp = NULL; } boolean_t zfs_acl_ids_overquota(zfsvfs_t *zfsvfs, zfs_acl_ids_t *acl_ids) { return (zfs_fuid_overquota(zfsvfs, B_FALSE, acl_ids->z_fuid) || zfs_fuid_overquota(zfsvfs, B_TRUE, acl_ids->z_fgid)); } /* * Retrieve a file's ACL */ int zfs_getacl(znode_t *zp, vsecattr_t *vsecp, boolean_t skipaclchk, cred_t *cr) { zfs_acl_t *aclp; ulong_t mask; int error; int count = 0; int largeace = 0; mask = vsecp->vsa_mask & (VSA_ACE | VSA_ACECNT | VSA_ACE_ACLFLAGS | VSA_ACE_ALLTYPES); if (mask == 0) return (SET_ERROR(ENOSYS)); if (error = zfs_zaccess(zp, ACE_READ_ACL, 0, skipaclchk, cr)) return (error); mutex_enter(&zp->z_acl_lock); - error = zfs_acl_node_read(zp, B_FALSE, &aclp, B_FALSE); + ASSERT_VOP_LOCKED(ZTOV(zp), __func__); + error = zfs_acl_node_read(zp, &aclp, B_FALSE); if (error != 0) { mutex_exit(&zp->z_acl_lock); return (error); } /* * Scan ACL to determine number of ACEs */ if ((zp->z_pflags & ZFS_ACL_OBJ_ACE) && !(mask & VSA_ACE_ALLTYPES)) { void *zacep = NULL; uint64_t who; uint32_t access_mask; uint16_t type, iflags; while (zacep = zfs_acl_next_ace(aclp, zacep, &who, &access_mask, &iflags, &type)) { switch (type) { case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE: case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE: case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE: case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE: largeace++; continue; default: count++; } } vsecp->vsa_aclcnt = count; } else count = (int)aclp->z_acl_count; if (mask & VSA_ACECNT) { vsecp->vsa_aclcnt = count; } if (mask & VSA_ACE) { size_t aclsz; aclsz = count * sizeof (ace_t) + sizeof (ace_object_t) * largeace; vsecp->vsa_aclentp = kmem_alloc(aclsz, KM_SLEEP); vsecp->vsa_aclentsz = aclsz; if (aclp->z_version == ZFS_ACL_VERSION_FUID) zfs_copy_fuid_2_ace(zp->z_zfsvfs, aclp, cr, vsecp->vsa_aclentp, !(mask & VSA_ACE_ALLTYPES)); else { zfs_acl_node_t *aclnode; void *start = vsecp->vsa_aclentp; for (aclnode = list_head(&aclp->z_acl); aclnode; aclnode = list_next(&aclp->z_acl, aclnode)) { bcopy(aclnode->z_acldata, start, aclnode->z_size); start = (caddr_t)start + aclnode->z_size; } ASSERT((caddr_t)start - (caddr_t)vsecp->vsa_aclentp == aclp->z_acl_bytes); } } if (mask & VSA_ACE_ACLFLAGS) { vsecp->vsa_aclflags = 0; if (zp->z_pflags & ZFS_ACL_DEFAULTED) vsecp->vsa_aclflags |= ACL_DEFAULTED; if (zp->z_pflags & ZFS_ACL_PROTECTED) vsecp->vsa_aclflags |= ACL_PROTECTED; if (zp->z_pflags & ZFS_ACL_AUTO_INHERIT) vsecp->vsa_aclflags |= ACL_AUTO_INHERIT; } mutex_exit(&zp->z_acl_lock); return (0); } int zfs_vsec_2_aclp(zfsvfs_t *zfsvfs, vtype_t obj_type, vsecattr_t *vsecp, cred_t *cr, zfs_fuid_info_t **fuidp, zfs_acl_t **zaclp) { zfs_acl_t *aclp; zfs_acl_node_t *aclnode; int aclcnt = vsecp->vsa_aclcnt; int error; if (vsecp->vsa_aclcnt > MAX_ACL_ENTRIES || vsecp->vsa_aclcnt <= 0) return (SET_ERROR(EINVAL)); aclp = zfs_acl_alloc(zfs_acl_version(zfsvfs->z_version)); aclp->z_hints = 0; aclnode = zfs_acl_node_alloc(aclcnt * sizeof (zfs_object_ace_t)); if (aclp->z_version == ZFS_ACL_VERSION_INITIAL) { if ((error = zfs_copy_ace_2_oldace(obj_type, aclp, (ace_t *)vsecp->vsa_aclentp, aclnode->z_acldata, aclcnt, &aclnode->z_size)) != 0) { zfs_acl_free(aclp); zfs_acl_node_free(aclnode); return (error); } } else { if ((error = zfs_copy_ace_2_fuid(zfsvfs, obj_type, aclp, vsecp->vsa_aclentp, aclnode->z_acldata, aclcnt, &aclnode->z_size, fuidp, cr)) != 0) { zfs_acl_free(aclp); zfs_acl_node_free(aclnode); return (error); } } aclp->z_acl_bytes = aclnode->z_size; aclnode->z_ace_count = aclcnt; aclp->z_acl_count = aclcnt; list_insert_head(&aclp->z_acl, aclnode); /* * If flags are being set then add them to z_hints */ if (vsecp->vsa_mask & VSA_ACE_ACLFLAGS) { if (vsecp->vsa_aclflags & ACL_PROTECTED) aclp->z_hints |= ZFS_ACL_PROTECTED; if (vsecp->vsa_aclflags & ACL_DEFAULTED) aclp->z_hints |= ZFS_ACL_DEFAULTED; if (vsecp->vsa_aclflags & ACL_AUTO_INHERIT) aclp->z_hints |= ZFS_ACL_AUTO_INHERIT; } *zaclp = aclp; return (0); } /* * Set a file's ACL */ int zfs_setacl(znode_t *zp, vsecattr_t *vsecp, boolean_t skipaclchk, cred_t *cr) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; zilog_t *zilog = zfsvfs->z_log; ulong_t mask = vsecp->vsa_mask & (VSA_ACE | VSA_ACECNT); dmu_tx_t *tx; int error; zfs_acl_t *aclp; zfs_fuid_info_t *fuidp = NULL; boolean_t fuid_dirtied; uint64_t acl_obj; + ASSERT_VOP_ELOCKED(ZTOV(zp), __func__); if (mask == 0) return (SET_ERROR(ENOSYS)); if (zp->z_pflags & ZFS_IMMUTABLE) return (SET_ERROR(EPERM)); if (error = zfs_zaccess(zp, ACE_WRITE_ACL, 0, skipaclchk, cr)) return (error); error = zfs_vsec_2_aclp(zfsvfs, ZTOV(zp)->v_type, vsecp, cr, &fuidp, &aclp); if (error) return (error); /* * If ACL wide flags aren't being set then preserve any * existing flags. */ if (!(vsecp->vsa_mask & VSA_ACE_ACLFLAGS)) { aclp->z_hints |= (zp->z_pflags & V4_ACL_WIDE_FLAGS); } top: mutex_enter(&zp->z_acl_lock); - mutex_enter(&zp->z_lock); tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE); fuid_dirtied = zfsvfs->z_fuid_dirty; if (fuid_dirtied) zfs_fuid_txhold(zfsvfs, tx); /* * If old version and ACL won't fit in bonus and we aren't * upgrading then take out necessary DMU holds */ if ((acl_obj = zfs_external_acl(zp)) != 0) { if (zfsvfs->z_version >= ZPL_VERSION_FUID && zfs_znode_acl_version(zp) <= ZFS_ACL_VERSION_INITIAL) { dmu_tx_hold_free(tx, acl_obj, 0, DMU_OBJECT_END); dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, aclp->z_acl_bytes); } else { dmu_tx_hold_write(tx, acl_obj, 0, aclp->z_acl_bytes); } } else if (!zp->z_is_sa && aclp->z_acl_bytes > ZFS_ACE_SPACE) { dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, aclp->z_acl_bytes); } zfs_sa_upgrade_txholds(tx, zp); error = dmu_tx_assign(tx, TXG_NOWAIT); if (error) { - mutex_exit(&zp->z_lock); mutex_exit(&zp->z_acl_lock); if (error == ERESTART) { dmu_tx_wait(tx); dmu_tx_abort(tx); goto top; } dmu_tx_abort(tx); zfs_acl_free(aclp); return (error); } error = zfs_aclset_common(zp, aclp, cr, tx); ASSERT(error == 0); ASSERT(zp->z_acl_cached == NULL); zp->z_acl_cached = aclp; if (fuid_dirtied) zfs_fuid_sync(zfsvfs, tx); zfs_log_acl(zilog, tx, zp, vsecp, fuidp); if (fuidp) zfs_fuid_info_free(fuidp); dmu_tx_commit(tx); - mutex_exit(&zp->z_lock); mutex_exit(&zp->z_acl_lock); return (error); } /* * Check accesses of interest (AoI) against attributes of the dataset * such as read-only. Returns zero if no AoI conflict with dataset * attributes, otherwise an appropriate errno is returned. */ static int zfs_zaccess_dataset_check(znode_t *zp, uint32_t v4_mode) { if ((v4_mode & WRITE_MASK) && (zp->z_zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) && (!IS_DEVVP(ZTOV(zp)) || (IS_DEVVP(ZTOV(zp)) && (v4_mode & WRITE_MASK_ATTRS)))) { return (SET_ERROR(EROFS)); } /* * Only check for READONLY on non-directories. */ if ((v4_mode & WRITE_MASK_DATA) && (((ZTOV(zp)->v_type != VDIR) && (zp->z_pflags & (ZFS_READONLY | ZFS_IMMUTABLE))) || (ZTOV(zp)->v_type == VDIR && (zp->z_pflags & ZFS_IMMUTABLE)))) { return (SET_ERROR(EPERM)); } #ifdef illumos if ((v4_mode & (ACE_DELETE | ACE_DELETE_CHILD)) && (zp->z_pflags & ZFS_NOUNLINK)) { return (SET_ERROR(EPERM)); } #else /* * In FreeBSD we allow to modify directory's content is ZFS_NOUNLINK * (sunlnk) is set. We just don't allow directory removal, which is * handled in zfs_zaccess_delete(). */ if ((v4_mode & ACE_DELETE) && (zp->z_pflags & ZFS_NOUNLINK)) { return (EPERM); } #endif if (((v4_mode & (ACE_READ_DATA|ACE_EXECUTE)) && (zp->z_pflags & ZFS_AV_QUARANTINED))) { return (SET_ERROR(EACCES)); } return (0); } /* * The primary usage of this function is to loop through all of the * ACEs in the znode, determining what accesses of interest (AoI) to * the caller are allowed or denied. The AoI are expressed as bits in * the working_mode parameter. As each ACE is processed, bits covered * by that ACE are removed from the working_mode. This removal * facilitates two things. The first is that when the working mode is * empty (= 0), we know we've looked at all the AoI. The second is * that the ACE interpretation rules don't allow a later ACE to undo * something granted or denied by an earlier ACE. Removing the * discovered access or denial enforces this rule. At the end of * processing the ACEs, all AoI that were found to be denied are * placed into the working_mode, giving the caller a mask of denied * accesses. Returns: * 0 if all AoI granted * EACCESS if the denied mask is non-zero * other error if abnormal failure (e.g., IO error) * * A secondary usage of the function is to determine if any of the * AoI are granted. If an ACE grants any access in * the working_mode, we immediately short circuit out of the function. * This mode is chosen by setting anyaccess to B_TRUE. The * working_mode is not a denied access mask upon exit if the function * is used in this manner. */ static int zfs_zaccess_aces_check(znode_t *zp, uint32_t *working_mode, boolean_t anyaccess, cred_t *cr) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; zfs_acl_t *aclp; int error; uid_t uid = crgetuid(cr); uint64_t who; uint16_t type, iflags; uint16_t entry_type; uint32_t access_mask; uint32_t deny_mask = 0; zfs_ace_hdr_t *acep = NULL; boolean_t checkit; uid_t gowner; uid_t fowner; zfs_fuid_map_ids(zp, cr, &fowner, &gowner); mutex_enter(&zp->z_acl_lock); - error = zfs_acl_node_read(zp, B_FALSE, &aclp, B_FALSE); + ASSERT_VOP_LOCKED(ZTOV(zp), __func__); + error = zfs_acl_node_read(zp, &aclp, B_FALSE); if (error != 0) { mutex_exit(&zp->z_acl_lock); return (error); } ASSERT(zp->z_acl_cached); while (acep = zfs_acl_next_ace(aclp, acep, &who, &access_mask, &iflags, &type)) { uint32_t mask_matched; if (!zfs_acl_valid_ace_type(type, iflags)) continue; if (ZTOV(zp)->v_type == VDIR && (iflags & ACE_INHERIT_ONLY_ACE)) continue; /* Skip ACE if it does not affect any AoI */ mask_matched = (access_mask & *working_mode); if (!mask_matched) continue; entry_type = (iflags & ACE_TYPE_FLAGS); checkit = B_FALSE; switch (entry_type) { case ACE_OWNER: if (uid == fowner) checkit = B_TRUE; break; case OWNING_GROUP: who = gowner; /*FALLTHROUGH*/ case ACE_IDENTIFIER_GROUP: checkit = zfs_groupmember(zfsvfs, who, cr); break; case ACE_EVERYONE: checkit = B_TRUE; break; /* USER Entry */ default: if (entry_type == 0) { uid_t newid; newid = zfs_fuid_map_id(zfsvfs, who, cr, ZFS_ACE_USER); if (newid != IDMAP_WK_CREATOR_OWNER_UID && uid == newid) checkit = B_TRUE; break; } else { mutex_exit(&zp->z_acl_lock); return (SET_ERROR(EIO)); } } if (checkit) { if (type == DENY) { DTRACE_PROBE3(zfs__ace__denies, znode_t *, zp, zfs_ace_hdr_t *, acep, uint32_t, mask_matched); deny_mask |= mask_matched; } else { DTRACE_PROBE3(zfs__ace__allows, znode_t *, zp, zfs_ace_hdr_t *, acep, uint32_t, mask_matched); if (anyaccess) { mutex_exit(&zp->z_acl_lock); return (0); } } *working_mode &= ~mask_matched; } /* Are we done? */ if (*working_mode == 0) break; } mutex_exit(&zp->z_acl_lock); /* Put the found 'denies' back on the working mode */ if (deny_mask) { *working_mode |= deny_mask; return (SET_ERROR(EACCES)); } else if (*working_mode) { return (-1); } return (0); } /* * Return true if any access whatsoever granted, we don't actually * care what access is granted. */ boolean_t zfs_has_access(znode_t *zp, cred_t *cr) { uint32_t have = ACE_ALL_PERMS; if (zfs_zaccess_aces_check(zp, &have, B_TRUE, cr) != 0) { uid_t owner; owner = zfs_fuid_map_id(zp->z_zfsvfs, zp->z_uid, cr, ZFS_OWNER); return (secpolicy_vnode_any_access(cr, ZTOV(zp), owner) == 0); } return (B_TRUE); } static int zfs_zaccess_common(znode_t *zp, uint32_t v4_mode, uint32_t *working_mode, boolean_t *check_privs, boolean_t skipaclchk, cred_t *cr) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; int err; *working_mode = v4_mode; *check_privs = B_TRUE; /* * Short circuit empty requests */ if (v4_mode == 0 || zfsvfs->z_replay) { *working_mode = 0; return (0); } if ((err = zfs_zaccess_dataset_check(zp, v4_mode)) != 0) { *check_privs = B_FALSE; return (err); } /* * The caller requested that the ACL check be skipped. This * would only happen if the caller checked VOP_ACCESS() with a * 32 bit ACE mask and already had the appropriate permissions. */ if (skipaclchk) { *working_mode = 0; return (0); } return (zfs_zaccess_aces_check(zp, working_mode, B_FALSE, cr)); } static int zfs_zaccess_append(znode_t *zp, uint32_t *working_mode, boolean_t *check_privs, cred_t *cr) { if (*working_mode != ACE_WRITE_DATA) return (SET_ERROR(EACCES)); return (zfs_zaccess_common(zp, ACE_APPEND_DATA, working_mode, check_privs, B_FALSE, cr)); } int zfs_fastaccesschk_execute(znode_t *zdp, cred_t *cr) { boolean_t owner = B_FALSE; boolean_t groupmbr = B_FALSE; boolean_t is_attr; uid_t uid = crgetuid(cr); int error; if (zdp->z_pflags & ZFS_AV_QUARANTINED) return (SET_ERROR(EACCES)); is_attr = ((zdp->z_pflags & ZFS_XATTR) && (ZTOV(zdp)->v_type == VDIR)); if (is_attr) goto slow; mutex_enter(&zdp->z_acl_lock); if (zdp->z_pflags & ZFS_NO_EXECS_DENIED) { mutex_exit(&zdp->z_acl_lock); return (0); } if (FUID_INDEX(zdp->z_uid) != 0 || FUID_INDEX(zdp->z_gid) != 0) { mutex_exit(&zdp->z_acl_lock); goto slow; } if (uid == zdp->z_uid) { owner = B_TRUE; if (zdp->z_mode & S_IXUSR) { mutex_exit(&zdp->z_acl_lock); return (0); } else { mutex_exit(&zdp->z_acl_lock); goto slow; } } if (groupmember(zdp->z_gid, cr)) { groupmbr = B_TRUE; if (zdp->z_mode & S_IXGRP) { mutex_exit(&zdp->z_acl_lock); return (0); } else { mutex_exit(&zdp->z_acl_lock); goto slow; } } if (!owner && !groupmbr) { if (zdp->z_mode & S_IXOTH) { mutex_exit(&zdp->z_acl_lock); return (0); } } mutex_exit(&zdp->z_acl_lock); slow: DTRACE_PROBE(zfs__fastpath__execute__access__miss); ZFS_ENTER(zdp->z_zfsvfs); error = zfs_zaccess(zdp, ACE_EXECUTE, 0, B_FALSE, cr); ZFS_EXIT(zdp->z_zfsvfs); return (error); } /* * Determine whether Access should be granted/denied. * * The least priv subsytem is always consulted as a basic privilege * can define any form of access. */ int zfs_zaccess(znode_t *zp, int mode, int flags, boolean_t skipaclchk, cred_t *cr) { uint32_t working_mode; int error; int is_attr; boolean_t check_privs; znode_t *xzp; znode_t *check_zp = zp; mode_t needed_bits; uid_t owner; is_attr = ((zp->z_pflags & ZFS_XATTR) && (ZTOV(zp)->v_type == VDIR)); #ifdef __FreeBSD_kernel__ /* * In FreeBSD, we don't care about permissions of individual ADS. * Note that not checking them is not just an optimization - without * this shortcut, EA operations may bogusly fail with EACCES. */ if (zp->z_pflags & ZFS_XATTR) return (0); #else /* * If attribute then validate against base file */ if (is_attr) { uint64_t parent; if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_PARENT(zp->z_zfsvfs), &parent, sizeof (parent))) != 0) return (error); if ((error = zfs_zget(zp->z_zfsvfs, parent, &xzp)) != 0) { return (error); } check_zp = xzp; /* * fixup mode to map to xattr perms */ if (mode & (ACE_WRITE_DATA|ACE_APPEND_DATA)) { mode &= ~(ACE_WRITE_DATA|ACE_APPEND_DATA); mode |= ACE_WRITE_NAMED_ATTRS; } if (mode & (ACE_READ_DATA|ACE_EXECUTE)) { mode &= ~(ACE_READ_DATA|ACE_EXECUTE); mode |= ACE_READ_NAMED_ATTRS; } } #endif owner = zfs_fuid_map_id(zp->z_zfsvfs, zp->z_uid, cr, ZFS_OWNER); /* * Map the bits required to the standard vnode flags VREAD|VWRITE|VEXEC * in needed_bits. Map the bits mapped by working_mode (currently * missing) in missing_bits. * Call secpolicy_vnode_access2() with (needed_bits & ~checkmode), * needed_bits. */ needed_bits = 0; working_mode = mode; if ((working_mode & (ACE_READ_ACL|ACE_READ_ATTRIBUTES)) && owner == crgetuid(cr)) working_mode &= ~(ACE_READ_ACL|ACE_READ_ATTRIBUTES); if (working_mode & (ACE_READ_DATA|ACE_READ_NAMED_ATTRS| ACE_READ_ACL|ACE_READ_ATTRIBUTES|ACE_SYNCHRONIZE)) needed_bits |= VREAD; if (working_mode & (ACE_WRITE_DATA|ACE_WRITE_NAMED_ATTRS| ACE_APPEND_DATA|ACE_WRITE_ATTRIBUTES|ACE_SYNCHRONIZE)) needed_bits |= VWRITE; if (working_mode & ACE_EXECUTE) needed_bits |= VEXEC; if ((error = zfs_zaccess_common(check_zp, mode, &working_mode, &check_privs, skipaclchk, cr)) == 0) { if (is_attr) VN_RELE(ZTOV(xzp)); return (secpolicy_vnode_access2(cr, ZTOV(zp), owner, needed_bits, needed_bits)); } if (error && !check_privs) { if (is_attr) VN_RELE(ZTOV(xzp)); return (error); } if (error && (flags & V_APPEND)) { error = zfs_zaccess_append(zp, &working_mode, &check_privs, cr); } if (error && check_privs) { mode_t checkmode = 0; /* * First check for implicit owner permission on * read_acl/read_attributes */ error = 0; ASSERT(working_mode != 0); if ((working_mode & (ACE_READ_ACL|ACE_READ_ATTRIBUTES) && owner == crgetuid(cr))) working_mode &= ~(ACE_READ_ACL|ACE_READ_ATTRIBUTES); if (working_mode & (ACE_READ_DATA|ACE_READ_NAMED_ATTRS| ACE_READ_ACL|ACE_READ_ATTRIBUTES|ACE_SYNCHRONIZE)) checkmode |= VREAD; if (working_mode & (ACE_WRITE_DATA|ACE_WRITE_NAMED_ATTRS| ACE_APPEND_DATA|ACE_WRITE_ATTRIBUTES|ACE_SYNCHRONIZE)) checkmode |= VWRITE; if (working_mode & ACE_EXECUTE) checkmode |= VEXEC; error = secpolicy_vnode_access2(cr, ZTOV(check_zp), owner, needed_bits & ~checkmode, needed_bits); if (error == 0 && (working_mode & ACE_WRITE_OWNER)) error = secpolicy_vnode_chown(ZTOV(check_zp), cr, owner); if (error == 0 && (working_mode & ACE_WRITE_ACL)) error = secpolicy_vnode_setdac(ZTOV(check_zp), cr, owner); if (error == 0 && (working_mode & (ACE_DELETE|ACE_DELETE_CHILD))) error = secpolicy_vnode_remove(ZTOV(check_zp), cr); if (error == 0 && (working_mode & ACE_SYNCHRONIZE)) { error = secpolicy_vnode_chown(ZTOV(check_zp), cr, owner); } if (error == 0) { /* * See if any bits other than those already checked * for are still present. If so then return EACCES */ if (working_mode & ~(ZFS_CHECKED_MASKS)) { error = SET_ERROR(EACCES); } } } else if (error == 0) { error = secpolicy_vnode_access2(cr, ZTOV(zp), owner, needed_bits, needed_bits); } if (is_attr) VN_RELE(ZTOV(xzp)); return (error); } /* * Translate traditional unix VREAD/VWRITE/VEXEC mode into * native ACL format and call zfs_zaccess() */ int zfs_zaccess_rwx(znode_t *zp, mode_t mode, int flags, cred_t *cr) { return (zfs_zaccess(zp, zfs_unix_to_v4(mode >> 6), flags, B_FALSE, cr)); } /* * Access function for secpolicy_vnode_setattr */ int zfs_zaccess_unix(znode_t *zp, mode_t mode, cred_t *cr) { int v4_mode = zfs_unix_to_v4(mode >> 6); return (zfs_zaccess(zp, v4_mode, 0, B_FALSE, cr)); } static int zfs_delete_final_check(znode_t *zp, znode_t *dzp, mode_t available_perms, cred_t *cr) { int error; uid_t downer; downer = zfs_fuid_map_id(dzp->z_zfsvfs, dzp->z_uid, cr, ZFS_OWNER); error = secpolicy_vnode_access2(cr, ZTOV(dzp), downer, available_perms, VWRITE|VEXEC); if (error == 0) error = zfs_sticky_remove_access(dzp, zp, cr); return (error); } /* * Determine whether Access should be granted/deny, without * consulting least priv subsystem. * * The following chart is the recommended NFSv4 enforcement for * ability to delete an object. * * ------------------------------------------------------- * | Parent Dir | Target Object Permissions | * | permissions | | * ------------------------------------------------------- * | | ACL Allows | ACL Denies| Delete | * | | Delete | Delete | unspecified| * ------------------------------------------------------- * | ACL Allows | Permit | Permit | Permit | * | DELETE_CHILD | | * ------------------------------------------------------- * | ACL Denies | Permit | Deny | Deny | * | DELETE_CHILD | | | | * ------------------------------------------------------- * | ACL specifies | | | | * | only allow | Permit | Permit | Permit | * | write and | | | | * | execute | | | | * ------------------------------------------------------- * | ACL denies | | | | * | write and | Permit | Deny | Deny | * | execute | | | | * ------------------------------------------------------- * ^ * | * No search privilege, can't even look up file? * */ int zfs_zaccess_delete(znode_t *dzp, znode_t *zp, cred_t *cr) { uint32_t dzp_working_mode = 0; uint32_t zp_working_mode = 0; int dzp_error, zp_error; mode_t available_perms; boolean_t dzpcheck_privs = B_TRUE; boolean_t zpcheck_privs = B_TRUE; /* * We want specific DELETE permissions to * take precedence over WRITE/EXECUTE. We don't * want an ACL such as this to mess us up. * user:joe:write_data:deny,user:joe:delete:allow * * However, deny permissions may ultimately be overridden * by secpolicy_vnode_access(). * * We will ask for all of the necessary permissions and then * look at the working modes from the directory and target object * to determine what was found. */ if (zp->z_pflags & (ZFS_IMMUTABLE | ZFS_NOUNLINK)) return (SET_ERROR(EPERM)); /* * First row * If the directory permissions allow the delete, we are done. */ if ((dzp_error = zfs_zaccess_common(dzp, ACE_DELETE_CHILD, &dzp_working_mode, &dzpcheck_privs, B_FALSE, cr)) == 0) return (0); /* * If target object has delete permission then we are done */ if ((zp_error = zfs_zaccess_common(zp, ACE_DELETE, &zp_working_mode, &zpcheck_privs, B_FALSE, cr)) == 0) return (0); ASSERT(dzp_error && zp_error); if (!dzpcheck_privs) return (dzp_error); if (!zpcheck_privs) return (zp_error); /* * Second row * * If directory returns EACCES then delete_child was denied * due to deny delete_child. In this case send the request through * secpolicy_vnode_remove(). We don't use zfs_delete_final_check() * since that *could* allow the delete based on write/execute permission * and we want delete permissions to override write/execute. */ if (dzp_error == EACCES) return (secpolicy_vnode_remove(ZTOV(dzp), cr)); /* XXXPJD: s/dzp/zp/ ? */ /* * Third Row * only need to see if we have write/execute on directory. */ dzp_error = zfs_zaccess_common(dzp, ACE_EXECUTE|ACE_WRITE_DATA, &dzp_working_mode, &dzpcheck_privs, B_FALSE, cr); if (dzp_error != 0 && !dzpcheck_privs) return (dzp_error); /* * Fourth row */ available_perms = (dzp_working_mode & ACE_WRITE_DATA) ? 0 : VWRITE; available_perms |= (dzp_working_mode & ACE_EXECUTE) ? 0 : VEXEC; return (zfs_delete_final_check(zp, dzp, available_perms, cr)); } int zfs_zaccess_rename(znode_t *sdzp, znode_t *szp, znode_t *tdzp, znode_t *tzp, cred_t *cr) { int add_perm; int error; if (szp->z_pflags & ZFS_AV_QUARANTINED) return (SET_ERROR(EACCES)); add_perm = (ZTOV(szp)->v_type == VDIR) ? ACE_ADD_SUBDIRECTORY : ACE_ADD_FILE; /* * Rename permissions are combination of delete permission + * add file/subdir permission. * * BSD operating systems also require write permission * on the directory being moved from one parent directory * to another. */ if (ZTOV(szp)->v_type == VDIR && ZTOV(sdzp) != ZTOV(tdzp)) { if (error = zfs_zaccess(szp, ACE_WRITE_DATA, 0, B_FALSE, cr)) return (error); } /* * first make sure we do the delete portion. * * If that succeeds then check for add_file/add_subdir permissions */ if (error = zfs_zaccess_delete(sdzp, szp, cr)) return (error); /* * If we have a tzp, see if we can delete it? */ if (tzp) { if (error = zfs_zaccess_delete(tdzp, tzp, cr)) return (error); } /* * Now check for add permissions */ error = zfs_zaccess(tdzp, add_perm, 0, B_FALSE, cr); return (error); } Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c (revision 303775) @@ -1,1115 +1,891 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2013, 2015 by Delphix. All rights reserved. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* - * zfs_match_find() is used by zfs_dirent_lock() to peform zap lookups + * zfs_match_find() is used by zfs_dirent_lookup() to peform zap lookups * of names after deciding which is the appropriate lookup interface. */ static int -zfs_match_find(zfsvfs_t *zfsvfs, znode_t *dzp, char *name, boolean_t exact, - boolean_t update, int *deflags, pathname_t *rpnp, uint64_t *zoid) +zfs_match_find(zfsvfs_t *zfsvfs, znode_t *dzp, const char *name, + boolean_t exact, uint64_t *zoid) { int error; if (zfsvfs->z_norm) { - matchtype_t mt = MT_FIRST; - boolean_t conflict = B_FALSE; - size_t bufsz = 0; - char *buf = NULL; + matchtype_t mt = exact? MT_EXACT : MT_FIRST; - if (rpnp) { - buf = rpnp->pn_buf; - bufsz = rpnp->pn_bufsize; - } - if (exact) - mt = MT_EXACT; /* * In the non-mixed case we only expect there would ever * be one match, but we need to use the normalizing lookup. */ error = zap_lookup_norm(zfsvfs->z_os, dzp->z_id, name, 8, 1, - zoid, mt, buf, bufsz, &conflict); - if (!error && deflags) - *deflags = conflict ? ED_CASE_CONFLICT : 0; + zoid, mt, NULL, 0, NULL); } else { error = zap_lookup(zfsvfs->z_os, dzp->z_id, name, 8, 1, zoid); } *zoid = ZFS_DIRENT_OBJ(*zoid); - if (error == ENOENT && update) - dnlc_update(ZTOV(dzp), name, DNLC_NO_VNODE); - return (error); } /* - * Lock a directory entry. A dirlock on protects that name - * in dzp's directory zap object. As long as you hold a dirlock, you can - * assume two things: (1) dzp cannot be reaped, and (2) no other thread - * can change the zap entry for (i.e. link or unlink) this name. + * Look up a directory entry under a locked vnode. + * dvp being locked gives us a guarantee that there are no concurrent + * modification of the directory and, thus, if a node can be found in + * the directory, then it must not be unlinked. * * Input arguments: * dzp - znode for directory * name - name of entry to lock * flag - ZNEW: if the entry already exists, fail with EEXIST. * ZEXISTS: if the entry does not exist, fail with ENOENT. - * ZSHARED: allow concurrent access with other ZSHARED callers. * ZXATTR: we want dzp's xattr directory - * ZCILOOK: On a mixed sensitivity file system, - * this lookup should be case-insensitive. - * ZCIEXACT: On a purely case-insensitive file system, - * this lookup should be case-sensitive. - * ZRENAMING: we are locking for renaming, force narrow locks - * ZHAVELOCK: Don't grab the z_name_lock for this call. The - * current thread already holds it. * * Output arguments: * zpp - pointer to the znode for the entry (NULL if there isn't one) - * dlpp - pointer to the dirlock for this entry (NULL on error) - * direntflags - (case-insensitive lookup only) - * flags if multiple case-sensitive matches exist in directory - * realpnp - (case-insensitive lookup only) - * actual name matched within the directory * * Return value: 0 on success or errno on failure. * * NOTE: Always checks for, and rejects, '.' and '..'. - * NOTE: For case-insensitive file systems we take wide locks (see below), - * but return znode pointers to a single match. */ int -zfs_dirent_lock(zfs_dirlock_t **dlpp, znode_t *dzp, char *name, znode_t **zpp, - int flag, int *direntflags, pathname_t *realpnp) +zfs_dirent_lookup(znode_t *dzp, const char *name, znode_t **zpp, int flag) { zfsvfs_t *zfsvfs = dzp->z_zfsvfs; - zfs_dirlock_t *dl; - boolean_t update; boolean_t exact; uint64_t zoid; vnode_t *vp = NULL; int error = 0; - int cmpflags; + ASSERT_VOP_LOCKED(ZTOV(dzp), __func__); + *zpp = NULL; - *dlpp = NULL; /* * Verify that we are not trying to lock '.', '..', or '.zfs' */ if (name[0] == '.' && (name[1] == '\0' || (name[1] == '.' && name[2] == '\0')) || zfs_has_ctldir(dzp) && strcmp(name, ZFS_CTLDIR_NAME) == 0) return (SET_ERROR(EEXIST)); /* * Case sensitivity and normalization preferences are set when * the file system is created. These are stored in the * zfsvfs->z_case and zfsvfs->z_norm fields. These choices - * affect what vnodes can be cached in the DNLC, how we - * perform zap lookups, and the "width" of our dirlocks. + * affect how we perform zap lookups. * - * A normal dirlock locks a single name. Note that with - * normalization a name can be composed multiple ways, but - * when normalized, these names all compare equal. A wide - * dirlock locks multiple names. We need these when the file - * system is supporting mixed-mode access. It is sometimes - * necessary to lock all case permutations of file name at - * once so that simultaneous case-insensitive/case-sensitive - * behaves as rationally as possible. - */ - - /* * Decide if exact matches should be requested when performing * a zap lookup on file systems supporting case-insensitive * access. - */ - exact = - ((zfsvfs->z_case == ZFS_CASE_INSENSITIVE) && (flag & ZCIEXACT)) || - ((zfsvfs->z_case == ZFS_CASE_MIXED) && !(flag & ZCILOOK)); - - /* - * Only look in or update the DNLC if we are looking for the - * name on a file system that does not require normalization - * or case folding. We can also look there if we happen to be - * on a non-normalizing, mixed sensitivity file system IF we - * are looking for the exact name. * - * Maybe can add TO-UPPERed version of name to dnlc in ci-only - * case for performance improvement? + * NB: we do not need to worry about this flag for ZFS_CASE_SENSITIVE + * because in that case MT_EXACT and MT_FIRST should produce exactly + * the same result. */ - update = !zfsvfs->z_norm || - ((zfsvfs->z_case == ZFS_CASE_MIXED) && - !(zfsvfs->z_norm & ~U8_TEXTPREP_TOUPPER) && !(flag & ZCILOOK)); + exact = zfsvfs->z_case == ZFS_CASE_MIXED; - /* - * ZRENAMING indicates we are in a situation where we should - * take narrow locks regardless of the file system's - * preferences for normalizing and case folding. This will - * prevent us deadlocking trying to grab the same wide lock - * twice if the two names happen to be case-insensitive - * matches. - */ - if (flag & ZRENAMING) - cmpflags = 0; - else - cmpflags = zfsvfs->z_norm; - - /* - * Wait until there are no locks on this name. - * - * Don't grab the the lock if it is already held. However, cannot - * have both ZSHARED and ZHAVELOCK together. - */ - ASSERT(!(flag & ZSHARED) || !(flag & ZHAVELOCK)); - if (!(flag & ZHAVELOCK)) - rw_enter(&dzp->z_name_lock, RW_READER); - - mutex_enter(&dzp->z_lock); - for (;;) { - if (dzp->z_unlinked && !(flag & ZXATTR)) { - mutex_exit(&dzp->z_lock); - if (!(flag & ZHAVELOCK)) - rw_exit(&dzp->z_name_lock); - return (SET_ERROR(ENOENT)); - } - for (dl = dzp->z_dirlocks; dl != NULL; dl = dl->dl_next) { - if ((u8_strcmp(name, dl->dl_name, 0, cmpflags, - U8_UNICODE_LATEST, &error) == 0) || error != 0) - break; - } - if (error != 0) { - mutex_exit(&dzp->z_lock); - if (!(flag & ZHAVELOCK)) - rw_exit(&dzp->z_name_lock); - return (SET_ERROR(ENOENT)); - } - if (dl == NULL) { - size_t namesize; - - /* - * Allocate a new dirlock and add it to the list. - */ - namesize = strlen(name) + 1; - dl = kmem_alloc(sizeof (zfs_dirlock_t) + namesize, - KM_SLEEP); - cv_init(&dl->dl_cv, NULL, CV_DEFAULT, NULL); - dl->dl_name = (char *)(dl + 1); - bcopy(name, dl->dl_name, namesize); - dl->dl_sharecnt = 0; - dl->dl_namelock = 0; - dl->dl_namesize = namesize; - dl->dl_dzp = dzp; - dl->dl_next = dzp->z_dirlocks; - dzp->z_dirlocks = dl; - break; - } - if ((flag & ZSHARED) && dl->dl_sharecnt != 0) - break; - cv_wait(&dl->dl_cv, &dzp->z_lock); - } - - /* - * If the z_name_lock was NOT held for this dirlock record it. - */ - if (flag & ZHAVELOCK) - dl->dl_namelock = 1; - - if (flag & ZSHARED) - dl->dl_sharecnt++; - - mutex_exit(&dzp->z_lock); - - /* - * We have a dirlock on the name. (Note that it is the dirlock, - * not the dzp's z_lock, that protects the name in the zap object.) - * See if there's an object by this name; if so, put a hold on it. - */ + if (dzp->z_unlinked && !(flag & ZXATTR)) + return (ENOENT); if (flag & ZXATTR) { error = sa_lookup(dzp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), &zoid, sizeof (zoid)); if (error == 0) error = (zoid == 0 ? ENOENT : 0); } else { - if (update) - vp = dnlc_lookup(ZTOV(dzp), name); - if (vp == DNLC_NO_VNODE) { - VN_RELE(vp); - error = SET_ERROR(ENOENT); - } else if (vp) { - if (flag & ZNEW) { - zfs_dirent_unlock(dl); - VN_RELE(vp); - return (SET_ERROR(EEXIST)); - } - *dlpp = dl; - *zpp = VTOZ(vp); - return (0); - } else { - error = zfs_match_find(zfsvfs, dzp, name, exact, - update, direntflags, realpnp, &zoid); - } + error = zfs_match_find(zfsvfs, dzp, name, exact, &zoid); } if (error) { if (error != ENOENT || (flag & ZEXISTS)) { - zfs_dirent_unlock(dl); return (error); } } else { if (flag & ZNEW) { - zfs_dirent_unlock(dl); return (SET_ERROR(EEXIST)); } error = zfs_zget(zfsvfs, zoid, zpp); - if (error) { - zfs_dirent_unlock(dl); + if (error) return (error); - } - if (!(flag & ZXATTR) && update) - dnlc_update(ZTOV(dzp), name, ZTOV(*zpp)); + ASSERT(!(*zpp)->z_unlinked); } - *dlpp = dl; - return (0); } -/* - * Unlock this directory entry and wake anyone who was waiting for it. - */ -void -zfs_dirent_unlock(zfs_dirlock_t *dl) +static int +zfs_dd_lookup(znode_t *dzp, znode_t **zpp) { - znode_t *dzp = dl->dl_dzp; - zfs_dirlock_t **prev_dl, *cur_dl; + zfsvfs_t *zfsvfs = dzp->z_zfsvfs; + znode_t *zp; + vnode_t *vp; + uint64_t parent; + int error; - mutex_enter(&dzp->z_lock); + ASSERT_VOP_LOCKED(ZTOV(dzp), __func__); + ASSERT(RRM_READ_HELD(&zfsvfs->z_teardown_lock)); - if (!dl->dl_namelock) - rw_exit(&dzp->z_name_lock); + if (dzp->z_unlinked) + return (ENOENT); - if (dl->dl_sharecnt > 1) { - dl->dl_sharecnt--; - mutex_exit(&dzp->z_lock); - return; - } - prev_dl = &dzp->z_dirlocks; - while ((cur_dl = *prev_dl) != dl) - prev_dl = &cur_dl->dl_next; - *prev_dl = dl->dl_next; - cv_broadcast(&dl->dl_cv); - mutex_exit(&dzp->z_lock); + if ((error = sa_lookup(dzp->z_sa_hdl, + SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent))) != 0) + return (error); - cv_destroy(&dl->dl_cv); - kmem_free(dl, sizeof (*dl) + dl->dl_namesize); + /* + * If we are a snapshot mounted under .zfs, return + * the snapshot directory. + */ + if (parent == dzp->z_id && zfsvfs->z_parent != zfsvfs) { + error = zfsctl_root_lookup(zfsvfs->z_parent->z_ctldir, + "snapshot", &vp, NULL, 0, NULL, kcred, + NULL, NULL, NULL); + if (error == 0) + zp = VTOZ(vp); + } else { + error = zfs_zget(zfsvfs, parent, &zp); + } + if (error == 0) + *zpp = zp; + return (error); } -/* - * Look up an entry in a directory. - * - * NOTE: '.' and '..' are handled as special cases because - * no directory entries are actually stored for them. If this is - * the root of a filesystem, then '.zfs' is also treated as a - * special pseudo-directory. - */ int -zfs_dirlook(znode_t *dzp, char *name, vnode_t **vpp, int flags, - int *deflg, pathname_t *rpnp) +zfs_dirlook(znode_t *dzp, const char *name, znode_t **zpp) { - zfs_dirlock_t *dl; + zfsvfs_t *zfsvfs = dzp->z_zfsvfs; znode_t *zp; int error = 0; - uint64_t parent; - int unlinked; - if (name[0] == 0 || (name[0] == '.' && name[1] == 0)) { - mutex_enter(&dzp->z_lock); - unlinked = dzp->z_unlinked; - mutex_exit(&dzp->z_lock); - if (unlinked) - return (ENOENT); + ASSERT_VOP_LOCKED(ZTOV(dzp), __func__); + ASSERT(RRM_READ_HELD(&zfsvfs->z_teardown_lock)); - *vpp = ZTOV(dzp); - VN_HOLD(*vpp); - } else if (name[0] == '.' && name[1] == '.' && name[2] == 0) { - zfsvfs_t *zfsvfs = dzp->z_zfsvfs; + if (dzp->z_unlinked) + return (SET_ERROR(ENOENT)); - /* - * If we are a snapshot mounted under .zfs, return - * the vp for the snapshot directory. - */ - if ((error = sa_lookup(dzp->z_sa_hdl, - SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent))) != 0) - return (error); - if (parent == dzp->z_id && zfsvfs->z_parent != zfsvfs) { - error = zfsctl_root_lookup(zfsvfs->z_parent->z_ctldir, - "snapshot", vpp, NULL, 0, NULL, kcred, - NULL, NULL, NULL); - return (error); - } - - mutex_enter(&dzp->z_lock); - unlinked = dzp->z_unlinked; - mutex_exit(&dzp->z_lock); - if (unlinked) - return (ENOENT); - - rw_enter(&dzp->z_parent_lock, RW_READER); - error = zfs_zget(zfsvfs, parent, &zp); - if (error == 0) - *vpp = ZTOV(zp); - rw_exit(&dzp->z_parent_lock); + if (name[0] == 0 || (name[0] == '.' && name[1] == 0)) { + *zpp = dzp; + } else if (name[0] == '.' && name[1] == '.' && name[2] == 0) { + error = zfs_dd_lookup(dzp, zpp); } else if (zfs_has_ctldir(dzp) && strcmp(name, ZFS_CTLDIR_NAME) == 0) { - *vpp = zfsctl_root(dzp); + *zpp = VTOZ(zfsctl_root(dzp)); } else { - int zf; - - zf = ZEXISTS | ZSHARED; - if (flags & FIGNORECASE) - zf |= ZCILOOK; - - error = zfs_dirent_lock(&dl, dzp, name, &zp, zf, deflg, rpnp); + error = zfs_dirent_lookup(dzp, name, &zp, ZEXISTS); if (error == 0) { - *vpp = ZTOV(zp); - zfs_dirent_unlock(dl); dzp->z_zn_prefetch = B_TRUE; /* enable prefetching */ + *zpp = zp; } - rpnp = NULL; } - - if ((flags & FIGNORECASE) && rpnp && !error) - (void) strlcpy(rpnp->pn_buf, name, rpnp->pn_bufsize); - return (error); } /* * unlinked Set (formerly known as the "delete queue") Error Handling * * When dealing with the unlinked set, we dmu_tx_hold_zap(), but we * don't specify the name of the entry that we will be manipulating. We * also fib and say that we won't be adding any new entries to the * unlinked set, even though we might (this is to lower the minimum file * size that can be deleted in a full filesystem). So on the small * chance that the nlink list is using a fat zap (ie. has more than * 2000 entries), we *may* not pre-read a block that's needed. * Therefore it is remotely possible for some of the assertions * regarding the unlinked set below to fail due to i/o error. On a * nondebug system, this will result in the space being leaked. */ void zfs_unlinked_add(znode_t *zp, dmu_tx_t *tx) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; ASSERT(zp->z_unlinked); ASSERT(zp->z_links == 0); VERIFY3U(0, ==, zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)); } /* * Clean up any znodes that had no links when we either crashed or * (force) umounted the file system. */ void zfs_unlinked_drain(zfsvfs_t *zfsvfs) { zap_cursor_t zc; zap_attribute_t zap; dmu_object_info_t doi; znode_t *zp; int error; /* * Interate over the contents of the unlinked set. */ for (zap_cursor_init(&zc, zfsvfs->z_os, zfsvfs->z_unlinkedobj); zap_cursor_retrieve(&zc, &zap) == 0; zap_cursor_advance(&zc)) { /* * See what kind of object we have in list */ error = dmu_object_info(zfsvfs->z_os, zap.za_first_integer, &doi); if (error != 0) continue; ASSERT((doi.doi_type == DMU_OT_PLAIN_FILE_CONTENTS) || (doi.doi_type == DMU_OT_DIRECTORY_CONTENTS)); /* * We need to re-mark these list entries for deletion, * so we pull them back into core and set zp->z_unlinked. */ error = zfs_zget(zfsvfs, zap.za_first_integer, &zp); /* * We may pick up znodes that are already marked for deletion. * This could happen during the purge of an extended attribute * directory. All we need to do is skip over them, since they * are already in the system marked z_unlinked. */ if (error != 0) continue; + vn_lock(ZTOV(zp), LK_EXCLUSIVE | LK_RETRY); zp->z_unlinked = B_TRUE; - VN_RELE(ZTOV(zp)); + vput(ZTOV(zp)); } zap_cursor_fini(&zc); } /* * Delete the entire contents of a directory. Return a count * of the number of entries that could not be deleted. If we encounter * an error, return a count of at least one so that the directory stays * in the unlinked set. * * NOTE: this function assumes that the directory is inactive, * so there is no need to lock its entries before deletion. * Also, it assumes the directory contents is *only* regular * files. */ static int zfs_purgedir(znode_t *dzp) { zap_cursor_t zc; zap_attribute_t zap; znode_t *xzp; dmu_tx_t *tx; zfsvfs_t *zfsvfs = dzp->z_zfsvfs; - zfs_dirlock_t dl; int skipped = 0; int error; for (zap_cursor_init(&zc, zfsvfs->z_os, dzp->z_id); (error = zap_cursor_retrieve(&zc, &zap)) == 0; zap_cursor_advance(&zc)) { error = zfs_zget(zfsvfs, ZFS_DIRENT_OBJ(zap.za_first_integer), &xzp); if (error) { skipped += 1; continue; } + vn_lock(ZTOV(xzp), LK_EXCLUSIVE | LK_RETRY); ASSERT((ZTOV(xzp)->v_type == VREG) || (ZTOV(xzp)->v_type == VLNK)); tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_sa(tx, dzp->z_sa_hdl, B_FALSE); dmu_tx_hold_zap(tx, dzp->z_id, FALSE, zap.za_name); dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE); dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL); /* Is this really needed ? */ zfs_sa_upgrade_txholds(tx, xzp); dmu_tx_mark_netfree(tx); error = dmu_tx_assign(tx, TXG_WAIT); if (error) { dmu_tx_abort(tx); - VN_RELE(ZTOV(xzp)); + vput(ZTOV(xzp)); skipped += 1; continue; } - bzero(&dl, sizeof (dl)); - dl.dl_dzp = dzp; - dl.dl_name = zap.za_name; - error = zfs_link_destroy(&dl, xzp, tx, 0, NULL); + error = zfs_link_destroy(dzp, zap.za_name, xzp, tx, 0, NULL); if (error) skipped += 1; dmu_tx_commit(tx); - VN_RELE(ZTOV(xzp)); + vput(ZTOV(xzp)); } zap_cursor_fini(&zc); if (error != ENOENT) skipped += 1; return (skipped); } void zfs_rmnode(znode_t *zp) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; objset_t *os = zfsvfs->z_os; znode_t *xzp = NULL; dmu_tx_t *tx; uint64_t acl_obj; uint64_t xattr_obj; int error; ASSERT(zp->z_links == 0); + ASSERT_VOP_ELOCKED(ZTOV(zp), __func__); /* * If this is an attribute directory, purge its contents. */ if (ZTOV(zp) != NULL && ZTOV(zp)->v_type == VDIR && (zp->z_pflags & ZFS_XATTR)) { if (zfs_purgedir(zp) != 0) { /* * Not enough space to delete some xattrs. * Leave it in the unlinked set. */ zfs_znode_dmu_fini(zp); zfs_znode_free(zp); return; } } else { /* * Free up all the data in the file. We don't do this for * XATTR directories because we need truncate and remove to be * in the same tx, like in zfs_znode_delete(). Otherwise, if * we crash here we'll end up with an inconsistent truncated * zap object in the delete queue. Note a truncated file is * harmless since it only contains user data. */ error = dmu_free_long_range(os, zp->z_id, 0, DMU_OBJECT_END); if (error) { /* * Not enough space. Leave the file in the unlinked * set. */ zfs_znode_dmu_fini(zp); zfs_znode_free(zp); return; } } /* * If the file has extended attributes, we're going to unlink * the xattr dir. */ error = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), &xattr_obj, sizeof (xattr_obj)); if (error == 0 && xattr_obj) { error = zfs_zget(zfsvfs, xattr_obj, &xzp); - ASSERT(error == 0); + ASSERT3S(error, ==, 0); + vn_lock(ZTOV(xzp), LK_EXCLUSIVE | LK_RETRY); } acl_obj = zfs_external_acl(zp); /* * Set up the final transaction. */ tx = dmu_tx_create(os); dmu_tx_hold_free(tx, zp->z_id, 0, DMU_OBJECT_END); dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL); if (xzp) { dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, TRUE, NULL); dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE); } if (acl_obj) dmu_tx_hold_free(tx, acl_obj, 0, DMU_OBJECT_END); zfs_sa_upgrade_txholds(tx, zp); error = dmu_tx_assign(tx, TXG_WAIT); if (error) { /* * Not enough space to delete the file. Leave it in the * unlinked set, leaking it until the fs is remounted (at * which point we'll call zfs_unlinked_drain() to process it). */ dmu_tx_abort(tx); zfs_znode_dmu_fini(zp); zfs_znode_free(zp); goto out; } if (xzp) { ASSERT(error == 0); - mutex_enter(&xzp->z_lock); xzp->z_unlinked = B_TRUE; /* mark xzp for deletion */ xzp->z_links = 0; /* no more links to it */ VERIFY(0 == sa_update(xzp->z_sa_hdl, SA_ZPL_LINKS(zfsvfs), &xzp->z_links, sizeof (xzp->z_links), tx)); - mutex_exit(&xzp->z_lock); zfs_unlinked_add(xzp, tx); } /* Remove this znode from the unlinked set */ VERIFY3U(0, ==, zap_remove_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)); zfs_znode_delete(zp, tx); dmu_tx_commit(tx); out: if (xzp) - VN_RELE(ZTOV(xzp)); + vput(ZTOV(xzp)); } static uint64_t zfs_dirent(znode_t *zp, uint64_t mode) { uint64_t de = zp->z_id; if (zp->z_zfsvfs->z_version >= ZPL_VERSION_DIRENT_TYPE) de |= IFTODT(mode) << 60; return (de); } /* - * Link zp into dl. Can only fail if zp has been unlinked. + * Link zp into dzp. Can only fail if zp has been unlinked. */ int -zfs_link_create(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag) +zfs_link_create(znode_t *dzp, const char *name, znode_t *zp, dmu_tx_t *tx, + int flag) { - znode_t *dzp = dl->dl_dzp; zfsvfs_t *zfsvfs = zp->z_zfsvfs; vnode_t *vp = ZTOV(zp); uint64_t value; int zp_is_dir = (vp->v_type == VDIR); sa_bulk_attr_t bulk[5]; uint64_t mtime[2], ctime[2]; int count = 0; int error; - mutex_enter(&zp->z_lock); - + ASSERT_VOP_ELOCKED(ZTOV(dzp), __func__); + ASSERT_VOP_ELOCKED(ZTOV(zp), __func__); +#if 0 + if (zp_is_dir) { + error = 0; + if (dzp->z_links >= LINK_MAX) + error = SET_ERROR(EMLINK); + return (error); + } +#endif if (!(flag & ZRENAMING)) { if (zp->z_unlinked) { /* no new links to unlinked zp */ ASSERT(!(flag & (ZNEW | ZEXISTS))); - mutex_exit(&zp->z_lock); return (SET_ERROR(ENOENT)); } +#if 0 + if (zp->z_links >= LINK_MAX) { + return (SET_ERROR(EMLINK)); + } +#endif zp->z_links++; SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs), NULL, &zp->z_links, sizeof (zp->z_links)); + } else { + ASSERT(zp->z_unlinked == 0); } SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_PARENT(zfsvfs), NULL, &dzp->z_id, sizeof (dzp->z_id)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL, &zp->z_pflags, sizeof (zp->z_pflags)); if (!(flag & ZNEW)) { SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, ctime, sizeof (ctime)); zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime, ctime, B_TRUE); } error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx); - ASSERT(error == 0); + ASSERT0(error); - mutex_exit(&zp->z_lock); - - mutex_enter(&dzp->z_lock); dzp->z_size++; dzp->z_links += zp_is_dir; count = 0; SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs), NULL, &dzp->z_size, sizeof (dzp->z_size)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs), NULL, &dzp->z_links, sizeof (dzp->z_links)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, mtime, sizeof (mtime)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, ctime, sizeof (ctime)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL, &dzp->z_pflags, sizeof (dzp->z_pflags)); zfs_tstamp_update_setup(dzp, CONTENT_MODIFIED, mtime, ctime, B_TRUE); error = sa_bulk_update(dzp->z_sa_hdl, bulk, count, tx); - ASSERT(error == 0); - mutex_exit(&dzp->z_lock); + ASSERT0(error); value = zfs_dirent(zp, zp->z_mode); - error = zap_add(zp->z_zfsvfs->z_os, dzp->z_id, dl->dl_name, + error = zap_add(zp->z_zfsvfs->z_os, dzp->z_id, name, 8, 1, &value, tx); - ASSERT(error == 0); + VERIFY0(error); - dnlc_update(ZTOV(dzp), dl->dl_name, vp); - return (0); } static int -zfs_dropname(zfs_dirlock_t *dl, znode_t *zp, znode_t *dzp, dmu_tx_t *tx, +zfs_dropname(znode_t *dzp, const char *name, znode_t *zp, dmu_tx_t *tx, int flag) { int error; if (zp->z_zfsvfs->z_norm) { - if (((zp->z_zfsvfs->z_case == ZFS_CASE_INSENSITIVE) && - (flag & ZCIEXACT)) || - ((zp->z_zfsvfs->z_case == ZFS_CASE_MIXED) && - !(flag & ZCILOOK))) + if (zp->z_zfsvfs->z_case == ZFS_CASE_MIXED) error = zap_remove_norm(zp->z_zfsvfs->z_os, - dzp->z_id, dl->dl_name, MT_EXACT, tx); + dzp->z_id, name, MT_EXACT, tx); else error = zap_remove_norm(zp->z_zfsvfs->z_os, - dzp->z_id, dl->dl_name, MT_FIRST, tx); + dzp->z_id, name, MT_FIRST, tx); } else { error = zap_remove(zp->z_zfsvfs->z_os, - dzp->z_id, dl->dl_name, tx); + dzp->z_id, name, tx); } return (error); } /* - * Unlink zp from dl, and mark zp for deletion if this was the last link. + * Unlink zp from dzp, and mark zp for deletion if this was the last link. * Can fail if zp is a mount point (EBUSY) or a non-empty directory (EEXIST). * If 'unlinkedp' is NULL, we put unlinked znodes on the unlinked list. * If it's non-NULL, we use it to indicate whether the znode needs deletion, * and it's the caller's job to do it. */ int -zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag, - boolean_t *unlinkedp) +zfs_link_destroy(znode_t *dzp, const char *name, znode_t *zp, dmu_tx_t *tx, + int flag, boolean_t *unlinkedp) { - znode_t *dzp = dl->dl_dzp; zfsvfs_t *zfsvfs = dzp->z_zfsvfs; vnode_t *vp = ZTOV(zp); int zp_is_dir = (vp->v_type == VDIR); boolean_t unlinked = B_FALSE; sa_bulk_attr_t bulk[5]; uint64_t mtime[2], ctime[2]; int count = 0; int error; - dnlc_remove(ZTOV(dzp), dl->dl_name); + ASSERT_VOP_ELOCKED(ZTOV(dzp), __func__); + ASSERT_VOP_ELOCKED(ZTOV(zp), __func__); if (!(flag & ZRENAMING)) { - if (vn_vfswlock(vp)) /* prevent new mounts on zp */ - return (SET_ERROR(EBUSY)); - if (vn_ismntpt(vp)) { /* don't remove mount point */ - vn_vfsunlock(vp); - return (SET_ERROR(EBUSY)); - } - - mutex_enter(&zp->z_lock); - if (zp_is_dir && !zfs_dirempty(zp)) { - mutex_exit(&zp->z_lock); - vn_vfsunlock(vp); #ifdef illumos return (SET_ERROR(EEXIST)); #else return (SET_ERROR(ENOTEMPTY)); #endif } /* * If we get here, we are going to try to remove the object. * First try removing the name from the directory; if that * fails, return the error. */ - error = zfs_dropname(dl, zp, dzp, tx, flag); + error = zfs_dropname(dzp, name, zp, tx, flag); if (error != 0) { - mutex_exit(&zp->z_lock); - vn_vfsunlock(vp); return (error); } if (zp->z_links <= zp_is_dir) { zfs_panic_recover("zfs: link count on vnode %p is %u, " "should be at least %u", zp->z_vnode, (int)zp->z_links, zp_is_dir + 1); zp->z_links = zp_is_dir + 1; } if (--zp->z_links == zp_is_dir) { zp->z_unlinked = B_TRUE; zp->z_links = 0; unlinked = B_TRUE; } else { SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, sizeof (ctime)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL, &zp->z_pflags, sizeof (zp->z_pflags)); zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime, ctime, B_TRUE); } SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs), NULL, &zp->z_links, sizeof (zp->z_links)); error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx); count = 0; - ASSERT(error == 0); - mutex_exit(&zp->z_lock); - vn_vfsunlock(vp); + ASSERT0(error); } else { - error = zfs_dropname(dl, zp, dzp, tx, flag); + ASSERT(zp->z_unlinked == 0); + error = zfs_dropname(dzp, name, zp, tx, flag); if (error != 0) return (error); } - mutex_enter(&dzp->z_lock); dzp->z_size--; /* one dirent removed */ dzp->z_links -= zp_is_dir; /* ".." link from zp */ SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs), NULL, &dzp->z_links, sizeof (dzp->z_links)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs), NULL, &dzp->z_size, sizeof (dzp->z_size)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, ctime, sizeof (ctime)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, mtime, sizeof (mtime)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL, &dzp->z_pflags, sizeof (dzp->z_pflags)); zfs_tstamp_update_setup(dzp, CONTENT_MODIFIED, mtime, ctime, B_TRUE); error = sa_bulk_update(dzp->z_sa_hdl, bulk, count, tx); - ASSERT(error == 0); - mutex_exit(&dzp->z_lock); + ASSERT0(error); if (unlinkedp != NULL) *unlinkedp = unlinked; else if (unlinked) zfs_unlinked_add(zp, tx); return (0); } /* - * Indicate whether the directory is empty. Works with or without z_lock - * held, but can only be consider a hint in the latter case. Returns true - * if only "." and ".." remain and there's no work in progress. + * Indicate whether the directory is empty. */ boolean_t zfs_dirempty(znode_t *dzp) { - return (dzp->z_size == 2 && dzp->z_dirlocks == 0); + return (dzp->z_size == 2); } int zfs_make_xattrdir(znode_t *zp, vattr_t *vap, vnode_t **xvpp, cred_t *cr) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; znode_t *xzp; dmu_tx_t *tx; int error; zfs_acl_ids_t acl_ids; boolean_t fuid_dirtied; uint64_t parent; *xvpp = NULL; /* * In FreeBSD, access checking for creating an EA is being done * in zfs_setextattr(), */ #ifndef __FreeBSD_kernel__ if (error = zfs_zaccess(zp, ACE_WRITE_NAMED_ATTRS, 0, B_FALSE, cr)) return (error); #endif if ((error = zfs_acl_ids_create(zp, IS_XATTR, vap, cr, NULL, &acl_ids)) != 0) return (error); if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) { zfs_acl_ids_free(&acl_ids); return (SET_ERROR(EDQUOT)); } getnewvnode_reserve(1); tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes + ZFS_SA_BASE_ATTR_SIZE); dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE); dmu_tx_hold_zap(tx, DMU_NEW_OBJECT, FALSE, NULL); fuid_dirtied = zfsvfs->z_fuid_dirty; if (fuid_dirtied) zfs_fuid_txhold(zfsvfs, tx); error = dmu_tx_assign(tx, TXG_WAIT); if (error) { zfs_acl_ids_free(&acl_ids); dmu_tx_abort(tx); return (error); } zfs_mknode(zp, vap, tx, cr, IS_XATTR, &xzp, &acl_ids); if (fuid_dirtied) zfs_fuid_sync(zfsvfs, tx); #ifdef DEBUG error = sa_lookup(xzp->z_sa_hdl, SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent)); ASSERT(error == 0 && parent == zp->z_id); #endif VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), &xzp->z_id, sizeof (xzp->z_id), tx)); (void) zfs_log_create(zfsvfs->z_log, tx, TX_MKXATTR, zp, xzp, "", NULL, acl_ids.z_fuidp, vap); zfs_acl_ids_free(&acl_ids); dmu_tx_commit(tx); getnewvnode_drop_reserve(); *xvpp = ZTOV(xzp); return (0); } /* * Return a znode for the extended attribute directory for zp. * ** If the directory does not already exist, it is created ** * * IN: zp - znode to obtain attribute directory from * cr - credentials of caller * flags - flags from the VOP_LOOKUP call * * OUT: xzpp - pointer to extended attribute znode * * RETURN: 0 on success * error number on failure */ int zfs_get_xattrdir(znode_t *zp, vnode_t **xvpp, cred_t *cr, int flags) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; znode_t *xzp; - zfs_dirlock_t *dl; vattr_t va; int error; top: - error = zfs_dirent_lock(&dl, zp, "", &xzp, ZXATTR, NULL, NULL); + error = zfs_dirent_lookup(zp, "", &xzp, ZXATTR); if (error) return (error); if (xzp != NULL) { *xvpp = ZTOV(xzp); - zfs_dirent_unlock(dl); return (0); } if (!(flags & CREATE_XATTR_DIR)) { - zfs_dirent_unlock(dl); #ifdef illumos return (SET_ERROR(ENOENT)); #else return (SET_ERROR(ENOATTR)); #endif } if (zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) { - zfs_dirent_unlock(dl); return (SET_ERROR(EROFS)); } /* * The ability to 'create' files in an attribute * directory comes from the write_xattr permission on the base file. * * The ability to 'search' an attribute directory requires * read_xattr permission on the base file. * * Once in a directory the ability to read/write attributes * is controlled by the permissions on the attribute file. */ va.va_mask = AT_TYPE | AT_MODE | AT_UID | AT_GID; va.va_type = VDIR; va.va_mode = S_IFDIR | S_ISVTX | 0777; zfs_fuid_map_ids(zp, cr, &va.va_uid, &va.va_gid); error = zfs_make_xattrdir(zp, &va, xvpp, cr); - zfs_dirent_unlock(dl); if (error == ERESTART) { /* NB: we already did dmu_tx_wait() if necessary */ goto top; } if (error == 0) VOP_UNLOCK(*xvpp, 0); return (error); } /* * Decide whether it is okay to remove within a sticky directory. * * In sticky directories, write access is not sufficient; * you can remove entries from a directory only if: * * you own the directory, * you own the entry, * the entry is a plain file and you have write access, * or you are privileged (checked in secpolicy...). * * The function returns 0 if remove access is granted. */ int zfs_sticky_remove_access(znode_t *zdp, znode_t *zp, cred_t *cr) { uid_t uid; uid_t downer; uid_t fowner; zfsvfs_t *zfsvfs = zdp->z_zfsvfs; if (zdp->z_zfsvfs->z_replay) return (0); if ((zdp->z_mode & S_ISVTX) == 0) return (0); downer = zfs_fuid_map_id(zfsvfs, zdp->z_uid, cr, ZFS_OWNER); fowner = zfs_fuid_map_id(zfsvfs, zp->z_uid, cr, ZFS_OWNER); if ((uid = crgetuid(cr)) == downer || uid == fowner || (ZTOV(zp)->v_type == VREG && zfs_zaccess(zp, ACE_WRITE_DATA, 0, B_FALSE, cr) == 0)) return (0); else return (secpolicy_vnode_remove(ZTOV(zp), cr)); } Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_sa.c =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_sa.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_sa.c (revision 303775) @@ -1,333 +1,327 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. */ #include #include #include #include #include /* * ZPL attribute registration table. * Order of attributes doesn't matter * a unique value will be assigned for each * attribute that is file system specific * * This is just the set of ZPL attributes that this * version of ZFS deals with natively. The file system * could have other attributes stored in files, but they will be * ignored. The SA framework will preserve them, just that * this version of ZFS won't change or delete them. */ sa_attr_reg_t zfs_attr_table[ZPL_END+1] = { {"ZPL_ATIME", sizeof (uint64_t) * 2, SA_UINT64_ARRAY, 0}, {"ZPL_MTIME", sizeof (uint64_t) * 2, SA_UINT64_ARRAY, 1}, {"ZPL_CTIME", sizeof (uint64_t) * 2, SA_UINT64_ARRAY, 2}, {"ZPL_CRTIME", sizeof (uint64_t) * 2, SA_UINT64_ARRAY, 3}, {"ZPL_GEN", sizeof (uint64_t), SA_UINT64_ARRAY, 4}, {"ZPL_MODE", sizeof (uint64_t), SA_UINT64_ARRAY, 5}, {"ZPL_SIZE", sizeof (uint64_t), SA_UINT64_ARRAY, 6}, {"ZPL_PARENT", sizeof (uint64_t), SA_UINT64_ARRAY, 7}, {"ZPL_LINKS", sizeof (uint64_t), SA_UINT64_ARRAY, 8}, {"ZPL_XATTR", sizeof (uint64_t), SA_UINT64_ARRAY, 9}, {"ZPL_RDEV", sizeof (uint64_t), SA_UINT64_ARRAY, 10}, {"ZPL_FLAGS", sizeof (uint64_t), SA_UINT64_ARRAY, 11}, {"ZPL_UID", sizeof (uint64_t), SA_UINT64_ARRAY, 12}, {"ZPL_GID", sizeof (uint64_t), SA_UINT64_ARRAY, 13}, {"ZPL_PAD", sizeof (uint64_t) * 4, SA_UINT64_ARRAY, 14}, {"ZPL_ZNODE_ACL", 88, SA_UINT8_ARRAY, 15}, {"ZPL_DACL_COUNT", sizeof (uint64_t), SA_UINT64_ARRAY, 0}, {"ZPL_SYMLINK", 0, SA_UINT8_ARRAY, 0}, {"ZPL_SCANSTAMP", 32, SA_UINT8_ARRAY, 0}, {"ZPL_DACL_ACES", 0, SA_ACL, 0}, {NULL, 0, 0, 0} }; #ifdef _KERNEL int zfs_sa_readlink(znode_t *zp, uio_t *uio) { dmu_buf_t *db = sa_get_db(zp->z_sa_hdl); size_t bufsz; int error; bufsz = zp->z_size; if (bufsz + ZFS_OLD_ZNODE_PHYS_SIZE <= db->db_size) { error = uiomove((caddr_t)db->db_data + ZFS_OLD_ZNODE_PHYS_SIZE, MIN((size_t)bufsz, uio->uio_resid), UIO_READ, uio); } else { dmu_buf_t *dbp; if ((error = dmu_buf_hold(zp->z_zfsvfs->z_os, zp->z_id, 0, FTAG, &dbp, DMU_READ_NO_PREFETCH)) == 0) { error = uiomove(dbp->db_data, MIN((size_t)bufsz, uio->uio_resid), UIO_READ, uio); dmu_buf_rele(dbp, FTAG); } } return (error); } void zfs_sa_symlink(znode_t *zp, char *link, int len, dmu_tx_t *tx) { dmu_buf_t *db = sa_get_db(zp->z_sa_hdl); if (ZFS_OLD_ZNODE_PHYS_SIZE + len <= dmu_bonus_max()) { VERIFY(dmu_set_bonus(db, len + ZFS_OLD_ZNODE_PHYS_SIZE, tx) == 0); if (len) { bcopy(link, (caddr_t)db->db_data + ZFS_OLD_ZNODE_PHYS_SIZE, len); } } else { dmu_buf_t *dbp; zfs_grow_blocksize(zp, len, tx); VERIFY(0 == dmu_buf_hold(zp->z_zfsvfs->z_os, zp->z_id, 0, FTAG, &dbp, DMU_READ_NO_PREFETCH)); dmu_buf_will_dirty(dbp, tx); ASSERT3U(len, <=, dbp->db_size); bcopy(link, dbp->db_data, len); dmu_buf_rele(dbp, FTAG); } } void zfs_sa_get_scanstamp(znode_t *zp, xvattr_t *xvap) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; xoptattr_t *xoap; - ASSERT(MUTEX_HELD(&zp->z_lock)); + ASSERT_VOP_LOCKED(ZTOV(zp), __func__); VERIFY((xoap = xva_getxoptattr(xvap)) != NULL); if (zp->z_is_sa) { if (sa_lookup(zp->z_sa_hdl, SA_ZPL_SCANSTAMP(zfsvfs), &xoap->xoa_av_scanstamp, sizeof (xoap->xoa_av_scanstamp)) != 0) return; } else { dmu_object_info_t doi; dmu_buf_t *db = sa_get_db(zp->z_sa_hdl); int len; if (!(zp->z_pflags & ZFS_BONUS_SCANSTAMP)) return; sa_object_info(zp->z_sa_hdl, &doi); len = sizeof (xoap->xoa_av_scanstamp) + ZFS_OLD_ZNODE_PHYS_SIZE; if (len <= doi.doi_bonus_size) { (void) memcpy(xoap->xoa_av_scanstamp, (caddr_t)db->db_data + ZFS_OLD_ZNODE_PHYS_SIZE, sizeof (xoap->xoa_av_scanstamp)); } } XVA_SET_RTN(xvap, XAT_AV_SCANSTAMP); } void zfs_sa_set_scanstamp(znode_t *zp, xvattr_t *xvap, dmu_tx_t *tx) { zfsvfs_t *zfsvfs = zp->z_zfsvfs; xoptattr_t *xoap; - ASSERT(MUTEX_HELD(&zp->z_lock)); + ASSERT_VOP_ELOCKED(ZTOV(zp), __func__); VERIFY((xoap = xva_getxoptattr(xvap)) != NULL); if (zp->z_is_sa) VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_SCANSTAMP(zfsvfs), &xoap->xoa_av_scanstamp, sizeof (xoap->xoa_av_scanstamp), tx)); else { dmu_object_info_t doi; dmu_buf_t *db = sa_get_db(zp->z_sa_hdl); int len; sa_object_info(zp->z_sa_hdl, &doi); len = sizeof (xoap->xoa_av_scanstamp) + ZFS_OLD_ZNODE_PHYS_SIZE; if (len > doi.doi_bonus_size) VERIFY(dmu_set_bonus(db, len, tx) == 0); (void) memcpy((caddr_t)db->db_data + ZFS_OLD_ZNODE_PHYS_SIZE, xoap->xoa_av_scanstamp, sizeof (xoap->xoa_av_scanstamp)); zp->z_pflags |= ZFS_BONUS_SCANSTAMP; VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_FLAGS(zfsvfs), &zp->z_pflags, sizeof (uint64_t), tx)); } } /* * I'm not convinced we should do any of this upgrade. * since the SA code can read both old/new znode formats * with probably little to no performance difference. * * All new files will be created with the new format. */ void zfs_sa_upgrade(sa_handle_t *hdl, dmu_tx_t *tx) { dmu_buf_t *db = sa_get_db(hdl); znode_t *zp = sa_get_userdata(hdl); zfsvfs_t *zfsvfs = zp->z_zfsvfs; sa_bulk_attr_t bulk[20]; int count = 0; sa_bulk_attr_t sa_attrs[20] = { 0 }; zfs_acl_locator_cb_t locate = { 0 }; uint64_t uid, gid, mode, rdev, xattr, parent; uint64_t crtime[2], mtime[2], ctime[2]; zfs_acl_phys_t znode_acl; char scanstamp[AV_SCANSTAMP_SZ]; - boolean_t drop_lock = B_FALSE; /* * No upgrade if ACL isn't cached * since we won't know which locks are held * and ready the ACL would require special "locked" * interfaces that would be messy */ if (zp->z_acl_cached == NULL || ZTOV(zp)->v_type == VLNK) return; /* - * If the z_lock is held and we aren't the owner - * the just return since we don't want to deadlock + * If the vnode lock is held and we aren't the owner + * then just return since we don't want to deadlock * trying to update the status of z_is_sa. This * file can then be upgraded at a later time. * * Otherwise, we know we are doing the * sa_update() that caused us to enter this function. */ - if (mutex_owner(&zp->z_lock) != curthread) { - if (mutex_tryenter(&zp->z_lock) == 0) + if (vn_lock(ZTOV(zp), LK_EXCLUSIVE | LK_NOWAIT) != 0) return; - else - drop_lock = B_TRUE; - } /* First do a bulk query of the attributes that aren't cached */ SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, &mtime, 16); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, 16); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CRTIME(zfsvfs), NULL, &crtime, 16); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL, &mode, 8); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_PARENT(zfsvfs), NULL, &parent, 8); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_XATTR(zfsvfs), NULL, &xattr, 8); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_RDEV(zfsvfs), NULL, &rdev, 8); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zfsvfs), NULL, &uid, 8); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zfsvfs), NULL, &gid, 8); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ZNODE_ACL(zfsvfs), NULL, &znode_acl, 88); if (sa_bulk_lookup_locked(hdl, bulk, count) != 0) goto done; /* * While the order here doesn't matter its best to try and organize * it is such a way to pick up an already existing layout number */ count = 0; SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_MODE(zfsvfs), NULL, &mode, 8); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_SIZE(zfsvfs), NULL, &zp->z_size, 8); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_GEN(zfsvfs), NULL, &zp->z_gen, 8); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_UID(zfsvfs), NULL, &uid, 8); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_GID(zfsvfs), NULL, &gid, 8); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_PARENT(zfsvfs), NULL, &parent, 8); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_FLAGS(zfsvfs), NULL, &zp->z_pflags, 8); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_ATIME(zfsvfs), NULL, zp->z_atime, 16); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_MTIME(zfsvfs), NULL, &mtime, 16); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, 16); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_CRTIME(zfsvfs), NULL, &crtime, 16); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_LINKS(zfsvfs), NULL, &zp->z_links, 8); if (zp->z_vnode->v_type == VBLK || zp->z_vnode->v_type == VCHR) SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_RDEV(zfsvfs), NULL, &rdev, 8); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_DACL_COUNT(zfsvfs), NULL, &zp->z_acl_cached->z_acl_count, 8); if (zp->z_acl_cached->z_version < ZFS_ACL_VERSION_FUID) zfs_acl_xform(zp, zp->z_acl_cached, CRED()); locate.cb_aclp = zp->z_acl_cached; SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_DACL_ACES(zfsvfs), zfs_acl_data_locator, &locate, zp->z_acl_cached->z_acl_bytes); if (xattr) SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_XATTR(zfsvfs), NULL, &xattr, 8); /* if scanstamp then add scanstamp */ if (zp->z_pflags & ZFS_BONUS_SCANSTAMP) { bcopy((caddr_t)db->db_data + ZFS_OLD_ZNODE_PHYS_SIZE, scanstamp, AV_SCANSTAMP_SZ); SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_SCANSTAMP(zfsvfs), NULL, scanstamp, AV_SCANSTAMP_SZ); zp->z_pflags &= ~ZFS_BONUS_SCANSTAMP; } VERIFY(dmu_set_bonustype(db, DMU_OT_SA, tx) == 0); VERIFY(sa_replace_all_by_template_locked(hdl, sa_attrs, count, tx) == 0); if (znode_acl.z_acl_extern_obj) VERIFY(0 == dmu_object_free(zfsvfs->z_os, znode_acl.z_acl_extern_obj, tx)); zp->z_is_sa = B_TRUE; done: - if (drop_lock) - mutex_exit(&zp->z_lock); + VOP_UNLOCK(ZTOV(zp), 0); } void zfs_sa_upgrade_txholds(dmu_tx_t *tx, znode_t *zp) { if (!zp->z_zfsvfs->z_use_sa || zp->z_is_sa) return; dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE); if (zfs_external_acl(zp)) { dmu_tx_hold_free(tx, zfs_external_acl(zp), 0, DMU_OBJECT_END); } } #endif Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c (revision 303775) @@ -1,2502 +1,2518 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011 Pawel Jakub Dawidek . * All rights reserved. * Copyright (c) 2012, 2015 by Delphix. All rights reserved. * Copyright (c) 2014 Integros [integros.com] */ /* Portions Copyright 2010 Robert Milkowski */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "zfs_comutil.h" struct mtx zfs_debug_mtx; MTX_SYSINIT(zfs_debug_mtx, &zfs_debug_mtx, "zfs_debug", MTX_DEF); SYSCTL_NODE(_vfs, OID_AUTO, zfs, CTLFLAG_RW, 0, "ZFS file system"); int zfs_super_owner; SYSCTL_INT(_vfs_zfs, OID_AUTO, super_owner, CTLFLAG_RW, &zfs_super_owner, 0, "File system owner can perform privileged operation on his file systems"); int zfs_debug_level; SYSCTL_INT(_vfs_zfs, OID_AUTO, debug, CTLFLAG_RWTUN, &zfs_debug_level, 0, "Debug level"); SYSCTL_NODE(_vfs_zfs, OID_AUTO, version, CTLFLAG_RD, 0, "ZFS versions"); static int zfs_version_acl = ZFS_ACL_VERSION; SYSCTL_INT(_vfs_zfs_version, OID_AUTO, acl, CTLFLAG_RD, &zfs_version_acl, 0, "ZFS_ACL_VERSION"); static int zfs_version_spa = SPA_VERSION; SYSCTL_INT(_vfs_zfs_version, OID_AUTO, spa, CTLFLAG_RD, &zfs_version_spa, 0, "SPA_VERSION"); static int zfs_version_zpl = ZPL_VERSION; SYSCTL_INT(_vfs_zfs_version, OID_AUTO, zpl, CTLFLAG_RD, &zfs_version_zpl, 0, "ZPL_VERSION"); static int zfs_mount(vfs_t *vfsp); static int zfs_umount(vfs_t *vfsp, int fflag); static int zfs_root(vfs_t *vfsp, int flags, vnode_t **vpp); static int zfs_statfs(vfs_t *vfsp, struct statfs *statp); static int zfs_vget(vfs_t *vfsp, ino_t ino, int flags, vnode_t **vpp); static int zfs_sync(vfs_t *vfsp, int waitfor); static int zfs_checkexp(vfs_t *vfsp, struct sockaddr *nam, int *extflagsp, struct ucred **credanonp, int *numsecflavors, int **secflavors); static int zfs_fhtovp(vfs_t *vfsp, fid_t *fidp, int flags, vnode_t **vpp); static void zfs_objset_close(zfsvfs_t *zfsvfs); static void zfs_freevfs(vfs_t *vfsp); struct vfsops zfs_vfsops = { .vfs_mount = zfs_mount, .vfs_unmount = zfs_umount, .vfs_root = zfs_root, .vfs_statfs = zfs_statfs, .vfs_vget = zfs_vget, .vfs_sync = zfs_sync, .vfs_checkexp = zfs_checkexp, .vfs_fhtovp = zfs_fhtovp, }; VFS_SET(zfs_vfsops, zfs, VFCF_JAIL | VFCF_DELEGADMIN); /* * We need to keep a count of active fs's. * This is necessary to prevent our module * from being unloaded after a umount -f */ static uint32_t zfs_active_fs_count = 0; /*ARGSUSED*/ static int zfs_sync(vfs_t *vfsp, int waitfor) { /* * Data integrity is job one. We don't want a compromised kernel * writing to the storage pool, so we never sync during panic. */ if (panicstr) return (0); /* * Ignore the system syncher. ZFS already commits async data * at zfs_txg_timeout intervals. */ if (waitfor == MNT_LAZY) return (0); if (vfsp != NULL) { /* * Sync a specific filesystem. */ zfsvfs_t *zfsvfs = vfsp->vfs_data; dsl_pool_t *dp; int error; error = vfs_stdsync(vfsp, waitfor); if (error != 0) return (error); ZFS_ENTER(zfsvfs); dp = dmu_objset_pool(zfsvfs->z_os); /* * If the system is shutting down, then skip any * filesystems which may exist on a suspended pool. */ if (sys_shutdown && spa_suspended(dp->dp_spa)) { ZFS_EXIT(zfsvfs); return (0); } if (zfsvfs->z_log != NULL) zil_commit(zfsvfs->z_log, 0); ZFS_EXIT(zfsvfs); } else { /* * Sync all ZFS filesystems. This is what happens when you * run sync(1M). Unlike other filesystems, ZFS honors the * request by waiting for all pools to commit all dirty data. */ spa_sync_allpools(); } return (0); } #ifndef __FreeBSD_kernel__ static int zfs_create_unique_device(dev_t *dev) { major_t new_major; do { ASSERT3U(zfs_minor, <=, MAXMIN32); minor_t start = zfs_minor; do { mutex_enter(&zfs_dev_mtx); if (zfs_minor >= MAXMIN32) { /* * If we're still using the real major * keep out of /dev/zfs and /dev/zvol minor * number space. If we're using a getudev()'ed * major number, we can use all of its minors. */ if (zfs_major == ddi_name_to_major(ZFS_DRIVER)) zfs_minor = ZFS_MIN_MINOR; else zfs_minor = 0; } else { zfs_minor++; } *dev = makedevice(zfs_major, zfs_minor); mutex_exit(&zfs_dev_mtx); } while (vfs_devismounted(*dev) && zfs_minor != start); if (zfs_minor == start) { /* * We are using all ~262,000 minor numbers for the * current major number. Create a new major number. */ if ((new_major = getudev()) == (major_t)-1) { cmn_err(CE_WARN, "zfs_mount: Can't get unique major " "device number."); return (-1); } mutex_enter(&zfs_dev_mtx); zfs_major = new_major; zfs_minor = 0; mutex_exit(&zfs_dev_mtx); } else { break; } /* CONSTANTCONDITION */ } while (1); return (0); } #endif /* !__FreeBSD_kernel__ */ static void atime_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; if (newval == TRUE) { zfsvfs->z_atime = TRUE; zfsvfs->z_vfs->vfs_flag &= ~MNT_NOATIME; vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOATIME); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_ATIME, NULL, 0); } else { zfsvfs->z_atime = FALSE; zfsvfs->z_vfs->vfs_flag |= MNT_NOATIME; vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_ATIME); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOATIME, NULL, 0); } } static void xattr_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; if (newval == TRUE) { /* XXX locking on vfs_flag? */ #ifdef TODO zfsvfs->z_vfs->vfs_flag |= VFS_XATTR; #endif vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOXATTR); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_XATTR, NULL, 0); } else { /* XXX locking on vfs_flag? */ #ifdef TODO zfsvfs->z_vfs->vfs_flag &= ~VFS_XATTR; #endif vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_XATTR); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOXATTR, NULL, 0); } } static void blksz_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; ASSERT3U(newval, <=, spa_maxblocksize(dmu_objset_spa(zfsvfs->z_os))); ASSERT3U(newval, >=, SPA_MINBLOCKSIZE); ASSERT(ISP2(newval)); zfsvfs->z_max_blksz = newval; zfsvfs->z_vfs->mnt_stat.f_iosize = newval; } static void readonly_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; if (newval) { /* XXX locking on vfs_flag? */ zfsvfs->z_vfs->vfs_flag |= VFS_RDONLY; vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_RW); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_RO, NULL, 0); } else { /* XXX locking on vfs_flag? */ zfsvfs->z_vfs->vfs_flag &= ~VFS_RDONLY; vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_RO); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_RW, NULL, 0); } } static void setuid_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; if (newval == FALSE) { zfsvfs->z_vfs->vfs_flag |= VFS_NOSETUID; vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_SETUID); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOSETUID, NULL, 0); } else { zfsvfs->z_vfs->vfs_flag &= ~VFS_NOSETUID; vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOSETUID); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_SETUID, NULL, 0); } } static void exec_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; if (newval == FALSE) { zfsvfs->z_vfs->vfs_flag |= VFS_NOEXEC; vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_EXEC); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOEXEC, NULL, 0); } else { zfsvfs->z_vfs->vfs_flag &= ~VFS_NOEXEC; vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOEXEC); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_EXEC, NULL, 0); } } /* * The nbmand mount option can be changed at mount time. * We can't allow it to be toggled on live file systems or incorrect * behavior may be seen from cifs clients * * This property isn't registered via dsl_prop_register(), but this callback * will be called when a file system is first mounted */ static void nbmand_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; if (newval == FALSE) { vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NBMAND); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NONBMAND, NULL, 0); } else { vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NONBMAND); vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NBMAND, NULL, 0); } } static void snapdir_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; zfsvfs->z_show_ctldir = newval; } static void vscan_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; zfsvfs->z_vscan = newval; } static void acl_mode_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; zfsvfs->z_acl_mode = newval; } static void acl_inherit_changed_cb(void *arg, uint64_t newval) { zfsvfs_t *zfsvfs = arg; zfsvfs->z_acl_inherit = newval; } static int zfs_register_callbacks(vfs_t *vfsp) { struct dsl_dataset *ds = NULL; objset_t *os = NULL; zfsvfs_t *zfsvfs = NULL; uint64_t nbmand; boolean_t readonly = B_FALSE; boolean_t do_readonly = B_FALSE; boolean_t setuid = B_FALSE; boolean_t do_setuid = B_FALSE; boolean_t exec = B_FALSE; boolean_t do_exec = B_FALSE; #ifdef illumos boolean_t devices = B_FALSE; boolean_t do_devices = B_FALSE; #endif boolean_t xattr = B_FALSE; boolean_t do_xattr = B_FALSE; boolean_t atime = B_FALSE; boolean_t do_atime = B_FALSE; int error = 0; ASSERT(vfsp); zfsvfs = vfsp->vfs_data; ASSERT(zfsvfs); os = zfsvfs->z_os; /* * This function can be called for a snapshot when we update snapshot's * mount point, which isn't really supported. */ if (dmu_objset_is_snapshot(os)) return (EOPNOTSUPP); /* * The act of registering our callbacks will destroy any mount * options we may have. In order to enable temporary overrides * of mount options, we stash away the current values and * restore them after we register the callbacks. */ if (vfs_optionisset(vfsp, MNTOPT_RO, NULL) || !spa_writeable(dmu_objset_spa(os))) { readonly = B_TRUE; do_readonly = B_TRUE; } else if (vfs_optionisset(vfsp, MNTOPT_RW, NULL)) { readonly = B_FALSE; do_readonly = B_TRUE; } if (vfs_optionisset(vfsp, MNTOPT_NOSUID, NULL)) { setuid = B_FALSE; do_setuid = B_TRUE; } else { if (vfs_optionisset(vfsp, MNTOPT_NOSETUID, NULL)) { setuid = B_FALSE; do_setuid = B_TRUE; } else if (vfs_optionisset(vfsp, MNTOPT_SETUID, NULL)) { setuid = B_TRUE; do_setuid = B_TRUE; } } if (vfs_optionisset(vfsp, MNTOPT_NOEXEC, NULL)) { exec = B_FALSE; do_exec = B_TRUE; } else if (vfs_optionisset(vfsp, MNTOPT_EXEC, NULL)) { exec = B_TRUE; do_exec = B_TRUE; } if (vfs_optionisset(vfsp, MNTOPT_NOXATTR, NULL)) { xattr = B_FALSE; do_xattr = B_TRUE; } else if (vfs_optionisset(vfsp, MNTOPT_XATTR, NULL)) { xattr = B_TRUE; do_xattr = B_TRUE; } if (vfs_optionisset(vfsp, MNTOPT_NOATIME, NULL)) { atime = B_FALSE; do_atime = B_TRUE; } else if (vfs_optionisset(vfsp, MNTOPT_ATIME, NULL)) { atime = B_TRUE; do_atime = B_TRUE; } /* * We need to enter pool configuration here, so that we can use * dsl_prop_get_int_ds() to handle the special nbmand property below. * dsl_prop_get_integer() can not be used, because it has to acquire * spa_namespace_lock and we can not do that because we already hold * z_teardown_lock. The problem is that spa_config_sync() is called * with spa_namespace_lock held and the function calls ZFS vnode * operations to write the cache file and thus z_teardown_lock is * acquired after spa_namespace_lock. */ ds = dmu_objset_ds(os); dsl_pool_config_enter(dmu_objset_pool(os), FTAG); /* * nbmand is a special property. It can only be changed at * mount time. * * This is weird, but it is documented to only be changeable * at mount time. */ if (vfs_optionisset(vfsp, MNTOPT_NONBMAND, NULL)) { nbmand = B_FALSE; } else if (vfs_optionisset(vfsp, MNTOPT_NBMAND, NULL)) { nbmand = B_TRUE; } else if (error = dsl_prop_get_int_ds(ds, "nbmand", &nbmand) != 0) { dsl_pool_config_exit(dmu_objset_pool(os), FTAG); return (error); } /* * Register property callbacks. * * It would probably be fine to just check for i/o error from * the first prop_register(), but I guess I like to go * overboard... */ error = dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_ATIME), atime_changed_cb, zfsvfs); error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_XATTR), xattr_changed_cb, zfsvfs); error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_RECORDSIZE), blksz_changed_cb, zfsvfs); error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_READONLY), readonly_changed_cb, zfsvfs); #ifdef illumos error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_DEVICES), devices_changed_cb, zfsvfs); #endif error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_SETUID), setuid_changed_cb, zfsvfs); error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_EXEC), exec_changed_cb, zfsvfs); error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_SNAPDIR), snapdir_changed_cb, zfsvfs); error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_ACLMODE), acl_mode_changed_cb, zfsvfs); error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_ACLINHERIT), acl_inherit_changed_cb, zfsvfs); error = error ? error : dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_VSCAN), vscan_changed_cb, zfsvfs); dsl_pool_config_exit(dmu_objset_pool(os), FTAG); if (error) goto unregister; /* * Invoke our callbacks to restore temporary mount options. */ if (do_readonly) readonly_changed_cb(zfsvfs, readonly); if (do_setuid) setuid_changed_cb(zfsvfs, setuid); if (do_exec) exec_changed_cb(zfsvfs, exec); if (do_xattr) xattr_changed_cb(zfsvfs, xattr); if (do_atime) atime_changed_cb(zfsvfs, atime); nbmand_changed_cb(zfsvfs, nbmand); return (0); unregister: dsl_prop_unregister_all(ds, zfsvfs); return (error); } static int zfs_space_delta_cb(dmu_object_type_t bonustype, void *data, uint64_t *userp, uint64_t *groupp) { /* * Is it a valid type of object to track? */ if (bonustype != DMU_OT_ZNODE && bonustype != DMU_OT_SA) return (SET_ERROR(ENOENT)); /* * If we have a NULL data pointer * then assume the id's aren't changing and * return EEXIST to the dmu to let it know to * use the same ids */ if (data == NULL) return (SET_ERROR(EEXIST)); if (bonustype == DMU_OT_ZNODE) { znode_phys_t *znp = data; *userp = znp->zp_uid; *groupp = znp->zp_gid; } else { int hdrsize; sa_hdr_phys_t *sap = data; sa_hdr_phys_t sa = *sap; boolean_t swap = B_FALSE; ASSERT(bonustype == DMU_OT_SA); if (sa.sa_magic == 0) { /* * This should only happen for newly created * files that haven't had the znode data filled * in yet. */ *userp = 0; *groupp = 0; return (0); } if (sa.sa_magic == BSWAP_32(SA_MAGIC)) { sa.sa_magic = SA_MAGIC; sa.sa_layout_info = BSWAP_16(sa.sa_layout_info); swap = B_TRUE; } else { VERIFY3U(sa.sa_magic, ==, SA_MAGIC); } hdrsize = sa_hdrsize(&sa); VERIFY3U(hdrsize, >=, sizeof (sa_hdr_phys_t)); *userp = *((uint64_t *)((uintptr_t)data + hdrsize + SA_UID_OFFSET)); *groupp = *((uint64_t *)((uintptr_t)data + hdrsize + SA_GID_OFFSET)); if (swap) { *userp = BSWAP_64(*userp); *groupp = BSWAP_64(*groupp); } } return (0); } static void fuidstr_to_sid(zfsvfs_t *zfsvfs, const char *fuidstr, char *domainbuf, int buflen, uid_t *ridp) { uint64_t fuid; const char *domain; fuid = strtonum(fuidstr, NULL); domain = zfs_fuid_find_by_idx(zfsvfs, FUID_INDEX(fuid)); if (domain) (void) strlcpy(domainbuf, domain, buflen); else domainbuf[0] = '\0'; *ridp = FUID_RID(fuid); } static uint64_t zfs_userquota_prop_to_obj(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type) { switch (type) { case ZFS_PROP_USERUSED: return (DMU_USERUSED_OBJECT); case ZFS_PROP_GROUPUSED: return (DMU_GROUPUSED_OBJECT); case ZFS_PROP_USERQUOTA: return (zfsvfs->z_userquota_obj); case ZFS_PROP_GROUPQUOTA: return (zfsvfs->z_groupquota_obj); } return (0); } int zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type, uint64_t *cookiep, void *vbuf, uint64_t *bufsizep) { int error; zap_cursor_t zc; zap_attribute_t za; zfs_useracct_t *buf = vbuf; uint64_t obj; if (!dmu_objset_userspace_present(zfsvfs->z_os)) return (SET_ERROR(ENOTSUP)); obj = zfs_userquota_prop_to_obj(zfsvfs, type); if (obj == 0) { *bufsizep = 0; return (0); } for (zap_cursor_init_serialized(&zc, zfsvfs->z_os, obj, *cookiep); (error = zap_cursor_retrieve(&zc, &za)) == 0; zap_cursor_advance(&zc)) { if ((uintptr_t)buf - (uintptr_t)vbuf + sizeof (zfs_useracct_t) > *bufsizep) break; fuidstr_to_sid(zfsvfs, za.za_name, buf->zu_domain, sizeof (buf->zu_domain), &buf->zu_rid); buf->zu_space = za.za_first_integer; buf++; } if (error == ENOENT) error = 0; ASSERT3U((uintptr_t)buf - (uintptr_t)vbuf, <=, *bufsizep); *bufsizep = (uintptr_t)buf - (uintptr_t)vbuf; *cookiep = zap_cursor_serialize(&zc); zap_cursor_fini(&zc); return (error); } /* * buf must be big enough (eg, 32 bytes) */ static int id_to_fuidstr(zfsvfs_t *zfsvfs, const char *domain, uid_t rid, char *buf, boolean_t addok) { uint64_t fuid; int domainid = 0; if (domain && domain[0]) { domainid = zfs_fuid_find_by_domain(zfsvfs, domain, NULL, addok); if (domainid == -1) return (SET_ERROR(ENOENT)); } fuid = FUID_ENCODE(domainid, rid); (void) sprintf(buf, "%llx", (longlong_t)fuid); return (0); } int zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type, const char *domain, uint64_t rid, uint64_t *valp) { char buf[32]; int err; uint64_t obj; *valp = 0; if (!dmu_objset_userspace_present(zfsvfs->z_os)) return (SET_ERROR(ENOTSUP)); obj = zfs_userquota_prop_to_obj(zfsvfs, type); if (obj == 0) return (0); err = id_to_fuidstr(zfsvfs, domain, rid, buf, B_FALSE); if (err) return (err); err = zap_lookup(zfsvfs->z_os, obj, buf, 8, 1, valp); if (err == ENOENT) err = 0; return (err); } int zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type, const char *domain, uint64_t rid, uint64_t quota) { char buf[32]; int err; dmu_tx_t *tx; uint64_t *objp; boolean_t fuid_dirtied; if (type != ZFS_PROP_USERQUOTA && type != ZFS_PROP_GROUPQUOTA) return (SET_ERROR(EINVAL)); if (zfsvfs->z_version < ZPL_VERSION_USERSPACE) return (SET_ERROR(ENOTSUP)); objp = (type == ZFS_PROP_USERQUOTA) ? &zfsvfs->z_userquota_obj : &zfsvfs->z_groupquota_obj; err = id_to_fuidstr(zfsvfs, domain, rid, buf, B_TRUE); if (err) return (err); fuid_dirtied = zfsvfs->z_fuid_dirty; tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_zap(tx, *objp ? *objp : DMU_NEW_OBJECT, B_TRUE, NULL); if (*objp == 0) { dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_TRUE, zfs_userquota_prop_prefixes[type]); } if (fuid_dirtied) zfs_fuid_txhold(zfsvfs, tx); err = dmu_tx_assign(tx, TXG_WAIT); if (err) { dmu_tx_abort(tx); return (err); } mutex_enter(&zfsvfs->z_lock); if (*objp == 0) { *objp = zap_create(zfsvfs->z_os, DMU_OT_USERGROUP_QUOTA, DMU_OT_NONE, 0, tx); VERIFY(0 == zap_add(zfsvfs->z_os, MASTER_NODE_OBJ, zfs_userquota_prop_prefixes[type], 8, 1, objp, tx)); } mutex_exit(&zfsvfs->z_lock); if (quota == 0) { err = zap_remove(zfsvfs->z_os, *objp, buf, tx); if (err == ENOENT) err = 0; } else { err = zap_update(zfsvfs->z_os, *objp, buf, 8, 1, "a, tx); } ASSERT(err == 0); if (fuid_dirtied) zfs_fuid_sync(zfsvfs, tx); dmu_tx_commit(tx); return (err); } boolean_t zfs_fuid_overquota(zfsvfs_t *zfsvfs, boolean_t isgroup, uint64_t fuid) { char buf[32]; uint64_t used, quota, usedobj, quotaobj; int err; usedobj = isgroup ? DMU_GROUPUSED_OBJECT : DMU_USERUSED_OBJECT; quotaobj = isgroup ? zfsvfs->z_groupquota_obj : zfsvfs->z_userquota_obj; if (quotaobj == 0 || zfsvfs->z_replay) return (B_FALSE); (void) sprintf(buf, "%llx", (longlong_t)fuid); err = zap_lookup(zfsvfs->z_os, quotaobj, buf, 8, 1, "a); if (err != 0) return (B_FALSE); err = zap_lookup(zfsvfs->z_os, usedobj, buf, 8, 1, &used); if (err != 0) return (B_FALSE); return (used >= quota); } boolean_t zfs_owner_overquota(zfsvfs_t *zfsvfs, znode_t *zp, boolean_t isgroup) { uint64_t fuid; uint64_t quotaobj; quotaobj = isgroup ? zfsvfs->z_groupquota_obj : zfsvfs->z_userquota_obj; fuid = isgroup ? zp->z_gid : zp->z_uid; if (quotaobj == 0 || zfsvfs->z_replay) return (B_FALSE); return (zfs_fuid_overquota(zfsvfs, isgroup, fuid)); } /* * Associate this zfsvfs with the given objset, which must be owned. * This will cache a bunch of on-disk state from the objset in the * zfsvfs. */ static int zfsvfs_init(zfsvfs_t *zfsvfs, objset_t *os) { int error; uint64_t val; zfsvfs->z_max_blksz = SPA_OLD_MAXBLOCKSIZE; zfsvfs->z_show_ctldir = ZFS_SNAPDIR_VISIBLE; zfsvfs->z_os = os; error = zfs_get_zplprop(os, ZFS_PROP_VERSION, &zfsvfs->z_version); if (error != 0) return (error); if (zfsvfs->z_version > zfs_zpl_version_map(spa_version(dmu_objset_spa(os)))) { (void) printf("Can't mount a version %lld file system " "on a version %lld pool\n. Pool must be upgraded to mount " "this file system.", (u_longlong_t)zfsvfs->z_version, (u_longlong_t)spa_version(dmu_objset_spa(os))); return (SET_ERROR(ENOTSUP)); } error = zfs_get_zplprop(os, ZFS_PROP_NORMALIZE, &val); if (error != 0) return (error); zfsvfs->z_norm = (int)val; error = zfs_get_zplprop(os, ZFS_PROP_UTF8ONLY, &val); if (error != 0) return (error); zfsvfs->z_utf8 = (val != 0); error = zfs_get_zplprop(os, ZFS_PROP_CASE, &val); if (error != 0) return (error); zfsvfs->z_case = (uint_t)val; /* * Fold case on file systems that are always or sometimes case * insensitive. */ if (zfsvfs->z_case == ZFS_CASE_INSENSITIVE || zfsvfs->z_case == ZFS_CASE_MIXED) zfsvfs->z_norm |= U8_TEXTPREP_TOUPPER; zfsvfs->z_use_fuids = USE_FUIDS(zfsvfs->z_version, zfsvfs->z_os); zfsvfs->z_use_sa = USE_SA(zfsvfs->z_version, zfsvfs->z_os); uint64_t sa_obj = 0; if (zfsvfs->z_use_sa) { /* should either have both of these objects or none */ error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_SA_ATTRS, 8, 1, &sa_obj); if (error != 0) return (error); } error = sa_setup(os, sa_obj, zfs_attr_table, ZPL_END, &zfsvfs->z_attr_table); if (error != 0) return (error); if (zfsvfs->z_version >= ZPL_VERSION_SA) sa_register_update_callback(os, zfs_sa_upgrade); error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_ROOT_OBJ, 8, 1, &zfsvfs->z_root); if (error != 0) return (error); ASSERT(zfsvfs->z_root != 0); error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_UNLINKED_SET, 8, 1, &zfsvfs->z_unlinkedobj); if (error != 0) return (error); error = zap_lookup(os, MASTER_NODE_OBJ, zfs_userquota_prop_prefixes[ZFS_PROP_USERQUOTA], 8, 1, &zfsvfs->z_userquota_obj); if (error == ENOENT) zfsvfs->z_userquota_obj = 0; else if (error != 0) return (error); error = zap_lookup(os, MASTER_NODE_OBJ, zfs_userquota_prop_prefixes[ZFS_PROP_GROUPQUOTA], 8, 1, &zfsvfs->z_groupquota_obj); if (error == ENOENT) zfsvfs->z_groupquota_obj = 0; else if (error != 0) return (error); error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES, 8, 1, &zfsvfs->z_fuid_obj); if (error == ENOENT) zfsvfs->z_fuid_obj = 0; else if (error != 0) return (error); error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_SHARES_DIR, 8, 1, &zfsvfs->z_shares_dir); if (error == ENOENT) zfsvfs->z_shares_dir = 0; else if (error != 0) return (error); + /* + * Only use the name cache if we are looking for a + * name on a file system that does not require normalization + * or case folding. We can also look there if we happen to be + * on a non-normalizing, mixed sensitivity file system IF we + * are looking for the exact name (which is always the case on + * FreeBSD). + */ + zfsvfs->z_use_namecache = !zfsvfs->z_norm || + ((zfsvfs->z_case == ZFS_CASE_MIXED) && + !(zfsvfs->z_norm & ~U8_TEXTPREP_TOUPPER)); + return (0); } int zfsvfs_create(const char *osname, zfsvfs_t **zfvp) { objset_t *os; zfsvfs_t *zfsvfs; int error; /* * XXX: Fix struct statfs so this isn't necessary! * * The 'osname' is used as the filesystem's special node, which means * it must fit in statfs.f_mntfromname, or else it can't be * enumerated, so libzfs_mnttab_find() returns NULL, which causes * 'zfs unmount' to think it's not mounted when it is. */ if (strlen(osname) >= MNAMELEN) return (SET_ERROR(ENAMETOOLONG)); zfsvfs = kmem_zalloc(sizeof (zfsvfs_t), KM_SLEEP); /* * We claim to always be readonly so we can open snapshots; * other ZPL code will prevent us from writing to snapshots. */ error = dmu_objset_own(osname, DMU_OST_ZFS, B_TRUE, zfsvfs, &os); if (error) { kmem_free(zfsvfs, sizeof (zfsvfs_t)); return (error); } zfsvfs->z_vfs = NULL; zfsvfs->z_parent = zfsvfs; mutex_init(&zfsvfs->z_znodes_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&zfsvfs->z_lock, NULL, MUTEX_DEFAULT, NULL); list_create(&zfsvfs->z_all_znodes, sizeof (znode_t), offsetof(znode_t, z_link_node)); +#ifdef DIAGNOSTIC + rrm_init(&zfsvfs->z_teardown_lock, B_TRUE); +#else rrm_init(&zfsvfs->z_teardown_lock, B_FALSE); +#endif rw_init(&zfsvfs->z_teardown_inactive_lock, NULL, RW_DEFAULT, NULL); rw_init(&zfsvfs->z_fuid_lock, NULL, RW_DEFAULT, NULL); for (int i = 0; i != ZFS_OBJ_MTX_SZ; i++) mutex_init(&zfsvfs->z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL); error = zfsvfs_init(zfsvfs, os); if (error != 0) { dmu_objset_disown(os, zfsvfs); *zfvp = NULL; kmem_free(zfsvfs, sizeof (zfsvfs_t)); return (error); } *zfvp = zfsvfs; return (0); } static int zfsvfs_setup(zfsvfs_t *zfsvfs, boolean_t mounting) { int error; error = zfs_register_callbacks(zfsvfs->z_vfs); if (error) return (error); /* * Set the objset user_ptr to track its zfsvfs. */ mutex_enter(&zfsvfs->z_os->os_user_ptr_lock); dmu_objset_set_user(zfsvfs->z_os, zfsvfs); mutex_exit(&zfsvfs->z_os->os_user_ptr_lock); zfsvfs->z_log = zil_open(zfsvfs->z_os, zfs_get_data); /* * If we are not mounting (ie: online recv), then we don't * have to worry about replaying the log as we blocked all * operations out since we closed the ZIL. */ if (mounting) { boolean_t readonly; /* * During replay we remove the read only flag to * allow replays to succeed. */ readonly = zfsvfs->z_vfs->vfs_flag & VFS_RDONLY; if (readonly != 0) zfsvfs->z_vfs->vfs_flag &= ~VFS_RDONLY; else zfs_unlinked_drain(zfsvfs); /* * Parse and replay the intent log. * * Because of ziltest, this must be done after * zfs_unlinked_drain(). (Further note: ziltest * doesn't use readonly mounts, where * zfs_unlinked_drain() isn't called.) This is because * ziltest causes spa_sync() to think it's committed, * but actually it is not, so the intent log contains * many txg's worth of changes. * * In particular, if object N is in the unlinked set in * the last txg to actually sync, then it could be * actually freed in a later txg and then reallocated * in a yet later txg. This would write a "create * object N" record to the intent log. Normally, this * would be fine because the spa_sync() would have * written out the fact that object N is free, before * we could write the "create object N" intent log * record. * * But when we are in ziltest mode, we advance the "open * txg" without actually spa_sync()-ing the changes to * disk. So we would see that object N is still * allocated and in the unlinked set, and there is an * intent log record saying to allocate it. */ if (spa_writeable(dmu_objset_spa(zfsvfs->z_os))) { if (zil_replay_disable) { zil_destroy(zfsvfs->z_log, B_FALSE); } else { zfsvfs->z_replay = B_TRUE; zil_replay(zfsvfs->z_os, zfsvfs, zfs_replay_vector); zfsvfs->z_replay = B_FALSE; } } zfsvfs->z_vfs->vfs_flag |= readonly; /* restore readonly bit */ } return (0); } extern krwlock_t zfsvfs_lock; /* in zfs_znode.c */ void zfsvfs_free(zfsvfs_t *zfsvfs) { int i; /* * This is a barrier to prevent the filesystem from going away in * zfs_znode_move() until we can safely ensure that the filesystem is * not unmounted. We consider the filesystem valid before the barrier * and invalid after the barrier. */ rw_enter(&zfsvfs_lock, RW_READER); rw_exit(&zfsvfs_lock); zfs_fuid_destroy(zfsvfs); mutex_destroy(&zfsvfs->z_znodes_lock); mutex_destroy(&zfsvfs->z_lock); list_destroy(&zfsvfs->z_all_znodes); rrm_destroy(&zfsvfs->z_teardown_lock); rw_destroy(&zfsvfs->z_teardown_inactive_lock); rw_destroy(&zfsvfs->z_fuid_lock); for (i = 0; i != ZFS_OBJ_MTX_SZ; i++) mutex_destroy(&zfsvfs->z_hold_mtx[i]); kmem_free(zfsvfs, sizeof (zfsvfs_t)); } static void zfs_set_fuid_feature(zfsvfs_t *zfsvfs) { zfsvfs->z_use_fuids = USE_FUIDS(zfsvfs->z_version, zfsvfs->z_os); if (zfsvfs->z_vfs) { if (zfsvfs->z_use_fuids) { vfs_set_feature(zfsvfs->z_vfs, VFSFT_XVATTR); vfs_set_feature(zfsvfs->z_vfs, VFSFT_SYSATTR_VIEWS); vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACEMASKONACCESS); vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACLONCREATE); vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACCESS_FILTER); vfs_set_feature(zfsvfs->z_vfs, VFSFT_REPARSE); } else { vfs_clear_feature(zfsvfs->z_vfs, VFSFT_XVATTR); vfs_clear_feature(zfsvfs->z_vfs, VFSFT_SYSATTR_VIEWS); vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACEMASKONACCESS); vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACLONCREATE); vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACCESS_FILTER); vfs_clear_feature(zfsvfs->z_vfs, VFSFT_REPARSE); } } zfsvfs->z_use_sa = USE_SA(zfsvfs->z_version, zfsvfs->z_os); } static int zfs_domount(vfs_t *vfsp, char *osname) { uint64_t recordsize, fsid_guid; int error = 0; zfsvfs_t *zfsvfs; vnode_t *vp; ASSERT(vfsp); ASSERT(osname); error = zfsvfs_create(osname, &zfsvfs); if (error) return (error); zfsvfs->z_vfs = vfsp; #ifdef illumos /* Initialize the generic filesystem structure. */ vfsp->vfs_bcount = 0; vfsp->vfs_data = NULL; if (zfs_create_unique_device(&mount_dev) == -1) { error = SET_ERROR(ENODEV); goto out; } ASSERT(vfs_devismounted(mount_dev) == 0); #endif if (error = dsl_prop_get_integer(osname, "recordsize", &recordsize, NULL)) goto out; zfsvfs->z_vfs->vfs_bsize = SPA_MINBLOCKSIZE; zfsvfs->z_vfs->mnt_stat.f_iosize = recordsize; vfsp->vfs_data = zfsvfs; vfsp->mnt_flag |= MNT_LOCAL; vfsp->mnt_kern_flag |= MNTK_LOOKUP_SHARED; vfsp->mnt_kern_flag |= MNTK_SHARED_WRITES; vfsp->mnt_kern_flag |= MNTK_EXTENDED_SHARED; vfsp->mnt_kern_flag |= MNTK_NO_IOPF; /* vn_io_fault can be used */ /* * The fsid is 64 bits, composed of an 8-bit fs type, which * separates our fsid from any other filesystem types, and a * 56-bit objset unique ID. The objset unique ID is unique to * all objsets open on this system, provided by unique_create(). * The 8-bit fs type must be put in the low bits of fsid[1] * because that's where other Solaris filesystems put it. */ fsid_guid = dmu_objset_fsid_guid(zfsvfs->z_os); ASSERT((fsid_guid & ~((1ULL<<56)-1)) == 0); vfsp->vfs_fsid.val[0] = fsid_guid; vfsp->vfs_fsid.val[1] = ((fsid_guid>>32) << 8) | vfsp->mnt_vfc->vfc_typenum & 0xFF; /* * Set features for file system. */ zfs_set_fuid_feature(zfsvfs); if (zfsvfs->z_case == ZFS_CASE_INSENSITIVE) { vfs_set_feature(vfsp, VFSFT_DIRENTFLAGS); vfs_set_feature(vfsp, VFSFT_CASEINSENSITIVE); vfs_set_feature(vfsp, VFSFT_NOCASESENSITIVE); } else if (zfsvfs->z_case == ZFS_CASE_MIXED) { vfs_set_feature(vfsp, VFSFT_DIRENTFLAGS); vfs_set_feature(vfsp, VFSFT_CASEINSENSITIVE); } vfs_set_feature(vfsp, VFSFT_ZEROCOPY_SUPPORTED); if (dmu_objset_is_snapshot(zfsvfs->z_os)) { uint64_t pval; atime_changed_cb(zfsvfs, B_FALSE); readonly_changed_cb(zfsvfs, B_TRUE); if (error = dsl_prop_get_integer(osname, "xattr", &pval, NULL)) goto out; xattr_changed_cb(zfsvfs, pval); zfsvfs->z_issnap = B_TRUE; zfsvfs->z_os->os_sync = ZFS_SYNC_DISABLED; mutex_enter(&zfsvfs->z_os->os_user_ptr_lock); dmu_objset_set_user(zfsvfs->z_os, zfsvfs); mutex_exit(&zfsvfs->z_os->os_user_ptr_lock); } else { error = zfsvfs_setup(zfsvfs, B_TRUE); } vfs_mountedfrom(vfsp, osname); if (!zfsvfs->z_issnap) zfsctl_create(zfsvfs); out: if (error) { dmu_objset_disown(zfsvfs->z_os, zfsvfs); zfsvfs_free(zfsvfs); } else { atomic_inc_32(&zfs_active_fs_count); } return (error); } void zfs_unregister_callbacks(zfsvfs_t *zfsvfs) { objset_t *os = zfsvfs->z_os; if (!dmu_objset_is_snapshot(os)) dsl_prop_unregister_all(dmu_objset_ds(os), zfsvfs); } #ifdef SECLABEL /* * Convert a decimal digit string to a uint64_t integer. */ static int str_to_uint64(char *str, uint64_t *objnum) { uint64_t num = 0; while (*str) { if (*str < '0' || *str > '9') return (SET_ERROR(EINVAL)); num = num*10 + *str++ - '0'; } *objnum = num; return (0); } /* * The boot path passed from the boot loader is in the form of * "rootpool-name/root-filesystem-object-number'. Convert this * string to a dataset name: "rootpool-name/root-filesystem-name". */ static int zfs_parse_bootfs(char *bpath, char *outpath) { char *slashp; uint64_t objnum; int error; if (*bpath == 0 || *bpath == '/') return (SET_ERROR(EINVAL)); (void) strcpy(outpath, bpath); slashp = strchr(bpath, '/'); /* if no '/', just return the pool name */ if (slashp == NULL) { return (0); } /* if not a number, just return the root dataset name */ if (str_to_uint64(slashp+1, &objnum)) { return (0); } *slashp = '\0'; error = dsl_dsobj_to_dsname(bpath, objnum, outpath); *slashp = '/'; return (error); } /* * Check that the hex label string is appropriate for the dataset being * mounted into the global_zone proper. * * Return an error if the hex label string is not default or * admin_low/admin_high. For admin_low labels, the corresponding * dataset must be readonly. */ int zfs_check_global_label(const char *dsname, const char *hexsl) { if (strcasecmp(hexsl, ZFS_MLSLABEL_DEFAULT) == 0) return (0); if (strcasecmp(hexsl, ADMIN_HIGH) == 0) return (0); if (strcasecmp(hexsl, ADMIN_LOW) == 0) { /* must be readonly */ uint64_t rdonly; if (dsl_prop_get_integer(dsname, zfs_prop_to_name(ZFS_PROP_READONLY), &rdonly, NULL)) return (SET_ERROR(EACCES)); return (rdonly ? 0 : EACCES); } return (SET_ERROR(EACCES)); } /* * Determine whether the mount is allowed according to MAC check. * by comparing (where appropriate) label of the dataset against * the label of the zone being mounted into. If the dataset has * no label, create one. * * Returns 0 if access allowed, error otherwise (e.g. EACCES) */ static int zfs_mount_label_policy(vfs_t *vfsp, char *osname) { int error, retv; zone_t *mntzone = NULL; ts_label_t *mnt_tsl; bslabel_t *mnt_sl; bslabel_t ds_sl; char ds_hexsl[MAXNAMELEN]; retv = EACCES; /* assume the worst */ /* * Start by getting the dataset label if it exists. */ error = dsl_prop_get(osname, zfs_prop_to_name(ZFS_PROP_MLSLABEL), 1, sizeof (ds_hexsl), &ds_hexsl, NULL); if (error) return (SET_ERROR(EACCES)); /* * If labeling is NOT enabled, then disallow the mount of datasets * which have a non-default label already. No other label checks * are needed. */ if (!is_system_labeled()) { if (strcasecmp(ds_hexsl, ZFS_MLSLABEL_DEFAULT) == 0) return (0); return (SET_ERROR(EACCES)); } /* * Get the label of the mountpoint. If mounting into the global * zone (i.e. mountpoint is not within an active zone and the * zoned property is off), the label must be default or * admin_low/admin_high only; no other checks are needed. */ mntzone = zone_find_by_any_path(refstr_value(vfsp->vfs_mntpt), B_FALSE); if (mntzone->zone_id == GLOBAL_ZONEID) { uint64_t zoned; zone_rele(mntzone); if (dsl_prop_get_integer(osname, zfs_prop_to_name(ZFS_PROP_ZONED), &zoned, NULL)) return (SET_ERROR(EACCES)); if (!zoned) return (zfs_check_global_label(osname, ds_hexsl)); else /* * This is the case of a zone dataset being mounted * initially, before the zone has been fully created; * allow this mount into global zone. */ return (0); } mnt_tsl = mntzone->zone_slabel; ASSERT(mnt_tsl != NULL); label_hold(mnt_tsl); mnt_sl = label2bslabel(mnt_tsl); if (strcasecmp(ds_hexsl, ZFS_MLSLABEL_DEFAULT) == 0) { /* * The dataset doesn't have a real label, so fabricate one. */ char *str = NULL; if (l_to_str_internal(mnt_sl, &str) == 0 && dsl_prop_set_string(osname, zfs_prop_to_name(ZFS_PROP_MLSLABEL), ZPROP_SRC_LOCAL, str) == 0) retv = 0; if (str != NULL) kmem_free(str, strlen(str) + 1); } else if (hexstr_to_label(ds_hexsl, &ds_sl) == 0) { /* * Now compare labels to complete the MAC check. If the * labels are equal then allow access. If the mountpoint * label dominates the dataset label, allow readonly access. * Otherwise, access is denied. */ if (blequal(mnt_sl, &ds_sl)) retv = 0; else if (bldominates(mnt_sl, &ds_sl)) { vfs_setmntopt(vfsp, MNTOPT_RO, NULL, 0); retv = 0; } } label_rele(mnt_tsl); zone_rele(mntzone); return (retv); } #endif /* SECLABEL */ #ifdef OPENSOLARIS_MOUNTROOT static int zfs_mountroot(vfs_t *vfsp, enum whymountroot why) { int error = 0; static int zfsrootdone = 0; zfsvfs_t *zfsvfs = NULL; znode_t *zp = NULL; vnode_t *vp = NULL; char *zfs_bootfs; char *zfs_devid; ASSERT(vfsp); /* * The filesystem that we mount as root is defined in the * boot property "zfs-bootfs" with a format of * "poolname/root-dataset-objnum". */ if (why == ROOT_INIT) { if (zfsrootdone++) return (SET_ERROR(EBUSY)); /* * the process of doing a spa_load will require the * clock to be set before we could (for example) do * something better by looking at the timestamp on * an uberblock, so just set it to -1. */ clkset(-1); if ((zfs_bootfs = spa_get_bootprop("zfs-bootfs")) == NULL) { cmn_err(CE_NOTE, "spa_get_bootfs: can not get " "bootfs name"); return (SET_ERROR(EINVAL)); } zfs_devid = spa_get_bootprop("diskdevid"); error = spa_import_rootpool(rootfs.bo_name, zfs_devid); if (zfs_devid) spa_free_bootprop(zfs_devid); if (error) { spa_free_bootprop(zfs_bootfs); cmn_err(CE_NOTE, "spa_import_rootpool: error %d", error); return (error); } if (error = zfs_parse_bootfs(zfs_bootfs, rootfs.bo_name)) { spa_free_bootprop(zfs_bootfs); cmn_err(CE_NOTE, "zfs_parse_bootfs: error %d", error); return (error); } spa_free_bootprop(zfs_bootfs); if (error = vfs_lock(vfsp)) return (error); if (error = zfs_domount(vfsp, rootfs.bo_name)) { cmn_err(CE_NOTE, "zfs_domount: error %d", error); goto out; } zfsvfs = (zfsvfs_t *)vfsp->vfs_data; ASSERT(zfsvfs); if (error = zfs_zget(zfsvfs, zfsvfs->z_root, &zp)) { cmn_err(CE_NOTE, "zfs_zget: error %d", error); goto out; } vp = ZTOV(zp); mutex_enter(&vp->v_lock); vp->v_flag |= VROOT; mutex_exit(&vp->v_lock); rootvp = vp; /* * Leave rootvp held. The root file system is never unmounted. */ vfs_add((struct vnode *)0, vfsp, (vfsp->vfs_flag & VFS_RDONLY) ? MS_RDONLY : 0); out: vfs_unlock(vfsp); return (error); } else if (why == ROOT_REMOUNT) { readonly_changed_cb(vfsp->vfs_data, B_FALSE); vfsp->vfs_flag |= VFS_REMOUNT; /* refresh mount options */ zfs_unregister_callbacks(vfsp->vfs_data); return (zfs_register_callbacks(vfsp)); } else if (why == ROOT_UNMOUNT) { zfs_unregister_callbacks((zfsvfs_t *)vfsp->vfs_data); (void) zfs_sync(vfsp, 0, 0); return (0); } /* * if "why" is equal to anything else other than ROOT_INIT, * ROOT_REMOUNT, or ROOT_UNMOUNT, we do not support it. */ return (SET_ERROR(ENOTSUP)); } #endif /* OPENSOLARIS_MOUNTROOT */ static int getpoolname(const char *osname, char *poolname) { char *p; p = strchr(osname, '/'); if (p == NULL) { if (strlen(osname) >= MAXNAMELEN) return (ENAMETOOLONG); (void) strcpy(poolname, osname); } else { if (p - osname >= MAXNAMELEN) return (ENAMETOOLONG); (void) strncpy(poolname, osname, p - osname); poolname[p - osname] = '\0'; } return (0); } /*ARGSUSED*/ static int zfs_mount(vfs_t *vfsp) { kthread_t *td = curthread; vnode_t *mvp = vfsp->mnt_vnodecovered; cred_t *cr = td->td_ucred; char *osname; int error = 0; int canwrite; #ifdef illumos if (mvp->v_type != VDIR) return (SET_ERROR(ENOTDIR)); mutex_enter(&mvp->v_lock); if ((uap->flags & MS_REMOUNT) == 0 && (uap->flags & MS_OVERLAY) == 0 && (mvp->v_count != 1 || (mvp->v_flag & VROOT))) { mutex_exit(&mvp->v_lock); return (SET_ERROR(EBUSY)); } mutex_exit(&mvp->v_lock); /* * ZFS does not support passing unparsed data in via MS_DATA. * Users should use the MS_OPTIONSTR interface; this means * that all option parsing is already done and the options struct * can be interrogated. */ if ((uap->flags & MS_DATA) && uap->datalen > 0) #else /* !illumos */ if (!prison_allow(td->td_ucred, PR_ALLOW_MOUNT_ZFS)) return (SET_ERROR(EPERM)); if (vfs_getopt(vfsp->mnt_optnew, "from", (void **)&osname, NULL)) return (SET_ERROR(EINVAL)); #endif /* illumos */ /* * If full-owner-access is enabled and delegated administration is * turned on, we must set nosuid. */ if (zfs_super_owner && dsl_deleg_access(osname, ZFS_DELEG_PERM_MOUNT, cr) != ECANCELED) { secpolicy_fs_mount_clearopts(cr, vfsp); } /* * Check for mount privilege? * * If we don't have privilege then see if * we have local permission to allow it */ error = secpolicy_fs_mount(cr, mvp, vfsp); if (error) { if (dsl_deleg_access(osname, ZFS_DELEG_PERM_MOUNT, cr) != 0) goto out; if (!(vfsp->vfs_flag & MS_REMOUNT)) { vattr_t vattr; /* * Make sure user is the owner of the mount point * or has sufficient privileges. */ vattr.va_mask = AT_UID; vn_lock(mvp, LK_SHARED | LK_RETRY); if (VOP_GETATTR(mvp, &vattr, cr)) { VOP_UNLOCK(mvp, 0); goto out; } if (secpolicy_vnode_owner(mvp, cr, vattr.va_uid) != 0 && VOP_ACCESS(mvp, VWRITE, cr, td) != 0) { VOP_UNLOCK(mvp, 0); goto out; } VOP_UNLOCK(mvp, 0); } secpolicy_fs_mount_clearopts(cr, vfsp); } /* * Refuse to mount a filesystem if we are in a local zone and the * dataset is not visible. */ if (!INGLOBALZONE(curthread) && (!zone_dataset_visible(osname, &canwrite) || !canwrite)) { error = SET_ERROR(EPERM); goto out; } #ifdef SECLABEL error = zfs_mount_label_policy(vfsp, osname); if (error) goto out; #endif vfsp->vfs_flag |= MNT_NFS4ACLS; /* * When doing a remount, we simply refresh our temporary properties * according to those options set in the current VFS options. */ if (vfsp->vfs_flag & MS_REMOUNT) { zfsvfs_t *zfsvfs = vfsp->vfs_data; /* * Refresh mount options with z_teardown_lock blocking I/O while * the filesystem is in an inconsistent state. * The lock also serializes this code with filesystem * manipulations between entry to zfs_suspend_fs() and return * from zfs_resume_fs(). */ rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG); zfs_unregister_callbacks(zfsvfs); error = zfs_register_callbacks(vfsp); rrm_exit(&zfsvfs->z_teardown_lock, FTAG); goto out; } /* Initial root mount: try hard to import the requested root pool. */ if ((vfsp->vfs_flag & MNT_ROOTFS) != 0 && (vfsp->vfs_flag & MNT_UPDATE) == 0) { char pname[MAXNAMELEN]; error = getpoolname(osname, pname); if (error == 0) error = spa_import_rootpool(pname); if (error) goto out; } DROP_GIANT(); error = zfs_domount(vfsp, osname); PICKUP_GIANT(); #ifdef illumos /* * Add an extra VFS_HOLD on our parent vfs so that it can't * disappear due to a forced unmount. */ if (error == 0 && ((zfsvfs_t *)vfsp->vfs_data)->z_issnap) VFS_HOLD(mvp->v_vfsp); #endif out: return (error); } static int zfs_statfs(vfs_t *vfsp, struct statfs *statp) { zfsvfs_t *zfsvfs = vfsp->vfs_data; uint64_t refdbytes, availbytes, usedobjs, availobjs; statp->f_version = STATFS_VERSION; ZFS_ENTER(zfsvfs); dmu_objset_space(zfsvfs->z_os, &refdbytes, &availbytes, &usedobjs, &availobjs); /* * The underlying storage pool actually uses multiple block sizes. * We report the fragsize as the smallest block size we support, * and we report our blocksize as the filesystem's maximum blocksize. */ statp->f_bsize = SPA_MINBLOCKSIZE; statp->f_iosize = zfsvfs->z_vfs->mnt_stat.f_iosize; /* * The following report "total" blocks of various kinds in the * file system, but reported in terms of f_frsize - the * "fragment" size. */ statp->f_blocks = (refdbytes + availbytes) >> SPA_MINBLOCKSHIFT; statp->f_bfree = availbytes / statp->f_bsize; statp->f_bavail = statp->f_bfree; /* no root reservation */ /* * statvfs() should really be called statufs(), because it assumes * static metadata. ZFS doesn't preallocate files, so the best * we can do is report the max that could possibly fit in f_files, * and that minus the number actually used in f_ffree. * For f_ffree, report the smaller of the number of object available * and the number of blocks (each object will take at least a block). */ statp->f_ffree = MIN(availobjs, statp->f_bfree); statp->f_files = statp->f_ffree + usedobjs; /* * We're a zfs filesystem. */ (void) strlcpy(statp->f_fstypename, "zfs", sizeof(statp->f_fstypename)); strlcpy(statp->f_mntfromname, vfsp->mnt_stat.f_mntfromname, sizeof(statp->f_mntfromname)); strlcpy(statp->f_mntonname, vfsp->mnt_stat.f_mntonname, sizeof(statp->f_mntonname)); statp->f_namemax = ZFS_MAXNAMELEN; ZFS_EXIT(zfsvfs); return (0); } static int zfs_root(vfs_t *vfsp, int flags, vnode_t **vpp) { zfsvfs_t *zfsvfs = vfsp->vfs_data; znode_t *rootzp; int error; ZFS_ENTER(zfsvfs); error = zfs_zget(zfsvfs, zfsvfs->z_root, &rootzp); if (error == 0) *vpp = ZTOV(rootzp); ZFS_EXIT(zfsvfs); if (error == 0) { error = vn_lock(*vpp, flags); if (error != 0) { VN_RELE(*vpp); *vpp = NULL; } } return (error); } /* * Teardown the zfsvfs::z_os. * * Note, if 'unmounting' if FALSE, we return with the 'z_teardown_lock' * and 'z_teardown_inactive_lock' held. */ static int zfsvfs_teardown(zfsvfs_t *zfsvfs, boolean_t unmounting) { znode_t *zp; rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG); if (!unmounting) { /* * We purge the parent filesystem's vfsp as the parent * filesystem and all of its snapshots have their vnode's * v_vfsp set to the parent's filesystem's vfsp. Note, * 'z_parent' is self referential for non-snapshots. */ (void) dnlc_purge_vfsp(zfsvfs->z_parent->z_vfs, 0); #ifdef FREEBSD_NAMECACHE cache_purgevfs(zfsvfs->z_parent->z_vfs); #endif } /* * Close the zil. NB: Can't close the zil while zfs_inactive * threads are blocked as zil_close can call zfs_inactive. */ if (zfsvfs->z_log) { zil_close(zfsvfs->z_log); zfsvfs->z_log = NULL; } rw_enter(&zfsvfs->z_teardown_inactive_lock, RW_WRITER); /* * If we are not unmounting (ie: online recv) and someone already * unmounted this file system while we were doing the switcheroo, * or a reopen of z_os failed then just bail out now. */ if (!unmounting && (zfsvfs->z_unmounted || zfsvfs->z_os == NULL)) { rw_exit(&zfsvfs->z_teardown_inactive_lock); rrm_exit(&zfsvfs->z_teardown_lock, FTAG); return (SET_ERROR(EIO)); } /* * At this point there are no vops active, and any new vops will * fail with EIO since we have z_teardown_lock for writer (only * relavent for forced unmount). * * Release all holds on dbufs. */ mutex_enter(&zfsvfs->z_znodes_lock); for (zp = list_head(&zfsvfs->z_all_znodes); zp != NULL; zp = list_next(&zfsvfs->z_all_znodes, zp)) if (zp->z_sa_hdl) { ASSERT(ZTOV(zp)->v_count >= 0); zfs_znode_dmu_fini(zp); } mutex_exit(&zfsvfs->z_znodes_lock); /* * If we are unmounting, set the unmounted flag and let new vops * unblock. zfs_inactive will have the unmounted behavior, and all * other vops will fail with EIO. */ if (unmounting) { zfsvfs->z_unmounted = B_TRUE; rrm_exit(&zfsvfs->z_teardown_lock, FTAG); rw_exit(&zfsvfs->z_teardown_inactive_lock); } /* * z_os will be NULL if there was an error in attempting to reopen * zfsvfs, so just return as the properties had already been * unregistered and cached data had been evicted before. */ if (zfsvfs->z_os == NULL) return (0); /* * Unregister properties. */ zfs_unregister_callbacks(zfsvfs); /* * Evict cached data */ if (dsl_dataset_is_dirty(dmu_objset_ds(zfsvfs->z_os)) && !(zfsvfs->z_vfs->vfs_flag & VFS_RDONLY)) txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0); dmu_objset_evict_dbufs(zfsvfs->z_os); return (0); } /*ARGSUSED*/ static int zfs_umount(vfs_t *vfsp, int fflag) { kthread_t *td = curthread; zfsvfs_t *zfsvfs = vfsp->vfs_data; objset_t *os; cred_t *cr = td->td_ucred; int ret; ret = secpolicy_fs_unmount(cr, vfsp); if (ret) { if (dsl_deleg_access((char *)refstr_value(vfsp->vfs_resource), ZFS_DELEG_PERM_MOUNT, cr)) return (ret); } /* * We purge the parent filesystem's vfsp as the parent filesystem * and all of its snapshots have their vnode's v_vfsp set to the * parent's filesystem's vfsp. Note, 'z_parent' is self * referential for non-snapshots. */ (void) dnlc_purge_vfsp(zfsvfs->z_parent->z_vfs, 0); /* * Unmount any snapshots mounted under .zfs before unmounting the * dataset itself. */ if (zfsvfs->z_ctldir != NULL) { if ((ret = zfsctl_umount_snapshots(vfsp, fflag, cr)) != 0) return (ret); ret = vflush(vfsp, 0, 0, td); ASSERT(ret == EBUSY); if (!(fflag & MS_FORCE)) { if (zfsvfs->z_ctldir->v_count > 1) return (EBUSY); ASSERT(zfsvfs->z_ctldir->v_count == 1); } zfsctl_destroy(zfsvfs); ASSERT(zfsvfs->z_ctldir == NULL); } if (fflag & MS_FORCE) { /* * Mark file system as unmounted before calling * vflush(FORCECLOSE). This way we ensure no future vnops * will be called and risk operating on DOOMED vnodes. */ rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG); zfsvfs->z_unmounted = B_TRUE; rrm_exit(&zfsvfs->z_teardown_lock, FTAG); } /* * Flush all the files. */ ret = vflush(vfsp, 0, (fflag & MS_FORCE) ? FORCECLOSE : 0, td); if (ret != 0) { if (!zfsvfs->z_issnap) { zfsctl_create(zfsvfs); ASSERT(zfsvfs->z_ctldir != NULL); } return (ret); } #ifdef illumos if (!(fflag & MS_FORCE)) { /* * Check the number of active vnodes in the file system. * Our count is maintained in the vfs structure, but the * number is off by 1 to indicate a hold on the vfs * structure itself. * * The '.zfs' directory maintains a reference of its * own, and any active references underneath are * reflected in the vnode count. */ if (zfsvfs->z_ctldir == NULL) { if (vfsp->vfs_count > 1) return (SET_ERROR(EBUSY)); } else { if (vfsp->vfs_count > 2 || zfsvfs->z_ctldir->v_count > 1) return (SET_ERROR(EBUSY)); } } #endif VERIFY(zfsvfs_teardown(zfsvfs, B_TRUE) == 0); os = zfsvfs->z_os; /* * z_os will be NULL if there was an error in * attempting to reopen zfsvfs. */ if (os != NULL) { /* * Unset the objset user_ptr. */ mutex_enter(&os->os_user_ptr_lock); dmu_objset_set_user(os, NULL); mutex_exit(&os->os_user_ptr_lock); /* * Finally release the objset */ dmu_objset_disown(os, zfsvfs); } /* * We can now safely destroy the '.zfs' directory node. */ if (zfsvfs->z_ctldir != NULL) zfsctl_destroy(zfsvfs); zfs_freevfs(vfsp); return (0); } static int zfs_vget(vfs_t *vfsp, ino_t ino, int flags, vnode_t **vpp) { zfsvfs_t *zfsvfs = vfsp->vfs_data; znode_t *zp; int err; /* * zfs_zget() can't operate on virtual entries like .zfs/ or * .zfs/snapshot/ directories, that's why we return EOPNOTSUPP. * This will make NFS to switch to LOOKUP instead of using VGET. */ if (ino == ZFSCTL_INO_ROOT || ino == ZFSCTL_INO_SNAPDIR || (zfsvfs->z_shares_dir != 0 && ino == zfsvfs->z_shares_dir)) return (EOPNOTSUPP); ZFS_ENTER(zfsvfs); err = zfs_zget(zfsvfs, ino, &zp); if (err == 0 && zp->z_unlinked) { - VN_RELE(ZTOV(zp)); + vrele(ZTOV(zp)); err = EINVAL; } if (err == 0) *vpp = ZTOV(zp); ZFS_EXIT(zfsvfs); if (err == 0) err = vn_lock(*vpp, flags); if (err != 0) *vpp = NULL; return (err); } static int zfs_checkexp(vfs_t *vfsp, struct sockaddr *nam, int *extflagsp, struct ucred **credanonp, int *numsecflavors, int **secflavors) { zfsvfs_t *zfsvfs = vfsp->vfs_data; /* * If this is regular file system vfsp is the same as * zfsvfs->z_parent->z_vfs, but if it is snapshot, * zfsvfs->z_parent->z_vfs represents parent file system * which we have to use here, because only this file system * has mnt_export configured. */ return (vfs_stdcheckexp(zfsvfs->z_parent->z_vfs, nam, extflagsp, credanonp, numsecflavors, secflavors)); } CTASSERT(SHORT_FID_LEN <= sizeof(struct fid)); CTASSERT(LONG_FID_LEN <= sizeof(struct fid)); static int zfs_fhtovp(vfs_t *vfsp, fid_t *fidp, int flags, vnode_t **vpp) { zfsvfs_t *zfsvfs = vfsp->vfs_data; znode_t *zp; uint64_t object = 0; uint64_t fid_gen = 0; uint64_t gen_mask; uint64_t zp_gen; int i, err; *vpp = NULL; ZFS_ENTER(zfsvfs); /* * On FreeBSD we can get snapshot's mount point or its parent file * system mount point depending if snapshot is already mounted or not. */ if (zfsvfs->z_parent == zfsvfs && fidp->fid_len == LONG_FID_LEN) { zfid_long_t *zlfid = (zfid_long_t *)fidp; uint64_t objsetid = 0; uint64_t setgen = 0; for (i = 0; i < sizeof (zlfid->zf_setid); i++) objsetid |= ((uint64_t)zlfid->zf_setid[i]) << (8 * i); for (i = 0; i < sizeof (zlfid->zf_setgen); i++) setgen |= ((uint64_t)zlfid->zf_setgen[i]) << (8 * i); ZFS_EXIT(zfsvfs); err = zfsctl_lookup_objset(vfsp, objsetid, &zfsvfs); if (err) return (SET_ERROR(EINVAL)); ZFS_ENTER(zfsvfs); } if (fidp->fid_len == SHORT_FID_LEN || fidp->fid_len == LONG_FID_LEN) { zfid_short_t *zfid = (zfid_short_t *)fidp; for (i = 0; i < sizeof (zfid->zf_object); i++) object |= ((uint64_t)zfid->zf_object[i]) << (8 * i); for (i = 0; i < sizeof (zfid->zf_gen); i++) fid_gen |= ((uint64_t)zfid->zf_gen[i]) << (8 * i); } else { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } /* * A zero fid_gen means we are in .zfs or the .zfs/snapshot * directory tree. If the object == zfsvfs->z_shares_dir, then * we are in the .zfs/shares directory tree. */ if ((fid_gen == 0 && (object == ZFSCTL_INO_ROOT || object == ZFSCTL_INO_SNAPDIR)) || (zfsvfs->z_shares_dir != 0 && object == zfsvfs->z_shares_dir)) { *vpp = zfsvfs->z_ctldir; ASSERT(*vpp != NULL); if (object == ZFSCTL_INO_SNAPDIR) { VERIFY(zfsctl_root_lookup(*vpp, "snapshot", vpp, NULL, 0, NULL, NULL, NULL, NULL, NULL) == 0); } else if (object == zfsvfs->z_shares_dir) { VERIFY(zfsctl_root_lookup(*vpp, "shares", vpp, NULL, 0, NULL, NULL, NULL, NULL, NULL) == 0); } else { - VN_HOLD(*vpp); + vref(*vpp); } ZFS_EXIT(zfsvfs); err = vn_lock(*vpp, flags); if (err != 0) *vpp = NULL; return (err); } gen_mask = -1ULL >> (64 - 8 * i); dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask); if (err = zfs_zget(zfsvfs, object, &zp)) { ZFS_EXIT(zfsvfs); return (err); } (void) sa_lookup(zp->z_sa_hdl, SA_ZPL_GEN(zfsvfs), &zp_gen, sizeof (uint64_t)); zp_gen = zp_gen & gen_mask; if (zp_gen == 0) zp_gen = 1; if (zp->z_unlinked || zp_gen != fid_gen) { dprintf("znode gen (%u) != fid gen (%u)\n", zp_gen, fid_gen); - VN_RELE(ZTOV(zp)); + vrele(ZTOV(zp)); ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } *vpp = ZTOV(zp); ZFS_EXIT(zfsvfs); err = vn_lock(*vpp, flags | LK_RETRY); if (err == 0) vnode_create_vobject(*vpp, zp->z_size, curthread); else *vpp = NULL; return (err); } /* * Block out VOPs and close zfsvfs_t::z_os * * Note, if successful, then we return with the 'z_teardown_lock' and * 'z_teardown_inactive_lock' write held. We leave ownership of the underlying * dataset and objset intact so that they can be atomically handed off during * a subsequent rollback or recv operation and the resume thereafter. */ int zfs_suspend_fs(zfsvfs_t *zfsvfs) { int error; if ((error = zfsvfs_teardown(zfsvfs, B_FALSE)) != 0) return (error); return (0); } /* * Rebuild SA and release VOPs. Note that ownership of the underlying dataset * is an invariant across any of the operations that can be performed while the * filesystem was suspended. Whether it succeeded or failed, the preconditions * are the same: the relevant objset and associated dataset are owned by * zfsvfs, held, and long held on entry. */ int zfs_resume_fs(zfsvfs_t *zfsvfs, const char *osname) { int err; znode_t *zp; ASSERT(RRM_WRITE_HELD(&zfsvfs->z_teardown_lock)); ASSERT(RW_WRITE_HELD(&zfsvfs->z_teardown_inactive_lock)); /* * We already own this, so just hold and rele it to update the * objset_t, as the one we had before may have been evicted. */ objset_t *os; VERIFY0(dmu_objset_hold(osname, zfsvfs, &os)); VERIFY3P(os->os_dsl_dataset->ds_owner, ==, zfsvfs); VERIFY(dsl_dataset_long_held(os->os_dsl_dataset)); dmu_objset_rele(os, zfsvfs); err = zfsvfs_init(zfsvfs, os); if (err != 0) goto bail; VERIFY(zfsvfs_setup(zfsvfs, B_FALSE) == 0); zfs_set_fuid_feature(zfsvfs); /* * Attempt to re-establish all the active znodes with * their dbufs. If a zfs_rezget() fails, then we'll let * any potential callers discover that via ZFS_ENTER_VERIFY_VP * when they try to use their znode. */ mutex_enter(&zfsvfs->z_znodes_lock); for (zp = list_head(&zfsvfs->z_all_znodes); zp; zp = list_next(&zfsvfs->z_all_znodes, zp)) { (void) zfs_rezget(zp); } mutex_exit(&zfsvfs->z_znodes_lock); bail: /* release the VOPs */ rw_exit(&zfsvfs->z_teardown_inactive_lock); rrm_exit(&zfsvfs->z_teardown_lock, FTAG); if (err) { /* * Since we couldn't setup the sa framework, try to force * unmount this file system. */ if (vn_vfswlock(zfsvfs->z_vfs->vfs_vnodecovered) == 0) { vfs_ref(zfsvfs->z_vfs); (void) dounmount(zfsvfs->z_vfs, MS_FORCE, curthread); } } return (err); } static void zfs_freevfs(vfs_t *vfsp) { zfsvfs_t *zfsvfs = vfsp->vfs_data; #ifdef illumos /* * If this is a snapshot, we have an extra VFS_HOLD on our parent * from zfs_mount(). Release it here. If we came through * zfs_mountroot() instead, we didn't grab an extra hold, so * skip the VFS_RELE for rootvfs. */ if (zfsvfs->z_issnap && (vfsp != rootvfs)) VFS_RELE(zfsvfs->z_parent->z_vfs); #endif zfsvfs_free(zfsvfs); atomic_dec_32(&zfs_active_fs_count); } #ifdef __i386__ static int desiredvnodes_backup; #endif static void zfs_vnodes_adjust(void) { #ifdef __i386__ int newdesiredvnodes; desiredvnodes_backup = desiredvnodes; /* * We calculate newdesiredvnodes the same way it is done in * vntblinit(). If it is equal to desiredvnodes, it means that * it wasn't tuned by the administrator and we can tune it down. */ newdesiredvnodes = min(maxproc + vm_cnt.v_page_count / 4, 2 * vm_kmem_size / (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); if (newdesiredvnodes == desiredvnodes) desiredvnodes = (3 * newdesiredvnodes) / 4; #endif } static void zfs_vnodes_adjust_back(void) { #ifdef __i386__ desiredvnodes = desiredvnodes_backup; #endif } void zfs_init(void) { printf("ZFS filesystem version: " ZPL_VERSION_STRING "\n"); /* * Initialize .zfs directory structures */ zfsctl_init(); /* * Initialize znode cache, vnode ops, etc... */ zfs_znode_init(); /* * Reduce number of vnodes. Originally number of vnodes is calculated * with UFS inode in mind. We reduce it here, because it's too big for * ZFS/i386. */ zfs_vnodes_adjust(); dmu_objset_register_type(DMU_OST_ZFS, zfs_space_delta_cb); } void zfs_fini(void) { zfsctl_fini(); zfs_znode_fini(); zfs_vnodes_adjust_back(); } int zfs_busy(void) { return (zfs_active_fs_count != 0); } int zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers) { int error; objset_t *os = zfsvfs->z_os; dmu_tx_t *tx; if (newvers < ZPL_VERSION_INITIAL || newvers > ZPL_VERSION) return (SET_ERROR(EINVAL)); if (newvers < zfsvfs->z_version) return (SET_ERROR(EINVAL)); if (zfs_spa_version_map(newvers) > spa_version(dmu_objset_spa(zfsvfs->z_os))) return (SET_ERROR(ENOTSUP)); tx = dmu_tx_create(os); dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_FALSE, ZPL_VERSION_STR); if (newvers >= ZPL_VERSION_SA && !zfsvfs->z_use_sa) { dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_TRUE, ZFS_SA_ATTRS); dmu_tx_hold_zap(tx, DMU_NEW_OBJECT, FALSE, NULL); } error = dmu_tx_assign(tx, TXG_WAIT); if (error) { dmu_tx_abort(tx); return (error); } error = zap_update(os, MASTER_NODE_OBJ, ZPL_VERSION_STR, 8, 1, &newvers, tx); if (error) { dmu_tx_commit(tx); return (error); } if (newvers >= ZPL_VERSION_SA && !zfsvfs->z_use_sa) { uint64_t sa_obj; ASSERT3U(spa_version(dmu_objset_spa(zfsvfs->z_os)), >=, SPA_VERSION_SA); sa_obj = zap_create(os, DMU_OT_SA_MASTER_NODE, DMU_OT_NONE, 0, tx); error = zap_add(os, MASTER_NODE_OBJ, ZFS_SA_ATTRS, 8, 1, &sa_obj, tx); ASSERT0(error); VERIFY(0 == sa_set_sa_object(os, sa_obj)); sa_register_update_callback(os, zfs_sa_upgrade); } spa_history_log_internal_ds(dmu_objset_ds(os), "upgrade", tx, "from %llu to %llu", zfsvfs->z_version, newvers); dmu_tx_commit(tx); zfsvfs->z_version = newvers; zfs_set_fuid_feature(zfsvfs); return (0); } /* * Read a property stored within the master node. */ int zfs_get_zplprop(objset_t *os, zfs_prop_t prop, uint64_t *value) { const char *pname; int error = ENOENT; /* * Look up the file system's value for the property. For the * version property, we look up a slightly different string. */ if (prop == ZFS_PROP_VERSION) pname = ZPL_VERSION_STR; else pname = zfs_prop_to_name(prop); if (os != NULL) error = zap_lookup(os, MASTER_NODE_OBJ, pname, 8, 1, value); if (error == ENOENT) { /* No value set, use the default value */ switch (prop) { case ZFS_PROP_VERSION: *value = ZPL_VERSION; break; case ZFS_PROP_NORMALIZE: case ZFS_PROP_UTF8ONLY: *value = 0; break; case ZFS_PROP_CASE: *value = ZFS_CASE_SENSITIVE; break; default: return (error); } error = 0; } return (error); } #ifdef _KERNEL void zfsvfs_update_fromname(const char *oldname, const char *newname) { char tmpbuf[MAXPATHLEN]; struct mount *mp; char *fromname; size_t oldlen; oldlen = strlen(oldname); mtx_lock(&mountlist_mtx); TAILQ_FOREACH(mp, &mountlist, mnt_list) { fromname = mp->mnt_stat.f_mntfromname; if (strcmp(fromname, oldname) == 0) { (void)strlcpy(fromname, newname, sizeof(mp->mnt_stat.f_mntfromname)); continue; } if (strncmp(fromname, oldname, oldlen) == 0 && (fromname[oldlen] == '/' || fromname[oldlen] == '@')) { (void)snprintf(tmpbuf, sizeof(tmpbuf), "%s%s", newname, fromname + oldlen); (void)strlcpy(fromname, tmpbuf, sizeof(mp->mnt_stat.f_mntfromname)); continue; } } mtx_unlock(&mountlist_mtx); } #endif Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c =================================================================== --- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (revision 303774) +++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c (revision 303775) @@ -1,7284 +1,6025 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2012, 2015 by Delphix. All rights reserved. * Copyright 2014 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2014 Integros [integros.com] */ /* Portions Copyright 2007 Jeremy Teo */ /* Portions Copyright 2010 Robert Milkowski */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include -#include #include #include #include #include #include #include #include #include /* * Programming rules. * * Each vnode op performs some logical unit of work. To do this, the ZPL must * properly lock its in-core state, create a DMU transaction, do the work, * record this work in the intent log (ZIL), commit the DMU transaction, * and wait for the intent log to commit if it is a synchronous operation. * Moreover, the vnode ops must work in both normal and log replay context. * The ordering of events is important to avoid deadlocks and references * to freed memory. The example below illustrates the following Big Rules: * * (1) A check must be made in each zfs thread for a mounted file system. * This is done avoiding races using ZFS_ENTER(zfsvfs). * A ZFS_EXIT(zfsvfs) is needed before all returns. Any znodes * must be checked with ZFS_VERIFY_ZP(zp). Both of these macros * can return EIO from the calling function. * * (2) VN_RELE() should always be the last thing except for zil_commit() * (if necessary) and ZFS_EXIT(). This is for 3 reasons: * First, if it's the last reference, the vnode/znode * can be freed, so the zp may point to freed memory. Second, the last * reference will call zfs_zinactive(), which may induce a lot of work -- * pushing cached pages (which acquires range locks) and syncing out * cached atime changes. Third, zfs_zinactive() may require a new tx, * which could deadlock the system if you were already holding one. * If you must call VN_RELE() within a tx then use VN_RELE_ASYNC(). * * (3) All range locks must be grabbed before calling dmu_tx_assign(), * as they can span dmu_tx_assign() calls. * * (4) If ZPL locks are held, pass TXG_NOWAIT as the second argument to * dmu_tx_assign(). This is critical because we don't want to block * while holding locks. * * If no ZPL locks are held (aside from ZFS_ENTER()), use TXG_WAIT. This * reduces lock contention and CPU usage when we must wait (note that if * throughput is constrained by the storage, nearly every transaction * must wait). * * Note, in particular, that if a lock is sometimes acquired before * the tx assigns, and sometimes after (e.g. z_lock), then failing * to use a non-blocking assign can deadlock the system. The scenario: * * Thread A has grabbed a lock before calling dmu_tx_assign(). * Thread B is in an already-assigned tx, and blocks for this lock. * Thread A calls dmu_tx_assign(TXG_WAIT) and blocks in txg_wait_open() * forever, because the previous txg can't quiesce until B's tx commits. * * If dmu_tx_assign() returns ERESTART and zfsvfs->z_assign is TXG_NOWAIT, * then drop all locks, call dmu_tx_wait(), and try again. On subsequent * calls to dmu_tx_assign(), pass TXG_WAITED rather than TXG_NOWAIT, * to indicate that this operation has already called dmu_tx_wait(). * This will ensure that we don't retry forever, waiting a short bit * each time. * * (5) If the operation succeeded, generate the intent log entry for it * before dropping locks. This ensures that the ordering of events * in the intent log matches the order in which they actually occurred. * During ZIL replay the zfs_log_* functions will update the sequence * number to indicate the zil transaction has replayed. * * (6) At the end of each vnode op, the DMU tx must always commit, * regardless of whether there were any errors. * * (7) After dropping all locks, invoke zil_commit(zilog, foid) * to ensure that synchronous semantics are provided when necessary. * * In general, this is how things should be ordered in each vnode op: * * ZFS_ENTER(zfsvfs); // exit if unmounted * top: - * zfs_dirent_lock(&dl, ...) // lock directory entry (may VN_HOLD()) + * zfs_dirent_lookup(&dl, ...) // lock directory entry (may VN_HOLD()) * rw_enter(...); // grab any other locks you need * tx = dmu_tx_create(...); // get DMU tx * dmu_tx_hold_*(); // hold each object you might modify * error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT); * if (error) { * rw_exit(...); // drop locks * zfs_dirent_unlock(dl); // unlock directory entry * VN_RELE(...); // release held vnodes * if (error == ERESTART) { * waited = B_TRUE; * dmu_tx_wait(tx); * dmu_tx_abort(tx); * goto top; * } * dmu_tx_abort(tx); // abort DMU tx * ZFS_EXIT(zfsvfs); // finished in zfs * return (error); // really out of space * } * error = do_real_work(); // do whatever this VOP does * if (error == 0) * zfs_log_*(...); // on success, make ZIL entry * dmu_tx_commit(tx); // commit DMU tx -- error or not * rw_exit(...); // drop locks * zfs_dirent_unlock(dl); // unlock directory entry * VN_RELE(...); // release held vnodes * zil_commit(zilog, foid); // synchronous when necessary * ZFS_EXIT(zfsvfs); // finished in zfs * return (error); // done, report error */ /* ARGSUSED */ static int zfs_open(vnode_t **vpp, int flag, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(*vpp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); if ((flag & FWRITE) && (zp->z_pflags & ZFS_APPENDONLY) && ((flag & FAPPEND) == 0)) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } if (!zfs_has_ctldir(zp) && zp->z_zfsvfs->z_vscan && ZTOV(zp)->v_type == VREG && !(zp->z_pflags & ZFS_AV_QUARANTINED) && zp->z_size > 0) { if (fs_vscan(*vpp, cr, 0) != 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EACCES)); } } /* Keep a count of the synchronous opens in the znode */ if (flag & (FSYNC | FDSYNC)) atomic_inc_32(&zp->z_sync_cnt); ZFS_EXIT(zfsvfs); return (0); } /* ARGSUSED */ static int zfs_close(vnode_t *vp, int flag, int count, offset_t offset, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; /* * Clean up any locks held by this process on the vp. */ cleanlocks(vp, ddi_get_pid(), 0); cleanshares(vp, ddi_get_pid()); ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); /* Decrement the synchronous opens in the znode */ if ((flag & (FSYNC | FDSYNC)) && (count == 1)) atomic_dec_32(&zp->z_sync_cnt); if (!zfs_has_ctldir(zp) && zp->z_zfsvfs->z_vscan && ZTOV(zp)->v_type == VREG && !(zp->z_pflags & ZFS_AV_QUARANTINED) && zp->z_size > 0) VERIFY(fs_vscan(vp, cr, 1) == 0); ZFS_EXIT(zfsvfs); return (0); } /* * Lseek support for finding holes (cmd == _FIO_SEEK_HOLE) and * data (cmd == _FIO_SEEK_DATA). "off" is an in/out parameter. */ static int zfs_holey(vnode_t *vp, u_long cmd, offset_t *off) { znode_t *zp = VTOZ(vp); uint64_t noff = (uint64_t)*off; /* new offset */ uint64_t file_sz; int error; boolean_t hole; file_sz = zp->z_size; if (noff >= file_sz) { return (SET_ERROR(ENXIO)); } if (cmd == _FIO_SEEK_HOLE) hole = B_TRUE; else hole = B_FALSE; error = dmu_offset_next(zp->z_zfsvfs->z_os, zp->z_id, hole, &noff); if (error == ESRCH) return (SET_ERROR(ENXIO)); /* * We could find a hole that begins after the logical end-of-file, * because dmu_offset_next() only works on whole blocks. If the * EOF falls mid-block, then indicate that the "virtual hole" * at the end of the file begins at the logical EOF, rather than * at the end of the last block. */ if (noff > file_sz) { ASSERT(hole); noff = file_sz; } if (noff < *off) return (error); *off = noff; return (error); } /* ARGSUSED */ static int zfs_ioctl(vnode_t *vp, u_long com, intptr_t data, int flag, cred_t *cred, int *rvalp, caller_context_t *ct) { offset_t off; offset_t ndata; dmu_object_info_t doi; int error; zfsvfs_t *zfsvfs; znode_t *zp; switch (com) { case _FIOFFS: { return (0); /* * The following two ioctls are used by bfu. Faking out, * necessary to avoid bfu errors. */ } case _FIOGDIO: case _FIOSDIO: { return (0); } case _FIO_SEEK_DATA: case _FIO_SEEK_HOLE: { #ifdef illumos if (ddi_copyin((void *)data, &off, sizeof (off), flag)) return (SET_ERROR(EFAULT)); #else off = *(offset_t *)data; #endif zp = VTOZ(vp); zfsvfs = zp->z_zfsvfs; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); /* offset parameter is in/out */ error = zfs_holey(vp, com, &off); ZFS_EXIT(zfsvfs); if (error) return (error); #ifdef illumos if (ddi_copyout(&off, (void *)data, sizeof (off), flag)) return (SET_ERROR(EFAULT)); #else *(offset_t *)data = off; #endif return (0); } #ifdef illumos case _FIO_COUNT_FILLED: { /* * _FIO_COUNT_FILLED adds a new ioctl command which * exposes the number of filled blocks in a * ZFS object. */ zp = VTOZ(vp); zfsvfs = zp->z_zfsvfs; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); /* * Wait for all dirty blocks for this object * to get synced out to disk, and the DMU info * updated. */ error = dmu_object_wait_synced(zfsvfs->z_os, zp->z_id); if (error) { ZFS_EXIT(zfsvfs); return (error); } /* * Retrieve fill count from DMU object. */ error = dmu_object_info(zfsvfs->z_os, zp->z_id, &doi); if (error) { ZFS_EXIT(zfsvfs); return (error); } ndata = doi.doi_fill_count; ZFS_EXIT(zfsvfs); if (ddi_copyout(&ndata, (void *)data, sizeof (ndata), flag)) return (SET_ERROR(EFAULT)); return (0); } #endif } return (SET_ERROR(ENOTTY)); } static vm_page_t page_busy(vnode_t *vp, int64_t start, int64_t off, int64_t nbytes) { vm_object_t obj; vm_page_t pp; int64_t end; /* * At present vm_page_clear_dirty extends the cleared range to DEV_BSIZE * aligned boundaries, if the range is not aligned. As a result a * DEV_BSIZE subrange with partially dirty data may get marked as clean. * It may happen that all DEV_BSIZE subranges are marked clean and thus * the whole page would be considred clean despite have some dirty data. * For this reason we should shrink the range to DEV_BSIZE aligned * boundaries before calling vm_page_clear_dirty. */ end = rounddown2(off + nbytes, DEV_BSIZE); off = roundup2(off, DEV_BSIZE); nbytes = end - off; obj = vp->v_object; zfs_vmobject_assert_wlocked(obj); for (;;) { if ((pp = vm_page_lookup(obj, OFF_TO_IDX(start))) != NULL && pp->valid) { if (vm_page_xbusied(pp)) { /* * Reference the page before unlocking and * sleeping so that the page daemon is less * likely to reclaim it. */ vm_page_reference(pp); vm_page_lock(pp); zfs_vmobject_wunlock(obj); vm_page_busy_sleep(pp, "zfsmwb"); zfs_vmobject_wlock(obj); continue; } vm_page_sbusy(pp); } else if (pp == NULL) { pp = vm_page_alloc(obj, OFF_TO_IDX(start), VM_ALLOC_SYSTEM | VM_ALLOC_IFCACHED | VM_ALLOC_SBUSY); } else { ASSERT(pp != NULL && !pp->valid); pp = NULL; } if (pp != NULL) { ASSERT3U(pp->valid, ==, VM_PAGE_BITS_ALL); vm_object_pip_add(obj, 1); pmap_remove_write(pp); if (nbytes != 0) vm_page_clear_dirty(pp, off, nbytes); } break; } return (pp); } static void page_unbusy(vm_page_t pp) { vm_page_sunbusy(pp); vm_object_pip_subtract(pp->object, 1); } static vm_page_t page_hold(vnode_t *vp, int64_t start) { vm_object_t obj; vm_page_t pp; obj = vp->v_object; zfs_vmobject_assert_wlocked(obj); for (;;) { if ((pp = vm_page_lookup(obj, OFF_TO_IDX(start))) != NULL && pp->valid) { if (vm_page_xbusied(pp)) { /* * Reference the page before unlocking and * sleeping so that the page daemon is less * likely to reclaim it. */ vm_page_reference(pp); vm_page_lock(pp); zfs_vmobject_wunlock(obj); vm_page_busy_sleep(pp, "zfsmwb"); zfs_vmobject_wlock(obj); continue; } ASSERT3U(pp->valid, ==, VM_PAGE_BITS_ALL); vm_page_lock(pp); vm_page_hold(pp); vm_page_unlock(pp); } else pp = NULL; break; } return (pp); } static void page_unhold(vm_page_t pp) { vm_page_lock(pp); vm_page_unhold(pp); vm_page_unlock(pp); } /* * When a file is memory mapped, we must keep the IO data synchronized * between the DMU cache and the memory mapped pages. What this means: * * On Write: If we find a memory mapped page, we write to *both* * the page and the dmu buffer. */ static void update_pages(vnode_t *vp, int64_t start, int len, objset_t *os, uint64_t oid, int segflg, dmu_tx_t *tx) { vm_object_t obj; struct sf_buf *sf; caddr_t va; int off; ASSERT(segflg != UIO_NOCOPY); ASSERT(vp->v_mount != NULL); obj = vp->v_object; ASSERT(obj != NULL); off = start & PAGEOFFSET; zfs_vmobject_wlock(obj); for (start &= PAGEMASK; len > 0; start += PAGESIZE) { vm_page_t pp; int nbytes = imin(PAGESIZE - off, len); if ((pp = page_busy(vp, start, off, nbytes)) != NULL) { zfs_vmobject_wunlock(obj); va = zfs_map_page(pp, &sf); (void) dmu_read(os, oid, start+off, nbytes, va+off, DMU_READ_PREFETCH);; zfs_unmap_page(sf); zfs_vmobject_wlock(obj); page_unbusy(pp); } len -= nbytes; off = 0; } vm_object_pip_wakeupn(obj, 0); zfs_vmobject_wunlock(obj); } /* * Read with UIO_NOCOPY flag means that sendfile(2) requests * ZFS to populate a range of page cache pages with data. * * NOTE: this function could be optimized to pre-allocate * all pages in advance, drain exclusive busy on all of them, * map them into contiguous KVA region and populate them * in one single dmu_read() call. */ static int mappedread_sf(vnode_t *vp, int nbytes, uio_t *uio) { znode_t *zp = VTOZ(vp); objset_t *os = zp->z_zfsvfs->z_os; struct sf_buf *sf; vm_object_t obj; vm_page_t pp; int64_t start; caddr_t va; int len = nbytes; int off; int error = 0; ASSERT(uio->uio_segflg == UIO_NOCOPY); ASSERT(vp->v_mount != NULL); obj = vp->v_object; ASSERT(obj != NULL); ASSERT((uio->uio_loffset & PAGEOFFSET) == 0); zfs_vmobject_wlock(obj); for (start = uio->uio_loffset; len > 0; start += PAGESIZE) { int bytes = MIN(PAGESIZE, len); pp = vm_page_grab(obj, OFF_TO_IDX(start), VM_ALLOC_SBUSY | VM_ALLOC_NORMAL | VM_ALLOC_IGN_SBUSY); if (pp->valid == 0) { zfs_vmobject_wunlock(obj); va = zfs_map_page(pp, &sf); error = dmu_read(os, zp->z_id, start, bytes, va, DMU_READ_PREFETCH); if (bytes != PAGESIZE && error == 0) bzero(va + bytes, PAGESIZE - bytes); zfs_unmap_page(sf); zfs_vmobject_wlock(obj); vm_page_sunbusy(pp); vm_page_lock(pp); if (error) { if (pp->wire_count == 0 && pp->valid == 0 && !vm_page_busied(pp)) vm_page_free(pp); } else { pp->valid = VM_PAGE_BITS_ALL; vm_page_activate(pp); } vm_page_unlock(pp); } else { ASSERT3U(pp->valid, ==, VM_PAGE_BITS_ALL); vm_page_sunbusy(pp); } if (error) break; uio->uio_resid -= bytes; uio->uio_offset += bytes; len -= bytes; } zfs_vmobject_wunlock(obj); return (error); } /* * When a file is memory mapped, we must keep the IO data synchronized * between the DMU cache and the memory mapped pages. What this means: * * On Read: We "read" preferentially from memory mapped pages, * else we default from the dmu buffer. * * NOTE: We will always "break up" the IO into PAGESIZE uiomoves when * the file is memory mapped. */ static int mappedread(vnode_t *vp, int nbytes, uio_t *uio) { znode_t *zp = VTOZ(vp); vm_object_t obj; int64_t start; caddr_t va; int len = nbytes; int off; int error = 0; ASSERT(vp->v_mount != NULL); obj = vp->v_object; ASSERT(obj != NULL); start = uio->uio_loffset; off = start & PAGEOFFSET; zfs_vmobject_wlock(obj); for (start &= PAGEMASK; len > 0; start += PAGESIZE) { vm_page_t pp; uint64_t bytes = MIN(PAGESIZE - off, len); if (pp = page_hold(vp, start)) { struct sf_buf *sf; caddr_t va; zfs_vmobject_wunlock(obj); va = zfs_map_page(pp, &sf); #ifdef illumos error = uiomove(va + off, bytes, UIO_READ, uio); #else error = vn_io_fault_uiomove(va + off, bytes, uio); #endif zfs_unmap_page(sf); zfs_vmobject_wlock(obj); page_unhold(pp); } else { zfs_vmobject_wunlock(obj); error = dmu_read_uio_dbuf(sa_get_db(zp->z_sa_hdl), uio, bytes); zfs_vmobject_wlock(obj); } len -= bytes; off = 0; if (error) break; } zfs_vmobject_wunlock(obj); return (error); } offset_t zfs_read_chunk_size = 1024 * 1024; /* Tunable */ /* * Read bytes from specified file into supplied buffer. * * IN: vp - vnode of file to be read from. * uio - structure supplying read location, range info, * and return buffer. * ioflag - SYNC flags; used to provide FRSYNC semantics. * cr - credentials of caller. * ct - caller context * * OUT: uio - updated offset and range, buffer filled. * * RETURN: 0 on success, error code on failure. * * Side Effects: * vp - atime updated if byte count > 0 */ /* ARGSUSED */ static int zfs_read(vnode_t *vp, uio_t *uio, int ioflag, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; ssize_t n, nbytes; int error = 0; rl_t *rl; xuio_t *xuio = NULL; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); if (zp->z_pflags & ZFS_AV_QUARANTINED) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EACCES)); } /* * Validate file offset */ if (uio->uio_loffset < (offset_t)0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } /* * Fasttrack empty reads */ if (uio->uio_resid == 0) { ZFS_EXIT(zfsvfs); return (0); } /* * Check for mandatory locks */ if (MANDMODE(zp->z_mode)) { if (error = chklock(vp, FREAD, uio->uio_loffset, uio->uio_resid, uio->uio_fmode, ct)) { ZFS_EXIT(zfsvfs); return (error); } } /* * If we're in FRSYNC mode, sync out this znode before reading it. */ if (zfsvfs->z_log && (ioflag & FRSYNC || zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)) zil_commit(zfsvfs->z_log, zp->z_id); /* * Lock the range against changes. */ rl = zfs_range_lock(zp, uio->uio_loffset, uio->uio_resid, RL_READER); /* * If we are reading past end-of-file we can skip * to the end; but we might still need to set atime. */ if (uio->uio_loffset >= zp->z_size) { error = 0; goto out; } ASSERT(uio->uio_loffset < zp->z_size); n = MIN(uio->uio_resid, zp->z_size - uio->uio_loffset); #ifdef illumos if ((uio->uio_extflg == UIO_XUIO) && (((xuio_t *)uio)->xu_type == UIOTYPE_ZEROCOPY)) { int nblk; int blksz = zp->z_blksz; uint64_t offset = uio->uio_loffset; xuio = (xuio_t *)uio; if ((ISP2(blksz))) { nblk = (P2ROUNDUP(offset + n, blksz) - P2ALIGN(offset, blksz)) / blksz; } else { ASSERT(offset + n <= blksz); nblk = 1; } (void) dmu_xuio_init(xuio, nblk); if (vn_has_cached_data(vp)) { /* * For simplicity, we always allocate a full buffer * even if we only expect to read a portion of a block. */ while (--nblk >= 0) { (void) dmu_xuio_add(xuio, dmu_request_arcbuf(sa_get_db(zp->z_sa_hdl), blksz), 0, blksz); } } } #endif /* illumos */ while (n > 0) { nbytes = MIN(n, zfs_read_chunk_size - P2PHASE(uio->uio_loffset, zfs_read_chunk_size)); #ifdef __FreeBSD__ if (uio->uio_segflg == UIO_NOCOPY) error = mappedread_sf(vp, nbytes, uio); else #endif /* __FreeBSD__ */ if (vn_has_cached_data(vp)) { error = mappedread(vp, nbytes, uio); } else { error = dmu_read_uio_dbuf(sa_get_db(zp->z_sa_hdl), uio, nbytes); } if (error) { /* convert checksum errors into IO errors */ if (error == ECKSUM) error = SET_ERROR(EIO); break; } n -= nbytes; } out: zfs_range_unlock(rl); ZFS_ACCESSTIME_STAMP(zfsvfs, zp); ZFS_EXIT(zfsvfs); return (error); } /* * Write the bytes to a file. * * IN: vp - vnode of file to be written to. * uio - structure supplying write location, range info, * and data buffer. * ioflag - FAPPEND, FSYNC, and/or FDSYNC. FAPPEND is * set if in append mode. * cr - credentials of caller. * ct - caller context (NFS/CIFS fem monitor only) * * OUT: uio - updated offset and range. * * RETURN: 0 on success, error code on failure. * * Timestamps: * vp - ctime|mtime updated if byte count > 0 */ /* ARGSUSED */ static int zfs_write(vnode_t *vp, uio_t *uio, int ioflag, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); rlim64_t limit = MAXOFFSET_T; ssize_t start_resid = uio->uio_resid; ssize_t tx_bytes; uint64_t end_size; dmu_tx_t *tx; zfsvfs_t *zfsvfs = zp->z_zfsvfs; zilog_t *zilog; offset_t woff; ssize_t n, nbytes; rl_t *rl; int max_blksz = zfsvfs->z_max_blksz; int error = 0; arc_buf_t *abuf; iovec_t *aiov = NULL; xuio_t *xuio = NULL; int i_iov = 0; int iovcnt = uio->uio_iovcnt; iovec_t *iovp = uio->uio_iov; int write_eof; int count = 0; sa_bulk_attr_t bulk[4]; uint64_t mtime[2], ctime[2]; /* * Fasttrack empty write */ n = start_resid; if (n == 0) return (0); if (limit == RLIM64_INFINITY || limit > MAXOFFSET_T) limit = MAXOFFSET_T; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, &mtime, 16); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, 16); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs), NULL, &zp->z_size, 8); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL, &zp->z_pflags, 8); /* * In a case vp->v_vfsp != zp->z_zfsvfs->z_vfs (e.g. snapshots) our * callers might not be able to detect properly that we are read-only, * so check it explicitly here. */ if (zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EROFS)); } /* * If immutable or not appending then return EPERM */ if ((zp->z_pflags & (ZFS_IMMUTABLE | ZFS_READONLY)) || ((zp->z_pflags & ZFS_APPENDONLY) && !(ioflag & FAPPEND) && (uio->uio_loffset < zp->z_size))) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } zilog = zfsvfs->z_log; /* * Validate file offset */ woff = ioflag & FAPPEND ? zp->z_size : uio->uio_loffset; if (woff < 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } /* * Check for mandatory locks before calling zfs_range_lock() * in order to prevent a deadlock with locks set via fcntl(). */ if (MANDMODE((mode_t)zp->z_mode) && (error = chklock(vp, FWRITE, woff, n, uio->uio_fmode, ct)) != 0) { ZFS_EXIT(zfsvfs); return (error); } #ifdef illumos /* * Pre-fault the pages to ensure slow (eg NFS) pages * don't hold up txg. * Skip this if uio contains loaned arc_buf. */ if ((uio->uio_extflg == UIO_XUIO) && (((xuio_t *)uio)->xu_type == UIOTYPE_ZEROCOPY)) xuio = (xuio_t *)uio; else uio_prefaultpages(MIN(n, max_blksz), uio); #endif /* * If in append mode, set the io offset pointer to eof. */ if (ioflag & FAPPEND) { /* * Obtain an appending range lock to guarantee file append * semantics. We reset the write offset once we have the lock. */ rl = zfs_range_lock(zp, 0, n, RL_APPEND); woff = rl->r_off; if (rl->r_len == UINT64_MAX) { /* * We overlocked the file because this write will cause * the file block size to increase. * Note that zp_size cannot change with this lock held. */ woff = zp->z_size; } uio->uio_loffset = woff; } else { /* * Note that if the file block size will change as a result of * this write, then this range lock will lock the entire file * so that we can re-write the block safely. */ rl = zfs_range_lock(zp, woff, n, RL_WRITER); } if (vn_rlimit_fsize(vp, uio, uio->uio_td)) { zfs_range_unlock(rl); ZFS_EXIT(zfsvfs); return (EFBIG); } if (woff >= limit) { zfs_range_unlock(rl); ZFS_EXIT(zfsvfs); return (SET_ERROR(EFBIG)); } if ((woff + n) > limit || woff > (limit - n)) n = limit - woff; /* Will this write extend the file length? */ write_eof = (woff + n > zp->z_size); end_size = MAX(zp->z_size, woff + n); /* * Write the file in reasonable size chunks. Each chunk is written * in a separate transaction; this keeps the intent log records small * and allows us to do more fine-grained space accounting. */ while (n > 0) { abuf = NULL; woff = uio->uio_loffset; if (zfs_owner_overquota(zfsvfs, zp, B_FALSE) || zfs_owner_overquota(zfsvfs, zp, B_TRUE)) { if (abuf != NULL) dmu_return_arcbuf(abuf); error = SET_ERROR(EDQUOT); break; } if (xuio && abuf == NULL) { ASSERT(i_iov < iovcnt); aiov = &iovp[i_iov]; abuf = dmu_xuio_arcbuf(xuio, i_iov); dmu_xuio_clear(xuio, i_iov); DTRACE_PROBE3(zfs_cp_write, int, i_iov, iovec_t *, aiov, arc_buf_t *, abuf); ASSERT((aiov->iov_base == abuf->b_data) || ((char *)aiov->iov_base - (char *)abuf->b_data + aiov->iov_len == arc_buf_size(abuf))); i_iov++; } else if (abuf == NULL && n >= max_blksz && woff >= zp->z_size && P2PHASE(woff, max_blksz) == 0 && zp->z_blksz == max_blksz) { /* * This write covers a full block. "Borrow" a buffer * from the dmu so that we can fill it before we enter * a transaction. This avoids the possibility of * holding up the transaction if the data copy hangs * up on a pagefault (e.g., from an NFS server mapping). */ #ifdef illumos size_t cbytes; #endif abuf = dmu_request_arcbuf(sa_get_db(zp->z_sa_hdl), max_blksz); ASSERT(abuf != NULL); ASSERT(arc_buf_size(abuf) == max_blksz); #ifdef illumos if (error = uiocopy(abuf->b_data, max_blksz, UIO_WRITE, uio, &cbytes)) { dmu_return_arcbuf(abuf); break; } ASSERT(cbytes == max_blksz); #else ssize_t resid = uio->uio_resid; error = vn_io_fault_uiomove(abuf->b_data, max_blksz, uio); if (error != 0) { uio->uio_offset -= resid - uio->uio_resid; uio->uio_resid = resid; dmu_return_arcbuf(abuf); break; } #endif } /* * Start a transaction. */ tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE); dmu_tx_hold_write(tx, zp->z_id, woff, MIN(n, max_blksz)); zfs_sa_upgrade_txholds(tx, zp); error = dmu_tx_assign(tx, TXG_WAIT); if (error) { dmu_tx_abort(tx); if (abuf != NULL) dmu_return_arcbuf(abuf); break; } /* * If zfs_range_lock() over-locked we grow the blocksize * and then reduce the lock range. This will only happen * on the first iteration since zfs_range_reduce() will * shrink down r_len to the appropriate size. */ if (rl->r_len == UINT64_MAX) { uint64_t new_blksz; if (zp->z_blksz > max_blksz) { /* * File's blocksize is already larger than the * "recordsize" property. Only let it grow to * the next power of 2. */ ASSERT(!ISP2(zp->z_blksz)); new_blksz = MIN(end_size, 1 << highbit64(zp->z_blksz)); } else { new_blksz = MIN(end_size, max_blksz); } zfs_grow_blocksize(zp, new_blksz, tx); zfs_range_reduce(rl, woff, n); } /* * XXX - should we really limit each write to z_max_blksz? * Perhaps we should use SPA_MAXBLOCKSIZE chunks? */ nbytes = MIN(n, max_blksz - P2PHASE(woff, max_blksz)); if (woff + nbytes > zp->z_size) vnode_pager_setsize(vp, woff + nbytes); if (abuf == NULL) { tx_bytes = uio->uio_resid; error = dmu_write_uio_dbuf(sa_get_db(zp->z_sa_hdl), uio, nbytes, tx); tx_bytes -= uio->uio_resid; } else { tx_bytes = nbytes; ASSERT(xuio == NULL || tx_bytes == aiov->iov_len); /* * If this is not a full block write, but we are * extending the file past EOF and this data starts * block-aligned, use assign_arcbuf(). Otherwise, * write via dmu_write(). */ if (tx_bytes < max_blksz && (!write_eof || aiov->iov_base != abuf->b_data)) { ASSERT(xuio); dmu_write(zfsvfs->z_os, zp->z_id, woff, aiov->iov_len, aiov->iov_base, tx); dmu_return_arcbuf(abuf); xuio_stat_wbuf_copied(); } else { ASSERT(xuio || tx_bytes == max_blksz); dmu_assign_arcbuf(sa_get_db(zp->z_sa_hdl), woff, abuf, tx); } #ifdef illumos ASSERT(tx_bytes <= uio->uio_resid); uioskip(uio, tx_bytes); #endif } if (tx_bytes && vn_has_cached_data(vp)) { update_pages(vp, woff, tx_bytes, zfsvfs->z_os, zp->z_id, uio->uio_segflg, tx); } /* * If we made no progress, we're done. If we made even * partial progress, update the znode and ZIL accordingly. */ if (tx_bytes == 0) { (void) sa_update(zp->z_sa_hdl, SA_ZPL_SIZE(zfsvfs), (void *)&zp->z_size, sizeof (uint64_t), tx); dmu_tx_commit(tx); ASSERT(error != 0); break; } /* * Clear Set-UID/Set-GID bits on successful write if not * privileged and at least one of the excute bits is set. * * It would be nice to to this after all writes have * been done, but that would still expose the ISUID/ISGID * to another app after the partial write is committed. * * Note: we don't call zfs_fuid_map_id() here because * user 0 is not an ephemeral uid. */ mutex_enter(&zp->z_acl_lock); if ((zp->z_mode & (S_IXUSR | (S_IXUSR >> 3) | (S_IXUSR >> 6))) != 0 && (zp->z_mode & (S_ISUID | S_ISGID)) != 0 && secpolicy_vnode_setid_retain(vp, cr, (zp->z_mode & S_ISUID) != 0 && zp->z_uid == 0) != 0) { uint64_t newmode; zp->z_mode &= ~(S_ISUID | S_ISGID); newmode = zp->z_mode; (void) sa_update(zp->z_sa_hdl, SA_ZPL_MODE(zfsvfs), (void *)&newmode, sizeof (uint64_t), tx); } mutex_exit(&zp->z_acl_lock); zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime, B_TRUE); /* * Update the file size (zp_size) if it has changed; * account for possible concurrent updates. */ while ((end_size = zp->z_size) < uio->uio_loffset) { (void) atomic_cas_64(&zp->z_size, end_size, uio->uio_loffset); #ifdef illumos ASSERT(error == 0); #else ASSERT(error == 0 || error == EFAULT); #endif } /* * If we are replaying and eof is non zero then force * the file size to the specified eof. Note, there's no * concurrency during replay. */ if (zfsvfs->z_replay && zfsvfs->z_replay_eof != 0) zp->z_size = zfsvfs->z_replay_eof; if (error == 0) error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx); else (void) sa_bulk_update(zp->z_sa_hdl, bulk, count, tx); zfs_log_write(zilog, tx, TX_WRITE, zp, woff, tx_bytes, ioflag); dmu_tx_commit(tx); if (error != 0) break; ASSERT(tx_bytes == nbytes); n -= nbytes; #ifdef illumos if (!xuio && n > 0) uio_prefaultpages(MIN(n, max_blksz), uio); #endif } zfs_range_unlock(rl); /* * If we're in replay mode, or we made no progress, return error. * Otherwise, it's at least a partial write, so it's successful. */ if (zfsvfs->z_replay || uio->uio_resid == start_resid) { ZFS_EXIT(zfsvfs); return (error); } #ifdef __FreeBSD__ /* * EFAULT means that at least one page of the source buffer was not * available. VFS will re-try remaining I/O upon this error. */ if (error == EFAULT) { ZFS_EXIT(zfsvfs); return (error); } #endif if (ioflag & (FSYNC | FDSYNC) || zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) zil_commit(zilog, zp->z_id); ZFS_EXIT(zfsvfs); return (0); } void zfs_get_done(zgd_t *zgd, int error) { znode_t *zp = zgd->zgd_private; objset_t *os = zp->z_zfsvfs->z_os; if (zgd->zgd_db) dmu_buf_rele(zgd->zgd_db, zgd); zfs_range_unlock(zgd->zgd_rl); /* * Release the vnode asynchronously as we currently have the * txg stopped from syncing. */ VN_RELE_ASYNC(ZTOV(zp), dsl_pool_vnrele_taskq(dmu_objset_pool(os))); if (error == 0 && zgd->zgd_bp) zil_add_block(zgd->zgd_zilog, zgd->zgd_bp); kmem_free(zgd, sizeof (zgd_t)); } #ifdef DEBUG static int zil_fault_io = 0; #endif /* * Get data to generate a TX_WRITE intent log record. */ int zfs_get_data(void *arg, lr_write_t *lr, char *buf, zio_t *zio) { zfsvfs_t *zfsvfs = arg; objset_t *os = zfsvfs->z_os; znode_t *zp; uint64_t object = lr->lr_foid; uint64_t offset = lr->lr_offset; uint64_t size = lr->lr_length; blkptr_t *bp = &lr->lr_blkptr; dmu_buf_t *db; zgd_t *zgd; int error = 0; ASSERT(zio != NULL); ASSERT(size != 0); /* * Nothing to do if the file has been removed */ if (zfs_zget(zfsvfs, object, &zp) != 0) return (SET_ERROR(ENOENT)); if (zp->z_unlinked) { /* * Release the vnode asynchronously as we currently have the * txg stopped from syncing. */ VN_RELE_ASYNC(ZTOV(zp), dsl_pool_vnrele_taskq(dmu_objset_pool(os))); return (SET_ERROR(ENOENT)); } zgd = (zgd_t *)kmem_zalloc(sizeof (zgd_t), KM_SLEEP); zgd->zgd_zilog = zfsvfs->z_log; zgd->zgd_private = zp; /* * Write records come in two flavors: immediate and indirect. * For small writes it's cheaper to store the data with the * log record (immediate); for large writes it's cheaper to * sync the data and get a pointer to it (indirect) so that * we don't have to write the data twice. */ if (buf != NULL) { /* immediate write */ zgd->zgd_rl = zfs_range_lock(zp, offset, size, RL_READER); /* test for truncation needs to be done while range locked */ if (offset >= zp->z_size) { error = SET_ERROR(ENOENT); } else { error = dmu_read(os, object, offset, size, buf, DMU_READ_NO_PREFETCH); } ASSERT(error == 0 || error == ENOENT); } else { /* indirect write */ /* * Have to lock the whole block to ensure when it's * written out and it's checksum is being calculated * that no one can change the data. We need to re-check * blocksize after we get the lock in case it's changed! */ for (;;) { uint64_t blkoff; size = zp->z_blksz; blkoff = ISP2(size) ? P2PHASE(offset, size) : offset; offset -= blkoff; zgd->zgd_rl = zfs_range_lock(zp, offset, size, RL_READER); if (zp->z_blksz == size) break; offset += blkoff; zfs_range_unlock(zgd->zgd_rl); } /* test for truncation needs to be done while range locked */ if (lr->lr_offset >= zp->z_size) error = SET_ERROR(ENOENT); #ifdef DEBUG if (zil_fault_io) { error = SET_ERROR(EIO); zil_fault_io = 0; } #endif if (error == 0) error = dmu_buf_hold(os, object, offset, zgd, &db, DMU_READ_NO_PREFETCH); if (error == 0) { blkptr_t *obp = dmu_buf_get_blkptr(db); if (obp) { ASSERT(BP_IS_HOLE(bp)); *bp = *obp; } zgd->zgd_db = db; zgd->zgd_bp = bp; ASSERT(db->db_offset == offset); ASSERT(db->db_size == size); error = dmu_sync(zio, lr->lr_common.lrc_txg, zfs_get_done, zgd); ASSERT(error || lr->lr_length <= zp->z_blksz); /* * On success, we need to wait for the write I/O * initiated by dmu_sync() to complete before we can * release this dbuf. We will finish everything up * in the zfs_get_done() callback. */ if (error == 0) return (0); if (error == EALREADY) { lr->lr_common.lrc_txtype = TX_WRITE2; error = 0; } } } zfs_get_done(zgd, error); return (error); } /*ARGSUSED*/ static int zfs_access(vnode_t *vp, int mode, int flag, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; int error; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); if (flag & V_ACE_MASK) error = zfs_zaccess(zp, mode, flag, B_FALSE, cr); else error = zfs_zaccess_rwx(zp, mode, flag, cr); ZFS_EXIT(zfsvfs); return (error); } -/* - * If vnode is for a device return a specfs vnode instead. - */ static int -specvp_check(vnode_t **vpp, cred_t *cr) +zfs_dd_callback(struct mount *mp, void *arg, int lkflags, struct vnode **vpp) { - int error = 0; + int error; - if (IS_DEVVP(*vpp)) { - struct vnode *svp; - - svp = specvp(*vpp, (*vpp)->v_rdev, (*vpp)->v_type, cr); - VN_RELE(*vpp); - if (svp == NULL) - error = SET_ERROR(ENOSYS); - *vpp = svp; - } + *vpp = arg; + error = vn_lock(*vpp, lkflags); + if (error != 0) + vrele(*vpp); return (error); } +static int +zfs_lookup_lock(vnode_t *dvp, vnode_t *vp, const char *name, int lkflags) +{ + znode_t *zdp = VTOZ(dvp); + zfsvfs_t *zfsvfs = zdp->z_zfsvfs; + int error; + int ltype; + ASSERT_VOP_LOCKED(dvp, __func__); +#ifdef DIAGNOSTIC + ASSERT(!RRM_LOCK_HELD(&zfsvfs->z_teardown_lock)); +#endif + + if (name[0] == 0 || (name[0] == '.' && name[1] == 0)) { + ASSERT3P(dvp, ==, vp); + vref(dvp); + ltype = lkflags & LK_TYPE_MASK; + if (ltype != VOP_ISLOCKED(dvp)) { + if (ltype == LK_EXCLUSIVE) + vn_lock(dvp, LK_UPGRADE | LK_RETRY); + else /* if (ltype == LK_SHARED) */ + vn_lock(dvp, LK_DOWNGRADE | LK_RETRY); + + /* + * Relock for the "." case could leave us with + * reclaimed vnode. + */ + if (dvp->v_iflag & VI_DOOMED) { + vrele(dvp); + return (SET_ERROR(ENOENT)); + } + } + return (0); + } else if (name[0] == '.' && name[1] == '.' && name[2] == 0) { + /* + * Note that in this case, dvp is the child vnode, and we + * are looking up the parent vnode - exactly reverse from + * normal operation. Unlocking dvp requires some rather + * tricky unlock/relock dance to prevent mp from being freed; + * use vn_vget_ino_gen() which takes care of all that. + * + * XXX Note that there is a time window when both vnodes are + * unlocked. It is possible, although highly unlikely, that + * during that window the parent-child relationship between + * the vnodes may change, for example, get reversed. + * In that case we would have a wrong lock order for the vnodes. + * All other filesystems seem to ignore this problem, so we + * do the same here. + * A potential solution could be implemented as follows: + * - using LK_NOWAIT when locking the second vnode and retrying + * if necessary + * - checking that the parent-child relationship still holds + * after locking both vnodes and retrying if it doesn't + */ + error = vn_vget_ino_gen(dvp, zfs_dd_callback, vp, lkflags, &vp); + return (error); + } else { + error = vn_lock(vp, lkflags); + if (error != 0) + vrele(vp); + return (error); + } +} + /* * Lookup an entry in a directory, or an extended attribute directory. * If it exists, return a held vnode reference for it. * * IN: dvp - vnode of directory to search. * nm - name of entry to lookup. * pnp - full pathname to lookup [UNUSED]. * flags - LOOKUP_XATTR set if looking for an attribute. * rdir - root directory vnode [UNUSED]. * cr - credentials of caller. * ct - caller context - * direntflags - directory lookup flags - * realpnp - returned pathname. * * OUT: vpp - vnode of located entry, NULL if not found. * * RETURN: 0 on success, error code on failure. * * Timestamps: * NA */ /* ARGSUSED */ static int zfs_lookup(vnode_t *dvp, char *nm, vnode_t **vpp, struct componentname *cnp, int nameiop, cred_t *cr, kthread_t *td, int flags) { znode_t *zdp = VTOZ(dvp); + znode_t *zp; zfsvfs_t *zfsvfs = zdp->z_zfsvfs; int error = 0; - int *direntflags = NULL; - void *realpnp = NULL; - /* fast path */ - if (!(flags & (LOOKUP_XATTR | FIGNORECASE))) { - + /* fast path (should be redundant with vfs namecache) */ + if (!(flags & LOOKUP_XATTR)) { if (dvp->v_type != VDIR) { return (SET_ERROR(ENOTDIR)); } else if (zdp->z_sa_hdl == NULL) { return (SET_ERROR(EIO)); } - - if (nm[0] == 0 || (nm[0] == '.' && nm[1] == '\0')) { - error = zfs_fastaccesschk_execute(zdp, cr); - if (!error) { - *vpp = dvp; - VN_HOLD(*vpp); - return (0); - } - return (error); - } else { - vnode_t *tvp = dnlc_lookup(dvp, nm); - - if (tvp) { - error = zfs_fastaccesschk_execute(zdp, cr); - if (error) { - VN_RELE(tvp); - return (error); - } - if (tvp == DNLC_NO_VNODE) { - VN_RELE(tvp); - return (SET_ERROR(ENOENT)); - } else { - *vpp = tvp; - return (specvp_check(vpp, cr)); - } - } - } } DTRACE_PROBE2(zfs__fastpath__lookup__miss, vnode_t *, dvp, char *, nm); ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zdp); *vpp = NULL; if (flags & LOOKUP_XATTR) { #ifdef TODO /* * If the xattr property is off, refuse the lookup request. */ if (!(zfsvfs->z_vfs->vfs_flag & VFS_XATTR)) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } #endif /* * We don't allow recursive attributes.. * Maybe someday we will. */ if (zdp->z_pflags & ZFS_XATTR) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } if (error = zfs_get_xattrdir(VTOZ(dvp), vpp, cr, flags)) { ZFS_EXIT(zfsvfs); return (error); } /* * Do we have permission to get into attribute directory? */ - if (error = zfs_zaccess(VTOZ(*vpp), ACE_EXECUTE, 0, B_FALSE, cr)) { - VN_RELE(*vpp); + vrele(*vpp); *vpp = NULL; } ZFS_EXIT(zfsvfs); return (error); } - if (dvp->v_type != VDIR) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(ENOTDIR)); - } - /* * Check accessibility of directory. */ - if (error = zfs_zaccess(zdp, ACE_EXECUTE, 0, B_FALSE, cr)) { ZFS_EXIT(zfsvfs); return (error); } if (zfsvfs->z_utf8 && u8_validate(nm, strlen(nm), NULL, U8_VALIDATE_ENTIRE, &error) < 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EILSEQ)); } - error = zfs_dirlook(zdp, nm, vpp, flags, direntflags, realpnp); - if (error == 0) - error = specvp_check(vpp, cr); + /* + * The loop is retry the lookup if the parent-child relationship + * changes during the dot-dot locking complexities. + */ + for (;;) { + uint64_t parent; + error = zfs_dirlook(zdp, nm, &zp); + if (error == 0) + *vpp = ZTOV(zp); + + ZFS_EXIT(zfsvfs); + if (error != 0) + break; + + error = zfs_lookup_lock(dvp, *vpp, nm, cnp->cn_lkflags); + if (error != 0) { + /* + * If we've got a locking error, then the vnode + * got reclaimed because of a force unmount. + * We never enter doomed vnodes into the name cache. + */ + *vpp = NULL; + return (error); + } + + if ((cnp->cn_flags & ISDOTDOT) == 0) + break; + + ZFS_ENTER(zfsvfs); + if (zdp->z_sa_hdl == NULL) { + error = SET_ERROR(EIO); + } else { + error = sa_lookup(zdp->z_sa_hdl, SA_ZPL_PARENT(zfsvfs), + &parent, sizeof (parent)); + } + if (error != 0) { + ZFS_EXIT(zfsvfs); + vput(ZTOV(zp)); + break; + } + if (zp->z_id == parent) { + ZFS_EXIT(zfsvfs); + break; + } + vput(ZTOV(zp)); + } + + if (error != 0) + *vpp = NULL; + /* Translate errors and add SAVENAME when needed. */ if (cnp->cn_flags & ISLASTCN) { switch (nameiop) { case CREATE: case RENAME: if (error == ENOENT) { error = EJUSTRETURN; cnp->cn_flags |= SAVENAME; break; } /* FALLTHROUGH */ case DELETE: if (error == 0) cnp->cn_flags |= SAVENAME; break; } } - if (error == 0 && (nm[0] != '.' || nm[1] != '\0')) { - int ltype = 0; - if (cnp->cn_flags & ISDOTDOT) { - ltype = VOP_ISLOCKED(dvp); - VOP_UNLOCK(dvp, 0); - } - ZFS_EXIT(zfsvfs); - error = vn_lock(*vpp, cnp->cn_lkflags); - if (cnp->cn_flags & ISDOTDOT) - vn_lock(dvp, ltype | LK_RETRY); - if (error != 0) { - VN_RELE(*vpp); - *vpp = NULL; - return (error); - } - } else { - ZFS_EXIT(zfsvfs); - } + /* Insert name into cache (as non-existent) if appropriate. */ + if (zfsvfs->z_use_namecache && + error == ENOENT && (cnp->cn_flags & MAKEENTRY) != 0) + cache_enter(dvp, NULL, cnp); -#ifdef FREEBSD_NAMECACHE - /* - * Insert name into cache (as non-existent) if appropriate. - */ - if (error == ENOENT && (cnp->cn_flags & MAKEENTRY) != 0) - cache_enter(dvp, *vpp, cnp); - /* - * Insert name into cache if appropriate. - */ - if (error == 0 && (cnp->cn_flags & MAKEENTRY)) { + /* Insert name into cache if appropriate. */ + if (zfsvfs->z_use_namecache && + error == 0 && (cnp->cn_flags & MAKEENTRY)) { if (!(cnp->cn_flags & ISLASTCN) || (nameiop != DELETE && nameiop != RENAME)) { cache_enter(dvp, *vpp, cnp); } } -#endif return (error); } /* * Attempt to create a new entry in a directory. If the entry * already exists, truncate the file if permissible, else return * an error. Return the vp of the created or trunc'd file. * * IN: dvp - vnode of directory to put new file entry in. * name - name of new file entry. * vap - attributes of new file. * excl - flag indicating exclusive or non-exclusive mode. * mode - mode to open file with. * cr - credentials of caller. * flag - large file flag [UNUSED]. * ct - caller context * vsecp - ACL to be set * * OUT: vpp - vnode of created or trunc'd entry. * * RETURN: 0 on success, error code on failure. * * Timestamps: * dvp - ctime|mtime updated if new entry created * vp - ctime|mtime always, atime if new */ /* ARGSUSED */ static int zfs_create(vnode_t *dvp, char *name, vattr_t *vap, int excl, int mode, vnode_t **vpp, cred_t *cr, kthread_t *td) { znode_t *zp, *dzp = VTOZ(dvp); zfsvfs_t *zfsvfs = dzp->z_zfsvfs; zilog_t *zilog; objset_t *os; - zfs_dirlock_t *dl; dmu_tx_t *tx; int error; ksid_t *ksid; uid_t uid; gid_t gid = crgetgid(cr); zfs_acl_ids_t acl_ids; boolean_t fuid_dirtied; - boolean_t have_acl = B_FALSE; - boolean_t waited = B_FALSE; void *vsecp = NULL; int flag = 0; + uint64_t txtype; /* * If we have an ephemeral id, ACL, or XVATTR then * make sure file system is at proper version */ ksid = crgetsid(cr, KSID_OWNER); if (ksid) uid = ksid_getid(ksid); else uid = crgetuid(cr); if (zfsvfs->z_use_fuids == B_FALSE && (vsecp || (vap->va_mask & AT_XVATTR) || IS_EPHEMERAL(uid) || IS_EPHEMERAL(gid))) return (SET_ERROR(EINVAL)); ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(dzp); os = zfsvfs->z_os; zilog = zfsvfs->z_log; if (zfsvfs->z_utf8 && u8_validate(name, strlen(name), NULL, U8_VALIDATE_ENTIRE, &error) < 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EILSEQ)); } if (vap->va_mask & AT_XVATTR) { if ((error = secpolicy_xvattr(dvp, (xvattr_t *)vap, crgetuid(cr), cr, vap->va_type)) != 0) { ZFS_EXIT(zfsvfs); return (error); } } - getnewvnode_reserve(1); - -top: *vpp = NULL; if ((vap->va_mode & S_ISVTX) && secpolicy_vnode_stky_modify(cr)) vap->va_mode &= ~S_ISVTX; - if (*name == '\0') { - /* - * Null component name refers to the directory itself. - */ - VN_HOLD(dvp); - zp = dzp; - dl = NULL; - error = 0; - } else { - /* possible VN_HOLD(zp) */ - int zflg = 0; + error = zfs_dirent_lookup(dzp, name, &zp, ZNEW); + if (error) { + ZFS_EXIT(zfsvfs); + return (error); + } + ASSERT3P(zp, ==, NULL); - if (flag & FIGNORECASE) - zflg |= ZCILOOK; - - error = zfs_dirent_lock(&dl, dzp, name, &zp, zflg, - NULL, NULL); - if (error) { - if (have_acl) - zfs_acl_ids_free(&acl_ids); - if (strcmp(name, "..") == 0) - error = SET_ERROR(EISDIR); - getnewvnode_drop_reserve(); - ZFS_EXIT(zfsvfs); - return (error); - } + /* + * Create a new file object and update the directory + * to reference it. + */ + if (error = zfs_zaccess(dzp, ACE_ADD_FILE, 0, B_FALSE, cr)) { + goto out; } - if (zp == NULL) { - uint64_t txtype; + /* + * We only support the creation of regular files in + * extended attribute directories. + */ - /* - * Create a new file object and update the directory - * to reference it. - */ - if (error = zfs_zaccess(dzp, ACE_ADD_FILE, 0, B_FALSE, cr)) { - if (have_acl) - zfs_acl_ids_free(&acl_ids); - goto out; - } + if ((dzp->z_pflags & ZFS_XATTR) && + (vap->va_type != VREG)) { + error = SET_ERROR(EINVAL); + goto out; + } - /* - * We only support the creation of regular files in - * extended attribute directories. - */ + if ((error = zfs_acl_ids_create(dzp, 0, vap, + cr, vsecp, &acl_ids)) != 0) + goto out; - if ((dzp->z_pflags & ZFS_XATTR) && - (vap->va_type != VREG)) { - if (have_acl) - zfs_acl_ids_free(&acl_ids); - error = SET_ERROR(EINVAL); - goto out; - } + if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) { + zfs_acl_ids_free(&acl_ids); + error = SET_ERROR(EDQUOT); + goto out; + } - if (!have_acl && (error = zfs_acl_ids_create(dzp, 0, vap, - cr, vsecp, &acl_ids)) != 0) - goto out; - have_acl = B_TRUE; + getnewvnode_reserve(1); - if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) { - zfs_acl_ids_free(&acl_ids); - error = SET_ERROR(EDQUOT); - goto out; - } + tx = dmu_tx_create(os); - tx = dmu_tx_create(os); + dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes + + ZFS_SA_BASE_ATTR_SIZE); - dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes + - ZFS_SA_BASE_ATTR_SIZE); - - fuid_dirtied = zfsvfs->z_fuid_dirty; - if (fuid_dirtied) - zfs_fuid_txhold(zfsvfs, tx); - dmu_tx_hold_zap(tx, dzp->z_id, TRUE, name); - dmu_tx_hold_sa(tx, dzp->z_sa_hdl, B_FALSE); - if (!zfsvfs->z_use_sa && - acl_ids.z_aclp->z_acl_bytes > ZFS_ACE_SPACE) { - dmu_tx_hold_write(tx, DMU_NEW_OBJECT, - 0, acl_ids.z_aclp->z_acl_bytes); - } - error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT); - if (error) { - zfs_dirent_unlock(dl); - if (error == ERESTART) { - waited = B_TRUE; - dmu_tx_wait(tx); - dmu_tx_abort(tx); - goto top; - } - zfs_acl_ids_free(&acl_ids); - dmu_tx_abort(tx); - getnewvnode_drop_reserve(); - ZFS_EXIT(zfsvfs); - return (error); - } - zfs_mknode(dzp, vap, tx, cr, 0, &zp, &acl_ids); - - if (fuid_dirtied) - zfs_fuid_sync(zfsvfs, tx); - - (void) zfs_link_create(dl, zp, tx, ZNEW); - txtype = zfs_log_create_txtype(Z_FILE, vsecp, vap); - if (flag & FIGNORECASE) - txtype |= TX_CI; - zfs_log_create(zilog, tx, txtype, dzp, zp, name, - vsecp, acl_ids.z_fuidp, vap); + fuid_dirtied = zfsvfs->z_fuid_dirty; + if (fuid_dirtied) + zfs_fuid_txhold(zfsvfs, tx); + dmu_tx_hold_zap(tx, dzp->z_id, TRUE, name); + dmu_tx_hold_sa(tx, dzp->z_sa_hdl, B_FALSE); + if (!zfsvfs->z_use_sa && + acl_ids.z_aclp->z_acl_bytes > ZFS_ACE_SPACE) { + dmu_tx_hold_write(tx, DMU_NEW_OBJECT, + 0, acl_ids.z_aclp->z_acl_bytes); + } + error = dmu_tx_assign(tx, TXG_WAIT); + if (error) { zfs_acl_ids_free(&acl_ids); - dmu_tx_commit(tx); - } else { - int aflags = (flag & FAPPEND) ? V_APPEND : 0; + dmu_tx_abort(tx); + getnewvnode_drop_reserve(); + ZFS_EXIT(zfsvfs); + return (error); + } + zfs_mknode(dzp, vap, tx, cr, 0, &zp, &acl_ids); - if (have_acl) - zfs_acl_ids_free(&acl_ids); - have_acl = B_FALSE; + if (fuid_dirtied) + zfs_fuid_sync(zfsvfs, tx); - /* - * A directory entry already exists for this name. - */ - /* - * Can't truncate an existing file if in exclusive mode. - */ - if (excl == EXCL) { - error = SET_ERROR(EEXIST); - goto out; - } - /* - * Can't open a directory for writing. - */ - if ((ZTOV(zp)->v_type == VDIR) && (mode & S_IWRITE)) { - error = SET_ERROR(EISDIR); - goto out; - } - /* - * Verify requested access to file. - */ - if (mode && (error = zfs_zaccess_rwx(zp, mode, aflags, cr))) { - goto out; - } + (void) zfs_link_create(dzp, name, zp, tx, ZNEW); + txtype = zfs_log_create_txtype(Z_FILE, vsecp, vap); + zfs_log_create(zilog, tx, txtype, dzp, zp, name, + vsecp, acl_ids.z_fuidp, vap); + zfs_acl_ids_free(&acl_ids); + dmu_tx_commit(tx); - mutex_enter(&dzp->z_lock); - dzp->z_seq++; - mutex_exit(&dzp->z_lock); - - /* - * Truncate regular files if requested. - */ - if ((ZTOV(zp)->v_type == VREG) && - (vap->va_mask & AT_SIZE) && (vap->va_size == 0)) { - /* we can't hold any locks when calling zfs_freesp() */ - zfs_dirent_unlock(dl); - dl = NULL; - error = zfs_freesp(zp, 0, 0, mode, TRUE); - if (error == 0) { - vnevent_create(ZTOV(zp), ct); - } - } - } -out: getnewvnode_drop_reserve(); - if (dl) - zfs_dirent_unlock(dl); - if (error) { - if (zp) - VN_RELE(ZTOV(zp)); - } else { +out: + if (error == 0) { *vpp = ZTOV(zp); - error = specvp_check(vpp, cr); } if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) zil_commit(zilog, 0); ZFS_EXIT(zfsvfs); return (error); } /* * Remove an entry from a directory. * * IN: dvp - vnode of directory to remove entry from. * name - name of entry to remove. * cr - credentials of caller. * ct - caller context * flags - case flags * * RETURN: 0 on success, error code on failure. * * Timestamps: * dvp - ctime|mtime * vp - ctime (if nlink > 0) */ -uint64_t null_xattr = 0; - /*ARGSUSED*/ static int -zfs_remove(vnode_t *dvp, char *name, cred_t *cr, caller_context_t *ct, - int flags) +zfs_remove(vnode_t *dvp, vnode_t *vp, char *name, cred_t *cr) { - znode_t *zp, *dzp = VTOZ(dvp); + znode_t *dzp = VTOZ(dvp); + znode_t *zp = VTOZ(vp); znode_t *xzp; - vnode_t *vp; zfsvfs_t *zfsvfs = dzp->z_zfsvfs; zilog_t *zilog; uint64_t acl_obj, xattr_obj; - uint64_t xattr_obj_unlinked = 0; uint64_t obj = 0; - zfs_dirlock_t *dl; dmu_tx_t *tx; - boolean_t may_delete_now, delete_now = FALSE; boolean_t unlinked, toobig = FALSE; uint64_t txtype; - pathname_t *realnmp = NULL; - pathname_t realnm; int error; - int zflg = ZEXISTS; - boolean_t waited = B_FALSE; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(dzp); + ZFS_VERIFY_ZP(zp); zilog = zfsvfs->z_log; + zp = VTOZ(vp); - if (flags & FIGNORECASE) { - zflg |= ZCILOOK; - pn_alloc(&realnm); - realnmp = &realnm; - } - -top: xattr_obj = 0; xzp = NULL; - /* - * Attempt to lock directory; fail if entry doesn't exist. - */ - if (error = zfs_dirent_lock(&dl, dzp, name, &zp, zflg, - NULL, realnmp)) { - if (realnmp) - pn_free(realnmp); - ZFS_EXIT(zfsvfs); - return (error); - } - vp = ZTOV(zp); - if (error = zfs_zaccess_delete(dzp, zp, cr)) { goto out; } /* * Need to use rmdir for removing directories. */ if (vp->v_type == VDIR) { error = SET_ERROR(EPERM); goto out; } vnevent_remove(vp, dvp, name, ct); - if (realnmp) - dnlc_remove(dvp, realnmp->pn_buf); - else - dnlc_remove(dvp, name); + obj = zp->z_id; - VI_LOCK(vp); - may_delete_now = vp->v_count == 1 && !vn_has_cached_data(vp); - VI_UNLOCK(vp); + /* are there any extended attributes? */ + error = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), + &xattr_obj, sizeof (xattr_obj)); + if (error == 0 && xattr_obj) { + error = zfs_zget(zfsvfs, xattr_obj, &xzp); + ASSERT0(error); + } /* * We may delete the znode now, or we may put it in the unlinked set; * it depends on whether we're the last link, and on whether there are * other holds on the vnode. So we dmu_tx_hold() the right things to * allow for either case. */ - obj = zp->z_id; tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_zap(tx, dzp->z_id, FALSE, name); dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE); zfs_sa_upgrade_txholds(tx, zp); zfs_sa_upgrade_txholds(tx, dzp); - if (may_delete_now) { - toobig = - zp->z_size > zp->z_blksz * DMU_MAX_DELETEBLKCNT; - /* if the file is too big, only hold_free a token amount */ - dmu_tx_hold_free(tx, zp->z_id, 0, - (toobig ? DMU_MAX_ACCESS : DMU_OBJECT_END)); - } - /* are there any extended attributes? */ - error = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), - &xattr_obj, sizeof (xattr_obj)); - if (error == 0 && xattr_obj) { - error = zfs_zget(zfsvfs, xattr_obj, &xzp); - ASSERT0(error); + if (xzp) { dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE); dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE); } - mutex_enter(&zp->z_lock); - if ((acl_obj = zfs_external_acl(zp)) != 0 && may_delete_now) - dmu_tx_hold_free(tx, acl_obj, 0, DMU_OBJECT_END); - mutex_exit(&zp->z_lock); - /* charge as an update -- would be nice not to charge at all */ dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL); /* * Mark this transaction as typically resulting in a net free of space */ dmu_tx_mark_netfree(tx); - error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT); + error = dmu_tx_assign(tx, TXG_WAIT); if (error) { - zfs_dirent_unlock(dl); - VN_RELE(vp); - if (xzp) - VN_RELE(ZTOV(xzp)); - if (error == ERESTART) { - waited = B_TRUE; - dmu_tx_wait(tx); - dmu_tx_abort(tx); - goto top; - } - if (realnmp) - pn_free(realnmp); dmu_tx_abort(tx); ZFS_EXIT(zfsvfs); return (error); } /* * Remove the directory entry. */ - error = zfs_link_destroy(dl, zp, tx, zflg, &unlinked); + error = zfs_link_destroy(dzp, name, zp, tx, ZEXISTS, &unlinked); if (error) { dmu_tx_commit(tx); goto out; } if (unlinked) { - /* - * Hold z_lock so that we can make sure that the ACL obj - * hasn't changed. Could have been deleted due to - * zfs_sa_upgrade(). - */ - mutex_enter(&zp->z_lock); - VI_LOCK(vp); - (void) sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), - &xattr_obj_unlinked, sizeof (xattr_obj_unlinked)); - delete_now = may_delete_now && !toobig && - vp->v_count == 1 && !vn_has_cached_data(vp) && - xattr_obj == xattr_obj_unlinked && zfs_external_acl(zp) == - acl_obj; - VI_UNLOCK(vp); - } - - if (delete_now) { -#ifdef __FreeBSD__ - panic("zfs_remove: delete_now branch taken"); -#endif - if (xattr_obj_unlinked) { - ASSERT3U(xzp->z_links, ==, 2); - mutex_enter(&xzp->z_lock); - xzp->z_unlinked = 1; - xzp->z_links = 0; - error = sa_update(xzp->z_sa_hdl, SA_ZPL_LINKS(zfsvfs), - &xzp->z_links, sizeof (xzp->z_links), tx); - ASSERT3U(error, ==, 0); - mutex_exit(&xzp->z_lock); - zfs_unlinked_add(xzp, tx); - - if (zp->z_is_sa) - error = sa_remove(zp->z_sa_hdl, - SA_ZPL_XATTR(zfsvfs), tx); - else - error = sa_update(zp->z_sa_hdl, - SA_ZPL_XATTR(zfsvfs), &null_xattr, - sizeof (uint64_t), tx); - ASSERT0(error); - } - VI_LOCK(vp); - vp->v_count--; - ASSERT0(vp->v_count); - VI_UNLOCK(vp); - mutex_exit(&zp->z_lock); - zfs_znode_delete(zp, tx); - } else if (unlinked) { - mutex_exit(&zp->z_lock); zfs_unlinked_add(zp, tx); -#ifdef __FreeBSD__ vp->v_vflag |= VV_NOSYNC; -#endif } txtype = TX_REMOVE; - if (flags & FIGNORECASE) - txtype |= TX_CI; zfs_log_remove(zilog, tx, txtype, dzp, name, obj); dmu_tx_commit(tx); out: - if (realnmp) - pn_free(realnmp); - zfs_dirent_unlock(dl); - - if (!delete_now) - VN_RELE(vp); if (xzp) - VN_RELE(ZTOV(xzp)); + vrele(ZTOV(xzp)); if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) zil_commit(zilog, 0); ZFS_EXIT(zfsvfs); return (error); } /* * Create a new directory and insert it into dvp using the name * provided. Return a pointer to the inserted directory. * * IN: dvp - vnode of directory to add subdir to. * dirname - name of new directory. * vap - attributes of new directory. * cr - credentials of caller. * ct - caller context * flags - case flags * vsecp - ACL to be set * * OUT: vpp - vnode of created directory. * * RETURN: 0 on success, error code on failure. * * Timestamps: * dvp - ctime|mtime updated * vp - ctime|mtime|atime updated */ /*ARGSUSED*/ static int -zfs_mkdir(vnode_t *dvp, char *dirname, vattr_t *vap, vnode_t **vpp, cred_t *cr, - caller_context_t *ct, int flags, vsecattr_t *vsecp) +zfs_mkdir(vnode_t *dvp, char *dirname, vattr_t *vap, vnode_t **vpp, cred_t *cr) { znode_t *zp, *dzp = VTOZ(dvp); zfsvfs_t *zfsvfs = dzp->z_zfsvfs; zilog_t *zilog; - zfs_dirlock_t *dl; uint64_t txtype; dmu_tx_t *tx; int error; - int zf = ZNEW; ksid_t *ksid; uid_t uid; gid_t gid = crgetgid(cr); zfs_acl_ids_t acl_ids; boolean_t fuid_dirtied; - boolean_t waited = B_FALSE; ASSERT(vap->va_type == VDIR); /* * If we have an ephemeral id, ACL, or XVATTR then * make sure file system is at proper version */ ksid = crgetsid(cr, KSID_OWNER); if (ksid) uid = ksid_getid(ksid); else uid = crgetuid(cr); if (zfsvfs->z_use_fuids == B_FALSE && - (vsecp || (vap->va_mask & AT_XVATTR) || + ((vap->va_mask & AT_XVATTR) || IS_EPHEMERAL(uid) || IS_EPHEMERAL(gid))) return (SET_ERROR(EINVAL)); ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(dzp); zilog = zfsvfs->z_log; if (dzp->z_pflags & ZFS_XATTR) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } if (zfsvfs->z_utf8 && u8_validate(dirname, strlen(dirname), NULL, U8_VALIDATE_ENTIRE, &error) < 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EILSEQ)); } - if (flags & FIGNORECASE) - zf |= ZCILOOK; if (vap->va_mask & AT_XVATTR) { if ((error = secpolicy_xvattr(dvp, (xvattr_t *)vap, crgetuid(cr), cr, vap->va_type)) != 0) { ZFS_EXIT(zfsvfs); return (error); } } if ((error = zfs_acl_ids_create(dzp, 0, vap, cr, - vsecp, &acl_ids)) != 0) { + NULL, &acl_ids)) != 0) { ZFS_EXIT(zfsvfs); return (error); } - getnewvnode_reserve(1); - /* * First make sure the new directory doesn't exist. * * Existence is checked first to make sure we don't return * EACCES instead of EEXIST which can cause some applications * to fail. */ -top: *vpp = NULL; - if (error = zfs_dirent_lock(&dl, dzp, dirname, &zp, zf, - NULL, NULL)) { + if (error = zfs_dirent_lookup(dzp, dirname, &zp, ZNEW)) { zfs_acl_ids_free(&acl_ids); - getnewvnode_drop_reserve(); ZFS_EXIT(zfsvfs); return (error); } + ASSERT3P(zp, ==, NULL); if (error = zfs_zaccess(dzp, ACE_ADD_SUBDIRECTORY, 0, B_FALSE, cr)) { zfs_acl_ids_free(&acl_ids); - zfs_dirent_unlock(dl); - getnewvnode_drop_reserve(); ZFS_EXIT(zfsvfs); return (error); } if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) { zfs_acl_ids_free(&acl_ids); - zfs_dirent_unlock(dl); - getnewvnode_drop_reserve(); ZFS_EXIT(zfsvfs); return (SET_ERROR(EDQUOT)); } /* * Add a new entry to the directory. */ + getnewvnode_reserve(1); tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_zap(tx, dzp->z_id, TRUE, dirname); dmu_tx_hold_zap(tx, DMU_NEW_OBJECT, FALSE, NULL); fuid_dirtied = zfsvfs->z_fuid_dirty; if (fuid_dirtied) zfs_fuid_txhold(zfsvfs, tx); if (!zfsvfs->z_use_sa && acl_ids.z_aclp->z_acl_bytes > ZFS_ACE_SPACE) { dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, acl_ids.z_aclp->z_acl_bytes); } dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes + ZFS_SA_BASE_ATTR_SIZE); - error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT); + error = dmu_tx_assign(tx, TXG_WAIT); if (error) { - zfs_dirent_unlock(dl); - if (error == ERESTART) { - waited = B_TRUE; - dmu_tx_wait(tx); - dmu_tx_abort(tx); - goto top; - } zfs_acl_ids_free(&acl_ids); dmu_tx_abort(tx); getnewvnode_drop_reserve(); ZFS_EXIT(zfsvfs); return (error); } /* * Create new node. */ zfs_mknode(dzp, vap, tx, cr, 0, &zp, &acl_ids); if (fuid_dirtied) zfs_fuid_sync(zfsvfs, tx); /* * Now put new name in parent dir. */ - (void) zfs_link_create(dl, zp, tx, ZNEW); + (void) zfs_link_create(dzp, dirname, zp, tx, ZNEW); *vpp = ZTOV(zp); - txtype = zfs_log_create_txtype(Z_DIR, vsecp, vap); - if (flags & FIGNORECASE) - txtype |= TX_CI; - zfs_log_create(zilog, tx, txtype, dzp, zp, dirname, vsecp, + txtype = zfs_log_create_txtype(Z_DIR, NULL, vap); + zfs_log_create(zilog, tx, txtype, dzp, zp, dirname, NULL, acl_ids.z_fuidp, vap); zfs_acl_ids_free(&acl_ids); dmu_tx_commit(tx); getnewvnode_drop_reserve(); - zfs_dirent_unlock(dl); - if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) zil_commit(zilog, 0); ZFS_EXIT(zfsvfs); return (0); } /* * Remove a directory subdir entry. If the current working * directory is the same as the subdir to be removed, the * remove will fail. * * IN: dvp - vnode of directory to remove from. * name - name of directory to be removed. * cwd - vnode of current working directory. * cr - credentials of caller. * ct - caller context * flags - case flags * * RETURN: 0 on success, error code on failure. * * Timestamps: * dvp - ctime|mtime updated */ /*ARGSUSED*/ static int -zfs_rmdir(vnode_t *dvp, char *name, vnode_t *cwd, cred_t *cr, - caller_context_t *ct, int flags) +zfs_rmdir(vnode_t *dvp, vnode_t *vp, char *name, cred_t *cr) { znode_t *dzp = VTOZ(dvp); - znode_t *zp; - vnode_t *vp; + znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = dzp->z_zfsvfs; zilog_t *zilog; - zfs_dirlock_t *dl; dmu_tx_t *tx; int error; - int zflg = ZEXISTS; - boolean_t waited = B_FALSE; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(dzp); + ZFS_VERIFY_ZP(zp); zilog = zfsvfs->z_log; - if (flags & FIGNORECASE) - zflg |= ZCILOOK; -top: - zp = NULL; - /* - * Attempt to lock directory; fail if entry doesn't exist. - */ - if (error = zfs_dirent_lock(&dl, dzp, name, &zp, zflg, - NULL, NULL)) { - ZFS_EXIT(zfsvfs); - return (error); - } - - vp = ZTOV(zp); - if (error = zfs_zaccess_delete(dzp, zp, cr)) { goto out; } if (vp->v_type != VDIR) { error = SET_ERROR(ENOTDIR); goto out; } - if (vp == cwd) { - error = SET_ERROR(EINVAL); - goto out; - } - vnevent_rmdir(vp, dvp, name, ct); - /* - * Grab a lock on the directory to make sure that noone is - * trying to add (or lookup) entries while we are removing it. - */ - rw_enter(&zp->z_name_lock, RW_WRITER); - - /* - * Grab a lock on the parent pointer to make sure we play well - * with the treewalk and directory rename code. - */ - rw_enter(&zp->z_parent_lock, RW_WRITER); - tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_zap(tx, dzp->z_id, FALSE, name); dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE); dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL); zfs_sa_upgrade_txholds(tx, zp); zfs_sa_upgrade_txholds(tx, dzp); dmu_tx_mark_netfree(tx); - error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT); + error = dmu_tx_assign(tx, TXG_WAIT); if (error) { - rw_exit(&zp->z_parent_lock); - rw_exit(&zp->z_name_lock); - zfs_dirent_unlock(dl); - VN_RELE(vp); - if (error == ERESTART) { - waited = B_TRUE; - dmu_tx_wait(tx); - dmu_tx_abort(tx); - goto top; - } dmu_tx_abort(tx); ZFS_EXIT(zfsvfs); return (error); } -#ifdef FREEBSD_NAMECACHE cache_purge(dvp); -#endif - error = zfs_link_destroy(dl, zp, tx, zflg, NULL); + error = zfs_link_destroy(dzp, name, zp, tx, ZEXISTS, NULL); if (error == 0) { uint64_t txtype = TX_RMDIR; - if (flags & FIGNORECASE) - txtype |= TX_CI; zfs_log_remove(zilog, tx, txtype, dzp, name, ZFS_NO_OBJECT); } dmu_tx_commit(tx); - rw_exit(&zp->z_parent_lock); - rw_exit(&zp->z_name_lock); -#ifdef FREEBSD_NAMECACHE cache_purge(vp); -#endif out: - zfs_dirent_unlock(dl); - - VN_RELE(vp); - if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) zil_commit(zilog, 0); ZFS_EXIT(zfsvfs); return (error); } /* * Read as many directory entries as will fit into the provided * buffer from the given directory cursor position (specified in * the uio structure). * * IN: vp - vnode of directory to read. * uio - structure supplying read location, range info, * and return buffer. * cr - credentials of caller. * ct - caller context * flags - case flags * * OUT: uio - updated offset and range, buffer filled. * eofp - set to true if end-of-file detected. * * RETURN: 0 on success, error code on failure. * * Timestamps: * vp - atime updated * * Note that the low 4 bits of the cookie returned by zap is always zero. * This allows us to use the low range for "special" directory entries: * We use 0 for '.', and 1 for '..'. If this is the root of the filesystem, * we use the offset 2 for the '.zfs' directory. */ /* ARGSUSED */ static int zfs_readdir(vnode_t *vp, uio_t *uio, cred_t *cr, int *eofp, int *ncookies, u_long **cookies) { znode_t *zp = VTOZ(vp); iovec_t *iovp; edirent_t *eodp; dirent64_t *odp; zfsvfs_t *zfsvfs = zp->z_zfsvfs; objset_t *os; caddr_t outbuf; size_t bufsize; zap_cursor_t zc; zap_attribute_t zap; uint_t bytes_wanted; uint64_t offset; /* must be unsigned; checks for < 1 */ uint64_t parent; int local_eof; int outcount; int error; uint8_t prefetch; boolean_t check_sysattrs; uint8_t type; int ncooks; u_long *cooks = NULL; int flags = 0; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent))) != 0) { ZFS_EXIT(zfsvfs); return (error); } /* * If we are not given an eof variable, * use a local one. */ if (eofp == NULL) eofp = &local_eof; /* * Check for valid iov_len. */ if (uio->uio_iov->iov_len <= 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } /* * Quit if directory has been removed (posix) */ if ((*eofp = zp->z_unlinked) != 0) { ZFS_EXIT(zfsvfs); return (0); } error = 0; os = zfsvfs->z_os; offset = uio->uio_loffset; prefetch = zp->z_zn_prefetch; /* * Initialize the iterator cursor. */ if (offset <= 3) { /* * Start iteration from the beginning of the directory. */ zap_cursor_init(&zc, os, zp->z_id); } else { /* * The offset is a serialized cursor. */ zap_cursor_init_serialized(&zc, os, zp->z_id, offset); } /* * Get space to change directory entries into fs independent format. */ iovp = uio->uio_iov; bytes_wanted = iovp->iov_len; if (uio->uio_segflg != UIO_SYSSPACE || uio->uio_iovcnt != 1) { bufsize = bytes_wanted; outbuf = kmem_alloc(bufsize, KM_SLEEP); odp = (struct dirent64 *)outbuf; } else { bufsize = bytes_wanted; outbuf = NULL; odp = (struct dirent64 *)iovp->iov_base; } eodp = (struct edirent *)odp; if (ncookies != NULL) { /* * Minimum entry size is dirent size and 1 byte for a file name. */ ncooks = uio->uio_resid / (sizeof(struct dirent) - sizeof(((struct dirent *)NULL)->d_name) + 1); cooks = malloc(ncooks * sizeof(u_long), M_TEMP, M_WAITOK); *cookies = cooks; *ncookies = ncooks; } /* * If this VFS supports the system attribute view interface; and * we're looking at an extended attribute directory; and we care * about normalization conflicts on this vfs; then we must check * for normalization conflicts with the sysattr name space. */ #ifdef TODO check_sysattrs = vfs_has_feature(vp->v_vfsp, VFSFT_SYSATTR_VIEWS) && (vp->v_flag & V_XATTRDIR) && zfsvfs->z_norm && (flags & V_RDDIR_ENTFLAGS); #else check_sysattrs = 0; #endif /* * Transform to file-system independent format */ outcount = 0; while (outcount < bytes_wanted) { ino64_t objnum; ushort_t reclen; off64_t *next = NULL; /* * Special case `.', `..', and `.zfs'. */ if (offset == 0) { (void) strcpy(zap.za_name, "."); zap.za_normalization_conflict = 0; objnum = zp->z_id; type = DT_DIR; } else if (offset == 1) { (void) strcpy(zap.za_name, ".."); zap.za_normalization_conflict = 0; objnum = parent; type = DT_DIR; } else if (offset == 2 && zfs_show_ctldir(zp)) { (void) strcpy(zap.za_name, ZFS_CTLDIR_NAME); zap.za_normalization_conflict = 0; objnum = ZFSCTL_INO_ROOT; type = DT_DIR; } else { /* * Grab next entry. */ if (error = zap_cursor_retrieve(&zc, &zap)) { if ((*eofp = (error == ENOENT)) != 0) break; else goto update; } if (zap.za_integer_length != 8 || zap.za_num_integers != 1) { cmn_err(CE_WARN, "zap_readdir: bad directory " "entry, obj = %lld, offset = %lld\n", (u_longlong_t)zp->z_id, (u_longlong_t)offset); error = SET_ERROR(ENXIO); goto update; } objnum = ZFS_DIRENT_OBJ(zap.za_first_integer); /* * MacOS X can extract the object type here such as: * uint8_t type = ZFS_DIRENT_TYPE(zap.za_first_integer); */ type = ZFS_DIRENT_TYPE(zap.za_first_integer); if (check_sysattrs && !zap.za_normalization_conflict) { #ifdef TODO zap.za_normalization_conflict = xattr_sysattr_casechk(zap.za_name); #else panic("%s:%u: TODO", __func__, __LINE__); #endif } } if (flags & V_RDDIR_ACCFILTER) { /* * If we have no access at all, don't include * this entry in the returned information */ znode_t *ezp; if (zfs_zget(zp->z_zfsvfs, objnum, &ezp) != 0) goto skip_entry; if (!zfs_has_access(ezp, cr)) { - VN_RELE(ZTOV(ezp)); + vrele(ZTOV(ezp)); goto skip_entry; } - VN_RELE(ZTOV(ezp)); + vrele(ZTOV(ezp)); } if (flags & V_RDDIR_ENTFLAGS) reclen = EDIRENT_RECLEN(strlen(zap.za_name)); else reclen = DIRENT64_RECLEN(strlen(zap.za_name)); /* * Will this entry fit in the buffer? */ if (outcount + reclen > bufsize) { /* * Did we manage to fit anything in the buffer? */ if (!outcount) { error = SET_ERROR(EINVAL); goto update; } break; } if (flags & V_RDDIR_ENTFLAGS) { /* * Add extended flag entry: */ eodp->ed_ino = objnum; eodp->ed_reclen = reclen; /* NOTE: ed_off is the offset for the *next* entry */ next = &(eodp->ed_off); eodp->ed_eflags = zap.za_normalization_conflict ? ED_CASE_CONFLICT : 0; (void) strncpy(eodp->ed_name, zap.za_name, EDIRENT_NAMELEN(reclen)); eodp = (edirent_t *)((intptr_t)eodp + reclen); } else { /* * Add normal entry: */ odp->d_ino = objnum; odp->d_reclen = reclen; odp->d_namlen = strlen(zap.za_name); (void) strlcpy(odp->d_name, zap.za_name, odp->d_namlen + 1); odp->d_type = type; odp = (dirent64_t *)((intptr_t)odp + reclen); } outcount += reclen; ASSERT(outcount <= bufsize); /* Prefetch znode */ if (prefetch) dmu_prefetch(os, objnum, 0, 0, 0, ZIO_PRIORITY_SYNC_READ); skip_entry: /* * Move to the next entry, fill in the previous offset. */ if (offset > 2 || (offset == 2 && !zfs_show_ctldir(zp))) { zap_cursor_advance(&zc); offset = zap_cursor_serialize(&zc); } else { offset += 1; } if (cooks != NULL) { *cooks++ = offset; ncooks--; KASSERT(ncooks >= 0, ("ncookies=%d", ncooks)); } } zp->z_zn_prefetch = B_FALSE; /* a lookup will re-enable pre-fetching */ /* Subtract unused cookies */ if (ncookies != NULL) *ncookies -= ncooks; if (uio->uio_segflg == UIO_SYSSPACE && uio->uio_iovcnt == 1) { iovp->iov_base += outcount; iovp->iov_len -= outcount; uio->uio_resid -= outcount; } else if (error = uiomove(outbuf, (long)outcount, UIO_READ, uio)) { /* * Reset the pointer. */ offset = uio->uio_loffset; } update: zap_cursor_fini(&zc); if (uio->uio_segflg != UIO_SYSSPACE || uio->uio_iovcnt != 1) kmem_free(outbuf, bufsize); if (error == ENOENT) error = 0; ZFS_ACCESSTIME_STAMP(zfsvfs, zp); uio->uio_loffset = offset; ZFS_EXIT(zfsvfs); if (error != 0 && cookies != NULL) { free(*cookies, M_TEMP); *cookies = NULL; *ncookies = 0; } return (error); } ulong_t zfs_fsync_sync_cnt = 4; static int zfs_fsync(vnode_t *vp, int syncflag, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; (void) tsd_set(zfs_fsyncer_key, (void *)zfs_fsync_sync_cnt); if (zfsvfs->z_os->os_sync != ZFS_SYNC_DISABLED) { ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); zil_commit(zfsvfs->z_log, zp->z_id); ZFS_EXIT(zfsvfs); } return (0); } /* * Get the requested file attributes and place them in the provided * vattr structure. * * IN: vp - vnode of file. * vap - va_mask identifies requested attributes. * If AT_XVATTR set, then optional attrs are requested * flags - ATTR_NOACLCHECK (CIFS server context) * cr - credentials of caller. * ct - caller context * * OUT: vap - attribute values. * * RETURN: 0 (always succeeds). */ /* ARGSUSED */ static int zfs_getattr(vnode_t *vp, vattr_t *vap, int flags, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; int error = 0; uint32_t blksize; u_longlong_t nblocks; uint64_t links; uint64_t mtime[2], ctime[2], crtime[2], rdev; xvattr_t *xvap = (xvattr_t *)vap; /* vap may be an xvattr_t * */ xoptattr_t *xoap = NULL; boolean_t skipaclchk = (flags & ATTR_NOACLCHECK) ? B_TRUE : B_FALSE; sa_bulk_attr_t bulk[4]; int count = 0; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); zfs_fuid_map_ids(zp, cr, &vap->va_uid, &vap->va_gid); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, &mtime, 16); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, 16); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CRTIME(zfsvfs), NULL, &crtime, 16); if (vp->v_type == VBLK || vp->v_type == VCHR) SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_RDEV(zfsvfs), NULL, &rdev, 8); if ((error = sa_bulk_lookup(zp->z_sa_hdl, bulk, count)) != 0) { ZFS_EXIT(zfsvfs); return (error); } /* * If ACL is trivial don't bother looking for ACE_READ_ATTRIBUTES. * Also, if we are the owner don't bother, since owner should * always be allowed to read basic attributes of file. */ if (!(zp->z_pflags & ZFS_ACL_TRIVIAL) && (vap->va_uid != crgetuid(cr))) { if (error = zfs_zaccess(zp, ACE_READ_ATTRIBUTES, 0, skipaclchk, cr)) { ZFS_EXIT(zfsvfs); return (error); } } /* * Return all attributes. It's cheaper to provide the answer * than to determine whether we were asked the question. */ - mutex_enter(&zp->z_lock); vap->va_type = IFTOVT(zp->z_mode); vap->va_mode = zp->z_mode & ~S_IFMT; #ifdef illumos vap->va_fsid = zp->z_zfsvfs->z_vfs->vfs_dev; #else vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0]; #endif vap->va_nodeid = zp->z_id; if ((vp->v_flag & VROOT) && zfs_show_ctldir(zp)) links = zp->z_links + 1; else links = zp->z_links; vap->va_nlink = MIN(links, LINK_MAX); /* nlink_t limit! */ vap->va_size = zp->z_size; #ifdef illumos vap->va_rdev = vp->v_rdev; #else if (vp->v_type == VBLK || vp->v_type == VCHR) vap->va_rdev = zfs_cmpldev(rdev); #endif vap->va_seq = zp->z_seq; vap->va_flags = 0; /* FreeBSD: Reset chflags(2) flags. */ vap->va_filerev = zp->z_seq; /* * Add in any requested optional attributes and the create time. * Also set the corresponding bits in the returned attribute bitmap. */ if ((xoap = xva_getxoptattr(xvap)) != NULL && zfsvfs->z_use_fuids) { if (XVA_ISSET_REQ(xvap, XAT_ARCHIVE)) { xoap->xoa_archive = ((zp->z_pflags & ZFS_ARCHIVE) != 0); XVA_SET_RTN(xvap, XAT_ARCHIVE); } if (XVA_ISSET_REQ(xvap, XAT_READONLY)) { xoap->xoa_readonly = ((zp->z_pflags & ZFS_READONLY) != 0); XVA_SET_RTN(xvap, XAT_READONLY); } if (XVA_ISSET_REQ(xvap, XAT_SYSTEM)) { xoap->xoa_system = ((zp->z_pflags & ZFS_SYSTEM) != 0); XVA_SET_RTN(xvap, XAT_SYSTEM); } if (XVA_ISSET_REQ(xvap, XAT_HIDDEN)) { xoap->xoa_hidden = ((zp->z_pflags & ZFS_HIDDEN) != 0); XVA_SET_RTN(xvap, XAT_HIDDEN); } if (XVA_ISSET_REQ(xvap, XAT_NOUNLINK)) { xoap->xoa_nounlink = ((zp->z_pflags & ZFS_NOUNLINK) != 0); XVA_SET_RTN(xvap, XAT_NOUNLINK); } if (XVA_ISSET_REQ(xvap, XAT_IMMUTABLE)) { xoap->xoa_immutable = ((zp->z_pflags & ZFS_IMMUTABLE) != 0); XVA_SET_RTN(xvap, XAT_IMMUTABLE); } if (XVA_ISSET_REQ(xvap, XAT_APPENDONLY)) { xoap->xoa_appendonly = ((zp->z_pflags & ZFS_APPENDONLY) != 0); XVA_SET_RTN(xvap, XAT_APPENDONLY); } if (XVA_ISSET_REQ(xvap, XAT_NODUMP)) { xoap->xoa_nodump = ((zp->z_pflags & ZFS_NODUMP) != 0); XVA_SET_RTN(xvap, XAT_NODUMP); } if (XVA_ISSET_REQ(xvap, XAT_OPAQUE)) { xoap->xoa_opaque = ((zp->z_pflags & ZFS_OPAQUE) != 0); XVA_SET_RTN(xvap, XAT_OPAQUE); } if (XVA_ISSET_REQ(xvap, XAT_AV_QUARANTINED)) { xoap->xoa_av_quarantined = ((zp->z_pflags & ZFS_AV_QUARANTINED) != 0); XVA_SET_RTN(xvap, XAT_AV_QUARANTINED); } if (XVA_ISSET_REQ(xvap, XAT_AV_MODIFIED)) { xoap->xoa_av_modified = ((zp->z_pflags & ZFS_AV_MODIFIED) != 0); XVA_SET_RTN(xvap, XAT_AV_MODIFIED); } if (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP) && vp->v_type == VREG) { zfs_sa_get_scanstamp(zp, xvap); } if (XVA_ISSET_REQ(xvap, XAT_CREATETIME)) { uint64_t times[2]; (void) sa_lookup(zp->z_sa_hdl, SA_ZPL_CRTIME(zfsvfs), times, sizeof (times)); ZFS_TIME_DECODE(&xoap->xoa_createtime, times); XVA_SET_RTN(xvap, XAT_CREATETIME); } if (XVA_ISSET_REQ(xvap, XAT_REPARSE)) { xoap->xoa_reparse = ((zp->z_pflags & ZFS_REPARSE) != 0); XVA_SET_RTN(xvap, XAT_REPARSE); } if (XVA_ISSET_REQ(xvap, XAT_GEN)) { xoap->xoa_generation = zp->z_gen; XVA_SET_RTN(xvap, XAT_GEN); } if (XVA_ISSET_REQ(xvap, XAT_OFFLINE)) { xoap->xoa_offline = ((zp->z_pflags & ZFS_OFFLINE) != 0); XVA_SET_RTN(xvap, XAT_OFFLINE); } if (XVA_ISSET_REQ(xvap, XAT_SPARSE)) { xoap->xoa_sparse = ((zp->z_pflags & ZFS_SPARSE) != 0); XVA_SET_RTN(xvap, XAT_SPARSE); } } ZFS_TIME_DECODE(&vap->va_atime, zp->z_atime); ZFS_TIME_DECODE(&vap->va_mtime, mtime); ZFS_TIME_DECODE(&vap->va_ctime, ctime); ZFS_TIME_DECODE(&vap->va_birthtime, crtime); - mutex_exit(&zp->z_lock); sa_object_size(zp->z_sa_hdl, &blksize, &nblocks); vap->va_blksize = blksize; vap->va_bytes = nblocks << 9; /* nblocks * 512 */ if (zp->z_blksz == 0) { /* * Block size hasn't been set; suggest maximal I/O transfers. */ vap->va_blksize = zfsvfs->z_max_blksz; } ZFS_EXIT(zfsvfs); return (0); } /* * Set the file attributes to the values contained in the * vattr structure. * * IN: vp - vnode of file to be modified. * vap - new attribute values. * If AT_XVATTR set, then optional attrs are being set * flags - ATTR_UTIME set if non-default time values provided. * - ATTR_NOACLCHECK (CIFS context only). * cr - credentials of caller. * ct - caller context * * RETURN: 0 on success, error code on failure. * * Timestamps: * vp - ctime updated, mtime updated if size changed. */ /* ARGSUSED */ static int zfs_setattr(vnode_t *vp, vattr_t *vap, int flags, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; zilog_t *zilog; dmu_tx_t *tx; vattr_t oldva; xvattr_t tmpxvattr; uint_t mask = vap->va_mask; uint_t saved_mask = 0; uint64_t saved_mode; int trim_mask = 0; uint64_t new_mode; uint64_t new_uid, new_gid; uint64_t xattr_obj; uint64_t mtime[2], ctime[2]; znode_t *attrzp; int need_policy = FALSE; int err, err2; zfs_fuid_info_t *fuidp = NULL; xvattr_t *xvap = (xvattr_t *)vap; /* vap may be an xvattr_t * */ xoptattr_t *xoap; zfs_acl_t *aclp; boolean_t skipaclchk = (flags & ATTR_NOACLCHECK) ? B_TRUE : B_FALSE; boolean_t fuid_dirtied = B_FALSE; sa_bulk_attr_t bulk[7], xattr_bulk[7]; int count = 0, xattr_count = 0; if (mask == 0) return (0); if (mask & AT_NOSET) return (SET_ERROR(EINVAL)); ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); zilog = zfsvfs->z_log; /* * Make sure that if we have ephemeral uid/gid or xvattr specified * that file system is at proper version level */ if (zfsvfs->z_use_fuids == B_FALSE && (((mask & AT_UID) && IS_EPHEMERAL(vap->va_uid)) || ((mask & AT_GID) && IS_EPHEMERAL(vap->va_gid)) || (mask & AT_XVATTR))) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } if (mask & AT_SIZE && vp->v_type == VDIR) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EISDIR)); } if (mask & AT_SIZE && vp->v_type != VREG && vp->v_type != VFIFO) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } /* * If this is an xvattr_t, then get a pointer to the structure of * optional attributes. If this is NULL, then we have a vattr_t. */ xoap = xva_getxoptattr(xvap); xva_init(&tmpxvattr); /* * Immutable files can only alter immutable bit and atime */ if ((zp->z_pflags & ZFS_IMMUTABLE) && ((mask & (AT_SIZE|AT_UID|AT_GID|AT_MTIME|AT_MODE)) || ((mask & AT_XVATTR) && XVA_ISSET_REQ(xvap, XAT_CREATETIME)))) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } if ((mask & AT_SIZE) && (zp->z_pflags & ZFS_READONLY)) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } /* * Verify timestamps doesn't overflow 32 bits. * ZFS can handle large timestamps, but 32bit syscalls can't * handle times greater than 2039. This check should be removed * once large timestamps are fully supported. */ if (mask & (AT_ATIME | AT_MTIME)) { if (((mask & AT_ATIME) && TIMESPEC_OVERFLOW(&vap->va_atime)) || ((mask & AT_MTIME) && TIMESPEC_OVERFLOW(&vap->va_mtime))) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EOVERFLOW)); } } -top: attrzp = NULL; aclp = NULL; /* Can this be moved to before the top label? */ if (zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EROFS)); } /* * First validate permissions */ if (mask & AT_SIZE) { /* * XXX - Note, we are not providing any open * mode flags here (like FNDELAY), so we may * block if there are locks present... this * should be addressed in openat(). */ /* XXX - would it be OK to generate a log record here? */ err = zfs_freesp(zp, vap->va_size, 0, 0, FALSE); if (err) { ZFS_EXIT(zfsvfs); return (err); } } if (mask & (AT_ATIME|AT_MTIME) || ((mask & AT_XVATTR) && (XVA_ISSET_REQ(xvap, XAT_HIDDEN) || XVA_ISSET_REQ(xvap, XAT_READONLY) || XVA_ISSET_REQ(xvap, XAT_ARCHIVE) || XVA_ISSET_REQ(xvap, XAT_OFFLINE) || XVA_ISSET_REQ(xvap, XAT_SPARSE) || XVA_ISSET_REQ(xvap, XAT_CREATETIME) || XVA_ISSET_REQ(xvap, XAT_SYSTEM)))) { need_policy = zfs_zaccess(zp, ACE_WRITE_ATTRIBUTES, 0, skipaclchk, cr); } if (mask & (AT_UID|AT_GID)) { int idmask = (mask & (AT_UID|AT_GID)); int take_owner; int take_group; /* * NOTE: even if a new mode is being set, * we may clear S_ISUID/S_ISGID bits. */ if (!(mask & AT_MODE)) vap->va_mode = zp->z_mode; /* * Take ownership or chgrp to group we are a member of */ take_owner = (mask & AT_UID) && (vap->va_uid == crgetuid(cr)); take_group = (mask & AT_GID) && zfs_groupmember(zfsvfs, vap->va_gid, cr); /* * If both AT_UID and AT_GID are set then take_owner and * take_group must both be set in order to allow taking * ownership. * * Otherwise, send the check through secpolicy_vnode_setattr() * */ if (((idmask == (AT_UID|AT_GID)) && take_owner && take_group) || ((idmask == AT_UID) && take_owner) || ((idmask == AT_GID) && take_group)) { if (zfs_zaccess(zp, ACE_WRITE_OWNER, 0, skipaclchk, cr) == 0) { /* * Remove setuid/setgid for non-privileged users */ secpolicy_setid_clear(vap, vp, cr); trim_mask = (mask & (AT_UID|AT_GID)); } else { need_policy = TRUE; } } else { need_policy = TRUE; } } - mutex_enter(&zp->z_lock); oldva.va_mode = zp->z_mode; zfs_fuid_map_ids(zp, cr, &oldva.va_uid, &oldva.va_gid); if (mask & AT_XVATTR) { /* * Update xvattr mask to include only those attributes * that are actually changing. * * the bits will be restored prior to actually setting * the attributes so the caller thinks they were set. */ if (XVA_ISSET_REQ(xvap, XAT_APPENDONLY)) { if (xoap->xoa_appendonly != ((zp->z_pflags & ZFS_APPENDONLY) != 0)) { need_policy = TRUE; } else { XVA_CLR_REQ(xvap, XAT_APPENDONLY); XVA_SET_REQ(&tmpxvattr, XAT_APPENDONLY); } } if (XVA_ISSET_REQ(xvap, XAT_NOUNLINK)) { if (xoap->xoa_nounlink != ((zp->z_pflags & ZFS_NOUNLINK) != 0)) { need_policy = TRUE; } else { XVA_CLR_REQ(xvap, XAT_NOUNLINK); XVA_SET_REQ(&tmpxvattr, XAT_NOUNLINK); } } if (XVA_ISSET_REQ(xvap, XAT_IMMUTABLE)) { if (xoap->xoa_immutable != ((zp->z_pflags & ZFS_IMMUTABLE) != 0)) { need_policy = TRUE; } else { XVA_CLR_REQ(xvap, XAT_IMMUTABLE); XVA_SET_REQ(&tmpxvattr, XAT_IMMUTABLE); } } if (XVA_ISSET_REQ(xvap, XAT_NODUMP)) { if (xoap->xoa_nodump != ((zp->z_pflags & ZFS_NODUMP) != 0)) { need_policy = TRUE; } else { XVA_CLR_REQ(xvap, XAT_NODUMP); XVA_SET_REQ(&tmpxvattr, XAT_NODUMP); } } if (XVA_ISSET_REQ(xvap, XAT_AV_MODIFIED)) { if (xoap->xoa_av_modified != ((zp->z_pflags & ZFS_AV_MODIFIED) != 0)) { need_policy = TRUE; } else { XVA_CLR_REQ(xvap, XAT_AV_MODIFIED); XVA_SET_REQ(&tmpxvattr, XAT_AV_MODIFIED); } } if (XVA_ISSET_REQ(xvap, XAT_AV_QUARANTINED)) { if ((vp->v_type != VREG && xoap->xoa_av_quarantined) || xoap->xoa_av_quarantined != ((zp->z_pflags & ZFS_AV_QUARANTINED) != 0)) { need_policy = TRUE; } else { XVA_CLR_REQ(xvap, XAT_AV_QUARANTINED); XVA_SET_REQ(&tmpxvattr, XAT_AV_QUARANTINED); } } if (XVA_ISSET_REQ(xvap, XAT_REPARSE)) { - mutex_exit(&zp->z_lock); ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } if (need_policy == FALSE && (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP) || XVA_ISSET_REQ(xvap, XAT_OPAQUE))) { need_policy = TRUE; } } - mutex_exit(&zp->z_lock); - if (mask & AT_MODE) { if (zfs_zaccess(zp, ACE_WRITE_ACL, 0, skipaclchk, cr) == 0) { err = secpolicy_setid_setsticky_clear(vp, vap, &oldva, cr); if (err) { ZFS_EXIT(zfsvfs); return (err); } trim_mask |= AT_MODE; } else { need_policy = TRUE; } } if (need_policy) { /* * If trim_mask is set then take ownership * has been granted or write_acl is present and user * has the ability to modify mode. In that case remove * UID|GID and or MODE from mask so that * secpolicy_vnode_setattr() doesn't revoke it. */ if (trim_mask) { saved_mask = vap->va_mask; vap->va_mask &= ~trim_mask; if (trim_mask & AT_MODE) { /* * Save the mode, as secpolicy_vnode_setattr() * will overwrite it with ova.va_mode. */ saved_mode = vap->va_mode; } } err = secpolicy_vnode_setattr(cr, vp, vap, &oldva, flags, (int (*)(void *, int, cred_t *))zfs_zaccess_unix, zp); if (err) { ZFS_EXIT(zfsvfs); return (err); } if (trim_mask) { vap->va_mask |= saved_mask; if (trim_mask & AT_MODE) { /* * Recover the mode after * secpolicy_vnode_setattr(). */ vap->va_mode = saved_mode; } } } /* * secpolicy_vnode_setattr, or take ownership may have * changed va_mask */ mask = vap->va_mask; if ((mask & (AT_UID | AT_GID))) { err = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), &xattr_obj, sizeof (xattr_obj)); if (err == 0 && xattr_obj) { err = zfs_zget(zp->z_zfsvfs, xattr_obj, &attrzp); if (err) goto out2; } if (mask & AT_UID) { new_uid = zfs_fuid_create(zfsvfs, (uint64_t)vap->va_uid, cr, ZFS_OWNER, &fuidp); if (new_uid != zp->z_uid && zfs_fuid_overquota(zfsvfs, B_FALSE, new_uid)) { if (attrzp) - VN_RELE(ZTOV(attrzp)); + vrele(ZTOV(attrzp)); err = SET_ERROR(EDQUOT); goto out2; } } if (mask & AT_GID) { new_gid = zfs_fuid_create(zfsvfs, (uint64_t)vap->va_gid, cr, ZFS_GROUP, &fuidp); if (new_gid != zp->z_gid && zfs_fuid_overquota(zfsvfs, B_TRUE, new_gid)) { if (attrzp) - VN_RELE(ZTOV(attrzp)); + vrele(ZTOV(attrzp)); err = SET_ERROR(EDQUOT); goto out2; } } } tx = dmu_tx_create(zfsvfs->z_os); if (mask & AT_MODE) { uint64_t pmode = zp->z_mode; uint64_t acl_obj; new_mode = (pmode & S_IFMT) | (vap->va_mode & ~S_IFMT); if (zp->z_zfsvfs->z_acl_mode == ZFS_ACL_RESTRICTED && !(zp->z_pflags & ZFS_ACL_TRIVIAL)) { err = SET_ERROR(EPERM); goto out; } if (err = zfs_acl_chmod_setattr(zp, &aclp, new_mode)) goto out; - mutex_enter(&zp->z_lock); if (!zp->z_is_sa && ((acl_obj = zfs_external_acl(zp)) != 0)) { /* * Are we upgrading ACL from old V0 format * to V1 format? */ if (zfsvfs->z_version >= ZPL_VERSION_FUID && zfs_znode_acl_version(zp) == ZFS_ACL_VERSION_INITIAL) { dmu_tx_hold_free(tx, acl_obj, 0, DMU_OBJECT_END); dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, aclp->z_acl_bytes); } else { dmu_tx_hold_write(tx, acl_obj, 0, aclp->z_acl_bytes); } } else if (!zp->z_is_sa && aclp->z_acl_bytes > ZFS_ACE_SPACE) { dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, aclp->z_acl_bytes); } - mutex_exit(&zp->z_lock); dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE); } else { if ((mask & AT_XVATTR) && XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP)) dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE); else dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE); } if (attrzp) { dmu_tx_hold_sa(tx, attrzp->z_sa_hdl, B_FALSE); } fuid_dirtied = zfsvfs->z_fuid_dirty; if (fuid_dirtied) zfs_fuid_txhold(zfsvfs, tx); zfs_sa_upgrade_txholds(tx, zp); err = dmu_tx_assign(tx, TXG_WAIT); if (err) goto out; count = 0; /* * Set each attribute requested. * We group settings according to the locks they need to acquire. * * Note: you cannot set ctime directly, although it will be * updated as a side-effect of calling this function. */ - if (mask & (AT_UID|AT_GID|AT_MODE)) mutex_enter(&zp->z_acl_lock); - mutex_enter(&zp->z_lock); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL, &zp->z_pflags, sizeof (zp->z_pflags)); if (attrzp) { if (mask & (AT_UID|AT_GID|AT_MODE)) mutex_enter(&attrzp->z_acl_lock); - mutex_enter(&attrzp->z_lock); SA_ADD_BULK_ATTR(xattr_bulk, xattr_count, SA_ZPL_FLAGS(zfsvfs), NULL, &attrzp->z_pflags, sizeof (attrzp->z_pflags)); } if (mask & (AT_UID|AT_GID)) { if (mask & AT_UID) { SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zfsvfs), NULL, &new_uid, sizeof (new_uid)); zp->z_uid = new_uid; if (attrzp) { SA_ADD_BULK_ATTR(xattr_bulk, xattr_count, SA_ZPL_UID(zfsvfs), NULL, &new_uid, sizeof (new_uid)); attrzp->z_uid = new_uid; } } if (mask & AT_GID) { SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zfsvfs), NULL, &new_gid, sizeof (new_gid)); zp->z_gid = new_gid; if (attrzp) { SA_ADD_BULK_ATTR(xattr_bulk, xattr_count, SA_ZPL_GID(zfsvfs), NULL, &new_gid, sizeof (new_gid)); attrzp->z_gid = new_gid; } } if (!(mask & AT_MODE)) { SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL, &new_mode, sizeof (new_mode)); new_mode = zp->z_mode; } err = zfs_acl_chown_setattr(zp); ASSERT(err == 0); if (attrzp) { err = zfs_acl_chown_setattr(attrzp); ASSERT(err == 0); } } if (mask & AT_MODE) { SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL, &new_mode, sizeof (new_mode)); zp->z_mode = new_mode; ASSERT3U((uintptr_t)aclp, !=, 0); err = zfs_aclset_common(zp, aclp, cr, tx); ASSERT0(err); if (zp->z_acl_cached) zfs_acl_free(zp->z_acl_cached); zp->z_acl_cached = aclp; aclp = NULL; } if (mask & AT_ATIME) { ZFS_TIME_ENCODE(&vap->va_atime, zp->z_atime); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zfsvfs), NULL, &zp->z_atime, sizeof (zp->z_atime)); } if (mask & AT_MTIME) { ZFS_TIME_ENCODE(&vap->va_mtime, mtime); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, mtime, sizeof (mtime)); } /* XXX - shouldn't this be done *before* the ATIME/MTIME checks? */ if (mask & AT_SIZE && !(mask & AT_MTIME)) { SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, mtime, sizeof (mtime)); SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, sizeof (ctime)); zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime, B_TRUE); } else if (mask != 0) { SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, sizeof (ctime)); zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime, ctime, B_TRUE); if (attrzp) { SA_ADD_BULK_ATTR(xattr_bulk, xattr_count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, sizeof (ctime)); zfs_tstamp_update_setup(attrzp, STATE_CHANGED, mtime, ctime, B_TRUE); } } /* * Do this after setting timestamps to prevent timestamp * update from toggling bit */ if (xoap && (mask & AT_XVATTR)) { /* * restore trimmed off masks * so that return masks can be set for caller. */ if (XVA_ISSET_REQ(&tmpxvattr, XAT_APPENDONLY)) { XVA_SET_REQ(xvap, XAT_APPENDONLY); } if (XVA_ISSET_REQ(&tmpxvattr, XAT_NOUNLINK)) { XVA_SET_REQ(xvap, XAT_NOUNLINK); } if (XVA_ISSET_REQ(&tmpxvattr, XAT_IMMUTABLE)) { XVA_SET_REQ(xvap, XAT_IMMUTABLE); } if (XVA_ISSET_REQ(&tmpxvattr, XAT_NODUMP)) { XVA_SET_REQ(xvap, XAT_NODUMP); } if (XVA_ISSET_REQ(&tmpxvattr, XAT_AV_MODIFIED)) { XVA_SET_REQ(xvap, XAT_AV_MODIFIED); } if (XVA_ISSET_REQ(&tmpxvattr, XAT_AV_QUARANTINED)) { XVA_SET_REQ(xvap, XAT_AV_QUARANTINED); } if (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP)) ASSERT(vp->v_type == VREG); zfs_xvattr_set(zp, xvap, tx); } if (fuid_dirtied) zfs_fuid_sync(zfsvfs, tx); if (mask != 0) zfs_log_setattr(zilog, tx, TX_SETATTR, zp, vap, mask, fuidp); - mutex_exit(&zp->z_lock); if (mask & (AT_UID|AT_GID|AT_MODE)) mutex_exit(&zp->z_acl_lock); if (attrzp) { if (mask & (AT_UID|AT_GID|AT_MODE)) mutex_exit(&attrzp->z_acl_lock); - mutex_exit(&attrzp->z_lock); } out: if (err == 0 && attrzp) { err2 = sa_bulk_update(attrzp->z_sa_hdl, xattr_bulk, xattr_count, tx); ASSERT(err2 == 0); } if (attrzp) - VN_RELE(ZTOV(attrzp)); + vrele(ZTOV(attrzp)); if (aclp) zfs_acl_free(aclp); if (fuidp) { zfs_fuid_info_free(fuidp); fuidp = NULL; } if (err) { dmu_tx_abort(tx); - if (err == ERESTART) - goto top; } else { err2 = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx); dmu_tx_commit(tx); } out2: if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) zil_commit(zilog, 0); ZFS_EXIT(zfsvfs); return (err); } -typedef struct zfs_zlock { - krwlock_t *zl_rwlock; /* lock we acquired */ - znode_t *zl_znode; /* znode we held */ - struct zfs_zlock *zl_next; /* next in list */ -} zfs_zlock_t; - /* - * Drop locks and release vnodes that were held by zfs_rename_lock(). + * We acquire all but fdvp locks using non-blocking acquisitions. If we + * fail to acquire any lock in the path we will drop all held locks, + * acquire the new lock in a blocking fashion, and then release it and + * restart the rename. This acquire/release step ensures that we do not + * spin on a lock waiting for release. On error release all vnode locks + * and decrement references the way tmpfs_rename() would do. */ -static void -zfs_rename_unlock(zfs_zlock_t **zlpp) +static int +zfs_rename_relock(struct vnode *sdvp, struct vnode **svpp, + struct vnode *tdvp, struct vnode **tvpp, + const struct componentname *scnp, const struct componentname *tcnp) { - zfs_zlock_t *zl; + zfsvfs_t *zfsvfs; + struct vnode *nvp, *svp, *tvp; + znode_t *sdzp, *tdzp, *szp, *tzp; + const char *snm = scnp->cn_nameptr; + const char *tnm = tcnp->cn_nameptr; + int error; - while ((zl = *zlpp) != NULL) { - if (zl->zl_znode != NULL) - VN_RELE(ZTOV(zl->zl_znode)); - rw_exit(zl->zl_rwlock); - *zlpp = zl->zl_next; - kmem_free(zl, sizeof (*zl)); + VOP_UNLOCK(tdvp, 0); + if (*tvpp != NULL && *tvpp != tdvp) + VOP_UNLOCK(*tvpp, 0); + +relock: + error = vn_lock(sdvp, LK_EXCLUSIVE); + if (error) + goto out; + sdzp = VTOZ(sdvp); + + error = vn_lock(tdvp, LK_EXCLUSIVE | LK_NOWAIT); + if (error != 0) { + VOP_UNLOCK(sdvp, 0); + if (error != EBUSY) + goto out; + error = vn_lock(tdvp, LK_EXCLUSIVE); + if (error) + goto out; + VOP_UNLOCK(tdvp, 0); + goto relock; } -} + tdzp = VTOZ(tdvp); -/* - * Search back through the directory tree, using the ".." entries. - * Lock each directory in the chain to prevent concurrent renames. - * Fail any attempt to move a directory into one of its own descendants. - * XXX - z_parent_lock can overlap with map or grow locks - */ -static int -zfs_rename_lock(znode_t *szp, znode_t *tdzp, znode_t *sdzp, zfs_zlock_t **zlpp) -{ - zfs_zlock_t *zl; - znode_t *zp = tdzp; - uint64_t rootid = zp->z_zfsvfs->z_root; - uint64_t oidp = zp->z_id; - krwlock_t *rwlp = &szp->z_parent_lock; - krw_t rw = RW_WRITER; + /* + * Before using sdzp and tdzp we must ensure that they are live. + * As a porting legacy from illumos we have two things to worry + * about. One is typical for FreeBSD and it is that the vnode is + * not reclaimed (doomed). The other is that the znode is live. + * The current code can invalidate the znode without acquiring the + * corresponding vnode lock if the object represented by the znode + * and vnode is no longer valid after a rollback or receive operation. + * z_teardown_lock hidden behind ZFS_ENTER and ZFS_EXIT is the lock + * that protects the znodes from the invalidation. + */ + zfsvfs = sdzp->z_zfsvfs; + ASSERT3P(zfsvfs, ==, tdzp->z_zfsvfs); + ZFS_ENTER(zfsvfs); /* - * First pass write-locks szp and compares to zp->z_id. - * Later passes read-lock zp and compare to zp->z_parent. + * We can not use ZFS_VERIFY_ZP() here because it could directly return + * bypassing the cleanup code in the case of an error. */ - do { - if (!rw_tryenter(rwlp, rw)) { - /* - * Another thread is renaming in this path. - * Note that if we are a WRITER, we don't have any - * parent_locks held yet. - */ - if (rw == RW_READER && zp->z_id > szp->z_id) { - /* - * Drop our locks and restart - */ - zfs_rename_unlock(&zl); - *zlpp = NULL; - zp = tdzp; - oidp = zp->z_id; - rwlp = &szp->z_parent_lock; - rw = RW_WRITER; - continue; - } else { - /* - * Wait for other thread to drop its locks - */ - rw_enter(rwlp, rw); + if (tdzp->z_sa_hdl == NULL || sdzp->z_sa_hdl == NULL) { + ZFS_EXIT(zfsvfs); + VOP_UNLOCK(sdvp, 0); + VOP_UNLOCK(tdvp, 0); + error = SET_ERROR(EIO); + goto out; + } + + /* + * Re-resolve svp to be certain it still exists and fetch the + * correct vnode. + */ + error = zfs_dirent_lookup(sdzp, snm, &szp, ZEXISTS); + if (error != 0) { + /* Source entry invalid or not there. */ + ZFS_EXIT(zfsvfs); + VOP_UNLOCK(sdvp, 0); + VOP_UNLOCK(tdvp, 0); + if ((scnp->cn_flags & ISDOTDOT) != 0 || + (scnp->cn_namelen == 1 && scnp->cn_nameptr[0] == '.')) + error = SET_ERROR(EINVAL); + goto out; + } + svp = ZTOV(szp); + + /* + * Re-resolve tvp, if it disappeared we just carry on. + */ + error = zfs_dirent_lookup(tdzp, tnm, &tzp, 0); + if (error != 0) { + ZFS_EXIT(zfsvfs); + VOP_UNLOCK(sdvp, 0); + VOP_UNLOCK(tdvp, 0); + vrele(svp); + if ((tcnp->cn_flags & ISDOTDOT) != 0) + error = SET_ERROR(EINVAL); + goto out; + } + if (tzp != NULL) + tvp = ZTOV(tzp); + else + tvp = NULL; + + /* + * At present the vnode locks must be acquired before z_teardown_lock, + * although it would be more logical to use the opposite order. + */ + ZFS_EXIT(zfsvfs); + + /* + * Now try acquire locks on svp and tvp. + */ + nvp = svp; + error = vn_lock(nvp, LK_EXCLUSIVE | LK_NOWAIT); + if (error != 0) { + VOP_UNLOCK(sdvp, 0); + VOP_UNLOCK(tdvp, 0); + if (tvp != NULL) + vrele(tvp); + if (error != EBUSY) { + vrele(nvp); + goto out; + } + error = vn_lock(nvp, LK_EXCLUSIVE); + if (error != 0) { + vrele(nvp); + goto out; + } + VOP_UNLOCK(nvp, 0); + /* + * Concurrent rename race. + * XXX ? + */ + if (nvp == tdvp) { + vrele(nvp); + error = SET_ERROR(EINVAL); + goto out; + } + vrele(*svpp); + *svpp = nvp; + goto relock; + } + vrele(*svpp); + *svpp = nvp; + + if (*tvpp != NULL) + vrele(*tvpp); + *tvpp = NULL; + if (tvp != NULL) { + nvp = tvp; + error = vn_lock(nvp, LK_EXCLUSIVE | LK_NOWAIT); + if (error != 0) { + VOP_UNLOCK(sdvp, 0); + VOP_UNLOCK(tdvp, 0); + VOP_UNLOCK(*svpp, 0); + if (error != EBUSY) { + vrele(nvp); + goto out; } + error = vn_lock(nvp, LK_EXCLUSIVE); + if (error != 0) { + vrele(nvp); + goto out; + } + vput(nvp); + goto relock; } + *tvpp = nvp; + } - zl = kmem_alloc(sizeof (*zl), KM_SLEEP); - zl->zl_rwlock = rwlp; - zl->zl_znode = NULL; - zl->zl_next = *zlpp; - *zlpp = zl; + return (0); - if (oidp == szp->z_id) /* We're a descendant of szp */ - return (SET_ERROR(EINVAL)); +out: + return (error); +} - if (oidp == rootid) /* We've hit the top */ - return (0); +/* + * Note that we must use VRELE_ASYNC in this function as it walks + * up the directory tree and vrele may need to acquire an exclusive + * lock if a last reference to a vnode is dropped. + */ +static int +zfs_rename_check(znode_t *szp, znode_t *sdzp, znode_t *tdzp) +{ + zfsvfs_t *zfsvfs; + znode_t *zp, *zp1; + uint64_t parent; + int error; - if (rw == RW_READER) { /* i.e. not the first pass */ - int error = zfs_zget(zp->z_zfsvfs, oidp, &zp); - if (error) - return (error); - zl->zl_znode = zp; + zfsvfs = tdzp->z_zfsvfs; + if (tdzp == szp) + return (SET_ERROR(EINVAL)); + if (tdzp == sdzp) + return (0); + if (tdzp->z_id == zfsvfs->z_root) + return (0); + zp = tdzp; + for (;;) { + ASSERT(!zp->z_unlinked); + if ((error = sa_lookup(zp->z_sa_hdl, + SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent))) != 0) + break; + + if (parent == szp->z_id) { + error = SET_ERROR(EINVAL); + break; } - (void) sa_lookup(zp->z_sa_hdl, SA_ZPL_PARENT(zp->z_zfsvfs), - &oidp, sizeof (oidp)); - rwlp = &zp->z_parent_lock; - rw = RW_READER; + if (parent == zfsvfs->z_root) + break; + if (parent == sdzp->z_id) + break; - } while (zp->z_id != sdzp->z_id); + error = zfs_zget(zfsvfs, parent, &zp1); + if (error != 0) + break; - return (0); + if (zp != tdzp) + VN_RELE_ASYNC(ZTOV(zp), + dsl_pool_vnrele_taskq(dmu_objset_pool(zfsvfs->z_os))); + zp = zp1; + } + + if (error == ENOTDIR) + panic("checkpath: .. not a directory\n"); + if (zp != tdzp) + VN_RELE_ASYNC(ZTOV(zp), + dsl_pool_vnrele_taskq(dmu_objset_pool(zfsvfs->z_os))); + return (error); } /* * Move an entry from the provided source directory to the target * directory. Change the entry name as indicated. * * IN: sdvp - Source directory containing the "old entry". * snm - Old entry name. * tdvp - Target directory to contain the "new entry". * tnm - New entry name. * cr - credentials of caller. * ct - caller context * flags - case flags * * RETURN: 0 on success, error code on failure. * * Timestamps: * sdvp,tdvp - ctime|mtime updated */ /*ARGSUSED*/ static int -zfs_rename(vnode_t *sdvp, char *snm, vnode_t *tdvp, char *tnm, cred_t *cr, - caller_context_t *ct, int flags) +zfs_rename(vnode_t *sdvp, vnode_t **svpp, struct componentname *scnp, + vnode_t *tdvp, vnode_t **tvpp, struct componentname *tcnp, + cred_t *cr) { - znode_t *tdzp, *sdzp, *szp, *tzp; - zfsvfs_t *zfsvfs; - zilog_t *zilog; - vnode_t *realvp; - zfs_dirlock_t *sdl, *tdl; + zfsvfs_t *zfsvfs; + znode_t *sdzp, *tdzp, *szp, *tzp; + zilog_t *zilog = NULL; dmu_tx_t *tx; - zfs_zlock_t *zl; - int cmp, serr, terr; + char *snm = scnp->cn_nameptr; + char *tnm = tcnp->cn_nameptr; int error = 0; - int zflg = 0; - boolean_t waited = B_FALSE; - tdzp = VTOZ(tdvp); - ZFS_VERIFY_ZP(tdzp); - zfsvfs = tdzp->z_zfsvfs; - ZFS_ENTER(zfsvfs); - zilog = zfsvfs->z_log; - sdzp = VTOZ(sdvp); + /* Reject renames across filesystems. */ + if ((*svpp)->v_mount != tdvp->v_mount || + ((*tvpp) != NULL && (*svpp)->v_mount != (*tvpp)->v_mount)) { + error = SET_ERROR(EXDEV); + goto out; + } + if (zfsctl_is_node(tdvp)) { + error = SET_ERROR(EXDEV); + goto out; + } + /* - * In case sdzp is not valid, let's be sure to exit from the right - * zfsvfs_t. + * Lock all four vnodes to ensure safety and semantics of renaming. */ - if (sdzp->z_sa_hdl == NULL) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EIO)); + error = zfs_rename_relock(sdvp, svpp, tdvp, tvpp, scnp, tcnp); + if (error != 0) { + /* no vnodes are locked in the case of error here */ + return (error); } + tdzp = VTOZ(tdvp); + sdzp = VTOZ(sdvp); + zfsvfs = tdzp->z_zfsvfs; + zilog = zfsvfs->z_log; + /* - * We check z_zfsvfs rather than v_vfsp here, because snapshots and the - * ctldir appear to have the same v_vfsp. + * After we re-enter ZFS_ENTER() we will have to revalidate all + * znodes involved. */ - if (sdzp->z_zfsvfs != zfsvfs || zfsctl_is_node(tdvp)) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EXDEV)); - } + ZFS_ENTER(zfsvfs); if (zfsvfs->z_utf8 && u8_validate(tnm, strlen(tnm), NULL, U8_VALIDATE_ENTIRE, &error) < 0) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EILSEQ)); + error = SET_ERROR(EILSEQ); + goto unlockout; } - if (flags & FIGNORECASE) - zflg |= ZCILOOK; + /* If source and target are the same file, there is nothing to do. */ + if ((*svpp) == (*tvpp)) { + error = 0; + goto unlockout; + } -top: - szp = NULL; - tzp = NULL; - zl = NULL; + if (((*svpp)->v_type == VDIR && (*svpp)->v_mountedhere != NULL) || + ((*tvpp) != NULL && (*tvpp)->v_type == VDIR && + (*tvpp)->v_mountedhere != NULL)) { + error = SET_ERROR(EXDEV); + goto unlockout; + } /* - * This is to prevent the creation of links into attribute space - * by renaming a linked file into/outof an attribute directory. - * See the comment in zfs_link() for why this is considered bad. + * We can not use ZFS_VERIFY_ZP() here because it could directly return + * bypassing the cleanup code in the case of an error. */ - if ((tdzp->z_pflags & ZFS_XATTR) != (sdzp->z_pflags & ZFS_XATTR)) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EINVAL)); + if (tdzp->z_sa_hdl == NULL || sdzp->z_sa_hdl == NULL) { + error = SET_ERROR(EIO); + goto unlockout; } - /* - * Lock source and target directory entries. To prevent deadlock, - * a lock ordering must be defined. We lock the directory with - * the smallest object id first, or if it's a tie, the one with - * the lexically first name. - */ - if (sdzp->z_id < tdzp->z_id) { - cmp = -1; - } else if (sdzp->z_id > tdzp->z_id) { - cmp = 1; - } else { - /* - * First compare the two name arguments without - * considering any case folding. - */ - int nofold = (zfsvfs->z_norm & ~U8_TEXTPREP_TOUPPER); - - cmp = u8_strcmp(snm, tnm, 0, nofold, U8_UNICODE_LATEST, &error); - ASSERT(error == 0 || !zfsvfs->z_utf8); - if (cmp == 0) { - /* - * POSIX: "If the old argument and the new argument - * both refer to links to the same existing file, - * the rename() function shall return successfully - * and perform no other action." - */ - ZFS_EXIT(zfsvfs); - return (0); - } - /* - * If the file system is case-folding, then we may - * have some more checking to do. A case-folding file - * system is either supporting mixed case sensitivity - * access or is completely case-insensitive. Note - * that the file system is always case preserving. - * - * In mixed sensitivity mode case sensitive behavior - * is the default. FIGNORECASE must be used to - * explicitly request case insensitive behavior. - * - * If the source and target names provided differ only - * by case (e.g., a request to rename 'tim' to 'Tim'), - * we will treat this as a special case in the - * case-insensitive mode: as long as the source name - * is an exact match, we will allow this to proceed as - * a name-change request. - */ - if ((zfsvfs->z_case == ZFS_CASE_INSENSITIVE || - (zfsvfs->z_case == ZFS_CASE_MIXED && - flags & FIGNORECASE)) && - u8_strcmp(snm, tnm, 0, zfsvfs->z_norm, U8_UNICODE_LATEST, - &error) == 0) { - /* - * case preserving rename request, require exact - * name matches - */ - zflg |= ZCIEXACT; - zflg &= ~ZCILOOK; - } + szp = VTOZ(*svpp); + tzp = *tvpp == NULL ? NULL : VTOZ(*tvpp); + if (szp->z_sa_hdl == NULL || (tzp != NULL && tzp->z_sa_hdl == NULL)) { + error = SET_ERROR(EIO); + goto unlockout; } /* - * If the source and destination directories are the same, we should - * grab the z_name_lock of that directory only once. + * This is to prevent the creation of links into attribute space + * by renaming a linked file into/outof an attribute directory. + * See the comment in zfs_link() for why this is considered bad. */ - if (sdzp == tdzp) { - zflg |= ZHAVELOCK; - rw_enter(&sdzp->z_name_lock, RW_READER); + if ((tdzp->z_pflags & ZFS_XATTR) != (sdzp->z_pflags & ZFS_XATTR)) { + error = SET_ERROR(EINVAL); + goto unlockout; } - if (cmp < 0) { - serr = zfs_dirent_lock(&sdl, sdzp, snm, &szp, - ZEXISTS | zflg, NULL, NULL); - terr = zfs_dirent_lock(&tdl, - tdzp, tnm, &tzp, ZRENAMING | zflg, NULL, NULL); - } else { - terr = zfs_dirent_lock(&tdl, - tdzp, tnm, &tzp, zflg, NULL, NULL); - serr = zfs_dirent_lock(&sdl, - sdzp, snm, &szp, ZEXISTS | ZRENAMING | zflg, - NULL, NULL); - } - - if (serr) { - /* - * Source entry invalid or not there. - */ - if (!terr) { - zfs_dirent_unlock(tdl); - if (tzp) - VN_RELE(ZTOV(tzp)); - } - - if (sdzp == tdzp) - rw_exit(&sdzp->z_name_lock); - - /* - * FreeBSD: In OpenSolaris they only check if rename source is - * ".." here, because "." is handled in their lookup. This is - * not the case for FreeBSD, so we check for "." explicitly. - */ - if (strcmp(snm, ".") == 0 || strcmp(snm, "..") == 0) - serr = SET_ERROR(EINVAL); - ZFS_EXIT(zfsvfs); - return (serr); - } - if (terr) { - zfs_dirent_unlock(sdl); - VN_RELE(ZTOV(szp)); - - if (sdzp == tdzp) - rw_exit(&sdzp->z_name_lock); - - if (strcmp(tnm, "..") == 0) - terr = SET_ERROR(EINVAL); - ZFS_EXIT(zfsvfs); - return (terr); - } - /* * Must have write access at the source to remove the old entry * and write access at the target to create the new entry. * Note that if target and source are the same, this can be * done in a single check. */ - if (error = zfs_zaccess_rename(sdzp, szp, tdzp, tzp, cr)) - goto out; + goto unlockout; - if (ZTOV(szp)->v_type == VDIR) { + if ((*svpp)->v_type == VDIR) { /* + * Avoid ".", "..", and aliases of "." for obvious reasons. + */ + if ((scnp->cn_namelen == 1 && scnp->cn_nameptr[0] == '.') || + sdzp == szp || + (scnp->cn_flags | tcnp->cn_flags) & ISDOTDOT) { + error = EINVAL; + goto unlockout; + } + + /* * Check to make sure rename is valid. * Can't do a move like this: /usr/a/b to /usr/a/b/c/d */ - if (error = zfs_rename_lock(szp, tdzp, sdzp, &zl)) - goto out; + if (error = zfs_rename_check(szp, sdzp, tdzp)) + goto unlockout; } /* * Does target exist? */ if (tzp) { /* * Source and target must be the same type. */ - if (ZTOV(szp)->v_type == VDIR) { - if (ZTOV(tzp)->v_type != VDIR) { + if ((*svpp)->v_type == VDIR) { + if ((*tvpp)->v_type != VDIR) { error = SET_ERROR(ENOTDIR); - goto out; + goto unlockout; + } else { + cache_purge(tdvp); + if (sdvp != tdvp) + cache_purge(sdvp); } } else { - if (ZTOV(tzp)->v_type == VDIR) { + if ((*tvpp)->v_type == VDIR) { error = SET_ERROR(EISDIR); - goto out; + goto unlockout; } } - /* - * POSIX dictates that when the source and target - * entries refer to the same file object, rename - * must do nothing and exit without error. - */ - if (szp->z_id == tzp->z_id) { - error = 0; - goto out; - } } - vnevent_rename_src(ZTOV(szp), sdvp, snm, ct); + vnevent_rename_src(*svpp, sdvp, scnp->cn_nameptr, ct); if (tzp) - vnevent_rename_dest(ZTOV(tzp), tdvp, tnm, ct); + vnevent_rename_dest(*tvpp, tdvp, tnm, ct); /* * notify the target directory if it is not the same * as source directory. */ if (tdvp != sdvp) { vnevent_rename_dest_dir(tdvp, ct); } tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_sa(tx, szp->z_sa_hdl, B_FALSE); dmu_tx_hold_sa(tx, sdzp->z_sa_hdl, B_FALSE); dmu_tx_hold_zap(tx, sdzp->z_id, FALSE, snm); dmu_tx_hold_zap(tx, tdzp->z_id, TRUE, tnm); if (sdzp != tdzp) { dmu_tx_hold_sa(tx, tdzp->z_sa_hdl, B_FALSE); zfs_sa_upgrade_txholds(tx, tdzp); } if (tzp) { dmu_tx_hold_sa(tx, tzp->z_sa_hdl, B_FALSE); zfs_sa_upgrade_txholds(tx, tzp); } zfs_sa_upgrade_txholds(tx, szp); dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL); - error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT); + error = dmu_tx_assign(tx, TXG_WAIT); if (error) { - if (zl != NULL) - zfs_rename_unlock(&zl); - zfs_dirent_unlock(sdl); - zfs_dirent_unlock(tdl); - - if (sdzp == tdzp) - rw_exit(&sdzp->z_name_lock); - - VN_RELE(ZTOV(szp)); - if (tzp) - VN_RELE(ZTOV(tzp)); - if (error == ERESTART) { - waited = B_TRUE; - dmu_tx_wait(tx); - dmu_tx_abort(tx); - goto top; - } dmu_tx_abort(tx); - ZFS_EXIT(zfsvfs); - return (error); + goto unlockout; } + if (tzp) /* Attempt to remove the existing target */ - error = zfs_link_destroy(tdl, tzp, tx, zflg, NULL); + error = zfs_link_destroy(tdzp, tnm, tzp, tx, 0, NULL); if (error == 0) { - error = zfs_link_create(tdl, szp, tx, ZRENAMING); + error = zfs_link_create(tdzp, tnm, szp, tx, ZRENAMING); if (error == 0) { szp->z_pflags |= ZFS_AV_MODIFIED; error = sa_update(szp->z_sa_hdl, SA_ZPL_FLAGS(zfsvfs), (void *)&szp->z_pflags, sizeof (uint64_t), tx); ASSERT0(error); - error = zfs_link_destroy(sdl, szp, tx, ZRENAMING, NULL); + error = zfs_link_destroy(sdzp, snm, szp, tx, ZRENAMING, + NULL); if (error == 0) { - zfs_log_rename(zilog, tx, TX_RENAME | - (flags & FIGNORECASE ? TX_CI : 0), sdzp, - sdl->dl_name, tdzp, tdl->dl_name, szp); + zfs_log_rename(zilog, tx, TX_RENAME, sdzp, + snm, tdzp, tnm, szp); /* * Update path information for the target vnode */ - vn_renamepath(tdvp, ZTOV(szp), tnm, - strlen(tnm)); + vn_renamepath(tdvp, *svpp, tnm, strlen(tnm)); } else { /* * At this point, we have successfully created * the target name, but have failed to remove * the source name. Since the create was done * with the ZRENAMING flag, there are * complications; for one, the link count is * wrong. The easiest way to deal with this * is to remove the newly created target, and * return the original error. This must * succeed; fortunately, it is very unlikely to * fail, since we just created it. */ - VERIFY3U(zfs_link_destroy(tdl, szp, tx, + VERIFY3U(zfs_link_destroy(tdzp, tnm, szp, tx, ZRENAMING, NULL), ==, 0); } } -#ifdef FREEBSD_NAMECACHE if (error == 0) { - cache_purge(sdvp); - cache_purge(tdvp); - cache_purge(ZTOV(szp)); - if (tzp) - cache_purge(ZTOV(tzp)); + cache_purge(*svpp); + if (*tvpp != NULL) + cache_purge(*tvpp); + cache_purge_negative(tdvp); } -#endif } dmu_tx_commit(tx); -out: - if (zl != NULL) - zfs_rename_unlock(&zl); - zfs_dirent_unlock(sdl); - zfs_dirent_unlock(tdl); +unlockout: /* all 4 vnodes are locked, ZFS_ENTER called */ + ZFS_EXIT(zfsvfs); + VOP_UNLOCK(*svpp, 0); + VOP_UNLOCK(sdvp, 0); - if (sdzp == tdzp) - rw_exit(&sdzp->z_name_lock); - - - VN_RELE(ZTOV(szp)); - if (tzp) - VN_RELE(ZTOV(tzp)); - - if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) +out: /* original two vnodes are locked */ + if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS && error == 0) zil_commit(zilog, 0); - ZFS_EXIT(zfsvfs); - + if (*tvpp != NULL) + VOP_UNLOCK(*tvpp, 0); + if (tdvp != *tvpp) + VOP_UNLOCK(tdvp, 0); return (error); } /* * Insert the indicated symbolic reference entry into the directory. * * IN: dvp - Directory to contain new symbolic link. * link - Name for new symlink entry. * vap - Attributes of new entry. * cr - credentials of caller. * ct - caller context * flags - case flags * * RETURN: 0 on success, error code on failure. * * Timestamps: * dvp - ctime|mtime updated */ /*ARGSUSED*/ static int zfs_symlink(vnode_t *dvp, vnode_t **vpp, char *name, vattr_t *vap, char *link, cred_t *cr, kthread_t *td) { znode_t *zp, *dzp = VTOZ(dvp); - zfs_dirlock_t *dl; dmu_tx_t *tx; zfsvfs_t *zfsvfs = dzp->z_zfsvfs; zilog_t *zilog; uint64_t len = strlen(link); int error; - int zflg = ZNEW; zfs_acl_ids_t acl_ids; boolean_t fuid_dirtied; uint64_t txtype = TX_SYMLINK; - boolean_t waited = B_FALSE; int flags = 0; ASSERT(vap->va_type == VLNK); ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(dzp); zilog = zfsvfs->z_log; if (zfsvfs->z_utf8 && u8_validate(name, strlen(name), NULL, U8_VALIDATE_ENTIRE, &error) < 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EILSEQ)); } - if (flags & FIGNORECASE) - zflg |= ZCILOOK; if (len > MAXPATHLEN) { ZFS_EXIT(zfsvfs); return (SET_ERROR(ENAMETOOLONG)); } if ((error = zfs_acl_ids_create(dzp, 0, vap, cr, NULL, &acl_ids)) != 0) { ZFS_EXIT(zfsvfs); return (error); } - getnewvnode_reserve(1); - -top: /* * Attempt to lock directory; fail if entry already exists. */ - error = zfs_dirent_lock(&dl, dzp, name, &zp, zflg, NULL, NULL); + error = zfs_dirent_lookup(dzp, name, &zp, ZNEW); if (error) { zfs_acl_ids_free(&acl_ids); - getnewvnode_drop_reserve(); ZFS_EXIT(zfsvfs); return (error); } if (error = zfs_zaccess(dzp, ACE_ADD_FILE, 0, B_FALSE, cr)) { zfs_acl_ids_free(&acl_ids); - zfs_dirent_unlock(dl); - getnewvnode_drop_reserve(); ZFS_EXIT(zfsvfs); return (error); } if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) { zfs_acl_ids_free(&acl_ids); - zfs_dirent_unlock(dl); - getnewvnode_drop_reserve(); ZFS_EXIT(zfsvfs); return (SET_ERROR(EDQUOT)); } + + getnewvnode_reserve(1); tx = dmu_tx_create(zfsvfs->z_os); fuid_dirtied = zfsvfs->z_fuid_dirty; dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, MAX(1, len)); dmu_tx_hold_zap(tx, dzp->z_id, TRUE, name); dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes + ZFS_SA_BASE_ATTR_SIZE + len); dmu_tx_hold_sa(tx, dzp->z_sa_hdl, B_FALSE); if (!zfsvfs->z_use_sa && acl_ids.z_aclp->z_acl_bytes > ZFS_ACE_SPACE) { dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, acl_ids.z_aclp->z_acl_bytes); } if (fuid_dirtied) zfs_fuid_txhold(zfsvfs, tx); - error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT); + error = dmu_tx_assign(tx, TXG_WAIT); if (error) { - zfs_dirent_unlock(dl); - if (error == ERESTART) { - waited = B_TRUE; - dmu_tx_wait(tx); - dmu_tx_abort(tx); - goto top; - } zfs_acl_ids_free(&acl_ids); dmu_tx_abort(tx); getnewvnode_drop_reserve(); ZFS_EXIT(zfsvfs); return (error); } /* * Create a new object for the symlink. * for version 4 ZPL datsets the symlink will be an SA attribute */ zfs_mknode(dzp, vap, tx, cr, 0, &zp, &acl_ids); if (fuid_dirtied) zfs_fuid_sync(zfsvfs, tx); - mutex_enter(&zp->z_lock); if (zp->z_is_sa) error = sa_update(zp->z_sa_hdl, SA_ZPL_SYMLINK(zfsvfs), link, len, tx); else zfs_sa_symlink(zp, link, len, tx); - mutex_exit(&zp->z_lock); zp->z_size = len; (void) sa_update(zp->z_sa_hdl, SA_ZPL_SIZE(zfsvfs), &zp->z_size, sizeof (zp->z_size), tx); /* * Insert the new object into the directory. */ - (void) zfs_link_create(dl, zp, tx, ZNEW); + (void) zfs_link_create(dzp, name, zp, tx, ZNEW); - if (flags & FIGNORECASE) - txtype |= TX_CI; zfs_log_symlink(zilog, tx, txtype, dzp, zp, name, link); *vpp = ZTOV(zp); zfs_acl_ids_free(&acl_ids); dmu_tx_commit(tx); getnewvnode_drop_reserve(); - zfs_dirent_unlock(dl); - if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) zil_commit(zilog, 0); ZFS_EXIT(zfsvfs); return (error); } /* * Return, in the buffer contained in the provided uio structure, * the symbolic path referred to by vp. * * IN: vp - vnode of symbolic link. * uio - structure to contain the link path. * cr - credentials of caller. * ct - caller context * * OUT: uio - structure containing the link path. * * RETURN: 0 on success, error code on failure. * * Timestamps: * vp - atime updated */ /* ARGSUSED */ static int zfs_readlink(vnode_t *vp, uio_t *uio, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; int error; ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(zp); - mutex_enter(&zp->z_lock); if (zp->z_is_sa) error = sa_lookup_uio(zp->z_sa_hdl, SA_ZPL_SYMLINK(zfsvfs), uio); else error = zfs_sa_readlink(zp, uio); - mutex_exit(&zp->z_lock); ZFS_ACCESSTIME_STAMP(zfsvfs, zp); ZFS_EXIT(zfsvfs); return (error); } /* * Insert a new entry into directory tdvp referencing svp. * * IN: tdvp - Directory to contain new entry. * svp - vnode of new entry. * name - name of new entry. * cr - credentials of caller. * ct - caller context * * RETURN: 0 on success, error code on failure. * * Timestamps: * tdvp - ctime|mtime updated * svp - ctime updated */ /* ARGSUSED */ static int zfs_link(vnode_t *tdvp, vnode_t *svp, char *name, cred_t *cr, caller_context_t *ct, int flags) { znode_t *dzp = VTOZ(tdvp); znode_t *tzp, *szp; zfsvfs_t *zfsvfs = dzp->z_zfsvfs; zilog_t *zilog; - zfs_dirlock_t *dl; dmu_tx_t *tx; - vnode_t *realvp; int error; - int zf = ZNEW; uint64_t parent; uid_t owner; - boolean_t waited = B_FALSE; ASSERT(tdvp->v_type == VDIR); ZFS_ENTER(zfsvfs); ZFS_VERIFY_ZP(dzp); zilog = zfsvfs->z_log; - if (VOP_REALVP(svp, &realvp, ct) == 0) - svp = realvp; - /* * POSIX dictates that we return EPERM here. * Better choices include ENOTSUP or EISDIR. */ if (svp->v_type == VDIR) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } szp = VTOZ(svp); ZFS_VERIFY_ZP(szp); if (szp->z_pflags & (ZFS_APPENDONLY | ZFS_IMMUTABLE | ZFS_READONLY)) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } - /* - * We check z_zfsvfs rather than v_vfsp here, because snapshots and the - * ctldir appear to have the same v_vfsp. - */ - if (szp->z_zfsvfs != zfsvfs || zfsctl_is_node(svp)) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EXDEV)); - } - /* Prevent links to .zfs/shares files */ if ((error = sa_lookup(szp->z_sa_hdl, SA_ZPL_PARENT(zfsvfs), &parent, sizeof (uint64_t))) != 0) { ZFS_EXIT(zfsvfs); return (error); } if (parent == zfsvfs->z_shares_dir) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } if (zfsvfs->z_utf8 && u8_validate(name, strlen(name), NULL, U8_VALIDATE_ENTIRE, &error) < 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EILSEQ)); } - if (flags & FIGNORECASE) - zf |= ZCILOOK; /* * We do not support links between attributes and non-attributes * because of the potential security risk of creating links * into "normal" file space in order to circumvent restrictions * imposed in attribute space. */ if ((szp->z_pflags & ZFS_XATTR) != (dzp->z_pflags & ZFS_XATTR)) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EINVAL)); } owner = zfs_fuid_map_id(zfsvfs, szp->z_uid, cr, ZFS_OWNER); if (owner != crgetuid(cr) && secpolicy_basic_link(svp, cr) != 0) { ZFS_EXIT(zfsvfs); return (SET_ERROR(EPERM)); } if (error = zfs_zaccess(dzp, ACE_ADD_FILE, 0, B_FALSE, cr)) { ZFS_EXIT(zfsvfs); return (error); } -top: /* * Attempt to lock directory; fail if entry already exists. */ - error = zfs_dirent_lock(&dl, dzp, name, &tzp, zf, NULL, NULL); + error = zfs_dirent_lookup(dzp, name, &tzp, ZNEW); if (error) { ZFS_EXIT(zfsvfs); return (error); } tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_sa(tx, szp->z_sa_hdl, B_FALSE); dmu_tx_hold_zap(tx, dzp->z_id, TRUE, name); zfs_sa_upgrade_txholds(tx, szp); zfs_sa_upgrade_txholds(tx, dzp); - error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT); + error = dmu_tx_assign(tx, TXG_WAIT); if (error) { - zfs_dirent_unlock(dl); - if (error == ERESTART) { - waited = B_TRUE; - dmu_tx_wait(tx); - dmu_tx_abort(tx); - goto top; - } dmu_tx_abort(tx); ZFS_EXIT(zfsvfs); return (error); } - error = zfs_link_create(dl, szp, tx, 0); + error = zfs_link_create(dzp, name, szp, tx, 0); if (error == 0) { uint64_t txtype = TX_LINK; - if (flags & FIGNORECASE) - txtype |= TX_CI; zfs_log_link(zilog, tx, txtype, dzp, szp, name); } dmu_tx_commit(tx); - zfs_dirent_unlock(dl); - if (error == 0) { vnevent_link(svp, ct); } if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) zil_commit(zilog, 0); ZFS_EXIT(zfsvfs); return (error); } -#ifdef illumos -/* - * zfs_null_putapage() is used when the file system has been force - * unmounted. It just drops the pages. - */ -/* ARGSUSED */ -static int -zfs_null_putapage(vnode_t *vp, page_t *pp, u_offset_t *offp, - size_t *lenp, int flags, cred_t *cr) -{ - pvn_write_done(pp, B_INVAL|B_FORCE|B_ERROR); - return (0); -} -/* - * Push a page out to disk, klustering if possible. - * - * IN: vp - file to push page to. - * pp - page to push. - * flags - additional flags. - * cr - credentials of caller. - * - * OUT: offp - start of range pushed. - * lenp - len of range pushed. - * - * RETURN: 0 on success, error code on failure. - * - * NOTE: callers must have locked the page to be pushed. On - * exit, the page (and all other pages in the kluster) must be - * unlocked. - */ -/* ARGSUSED */ -static int -zfs_putapage(vnode_t *vp, page_t *pp, u_offset_t *offp, - size_t *lenp, int flags, cred_t *cr) -{ - znode_t *zp = VTOZ(vp); - zfsvfs_t *zfsvfs = zp->z_zfsvfs; - dmu_tx_t *tx; - u_offset_t off, koff; - size_t len, klen; - int err; - - off = pp->p_offset; - len = PAGESIZE; - /* - * If our blocksize is bigger than the page size, try to kluster - * multiple pages so that we write a full block (thus avoiding - * a read-modify-write). - */ - if (off < zp->z_size && zp->z_blksz > PAGESIZE) { - klen = P2ROUNDUP((ulong_t)zp->z_blksz, PAGESIZE); - koff = ISP2(klen) ? P2ALIGN(off, (u_offset_t)klen) : 0; - ASSERT(koff <= zp->z_size); - if (koff + klen > zp->z_size) - klen = P2ROUNDUP(zp->z_size - koff, (uint64_t)PAGESIZE); - pp = pvn_write_kluster(vp, pp, &off, &len, koff, klen, flags); - } - ASSERT3U(btop(len), ==, btopr(len)); - - /* - * Can't push pages past end-of-file. - */ - if (off >= zp->z_size) { - /* ignore all pages */ - err = 0; - goto out; - } else if (off + len > zp->z_size) { - int npages = btopr(zp->z_size - off); - page_t *trunc; - - page_list_break(&pp, &trunc, npages); - /* ignore pages past end of file */ - if (trunc) - pvn_write_done(trunc, flags); - len = zp->z_size - off; - } - - if (zfs_owner_overquota(zfsvfs, zp, B_FALSE) || - zfs_owner_overquota(zfsvfs, zp, B_TRUE)) { - err = SET_ERROR(EDQUOT); - goto out; - } - tx = dmu_tx_create(zfsvfs->z_os); - dmu_tx_hold_write(tx, zp->z_id, off, len); - - dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE); - zfs_sa_upgrade_txholds(tx, zp); - err = dmu_tx_assign(tx, TXG_WAIT); - if (err != 0) { - dmu_tx_abort(tx); - goto out; - } - - if (zp->z_blksz <= PAGESIZE) { - caddr_t va = zfs_map_page(pp, S_READ); - ASSERT3U(len, <=, PAGESIZE); - dmu_write(zfsvfs->z_os, zp->z_id, off, len, va, tx); - zfs_unmap_page(pp, va); - } else { - err = dmu_write_pages(zfsvfs->z_os, zp->z_id, off, len, pp, tx); - } - - if (err == 0) { - uint64_t mtime[2], ctime[2]; - sa_bulk_attr_t bulk[3]; - int count = 0; - - SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, - &mtime, 16); - SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, - &ctime, 16); - SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL, - &zp->z_pflags, 8); - zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime, - B_TRUE); - zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp, off, len, 0); - } - dmu_tx_commit(tx); - -out: - pvn_write_done(pp, (err ? B_ERROR : 0) | flags); - if (offp) - *offp = off; - if (lenp) - *lenp = len; - - return (err); -} - -/* - * Copy the portion of the file indicated from pages into the file. - * The pages are stored in a page list attached to the files vnode. - * - * IN: vp - vnode of file to push page data to. - * off - position in file to put data. - * len - amount of data to write. - * flags - flags to control the operation. - * cr - credentials of caller. - * ct - caller context. - * - * RETURN: 0 on success, error code on failure. - * - * Timestamps: - * vp - ctime|mtime updated - */ /*ARGSUSED*/ -static int -zfs_putpage(vnode_t *vp, offset_t off, size_t len, int flags, cred_t *cr, - caller_context_t *ct) -{ - znode_t *zp = VTOZ(vp); - zfsvfs_t *zfsvfs = zp->z_zfsvfs; - page_t *pp; - size_t io_len; - u_offset_t io_off; - uint_t blksz; - rl_t *rl; - int error = 0; - - ZFS_ENTER(zfsvfs); - ZFS_VERIFY_ZP(zp); - - /* - * Align this request to the file block size in case we kluster. - * XXX - this can result in pretty aggresive locking, which can - * impact simultanious read/write access. One option might be - * to break up long requests (len == 0) into block-by-block - * operations to get narrower locking. - */ - blksz = zp->z_blksz; - if (ISP2(blksz)) - io_off = P2ALIGN_TYPED(off, blksz, u_offset_t); - else - io_off = 0; - if (len > 0 && ISP2(blksz)) - io_len = P2ROUNDUP_TYPED(len + (off - io_off), blksz, size_t); - else - io_len = 0; - - if (io_len == 0) { - /* - * Search the entire vp list for pages >= io_off. - */ - rl = zfs_range_lock(zp, io_off, UINT64_MAX, RL_WRITER); - error = pvn_vplist_dirty(vp, io_off, zfs_putapage, flags, cr); - goto out; - } - rl = zfs_range_lock(zp, io_off, io_len, RL_WRITER); - - if (off > zp->z_size) { - /* past end of file */ - zfs_range_unlock(rl); - ZFS_EXIT(zfsvfs); - return (0); - } - - len = MIN(io_len, P2ROUNDUP(zp->z_size, PAGESIZE) - io_off); - - for (off = io_off; io_off < off + len; io_off += io_len) { - if ((flags & B_INVAL) || ((flags & B_ASYNC) == 0)) { - pp = page_lookup(vp, io_off, - (flags & (B_INVAL | B_FREE)) ? SE_EXCL : SE_SHARED); - } else { - pp = page_lookup_nowait(vp, io_off, - (flags & B_FREE) ? SE_EXCL : SE_SHARED); - } - - if (pp != NULL && pvn_getdirty(pp, flags)) { - int err; - - /* - * Found a dirty page to push - */ - err = zfs_putapage(vp, pp, &io_off, &io_len, flags, cr); - if (err) - error = err; - } else { - io_len = PAGESIZE; - } - } -out: - zfs_range_unlock(rl); - if ((flags & B_ASYNC) == 0 || zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS) - zil_commit(zfsvfs->z_log, zp->z_id); - ZFS_EXIT(zfsvfs); - return (error); -} -#endif /* illumos */ - -/*ARGSUSED*/ void zfs_inactive(vnode_t *vp, cred_t *cr, caller_context_t *ct) { znode_t *zp = VTOZ(vp); zfsvfs_t *zfsvfs = zp->z_zfsvfs; int error; rw_enter(&zfsvfs->z_teardown_inactive_lock, RW_READER); if (zp->z_sa_hdl == NULL) { /* * The fs has been unmounted, or we did a * suspend/resume and this file no longer exists. */ rw_exit(&zfsvfs->z_teardown_inactive_lock); vrecycle(vp); return; } - mutex_enter(&zp->z_lock); if (zp->z_unlinked) { /* * Fast path to recycle a vnode of a removed file. */ - mutex_exit(&zp->z_lock); rw_exit(&zfsvfs->z_teardown_inactive_lock); vrecycle(vp); return; } - mutex_exit(&zp->z_lock); if (zp->z_atime_dirty && zp->z_unlinked == 0) { dmu_tx_t *tx = dmu_tx_create(zfsvfs->z_os); dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE); zfs_sa_upgrade_txholds(tx, zp); error = dmu_tx_assign(tx, TXG_WAIT); if (error) { dmu_tx_abort(tx); } else { - mutex_enter(&zp->z_lock); (void) sa_update(zp->z_sa_hdl, SA_ZPL_ATIME(zfsvfs), (void *)&zp->z_atime, sizeof (zp->z_atime), tx); zp->z_atime_dirty = 0; - mutex_exit(&zp->z_lock); dmu_tx_commit(tx); } } rw_exit(&zfsvfs->z_teardown_inactive_lock); } -#ifdef illumos -/* - * Bounds-check the seek operation. - * - * IN: vp - vnode seeking within - * ooff - old file offset - * noffp - pointer to new file offset - * ct - caller context - * - * RETURN: 0 on success, EINVAL if new offset invalid. - */ -/* ARGSUSED */ -static int -zfs_seek(vnode_t *vp, offset_t ooff, offset_t *noffp, - caller_context_t *ct) -{ - if (vp->v_type == VDIR) - return (0); - return ((*noffp < 0 || *noffp > MAXOFFSET_T) ? EINVAL : 0); -} -/* - * Pre-filter the generic locking function to trap attempts to place - * a mandatory lock on a memory mapped file. - */ -static int -zfs_frlock(vnode_t *vp, int cmd, flock64_t *bfp, int flag, offset_t offset, - flk_callback_t *flk_cbp, cred_t *cr, caller_context_t *ct) -{ - znode_t *zp = VTOZ(vp); - zfsvfs_t *zfsvfs = zp->z_zfsvfs; - - ZFS_ENTER(zfsvfs); - ZFS_VERIFY_ZP(zp); - - /* - * We are following the UFS semantics with respect to mapcnt - * here: If we see that the file is mapped already, then we will - * return an error, but we don't worry about races between this - * function and zfs_map(). - */ - if (zp->z_mapcnt > 0 && MANDMODE(zp->z_mode)) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EAGAIN)); - } - ZFS_EXIT(zfsvfs); - return (fs_frlock(vp, cmd, bfp, flag, offset, flk_cbp, cr, ct)); -} - -/* - * If we can't find a page in the cache, we will create a new page - * and fill it with file data. For efficiency, we may try to fill - * multiple pages at once (klustering) to fill up the supplied page - * list. Note that the pages to be filled are held with an exclusive - * lock to prevent access by other threads while they are being filled. - */ -static int -zfs_fillpage(vnode_t *vp, u_offset_t off, struct seg *seg, - caddr_t addr, page_t *pl[], size_t plsz, enum seg_rw rw) -{ - znode_t *zp = VTOZ(vp); - page_t *pp, *cur_pp; - objset_t *os = zp->z_zfsvfs->z_os; - u_offset_t io_off, total; - size_t io_len; - int err; - - if (plsz == PAGESIZE || zp->z_blksz <= PAGESIZE) { - /* - * We only have a single page, don't bother klustering - */ - io_off = off; - io_len = PAGESIZE; - pp = page_create_va(vp, io_off, io_len, - PG_EXCL | PG_WAIT, seg, addr); - } else { - /* - * Try to find enough pages to fill the page list - */ - pp = pvn_read_kluster(vp, off, seg, addr, &io_off, - &io_len, off, plsz, 0); - } - if (pp == NULL) { - /* - * The page already exists, nothing to do here. - */ - *pl = NULL; - return (0); - } - - /* - * Fill the pages in the kluster. - */ - cur_pp = pp; - for (total = io_off + io_len; io_off < total; io_off += PAGESIZE) { - caddr_t va; - - ASSERT3U(io_off, ==, cur_pp->p_offset); - va = zfs_map_page(cur_pp, S_WRITE); - err = dmu_read(os, zp->z_id, io_off, PAGESIZE, va, - DMU_READ_PREFETCH); - zfs_unmap_page(cur_pp, va); - if (err) { - /* On error, toss the entire kluster */ - pvn_read_done(pp, B_ERROR); - /* convert checksum errors into IO errors */ - if (err == ECKSUM) - err = SET_ERROR(EIO); - return (err); - } - cur_pp = cur_pp->p_next; - } - - /* - * Fill in the page list array from the kluster starting - * from the desired offset `off'. - * NOTE: the page list will always be null terminated. - */ - pvn_plist_init(pp, pl, plsz, off, io_len, rw); - ASSERT(pl == NULL || (*pl)->p_offset == off); - - return (0); -} - -/* - * Return pointers to the pages for the file region [off, off + len] - * in the pl array. If plsz is greater than len, this function may - * also return page pointers from after the specified region - * (i.e. the region [off, off + plsz]). These additional pages are - * only returned if they are already in the cache, or were created as - * part of a klustered read. - * - * IN: vp - vnode of file to get data from. - * off - position in file to get data from. - * len - amount of data to retrieve. - * plsz - length of provided page list. - * seg - segment to obtain pages for. - * addr - virtual address of fault. - * rw - mode of created pages. - * cr - credentials of caller. - * ct - caller context. - * - * OUT: protp - protection mode of created pages. - * pl - list of pages created. - * - * RETURN: 0 on success, error code on failure. - * - * Timestamps: - * vp - atime updated - */ -/* ARGSUSED */ -static int -zfs_getpage(vnode_t *vp, offset_t off, size_t len, uint_t *protp, - page_t *pl[], size_t plsz, struct seg *seg, caddr_t addr, - enum seg_rw rw, cred_t *cr, caller_context_t *ct) -{ - znode_t *zp = VTOZ(vp); - zfsvfs_t *zfsvfs = zp->z_zfsvfs; - page_t **pl0 = pl; - int err = 0; - - /* we do our own caching, faultahead is unnecessary */ - if (pl == NULL) - return (0); - else if (len > plsz) - len = plsz; - else - len = P2ROUNDUP(len, PAGESIZE); - ASSERT(plsz >= len); - - ZFS_ENTER(zfsvfs); - ZFS_VERIFY_ZP(zp); - - if (protp) - *protp = PROT_ALL; - - /* - * Loop through the requested range [off, off + len) looking - * for pages. If we don't find a page, we will need to create - * a new page and fill it with data from the file. - */ - while (len > 0) { - if (*pl = page_lookup(vp, off, SE_SHARED)) - *(pl+1) = NULL; - else if (err = zfs_fillpage(vp, off, seg, addr, pl, plsz, rw)) - goto out; - while (*pl) { - ASSERT3U((*pl)->p_offset, ==, off); - off += PAGESIZE; - addr += PAGESIZE; - if (len > 0) { - ASSERT3U(len, >=, PAGESIZE); - len -= PAGESIZE; - } - ASSERT3U(plsz, >=, PAGESIZE); - plsz -= PAGESIZE; - pl++; - } - } - - /* - * Fill out the page array with any pages already in the cache. - */ - while (plsz > 0 && - (*pl++ = page_lookup_nowait(vp, off, SE_SHARED))) { - off += PAGESIZE; - plsz -= PAGESIZE; - } -out: - if (err) { - /* - * Release any pages we have previously locked. - */ - while (pl > pl0) - page_unlock(*--pl); - } else { - ZFS_ACCESSTIME_STAMP(zfsvfs, zp); - } - - *pl = NULL; - - ZFS_EXIT(zfsvfs); - return (err); -} - -/* - * Request a memory map for a section of a file. This code interacts - * with common code and the VM system as follows: - * - * - common code calls mmap(), which ends up in smmap_common() - * - this calls VOP_MAP(), which takes you into (say) zfs - * - zfs_map() calls as_map(), passing segvn_create() as the callback - * - segvn_create() creates the new segment and calls VOP_ADDMAP() - * - zfs_addmap() updates z_mapcnt - */ -/*ARGSUSED*/ -static int -zfs_map(vnode_t *vp, offset_t off, struct as *as, caddr_t *addrp, - size_t len, uchar_t prot, uchar_t maxprot, uint_t flags, cred_t *cr, - caller_context_t *ct) -{ - znode_t *zp = VTOZ(vp); - zfsvfs_t *zfsvfs = zp->z_zfsvfs; - segvn_crargs_t vn_a; - int error; - - ZFS_ENTER(zfsvfs); - ZFS_VERIFY_ZP(zp); - - if ((prot & PROT_WRITE) && (zp->z_pflags & - (ZFS_IMMUTABLE | ZFS_READONLY | ZFS_APPENDONLY))) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EPERM)); - } - - if ((prot & (PROT_READ | PROT_EXEC)) && - (zp->z_pflags & ZFS_AV_QUARANTINED)) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EACCES)); - } - - if (vp->v_flag & VNOMAP) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(ENOSYS)); - } - - if (off < 0 || len > MAXOFFSET_T - off) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(ENXIO)); - } - - if (vp->v_type != VREG) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(ENODEV)); - } - - /* - * If file is locked, disallow mapping. - */ - if (MANDMODE(zp->z_mode) && vn_has_flocks(vp)) { - ZFS_EXIT(zfsvfs); - return (SET_ERROR(EAGAIN)); - } - - as_rangelock(as); - error = choose_addr(as, addrp, len, off, ADDR_VACALIGN, flags); - if (error != 0) { - as_rangeunlock(as); - ZFS_EXIT(zfsvfs); - return (error); - } - - vn_a.vp = vp; - vn_a.offset = (u_offset_t)off; - vn_a.type = flags & MAP_TYPE; - vn_a.prot = prot; - vn_a.maxprot = maxprot; - vn_a.cred = cr; - vn_a.amp = NULL; - vn_a.flags = flags & ~MAP_TYPE; - vn_a.szc = 0; - vn_a.lgrp_mem_policy_flags = 0; - - error = as_map(as, *addrp, len, segvn_create, &vn_a); - - as_rangeunlock(as); - ZFS_EXIT(zfsvfs); - return (error); -} - -/* ARGSUSED */ -static int -zfs_addmap(vnode_t *vp, offset_t off, struct as *as, caddr_t addr, - size_t len, uchar_t prot, uchar_t maxprot, uint_t flags, cred_t *cr, - caller_context_t *ct) -{ - uint64_t pages = btopr(len); - - atomic_add_64(&VTOZ(vp)->z_mapcnt, pages); - return (0); -} - -/* - * The reason we push dirty pages as part of zfs_delmap() is so that we get a - * more accurate mtime for the associated file. Since we don't have a way of - * detecting when the data was actually modified, we have to resort to - * heuristics. If an explicit msync() is done, then we mark the mtime when the - * last page is pushed. The problem occurs when the msync() call is omitted, - * which by far the most common case: - * - * open() - * mmap() - * - * munmap() - * close() - *