Index: projects/clang380-import/bin/setfacl/setfacl.1 =================================================================== --- projects/clang380-import/bin/setfacl/setfacl.1 (revision 294776) +++ projects/clang380-import/bin/setfacl/setfacl.1 (revision 294777) @@ -1,492 +1,493 @@ .\"- .\" Copyright (c) 2001 Chris D. Faulhaber .\" Copyright (c) 2011 Edward Tomasz NapieraƂa .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd September 4, 2015 +.Dd January 23, 2016 .Dt SETFACL 1 .Os .Sh NAME .Nm setfacl .Nd set ACL information .Sh SYNOPSIS .Nm .Op Fl bdhkn .Op Fl a Ar position entries .Op Fl m Ar entries .Op Fl M Ar file .Op Fl x Ar entries | position .Op Fl X Ar file .Op Ar .Sh DESCRIPTION The .Nm utility sets discretionary access control information on the specified file(s). If no files are specified, or the list consists of the only .Sq Fl , the file names are taken from the standard input. .Pp The following options are available: .Bl -tag -width indent .It Fl a Ar position entries Modify the ACL on the specified files by inserting new ACL entries specified in .Ar entries , starting at position .Ar position , counting from zero. This option is only applicable to NFSv4 ACLs. .It Fl b -Remove all ACL entries except for the three required entries -(POSIX.1e ACLs) or six "canonical" entries (NFSv4 ACLs). +Remove all ACL entries except for the ones synthesized +from the file mode - the three mandatory entries in case +of POSIX.1e ACL. If the POSIX.1e ACL contains a .Dq Li mask entry, the permissions of the .Dq Li group entry in the resulting ACL will be set to the permission associated with both the .Dq Li group and .Dq Li mask entries of the current ACL. .It Fl d The operations apply to the default ACL entries instead of access ACL entries. Currently only directories may have default ACL's. This option is not applicable to NFSv4 ACLs. .It Fl h If the target of the operation is a symbolic link, perform the operation on the symbolic link itself, rather than following the link. .It Fl k Delete any default ACL entries on the specified files. It is not considered an error if the specified files do not have any default ACL entries. An error will be reported if any of the specified files cannot have a default entry (i.e.\& non-directories). This option is not applicable to NFSv4 ACLs. .It Fl m Ar entries Modify the ACL on the specified file. New entries will be added, and existing entries will be modified according to the .Ar entries argument. For NFSv4 ACLs, it is recommended to use the .Fl a and .Fl x options instead. .It Fl M Ar file Modify the ACL entries on the specified files by adding new ACL entries and modifying existing ACL entries with the ACL entries specified in the file .Ar file . If .Ar file is .Fl , the input is taken from stdin. .It Fl n Do not recalculate the permissions associated with the ACL mask entry. This option is not applicable to NFSv4 ACLs. .It Fl x Ar entries | position If .Ar entries is specified, remove the ACL entries specified there from the access or default ACL of the specified files. Otherwise, remove entry at index .Ar position , counting from zero. .It Fl X Ar file Remove the ACL entries specified in the file .Ar file from the access or default ACL of the specified files. .El .Pp The above options are evaluated in the order specified on the command-line. .Sh POSIX.1e ACL ENTRIES A POSIX.1E ACL entry contains three colon-separated fields: an ACL tag, an ACL qualifier, and discretionary access permissions: .Bl -tag -width indent .It Ar "ACL tag" The ACL tag specifies the ACL entry type and consists of one of the following: .Dq Li user or .Ql u specifying the access granted to the owner of the file or a specified user; .Dq Li group or .Ql g specifying the access granted to the file owning group or a specified group; .Dq Li other or .Ql o specifying the access granted to any process that does not match any user or group ACL entry; .Dq Li mask or .Ql m specifying the maximum access granted to any ACL entry except the .Dq Li user ACL entry for the file owner and the .Dq Li other ACL entry. .It Ar "ACL qualifier" The ACL qualifier field describes the user or group associated with the ACL entry. It may consist of one of the following: uid or user name, gid or group name, or empty. For .Dq Li user ACL entries, an empty field specifies access granted to the file owner. For .Dq Li group ACL entries, an empty field specifies access granted to the file owning group. .Dq Li mask and .Dq Li other ACL entries do not use this field. .It Ar "access permissions" The access permissions field contains up to one of each of the following: .Ql r , .Ql w , and .Ql x to set read, write, and execute permissions, respectively. Each of these may be excluded or replaced with a .Ql - character to indicate no access. .El .Pp A .Dq Li mask ACL entry is required on a file with any ACL entries other than the default .Dq Li user , .Dq Li group , and .Dq Li other ACL entries. If the .Fl n option is not specified and no .Dq Li mask ACL entry was specified, the .Nm utility will apply a .Dq Li mask ACL entry consisting of the union of the permissions associated with all .Dq Li group ACL entries in the resulting ACL. .Pp Traditional POSIX interfaces acting on file system object modes have modified semantics in the presence of POSIX.1e extended ACLs. When a mask entry is present on the access ACL of an object, the mask entry is substituted for the group bits; this occurs in programs such as .Xr stat 1 or .Xr ls 1 . When the mode is modified on an object that has a mask entry, the changes applied to the group bits will actually be applied to the mask entry. These semantics provide for greater application compatibility: applications modifying the mode instead of the ACL will see conservative behavior, limiting the effective rights granted by all of the additional user and group entries; this occurs in programs such as .Xr chmod 1 . .Pp ACL entries applied from a file using the .Fl M or .Fl X options shall be of the following form: one ACL entry per line, as previously specified; whitespace is ignored; any text after a .Ql # is ignored (comments). .Pp When POSIX.1e ACL entries are evaluated, the access check algorithm checks the ACL entries in the following order: file owner, .Dq Li user ACL entries, file owning group, .Dq Li group ACL entries, and .Dq Li other ACL entry. .Pp Multiple ACL entries specified on the command line are separated by commas. .Pp It is possible for files and directories to inherit ACL entries from their parent directory. This is accomplished through the use of the default ACL. It should be noted that before you can specify a default ACL, the mandatory ACL entries for user, group, other and mask must be set. For more details see the examples below. Default ACLs can be created by using .Fl d . .Sh NFSv4 ACL ENTRIES An NFSv4 ACL entry contains four or five colon-separated fields: an ACL tag, an ACL qualifier (only for .Dq Li user and .Dq Li group tags), discretionary access permissions, ACL inheritance flags, and ACL type: .Bl -tag -width indent .It Ar "ACL tag" The ACL tag specifies the ACL entry type and consists of one of the following: .Dq Li user or .Ql u specifying the access granted to the specified user; .Dq Li group or .Ql g specifying the access granted to the specified group; .Dq Li owner@ specifying the access granted to the owner of the file; .Dq Li group@ specifying the access granted to the file owning group; .Dq Li everyone@ specifying everyone. Note that .Dq Li everyone@ is not the same as traditional Unix .Dq Li other - it means, literally, everyone, including file owner and owning group. .It Ar "ACL qualifier" The ACL qualifier field describes the user or group associated with the ACL entry. It may consist of one of the following: uid or user name, or gid or group name. In entries whose tag type is one of .Dq Li owner@ , .Dq Li group@ , or .Dq Li everyone@ , this field is omitted altogether, including the trailing comma. .It Ar "access permissions" Access permissions may be specified in either short or long form. Short and long forms may not be mixed. Permissions in long form are separated by the .Ql / character; in short form, they are concatenated together. Valid permissions are: .Bl -tag -width ".Dv modify_set" .It Short Long .It r read_data .It w write_data .It x execute .It p append_data .It D delete_child .It d delete .It a read_attributes .It A write_attributes .It R read_xattr .It W write_xattr .It c read_acl .It C write_acl .It o write_owner .It s synchronize .El .Pp In addition, the following permission sets may be used: .Bl -tag -width ".Dv modify_set" .It Set Permissions .It full_set all permissions, as shown above .It modify_set all permissions except write_acl and write_owner .It read_set read_data, read_attributes, read_xattr and read_acl .It write_set write_data, append_data, write_attributes and write_xattr .El .It Ar "ACL inheritance flags" Inheritance flags may be specified in either short or long form. Short and long forms may not be mixed. Access flags in long form are separated by the .Ql / character; in short form, they are concatenated together. Valid inheritance flags are: .Bl -tag -width ".Dv short" .It Short Long .It f file_inherit .It d dir_inherit .It i inherit_only .It n no_propagate .It I inherited .El .Pp Other than the "inherited" flag, inheritance flags may be only set on directories. .It Ar "ACL type" The ACL type field is either .Dq Li allow or .Dq Li deny . .El .Pp ACL entries applied from a file using the .Fl M or .Fl X options shall be of the following form: one ACL entry per line, as previously specified; whitespace is ignored; any text after a .Ql # is ignored (comments). .Pp NFSv4 ACL entries are evaluated in their visible order. .Pp Multiple ACL entries specified on the command line are separated by commas. .Pp Note that the file owner is always granted the read_acl, write_acl, read_attributes, and write_attributes permissions, even if the ACL would deny it. .Sh EXIT STATUS .Ex -std .Sh EXAMPLES .Dl setfacl -d -m u::rwx,g::rx,o::rx,mask::rwx dir .Dl setfacl -d -m g:admins:rwx dir .Pp The first command sets the mandatory elements of the POSIX.1e default ACL. The second command specifies that users in group admins can have read, write, and execute permissions for directory named "dir". It should be noted that any files or directories created underneath "dir" will inherit these default ACLs upon creation. .Pp .Dl setfacl -m u::rwx,g:mail:rw file .Pp Sets read, write, and execute permissions for the .Pa file owner's POSIX.1e ACL entry and read and write permissions for group mail on .Pa file . .Pp .Dl setfacl -m owner@:rwxp::allow,g:mail:rwp::allow file .Pp Semantically equal to the example above, but for NFSv4 ACL. .Pp .Dl setfacl -M file1 file2 .Pp Sets/updates the ACL entries contained in .Pa file1 on .Pa file2 . .Pp .Dl setfacl -x g:mail:rw file .Pp Remove the group mail POSIX.1e ACL entry containing read/write permissions from .Pa file . .Pp .Dl setfacl -x0 file .Pp Remove the first entry from the NFSv4 ACL from .Pa file . .Pp .Dl setfacl -bn file .Pp Remove all .Dq Li access ACL entries except for the three required from .Pa file . .Pp .Dl getfacl file1 | setfacl -b -n -M - file2 .Pp Copy ACL entries from .Pa file1 to .Pa file2 . .Sh SEE ALSO .Xr getfacl 1 , .Xr acl 3 , .Xr getextattr 8 , .Xr setextattr 8 , .Xr acl 9 , .Xr extattr 9 .Sh STANDARDS The .Nm utility is expected to be .Tn IEEE Std 1003.2c compliant. .Sh HISTORY Extended Attribute and Access Control List support was developed as part of the .Tn TrustedBSD Project and introduced in .Fx 5.0 . NFSv4 ACL support was introduced in .Fx 8.1 . .Sh AUTHORS .An -nosplit The .Nm utility was written by .An Chris D. Faulhaber Aq Mt jedgar@fxp.org . NFSv4 ACL support was implemented by .An Edward Tomasz Napierala Aq Mt trasz@FreeBSD.org . Index: projects/clang380-import/bin/sh/cd.c =================================================================== --- projects/clang380-import/bin/sh/cd.c (revision 294776) +++ projects/clang380-import/bin/sh/cd.c (revision 294777) @@ -1,421 +1,421 @@ /*- * Copyright (c) 1991, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Kenneth Almquist. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #ifndef lint #if 0 static char sccsid[] = "@(#)cd.c 8.2 (Berkeley) 5/4/95"; #endif #endif /* not lint */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include /* * The cd and pwd commands. */ #include "shell.h" #include "var.h" #include "nodes.h" /* for jobs.h */ #include "jobs.h" #include "options.h" #include "output.h" #include "memalloc.h" #include "error.h" #include "exec.h" #include "redir.h" #include "mystring.h" #include "show.h" #include "cd.h" #include "builtins.h" static int cdlogical(char *); static int cdphysical(char *); static int docd(char *, int, int); -static char *getcomponent(void); +static char *getcomponent(char **); static char *findcwd(char *); static void updatepwd(char *); static char *getpwd(void); static char *getpwd2(void); static char *curdir = NULL; /* current working directory */ -static char *prevdir; /* previous working directory */ -static char *cdcomppath; int cdcmd(int argc __unused, char **argv __unused) { const char *dest; const char *path; char *p; struct stat statb; int ch, phys, print = 0, getcwderr = 0; int rc; int errno1 = ENOENT; phys = Pflag; while ((ch = nextopt("eLP")) != '\0') { switch (ch) { case 'e': getcwderr = 1; break; case 'L': phys = 0; break; case 'P': phys = 1; break; } } if (*argptr != NULL && argptr[1] != NULL) error("too many arguments"); if ((dest = *argptr) == NULL && (dest = bltinlookup("HOME", 1)) == NULL) error("HOME not set"); if (*dest == '\0') dest = "."; if (dest[0] == '-' && dest[1] == '\0') { - dest = prevdir ? prevdir : curdir; - if (dest) - print = 1; - else - dest = "."; + dest = bltinlookup("OLDPWD", 1); + if (dest == NULL) + error("OLDPWD not set"); + print = 1; } if (dest[0] == '/' || (dest[0] == '.' && (dest[1] == '/' || dest[1] == '\0')) || (dest[0] == '.' && dest[1] == '.' && (dest[2] == '/' || dest[2] == '\0')) || (path = bltinlookup("CDPATH", 1)) == NULL) path = ""; while ((p = padvance(&path, dest)) != NULL) { if (stat(p, &statb) < 0) { if (errno != ENOENT) errno1 = errno; } else if (!S_ISDIR(statb.st_mode)) errno1 = ENOTDIR; else { if (!print) { /* * XXX - rethink */ if (p[0] == '.' && p[1] == '/' && p[2] != '\0') print = strcmp(p + 2, dest); else print = strcmp(p, dest); } rc = docd(p, print, phys); if (rc >= 0) return getcwderr ? rc : 0; if (errno != ENOENT) errno1 = errno; } } error("%s: %s", dest, strerror(errno1)); /*NOTREACHED*/ return 0; } /* * Actually change the directory. In an interactive shell, print the * directory name if "print" is nonzero. */ static int docd(char *dest, int print, int phys) { int rc; TRACE(("docd(\"%s\", %d, %d) called\n", dest, print, phys)); /* If logical cd fails, fall back to physical. */ if ((phys || (rc = cdlogical(dest)) < 0) && (rc = cdphysical(dest)) < 0) return (-1); if (print && iflag && curdir) out1fmt("%s\n", curdir); return (rc); } static int cdlogical(char *dest) { char *p; char *q; char *component; + char *path; struct stat statb; int first; int badstat; /* * Check each component of the path. If we find a symlink or * something we can't stat, clear curdir to force a getcwd() * next time we get the value of the current directory. */ badstat = 0; - cdcomppath = stsavestr(dest); + path = stsavestr(dest); STARTSTACKSTR(p); if (*dest == '/') { STPUTC('/', p); - cdcomppath++; + path++; } first = 1; - while ((q = getcomponent()) != NULL) { + while ((q = getcomponent(&path)) != NULL) { if (q[0] == '\0' || (q[0] == '.' && q[1] == '\0')) continue; if (! first) STPUTC('/', p); first = 0; component = q; STPUTS(q, p); if (equal(component, "..")) continue; STACKSTRNUL(p); if (lstat(stackblock(), &statb) < 0) { badstat = 1; break; } } INTOFF; if ((p = findcwd(badstat ? NULL : dest)) == NULL || chdir(p) < 0) { INTON; return (-1); } updatepwd(p); INTON; return (0); } static int cdphysical(char *dest) { char *p; int rc = 0; INTOFF; if (chdir(dest) < 0) { INTON; return (-1); } p = findcwd(NULL); if (p == NULL) { warning("warning: failed to get name of current directory"); rc = 1; } updatepwd(p); INTON; return (rc); } /* - * Get the next component of the path name pointed to by cdcomppath. - * This routine overwrites the string pointed to by cdcomppath. + * Get the next component of the path name pointed to by *path. + * This routine overwrites *path and the string pointed to by it. */ static char * -getcomponent(void) +getcomponent(char **path) { char *p; char *start; - if ((p = cdcomppath) == NULL) + if ((p = *path) == NULL) return NULL; - start = cdcomppath; + start = *path; while (*p != '/' && *p != '\0') p++; if (*p == '\0') { - cdcomppath = NULL; + *path = NULL; } else { *p++ = '\0'; - cdcomppath = p; + *path = p; } return start; } static char * findcwd(char *dir) { char *new; char *p; + char *path; /* * If our argument is NULL, we don't know the current directory * any more because we traversed a symbolic link or something * we couldn't stat(). */ if (dir == NULL || curdir == NULL) return getpwd2(); - cdcomppath = stsavestr(dir); + path = stsavestr(dir); STARTSTACKSTR(new); if (*dir != '/') { STPUTS(curdir, new); if (STTOPC(new) == '/') STUNPUTC(new); } - while ((p = getcomponent()) != NULL) { + while ((p = getcomponent(&path)) != NULL) { if (equal(p, "..")) { while (new > stackblock() && (STUNPUTC(new), *new) != '/'); } else if (*p != '\0' && ! equal(p, ".")) { STPUTC('/', new); STPUTS(p, new); } } if (new == stackblock()) STPUTC('/', new); STACKSTRNUL(new); return stackblock(); } /* * Update curdir (the name of the current directory) in response to a * cd command. We also call hashcd to let the routines in exec.c know * that the current directory has changed. */ static void updatepwd(char *dir) { + char *prevdir; + hashcd(); /* update command hash table */ - if (prevdir) - ckfree(prevdir); + setvar("PWD", dir, VEXPORT); + setvar("OLDPWD", curdir, VEXPORT); prevdir = curdir; curdir = dir ? savestr(dir) : NULL; - setvar("PWD", curdir, VEXPORT); - setvar("OLDPWD", prevdir, VEXPORT); + ckfree(prevdir); } int pwdcmd(int argc __unused, char **argv __unused) { char *p; int ch, phys; phys = Pflag; while ((ch = nextopt("LP")) != '\0') { switch (ch) { case 'L': phys = 0; break; case 'P': phys = 1; break; } } if (*argptr != NULL) error("too many arguments"); if (!phys && getpwd()) { out1str(curdir); out1c('\n'); } else { if ((p = getpwd2()) == NULL) error(".: %s", strerror(errno)); out1str(p); out1c('\n'); } return 0; } /* * Get the current directory and cache the result in curdir. */ static char * getpwd(void) { char *p; if (curdir) return curdir; p = getpwd2(); if (p != NULL) curdir = savestr(p); return curdir; } #define MAXPWD 256 /* * Return the current directory. */ static char * getpwd2(void) { char *pwd; int i; for (i = MAXPWD;; i *= 2) { pwd = stalloc(i); if (getcwd(pwd, i) != NULL) return pwd; stunalloc(pwd); if (errno != ERANGE) break; } return NULL; } /* * Initialize PWD in a new shell. * If the shell is interactive, we need to warn if this fails. */ void pwd_init(int warn) { char *pwd; struct stat stdot, stpwd; pwd = lookupvar("PWD"); if (pwd && *pwd == '/' && stat(".", &stdot) != -1 && stat(pwd, &stpwd) != -1 && stdot.st_dev == stpwd.st_dev && stdot.st_ino == stpwd.st_ino) { if (curdir) ckfree(curdir); curdir = savestr(pwd); } if (getpwd() == NULL && warn) out2fmt_flush("sh: cannot determine working directory\n"); setvar("PWD", curdir, VEXPORT); } Index: projects/clang380-import/bin/sh/expand.c =================================================================== --- projects/clang380-import/bin/sh/expand.c (revision 294776) +++ projects/clang380-import/bin/sh/expand.c (revision 294777) @@ -1,1539 +1,1545 @@ /*- * Copyright (c) 1991, 1993 * The Regents of the University of California. All rights reserved. * Copyright (c) 1997-2005 * Herbert Xu . All rights reserved. * Copyright (c) 2010-2015 * Jilles Tjoelker . All rights reserved. * * This code is derived from software contributed to Berkeley by * Kenneth Almquist. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #ifndef lint #if 0 static char sccsid[] = "@(#)expand.c 8.5 (Berkeley) 5/15/95"; #endif #endif /* not lint */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* * Routines to expand arguments to commands. We have to deal with * backquotes, shell variables, and file metacharacters. */ #include "shell.h" #include "main.h" #include "nodes.h" #include "eval.h" #include "expand.h" #include "syntax.h" #include "parser.h" #include "jobs.h" #include "options.h" #include "var.h" #include "input.h" #include "output.h" #include "memalloc.h" #include "error.h" #include "mystring.h" #include "arith.h" #include "show.h" #include "builtins.h" enum wordstate { WORD_IDLE, WORD_WS_DELIMITED, WORD_QUOTEMARK }; struct worddest { struct arglist *list; enum wordstate state; }; static char *expdest; /* output of current string */ static struct nodelist *argbackq; /* list of back quote expressions */ -static char *argstr(char *, int, struct worddest *); -static char *exptilde(char *, int); -static char *expari(char *, int, struct worddest *); +static const char *argstr(const char *, int, struct worddest *); +static const char *exptilde(const char *, int); +static const char *expari(const char *, int, struct worddest *); static void expbackq(union node *, int, int, struct worddest *); -static void subevalvar_trim(char *, int, int, int); -static int subevalvar_misc(char *, const char *, int, int, int); -static char *evalvar(char *, int, struct worddest *); +static void subevalvar_trim(const char *, int, int, int); +static int subevalvar_misc(const char *, const char *, int, int, int); +static const char *evalvar(const char *, int, struct worddest *); static int varisset(const char *, int); static void strtodest(const char *, int, int, int, struct worddest *); static void reprocess(int, int, int, int, struct worddest *); static void varvalue(const char *, int, int, int, struct worddest *); static void expandmeta(char *, struct arglist *); static void expmeta(char *, char *, struct arglist *); static int expsortcmp(const void *, const void *); static int patmatch(const char *, const char *); static void cvtnum(int, char *); static int collate_range_cmp(wchar_t, wchar_t); void emptyarglist(struct arglist *list) { list->args = list->smallarg; list->count = 0; list->capacity = sizeof(list->smallarg) / sizeof(list->smallarg[0]); } void appendarglist(struct arglist *list, char *str) { char **newargs; int newcapacity; if (list->count >= list->capacity) { newcapacity = list->capacity * 2; if (newcapacity < 16) newcapacity = 16; if (newcapacity > INT_MAX / (int)sizeof(newargs[0])) error("Too many entries in arglist"); newargs = stalloc(newcapacity * sizeof(newargs[0])); memcpy(newargs, list->args, list->count * sizeof(newargs[0])); list->args = newargs; list->capacity = newcapacity; } list->args[list->count++] = str; } static int collate_range_cmp(wchar_t c1, wchar_t c2) { static wchar_t s1[2], s2[2]; s1[0] = c1; s2[0] = c2; return (wcscoll(s1, s2)); } static char * stputs_quotes(const char *data, const char *syntax, char *p) { while (*data) { CHECKSTRSPACE(2, p); if (syntax[(int)*data] == CCTL) USTPUTC(CTLESC, p); USTPUTC(*data++, p); } return (p); } #define STPUTS_QUOTES(data, syntax, p) p = stputs_quotes((data), syntax, p) static char * nextword(char c, int flag, char *p, struct worddest *dst) { int is_ws; is_ws = c == '\t' || c == '\n' || c == ' '; if (p != stackblock() || (is_ws ? dst->state == WORD_QUOTEMARK : dst->state != WORD_WS_DELIMITED) || c == '\0') { STPUTC('\0', p); if (flag & EXP_GLOB) expandmeta(grabstackstr(p), dst->list); else appendarglist(dst->list, grabstackstr(p)); dst->state = is_ws ? WORD_WS_DELIMITED : WORD_IDLE; } else if (!is_ws && dst->state == WORD_WS_DELIMITED) dst->state = WORD_IDLE; /* Reserve space while the stack string is empty. */ appendarglist(dst->list, NULL); dst->list->count--; STARTSTACKSTR(p); return p; } #define NEXTWORD(c, flag, p, dstlist) p = nextword(c, flag, p, dstlist) static char * stputs_split(const char *data, const char *syntax, int flag, char *p, struct worddest *dst) { const char *ifs; char c; ifs = ifsset() ? ifsval() : " \t\n"; while (*data) { CHECKSTRSPACE(2, p); c = *data++; if (strchr(ifs, c) != NULL) { NEXTWORD(c, flag, p, dst); continue; } if (flag & EXP_GLOB && syntax[(int)c] == CCTL) USTPUTC(CTLESC, p); USTPUTC(c, p); } return (p); } #define STPUTS_SPLIT(data, syntax, flag, p, dst) p = stputs_split((data), syntax, flag, p, dst) /* * Perform expansions on an argument, placing the resulting list of arguments * in arglist. Parameter expansion, command substitution and arithmetic * expansion are always performed; additional expansions can be requested * via flag (EXP_*). * The result is left in the stack string. * When arglist is NULL, perform here document expansion. * * Caution: this function uses global state and is not reentrant. * However, a new invocation after an interrupted invocation is safe * and will reset the global state for the new call. */ void expandarg(union node *arg, struct arglist *arglist, int flag) { struct worddest exparg; if (fflag) flag &= ~EXP_GLOB; argbackq = arg->narg.backquote; exparg.list = arglist; exparg.state = WORD_IDLE; STARTSTACKSTR(expdest); argstr(arg->narg.text, flag, &exparg); if (arglist == NULL) { STACKSTRNUL(expdest); return; /* here document expanded */ } if ((flag & EXP_SPLIT) == 0 || expdest != stackblock() || exparg.state == WORD_QUOTEMARK) { STPUTC('\0', expdest); if (flag & EXP_SPLIT) { if (flag & EXP_GLOB) expandmeta(grabstackstr(expdest), exparg.list); else appendarglist(exparg.list, grabstackstr(expdest)); } } if ((flag & EXP_SPLIT) == 0) appendarglist(arglist, grabstackstr(expdest)); } /* * Perform parameter expansion, command substitution and arithmetic * expansion, and tilde expansion if requested via EXP_TILDE/EXP_VARTILDE. * Processing ends at a CTLENDVAR or CTLENDARI character as well as '\0'. * This is used to expand word in ${var+word} etc. * If EXP_GLOB or EXP_CASE are set, keep and/or generate CTLESC * characters to allow for further processing. * * If EXP_SPLIT is set, dst receives any complete words produced. */ -static char * -argstr(char *p, int flag, struct worddest *dst) +static const char * +argstr(const char *p, int flag, struct worddest *dst) { char c; int quotes = flag & (EXP_GLOB | EXP_CASE); /* do CTLESC */ int firsteq = 1; int split_lit; int lit_quoted; split_lit = flag & EXP_SPLIT_LIT; lit_quoted = flag & EXP_LIT_QUOTED; flag &= ~(EXP_SPLIT_LIT | EXP_LIT_QUOTED); if (*p == '~' && (flag & (EXP_TILDE | EXP_VARTILDE))) p = exptilde(p, flag); for (;;) { CHECKSTRSPACE(2, expdest); switch (c = *p++) { case '\0': return (p - 1); case CTLENDVAR: case CTLENDARI: return (p); case CTLQUOTEMARK: lit_quoted = 1; /* "$@" syntax adherence hack */ if (p[0] == CTLVAR && (p[1] & VSQUOTE) != 0 && p[2] == '@' && p[3] == '=') break; if ((flag & EXP_SPLIT) != 0 && expdest == stackblock()) dst->state = WORD_QUOTEMARK; break; case CTLQUOTEEND: lit_quoted = 0; break; case CTLESC: c = *p++; if (split_lit && !lit_quoted && strchr(ifsset() ? ifsval() : " \t\n", c) != NULL) { NEXTWORD(c, flag, expdest, dst); break; } if (quotes) USTPUTC(CTLESC, expdest); USTPUTC(c, expdest); break; case CTLVAR: p = evalvar(p, flag, dst); break; case CTLBACKQ: case CTLBACKQ|CTLQUOTE: expbackq(argbackq->n, c & CTLQUOTE, flag, dst); argbackq = argbackq->next; break; case CTLARI: p = expari(p, flag, dst); break; case ':': case '=': /* * sort of a hack - expand tildes in variable * assignments (after the first '=' and after ':'s). */ if (split_lit && !lit_quoted && strchr(ifsset() ? ifsval() : " \t\n", c) != NULL) { NEXTWORD(c, flag, expdest, dst); break; } USTPUTC(c, expdest); if (flag & EXP_VARTILDE && *p == '~' && (c != '=' || firsteq)) { if (c == '=') firsteq = 0; p = exptilde(p, flag); } break; default: if (split_lit && !lit_quoted && strchr(ifsset() ? ifsval() : " \t\n", c) != NULL) { NEXTWORD(c, flag, expdest, dst); break; } USTPUTC(c, expdest); } } } /* * Perform tilde expansion, placing the result in the stack string and * returning the next position in the input string to process. */ -static char * -exptilde(char *p, int flag) +static const char * +exptilde(const char *p, int flag) { - char c, *startp = p; + char c; + const char *startp = p; + const char *user; struct passwd *pw; char *home; + int len; for (;;) { c = *p; switch(c) { case CTLESC: /* This means CTL* are always considered quoted. */ case CTLVAR: case CTLBACKQ: case CTLBACKQ | CTLQUOTE: case CTLARI: case CTLENDARI: case CTLQUOTEMARK: return (startp); case ':': if ((flag & EXP_VARTILDE) == 0) break; /* FALLTHROUGH */ case '\0': case '/': case CTLENDVAR: - *p = '\0'; - if (*(startp+1) == '\0') { + len = p - startp - 1; + STPUTBIN(startp + 1, len, expdest); + STACKSTRNUL(expdest); + user = expdest - len; + if (*user == '\0') { home = lookupvar("HOME"); } else { - pw = getpwnam(startp+1); + pw = getpwnam(user); home = pw != NULL ? pw->pw_dir : NULL; } - *p = c; + STADJUST(-len, expdest); if (home == NULL || *home == '\0') return (startp); strtodest(home, flag, VSNORMAL, 1, NULL); return (p); } p++; } } /* * Expand arithmetic expression. */ -static char * -expari(char *p, int flag, struct worddest *dst) +static const char * +expari(const char *p, int flag, struct worddest *dst) { char *q, *start; arith_t result; int begoff; int quoted; int adj; quoted = *p++ == '"'; begoff = expdest - stackblock(); p = argstr(p, 0, NULL); STPUTC('\0', expdest); start = stackblock() + begoff; q = grabstackstr(expdest); result = arith(start); ungrabstackstr(q, expdest); start = stackblock() + begoff; adj = start - expdest; STADJUST(adj, expdest); CHECKSTRSPACE((int)(DIGITS(result) + 1), expdest); fmtstr(expdest, DIGITS(result), ARITH_FORMAT_STR, result); adj = strlen(expdest); STADJUST(adj, expdest); if (!quoted) reprocess(expdest - adj - stackblock(), flag, VSNORMAL, 0, dst); return p; } /* * Perform command substitution. */ static void expbackq(union node *cmd, int quoted, int flag, struct worddest *dst) { struct backcmd in; int i; char buf[128]; char *p; char *dest = expdest; struct nodelist *saveargbackq; char lastc; char const *syntax = quoted? DQSYNTAX : BASESYNTAX; int quotes = flag & (EXP_GLOB | EXP_CASE); size_t nnl; const char *ifs; INTOFF; saveargbackq = argbackq; p = grabstackstr(dest); evalbackcmd(cmd, &in); ungrabstackstr(p, dest); argbackq = saveargbackq; p = in.buf; lastc = '\0'; nnl = 0; if (!quoted && flag & EXP_SPLIT) ifs = ifsset() ? ifsval() : " \t\n"; else ifs = ""; /* Don't copy trailing newlines */ for (;;) { if (--in.nleft < 0) { if (in.fd < 0) break; while ((i = read(in.fd, buf, sizeof buf)) < 0 && errno == EINTR); TRACE(("expbackq: read returns %d\n", i)); if (i <= 0) break; p = buf; in.nleft = i - 1; } lastc = *p++; if (lastc == '\0') continue; if (lastc == '\n') { nnl++; } else { if (nnl > 0) { if (strchr(ifs, '\n') != NULL) { NEXTWORD('\n', flag, dest, dst); nnl = 0; } else { CHECKSTRSPACE(nnl + 2, dest); while (nnl > 0) { nnl--; USTPUTC('\n', dest); } } } if (strchr(ifs, lastc) != NULL) NEXTWORD(lastc, flag, dest, dst); else { CHECKSTRSPACE(2, dest); if (quotes && syntax[(int)lastc] == CCTL) USTPUTC(CTLESC, dest); USTPUTC(lastc, dest); } } } if (in.fd >= 0) close(in.fd); if (in.buf) ckfree(in.buf); if (in.jp) exitstatus = waitforjob(in.jp, (int *)NULL); TRACE(("expbackq: size=%td: \"%.*s\"\n", ((dest - stackblock()) - startloc), (int)((dest - stackblock()) - startloc), stackblock() + startloc)); expdest = dest; INTON; } static void recordleft(const char *str, const char *loc, char *startp) { int amount; amount = ((str - 1) - (loc - startp)) - expdest; STADJUST(amount, expdest); while (loc != str - 1) *startp++ = *loc++; } static void -subevalvar_trim(char *p, int strloc, int subtype, int startloc) +subevalvar_trim(const char *p, int strloc, int subtype, int startloc) { char *startp; char *loc = NULL; char *str; int c = 0; struct nodelist *saveargbackq = argbackq; int amount; argstr(p, EXP_CASE | EXP_TILDE, NULL); STACKSTRNUL(expdest); argbackq = saveargbackq; startp = stackblock() + startloc; str = stackblock() + strloc; switch (subtype) { case VSTRIMLEFT: for (loc = startp; loc < str; loc++) { c = *loc; *loc = '\0'; if (patmatch(str, startp)) { *loc = c; recordleft(str, loc, startp); return; } *loc = c; } break; case VSTRIMLEFTMAX: for (loc = str - 1; loc >= startp;) { c = *loc; *loc = '\0'; if (patmatch(str, startp)) { *loc = c; recordleft(str, loc, startp); return; } *loc = c; loc--; } break; case VSTRIMRIGHT: for (loc = str - 1; loc >= startp;) { if (patmatch(str, loc)) { amount = loc - expdest; STADJUST(amount, expdest); return; } loc--; } break; case VSTRIMRIGHTMAX: for (loc = startp; loc < str - 1; loc++) { if (patmatch(str, loc)) { amount = loc - expdest; STADJUST(amount, expdest); return; } } break; default: abort(); } amount = (expdest - stackblock() - strloc) + 1; STADJUST(-amount, expdest); } static int -subevalvar_misc(char *p, const char *var, int subtype, int startloc, +subevalvar_misc(const char *p, const char *var, int subtype, int startloc, int varflags) { char *startp; struct nodelist *saveargbackq = argbackq; int amount; argstr(p, EXP_TILDE, NULL); STACKSTRNUL(expdest); argbackq = saveargbackq; startp = stackblock() + startloc; switch (subtype) { case VSASSIGN: setvar(var, startp, 0); amount = startp - expdest; STADJUST(amount, expdest); return 1; case VSQUESTION: if (*p != CTLENDVAR) { outfmt(out2, "%s\n", startp); error((char *)NULL); } error("%.*s: parameter %snot set", (int)(p - var - 1), var, (varflags & VSNUL) ? "null or " : ""); return 0; default: abort(); } } /* * Expand a variable, and return a pointer to the next character in the * input string. */ -static char * -evalvar(char *p, int flag, struct worddest *dst) +static const char * +evalvar(const char *p, int flag, struct worddest *dst) { int subtype; int varflags; - char *var; + const char *var; const char *val; int patloc; int c; int set; int special; int startloc; int varlen; int varlenb; char buf[21]; varflags = (unsigned char)*p++; subtype = varflags & VSTYPE; var = p; special = 0; if (! is_name(*p)) special = 1; p = strchr(p, '=') + 1; again: /* jump here after setting a variable with ${var=text} */ if (varflags & VSLINENO) { set = 1; special = 1; val = NULL; } else if (special) { set = varisset(var, varflags & VSNUL); val = NULL; } else { val = bltinlookup(var, 1); if (val == NULL || ((varflags & VSNUL) && val[0] == '\0')) { val = NULL; set = 0; } else set = 1; } varlen = 0; startloc = expdest - stackblock(); if (!set && uflag && *var != '@' && *var != '*') { switch (subtype) { case VSNORMAL: case VSTRIMLEFT: case VSTRIMLEFTMAX: case VSTRIMRIGHT: case VSTRIMRIGHTMAX: case VSLENGTH: error("%.*s: parameter not set", (int)(p - var - 1), var); } } if (set && subtype != VSPLUS) { /* insert the value of the variable */ if (special) { if (varflags & VSLINENO) { if (p - var > (ptrdiff_t)sizeof(buf)) abort(); memcpy(buf, var, p - var - 1); buf[p - var - 1] = '\0'; strtodest(buf, flag, subtype, varflags & VSQUOTE, dst); } else varvalue(var, varflags & VSQUOTE, subtype, flag, dst); if (subtype == VSLENGTH) { varlenb = expdest - stackblock() - startloc; varlen = varlenb; if (localeisutf8) { val = stackblock() + startloc; for (;val != expdest; val++) if ((*val & 0xC0) == 0x80) varlen--; } STADJUST(-varlenb, expdest); } } else { if (subtype == VSLENGTH) { for (;*val; val++) if (!localeisutf8 || (*val & 0xC0) != 0x80) varlen++; } else strtodest(val, flag, subtype, varflags & VSQUOTE, dst); } } if (subtype == VSPLUS) set = ! set; switch (subtype) { case VSLENGTH: cvtnum(varlen, buf); strtodest(buf, flag, VSNORMAL, varflags & VSQUOTE, dst); break; case VSNORMAL: break; case VSPLUS: case VSMINUS: if (!set) { argstr(p, flag | (flag & EXP_SPLIT ? EXP_SPLIT_LIT : 0) | (varflags & VSQUOTE ? EXP_LIT_QUOTED : 0), dst); break; } break; case VSTRIMLEFT: case VSTRIMLEFTMAX: case VSTRIMRIGHT: case VSTRIMRIGHTMAX: if (!set) break; /* * Terminate the string and start recording the pattern * right after it */ STPUTC('\0', expdest); patloc = expdest - stackblock(); subevalvar_trim(p, patloc, subtype, startloc); reprocess(startloc, flag, VSNORMAL, varflags & VSQUOTE, dst); if (flag & EXP_SPLIT && *var == '@' && varflags & VSQUOTE) dst->state = WORD_QUOTEMARK; break; case VSASSIGN: case VSQUESTION: if (!set) { if (subevalvar_misc(p, var, subtype, startloc, varflags)) { varflags &= ~VSNUL; goto again; } break; } break; case VSERROR: c = p - var - 1; error("${%.*s%s}: Bad substitution", c, var, (c > 0 && *p != CTLENDVAR) ? "..." : ""); default: abort(); } if (subtype != VSNORMAL) { /* skip to end of alternative */ int nesting = 1; for (;;) { if ((c = *p++) == CTLESC) p++; else if (c == CTLBACKQ || c == (CTLBACKQ|CTLQUOTE)) { if (set) argbackq = argbackq->next; } else if (c == CTLVAR) { if ((*p++ & VSTYPE) != VSNORMAL) nesting++; } else if (c == CTLENDVAR) { if (--nesting == 0) break; } } } return p; } /* * Test whether a specialized variable is set. */ static int varisset(const char *name, int nulok) { if (*name == '!') return backgndpidset(); else if (*name == '@' || *name == '*') { if (*shellparam.p == NULL) return 0; if (nulok) { char **av; for (av = shellparam.p; *av; av++) if (**av != '\0') return 1; return 0; } } else if (is_digit(*name)) { char *ap; long num; errno = 0; num = strtol(name, NULL, 10); if (errno != 0 || num > shellparam.nparam) return 0; if (num == 0) ap = arg0; else ap = shellparam.p[num - 1]; if (nulok && (ap == NULL || *ap == '\0')) return 0; } return 1; } static void strtodest(const char *p, int flag, int subtype, int quoted, struct worddest *dst) { if (subtype == VSLENGTH || subtype == VSTRIMLEFT || subtype == VSTRIMLEFTMAX || subtype == VSTRIMRIGHT || subtype == VSTRIMRIGHTMAX) STPUTS(p, expdest); else if (flag & EXP_SPLIT && !quoted && dst != NULL) STPUTS_SPLIT(p, BASESYNTAX, flag, expdest, dst); else if (flag & (EXP_GLOB | EXP_CASE)) STPUTS_QUOTES(p, quoted ? DQSYNTAX : BASESYNTAX, expdest); else STPUTS(p, expdest); } static void reprocess(int startloc, int flag, int subtype, int quoted, struct worddest *dst) { static char *buf = NULL; static size_t buflen = 0; char *startp; size_t len, zpos, zlen; startp = stackblock() + startloc; len = expdest - startp; if (len >= SIZE_MAX / 2) abort(); INTOFF; if (len >= buflen) { ckfree(buf); buf = NULL; } if (buflen < 128) buflen = 128; while (len >= buflen) buflen <<= 1; if (buf == NULL) buf = ckmalloc(buflen); INTON; memcpy(buf, startp, len); buf[len] = '\0'; STADJUST(-len, expdest); for (zpos = 0;;) { zlen = strlen(buf + zpos); strtodest(buf + zpos, flag, subtype, quoted, dst); zpos += zlen + 1; if (zpos == len + 1) break; if (flag & EXP_SPLIT && (quoted || (zlen > 0 && zpos < len))) NEXTWORD('\0', flag, expdest, dst); } } /* * Add the value of a specialized variable to the stack string. */ static void varvalue(const char *name, int quoted, int subtype, int flag, struct worddest *dst) { int num; char *p; int i; int splitlater; char sep[2]; char **ap; char buf[(NSHORTOPTS > 10 ? NSHORTOPTS : 10) + 1]; if (subtype == VSLENGTH) flag &= ~EXP_FULL; splitlater = subtype == VSTRIMLEFT || subtype == VSTRIMLEFTMAX || subtype == VSTRIMRIGHT || subtype == VSTRIMRIGHTMAX; switch (*name) { case '$': num = rootpid; break; case '?': num = oexitstatus; break; case '#': num = shellparam.nparam; break; case '!': num = backgndpidval(); break; case '-': p = buf; for (i = 0 ; i < NSHORTOPTS ; i++) { if (optval[i]) *p++ = optletter[i]; } *p = '\0'; strtodest(buf, flag, subtype, quoted, dst); return; case '@': if (flag & EXP_SPLIT && quoted) { for (ap = shellparam.p ; (p = *ap++) != NULL ; ) { strtodest(p, flag, subtype, quoted, dst); if (*ap) { if (splitlater) STPUTC('\0', expdest); else NEXTWORD('\0', flag, expdest, dst); } } if (shellparam.nparam > 0) dst->state = WORD_QUOTEMARK; return; } /* FALLTHROUGH */ case '*': if (ifsset()) sep[0] = ifsval()[0]; else sep[0] = ' '; sep[1] = '\0'; for (ap = shellparam.p ; (p = *ap++) != NULL ; ) { strtodest(p, flag, subtype, quoted, dst); if (!*ap) break; if (sep[0]) strtodest(sep, flag, subtype, quoted, dst); else if (flag & EXP_SPLIT && !quoted && **ap != '\0') { if (splitlater) STPUTC('\0', expdest); else NEXTWORD('\0', flag, expdest, dst); } } return; default: if (is_digit(*name)) { num = atoi(name); if (num == 0) p = arg0; else if (num > 0 && num <= shellparam.nparam) p = shellparam.p[num - 1]; else return; strtodest(p, flag, subtype, quoted, dst); } return; } cvtnum(num, buf); strtodest(buf, flag, subtype, quoted, dst); } static char expdir[PATH_MAX]; #define expdir_end (expdir + sizeof(expdir)) /* * Perform pathname generation and remove control characters. * At this point, the only control characters should be CTLESC. * The results are stored in the list dstlist. */ static void expandmeta(char *pattern, struct arglist *dstlist) { char *p; int firstmatch; char c; firstmatch = dstlist->count; p = pattern; for (; (c = *p) != '\0'; p++) { /* fast check for meta chars */ if (c == '*' || c == '?' || c == '[') { INTOFF; expmeta(expdir, pattern, dstlist); INTON; break; } } if (dstlist->count == firstmatch) { /* * no matches */ rmescapes(pattern); appendarglist(dstlist, pattern); } else { qsort(&dstlist->args[firstmatch], dstlist->count - firstmatch, sizeof(dstlist->args[0]), expsortcmp); } } /* * Do metacharacter (i.e. *, ?, [...]) expansion. */ static void expmeta(char *enddir, char *name, struct arglist *arglist) { const char *p; const char *q; const char *start; char *endname; int metaflag; struct stat statb; DIR *dirp; struct dirent *dp; int atend; int matchdot; int esc; int namlen; metaflag = 0; start = name; for (p = name; esc = 0, *p; p += esc + 1) { if (*p == '*' || *p == '?') metaflag = 1; else if (*p == '[') { q = p + 1; if (*q == '!' || *q == '^') q++; for (;;) { if (*q == CTLESC) q++; if (*q == '/' || *q == '\0') break; if (*++q == ']') { metaflag = 1; break; } } } else if (*p == '\0') break; else { if (*p == CTLESC) esc++; if (p[esc] == '/') { if (metaflag) break; start = p + esc + 1; } } } if (metaflag == 0) { /* we've reached the end of the file name */ if (enddir != expdir) metaflag++; for (p = name ; ; p++) { if (*p == CTLESC) p++; *enddir++ = *p; if (*p == '\0') break; if (enddir == expdir_end) return; } if (metaflag == 0 || lstat(expdir, &statb) >= 0) appendarglist(arglist, stsavestr(expdir)); return; } endname = name + (p - name); if (start != name) { p = name; while (p < start) { if (*p == CTLESC) p++; *enddir++ = *p++; if (enddir == expdir_end) return; } } if (enddir == expdir) { p = "."; } else if (enddir == expdir + 1 && *expdir == '/') { p = "/"; } else { p = expdir; enddir[-1] = '\0'; } if ((dirp = opendir(p)) == NULL) return; if (enddir != expdir) enddir[-1] = '/'; if (*endname == 0) { atend = 1; } else { atend = 0; *endname = '\0'; endname += esc + 1; } matchdot = 0; p = start; if (*p == CTLESC) p++; if (*p == '.') matchdot++; while (! int_pending() && (dp = readdir(dirp)) != NULL) { if (dp->d_name[0] == '.' && ! matchdot) continue; if (patmatch(start, dp->d_name)) { namlen = dp->d_namlen; if (enddir + namlen + 1 > expdir_end) continue; memcpy(enddir, dp->d_name, namlen + 1); if (atend) appendarglist(arglist, stsavestr(expdir)); else { if (dp->d_type != DT_UNKNOWN && dp->d_type != DT_DIR && dp->d_type != DT_LNK) continue; if (enddir + namlen + 2 > expdir_end) continue; enddir[namlen] = '/'; enddir[namlen + 1] = '\0'; expmeta(enddir + namlen + 1, endname, arglist); } } } closedir(dirp); if (! atend) endname[-esc - 1] = esc ? CTLESC : '/'; } static int expsortcmp(const void *p1, const void *p2) { const char *s1 = *(const char * const *)p1; const char *s2 = *(const char * const *)p2; return (strcmp(s1, s2)); } static wchar_t get_wc(const char **p) { wchar_t c; int chrlen; chrlen = mbtowc(&c, *p, 4); if (chrlen == 0) return 0; else if (chrlen == -1) c = 0; else *p += chrlen; return c; } /* * See if a character matches a character class, starting at the first colon * of "[:class:]". * If a valid character class is recognized, a pointer to the next character * after the final closing bracket is stored into *end, otherwise a null * pointer is stored into *end. */ static int match_charclass(const char *p, wchar_t chr, const char **end) { char name[20]; const char *nameend; wctype_t cclass; *end = NULL; p++; nameend = strstr(p, ":]"); if (nameend == NULL || (size_t)(nameend - p) >= sizeof(name) || nameend == p) return 0; memcpy(name, p, nameend - p); name[nameend - p] = '\0'; *end = nameend + 2; cclass = wctype(name); /* An unknown class matches nothing but is valid nevertheless. */ if (cclass == 0) return 0; return iswctype(chr, cclass); } /* * Returns true if the pattern matches the string. */ static int patmatch(const char *pattern, const char *string) { const char *p, *q, *end; const char *bt_p, *bt_q; char c; wchar_t wc, wc2; p = pattern; q = string; bt_p = NULL; bt_q = NULL; for (;;) { switch (c = *p++) { case '\0': if (*q != '\0') goto backtrack; return 1; case CTLESC: if (*q++ != *p++) goto backtrack; break; case '?': if (*q == '\0') return 0; if (localeisutf8) { wc = get_wc(&q); /* * A '?' does not match invalid UTF-8 but a * '*' does, so backtrack. */ if (wc == 0) goto backtrack; } else wc = (unsigned char)*q++; break; case '*': c = *p; while (c == '*') c = *++p; /* * If the pattern ends here, we know the string * matches without needing to look at the rest of it. */ if (c == '\0') return 1; /* * First try the shortest match for the '*' that * could work. We can forget any earlier '*' since * there is no way having it match more characters * can help us, given that we are already here. */ bt_p = p; bt_q = q; break; case '[': { const char *savep, *saveq; int invert, found; wchar_t chr; savep = p, saveq = q; invert = 0; if (*p == '!' || *p == '^') { invert++; p++; } found = 0; if (*q == '\0') return 0; if (localeisutf8) { chr = get_wc(&q); if (chr == 0) goto backtrack; } else chr = (unsigned char)*q++; c = *p++; do { if (c == '\0') { p = savep, q = saveq; c = '['; goto dft; } if (c == '[' && *p == ':') { found |= match_charclass(p, chr, &end); if (end != NULL) p = end; } if (c == CTLESC) c = *p++; if (localeisutf8 && c & 0x80) { p--; wc = get_wc(&p); if (wc == 0) /* bad utf-8 */ return 0; } else wc = (unsigned char)c; if (*p == '-' && p[1] != ']') { p++; if (*p == CTLESC) p++; if (localeisutf8) { wc2 = get_wc(&p); if (wc2 == 0) /* bad utf-8 */ return 0; } else wc2 = (unsigned char)*p++; if ( collate_range_cmp(chr, wc) >= 0 && collate_range_cmp(chr, wc2) <= 0 ) found = 1; } else { if (chr == wc) found = 1; } } while ((c = *p++) != ']'); if (found == invert) goto backtrack; break; } dft: default: if (*q == '\0') return 0; if (*q++ == c) break; backtrack: /* * If we have a mismatch (other than hitting the end * of the string), go back to the last '*' seen and * have it match one additional character. */ if (bt_p == NULL) return 0; if (*bt_q == '\0') return 0; bt_q++; p = bt_p; q = bt_q; break; } } } /* * Remove any CTLESC and CTLQUOTEMARK characters from a string. */ void rmescapes(char *str) { char *p, *q; p = str; while (*p != CTLESC && *p != CTLQUOTEMARK && *p != CTLQUOTEEND) { if (*p++ == '\0') return; } q = p; while (*p) { if (*p == CTLQUOTEMARK || *p == CTLQUOTEEND) { p++; continue; } if (*p == CTLESC) p++; *q++ = *p++; } *q = '\0'; } /* * See if a pattern matches in a case statement. */ int casematch(union node *pattern, const char *val) { struct stackmark smark; int result; char *p; setstackmark(&smark); argbackq = pattern->narg.backquote; STARTSTACKSTR(expdest); argstr(pattern->narg.text, EXP_TILDE | EXP_CASE, NULL); STPUTC('\0', expdest); p = grabstackstr(expdest); result = patmatch(p, val); popstackmark(&smark); return result; } /* * Our own itoa(). */ static void cvtnum(int num, char *buf) { char temp[32]; int neg = num < 0; char *p = temp + 31; temp[31] = '\0'; do { *--p = num % 10 + '0'; } while ((num /= 10) != 0); if (neg) *--p = '-'; memcpy(buf, p, temp + 32 - p); } /* * Do most of the work for wordexp(3). */ int wordexpcmd(int argc, char **argv) { size_t len; int i; out1fmt("%08x", argc - 1); for (i = 1, len = 0; i < argc; i++) len += strlen(argv[i]); out1fmt("%08x", (int)len); for (i = 1; i < argc; i++) outbin(argv[i], strlen(argv[i]) + 1, out1); return (0); } /* * Do most of the work for wordexp(3), new version. */ int freebsd_wordexpcmd(int argc __unused, char **argv __unused) { struct arglist arglist; union node *args, *n; size_t len; int ch; int protected = 0; int fd = -1; int i; while ((ch = nextopt("f:p")) != '\0') { switch (ch) { case 'f': fd = number(shoptarg); break; case 'p': protected = 1; break; } } if (*argptr != NULL) error("wrong number of arguments"); if (fd < 0) error("missing fd"); INTOFF; setinputfd(fd, 1); INTON; args = parsewordexp(); popfile(); /* will also close fd */ if (protected) for (n = args; n != NULL; n = n->narg.next) { if (n->narg.backquote != NULL) { outcslow('C', out1); error("command substitution disabled"); } } outcslow(' ', out1); emptyarglist(&arglist); for (n = args; n != NULL; n = n->narg.next) expandarg(n, &arglist, EXP_FULL | EXP_TILDE); for (i = 0, len = 0; i < arglist.count; i++) len += strlen(arglist.args[i]); out1fmt("%016x %016zx", arglist.count, len); for (i = 0; i < arglist.count; i++) outbin(arglist.args[i], strlen(arglist.args[i]) + 1, out1); return (0); } Index: projects/clang380-import/cddl/lib/Makefile =================================================================== --- projects/clang380-import/cddl/lib/Makefile (revision 294776) +++ projects/clang380-import/cddl/lib/Makefile (revision 294777) @@ -1,41 +1,41 @@ # $FreeBSD$ .include SUBDIR= ${_drti} \ libavl \ libctf \ ${_libdtrace} \ libnvpair \ libumem \ libuutil \ ${_libzfs_core} \ ${_libzfs} \ ${_libzpool} \ ${_tests} .if ${MK_TESTS} != "no" _tests= tests .endif .if ${MK_ZFS} != "no" _libzfs_core= libzfs_core _libzfs= libzfs .if ${MK_LIBTHR} != "no" _libzpool= libzpool .endif .endif -.if ${MACHINE_CPUARCH} != "sparc64" +.if ${MACHINE_CPUARCH} != "sparc64" && ${MACHINE_CPUARCH} != "riscv" _drti= drti _libdtrace= libdtrace .endif SUBDIR_DEPEND_libdtrace= libctf SUBDIR_DEPEND_libzfs_core= libnvpair SUBDIR_DEPEND_libzfs= libavl libnvpair libumem libuutil libzfs_core SUBDIR_DEPEND_libzpool= libavl libnvpair libumem SUBDIR_PARALLEL= .include Index: projects/clang380-import/cddl =================================================================== --- projects/clang380-import/cddl (revision 294776) +++ projects/clang380-import/cddl (revision 294777) Property changes on: projects/clang380-import/cddl ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/cddl:r294090-294776 Index: projects/clang380-import/contrib/elftoolchain/libelf/_libelf_config.h =================================================================== --- projects/clang380-import/contrib/elftoolchain/libelf/_libelf_config.h (revision 294776) +++ projects/clang380-import/contrib/elftoolchain/libelf/_libelf_config.h (revision 294777) @@ -1,183 +1,189 @@ /*- * Copyright (c) 2008-2011 Joseph Koshy * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $Id: _libelf_config.h 3168 2015-02-24 19:17:47Z emaste $ */ #if defined(__APPLE__) || defined(__DragonFly__) #if defined(__amd64__) #define LIBELF_ARCH EM_X86_64 #define LIBELF_BYTEORDER ELFDATA2LSB #define LIBELF_CLASS ELFCLASS64 #elif defined(__i386__) #define LIBELF_ARCH EM_386 #define LIBELF_BYTEORDER ELFDATA2LSB #define LIBELF_CLASS ELFCLASS32 #endif #endif /* __DragonFly__ */ #ifdef __FreeBSD__ /* * Define LIBELF_{ARCH,BYTEORDER,CLASS} based on the machine architecture. * See also: . */ #if defined(__amd64__) #define LIBELF_ARCH EM_X86_64 #define LIBELF_BYTEORDER ELFDATA2LSB #define LIBELF_CLASS ELFCLASS64 #elif defined(__aarch64__) #define LIBELF_ARCH EM_AARCH64 #define LIBELF_BYTEORDER ELFDATA2LSB #define LIBELF_CLASS ELFCLASS64 #elif defined(__arm__) #define LIBELF_ARCH EM_ARM #if defined(__ARMEB__) /* Big-endian ARM. */ #define LIBELF_BYTEORDER ELFDATA2MSB #else #define LIBELF_BYTEORDER ELFDATA2LSB #endif #define LIBELF_CLASS ELFCLASS32 #elif defined(__i386__) #define LIBELF_ARCH EM_386 #define LIBELF_BYTEORDER ELFDATA2LSB #define LIBELF_CLASS ELFCLASS32 #elif defined(__ia64__) #define LIBELF_ARCH EM_IA_64 #define LIBELF_BYTEORDER ELFDATA2LSB #define LIBELF_CLASS ELFCLASS64 #elif defined(__mips__) #define LIBELF_ARCH EM_MIPS #if defined(__MIPSEB__) #define LIBELF_BYTEORDER ELFDATA2MSB #else #define LIBELF_BYTEORDER ELFDATA2LSB #endif #define LIBELF_CLASS ELFCLASS32 #elif defined(__powerpc__) #define LIBELF_ARCH EM_PPC #define LIBELF_BYTEORDER ELFDATA2MSB #define LIBELF_CLASS ELFCLASS32 +#elif defined(__riscv64) + +#define LIBELF_ARCH EM_RISCV +#define LIBELF_BYTEORDER ELFDATA2LSB +#define LIBELF_CLASS ELFCLASS64 + #elif defined(__sparc__) #define LIBELF_ARCH EM_SPARCV9 #define LIBELF_BYTEORDER ELFDATA2MSB #define LIBELF_CLASS ELFCLASS64 #else #error Unknown FreeBSD architecture. #endif #endif /* __FreeBSD__ */ /* * Definitions for Minix3. */ #ifdef __minix #define LIBELF_ARCH EM_386 #define LIBELF_BYTEORDER ELFDATA2LSB #define LIBELF_CLASS ELFCLASS32 #endif /* __minix */ #ifdef __NetBSD__ #include #if !defined(ARCH_ELFSIZE) #error ARCH_ELFSIZE is not defined. #endif #if ARCH_ELFSIZE == 32 #define LIBELF_ARCH ELF32_MACHDEP_ID #define LIBELF_BYTEORDER ELF32_MACHDEP_ENDIANNESS #define LIBELF_CLASS ELFCLASS32 #define Elf_Note Elf32_Nhdr #else #define LIBELF_ARCH ELF64_MACHDEP_ID #define LIBELF_BYTEORDER ELF64_MACHDEP_ENDIANNESS #define LIBELF_CLASS ELFCLASS64 #define Elf_Note Elf64_Nhdr #endif #endif /* __NetBSD__ */ #if defined(__OpenBSD__) #include #define LIBELF_ARCH ELF_TARG_MACH #define LIBELF_BYTEORDER ELF_TARG_DATA #define LIBELF_CLASS ELF_TARG_CLASS #endif /* * GNU & Linux compatibility. * * `__linux__' is defined in an environment runs the Linux kernel and glibc. * `__GNU__' is defined in an environment runs a GNU kernel (Hurd) and glibc. * `__GLIBC__' is defined for an environment that runs glibc over a non-GNU * kernel such as GNU/kFreeBSD. */ #if defined(__linux__) || defined(__GNU__) || defined(__GLIBC__) #if defined(__linux__) #include "native-elf-format.h" #define LIBELF_CLASS ELFTC_CLASS #define LIBELF_ARCH ELFTC_ARCH #define LIBELF_BYTEORDER ELFTC_BYTEORDER #endif /* defined(__linux__) */ #if LIBELF_CLASS == ELFCLASS32 #define Elf_Note Elf32_Nhdr #elif LIBELF_CLASS == ELFCLASS64 #define Elf_Note Elf64_Nhdr #else #error LIBELF_CLASS needs to be one of ELFCLASS32 or ELFCLASS64 #endif #endif /* defined(__linux__) || defined(__GNU__) || defined(__GLIBC__) */ Index: projects/clang380-import/contrib/elftoolchain =================================================================== --- projects/clang380-import/contrib/elftoolchain (revision 294776) +++ projects/clang380-import/contrib/elftoolchain (revision 294777) Property changes on: projects/clang380-import/contrib/elftoolchain ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/contrib/elftoolchain:r292913-294776 Index: projects/clang380-import/contrib/gcc/config/riscv64/freebsd.h =================================================================== --- projects/clang380-import/contrib/gcc/config/riscv64/freebsd.h (nonexistent) +++ projects/clang380-import/contrib/gcc/config/riscv64/freebsd.h (revision 294777) @@ -0,0 +1,6 @@ +/* $FreeBSD$ */ + +#undef INIT_SECTION_ASM_OP +#undef FINI_SECTION_ASM_OP +#define INIT_ARRAY_SECTION_ASM_OP "\t.section\t.init_array,\"aw\",%init_array" +#define FINI_ARRAY_SECTION_ASM_OP "\t.section\t.fini_array,\"aw\",%fini_array" Property changes on: projects/clang380-import/contrib/gcc/config/riscv64/freebsd.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/contrib/gcc/config/riscv64/riscv64.h =================================================================== --- projects/clang380-import/contrib/gcc/config/riscv64/riscv64.h (nonexistent) +++ projects/clang380-import/contrib/gcc/config/riscv64/riscv64.h (revision 294777) @@ -0,0 +1 @@ +/* $FreeBSD$ */ Property changes on: projects/clang380-import/contrib/gcc/config/riscv64/riscv64.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/contrib/gcc =================================================================== --- projects/clang380-import/contrib/gcc (revision 294776) +++ projects/clang380-import/contrib/gcc (revision 294777) Property changes on: projects/clang380-import/contrib/gcc ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/contrib/gcc:r292913-294776 Index: projects/clang380-import/contrib/ofed/librdmacm/examples/build/rping/Makefile =================================================================== --- projects/clang380-import/contrib/ofed/librdmacm/examples/build/rping/Makefile (revision 294776) +++ projects/clang380-import/contrib/ofed/librdmacm/examples/build/rping/Makefile (revision 294777) @@ -1,11 +1,12 @@ # # $FreeBSD$ # .PATH: ${.CURDIR}/../.. PROG= rping MAN= SRCS= rping.c -LDADD+= -libverbs -lrdmacm -lpthread -LDADD+= -lmlx4 +LIBADD+= ibverbs rdmacm pthread +LIBADD+= mlx4 +LIBADD+= cxgb4 .include Index: projects/clang380-import/etc/defaults/periodic.conf =================================================================== --- projects/clang380-import/etc/defaults/periodic.conf (revision 294776) +++ projects/clang380-import/etc/defaults/periodic.conf (revision 294777) @@ -1,386 +1,391 @@ #!/bin/sh # # This is defaults/periodic.conf - a file full of useful variables that # you can set to change the default behaviour of periodic jobs on your # system. You should not edit this file! Put any overrides into one of the # $periodic_conf_files instead and you will be able to update these defaults # later without spamming your local configuration information. # # The $periodic_conf_files files should only contain values which override # values set in this file. This eases the upgrade path when defaults # are changed and new features are added. # # For a more detailed explanation of all the periodic.conf variables, please # refer to the periodic.conf(5) manual page. # # $FreeBSD$ # # What files override these defaults ? periodic_conf_files="/etc/periodic.conf /etc/periodic.conf.local" # periodic script dirs local_periodic="/usr/local/etc/periodic" # Daily options # These options are used by periodic(8) itself to determine what to do # with the output of the sub-programs that are run, and where to send # that output. $daily_output might be set to /var/log/daily.log if you # wish to log the daily output and have the files rotated by newsyslog(8) # daily_output="root" # user or /file daily_show_success="YES" # scripts returning 0 daily_show_info="YES" # scripts returning 1 daily_show_badconfig="NO" # scripts returning 2 # 100.clean-disks daily_clean_disks_enable="NO" # Delete files daily daily_clean_disks_files="[#,]* .#* a.out *.core *.CKP .emacs_[0-9]*" daily_clean_disks_days=3 # If older than this daily_clean_disks_verbose="YES" # Mention files deleted # 110.clean-tmps daily_clean_tmps_enable="NO" # Delete stuff daily daily_clean_tmps_dirs="/tmp" # Delete under here daily_clean_tmps_days="3" # If not accessed for daily_clean_tmps_ignore=".X*-lock .X11-unix .ICE-unix .font-unix .XIM-unix" daily_clean_tmps_ignore="$daily_clean_tmps_ignore quota.user quota.group .snap" daily_clean_tmps_ignore="$daily_clean_tmps_ignore .sujournal" # Don't delete these daily_clean_tmps_verbose="YES" # Mention files deleted # 120.clean-preserve daily_clean_preserve_enable="YES" # Delete files daily daily_clean_preserve_days=7 # If not modified for daily_clean_preserve_verbose="YES" # Mention files deleted # 130.clean-msgs daily_clean_msgs_enable="YES" # Delete msgs daily daily_clean_msgs_days= # If not modified for # 140.clean-rwho daily_clean_rwho_enable="YES" # Delete rwho daily daily_clean_rwho_days=7 # If not modified for daily_clean_rwho_verbose="YES" # Mention files deleted # 150.clean-hoststat daily_clean_hoststat_enable="YES" # Purge sendmail host # status cache daily # 200.backup-passwd daily_backup_passwd_enable="YES" # Backup passwd & group # 210.backup-aliases daily_backup_aliases_enable="YES" # Backup mail aliases # 300.calendar daily_calendar_enable="NO" # Run calendar -a # 310.accounting daily_accounting_enable="YES" # Rotate acct files daily_accounting_compress="NO" # Gzip rotated files daily_accounting_flags=-q # Flags to /usr/sbin/sa daily_accounting_save=3 # How many files to save # 330.news daily_news_expire_enable="YES" # Run news.expire # 400.status-disks daily_status_disks_enable="YES" # Check disk status daily_status_disks_df_flags="-l -h" # df(1) flags for check # 401.status-graid daily_status_graid_enable="NO" # Check graid(8) # 404.status-zfs daily_status_zfs_enable="NO" # Check ZFS daily_status_zfs_zpool_list_enable="YES" # List ZFS pools # 406.status-gmirror daily_status_gmirror_enable="NO" # Check gmirror(8) # 407.status-graid3 daily_status_graid3_enable="NO" # Check graid3(8) # 408.status-gstripe daily_status_gstripe_enable="NO" # Check gstripe(8) # 409.status-gconcat daily_status_gconcat_enable="NO" # Check gconcat(8) # 420.status-network daily_status_network_enable="YES" # Check network status daily_status_network_usedns="YES" # DNS lookups are ok daily_status_network_netstat_flags="-d" # netstat(1) flags # 430.status-uptime daily_status_uptime_enable="YES" # Check system uptime # 440.status-mailq daily_status_mailq_enable="YES" # Check mail status daily_status_mailq_shorten="NO" # Shorten output daily_status_include_submit_mailq="YES" # Also submit queue # 450.status-security daily_status_security_enable="YES" # Security check # See also "Security options" below for more options daily_status_security_inline="NO" # Run inline ? daily_status_security_output="root" # user or /file # 460.status-mail-rejects daily_status_mail_rejects_enable="YES" # Check mail rejects daily_status_mail_rejects_logs=3 # How many logs to check daily_status_mail_rejects_shorten="NO" # Shorten output +# 480.leapfile-ntpd +daily_ntpd_leapfile_enable="NO" # Fetch NTP leapfile +daily_ntpd_avoid_congestion="YES" # Avoid congesting + # leapfile sources + # 480.status-ntpd daily_status_ntpd_enable="NO" # Check NTP status # 500.queuerun daily_queuerun_enable="YES" # Run mail queue daily_submit_queuerun="YES" # Also submit queue # 510.status-world-kernel daily_status_world_kernel="YES" # Check the running # userland/kernel version # 800.scrub-zfs daily_scrub_zfs_enable="NO" daily_scrub_zfs_pools="" # empty string selects all pools daily_scrub_zfs_default_threshold="35" # days between scrubs #daily_scrub_zfs_${poolname}_threshold="35" # pool specific threshold # 999.local daily_local="/etc/daily.local" # Local scripts # Weekly options # These options are used by periodic(8) itself to determine what to do # with the output of the sub-programs that are run, and where to send # that output. $weekly_output might be set to /var/log/weekly.log if you # wish to log the weekly output and have the files rotated by newsyslog(8) # weekly_output="root" # user or /file weekly_show_success="YES" # scripts returning 0 weekly_show_info="YES" # scripts returning 1 weekly_show_badconfig="NO" # scripts returning 2 # 310.locate weekly_locate_enable="YES" # Update locate weekly # 320.whatis weekly_whatis_enable="YES" # Update whatis weekly # 330.catman weekly_catman_enable="NO" # Preformat man pages # 340.noid weekly_noid_enable="NO" # Find unowned files weekly_noid_dirs="/" # Look here # 450.status-security weekly_status_security_enable="YES" # Security check # See also "Security options" above for more options weekly_status_security_inline="NO" # Run inline ? weekly_status_security_output="root" # user or /file # 999.local weekly_local="/etc/weekly.local" # Local scripts # Monthly options # These options are used by periodic(8) itself to determine what to do # with the output of the sub-programs that are run, and where to send # that output. $monthly_output might be set to /var/log/monthly.log if you # wish to log the monthly output and have the files rotated by newsyslog(8) # monthly_output="root" # user or /file monthly_show_success="YES" # scripts returning 0 monthly_show_info="YES" # scripts returning 1 monthly_show_badconfig="NO" # scripts returning 2 # 200.accounting monthly_accounting_enable="YES" # Login accounting # 450.status-security monthly_status_security_enable="YES" # Security check # See also "Security options" above for more options monthly_status_security_inline="NO" # Run inline ? monthly_status_security_output="root" # user or /file # 999.local monthly_local="/etc/monthly.local" # Local scripts # Security options # These options are used by the security periodic(8) scripts spawned in # daily and weekly 450.status-security. security_status_logdir="/var/log" # Directory for logs security_status_diff_flags="-b -u" # flags for diff output # Each of the security_status_*_period options below can have one of the # following values: # - NO: do not run at all # - daily: only run during the daily security status # - weekly: only run during the weekly security status # - monthly: only run during the monthly security status # Note that if periodic security scripts are run from crontab(5) directly, # they will be run unless _enable or _period is set to "NO". # 100.chksetuid security_status_chksetuid_enable="YES" security_status_chksetuid_period="daily" # 110.neggrpperm security_status_neggrpperm_enable="YES" security_status_neggrpperm_period="daily" # 200.chkmounts security_status_chkmounts_enable="YES" security_status_chkmounts_period="daily" #security_status_chkmounts_ignore="^amd:" # Don't check matching # FS types security_status_noamd="NO" # Don't check amd mounts # 300.chkuid0 security_status_chkuid0_enable="YES" security_status_chkuid0_period="daily" # 400.passwdless security_status_passwdless_enable="YES" security_status_passwdless_period="daily" # 410.logincheck security_status_logincheck_enable="YES" security_status_logincheck_period="daily" # 500.ipfwdenied security_status_ipfwdenied_enable="YES" security_status_ipfwdenied_period="daily" # 510.ipfdenied security_status_ipfdenied_enable="YES" security_status_ipfdenied_period="daily" # 520.pfdenied security_status_pfdenied_enable="YES" security_status_pfdenied_period="daily" # 550.ipfwlimit security_status_ipfwlimit_enable="YES" security_status_ipfwlimit_period="daily" # 610.ipf6denied security_status_ipf6denied_enable="YES" security_status_ipf6denied_period="daily" # 700.kernelmsg security_status_kernelmsg_enable="YES" security_status_kernelmsg_period="daily" # 800.loginfail security_status_loginfail_enable="YES" security_status_loginfail_period="daily" # 900.tcpwrap security_status_tcpwrap_enable="YES" security_status_tcpwrap_period="daily" # Define source_periodic_confs, the mechanism used by /etc/periodic/*/* # scripts to source defaults/periodic.conf overrides safely. if [ -z "${source_periodic_confs_defined}" ]; then source_periodic_confs_defined=yes # Compatibility with old daily variable names. # They can be removed in stable/11. security_daily_compat_var() { local var=$1 dailyvar value dailyvar=daily_status_security${var#security_status} periodvar=${var%enable}period eval value=\"\$$dailyvar\" [ -z "$value" ] && return echo "Warning: Variable \$$dailyvar is deprecated," \ "use \$$var instead." >&2 case "$value" in [Yy][Ee][Ss]) eval $var=YES eval $periodvar=daily ;; *) eval $var=\"$value\" ;; esac } check_yesno_period() { local var="$1" periodvar value period eval value=\"\$$var\" case "$value" in [Yy][Ee][Ss]) ;; *) return 1 ;; esac periodvar=${var%enable}period eval period=\"\$$periodvar\" case "$PERIODIC" in "security daily") case "$period" in [Dd][Aa][Ii][Ll][Yy]) return 0 ;; *) return 1 ;; esac ;; "security weekly") case "$period" in [Ww][Ee][Ee][Kk][Ll][Yy]) return 0 ;; *) return 1 ;; esac ;; "security monthly") case "$period" in [Mm][Oo][Nn][Tt][Hh][Ll][Yy]) return 0 ;; *) return 1 ;; esac ;; security) # Run directly from crontab(5). case "$period" in [Nn][Oo]) return 1 ;; *) return 0 ;; esac ;; '') # Script run manually. return 0 ;; *) echo "ASSERTION FAILED: Unexpected value for" \ "\$PERIODIC: '$PERIODIC'" >&2 exit 127 ;; esac } source_periodic_confs() { local i sourced_files for i in ${periodic_conf_files}; do case ${sourced_files} in *:$i:*) ;; *) sourced_files="${sourced_files}:$i:" [ -r $i ] && . $i ;; esac done } fi Index: projects/clang380-import/etc/defaults/rc.conf =================================================================== --- projects/clang380-import/etc/defaults/rc.conf (revision 294776) +++ projects/clang380-import/etc/defaults/rc.conf (revision 294777) @@ -1,709 +1,718 @@ #!/bin/sh # This is rc.conf - a file full of useful variables that you can set # to change the default startup behavior of your system. You should # not edit this file! Put any overrides into one of the ${rc_conf_files} # instead and you will be able to update these defaults later without # spamming your local configuration information. # # The ${rc_conf_files} files should only contain values which override # values set in this file. This eases the upgrade path when defaults # are changed and new features are added. # # All arguments must be in double or single quotes. # # For a more detailed explanation of all the rc.conf variables, please # refer to the rc.conf(5) manual page. # # $FreeBSD$ ############################################################## ### Important initial Boot-time options #################### ############################################################## #rc_debug="NO" # Set to YES to enable debugging output from rc.d rc_info="NO" # Enables display of informational messages at boot. rc_startmsgs="YES" # Show "Starting foo:" messages at boot rcshutdown_timeout="90" # Seconds to wait before terminating rc.shutdown early_late_divider="FILESYSTEMS" # Script that separates early/late # stages of the boot process. Make sure you know # the ramifications if you change this. # See rc.conf(5) for more details. always_force_depends="NO" # Set to check that indicated dependencies are # running during boot (can increase boot time). apm_enable="NO" # Set to YES to enable APM BIOS functions (or NO). apmd_enable="NO" # Run apmd to handle APM event from userland. apmd_flags="" # Flags to apmd (if enabled). ddb_enable="NO" # Set to YES to load ddb scripts at boot. ddb_config="/etc/ddb.conf" # ddb(8) config file. devd_enable="YES" # Run devd, to trigger programs on device tree changes. devd_flags="" # Additional flags for devd(8). #kld_list="" # Kernel modules to load after local disks are mounted kldxref_enable="NO" # Build linker.hints files with kldxref(8). kldxref_clobber="NO" # Overwrite old linker.hints at boot. kldxref_module_path="" # Override kern.module_path. A ';'-delimited list. powerd_enable="NO" # Run powerd to lower our power usage. powerd_flags="" # Flags to powerd (if enabled). tmpmfs="AUTO" # Set to YES to always create an mfs /tmp, NO to never tmpsize="20m" # Size of mfs /tmp if created tmpmfs_flags="-S" # Extra mdmfs options for the mfs /tmp varmfs="AUTO" # Set to YES to always create an mfs /var, NO to never varsize="32m" # Size of mfs /var if created varmfs_flags="-S" # Extra mount options for the mfs /var populate_var="AUTO" # Set to YES to always (re)populate /var, NO to never cleanvar_enable="YES" # Clean the /var directory local_startup="/usr/local/etc/rc.d" # startup script dirs. script_name_sep=" " # Change if your startup scripts' names contain spaces rc_conf_files="/etc/rc.conf /etc/rc.conf.local" # ZFS support zfs_enable="NO" # Set to YES to automatically mount ZFS file systems gptboot_enable="YES" # GPT boot success/failure reporting. # Experimental - test before enabling gbde_autoattach_all="NO" # YES automatically mounts gbde devices from fstab gbde_devices="NO" # Devices to automatically attach (list, or AUTO) gbde_attach_attempts="3" # Number of times to attempt attaching gbde devices gbde_lockdir="/etc" # Where to look for gbde lockfiles # GELI disk encryption configuration. geli_devices="" # List of devices to automatically attach in addition to # GELI devices listed in /etc/fstab. geli_tries="" # Number of times to attempt attaching geli device. # If empty, kern.geom.eli.tries will be used. geli_default_flags="" # Default flags for geli(8). geli_autodetach="YES" # Automatically detach on last close. # Providers are marked as such when all file systems are # mounted. # Example use. #geli_devices="da1 mirror/home" #geli_da1_flags="-p -k /etc/geli/da1.keys" #geli_da1_autodetach="NO" #geli_mirror_home_flags="-k /etc/geli/home.keys" root_rw_mount="YES" # Set to NO to inhibit remounting root read-write. root_hold_delay="30" # Time to wait for root mount hold release. fsck_y_enable="NO" # Set to YES to do fsck -y if the initial preen fails. fsck_y_flags="" # Additional flags for fsck -y background_fsck="YES" # Attempt to run fsck in the background where possible. background_fsck_delay="60" # Time to wait (seconds) before starting the fsck. netfs_types="nfs:NFS smbfs:SMB" # Net filesystems. extra_netfs_types="NO" # List of network extra filesystem types for delayed # mount at startup (or NO). ############################################################## ### Network configuration sub-section ###################### ############################################################## ### Basic network and firewall/security options: ### hostname="" # Set this! hostid_enable="YES" # Set host UUID. hostid_file="/etc/hostid" # File with hostuuid. nisdomainname="NO" # Set to NIS domain if using NIS (or NO). dhclient_program="/sbin/dhclient" # Path to dhcp client program. dhclient_flags="" # Extra flags to pass to dhcp client. #dhclient_flags_fxp0="" # Extra dhclient flags for fxp0 only background_dhclient="NO" # Start dhcp client in the background. #background_dhclient_fxp0="YES" # Start dhcp client on fxp0 in the background. synchronous_dhclient="NO" # Start dhclient directly on configured # interfaces during startup. defaultroute_delay="30" # Time to wait for a default route on a DHCP interface. defaultroute_carrier_delay="5" # Time to wait for carrier while waiting for a default route. netif_enable="YES" # Set to YES to initialize network interfaces netif_ipexpand_max="2048" # Maximum number of IP addrs in a range spec. wpa_supplicant_program="/usr/sbin/wpa_supplicant" wpa_supplicant_flags="-s" # Extra flags to pass to wpa_supplicant wpa_supplicant_conf_file="/etc/wpa_supplicant.conf" # firewall_enable="NO" # Set to YES to enable firewall functionality firewall_script="/etc/rc.firewall" # Which script to run to set up the firewall firewall_type="UNKNOWN" # Firewall type (see /etc/rc.firewall) firewall_quiet="NO" # Set to YES to suppress rule display firewall_logging="NO" # Set to YES to enable events logging firewall_logif="NO" # Set to YES to create logging-pseudo interface firewall_flags="" # Flags passed to ipfw when type is a file firewall_coscripts="" # List of executables/scripts to run after # firewall starts/stops firewall_client_net="192.0.2.0/24" # IPv4 Network address for "client" # firewall. #firewall_client_net_ipv6="2001:db8:2:1::/64" # IPv6 network prefix for # "client" firewall. firewall_simple_iif="ed1" # Inside network interface for "simple" # firewall. firewall_simple_inet="192.0.2.16/28" # Inside network address for "simple" # firewall. firewall_simple_oif="ed0" # Outside network interface for "simple" # firewall. firewall_simple_onet="192.0.2.0/28" # Outside network address for "simple" # firewall. #firewall_simple_iif_ipv6="ed1" # Inside IPv6 network interface for "simple" # firewall. #firewall_simple_inet_ipv6="2001:db8:2:800::/56" # Inside IPv6 network prefix # for "simple" firewall. #firewall_simple_oif_ipv6="ed0" # Outside IPv6 network interface for "simple" # firewall. #firewall_simple_onet_ipv6="2001:db8:2:0::/56" # Outside IPv6 network prefix # for "simple" firewall. firewall_myservices="" # List of TCP ports on which this host # offers services for "workstation" firewall. firewall_allowservices="" # List of IPs which have access to # $firewall_myservices for "workstation" # firewall. firewall_trusted="" # List of IPs which have full access to this # host for "workstation" firewall. firewall_logdeny="NO" # Set to YES to log default denied incoming # packets for "workstation" firewall. firewall_nologports="135-139,445 1026,1027 1433,1434" # List of TCP/UDP ports # for which denied incoming packets are not # logged for "workstation" firewall. firewall_nat_enable="NO" # Enable kernel NAT (if firewall_enable == YES) firewall_nat_interface="" # Public interface or IPaddress to use firewall_nat_flags="" # Additional configuration parameters dummynet_enable="NO" # Load the dummynet(4) module ip_portrange_first="NO" # Set first dynamically allocated port ip_portrange_last="NO" # Set last dynamically allocated port ike_enable="NO" # Enable IKE daemon (usually racoon or isakmpd) ike_program="/usr/local/sbin/isakmpd" # Path to IKE daemon ike_flags="" # Additional flags for IKE daemon ipsec_enable="NO" # Set to YES to run setkey on ipsec_file ipsec_file="/etc/ipsec.conf" # Name of config file for setkey natd_program="/sbin/natd" # path to natd, if you want a different one. natd_enable="NO" # Enable natd (if firewall_enable == YES). natd_interface="" # Public interface or IPaddress to use. natd_flags="" # Additional flags for natd. ipfilter_enable="NO" # Set to YES to enable ipfilter functionality ipfilter_program="/sbin/ipf" # where the ipfilter program lives ipfilter_rules="/etc/ipf.rules" # rules definition file for ipfilter, see # /usr/src/contrib/ipfilter/rules for examples ipfilter_flags="" # additional flags for ipfilter ipnat_enable="NO" # Set to YES to enable ipnat functionality ipnat_program="/sbin/ipnat" # where the ipnat program lives ipnat_rules="/etc/ipnat.rules" # rules definition file for ipnat ipnat_flags="" # additional flags for ipnat ipmon_enable="NO" # Set to YES for ipmon; needs ipfilter or ipnat ipmon_program="/sbin/ipmon" # where the ipfilter monitor program lives ipmon_flags="-Ds" # typically "-Ds" or "-D /var/log/ipflog" ipfs_enable="NO" # Set to YES to enable saving and restoring # of state tables at shutdown and boot ipfs_program="/sbin/ipfs" # where the ipfs program lives ipfs_flags="" # additional flags for ipfs pf_enable="NO" # Set to YES to enable packet filter (pf) pf_rules="/etc/pf.conf" # rules definition file for pf pf_program="/sbin/pfctl" # where the pfctl program lives pf_flags="" # additional flags for pfctl pflog_enable="NO" # Set to YES to enable packet filter logging pflog_logfile="/var/log/pflog" # where pflogd should store the logfile pflog_program="/sbin/pflogd" # where the pflogd program lives pflog_flags="" # additional flags for pflogd ftpproxy_enable="NO" # Set to YES to enable ftp-proxy(8) for pf ftpproxy_flags="" # additional flags for ftp-proxy(8) pfsync_enable="NO" # Expose pf state to other hosts for syncing pfsync_syncdev="" # Interface for pfsync to work through pfsync_syncpeer="" # IP address of pfsync peer host pfsync_ifconfig="" # Additional options to ifconfig(8) for pfsync tcp_extensions="YES" # Set to NO to turn off RFC1323 extensions. log_in_vain="0" # >=1 to log connects to ports w/o listeners. tcp_keepalive="YES" # Enable stale TCP connection timeout (or NO). tcp_drop_synfin="NO" # Set to YES to drop TCP packets with SYN+FIN # NOTE: this violates the TCP specification icmp_drop_redirect="NO" # Set to YES to ignore ICMP REDIRECT packets icmp_log_redirect="NO" # Set to YES to log ICMP REDIRECT packets network_interfaces="auto" # List of network interfaces (or "auto"). cloned_interfaces="" # List of cloned network interfaces to create. #cloned_interfaces="gif0 gif1 gif2 gif3" # Pre-cloning GENERIC config. #ifconfig_lo0="inet 127.0.0.1" # default loopback device configuration. #ifconfig_lo0_alias0="inet 127.0.0.254 netmask 0xffffffff" # Sample alias entry. #ifconfig_ed0_ipv6="inet6 2001:db8:1::1 prefixlen 64" # Sample IPv6 addr entry #ifconfig_ed0_alias0="inet6 2001:db8:2::1 prefixlen 64" # Sample IPv6 alias #ifconfig_fxp0_name="net0" # Change interface name from fxp0 to net0. #vlans_fxp0="101 vlan0" # vlan(4) interfaces for fxp0 device #create_args_vlan0="vlan 102" # vlan tag for vlan0 device #wlans_ath0="wlan0" # wlan(4) interfaces for ath0 device #wlandebug_wlan0="scan+auth+assoc" # Set debug flags with wlanddebug(8) #ipv4_addrs_fxp0="192.168.0.1/24 192.168.1.1-5/28" # example IPv4 address entry. # #autobridge_interfaces="bridge0" # List of bridges to check #autobridge_bridge0="tap* vlan0" # Interface glob to automatically add to the bridge # # If you have any sppp(4) interfaces above, you might also want to set # the following parameters. Refer to spppcontrol(8) for their meaning. sppp_interfaces="" # List of sppp interfaces. #sppp_interfaces="...0" # example: sppp over ... #spppconfig_...0="authproto=chap myauthname=foo myauthsecret='top secret' hisauthname=some-gw hisauthsecret='another secret'" # User ppp configuration. ppp_enable="NO" # Start user-ppp (or NO). ppp_program="/usr/sbin/ppp" # Path to user-ppp program. ppp_mode="auto" # Choice of "auto", "ddial", "direct" or "dedicated". # For details see man page for ppp(8). Default is auto. ppp_nat="YES" # Use PPP's internal network address translation or NO. ppp_profile="papchap" # Which profile to use from /etc/ppp/ppp.conf. ppp_user="root" # Which user to run ppp as # Start multiple instances of ppp at boot time #ppp_profile="profile1 profile2 profile3" # Which profiles to use #ppp_profile1_mode="ddial" # Override ppp mode for profile1 #ppp_profile2_nat="NO" # Override nat mode for profile2 # profile3 uses default ppp_mode and ppp_nat ### Network daemon (miscellaneous) ### hostapd_enable="NO" # Run hostap daemon. syslogd_enable="YES" # Run syslog daemon (or NO). syslogd_program="/usr/sbin/syslogd" # path to syslogd, if you want a different one. syslogd_flags="-s" # Flags to syslogd (if enabled). altlog_proglist="" # List of chrooted applicatioins in /var inetd_enable="NO" # Run the network daemon dispatcher (YES/NO). inetd_program="/usr/sbin/inetd" # path to inetd, if you want a different one. inetd_flags="-wW -C 60" # Optional flags to inetd iscsid_enable="NO" # iSCSI initiator daemon. iscsictl_enable="NO" # iSCSI initiator autostart. iscsictl_flags="-Aa" # Optional flags to iscsictl. hastd_enable="NO" # Run the HAST daemon (YES/NO). hastd_program="/sbin/hastd" # path to hastd, if you want a different one. hastd_flags="" # Optional flags to hastd. ctld_enable="NO" # CAM Target Layer / iSCSI target daemon. local_unbound_enable="NO" # local caching resolver # # kerberos. Do not run the admin daemons on slave servers # kdc_enable="NO" # Run a kerberos 5 KDC (or NO). kdc_program="/usr/libexec/kdc" # path to kerberos 5 KDC kdc_flags="" # Additional flags to the kerberos 5 KDC kadmind_enable="NO" # Run kadmind (or NO) kadmind_program="/usr/libexec/kadmind" # path to kadmind kpasswdd_enable="NO" # Run kpasswdd (or NO) kpasswdd_program="/usr/libexec/kpasswdd" # path to kpasswdd kfd_enable="NO" # Run kfd (or NO) kfd_program="/usr/libexec/kfd" # path to kerberos 5 kfd daemon kfd_flags="" ipropd_master_enable="NO" # Run Heimdal incremental propagation daemon # (master daemon). ipropd_master_program="/usr/libexec/ipropd-master" ipropd_master_flags="" # Flags to ipropd-master. ipropd_master_keytab="/etc/krb5.keytab" # keytab for ipropd-master. ipropd_master_slaves="" # slave node names used for /var/heimdal/slaves. ipropd_slave_enable="NO" # Run Heimdal incremental propagation daemon # (slave daemon). ipropd_slave_program="/usr/libexec/ipropd-slave" ipropd_slave_flags="" # Flags to ipropd-slave. ipropd_slave_keytab="/etc/krb5.keytab" # keytab for ipropd-slave. ipropd_slave_master="" # master node name. gssd_enable="NO" # Run the gssd daemon (or NO). gssd_program="/usr/sbin/gssd" # Path to gssd. gssd_flags="" # Flags for gssd. rwhod_enable="NO" # Run the rwho daemon (or NO). rwhod_flags="" # Flags for rwhod rarpd_enable="NO" # Run rarpd (or NO). rarpd_flags="-a" # Flags to rarpd. bootparamd_enable="NO" # Run bootparamd (or NO). bootparamd_flags="" # Flags to bootparamd pppoed_enable="NO" # Run the PPP over Ethernet daemon. pppoed_provider="*" # Provider and ppp(8) config file entry. pppoed_flags="-P /var/run/pppoed.pid" # Flags to pppoed (if enabled). pppoed_interface="fxp0" # The interface that pppoed runs on. sshd_enable="NO" # Enable sshd sshd_program="/usr/sbin/sshd" # path to sshd, if you want a different one. sshd_flags="" # Additional flags for sshd. ftpd_enable="NO" # Enable stand-alone ftpd. ftpd_program="/usr/libexec/ftpd" # Path to ftpd, if you want a different one. ftpd_flags="" # Additional flags to stand-alone ftpd. ### Network daemon (NFS): All need rpcbind_enable="YES" ### amd_enable="NO" # Run amd service with $amd_flags (or NO). amd_program="/usr/sbin/amd" # path to amd, if you want a different one. amd_flags="-a /.amd_mnt -l syslog /host /etc/amd.map /net /etc/amd.map" amd_map_program="NO" # Can be set to "ypcat -k amd.master" autofs_enable="NO" # Run autofs daemons. automount_flags="" # Flags to automount(8) (if autofs enabled). automountd_flags="" # Flags to automountd(8) (if autofs enabled). autounmountd_flags="" # Flags to autounmountd(8) (if autofs enabled). nfs_client_enable="NO" # This host is an NFS client (or NO). nfs_access_cache="60" # Client cache timeout in seconds nfs_server_enable="NO" # This host is an NFS server (or NO). nfs_server_flags="-u -t" # Flags to nfsd (if enabled). nfs_server_managegids="NO" # The NFS server maps gids for AUTH_SYS (or NO). mountd_enable="NO" # Run mountd (or NO). mountd_flags="-r" # Flags to mountd (if NFS server enabled). weak_mountd_authentication="NO" # Allow non-root mount requests to be served. nfs_reserved_port_only="NO" # Provide NFS only on secure port (or NO). nfs_bufpackets="" # bufspace (in packets) for client rpc_lockd_enable="NO" # Run NFS rpc.lockd needed for client/server. rpc_lockd_flags="" # Flags to rpc.lockd (if enabled). rpc_statd_enable="NO" # Run NFS rpc.statd needed for client/server. rpc_statd_flags="" # Flags to rpc.statd (if enabled). rpcbind_enable="NO" # Run the portmapper service (YES/NO). rpcbind_program="/usr/sbin/rpcbind" # path to rpcbind, if you want a different one. rpcbind_flags="" # Flags to rpcbind (if enabled). rpc_ypupdated_enable="NO" # Run if NIS master and SecureRPC (or NO). keyserv_enable="NO" # Run the SecureRPC keyserver (or NO). keyserv_flags="" # Flags to keyserv (if enabled). nfsv4_server_enable="NO" # Enable support for NFSv4 nfscbd_enable="NO" # NFSv4 client side callback daemon nfscbd_flags="" # Flags for nfscbd nfsuserd_enable="NO" # NFSv4 user/group name mapping daemon nfsuserd_flags="" # Flags for nfsuserd ### Network Time Services options: ### timed_enable="NO" # Run the time daemon (or NO). timed_flags="" # Flags to timed (if enabled). ntpdate_enable="NO" # Run ntpdate to sync time on boot (or NO). ntpdate_program="/usr/sbin/ntpdate" # path to ntpdate, if you want a different one. ntpdate_flags="-b" # Flags to ntpdate (if enabled). ntpdate_config="/etc/ntp.conf" # ntpdate(8) configuration file ntpdate_hosts="" # Whitespace-separated list of ntpdate(8) servers. ntpd_enable="NO" # Run ntpd Network Time Protocol (or NO). ntpd_program="/usr/sbin/ntpd" # path to ntpd, if you want a different one. ntpd_config="/etc/ntp.conf" # ntpd(8) configuration file ntpd_sync_on_start="NO" # Sync time on ntpd startup, even if offset is high ntpd_flags="-p /var/run/ntpd.pid -f /var/db/ntpd.drift" # Flags to ntpd (if enabled). +ntp_src_leapfile="/etc/ntp/leap-seconds" + # Initial source for ntpd leapfile +ntp_db_leapfile="/var/db/ntpd.leap-seconds.list" + # Working copy (updated weekly) leapfile +ntp_leapfile_sources="https://www.ietf.org/timezones/data/leap-seconds.list" + # Source from which to fetch leapfile +ntp_leapfile_expiry_days=30 # Check for new leapfile 30 days prior to + # expiry. +ntp_leapfile_fetch_verbose="NO" # Be verbose during NTP leapfile fetch # Network Information Services (NIS) options: All need rpcbind_enable="YES" ### nis_client_enable="NO" # We're an NIS client (or NO). nis_client_flags="" # Flags to ypbind (if enabled). nis_ypset_enable="NO" # Run ypset at boot time (or NO). nis_ypset_flags="" # Flags to ypset (if enabled). nis_server_enable="NO" # We're an NIS server (or NO). nis_server_flags="" # Flags to ypserv (if enabled). nis_ypxfrd_enable="NO" # Run rpc.ypxfrd at boot time (or NO). nis_ypxfrd_flags="" # Flags to rpc.ypxfrd (if enabled). nis_yppasswdd_enable="NO" # Run rpc.yppasswdd at boot time (or NO). nis_yppasswdd_flags="" # Flags to rpc.yppasswdd (if enabled). ### SNMP daemon ### # Be sure to understand the security implications of running SNMP v1/v2 # in your network. bsnmpd_enable="NO" # Run the SNMP daemon (or NO). bsnmpd_flags="" # Flags for bsnmpd. ### Network routing options: ### defaultrouter="NO" # Set to default gateway (or NO). static_arp_pairs="" # Set to static ARP list (or leave empty). static_ndp_pairs="" # Set to static NDP list (or leave empty). static_routes="" # Set to static route list (or leave empty). natm_static_routes="" # Set to static route list for NATM (or leave empty). gateway_enable="NO" # Set to YES if this host will be a gateway. routed_enable="NO" # Set to YES to enable a routing daemon. routed_program="/sbin/routed" # Name of routing daemon to use if enabled. routed_flags="-q" # Flags for routing daemon. arpproxy_all="NO" # replaces obsolete kernel option ARP_PROXYALL. forward_sourceroute="NO" # do source routing (only if gateway_enable is set to "YES") accept_sourceroute="NO" # accept source routed packets to us ### ATM interface options: ### atm_enable="NO" # Configure ATM interfaces (or NO). #atm_netif_hea0="atm 1" # Network interfaces for physical interface. #atm_sigmgr_hea0="uni31" # Signalling manager for physical interface. #atm_prefix_hea0="ILMI" # NSAP prefix (UNI interfaces only) (or ILMI). #atm_macaddr_hea0="NO" # Override physical MAC address (or NO). #atm_arpserver_atm0="0x47.0005.80.999999.9999.9999.9999.999999999999.00" # ATMARP server address (or local). #atm_scsparp_atm0="NO" # Run SCSP/ATMARP on network interface (or NO). atm_pvcs="" # Set to PVC list (or leave empty). atm_arps="" # Set to permanent ARP list (or leave empty). ### Bluetooth ### hcsecd_enable="NO" # Enable hcsecd(8) (or NO) hcsecd_config="/etc/bluetooth/hcsecd.conf" # hcsecd(8) configuration file sdpd_enable="NO" # Enable sdpd(8) (or NO) sdpd_control="/var/run/sdp" # sdpd(8) control socket sdpd_groupname="nobody" # set spdp(8) user/group to run as after sdpd_username="nobody" # it initializes bthidd_enable="NO" # Enable bthidd(8) (or NO) bthidd_config="/etc/bluetooth/bthidd.conf" # bthidd(8) configuration file bthidd_hids="/var/db/bthidd.hids" # bthidd(8) known HID devices file rfcomm_pppd_server_enable="NO" # Enable rfcomm_pppd(8) in server mode (or NO) rfcomm_pppd_server_profile="one two" # Profile to use from /etc/ppp/ppp.conf # #rfcomm_pppd_server_one_bdaddr="" # Override local bdaddr for 'one' rfcomm_pppd_server_one_channel="1" # Override local channel for 'one' #rfcomm_pppd_server_one_register_sp="NO" # Override SP and DUN register #rfcomm_pppd_server_one_register_dun="NO" # for 'one' # #rfcomm_pppd_server_two_bdaddr="" # Override local bdaddr for 'two' rfcomm_pppd_server_two_channel="3" # Override local channel for 'two' #rfcomm_pppd_server_two_register_sp="NO" # Override SP and DUN register #rfcomm_pppd_server_two_register_dun="NO" # for 'two' ubthidhci_enable="NO" # Switch an USB BT controller present on #ubthidhci_busnum="3" # bus 3 and addr 2 from HID mode to HCI mode. #ubthidhci_addr="2" # Check usbconfig list to find the correct # numbers for your system. ### Network link/usability verification options netwait_enable="NO" # Enable rc.d/netwait (or NO) #netwait_ip="" # Wait for ping response from any IP in this list. netwait_timeout="60" # Total number of seconds to perform pings. #netwait_if="" # Wait for active link on each intf in this list. netwait_if_timeout="30" # Total number of seconds to monitor link state. ### Miscellaneous network options: ### icmp_bmcastecho="NO" # respond to broadcast ping packets ### IPv6 options: ### ipv6_network_interfaces="auto" # List of IPv6 network interfaces # (or "auto" or "none"). ipv6_activate_all_interfaces="NO" # If NO, interfaces which have no # corresponding $ifconfig_IF_ipv6 is # marked as IFDISABLED for security # reason. ipv6_defaultrouter="NO" # Set to IPv6 default gateway (or NO). #ipv6_defaultrouter="2002:c058:6301::" # Use this for 6to4 (RFC 3068) ipv6_static_routes="" # Set to static route list (or leave empty). #ipv6_static_routes="xxx" # An example to set fec0:0000:0000:0006::/64 # route toward loopback interface. #ipv6_route_xxx="fec0:0000:0000:0006:: -prefixlen 64 ::1" ipv6_gateway_enable="NO" # Set to YES if this host will be a gateway. ipv6_cpe_wanif="NO" # Set to the upstram interface name if this # node will work as a router to forward IPv6 # packets not explicitly addressed to itself. ipv6_privacy="NO" # Use privacy address on RA-receiving IFs # (RFC 4941) route6d_enable="NO" # Set to YES to enable an IPv6 routing daemon. route6d_program="/usr/sbin/route6d" # Name of IPv6 routing daemon. route6d_flags="" # Flags to IPv6 routing daemon. #route6d_flags="-l" # Example for route6d with only IPv6 site local # addrs. #route6d_flags="-q" # If you want to run a routing daemon on an end # node, you should stop advertisement. #ipv6_network_interfaces="ed0 ep0" # Examples for router # or static configuration for end node. # Choose correct prefix value. #ipv6_prefix_ed0="fec0:0000:0000:0001 fec0:0000:0000:0002" # Examples for rtr. #ipv6_prefix_ep0="fec0:0000:0000:0003 fec0:0000:0000:0004" # Examples for rtr. ipv6_default_interface="NO" # Default output interface for scoped addrs. # This works only with # ipv6_gateway_enable="NO". rtsol_flags="" # Flags to IPv6 router solicitation. rtsold_enable="NO" # Set to YES to enable an IPv6 router # solicitation daemon. rtsold_flags="-a" # Flags to an IPv6 router solicitation # daemon. rtadvd_enable="NO" # Set to YES to enable an IPv6 router # advertisement daemon. If set to YES, # this router becomes a possible candidate # IPv6 default router for local subnets. rtadvd_interfaces="" # Interfaces rtadvd sends RA packets. mroute6d_enable="NO" # Do IPv6 multicast routing. mroute6d_program="/usr/local/sbin/pim6dd" # Name of IPv6 multicast # routing daemon. You need to # install it from package or # port. mroute6d_flags="" # Flags to IPv6 multicast routing daemon. stf_interface_ipv4addr="" # Local IPv4 addr for 6to4 IPv6 over IPv4 # tunneling interface. Specify this entry # to enable 6to4 interface. stf_interface_ipv4plen="0" # Prefix length for 6to4 IPv4 addr, # to limit peer addr range. Effective value # is 0-31. stf_interface_ipv6_ifid="0:0:0:1" # IPv6 interface id for stf0. # If you like, you can set "AUTO" for this. stf_interface_ipv6_slaid="0000" # IPv6 Site Level Aggregator for stf0 ipv6_ipv4mapping="NO" # Set to "YES" to enable IPv4 mapped IPv6 addr # communication. (like ::ffff:a.b.c.d) ipv6_ipfilter_rules="/etc/ipf6.rules" # rules definition file for ipfilter, # see /usr/src/contrib/ipfilter/rules # for examples ip6addrctl_enable="YES" # Set to YES to enable default address selection ip6addrctl_verbose="NO" # Set to YES to enable verbose configuration messages ip6addrctl_policy="AUTO" # A pre-defined address selection policy # (ipv4_prefer, ipv6_prefer, or AUTO) ############################################################## ### System console options ################################# ############################################################## keyboard="" # keyboard device to use (default /dev/kbd0). keymap="NO" # keymap in /usr/share/{syscons,vt}/keymaps/* (or NO). keyrate="NO" # keyboard rate to: slow, normal, fast (or NO). keybell="NO" # See kbdcontrol(1) for options. Use "off" to disable. keychange="NO" # function keys default values (or NO). cursor="NO" # cursor type {normal|blink|destructive} (or NO). scrnmap="NO" # screen map in /usr/share/syscons/scrnmaps/* (or NO). font8x16="NO" # font 8x16 from /usr/share/{syscons,vt}/fonts/* (or NO). font8x14="NO" # font 8x14 from /usr/share/{syscons,vt}/fonts/* (or NO). font8x8="NO" # font 8x8 from /usr/share/{syscons,vt}/fonts/* (or NO). blanktime="300" # blank time (in seconds) or "NO" to turn it off. saver="NO" # screen saver: Uses /boot/kernel/${saver}_saver.ko moused_nondefault_enable="YES" # Treat non-default mice as enabled unless # specifically overriden in rc.conf(5). moused_enable="NO" # Run the mouse daemon. moused_type="auto" # See man page for rc.conf(5) for available settings. moused_port="/dev/psm0" # Set to your mouse port. moused_flags="" # Any additional flags to moused. mousechar_start="NO" # if 0xd0-0xd3 default range is occupied in your # language code table, specify alternative range # start like mousechar_start=3, see vidcontrol(1) allscreens_flags="" # Set this vidcontrol mode for all virtual screens allscreens_kbdflags="" # Set this kbdcontrol mode for all virtual screens ############################################################## ### Mail Transfer Agent (MTA) options ###################### ############################################################## mta_start_script="/etc/rc.sendmail" # Script to start your chosen MTA, called by /etc/rc. # Settings for /etc/rc.sendmail and /etc/rc.d/sendmail: sendmail_enable="NO" # Run the sendmail inbound daemon (YES/NO). sendmail_pidfile="/var/run/sendmail.pid" # sendmail pid file sendmail_procname="/usr/sbin/sendmail" # sendmail process name sendmail_flags="-L sm-mta -bd -q30m" # Flags to sendmail (as a server) sendmail_cert_create="YES" # Create a server certificate if none (YES/NO) #sendmail_cert_cn="CN" # CN of the generate certificate sendmail_submit_enable="YES" # Start a localhost-only MTA for mail submission sendmail_submit_flags="-L sm-mta -bd -q30m -ODaemonPortOptions=Addr=localhost" # Flags for localhost-only MTA sendmail_outbound_enable="YES" # Dequeue stuck mail (YES/NO). sendmail_outbound_flags="-L sm-queue -q30m" # Flags to sendmail (outbound only) sendmail_msp_queue_enable="YES" # Dequeue stuck clientmqueue mail (YES/NO). sendmail_msp_queue_flags="-L sm-msp-queue -Ac -q30m" # Flags for sendmail_msp_queue daemon. sendmail_rebuild_aliases="NO" # Run newaliases if necessary (YES/NO). ############################################################## ### Miscellaneous administrative options ################### ############################################################## auditd_enable="NO" # Run the audit daemon. auditd_program="/usr/sbin/auditd" # Path to the audit daemon. auditd_flags="" # Which options to pass to the audit daemon. auditdistd_enable="NO" # Run the audit daemon. auditdistd_program="/usr/sbin/auditdistd" # Path to the auditdistd daemon. auditdistd_flags="" # Which options to pass to the auditdistd daemon. cron_enable="YES" # Run the periodic job daemon. cron_program="/usr/sbin/cron" # Which cron executable to run (if enabled). cron_dst="YES" # Handle DST transitions intelligently (YES/NO) cron_flags="" # Which options to pass to the cron daemon. lpd_enable="NO" # Run the line printer daemon. lpd_program="/usr/sbin/lpd" # path to lpd, if you want a different one. lpd_flags="" # Flags to lpd (if enabled). nscd_enable="NO" # Run the nsswitch caching daemon. chkprintcap_enable="NO" # Run chkprintcap(8) before running lpd. chkprintcap_flags="-d" # Create missing directories by default. dumpdev="AUTO" # Device to crashdump to (device name, AUTO, or NO). dumpdir="/var/crash" # Directory where crash dumps are to be stored savecore_flags="-m 10" # Used if dumpdev is enabled above, and present. # By default, only the 10 most recent kernel dumps # are saved. crashinfo_enable="YES" # Automatically generate crash dump summary. crashinfo_program="/usr/sbin/crashinfo" # Script to generate crash dump summary. quota_enable="NO" # turn on quotas on startup (or NO). check_quotas="YES" # Check quotas on startup (or NO). quotaon_flags="-a" # Turn quotas on for all file systems (if enabled) quotaoff_flags="-a" # Turn quotas off for all file systems at shutdown quotacheck_flags="-a" # Check all file system quotas (if enabled) accounting_enable="NO" # Turn on process accounting (or NO). ibcs2_enable="NO" # Ibcs2 (SCO) emulation loaded at startup (or NO). ibcs2_loaders="coff" # List of additional Ibcs2 loaders (or NO). firstboot_sentinel="/firstboot" # Scripts with "firstboot" keyword are run if # this file exists. Should be on a R/W filesystem so # the file can be deleted after the boot completes. # Emulation/compatibility services provided by /etc/rc.d/abi sysvipc_enable="NO" # Load System V IPC primitives at startup (or NO). linux_enable="NO" # Linux binary compatibility loaded at startup (or NO). svr4_enable="NO" # SysVR4 emulation loaded at startup (or NO). clear_tmp_enable="NO" # Clear /tmp at startup. clear_tmp_X="YES" # Clear and recreate X11-related directories in /tmp ldconfig_insecure="NO" # Set to YES to disable ldconfig security checks ldconfig_paths="/usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg" # shared library search paths ldconfig32_paths="/usr/lib32 /usr/lib32/compat" # 32-bit compatibility shared library search paths ldconfigsoft_paths="/usr/libsoft /usr/libsoft/compat /usr/local/libsoft" # soft float compatibility shared library search paths # Note: temporarily with extra stuff for transition ldconfig_paths_aout="/usr/lib/compat/aout /usr/local/lib/aout" # a.out shared library search paths ldconfig_local_dirs="/usr/local/libdata/ldconfig" # Local directories with ldconfig configuration files. ldconfig_local32_dirs="/usr/local/libdata/ldconfig32" # Local directories with 32-bit compatibility ldconfig # configuration files. ldconfig_localsoft_dirs="/usr/local/libdata/ldconfigsoft" # Local directories with soft float compatibility ldconfig # configuration files. kern_securelevel_enable="NO" # kernel security level (see security(7)) kern_securelevel="-1" # range: -1..3 ; `-1' is the most insecure # Note that setting securelevel to 0 will result # in the system booting with securelevel set to 1, as # init(8) will raise the level when rc(8) completes. update_motd="YES" # update version info in /etc/motd (or NO) entropy_boot_file="/boot/entropy" # Set to NO to disable very early # (used at early boot time) entropy caching through reboots. entropy_file="/entropy" # Set to NO to disable late (used when going multi-user) # entropy through reboots. # /var/db/entropy-file is preferred if / is not avail. entropy_dir="/var/db/entropy" # Set to NO to disable caching entropy via cron. entropy_save_sz="4096" # Size of the entropy cache files. entropy_save_num="8" # Number of entropy cache files to save. harvest_mask="511" # Entropy device harvests all but the very invasive sources. # (See 'sysctl kern.random.harvest' and random(4)) dmesg_enable="YES" # Save dmesg(8) to /var/run/dmesg.boot watchdogd_enable="NO" # Start the software watchdog daemon watchdogd_flags="" # Flags to watchdogd (if enabled) devfs_rulesets="/etc/defaults/devfs.rules /etc/devfs.rules" # Files containing # devfs(8) rules. devfs_system_ruleset="" # The name (NOT number) of a ruleset to apply to /dev devfs_set_rulesets="" # A list of /mount/dev=ruleset_name settings to # apply (must be mounted already, i.e. fstab(5)) devfs_load_rulesets="YES" # Enable to always load the default rulesets performance_cx_lowest="C2" # Online CPU idle state performance_cpu_freq="NONE" # Online CPU frequency economy_cx_lowest="Cmax" # Offline CPU idle state economy_cpu_freq="NONE" # Offline CPU frequency virecover_enable="YES" # Perform housekeeping for the vi(1) editor ugidfw_enable="NO" # Load mac_bsdextended(4) rules on boot bsdextended_script="/etc/rc.bsdextended" # Default mac_bsdextended(4) # ruleset file. newsyslog_enable="YES" # Run newsyslog at startup. newsyslog_flags="-CN" # Newsyslog flags to create marked files mixer_enable="YES" # Run the sound mixer. opensm_enable="NO" # Opensm(8) for infiniband devices defaults to off casperd_enable="YES" # casperd(8) daemon # rctl(8) requires kernel options RACCT and RCTL rctl_enable="YES" # Load rctl(8) rules on boot rctl_rules="/etc/rctl.conf" # rctl(8) ruleset. See rctl.conf(5). iovctl_files="" # Config files for iovctl(8) ############################################################## ### Jail Configuration (see rc.conf(5) manual page) ########## ############################################################## jail_enable="NO" # Set to NO to disable starting of any jails jail_parallel_start="NO" # Start jails in the background jail_list="" # Space separated list of names of jails ############################################################## ### Define source_rc_confs, the mechanism used by /etc/rc.* ## ### scripts to source rc_conf_files overrides safely. ## ############################################################## if [ -z "${source_rc_confs_defined}" ]; then source_rc_confs_defined=yes source_rc_confs() { local i sourced_files for i in ${rc_conf_files}; do case ${sourced_files} in *:$i:*) ;; *) sourced_files="${sourced_files}:$i:" if [ -r $i ]; then . $i fi ;; esac done } fi Index: projects/clang380-import/etc/ntp.conf =================================================================== --- projects/clang380-import/etc/ntp.conf (revision 294776) +++ projects/clang380-import/etc/ntp.conf (revision 294777) @@ -1,84 +1,86 @@ # # $FreeBSD$ # # Default NTP servers for the FreeBSD operating system. # # Don't forget to enable ntpd in /etc/rc.conf with: # ntpd_enable="YES" # # The driftfile is by default /var/db/ntpd.drift, check # /etc/defaults/rc.conf on how to change the location. # # # The following three servers will give you a random set of three # NTP servers geographically close to you. # See http://www.pool.ntp.org/ for details. Note, the pool encourages # users with a static IP and good upstream NTP servers to add a server # to the pool. See http://www.pool.ntp.org/join.html if you are interested. # # The option `iburst' is used for faster initial synchronization. # server 0.freebsd.pool.ntp.org iburst server 1.freebsd.pool.ntp.org iburst server 2.freebsd.pool.ntp.org iburst #server 3.freebsd.pool.ntp.org iburst # # If you want to pick yourself which country's public NTP server # you want sync against, comment out the above servers, uncomment # the next ones and replace CC with the country's abbreviation. # Make sure that the hostnames resolve to a proper IP address! # # server 0.CC.pool.ntp.org iburst # server 1.CC.pool.ntp.org iburst # server 2.CC.pool.ntp.org iburst # # Security: # # By default, only allow time queries and block all other requests # from unauthenticated clients. # # See http://support.ntp.org/bin/view/Support/AccessRestrictions # for more information. # restrict default limited kod nomodify notrap nopeer noquery restrict -6 default limited kod nomodify notrap nopeer noquery # # Alternatively, the following rules would block all unauthorized access. # #restrict default ignore #restrict -6 default ignore # # In this case, all remote NTP time servers also need to be explicitly # allowed or they would not be able to exchange time information with # this server. # # Please note that this example doesn't work for the servers in # the pool.ntp.org domain since they return multiple A records. # #restrict 0.pool.ntp.org nomodify nopeer noquery notrap #restrict 1.pool.ntp.org nomodify nopeer noquery notrap #restrict 2.pool.ntp.org nomodify nopeer noquery notrap # # The following settings allow unrestricted access from the localhost restrict 127.0.0.1 restrict -6 ::1 restrict 127.127.1.0 # # If a server loses sync with all upstream servers, NTP clients # no longer follow that server. The local clock can be configured # to provide a time source when this happens, but it should usually # be configured on just one server on a network. For more details see # http://support.ntp.org/bin/view/Support/UndisciplinedLocalClock # The use of Orphan Mode may be preferable. # #server 127.127.1.0 #fudge 127.127.1.0 stratum 10 # See http://support.ntp.org/bin/view/Support/ConfiguringNTP#Section_6.14. # for documentation regarding leapfile. Updates to the file can be obtained # from ftp://time.nist.gov/pub/ or ftp://tycho.usno.navy.mil/pub/ntp/. -leapfile "/etc/ntp/leap-seconds" +# Use either leapfile in /etc/ntp or weekly updated leapfile in /var/db. +#leapfile "/etc/ntp/leap-seconds" +leapfile "/var/db/ntpd.leap-seconds.list" Index: projects/clang380-import/etc/periodic/daily/480.leapfile-ntpd =================================================================== --- projects/clang380-import/etc/periodic/daily/480.leapfile-ntpd (nonexistent) +++ projects/clang380-import/etc/periodic/daily/480.leapfile-ntpd (revision 294777) @@ -0,0 +1,28 @@ +#!/bin/sh +# +# $FreeBSD$ +# + +# If there is a global system configuration file, suck it in. +# +if [ -r /etc/defaults/periodic.conf ] +then + . /etc/defaults/periodic.conf + source_periodic_confs +fi + +case "$daily_ntpd_leapfile_enable" in + [Yy][Ee][Ss]) + case "$daily_ntpd_avoid_congestion" in + [Yy][Ee][Ss]) + # Avoid dogpiling + (sleep $(jot -r 1 0 86400); service ntpd fetch) & + ;; + *) + service ntpd fetch + ;; + esac + ;; +esac + +exit $rc Property changes on: projects/clang380-import/etc/periodic/daily/480.leapfile-ntpd ___________________________________________________________________ Added: svn:executable ## -0,0 +1 ## +* \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Index: projects/clang380-import/etc/periodic/daily/Makefile =================================================================== --- projects/clang380-import/etc/periodic/daily/Makefile (revision 294776) +++ projects/clang380-import/etc/periodic/daily/Makefile (revision 294777) @@ -1,57 +1,58 @@ # $FreeBSD$ .include FILES= 100.clean-disks \ 110.clean-tmps \ 120.clean-preserve \ 200.backup-passwd \ 210.backup-aliases \ 330.news \ 400.status-disks \ 401.status-graid \ 406.status-gmirror \ 407.status-graid3 \ 408.status-gstripe \ 409.status-gconcat \ 420.status-network \ 430.status-uptime \ 450.status-security \ 510.status-world-kernel \ 999.local # NB: keep these sorted by MK_* knobs .if ${MK_ACCT} != "no" FILES+= 310.accounting .endif .if ${MK_CALENDAR} != "no" FILES+= 300.calendar .endif .if ${MK_MAIL} != "no" FILES+= 130.clean-msgs .endif .if ${MK_NTP} != "no" -FILES+= 480.status-ntpd +FILES+= 480.status-ntpd \ + 480.leapfile-ntpd .endif .if ${MK_RCMDS} != "no" FILES+= 140.clean-rwho .endif .if ${MK_SENDMAIL} != "no" FILES+= 150.clean-hoststat \ 440.status-mailq \ 460.status-mail-rejects \ 500.queuerun .endif .if ${MK_ZFS} != "no" FILES+= 404.status-zfs \ 800.scrub-zfs .endif .include Index: projects/clang380-import/etc/rc.d/jail =================================================================== --- projects/clang380-import/etc/rc.d/jail (revision 294776) +++ projects/clang380-import/etc/rc.d/jail (revision 294777) @@ -1,575 +1,575 @@ #!/bin/sh # # $FreeBSD$ # # PROVIDE: jail # REQUIRE: LOGIN FILESYSTEMS # BEFORE: securelevel # KEYWORD: nojail shutdown . /etc/rc.subr name="jail" rcvar="jail_enable" start_cmd="jail_start" start_postcmd="jail_warn" stop_cmd="jail_stop" config_cmd="jail_config" console_cmd="jail_console" status_cmd="jail_status" extra_commands="config console status" : ${jail_conf:=/etc/jail.conf} : ${jail_program:=/usr/sbin/jail} : ${jail_consolecmd:=/usr/bin/login -f root} : ${jail_jexec:=/usr/sbin/jexec} : ${jail_jls:=/usr/sbin/jls} need_dad_wait= # extract_var jv name param num defval # Extract value from ${jail_$jv_$name} or ${jail_$name} and # set it to $param. If not defined, $defval is used. # When $num is [0-9]*, ${jail_$jv_$name$num} are looked up and -# $param is set by using +=. +# $param is set by using +=. $num=0 is optional (params may start at 1). # When $num is YN or NY, the value is interpret as boolean. extract_var() { local i _jv _name _param _num _def _name1 _name2 _jv=$1 _name=$2 _param=$3 _num=$4 _def=$5 case $_num in YN) _name1=jail_${_jv}_${_name} _name2=jail_${_name} eval $_name1=\"\${$_name1:-\${$_name2:-$_def}}\" if checkyesno $_name1; then echo " $_param = 1;" else echo " $_param = 0;" fi ;; NY) _name1=jail_${_jv}_${_name} _name2=jail_${_name} eval $_name1=\"\${$_name1:-\${$_name2:-$_def}}\" if checkyesno $_name1; then echo " $_param = 0;" else echo " $_param = 1;" fi ;; [0-9]*) i=$_num while : ; do _name1=jail_${_jv}_${_name}${i} _name2=jail_${_name}${i} eval _tmpargs=\"\${$_name1:-\${$_name2:-$_def}}\" if [ -n "$_tmpargs" ]; then echo " $_param += \"$_tmpargs\";" - else + elif [ $i != 0 ]; then break; fi i=$(($i + 1)) done ;; *) _name1=jail_${_jv}_${_name} _name2=jail_${_name} eval _tmpargs=\"\${$_name1:-\${$_name2:-$_def}}\" if [ -n "$_tmpargs" ]; then echo " $_param = \"$_tmpargs\";" fi ;; esac } # parse_options _j _jv # Parse options and create a temporary configuration file if necessary. # parse_options() { local _j _jv _p _j=$1 _jv=$2 _confwarn=0 if [ -z "$_j" ]; then warn "parse_options: you must specify a jail" return fi eval _jconf=\"\${jail_${_jv}_conf:-/etc/jail.${_j}.conf}\" eval _rootdir=\"\$jail_${_jv}_rootdir\" eval _hostname=\"\$jail_${_jv}_hostname\" if [ -z "$_rootdir" -o \ -z "$_hostname" ]; then if [ -r "$_jconf" ]; then _conf="$_jconf" return 0 elif [ -r "$jail_conf" ]; then _conf="$jail_conf" return 0 else warn "Invalid configuration for $_j " \ "(no jail.conf, no hostname, or no path). " \ "Jail $_j was ignored." fi return 1 fi eval _ip=\"\$jail_${_jv}_ip\" if [ -z "$_ip" ] && ! check_kern_features vimage; then warn "no ipaddress specified and no vimage support. " \ "Jail $_j was ignored." return 1 fi _conf=/var/run/jail.${_j}.conf # # To relieve confusion, show a warning message. # _confwarn=1 if [ -r "$jail_conf" -o -r "$_jconf" ]; then if ! checkyesno jail_parallel_start; then warn "$_conf is created and used for jail $_j." fi fi /usr/bin/install -m 0644 -o root -g wheel /dev/null $_conf || return 1 eval : \${jail_${_jv}_flags:=${jail_flags}} eval _exec=\"\$jail_${_jv}_exec\" eval _exec_start=\"\$jail_${_jv}_exec_start\" eval _exec_stop=\"\$jail_${_jv}_exec_stop\" if [ -n "${_exec}" ]; then # simple/backward-compatible execution _exec_start="${_exec}" _exec_stop="" else # flexible execution if [ -z "${_exec_start}" ]; then _exec_start="/bin/sh /etc/rc" if [ -z "${_exec_stop}" ]; then _exec_stop="/bin/sh /etc/rc.shutdown" fi fi fi eval _interface=\"\${jail_${_jv}_interface:-${jail_interface}}\" eval _parameters=\"\${jail_${_jv}_parameters:-${jail_parameters}}\" eval _fstab=\"\${jail_${_jv}_fstab:-${jail_fstab:-/etc/fstab.$_j}}\" ( date +"# Generated by rc.d/jail at %Y-%m-%d %H:%M:%S" echo "$_j {" extract_var $_jv hostname host.hostname - "" extract_var $_jv rootdir path - "" if [ -n "$_ip" ]; then extract_var $_jv interface interface - "" jail_handle_ips_option $_ip $_interface alias=0 while : ; do eval _x=\"\$jail_${_jv}_ip_multi${alias}\" [ -z "$_x" ] && break jail_handle_ips_option $_x $_interface alias=$(($alias + 1)) done case $need_dad_wait in 1) # Sleep to let DAD complete before # starting services. echo " exec.start += \"sleep " \ $(($(${SYSCTL_N} net.inet6.ip6.dad_count) + 1)) \ "\";" ;; esac # These are applicable only to non-vimage jails. extract_var $_jv fib exec.fib - "" extract_var $_jv socket_unixiproute_only \ allow.raw_sockets NY YES else echo " vnet;" extract_var $_jv vnet_interface vnet.interface - "" fi echo " exec.clean;" echo " exec.system_user = \"root\";" echo " exec.jail_user = \"root\";" extract_var $_jv exec_prestart exec.prestart 0 "" extract_var $_jv exec_poststart exec.poststart 0 "" extract_var $_jv exec_prestop exec.prestop 0 "" extract_var $_jv exec_poststop exec.poststop 0 "" echo " exec.start += \"$_exec_start\";" - extract_var $_jv exec_afterstart exec.start 1 "" + extract_var $_jv exec_afterstart exec.start 0 "" echo " exec.stop = \"$_exec_stop\";" extract_var $_jv consolelog exec.consolelog - \ /var/log/jail_${_j}_console.log if [ -r $_fstab ]; then echo " mount.fstab = \"$_fstab\";" fi eval : \${jail_${_jv}_devfs_enable:=${jail_devfs_enable:-NO}} if checkyesno jail_${_jv}_devfs_enable; then echo " mount.devfs;" eval _ruleset=\${jail_${_jv}_devfs_ruleset:-${jail_devfs_ruleset}} case $_ruleset in "") ;; [0-9]*) echo " devfs_ruleset = \"$_ruleset\";" ;; devfsrules_jail) # XXX: This is the default value, # Let jail(8) to use the default because # mount(8) only accepts an integer. # This should accept a ruleset name. ;; *) warn "devfs_ruleset must be an integer." ;; esac fi eval : \${jail_${_jv}_fdescfs_enable:=${jail_fdescfs_enable:-NO}} if checkyesno jail_${_jv}_fdescfs_enable; then echo " mount.fdescfs;" fi eval : \${jail_${_jv}_procfs_enable:=${jail_procfs_enable:-NO}} if checkyesno jail_${_jv}_procfs_enable; then echo " mount.procfs;" fi eval : \${jail_${_jv}_mount_enable:=${jail_mount_enable:-NO}} if checkyesno jail_${_jv}_mount_enable; then echo " allow.mount;" fi extract_var $_jv set_hostname_allow allow.set_hostname YN NO extract_var $_jv sysvipc_allow allow.sysvipc YN NO extract_var $_jv osreldate osreldate extract_var $_jv osrelease osrelease for _p in $_parameters; do echo " ${_p%\;};" done echo "}" ) >> $_conf return 0 } # jail_extract_address argument iface # The second argument is the string from one of the _ip # or the _multi variables. In case of a comma separated list # only one argument must be passed in at a time. # The function alters the _type, _iface, _addr and _mask variables. # jail_extract_address() { local _i _interface _i=$1 _interface=$2 if [ -z "${_i}" ]; then warn "jail_extract_address: called without input" return fi # Check if we have an interface prefix given and split into # iFace and rest. case "${_i}" in *\|*) # ifN|.. prefix there _iface=${_i%%|*} _r=${_i##*|} ;; *) _iface="" _r=${_i} ;; esac # In case the IP has no interface given, check if we have a global one. _iface=${_iface:-${_interface}} # Set address, cut off any prefix/netmask/prefixlen. _addr=${_r} _addr=${_addr%%[/ ]*} # Theoretically we can return here if interface is not set, # as we only care about the _mask if we call ifconfig. # This is not done because we may want to santize IP addresses # based on _type later, and optionally change the type as well. # Extract the prefix/netmask/prefixlen part by cutting off the address. _mask=${_r} _mask=`expr "${_mask}" : "${_addr}\(.*\)"` # Identify type {inet,inet6}. case "${_addr}" in *\.*\.*\.*) _type="inet" ;; *:*) _type="inet6" ;; *) warn "jail_extract_address: type not identified" ;; esac # Handle the special /netmask instead of /prefix or # "netmask xxx" case for legacy IP. # We do NOT support shortend class-full netmasks. if [ "${_type}" = "inet" ]; then case "${_mask}" in /*\.*\.*\.*) _mask=" netmask ${_mask#/}" ;; *) ;; esac # In case _mask is still not set use /32. _mask=${_mask:-/32} elif [ "${_type}" = "inet6" ]; then # In case _mask is not set for IPv6, use /128. _mask=${_mask:-/128} fi } # jail_handle_ips_option input iface # Handle a single argument imput which can be a comma separated # list of addresses (theoretically with an option interface and # prefix/netmask/prefixlen). # jail_handle_ips_option() { local _x _type _i _defif _x=$1 _defif=$2 if [ -z "${_x}" ]; then # No IP given. This can happen for the primary address # of each address family. return fi # Loop, in case we find a comma separated list, we need to handle # each argument on its own. while [ ${#_x} -gt 0 ]; do case "${_x}" in *,*) # Extract the first argument and strip it off the list. _i=`expr "${_x}" : '^\([^,]*\)'` _x=`expr "${_x}" : "^[^,]*,\(.*\)"` ;; *) _i=${_x} _x="" ;; esac _type="" _addr="" _mask="" _iface="" jail_extract_address $_i $_defif # make sure we got an address. case $_addr in "") continue ;; *) ;; esac # Append address to list of addresses for the jail command. case $_type in inet) echo " ip4.addr += \"${_iface:+${_iface}|}${_addr}${_mask}\";" ;; inet6) echo " ip6.addr += \"${_iface:+${_iface}|}${_addr}${_mask}\";" need_dad_wait=1 ;; esac done } jail_config() { local _j _jv case $1 in _ALL) return ;; esac for _j in $@; do _j=$(echo $_j | tr /. _) _jv=$(echo -n $_j | tr -c '[:alnum:]' _) if parse_options $_j $_jv; then echo "$_j: parameters are in $_conf." fi done } jail_console() { local _j _jv _cmd # One argument that is not _ALL. case $#:$1 in 0:*|1:_ALL) err 3 "Specify a jail name." ;; 1:*) ;; esac _j=$(echo $1 | tr /. _) _jv=$(echo -n $1 | tr -c '[:alnum:]' _) shift case $# in 0) eval _cmd=\${jail_${_jv}_consolecmd:-$jail_consolecmd} ;; *) _cmd=$@ ;; esac $jail_jexec $_j $_cmd } jail_status() { $jail_jls -N } jail_start() { local _j _jv _jid _jl _id _name if [ $# = 0 ]; then return fi echo -n 'Starting jails:' case $1 in _ALL) command=$jail_program rc_flags=$jail_flags command_args="-f $jail_conf -c" _tmp=`mktemp -t jail` || exit 3 if $command $rc_flags $command_args >> $_tmp 2>&1; then $jail_jls jid name | while read _id _name; do echo -n " $_name" echo $_id > /var/run/jail_${_name}.id done else tail -1 $_tmp fi rm -f $_tmp echo '.' return ;; esac if checkyesno jail_parallel_start; then # # Start jails in parallel and then check jail id when # jail_parallel_start is YES. # _jl= for _j in $@; do _j=$(echo $_j | tr /. _) _jv=$(echo -n $_j | tr -c '[:alnum:]' _) parse_options $_j $_jv || continue _jl="$_jl $_j" eval rc_flags=\${jail_${_jv}_flags:-$jail_flags} eval command=\${jail_${_jv}_program:-$jail_program} command_args="-i -f $_conf -c $_j" $command $rc_flags $command_args \ >/dev/null 2>&1 /var/run/jail_${_j}.id else rm -f /var/run/jail_${_j}.id echo " cannot start jail " \ "\"${_hostname:-${_j}}\": " fi done else # # Start jails one-by-one when jail_parallel_start is NO. # for _j in $@; do _j=$(echo $_j | tr /. _) _jv=$(echo -n $_j | tr -c '[:alnum:]' _) parse_options $_j $_jv || continue eval rc_flags=\${jail_${_jv}_flags:-$jail_flags} eval command=\${jail_${_jv}_program:-$jail_program} command_args="-i -f $_conf -c $_j" _tmp=`mktemp -t jail` || exit 3 if $command $rc_flags $command_args \ >> $_tmp 2>&1 /var/run/jail_${_j}.id else rm -f /var/run/jail_${_j}.id echo " cannot start jail " \ "\"${_hostname:-${_j}}\": " cat $_tmp fi rm -f $_tmp done fi echo '.' } jail_stop() { local _j _jv if [ $# = 0 ]; then return fi echo -n 'Stopping jails:' case $1 in _ALL) command=$jail_program rc_flags=$jail_flags command_args="-f $jail_conf -r" $jail_jls name | while read _j; do echo -n " $_j" _tmp=`mktemp -t jail` || exit 3 $command $rc_flags $command_args $_j >> $_tmp 2>&1 if $jail_jls -j $_j > /dev/null 2>&1; then tail -1 $_tmp else rm -f /var/run/jail_${_j}.id fi rm -f $_tmp done echo '.' return ;; esac for _j in $@; do _j=$(echo $_j | tr /. _) _jv=$(echo -n $_j | tr -c '[:alnum:]' _) parse_options $_j $_jv || continue if ! $jail_jls -j $_j > /dev/null 2>&1; then continue fi eval command=\${jail_${_jv}_program:-$jail_program} echo -n " ${_hostname:-${_j}}" _tmp=`mktemp -t jail` || exit 3 $command -q -f $_conf -r $_j >> $_tmp 2>&1 if $jail_jls -j $_j > /dev/null 2>&1; then tail -1 $_tmp else rm -f /var/run/jail_${_j}.id fi rm -f $_tmp done echo '.' } jail_warn() { # To relieve confusion, show a warning message. case $_confwarn in 1) warn "Per-jail configuration via jail_* variables " \ "is obsolete. Please consider to migrate to $jail_conf." ;; esac } load_rc_config $name case $# in 1) run_rc_command $@ ${jail_list:-_ALL} ;; *) run_rc_command $@ ;; esac Index: projects/clang380-import/etc/rc.d/ntpd =================================================================== --- projects/clang380-import/etc/rc.d/ntpd (revision 294776) +++ projects/clang380-import/etc/rc.d/ntpd (revision 294777) @@ -1,53 +1,120 @@ #!/bin/sh # # $FreeBSD$ # # PROVIDE: ntpd # REQUIRE: DAEMON ntpdate FILESYSTEMS devfs # BEFORE: LOGIN # KEYWORD: nojail shutdown . /etc/rc.subr name="ntpd" rcvar="ntpd_enable" command="/usr/sbin/${name}" pidfile="/var/run/${name}.pid" +extra_commands="fetch" +fetch_cmd="ntpd_fetch_leapfile" start_precmd="ntpd_precmd" load_rc_config $name ntpd_precmd() { rc_flags="-c ${ntpd_config} ${ntpd_flags}" if checkyesno ntpd_sync_on_start; then rc_flags="-g $rc_flags" fi if [ -z "$ntpd_chrootdir" ]; then return 0; fi + if [ ! -f $ntp_db_leapfile ]; then + ntpd_fetch_leapfile + fi + # If running in a chroot cage, ensure that the appropriate files # exist inside the cage, as well as helper symlinks into the cage # from outside. # # As this is called after the is_running and required_dir checks # are made in run_rc_command(), we can safely assume ${ntpd_chrootdir} # exists and ntpd isn't running at this point (unless forcestart # is used). # if [ ! -c "${ntpd_chrootdir}/dev/clockctl" ]; then rm -f "${ntpd_chrootdir}/dev/clockctl" ( cd /dev ; /bin/pax -rw -pe clockctl "${ntpd_chrootdir}/dev" ) fi ln -fs "${ntpd_chrootdir}/var/db/ntp.drift" /var/db/ntp.drift + ln -fs "${ntpd_chrootdir}${ntp_tmp_leapfile}" ${ntp_tmp_leapfile} # Change run_rc_commands()'s internal copy of $ntpd_flags # rc_flags="-u ntpd:ntpd -i ${ntpd_chrootdir} $rc_flags" +} + +current_ntp_ts() { + # Seconds between 1900-01-01 and 1970-01-01 + # echo $(((70*365+17)*86400)) + ntp_to_unix=2208988800 + + echo $(($(date -u +%s)+$ntp_to_unix)) +} + +get_ntp_leapfile_ver() { + expr "$(awk '$1 == "#$" { print $2 }' "$1" 2>/dev/null)" : \ + '^\([1-9][0-9]*\)$' \| 0 +} + +get_ntp_leapfile_expiry() { + expr "$(awk '$1 == "#@" { print $2 }' "$1" 2>/dev/null)" : \ + '^\([1-9][0-9]*\)$' \| 0 +} + +ntpd_fetch_leapfile() { + local ntp_tmp_leapfile rc verbose + + if checkyesno ntp_leapfile_fetch_verbose; then + verbose=echo + else + verbose=: + fi + + ntp_tmp_leapfile="/var/run/ntpd.leap-seconds.list" + + ntp_ver_no_src=$(get_ntp_leapfile_ver $ntp_src_leapfile) + ntp_ver_no_db=$(get_ntp_leapfile_ver $ntp_db_leapfile) + $verbose ntp_src_leapfile version is $ntp_ver_no_src + $verbose ntp_db_leapfile version is $ntp_ver_no_db + + if [ "$ntp_ver_no_src" -gt "$ntp_ver_no_db" ]; then + $verbose replacing $ntp_db_leapfile with $ntp_src_leapfile + cp -p $ntp_src_leapfile $ntp_db_leapfile + ntp_ver_no_db=$ntp_ver_no_src + else + $verbose not replacing $ntp_db_leapfile with $ntp_src_leapfile + fi + ntp_leap_expiry=$(get_ntp_leapfile_expiry $ntp_db_leapfile) + ntp_leapfile_expiry_seconds=$((ntp_leapfile_expiry_days*86400)) + ntp_leap_fetch_date=$((ntp_leap_expiry-ntp_leapfile_expiry_seconds)) + if [ $(current_ntp_ts) -ge $ntp_leap_fetch_date ]; then + $verbose Within ntp leapfile expiry limit, initiating fetch + for url in $ntp_leapfile_sources ; do + $verbose fetching $url + fetch -mqo $ntp_tmp_leapfile $url && break + done + ntp_ver_no_tmp=$(get_ntp_leapfile_ver $ntp_tmp_leapfile) + if [ "$ntp_ver_no_tmp" -gt "$ntp_ver_no_db" ]; then + $verbose using $url as $ntp_db_leapfile + mv $ntp_tmp_leapfile $ntp_db_leapfile + else + $verbose using existing $ntp_db_leapfile + fi + fi } run_rc_command "$1" Index: projects/clang380-import/gnu/lib/libreadline/readline/Makefile =================================================================== --- projects/clang380-import/gnu/lib/libreadline/readline/Makefile (revision 294776) +++ projects/clang380-import/gnu/lib/libreadline/readline/Makefile (revision 294777) @@ -1,29 +1,29 @@ # $FreeBSD$ LIB= readline INTERNALLIB= yes -NO_MAN= yes +MAN= TILDESRC= tilde.c SRCS= readline.c vi_mode.c funmap.c keymaps.c parens.c search.c \ rltty.c complete.c bind.c isearch.c display.c signals.c \ util.c kill.c undo.c macro.c input.c callback.c terminal.c \ text.c nls.c misc.c compat.c xmalloc.c $(HISTSRC) $(TILDESRC) INSTALLED_HEADERS= readline.h chardefs.h keymaps.h history.h tilde.h \ rlstdc.h rlconf.h rltypedefs.h CFLAGS+= -I${.OBJDIR}/.. SRCDIR= ${.CURDIR}/../../../../contrib/libreadline CLEANFILES+= ${INSTALLED_HEADERS} SRCS+= ${INSTALLED_HEADERS} .for _h in ${INSTALLED_HEADERS} ${_h}: ${SRCDIR}/${_h} .NOMETA ln -sf ${.ALLSRC} ${.TARGET} .endfor LIBADD= ncursesw .include Index: projects/clang380-import/gnu/lib =================================================================== --- projects/clang380-import/gnu/lib (revision 294776) +++ projects/clang380-import/gnu/lib (revision 294777) Property changes on: projects/clang380-import/gnu/lib ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/gnu/lib:r294599-294776 Index: projects/clang380-import/lib/Makefile =================================================================== --- projects/clang380-import/lib/Makefile (revision 294776) +++ projects/clang380-import/lib/Makefile (revision 294777) @@ -1,315 +1,316 @@ # @(#)Makefile 8.1 (Berkeley) 6/4/93 # $FreeBSD$ .include # The SUBDIR_ORDERED list is a small set of libraries which are used by many # of the other libraries. These are built first with a .WAIT between them # and the main list to avoid needing a SUBDIR_DEPEND line on every library # naming just these few items. SUBDIR_ORDERED= csu \ .WAIT \ libc \ libc_nonshared \ libcompiler_rt \ ${_libclang_rt} \ ${_libcplusplus} \ ${_libcxxrt} \ libelf \ msun # The main list; please keep these sorted alphabetically. SUBDIR= ${SUBDIR_ORDERED} \ .WAIT \ libalias \ libarchive \ ${_libatm} \ libauditd \ libbegemot \ libblocksruntime \ ${_libbluetooth} \ ${_libbsnmp} \ libbsdstat \ libbsm \ libbz2 \ libcalendar \ libcam \ ${_libcapsicum} \ ${_libcasper} \ ${_libcom_err} \ libcompat \ libcrypt \ libdevctl \ libdevinfo \ libdevstat \ libdpv \ libdwarf \ libedit \ ${_libelftc} \ libevent \ libexecinfo \ libexpat \ libfetch \ libfigpar \ libgeom \ ${_libgpio} \ ${_libgssapi} \ ${_librpcsec_gss} \ ${_libiconv_modules} \ libipsec \ libjail \ libkiconv \ libkvm \ ${_libldns} \ liblzma \ ${_libmagic} \ libmemstat \ libmd \ ${_libmilter} \ ${_libmp} \ libmt \ ${_libnandfs} \ lib80211 \ libnetbsd \ ${_libnetgraph} \ ${_libngatm} \ libnv \ libopenbsd \ libopie \ libpam \ libpcap \ libpjdlog \ ${_libpmc} \ ${_libproc} \ libprocstat \ ${_libradius} \ librpcsvc \ librt \ ${_librtld_db} \ libsbuf \ ${_libsdp} \ ${_libsm} \ libsmb \ ${_libsmdb} \ ${_libsmutil} \ libsqlite3 \ libstand \ libstdbuf \ libstdthreads \ libsysdecode \ libtacplus \ ${_libtelnet} \ ${_libthr} \ libthread_db \ libucl \ libufs \ libugidfw \ libulog \ ${_libunbound} \ ${_libusbhid} \ ${_libusb} \ libutil \ ${_libvgl} \ ${_libvmmapi} \ libwrap \ libxo \ liby \ ${_libypclnt} \ libz \ ncurses \ ${_atf} \ ${_clang} \ ${_cuse} \ ${_tests} # Inter-library dependencies. When the makefile for a library contains LDADD # libraries, those libraries should be listed as build order dependencies here. SUBDIR_DEPEND_libarchive= libz libbz2 libexpat liblzma libmd SUBDIR_DEPEND_libatm= libmd SUBDIR_DEPEND_libauditdm= libbsm SUBDIR_DEPEND_libbsnmp= ${_libnetgraph} SUBDIR_DEPEND_libc++:= libcxxrt SUBDIR_DEPEND_libc= libcompiler_rt SUBDIR_DEPEND_libcam= libsbuf SUBDIR_DEPEND_libcapsicum= libnv SUBDIR_DEPEND_libcasper= libcapsicum libnv libpjdlog SUBDIR_DEPEND_libdevstat= libkvm SUBDIR_DEPEND_libdpv= libfigpar ncurses libutil SUBDIR_DEPEND_libedit= ncurses SUBDIR_DEPEND_libgeom= libexpat libsbuf SUBDIR_DEPEND_liblibrpcsec_gss= libgssapi SUBDIR_DEPEND_libmagic= libz SUBDIR_DEPEND_libmemstat= libkvm SUBDIR_DEPEND_libopie= libmd SUBDIR_DEPEND_libpam= libcrypt libopie ${_libradius} librpcsvc libtacplus libutil ${_libypclnt} ${_libcom_err} SUBDIR_DEPEND_libpjdlog= libutil SUBDIR_DEPEND_libprocstat= libkvm libutil SUBDIR_DEPEND_libradius= libmd SUBDIR_DEPEND_libsmb= libkiconv SUBDIR_DEPEND_libtacplus= libmd SUBDIR_DEPEND_libulog= libmd SUBDIR_DEPEND_libunbound= ${_libldns} SUBDIR_DEPEND_liblzma= ${_libthr} # NB: keep these sorted by MK_* knobs .if ${MK_ATM} != "no" _libngatm= libngatm .endif .if ${MK_BLUETOOTH} != "no" _libbluetooth= libbluetooth _libsdp= libsdp .endif .if ${MK_BSNMP} != "no" _libbsnmp= libbsnmp .endif .if ${MK_CASPER} != "no" _libcapsicum= libcapsicum _libcasper= libcasper .endif .if ${MK_CLANG} != "no" && !defined(COMPAT_32BIT) _clang= clang .endif .if ${MK_CUSE} != "no" _cuse= libcuse .endif .if ${MK_TOOLCHAIN} != "no" _libelftc= libelftc .endif .if ${MK_FILE} != "no" _libmagic= libmagic .endif .if ${MK_GPIO} != "no" _libgpio= libgpio .endif .if ${MK_GSSAPI} != "no" _libgssapi= libgssapi _librpcsec_gss= librpcsec_gss .endif .if ${MK_ICONV} != "no" _libiconv_modules= libiconv_modules .endif .if ${MK_KERBEROS_SUPPORT} != "no" _libcom_err= libcom_err .endif .if ${MK_LDNS} != "no" _libldns= libldns .endif # The libraries under libclang_rt can only be built by clang, and only make # sense to build when clang is enabled at all. Furthermore, they can only be # built for certain architectures. .if ${MK_CLANG} != "no" && ${COMPILER_TYPE} == "clang" && \ (${MACHINE_CPUARCH} == "aarch64" || ${MACHINE_CPUARCH} == "amd64" || \ (${MACHINE_CPUARCH} == "arm" && ${MACHINE_ARCH} != "armeb") || \ (${MACHINE_CPUARCH} == "i386")) _libclang_rt= libclang_rt .endif .if ${MK_LIBCPLUSPLUS} != "no" _libcxxrt= libcxxrt _libcplusplus= libc++ .endif .if ${MK_LIBTHR} != "no" _libthr= libthr .endif .if ${MK_NAND} != "no" _libnandfs= libnandfs .endif .if ${MK_NETGRAPH} != "no" _libnetgraph= libnetgraph .endif .if ${MK_NIS} != "no" _libypclnt= libypclnt .endif .if ${MACHINE_CPUARCH} == "i386" || ${MACHINE_CPUARCH} == "amd64" _libvgl= libvgl _libproc= libproc _librtld_db= librtld_db .endif .if ${MACHINE_CPUARCH} == "amd64" .if ${MK_BHYVE} != "no" _libvmmapi= libvmmapi .endif .endif .if ${MACHINE_CPUARCH} == "mips" _libproc= libproc _librtld_db= librtld_db .endif .if ${MACHINE_CPUARCH} == "powerpc" _libproc= libproc _librtld_db= librtld_db .endif -.if ${MACHINE_CPUARCH} == "aarch64" || ${MACHINE_CPUARCH} == "arm" +.if ${MACHINE_CPUARCH} == "aarch64" || ${MACHINE_CPUARCH} == "arm" || \ + ${MACHINE_CPUARCH} == "riscv" _libproc= libproc _librtld_db= librtld_db .endif .if ${MK_OPENSSL} != "no" _libmp= libmp .endif .if ${MK_PMC} != "no" _libpmc= libpmc .endif .if ${MK_RADIUS_SUPPORT} != "no" _libradius= libradius .endif .if ${MK_SENDMAIL} != "no" _libmilter= libmilter _libsm= libsm _libsmdb= libsmdb _libsmutil= libsmutil .endif .if ${MK_TELNET} != "no" _libtelnet= libtelnet .endif .if ${MK_TESTS_SUPPORT} != "no" _atf= atf .endif .if ${MK_TESTS} != "no" _tests= tests .endif .if ${MK_UNBOUND} != "no" _libunbound= libunbound .endif .if ${MK_USB} != "no" _libusbhid= libusbhid _libusb= libusb .endif .if !make(install) SUBDIR_PARALLEL= .endif .include Index: projects/clang380-import/lib/libc/Makefile =================================================================== --- projects/clang380-import/lib/libc/Makefile (revision 294776) +++ projects/clang380-import/lib/libc/Makefile (revision 294777) @@ -1,196 +1,196 @@ # @(#)Makefile 8.2 (Berkeley) 2/3/94 # $FreeBSD$ SHLIBDIR?= /lib .include LIBC_SRCTOP?= ${.CURDIR} # Pick the current architecture directory for libc. In general, this is # named MACHINE_CPUARCH, but some ABIs are different enough to require # their own libc, so allow a directory named MACHINE_ARCH to override this. .if exists(${LIBC_SRCTOP}/${MACHINE_ARCH}) LIBC_ARCH=${MACHINE_ARCH} .else LIBC_ARCH=${MACHINE_CPUARCH} .endif # All library objects contain FreeBSD revision strings by default; they may be # excluded as a space-saving measure. To produce a library that does # not contain these strings, add -DSTRIP_FBSDID (see ) to CFLAGS # below. Note: there are no IDs for syscall stubs whose sources are generated. # To include legacy CSRG sccsid strings, add -DLIBC_SCCS and -DSYSLIBC_SCCS # to CFLAGS below. -DSYSLIBC_SCCS affects just the system call stubs. LIB=c SHLIB_MAJOR= 7 SHLIB_LDSCRIPT=libc.ldscript SHLIB_LDSCRIPT_LINKS=libxnet.so WARNS?= 2 CFLAGS+=-I${LIBC_SRCTOP}/include -I${LIBC_SRCTOP}/../../include CFLAGS+=-I${LIBC_SRCTOP}/${LIBC_ARCH} .if ${MK_NLS} != "no" CFLAGS+=-DNLS .endif CLEANFILES+=tags INSTALL_PIC_ARCHIVE= PRECIOUSLIB= .ifndef NO_THREAD_STACK_UNWIND CANCELPOINTS_CFLAGS=-fexceptions CFLAGS+=${CANCELPOINTS_CFLAGS} .endif # # Link with static libcompiler_rt.a. # LDFLAGS+= -nodefaultlibs LIBADD+= compiler_rt .if ${MK_SSP} != "no" LIBADD+= ssp_nonshared .endif # Extras that live in either libc.a or libc_nonshared.a LIBC_NONSHARED_SRCS= # Define (empty) variables so that make doesn't give substitution # errors if the included makefiles don't change these: MDSRCS= MISRCS= MDASM= MIASM= NOASM= .include "${LIBC_SRCTOP}/${LIBC_ARCH}/Makefile.inc" .include "${LIBC_SRCTOP}/db/Makefile.inc" .include "${LIBC_SRCTOP}/compat-43/Makefile.inc" .include "${LIBC_SRCTOP}/gdtoa/Makefile.inc" .include "${LIBC_SRCTOP}/gen/Makefile.inc" .include "${LIBC_SRCTOP}/gmon/Makefile.inc" .if ${MK_ICONV} != "no" .include "${LIBC_SRCTOP}/iconv/Makefile.inc" .endif .include "${LIBC_SRCTOP}/inet/Makefile.inc" .include "${LIBC_SRCTOP}/isc/Makefile.inc" .include "${LIBC_SRCTOP}/locale/Makefile.inc" .include "${LIBC_SRCTOP}/md/Makefile.inc" .include "${LIBC_SRCTOP}/nameser/Makefile.inc" .include "${LIBC_SRCTOP}/net/Makefile.inc" .include "${LIBC_SRCTOP}/nls/Makefile.inc" .include "${LIBC_SRCTOP}/posix1e/Makefile.inc" .if ${LIBC_ARCH} != "aarch64" && \ ${LIBC_ARCH} != "amd64" && \ ${LIBC_ARCH} != "powerpc64" && \ ${LIBC_ARCH} != "riscv" && \ ${LIBC_ARCH} != "sparc64" && \ ${MACHINE_ARCH:Mmipsn32*} == "" && \ ${MACHINE_ARCH:Mmips64*} == "" .include "${LIBC_SRCTOP}/quad/Makefile.inc" .endif .include "${LIBC_SRCTOP}/regex/Makefile.inc" .include "${LIBC_SRCTOP}/resolv/Makefile.inc" .include "${LIBC_SRCTOP}/stdio/Makefile.inc" .include "${LIBC_SRCTOP}/stdlib/Makefile.inc" .include "${LIBC_SRCTOP}/stdlib/jemalloc/Makefile.inc" .include "${LIBC_SRCTOP}/stdtime/Makefile.inc" .include "${LIBC_SRCTOP}/string/Makefile.inc" .include "${LIBC_SRCTOP}/sys/Makefile.inc" .include "${LIBC_SRCTOP}/secure/Makefile.inc" .include "${LIBC_SRCTOP}/rpc/Makefile.inc" .include "${LIBC_SRCTOP}/uuid/Makefile.inc" .include "${LIBC_SRCTOP}/xdr/Makefile.inc" .if (${LIBC_ARCH} == "arm" && ${MACHINE_ARCH} != "armv6hf") ||\ ${LIBC_ARCH} == "mips" .include "${LIBC_SRCTOP}/softfloat/Makefile.inc" .endif .if ${MK_NIS} != "no" CFLAGS+= -DYP .include "${LIBC_SRCTOP}/yp/Makefile.inc" .endif .include "${LIBC_SRCTOP}/capability/Makefile.inc" .if ${MK_HESIOD} != "no" CFLAGS+= -DHESIOD .endif .if ${MK_FP_LIBC} == "no" CFLAGS+= -DNO_FLOATING_POINT .endif .if ${MK_NS_CACHING} != "no" CFLAGS+= -DNS_CACHING .endif .if defined(_FREEFALL_CONFIG) CFLAGS+=-D_FREEFALL_CONFIG .endif STATICOBJS+=${LIBC_NONSHARED_SRCS:S/.c$/.o/} VERSION_DEF=${LIBC_SRCTOP}/Versions.def SYMBOL_MAPS=${SYM_MAPS} CFLAGS+= -DSYMBOL_VERSIONING # If there are no machine dependent sources, append all the # machine-independent sources: .if empty(MDSRCS) SRCS+= ${MISRCS} .else # Append machine-dependent sources, then append machine-independent sources # for which there is no machine-dependent variant. SRCS+= ${MDSRCS} .for _src in ${MISRCS} .if ${MDSRCS:R:M${_src:R}} == "" SRCS+= ${_src} .endif .endfor .endif KQSRCS= adddi3.c anddi3.c ashldi3.c ashrdi3.c cmpdi2.c divdi3.c iordi3.c \ lshldi3.c lshrdi3.c moddi3.c muldi3.c negdi2.c notdi2.c qdivrem.c \ subdi3.c ucmpdi2.c udivdi3.c umoddi3.c xordi3.c KSRCS= bcmp.c ffs.c ffsl.c fls.c flsl.c mcount.c strcat.c strchr.c \ strcmp.c strcpy.c strlen.c strncpy.c strrchr.c libkern: libkern.gen libkern.${LIBC_ARCH} libkern.gen: ${KQSRCS} ${KSRCS} ${CP} ${LIBC_SRCTOP}/quad/quad.h ${.ALLSRC} ${DESTDIR}/sys/libkern libkern.${LIBC_ARCH}:: ${KMSRCS} .if defined(KMSRCS) && !empty(KMSRCS) ${CP} ${.ALLSRC} ${DESTDIR}/sys/libkern/${LIBC_ARCH} .endif .if ${MK_TESTS} != "no" SUBDIR+= tests .endif .include .if !defined(_SKIP_BUILD) # We need libutil.h, get it directly to avoid # recording a build dependency -CFLAGS+= -I${.CURDIR:H}/libutil +CFLAGS+= -I${SRCTOP}/lib/libutil # Same issue with libm -MSUN_ARCH_SUBDIR != ${MAKE} -B -C ${.CURDIR:H}/msun -V ARCH_SUBDIR +MSUN_ARCH_SUBDIR != ${MAKE} -B -C ${SRCTOP}/lib/msun -V ARCH_SUBDIR # unfortunately msun/src contains both private and public headers -CFLAGS+= -I${.CURDIR:H}/msun/${MSUN_ARCH_SUBDIR} +CFLAGS+= -I${SRCTOP}/lib/msun/${MSUN_ARCH_SUBDIR} .if ${MACHINE_CPUARCH} == "i386" || ${MACHINE_CPUARCH} == "amd64" -CFLAGS+= -I${.CURDIR:H}/msun/x86 +CFLAGS+= -I${SRCTOP}/lib/msun/x86 .endif -CFLAGS+= -I${.CURDIR:H}/msun/src +CFLAGS+= -I${SRCTOP}/lib/msun/src # and we do not want to record a dependency on msun .if ${.MAKE.LEVEL} > 0 GENDIRDEPS_FILTER+= N${RELDIR:H}/msun .endif .endif # Disable warnings in contributed sources. CWARNFLAGS:= ${.IMPSRC:Ngdtoa_*.c:C/^.+$/${CWARNFLAGS}/:C/^$/-w/} # XXX For now, we don't allow libc to be compiled with # -fstack-protector-all because it breaks rtld. We may want to make a librtld # in the future to circumvent this. SSP_CFLAGS:= ${SSP_CFLAGS:S/^-fstack-protector-all$/-fstack-protector/} # Disable stack protection for SSP symbols. SSP_CFLAGS:= ${.IMPSRC:N*/stack_protector.c:C/^.+$/${SSP_CFLAGS}/} # Generate stack unwinding tables for cancellation points CANCELPOINTS_CFLAGS:= ${.IMPSRC:Mcancelpoints_*:C/^.+$/${CANCELPOINTS_CFLAGS}/:C/^$//} Index: projects/clang380-import/lib/libc/gen/readpassphrase.c =================================================================== --- projects/clang380-import/lib/libc/gen/readpassphrase.c (revision 294776) +++ projects/clang380-import/lib/libc/gen/readpassphrase.c (revision 294777) @@ -1,195 +1,203 @@ /* $OpenBSD: readpassphrase.c,v 1.24 2013/11/24 23:51:29 deraadt Exp $ */ /* * Copyright (c) 2000-2002, 2007, 2010 * Todd C. Miller * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * * Sponsored in part by the Defense Advanced Research Projects * Agency (DARPA) and Air Force Research Laboratory, Air Force * Materiel Command, USAF, under agreement number F39502-99-1-0512. */ #include __FBSDID("$FreeBSD$"); #include "namespace.h" #include #include #include #include #include #include #include #include #include #include #include "un-namespace.h" #include "libc_private.h" static volatile sig_atomic_t signo[NSIG]; static void handler(int); char * readpassphrase(const char *prompt, char *buf, size_t bufsiz, int flags) { ssize_t nr; - int input, output, save_errno, i, need_restart; + int input, output, save_errno, i, need_restart, input_is_tty; char ch, *p, *end; struct termios term, oterm; struct sigaction sa, savealrm, saveint, savehup, savequit, saveterm; struct sigaction savetstp, savettin, savettou, savepipe; /* I suppose we could alloc on demand in this case (XXX). */ if (bufsiz == 0) { errno = EINVAL; return(NULL); } restart: for (i = 0; i < NSIG; i++) signo[i] = 0; nr = -1; save_errno = 0; need_restart = 0; /* * Read and write to /dev/tty if available. If not, read from * stdin and write to stderr unless a tty is required. */ - if ((flags & RPP_STDIN) || - (input = output = _open(_PATH_TTY, O_RDWR | O_CLOEXEC)) == -1) { - if (flags & RPP_REQUIRE_TTY) { - errno = ENOTTY; - return(NULL); + input_is_tty = 0; + if (!(flags & RPP_STDIN)) { + input = output = _open(_PATH_TTY, O_RDWR | O_CLOEXEC); + if (input == -1) { + if (flags & RPP_REQUIRE_TTY) { + errno = ENOTTY; + return(NULL); + } + input = STDIN_FILENO; + output = STDERR_FILENO; + } else { + input_is_tty = 1; } + } else { input = STDIN_FILENO; output = STDERR_FILENO; } /* * Turn off echo if possible. * If we are using a tty but are not the foreground pgrp this will * generate SIGTTOU, so do it *before* installing the signal handlers. */ - if (input != STDIN_FILENO && tcgetattr(input, &oterm) == 0) { + if (input_is_tty && tcgetattr(input, &oterm) == 0) { memcpy(&term, &oterm, sizeof(term)); if (!(flags & RPP_ECHO_ON)) term.c_lflag &= ~(ECHO | ECHONL); if (term.c_cc[VSTATUS] != _POSIX_VDISABLE) term.c_cc[VSTATUS] = _POSIX_VDISABLE; (void)tcsetattr(input, TCSAFLUSH|TCSASOFT, &term); } else { memset(&term, 0, sizeof(term)); term.c_lflag |= ECHO; memset(&oterm, 0, sizeof(oterm)); oterm.c_lflag |= ECHO; } /* * Catch signals that would otherwise cause the user to end * up with echo turned off in the shell. Don't worry about * things like SIGXCPU and SIGVTALRM for now. */ sigemptyset(&sa.sa_mask); sa.sa_flags = 0; /* don't restart system calls */ sa.sa_handler = handler; (void)__libc_sigaction(SIGALRM, &sa, &savealrm); (void)__libc_sigaction(SIGHUP, &sa, &savehup); (void)__libc_sigaction(SIGINT, &sa, &saveint); (void)__libc_sigaction(SIGPIPE, &sa, &savepipe); (void)__libc_sigaction(SIGQUIT, &sa, &savequit); (void)__libc_sigaction(SIGTERM, &sa, &saveterm); (void)__libc_sigaction(SIGTSTP, &sa, &savetstp); (void)__libc_sigaction(SIGTTIN, &sa, &savettin); (void)__libc_sigaction(SIGTTOU, &sa, &savettou); if (!(flags & RPP_STDIN)) (void)_write(output, prompt, strlen(prompt)); end = buf + bufsiz - 1; p = buf; while ((nr = _read(input, &ch, 1)) == 1 && ch != '\n' && ch != '\r') { if (p < end) { if ((flags & RPP_SEVENBIT)) ch &= 0x7f; if (isalpha((unsigned char)ch)) { if ((flags & RPP_FORCELOWER)) ch = (char)tolower((unsigned char)ch); if ((flags & RPP_FORCEUPPER)) ch = (char)toupper((unsigned char)ch); } *p++ = ch; } } *p = '\0'; save_errno = errno; if (!(term.c_lflag & ECHO)) (void)_write(output, "\n", 1); /* Restore old terminal settings and signals. */ if (memcmp(&term, &oterm, sizeof(term)) != 0) { while (tcsetattr(input, TCSAFLUSH|TCSASOFT, &oterm) == -1 && errno == EINTR && !signo[SIGTTOU]) continue; } (void)__libc_sigaction(SIGALRM, &savealrm, NULL); (void)__libc_sigaction(SIGHUP, &savehup, NULL); (void)__libc_sigaction(SIGINT, &saveint, NULL); (void)__libc_sigaction(SIGQUIT, &savequit, NULL); (void)__libc_sigaction(SIGPIPE, &savepipe, NULL); (void)__libc_sigaction(SIGTERM, &saveterm, NULL); (void)__libc_sigaction(SIGTSTP, &savetstp, NULL); (void)__libc_sigaction(SIGTTIN, &savettin, NULL); (void)__libc_sigaction(SIGTTOU, &savettou, NULL); - if (input != STDIN_FILENO) + if (input_is_tty) (void)_close(input); /* * If we were interrupted by a signal, resend it to ourselves * now that we have restored the signal handlers. */ for (i = 0; i < NSIG; i++) { if (signo[i]) { kill(getpid(), i); switch (i) { case SIGTSTP: case SIGTTIN: case SIGTTOU: need_restart = 1; } } } if (need_restart) goto restart; if (save_errno) errno = save_errno; return(nr == -1 ? NULL : buf); } char * getpass(const char *prompt) { static char buf[_PASSWORD_LEN + 1]; if (readpassphrase(prompt, buf, sizeof(buf), RPP_ECHO_OFF) == NULL) buf[0] = '\0'; return(buf); } static void handler(int s) { signo[s] = 1; } Index: projects/clang380-import/lib/libc/net/sctp_sys_calls.c =================================================================== --- projects/clang380-import/lib/libc/net/sctp_sys_calls.c (revision 294776) +++ projects/clang380-import/lib/libc/net/sctp_sys_calls.c (revision 294777) @@ -1,1191 +1,1203 @@ /*- * Copyright (c) 2001-2007, by Cisco Systems, Inc. All rights reserved. * Copyright (c) 2008-2012, by Randall Stewart. All rights reserved. * Copyright (c) 2008-2012, by Michael Tuexen. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * a) Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * b) Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the distribution. * * c) Neither the name of Cisco Systems, Inc. nor the names of its * contributors may be used to endorse or promote products derived * from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF * THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifndef IN6_IS_ADDR_V4MAPPED #define IN6_IS_ADDR_V4MAPPED(a) \ ((*(const uint32_t *)(const void *)(&(a)->s6_addr[0]) == 0) && \ (*(const uint32_t *)(const void *)(&(a)->s6_addr[4]) == 0) && \ (*(const uint32_t *)(const void *)(&(a)->s6_addr[8]) == ntohl(0x0000ffff))) #endif #define SCTP_CONTROL_VEC_SIZE_RCV 16384 static void in6_sin6_2_sin(struct sockaddr_in *sin, struct sockaddr_in6 *sin6) { bzero(sin, sizeof(*sin)); sin->sin_len = sizeof(struct sockaddr_in); sin->sin_family = AF_INET; sin->sin_port = sin6->sin6_port; sin->sin_addr.s_addr = sin6->sin6_addr.__u6_addr.__u6_addr32[3]; } int sctp_getaddrlen(sa_family_t family) { int ret, sd; socklen_t siz; struct sctp_assoc_value av; av.assoc_value = family; siz = sizeof(av); #if defined(AF_INET) sd = socket(AF_INET, SOCK_SEQPACKET, IPPROTO_SCTP); #elif defined(AF_INET6) sd = socket(AF_INET6, SOCK_SEQPACKET, IPPROTO_SCTP); #else sd = -1; #endif if (sd == -1) { return (-1); } ret = getsockopt(sd, IPPROTO_SCTP, SCTP_GET_ADDR_LEN, &av, &siz); close(sd); if (ret == 0) { return ((int)av.assoc_value); } else { return (-1); } } int sctp_connectx(int sd, const struct sockaddr *addrs, int addrcnt, sctp_assoc_t * id) { char *buf; int i, ret, *aa; char *cpto; const struct sockaddr *at; size_t len; /* validate the address count and list */ if ((addrs == NULL) || (addrcnt <= 0)) { errno = EINVAL; return (-1); } if ((buf = malloc(sizeof(int) + (size_t)addrcnt * sizeof(struct sockaddr_in6))) == NULL) { errno = E2BIG; return (-1); } len = sizeof(int); at = addrs; cpto = buf + sizeof(int); /* validate all the addresses and get the size */ for (i = 0; i < addrcnt; i++) { switch (at->sa_family) { case AF_INET: if (at->sa_len != sizeof(struct sockaddr_in)) { free(buf); errno = EINVAL; return (-1); } memcpy(cpto, at, sizeof(struct sockaddr_in)); cpto = ((caddr_t)cpto + sizeof(struct sockaddr_in)); len += sizeof(struct sockaddr_in); break; case AF_INET6: if (at->sa_len != sizeof(struct sockaddr_in6)) { free(buf); errno = EINVAL; return (-1); } if (IN6_IS_ADDR_V4MAPPED(&((struct sockaddr_in6 *)at)->sin6_addr)) { in6_sin6_2_sin((struct sockaddr_in *)cpto, (struct sockaddr_in6 *)at); cpto = ((caddr_t)cpto + sizeof(struct sockaddr_in)); len += sizeof(struct sockaddr_in); } else { memcpy(cpto, at, sizeof(struct sockaddr_in6)); cpto = ((caddr_t)cpto + sizeof(struct sockaddr_in6)); len += sizeof(struct sockaddr_in6); } break; default: free(buf); errno = EINVAL; return (-1); } at = (struct sockaddr *)((caddr_t)at + at->sa_len); } aa = (int *)buf; *aa = addrcnt; ret = setsockopt(sd, IPPROTO_SCTP, SCTP_CONNECT_X, (void *)buf, (socklen_t) len); if ((ret == 0) && (id != NULL)) { *id = *(sctp_assoc_t *) buf; } free(buf); return (ret); } int sctp_bindx(int sd, struct sockaddr *addrs, int addrcnt, int flags) { struct sctp_getaddresses *gaddrs; struct sockaddr *sa; struct sockaddr_in *sin; struct sockaddr_in6 *sin6; int i; size_t argsz; uint16_t sport = 0; /* validate the flags */ if ((flags != SCTP_BINDX_ADD_ADDR) && (flags != SCTP_BINDX_REM_ADDR)) { errno = EFAULT; return (-1); } /* validate the address count and list */ if ((addrcnt <= 0) || (addrs == NULL)) { errno = EINVAL; return (-1); } /* First pre-screen the addresses */ sa = addrs; for (i = 0; i < addrcnt; i++) { switch (sa->sa_family) { case AF_INET: if (sa->sa_len != sizeof(struct sockaddr_in)) { errno = EINVAL; return (-1); } sin = (struct sockaddr_in *)sa; if (sin->sin_port) { /* non-zero port, check or save */ if (sport) { /* Check against our port */ if (sport != sin->sin_port) { errno = EINVAL; return (-1); } } else { /* save off the port */ sport = sin->sin_port; } } break; case AF_INET6: if (sa->sa_len != sizeof(struct sockaddr_in6)) { errno = EINVAL; return (-1); } sin6 = (struct sockaddr_in6 *)sa; if (sin6->sin6_port) { /* non-zero port, check or save */ if (sport) { /* Check against our port */ if (sport != sin6->sin6_port) { errno = EINVAL; return (-1); } } else { /* save off the port */ sport = sin6->sin6_port; } } break; default: /* Invalid address family specified. */ errno = EAFNOSUPPORT; return (-1); } sa = (struct sockaddr *)((caddr_t)sa + sa->sa_len); } argsz = sizeof(struct sctp_getaddresses) + sizeof(struct sockaddr_storage); if ((gaddrs = (struct sctp_getaddresses *)malloc(argsz)) == NULL) { errno = ENOMEM; return (-1); } sa = addrs; for (i = 0; i < addrcnt; i++) { memset(gaddrs, 0, argsz); gaddrs->sget_assoc_id = 0; memcpy(gaddrs->addr, sa, sa->sa_len); /* * Now, if there was a port mentioned, assure that the first * address has that port to make sure it fails or succeeds * correctly. */ if ((i == 0) && (sport != 0)) { switch (gaddrs->addr->sa_family) { case AF_INET: sin = (struct sockaddr_in *)gaddrs->addr; sin->sin_port = sport; break; case AF_INET6: sin6 = (struct sockaddr_in6 *)gaddrs->addr; sin6->sin6_port = sport; break; } } if (setsockopt(sd, IPPROTO_SCTP, flags, gaddrs, (socklen_t) argsz) != 0) { free(gaddrs); return (-1); } sa = (struct sockaddr *)((caddr_t)sa + sa->sa_len); } free(gaddrs); return (0); } int sctp_opt_info(int sd, sctp_assoc_t id, int opt, void *arg, socklen_t * size) { if (arg == NULL) { errno = EINVAL; return (-1); } if ((id == SCTP_CURRENT_ASSOC) || (id == SCTP_ALL_ASSOC)) { errno = EINVAL; return (-1); } switch (opt) { case SCTP_RTOINFO: ((struct sctp_rtoinfo *)arg)->srto_assoc_id = id; break; case SCTP_ASSOCINFO: ((struct sctp_assocparams *)arg)->sasoc_assoc_id = id; break; case SCTP_DEFAULT_SEND_PARAM: ((struct sctp_assocparams *)arg)->sasoc_assoc_id = id; break; case SCTP_PRIMARY_ADDR: ((struct sctp_setprim *)arg)->ssp_assoc_id = id; break; case SCTP_PEER_ADDR_PARAMS: ((struct sctp_paddrparams *)arg)->spp_assoc_id = id; break; case SCTP_MAXSEG: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_AUTH_KEY: ((struct sctp_authkey *)arg)->sca_assoc_id = id; break; case SCTP_AUTH_ACTIVE_KEY: ((struct sctp_authkeyid *)arg)->scact_assoc_id = id; break; case SCTP_DELAYED_SACK: ((struct sctp_sack_info *)arg)->sack_assoc_id = id; break; case SCTP_CONTEXT: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_STATUS: ((struct sctp_status *)arg)->sstat_assoc_id = id; break; case SCTP_GET_PEER_ADDR_INFO: ((struct sctp_paddrinfo *)arg)->spinfo_assoc_id = id; break; case SCTP_PEER_AUTH_CHUNKS: ((struct sctp_authchunks *)arg)->gauth_assoc_id = id; break; case SCTP_LOCAL_AUTH_CHUNKS: ((struct sctp_authchunks *)arg)->gauth_assoc_id = id; break; case SCTP_TIMEOUTS: ((struct sctp_timeouts *)arg)->stimo_assoc_id = id; break; case SCTP_EVENT: ((struct sctp_event *)arg)->se_assoc_id = id; break; case SCTP_DEFAULT_SNDINFO: ((struct sctp_sndinfo *)arg)->snd_assoc_id = id; break; case SCTP_DEFAULT_PRINFO: ((struct sctp_default_prinfo *)arg)->pr_assoc_id = id; break; case SCTP_PEER_ADDR_THLDS: ((struct sctp_paddrthlds *)arg)->spt_assoc_id = id; break; case SCTP_REMOTE_UDP_ENCAPS_PORT: ((struct sctp_udpencaps *)arg)->sue_assoc_id = id; break; case SCTP_ECN_SUPPORTED: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_PR_SUPPORTED: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_AUTH_SUPPORTED: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_ASCONF_SUPPORTED: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_RECONFIG_SUPPORTED: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_NRSACK_SUPPORTED: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_PKTDROP_SUPPORTED: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_MAX_BURST: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_ENABLE_STREAM_RESET: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; case SCTP_PR_STREAM_STATUS: ((struct sctp_prstatus *)arg)->sprstat_assoc_id = id; break; case SCTP_PR_ASSOC_STATUS: ((struct sctp_prstatus *)arg)->sprstat_assoc_id = id; break; case SCTP_MAX_CWND: ((struct sctp_assoc_value *)arg)->assoc_id = id; break; default: break; } return (getsockopt(sd, IPPROTO_SCTP, opt, arg, size)); } int sctp_getpaddrs(int sd, sctp_assoc_t id, struct sockaddr **raddrs) { struct sctp_getaddresses *addrs; struct sockaddr *sa; sctp_assoc_t asoc; caddr_t lim; socklen_t opt_len; int cnt; if (raddrs == NULL) { errno = EFAULT; return (-1); } asoc = id; opt_len = (socklen_t) sizeof(sctp_assoc_t); if (getsockopt(sd, IPPROTO_SCTP, SCTP_GET_REMOTE_ADDR_SIZE, &asoc, &opt_len) != 0) { return (-1); } /* size required is returned in 'asoc' */ opt_len = (socklen_t) ((size_t)asoc + sizeof(struct sctp_getaddresses)); addrs = calloc(1, (size_t)opt_len); if (addrs == NULL) { errno = ENOMEM; return (-1); } addrs->sget_assoc_id = id; /* Now lets get the array of addresses */ if (getsockopt(sd, IPPROTO_SCTP, SCTP_GET_PEER_ADDRESSES, addrs, &opt_len) != 0) { free(addrs); return (-1); } *raddrs = (struct sockaddr *)&addrs->addr[0]; cnt = 0; sa = (struct sockaddr *)&addrs->addr[0]; lim = (caddr_t)addrs + opt_len; while (((caddr_t)sa < lim) && (sa->sa_len > 0)) { sa = (struct sockaddr *)((caddr_t)sa + sa->sa_len); cnt++; } return (cnt); } void sctp_freepaddrs(struct sockaddr *addrs) { void *fr_addr; /* Take away the hidden association id */ fr_addr = (void *)((caddr_t)addrs - sizeof(sctp_assoc_t)); /* Now free it */ free(fr_addr); } int sctp_getladdrs(int sd, sctp_assoc_t id, struct sockaddr **raddrs) { struct sctp_getaddresses *addrs; caddr_t lim; struct sockaddr *sa; size_t size_of_addresses; socklen_t opt_len; int cnt; if (raddrs == NULL) { errno = EFAULT; return (-1); } size_of_addresses = 0; opt_len = (socklen_t) sizeof(int); if (getsockopt(sd, IPPROTO_SCTP, SCTP_GET_LOCAL_ADDR_SIZE, &size_of_addresses, &opt_len) != 0) { errno = ENOMEM; return (-1); } if (size_of_addresses == 0) { errno = ENOTCONN; return (-1); } opt_len = (socklen_t) (size_of_addresses + sizeof(struct sockaddr_storage) + sizeof(struct sctp_getaddresses)); addrs = calloc(1, (size_t)opt_len); if (addrs == NULL) { errno = ENOMEM; return (-1); } addrs->sget_assoc_id = id; /* Now lets get the array of addresses */ if (getsockopt(sd, IPPROTO_SCTP, SCTP_GET_LOCAL_ADDRESSES, addrs, &opt_len) != 0) { free(addrs); errno = ENOMEM; return (-1); } *raddrs = (struct sockaddr *)&addrs->addr[0]; cnt = 0; sa = (struct sockaddr *)&addrs->addr[0]; lim = (caddr_t)addrs + opt_len; while (((caddr_t)sa < lim) && (sa->sa_len > 0)) { sa = (struct sockaddr *)((caddr_t)sa + sa->sa_len); cnt++; } return (cnt); } void sctp_freeladdrs(struct sockaddr *addrs) { void *fr_addr; /* Take away the hidden association id */ fr_addr = (void *)((caddr_t)addrs - sizeof(sctp_assoc_t)); /* Now free it */ free(fr_addr); } ssize_t sctp_sendmsg(int s, const void *data, size_t len, const struct sockaddr *to, socklen_t tolen, uint32_t ppid, uint32_t flags, uint16_t stream_no, uint32_t timetolive, uint32_t context) { #ifdef SYS_sctp_generic_sendmsg struct sctp_sndrcvinfo sinfo; memset(&sinfo, 0, sizeof(struct sctp_sndrcvinfo)); sinfo.sinfo_ppid = ppid; sinfo.sinfo_flags = flags; sinfo.sinfo_stream = stream_no; sinfo.sinfo_timetolive = timetolive; sinfo.sinfo_context = context; sinfo.sinfo_assoc_id = 0; return (syscall(SYS_sctp_generic_sendmsg, s, data, len, to, tolen, &sinfo, 0)); #else struct msghdr msg; struct sctp_sndrcvinfo *sinfo; struct iovec iov; char cmsgbuf[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; struct cmsghdr *cmsg; struct sockaddr *who = NULL; union { struct sockaddr_in in; struct sockaddr_in6 in6; } addr; if ((tolen > 0) && ((to == NULL) || (tolen < sizeof(struct sockaddr)))) { errno = EINVAL; return (-1); } if ((to != NULL) && (tolen > 0)) { switch (to->sa_family) { case AF_INET: if (tolen != sizeof(struct sockaddr_in)) { errno = EINVAL; return (-1); } if ((to->sa_len > 0) && (to->sa_len != sizeof(struct sockaddr_in))) { errno = EINVAL; return (-1); } memcpy(&addr, to, sizeof(struct sockaddr_in)); addr.in.sin_len = sizeof(struct sockaddr_in); break; case AF_INET6: if (tolen != sizeof(struct sockaddr_in6)) { errno = EINVAL; return (-1); } if ((to->sa_len > 0) && (to->sa_len != sizeof(struct sockaddr_in6))) { errno = EINVAL; return (-1); } memcpy(&addr, to, sizeof(struct sockaddr_in6)); addr.in6.sin6_len = sizeof(struct sockaddr_in6); break; default: errno = EAFNOSUPPORT; return (-1); } who = (struct sockaddr *)&addr; } iov.iov_base = (char *)data; iov.iov_len = len; if (who) { msg.msg_name = (caddr_t)who; msg.msg_namelen = who->sa_len; } else { msg.msg_name = (caddr_t)NULL; msg.msg_namelen = 0; } msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = cmsgbuf; msg.msg_controllen = CMSG_SPACE(sizeof(struct sctp_sndrcvinfo)); msg.msg_flags = 0; cmsg = (struct cmsghdr *)cmsgbuf; cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_SNDRCV; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sctp_sndrcvinfo)); sinfo = (struct sctp_sndrcvinfo *)CMSG_DATA(cmsg); memset(sinfo, 0, sizeof(struct sctp_sndrcvinfo)); sinfo->sinfo_stream = stream_no; sinfo->sinfo_ssn = 0; sinfo->sinfo_flags = flags; sinfo->sinfo_ppid = ppid; sinfo->sinfo_context = context; sinfo->sinfo_assoc_id = 0; sinfo->sinfo_timetolive = timetolive; return (sendmsg(s, &msg, 0)); #endif } sctp_assoc_t sctp_getassocid(int sd, struct sockaddr *sa) { struct sctp_paddrinfo sp; socklen_t siz; /* First get the assoc id */ siz = sizeof(sp); memset(&sp, 0, sizeof(sp)); memcpy((caddr_t)&sp.spinfo_address, sa, sa->sa_len); if (getsockopt(sd, IPPROTO_SCTP, SCTP_GET_PEER_ADDR_INFO, &sp, &siz) != 0) { /* We depend on the fact that 0 can never be returned */ return ((sctp_assoc_t) 0); } return (sp.spinfo_assoc_id); } ssize_t sctp_send(int sd, const void *data, size_t len, const struct sctp_sndrcvinfo *sinfo, int flags) { #ifdef SYS_sctp_generic_sendmsg struct sockaddr *to = NULL; return (syscall(SYS_sctp_generic_sendmsg, sd, data, len, to, 0, sinfo, flags)); #else struct msghdr msg; struct iovec iov; char cmsgbuf[CMSG_SPACE(sizeof(struct sctp_sndrcvinfo))]; struct cmsghdr *cmsg; if (sinfo == NULL) { errno = EINVAL; return (-1); } iov.iov_base = (char *)data; iov.iov_len = len; msg.msg_name = NULL; msg.msg_namelen = 0; msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = cmsgbuf; msg.msg_controllen = CMSG_SPACE(sizeof(struct sctp_sndrcvinfo)); msg.msg_flags = 0; cmsg = (struct cmsghdr *)cmsgbuf; cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_SNDRCV; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sctp_sndrcvinfo)); memcpy(CMSG_DATA(cmsg), sinfo, sizeof(struct sctp_sndrcvinfo)); return (sendmsg(sd, &msg, flags)); #endif } ssize_t sctp_sendx(int sd, const void *msg, size_t msg_len, struct sockaddr *addrs, int addrcnt, struct sctp_sndrcvinfo *sinfo, int flags) { struct sctp_sndrcvinfo __sinfo; ssize_t ret; int i, cnt, *aa, saved_errno; char *buf; int no_end_cx = 0; size_t len, add_len; struct sockaddr *at; if (addrs == NULL) { errno = EINVAL; return (-1); } #ifdef SYS_sctp_generic_sendmsg if (addrcnt == 1) { socklen_t l; + ssize_t ret; /* * Quick way, we don't need to do a connectx so lets use the * syscall directly. */ l = addrs->sa_len; - return (syscall(SYS_sctp_generic_sendmsg, sd, - msg, msg_len, addrs, l, sinfo, flags)); + ret = syscall(SYS_sctp_generic_sendmsg, sd, + msg, msg_len, addrs, l, sinfo, flags); + if ((ret >= 0) && (sinfo != NULL)) { + sinfo->sinfo_assoc_id = sctp_getassocid(sd, addrs); + } + return (ret); } #endif len = sizeof(int); at = addrs; cnt = 0; /* validate all the addresses and get the size */ for (i = 0; i < addrcnt; i++) { if (at->sa_family == AF_INET) { add_len = sizeof(struct sockaddr_in); } else if (at->sa_family == AF_INET6) { add_len = sizeof(struct sockaddr_in6); } else { errno = EINVAL; return (-1); } len += add_len; at = (struct sockaddr *)((caddr_t)at + add_len); cnt++; } /* do we have any? */ if (cnt == 0) { errno = EINVAL; return (-1); } buf = malloc(len); if (buf == NULL) { errno = ENOMEM; return (-1); } aa = (int *)buf; *aa = cnt; aa++; memcpy((caddr_t)aa, addrs, (size_t)(len - sizeof(int))); ret = setsockopt(sd, IPPROTO_SCTP, SCTP_CONNECT_X_DELAYED, (void *)buf, (socklen_t) len); free(buf); if (ret != 0) { if (errno == EALREADY) { no_end_cx = 1; goto continue_send; } return (ret); } continue_send: if (sinfo == NULL) { sinfo = &__sinfo; memset(&__sinfo, 0, sizeof(__sinfo)); } sinfo->sinfo_assoc_id = sctp_getassocid(sd, addrs); if (sinfo->sinfo_assoc_id == 0) { (void)setsockopt(sd, IPPROTO_SCTP, SCTP_CONNECT_X_COMPLETE, (void *)addrs, (socklen_t) addrs->sa_len); errno = ENOENT; return (-1); } ret = sctp_send(sd, msg, msg_len, sinfo, flags); saved_errno = errno; if (no_end_cx == 0) (void)setsockopt(sd, IPPROTO_SCTP, SCTP_CONNECT_X_COMPLETE, (void *)addrs, (socklen_t) addrs->sa_len); errno = saved_errno; return (ret); } ssize_t sctp_sendmsgx(int sd, const void *msg, size_t len, struct sockaddr *addrs, int addrcnt, uint32_t ppid, uint32_t flags, uint16_t stream_no, uint32_t timetolive, uint32_t context) { struct sctp_sndrcvinfo sinfo; memset((void *)&sinfo, 0, sizeof(struct sctp_sndrcvinfo)); sinfo.sinfo_ppid = ppid; sinfo.sinfo_flags = flags; sinfo.sinfo_ssn = stream_no; sinfo.sinfo_timetolive = timetolive; sinfo.sinfo_context = context; return (sctp_sendx(sd, msg, len, addrs, addrcnt, &sinfo, 0)); } ssize_t sctp_recvmsg(int s, void *dbuf, size_t len, struct sockaddr *from, socklen_t * fromlen, struct sctp_sndrcvinfo *sinfo, int *msg_flags) { #ifdef SYS_sctp_generic_recvmsg struct iovec iov; iov.iov_base = dbuf; iov.iov_len = len; return (syscall(SYS_sctp_generic_recvmsg, s, &iov, 1, from, fromlen, sinfo, msg_flags)); #else ssize_t sz; struct msghdr msg; struct iovec iov; char cmsgbuf[SCTP_CONTROL_VEC_SIZE_RCV]; struct cmsghdr *cmsg; if (msg_flags == NULL) { errno = EINVAL; return (-1); } iov.iov_base = dbuf; iov.iov_len = len; msg.msg_name = (caddr_t)from; if (fromlen == NULL) msg.msg_namelen = 0; else msg.msg_namelen = *fromlen; msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = cmsgbuf; msg.msg_controllen = sizeof(cmsgbuf); msg.msg_flags = 0; sz = recvmsg(s, &msg, *msg_flags); *msg_flags = msg.msg_flags; if (sz <= 0) { return (sz); } if (sinfo) { sinfo->sinfo_assoc_id = 0; } if ((msg.msg_controllen > 0) && (sinfo != NULL)) { /* * parse through and see if we find the sctp_sndrcvinfo (if * the user wants it). */ for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { if (cmsg->cmsg_level != IPPROTO_SCTP) { continue; } if (cmsg->cmsg_type == SCTP_SNDRCV) { memcpy(sinfo, CMSG_DATA(cmsg), sizeof(struct sctp_sndrcvinfo)); break; } if (cmsg->cmsg_type == SCTP_EXTRCV) { /* * Let's hope that the user provided enough * enough memory. At least he asked for more * information. */ memcpy(sinfo, CMSG_DATA(cmsg), sizeof(struct sctp_extrcvinfo)); break; } } } return (sz); #endif } ssize_t sctp_recvv(int sd, const struct iovec *iov, int iovlen, struct sockaddr *from, socklen_t * fromlen, void *info, socklen_t * infolen, unsigned int *infotype, int *flags) { char cmsgbuf[SCTP_CONTROL_VEC_SIZE_RCV]; struct msghdr msg; struct cmsghdr *cmsg; ssize_t ret; struct sctp_rcvinfo *rcvinfo; struct sctp_nxtinfo *nxtinfo; if (((info != NULL) && (infolen == NULL)) || ((info == NULL) && (infolen != NULL) && (*infolen != 0)) || ((info != NULL) && (infotype == NULL))) { errno = EINVAL; return (-1); } if (infotype) { *infotype = SCTP_RECVV_NOINFO; } msg.msg_name = from; if (fromlen == NULL) { msg.msg_namelen = 0; } else { msg.msg_namelen = *fromlen; } msg.msg_iov = (struct iovec *)iov; msg.msg_iovlen = iovlen; msg.msg_control = cmsgbuf; msg.msg_controllen = sizeof(cmsgbuf); msg.msg_flags = 0; ret = recvmsg(sd, &msg, *flags); *flags = msg.msg_flags; if ((ret > 0) && (msg.msg_controllen > 0) && (infotype != NULL) && (infolen != NULL) && (*infolen > 0)) { rcvinfo = NULL; nxtinfo = NULL; for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { if (cmsg->cmsg_level != IPPROTO_SCTP) { continue; } if (cmsg->cmsg_type == SCTP_RCVINFO) { rcvinfo = (struct sctp_rcvinfo *)CMSG_DATA(cmsg); if (nxtinfo != NULL) { break; } else { continue; } } if (cmsg->cmsg_type == SCTP_NXTINFO) { nxtinfo = (struct sctp_nxtinfo *)CMSG_DATA(cmsg); if (rcvinfo != NULL) { break; } else { continue; } } } if (rcvinfo != NULL) { if ((nxtinfo != NULL) && (*infolen >= sizeof(struct sctp_recvv_rn))) { struct sctp_recvv_rn *rn_info; rn_info = (struct sctp_recvv_rn *)info; rn_info->recvv_rcvinfo = *rcvinfo; rn_info->recvv_nxtinfo = *nxtinfo; *infolen = (socklen_t) sizeof(struct sctp_recvv_rn); *infotype = SCTP_RECVV_RN; } else if (*infolen >= sizeof(struct sctp_rcvinfo)) { memcpy(info, rcvinfo, sizeof(struct sctp_rcvinfo)); *infolen = (socklen_t) sizeof(struct sctp_rcvinfo); *infotype = SCTP_RECVV_RCVINFO; } } else if (nxtinfo != NULL) { if (*infolen >= sizeof(struct sctp_nxtinfo)) { memcpy(info, nxtinfo, sizeof(struct sctp_nxtinfo)); *infolen = (socklen_t) sizeof(struct sctp_nxtinfo); *infotype = SCTP_RECVV_NXTINFO; } } } return (ret); } ssize_t sctp_sendv(int sd, const struct iovec *iov, int iovcnt, struct sockaddr *addrs, int addrcnt, void *info, socklen_t infolen, unsigned int infotype, int flags) { ssize_t ret; int i; socklen_t addr_len; struct msghdr msg; in_port_t port; struct sctp_sendv_spa *spa_info; struct cmsghdr *cmsg; char *cmsgbuf; struct sockaddr *addr; struct sockaddr_in *addr_in; struct sockaddr_in6 *addr_in6; + sctp_assoc_t *assoc_id; if ((addrcnt < 0) || (iovcnt < 0) || ((addrs == NULL) && (addrcnt > 0)) || ((addrs != NULL) && (addrcnt == 0)) || ((iov == NULL) && (iovcnt > 0)) || ((iov != NULL) && (iovcnt == 0))) { errno = EINVAL; return (-1); } cmsgbuf = malloc(CMSG_SPACE(sizeof(struct sctp_sndinfo)) + CMSG_SPACE(sizeof(struct sctp_prinfo)) + CMSG_SPACE(sizeof(struct sctp_authinfo)) + (size_t)addrcnt * CMSG_SPACE(sizeof(struct in6_addr))); if (cmsgbuf == NULL) { errno = ENOMEM; return (-1); } + assoc_id = NULL; msg.msg_control = cmsgbuf; msg.msg_controllen = 0; cmsg = (struct cmsghdr *)cmsgbuf; switch (infotype) { case SCTP_SENDV_NOINFO: if ((infolen != 0) || (info != NULL)) { free(cmsgbuf); errno = EINVAL; return (-1); } break; case SCTP_SENDV_SNDINFO: if ((info == NULL) || (infolen < sizeof(struct sctp_sndinfo))) { free(cmsgbuf); errno = EINVAL; return (-1); } cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_SNDINFO; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sctp_sndinfo)); memcpy(CMSG_DATA(cmsg), info, sizeof(struct sctp_sndinfo)); msg.msg_controllen += CMSG_SPACE(sizeof(struct sctp_sndinfo)); cmsg = (struct cmsghdr *)((caddr_t)cmsg + CMSG_SPACE(sizeof(struct sctp_sndinfo))); + assoc_id = &(((struct sctp_sndinfo *)info)->snd_assoc_id); break; case SCTP_SENDV_PRINFO: if ((info == NULL) || (infolen < sizeof(struct sctp_prinfo))) { free(cmsgbuf); errno = EINVAL; return (-1); } cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_PRINFO; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sctp_prinfo)); memcpy(CMSG_DATA(cmsg), info, sizeof(struct sctp_prinfo)); msg.msg_controllen += CMSG_SPACE(sizeof(struct sctp_prinfo)); cmsg = (struct cmsghdr *)((caddr_t)cmsg + CMSG_SPACE(sizeof(struct sctp_prinfo))); break; case SCTP_SENDV_AUTHINFO: if ((info == NULL) || (infolen < sizeof(struct sctp_authinfo))) { free(cmsgbuf); errno = EINVAL; return (-1); } cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_AUTHINFO; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sctp_authinfo)); memcpy(CMSG_DATA(cmsg), info, sizeof(struct sctp_authinfo)); msg.msg_controllen += CMSG_SPACE(sizeof(struct sctp_authinfo)); cmsg = (struct cmsghdr *)((caddr_t)cmsg + CMSG_SPACE(sizeof(struct sctp_authinfo))); break; case SCTP_SENDV_SPA: if ((info == NULL) || (infolen < sizeof(struct sctp_sendv_spa))) { free(cmsgbuf); errno = EINVAL; return (-1); } spa_info = (struct sctp_sendv_spa *)info; if (spa_info->sendv_flags & SCTP_SEND_SNDINFO_VALID) { cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_SNDINFO; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sctp_sndinfo)); memcpy(CMSG_DATA(cmsg), &spa_info->sendv_sndinfo, sizeof(struct sctp_sndinfo)); msg.msg_controllen += CMSG_SPACE(sizeof(struct sctp_sndinfo)); cmsg = (struct cmsghdr *)((caddr_t)cmsg + CMSG_SPACE(sizeof(struct sctp_sndinfo))); + assoc_id = &(spa_info->sendv_sndinfo.snd_assoc_id); } if (spa_info->sendv_flags & SCTP_SEND_PRINFO_VALID) { cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_PRINFO; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sctp_prinfo)); memcpy(CMSG_DATA(cmsg), &spa_info->sendv_prinfo, sizeof(struct sctp_prinfo)); msg.msg_controllen += CMSG_SPACE(sizeof(struct sctp_prinfo)); cmsg = (struct cmsghdr *)((caddr_t)cmsg + CMSG_SPACE(sizeof(struct sctp_prinfo))); } if (spa_info->sendv_flags & SCTP_SEND_AUTHINFO_VALID) { cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_AUTHINFO; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sctp_authinfo)); memcpy(CMSG_DATA(cmsg), &spa_info->sendv_authinfo, sizeof(struct sctp_authinfo)); msg.msg_controllen += CMSG_SPACE(sizeof(struct sctp_authinfo)); cmsg = (struct cmsghdr *)((caddr_t)cmsg + CMSG_SPACE(sizeof(struct sctp_authinfo))); } break; default: free(cmsgbuf); errno = EINVAL; return (-1); } addr = addrs; msg.msg_name = NULL; msg.msg_namelen = 0; for (i = 0; i < addrcnt; i++) { switch (addr->sa_family) { case AF_INET: addr_len = (socklen_t) sizeof(struct sockaddr_in); addr_in = (struct sockaddr_in *)addr; if (addr_in->sin_len != addr_len) { free(cmsgbuf); errno = EINVAL; return (-1); } if (i == 0) { port = addr_in->sin_port; } else { if (port == addr_in->sin_port) { cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_DSTADDRV4; cmsg->cmsg_len = CMSG_LEN(sizeof(struct in_addr)); memcpy(CMSG_DATA(cmsg), &addr_in->sin_addr, sizeof(struct in_addr)); msg.msg_controllen += CMSG_SPACE(sizeof(struct in_addr)); cmsg = (struct cmsghdr *)((caddr_t)cmsg + CMSG_SPACE(sizeof(struct in_addr))); } else { free(cmsgbuf); errno = EINVAL; return (-1); } } break; case AF_INET6: addr_len = (socklen_t) sizeof(struct sockaddr_in6); addr_in6 = (struct sockaddr_in6 *)addr; if (addr_in6->sin6_len != addr_len) { free(cmsgbuf); errno = EINVAL; return (-1); } if (i == 0) { port = addr_in6->sin6_port; } else { if (port == addr_in6->sin6_port) { cmsg->cmsg_level = IPPROTO_SCTP; cmsg->cmsg_type = SCTP_DSTADDRV6; cmsg->cmsg_len = CMSG_LEN(sizeof(struct in6_addr)); memcpy(CMSG_DATA(cmsg), &addr_in6->sin6_addr, sizeof(struct in6_addr)); msg.msg_controllen += CMSG_SPACE(sizeof(struct in6_addr)); cmsg = (struct cmsghdr *)((caddr_t)cmsg + CMSG_SPACE(sizeof(struct in6_addr))); } else { free(cmsgbuf); errno = EINVAL; return (-1); } } break; default: free(cmsgbuf); errno = EINVAL; return (-1); } if (i == 0) { msg.msg_name = addr; msg.msg_namelen = addr_len; } addr = (struct sockaddr *)((caddr_t)addr + addr_len); } if (msg.msg_controllen == 0) { msg.msg_control = NULL; } msg.msg_iov = (struct iovec *)iov; msg.msg_iovlen = iovcnt; msg.msg_flags = 0; ret = sendmsg(sd, &msg, flags); free(cmsgbuf); + if ((ret >= 0) && (addrs != NULL) && (assoc_id != NULL)) { + *assoc_id = sctp_getassocid(sd, addrs); + } return (ret); } #if !defined(SYS_sctp_peeloff) && !defined(HAVE_SCTP_PEELOFF_SOCKOPT) int sctp_peeloff(int sd, sctp_assoc_t assoc_id) { /* NOT supported, return invalid sd */ errno = ENOTSUP; return (-1); } #endif #if defined(SYS_sctp_peeloff) && !defined(HAVE_SCTP_PEELOFF_SOCKOPT) int sctp_peeloff(int sd, sctp_assoc_t assoc_id) { return (syscall(SYS_sctp_peeloff, sd, assoc_id)); } #endif #undef SCTP_CONTROL_VEC_SIZE_RCV Index: projects/clang380-import/lib/libc =================================================================== --- projects/clang380-import/lib/libc (revision 294776) +++ projects/clang380-import/lib/libc (revision 294777) Property changes on: projects/clang380-import/lib/libc ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/lib/libc:r294599-294776 Index: projects/clang380-import/lib/libelftc/Makefile =================================================================== --- projects/clang380-import/lib/libelftc/Makefile (revision 294776) +++ projects/clang380-import/lib/libelftc/Makefile (revision 294777) @@ -1,30 +1,30 @@ # $FreeBSD$ .include INTERNALLIB= ELFTCDIR= ${.CURDIR}/../../contrib/elftoolchain .PATH: ${ELFTCDIR}/libelftc LIB= elftc SRCS= elftc_bfdtarget.c \ elftc_copyfile.c \ elftc_demangle.c \ elftc_set_timestamps.c \ elftc_string_table.c \ elftc_version.c \ libelftc_bfdtarget.c \ libelftc_dem_arm.c \ libelftc_dem_gnu2.c \ libelftc_dem_gnu3.c \ libelftc_hash.c \ libelftc_vstr.c INCS= libelftc.h CFLAGS+=-I${ELFTCDIR}/libelftc -I${ELFTCDIR}/common -NO_MAN= yes +MAN= .include Index: projects/clang380-import/lib/libproc/proc_bkpt.c =================================================================== --- projects/clang380-import/lib/libproc/proc_bkpt.c (revision 294776) +++ projects/clang380-import/lib/libproc/proc_bkpt.c (revision 294777) @@ -1,259 +1,262 @@ /* * Copyright (c) 2010 The FreeBSD Foundation * All rights reserved. * * This software was developed by Rui Paulo under sponsorship from the * FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include "_libproc.h" #if defined(__aarch64__) #define AARCH64_BRK 0xd4200000 #define AARCH64_BRK_IMM16_SHIFT 5 #define AARCH64_BRK_IMM16_VAL (0xd << AARCH64_BRK_IMM16_SHIFT) #define BREAKPOINT_INSTR (AARCH64_BRK | AARCH64_BRK_IMM16_VAL) #define BREAKPOINT_INSTR_SZ 4 #elif defined(__amd64__) || defined(__i386__) #define BREAKPOINT_INSTR 0xcc /* int 0x3 */ #define BREAKPOINT_INSTR_SZ 1 #define BREAKPOINT_ADJUST_SZ BREAKPOINT_INSTR_SZ #elif defined(__arm__) #define BREAKPOINT_INSTR 0xe7ffffff /* bkpt */ #define BREAKPOINT_INSTR_SZ 4 #elif defined(__mips__) #define BREAKPOINT_INSTR 0xd /* break */ #define BREAKPOINT_INSTR_SZ 4 #elif defined(__powerpc__) #define BREAKPOINT_INSTR 0x7fe00008 /* trap */ #define BREAKPOINT_INSTR_SZ 4 +#elif defined(__riscv__) +#define BREAKPOINT_INSTR 0x00100073 /* sbreak */ +#define BREAKPOINT_INSTR_SZ 4 #else #error "Add support for your architecture" #endif static int proc_stop(struct proc_handle *phdl) { int status; if (kill(proc_getpid(phdl), SIGSTOP) == -1) { DPRINTF("kill %d", proc_getpid(phdl)); return (-1); } else if (waitpid(proc_getpid(phdl), &status, WSTOPPED) == -1) { DPRINTF("waitpid %d", proc_getpid(phdl)); return (-1); } else if (!WIFSTOPPED(status)) { DPRINTFX("waitpid: unexpected status 0x%x", status); return (-1); } return (0); } int proc_bkptset(struct proc_handle *phdl, uintptr_t address, unsigned long *saved) { struct ptrace_io_desc piod; unsigned long paddr, caddr; int ret = 0, stopped; *saved = 0; if (phdl->status == PS_DEAD || phdl->status == PS_UNDEAD || phdl->status == PS_IDLE) { errno = ENOENT; return (-1); } DPRINTFX("adding breakpoint at 0x%lx", address); stopped = 0; if (phdl->status != PS_STOP) { if (proc_stop(phdl) != 0) return (-1); stopped = 1; } /* * Read the original instruction. */ caddr = address; paddr = 0; piod.piod_op = PIOD_READ_I; piod.piod_offs = (void *)caddr; piod.piod_addr = &paddr; piod.piod_len = BREAKPOINT_INSTR_SZ; if (ptrace(PT_IO, proc_getpid(phdl), (caddr_t)&piod, 0) < 0) { DPRINTF("ERROR: couldn't read instruction at address 0x%" PRIuPTR, address); ret = -1; goto done; } *saved = paddr; /* * Write a breakpoint instruction to that address. */ caddr = address; paddr = BREAKPOINT_INSTR; piod.piod_op = PIOD_WRITE_I; piod.piod_offs = (void *)caddr; piod.piod_addr = &paddr; piod.piod_len = BREAKPOINT_INSTR_SZ; if (ptrace(PT_IO, proc_getpid(phdl), (caddr_t)&piod, 0) < 0) { DPRINTF("ERROR: couldn't write instruction at address 0x%" PRIuPTR, address); ret = -1; goto done; } done: if (stopped) /* Restart the process if we had to stop it. */ proc_continue(phdl); return (ret); } int proc_bkptdel(struct proc_handle *phdl, uintptr_t address, unsigned long saved) { struct ptrace_io_desc piod; unsigned long paddr, caddr; int ret = 0, stopped; if (phdl->status == PS_DEAD || phdl->status == PS_UNDEAD || phdl->status == PS_IDLE) { errno = ENOENT; return (-1); } DPRINTFX("removing breakpoint at 0x%lx", address); stopped = 0; if (phdl->status != PS_STOP) { if (proc_stop(phdl) != 0) return (-1); stopped = 1; } /* * Overwrite the breakpoint instruction that we setup previously. */ caddr = address; paddr = saved; piod.piod_op = PIOD_WRITE_I; piod.piod_offs = (void *)caddr; piod.piod_addr = &paddr; piod.piod_len = BREAKPOINT_INSTR_SZ; if (ptrace(PT_IO, proc_getpid(phdl), (caddr_t)&piod, 0) < 0) { DPRINTF("ERROR: couldn't write instruction at address 0x%" PRIuPTR, address); ret = -1; } if (stopped) /* Restart the process if we had to stop it. */ proc_continue(phdl); return (ret); } /* * Decrement pc so that we delete the breakpoint at the correct * address, i.e. at the BREAKPOINT_INSTR address. * * This is only needed on some architectures where the pc value * when reading registers points at the instruction after the * breakpoint, e.g. x86. */ void proc_bkptregadj(unsigned long *pc) { (void)pc; #ifdef BREAKPOINT_ADJUST_SZ *pc = *pc - BREAKPOINT_ADJUST_SZ; #endif } /* * Step over the breakpoint. */ int proc_bkptexec(struct proc_handle *phdl, unsigned long saved) { unsigned long pc; unsigned long samesaved; int status; if (proc_regget(phdl, REG_PC, &pc) < 0) { DPRINTFX("ERROR: couldn't get PC register"); return (-1); } proc_bkptregadj(&pc); if (proc_bkptdel(phdl, pc, saved) < 0) { DPRINTFX("ERROR: couldn't delete breakpoint"); return (-1); } /* * Go back in time and step over the new instruction just * set up by proc_bkptdel(). */ proc_regset(phdl, REG_PC, pc); if (ptrace(PT_STEP, proc_getpid(phdl), (caddr_t)1, 0) < 0) { DPRINTFX("ERROR: ptrace step failed"); return (-1); } proc_wstatus(phdl); status = proc_getwstat(phdl); if (!WIFSTOPPED(status)) { DPRINTFX("ERROR: don't know why process stopped"); return (-1); } /* * Restore the breakpoint. The saved instruction should be * the same as the one that we were passed in. */ if (proc_bkptset(phdl, pc, &samesaved) < 0) { DPRINTFX("ERROR: couldn't restore breakpoint"); return (-1); } assert(samesaved == saved); return (0); } Index: projects/clang380-import/lib/libproc/proc_regs.c =================================================================== --- projects/clang380-import/lib/libproc/proc_regs.c (revision 294776) +++ projects/clang380-import/lib/libproc/proc_regs.c (revision 294777) @@ -1,145 +1,153 @@ /* * Copyright (c) 2010 The FreeBSD Foundation * All rights reserved. * * This software was developed by Rui Paulo under sponsorship from the * FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include "_libproc.h" int proc_regget(struct proc_handle *phdl, proc_reg_t reg, unsigned long *regvalue) { struct reg regs; if (phdl->status == PS_DEAD || phdl->status == PS_UNDEAD || phdl->status == PS_IDLE) { errno = ENOENT; return (-1); } memset(®s, 0, sizeof(regs)); if (ptrace(PT_GETREGS, proc_getpid(phdl), (caddr_t)®s, 0) < 0) return (-1); switch (reg) { case REG_PC: #if defined(__aarch64__) *regvalue = regs.elr; #elif defined(__amd64__) *regvalue = regs.r_rip; #elif defined(__arm__) *regvalue = regs.r_pc; #elif defined(__i386__) *regvalue = regs.r_eip; #elif defined(__mips__) *regvalue = regs.r_regs[PC]; #elif defined(__powerpc__) *regvalue = regs.pc; +#elif defined(__riscv__) + *regvalue = regs.sepc; #endif break; case REG_SP: #if defined(__aarch64__) *regvalue = regs.sp; #elif defined(__amd64__) *regvalue = regs.r_rsp; #elif defined(__arm__) *regvalue = regs.r_sp; #elif defined(__i386__) *regvalue = regs.r_esp; #elif defined(__mips__) *regvalue = regs.r_regs[SP]; #elif defined(__powerpc__) *regvalue = regs.fixreg[1]; +#elif defined(__riscv__) + *regvalue = regs.sp; #endif break; default: DPRINTFX("ERROR: no support for reg number %d", reg); return (-1); } return (0); } int proc_regset(struct proc_handle *phdl, proc_reg_t reg, unsigned long regvalue) { struct reg regs; if (phdl->status == PS_DEAD || phdl->status == PS_UNDEAD || phdl->status == PS_IDLE) { errno = ENOENT; return (-1); } if (ptrace(PT_GETREGS, proc_getpid(phdl), (caddr_t)®s, 0) < 0) return (-1); switch (reg) { case REG_PC: #if defined(__aarch64__) regs.elr = regvalue; #elif defined(__amd64__) regs.r_rip = regvalue; #elif defined(__arm__) regs.r_pc = regvalue; #elif defined(__i386__) regs.r_eip = regvalue; #elif defined(__mips__) regs.r_regs[PC] = regvalue; #elif defined(__powerpc__) regs.pc = regvalue; +#elif defined(__riscv__) + regs.sepc = regvalue; #endif break; case REG_SP: #if defined(__aarch64__) regs.sp = regvalue; #elif defined(__amd64__) regs.r_rsp = regvalue; #elif defined(__arm__) regs.r_sp = regvalue; #elif defined(__i386__) regs.r_esp = regvalue; #elif defined(__mips__) regs.r_regs[PC] = regvalue; #elif defined(__powerpc__) regs.fixreg[1] = regvalue; +#elif defined(__riscv__) + regs.sp = regvalue; #endif break; default: DPRINTFX("ERROR: no support for reg number %d", reg); return (-1); } if (ptrace(PT_SETREGS, proc_getpid(phdl), (caddr_t)®s, 0) < 0) return (-1); return (0); } Index: projects/clang380-import/libexec/rtld-elf/riscv/rtld_machdep.h =================================================================== --- projects/clang380-import/libexec/rtld-elf/riscv/rtld_machdep.h (revision 294776) +++ projects/clang380-import/libexec/rtld-elf/riscv/rtld_machdep.h (revision 294777) @@ -1,111 +1,113 @@ /*- * Copyright (c) 1999, 2000 John D. Polstra. * Copyright (c) 2015 Ruslan Bukin * All rights reserved. * * Portions of this software were developed by SRI International and the * University of Cambridge Computer Laboratory under DARPA/AFRL contract * FA8750-10-C-0237 ("CTSRD"), as part of the DARPA CRASH research programme. * * Portions of this software were developed by the University of Cambridge * Computer Laboratory as part of the CTSRD Project, with support from the * UK Higher Education Innovation Fund (HEIF). * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef RTLD_MACHDEP_H #define RTLD_MACHDEP_H 1 #include #include struct Struct_Obj_Entry; uint64_t set_gp(struct Struct_Obj_Entry *obj); /* Return the address of the .dynamic section in the dynamic linker. */ #define rtld_dynamic(obj) \ ({ \ Elf_Addr _dynamic_addr; \ __asm __volatile("lla %0, _DYNAMIC" : "=r"(_dynamic_addr)); \ (const Elf_Dyn *)_dynamic_addr; \ }) #define RTLD_IS_DYNAMIC() (1) Elf_Addr reloc_jmpslot(Elf_Addr *where, Elf_Addr target, const struct Struct_Obj_Entry *defobj, const struct Struct_Obj_Entry *obj, const Elf_Rel *rel); #define make_function_pointer(def, defobj) \ ((defobj)->relocbase + (def)->st_value) #define call_initfini_pointer(obj, target) \ ({ \ uint64_t old0; \ old0 = set_gp(obj); \ (((InitFunc)(target))()); \ __asm __volatile("mv gp, %0" :: "r"(old0)); \ }) #define call_init_pointer(obj, target) \ ({ \ uint64_t old1; \ old1 = set_gp(obj); \ (((InitArrFunc)(target))(main_argc, main_argv, environ)); \ __asm __volatile("mv gp, %0" :: "r"(old1)); \ }) /* * Lazy binding entry point, called via PLT. */ void _rtld_bind_start(void); /* * TLS */ #define TLS_TP_OFFSET 0x0 #define TLS_DTV_OFFSET 0x800 #define TLS_TCB_SIZE 16 #define round(size, align) \ (((size) + (align) - 1) & ~((align) - 1)) #define calculate_first_tls_offset(size, align) \ round(16, align) #define calculate_tls_offset(prev_offset, prev_size, size, align) \ round(prev_offset + prev_size, align) #define calculate_tls_end(off, size) ((off) + (size)) typedef struct { unsigned long ti_module; unsigned long ti_offset; } tls_index; extern void *__tls_get_addr(tls_index* ti); #define RTLD_DEFAULT_STACK_PF_EXEC PF_X #define RTLD_DEFAULT_STACK_EXEC PROT_EXEC +#define md_abi_variant_hook(x) + #endif Index: projects/clang380-import/sbin/ifconfig/iflagg.c =================================================================== --- projects/clang380-import/sbin/ifconfig/iflagg.c (revision 294776) +++ projects/clang380-import/sbin/ifconfig/iflagg.c (revision 294777) @@ -1,316 +1,332 @@ /*- */ #ifndef lint static const char rcsid[] = "$FreeBSD$"; #endif /* not lint */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "ifconfig.h" char lacpbuf[120]; /* LACP peer '[(a,a,a),(p,p,p)]' */ static void setlaggport(const char *val, int d, int s, const struct afswtch *afp) { struct lagg_reqport rp; bzero(&rp, sizeof(rp)); strlcpy(rp.rp_ifname, name, sizeof(rp.rp_ifname)); strlcpy(rp.rp_portname, val, sizeof(rp.rp_portname)); /* Don't choke if the port is already in this lagg. */ if (ioctl(s, SIOCSLAGGPORT, &rp) && errno != EEXIST) err(1, "SIOCSLAGGPORT"); } static void unsetlaggport(const char *val, int d, int s, const struct afswtch *afp) { struct lagg_reqport rp; bzero(&rp, sizeof(rp)); strlcpy(rp.rp_ifname, name, sizeof(rp.rp_ifname)); strlcpy(rp.rp_portname, val, sizeof(rp.rp_portname)); if (ioctl(s, SIOCSLAGGDELPORT, &rp)) err(1, "SIOCSLAGGDELPORT"); } static void setlaggproto(const char *val, int d, int s, const struct afswtch *afp) { struct lagg_protos lpr[] = LAGG_PROTOS; struct lagg_reqall ra; int i; bzero(&ra, sizeof(ra)); ra.ra_proto = LAGG_PROTO_MAX; for (i = 0; i < nitems(lpr); i++) { if (strcmp(val, lpr[i].lpr_name) == 0) { ra.ra_proto = lpr[i].lpr_proto; break; } } if (ra.ra_proto == LAGG_PROTO_MAX) errx(1, "Invalid aggregation protocol: %s", val); strlcpy(ra.ra_ifname, name, sizeof(ra.ra_ifname)); if (ioctl(s, SIOCSLAGG, &ra) != 0) err(1, "SIOCSLAGG"); } static void setlaggflowidshift(const char *val, int d, int s, const struct afswtch *afp) { struct lagg_reqopts ro; bzero(&ro, sizeof(ro)); ro.ro_opts = LAGG_OPT_FLOWIDSHIFT; strlcpy(ro.ro_ifname, name, sizeof(ro.ro_ifname)); ro.ro_flowid_shift = (int)strtol(val, NULL, 10); if (ro.ro_flowid_shift & ~LAGG_OPT_FLOWIDSHIFT_MASK) errx(1, "Invalid flowid_shift option: %s", val); if (ioctl(s, SIOCSLAGGOPTS, &ro) != 0) err(1, "SIOCSLAGGOPTS"); } static void +setlaggrr_limit(const char *val, int d, int s, const struct afswtch *afp) +{ + struct lagg_reqopts ro; + + bzero(&ro, sizeof(ro)); + strlcpy(ro.ro_ifname, name, sizeof(ro.ro_ifname)); + ro.ro_bkt = (int)strtol(val, NULL, 10); + + if (ioctl(s, SIOCSLAGGOPTS, &ro) != 0) + err(1, "SIOCSLAGG"); +} + +static void setlaggsetopt(const char *val, int d, int s, const struct afswtch *afp) { struct lagg_reqopts ro; bzero(&ro, sizeof(ro)); ro.ro_opts = d; switch (ro.ro_opts) { case LAGG_OPT_USE_FLOWID: case -LAGG_OPT_USE_FLOWID: case LAGG_OPT_LACP_STRICT: case -LAGG_OPT_LACP_STRICT: case LAGG_OPT_LACP_TXTEST: case -LAGG_OPT_LACP_TXTEST: case LAGG_OPT_LACP_RXTEST: case -LAGG_OPT_LACP_RXTEST: case LAGG_OPT_LACP_TIMEOUT: case -LAGG_OPT_LACP_TIMEOUT: break; default: err(1, "Invalid lagg option"); } strlcpy(ro.ro_ifname, name, sizeof(ro.ro_ifname)); if (ioctl(s, SIOCSLAGGOPTS, &ro) != 0) err(1, "SIOCSLAGGOPTS"); } static void setlagghash(const char *val, int d, int s, const struct afswtch *afp) { struct lagg_reqflags rf; char *str, *tmp, *tok; rf.rf_flags = 0; str = tmp = strdup(val); while ((tok = strsep(&tmp, ",")) != NULL) { if (strcmp(tok, "l2") == 0) rf.rf_flags |= LAGG_F_HASHL2; else if (strcmp(tok, "l3") == 0) rf.rf_flags |= LAGG_F_HASHL3; else if (strcmp(tok, "l4") == 0) rf.rf_flags |= LAGG_F_HASHL4; else errx(1, "Invalid lagghash option: %s", tok); } free(str); if (rf.rf_flags == 0) errx(1, "No lagghash options supplied"); strlcpy(rf.rf_ifname, name, sizeof(rf.rf_ifname)); if (ioctl(s, SIOCSLAGGHASH, &rf)) err(1, "SIOCSLAGGHASH"); } static char * lacp_format_mac(const uint8_t *mac, char *buf, size_t buflen) { snprintf(buf, buflen, "%02X-%02X-%02X-%02X-%02X-%02X", (int)mac[0], (int)mac[1], (int)mac[2], (int)mac[3], (int)mac[4], (int)mac[5]); return (buf); } static char * lacp_format_peer(struct lacp_opreq *req, const char *sep) { char macbuf1[20]; char macbuf2[20]; snprintf(lacpbuf, sizeof(lacpbuf), "[(%04X,%s,%04X,%04X,%04X),%s(%04X,%s,%04X,%04X,%04X)]", req->actor_prio, lacp_format_mac(req->actor_mac, macbuf1, sizeof(macbuf1)), req->actor_key, req->actor_portprio, req->actor_portno, sep, req->partner_prio, lacp_format_mac(req->partner_mac, macbuf2, sizeof(macbuf2)), req->partner_key, req->partner_portprio, req->partner_portno); return(lacpbuf); } static void lagg_status(int s) { struct lagg_protos lpr[] = LAGG_PROTOS; struct lagg_reqport rp, rpbuf[LAGG_MAX_PORTS]; struct lagg_reqall ra; struct lagg_reqopts ro; struct lagg_reqflags rf; struct lacp_opreq *lp; const char *proto = ""; int i, isport = 0; bzero(&rp, sizeof(rp)); bzero(&ra, sizeof(ra)); bzero(&ro, sizeof(ro)); strlcpy(rp.rp_ifname, name, sizeof(rp.rp_ifname)); strlcpy(rp.rp_portname, name, sizeof(rp.rp_portname)); if (ioctl(s, SIOCGLAGGPORT, &rp) == 0) isport = 1; strlcpy(ra.ra_ifname, name, sizeof(ra.ra_ifname)); ra.ra_size = sizeof(rpbuf); ra.ra_port = rpbuf; strlcpy(ro.ro_ifname, name, sizeof(ro.ro_ifname)); ioctl(s, SIOCGLAGGOPTS, &ro); strlcpy(rf.rf_ifname, name, sizeof(rf.rf_ifname)); if (ioctl(s, SIOCGLAGGFLAGS, &rf) != 0) rf.rf_flags = 0; if (ioctl(s, SIOCGLAGG, &ra) == 0) { lp = (struct lacp_opreq *)&ra.ra_lacpreq; for (i = 0; i < nitems(lpr); i++) { if (ra.ra_proto == lpr[i].lpr_proto) { proto = lpr[i].lpr_name; break; } } printf("\tlaggproto %s", proto); if (rf.rf_flags & LAGG_F_HASHMASK) { const char *sep = ""; printf(" lagghash "); if (rf.rf_flags & LAGG_F_HASHL2) { printf("%sl2", sep); sep = ","; } if (rf.rf_flags & LAGG_F_HASHL3) { printf("%sl3", sep); sep = ","; } if (rf.rf_flags & LAGG_F_HASHL4) { printf("%sl4", sep); sep = ","; } } if (isport) printf(" laggdev %s", rp.rp_ifname); putchar('\n'); if (verbose) { printf("\tlagg options:\n"); printb("\t\tflags", ro.ro_opts, LAGG_OPT_BITS); putchar('\n'); printf("\t\tflowid_shift: %d\n", ro.ro_flowid_shift); + if (ra.ra_proto == LAGG_PROTO_ROUNDROBIN) + printf("\t\trr_limit: %d\n", ro.ro_bkt); printf("\tlagg statistics:\n"); printf("\t\tactive ports: %d\n", ro.ro_active); printf("\t\tflapping: %u\n", ro.ro_flapping); if (ra.ra_proto == LAGG_PROTO_LACP) { printf("\tlag id: %s\n", lacp_format_peer(lp, "\n\t\t ")); } } for (i = 0; i < ra.ra_ports; i++) { lp = (struct lacp_opreq *)&rpbuf[i].rp_lacpreq; printf("\tlaggport: %s ", rpbuf[i].rp_portname); printb("flags", rpbuf[i].rp_flags, LAGG_PORT_BITS); if (verbose && ra.ra_proto == LAGG_PROTO_LACP) printb(" state", lp->actor_state, LACP_STATE_BITS); putchar('\n'); if (verbose && ra.ra_proto == LAGG_PROTO_LACP) printf("\t\t%s\n", lacp_format_peer(lp, "\n\t\t ")); } if (0 /* XXX */) { printf("\tsupported aggregation protocols:\n"); for (i = 0; i < (sizeof(lpr) / sizeof(lpr[0])); i++) printf("\t\tlaggproto %s\n", lpr[i].lpr_name); } } } static struct cmd lagg_cmds[] = { DEF_CMD_ARG("laggport", setlaggport), DEF_CMD_ARG("-laggport", unsetlaggport), DEF_CMD_ARG("laggproto", setlaggproto), DEF_CMD_ARG("lagghash", setlagghash), DEF_CMD("use_flowid", LAGG_OPT_USE_FLOWID, setlaggsetopt), DEF_CMD("-use_flowid", -LAGG_OPT_USE_FLOWID, setlaggsetopt), DEF_CMD("lacp_strict", LAGG_OPT_LACP_STRICT, setlaggsetopt), DEF_CMD("-lacp_strict", -LAGG_OPT_LACP_STRICT, setlaggsetopt), DEF_CMD("lacp_txtest", LAGG_OPT_LACP_TXTEST, setlaggsetopt), DEF_CMD("-lacp_txtest", -LAGG_OPT_LACP_TXTEST, setlaggsetopt), DEF_CMD("lacp_rxtest", LAGG_OPT_LACP_RXTEST, setlaggsetopt), DEF_CMD("-lacp_rxtest", -LAGG_OPT_LACP_RXTEST, setlaggsetopt), DEF_CMD("lacp_fast_timeout", LAGG_OPT_LACP_TIMEOUT, setlaggsetopt), DEF_CMD("-lacp_fast_timeout", -LAGG_OPT_LACP_TIMEOUT, setlaggsetopt), DEF_CMD_ARG("flowid_shift", setlaggflowidshift), + DEF_CMD_ARG("rr_limit", setlaggrr_limit), }; static struct afswtch af_lagg = { .af_name = "af_lagg", .af_af = AF_UNSPEC, .af_other_status = lagg_status, }; static __constructor void lagg_ctor(void) { int i; for (i = 0; i < nitems(lagg_cmds); i++) cmd_register(&lagg_cmds[i]); af_register(&af_lagg); } Index: projects/clang380-import/sbin/kldstat/Makefile =================================================================== --- projects/clang380-import/sbin/kldstat/Makefile (revision 294776) +++ projects/clang380-import/sbin/kldstat/Makefile (revision 294777) @@ -1,32 +1,34 @@ # # Copyright (c) 1997 Doug Rabson # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # # $FreeBSD$ # PROG= kldstat MAN= kldstat.8 +LIBADD= util + .include Index: projects/clang380-import/sbin/kldstat/kldstat.8 =================================================================== --- projects/clang380-import/sbin/kldstat/kldstat.8 (revision 294776) +++ projects/clang380-import/sbin/kldstat/kldstat.8 (revision 294777) @@ -1,77 +1,81 @@ .\" .\" Copyright (c) 1997 Doug Rabson .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd January 22, 2014 +.Dd January 19, 2016 .Dt KLDSTAT 8 .Os .Sh NAME .Nm kldstat .Nd display status of dynamic kernel linker .Sh SYNOPSIS .Nm +.Op Fl h .Op Fl q .Op Fl v .Op Fl i Ar id .Op Fl n Ar filename .Nm .Op Fl q .Op Fl m Ar modname .Sh DESCRIPTION The .Nm utility displays the status of any files dynamically linked into the kernel. .Pp The following options are available: .Bl -tag -width indentXX +.It Fl h +Display the size field in a human-readable form, using unit suffixes +instead of hex values. .It Fl v Be more verbose. .It Fl i Ar id Display the status of only the file with this ID. .It Fl n Ar filename Display the status of only the file with this filename. .It Fl q Only check if module is loaded or compiled into the kernel. .It Fl m Ar modname Display the status of only the module with this modname. .El .Sh EXIT STATUS .Ex -std .Sh SEE ALSO .Xr kldstat 2 , .Xr kldload 8 , .Xr kldunload 8 .Sh HISTORY The .Nm utility first appeared in .Fx 3.0 , replacing the .Nm lkm interface. .Sh AUTHORS .An Doug Rabson Aq Mt dfr@FreeBSD.org Index: projects/clang380-import/sbin/kldstat/kldstat.c =================================================================== --- projects/clang380-import/sbin/kldstat/kldstat.c (revision 294776) +++ projects/clang380-import/sbin/kldstat/kldstat.c (revision 294777) @@ -1,166 +1,183 @@ /*- * Copyright (c) 1997 Doug Rabson * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include +#include #include #include #include #include #include #include #include #define POINTER_WIDTH ((int)(sizeof(void *) * 2 + 2)) static void printmod(int modid) { struct module_stat stat; stat.version = sizeof(struct module_stat); if (modstat(modid, &stat) < 0) warn("can't stat module id %d", modid); else printf("\t\t%2d %s\n", stat.id, stat.name); } static void -printfile(int fileid, int verbose) +printfile(int fileid, int verbose, int humanized) { struct kld_file_stat stat; int modid; + char buf[5]; stat.version = sizeof(struct kld_file_stat); - if (kldstat(fileid, &stat) < 0) + if (kldstat(fileid, &stat) < 0) { err(1, "can't stat file id %d", fileid); - else - printf("%2d %4d %p %-8zx %s", - stat.id, stat.refs, stat.address, stat.size, - stat.name); + } else { + if (humanized) { + humanize_number(buf, sizeof(buf), stat.size, + "", HN_AUTOSCALE, HN_DECIMAL | HN_NOSPACE); + printf("%2d %4d %p %5s %s", + stat.id, stat.refs, stat.address, buf, stat.name); + } else { + printf("%2d %4d %p %-8zx %s", + stat.id, stat.refs, stat.address, stat.size, stat.name); + } + } + if (verbose) { printf(" (%s)\n", stat.pathname); printf("\tContains modules:\n"); printf("\t\tId Name\n"); for (modid = kldfirstmod(fileid); modid > 0; modid = modfnext(modid)) printmod(modid); } else printf("\n"); } static void usage(void) { - fprintf(stderr, "usage: kldstat [-q] [-v] [-i id] [-n filename]\n"); + fprintf(stderr, "usage: kldstat [-h] [-q] [-v] [-i id] [-n filename]\n"); fprintf(stderr, " kldstat [-q] [-m modname]\n"); exit(1); } int main(int argc, char** argv) { int c; + int humanized = 0; int verbose = 0; int fileid = 0; int quiet = 0; char* filename = NULL; char* modname = NULL; char* p; - while ((c = getopt(argc, argv, "i:m:n:qv")) != -1) + while ((c = getopt(argc, argv, "hi:m:n:qv")) != -1) switch (c) { + case 'h': + humanized = 1; + break; case 'i': fileid = (int)strtoul(optarg, &p, 10); if (*p != '\0') usage(); break; case 'm': modname = optarg; break; case 'n': filename = optarg; break; case 'q': quiet = 1; break; case 'v': verbose = 1; break; default: usage(); } argc -= optind; argv += optind; if (argc != 0) usage(); if (modname != NULL) { int modid; struct module_stat stat; if ((modid = modfind(modname)) < 0) { if (!quiet) warn("can't find module %s", modname); return 1; } else if (quiet) { return 0; } stat.version = sizeof(struct module_stat); if (modstat(modid, &stat) < 0) warn("can't stat module id %d", modid); else { printf("Id Refs Name\n"); printf("%3d %4d %s\n", stat.id, stat.refs, stat.name); } return 0; } if (filename != NULL) { if ((fileid = kldfind(filename)) < 0) { if (!quiet) warn("can't find file %s", filename); return 1; } else if (quiet) { return 0; } } - printf("Id Refs Address%*c Size Name\n", POINTER_WIDTH - 7, ' '); + if (humanized) + printf("Id Refs Address%*c Size Name\n", POINTER_WIDTH - 7, ' '); + else + printf("Id Refs Address%*c Size Name\n", POINTER_WIDTH - 7, ' '); if (fileid != 0) - printfile(fileid, verbose); + printfile(fileid, verbose, humanized); else for (fileid = kldnext(0); fileid > 0; fileid = kldnext(fileid)) - printfile(fileid, verbose); + printfile(fileid, verbose, humanized); return 0; } Index: projects/clang380-import/sbin =================================================================== --- projects/clang380-import/sbin (revision 294776) +++ projects/clang380-import/sbin (revision 294777) Property changes on: projects/clang380-import/sbin ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sbin:r294599-294776 Index: projects/clang380-import/share/dtrace/watch_kill =================================================================== --- projects/clang380-import/share/dtrace/watch_kill (revision 294776) +++ projects/clang380-import/share/dtrace/watch_kill (revision 294777) @@ -1,232 +1,232 @@ #!/usr/sbin/dtrace -s /* - - * Copyright (c) 2014-2015 Devin Teske + * Copyright (c) 2014-2016 Devin Teske * All rights reserved. * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $Title: dtrace(1) script to log process(es) entering syscall::kill $ * $FreeBSD$ */ #pragma D option quiet #pragma D option dynvarsize=16m #pragma D option switchrate=10hz /*********************************************************/ syscall::execve:entry /* probe ID 1 */ { this->caller_execname = execname; } /*********************************************************/ syscall::kill:entry /* probe ID 2 */ { this->pid_to_kill = (pid_t)arg0; this->kill_signal = (int)arg1; /* * Examine process, parent process, and grandparent process details */ /******************* CURPROC *******************/ this->proc = curthread->td_proc; this->pid0 = this->proc->p_pid; this->uid0 = this->proc->p_ucred->cr_uid; this->gid0 = this->proc->p_ucred->cr_rgid; this->p_args = this->proc->p_args; this->ar_length = this->p_args ? this->p_args->ar_length : 0; this->ar_args = (char *)(this->p_args ? this->p_args->ar_args : 0); this->arg0_0 = this->ar_length > 0 ? this->ar_args : stringof(this->proc->p_comm); this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg0_1 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg0_2 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg0_3 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg0_4 = this->ar_length > 0 ? "..." : ""; /******************* PPARENT *******************/ this->proc = this->proc->p_pptr; this->pid1 = this->proc->p_pid; this->uid1 = this->proc->p_ucred->cr_uid; this->gid1 = this->proc->p_ucred->cr_rgid; this->p_args = this->proc ? this->proc->p_args : 0; this->ar_length = this->p_args ? this->p_args->ar_length : 0; this->ar_args = (char *)(this->p_args ? this->p_args->ar_args : 0); this->arg1_0 = this->ar_length > 0 ? this->ar_args : stringof(this->proc->p_comm); this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg1_1 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg1_2 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg1_3 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg1_4 = this->ar_length > 0 ? "..." : ""; /******************* GPARENT *******************/ this->proc = this->proc->p_pptr; this->pid2 = this->proc->p_pid; this->uid2 = this->proc->p_ucred->cr_uid; this->gid2 = this->proc->p_ucred->cr_rgid; this->p_args = this->proc ? this->proc->p_args : 0; this->ar_length = this->p_args ? this->p_args->ar_length : 0; this->ar_args = (char *)(this->p_args ? this->p_args->ar_args : 0); this->arg2_0 = this->ar_length > 0 ? this->ar_args : stringof(this->proc->p_comm); this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg2_1 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg2_2 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg2_3 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg2_4 = this->ar_length > 0 ? "..." : ""; /******************* APARENT *******************/ this->proc = this->proc->p_pptr; this->pid3 = this->proc->p_pid; this->uid3 = this->proc->p_ucred->cr_uid; this->gid3 = this->proc->p_ucred->cr_rgid; this->p_args = this->proc ? this->proc->p_args : 0; this->ar_length = this->p_args ? this->p_args->ar_length : 0; this->ar_args = (char *)(this->p_args ? this->p_args->ar_args : 0); this->arg3_0 = this->ar_length > 0 ? this->ar_args : stringof(this->proc->p_comm); this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg3_1 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg3_2 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg3_3 = this->ar_length > 0 ? this->ar_args : ""; this->len = this->ar_length > 0 ? strlen(this->ar_args) + 1 : 0; this->ar_args += this->len; this->ar_length -= this->len; this->arg3_4 = this->ar_length > 0 ? "..." : ""; /***********************************************/ /* * Print process, parent, and grandparent details */ printf("%Y %s[%d]: ", timestamp + 1406598400000000000, this->caller_execname, this->pid1); printf("%s", this->arg0_0); printf("%s%s", this->arg0_1 != "" ? " " : "", this->arg0_1); printf("%s%s", this->arg0_2 != "" ? " " : "", this->arg0_2); printf("%s%s", this->arg0_3 != "" ? " " : "", this->arg0_3); printf("%s%s", this->arg0_4 != "" ? " " : "", this->arg0_4); printf(" (sending signal %u to pid %u)", this->kill_signal, this->pid_to_kill); printf("\n"); printf(" -+= %05d %d.%d %s", this->pid3, this->uid3, this->gid3, this->arg3_0); printf("%s%s", this->arg3_1 != "" ? " " : "", this->arg3_1); printf("%s%s", this->arg3_2 != "" ? " " : "", this->arg3_2); printf("%s%s", this->arg3_3 != "" ? " " : "", this->arg3_3); printf("%s%s", this->arg3_4 != "" ? " " : "", this->arg3_4); printf("%s", this->arg3_0 != "" ? "\n" : ""); printf(" \-+= %05d %d.%d %s", this->pid2, this->uid2, this->gid2, this->arg2_0); printf("%s%s", this->arg2_1 != "" ? " " : "", this->arg2_1); printf("%s%s", this->arg2_2 != "" ? " " : "", this->arg2_2); printf("%s%s", this->arg2_3 != "" ? " " : "", this->arg2_3); printf("%s%s", this->arg2_4 != "" ? " " : "", this->arg2_4); printf("%s", this->arg2_0 != "" ? "\n" : ""); printf(" \-+= %05d %d.%d %s", this->pid1, this->uid1, this->gid1, this->arg1_0); printf("%s%s", this->arg1_1 != "" ? " " : "", this->arg1_1); printf("%s%s", this->arg1_2 != "" ? " " : "", this->arg1_2); printf("%s%s", this->arg1_3 != "" ? " " : "", this->arg1_3); printf("%s%s", this->arg1_4 != "" ? " " : "", this->arg1_4); printf("%s", this->arg1_0 != "" ? "\n" : ""); printf(" \-+= %05d %d.%d %s", this->pid0, this->uid0, this->gid0, this->arg0_0); printf("%s%s", this->arg0_1 != "" ? " " : "", this->arg0_1); printf("%s%s", this->arg0_2 != "" ? " " : "", this->arg0_2); printf("%s%s", this->arg0_3 != "" ? " " : "", this->arg0_3); printf("%s%s", this->arg0_4 != "" ? " " : "", this->arg0_4); printf("%s", this->arg0_0 != "" ? "\n" : ""); } Index: projects/clang380-import/share/man/man4/lagg.4 =================================================================== --- projects/clang380-import/share/man/man4/lagg.4 (revision 294776) +++ projects/clang380-import/share/man/man4/lagg.4 (revision 294777) @@ -1,204 +1,222 @@ .\" $OpenBSD: trunk.4,v 1.18 2006/06/09 13:53:34 jmc Exp $ .\" .\" Copyright (c) 2005, 2006 Reyk Floeter .\" .\" Permission to use, copy, modify, and distribute this software for any .\" purpose with or without fee is hereby granted, provided that the above .\" copyright notice and this permission notice appear in all copies. .\" .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. .\" .\" $FreeBSD$ .\" -.Dd November 6, 2015 +.Dd January 23, 2016 .Dt LAGG 4 .Os .Sh NAME .Nm lagg .Nd link aggregation and link failover interface .Sh SYNOPSIS To compile this driver into the kernel, place the following line in your kernel configuration file: .Bd -ragged -offset indent .Cd "device lagg" .Ed .Pp Alternatively, to load the driver as a module at boot time, place the following line in .Xr loader.conf 5 : .Bd -literal -offset indent if_lagg_load="YES" .Ed .Sh DESCRIPTION The .Nm interface allows aggregation of multiple network interfaces as one virtual .Nm interface for the purpose of providing fault-tolerance and high-speed links. .Pp A .Nm interface can be created using the .Ic ifconfig lagg Ns Ar N Ic create command. It can use different link aggregation protocols specified using the .Ic laggproto Ar proto option. Child interfaces can be added using the .Ic laggport Ar child-iface option and removed using the .Ic -laggport Ar child-iface option. .Pp The driver currently supports the aggregation protocols .Ic failover (the default), .Ic lacp , .Ic loadbalance , .Ic roundrobin , .Ic broadcast , and .Ic none . The protocols determine which ports are used for outgoing traffic and whether a specific port accepts incoming traffic. The interface link state is used to validate if the port is active or not. .Bl -tag -width loadbalance .It Ic failover Sends traffic only through the active port. If the master port becomes unavailable, the next active port is used. The first interface added is the master port; any interfaces added after that are used as failover devices. .Pp By default, received traffic is only accepted when they are received through the active port. This constraint can be relaxed by setting the .Va net.link.lagg.failover_rx_all .Xr sysctl 8 variable to a nonzero value, which is useful for certain bridged network setups. .Ic loadbalance mode. .It Ic lacp Supports the IEEE 802.1AX (formerly 802.3ad) Link Aggregation Control Protocol (LACP) and the Marker Protocol. LACP will negotiate a set of aggregable links with the peer in to one or more Link Aggregated Groups. Each LAG is composed of ports of the same speed, set to full-duplex operation. The traffic will be balanced across the ports in the LAG with the greatest total speed, in most cases there will only be one LAG which contains all ports. In the event of changes in physical connectivity, Link Aggregation will quickly converge to a new configuration. .It Ic loadbalance Balances outgoing traffic across the active ports based on hashed protocol header information and accepts incoming traffic from any active port. This is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. The hash includes the Ethernet source and destination address, and, if available, the VLAN tag, and the IP source and destination address. .It Ic roundrobin Distributes outgoing traffic using a round-robin scheduler through all active ports and accepts incoming traffic from any active port. +Using +.Ic roundrobin +mode can cause unordered packet arrival at the client. +Throughput might be limited as the client performs CPU-intensive packet +reordering. .It Ic broadcast Sends frames to all ports of the LAG and receives frames on any port of the LAG. .It Ic none This protocol is intended to do nothing: it disables any traffic without disabling the .Nm interface itself. .El .Pp Each .Nm interface is created at runtime using interface cloning. This is most easily done with the .Xr ifconfig 8 .Cm create command or using the .Va cloned_interfaces variable in .Xr rc.conf 5 . .Pp The MTU of the first interface to be added is used as the lagg MTU. All additional interfaces are required to have exactly the same value. .Pp The .Ic loadbalance and .Ic lacp modes will use the RSS hash from the network card if available to avoid computing one, this may give poor traffic distribution if the hash is invalid or uses less of the protocol header information. Local hash computation can be forced per interface by setting the .Cm use_flowid .Xr ifconfig 8 flag. The default for new interfaces is set via the .Va net.link.lagg.default_use_flowid .Xr sysctl 8 . .Sh EXAMPLES Create a link aggregation using LACP with two .Xr bge 4 Gigabit Ethernet interfaces: .Bd -literal -offset indent # ifconfig bge0 up # ifconfig bge1 up # ifconfig lagg0 create # ifconfig lagg0 laggproto lacp laggport bge0 laggport bge1 \e 192.168.1.1 netmask 255.255.255.0 +.Ed +.Pp +Create a link aggregation using ROUNDROBIN with two +.Xr bge 4 +Gigabit Ethernet interfaces and set the limit of 500 packets +per interface: +.Bd -literal -offset indent +# ifconfig bge0 up +# ifconfig bge1 up +# ifconfig lagg0 create +# ifconfig lagg0 laggproto roundrobin laggport bge0 laggport bge1 \e + 192.168.1.1 netmask 255.255.255.0 +# ifconfig lagg0 rr_limit 500 .Ed .Pp The following example uses an active failover interface to set up roaming between wired and wireless networks using two network devices. Whenever the wired master interface is unplugged, the wireless failover device will be used: .Bd -literal -offset indent # ifconfig em0 up # ifconfig ath0 ether 00:11:22:33:44:55 # ifconfig create wlan0 wlandev ath0 ssid my_net up # ifconfig lagg0 create # ifconfig lagg0 laggproto failover laggport em0 laggport wlan0 \e 192.168.1.1 netmask 255.255.255.0 .Ed .Pp (Note the mac address of the wireless device is forced to match the wired device as a workaround.) .Sh SEE ALSO .Xr ng_one2many 4 , .Xr ifconfig 8 , .Xr sysctl 8 .Sh HISTORY The .Nm device first appeared in .Fx 6.3 . .Sh AUTHORS .An -nosplit The .Nm driver was written under the name .Nm trunk by .An Reyk Floeter Aq Mt reyk@openbsd.org . The LACP implementation was written by .An YAMAMOTO Takashi for .Nx . .Sh BUGS There is no way to configure LACP administrative variables, including system and port priorities. The current implementation always performs active-mode LACP and uses 0x8000 as system and port priorities. Index: projects/clang380-import/share/man/man4 =================================================================== --- projects/clang380-import/share/man/man4 (revision 294776) +++ projects/clang380-import/share/man/man4 (revision 294777) Property changes on: projects/clang380-import/share/man/man4 ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/share/man/man4:r294599-294776 Index: projects/clang380-import/share/man/man5/ext2fs.5 =================================================================== --- projects/clang380-import/share/man/man5/ext2fs.5 (revision 294776) +++ projects/clang380-import/share/man/man5/ext2fs.5 (revision 294777) @@ -1,83 +1,89 @@ .\" .\" Copyright (c) 2006 Craig Rodrigues .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 3. The name of the author may not be used to endorse or promote products .\" derived from this software without specific prior written permission .\" .\" THIS DOCUMENTATION IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES .\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. .\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, .\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT .\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, .\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY .\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd October 1, 2013 +.Dd January 23, 2016 .Dt EXT2FS 5 .Os .Sh NAME .Nm ext2fs -.Nd "Ext2fs file system" +.Nd "ext2/ext3/ext4 file system" .Sh SYNOPSIS To link into the kernel: .Bd -ragged -offset indent .Cd "options EXT2FS" .Ed .Pp To load as a kernel loadable module: .Pp .Dl "kldload ext2fs" .Sh DESCRIPTION The .Nm driver will permit the .Fx kernel to access -.Tn Ext2 +.Tn ext2 , +.Tn ext3 , +and +.Tn ext4 file systems. +The +.Tn ext4 +support is read-only. .Sh EXAMPLES To mount a .Nm volume located on .Pa /dev/ada1s1 : .Pp .Dl "mount -t ext2fs /dev/ada1s1 /mnt" .Sh SEE ALSO .Xr nmount 2 , .Xr unmount 2 , .Xr fstab 5 , .Xr mount 8 .Sh HISTORY The .Nm driver first appeared in .Fx 2.2 . .Sh AUTHORS .An -nosplit The .Nm kernel implementation was written by .An Godmar Back or modified by him using the CSRG sources. .Pp .An John Dyson and others in the .Fx Project made modifications. .Pp This manual page was written by .An Craig Rodrigues Aq Mt rodrigc@FreeBSD.org . Index: projects/clang380-import/share/man/man9/hashinit.9 =================================================================== --- projects/clang380-import/share/man/man9/hashinit.9 (revision 294776) +++ projects/clang380-import/share/man/man9/hashinit.9 (revision 294777) @@ -1,176 +1,178 @@ .\" .\" Copyright (c) 2004 Joseph Koshy .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' .\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE .\" LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE .\" POSSIBILITY OF SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd October 10, 2004 +.Dd January 23, 2016 .Dt HASHINIT 9 .Os .Sh NAME .Nm hashinit , hashinit_flags , hashdestroy , phashinit .Nd manage kernel hash tables .Sh SYNOPSIS .In sys/malloc.h .In sys/systm.h .In sys/queue.h .Ft "void *" .Fn hashinit "int nelements" "struct malloc_type *type" "u_long *hashmask" .Ft void .Fo hashinit_flags .Fa "int nelements" "struct malloc_type *type" "u_long *hashmask" "int flags" .Fc .Ft void .Fn hashdestroy "void *hashtbl" "struct malloc_type *type" "u_long hashmask" .Ft "void *" .Fn phashinit "int nelements" "struct malloc_type *type" "u_long *nentries" .Sh DESCRIPTION The .Fn hashinit , .Fn hashinit_flags and .Fn phashinit functions allocate space for hash tables of size given by the argument .Fa nelements . .Pp The .Fn hashinit function allocates hash tables that are sized to largest power of two less than or equal to argument .Fa nelements . The .Fn phashinit function allocates hash tables that are sized to the largest prime number less than or equal to argument .Fa nelements . The .Fn hashinit_flags function operates like .Fn hashinit but also accepts an additional argument .Fa flags which control various options during allocation. Allocated hash tables are contiguous arrays of .Xr LIST_HEAD 3 entries, allocated using .Xr malloc 9 , and initialized using .Xr LIST_INIT 3 . The malloc arena to be used for allocation is pointed to by argument .Fa type . .Pp The .Fn hashdestroy function frees the space occupied by the hash table pointed to by argument .Fa hashtbl . Argument .Fa type determines the malloc arena to use when freeing space. The argument .Fa hashmask should be the bit mask returned by the call to .Fn hashinit that allocated the hash table. The argument .Fa flags must be used with one of the following values. .Pp .Bl -tag -width ".Dv HASH_NOWAIT" -offset indent -compact .It Dv HASH_NOWAIT Any malloc performed by the .Fn hashinit_flags function will not be allowed to wait, and therefore may fail. .It Dv HASH_WAITOK -Any malloc performed by the +Any malloc performed by .Fn hashinit_flags function is allowed to wait for memory. +This is also the behavior of +.Fn hashinit . .El .Sh IMPLEMENTATION NOTES The largest prime hash value chosen by .Fn phashinit is 32749. .Sh RETURN VALUES The .Fn hashinit function returns a pointer to an allocated hash table and sets the location pointed to by .Fa hashmask to the bit mask to be used for computing the correct slot in the hash table. .Pp The .Fn phashinit function returns a pointer to an allocated hash table and sets the location pointed to by .Fa nentries to the number of rows in the hash table. .Sh EXAMPLES A typical example is shown below: .Bd -literal -offset indent \&... static LIST_HEAD(foo, foo) *footable; static u_long foomask; \&... footable = hashinit(32, M_FOO, &foomask); .Ed .Pp Here we allocate a hash table with 32 entries from the malloc arena pointed to by .Dv M_FOO . The mask for the allocated hash table is returned in .Va foomask . A subsequent call to .Fn hashdestroy uses the value in .Va foomask : .Bd -literal -offset indent \&... hashdestroy(footable, M_FOO, foomask); .Ed .Sh DIAGNOSTICS The .Fn hashinit and .Fn phashinit functions will panic if argument .Fa nelements is less than or equal to zero. .Pp The .Fn hashdestroy function will panic if the hash table pointed to by .Fa hashtbl is not empty. .Sh SEE ALSO .Xr LIST_HEAD 3 , .Xr malloc 9 .Sh BUGS There is no .Fn phashdestroy function, and using .Fn hashdestroy to free a hash table allocated by .Fn phashinit usually has grave consequences. Index: projects/clang380-import/share/mk/auto.obj.mk =================================================================== --- projects/clang380-import/share/mk/auto.obj.mk (revision 294776) +++ projects/clang380-import/share/mk/auto.obj.mk (revision 294777) @@ -1,65 +1,64 @@ # $FreeBSD$ -# $Id: auto.obj.mk,v 1.10 2015/04/16 16:59:00 sjg Exp $ +# $Id: auto.obj.mk,v 1.12 2015/12/16 01:57:06 sjg Exp $ # # @(#) Copyright (c) 2004, Simon J. Gerraty # # This file is provided in the hope that it will # be of use. There is absolutely NO WARRANTY. # Permission to copy, redistribute or otherwise # use this file is hereby granted provided that # the above copyright notice and this notice are # left intact. # # Please send copies of changes and bug-fixes to: # sjg@crufty.net # ECHO_TRACE ?= echo .ifndef Mkdirs # A race condition in some versions of mkdir, means that it can bail # if another process made a dir that mkdir expected to. # We repeat the mkdir -p a number of times to try and work around this. # We stop looping as soon as the dir exists. # If we get to the end of the loop, a plain mkdir will issue an error. Mkdirs= Mkdirs() { \ for d in $$*; do \ for i in 1 2 3 4 5 6; do \ mkdir -p $$d; \ test -d $$d && return 0; \ done > /dev/null 2>&1; \ mkdir $$d || exit $$?; \ done; } .endif # if MKOBJDIRS is set to auto (and NOOBJ isn't defined) do some magic... # This will automatically create objdirs as needed. # Skip it if we are just doing 'clean'. .if ${MK_AUTO_OBJ:Uno} == "yes" MKOBJDIRS= auto .endif .if !defined(NOOBJ) && !defined(NO_OBJ) && ${MKOBJDIRS:Uno} == auto # Use __objdir here so it is easier to tweak without impacting # the logic. -.if !empty(MAKEOBJDIRPREFIX) && exists(${MAKEOBJDIRPREFIX}) +.if !empty(MAKEOBJDIRPREFIX) __objdir?= ${MAKEOBJDIRPREFIX}${.CURDIR} .endif __objdir?= ${MAKEOBJDIR:Uobj} -__objdir:= ${__objdir:tA} -.if ${.OBJDIR} != ${__objdir} +__objdir:= ${__objdir} +.if ${.OBJDIR:tA} != ${__objdir:tA} # We need to chdir, make the directory if needed .if !exists(${__objdir}/) && \ (${.TARGETS} == "" || ${.TARGETS:Nclean*:N*clean:Ndestroy*} != "") # This will actually make it... __objdir_made != echo ${__objdir}/; umask ${OBJDIR_UMASK:U002}; \ ${ECHO_TRACE} "[Creating objdir ${__objdir}...]" >&2; \ ${Mkdirs}; Mkdirs ${__objdir} -__objdir:= ${__objdir:tA} .endif # This causes make to use the specified directory as .OBJDIR .OBJDIR: ${__objdir} -.if ${.OBJDIR} != ${__objdir} && ${__objdir_made:Uno:M${__objdir}/*} != "" +.if ${.OBJDIR:tA} != ${__objdir:tA} && ${__objdir_made:Uno:M${__objdir}/*} != "" .error could not use ${__objdir}: .OBJDIR=${.OBJDIR} .endif .endif .endif Index: projects/clang380-import/share/mk/bsd.dep.mk =================================================================== --- projects/clang380-import/share/mk/bsd.dep.mk (revision 294776) +++ projects/clang380-import/share/mk/bsd.dep.mk (revision 294777) @@ -1,284 +1,286 @@ # $FreeBSD$ # # The include file handles Makefile dependencies. # # # +++ variables +++ # # CTAGS A tags file generation program [gtags] # # CTAGSFLAGS Options for ctags(1) [not set] # # DEPENDFILE dependencies file [.depend] # # GTAGSFLAGS Options for gtags(1) [-o] # # HTAGSFLAGS Options for htags(1) [not set] # # MKDEP Options for ${MKDEPCMD} [not set] # # MKDEPCMD Makefile dependency list program [mkdep] # # SRCS List of source files (c, c++, assembler) # # DPSRCS List of source files which are needed for generating # dependencies, ${SRCS} are always part of it. # # +++ targets +++ # # cleandepend: # Remove depend and tags file # # depend: # Make the dependencies for the source files, and store # them in the file ${DEPENDFILE}. # # tags: # In "ctags" mode, create a tags file for the source files. # In "gtags" mode, create a (GLOBAL) gtags file for the # source files. If HTML is defined, htags(1) is also run # after gtags(1). .if !target(____) .error bsd.dep.mk cannot be included directly. .endif CTAGS?= gtags CTAGSFLAGS?= GTAGSFLAGS?= -o HTAGSFLAGS?= _MKDEPCC:= ${CC:N${CCACHE_BIN}} # XXX: DEPFLAGS can come out once Makefile.inc1 properly passes down # CXXFLAGS. .if !empty(DEPFLAGS) _MKDEPCC+= ${DEPFLAGS} .endif MKDEPCMD?= CC='${_MKDEPCC}' mkdep DEPENDFILE?= .depend +.MAKE.DEPENDFILE= ${DEPENDFILE} DEPENDFILES= ${DEPENDFILE} # Keep `tags' here, before SRCS are mangled below for `depend'. .if !target(tags) && defined(SRCS) && !defined(NO_TAGS) tags: ${SRCS} .if ${CTAGS:T} == "gtags" @cd ${.CURDIR} && ${CTAGS} ${GTAGSFLAGS} ${.OBJDIR} .if defined(HTML) @cd ${.CURDIR} && htags ${HTAGSFLAGS} -d ${.OBJDIR} ${.OBJDIR} .endif .else @${CTAGS} ${CTAGSFLAGS} -f /dev/stdout \ ${.ALLSRC:N*.h} | sed "s;${.CURDIR}/;;" > ${.TARGET} .endif .endif .if defined(SRCS) CLEANFILES?= .if ${MK_FAST_DEPEND} == "yes" || !exists(${.OBJDIR}/${DEPENDFILE}) .for _S in ${SRCS:N*.[dhly]} ${_S:R}.o: ${_S} .endfor .endif # Lexical analyzers .for _LSRC in ${SRCS:M*.l:N*/*} .for _LC in ${_LSRC:R}.c ${_LC}: ${_LSRC} ${LEX} ${LFLAGS} -o${.TARGET} ${.ALLSRC} .if ${MK_FAST_DEPEND} == "yes" || !exists(${.OBJDIR}/${DEPENDFILE}) ${_LC:R}.o: ${_LC} .endif SRCS:= ${SRCS:S/${_LSRC}/${_LC}/} CLEANFILES+= ${_LC} .endfor .endfor # Yacc grammars .for _YSRC in ${SRCS:M*.y:N*/*} .for _YC in ${_YSRC:R}.c SRCS:= ${SRCS:S/${_YSRC}/${_YC}/} CLEANFILES+= ${_YC} .if !empty(YFLAGS:M-d) && !empty(SRCS:My.tab.h) .ORDER: ${_YC} y.tab.h ${_YC} y.tab.h: ${_YSRC} ${YACC} ${YFLAGS} ${.ALLSRC} cp y.tab.c ${_YC} CLEANFILES+= y.tab.c y.tab.h .elif !empty(YFLAGS:M-d) .for _YH in ${_YC:R}.h ${_YH}: ${_YC} ${_YC}: ${_YSRC} ${YACC} ${YFLAGS} -o ${_YC} ${.ALLSRC} SRCS+= ${_YH} CLEANFILES+= ${_YH} .endfor .else ${_YC}: ${_YSRC} ${YACC} ${YFLAGS} -o ${_YC} ${.ALLSRC} .endif .if ${MK_FAST_DEPEND} == "yes" || !exists(${.OBJDIR}/${DEPENDFILE}) ${_YC:R}.o: ${_YC} .endif .endfor .endfor # DTrace probe definitions .if ${SRCS:M*.d} CFLAGS+= -I${.OBJDIR} .endif .for _DSRC in ${SRCS:M*.d:N*/*} .for _D in ${_DSRC:R} -DHDRS+= ${_D}.h +SRCS+= ${_D}.h ${_D}.h: ${_DSRC} ${DTRACE} ${DTRACEFLAGS} -h -s ${.ALLSRC} SRCS:= ${SRCS:S/^${_DSRC}$//} OBJS+= ${_D}.o CLEANFILES+= ${_D}.h ${_D}.o ${_D}.o: ${_DSRC} ${OBJS:S/^${_D}.o$//} - ${DTRACE} ${DTRACEFLAGS} -G -o ${.TARGET} -s ${.ALLSRC} + @rm -f ${.TARGET} + ${DTRACE} ${DTRACEFLAGS} -G -o ${.TARGET} -s ${.ALLSRC:N*.h} .if defined(LIB) CLEANFILES+= ${_D}.So ${_D}.po ${_D}.So: ${_DSRC} ${SOBJS:S/^${_D}.So$//} - ${DTRACE} ${DTRACEFLAGS} -G -o ${.TARGET} -s ${.ALLSRC} + @rm -f ${.TARGET} + ${DTRACE} ${DTRACEFLAGS} -G -o ${.TARGET} -s ${.ALLSRC:N*.h} ${_D}.po: ${_DSRC} ${POBJS:S/^${_D}.po$//} - ${DTRACE} ${DTRACEFLAGS} -G -o ${.TARGET} -s ${.ALLSRC} + @rm -f ${.TARGET} + ${DTRACE} ${DTRACEFLAGS} -G -o ${.TARGET} -s ${.ALLSRC:N*.h} .endif .endfor .endfor -beforedepend: ${DHDRS} -beforebuild: ${DHDRS} .if ${MK_FAST_DEPEND} == "yes" && \ (${.MAKE.MODE:Mmeta} == "" || ${.MAKE.MODE:Mnofilemon} != "") DEPENDFILES+= ${DEPENDFILE}.* DEPEND_MP?= -MP # Handle OBJS=../somefile.o hacks. Just replace '/' rather than use :T to # avoid collisions. DEPEND_FILTER= C,/,_,g DEPEND_CFLAGS+= -MD ${DEPEND_MP} -MF${DEPENDFILE}.${.TARGET:${DEPEND_FILTER}} DEPEND_CFLAGS+= -MT${.TARGET} .if defined(.PARSEDIR) # Only add in DEPEND_CFLAGS for CFLAGS on files we expect from DEPENDOBJS # as those are the only ones we will include. DEPEND_CFLAGS_CONDITION= !empty(DEPENDOBJS:M${.TARGET:${DEPEND_FILTER}}) CFLAGS+= ${${DEPEND_CFLAGS_CONDITION}:?${DEPEND_CFLAGS}:} .else CFLAGS+= ${DEPEND_CFLAGS} .endif DEPENDSRCS= ${SRCS:M*.[cSC]} ${SRCS:M*.cxx} ${SRCS:M*.cpp} ${SRCS:M*.cc} .if !empty(DEPENDSRCS) DEPENDOBJS+= ${DEPENDSRCS:R:S,$,.o,} .endif DEPENDFILES_OBJS= ${DEPENDOBJS:O:u:${DEPEND_FILTER}:C/^/${DEPENDFILE}./} .if ${.MAKEFLAGS:M-V} == "" .for __depend_obj in ${DEPENDFILES_OBJS} .sinclude "${__depend_obj}" .endfor .endif .endif # ${MK_FAST_DEPEND} == "yes" .endif # defined(SRCS) .if ${MK_DIRDEPS_BUILD} == "yes" .include # this depend: bypasses that below # the dependency helps when bootstrapping depend: beforedepend ${DPSRCS} ${SRCS} afterdepend beforedepend: afterdepend: beforedepend .endif .if !target(depend) .if defined(SRCS) depend: beforedepend ${DEPENDFILE} afterdepend # Tell bmake not to look for generated files via .PATH .NOPATH: ${DEPENDFILE} ${DEPENDFILES_OBJS} .if ${MK_FAST_DEPEND} == "no" # Capture -include from CFLAGS. # This could be simpler with bmake :tW but needs to support fmake for MFC. _CFLAGS_INCLUDES= ${CFLAGS:Q:S/\\ /,/g:C/-include,/-include%/g:C/,/ /g:M-include*:C/%/ /g} _CXXFLAGS_INCLUDES= ${CXXFLAGS:Q:S/\\ /,/g:C/-include,/-include%/g:C/,/ /g:M-include*:C/%/ /g} # XXX: Temporary hack to workaround .depend files not tracking -include .if !empty(_CFLAGS_INCLUDES) ${OBJS} ${POBJS} ${SOBJS}: ${_CFLAGS_INCLUDES:M*.h} .endif .if !empty(_CXXFLAGS_INCLUDES) ${OBJS} ${POBJS} ${SOBJS}: ${_CXXFLAGS_INCLUDES:M*.h} .endif # Different types of sources are compiled with slightly different flags. # Split up the sources, and filter out headers and non-applicable flags. MKDEP_CFLAGS= ${CFLAGS:M-nostdinc*} ${CFLAGS:M-[BIDU]*} ${CFLAGS:M-std=*} \ ${CFLAGS:M-ansi} ${_CFLAGS_INCLUDES} MKDEP_CXXFLAGS= ${CXXFLAGS:M-nostdinc*} ${CXXFLAGS:M-[BIDU]*} \ ${CXXFLAGS:M-std=*} ${CXXFLAGS:M-ansi} ${CXXFLAGS:M-stdlib=*} \ ${_CXXFLAGS_INCLUDES} .endif # ${MK_FAST_DEPEND} == "no" DPSRCS+= ${SRCS} ${DEPENDFILE}: ${DPSRCS} .if ${MK_FAST_DEPEND} == "no" rm -f ${DEPENDFILE} .if !empty(DPSRCS:M*.[cS]) ${MKDEPCMD} -f ${DEPENDFILE} -a ${MKDEP} \ ${MKDEP_CFLAGS} ${.ALLSRC:M*.[cS]} .endif .if !empty(DPSRCS:M*.cc) || !empty(DPSRCS:M*.C) || !empty(DPSRCS:M*.cpp) || \ !empty(DPSRCS:M*.cxx) ${MKDEPCMD} -f ${DEPENDFILE} -a ${MKDEP} \ ${MKDEP_CXXFLAGS} \ ${.ALLSRC:M*.cc} ${.ALLSRC:M*.C} ${.ALLSRC:M*.cpp} ${.ALLSRC:M*.cxx} .else .endif .else : > ${.TARGET} .endif # ${MK_FAST_DEPEND} == "no" .if target(_EXTRADEPEND) _EXTRADEPEND: .USE ${DEPENDFILE}: _EXTRADEPEND .endif .ORDER: ${DEPENDFILE} afterdepend .else depend: beforedepend afterdepend .endif .if !target(beforedepend) beforedepend: .else .ORDER: beforedepend ${DEPENDFILE} .ORDER: beforedepend afterdepend .endif .if !target(afterdepend) afterdepend: .endif .endif .if !target(cleandepend) cleandepend: .if defined(SRCS) .if ${CTAGS:T} == "gtags" rm -f ${DEPENDFILES} GPATH GRTAGS GSYMS GTAGS .if defined(HTML) rm -rf HTML .endif .else rm -f ${DEPENDFILES} tags .endif .endif .endif .if !target(checkdpadd) && (defined(DPADD) || defined(LDADD)) _LDADD_FROM_DPADD= ${DPADD:R:T:C;^lib(.*)$;-l\1;g} # Ignore -Wl,--start-group/-Wl,--end-group as it might be required in the # LDADD list due to unresolved symbols _LDADD_CANONICALIZED= ${LDADD:N:R:T:C;^lib(.*)$;-l\1;g:N-Wl,--[es]*-group} checkdpadd: .if ${_LDADD_FROM_DPADD} != ${_LDADD_CANONICALIZED} @echo ${.CURDIR} @echo "DPADD -> ${_LDADD_FROM_DPADD}" @echo "LDADD -> ${_LDADD_CANONICALIZED}" .endif .endif Index: projects/clang380-import/share/mk/gendirdeps.mk =================================================================== --- projects/clang380-import/share/mk/gendirdeps.mk (revision 294776) +++ projects/clang380-import/share/mk/gendirdeps.mk (revision 294777) @@ -1,347 +1,347 @@ # $FreeBSD$ -# $Id: gendirdeps.mk,v 1.27 2015/06/08 20:55:11 sjg Exp $ +# $Id: gendirdeps.mk,v 1.29 2015/10/03 05:00:46 sjg Exp $ # Copyright (c) 2010-2013, Juniper Networks, Inc. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # # This makefile [re]generates ${.MAKE.DEPENDFILE} # .include # Assumptions: # RELDIR is the relative path from ${SRCTOP} to ${_CURDIR} # (SRCTOP is ${SB}/src) # _CURDIR is the absolute version of ${.CURDIR} # _OBJDIR is the absolute version of ${.OBJDIR} # _objroot is realpath of ${_OBJTOP} without ${MACHINE} # this may be different from _OBJROOT if $SB/obj is a # symlink to another filesystem. # _objroot must be a prefix match for _objtop .MAIN: all # keep this simple .MAKE.MODE = compat all: _CURDIR ?= ${.CURDIR} _OBJDIR ?= ${.OBJDIR} _OBJTOP ?= ${OBJTOP} _OBJROOT ?= ${OBJROOT:U${_OBJTOP}} .if ${_OBJROOT:M*/} _slash=/ .else _slash= .endif _objroot ?= ${_OBJROOT:tA}${_slash} _this = ${.PARSEDIR}/${.PARSEFILE} # remember what to make _DEPENDFILE := ${_CURDIR}/${.MAKE.DEPENDFILE:T} # We do _not_ want to read our own output! .MAKE.DEPENDFILE = /dev/null # caller should have set this META_FILES ?= ${.MAKE.META.FILES} .if !empty(META_FILES) .if ${.MAKE.LEVEL} > 0 && !empty(GENDIRDEPS_FILTER) # so we can compare below .-include <${_DEPENDFILE}> # yes, I mean :U with no value _DIRDEPS := ${DIRDEPS:U:O:u} .endif META_FILES := ${META_FILES:T:O:u} .export META_FILES # pickup customizations .-include "local.gendirdeps.mk" # these are actually prefixes that we'll skip # they should all be absolute paths SKIP_GENDIRDEPS ?= .if !empty(SKIP_GENDIRDEPS) _skip_gendirdeps = egrep -v '^(${SKIP_GENDIRDEPS:O:u:ts|})' | .else _skip_gendirdeps = .endif # Below we will turn _{VAR} into ${VAR} which keeps this simple # GENDIRDEPS_FILTER_DIR_VARS is a list of dirs to be substiuted for. # GENDIRDEPS_FILTER_VARS is more general. # In each case order matters. .if !empty(GENDIRDEPS_FILTER_DIR_VARS) GENDIRDEPS_FILTER += ${GENDIRDEPS_FILTER_DIR_VARS:@v@S,${$v},_{${v}},@} .endif .if !empty(GENDIRDEPS_FILTER_VARS) GENDIRDEPS_FILTER += ${GENDIRDEPS_FILTER_VARS:@v@S,/${$v}/,/_{${v}}/,@:NS,//,*:u} .endif # this (*should* be set in meta.sys.mk) # is the script that extracts what we want. META2DEPS ?= ${.PARSEDIR}/meta2deps.sh META2DEPS := ${META2DEPS} .if ${DEBUG_GENDIRDEPS:Uno:@x@${RELDIR:M$x}@} != "" && ${DEBUG_GENDIRDEPS:Uno:Mmeta2d*} != "" _time = time _sh_x = sh -x _py_d = -ddd .else _time = _sh_x = _py_d = .endif .if ${META2DEPS:E} == "py" # we can afford to do this all the time. DPDEPS ?= no META2DEPS_CMD = ${_time} ${PYTHON} ${META2DEPS} ${_py_d} .if ${DPDEPS:tl} != "no" META2DEPS_CMD += -D ${DPDEPS} .endif META2DEPS_FILTER = sed 's,^src:,${SRCTOP}/,;s,^\([^/]\),${OBJTOP}/\1,' | .elif ${META2DEPS:E} == "sh" META2DEPS_CMD = ${_time} ${_sh_x} ${META2DEPS} OBJTOP=${_OBJTOP} .else META2DEPS_CMD ?= ${META2DEPS} .endif .if ${TARGET_OBJ_SPEC:U${MACHINE}} != ${MACHINE} META2DEPS_CMD += -T ${TARGET_OBJ_SPEC} .endif META2DEPS_CMD += \ -R ${RELDIR} -H ${HOST_TARGET} \ ${M2D_OBJROOTS:O:u:@o@-O $o@} M2D_OBJROOTS += ${OBJTOP} ${_OBJROOT} ${_objroot} .if defined(SB_OBJROOT) M2D_OBJROOTS += ${SB_OBJROOT} .endif .if ${.MAKE.DEPENDFILE_PREFERENCE:U${.MAKE.DEPENDFILE}:M*.${MACHINE}} == "" # meta2deps.py only groks objroot # so we need to give it what it expects # and tell it not to add machine qualifiers META2DEPS_ARGS += MACHINE=none .endif .if defined(SB_BACKING_SB) META2DEPS_CMD += -S ${SB_BACKING_SB}/src M2D_OBJROOTS += ${SB_BACKING_SB}/${SB_OBJPREFIX} .endif # we are only interested in the dirs -# sepecifically those we read something from. +# specifically those we read something from. # we canonicalize them to keep things simple # if we are using a split-fs sandbox, it gets a little messier. _objtop := ${_OBJTOP:tA} dir_list != cd ${_OBJDIR} && \ ${META2DEPS_CMD} MACHINE=${MACHINE} \ SRCTOP=${SRCTOP} RELDIR=${RELDIR} CURDIR=${_CURDIR} \ ${META2DEPS_ARGS} \ ${META_FILES:O:u} | ${META2DEPS_FILTER} ${_skip_gendirdeps} \ sed 's,//*$$,,;s,\.${HOST_TARGET}$$,.host,' .if ${dir_list:M*ERROR\:*} != "" .warning ${dir_list:tW:C,.*(ERROR),\1,} .warning Skipping ${_DEPENDFILE:S,${SRCTOP}/,,} # we are not going to update anything .else dpadd_dir_list= .if !empty(DPADD) _nonlibs := ${DPADD:T:Nlib*:N*include} .if !empty(_nonlibs) ddep_list = .for f in ${_nonlibs:@x@${DPADD:M*/$x}@} .if exists($f.dirdep) ddep_list += $f.dirdep .elif exists(${f:H}.dirdep) ddep_list += ${f:H}.dirdep .else dir_list += ${f:H:tA} dpadd_dir_list += ${f:H:tA} .endif .endfor .if !empty(ddep_list) ddeps != cat ${ddep_list:O:u} | ${META2DEPS_FILTER} ${_skip_gendirdeps} \ sed 's,//*$$,,;s,\.${HOST_TARGET}$$,.host,;s,\.${MACHINE}$$,,' .if ${DEBUG_GENDIRDEPS:Uno:@x@${RELDIR:M$x}@} != "" .info ${RELDIR}: raw_dir_list='${dir_list}' .info ${RELDIR}: ddeps='${ddeps}' .endif dir_list += ${ddeps} .endif .endif .endif # DIRDEPS represent things that had to have been built first # so they should all be undir OBJTOP. # Note that ${_OBJTOP}/bsd/include/machine will get reported # to us as $SRCTOP/bsd/sys/$MACHINE_ARCH/include meaning we # will want to visit bsd/include # so we add # ${"${dir_list:M*bsd/sys/${MACHINE_ARCH}/include}":?bsd/include:} # to GENDIRDEPS_DIR_LIST_XTRAS _objtops = ${OBJTOP} ${_OBJTOP} ${_objtop} _objtops := ${_objtops:O:u} dirdep_list = \ ${_objtops:@o@${dir_list:M$o*/*:C,$o[^/]*/,,}@} \ ${GENDIRDEPS_DIR_LIST_XTRAS} # sort longest first M2D_OBJROOTS := ${M2D_OBJROOTS:O:u:[-1..1]} # anything we use from an object dir other than ours # needs to be qualified with its . suffix # (we used the pseudo machine "host" for the HOST_TARGET). skip_ql= ${SRCTOP}* ${_objtops:@o@$o*@} .for o in ${M2D_OBJROOTS:${skip_ql:${M_ListToSkip}}} # we need := so only skip_ql to this point applies ql.$o := ${dir_list:${skip_ql:${M_ListToSkip}}:M$o*/*/*:C,$o([^/]+)/(.*),\2.\1,:S,.${HOST_TARGET},.host,} qualdir_list += ${ql.$o} .if ${DEBUG_GENDIRDEPS:Uno:@x@${RELDIR:M$x}@} != "" .info ${RELDIR}: o=$o ${ql.$o qualdir_list:L:@v@$v=${$v}@} .endif skip_ql+= $o* .endfor dirdep_list := ${dirdep_list:O:u} qualdir_list := ${qualdir_list:N*.${MACHINE}:O:u} DIRDEPS = \ ${dirdep_list:N${RELDIR}:N${RELDIR}/*} \ ${qualdir_list:N${RELDIR}.*:N${RELDIR}/*} # We only consider things below $RELDIR/ if they have a makefile. # This is the same test that _DIRDEP_USE applies. # We have do a double test with dirdep_list as it _may_ contain # qualified dirs - if we got anything from a stage dir. # qualdir_list we know are all qualified. # It would be nice do peform this check for all of DIRDEPS, # but we cannot assume that all of the tree is present, # in fact we can only assume that RELDIR is. DIRDEPS += \ ${dirdep_list:M${RELDIR}/*:@d@${.MAKE.MAKEFILE_PREFERENCE:@m@${exists(${SRCTOP}/$d/$m):?$d:${exists(${SRCTOP}/${d:R}/$m):?$d:}}@}@} \ ${qualdir_list:M${RELDIR}/*:@d@${.MAKE.MAKEFILE_PREFERENCE:@m@${exists(${SRCTOP}/${d:R}/$m):?$d:}@}@} DIRDEPS := ${DIRDEPS:${GENDIRDEPS_FILTER:UNno:ts:}:C,//+,/,g:O:u} .if ${DEBUG_GENDIRDEPS:Uno:@x@${RELDIR:M$x}@} != "" .info ${RELDIR}: M2D_OBJROOTS=${M2D_OBJROOTS} .info ${RELDIR}: dir_list='${dir_list}' .info ${RELDIR}: dpadd_dir_list='${dpadd_dir_list}' .info ${RELDIR}: dirdep_list='${dirdep_list}' .info ${RELDIR}: qualdir_list='${qualdir_list}' .info ${RELDIR}: SKIP_GENDIRDEPS='${SKIP_GENDIRDEPS}' .info ${RELDIR}: GENDIRDEPS_FILTER='${GENDIRDEPS_FILTER}' .info ${RELDIR}: FORCE_DPADD='${DPADD}' .info ${RELDIR}: DIRDEPS='${DIRDEPS}' .endif # SRC_DIRDEPS is for checkout logic src_dirdep_list = \ ${dir_list:M${SRCTOP}/*:S,${SRCTOP}/,,} SRC_DIRDEPS = \ ${src_dirdep_list:N${RELDIR}:N${RELDIR}/*:C,(/h)/.*,,} SRC_DIRDEPS := ${SRC_DIRDEPS:${GENDIRDEPS_SRC_FILTER:UN/*:ts:}:C,//+,/,g:O:u} # if you want to capture SRC_DIRDEPS in .MAKE.DEPENDFILE put # SRC_DIRDEPS_FILE = ${_DEPENDFILE} # in local.gendirdeps.mk .if ${SRC_DIRDEPS_FILE:Uno:tl} != "no" ECHO_SRC_DIRDEPS = echo 'SRC_DIRDEPS = \'; echo '${SRC_DIRDEPS:@d@ $d \\${.newline}@}'; echo; .if ${SRC_DIRDEPS_FILE:T} == ${_DEPENDFILE:T} _include_src_dirdeps = ${ECHO_SRC_DIRDEPS} .else all: ${SRC_DIRDEPS_FILE} .if !target(${SRC_DIRDEPS_FILE}) ${SRC_DIRDEPS_FILE}: ${META_FILES} ${_this} ${META2DEPS} @(${ECHO_SRC_DIRDEPS}) > $@ .endif .endif .endif _include_src_dirdeps ?= all: ${_DEPENDFILE} # if this is going to exist it would be there by now .if !exists(.depend) CAT_DEPEND = /dev/null .endif CAT_DEPEND ?= .depend .if !empty(_DIRDEPS) && ${DIRDEPS} != ${_DIRDEPS} # we may have changed a filter .PHONY: ${_DEPENDFILE} .endif # 'cat .depend' should suffice, but if we are mixing build modes # .depend may contain things we don't want. # The sed command at the end of the stream, allows for the filters # to output _{VAR} tokens which we will turn into proper ${VAR} references. ${_DEPENDFILE}: ${CAT_DEPEND:M.depend} ${META_FILES:O:u:@m@${exists($m):?$m:}@} ${_this} ${META2DEPS} @(${GENDIRDEPS_HEADER} echo '# Autogenerated - do NOT edit!'; echo; \ echo 'DIRDEPS = \'; \ echo '${DIRDEPS:@d@ $d \\${.newline}@}'; echo; \ ${_include_src_dirdeps} \ echo '.include '; \ echo; \ echo '.if $${DEP_RELDIR} == $${_DEP_RELDIR}'; \ echo '# local dependencies - needed for -jN in clean tree'; \ [ -s ${CAT_DEPEND} ] && { grep : ${CAT_DEPEND} | grep -v '[/\\]'; }; \ echo '.endif' ) | sed 's,_\([{(]\),$$\1,g' > $@.new${.MAKE.PID} @${InstallNew}; InstallNew -s $@.new${.MAKE.PID} .endif # meta2deps failed .elif !empty(SUBDIR) DIRDEPS := ${SUBDIR:S,^,${RELDIR}/,:O:u} all: ${_DEPENDFILE} ${_DEPENDFILE}: ${MAKEFILE} ${_this} @(${GENDIRDEPS_HEADER} echo '# Autogenerated - do NOT edit!'; echo; \ echo 'DIRDEPS = \'; \ echo '${DIRDEPS:@d@ $d \\${.newline}@}'; echo; \ echo '.include '; \ echo ) | sed 's,_\([{(]\),$$\1,g' > $@.new @${InstallNew}; InstallNew $@.new .else # nothing to do all ${_DEPENDFILE}: .endif ${_DEPENDFILE}: .PRECIOUS Index: projects/clang380-import/share/mk/host-target.mk =================================================================== --- projects/clang380-import/share/mk/host-target.mk (revision 294776) +++ projects/clang380-import/share/mk/host-target.mk (revision 294777) @@ -1,36 +1,45 @@ # $FreeBSD$ # RCSid: -# $Id: host-target.mk,v 1.7 2014/05/16 17:54:52 sjg Exp $ +# $Id: host-target.mk,v 1.11 2015/10/25 00:07:20 sjg Exp $ # Host platform information; may be overridden .if !defined(_HOST_OSNAME) _HOST_OSNAME != uname -s .export _HOST_OSNAME .endif .if !defined(_HOST_OSREL) _HOST_OSREL != uname -r .export _HOST_OSREL .endif +.if !defined(_HOST_MACHINE) +_HOST_MACHINE != uname -m +.export _HOST_MACHINE +.endif .if !defined(_HOST_ARCH) -_HOST_ARCH != uname -p 2>/dev/null || uname -m +# for NetBSD prefer $MACHINE (amd64 rather than x86_64) +.if ${_HOST_OSNAME:NNetBSD} == "" +_HOST_ARCH := ${_HOST_MACHINE} +.else +_HOST_ARCH != uname -p 2> /dev/null || uname -m # uname -p may produce garbage on linux -.if ${_HOST_ARCH:[\#]} > 1 -_HOST_ARCH != uname -m +.if ${_HOST_ARCH:[\#]} > 1 || ${_HOST_ARCH:Nunknown} == "" +_HOST_ARCH := ${_HOST_MACHINE} .endif +.endif .export _HOST_ARCH .endif .if !defined(HOST_MACHINE) -HOST_MACHINE != uname -m +HOST_MACHINE := ${_HOST_MACHINE} .export HOST_MACHINE .endif HOST_OSMAJOR := ${_HOST_OSREL:C/[^0-9].*//} -HOST_OSTYPE := ${_HOST_OSNAME}-${_HOST_OSREL:C/\([^\)]*\)//}-${_HOST_ARCH} +HOST_OSTYPE := ${_HOST_OSNAME:S,/,,g}-${_HOST_OSREL:C/\([^\)]*\)//}-${_HOST_ARCH} HOST_OS := ${_HOST_OSNAME} host_os := ${_HOST_OSNAME:tl} -HOST_TARGET := ${host_os}${HOST_OSMAJOR}-${_HOST_ARCH} +HOST_TARGET := ${host_os:S,/,,g}${HOST_OSMAJOR}-${_HOST_ARCH} # tr is insanely non-portable, accommodate the lowest common denominator TR ?= tr toLower = ${TR} 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' toUpper = ${TR} 'abcdefghijklmnopqrstuvwxyz' 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' Index: projects/clang380-import/share/mk/meta.subdir.mk =================================================================== --- projects/clang380-import/share/mk/meta.subdir.mk (revision 294776) +++ projects/clang380-import/share/mk/meta.subdir.mk (revision 294777) @@ -1,80 +1,80 @@ # $FreeBSD$ -# $Id: meta.subdir.mk,v 1.10 2012/07/03 05:26:46 sjg Exp $ +# $Id: meta.subdir.mk,v 1.11 2015/11/24 22:26:51 sjg Exp $ # # @(#) Copyright (c) 2010, Simon J. Gerraty # # This file is provided in the hope that it will # be of use. There is absolutely NO WARRANTY. # Permission to copy, redistribute or otherwise # use this file is hereby granted provided that # the above copyright notice and this notice are # left intact. # # Please send copies of changes and bug-fixes to: # sjg@crufty.net # .if !defined(NO_SUBDIR) && !empty(SUBDIR) .if make(destroy*) || make(clean*) .MAKE.MODE = compat .if !commands(destroy) .-include .endif .elif ${.MAKE.LEVEL} == 0 .MAIN: all .if !exists(${.CURDIR}/${.MAKE.DEPENDFILE:T}) || make(gendirdeps) # start with this DIRDEPS = ${SUBDIR:N.WAIT:O:u:@d@${RELDIR}/$d@} .if make(gendirdeps) .include .else # this is the cunning bit # actually it is probably a bit risky # since we may pickup subdirs which are not relevant # the alternative is a walk through the tree though # which is difficult without a sub-make. .if defined(BOOTSTRAP_DEPENDFILES) _find_name = ${.MAKE.MAKEFILE_PREFERENCE:@m@-o -name $m@:S,^-o,,1} DIRDEPS = ${_subdeps:H:O:u:@d@${RELDIR}/$d@} .elif ${.MAKE.DEPENDFILE:E} == ${MACHINE} && defined(ALL_MACHINES) # we want to find Makefile.depend.* ie for all machines # and turn the dirs into dir. _find_name = -name '${.MAKE.DEPENDFILE:T:R}*' DIRDEPS = ${_subdeps:O:u:${NIgnoreFiles}:@d@${RELDIR}/${d:H}.${d:E}@:S,.${MACHINE}$,,:S,.depend$,,} .else # much simpler _find_name = -name ${.MAKE.DEPENDFILE:T} .if ${.MAKE.DEPENDFILE:E} == ${MACHINE} _find_name += -o -name ${.MAKE.DEPENDFILE:T:R} .endif DIRDEPS = ${_subdeps:H:O:u:@d@${RELDIR}/$d@} .endif _subdeps != cd ${.CURDIR} && \ find ${SUBDIR:N.WAIT} -type f \( ${_find_name} \) -print -o \ -name .svn -prune 2> /dev/null; echo .if empty(_subdeps) DIRDEPS = .else # clean up if needed -DIRDEPS := ${DIRDEPS:S,^./,,:S,/./,/,g:${SUBDIREPS_FILTER:Uu}} +DIRDEPS := ${DIRDEPS:S,^./,,:S,/./,/,g:${SUBDIRDEPS_FILTER:Uu}} .endif # we just dealt with it, if we leave it defined, # dirdeps.mk will compute some interesting combinations. .undef ALL_MACHINES DEP_RELDIR = ${RELDIR} .include .endif .endif .else all: .PHONY .endif .endif Index: projects/clang380-import/share =================================================================== --- projects/clang380-import/share (revision 294776) +++ projects/clang380-import/share (revision 294777) Property changes on: projects/clang380-import/share ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/share:r294599-294776 Index: projects/clang380-import/sys/amd64/linux/linux.h =================================================================== --- projects/clang380-import/sys/amd64/linux/linux.h (revision 294776) +++ projects/clang380-import/sys/amd64/linux/linux.h (revision 294777) @@ -1,549 +1,549 @@ /*- * Copyright (c) 2013 Dmitry Chagin * Copyright (c) 1994-1996 SĂžren Schmidt * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer * in this position and unchanged. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote products * derived from this software without specific prior written permission * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _AMD64_LINUX_H_ #define _AMD64_LINUX_H_ #include #include /* * debugging support */ extern u_char linux_debug_map[]; #define ldebug(name) isclr(linux_debug_map, LINUX_SYS_linux_ ## name) #define ARGS(nm, fmt) "linux(%ld/%ld): "#nm"("fmt")\n", \ (long)td->td_proc->p_pid, (long)td->td_tid #define LMSG(fmt) "linux(%ld/%ld): "fmt"\n", \ (long)td->td_proc->p_pid, (long)td->td_tid #define LINUX_DTRACE linuxulator #define PTRIN(v) (void *)(v) #define PTROUT(v) (uintptr_t)(v) #define CP(src,dst,fld) do { (dst).fld = (src).fld; } while (0) #define CP2(src,dst,sfld,dfld) do { (dst).dfld = (src).sfld; } while (0) #define PTRIN_CP(src,dst,fld) \ do { (dst).fld = PTRIN((src).fld); } while (0) /* * Provide a separate set of types for the Linux types. */ typedef int32_t l_int; typedef int64_t l_long; typedef int16_t l_short; typedef uint32_t l_uint; typedef uint64_t l_ulong; typedef uint16_t l_ushort; typedef l_ulong l_uintptr_t; typedef l_long l_clock_t; typedef l_int l_daddr_t; typedef l_ulong l_dev_t; typedef l_uint l_gid_t; typedef l_uint l_uid_t; typedef l_ulong l_ino_t; typedef l_int l_key_t; typedef l_long l_loff_t; typedef l_uint l_mode_t; typedef l_long l_off_t; typedef l_int l_pid_t; typedef l_ulong l_size_t; typedef l_long l_ssize_t; typedef l_long l_suseconds_t; typedef l_long l_time_t; typedef l_int l_timer_t; typedef l_int l_mqd_t; typedef l_size_t l_socklen_t; typedef l_ulong l_fd_mask; typedef struct { l_int val[2]; } l_fsid_t; typedef struct { l_time_t tv_sec; l_suseconds_t tv_usec; } l_timeval; #define l_fd_set fd_set /* * Miscellaneous */ #define LINUX_NAME_MAX 255 #define LINUX_CTL_MAXNAME 10 #define LINUX_AT_COUNT 19 /* Count of used aux entry types. */ struct l___sysctl_args { l_uintptr_t name; l_int nlen; l_uintptr_t oldval; l_uintptr_t oldlenp; l_uintptr_t newval; l_size_t newlen; l_ulong __spare[4]; }; /* Scheduling policies */ #define LINUX_SCHED_OTHER 0 #define LINUX_SCHED_FIFO 1 #define LINUX_SCHED_RR 2 /* Resource limits */ #define LINUX_RLIMIT_CPU 0 #define LINUX_RLIMIT_FSIZE 1 #define LINUX_RLIMIT_DATA 2 #define LINUX_RLIMIT_STACK 3 #define LINUX_RLIMIT_CORE 4 #define LINUX_RLIMIT_RSS 5 #define LINUX_RLIMIT_NPROC 6 #define LINUX_RLIMIT_NOFILE 7 #define LINUX_RLIMIT_MEMLOCK 8 #define LINUX_RLIMIT_AS 9 /* Address space limit */ #define LINUX_RLIM_NLIMITS 10 struct l_rlimit { l_ulong rlim_cur; l_ulong rlim_max; }; /* mmap options */ #define LINUX_MAP_SHARED 0x0001 #define LINUX_MAP_PRIVATE 0x0002 #define LINUX_MAP_FIXED 0x0010 #define LINUX_MAP_ANON 0x0020 #define LINUX_MAP_GROWSDOWN 0x0100 /* * stat family of syscalls */ struct l_timespec { l_time_t tv_sec; l_long tv_nsec; }; struct l_newstat { l_dev_t st_dev; l_ino_t st_ino; l_ulong st_nlink; l_uint st_mode; l_uid_t st_uid; l_gid_t st_gid; l_uint __st_pad1; l_dev_t st_rdev; l_off_t st_size; l_long st_blksize; l_long st_blocks; struct l_timespec st_atim; struct l_timespec st_mtim; struct l_timespec st_ctim; l_long __unused1; l_long __unused2; l_long __unused3; }; /* sigaction flags */ #define LINUX_SA_NOCLDSTOP 0x00000001 #define LINUX_SA_NOCLDWAIT 0x00000002 #define LINUX_SA_SIGINFO 0x00000004 #define LINUX_SA_RESTORER 0x04000000 #define LINUX_SA_ONSTACK 0x08000000 #define LINUX_SA_RESTART 0x10000000 #define LINUX_SA_INTERRUPT 0x20000000 #define LINUX_SA_NOMASK 0x40000000 #define LINUX_SA_ONESHOT 0x80000000 /* sigprocmask actions */ #define LINUX_SIG_BLOCK 0 #define LINUX_SIG_UNBLOCK 1 #define LINUX_SIG_SETMASK 2 /* sigaltstack */ #define LINUX_MINSIGSTKSZ 2048 typedef void (*l_handler_t)(l_int); typedef struct { l_handler_t lsa_handler; l_ulong lsa_flags; l_uintptr_t lsa_restorer; l_sigset_t lsa_mask; } l_sigaction_t; typedef struct { l_uintptr_t ss_sp; l_int ss_flags; l_size_t ss_size; } l_stack_t; struct l_fpstate { u_int16_t cwd; u_int16_t swd; u_int16_t twd; u_int16_t fop; u_int64_t rip; u_int64_t rdp; u_int32_t mxcsr; u_int32_t mxcsr_mask; u_int32_t st_space[32]; u_int32_t xmm_space[64]; u_int32_t reserved2[24]; }; struct l_sigcontext { l_ulong sc_r8; l_ulong sc_r9; l_ulong sc_r10; l_ulong sc_r11; l_ulong sc_r12; l_ulong sc_r13; l_ulong sc_r14; l_ulong sc_r15; l_ulong sc_rdi; l_ulong sc_rsi; l_ulong sc_rbp; l_ulong sc_rbx; l_ulong sc_rdx; l_ulong sc_rax; l_ulong sc_rcx; l_ulong sc_rsp; l_ulong sc_rip; l_ulong sc_rflags; l_ushort sc_cs; l_ushort sc_gs; l_ushort sc_fs; l_ushort sc___pad0; l_ulong sc_err; l_ulong sc_trapno; l_sigset_t sc_mask; l_ulong sc_cr2; struct l_fpstate *sc_fpstate; l_ulong sc_reserved1[8]; }; struct l_ucontext { l_ulong uc_flags; l_uintptr_t uc_link; l_stack_t uc_stack; struct l_sigcontext uc_mcontext; l_sigset_t uc_sigmask; }; #define LINUX_SI_PREAMBLE_SIZE (4 * sizeof(int)) #define LINUX_SI_MAX_SIZE 128 #define LINUX_SI_PAD_SIZE ((LINUX_SI_MAX_SIZE - \ LINUX_SI_PREAMBLE_SIZE) / sizeof(l_int)) typedef union l_sigval { l_int sival_int; l_uintptr_t sival_ptr; } l_sigval_t; typedef struct l_siginfo { l_int lsi_signo; l_int lsi_errno; l_int lsi_code; union { l_int _pad[LINUX_SI_PAD_SIZE]; struct { l_pid_t _pid; l_uid_t _uid; } _kill; struct { l_timer_t _tid; l_int _overrun; char _pad[sizeof(l_uid_t) - sizeof(int)]; union l_sigval _sigval; l_uint _sys_private; } _timer; struct { l_pid_t _pid; /* sender's pid */ l_uid_t _uid; /* sender's uid */ union l_sigval _sigval; } _rt; struct { l_pid_t _pid; /* which child */ l_uid_t _uid; /* sender's uid */ l_int _status; /* exit code */ l_clock_t _utime; l_clock_t _stime; } _sigchld; struct { l_uintptr_t _addr; /* Faulting insn/memory ref. */ } _sigfault; struct { l_long _band; /* POLL_IN,POLL_OUT,POLL_MSG */ l_int _fd; } _sigpoll; } _sifields; } l_siginfo_t; #define lsi_pid _sifields._kill._pid #define lsi_uid _sifields._kill._uid #define lsi_tid _sifields._timer._tid #define lsi_overrun _sifields._timer._overrun #define lsi_sys_private _sifields._timer._sys_private #define lsi_status _sifields._sigchld._status #define lsi_utime _sifields._sigchld._utime #define lsi_stime _sifields._sigchld._stime #define lsi_value _sifields._rt._sigval #define lsi_int _sifields._rt._sigval.sival_int #define lsi_ptr _sifields._rt._sigval.sival_ptr #define lsi_addr _sifields._sigfault._addr #define lsi_band _sifields._sigpoll._band #define lsi_fd _sifields._sigpoll._fd /* * We make the stack look like Linux expects it when calling a signal * handler, but use the BSD way of calling the handler and sigreturn(). * This means that we need to pass the pointer to the handler too. * It is appended to the frame to not interfere with the rest of it. */ struct l_rt_sigframe { struct l_ucontext sf_sc; struct l_siginfo sf_si; l_handler_t sf_handler; }; /* * mount flags */ #define LINUX_MS_RDONLY 0x0001 #define LINUX_MS_NOSUID 0x0002 #define LINUX_MS_NODEV 0x0004 #define LINUX_MS_NOEXEC 0x0008 #define LINUX_MS_REMOUNT 0x0020 /* * SystemV IPC defines */ #define LINUX_IPC_RMID 0 #define LINUX_IPC_SET 1 #define LINUX_IPC_STAT 2 #define LINUX_IPC_INFO 3 #define LINUX_SHM_LOCK 11 #define LINUX_SHM_UNLOCK 12 #define LINUX_SHM_STAT 13 #define LINUX_SHM_INFO 14 #define LINUX_SHM_RDONLY 0x1000 #define LINUX_SHM_RND 0x2000 #define LINUX_SHM_REMAP 0x4000 /* semctl commands */ #define LINUX_GETPID 11 #define LINUX_GETVAL 12 #define LINUX_GETALL 13 #define LINUX_GETNCNT 14 #define LINUX_GETZCNT 15 #define LINUX_SETVAL 16 #define LINUX_SETALL 17 #define LINUX_SEM_STAT 18 #define LINUX_SEM_INFO 19 union l_semun { l_int val; l_uintptr_t buf; l_uintptr_t array; l_uintptr_t __buf; l_uintptr_t __pad; }; struct l_ipc_perm { l_key_t key; l_uid_t uid; l_gid_t gid; l_uid_t cuid; l_gid_t cgid; l_ushort mode; l_ushort seq; }; /* * Socket defines */ #define LINUX_SOL_SOCKET 1 #define LINUX_SOL_IP 0 #define LINUX_SOL_IPX 256 #define LINUX_SOL_AX25 257 #define LINUX_SOL_TCP 6 #define LINUX_SOL_UDP 17 #define LINUX_SO_DEBUG 1 #define LINUX_SO_REUSEADDR 2 #define LINUX_SO_TYPE 3 #define LINUX_SO_ERROR 4 #define LINUX_SO_DONTROUTE 5 #define LINUX_SO_BROADCAST 6 #define LINUX_SO_SNDBUF 7 #define LINUX_SO_RCVBUF 8 #define LINUX_SO_KEEPALIVE 9 #define LINUX_SO_OOBINLINE 10 #define LINUX_SO_NO_CHECK 11 #define LINUX_SO_PRIORITY 12 #define LINUX_SO_LINGER 13 #define LINUX_SO_PASSCRED 16 #define LINUX_SO_PEERCRED 17 #define LINUX_SO_RCVLOWAT 18 #define LINUX_SO_SNDLOWAT 19 #define LINUX_SO_RCVTIMEO 20 #define LINUX_SO_SNDTIMEO 21 #define LINUX_SO_TIMESTAMP 29 #define LINUX_SO_ACCEPTCONN 30 #define LINUX_IP_TOS 1 #define LINUX_IP_TTL 2 #define LINUX_IP_HDRINCL 3 #define LINUX_IP_OPTIONS 4 #define LINUX_IP_MULTICAST_IF 32 #define LINUX_IP_MULTICAST_TTL 33 #define LINUX_IP_MULTICAST_LOOP 34 #define LINUX_IP_ADD_MEMBERSHIP 35 #define LINUX_IP_DROP_MEMBERSHIP 36 struct l_sockaddr { l_ushort sa_family; char sa_data[14]; }; struct l_ifmap { l_ulong mem_start; l_ulong mem_end; l_ushort base_addr; u_char irq; u_char dma; u_char port; } __packed; #define LINUX_IFHWADDRLEN 6 #define LINUX_IFNAMSIZ 16 struct l_ifreq { union { char ifrn_name[LINUX_IFNAMSIZ]; } ifr_ifrn; union { struct l_sockaddr ifru_addr; struct l_sockaddr ifru_dstaddr; struct l_sockaddr ifru_broadaddr; struct l_sockaddr ifru_netmask; struct l_sockaddr ifru_hwaddr; l_short ifru_flags[1]; l_int ifru_metric; l_int ifru_mtu; struct l_ifmap ifru_map; char ifru_slave[LINUX_IFNAMSIZ]; l_uintptr_t ifru_data; } ifr_ifru; } __packed; #define ifr_name ifr_ifrn.ifrn_name /* Interface name */ #define ifr_hwaddr ifr_ifru.ifru_hwaddr /* MAC address */ struct l_ifconf { int ifc_len; union { l_uintptr_t ifcu_buf; l_uintptr_t ifcu_req; } ifc_ifcu; }; #define ifc_buf ifc_ifcu.ifcu_buf #define ifc_req ifc_ifcu.ifcu_req /* * poll() */ #define LINUX_POLLIN 0x0001 #define LINUX_POLLPRI 0x0002 #define LINUX_POLLOUT 0x0004 #define LINUX_POLLERR 0x0008 #define LINUX_POLLHUP 0x0010 #define LINUX_POLLNVAL 0x0020 #define LINUX_POLLRDNORM 0x0040 #define LINUX_POLLRDBAND 0x0080 #define LINUX_POLLWRNORM 0x0100 #define LINUX_POLLWRBAND 0x0200 #define LINUX_POLLMSG 0x0400 struct l_pollfd { l_int fd; l_short events; l_short revents; }; #define LINUX_CLONE_VM 0x00000100 #define LINUX_CLONE_FS 0x00000200 #define LINUX_CLONE_FILES 0x00000400 #define LINUX_CLONE_SIGHAND 0x00000800 #define LINUX_CLONE_PID 0x00001000 /* No longer exist in Linux */ #define LINUX_CLONE_VFORK 0x00004000 #define LINUX_CLONE_PARENT 0x00008000 #define LINUX_CLONE_THREAD 0x00010000 #define LINUX_CLONE_SETTLS 0x00080000 #define LINUX_CLONE_PARENT_SETTID 0x00100000 #define LINUX_CLONE_CHILD_CLEARTID 0x00200000 #define LINUX_CLONE_CHILD_SETTID 0x01000000 #define LINUX_ARCH_SET_GS 0x1001 #define LINUX_ARCH_SET_FS 0x1002 -#define LINUX_ARCH_GET_GS 0x1003 -#define LINUX_ARCH_GET_FS 0x1004 +#define LINUX_ARCH_GET_FS 0x1003 +#define LINUX_ARCH_GET_GS 0x1004 #define linux_copyout_rusage(r, u) copyout(r, u, sizeof(*r)) /* robust futexes */ struct linux_robust_list { l_uintptr_t next; }; struct linux_robust_list_head { struct linux_robust_list list; l_long futex_offset; l_uintptr_t pending_list; }; #endif /* !_AMD64_LINUX_H_ */ Index: projects/clang380-import/sys/amd64/linux/linux_machdep.c =================================================================== --- projects/clang380-import/sys/amd64/linux/linux_machdep.c (revision 294776) +++ projects/clang380-import/sys/amd64/linux/linux_machdep.c (revision 294777) @@ -1,433 +1,432 @@ /*- * Copyright (c) 2013 Dmitry Chagin * Copyright (c) 2004 Tim J. Robbins * Copyright (c) 2002 Doug Rabson * Copyright (c) 2000 Marcel Moolenaar * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer * in this position and unchanged. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote products * derived from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include int linux_execve(struct thread *td, struct linux_execve_args *args) { struct image_args eargs; char *path; int error; LCONVPATHEXIST(td, args->path, &path); LINUX_CTR(execve); error = exec_copyin_args(&eargs, path, UIO_SYSSPACE, args->argp, args->envp); free(path, M_TEMP); if (error == 0) error = linux_common_execve(td, &eargs); return (error); } int linux_set_upcall_kse(struct thread *td, register_t stack) { if (stack) td->td_frame->tf_rsp = stack; /* * The newly created Linux thread returns * to the user space by the same path that a parent do. */ td->td_frame->tf_rax = 0; return (0); } #define STACK_SIZE (2 * 1024 * 1024) #define GUARD_SIZE (4 * PAGE_SIZE) int linux_mmap2(struct thread *td, struct linux_mmap2_args *args) { struct proc *p = td->td_proc; struct mmap_args /* { caddr_t addr; size_t len; int prot; int flags; int fd; long pad; off_t pos; } */ bsd_args; int error; struct file *fp; cap_rights_t rights; LINUX_CTR6(mmap2, "0x%lx, %ld, %ld, 0x%08lx, %ld, 0x%lx", args->addr, args->len, args->prot, args->flags, args->fd, args->pgoff); error = 0; bsd_args.flags = 0; fp = NULL; /* * Linux mmap(2): * You must specify exactly one of MAP_SHARED and MAP_PRIVATE */ if (! ((args->flags & LINUX_MAP_SHARED) ^ (args->flags & LINUX_MAP_PRIVATE))) return (EINVAL); if (args->flags & LINUX_MAP_SHARED) bsd_args.flags |= MAP_SHARED; if (args->flags & LINUX_MAP_PRIVATE) bsd_args.flags |= MAP_PRIVATE; if (args->flags & LINUX_MAP_FIXED) bsd_args.flags |= MAP_FIXED; if (args->flags & LINUX_MAP_ANON) bsd_args.flags |= MAP_ANON; else bsd_args.flags |= MAP_NOSYNC; if (args->flags & LINUX_MAP_GROWSDOWN) bsd_args.flags |= MAP_STACK; /* * PROT_READ, PROT_WRITE, or PROT_EXEC implies PROT_READ and PROT_EXEC * on Linux/i386. We do this to ensure maximum compatibility. * Linux/ia64 does the same in i386 emulation mode. */ bsd_args.prot = args->prot; if (bsd_args.prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) bsd_args.prot |= PROT_READ | PROT_EXEC; /* Linux does not check file descriptor when MAP_ANONYMOUS is set. */ bsd_args.fd = (bsd_args.flags & MAP_ANON) ? -1 : args->fd; if (bsd_args.fd != -1) { /* * Linux follows Solaris mmap(2) description: * The file descriptor fildes is opened with * read permission, regardless of the * protection options specified. */ error = fget(td, bsd_args.fd, cap_rights_init(&rights, CAP_MMAP), &fp); if (error != 0 ) return (error); if (fp->f_type != DTYPE_VNODE) { fdrop(fp, td); return (EINVAL); } /* Linux mmap() just fails for O_WRONLY files */ if (!(fp->f_flag & FREAD)) { fdrop(fp, td); return (EACCES); } fdrop(fp, td); } if (args->flags & LINUX_MAP_GROWSDOWN) { /* * The Linux MAP_GROWSDOWN option does not limit auto * growth of the region. Linux mmap with this option * takes as addr the inital BOS, and as len, the initial * region size. It can then grow down from addr without * limit. However, Linux threads has an implicit internal * limit to stack size of STACK_SIZE. Its just not * enforced explicitly in Linux. But, here we impose * a limit of (STACK_SIZE - GUARD_SIZE) on the stack * region, since we can do this with our mmap. * * Our mmap with MAP_STACK takes addr as the maximum * downsize limit on BOS, and as len the max size of * the region. It then maps the top SGROWSIZ bytes, * and auto grows the region down, up to the limit * in addr. * * If we don't use the MAP_STACK option, the effect * of this code is to allocate a stack region of a * fixed size of (STACK_SIZE - GUARD_SIZE). */ if ((caddr_t)PTRIN(args->addr) + args->len > p->p_vmspace->vm_maxsaddr) { /* * Some Linux apps will attempt to mmap * thread stacks near the top of their * address space. If their TOS is greater * than vm_maxsaddr, vm_map_growstack() * will confuse the thread stack with the * process stack and deliver a SEGV if they * attempt to grow the thread stack past their * current stacksize rlimit. To avoid this, * adjust vm_maxsaddr upwards to reflect * the current stacksize rlimit rather * than the maximum possible stacksize. * It would be better to adjust the * mmap'ed region, but some apps do not check * mmap's return value. */ PROC_LOCK(p); p->p_vmspace->vm_maxsaddr = (char *)USRSTACK - lim_cur_proc(p, RLIMIT_STACK); PROC_UNLOCK(p); } /* * This gives us our maximum stack size and a new BOS. * If we're using VM_STACK, then mmap will just map * the top SGROWSIZ bytes, and let the stack grow down * to the limit at BOS. If we're not using VM_STACK * we map the full stack, since we don't have a way * to autogrow it. */ if (args->len > STACK_SIZE - GUARD_SIZE) { bsd_args.addr = (caddr_t)PTRIN(args->addr); bsd_args.len = args->len; } else { bsd_args.addr = (caddr_t)PTRIN(args->addr) - (STACK_SIZE - GUARD_SIZE - args->len); bsd_args.len = STACK_SIZE - GUARD_SIZE; } } else { bsd_args.addr = (caddr_t)PTRIN(args->addr); bsd_args.len = args->len; } bsd_args.pos = (off_t)args->pgoff; error = sys_mmap(td, &bsd_args); LINUX_CTR2(mmap2, "return: %d (%p)", error, td->td_retval[0]); return (error); } int linux_mprotect(struct thread *td, struct linux_mprotect_args *uap) { struct mprotect_args bsd_args; LINUX_CTR(mprotect); bsd_args.addr = uap->addr; bsd_args.len = uap->len; bsd_args.prot = uap->prot; if (bsd_args.prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) bsd_args.prot |= PROT_READ | PROT_EXEC; return (sys_mprotect(td, &bsd_args)); } int linux_iopl(struct thread *td, struct linux_iopl_args *args) { int error; LINUX_CTR(iopl); if (args->level > 3) return (EINVAL); if ((error = priv_check(td, PRIV_IO)) != 0) return (error); if ((error = securelevel_gt(td->td_ucred, 0)) != 0) return (error); td->td_frame->tf_rflags = (td->td_frame->tf_rflags & ~PSL_IOPL) | (args->level * (PSL_IOPL / 3)); return (0); } int linux_rt_sigsuspend(struct thread *td, struct linux_rt_sigsuspend_args *uap) { l_sigset_t lmask; sigset_t sigmask; int error; LINUX_CTR2(rt_sigsuspend, "%p, %ld", uap->newset, uap->sigsetsize); if (uap->sigsetsize != sizeof(l_sigset_t)) return (EINVAL); error = copyin(uap->newset, &lmask, sizeof(l_sigset_t)); if (error) return (error); linux_to_bsd_sigset(&lmask, &sigmask); return (kern_sigsuspend(td, sigmask)); } int linux_pause(struct thread *td, struct linux_pause_args *args) { struct proc *p = td->td_proc; sigset_t sigmask; LINUX_CTR(pause); PROC_LOCK(p); sigmask = td->td_sigmask; PROC_UNLOCK(p); return (kern_sigsuspend(td, sigmask)); } int linux_sigaltstack(struct thread *td, struct linux_sigaltstack_args *uap) { stack_t ss, oss; l_stack_t lss; int error; LINUX_CTR2(sigaltstack, "%p, %p", uap->uss, uap->uoss); if (uap->uss != NULL) { error = copyin(uap->uss, &lss, sizeof(l_stack_t)); if (error) return (error); ss.ss_sp = PTRIN(lss.ss_sp); ss.ss_size = lss.ss_size; ss.ss_flags = linux_to_bsd_sigaltstack(lss.ss_flags); } error = kern_sigaltstack(td, (uap->uss != NULL) ? &ss : NULL, (uap->uoss != NULL) ? &oss : NULL); if (!error && uap->uoss != NULL) { lss.ss_sp = PTROUT(oss.ss_sp); lss.ss_size = oss.ss_size; lss.ss_flags = bsd_to_linux_sigaltstack(oss.ss_flags); error = copyout(&lss, uap->uoss, sizeof(l_stack_t)); } return (error); } -/* XXX do all */ int linux_arch_prctl(struct thread *td, struct linux_arch_prctl_args *args) { int error; struct pcb *pcb; LINUX_CTR2(arch_prctl, "0x%x, %p", args->code, args->addr); error = ENOTSUP; pcb = td->td_pcb; switch (args->code) { case LINUX_ARCH_GET_GS: error = copyout(&pcb->pcb_gsbase, (unsigned long *)args->addr, sizeof(args->addr)); break; case LINUX_ARCH_SET_GS: if (args->addr >= VM_MAXUSER_ADDRESS) return(EPERM); break; case LINUX_ARCH_GET_FS: error = copyout(&pcb->pcb_fsbase, (unsigned long *)args->addr, sizeof(args->addr)); break; case LINUX_ARCH_SET_FS: error = linux_set_cloned_tls(td, (void *)args->addr); break; default: error = EINVAL; } return (error); } int linux_set_cloned_tls(struct thread *td, void *desc) { struct pcb *pcb; if ((uint64_t)desc >= VM_MAXUSER_ADDRESS) return (EPERM); pcb = td->td_pcb; pcb->pcb_fsbase = (register_t)desc; td->td_frame->tf_fs = _ufssel; return (0); } Index: projects/clang380-import/sys/arm/allwinner/a10_machdep.c =================================================================== --- projects/clang380-import/sys/arm/allwinner/a10_machdep.c (revision 294776) +++ projects/clang380-import/sys/arm/allwinner/a10_machdep.c (nonexistent) @@ -1,113 +0,0 @@ -/*- - * Copyright (c) 2012 Ganbold Tsagaankhuu - * All rights reserved. - * - * This code is derived from software written for Brini by Mark Brinicombe - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * 1. Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in the - * documentation and/or other materials provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND - * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE - * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS - * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) - * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT - * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY - * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF - * SUCH DAMAGE. - * - * from: FreeBSD: //depot/projects/arm/src/sys/arm/ti/ti_machdep.c - */ - -#include "opt_ddb.h" -#include "opt_platform.h" - -#include -__FBSDID("$FreeBSD$"); - -#define _ARM32_BUS_DMA_PRIVATE -#include -#include -#include - -#include -#include - -#include -#include -#include -#include - -#include - -#include - -vm_offset_t -platform_lastaddr(void) -{ - - return (arm_devmap_lastaddr()); -} - -void -platform_probe_and_attach(void) -{ -} - -void -platform_gpio_init(void) -{ -} - -void -platform_late_init(void) -{ -} - -/* - * Set up static device mappings. - * - * This covers all the on-chip device with 1MB section mappings, which is good - * for performance (uses fewer TLB entries for device access). - * - * XXX It also covers a block of SRAM and some GPU (mali400) stuff that maybe - * shouldn't be device-mapped. The original code mapped a 4MB block, but - * perhaps a 1MB block would be more appropriate. - */ -int -platform_devmap_init(void) -{ - - arm_devmap_add_entry(0x01C00000, 0x00400000); /* 4MB */ - - return (0); -} - -struct arm32_dma_range * -bus_dma_get_range(void) -{ - return (NULL); -} - -int -bus_dma_get_range_nb(void) -{ - return (0); -} - -void -cpu_reset() -{ - a10wd_watchdog_reset(); - printf("Reset failed!\n"); - while (1); -} Property changes on: projects/clang380-import/sys/arm/allwinner/a10_machdep.c ___________________________________________________________________ Deleted: svn:eol-style ## -1 +0,0 ## -native \ No newline at end of property Deleted: svn:keywords ## -1 +0,0 ## -FreeBSD=%H \ No newline at end of property Deleted: svn:mime-type ## -1 +0,0 ## -text/plain \ No newline at end of property Index: projects/clang380-import/sys/arm/allwinner/a10_clk.c =================================================================== --- projects/clang380-import/sys/arm/allwinner/a10_clk.c (revision 294776) +++ projects/clang380-import/sys/arm/allwinner/a10_clk.c (revision 294777) @@ -1,349 +1,443 @@ /*- * Copyright (c) 2013 Ganbold Tsagaankhuu * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* Simple clock driver for Allwinner A10 */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include "a10_clk.h" struct a10_ccm_softc { struct resource *res; bus_space_tag_t bst; bus_space_handle_t bsh; int pll6_enabled; }; static struct a10_ccm_softc *a10_ccm_sc = NULL; #define ccm_read_4(sc, reg) \ bus_space_read_4((sc)->bst, (sc)->bsh, (reg)) #define ccm_write_4(sc, reg, val) \ bus_space_write_4((sc)->bst, (sc)->bsh, (reg), (val)) static int a10_ccm_probe(device_t dev) { if (!ofw_bus_status_okay(dev)) return (ENXIO); if (ofw_bus_is_compatible(dev, "allwinner,sun4i-ccm")) { device_set_desc(dev, "Allwinner Clock Control Module"); return(BUS_PROBE_DEFAULT); } return (ENXIO); } static int a10_ccm_attach(device_t dev) { struct a10_ccm_softc *sc = device_get_softc(dev); int rid = 0; if (a10_ccm_sc) return (ENXIO); sc->res = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &rid, RF_ACTIVE); if (!sc->res) { device_printf(dev, "could not allocate resource\n"); return (ENXIO); } sc->bst = rman_get_bustag(sc->res); sc->bsh = rman_get_bushandle(sc->res); a10_ccm_sc = sc; return (0); } static device_method_t a10_ccm_methods[] = { DEVMETHOD(device_probe, a10_ccm_probe), DEVMETHOD(device_attach, a10_ccm_attach), { 0, 0 } }; static driver_t a10_ccm_driver = { "a10_ccm", a10_ccm_methods, sizeof(struct a10_ccm_softc), }; static devclass_t a10_ccm_devclass; DRIVER_MODULE(a10_ccm, simplebus, a10_ccm_driver, a10_ccm_devclass, 0, 0); int a10_clk_usb_activate(void) { struct a10_ccm_softc *sc = a10_ccm_sc; uint32_t reg_value; if (sc == NULL) return (ENXIO); /* Gating AHB clock for USB */ reg_value = ccm_read_4(sc, CCM_AHB_GATING0); reg_value |= CCM_AHB_GATING_USB0; /* AHB clock gate usb0 */ reg_value |= CCM_AHB_GATING_EHCI0; /* AHB clock gate ehci0 */ reg_value |= CCM_AHB_GATING_EHCI1; /* AHB clock gate ehci1 */ ccm_write_4(sc, CCM_AHB_GATING0, reg_value); /* Enable clock for USB */ reg_value = ccm_read_4(sc, CCM_USB_CLK); reg_value |= CCM_USB_PHY; /* USBPHY */ reg_value |= CCM_USB0_RESET; /* disable reset for USB0 */ reg_value |= CCM_USB1_RESET; /* disable reset for USB1 */ reg_value |= CCM_USB2_RESET; /* disable reset for USB2 */ ccm_write_4(sc, CCM_USB_CLK, reg_value); return (0); } int a10_clk_usb_deactivate(void) { struct a10_ccm_softc *sc = a10_ccm_sc; uint32_t reg_value; if (sc == NULL) return (ENXIO); /* Disable clock for USB */ reg_value = ccm_read_4(sc, CCM_USB_CLK); reg_value &= ~CCM_USB_PHY; /* USBPHY */ reg_value &= ~CCM_USB0_RESET; /* reset for USB0 */ reg_value &= ~CCM_USB1_RESET; /* reset for USB1 */ reg_value &= ~CCM_USB2_RESET; /* reset for USB2 */ ccm_write_4(sc, CCM_USB_CLK, reg_value); /* Disable gating AHB clock for USB */ reg_value = ccm_read_4(sc, CCM_AHB_GATING0); reg_value &= ~CCM_AHB_GATING_USB0; /* disable AHB clock gate usb0 */ reg_value &= ~CCM_AHB_GATING_EHCI0; /* disable AHB clock gate ehci0 */ reg_value &= ~CCM_AHB_GATING_EHCI1; /* disable AHB clock gate ehci1 */ ccm_write_4(sc, CCM_AHB_GATING0, reg_value); return (0); } int a10_clk_emac_activate(void) { struct a10_ccm_softc *sc = a10_ccm_sc; uint32_t reg_value; if (sc == NULL) return (ENXIO); /* Gating AHB clock for EMAC */ reg_value = ccm_read_4(sc, CCM_AHB_GATING0); reg_value |= CCM_AHB_GATING_EMAC; ccm_write_4(sc, CCM_AHB_GATING0, reg_value); return (0); } int a10_clk_gmac_activate(phandle_t node) { char *phy_type; struct a10_ccm_softc *sc; uint32_t reg_value; sc = a10_ccm_sc; if (sc == NULL) return (ENXIO); /* Gating AHB clock for GMAC */ reg_value = ccm_read_4(sc, CCM_AHB_GATING1); reg_value |= CCM_AHB_GATING_GMAC; ccm_write_4(sc, CCM_AHB_GATING1, reg_value); /* Set GMAC mode. */ reg_value = CCM_GMAC_CLK_MII; if (OF_getprop_alloc(node, "phy-type", 1, (void **)&phy_type) > 0) { if (strcasecmp(phy_type, "rgmii") == 0) reg_value = CCM_GMAC_CLK_RGMII | CCM_GMAC_MODE_RGMII; else if (strcasecmp(phy_type, "rgmii-bpi") == 0) { reg_value = CCM_GMAC_CLK_RGMII | CCM_GMAC_MODE_RGMII; reg_value |= (3 << CCM_GMAC_CLK_DELAY_SHIFT); } free(phy_type, M_OFWPROP); } ccm_write_4(sc, CCM_GMAC_CLK, reg_value); return (0); } static void a10_clk_pll6_enable(void) { struct a10_ccm_softc *sc; uint32_t reg_value; /* * SATA needs PLL6 to be a 100MHz clock. * The SATA output frequency is 24MHz * n * k / m / 6. * To get to 100MHz, k & m must be equal and n must be 25. * For other uses the output frequency is 24MHz * n * k / 2. */ sc = a10_ccm_sc; if (sc->pll6_enabled) return; reg_value = ccm_read_4(sc, CCM_PLL6_CFG); reg_value &= ~CCM_PLL_CFG_BYPASS; reg_value &= ~(CCM_PLL_CFG_FACTOR_K | CCM_PLL_CFG_FACTOR_M | CCM_PLL_CFG_FACTOR_N); reg_value |= (25 << CCM_PLL_CFG_FACTOR_N_SHIFT); reg_value |= CCM_PLL6_CFG_SATA_CLKEN; reg_value |= CCM_PLL_CFG_ENABLE; ccm_write_4(sc, CCM_PLL6_CFG, reg_value); sc->pll6_enabled = 1; } static unsigned int a10_clk_pll6_get_rate(void) { struct a10_ccm_softc *sc; uint32_t k, n, reg_value; sc = a10_ccm_sc; reg_value = ccm_read_4(sc, CCM_PLL6_CFG); n = ((reg_value & CCM_PLL_CFG_FACTOR_N) >> CCM_PLL_CFG_FACTOR_N_SHIFT); k = ((reg_value & CCM_PLL_CFG_FACTOR_K) >> CCM_PLL_CFG_FACTOR_K_SHIFT) + 1; return ((CCM_CLK_REF_FREQ * n * k) / 2); } +static int +a10_clk_pll2_set_rate(unsigned int freq) +{ + struct a10_ccm_softc *sc; + uint32_t reg_value; + unsigned int prediv, postdiv, n; + + sc = a10_ccm_sc; + if (sc == NULL) + return (ENXIO); + + reg_value = ccm_read_4(sc, CCM_PLL2_CFG); + reg_value &= ~(CCM_PLL2_CFG_PREDIV | CCM_PLL2_CFG_POSTDIV | + CCM_PLL_CFG_FACTOR_N); + + /* + * Audio Codec needs PLL2 to be either 24576000 Hz or 22579200 Hz + * + * PLL2 output frequency is 24MHz * n / prediv / postdiv. + * To get as close as possible to the desired rate, we use a + * pre-divider of 21 and a post-divider of 4. With these values, + * a multiplier of 86 or 79 gets us close to the target rates. + */ + prediv = 21; + postdiv = 4; + + switch (freq) { + case 24576000: + n = 86; + reg_value |= CCM_PLL_CFG_ENABLE; + break; + case 22579200: + n = 79; + reg_value |= CCM_PLL_CFG_ENABLE; + break; + case 0: + n = 1; + reg_value &= ~CCM_PLL_CFG_ENABLE; + break; + default: + return (EINVAL); + } + + reg_value |= (prediv << CCM_PLL2_CFG_PREDIV_SHIFT); + reg_value |= (postdiv << CCM_PLL2_CFG_POSTDIV_SHIFT); + reg_value |= (n << CCM_PLL_CFG_FACTOR_N_SHIFT); + ccm_write_4(sc, CCM_PLL2_CFG, reg_value); + + return (0); +} + int a10_clk_ahci_activate(void) { struct a10_ccm_softc *sc; uint32_t reg_value; sc = a10_ccm_sc; if (sc == NULL) return (ENXIO); a10_clk_pll6_enable(); /* Gating AHB clock for SATA */ reg_value = ccm_read_4(sc, CCM_AHB_GATING0); reg_value |= CCM_AHB_GATING_SATA; ccm_write_4(sc, CCM_AHB_GATING0, reg_value); DELAY(1000); ccm_write_4(sc, CCM_SATA_CLK, CCM_PLL_CFG_ENABLE); return (0); } int a10_clk_mmc_activate(int devid) { struct a10_ccm_softc *sc; uint32_t reg_value; sc = a10_ccm_sc; if (sc == NULL) return (ENXIO); a10_clk_pll6_enable(); /* Gating AHB clock for SD/MMC */ reg_value = ccm_read_4(sc, CCM_AHB_GATING0); reg_value |= CCM_AHB_GATING_SDMMC0 << devid; ccm_write_4(sc, CCM_AHB_GATING0, reg_value); return (0); } int a10_clk_mmc_cfg(int devid, int freq) { struct a10_ccm_softc *sc; uint32_t clksrc, m, n, ophase, phase, reg_value; unsigned int pll_freq; sc = a10_ccm_sc; if (sc == NULL) return (ENXIO); freq /= 1000; if (freq <= 400) { pll_freq = CCM_CLK_REF_FREQ / 1000; clksrc = CCM_SD_CLK_SRC_SEL_OSC24M; ophase = 0; phase = 0; n = 2; } else if (freq <= 25000) { pll_freq = a10_clk_pll6_get_rate() / 1000; clksrc = CCM_SD_CLK_SRC_SEL_PLL6; ophase = 0; phase = 5; n = 2; } else if (freq <= 50000) { pll_freq = a10_clk_pll6_get_rate() / 1000; clksrc = CCM_SD_CLK_SRC_SEL_PLL6; ophase = 3; phase = 5; n = 0; } else return (EINVAL); m = ((pll_freq / (1 << n)) / (freq)) - 1; reg_value = ccm_read_4(sc, CCM_MMC0_SCLK_CFG + (devid * 4)); reg_value &= ~CCM_SD_CLK_SRC_SEL; reg_value |= (clksrc << CCM_SD_CLK_SRC_SEL_SHIFT); reg_value &= ~CCM_SD_CLK_PHASE_CTR; reg_value |= (phase << CCM_SD_CLK_PHASE_CTR_SHIFT); reg_value &= ~CCM_SD_CLK_DIV_RATIO_N; reg_value |= (n << CCM_SD_CLK_DIV_RATIO_N_SHIFT); reg_value &= ~CCM_SD_CLK_OPHASE_CTR; reg_value |= (ophase << CCM_SD_CLK_OPHASE_CTR_SHIFT); reg_value &= ~CCM_SD_CLK_DIV_RATIO_M; reg_value |= m; reg_value |= CCM_PLL_CFG_ENABLE; ccm_write_4(sc, CCM_MMC0_SCLK_CFG + (devid * 4), reg_value); + + return (0); +} + +int +a10_clk_dmac_activate(void) +{ + struct a10_ccm_softc *sc; + uint32_t reg_value; + + sc = a10_ccm_sc; + if (sc == NULL) + return (ENXIO); + + /* Gating AHB clock for DMA controller */ + reg_value = ccm_read_4(sc, CCM_AHB_GATING0); + reg_value |= CCM_AHB_GATING_DMA; + ccm_write_4(sc, CCM_AHB_GATING0, reg_value); + + return (0); +} + +int +a10_clk_codec_activate(unsigned int freq) +{ + struct a10_ccm_softc *sc; + uint32_t reg_value; + + sc = a10_ccm_sc; + if (sc == NULL) + return (ENXIO); + + a10_clk_pll2_set_rate(freq); + + /* Gating APB clock for ADDA */ + reg_value = ccm_read_4(sc, CCM_APB0_GATING); + reg_value |= CCM_APB0_GATING_ADDA; + ccm_write_4(sc, CCM_APB0_GATING, reg_value); + + /* Enable audio codec clock */ + reg_value = ccm_read_4(sc, CCM_AUDIO_CODEC_CLK); + reg_value |= CCM_AUDIO_CODEC_ENABLE; + ccm_write_4(sc, CCM_AUDIO_CODEC_CLK, reg_value); return (0); } Index: projects/clang380-import/sys/arm/allwinner/a10_clk.h =================================================================== --- projects/clang380-import/sys/arm/allwinner/a10_clk.h (revision 294776) +++ projects/clang380-import/sys/arm/allwinner/a10_clk.h (revision 294777) @@ -1,159 +1,172 @@ /*- * Copyright (c) 2013 Ganbold Tsagaankhuu * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _A10_CLK_H_ #define _A10_CLK_H_ #define CCM_PLL1_CFG 0x0000 #define CCM_PLL1_TUN 0x0004 #define CCM_PLL2_CFG 0x0008 #define CCM_PLL2_TUN 0x000c #define CCM_PLL3_CFG 0x0010 #define CCM_PLL3_TUN 0x0014 #define CCM_PLL4_CFG 0x0018 #define CCM_PLL4_TUN 0x001c #define CCM_PLL5_CFG 0x0020 #define CCM_PLL5_TUN 0x0024 #define CCM_PLL6_CFG 0x0028 #define CCM_PLL6_TUN 0x002c #define CCM_PLL7_CFG 0x0030 #define CCM_PLL7_TUN 0x0034 #define CCM_PLL1_TUN2 0x0038 #define CCM_PLL5_TUN2 0x003c #define CCM_PLL_LOCK_DBG 0x004c #define CCM_OSC24M_CFG 0x0050 #define CCM_CPU_AHB_APB0_CFG 0x0054 #define CCM_APB1_CLK_DIV 0x0058 #define CCM_AXI_GATING 0x005c #define CCM_AHB_GATING0 0x0060 #define CCM_AHB_GATING1 0x0064 #define CCM_APB0_GATING 0x0068 #define CCM_APB1_GATING 0x006c #define CCM_NAND_SCLK_CFG 0x0080 #define CCM_MS_SCLK_CFG 0x0084 #define CCM_MMC0_SCLK_CFG 0x0088 #define CCM_MMC1_SCLK_CFG 0x008c #define CCM_MMC2_SCLK_CFG 0x0090 #define CCM_MMC3_SCLK_CFG 0x0094 #define CCM_TS_CLK 0x0098 #define CCM_SS_CLK 0x009c #define CCM_SPI0_CLK 0x00a0 #define CCM_SPI1_CLK 0x00a4 #define CCM_SPI2_CLK 0x00a8 #define CCM_PATA_CLK 0x00ac #define CCM_IR0_CLK 0x00b0 #define CCM_IR1_CLK 0x00b4 #define CCM_IIS_CLK 0x00b8 #define CCM_AC97_CLK 0x00bc #define CCM_SPDIF_CLK 0x00c0 #define CCM_KEYPAD_CLK 0x00c4 #define CCM_SATA_CLK 0x00c8 #define CCM_USB_CLK 0x00cc #define CCM_GPS_CLK 0x00d0 #define CCM_SPI3_CLK 0x00d4 #define CCM_DRAM_CLK 0x0100 #define CCM_BE0_SCLK 0x0104 #define CCM_BE1_SCLK 0x0108 #define CCM_FE0_CLK 0x010c #define CCM_FE1_CLK 0x0110 #define CCM_MP_CLK 0x0114 #define CCM_LCD0_CH0_CLK 0x0118 #define CCM_LCD1_CH0_CLK 0x011c #define CCM_CSI_ISP_CLK 0x0120 #define CCM_TVD_CLK 0x0128 #define CCM_LCD0_CH1_CLK 0x012c #define CCM_LCD1_CH1_CLK 0x0130 #define CCM_CS0_CLK 0x0134 #define CCM_CS1_CLK 0x0138 #define CCM_VE_CLK 0x013c #define CCM_AUDIO_CODEC_CLK 0x0140 #define CCM_AVS_CLK 0x0144 #define CCM_ACE_CLK 0x0148 #define CCM_LVDS_CLK 0x014c #define CCM_HDMI_CLK 0x0150 #define CCM_MALI400_CLK 0x0154 #define CCM_GMAC_CLK 0x0164 #define CCM_GMAC_CLK_DELAY_SHIFT 10 #define CCM_GMAC_CLK_MODE_MASK 0x7 #define CCM_GMAC_MODE_RGMII (1 << 2) #define CCM_GMAC_CLK_MII 0x0 #define CCM_GMAC_CLK_EXT_RGMII 0x1 #define CCM_GMAC_CLK_RGMII 0x2 +/* APB0_GATING */ +#define CCM_APB0_GATING_ADDA (1 << 0) + /* AHB_GATING_REG0 */ #define CCM_AHB_GATING_USB0 (1 << 0) #define CCM_AHB_GATING_EHCI0 (1 << 1) #define CCM_AHB_GATING_EHCI1 (1 << 3) +#define CCM_AHB_GATING_DMA (1 << 6) #define CCM_AHB_GATING_SDMMC0 (1 << 8) #define CCM_AHB_GATING_EMAC (1 << 17) #define CCM_AHB_GATING_SATA (1 << 25) /* AHB_GATING_REG1 */ #define CCM_AHB_GATING_GMAC (1 << 17) #define CCM_USB_PHY (1 << 8) #define CCM_USB0_RESET (1 << 0) #define CCM_USB1_RESET (1 << 1) #define CCM_USB2_RESET (1 << 2) #define CCM_PLL_CFG_ENABLE (1U << 31) #define CCM_PLL_CFG_BYPASS (1U << 30) #define CCM_PLL_CFG_PLL5 (1U << 25) #define CCM_PLL_CFG_PLL6 (1U << 24) #define CCM_PLL_CFG_FACTOR_N 0x1f00 #define CCM_PLL_CFG_FACTOR_N_SHIFT 8 #define CCM_PLL_CFG_FACTOR_K 0x30 #define CCM_PLL_CFG_FACTOR_K_SHIFT 4 #define CCM_PLL_CFG_FACTOR_M 0x3 +#define CCM_PLL2_CFG_POSTDIV 0x3c000000 +#define CCM_PLL2_CFG_POSTDIV_SHIFT 26 +#define CCM_PLL2_CFG_PREDIV 0x1f +#define CCM_PLL2_CFG_PREDIV_SHIFT 0 + #define CCM_PLL6_CFG_SATA_CLKEN (1U << 14) #define CCM_SD_CLK_SRC_SEL 0x3000000 #define CCM_SD_CLK_SRC_SEL_SHIFT 24 #define CCM_SD_CLK_SRC_SEL_OSC24M 0 #define CCM_SD_CLK_SRC_SEL_PLL6 1 #define CCM_SD_CLK_PHASE_CTR 0x700000 #define CCM_SD_CLK_PHASE_CTR_SHIFT 20 #define CCM_SD_CLK_DIV_RATIO_N 0x30000 #define CCM_SD_CLK_DIV_RATIO_N_SHIFT 16 #define CCM_SD_CLK_OPHASE_CTR 0x700 #define CCM_SD_CLK_OPHASE_CTR_SHIFT 8 #define CCM_SD_CLK_DIV_RATIO_M 0xf +#define CCM_AUDIO_CODEC_ENABLE (1U << 31) + #define CCM_CLK_REF_FREQ 24000000U int a10_clk_usb_activate(void); int a10_clk_usb_deactivate(void); int a10_clk_emac_activate(void); int a10_clk_gmac_activate(phandle_t); int a10_clk_ahci_activate(void); int a10_clk_mmc_activate(int); int a10_clk_mmc_cfg(int, int); +int a10_clk_dmac_activate(void); +int a10_clk_codec_activate(unsigned int); #endif /* _A10_CLK_H_ */ Index: projects/clang380-import/sys/arm/allwinner/a10_common.c =================================================================== --- projects/clang380-import/sys/arm/allwinner/a10_common.c (revision 294776) +++ projects/clang380-import/sys/arm/allwinner/a10_common.c (revision 294777) @@ -1,68 +1,72 @@ /*- * Copyright (c) 2012 Ganbold Tsagaankhuu * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include struct fdt_fixup_entry fdt_fixup_table[] = { { NULL, NULL } }; +#ifndef ARM_INTRNG + static int fdt_aintc_decode_ic(phandle_t node, pcell_t *intr, int *interrupt, int *trig, int *pol) { int offset; if (fdt_is_compatible(node, "allwinner,sun4i-ic")) offset = 0; else if (fdt_is_compatible(node, "arm,gic")) offset = 32; else return (ENXIO); *interrupt = fdt32_to_cpu(intr[0]) + offset; *trig = INTR_TRIGGER_CONFORM; *pol = INTR_POLARITY_CONFORM; return (0); } fdt_pic_decode_t fdt_pic_table[] = { &fdt_aintc_decode_ic, NULL }; + +#endif /* ARM_INTRNG */ Index: projects/clang380-import/sys/arm/allwinner/allwinner_machdep.c =================================================================== --- projects/clang380-import/sys/arm/allwinner/allwinner_machdep.c (nonexistent) +++ projects/clang380-import/sys/arm/allwinner/allwinner_machdep.c (revision 294777) @@ -0,0 +1,154 @@ +/*- + * Copyright (c) 2012 Ganbold Tsagaankhuu + * Copyright (c) 2015-2016 Emmanuel Vadot + * All rights reserved. + * + * This code is derived from software written for Brini by Mark Brinicombe + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * from: FreeBSD: //depot/projects/arm/src/sys/arm/ti/ti_machdep.c + */ + +#include "opt_ddb.h" +#include "opt_platform.h" + +#include +__FBSDID("$FreeBSD$"); + +#define _ARM32_BUS_DMA_PRIVATE +#include +#include +#include + +#include +#include + +#include +#include +#include +#include + +#include + +#include +#include + +#include "platform_if.h" + +static u_int soc_type; +static u_int soc_family; + +static int +a10_attach(platform_t plat) +{ + soc_type = ALLWINNERSOC_A10; + soc_family = ALLWINNERSOC_SUN4I; + return (0); +} + +static int +a20_attach(platform_t plat) +{ + soc_type = ALLWINNERSOC_A20; + soc_family = ALLWINNERSOC_SUN7I; + + return (0); +} + + +static vm_offset_t +allwinner_lastaddr(platform_t plat) +{ + + return (arm_devmap_lastaddr()); +} + +/* + * Set up static device mappings. + * + * This covers all the on-chip device with 1MB section mappings, which is good + * for performance (uses fewer TLB entries for device access). + * + * XXX It also covers a block of SRAM and some GPU (mali400) stuff that maybe + * shouldn't be device-mapped. The original code mapped a 4MB block, but + * perhaps a 1MB block would be more appropriate. + */ +static int +allwinner_devmap_init(platform_t plat) +{ + + arm_devmap_add_entry(0x01C00000, 0x00400000); /* 4MB */ + + return (0); +} + +struct arm32_dma_range * +bus_dma_get_range(void) +{ + return (NULL); +} + +int +bus_dma_get_range_nb(void) +{ + return (0); +} + +void +cpu_reset() +{ + a10wd_watchdog_reset(); + printf("Reset failed!\n"); + while (1); +} + +static platform_method_t a10_methods[] = { + PLATFORMMETHOD(platform_attach, a10_attach), + PLATFORMMETHOD(platform_lastaddr, allwinner_lastaddr), + PLATFORMMETHOD(platform_devmap_init, allwinner_devmap_init), + + PLATFORMMETHOD_END, +}; + +static platform_method_t a20_methods[] = { + PLATFORMMETHOD(platform_attach, a20_attach), + PLATFORMMETHOD(platform_lastaddr, allwinner_lastaddr), + PLATFORMMETHOD(platform_devmap_init, allwinner_devmap_init), + + PLATFORMMETHOD_END, +}; + +u_int +allwinner_soc_type(void) +{ + return (soc_type); +} + +u_int +allwinner_soc_family(void) +{ + return (soc_family); +} + +FDT_PLATFORM_DEF(a10, "a10", 0, "allwinner,sun4i-a10"); +FDT_PLATFORM_DEF(a20, "a20", 0, "allwinner,sun7i-a20"); Property changes on: projects/clang380-import/sys/arm/allwinner/allwinner_machdep.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/arm/allwinner/allwinner_machdep.h =================================================================== --- projects/clang380-import/sys/arm/allwinner/allwinner_machdep.h (nonexistent) +++ projects/clang380-import/sys/arm/allwinner/allwinner_machdep.h (revision 294777) @@ -0,0 +1,47 @@ +/*- + * Copyright (c) 2015 Emmanuel Vadot + * All rights reserved. + * + * This code is derived from software written for Brini by Mark Brinicombe + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + * + */ + +#ifndef AW_MACHDEP_H +#define AW_MACHDEP_H + +#define ALLWINNERSOC_A10 0x10000000 +#define ALLWINNERSOC_A13 0x13000000 +#define ALLWINNERSOC_A10S 0x10000001 +#define ALLWINNERSOC_A20 0x20000000 + +#define ALLWINNERSOC_SUN4I 0x40000000 +#define ALLWINNERSOC_SUN5I 0x50000000 +#define ALLWINNERSOC_SUN7I 0x70000000 + +u_int allwinner_soc_type(void); +u_int allwinner_soc_family(void); + +#endif /* AW_MACHDEP_H */ Property changes on: projects/clang380-import/sys/arm/allwinner/allwinner_machdep.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/arm/allwinner/files.allwinner =================================================================== --- projects/clang380-import/sys/arm/allwinner/files.allwinner (revision 294776) +++ projects/clang380-import/sys/arm/allwinner/files.allwinner (revision 294777) @@ -1,16 +1,16 @@ # $FreeBSD$ kern/kern_clocksource.c standard arm/allwinner/a10_ahci.c optional ahci arm/allwinner/a10_clk.c standard arm/allwinner/a10_common.c standard arm/allwinner/a10_ehci.c optional ehci arm/allwinner/a10_gpio.c optional gpio -arm/allwinner/a10_machdep.c standard arm/allwinner/a10_mmc.c optional mmc arm/allwinner/a10_sramc.c standard arm/allwinner/a10_wdog.c standard arm/allwinner/a20/a20_cpu_cfg.c standard +arm/allwinner/allwinner_machdep.c standard arm/allwinner/if_emac.c optional emac arm/allwinner/timer.c standard #arm/allwinner/console.c standard Index: projects/clang380-import/sys/arm/arm/db_trace.c =================================================================== --- projects/clang380-import/sys/arm/arm/db_trace.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/db_trace.c (revision 294777) @@ -1,183 +1,188 @@ /* $NetBSD: db_trace.c,v 1.8 2003/01/17 22:28:48 thorpej Exp $ */ /*- * Copyright (c) 2000, 2001 Ben Harris * Copyright (c) 1996 Scott K. Stevens * * Mach Operating System * Copyright (c) 1991,1990 Carnegie Mellon University * All Rights Reserved. * * Permission to use, copy, modify and distribute this software and its * documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie the * rights to redistribute these changes. */ +#include "opt_ddb.h" #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include static void db_stack_trace_cmd(struct unwind_state *state) { const char *name; db_expr_t value; db_expr_t offset; c_db_sym_t sym; u_int reg, i; char *sep; uint16_t upd_mask; bool finished; finished = false; while (!finished) { finished = unwind_stack_one(state, 1); /* Print the frame details */ sym = db_search_symbol(state->start_pc, DB_STGY_ANY, &offset); if (sym == C_DB_SYM_NULL) { value = 0; name = "(null)"; } else db_symbol_values(sym, &name, &value); db_printf("%s() at ", name); db_printsym(state->start_pc, DB_STGY_PROC); db_printf("\n"); db_printf("\t pc = 0x%08x lr = 0x%08x (", state->start_pc, state->registers[LR]); db_printsym(state->registers[LR], DB_STGY_PROC); db_printf(")\n"); db_printf("\t sp = 0x%08x fp = 0x%08x", state->registers[SP], state->registers[FP]); /* Don't print the registers we have already printed */ upd_mask = state->update_mask & ~((1 << SP) | (1 << FP) | (1 << LR) | (1 << PC)); sep = "\n\t"; for (i = 0, reg = 0; upd_mask != 0; upd_mask >>= 1, reg++) { if ((upd_mask & 1) != 0) { db_printf("%s%sr%d = 0x%08x", sep, (reg < 10) ? " " : "", reg, state->registers[reg]); i++; if (i == 2) { sep = "\n\t"; i = 0; } else sep = " "; } } db_printf("\n"); if (finished) break; /* * Stop if directed to do so, or if we've unwound back to the * kernel entry point, or if the unwind function didn't change * anything (to avoid getting stuck in this loop forever). * If the latter happens, it's an indication that the unwind * information is incorrect somehow for the function named in * the last frame printed before you see the unwind failure * message (maybe it needs a STOP_UNWINDING). */ if (state->registers[PC] < VM_MIN_KERNEL_ADDRESS) { db_printf("Unable to unwind into user mode\n"); finished = true; } else if (state->update_mask == 0) { db_printf("Unwind failure (no registers changed)\n"); finished = true; } } } -/* XXX stubs */ void db_md_list_watchpoints() { + + dbg_show_watchpoint(); } int db_md_clr_watchpoint(db_expr_t addr, db_expr_t size) { - return (0); + + return (dbg_remove_watchpoint(addr, size)); } int db_md_set_watchpoint(db_expr_t addr, db_expr_t size) { - return (0); + + return (dbg_setup_watchpoint(addr, size, HW_WATCHPOINT_RW)); } int db_trace_thread(struct thread *thr, int count) { struct unwind_state state; struct pcb *ctx; if (thr != curthread) { ctx = kdb_thr_ctx(thr); state.registers[FP] = ctx->pcb_regs.sf_r11; state.registers[SP] = ctx->pcb_regs.sf_sp; state.registers[LR] = ctx->pcb_regs.sf_lr; state.registers[PC] = ctx->pcb_regs.sf_pc; db_stack_trace_cmd(&state); } else db_trace_self(); return (0); } void db_trace_self(void) { struct unwind_state state; uint32_t sp; /* Read the stack pointer */ __asm __volatile("mov %0, sp" : "=&r" (sp)); state.registers[FP] = (uint32_t)__builtin_frame_address(0); state.registers[SP] = sp; state.registers[LR] = (uint32_t)__builtin_return_address(0); state.registers[PC] = (uint32_t)db_trace_self; db_stack_trace_cmd(&state); } Index: projects/clang380-import/sys/arm/arm/debug_monitor.c =================================================================== --- projects/clang380-import/sys/arm/arm/debug_monitor.c (nonexistent) +++ projects/clang380-import/sys/arm/arm/debug_monitor.c (revision 294777) @@ -0,0 +1,943 @@ +/* + * Copyright (c) 2015 Juniper Networks Inc. + * All rights reserved. + * + * Developed by Semihalf. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include "opt_ddb.h" + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +enum dbg_t { + DBG_TYPE_BREAKPOINT = 0, + DBG_TYPE_WATCHPOINT = 1, +}; + +struct dbg_wb_conf { + enum dbg_t type; + enum dbg_access_t access; + db_addr_t address; + db_expr_t size; + u_int slot; +}; + +static int dbg_reset_state(void); +static int dbg_setup_breakpoint(db_expr_t, db_expr_t, u_int); +static int dbg_remove_breakpoint(u_int); +static u_int dbg_find_slot(enum dbg_t, db_expr_t); +static boolean_t dbg_check_slot_free(enum dbg_t, u_int); + +static int dbg_remove_xpoint(struct dbg_wb_conf *); +static int dbg_setup_xpoint(struct dbg_wb_conf *); + +static boolean_t dbg_capable; /* Indicates that machine is capable of using + HW watchpoints/breakpoints */ +static boolean_t dbg_ready[MAXCPU]; /* Debug arch. reset performed on this CPU */ + +static uint32_t dbg_model; /* Debug Arch. Model */ +static boolean_t dbg_ossr; /* OS Save and Restore implemented */ + +static uint32_t dbg_watchpoint_num; +static uint32_t dbg_breakpoint_num; + +static int dbg_ref_count_mme[MAXCPU]; /* Times monitor mode was enabled */ + +/* ID_DFR0 - Debug Feature Register 0 */ +#define ID_DFR0_CP_DEBUG_M_SHIFT 0 +#define ID_DFR0_CP_DEBUG_M_MASK (0xF << ID_DFR0_CP_DEBUG_M_SHIFT) +#define ID_DFR0_CP_DEBUG_M_NS (0x0) /* Not supported */ +#define ID_DFR0_CP_DEBUG_M_V6 (0x2) /* v6 Debug arch. CP14 access */ +#define ID_DFR0_CP_DEBUG_M_V6_1 (0x3) /* v6.1 Debug arch. CP14 access */ +#define ID_DFR0_CP_DEBUG_M_V7 (0x4) /* v7 Debug arch. CP14 access */ +#define ID_DFR0_CP_DEBUG_M_V7_1 (0x5) /* v7.1 Debug arch. CP14 access */ + +/* DBGDIDR - Debug ID Register */ +#define DBGDIDR_WRPS_SHIFT 28 +#define DBGDIDR_WRPS_MASK (0xF << DBGDIDR_WRPS_SHIFT) +#define DBGDIDR_WRPS_NUM(reg) \ + ((((reg) & DBGDIDR_WRPS_MASK) >> DBGDIDR_WRPS_SHIFT) + 1) + +#define DBGDIDR_BRPS_SHIFT 24 +#define DBGDIDR_BRPS_MASK (0xF << DBGDIDR_BRPS_SHIFT) +#define DBGDIDR_BRPS_NUM(reg) \ + ((((reg) & DBGDIDR_BRPS_MASK) >> DBGDIDR_BRPS_SHIFT) + 1) + +/* DBGPRSR - Device Powerdown and Reset Status Register */ +#define DBGPRSR_PU (1 << 0) /* Powerup status */ + +/* DBGOSLSR - OS Lock Status Register */ +#define DBGOSLSR_OSLM0 (1 << 0) + +/* DBGOSDLR - OS Double Lock Register */ +#define DBGPRSR_DLK (1 << 0) /* OS Double Lock set */ + +/* DBGDSCR - Debug Status and Control Register */ +#define DBGSCR_MDBG_EN (1 << 15) /* Monitor debug-mode enable */ + +/* DBGWVR - Watchpoint Value Register */ +#define DBGWVR_ADDR_MASK (~0x3U) + +/* Watchpoints/breakpoints control register bitfields */ +#define DBG_WB_CTRL_LEN_1 (0x1 << 5) +#define DBG_WB_CTRL_LEN_2 (0x3 << 5) +#define DBG_WB_CTRL_LEN_4 (0xf << 5) +#define DBG_WB_CTRL_LEN_8 (0xff << 5) +#define DBG_WB_CTRL_LEN_MASK(x) ((x) & (0xff << 5)) +#define DBG_WB_CTRL_EXEC (0x0 << 3) +#define DBG_WB_CTRL_LOAD (0x1 << 3) +#define DBG_WB_CTRL_STORE (0x2 << 3) +#define DBG_WB_CTRL_ACCESS_MASK(x) ((x) & (0x3 << 3)) + +/* Common for breakpoint and watchpoint */ +#define DBG_WB_CTRL_PL1 (0x1 << 1) +#define DBG_WB_CTRL_PL0 (0x2 << 1) +#define DBG_WB_CTRL_PLX_MASK(x) ((x) & (0x3 << 1)) +#define DBG_WB_CTRL_E (0x1 << 0) + +/* + * Watchpoint/breakpoint helpers + */ +#define DBG_BKPT_BT_SLOT 0 /* Slot for branch taken */ +#define DBG_BKPT_BNT_SLOT 1 /* Slot for branch not taken */ + +#define OP2_SHIFT 4 + +/* Opc2 numbers for coprocessor instructions */ +#define DBG_WB_BVR 4 +#define DBG_WB_BCR 5 +#define DBG_WB_WVR 6 +#define DBG_WB_WCR 7 + +#define DBG_REG_BASE_BVR (DBG_WB_BVR << OP2_SHIFT) +#define DBG_REG_BASE_BCR (DBG_WB_BCR << OP2_SHIFT) +#define DBG_REG_BASE_WVR (DBG_WB_WVR << OP2_SHIFT) +#define DBG_REG_BASE_WCR (DBG_WB_WCR << OP2_SHIFT) + +#define DBG_WB_READ(cn, cm, op2, val) do { \ + __asm __volatile("mrc p14, 0, %0, " #cn "," #cm "," #op2 : "=r" (val)); \ +} while (0) + +#define DBG_WB_WRITE(cn, cm, op2, val) do { \ + __asm __volatile("mcr p14, 0, %0, " #cn "," #cm "," #op2 :: "r" (val)); \ +} while (0) + +#define READ_WB_REG_CASE(op2, m, val) \ + case (((op2) << OP2_SHIFT) + m): \ + DBG_WB_READ(c0, c ## m, op2, val); \ + break + +#define WRITE_WB_REG_CASE(op2, m, val) \ + case (((op2) << OP2_SHIFT) + m): \ + DBG_WB_WRITE(c0, c ## m, op2, val); \ + break + +#define SWITCH_CASES_READ_WB_REG(op2, val) \ + READ_WB_REG_CASE(op2, 0, val); \ + READ_WB_REG_CASE(op2, 1, val); \ + READ_WB_REG_CASE(op2, 2, val); \ + READ_WB_REG_CASE(op2, 3, val); \ + READ_WB_REG_CASE(op2, 4, val); \ + READ_WB_REG_CASE(op2, 5, val); \ + READ_WB_REG_CASE(op2, 6, val); \ + READ_WB_REG_CASE(op2, 7, val); \ + READ_WB_REG_CASE(op2, 8, val); \ + READ_WB_REG_CASE(op2, 9, val); \ + READ_WB_REG_CASE(op2, 10, val); \ + READ_WB_REG_CASE(op2, 11, val); \ + READ_WB_REG_CASE(op2, 12, val); \ + READ_WB_REG_CASE(op2, 13, val); \ + READ_WB_REG_CASE(op2, 14, val); \ + READ_WB_REG_CASE(op2, 15, val) + +#define SWITCH_CASES_WRITE_WB_REG(op2, val) \ + WRITE_WB_REG_CASE(op2, 0, val); \ + WRITE_WB_REG_CASE(op2, 1, val); \ + WRITE_WB_REG_CASE(op2, 2, val); \ + WRITE_WB_REG_CASE(op2, 3, val); \ + WRITE_WB_REG_CASE(op2, 4, val); \ + WRITE_WB_REG_CASE(op2, 5, val); \ + WRITE_WB_REG_CASE(op2, 6, val); \ + WRITE_WB_REG_CASE(op2, 7, val); \ + WRITE_WB_REG_CASE(op2, 8, val); \ + WRITE_WB_REG_CASE(op2, 9, val); \ + WRITE_WB_REG_CASE(op2, 10, val); \ + WRITE_WB_REG_CASE(op2, 11, val); \ + WRITE_WB_REG_CASE(op2, 12, val); \ + WRITE_WB_REG_CASE(op2, 13, val); \ + WRITE_WB_REG_CASE(op2, 14, val); \ + WRITE_WB_REG_CASE(op2, 15, val) + +static uint32_t +dbg_wb_read_reg(int reg, int n) +{ + uint32_t val; + + val = 0; + + switch (reg + n) { + SWITCH_CASES_READ_WB_REG(DBG_WB_WVR, val); + SWITCH_CASES_READ_WB_REG(DBG_WB_WCR, val); + SWITCH_CASES_READ_WB_REG(DBG_WB_BVR, val); + SWITCH_CASES_READ_WB_REG(DBG_WB_BCR, val); + default: + db_printf( + "trying to read from CP14 reg. using wrong opc2 %d\n", + reg >> OP2_SHIFT); + } + + return (val); +} + +static void +dbg_wb_write_reg(int reg, int n, uint32_t val) +{ + + switch (reg + n) { + SWITCH_CASES_WRITE_WB_REG(DBG_WB_WVR, val); + SWITCH_CASES_WRITE_WB_REG(DBG_WB_WCR, val); + SWITCH_CASES_WRITE_WB_REG(DBG_WB_BVR, val); + SWITCH_CASES_WRITE_WB_REG(DBG_WB_BCR, val); + default: + db_printf( + "trying to write to CP14 reg. using wrong opc2 %d\n", + reg >> OP2_SHIFT); + } + isb(); +} + +boolean_t +kdb_cpu_pc_is_singlestep(db_addr_t pc) +{ + + if (dbg_find_slot(DBG_TYPE_BREAKPOINT, pc) != ~0U) + return (TRUE); + + return (FALSE); +} + +void +kdb_cpu_set_singlestep(void) +{ + db_expr_t inst; + db_addr_t pc, brpc; + uint32_t wcr; + u_int i; + + /* + * Disable watchpoints, e.g. stepping over watched instruction will + * trigger break exception instead of single-step exception and locks + * CPU on that instruction for ever. + */ + for (i = 0; i < dbg_watchpoint_num; i++) { + wcr = dbg_wb_read_reg(DBG_REG_BASE_WCR, i); + if ((wcr & DBG_WB_CTRL_E) != 0) { + dbg_wb_write_reg(DBG_REG_BASE_WCR, i, + (wcr & ~DBG_WB_CTRL_E)); + } + } + + pc = PC_REGS(); + + inst = db_get_value(pc, sizeof(pc), FALSE); + if (inst_branch(inst) || inst_call(inst) || inst_return(inst)) { + brpc = branch_taken(inst, pc); + dbg_setup_breakpoint(brpc, INSN_SIZE, DBG_BKPT_BT_SLOT); + } + pc = next_instr_address(pc, 0); + dbg_setup_breakpoint(pc, INSN_SIZE, DBG_BKPT_BNT_SLOT); +} + +void +kdb_cpu_clear_singlestep(void) +{ + uint32_t wvr, wcr; + u_int i; + + dbg_remove_breakpoint(DBG_BKPT_BT_SLOT); + dbg_remove_breakpoint(DBG_BKPT_BNT_SLOT); + + /* Restore all watchpoints */ + for (i = 0; i < dbg_watchpoint_num; i++) { + wcr = dbg_wb_read_reg(DBG_REG_BASE_WCR, i); + wvr = dbg_wb_read_reg(DBG_REG_BASE_WVR, i); + /* Watchpoint considered not empty if address value is not 0 */ + if ((wvr & DBGWVR_ADDR_MASK) != 0) { + dbg_wb_write_reg(DBG_REG_BASE_WCR, i, + (wcr | DBG_WB_CTRL_E)); + } + } +} + +int +dbg_setup_watchpoint(db_expr_t addr, db_expr_t size, enum dbg_access_t access) +{ + struct dbg_wb_conf conf; + + if (access == HW_BREAKPOINT_X) { + db_printf("Invalid access type for watchpoint: %d\n", access); + return (EINVAL); + } + + conf.address = addr; + conf.size = size; + conf.access = access; + conf.type = DBG_TYPE_WATCHPOINT; + + return (dbg_setup_xpoint(&conf)); +} + +int +dbg_remove_watchpoint(db_expr_t addr, db_expr_t size __unused) +{ + struct dbg_wb_conf conf; + + conf.address = addr; + conf.type = DBG_TYPE_WATCHPOINT; + + return (dbg_remove_xpoint(&conf)); +} + +static int +dbg_setup_breakpoint(db_expr_t addr, db_expr_t size, u_int slot) +{ + struct dbg_wb_conf conf; + + conf.address = addr; + conf.size = size; + conf.access = HW_BREAKPOINT_X; + conf.type = DBG_TYPE_BREAKPOINT; + conf.slot = slot; + + return (dbg_setup_xpoint(&conf)); +} + +static int +dbg_remove_breakpoint(u_int slot) +{ + struct dbg_wb_conf conf; + + /* Slot already cleared. Don't recurse */ + if (dbg_check_slot_free(DBG_TYPE_BREAKPOINT, slot)) + return (0); + + conf.slot = slot; + conf.type = DBG_TYPE_BREAKPOINT; + + return (dbg_remove_xpoint(&conf)); +} + +static const char * +dbg_watchtype_str(uint32_t type) +{ + + switch (type) { + case DBG_WB_CTRL_EXEC: + return ("execute"); + case DBG_WB_CTRL_STORE: + return ("write"); + case DBG_WB_CTRL_LOAD: + return ("read"); + case DBG_WB_CTRL_LOAD | DBG_WB_CTRL_STORE: + return ("read/write"); + default: + return ("invalid"); + } +} + +static int +dbg_watchtype_len(uint32_t len) +{ + + switch (len) { + case DBG_WB_CTRL_LEN_1: + return (1); + case DBG_WB_CTRL_LEN_2: + return (2); + case DBG_WB_CTRL_LEN_4: + return (4); + case DBG_WB_CTRL_LEN_8: + return (8); + default: + return (0); + } +} + +void +dbg_show_watchpoint(void) +{ + uint32_t wcr, len, type; + uint32_t addr; + boolean_t is_enabled; + int i; + + if (!dbg_capable) { + db_printf("Architecture does not support HW " + "breakpoints/watchpoints\n"); + return; + } + + db_printf("\nhardware watchpoints:\n"); + db_printf(" watch status type len address symbol\n"); + db_printf(" ----- -------- ---------- --- ---------- ------------------\n"); + for (i = 0; i < dbg_watchpoint_num; i++) { + wcr = dbg_wb_read_reg(DBG_REG_BASE_WCR, i); + if ((wcr & DBG_WB_CTRL_E) != 0) + is_enabled = TRUE; + else + is_enabled = FALSE; + + type = DBG_WB_CTRL_ACCESS_MASK(wcr); + len = DBG_WB_CTRL_LEN_MASK(wcr); + addr = dbg_wb_read_reg(DBG_REG_BASE_WVR, i) & DBGWVR_ADDR_MASK; + db_printf(" %-5d %-8s %10s %3d 0x%08x ", i, + is_enabled ? "enabled" : "disabled", + is_enabled ? dbg_watchtype_str(type) : "", + is_enabled ? dbg_watchtype_len(len) : 0, + addr); + db_printsym((db_addr_t)addr, DB_STGY_ANY); + db_printf("\n"); + } +} + +static boolean_t +dbg_check_slot_free(enum dbg_t type, u_int slot) +{ + uint32_t cr, vr; + uint32_t max; + + switch(type) { + case DBG_TYPE_BREAKPOINT: + max = dbg_breakpoint_num; + cr = DBG_REG_BASE_BCR; + vr = DBG_REG_BASE_BVR; + break; + case DBG_TYPE_WATCHPOINT: + max = dbg_watchpoint_num; + cr = DBG_REG_BASE_WCR; + vr = DBG_REG_BASE_WVR; + break; + default: + db_printf("%s: Unsupported event type %d\n", __func__, type); + return (FALSE); + } + + if (slot >= max) { + db_printf("%s: Invalid slot number %d, max %d\n", + __func__, slot, max - 1); + return (FALSE); + } + + if ((dbg_wb_read_reg(cr, slot) & DBG_WB_CTRL_E) == 0 && + (dbg_wb_read_reg(vr, slot) & DBGWVR_ADDR_MASK) == 0) + return (TRUE); + + return (FALSE); +} + +static u_int +dbg_find_free_slot(enum dbg_t type) +{ + u_int max, i; + + switch(type) { + case DBG_TYPE_BREAKPOINT: + max = dbg_breakpoint_num; + break; + case DBG_TYPE_WATCHPOINT: + max = dbg_watchpoint_num; + break; + default: + db_printf("Unsupported debug type\n"); + return (~0U); + } + + for (i = 0; i < max; i++) { + if (dbg_check_slot_free(type, i)) + return (i); + } + + return (~0U); +} + +static u_int +dbg_find_slot(enum dbg_t type, db_expr_t addr) +{ + uint32_t reg_addr, reg_ctrl; + u_int max, i; + + switch(type) { + case DBG_TYPE_BREAKPOINT: + max = dbg_breakpoint_num; + reg_addr = DBG_REG_BASE_BVR; + reg_ctrl = DBG_REG_BASE_BCR; + break; + case DBG_TYPE_WATCHPOINT: + max = dbg_watchpoint_num; + reg_addr = DBG_REG_BASE_WVR; + reg_ctrl = DBG_REG_BASE_WCR; + break; + default: + db_printf("Unsupported debug type\n"); + return (~0U); + } + + for (i = 0; i < max; i++) { + if ((dbg_wb_read_reg(reg_addr, i) == addr) && + ((dbg_wb_read_reg(reg_ctrl, i) & DBG_WB_CTRL_E) != 0)) + return (i); + } + + return (~0U); +} + +static __inline boolean_t +dbg_monitor_is_enabled(void) +{ + + return ((cp14_dbgdscrint_get() & DBGSCR_MDBG_EN) != 0); +} + +static int +dbg_enable_monitor(void) +{ + uint32_t dbg_dscr; + + /* Already enabled? Just increment reference counter and return */ + if (dbg_monitor_is_enabled()) { + dbg_ref_count_mme[PCPU_GET(cpuid)]++; + return (0); + } + + dbg_dscr = cp14_dbgdscrint_get(); + + switch (dbg_model) { + case ID_DFR0_CP_DEBUG_M_V6: + case ID_DFR0_CP_DEBUG_M_V6_1: /* fall through */ + cp14_dbgdscr_v6_set(dbg_dscr | DBGSCR_MDBG_EN); + break; + case ID_DFR0_CP_DEBUG_M_V7: /* fall through */ + case ID_DFR0_CP_DEBUG_M_V7_1: + cp14_dbgdscr_v7_set(dbg_dscr | DBGSCR_MDBG_EN); + break; + default: + break; + } + isb(); + + /* Verify that Monitor mode is set */ + if (dbg_monitor_is_enabled()) { + dbg_ref_count_mme[PCPU_GET(cpuid)]++; + return (0); + } + + return (ENXIO); +} + +static int +dbg_disable_monitor(void) +{ + uint32_t dbg_dscr; + + if (!dbg_monitor_is_enabled()) + return (0); + + if (--dbg_ref_count_mme[PCPU_GET(cpuid)] > 0) + return (0); + + dbg_dscr = cp14_dbgdscrint_get(); + switch (dbg_model) { + case ID_DFR0_CP_DEBUG_M_V6: + case ID_DFR0_CP_DEBUG_M_V6_1: /* fall through */ + dbg_dscr &= ~DBGSCR_MDBG_EN; + cp14_dbgdscr_v6_set(dbg_dscr); + break; + case ID_DFR0_CP_DEBUG_M_V7: /* fall through */ + case ID_DFR0_CP_DEBUG_M_V7_1: + dbg_dscr &= ~DBGSCR_MDBG_EN; + cp14_dbgdscr_v7_set(dbg_dscr); + break; + default: + return (ENXIO); + } + isb(); + + return (0); +} + +static int +dbg_setup_xpoint(struct dbg_wb_conf *conf) +{ + const char *typestr; + uint32_t cr_size, cr_priv, cr_access; + uint32_t reg_ctrl, reg_addr, ctrl, addr; + boolean_t is_bkpt; + u_int cpuid; + u_int i; + int err; + + if (!dbg_capable) + return (ENXIO); + + is_bkpt = (conf->type == DBG_TYPE_BREAKPOINT); + typestr = is_bkpt ? "breakpoint" : "watchpoint"; + + cpuid = PCPU_GET(cpuid); + if (!dbg_ready[cpuid]) { + err = dbg_reset_state(); + if (err != 0) + return (err); + dbg_ready[cpuid] = TRUE; + } + + if (is_bkpt) { + if (dbg_breakpoint_num == 0) { + db_printf("Breakpoints not supported on this architecture\n"); + return (ENXIO); + } + i = conf->slot; + if (!dbg_check_slot_free(DBG_TYPE_BREAKPOINT, i)) { + /* + * This should never happen. If it does it means that + * there is an erroneus scenario somewhere. Still, it can + * be done but let's inform the user. + */ + db_printf("ERROR: Breakpoint already set. Replacing...\n"); + } + } else { + i = dbg_find_free_slot(DBG_TYPE_WATCHPOINT); + if (i == ~0U) { + db_printf("Can not find slot for %s, max %d slots supported\n", + typestr, dbg_watchpoint_num); + return (ENXIO); + } + } + + /* Kernel access only */ + cr_priv = DBG_WB_CTRL_PL1; + + switch(conf->size) { + case 1: + cr_size = DBG_WB_CTRL_LEN_1; + break; + case 2: + cr_size = DBG_WB_CTRL_LEN_2; + break; + case 4: + cr_size = DBG_WB_CTRL_LEN_4; + break; + case 8: + cr_size = DBG_WB_CTRL_LEN_8; + break; + default: + db_printf("Unsupported address size for %s\n", typestr); + return (EINVAL); + } + + if (is_bkpt) { + cr_access = DBG_WB_CTRL_EXEC; + reg_ctrl = DBG_REG_BASE_BCR; + reg_addr = DBG_REG_BASE_BVR; + /* Always unlinked BKPT */ + ctrl = (cr_size | cr_access | cr_priv | DBG_WB_CTRL_E); + } else { + switch(conf->access) { + case HW_WATCHPOINT_R: + cr_access = DBG_WB_CTRL_LOAD; + break; + case HW_WATCHPOINT_W: + cr_access = DBG_WB_CTRL_STORE; + break; + case HW_WATCHPOINT_RW: + cr_access = DBG_WB_CTRL_LOAD | DBG_WB_CTRL_STORE; + break; + default: + db_printf("Unsupported exception level for %s\n", typestr); + return (EINVAL); + } + + reg_ctrl = DBG_REG_BASE_WCR; + reg_addr = DBG_REG_BASE_WVR; + ctrl = (cr_size | cr_access | cr_priv | DBG_WB_CTRL_E); + } + + addr = conf->address; + + dbg_wb_write_reg(reg_addr, i, addr); + dbg_wb_write_reg(reg_ctrl, i, ctrl); + + return (dbg_enable_monitor()); +} + +static int +dbg_remove_xpoint(struct dbg_wb_conf *conf) +{ + uint32_t reg_ctrl, reg_addr, addr; + u_int cpuid; + u_int i; + int err; + + if (!dbg_capable) + return (ENXIO); + + cpuid = PCPU_GET(cpuid); + if (!dbg_ready[cpuid]) { + err = dbg_reset_state(); + if (err != 0) + return (err); + dbg_ready[cpuid] = TRUE; + } + + addr = conf->address; + + if (conf->type == DBG_TYPE_BREAKPOINT) { + i = conf->slot; + reg_ctrl = DBG_REG_BASE_BCR; + reg_addr = DBG_REG_BASE_BVR; + } else { + i = dbg_find_slot(DBG_TYPE_WATCHPOINT, addr); + if (i == ~0U) { + db_printf("Can not find watchpoint for address 0%x\n", addr); + return (EINVAL); + } + reg_ctrl = DBG_REG_BASE_WCR; + reg_addr = DBG_REG_BASE_WVR; + } + + dbg_wb_write_reg(reg_ctrl, i, 0); + dbg_wb_write_reg(reg_addr, i, 0); + + return (dbg_disable_monitor()); +} + +static __inline uint32_t +dbg_get_debug_model(void) +{ + uint32_t dbg_m; + + dbg_m = ((cpuinfo.id_dfr0 & ID_DFR0_CP_DEBUG_M_MASK) >> + ID_DFR0_CP_DEBUG_M_SHIFT); + + return (dbg_m); +} + +static __inline boolean_t +dbg_get_ossr(void) +{ + + switch (dbg_model) { + case ID_DFR0_CP_DEBUG_M_V6_1: + if ((cp14_dbgoslsr_get() & DBGOSLSR_OSLM0) != 0) + return (TRUE); + + return (FALSE); + case ID_DFR0_CP_DEBUG_M_V7_1: + return (TRUE); + default: + return (FALSE); + } +} + +static __inline boolean_t +dbg_arch_supported(void) +{ + + switch (dbg_model) { + case ID_DFR0_CP_DEBUG_M_V6: + case ID_DFR0_CP_DEBUG_M_V6_1: + case ID_DFR0_CP_DEBUG_M_V7: + case ID_DFR0_CP_DEBUG_M_V7_1: /* fall through */ + return (TRUE); + default: + /* We only support valid v6.x/v7.x modes through CP14 */ + return (FALSE); + } +} + +static __inline uint32_t +dbg_get_wrp_num(void) +{ + uint32_t dbg_didr; + + dbg_didr = cp14_dbgdidr_get(); + + return (DBGDIDR_WRPS_NUM(dbg_didr)); +} + +static __inline uint32_t +dgb_get_brp_num(void) +{ + uint32_t dbg_didr; + + dbg_didr = cp14_dbgdidr_get(); + + return (DBGDIDR_BRPS_NUM(dbg_didr)); +} + +static int +dbg_reset_state(void) +{ + u_int cpuid; + size_t i; + int err; + + cpuid = PCPU_GET(cpuid); + err = 0; + + switch (dbg_model) { + case ID_DFR0_CP_DEBUG_M_V6: + /* v6 Debug logic reset upon power-up */ + return (0); + case ID_DFR0_CP_DEBUG_M_V6_1: + /* Is core power domain powered up? */ + if ((cp14_dbgprsr_get() & DBGPRSR_PU) == 0) + err = ENXIO; + + if (err != 0) + break; + + if (dbg_ossr) + goto vectr_clr; + break; + case ID_DFR0_CP_DEBUG_M_V7: + break; + case ID_DFR0_CP_DEBUG_M_V7_1: + /* Is double lock set? */ + if ((cp14_dbgosdlr_get() & DBGPRSR_DLK) != 0) + err = ENXIO; + + break; + default: + break; + } + + if (err != 0) { + db_printf("Debug facility locked (CPU%d)\n", cpuid); + return (err); + } + + /* + * DBGOSLAR is always implemented for v7.1 Debug Arch. however is + * optional for v7 (depends on OS save and restore support). + */ + if (((dbg_model & ID_DFR0_CP_DEBUG_M_V7_1) != 0) || dbg_ossr) { + /* + * Clear OS lock. + * Writing any other value than 0xC5ACCESS will unlock. + */ + cp14_dbgoslar_set(0); + isb(); + } + +vectr_clr: + /* + * After reset we must ensure that DBGVCR has a defined value. + * Disable all vector catch events. Safe to use - required in all + * implementations. + */ + cp14_dbgvcr_set(0); + isb(); + + /* + * We have limited number of {watch,break}points, each consists of + * two registers: + * - wcr/bcr regsiter configurates corresponding {watch,break}point + * behaviour + * - wvr/bvr register keeps address we are hunting for + * + * Reset all breakpoints and watchpoints. + */ + for (i = 0; i < dbg_watchpoint_num; ++i) { + dbg_wb_write_reg(DBG_REG_BASE_WCR, i, 0); + dbg_wb_write_reg(DBG_REG_BASE_WVR, i, 0); + } + + for (i = 0; i < dbg_breakpoint_num; ++i) { + dbg_wb_write_reg(DBG_REG_BASE_BCR, i, 0); + dbg_wb_write_reg(DBG_REG_BASE_BVR, i, 0); + } + + return (0); +} + +void +dbg_monitor_init(void) +{ + int err; + + /* Fetch ARM Debug Architecture model */ + dbg_model = dbg_get_debug_model(); + + if (!dbg_arch_supported()) { + db_printf("ARM Debug Architecture not supported\n"); + return; + } + + if (bootverbose) { + db_printf("ARM Debug Architecture %s\n", + (dbg_model == ID_DFR0_CP_DEBUG_M_V6) ? "v6" : + (dbg_model == ID_DFR0_CP_DEBUG_M_V6_1) ? "v6.1" : + (dbg_model == ID_DFR0_CP_DEBUG_M_V7) ? "v7" : + (dbg_model == ID_DFR0_CP_DEBUG_M_V7_1) ? "v7.1" : "unknown"); + } + + /* Do we have OS Save and Restore mechanism? */ + dbg_ossr = dbg_get_ossr(); + + /* Find out many breakpoints and watchpoints we can use */ + dbg_watchpoint_num = dbg_get_wrp_num(); + dbg_breakpoint_num = dgb_get_brp_num(); + + if (bootverbose) { + db_printf("%d watchpoints and %d breakpoints supported\n", + dbg_watchpoint_num, dbg_breakpoint_num); + } + + err = dbg_reset_state(); + if (err == 0) { + dbg_capable = TRUE; + return; + } + + db_printf("HW Breakpoints/Watchpoints not enabled on CPU%d\n", + PCPU_GET(cpuid)); +} Property changes on: projects/clang380-import/sys/arm/arm/debug_monitor.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/arm/arm/elf_machdep.c =================================================================== --- projects/clang380-import/sys/arm/arm/elf_machdep.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/elf_machdep.c (revision 294777) @@ -1,283 +1,295 @@ /*- * Copyright 1996-1998 John D. Polstra. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static boolean_t elf32_arm_abi_supported(struct image_params *); struct sysentvec elf32_freebsd_sysvec = { .sv_size = SYS_MAXSYSCALL, .sv_table = sysent, .sv_mask = 0, .sv_errsize = 0, .sv_errtbl = NULL, .sv_transtrap = NULL, .sv_fixup = __elfN(freebsd_fixup), .sv_sendsig = sendsig, .sv_sigcode = sigcode, .sv_szsigcode = &szsigcode, .sv_name = "FreeBSD ELF32", .sv_coredump = __elfN(coredump), .sv_imgact_try = NULL, .sv_minsigstksz = MINSIGSTKSZ, .sv_pagesize = PAGE_SIZE, .sv_minuser = VM_MIN_ADDRESS, .sv_maxuser = VM_MAXUSER_ADDRESS, .sv_usrstack = USRSTACK, .sv_psstrings = PS_STRINGS, .sv_stackprot = VM_PROT_ALL, .sv_copyout_strings = exec_copyout_strings, .sv_setregs = exec_setregs, .sv_fixlimit = NULL, .sv_maxssiz = NULL, .sv_flags = #if __ARM_ARCH >= 6 SV_SHP | SV_TIMEKEEP | #endif SV_ABI_FREEBSD | SV_ILP32, .sv_set_syscall_retval = cpu_set_syscall_retval, .sv_fetch_syscall_args = cpu_fetch_syscall_args, .sv_syscallnames = syscallnames, .sv_shared_page_base = SHAREDPAGE, .sv_shared_page_len = PAGE_SIZE, .sv_schedtail = NULL, .sv_thread_detach = NULL, .sv_trap = NULL, }; INIT_SYSENTVEC(elf32_sysvec, &elf32_freebsd_sysvec); static Elf32_Brandinfo freebsd_brand_info = { .brand = ELFOSABI_FREEBSD, .machine = EM_ARM, .compat_3_brand = "FreeBSD", .emul_path = NULL, .interp_path = "/libexec/ld-elf.so.1", .sysvec = &elf32_freebsd_sysvec, .interp_newpath = NULL, .brand_note = &elf32_freebsd_brandnote, .flags = BI_CAN_EXEC_DYN | BI_BRAND_NOTE, .header_supported= elf32_arm_abi_supported, }; SYSINIT(elf32, SI_SUB_EXEC, SI_ORDER_FIRST, (sysinit_cfunc_t) elf32_insert_brand_entry, &freebsd_brand_info); static boolean_t elf32_arm_abi_supported(struct image_params *imgp) { const Elf_Ehdr *hdr = (const Elf_Ehdr *)imgp->image_header; /* * When configured for EABI, FreeBSD supports EABI vesions 4 and 5. */ if (EF_ARM_EABI_VERSION(hdr->e_flags) < EF_ARM_EABI_FREEBSD_MIN) { if (bootverbose) uprintf("Attempting to execute non EABI binary (rev %d) image %s", EF_ARM_EABI_VERSION(hdr->e_flags), imgp->args->fname); return (FALSE); } return (TRUE); } void elf32_dump_thread(struct thread *td __unused, void *dst __unused, size_t *off __unused) { } /* * It is possible for the compiler to emit relocations for unaligned data. * We handle this situation with these inlines. */ #define RELOC_ALIGNED_P(x) \ (((uintptr_t)(x) & (sizeof(void *) - 1)) == 0) static __inline Elf_Addr load_ptr(Elf_Addr *where) { Elf_Addr res; if (RELOC_ALIGNED_P(where)) return *where; memcpy(&res, where, sizeof(res)); return (res); } static __inline void store_ptr(Elf_Addr *where, Elf_Addr val) { if (RELOC_ALIGNED_P(where)) *where = val; else memcpy(where, &val, sizeof(val)); } #undef RELOC_ALIGNED_P /* Process one elf relocation with addend. */ static int elf_reloc_internal(linker_file_t lf, Elf_Addr relocbase, const void *data, int type, int local, elf_lookup_fn lookup) { Elf_Addr *where; Elf_Addr addr; Elf_Addr addend; Elf_Word rtype, symidx; const Elf_Rel *rel; const Elf_Rela *rela; int error; switch (type) { case ELF_RELOC_REL: rel = (const Elf_Rel *)data; where = (Elf_Addr *) (relocbase + rel->r_offset); addend = load_ptr(where); rtype = ELF_R_TYPE(rel->r_info); symidx = ELF_R_SYM(rel->r_info); break; case ELF_RELOC_RELA: rela = (const Elf_Rela *)data; where = (Elf_Addr *) (relocbase + rela->r_offset); addend = rela->r_addend; rtype = ELF_R_TYPE(rela->r_info); symidx = ELF_R_SYM(rela->r_info); break; default: panic("unknown reloc type %d\n", type); } if (local) { if (rtype == R_ARM_RELATIVE) { /* A + B */ addr = elf_relocaddr(lf, relocbase + addend); if (load_ptr(where) != addr) store_ptr(where, addr); } return (0); } switch (rtype) { case R_ARM_NONE: /* none */ break; case R_ARM_ABS32: error = lookup(lf, symidx, 1, &addr); if (error != 0) return -1; store_ptr(where, addr + load_ptr(where)); break; case R_ARM_COPY: /* none */ /* * There shouldn't be copy relocations in kernel * objects. */ printf("kldload: unexpected R_COPY relocation\n"); return -1; break; case R_ARM_JUMP_SLOT: error = lookup(lf, symidx, 1, &addr); if (error == 0) { store_ptr(where, addr); return (0); } return (-1); case R_ARM_RELATIVE: break; default: printf("kldload: unexpected relocation type %d\n", rtype); return -1; } return(0); } int elf_reloc(linker_file_t lf, Elf_Addr relocbase, const void *data, int type, elf_lookup_fn lookup) { return (elf_reloc_internal(lf, relocbase, data, type, 0, lookup)); } int elf_reloc_local(linker_file_t lf, Elf_Addr relocbase, const void *data, int type, elf_lookup_fn lookup) { return (elf_reloc_internal(lf, relocbase, data, type, 1, lookup)); } int -elf_cpu_load_file(linker_file_t lf __unused) +elf_cpu_load_file(linker_file_t lf) { /* * The pmap code does not do an icache sync upon establishing executable * mappings in the kernel pmap. It's an optimization based on the fact * that kernel memory allocations always have EXECUTABLE protection even * when the memory isn't going to hold executable code. The only time * kernel memory holding instructions does need a sync is after loading - * a kernel module, and that's when this function gets called. Normal - * data cache maintenance has already been done by the IO code, and TLB - * maintenance has been done by the pmap code, so all we have to do here - * is invalidate the instruction cache (which also invalidates the - * branch predictor cache on platforms that have one). + * a kernel module, and that's when this function gets called. + * + * This syncs data and instruction caches after loading a module. We + * don't worry about the kernel itself (lf->id is 1) as locore.S did + * that on entry. Even if data cache maintenance was done by IO code, + * the relocation fixup process creates dirty cache entries that we must + * write back before doing icache sync. The instruction cache sync also + * invalidates the branch predictor cache on platforms that have one. */ + if (lf->id == 1) + return (0); +#if __ARM_ARCH >= 6 + dcache_wb_pou((vm_offset_t)lf->address, (vm_size_t)lf->size); + icache_inv_all(); +#else + cpu_dcache_wb_range((vm_offset_t)lf->address, (vm_size_t)lf->size); + cpu_l2cache_wb_range((vm_offset_t)lf->address, (vm_size_t)lf->size); cpu_icache_sync_all(); +#endif return (0); } int elf_cpu_unload_file(linker_file_t lf __unused) { return (0); } Index: projects/clang380-import/sys/arm/arm/machdep.c =================================================================== --- projects/clang380-import/sys/arm/arm/machdep.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/machdep.c (revision 294777) @@ -1,1917 +1,1920 @@ /* $NetBSD: arm32_machdep.c,v 1.44 2004/03/24 15:34:47 atatat Exp $ */ /*- * Copyright (c) 2004 Olivier Houchard * Copyright (c) 1994-1998 Mark Brinicombe. * Copyright (c) 1994 Brini. * All rights reserved. * * This code is derived from software written for Brini by Mark Brinicombe * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by Mark Brinicombe * for the NetBSD Project. * 4. The name of the company nor the name of the author may be used to * endorse or promote products derived from this software without specific * prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * Machine dependant functions for kernel setup * * Created : 17/09/94 * Updated : 18/04/01 updated for new wscons */ #include "opt_compat.h" #include "opt_ddb.h" #include "opt_kstack_pages.h" #include "opt_platform.h" #include "opt_sched.h" #include "opt_timer.h" #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef FDT #include #include #endif #ifdef DDB #include #if __ARM_ARCH >= 6 #include DB_SHOW_COMMAND(cp15, db_show_cp15) { u_int reg; reg = cp15_midr_get(); db_printf("Cpu ID: 0x%08x\n", reg); reg = cp15_ctr_get(); db_printf("Current Cache Lvl ID: 0x%08x\n",reg); reg = cp15_sctlr_get(); db_printf("Ctrl: 0x%08x\n",reg); reg = cp15_actlr_get(); db_printf("Aux Ctrl: 0x%08x\n",reg); reg = cp15_id_pfr0_get(); db_printf("Processor Feat 0: 0x%08x\n", reg); reg = cp15_id_pfr1_get(); db_printf("Processor Feat 1: 0x%08x\n", reg); reg = cp15_id_dfr0_get(); db_printf("Debug Feat 0: 0x%08x\n", reg); reg = cp15_id_afr0_get(); db_printf("Auxiliary Feat 0: 0x%08x\n", reg); reg = cp15_id_mmfr0_get(); db_printf("Memory Model Feat 0: 0x%08x\n", reg); reg = cp15_id_mmfr1_get(); db_printf("Memory Model Feat 1: 0x%08x\n", reg); reg = cp15_id_mmfr2_get(); db_printf("Memory Model Feat 2: 0x%08x\n", reg); reg = cp15_id_mmfr3_get(); db_printf("Memory Model Feat 3: 0x%08x\n", reg); reg = cp15_ttbr_get(); db_printf("TTB0: 0x%08x\n", reg); } DB_SHOW_COMMAND(vtop, db_show_vtop) { u_int reg; if (have_addr) { cp15_ats1cpr_set(addr); reg = cp15_par_get(); db_printf("Physical address reg: 0x%08x\n",reg); } else db_printf("show vtop \n"); } #endif /* __ARM_ARCH >= 6 */ #endif /* DDB */ #ifdef DEBUG #define debugf(fmt, args...) printf(fmt, ##args) #else #define debugf(fmt, args...) #endif struct pcpu __pcpu[MAXCPU]; struct pcpu *pcpup = &__pcpu[0]; static struct trapframe proc0_tf; uint32_t cpu_reset_address = 0; int cold = 1; vm_offset_t vector_page; int (*_arm_memcpy)(void *, void *, int, int) = NULL; int (*_arm_bzero)(void *, int, int) = NULL; int _min_memcpy_size = 0; int _min_bzero_size = 0; extern int *end; #ifdef FDT static char *loader_envp; vm_paddr_t pmap_pa; #ifdef ARM_NEW_PMAP vm_offset_t systempage; vm_offset_t irqstack; vm_offset_t undstack; vm_offset_t abtstack; #else /* * This is the number of L2 page tables required for covering max * (hypothetical) memsize of 4GB and all kernel mappings (vectors, msgbuf, * stacks etc.), uprounded to be divisible by 4. */ #define KERNEL_PT_MAX 78 static struct pv_addr kernel_pt_table[KERNEL_PT_MAX]; struct pv_addr systempage; static struct pv_addr msgbufpv; struct pv_addr irqstack; struct pv_addr undstack; struct pv_addr abtstack; static struct pv_addr kernelstack; #endif #endif #if defined(LINUX_BOOT_ABI) #define LBABI_MAX_BANKS 10 uint32_t board_id; struct arm_lbabi_tag *atag_list; char linux_command_line[LBABI_MAX_COMMAND_LINE + 1]; char atags[LBABI_MAX_COMMAND_LINE * 2]; uint32_t memstart[LBABI_MAX_BANKS]; uint32_t memsize[LBABI_MAX_BANKS]; uint32_t membanks; #endif static uint32_t board_revision; /* hex representation of uint64_t */ static char board_serial[32]; SYSCTL_NODE(_hw, OID_AUTO, board, CTLFLAG_RD, 0, "Board attributes"); SYSCTL_UINT(_hw_board, OID_AUTO, revision, CTLFLAG_RD, &board_revision, 0, "Board revision"); SYSCTL_STRING(_hw_board, OID_AUTO, serial, CTLFLAG_RD, board_serial, 0, "Board serial"); int vfp_exists; SYSCTL_INT(_hw, HW_FLOATINGPT, floatingpoint, CTLFLAG_RD, &vfp_exists, 0, "Floating point support enabled"); void board_set_serial(uint64_t serial) { snprintf(board_serial, sizeof(board_serial)-1, "%016jx", serial); } void board_set_revision(uint32_t revision) { board_revision = revision; } void sendsig(catcher, ksi, mask) sig_t catcher; ksiginfo_t *ksi; sigset_t *mask; { struct thread *td; struct proc *p; struct trapframe *tf; struct sigframe *fp, frame; struct sigacts *psp; struct sysentvec *sysent; int onstack; int sig; int code; td = curthread; p = td->td_proc; PROC_LOCK_ASSERT(p, MA_OWNED); sig = ksi->ksi_signo; code = ksi->ksi_code; psp = p->p_sigacts; mtx_assert(&psp->ps_mtx, MA_OWNED); tf = td->td_frame; onstack = sigonstack(tf->tf_usr_sp); CTR4(KTR_SIG, "sendsig: td=%p (%s) catcher=%p sig=%d", td, p->p_comm, catcher, sig); /* Allocate and validate space for the signal handler context. */ if ((td->td_pflags & TDP_ALTSTACK) != 0 && !(onstack) && SIGISMEMBER(psp->ps_sigonstack, sig)) { fp = (struct sigframe *)(td->td_sigstk.ss_sp + td->td_sigstk.ss_size); #if defined(COMPAT_43) td->td_sigstk.ss_flags |= SS_ONSTACK; #endif } else fp = (struct sigframe *)td->td_frame->tf_usr_sp; /* make room on the stack */ fp--; /* make the stack aligned */ fp = (struct sigframe *)STACKALIGN(fp); /* Populate the siginfo frame. */ get_mcontext(td, &frame.sf_uc.uc_mcontext, 0); frame.sf_si = ksi->ksi_info; frame.sf_uc.uc_sigmask = *mask; frame.sf_uc.uc_stack.ss_flags = (td->td_pflags & TDP_ALTSTACK ) ? ((onstack) ? SS_ONSTACK : 0) : SS_DISABLE; frame.sf_uc.uc_stack = td->td_sigstk; mtx_unlock(&psp->ps_mtx); PROC_UNLOCK(td->td_proc); /* Copy the sigframe out to the user's stack. */ if (copyout(&frame, fp, sizeof(*fp)) != 0) { /* Process has trashed its stack. Kill it. */ CTR2(KTR_SIG, "sendsig: sigexit td=%p fp=%p", td, fp); PROC_LOCK(p); sigexit(td, SIGILL); } /* * Build context to run handler in. We invoke the handler * directly, only returning via the trampoline. Note the * trampoline version numbers are coordinated with machine- * dependent code in libc. */ tf->tf_r0 = sig; tf->tf_r1 = (register_t)&fp->sf_si; tf->tf_r2 = (register_t)&fp->sf_uc; /* the trampoline uses r5 as the uc address */ tf->tf_r5 = (register_t)&fp->sf_uc; tf->tf_pc = (register_t)catcher; tf->tf_usr_sp = (register_t)fp; sysent = p->p_sysent; if (sysent->sv_sigcode_base != 0) tf->tf_usr_lr = (register_t)sysent->sv_sigcode_base; else tf->tf_usr_lr = (register_t)(sysent->sv_psstrings - *(sysent->sv_szsigcode)); /* Set the mode to enter in the signal handler */ #if __ARM_ARCH >= 7 if ((register_t)catcher & 1) tf->tf_spsr |= PSR_T; else tf->tf_spsr &= ~PSR_T; #endif CTR3(KTR_SIG, "sendsig: return td=%p pc=%#x sp=%#x", td, tf->tf_usr_lr, tf->tf_usr_sp); PROC_LOCK(p); mtx_lock(&psp->ps_mtx); } struct kva_md_info kmi; /* * arm32_vector_init: * * Initialize the vector page, and select whether or not to * relocate the vectors. * * NOTE: We expect the vector page to be mapped at its expected * destination. */ extern unsigned int page0[], page0_data[]; void arm_vector_init(vm_offset_t va, int which) { unsigned int *vectors = (int *) va; unsigned int *vectors_data = vectors + (page0_data - page0); int vec; /* * Loop through the vectors we're taking over, and copy the * vector's insn and data word. */ for (vec = 0; vec < ARM_NVEC; vec++) { if ((which & (1 << vec)) == 0) { /* Don't want to take over this vector. */ continue; } vectors[vec] = page0[vec]; vectors_data[vec] = page0_data[vec]; } /* Now sync the vectors. */ cpu_icache_sync_range(va, (ARM_NVEC * 2) * sizeof(u_int)); vector_page = va; if (va == ARM_VECTORS_HIGH) { /* * Assume the MD caller knows what it's doing here, and * really does want the vector page relocated. * * Note: This has to be done here (and not just in * cpu_setup()) because the vector page needs to be * accessible *before* cpu_startup() is called. * Think ddb(9) ... * * NOTE: If the CPU control register is not readable, * this will totally fail! We'll just assume that * any system that has high vector support has a * readable CPU control register, for now. If we * ever encounter one that does not, we'll have to * rethink this. */ cpu_control(CPU_CONTROL_VECRELOC, CPU_CONTROL_VECRELOC); } } static void cpu_startup(void *dummy) { struct pcb *pcb = thread0.td_pcb; const unsigned int mbyte = 1024 * 1024; #ifdef ARM_TP_ADDRESS #ifndef ARM_CACHE_LOCK_ENABLE vm_page_t m; #endif #endif identify_arm_cpu(); vm_ksubmap_init(&kmi); /* * Display the RAM layout. */ printf("real memory = %ju (%ju MB)\n", (uintmax_t)arm32_ptob(realmem), (uintmax_t)arm32_ptob(realmem) / mbyte); printf("avail memory = %ju (%ju MB)\n", (uintmax_t)arm32_ptob(vm_cnt.v_free_count), (uintmax_t)arm32_ptob(vm_cnt.v_free_count) / mbyte); if (bootverbose) { arm_physmem_print_tables(); arm_devmap_print_table(); } bufinit(); vm_pager_bufferinit(); pcb->pcb_regs.sf_sp = (u_int)thread0.td_kstack + USPACE_SVC_STACK_TOP; pmap_set_pcb_pagedir(pmap_kernel(), pcb); #ifndef ARM_NEW_PMAP vector_page_setprot(VM_PROT_READ); pmap_postinit(); #endif #ifdef ARM_TP_ADDRESS #ifdef ARM_CACHE_LOCK_ENABLE pmap_kenter_user(ARM_TP_ADDRESS, ARM_TP_ADDRESS); arm_lock_cache_line(ARM_TP_ADDRESS); #else m = vm_page_alloc(NULL, 0, VM_ALLOC_NOOBJ | VM_ALLOC_ZERO); pmap_kenter_user(ARM_TP_ADDRESS, VM_PAGE_TO_PHYS(m)); #endif *(uint32_t *)ARM_RAS_START = 0; *(uint32_t *)ARM_RAS_END = 0xffffffff; #endif } SYSINIT(cpu, SI_SUB_CPU, SI_ORDER_FIRST, cpu_startup, NULL); /* * Flush the D-cache for non-DMA I/O so that the I-cache can * be made coherent later. */ void cpu_flush_dcache(void *ptr, size_t len) { cpu_dcache_wb_range((uintptr_t)ptr, len); #ifdef ARM_L2_PIPT cpu_l2cache_wb_range((uintptr_t)vtophys(ptr), len); #else cpu_l2cache_wb_range((uintptr_t)ptr, len); #endif } /* Get current clock frequency for the given cpu id. */ int cpu_est_clockrate(int cpu_id, uint64_t *rate) { return (ENXIO); } void cpu_idle(int busy) { CTR2(KTR_SPARE2, "cpu_idle(%d) at %d", busy, curcpu); spinlock_enter(); #ifndef NO_EVENTTIMERS if (!busy) cpu_idleclock(); #endif if (!sched_runnable()) cpu_sleep(0); #ifndef NO_EVENTTIMERS if (!busy) cpu_activeclock(); #endif spinlock_exit(); CTR2(KTR_SPARE2, "cpu_idle(%d) at %d done", busy, curcpu); } int cpu_idle_wakeup(int cpu) { return (0); } /* * Most ARM platforms don't need to do anything special to init their clocks * (they get intialized during normal device attachment), and by not defining a * cpu_initclocks() function they get this generic one. Any platform that needs * to do something special can just provide their own implementation, which will * override this one due to the weak linkage. */ void arm_generic_initclocks(void) { #ifndef NO_EVENTTIMERS #ifdef SMP if (PCPU_GET(cpuid) == 0) cpu_initclocks_bsp(); else cpu_initclocks_ap(); #else cpu_initclocks_bsp(); #endif #endif } __weak_reference(arm_generic_initclocks, cpu_initclocks); int fill_regs(struct thread *td, struct reg *regs) { struct trapframe *tf = td->td_frame; bcopy(&tf->tf_r0, regs->r, sizeof(regs->r)); regs->r_sp = tf->tf_usr_sp; regs->r_lr = tf->tf_usr_lr; regs->r_pc = tf->tf_pc; regs->r_cpsr = tf->tf_spsr; return (0); } int fill_fpregs(struct thread *td, struct fpreg *regs) { bzero(regs, sizeof(*regs)); return (0); } int set_regs(struct thread *td, struct reg *regs) { struct trapframe *tf = td->td_frame; bcopy(regs->r, &tf->tf_r0, sizeof(regs->r)); tf->tf_usr_sp = regs->r_sp; tf->tf_usr_lr = regs->r_lr; tf->tf_pc = regs->r_pc; tf->tf_spsr &= ~PSR_FLAGS; tf->tf_spsr |= regs->r_cpsr & PSR_FLAGS; return (0); } int set_fpregs(struct thread *td, struct fpreg *regs) { return (0); } int fill_dbregs(struct thread *td, struct dbreg *regs) { return (0); } int set_dbregs(struct thread *td, struct dbreg *regs) { return (0); } static int ptrace_read_int(struct thread *td, vm_offset_t addr, uint32_t *v) { if (proc_readmem(td, td->td_proc, addr, v, sizeof(*v)) != sizeof(*v)) return (ENOMEM); return (0); } static int ptrace_write_int(struct thread *td, vm_offset_t addr, uint32_t v) { if (proc_writemem(td, td->td_proc, addr, &v, sizeof(v)) != sizeof(v)) return (ENOMEM); return (0); } static u_int ptrace_get_usr_reg(void *cookie, int reg) { int ret; struct thread *td = cookie; KASSERT(((reg >= 0) && (reg <= ARM_REG_NUM_PC)), ("reg is outside range")); switch(reg) { case ARM_REG_NUM_PC: ret = td->td_frame->tf_pc; break; case ARM_REG_NUM_LR: ret = td->td_frame->tf_usr_lr; break; case ARM_REG_NUM_SP: ret = td->td_frame->tf_usr_sp; break; default: ret = *((register_t*)&td->td_frame->tf_r0 + reg); break; } return (ret); } static u_int ptrace_get_usr_int(void* cookie, vm_offset_t offset, u_int* val) { struct thread *td = cookie; u_int error; error = ptrace_read_int(td, offset, val); return (error); } /** * This function parses current instruction opcode and decodes * any possible jump (change in PC) which might occur after * the instruction is executed. * * @param td Thread structure of analysed task * @param cur_instr Currently executed instruction * @param alt_next_address Pointer to the variable where * the destination address of the * jump instruction shall be stored. * * @return <0> when jump is possible * otherwise */ static int ptrace_get_alternative_next(struct thread *td, uint32_t cur_instr, uint32_t *alt_next_address) { int error; if (inst_branch(cur_instr) || inst_call(cur_instr) || inst_return(cur_instr)) { error = arm_predict_branch(td, cur_instr, td->td_frame->tf_pc, alt_next_address, ptrace_get_usr_reg, ptrace_get_usr_int); return (error); } return (EINVAL); } int ptrace_single_step(struct thread *td) { struct proc *p; int error, error_alt; uint32_t cur_instr, alt_next = 0; /* TODO: This needs to be updated for Thumb-2 */ if ((td->td_frame->tf_spsr & PSR_T) != 0) return (EINVAL); KASSERT(td->td_md.md_ptrace_instr == 0, ("Didn't clear single step")); KASSERT(td->td_md.md_ptrace_instr_alt == 0, ("Didn't clear alternative single step")); p = td->td_proc; PROC_UNLOCK(p); error = ptrace_read_int(td, td->td_frame->tf_pc, &cur_instr); if (error) goto out; error = ptrace_read_int(td, td->td_frame->tf_pc + INSN_SIZE, &td->td_md.md_ptrace_instr); if (error == 0) { error = ptrace_write_int(td, td->td_frame->tf_pc + INSN_SIZE, PTRACE_BREAKPOINT); if (error) { td->td_md.md_ptrace_instr = 0; } else { td->td_md.md_ptrace_addr = td->td_frame->tf_pc + INSN_SIZE; } } error_alt = ptrace_get_alternative_next(td, cur_instr, &alt_next); if (error_alt == 0) { error_alt = ptrace_read_int(td, alt_next, &td->td_md.md_ptrace_instr_alt); if (error_alt) { td->td_md.md_ptrace_instr_alt = 0; } else { error_alt = ptrace_write_int(td, alt_next, PTRACE_BREAKPOINT); if (error_alt) td->td_md.md_ptrace_instr_alt = 0; else td->td_md.md_ptrace_addr_alt = alt_next; } } out: PROC_LOCK(p); return ((error != 0) && (error_alt != 0)); } int ptrace_clear_single_step(struct thread *td) { struct proc *p; /* TODO: This needs to be updated for Thumb-2 */ if ((td->td_frame->tf_spsr & PSR_T) != 0) return (EINVAL); if (td->td_md.md_ptrace_instr != 0) { p = td->td_proc; PROC_UNLOCK(p); ptrace_write_int(td, td->td_md.md_ptrace_addr, td->td_md.md_ptrace_instr); PROC_LOCK(p); td->td_md.md_ptrace_instr = 0; } if (td->td_md.md_ptrace_instr_alt != 0) { p = td->td_proc; PROC_UNLOCK(p); ptrace_write_int(td, td->td_md.md_ptrace_addr_alt, td->td_md.md_ptrace_instr_alt); PROC_LOCK(p); td->td_md.md_ptrace_instr_alt = 0; } return (0); } int ptrace_set_pc(struct thread *td, unsigned long addr) { td->td_frame->tf_pc = addr; return (0); } void cpu_pcpu_init(struct pcpu *pcpu, int cpuid, size_t size) { } void spinlock_enter(void) { struct thread *td; register_t cspr; td = curthread; if (td->td_md.md_spinlock_count == 0) { cspr = disable_interrupts(PSR_I | PSR_F); td->td_md.md_spinlock_count = 1; td->td_md.md_saved_cspr = cspr; } else td->td_md.md_spinlock_count++; critical_enter(); } void spinlock_exit(void) { struct thread *td; register_t cspr; td = curthread; critical_exit(); cspr = td->td_md.md_saved_cspr; td->td_md.md_spinlock_count--; if (td->td_md.md_spinlock_count == 0) restore_interrupts(cspr); } /* * Clear registers on exec */ void exec_setregs(struct thread *td, struct image_params *imgp, u_long stack) { struct trapframe *tf = td->td_frame; memset(tf, 0, sizeof(*tf)); tf->tf_usr_sp = stack; tf->tf_usr_lr = imgp->entry_addr; tf->tf_svc_lr = 0x77777777; tf->tf_pc = imgp->entry_addr; tf->tf_spsr = PSR_USR32_MODE; } /* * Get machine context. */ int get_mcontext(struct thread *td, mcontext_t *mcp, int clear_ret) { struct trapframe *tf = td->td_frame; __greg_t *gr = mcp->__gregs; if (clear_ret & GET_MC_CLEAR_RET) { gr[_REG_R0] = 0; gr[_REG_CPSR] = tf->tf_spsr & ~PSR_C; } else { gr[_REG_R0] = tf->tf_r0; gr[_REG_CPSR] = tf->tf_spsr; } gr[_REG_R1] = tf->tf_r1; gr[_REG_R2] = tf->tf_r2; gr[_REG_R3] = tf->tf_r3; gr[_REG_R4] = tf->tf_r4; gr[_REG_R5] = tf->tf_r5; gr[_REG_R6] = tf->tf_r6; gr[_REG_R7] = tf->tf_r7; gr[_REG_R8] = tf->tf_r8; gr[_REG_R9] = tf->tf_r9; gr[_REG_R10] = tf->tf_r10; gr[_REG_R11] = tf->tf_r11; gr[_REG_R12] = tf->tf_r12; gr[_REG_SP] = tf->tf_usr_sp; gr[_REG_LR] = tf->tf_usr_lr; gr[_REG_PC] = tf->tf_pc; return (0); } /* * Set machine context. * * However, we don't set any but the user modifiable flags, and we won't * touch the cs selector. */ int set_mcontext(struct thread *td, mcontext_t *mcp) { struct trapframe *tf = td->td_frame; const __greg_t *gr = mcp->__gregs; tf->tf_r0 = gr[_REG_R0]; tf->tf_r1 = gr[_REG_R1]; tf->tf_r2 = gr[_REG_R2]; tf->tf_r3 = gr[_REG_R3]; tf->tf_r4 = gr[_REG_R4]; tf->tf_r5 = gr[_REG_R5]; tf->tf_r6 = gr[_REG_R6]; tf->tf_r7 = gr[_REG_R7]; tf->tf_r8 = gr[_REG_R8]; tf->tf_r9 = gr[_REG_R9]; tf->tf_r10 = gr[_REG_R10]; tf->tf_r11 = gr[_REG_R11]; tf->tf_r12 = gr[_REG_R12]; tf->tf_usr_sp = gr[_REG_SP]; tf->tf_usr_lr = gr[_REG_LR]; tf->tf_pc = gr[_REG_PC]; tf->tf_spsr = gr[_REG_CPSR]; return (0); } /* * MPSAFE */ int sys_sigreturn(td, uap) struct thread *td; struct sigreturn_args /* { const struct __ucontext *sigcntxp; } */ *uap; { ucontext_t uc; int spsr; if (uap == NULL) return (EFAULT); if (copyin(uap->sigcntxp, &uc, sizeof(uc))) return (EFAULT); /* * Make sure the processor mode has not been tampered with and * interrupts have not been disabled. */ spsr = uc.uc_mcontext.__gregs[_REG_CPSR]; if ((spsr & PSR_MODE) != PSR_USR32_MODE || (spsr & (PSR_I | PSR_F)) != 0) return (EINVAL); /* Restore register context. */ set_mcontext(td, &uc.uc_mcontext); /* Restore signal mask. */ kern_sigprocmask(td, SIG_SETMASK, &uc.uc_sigmask, NULL, 0); return (EJUSTRETURN); } /* * Construct a PCB from a trapframe. This is called from kdb_trap() where * we want to start a backtrace from the function that caused us to enter * the debugger. We have the context in the trapframe, but base the trace * on the PCB. The PCB doesn't have to be perfect, as long as it contains * enough for a backtrace. */ void makectx(struct trapframe *tf, struct pcb *pcb) { pcb->pcb_regs.sf_r4 = tf->tf_r4; pcb->pcb_regs.sf_r5 = tf->tf_r5; pcb->pcb_regs.sf_r6 = tf->tf_r6; pcb->pcb_regs.sf_r7 = tf->tf_r7; pcb->pcb_regs.sf_r8 = tf->tf_r8; pcb->pcb_regs.sf_r9 = tf->tf_r9; pcb->pcb_regs.sf_r10 = tf->tf_r10; pcb->pcb_regs.sf_r11 = tf->tf_r11; pcb->pcb_regs.sf_r12 = tf->tf_r12; pcb->pcb_regs.sf_pc = tf->tf_pc; pcb->pcb_regs.sf_lr = tf->tf_usr_lr; pcb->pcb_regs.sf_sp = tf->tf_usr_sp; } /* * Fake up a boot descriptor table */ vm_offset_t fake_preload_metadata(struct arm_boot_params *abp __unused) { #ifdef DDB vm_offset_t zstart = 0, zend = 0; #endif vm_offset_t lastaddr; int i = 0; static uint32_t fake_preload[35]; fake_preload[i++] = MODINFO_NAME; fake_preload[i++] = strlen("kernel") + 1; strcpy((char*)&fake_preload[i++], "kernel"); i += 1; fake_preload[i++] = MODINFO_TYPE; fake_preload[i++] = strlen("elf kernel") + 1; strcpy((char*)&fake_preload[i++], "elf kernel"); i += 2; fake_preload[i++] = MODINFO_ADDR; fake_preload[i++] = sizeof(vm_offset_t); fake_preload[i++] = KERNVIRTADDR; fake_preload[i++] = MODINFO_SIZE; fake_preload[i++] = sizeof(uint32_t); fake_preload[i++] = (uint32_t)&end - KERNVIRTADDR; #ifdef DDB if (*(uint32_t *)KERNVIRTADDR == MAGIC_TRAMP_NUMBER) { fake_preload[i++] = MODINFO_METADATA|MODINFOMD_SSYM; fake_preload[i++] = sizeof(vm_offset_t); fake_preload[i++] = *(uint32_t *)(KERNVIRTADDR + 4); fake_preload[i++] = MODINFO_METADATA|MODINFOMD_ESYM; fake_preload[i++] = sizeof(vm_offset_t); fake_preload[i++] = *(uint32_t *)(KERNVIRTADDR + 8); lastaddr = *(uint32_t *)(KERNVIRTADDR + 8); zend = lastaddr; zstart = *(uint32_t *)(KERNVIRTADDR + 4); db_fetch_ksymtab(zstart, zend); } else #endif lastaddr = (vm_offset_t)&end; fake_preload[i++] = 0; fake_preload[i] = 0; preload_metadata = (void *)fake_preload; init_static_kenv(NULL, 0); return (lastaddr); } void pcpu0_init(void) { #if __ARM_ARCH >= 6 set_curthread(&thread0); #endif pcpu_init(pcpup, 0, sizeof(struct pcpu)); PCPU_SET(curthread, &thread0); } #if defined(LINUX_BOOT_ABI) vm_offset_t linux_parse_boot_param(struct arm_boot_params *abp) { struct arm_lbabi_tag *walker; uint32_t revision; uint64_t serial; /* * Linux boot ABI: r0 = 0, r1 is the board type (!= 0) and r2 * is atags or dtb pointer. If all of these aren't satisfied, * then punt. */ if (!(abp->abp_r0 == 0 && abp->abp_r1 != 0 && abp->abp_r2 != 0)) return 0; board_id = abp->abp_r1; walker = (struct arm_lbabi_tag *) (abp->abp_r2 + KERNVIRTADDR - abp->abp_physaddr); /* xxx - Need to also look for binary device tree */ if (ATAG_TAG(walker) != ATAG_CORE) return 0; atag_list = walker; while (ATAG_TAG(walker) != ATAG_NONE) { switch (ATAG_TAG(walker)) { case ATAG_CORE: break; case ATAG_MEM: arm_physmem_hardware_region(walker->u.tag_mem.start, walker->u.tag_mem.size); break; case ATAG_INITRD2: break; case ATAG_SERIAL: serial = walker->u.tag_sn.low | ((uint64_t)walker->u.tag_sn.high << 32); board_set_serial(serial); break; case ATAG_REVISION: revision = walker->u.tag_rev.rev; board_set_revision(revision); break; case ATAG_CMDLINE: /* XXX open question: Parse this for boothowto? */ bcopy(walker->u.tag_cmd.command, linux_command_line, ATAG_SIZE(walker)); break; default: break; } walker = ATAG_NEXT(walker); } /* Save a copy for later */ bcopy(atag_list, atags, (char *)walker - (char *)atag_list + ATAG_SIZE(walker)); init_static_kenv(NULL, 0); return fake_preload_metadata(abp); } #endif #if defined(FREEBSD_BOOT_LOADER) vm_offset_t freebsd_parse_boot_param(struct arm_boot_params *abp) { vm_offset_t lastaddr = 0; void *mdp; void *kmdp; #ifdef DDB vm_offset_t ksym_start; vm_offset_t ksym_end; #endif /* * Mask metadata pointer: it is supposed to be on page boundary. If * the first argument (mdp) doesn't point to a valid address the * bootloader must have passed us something else than the metadata * ptr, so we give up. Also give up if we cannot find metadta section * the loader creates that we get all this data out of. */ if ((mdp = (void *)(abp->abp_r0 & ~PAGE_MASK)) == NULL) return 0; preload_metadata = mdp; kmdp = preload_search_by_type("elf kernel"); if (kmdp == NULL) return 0; boothowto = MD_FETCH(kmdp, MODINFOMD_HOWTO, int); loader_envp = MD_FETCH(kmdp, MODINFOMD_ENVP, char *); init_static_kenv(loader_envp, 0); lastaddr = MD_FETCH(kmdp, MODINFOMD_KERNEND, vm_offset_t); #ifdef DDB ksym_start = MD_FETCH(kmdp, MODINFOMD_SSYM, uintptr_t); ksym_end = MD_FETCH(kmdp, MODINFOMD_ESYM, uintptr_t); db_fetch_ksymtab(ksym_start, ksym_end); #endif return lastaddr; } #endif vm_offset_t default_parse_boot_param(struct arm_boot_params *abp) { vm_offset_t lastaddr; #if defined(LINUX_BOOT_ABI) if ((lastaddr = linux_parse_boot_param(abp)) != 0) return lastaddr; #endif #if defined(FREEBSD_BOOT_LOADER) if ((lastaddr = freebsd_parse_boot_param(abp)) != 0) return lastaddr; #endif /* Fall back to hardcoded metadata. */ lastaddr = fake_preload_metadata(abp); return lastaddr; } /* * Stub version of the boot parameter parsing routine. We are * called early in initarm, before even VM has been initialized. * This routine needs to preserve any data that the boot loader * has passed in before the kernel starts to grow past the end * of the BSS, traditionally the place boot-loaders put this data. * * Since this is called so early, things that depend on the vm system * being setup (including access to some SoC's serial ports), about * all that can be done in this routine is to copy the arguments. * * This is the default boot parameter parsing routine. Individual * kernels/boards can override this weak function with one of their * own. We just fake metadata... */ __weak_reference(default_parse_boot_param, parse_boot_param); /* * Initialize proc0 */ void init_proc0(vm_offset_t kstack) { proc_linkup0(&proc0, &thread0); thread0.td_kstack = kstack; thread0.td_pcb = (struct pcb *) (thread0.td_kstack + kstack_pages * PAGE_SIZE) - 1; thread0.td_pcb->pcb_flags = 0; thread0.td_pcb->pcb_vfpcpu = -1; thread0.td_pcb->pcb_vfpstate.fpscr = VFPSCR_DN; thread0.td_frame = &proc0_tf; pcpup->pc_curpcb = thread0.td_pcb; } int arm_predict_branch(void *cookie, u_int insn, register_t pc, register_t *new_pc, u_int (*fetch_reg)(void*, int), u_int (*read_int)(void*, vm_offset_t, u_int*)) { u_int addr, nregs, offset = 0; int error = 0; switch ((insn >> 24) & 0xf) { case 0x2: /* add pc, reg1, #value */ case 0x0: /* add pc, reg1, reg2, lsl #offset */ addr = fetch_reg(cookie, (insn >> 16) & 0xf); if (((insn >> 16) & 0xf) == 15) addr += 8; if (insn & 0x0200000) { offset = (insn >> 7) & 0x1e; offset = (insn & 0xff) << (32 - offset) | (insn & 0xff) >> offset; } else { offset = fetch_reg(cookie, insn & 0x0f); if ((insn & 0x0000ff0) != 0x00000000) { if (insn & 0x10) nregs = fetch_reg(cookie, (insn >> 8) & 0xf); else nregs = (insn >> 7) & 0x1f; switch ((insn >> 5) & 3) { case 0: /* lsl */ offset = offset << nregs; break; case 1: /* lsr */ offset = offset >> nregs; break; default: break; /* XXX */ } } *new_pc = addr + offset; return (0); } case 0xa: /* b ... */ case 0xb: /* bl ... */ addr = ((insn << 2) & 0x03ffffff); if (addr & 0x02000000) addr |= 0xfc000000; *new_pc = (pc + 8 + addr); return (0); case 0x7: /* ldr pc, [pc, reg, lsl #2] */ addr = fetch_reg(cookie, insn & 0xf); addr = pc + 8 + (addr << 2); error = read_int(cookie, addr, &addr); *new_pc = addr; return (error); case 0x1: /* mov pc, reg */ *new_pc = fetch_reg(cookie, insn & 0xf); return (0); case 0x4: case 0x5: /* ldr pc, [reg] */ addr = fetch_reg(cookie, (insn >> 16) & 0xf); /* ldr pc, [reg, #offset] */ if (insn & (1 << 24)) offset = insn & 0xfff; if (insn & 0x00800000) addr += offset; else addr -= offset; error = read_int(cookie, addr, &addr); *new_pc = addr; return (error); case 0x8: /* ldmxx reg, {..., pc} */ case 0x9: addr = fetch_reg(cookie, (insn >> 16) & 0xf); nregs = (insn & 0x5555) + ((insn >> 1) & 0x5555); nregs = (nregs & 0x3333) + ((nregs >> 2) & 0x3333); nregs = (nregs + (nregs >> 4)) & 0x0f0f; nregs = (nregs + (nregs >> 8)) & 0x001f; switch ((insn >> 23) & 0x3) { case 0x0: /* ldmda */ addr = addr - 0; break; case 0x1: /* ldmia */ addr = addr + 0 + ((nregs - 1) << 2); break; case 0x2: /* ldmdb */ addr = addr - 4; break; case 0x3: /* ldmib */ addr = addr + 4 + ((nregs - 1) << 2); break; } error = read_int(cookie, addr, &addr); *new_pc = addr; return (error); default: return (EINVAL); } } #ifdef ARM_NEW_PMAP void set_stackptrs(int cpu) { set_stackptr(PSR_IRQ32_MODE, irqstack + ((IRQ_STACK_SIZE * PAGE_SIZE) * (cpu + 1))); set_stackptr(PSR_ABT32_MODE, abtstack + ((ABT_STACK_SIZE * PAGE_SIZE) * (cpu + 1))); set_stackptr(PSR_UND32_MODE, undstack + ((UND_STACK_SIZE * PAGE_SIZE) * (cpu + 1))); } #else void set_stackptrs(int cpu) { set_stackptr(PSR_IRQ32_MODE, irqstack.pv_va + ((IRQ_STACK_SIZE * PAGE_SIZE) * (cpu + 1))); set_stackptr(PSR_ABT32_MODE, abtstack.pv_va + ((ABT_STACK_SIZE * PAGE_SIZE) * (cpu + 1))); set_stackptr(PSR_UND32_MODE, undstack.pv_va + ((UND_STACK_SIZE * PAGE_SIZE) * (cpu + 1))); } #endif #ifdef EFI #define efi_next_descriptor(ptr, size) \ ((struct efi_md *)(((uint8_t *) ptr) + size)) static void add_efi_map_entries(struct efi_map_header *efihdr, struct mem_region *mr, int *mrcnt, uint32_t *memsize) { struct efi_md *map, *p; const char *type; size_t efisz, memory_size; int ndesc, i, j; static const char *types[] = { "Reserved", "LoaderCode", "LoaderData", "BootServicesCode", "BootServicesData", "RuntimeServicesCode", "RuntimeServicesData", "ConventionalMemory", "UnusableMemory", "ACPIReclaimMemory", "ACPIMemoryNVS", "MemoryMappedIO", "MemoryMappedIOPortSpace", "PalCode" }; *mrcnt = 0; *memsize = 0; /* * Memory map data provided by UEFI via the GetMemoryMap * Boot Services API. */ efisz = roundup2(sizeof(struct efi_map_header), 0x10); map = (struct efi_md *)((uint8_t *)efihdr + efisz); if (efihdr->descriptor_size == 0) return; ndesc = efihdr->memory_size / efihdr->descriptor_size; if (boothowto & RB_VERBOSE) printf("%23s %12s %12s %8s %4s\n", "Type", "Physical", "Virtual", "#Pages", "Attr"); memory_size = 0; for (i = 0, j = 0, p = map; i < ndesc; i++, p = efi_next_descriptor(p, efihdr->descriptor_size)) { if (boothowto & RB_VERBOSE) { if (p->md_type <= EFI_MD_TYPE_PALCODE) type = types[p->md_type]; else type = ""; printf("%23s %012llx %12p %08llx ", type, p->md_phys, p->md_virt, p->md_pages); if (p->md_attr & EFI_MD_ATTR_UC) printf("UC "); if (p->md_attr & EFI_MD_ATTR_WC) printf("WC "); if (p->md_attr & EFI_MD_ATTR_WT) printf("WT "); if (p->md_attr & EFI_MD_ATTR_WB) printf("WB "); if (p->md_attr & EFI_MD_ATTR_UCE) printf("UCE "); if (p->md_attr & EFI_MD_ATTR_WP) printf("WP "); if (p->md_attr & EFI_MD_ATTR_RP) printf("RP "); if (p->md_attr & EFI_MD_ATTR_XP) printf("XP "); if (p->md_attr & EFI_MD_ATTR_RT) printf("RUNTIME"); printf("\n"); } switch (p->md_type) { case EFI_MD_TYPE_CODE: case EFI_MD_TYPE_DATA: case EFI_MD_TYPE_BS_CODE: case EFI_MD_TYPE_BS_DATA: case EFI_MD_TYPE_FREE: /* * We're allowed to use any entry with these types. */ break; default: continue; } j++; if (j >= FDT_MEM_REGIONS) break; mr[j].mr_start = p->md_phys; mr[j].mr_size = p->md_pages * PAGE_SIZE; memory_size += mr[j].mr_size; } *mrcnt = j; *memsize = memory_size; } #endif /* EFI */ #ifdef FDT static char * kenv_next(char *cp) { if (cp != NULL) { while (*cp != 0) cp++; cp++; if (*cp == 0) cp = NULL; } return (cp); } static void print_kenv(void) { char *cp; debugf("loader passed (static) kenv:\n"); if (loader_envp == NULL) { debugf(" no env, null ptr\n"); return; } debugf(" loader_envp = 0x%08x\n", (uint32_t)loader_envp); for (cp = loader_envp; cp != NULL; cp = kenv_next(cp)) debugf(" %x %s\n", (uint32_t)cp, cp); } #ifndef ARM_NEW_PMAP void * initarm(struct arm_boot_params *abp) { struct mem_region mem_regions[FDT_MEM_REGIONS]; struct pv_addr kernel_l1pt; struct pv_addr dpcpu; vm_offset_t dtbp, freemempos, l2_start, lastaddr; uint32_t memsize, l2size; char *env; void *kmdp; u_int l1pagetable; int i, j, err_devmap, mem_regions_sz; lastaddr = parse_boot_param(abp); arm_physmem_kernaddr = abp->abp_physaddr; memsize = 0; cpuinfo_init(); set_cpufuncs(); /* * Find the dtb passed in by the boot loader. */ kmdp = preload_search_by_type("elf kernel"); if (kmdp != NULL) dtbp = MD_FETCH(kmdp, MODINFOMD_DTBP, vm_offset_t); else dtbp = (vm_offset_t)NULL; #if defined(FDT_DTB_STATIC) /* * In case the device tree blob was not retrieved (from metadata) try * to use the statically embedded one. */ if (dtbp == (vm_offset_t)NULL) dtbp = (vm_offset_t)&fdt_static_dtb; #endif if (OF_install(OFW_FDT, 0) == FALSE) panic("Cannot install FDT"); if (OF_init((void *)dtbp) != 0) panic("OF_init failed with the found device tree"); /* Grab physical memory regions information from device tree. */ if (fdt_get_mem_regions(mem_regions, &mem_regions_sz, &memsize) != 0) panic("Cannot get physical memory regions"); arm_physmem_hardware_regions(mem_regions, mem_regions_sz); /* Grab reserved memory regions information from device tree. */ if (fdt_get_reserved_regions(mem_regions, &mem_regions_sz) == 0) arm_physmem_exclude_regions(mem_regions, mem_regions_sz, EXFLAG_NODUMP | EXFLAG_NOALLOC); /* Platform-specific initialisation */ platform_probe_and_attach(); pcpu0_init(); /* Do basic tuning, hz etc */ init_param1(); /* Calculate number of L2 tables needed for mapping vm_page_array */ l2size = (memsize / PAGE_SIZE) * sizeof(struct vm_page); l2size = (l2size >> L1_S_SHIFT) + 1; /* * Add one table for end of kernel map, one for stacks, msgbuf and * L1 and L2 tables map and one for vectors map. */ l2size += 3; /* Make it divisible by 4 */ l2size = (l2size + 3) & ~3; freemempos = (lastaddr + PAGE_MASK) & ~PAGE_MASK; /* Define a macro to simplify memory allocation */ #define valloc_pages(var, np) \ alloc_pages((var).pv_va, (np)); \ (var).pv_pa = (var).pv_va + (abp->abp_physaddr - KERNVIRTADDR); #define alloc_pages(var, np) \ (var) = freemempos; \ freemempos += (np * PAGE_SIZE); \ memset((char *)(var), 0, ((np) * PAGE_SIZE)); while (((freemempos - L1_TABLE_SIZE) & (L1_TABLE_SIZE - 1)) != 0) freemempos += PAGE_SIZE; valloc_pages(kernel_l1pt, L1_TABLE_SIZE / PAGE_SIZE); for (i = 0, j = 0; i < l2size; ++i) { if (!(i % (PAGE_SIZE / L2_TABLE_SIZE_REAL))) { valloc_pages(kernel_pt_table[i], L2_TABLE_SIZE / PAGE_SIZE); j = i; } else { kernel_pt_table[i].pv_va = kernel_pt_table[j].pv_va + L2_TABLE_SIZE_REAL * (i - j); kernel_pt_table[i].pv_pa = kernel_pt_table[i].pv_va - KERNVIRTADDR + abp->abp_physaddr; } } /* * Allocate a page for the system page mapped to 0x00000000 * or 0xffff0000. This page will just contain the system vectors * and can be shared by all processes. */ valloc_pages(systempage, 1); /* Allocate dynamic per-cpu area. */ valloc_pages(dpcpu, DPCPU_SIZE / PAGE_SIZE); dpcpu_init((void *)dpcpu.pv_va, 0); /* Allocate stacks for all modes */ valloc_pages(irqstack, IRQ_STACK_SIZE * MAXCPU); valloc_pages(abtstack, ABT_STACK_SIZE * MAXCPU); valloc_pages(undstack, UND_STACK_SIZE * MAXCPU); valloc_pages(kernelstack, kstack_pages * MAXCPU); valloc_pages(msgbufpv, round_page(msgbufsize) / PAGE_SIZE); /* * Now we start construction of the L1 page table * We start by mapping the L2 page tables into the L1. * This means that we can replace L1 mappings later on if necessary */ l1pagetable = kernel_l1pt.pv_va; /* * Try to map as much as possible of kernel text and data using * 1MB section mapping and for the rest of initial kernel address * space use L2 coarse tables. * * Link L2 tables for mapping remainder of kernel (modulo 1MB) * and kernel structures */ l2_start = lastaddr & ~(L1_S_OFFSET); for (i = 0 ; i < l2size - 1; i++) pmap_link_l2pt(l1pagetable, l2_start + i * L1_S_SIZE, &kernel_pt_table[i]); pmap_curmaxkvaddr = l2_start + (l2size - 1) * L1_S_SIZE; /* Map kernel code and data */ pmap_map_chunk(l1pagetable, KERNVIRTADDR, abp->abp_physaddr, (((uint32_t)(lastaddr) - KERNVIRTADDR) + PAGE_MASK) & ~PAGE_MASK, VM_PROT_READ|VM_PROT_WRITE, PTE_CACHE); /* Map L1 directory and allocated L2 page tables */ pmap_map_chunk(l1pagetable, kernel_l1pt.pv_va, kernel_l1pt.pv_pa, L1_TABLE_SIZE, VM_PROT_READ|VM_PROT_WRITE, PTE_PAGETABLE); pmap_map_chunk(l1pagetable, kernel_pt_table[0].pv_va, kernel_pt_table[0].pv_pa, L2_TABLE_SIZE_REAL * l2size, VM_PROT_READ|VM_PROT_WRITE, PTE_PAGETABLE); /* Map allocated DPCPU, stacks and msgbuf */ pmap_map_chunk(l1pagetable, dpcpu.pv_va, dpcpu.pv_pa, freemempos - dpcpu.pv_va, VM_PROT_READ|VM_PROT_WRITE, PTE_CACHE); /* Link and map the vector page */ pmap_link_l2pt(l1pagetable, ARM_VECTORS_HIGH, &kernel_pt_table[l2size - 1]); pmap_map_entry(l1pagetable, ARM_VECTORS_HIGH, systempage.pv_pa, VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE, PTE_CACHE); /* Establish static device mappings. */ err_devmap = platform_devmap_init(); arm_devmap_bootstrap(l1pagetable, NULL); vm_max_kernel_address = platform_lastaddr(); cpu_domains((DOMAIN_CLIENT << (PMAP_DOMAIN_KERNEL * 2)) | DOMAIN_CLIENT); pmap_pa = kernel_l1pt.pv_pa; setttb(kernel_l1pt.pv_pa); cpu_tlb_flushID(); cpu_domains(DOMAIN_CLIENT << (PMAP_DOMAIN_KERNEL * 2)); /* * Now that proper page tables are installed, call cpu_setup() to enable * instruction and data caches and other chip-specific features. */ cpu_setup(); /* * Only after the SOC registers block is mapped we can perform device * tree fixups, as they may attempt to read parameters from hardware. */ OF_interpret("perform-fixup", 0); platform_gpio_init(); cninit(); debugf("initarm: console initialized\n"); debugf(" arg1 kmdp = 0x%08x\n", (uint32_t)kmdp); debugf(" boothowto = 0x%08x\n", boothowto); debugf(" dtbp = 0x%08x\n", (uint32_t)dtbp); print_kenv(); env = kern_getenv("kernelname"); if (env != NULL) { strlcpy(kernelname, env, sizeof(kernelname)); freeenv(env); } if (err_devmap != 0) printf("WARNING: could not fully configure devmap, error=%d\n", err_devmap); platform_late_init(); /* * Pages were allocated during the secondary bootstrap for the * stacks for different CPU modes. * We must now set the r13 registers in the different CPU modes to * point to these stacks. * Since the ARM stacks use STMFD etc. we must set r13 to the top end * of the stack memory. */ cpu_control(CPU_CONTROL_MMU_ENABLE, CPU_CONTROL_MMU_ENABLE); set_stackptrs(0); /* * We must now clean the cache again.... * Cleaning may be done by reading new data to displace any * dirty data in the cache. This will have happened in setttb() * but since we are boot strapping the addresses used for the read * may have just been remapped and thus the cache could be out * of sync. A re-clean after the switch will cure this. * After booting there are no gross relocations of the kernel thus * this problem will not occur after initarm(). */ cpu_idcache_wbinv_all(); undefined_init(); init_proc0(kernelstack.pv_va); arm_vector_init(ARM_VECTORS_HIGH, ARM_VEC_ALL); pmap_bootstrap(freemempos, &kernel_l1pt); msgbufp = (void *)msgbufpv.pv_va; msgbufinit(msgbufp, msgbufsize); mutex_init(); /* * Exclude the kernel (and all the things we allocated which immediately * follow the kernel) from the VM allocation pool but not from crash * dumps. virtual_avail is a global variable which tracks the kva we've * "allocated" while setting up pmaps. * * Prepare the list of physical memory available to the vm subsystem. */ arm_physmem_exclude_region(abp->abp_physaddr, (virtual_avail - KERNVIRTADDR), EXFLAG_NOALLOC); arm_physmem_init_kernel_globals(); init_param2(physmem); + dbg_monitor_init(); kdb_init(); return ((void *)(kernelstack.pv_va + USPACE_SVC_STACK_TOP - sizeof(struct pcb))); } #else /* !ARM_NEW_PMAP */ void * initarm(struct arm_boot_params *abp) { struct mem_region mem_regions[FDT_MEM_REGIONS]; vm_paddr_t lastaddr; vm_offset_t dtbp, kernelstack, dpcpu; uint32_t memsize; char *env; void *kmdp; int err_devmap, mem_regions_sz; #ifdef EFI struct efi_map_header *efihdr; #endif /* get last allocated physical address */ arm_physmem_kernaddr = abp->abp_physaddr; lastaddr = parse_boot_param(abp) - KERNVIRTADDR + arm_physmem_kernaddr; memsize = 0; set_cpufuncs(); cpuinfo_init(); /* * Find the dtb passed in by the boot loader. */ kmdp = preload_search_by_type("elf kernel"); dtbp = MD_FETCH(kmdp, MODINFOMD_DTBP, vm_offset_t); #if defined(FDT_DTB_STATIC) /* * In case the device tree blob was not retrieved (from metadata) try * to use the statically embedded one. */ if (dtbp == (vm_offset_t)NULL) dtbp = (vm_offset_t)&fdt_static_dtb; #endif if (OF_install(OFW_FDT, 0) == FALSE) panic("Cannot install FDT"); if (OF_init((void *)dtbp) != 0) panic("OF_init failed with the found device tree"); #ifdef EFI efihdr = (struct efi_map_header *)preload_search_info(kmdp, MODINFO_METADATA | MODINFOMD_EFI_MAP); if (efihdr != NULL) { add_efi_map_entries(efihdr, mem_regions, &mem_regions_sz, &memsize); } else #endif { /* Grab physical memory regions information from device tree. */ if (fdt_get_mem_regions(mem_regions, &mem_regions_sz, &memsize) != 0) panic("Cannot get physical memory regions"); } arm_physmem_hardware_regions(mem_regions, mem_regions_sz); /* Grab reserved memory regions information from device tree. */ if (fdt_get_reserved_regions(mem_regions, &mem_regions_sz) == 0) arm_physmem_exclude_regions(mem_regions, mem_regions_sz, EXFLAG_NODUMP | EXFLAG_NOALLOC); /* * Set TEX remapping registers. * Setup kernel page tables and switch to kernel L1 page table. */ pmap_set_tex(); pmap_bootstrap_prepare(lastaddr); /* * Now that proper page tables are installed, call cpu_setup() to enable * instruction and data caches and other chip-specific features. */ cpu_setup(); /* Platform-specific initialisation */ platform_probe_and_attach(); pcpu0_init(); /* Do basic tuning, hz etc */ init_param1(); /* * Allocate a page for the system page mapped to 0xffff0000 * This page will just contain the system vectors and can be * shared by all processes. */ systempage = pmap_preboot_get_pages(1); /* Map the vector page. */ pmap_preboot_map_pages(systempage, ARM_VECTORS_HIGH, 1); if (virtual_end >= ARM_VECTORS_HIGH) virtual_end = ARM_VECTORS_HIGH - 1; /* Allocate dynamic per-cpu area. */ dpcpu = pmap_preboot_get_vpages(DPCPU_SIZE / PAGE_SIZE); dpcpu_init((void *)dpcpu, 0); /* Allocate stacks for all modes */ irqstack = pmap_preboot_get_vpages(IRQ_STACK_SIZE * MAXCPU); abtstack = pmap_preboot_get_vpages(ABT_STACK_SIZE * MAXCPU); undstack = pmap_preboot_get_vpages(UND_STACK_SIZE * MAXCPU ); kernelstack = pmap_preboot_get_vpages(kstack_pages * MAXCPU); /* Allocate message buffer. */ msgbufp = (void *)pmap_preboot_get_vpages( round_page(msgbufsize) / PAGE_SIZE); /* * Pages were allocated during the secondary bootstrap for the * stacks for different CPU modes. * We must now set the r13 registers in the different CPU modes to * point to these stacks. * Since the ARM stacks use STMFD etc. we must set r13 to the top end * of the stack memory. */ set_stackptrs(0); mutex_init(); /* Establish static device mappings. */ err_devmap = platform_devmap_init(); arm_devmap_bootstrap(0, NULL); vm_max_kernel_address = platform_lastaddr(); /* * Only after the SOC registers block is mapped we can perform device * tree fixups, as they may attempt to read parameters from hardware. */ OF_interpret("perform-fixup", 0); platform_gpio_init(); cninit(); debugf("initarm: console initialized\n"); debugf(" arg1 kmdp = 0x%08x\n", (uint32_t)kmdp); debugf(" boothowto = 0x%08x\n", boothowto); debugf(" dtbp = 0x%08x\n", (uint32_t)dtbp); debugf(" lastaddr1: 0x%08x\n", lastaddr); print_kenv(); env = kern_getenv("kernelname"); if (env != NULL) strlcpy(kernelname, env, sizeof(kernelname)); if (err_devmap != 0) printf("WARNING: could not fully configure devmap, error=%d\n", err_devmap); platform_late_init(); /* * We must now clean the cache again.... * Cleaning may be done by reading new data to displace any * dirty data in the cache. This will have happened in setttb() * but since we are boot strapping the addresses used for the read * may have just been remapped and thus the cache could be out * of sync. A re-clean after the switch will cure this. * After booting there are no gross relocations of the kernel thus * this problem will not occur after initarm(). */ /* Set stack for exception handlers */ undefined_init(); init_proc0(kernelstack); arm_vector_init(ARM_VECTORS_HIGH, ARM_VEC_ALL); enable_interrupts(PSR_A); pmap_bootstrap(0); /* Exclude the kernel (and all the things we allocated which immediately * follow the kernel) from the VM allocation pool but not from crash * dumps. virtual_avail is a global variable which tracks the kva we've * "allocated" while setting up pmaps. * * Prepare the list of physical memory available to the vm subsystem. */ arm_physmem_exclude_region(abp->abp_physaddr, pmap_preboot_get_pages(0) - abp->abp_physaddr, EXFLAG_NOALLOC); arm_physmem_init_kernel_globals(); init_param2(physmem); /* Init message buffer. */ msgbufinit(msgbufp, msgbufsize); + dbg_monitor_init(); kdb_init(); return ((void *)STACKALIGN(thread0.td_pcb)); } #endif /* !ARM_NEW_PMAP */ #endif /* FDT */ uint32_t (*arm_cpu_fill_vdso_timehands)(struct vdso_timehands *, struct timecounter *); uint32_t cpu_fill_vdso_timehands(struct vdso_timehands *vdso_th, struct timecounter *tc) { return (arm_cpu_fill_vdso_timehands != NULL ? arm_cpu_fill_vdso_timehands(vdso_th, tc) : 0); } Index: projects/clang380-import/sys/arm/arm/minidump_machdep.c =================================================================== --- projects/clang380-import/sys/arm/arm/minidump_machdep.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/minidump_machdep.c (revision 294777) @@ -1,518 +1,396 @@ /*- + * Copyright (c) 2006 Peter Wemm * Copyright (c) 2008 Semihalf, Grzegorz Bernacki * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * from: FreeBSD: src/sys/i386/i386/minidump_machdep.c,v 1.6 2008/08/17 23:27:27 */ #include __FBSDID("$FreeBSD$"); #include "opt_watchdog.h" #include #include #include #include #include #include #include #ifdef SW_WATCHDOG #include #endif #include #include #include #include #include #include #include #include CTASSERT(sizeof(struct kerneldumpheader) == 512); /* * Don't touch the first SIZEOF_METADATA bytes on the dump device. This * is to protect us from metadata and to protect metadata from us. */ #define SIZEOF_METADATA (64*1024) uint32_t *vm_page_dump; int vm_page_dump_size; -#ifndef ARM_NEW_PMAP - static struct kerneldumpheader kdh; static off_t dumplo; /* Handle chunked writes. */ -static size_t fragsz, offset; +static size_t fragsz; static void *dump_va; static uint64_t counter, progress; CTASSERT(sizeof(*vm_page_dump) == 4); static int is_dumpable(vm_paddr_t pa) { int i; for (i = 0; dump_avail[i] != 0 || dump_avail[i + 1] != 0; i += 2) { if (pa >= dump_avail[i] && pa < dump_avail[i + 1]) return (1); } return (0); } #define PG2MB(pgs) (((pgs) + (1 << 8) - 1) >> 8) static int blk_flush(struct dumperinfo *di) { int error; if (fragsz == 0) return (0); - error = dump_write(di, (char*)dump_va + offset, 0, dumplo, fragsz - offset); - dumplo += (fragsz - offset); + error = dump_write(di, dump_va, 0, dumplo, fragsz); + dumplo += fragsz; fragsz = 0; - offset = 0; return (error); } static int blk_write(struct dumperinfo *di, char *ptr, vm_paddr_t pa, size_t sz) { size_t len; int error, i, c; u_int maxdumpsz; - maxdumpsz = di->maxiosize; - + maxdumpsz = min(di->maxiosize, MAXDUMPPGS * PAGE_SIZE); if (maxdumpsz == 0) /* seatbelt */ maxdumpsz = PAGE_SIZE; - error = 0; - if (ptr != NULL && pa != 0) { printf("cant have both va and pa!\n"); return (EINVAL); } - + if (pa != 0) { + if ((sz % PAGE_SIZE) != 0) { + printf("size not page aligned\n"); + return (EINVAL); + } + if ((pa & PAGE_MASK) != 0) { + printf("address not page aligned\n"); + return (EINVAL); + } + } if (ptr != NULL) { - /* If we're doing a virtual dump, flush any pre-existing pa pages */ + /* Flush any pre-existing pa pages before a virtual dump. */ error = blk_flush(di); if (error) return (error); } - while (sz) { - if (fragsz == 0) { - offset = pa & PAGE_MASK; - fragsz += offset; - } len = maxdumpsz - fragsz; if (len > sz) len = sz; counter += len; progress -= len; - if (counter >> 22) { printf(" %lld", PG2MB(progress >> PAGE_SHIFT)); counter &= (1<<22) - 1; } #ifdef SW_WATCHDOG wdog_kern_pat(WD_LASTVAL); #endif if (ptr) { error = dump_write(di, ptr, 0, dumplo, len); if (error) return (error); dumplo += len; ptr += len; sz -= len; } else { for (i = 0; i < len; i += PAGE_SIZE) dump_va = pmap_kenter_temporary(pa + i, (i + fragsz) >> PAGE_SHIFT); fragsz += len; pa += len; sz -= len; if (fragsz == maxdumpsz) { error = blk_flush(di); if (error) return (error); } } /* Check for user abort. */ c = cncheckc(); if (c == 0x03) return (ECANCELED); if (c != -1) printf(" (CTRL-C to abort) "); } return (0); } -static int -blk_write_cont(struct dumperinfo *di, vm_paddr_t pa, size_t sz) -{ - int error; +/* A buffer for general use. Its size must be one page at least. */ +static char dumpbuf[PAGE_SIZE]; +CTASSERT(sizeof(dumpbuf) % sizeof(pt2_entry_t) == 0); - error = blk_write(di, 0, pa, sz); - if (error) - return (error); - - error = blk_flush(di); - if (error) - return (error); - - return (0); -} - -/* A fake page table page, to avoid having to handle both 4K and 2M pages */ -static pt_entry_t fakept[NPTEPG]; - int minidumpsys(struct dumperinfo *di) { struct minidumphdr mdhdr; uint64_t dumpsize; uint32_t ptesize; uint32_t bits; uint32_t pa, prev_pa = 0, count = 0; vm_offset_t va; - pd_entry_t *pdp; - pt_entry_t *pt, *ptp; - int i, k, bit, error; + int i, bit, error; char *addr; /* * Flush caches. Note that in the SMP case this operates only on the * current CPU's L1 cache. Before we reach this point, code in either * the system shutdown or kernel debugger has called stop_cpus() to stop * all cores other than this one. Part of the ARM handling of * stop_cpus() is to call wbinv_all() on that core's local L1 cache. So * by time we get to here, all that remains is to flush the L1 for the * current CPU, then the L2. */ cpu_idcache_wbinv_all(); cpu_l2cache_wbinv_all(); counter = 0; /* Walk page table pages, set bits in vm_page_dump */ ptesize = 0; - for (va = KERNBASE; va < kernel_vm_end; va += NBPDR) { - /* - * We always write a page, even if it is zero. Each - * page written corresponds to 2MB of space - */ - ptesize += L2_TABLE_SIZE_REAL; - pmap_get_pde_pte(pmap_kernel(), va, &pdp, &ptp); - if (pmap_pde_v(pdp) && pmap_pde_section(pdp)) { - /* This is a section mapping 1M page. */ - pa = (*pdp & L1_S_ADDR_MASK) | (va & ~L1_S_ADDR_MASK); - for (k = 0; k < (L1_S_SIZE / PAGE_SIZE); k++) { - if (is_dumpable(pa)) - dump_add_page(pa); - pa += PAGE_SIZE; - } - continue; - } - if (pmap_pde_v(pdp) && pmap_pde_page(pdp)) { - /* Set bit for each valid page in this 1MB block */ - addr = pmap_kenter_temporary(*pdp & L1_C_ADDR_MASK, 0); - pt = (pt_entry_t*)(addr + - (((uint32_t)*pdp & L1_C_ADDR_MASK) & PAGE_MASK)); - for (k = 0; k < 256; k++) { - if ((pt[k] & L2_TYPE_MASK) == L2_TYPE_L) { - pa = (pt[k] & L2_L_FRAME) | - (va & L2_L_OFFSET); - for (i = 0; i < 16; i++) { - if (is_dumpable(pa)) - dump_add_page(pa); - k++; - pa += PAGE_SIZE; - } - } else if ((pt[k] & L2_TYPE_MASK) == L2_TYPE_S) { - pa = (pt[k] & L2_S_FRAME) | - (va & L2_S_OFFSET); - if (is_dumpable(pa)) - dump_add_page(pa); - } - } - } else { - /* Nothing, we're going to dump a null page */ - } + for (va = KERNBASE; va < kernel_vm_end; va += PAGE_SIZE) { + pa = pmap_dump_kextract(va, NULL); + if (pa != 0 && is_dumpable(pa)) + dump_add_page(pa); + ptesize += sizeof(pt2_entry_t); } /* Calculate dump size. */ dumpsize = ptesize; dumpsize += round_page(msgbufp->msg_size); dumpsize += round_page(vm_page_dump_size); for (i = 0; i < vm_page_dump_size / sizeof(*vm_page_dump); i++) { bits = vm_page_dump[i]; while (bits) { bit = ffs(bits) - 1; pa = (((uint64_t)i * sizeof(*vm_page_dump) * NBBY) + bit) * PAGE_SIZE; /* Clear out undumpable pages now if needed */ if (is_dumpable(pa)) dumpsize += PAGE_SIZE; else dump_drop_page(pa); bits &= ~(1ul << bit); } } dumpsize += PAGE_SIZE; /* Determine dump offset on device. */ if (di->mediasize < SIZEOF_METADATA + dumpsize + sizeof(kdh) * 2) { error = ENOSPC; goto fail; } dumplo = di->mediaoffset + di->mediasize - dumpsize; dumplo -= sizeof(kdh) * 2; progress = dumpsize; /* Initialize mdhdr */ bzero(&mdhdr, sizeof(mdhdr)); strcpy(mdhdr.magic, MINIDUMP_MAGIC); mdhdr.version = MINIDUMP_VERSION; mdhdr.msgbufsize = msgbufp->msg_size; mdhdr.bitmapsize = vm_page_dump_size; mdhdr.ptesize = ptesize; mdhdr.kernbase = KERNBASE; mdhdr.arch = __ARM_ARCH; #if __ARM_ARCH >= 6 mdhdr.mmuformat = MINIDUMP_MMU_FORMAT_V6; #else mdhdr.mmuformat = MINIDUMP_MMU_FORMAT_V4; #endif mkdumpheader(&kdh, KERNELDUMPMAGIC, KERNELDUMP_ARM_VERSION, dumpsize, di->blocksize); printf("Physical memory: %u MB\n", ptoa((uintmax_t)physmem) / 1048576); printf("Dumping %llu MB:", (long long)dumpsize >> 20); /* Dump leader */ error = dump_write(di, &kdh, 0, dumplo, sizeof(kdh)); if (error) goto fail; dumplo += sizeof(kdh); /* Dump my header */ - bzero(&fakept, sizeof(fakept)); - bcopy(&mdhdr, &fakept, sizeof(mdhdr)); - error = blk_write(di, (char *)&fakept, 0, PAGE_SIZE); + bzero(dumpbuf, sizeof(dumpbuf)); + bcopy(&mdhdr, dumpbuf, sizeof(mdhdr)); + error = blk_write(di, dumpbuf, 0, PAGE_SIZE); if (error) goto fail; /* Dump msgbuf up front */ - error = blk_write(di, (char *)msgbufp->msg_ptr, 0, round_page(msgbufp->msg_size)); + error = blk_write(di, (char *)msgbufp->msg_ptr, 0, + round_page(msgbufp->msg_size)); if (error) goto fail; /* Dump bitmap */ error = blk_write(di, (char *)vm_page_dump, 0, round_page(vm_page_dump_size)); if (error) goto fail; /* Dump kernel page table pages */ - for (va = KERNBASE; va < kernel_vm_end; va += NBPDR) { - /* We always write a page, even if it is zero */ - pmap_get_pde_pte(pmap_kernel(), va, &pdp, &ptp); - - if (pmap_pde_v(pdp) && pmap_pde_section(pdp)) { - if (count) { - error = blk_write_cont(di, prev_pa, - count * L2_TABLE_SIZE_REAL); - if (error) - goto fail; - count = 0; - prev_pa = 0; - } - /* This is a single 2M block. Generate a fake PTP */ - pa = (*pdp & L1_S_ADDR_MASK) | (va & ~L1_S_ADDR_MASK); - for (k = 0; k < (L1_S_SIZE / PAGE_SIZE); k++) { - fakept[k] = L2_S_PROTO | (pa + (k * PAGE_SIZE)) | - L2_S_PROT(PTE_KERNEL, - VM_PROT_READ | VM_PROT_WRITE); - } - error = blk_write(di, (char *)&fakept, 0, - L2_TABLE_SIZE_REAL); - if (error) + addr = dumpbuf; + for (va = KERNBASE; va < kernel_vm_end; va += PAGE_SIZE) { + pmap_dump_kextract(va, (pt2_entry_t *)addr); + addr += sizeof(pt2_entry_t); + if (addr == dumpbuf + sizeof(dumpbuf)) { + error = blk_write(di, dumpbuf, 0, sizeof(dumpbuf)); + if (error != 0) goto fail; - /* Flush, in case we reuse fakept in the same block */ - error = blk_flush(di); - if (error) - goto fail; - continue; + addr = dumpbuf; } - if (pmap_pde_v(pdp) && pmap_pde_page(pdp)) { - pa = *pdp & L1_C_ADDR_MASK; - if (!count) { - prev_pa = pa; - count++; - } - else { - if (pa == (prev_pa + count * L2_TABLE_SIZE_REAL)) - count++; - else { - error = blk_write_cont(di, prev_pa, - count * L2_TABLE_SIZE_REAL); - if (error) - goto fail; - count = 1; - prev_pa = pa; - } - } - } else { - if (count) { - error = blk_write_cont(di, prev_pa, - count * L2_TABLE_SIZE_REAL); - if (error) - goto fail; - count = 0; - prev_pa = 0; - } - bzero(fakept, sizeof(fakept)); - error = blk_write(di, (char *)&fakept, 0, - L2_TABLE_SIZE_REAL); - if (error) - goto fail; - /* Flush, in case we reuse fakept in the same block */ - error = blk_flush(di); - if (error) - goto fail; - } } - - if (count) { - error = blk_write_cont(di, prev_pa, count * L2_TABLE_SIZE_REAL); - if (error) + if (addr != dumpbuf) { + error = blk_write(di, dumpbuf, 0, addr - dumpbuf); + if (error != 0) goto fail; - count = 0; - prev_pa = 0; } /* Dump memory chunks */ for (i = 0; i < vm_page_dump_size / sizeof(*vm_page_dump); i++) { bits = vm_page_dump[i]; while (bits) { bit = ffs(bits) - 1; pa = (((uint64_t)i * sizeof(*vm_page_dump) * NBBY) + bit) * PAGE_SIZE; if (!count) { prev_pa = pa; count++; } else { if (pa == (prev_pa + count * PAGE_SIZE)) count++; else { - error = blk_write_cont(di, prev_pa, + error = blk_write(di, NULL, prev_pa, count * PAGE_SIZE); if (error) goto fail; count = 1; prev_pa = pa; } } bits &= ~(1ul << bit); } } if (count) { - error = blk_write_cont(di, prev_pa, count * PAGE_SIZE); + error = blk_write(di, NULL, prev_pa, count * PAGE_SIZE); if (error) goto fail; count = 0; prev_pa = 0; } + error = blk_flush(di); + if (error) + goto fail; + /* Dump trailer */ error = dump_write(di, &kdh, 0, dumplo, sizeof(kdh)); if (error) goto fail; dumplo += sizeof(kdh); /* Signal completion, signoff and exit stage left. */ dump_write(di, NULL, 0, 0, 0); printf("\nDump complete\n"); return (0); fail: if (error < 0) error = -error; if (error == ECANCELED) printf("\nDump aborted\n"); else if (error == ENOSPC) printf("\nDump failed. Partition too small.\n"); else printf("\n** DUMP FAILED (ERROR %d) **\n", error); return (error); return (0); } - -#else /* ARM_NEW_PMAP */ - -int -minidumpsys(struct dumperinfo *di) -{ - - return (0); -} - -#endif void dump_add_page(vm_paddr_t pa) { int idx, bit; pa >>= PAGE_SHIFT; idx = pa >> 5; /* 2^5 = 32 */ bit = pa & 31; atomic_set_int(&vm_page_dump[idx], 1ul << bit); } void dump_drop_page(vm_paddr_t pa) { int idx, bit; pa >>= PAGE_SHIFT; idx = pa >> 5; /* 2^5 = 32 */ bit = pa & 31; atomic_clear_int(&vm_page_dump[idx], 1ul << bit); } Index: projects/clang380-import/sys/arm/arm/physmem.c =================================================================== --- projects/clang380-import/sys/arm/arm/physmem.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/physmem.c (revision 294777) @@ -1,367 +1,376 @@ /*- * Copyright (c) 2014 Ian Lepore * All rights excluded. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_ddb.h" /* * Routines for describing and initializing anything related to physical memory. */ #include #include #include #include #include /* * These structures are used internally to keep track of regions of physical * ram, and regions within the physical ram that need to be excluded. An * exclusion region can be excluded from crash dumps, from the vm pool of pages * that can be allocated, or both, depending on the exclusion flags associated * with the region. */ #define MAX_HWCNT 10 #define MAX_EXCNT 10 +#define MAX_PHYS_ADDR 0xFFFFFFFFull + struct region { vm_paddr_t addr; vm_size_t size; uint32_t flags; }; static struct region hwregions[MAX_HWCNT]; static struct region exregions[MAX_EXCNT]; static size_t hwcnt; static size_t excnt; /* * These "avail lists" are globals used to communicate physical memory layout to * other parts of the kernel. Within the arrays, each value is the starting * address of a contiguous area of physical address space. The values at even * indexes are areas that contain usable memory and the values at odd indexes * are areas that aren't usable. Each list is terminated by a pair of zero * entries. * * dump_avail tells the dump code what regions to include in a crash dump, and * phys_avail is the way we hand all the remaining physical ram we haven't used * in early kernel init over to the vm system for allocation management. * * We size these arrays to hold twice as many available regions as we allow for * hardware memory regions, to allow for the fact that exclusions can split a * hardware region into two or more available regions. In the real world there * will typically be one or two hardware regions and two or three exclusions. * * Each available region in this list occupies two array slots (the start of the * available region and the start of the unavailable region that follows it). */ #define MAX_AVAIL_REGIONS (MAX_HWCNT * 2) #define MAX_AVAIL_ENTRIES (MAX_AVAIL_REGIONS * 2) vm_paddr_t phys_avail[MAX_AVAIL_ENTRIES + 2]; /* +2 to allow for a pair */ vm_paddr_t dump_avail[MAX_AVAIL_ENTRIES + 2]; /* of zeroes to terminate. */ /* * realmem is the total number of hardware pages, excluded or not. * Maxmem is one greater than the last physical page number. */ long realmem; long Maxmem; /* The address at which the kernel was loaded. Set early in initarm(). */ vm_paddr_t arm_physmem_kernaddr; /* * Print the contents of the physical and excluded region tables using the * provided printf-like output function (which will be either printf or * db_printf). */ static void physmem_dump_tables(int (*prfunc)(const char *, ...)) { int flags, i; uintmax_t addr, size; const unsigned int mbyte = 1024 * 1024; prfunc("Physical memory chunk(s):\n"); for (i = 0; i < hwcnt; ++i) { addr = hwregions[i].addr; size = hwregions[i].size; prfunc(" 0x%08jx - 0x%08jx, %5ju MB (%7ju pages)\n", addr, addr + size - 1, size / mbyte, size / PAGE_SIZE); } prfunc("Excluded memory regions:\n"); for (i = 0; i < excnt; ++i) { addr = exregions[i].addr; size = exregions[i].size; flags = exregions[i].flags; prfunc(" 0x%08jx - 0x%08jx, %5ju MB (%7ju pages) %s %s\n", addr, addr + size - 1, size / mbyte, size / PAGE_SIZE, (flags & EXFLAG_NOALLOC) ? "NoAlloc" : "", (flags & EXFLAG_NODUMP) ? "NoDump" : ""); } #ifdef DEBUG prfunc("Avail lists:\n"); for (i = 0; phys_avail[i] != 0; ++i) { prfunc(" phys_avail[%d] 0x%08x\n", i, phys_avail[i]); } for (i = 0; dump_avail[i] != 0; ++i) { prfunc(" dump_avail[%d] 0x%08x\n", i, dump_avail[i]); } #endif } /* * Print the contents of the static mapping table. Used for bootverbose. */ void arm_physmem_print_tables() { physmem_dump_tables(printf); } /* * Walk the list of hardware regions, processing it against the list of * exclusions that contain the given exflags, and generating an "avail list". * * Updates the value at *pavail with the sum of all pages in all hw regions. * * Returns the number of pages of non-excluded memory added to the avail list. */ static size_t regions_to_avail(vm_paddr_t *avail, uint32_t exflags, long *pavail) { size_t acnt, exi, hwi; uint64_t end, start, xend, xstart; long availmem; const struct region *exp, *hwp; realmem = 0; availmem = 0; acnt = 0; for (hwi = 0, hwp = hwregions; hwi < hwcnt; ++hwi, ++hwp) { start = hwp->addr; end = hwp->size + start; realmem += arm32_btop((vm_offset_t)(end - start)); for (exi = 0, exp = exregions; exi < excnt; ++exi, ++exp) { /* * If the excluded region does not match given flags, * continue checking with the next excluded region. */ if ((exp->flags & exflags) == 0) continue; xstart = exp->addr; xend = exp->size + xstart; /* * If the excluded region ends before this hw region, * continue checking with the next excluded region. */ if (xend <= start) continue; /* * If the excluded region begins after this hw region * we're done because both lists are sorted. */ if (xstart >= end) break; /* * If the excluded region completely covers this hw * region, shrink this hw region to zero size. */ if ((start >= xstart) && (end <= xend)) { start = xend; end = xend; break; } /* * If the excluded region falls wholly within this hw * region without abutting or overlapping the beginning * or end, create an available entry from the leading * fragment, then adjust the start of this hw region to * the end of the excluded region, and continue checking * the next excluded region because another exclusion * could affect the remainder of this hw region. */ if ((xstart > start) && (xend < end)) { avail[acnt++] = (vm_paddr_t)start; avail[acnt++] = (vm_paddr_t)xstart; availmem += arm32_btop((vm_offset_t)(xstart - start)); start = xend; continue; } /* * We know the excluded region overlaps either the start * or end of this hardware region (but not both), trim * the excluded portion off the appropriate end. */ if (xstart <= start) start = xend; else end = xstart; } /* * If the trimming actions above left a non-zero size, create an * available entry for it. */ if (end > start) { avail[acnt++] = (vm_paddr_t)start; avail[acnt++] = (vm_paddr_t)end; availmem += arm32_btop((vm_offset_t)(end - start)); } if (acnt >= MAX_AVAIL_ENTRIES) panic("Not enough space in the dump/phys_avail arrays"); } if (pavail) *pavail = availmem; return (acnt); } /* * Insertion-sort a new entry into a regions list; sorted by start address. */ static void insert_region(struct region *regions, size_t rcnt, vm_paddr_t addr, vm_size_t size, uint32_t flags) { size_t i; struct region *ep, *rp; ep = regions + rcnt; for (i = 0, rp = regions; i < rcnt; ++i, ++rp) { if (addr < rp->addr) { bcopy(rp, rp + 1, (ep - rp) * sizeof(*rp)); break; } } rp->addr = addr; rp->size = size; rp->flags = flags; } /* * Add a hardware memory region. */ void -arm_physmem_hardware_region(vm_paddr_t pa, vm_size_t sz) +arm_physmem_hardware_region(uint64_t pa, uint64_t sz) { vm_offset_t adj; /* * Filter out the page at PA 0x00000000. The VM can't handle it, as * pmap_extract() == 0 means failure. - * + */ + if (pa == 0) { + if (sz <= PAGE_SIZE) + return; + pa = PAGE_SIZE; + sz -= PAGE_SIZE; + } else if (pa > MAX_PHYS_ADDR) { + /* This range is past usable memory, ignore it */ + return; + } + + /* * Also filter out the page at the end of the physical address space -- * if addr is non-zero and addr+size is zero we wrapped to the next byte * beyond what vm_paddr_t can express. That leads to a NULL pointer * deref early in startup; work around it by leaving the last page out. * * XXX This just in: subtract out a whole megabyte, not just 1 page. * Reducing the size by anything less than 1MB results in the NULL * pointer deref in _vm_map_lock_read(). Better to give up a megabyte * than leave some folks with an unusable system while we investigate. */ - if (pa == 0) { - if (sz <= PAGE_SIZE) - return; - pa = PAGE_SIZE; - sz -= PAGE_SIZE; - } else if (pa + sz == 0) { + if ((pa + sz) > (MAX_PHYS_ADDR - 1024 * 1024)) { + sz = MAX_PHYS_ADDR - pa + 1; if (sz <= 1024 * 1024) return; sz -= 1024 * 1024; } /* * Round the starting address up to a page boundary, and truncate the * ending page down to a page boundary. */ adj = round_page(pa) - pa; pa = round_page(pa); sz = trunc_page(sz - adj); if (sz > 0 && hwcnt < nitems(hwregions)) insert_region(hwregions, hwcnt++, pa, sz, 0); } /* * Add an exclusion region. */ void arm_physmem_exclude_region(vm_paddr_t pa, vm_size_t sz, uint32_t exflags) { vm_offset_t adj; /* * Truncate the starting address down to a page boundary, and round the * ending page up to a page boundary. */ adj = pa - trunc_page(pa); pa = trunc_page(pa); sz = round_page(sz + adj); if (excnt < nitems(exregions)) insert_region(exregions, excnt++, pa, sz, exflags); } /* * Process all the regions added earlier into the global avail lists. * * Updates the kernel global 'physmem' with the number of physical pages * available for use (all pages not in any exclusion region). * * Updates the kernel global 'Maxmem' with the page number one greater then the * last page of physical memory in the system. */ void arm_physmem_init_kernel_globals(void) { size_t nextidx; regions_to_avail(dump_avail, EXFLAG_NODUMP, NULL); nextidx = regions_to_avail(phys_avail, EXFLAG_NOALLOC, &physmem); if (nextidx == 0) panic("No memory entries in phys_avail"); Maxmem = atop(phys_avail[nextidx - 1]); } #ifdef DDB #include DB_SHOW_COMMAND(physmem, db_show_physmem) { physmem_dump_tables(db_printf); } #endif /* DDB */ Index: projects/clang380-import/sys/arm/arm/pmap-v6-new.c =================================================================== --- projects/clang380-import/sys/arm/arm/pmap-v6-new.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/pmap-v6-new.c (revision 294777) @@ -1,6599 +1,6629 @@ /*- * Copyright (c) 1991 Regents of the University of California. * Copyright (c) 1994 John S. Dyson * Copyright (c) 1994 David Greenman * Copyright (c) 2005-2010 Alan L. Cox * Copyright (c) 2014 Svatopluk Kraus * Copyright (c) 2014 Michal Meloun * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)pmap.c 7.7 (Berkeley) 5/12/91 */ /*- * Copyright (c) 2003 Networks Associates Technology, Inc. * All rights reserved. * * This software was developed for the FreeBSD Project by Jake Burkholder, * Safeport Network Services, and Network Associates Laboratories, the * Security Research Division of Network Associates, Inc. under * DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA * CHATS research program. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); /* * Manages physical address maps. * * Since the information managed by this module is * also stored by the logical address mapping module, * this module may throw away valid virtual-to-physical * mappings at almost any time. However, invalidations * of virtual-to-physical mappings must be done as * requested. * * In order to cope with hardware architectures which * make virtual-to-physical map invalidates expensive, * this module may delay invalidate or reduced protection * operations until such time as they are actually * necessary. This module is given full information as * to which processors are currently using which maps, * and to when physical maps must be made correct. */ #include "opt_vm.h" #include "opt_pmap.h" #include "opt_ddb.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef SMP #include #else #include #endif #ifdef DDB #include #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef SMP #include #endif #ifndef PMAP_SHPGPERPROC #define PMAP_SHPGPERPROC 200 #endif #ifndef DIAGNOSTIC #define PMAP_INLINE __inline #else #define PMAP_INLINE #endif #ifdef PMAP_DEBUG static void pmap_zero_page_check(vm_page_t m); void pmap_debug(int level); int pmap_pid_dump(int pid); #define PDEBUG(_lev_,_stat_) \ if (pmap_debug_level >= (_lev_)) \ ((_stat_)) #define dprintf printf int pmap_debug_level = 1; #else /* PMAP_DEBUG */ #define PDEBUG(_lev_,_stat_) /* Nothing */ #define dprintf(x, arg...) #endif /* PMAP_DEBUG */ /* * Level 2 page tables map definion ('max' is excluded). */ #define PT2V_MIN_ADDRESS ((vm_offset_t)PT2MAP) #define PT2V_MAX_ADDRESS ((vm_offset_t)PT2MAP + PT2MAP_SIZE) #define UPT2V_MIN_ADDRESS ((vm_offset_t)PT2MAP) #define UPT2V_MAX_ADDRESS \ ((vm_offset_t)(PT2MAP + (KERNBASE >> PT2MAP_SHIFT))) /* * Promotion to a 1MB (PTE1) page mapping requires that the corresponding * 4KB (PTE2) page mappings have identical settings for the following fields: */ #define PTE2_PROMOTE (PTE2_V | PTE2_A | PTE2_NM | PTE2_S | PTE2_NG | \ PTE2_NX | PTE2_RO | PTE2_U | PTE2_W | \ PTE2_ATTR_MASK) #define PTE1_PROMOTE (PTE1_V | PTE1_A | PTE1_NM | PTE1_S | PTE1_NG | \ PTE1_NX | PTE1_RO | PTE1_U | PTE1_W | \ PTE1_ATTR_MASK) #define ATTR_TO_L1(l2_attr) ((((l2_attr) & L2_TEX0) ? L1_S_TEX0 : 0) | \ (((l2_attr) & L2_C) ? L1_S_C : 0) | \ (((l2_attr) & L2_B) ? L1_S_B : 0) | \ (((l2_attr) & PTE2_A) ? PTE1_A : 0) | \ (((l2_attr) & PTE2_NM) ? PTE1_NM : 0) | \ (((l2_attr) & PTE2_S) ? PTE1_S : 0) | \ (((l2_attr) & PTE2_NG) ? PTE1_NG : 0) | \ (((l2_attr) & PTE2_NX) ? PTE1_NX : 0) | \ (((l2_attr) & PTE2_RO) ? PTE1_RO : 0) | \ (((l2_attr) & PTE2_U) ? PTE1_U : 0) | \ (((l2_attr) & PTE2_W) ? PTE1_W : 0)) #define ATTR_TO_L2(l1_attr) ((((l1_attr) & L1_S_TEX0) ? L2_TEX0 : 0) | \ (((l1_attr) & L1_S_C) ? L2_C : 0) | \ (((l1_attr) & L1_S_B) ? L2_B : 0) | \ (((l1_attr) & PTE1_A) ? PTE2_A : 0) | \ (((l1_attr) & PTE1_NM) ? PTE2_NM : 0) | \ (((l1_attr) & PTE1_S) ? PTE2_S : 0) | \ (((l1_attr) & PTE1_NG) ? PTE2_NG : 0) | \ (((l1_attr) & PTE1_NX) ? PTE2_NX : 0) | \ (((l1_attr) & PTE1_RO) ? PTE2_RO : 0) | \ (((l1_attr) & PTE1_U) ? PTE2_U : 0) | \ (((l1_attr) & PTE1_W) ? PTE2_W : 0)) /* * PTE2 descriptors creation macros. */ #define PTE2_KPT(pa) PTE2_KERN(pa, PTE2_AP_KRW, pt_memattr) #define PTE2_KPT_NG(pa) PTE2_KERN_NG(pa, PTE2_AP_KRW, pt_memattr) #define PTE2_KRW(pa) PTE2_KERN(pa, PTE2_AP_KRW, PTE2_ATTR_NORMAL) #define PTE2_KRO(pa) PTE2_KERN(pa, PTE2_AP_KR, PTE2_ATTR_NORMAL) #define PV_STATS #ifdef PV_STATS #define PV_STAT(x) do { x ; } while (0) #else #define PV_STAT(x) do { } while (0) #endif /* * The boot_pt1 is used temporary in very early boot stage as L1 page table. * We can init many things with no memory allocation thanks to its static * allocation and this brings two main advantages: * (1) other cores can be started very simply, * (2) various boot loaders can be supported as its arguments can be processed * in virtual address space and can be moved to safe location before * first allocation happened. * Only disadvantage is that boot_pt1 is used only in very early boot stage. * However, the table is uninitialized and so lays in bss. Therefore kernel * image size is not influenced. * * QQQ: In the future, maybe, boot_pt1 can be used for soft reset and * CPU suspend/resume game. */ extern pt1_entry_t boot_pt1[]; vm_paddr_t base_pt1; pt1_entry_t *kern_pt1; pt2_entry_t *kern_pt2tab; pt2_entry_t *PT2MAP; static uint32_t ttb_flags; static vm_memattr_t pt_memattr; ttb_entry_t pmap_kern_ttb; /* XXX use converion function*/ #define PTE2_ATTR_NORMAL VM_MEMATTR_DEFAULT #define PTE1_ATTR_NORMAL ATTR_TO_L1(PTE2_ATTR_NORMAL) struct pmap kernel_pmap_store; LIST_HEAD(pmaplist, pmap); static struct pmaplist allpmaps; static struct mtx allpmaps_lock; vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ static vm_offset_t kernel_vm_end_new; vm_offset_t kernel_vm_end = KERNBASE + NKPT2PG * NPT2_IN_PG * PTE1_SIZE; vm_offset_t vm_max_kernel_address; vm_paddr_t kernel_l1pa; static struct rwlock __aligned(CACHE_LINE_SIZE) pvh_global_lock; /* * Data for the pv entry allocation mechanism */ static TAILQ_HEAD(pch, pv_chunk) pv_chunks = TAILQ_HEAD_INITIALIZER(pv_chunks); static int pv_entry_count = 0, pv_entry_max = 0, pv_entry_high_water = 0; static struct md_page *pv_table; /* XXX: Is it used only the list in md_page? */ static int shpgperproc = PMAP_SHPGPERPROC; struct pv_chunk *pv_chunkbase; /* KVA block for pv_chunks */ int pv_maxchunks; /* How many chunks we have KVA for */ vm_offset_t pv_vafree; /* freelist stored in the PTE */ vm_paddr_t first_managed_pa; #define pa_to_pvh(pa) (&pv_table[pte1_index(pa - first_managed_pa)]) /* * All those kernel PT submaps that BSD is so fond of */ struct sysmaps { struct mtx lock; pt2_entry_t *CMAP1; pt2_entry_t *CMAP2; pt2_entry_t *CMAP3; caddr_t CADDR1; caddr_t CADDR2; caddr_t CADDR3; }; static struct sysmaps sysmaps_pcpu[MAXCPU]; static pt2_entry_t *CMAP3; static caddr_t CADDR3; caddr_t _tmppt = 0; struct msgbuf *msgbufp = 0; /* XXX move it to machdep.c */ /* * Crashdump maps. */ static caddr_t crashdumpmap; static pt2_entry_t *PMAP1 = 0, *PMAP2; static pt2_entry_t *PADDR1 = 0, *PADDR2; #ifdef DDB static pt2_entry_t *PMAP3; static pt2_entry_t *PADDR3; static int PMAP3cpu __unused; /* for SMP only */ #endif #ifdef SMP static int PMAP1cpu; static int PMAP1changedcpu; SYSCTL_INT(_debug, OID_AUTO, PMAP1changedcpu, CTLFLAG_RD, &PMAP1changedcpu, 0, "Number of times pmap_pte2_quick changed CPU with same PMAP1"); #endif static int PMAP1changed; SYSCTL_INT(_debug, OID_AUTO, PMAP1changed, CTLFLAG_RD, &PMAP1changed, 0, "Number of times pmap_pte2_quick changed PMAP1"); static int PMAP1unchanged; SYSCTL_INT(_debug, OID_AUTO, PMAP1unchanged, CTLFLAG_RD, &PMAP1unchanged, 0, "Number of times pmap_pte2_quick didn't change PMAP1"); static struct mtx PMAP2mutex; static __inline void pt2_wirecount_init(vm_page_t m); static boolean_t pmap_demote_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va); void cache_icache_sync_fresh(vm_offset_t va, vm_paddr_t pa, vm_size_t size); /* * Function to set the debug level of the pmap code. */ #ifdef PMAP_DEBUG void pmap_debug(int level) { pmap_debug_level = level; dprintf("pmap_debug: level=%d\n", pmap_debug_level); } #endif /* PMAP_DEBUG */ /* * This table must corespond with memory attribute configuration in vm.h. * First entry is used for normal system mapping. * * Device memory is always marked as shared. * Normal memory is shared only in SMP . * Not outer shareable bits are not used yet. * Class 6 cannot be used on ARM11. */ #define TEXDEF_TYPE_SHIFT 0 #define TEXDEF_TYPE_MASK 0x3 #define TEXDEF_INNER_SHIFT 2 #define TEXDEF_INNER_MASK 0x3 #define TEXDEF_OUTER_SHIFT 4 #define TEXDEF_OUTER_MASK 0x3 #define TEXDEF_NOS_SHIFT 6 #define TEXDEF_NOS_MASK 0x1 #define TEX(t, i, o, s) \ ((t) << TEXDEF_TYPE_SHIFT) | \ ((i) << TEXDEF_INNER_SHIFT) | \ ((o) << TEXDEF_OUTER_SHIFT | \ ((s) << TEXDEF_NOS_SHIFT)) static uint32_t tex_class[8] = { /* type inner cache outer cache */ TEX(PRRR_MEM, NMRR_WB_WA, NMRR_WB_WA, 0), /* 0 - ATTR_WB_WA */ TEX(PRRR_MEM, NMRR_NC, NMRR_NC, 0), /* 1 - ATTR_NOCACHE */ TEX(PRRR_DEV, NMRR_NC, NMRR_NC, 0), /* 2 - ATTR_DEVICE */ TEX(PRRR_SO, NMRR_NC, NMRR_NC, 0), /* 3 - ATTR_SO */ TEX(PRRR_MEM, NMRR_WT, NMRR_WT, 0), /* 4 - ATTR_WT */ TEX(PRRR_MEM, NMRR_NC, NMRR_NC, 0), /* 5 - NOT USED YET */ TEX(PRRR_MEM, NMRR_NC, NMRR_NC, 0), /* 6 - NOT USED YET */ TEX(PRRR_MEM, NMRR_NC, NMRR_NC, 0), /* 7 - NOT USED YET */ }; #undef TEX /* * Convert TEX definition entry to TTB flags. */ static uint32_t encode_ttb_flags(int idx) { uint32_t inner, outer, nos, reg; inner = (tex_class[idx] >> TEXDEF_INNER_SHIFT) & TEXDEF_INNER_MASK; outer = (tex_class[idx] >> TEXDEF_OUTER_SHIFT) & TEXDEF_OUTER_MASK; nos = (tex_class[idx] >> TEXDEF_NOS_SHIFT) & TEXDEF_NOS_MASK; reg = nos << 5; reg |= outer << 3; if (cpuinfo.coherent_walk) reg |= (inner & 0x1) << 6; reg |= (inner & 0x2) >> 1; #ifdef SMP reg |= 1 << 1; #endif return reg; } /* * Set TEX remapping registers in current CPU. */ void pmap_set_tex(void) { uint32_t prrr, nmrr; uint32_t type, inner, outer, nos; int i; #ifdef PMAP_PTE_NOCACHE /* XXX fixme */ if (cpuinfo.coherent_walk) { pt_memattr = VM_MEMATTR_WB_WA; ttb_flags = encode_ttb_flags(0); } else { pt_memattr = VM_MEMATTR_NOCACHE; ttb_flags = encode_ttb_flags(1); } #else pt_memattr = VM_MEMATTR_WB_WA; ttb_flags = encode_ttb_flags(0); #endif prrr = 0; nmrr = 0; /* Build remapping register from TEX classes. */ for (i = 0; i < 8; i++) { type = (tex_class[i] >> TEXDEF_TYPE_SHIFT) & TEXDEF_TYPE_MASK; inner = (tex_class[i] >> TEXDEF_INNER_SHIFT) & TEXDEF_INNER_MASK; outer = (tex_class[i] >> TEXDEF_OUTER_SHIFT) & TEXDEF_OUTER_MASK; nos = (tex_class[i] >> TEXDEF_NOS_SHIFT) & TEXDEF_NOS_MASK; prrr |= type << (i * 2); prrr |= nos << (i + 24); nmrr |= inner << (i * 2); nmrr |= outer << (i * 2 + 16); } /* Add shareable bits for device memory. */ prrr |= PRRR_DS0 | PRRR_DS1; /* Add shareable bits for normal memory in SMP case. */ #ifdef SMP prrr |= PRRR_NS1; #endif cp15_prrr_set(prrr); cp15_nmrr_set(nmrr); /* Caches are disabled, so full TLB flush should be enough. */ tlb_flush_all_local(); } /* * KERNBASE must be multiple of NPT2_IN_PG * PTE1_SIZE. In other words, * KERNBASE is mapped by first L2 page table in L2 page table page. It * meets same constrain due to PT2MAP being placed just under KERNBASE. */ CTASSERT((KERNBASE & (NPT2_IN_PG * PTE1_SIZE - 1)) == 0); CTASSERT((KERNBASE - VM_MAXUSER_ADDRESS) >= PT2MAP_SIZE); /* * In crazy dreams, PAGE_SIZE could be a multiple of PTE2_SIZE in general. * For now, anyhow, the following check must be fulfilled. */ CTASSERT(PAGE_SIZE == PTE2_SIZE); /* * We don't want to mess up MI code with all MMU and PMAP definitions, * so some things, which depend on other ones, are defined independently. * Now, it is time to check that we don't screw up something. */ CTASSERT(PDRSHIFT == PTE1_SHIFT); /* * Check L1 and L2 page table entries definitions consistency. */ CTASSERT(NB_IN_PT1 == (sizeof(pt1_entry_t) * NPTE1_IN_PT1)); CTASSERT(NB_IN_PT2 == (sizeof(pt2_entry_t) * NPTE2_IN_PT2)); /* * Check L2 page tables page consistency. */ CTASSERT(PAGE_SIZE == (NPT2_IN_PG * NB_IN_PT2)); CTASSERT((1 << PT2PG_SHIFT) == NPT2_IN_PG); /* * Check PT2TAB consistency. * PT2TAB_ENTRIES is defined as a division of NPTE1_IN_PT1 by NPT2_IN_PG. * This should be done without remainder. */ CTASSERT(NPTE1_IN_PT1 == (PT2TAB_ENTRIES * NPT2_IN_PG)); /* * A PT2MAP magic. * * All level 2 page tables (PT2s) are mapped continuously and accordingly * into PT2MAP address space. As PT2 size is less than PAGE_SIZE, this can * be done only if PAGE_SIZE is a multiple of PT2 size. All PT2s in one page * must be used together, but not necessary at once. The first PT2 in a page * must map things on correctly aligned address and the others must follow * in right order. */ #define NB_IN_PT2TAB (PT2TAB_ENTRIES * sizeof(pt2_entry_t)) #define NPT2_IN_PT2TAB (NB_IN_PT2TAB / NB_IN_PT2) #define NPG_IN_PT2TAB (NB_IN_PT2TAB / PAGE_SIZE) /* * Check PT2TAB consistency. * NPT2_IN_PT2TAB is defined as a division of NB_IN_PT2TAB by NB_IN_PT2. * NPG_IN_PT2TAB is defined as a division of NB_IN_PT2TAB by PAGE_SIZE. * The both should be done without remainder. */ CTASSERT(NB_IN_PT2TAB == (NPT2_IN_PT2TAB * NB_IN_PT2)); CTASSERT(NB_IN_PT2TAB == (NPG_IN_PT2TAB * PAGE_SIZE)); /* * The implementation was made general, however, with the assumption * bellow in mind. In case of another value of NPG_IN_PT2TAB, * the code should be once more rechecked. */ CTASSERT(NPG_IN_PT2TAB == 1); /* * Get offset of PT2 in a page * associated with given PT1 index. */ static __inline u_int page_pt2off(u_int pt1_idx) { return ((pt1_idx & PT2PG_MASK) * NB_IN_PT2); } /* * Get physical address of PT2 * associated with given PT2s page and PT1 index. */ static __inline vm_paddr_t page_pt2pa(vm_paddr_t pgpa, u_int pt1_idx) { return (pgpa + page_pt2off(pt1_idx)); } /* * Get first entry of PT2 * associated with given PT2s page and PT1 index. */ static __inline pt2_entry_t * page_pt2(vm_offset_t pgva, u_int pt1_idx) { return ((pt2_entry_t *)(pgva + page_pt2off(pt1_idx))); } /* * Get virtual address of PT2s page (mapped in PT2MAP) * which holds PT2 which holds entry which maps given virtual address. */ static __inline vm_offset_t pt2map_pt2pg(vm_offset_t va) { va &= ~(NPT2_IN_PG * PTE1_SIZE - 1); return ((vm_offset_t)pt2map_entry(va)); } /***************************************************************************** * * THREE pmap initialization milestones exist: * * locore.S * -> fundamental init (including MMU) in ASM * * initarm() * -> fundamental init continues in C * -> first available physical address is known * * pmap_bootstrap_prepare() -> FIRST PMAP MILESTONE (first epoch begins) * -> basic (safe) interface for physical address allocation is made * -> basic (safe) interface for virtual mapping is made * -> limited not SMP coherent work is possible * * -> more fundamental init continues in C * -> locks and some more things are available * -> all fundamental allocations and mappings are done * * pmap_bootstrap() -> SECOND PMAP MILESTONE (second epoch begins) * -> phys_avail[] and virtual_avail is set * -> control is passed to vm subsystem * -> physical and virtual address allocation are off limit * -> low level mapping functions, some SMP coherent, * are available, which cannot be used before vm subsystem * is being inited * * mi_startup() * -> vm subsystem is being inited * * pmap_init() -> THIRD PMAP MILESTONE (third epoch begins) * -> pmap is fully inited * *****************************************************************************/ /***************************************************************************** * * PMAP first stage initialization and utility functions * for pre-bootstrap epoch. * * After pmap_bootstrap_prepare() is called, the following functions * can be used: * * (1) strictly only for this stage functions for physical page allocations, * virtual space allocations, and mappings: * * vm_paddr_t pmap_preboot_get_pages(u_int num); * void pmap_preboot_map_pages(vm_paddr_t pa, vm_offset_t va, u_int num); * vm_offset_t pmap_preboot_reserve_pages(u_int num); * vm_offset_t pmap_preboot_get_vpages(u_int num); * void pmap_preboot_map_attr(vm_paddr_t pa, vm_offset_t va, vm_size_t size, * int prot, int attr); * * (2) for all stages: * * vm_paddr_t pmap_kextract(vm_offset_t va); * * NOTE: This is not SMP coherent stage. * *****************************************************************************/ #define KERNEL_P2V(pa) \ ((vm_offset_t)((pa) - arm_physmem_kernaddr + KERNVIRTADDR)) #define KERNEL_V2P(va) \ ((vm_paddr_t)((va) - KERNVIRTADDR + arm_physmem_kernaddr)) static vm_paddr_t last_paddr; /* * Pre-bootstrap epoch page allocator. */ vm_paddr_t pmap_preboot_get_pages(u_int num) { vm_paddr_t ret; ret = last_paddr; last_paddr += num * PAGE_SIZE; return (ret); } /* * The fundamental initalization of PMAP stuff. * * Some things already happened in locore.S and some things could happen * before pmap_bootstrap_prepare() is called, so let's recall what is done: * 1. Caches are disabled. * 2. We are running on virtual addresses already with 'boot_pt1' * as L1 page table. * 3. So far, all virtual addresses can be converted to physical ones and * vice versa by the following macros: * KERNEL_P2V(pa) .... physical to virtual ones, * KERNEL_V2P(va) .... virtual to physical ones. * * What is done herein: * 1. The 'boot_pt1' is replaced by real kernel L1 page table 'kern_pt1'. * 2. PT2MAP magic is brought to live. * 3. Basic preboot functions for page allocations and mappings can be used. * 4. Everything is prepared for L1 cache enabling. * * Variations: * 1. To use second TTB register, so kernel and users page tables will be * separated. This way process forking - pmap_pinit() - could be faster, * it saves physical pages and KVA per a process, and it's simple change. * However, it will lead, due to hardware matter, to the following: * (a) 2G space for kernel and 2G space for users. * (b) 1G space for kernel in low addresses and 3G for users above it. * A question is: Is the case (b) really an option? Note that case (b) * does save neither physical memory and KVA. */ void pmap_bootstrap_prepare(vm_paddr_t last) { vm_paddr_t pt2pg_pa, pt2tab_pa, pa, size; vm_offset_t pt2pg_va; pt1_entry_t *pte1p; pt2_entry_t *pte2p; u_int i; uint32_t actlr_mask, actlr_set; /* * Now, we are going to make real kernel mapping. Note that we are * already running on some mapping made in locore.S and we expect * that it's large enough to ensure nofault access to physical memory * allocated herein before switch. * * As kernel image and everything needed before are and will be mapped * by section mappings, we align last physical address to PTE1_SIZE. */ last_paddr = pte1_roundup(last); /* * Allocate and zero page(s) for kernel L1 page table. * * Note that it's first allocation on space which was PTE1_SIZE * aligned and as such base_pt1 is aligned to NB_IN_PT1 too. */ base_pt1 = pmap_preboot_get_pages(NPG_IN_PT1); kern_pt1 = (pt1_entry_t *)KERNEL_P2V(base_pt1); bzero((void*)kern_pt1, NB_IN_PT1); pte1_sync_range(kern_pt1, NB_IN_PT1); /* Allocate and zero page(s) for kernel PT2TAB. */ pt2tab_pa = pmap_preboot_get_pages(NPG_IN_PT2TAB); kern_pt2tab = (pt2_entry_t *)KERNEL_P2V(pt2tab_pa); bzero(kern_pt2tab, NB_IN_PT2TAB); pte2_sync_range(kern_pt2tab, NB_IN_PT2TAB); /* Allocate and zero page(s) for kernel L2 page tables. */ pt2pg_pa = pmap_preboot_get_pages(NKPT2PG); pt2pg_va = KERNEL_P2V(pt2pg_pa); size = NKPT2PG * PAGE_SIZE; bzero((void*)pt2pg_va, size); pte2_sync_range((pt2_entry_t *)pt2pg_va, size); /* * Add a physical memory segment (vm_phys_seg) corresponding to the * preallocated pages for kernel L2 page tables so that vm_page * structures representing these pages will be created. The vm_page * structures are required for promotion of the corresponding kernel * virtual addresses to section mappings. */ vm_phys_add_seg(pt2tab_pa, pmap_preboot_get_pages(0)); /* * Insert allocated L2 page table pages to PT2TAB and make * link to all PT2s in L1 page table. See how kernel_vm_end * is initialized. * * We play simple and safe. So every KVA will have underlaying * L2 page table, even kernel image mapped by sections. */ pte2p = kern_pt2tab_entry(KERNBASE); for (pa = pt2pg_pa; pa < pt2pg_pa + size; pa += PTE2_SIZE) pt2tab_store(pte2p++, PTE2_KPT(pa)); pte1p = kern_pte1(KERNBASE); for (pa = pt2pg_pa; pa < pt2pg_pa + size; pa += NB_IN_PT2) pte1_store(pte1p++, PTE1_LINK(pa)); /* Make section mappings for kernel. */ pte1p = kern_pte1(KERNBASE); for (pa = KERNEL_V2P(KERNBASE); pa < last; pa += PTE1_SIZE) pte1_store(pte1p++, PTE1_KERN(pa, PTE1_AP_KRW, ATTR_TO_L1(PTE2_ATTR_WB_WA))); /* * Get free and aligned space for PT2MAP and make L1 page table links * to L2 page tables held in PT2TAB. * * Note that pages holding PT2s are stored in PT2TAB as pt2_entry_t * descriptors and PT2TAB page(s) itself is(are) used as PT2s. Thus * each entry in PT2TAB maps all PT2s in a page. This implies that * virtual address of PT2MAP must be aligned to NPT2_IN_PG * PTE1_SIZE. */ PT2MAP = (pt2_entry_t *)(KERNBASE - PT2MAP_SIZE); pte1p = kern_pte1((vm_offset_t)PT2MAP); for (pa = pt2tab_pa, i = 0; i < NPT2_IN_PT2TAB; i++, pa += NB_IN_PT2) { pte1_store(pte1p++, PTE1_LINK(pa)); } /* * Store PT2TAB in PT2TAB itself, i.e. self reference mapping. * Each pmap will hold own PT2TAB, so the mapping should be not global. */ pte2p = kern_pt2tab_entry((vm_offset_t)PT2MAP); for (pa = pt2tab_pa, i = 0; i < NPG_IN_PT2TAB; i++, pa += PTE2_SIZE) { pt2tab_store(pte2p++, PTE2_KPT_NG(pa)); } /* * Choose correct L2 page table and make mappings for allocations * made herein which replaces temporary locore.S mappings after a while. * Note that PT2MAP cannot be used until we switch to kern_pt1. * * Note, that these allocations started aligned on 1M section and * kernel PT1 was allocated first. Making of mappings must follow * order of physical allocations as we've used KERNEL_P2V() macro * for virtual addresses resolution. */ pte2p = kern_pt2tab_entry((vm_offset_t)kern_pt1); pt2pg_va = KERNEL_P2V(pte2_pa(pte2_load(pte2p))); pte2p = page_pt2(pt2pg_va, pte1_index((vm_offset_t)kern_pt1)); /* Make mapping for kernel L1 page table. */ for (pa = base_pt1, i = 0; i < NPG_IN_PT1; i++, pa += PTE2_SIZE) pte2_store(pte2p++, PTE2_KPT(pa)); /* Make mapping for kernel PT2TAB. */ for (pa = pt2tab_pa, i = 0; i < NPG_IN_PT2TAB; i++, pa += PTE2_SIZE) pte2_store(pte2p++, PTE2_KPT(pa)); /* Finally, switch from 'boot_pt1' to 'kern_pt1'. */ pmap_kern_ttb = base_pt1 | ttb_flags; cpuinfo_get_actlr_modifier(&actlr_mask, &actlr_set); reinit_mmu(pmap_kern_ttb, actlr_mask, actlr_set); /* * Initialize the first available KVA. As kernel image is mapped by * sections, we are leaving some gap behind. */ virtual_avail = (vm_offset_t)kern_pt2tab + NPG_IN_PT2TAB * PAGE_SIZE; } /* * Setup L2 page table page for given KVA. * Used in pre-bootstrap epoch. * * Note that we have allocated NKPT2PG pages for L2 page tables in advance * and used them for mapping KVA starting from KERNBASE. However, this is not * enough. Vectors and devices need L2 page tables too. Note that they are * even above VM_MAX_KERNEL_ADDRESS. */ static __inline vm_paddr_t pmap_preboot_pt2pg_setup(vm_offset_t va) { pt2_entry_t *pte2p, pte2; vm_paddr_t pt2pg_pa; /* Get associated entry in PT2TAB. */ pte2p = kern_pt2tab_entry(va); /* Just return, if PT2s page exists already. */ pte2 = pt2tab_load(pte2p); if (pte2_is_valid(pte2)) return (pte2_pa(pte2)); KASSERT(va >= VM_MAX_KERNEL_ADDRESS, ("%s: NKPT2PG too small", __func__)); /* * Allocate page for PT2s and insert it to PT2TAB. * In other words, map it into PT2MAP space. */ pt2pg_pa = pmap_preboot_get_pages(1); pt2tab_store(pte2p, PTE2_KPT(pt2pg_pa)); /* Zero all PT2s in allocated page. */ bzero((void*)pt2map_pt2pg(va), PAGE_SIZE); pte2_sync_range((pt2_entry_t *)pt2map_pt2pg(va), PAGE_SIZE); return (pt2pg_pa); } /* * Setup L2 page table for given KVA. * Used in pre-bootstrap epoch. */ static void pmap_preboot_pt2_setup(vm_offset_t va) { pt1_entry_t *pte1p; vm_paddr_t pt2pg_pa, pt2_pa; /* Setup PT2's page. */ pt2pg_pa = pmap_preboot_pt2pg_setup(va); pt2_pa = page_pt2pa(pt2pg_pa, pte1_index(va)); /* Insert PT2 to PT1. */ pte1p = kern_pte1(va); pte1_store(pte1p, PTE1_LINK(pt2_pa)); } /* * Get L2 page entry associated with given KVA. * Used in pre-bootstrap epoch. */ static __inline pt2_entry_t* pmap_preboot_vtopte2(vm_offset_t va) { pt1_entry_t *pte1p; /* Setup PT2 if needed. */ pte1p = kern_pte1(va); if (!pte1_is_valid(pte1_load(pte1p))) /* XXX - sections ?! */ pmap_preboot_pt2_setup(va); return (pt2map_entry(va)); } /* * Pre-bootstrap epoch page(s) mapping(s). */ void pmap_preboot_map_pages(vm_paddr_t pa, vm_offset_t va, u_int num) { u_int i; pt2_entry_t *pte2p; /* Map all the pages. */ for (i = 0; i < num; i++) { pte2p = pmap_preboot_vtopte2(va); pte2_store(pte2p, PTE2_KRW(pa)); va += PAGE_SIZE; pa += PAGE_SIZE; } } /* * Pre-bootstrap epoch virtual space alocator. */ vm_offset_t pmap_preboot_reserve_pages(u_int num) { u_int i; vm_offset_t start, va; pt2_entry_t *pte2p; /* Allocate virtual space. */ start = va = virtual_avail; virtual_avail += num * PAGE_SIZE; /* Zero the mapping. */ for (i = 0; i < num; i++) { pte2p = pmap_preboot_vtopte2(va); pte2_store(pte2p, 0); va += PAGE_SIZE; } return (start); } /* * Pre-bootstrap epoch page(s) allocation and mapping(s). */ vm_offset_t pmap_preboot_get_vpages(u_int num) { vm_paddr_t pa; vm_offset_t va; /* Allocate physical page(s). */ pa = pmap_preboot_get_pages(num); /* Allocate virtual space. */ va = virtual_avail; virtual_avail += num * PAGE_SIZE; /* Map and zero all. */ pmap_preboot_map_pages(pa, va, num); bzero((void *)va, num * PAGE_SIZE); return (va); } /* * Pre-bootstrap epoch page mapping(s) with attributes. */ void pmap_preboot_map_attr(vm_paddr_t pa, vm_offset_t va, vm_size_t size, int prot, int attr) { u_int num; u_int l1_attr, l1_prot; pt1_entry_t *pte1p; pt2_entry_t *pte2p; l1_prot = ATTR_TO_L1(prot); l1_attr = ATTR_TO_L1(attr); /* Map all the pages. */ num = round_page(size); while (num > 0) { if ((((va | pa) & PTE1_OFFSET) == 0) && (num >= PTE1_SIZE)) { pte1p = kern_pte1(va); pte1_store(pte1p, PTE1_KERN(pa, l1_prot, l1_attr)); va += PTE1_SIZE; pa += PTE1_SIZE; num -= PTE1_SIZE; } else { pte2p = pmap_preboot_vtopte2(va); pte2_store(pte2p, PTE2_KERN(pa, prot, attr)); va += PAGE_SIZE; pa += PAGE_SIZE; num -= PAGE_SIZE; } } } /* * Extract from the kernel page table the physical address * that is mapped by the given virtual address "va". */ vm_paddr_t pmap_kextract(vm_offset_t va) { vm_paddr_t pa; pt1_entry_t pte1; pt2_entry_t pte2; pte1 = pte1_load(kern_pte1(va)); if (pte1_is_section(pte1)) { pa = pte1_pa(pte1) | (va & PTE1_OFFSET); } else if (pte1_is_link(pte1)) { /* * We should beware of concurrent promotion that changes * pte1 at this point. However, it's not a problem as PT2 * page is preserved by promotion in PT2TAB. So even if * it happens, using of PT2MAP is still safe. * * QQQ: However, concurrent removing is a problem which * ends in abort on PT2MAP space. Locking must be used * to deal with this. */ pte2 = pte2_load(pt2map_entry(va)); pa = pte2_pa(pte2) | (va & PTE2_OFFSET); } else { panic("%s: va %#x pte1 %#x", __func__, va, pte1); } return (pa); } +/* + * Extract from the kernel page table the physical address + * that is mapped by the given virtual address "va". Also + * return L2 page table entry which maps the address. + * + * This is only intended to be used for panic dumps. + */ +vm_paddr_t +pmap_dump_kextract(vm_offset_t va, pt2_entry_t *pte2p) +{ + vm_paddr_t pa; + pt1_entry_t pte1; + pt2_entry_t pte2; + + pte1 = pte1_load(kern_pte1(va)); + if (pte1_is_section(pte1)) { + pa = pte1_pa(pte1) | (va & PTE1_OFFSET); + pte2 = pa | ATTR_TO_L2(pte1) | PTE2_V; + } else if (pte1_is_link(pte1)) { + pte2 = pte2_load(pt2map_entry(va)); + pa = pte2_pa(pte2); + } else { + pte2 = 0; + pa = 0; + } + if (pte2p != NULL) + *pte2p = pte2; + return (pa); +} + /***************************************************************************** * * PMAP second stage initialization and utility functions * for bootstrap epoch. * * After pmap_bootstrap() is called, the following functions for * mappings can be used: * * void pmap_kenter(vm_offset_t va, vm_paddr_t pa); * void pmap_kremove(vm_offset_t va); * vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, * int prot); * * NOTE: This is not SMP coherent stage. And physical page allocation is not * allowed during this stage. * *****************************************************************************/ /* * Initialize kernel PMAP locks and lists, kernel_pmap itself, and * reserve various virtual spaces for temporary mappings. */ void pmap_bootstrap(vm_offset_t firstaddr) { pt2_entry_t *unused __unused; struct sysmaps *sysmaps; u_int i; /* * Initialize the kernel pmap (which is statically allocated). */ PMAP_LOCK_INIT(kernel_pmap); kernel_l1pa = (vm_paddr_t)kern_pt1; /* for libkvm */ kernel_pmap->pm_pt1 = kern_pt1; kernel_pmap->pm_pt2tab = kern_pt2tab; CPU_FILL(&kernel_pmap->pm_active); /* don't allow deactivation */ TAILQ_INIT(&kernel_pmap->pm_pvchunk); /* * Initialize the global pv list lock. */ rw_init(&pvh_global_lock, "pmap pv global"); LIST_INIT(&allpmaps); /* * Request a spin mutex so that changes to allpmaps cannot be * preempted by smp_rendezvous_cpus(). */ mtx_init(&allpmaps_lock, "allpmaps", NULL, MTX_SPIN); mtx_lock_spin(&allpmaps_lock); LIST_INSERT_HEAD(&allpmaps, kernel_pmap, pm_list); mtx_unlock_spin(&allpmaps_lock); /* * Reserve some special page table entries/VA space for temporary * mapping of pages. */ #define SYSMAP(c, p, v, n) do { \ v = (c)pmap_preboot_reserve_pages(n); \ p = pt2map_entry((vm_offset_t)v); \ } while (0) /* * Local CMAP1/CMAP2 are used for zeroing and copying pages. * Local CMAP3 is used for data cache cleaning. * Global CMAP3 is used for the idle process page zeroing. */ for (i = 0; i < MAXCPU; i++) { sysmaps = &sysmaps_pcpu[i]; mtx_init(&sysmaps->lock, "SYSMAPS", NULL, MTX_DEF); SYSMAP(caddr_t, sysmaps->CMAP1, sysmaps->CADDR1, 1); SYSMAP(caddr_t, sysmaps->CMAP2, sysmaps->CADDR2, 1); SYSMAP(caddr_t, sysmaps->CMAP3, sysmaps->CADDR3, 1); } SYSMAP(caddr_t, CMAP3, CADDR3, 1); /* * Crashdump maps. */ SYSMAP(caddr_t, unused, crashdumpmap, MAXDUMPPGS); /* * _tmppt is used for reading arbitrary physical pages via /dev/mem. */ SYSMAP(caddr_t, unused, _tmppt, 1); /* * PADDR1 and PADDR2 are used by pmap_pte2_quick() and pmap_pte2(), * respectively. PADDR3 is used by pmap_pte2_ddb(). */ SYSMAP(pt2_entry_t *, PMAP1, PADDR1, 1); SYSMAP(pt2_entry_t *, PMAP2, PADDR2, 1); #ifdef DDB SYSMAP(pt2_entry_t *, PMAP3, PADDR3, 1); #endif mtx_init(&PMAP2mutex, "PMAP2", NULL, MTX_DEF); /* * Note that in very short time in initarm(), we are going to * initialize phys_avail[] array and no futher page allocation * can happen after that until vm subsystem will be initialized. */ kernel_vm_end_new = kernel_vm_end; virtual_end = vm_max_kernel_address; } static void pmap_init_qpages(void) { struct pcpu *pc; int i; CPU_FOREACH(i) { pc = pcpu_find(i); pc->pc_qmap_addr = kva_alloc(PAGE_SIZE); if (pc->pc_qmap_addr == 0) panic("%s: unable to allocate KVA", __func__); } } SYSINIT(qpages_init, SI_SUB_CPU, SI_ORDER_ANY, pmap_init_qpages, NULL); /* * The function can already be use in second initialization stage. * As such, the function DOES NOT call pmap_growkernel() where PT2 * allocation can happen. So if used, be sure that PT2 for given * virtual address is allocated already! * * Add a wired page to the kva. * Note: not SMP coherent. */ static __inline void pmap_kenter_prot_attr(vm_offset_t va, vm_paddr_t pa, uint32_t prot, uint32_t attr) { pt1_entry_t *pte1p; pt2_entry_t *pte2p; pte1p = kern_pte1(va); if (!pte1_is_valid(pte1_load(pte1p))) { /* XXX - sections ?! */ /* * This is a very low level function, so PT2 and particularly * PT2PG associated with given virtual address must be already * allocated. It's a pain mainly during pmap initialization * stage. However, called after pmap initialization with * virtual address not under kernel_vm_end will lead to * the same misery. */ if (!pte2_is_valid(pte2_load(kern_pt2tab_entry(va)))) panic("%s: kernel PT2 not allocated!", __func__); } pte2p = pt2map_entry(va); pte2_store(pte2p, PTE2_KERN(pa, prot, attr)); } static __inline void pmap_kenter_attr(vm_offset_t va, vm_paddr_t pa, int attr) { pmap_kenter_prot_attr(va, pa, PTE2_AP_KRW, attr); } PMAP_INLINE void pmap_kenter(vm_offset_t va, vm_paddr_t pa) { pmap_kenter_prot_attr(va, pa, PTE2_AP_KRW, PTE2_ATTR_NORMAL); } /* * Remove a page from the kernel pagetables. * Note: not SMP coherent. */ PMAP_INLINE void pmap_kremove(vm_offset_t va) { pt2_entry_t *pte2p; pte2p = pt2map_entry(va); pte2_clear(pte2p); } /* * Share new kernel PT2PG with all pmaps. * The caller is responsible for maintaining TLB consistency. */ static void pmap_kenter_pt2tab(vm_offset_t va, pt2_entry_t npte2) { pmap_t pmap; pt2_entry_t *pte2p; mtx_lock_spin(&allpmaps_lock); LIST_FOREACH(pmap, &allpmaps, pm_list) { pte2p = pmap_pt2tab_entry(pmap, va); pt2tab_store(pte2p, npte2); } mtx_unlock_spin(&allpmaps_lock); } /* * Share new kernel PTE1 with all pmaps. * The caller is responsible for maintaining TLB consistency. */ static void pmap_kenter_pte1(vm_offset_t va, pt1_entry_t npte1) { pmap_t pmap; pt1_entry_t *pte1p; mtx_lock_spin(&allpmaps_lock); LIST_FOREACH(pmap, &allpmaps, pm_list) { pte1p = pmap_pte1(pmap, va); pte1_store(pte1p, npte1); } mtx_unlock_spin(&allpmaps_lock); } /* * Used to map a range of physical addresses into kernel * virtual address space. * * The value passed in '*virt' is a suggested virtual address for * the mapping. Architectures which can support a direct-mapped * physical to virtual region can return the appropriate address * within that region, leaving '*virt' unchanged. Other * architectures should map the pages starting at '*virt' and * update '*virt' with the first usable address after the mapped * region. * * NOTE: Read the comments above pmap_kenter_prot_attr() as * the function is used herein! */ vm_offset_t pmap_map(vm_offset_t *virt, vm_paddr_t start, vm_paddr_t end, int prot) { vm_offset_t va, sva; vm_paddr_t pte1_offset; pt1_entry_t npte1; u_int l1prot,l2prot; PDEBUG(1, printf("%s: virt = %#x, start = %#x, end = %#x (size = %#x)," " prot = %d\n", __func__, *virt, start, end, end - start, prot)); l2prot = (prot & VM_PROT_WRITE) ? PTE2_AP_KRW : PTE1_AP_KR; l2prot |= (prot & VM_PROT_EXECUTE) ? PTE2_X : PTE2_NX; l1prot = ATTR_TO_L1(l2prot); va = *virt; /* * Does the physical address range's size and alignment permit at * least one section mapping to be created? */ pte1_offset = start & PTE1_OFFSET; if ((end - start) - ((PTE1_SIZE - pte1_offset) & PTE1_OFFSET) >= PTE1_SIZE) { /* * Increase the starting virtual address so that its alignment * does not preclude the use of section mappings. */ if ((va & PTE1_OFFSET) < pte1_offset) va = pte1_trunc(va) + pte1_offset; else if ((va & PTE1_OFFSET) > pte1_offset) va = pte1_roundup(va) + pte1_offset; } sva = va; while (start < end) { if ((start & PTE1_OFFSET) == 0 && end - start >= PTE1_SIZE) { KASSERT((va & PTE1_OFFSET) == 0, ("%s: misaligned va %#x", __func__, va)); npte1 = PTE1_KERN(start, l1prot, PTE1_ATTR_NORMAL); pmap_kenter_pte1(va, npte1); va += PTE1_SIZE; start += PTE1_SIZE; } else { pmap_kenter_prot_attr(va, start, l2prot, PTE2_ATTR_NORMAL); va += PAGE_SIZE; start += PAGE_SIZE; } } tlb_flush_range(sva, va - sva); *virt = va; return (sva); } /* * Make a temporary mapping for a physical address. * This is only intended to be used for panic dumps. */ void * pmap_kenter_temporary(vm_paddr_t pa, int i) { vm_offset_t va; /* QQQ: 'i' should be less or equal to MAXDUMPPGS. */ va = (vm_offset_t)crashdumpmap + (i * PAGE_SIZE); pmap_kenter(va, pa); tlb_flush_local(va); return ((void *)crashdumpmap); } /************************************* * * TLB & cache maintenance routines. * *************************************/ /* * We inline these within pmap.c for speed. */ PMAP_INLINE void pmap_tlb_flush(pmap_t pmap, vm_offset_t va) { if (pmap == kernel_pmap || !CPU_EMPTY(&pmap->pm_active)) tlb_flush(va); } PMAP_INLINE void pmap_tlb_flush_range(pmap_t pmap, vm_offset_t sva, vm_size_t size) { if (pmap == kernel_pmap || !CPU_EMPTY(&pmap->pm_active)) tlb_flush_range(sva, size); } /* * Abuse the pte2 nodes for unmapped kva to thread a kva freelist through. * Requirements: * - Must deal with pages in order to ensure that none of the PTE2_* bits * are ever set, PTE2_V in particular. * - Assumes we can write to pte2s without pte2_store() atomic ops. * - Assumes nothing will ever test these addresses for 0 to indicate * no mapping instead of correctly checking PTE2_V. * - Assumes a vm_offset_t will fit in a pte2 (true for arm). * Because PTE2_V is never set, there can be no mappings to invalidate. */ static vm_offset_t pmap_pte2list_alloc(vm_offset_t *head) { pt2_entry_t *pte2p; vm_offset_t va; va = *head; if (va == 0) panic("pmap_ptelist_alloc: exhausted ptelist KVA"); pte2p = pt2map_entry(va); *head = *pte2p; if (*head & PTE2_V) panic("%s: va with PTE2_V set!", __func__); *pte2p = 0; return (va); } static void pmap_pte2list_free(vm_offset_t *head, vm_offset_t va) { pt2_entry_t *pte2p; if (va & PTE2_V) panic("%s: freeing va with PTE2_V set!", __func__); pte2p = pt2map_entry(va); *pte2p = *head; /* virtual! PTE2_V is 0 though */ *head = va; } static void pmap_pte2list_init(vm_offset_t *head, void *base, int npages) { int i; vm_offset_t va; *head = 0; for (i = npages - 1; i >= 0; i--) { va = (vm_offset_t)base + i * PAGE_SIZE; pmap_pte2list_free(head, va); } } /***************************************************************************** * * PMAP third and final stage initialization. * * After pmap_init() is called, PMAP subsystem is fully initialized. * *****************************************************************************/ SYSCTL_NODE(_vm, OID_AUTO, pmap, CTLFLAG_RD, 0, "VM/pmap parameters"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_max, CTLFLAG_RD, &pv_entry_max, 0, "Max number of PV entries"); SYSCTL_INT(_vm_pmap, OID_AUTO, shpgperproc, CTLFLAG_RD, &shpgperproc, 0, "Page share factor per proc"); static u_long nkpt2pg = NKPT2PG; SYSCTL_ULONG(_vm_pmap, OID_AUTO, nkpt2pg, CTLFLAG_RD, &nkpt2pg, 0, "Pre-allocated pages for kernel PT2s"); static int sp_enabled = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, sp_enabled, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &sp_enabled, 0, "Are large page mappings enabled?"); static SYSCTL_NODE(_vm_pmap, OID_AUTO, pte1, CTLFLAG_RD, 0, "1MB page mapping counters"); static u_long pmap_pte1_demotions; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, demotions, CTLFLAG_RD, &pmap_pte1_demotions, 0, "1MB page demotions"); static u_long pmap_pte1_mappings; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, mappings, CTLFLAG_RD, &pmap_pte1_mappings, 0, "1MB page mappings"); static u_long pmap_pte1_p_failures; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, p_failures, CTLFLAG_RD, &pmap_pte1_p_failures, 0, "1MB page promotion failures"); static u_long pmap_pte1_promotions; SYSCTL_ULONG(_vm_pmap_pte1, OID_AUTO, promotions, CTLFLAG_RD, &pmap_pte1_promotions, 0, "1MB page promotions"); static __inline ttb_entry_t pmap_ttb_get(pmap_t pmap) { return (vtophys(pmap->pm_pt1) | ttb_flags); } /* * Initialize a vm_page's machine-dependent fields. * * Variations: * 1. Pages for L2 page tables are always not managed. So, pv_list and * pt2_wirecount can share same physical space. However, proper * initialization on a page alloc for page tables and reinitialization * on the page free must be ensured. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); pt2_wirecount_init(m); m->md.pat_mode = PTE2_ATTR_NORMAL; } /* * Virtualization for faster way how to zero whole page. */ static __inline void pagezero(void *page) { bzero(page, PAGE_SIZE); } /* * Zero L2 page table page. * Use same KVA as in pmap_zero_page(). */ static __inline vm_paddr_t pmap_pt2pg_zero(vm_page_t m) { vm_paddr_t pa; struct sysmaps *sysmaps; pa = VM_PAGE_TO_PHYS(m); /* * XXX: For now, we map whole page even if it's already zero, * to sync it even if the sync is only DSB. */ sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(pa, PTE2_AP_KRW, m->md.pat_mode)); /* Even VM_ALLOC_ZERO request is only advisory. */ if ((m->flags & PG_ZERO) == 0) pagezero(sysmaps->CADDR2); pte2_sync_range((pt2_entry_t *)sysmaps->CADDR2, PAGE_SIZE); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); return (pa); } /* * Init just allocated page as L2 page table(s) holder * and return its physical address. */ static __inline vm_paddr_t pmap_pt2pg_init(pmap_t pmap, vm_offset_t va, vm_page_t m) { vm_paddr_t pa; pt2_entry_t *pte2p; /* Check page attributes. */ if (pmap_page_get_memattr(m) != pt_memattr) pmap_page_set_memattr(m, pt_memattr); /* Zero page and init wire counts. */ pa = pmap_pt2pg_zero(m); pt2_wirecount_init(m); /* * Map page to PT2MAP address space for given pmap. * Note that PT2MAP space is shared with all pmaps. */ if (pmap == kernel_pmap) pmap_kenter_pt2tab(va, PTE2_KPT(pa)); else { pte2p = pmap_pt2tab_entry(pmap, va); pt2tab_store(pte2p, PTE2_KPT_NG(pa)); } return (pa); } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ void pmap_init(void) { vm_size_t s; pt2_entry_t *pte2p, pte2; u_int i, pte1_idx, pv_npg; PDEBUG(1, printf("%s: phys_start = %#x\n", __func__, PHYSADDR)); /* * Initialize the vm page array entries for kernel pmap's * L2 page table pages allocated in advance. */ pte1_idx = pte1_index(KERNBASE - PT2MAP_SIZE); pte2p = kern_pt2tab_entry(KERNBASE - PT2MAP_SIZE); for (i = 0; i < nkpt2pg + NPG_IN_PT2TAB; i++, pte2p++) { vm_paddr_t pa; vm_page_t m; pte2 = pte2_load(pte2p); KASSERT(pte2_is_valid(pte2), ("%s: no valid entry", __func__)); pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pa); KASSERT(m >= vm_page_array && m < &vm_page_array[vm_page_array_size], ("%s: L2 page table page is out of range", __func__)); m->pindex = pte1_idx; m->phys_addr = pa; pte1_idx += NPT2_IN_PG; } /* * Initialize the address space (zone) for the pv entries. Set a * high water mark so that the system can recover from excessive * numbers of pv entries. */ TUNABLE_INT_FETCH("vm.pmap.shpgperproc", &shpgperproc); pv_entry_max = shpgperproc * maxproc + vm_cnt.v_page_count; TUNABLE_INT_FETCH("vm.pmap.pv_entries", &pv_entry_max); pv_entry_max = roundup(pv_entry_max, _NPCPV); pv_entry_high_water = 9 * (pv_entry_max / 10); /* * Are large page mappings enabled? */ TUNABLE_INT_FETCH("vm.pmap.sp_enabled", &sp_enabled); if (sp_enabled) { KASSERT(MAXPAGESIZES > 1 && pagesizes[1] == 0, ("%s: can't assign to pagesizes[1]", __func__)); pagesizes[1] = PTE1_SIZE; } /* * Calculate the size of the pv head table for sections. * Handle the possibility that "vm_phys_segs[...].end" is zero. * Note that the table is only for sections which could be promoted. */ first_managed_pa = pte1_trunc(vm_phys_segs[0].start); pv_npg = (pte1_trunc(vm_phys_segs[vm_phys_nsegs - 1].end - PAGE_SIZE) - first_managed_pa) / PTE1_SIZE + 1; /* * Allocate memory for the pv head table for sections. */ s = (vm_size_t)(pv_npg * sizeof(struct md_page)); s = round_page(s); pv_table = (struct md_page *)kmem_malloc(kernel_arena, s, M_WAITOK | M_ZERO); for (i = 0; i < pv_npg; i++) TAILQ_INIT(&pv_table[i].pv_list); pv_maxchunks = MAX(pv_entry_max / _NPCPV, maxproc); pv_chunkbase = (struct pv_chunk *)kva_alloc(PAGE_SIZE * pv_maxchunks); if (pv_chunkbase == NULL) panic("%s: not enough kvm for pv chunks", __func__); pmap_pte2list_init(&pv_vafree, pv_chunkbase, pv_maxchunks); } /* * Add a list of wired pages to the kva * this routine is only used for temporary * kernel mappings that do not need to have * page modification or references recorded. * Note that old mappings are simply written * over. The page *must* be wired. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qenter(vm_offset_t sva, vm_page_t *ma, int count) { u_int anychanged; pt2_entry_t *epte2p, *pte2p, pte2; vm_page_t m; vm_paddr_t pa; anychanged = 0; pte2p = pt2map_entry(sva); epte2p = pte2p + count; while (pte2p < epte2p) { m = *ma++; pa = VM_PAGE_TO_PHYS(m); pte2 = pte2_load(pte2p); if ((pte2_pa(pte2) != pa) || (pte2_attr(pte2) != m->md.pat_mode)) { anychanged++; pte2_store(pte2p, PTE2_KERN(pa, PTE2_AP_KRW, m->md.pat_mode)); } pte2p++; } if (__predict_false(anychanged)) tlb_flush_range(sva, count * PAGE_SIZE); } /* * This routine tears out page mappings from the * kernel -- it is meant only for temporary mappings. * Note: SMP coherent. Uses a ranged shootdown IPI. */ void pmap_qremove(vm_offset_t sva, int count) { vm_offset_t va; va = sva; while (count-- > 0) { pmap_kremove(va); va += PAGE_SIZE; } tlb_flush_range(sva, va - sva); } /* * Are we current address space or kernel? */ static __inline int pmap_is_current(pmap_t pmap) { return (pmap == kernel_pmap || (pmap == vmspace_pmap(curthread->td_proc->p_vmspace))); } /* * If the given pmap is not the current or kernel pmap, the returned * pte2 must be released by passing it to pmap_pte2_release(). */ static pt2_entry_t * pmap_pte2(pmap_t pmap, vm_offset_t va) { pt1_entry_t pte1; vm_paddr_t pt2pg_pa; pte1 = pte1_load(pmap_pte1(pmap, va)); if (pte1_is_section(pte1)) panic("%s: attempt to map PTE1", __func__); if (pte1_is_link(pte1)) { /* Are we current address space or kernel? */ if (pmap_is_current(pmap)) return (pt2map_entry(va)); /* Note that L2 page table size is not equal to PAGE_SIZE. */ pt2pg_pa = trunc_page(pte1_link_pa(pte1)); mtx_lock(&PMAP2mutex); if (pte2_pa(pte2_load(PMAP2)) != pt2pg_pa) { pte2_store(PMAP2, PTE2_KPT(pt2pg_pa)); tlb_flush((vm_offset_t)PADDR2); } return (PADDR2 + (arm32_btop(va) & (NPTE2_IN_PG - 1))); } return (NULL); } /* * Releases a pte2 that was obtained from pmap_pte2(). * Be prepared for the pte2p being NULL. */ static __inline void pmap_pte2_release(pt2_entry_t *pte2p) { if ((pt2_entry_t *)(trunc_page((vm_offset_t)pte2p)) == PADDR2) { mtx_unlock(&PMAP2mutex); } } /* * Super fast pmap_pte2 routine best used when scanning * the pv lists. This eliminates many coarse-grained * invltlb calls. Note that many of the pv list * scans are across different pmaps. It is very wasteful * to do an entire tlb flush for checking a single mapping. * * If the given pmap is not the current pmap, pvh_global_lock * must be held and curthread pinned to a CPU. */ static pt2_entry_t * pmap_pte2_quick(pmap_t pmap, vm_offset_t va) { pt1_entry_t pte1; vm_paddr_t pt2pg_pa; pte1 = pte1_load(pmap_pte1(pmap, va)); if (pte1_is_section(pte1)) panic("%s: attempt to map PTE1", __func__); if (pte1_is_link(pte1)) { /* Are we current address space or kernel? */ if (pmap_is_current(pmap)) return (pt2map_entry(va)); rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT(curthread->td_pinned > 0, ("%s: curthread not pinned", __func__)); /* Note that L2 page table size is not equal to PAGE_SIZE. */ pt2pg_pa = trunc_page(pte1_link_pa(pte1)); if (pte2_pa(pte2_load(PMAP1)) != pt2pg_pa) { pte2_store(PMAP1, PTE2_KPT(pt2pg_pa)); #ifdef SMP PMAP1cpu = PCPU_GET(cpuid); #endif tlb_flush_local((vm_offset_t)PADDR1); PMAP1changed++; } else #ifdef SMP if (PMAP1cpu != PCPU_GET(cpuid)) { PMAP1cpu = PCPU_GET(cpuid); tlb_flush_local((vm_offset_t)PADDR1); PMAP1changedcpu++; } else #endif PMAP1unchanged++; return (PADDR1 + (arm32_btop(va) & (NPTE2_IN_PG - 1))); } return (NULL); } /* * Routine: pmap_extract * Function: * Extract the physical page address associated * with the given map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va) { vm_paddr_t pa; pt1_entry_t pte1; pt2_entry_t *pte2p; PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, va)); if (pte1_is_section(pte1)) pa = pte1_pa(pte1) | (va & PTE1_OFFSET); else if (pte1_is_link(pte1)) { pte2p = pmap_pte2(pmap, va); pa = pte2_pa(pte2_load(pte2p)) | (va & PTE2_OFFSET); pmap_pte2_release(pte2p); } else pa = 0; PMAP_UNLOCK(pmap); return (pa); } /* * Routine: pmap_extract_and_hold * Function: * Atomically extract and hold the physical page * with the given pmap and virtual address pair * if that mapping permits the given protection. */ vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot) { vm_paddr_t pa, lockpa; pt1_entry_t pte1; pt2_entry_t pte2, *pte2p; vm_page_t m; lockpa = 0; m = NULL; PMAP_LOCK(pmap); retry: pte1 = pte1_load(pmap_pte1(pmap, va)); if (pte1_is_section(pte1)) { if (!(pte1 & PTE1_RO) || !(prot & VM_PROT_WRITE)) { pa = pte1_pa(pte1) | (va & PTE1_OFFSET); if (vm_page_pa_tryrelock(pmap, pa, &lockpa)) goto retry; m = PHYS_TO_VM_PAGE(pa); vm_page_hold(m); } } else if (pte1_is_link(pte1)) { pte2p = pmap_pte2(pmap, va); pte2 = pte2_load(pte2p); pmap_pte2_release(pte2p); if (pte2_is_valid(pte2) && (!(pte2 & PTE2_RO) || !(prot & VM_PROT_WRITE))) { pa = pte2_pa(pte2); if (vm_page_pa_tryrelock(pmap, pa, &lockpa)) goto retry; m = PHYS_TO_VM_PAGE(pa); vm_page_hold(m); } } PA_UNLOCK_COND(lockpa); PMAP_UNLOCK(pmap); return (m); } /* * Grow the number of kernel L2 page table entries, if needed. */ void pmap_growkernel(vm_offset_t addr) { vm_page_t m; vm_paddr_t pt2pg_pa, pt2_pa; pt1_entry_t pte1; pt2_entry_t pte2; PDEBUG(1, printf("%s: addr = %#x\n", __func__, addr)); /* * All the time kernel_vm_end is first KVA for which underlying * L2 page table is either not allocated or linked from L1 page table * (not considering sections). Except for two possible cases: * * (1) in the very beginning as long as pmap_growkernel() was * not called, it could be first unused KVA (which is not * rounded up to PTE1_SIZE), * * (2) when all KVA space is mapped and kernel_map->max_offset * address is not rounded up to PTE1_SIZE. (For example, * it could be 0xFFFFFFFF.) */ kernel_vm_end = pte1_roundup(kernel_vm_end); mtx_assert(&kernel_map->system_mtx, MA_OWNED); addr = roundup2(addr, PTE1_SIZE); if (addr - 1 >= kernel_map->max_offset) addr = kernel_map->max_offset; while (kernel_vm_end < addr) { pte1 = pte1_load(kern_pte1(kernel_vm_end)); if (pte1_is_valid(pte1)) { kernel_vm_end += PTE1_SIZE; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } continue; } /* * kernel_vm_end_new is used in pmap_pinit() when kernel * mappings are entered to new pmap all at once to avoid race * between pmap_kenter_pte1() and kernel_vm_end increase. * The same aplies to pmap_kenter_pt2tab(). */ kernel_vm_end_new = kernel_vm_end + PTE1_SIZE; pte2 = pt2tab_load(kern_pt2tab_entry(kernel_vm_end)); if (!pte2_is_valid(pte2)) { /* * Install new PT2s page into kernel PT2TAB. */ m = vm_page_alloc(NULL, pte1_index(kernel_vm_end) & ~PT2PG_MASK, VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (m == NULL) panic("%s: no memory to grow kernel", __func__); /* * QQQ: To link all new L2 page tables from L1 page * table now and so pmap_kenter_pte1() them * at once together with pmap_kenter_pt2tab() * could be nice speed up. However, * pmap_growkernel() does not happen so often... * QQQ: The other TTBR is another option. */ pt2pg_pa = pmap_pt2pg_init(kernel_pmap, kernel_vm_end, m); } else pt2pg_pa = pte2_pa(pte2); pt2_pa = page_pt2pa(pt2pg_pa, pte1_index(kernel_vm_end)); pmap_kenter_pte1(kernel_vm_end, PTE1_LINK(pt2_pa)); kernel_vm_end = kernel_vm_end_new; if (kernel_vm_end - 1 >= kernel_map->max_offset) { kernel_vm_end = kernel_map->max_offset; break; } } } static int kvm_size(SYSCTL_HANDLER_ARGS) { unsigned long ksize = vm_max_kernel_address - KERNBASE; return (sysctl_handle_long(oidp, &ksize, 0, req)); } SYSCTL_PROC(_vm, OID_AUTO, kvm_size, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_size, "IU", "Size of KVM"); static int kvm_free(SYSCTL_HANDLER_ARGS) { unsigned long kfree = vm_max_kernel_address - kernel_vm_end; return (sysctl_handle_long(oidp, &kfree, 0, req)); } SYSCTL_PROC(_vm, OID_AUTO, kvm_free, CTLTYPE_LONG|CTLFLAG_RD, 0, 0, kvm_free, "IU", "Amount of KVM free"); /*********************************************** * * Pmap allocation/deallocation routines. * ***********************************************/ /* * Initialize the pmap for the swapper process. */ void pmap_pinit0(pmap_t pmap) { PDEBUG(1, printf("%s: pmap = %p\n", __func__, pmap)); PMAP_LOCK_INIT(pmap); /* * Kernel page table directory and pmap stuff around is already * initialized, we are using it right now and here. So, finish * only PMAP structures initialization for process0 ... * * Since the L1 page table and PT2TAB is shared with the kernel pmap, * which is already included in the list "allpmaps", this pmap does * not need to be inserted into that list. */ pmap->pm_pt1 = kern_pt1; pmap->pm_pt2tab = kern_pt2tab; CPU_ZERO(&pmap->pm_active); PCPU_SET(curpmap, pmap); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); CPU_SET(0, &pmap->pm_active); } static __inline void pte1_copy_nosync(pt1_entry_t *spte1p, pt1_entry_t *dpte1p, vm_offset_t sva, vm_offset_t eva) { u_int idx, count; idx = pte1_index(sva); count = (pte1_index(eva) - idx + 1) * sizeof(pt1_entry_t); bcopy(spte1p + idx, dpte1p + idx, count); } static __inline void pt2tab_copy_nosync(pt2_entry_t *spte2p, pt2_entry_t *dpte2p, vm_offset_t sva, vm_offset_t eva) { u_int idx, count; idx = pt2tab_index(sva); count = (pt2tab_index(eva) - idx + 1) * sizeof(pt2_entry_t); bcopy(spte2p + idx, dpte2p + idx, count); } /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. */ int pmap_pinit(pmap_t pmap) { pt1_entry_t *pte1p; pt2_entry_t *pte2p; vm_paddr_t pa, pt2tab_pa; u_int i; PDEBUG(6, printf("%s: pmap = %p, pm_pt1 = %p\n", __func__, pmap, pmap->pm_pt1)); /* * No need to allocate L2 page table space yet but we do need * a valid L1 page table and PT2TAB table. * * Install shared kernel mappings to these tables. It's a little * tricky as some parts of KVA are reserved for vectors, devices, * and whatever else. These parts are supposed to be above * vm_max_kernel_address. Thus two regions should be installed: * * (1) . * * QQQ: The second region should be stable enough to be installed * only once in time when the tables are allocated. * QQQ: Maybe copy of both regions at once could be faster ... * QQQ: Maybe the other TTBR is an option. * * Finally, install own PT2TAB table to these tables. */ if (pmap->pm_pt1 == NULL) { pmap->pm_pt1 = (pt1_entry_t *)kmem_alloc_contig(kernel_arena, NB_IN_PT1, M_NOWAIT | M_ZERO, 0, -1UL, NB_IN_PT1, 0, pt_memattr); if (pmap->pm_pt1 == NULL) return (0); } if (pmap->pm_pt2tab == NULL) { /* * QQQ: (1) PT2TAB must be contiguous. If PT2TAB is one page * only, what should be the only size for 32 bit systems, * then we could allocate it with vm_page_alloc() and all * the stuff needed as other L2 page table pages. * (2) Note that a process PT2TAB is special L2 page table * page. Its mapping in kernel_arena is permanent and can * be used no matter which process is current. Its mapping * in PT2MAP can be used only for current process. */ pmap->pm_pt2tab = (pt2_entry_t *)kmem_alloc_attr(kernel_arena, NB_IN_PT2TAB, M_NOWAIT | M_ZERO, 0, -1UL, pt_memattr); if (pmap->pm_pt2tab == NULL) { /* * QQQ: As struct pmap is allocated from UMA with * UMA_ZONE_NOFREE flag, it's important to leave * no allocation in pmap if initialization failed. */ kmem_free(kernel_arena, (vm_offset_t)pmap->pm_pt1, NB_IN_PT1); pmap->pm_pt1 = NULL; return (0); } /* * QQQ: Each L2 page table page vm_page_t has pindex set to * pte1 index of virtual address mapped by this page. * It's not valid for non kernel PT2TABs themselves. * The pindex of these pages can not be altered because * of the way how they are allocated now. However, it * should not be a problem. */ } mtx_lock_spin(&allpmaps_lock); /* * To avoid race with pmap_kenter_pte1() and pmap_kenter_pt2tab(), * kernel_vm_end_new is used here instead of kernel_vm_end. */ pte1_copy_nosync(kern_pt1, pmap->pm_pt1, KERNBASE, kernel_vm_end_new - 1); pte1_copy_nosync(kern_pt1, pmap->pm_pt1, vm_max_kernel_address, 0xFFFFFFFF); pt2tab_copy_nosync(kern_pt2tab, pmap->pm_pt2tab, KERNBASE, kernel_vm_end_new - 1); pt2tab_copy_nosync(kern_pt2tab, pmap->pm_pt2tab, vm_max_kernel_address, 0xFFFFFFFF); LIST_INSERT_HEAD(&allpmaps, pmap, pm_list); mtx_unlock_spin(&allpmaps_lock); /* * Store PT2MAP PT2 pages (a.k.a. PT2TAB) in PT2TAB itself. * I.e. self reference mapping. The PT2TAB is private, however mapped * into shared PT2MAP space, so the mapping should be not global. */ pt2tab_pa = vtophys(pmap->pm_pt2tab); pte2p = pmap_pt2tab_entry(pmap, (vm_offset_t)PT2MAP); for (pa = pt2tab_pa, i = 0; i < NPG_IN_PT2TAB; i++, pa += PTE2_SIZE) { pt2tab_store(pte2p++, PTE2_KPT_NG(pa)); } /* Insert PT2MAP PT2s into pmap PT1. */ pte1p = pmap_pte1(pmap, (vm_offset_t)PT2MAP); for (pa = pt2tab_pa, i = 0; i < NPT2_IN_PT2TAB; i++, pa += NB_IN_PT2) { pte1_store(pte1p++, PTE1_LINK(pa)); } /* * Now synchronize new mapping which was made above. */ pte1_sync_range(pmap->pm_pt1, NB_IN_PT1); pte2_sync_range(pmap->pm_pt2tab, NB_IN_PT2TAB); CPU_ZERO(&pmap->pm_active); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); return (1); } #ifdef INVARIANTS static boolean_t pt2tab_user_is_empty(pt2_entry_t *tab) { u_int i, end; end = pt2tab_index(VM_MAXUSER_ADDRESS); for (i = 0; i < end; i++) if (tab[i] != 0) return (FALSE); return (TRUE); } #endif /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pmap) { #ifdef INVARIANTS vm_offset_t start, end; #endif KASSERT(pmap->pm_stats.resident_count == 0, ("%s: pmap resident count %ld != 0", __func__, pmap->pm_stats.resident_count)); KASSERT(pt2tab_user_is_empty(pmap->pm_pt2tab), ("%s: has allocated user PT2(s)", __func__)); KASSERT(CPU_EMPTY(&pmap->pm_active), ("%s: pmap %p is active on some CPU(s)", __func__, pmap)); mtx_lock_spin(&allpmaps_lock); LIST_REMOVE(pmap, pm_list); mtx_unlock_spin(&allpmaps_lock); #ifdef INVARIANTS start = pte1_index(KERNBASE) * sizeof(pt1_entry_t); end = (pte1_index(0xFFFFFFFF) + 1) * sizeof(pt1_entry_t); bzero((char *)pmap->pm_pt1 + start, end - start); start = pt2tab_index(KERNBASE) * sizeof(pt2_entry_t); end = (pt2tab_index(0xFFFFFFFF) + 1) * sizeof(pt2_entry_t); bzero((char *)pmap->pm_pt2tab + start, end - start); #endif /* * We are leaving PT1 and PT2TAB allocated on released pmap, * so hopefully UMA vmspace_zone will always be inited with * UMA_ZONE_NOFREE flag. */ } /********************************************************* * * L2 table pages and their pages management routines. * *********************************************************/ /* * Virtual interface for L2 page table wire counting. * * Each L2 page table in a page has own counter which counts a number of * valid mappings in a table. Global page counter counts mappings in all * tables in a page plus a single itself mapping in PT2TAB. * * During a promotion we leave the associated L2 page table counter * untouched, so the table (strictly speaking a page which holds it) * is never freed if promoted. * * If a page m->wire_count == 1 then no valid mappings exist in any L2 page * table in the page and the page itself is only mapped in PT2TAB. */ static __inline void pt2_wirecount_init(vm_page_t m) { u_int i; /* * Note: A page m is allocated with VM_ALLOC_WIRED flag and * m->wire_count should be already set correctly. * So, there is no need to set it again herein. */ for (i = 0; i < NPT2_IN_PG; i++) m->md.pt2_wirecount[i] = 0; } static __inline void pt2_wirecount_inc(vm_page_t m, uint32_t pte1_idx) { /* * Note: A just modificated pte2 (i.e. already allocated) * is acquiring one extra reference which must be * explicitly cleared. It influences the KASSERTs herein. * All L2 page tables in a page always belong to the same * pmap, so we allow only one extra reference for the page. */ KASSERT(m->md.pt2_wirecount[pte1_idx & PT2PG_MASK] < (NPTE2_IN_PT2 + 1), ("%s: PT2 is overflowing ...", __func__)); KASSERT(m->wire_count <= (NPTE2_IN_PG + 1), ("%s: PT2PG is overflowing ...", __func__)); m->wire_count++; m->md.pt2_wirecount[pte1_idx & PT2PG_MASK]++; } static __inline void pt2_wirecount_dec(vm_page_t m, uint32_t pte1_idx) { KASSERT(m->md.pt2_wirecount[pte1_idx & PT2PG_MASK] != 0, ("%s: PT2 is underflowing ...", __func__)); KASSERT(m->wire_count > 1, ("%s: PT2PG is underflowing ...", __func__)); m->wire_count--; m->md.pt2_wirecount[pte1_idx & PT2PG_MASK]--; } static __inline void pt2_wirecount_set(vm_page_t m, uint32_t pte1_idx, uint16_t count) { KASSERT(count <= NPTE2_IN_PT2, ("%s: invalid count %u", __func__, count)); KASSERT(m->wire_count > m->md.pt2_wirecount[pte1_idx & PT2PG_MASK], ("%s: PT2PG corrupting (%u, %u) ...", __func__, m->wire_count, m->md.pt2_wirecount[pte1_idx & PT2PG_MASK])); m->wire_count -= m->md.pt2_wirecount[pte1_idx & PT2PG_MASK]; m->wire_count += count; m->md.pt2_wirecount[pte1_idx & PT2PG_MASK] = count; KASSERT(m->wire_count <= (NPTE2_IN_PG + 1), ("%s: PT2PG is overflowed (%u) ...", __func__, m->wire_count)); } static __inline uint32_t pt2_wirecount_get(vm_page_t m, uint32_t pte1_idx) { return (m->md.pt2_wirecount[pte1_idx & PT2PG_MASK]); } static __inline boolean_t pt2_is_empty(vm_page_t m, vm_offset_t va) { return (m->md.pt2_wirecount[pte1_index(va) & PT2PG_MASK] == 0); } static __inline boolean_t pt2_is_full(vm_page_t m, vm_offset_t va) { return (m->md.pt2_wirecount[pte1_index(va) & PT2PG_MASK] == NPTE2_IN_PT2); } static __inline boolean_t pt2pg_is_empty(vm_page_t m) { return (m->wire_count == 1); } /* * This routine is called if the L2 page table * is not mapped correctly. */ static vm_page_t _pmap_allocpte2(pmap_t pmap, vm_offset_t va, u_int flags) { uint32_t pte1_idx; pt1_entry_t *pte1p; pt2_entry_t pte2; vm_page_t m; vm_paddr_t pt2pg_pa, pt2_pa; pte1_idx = pte1_index(va); pte1p = pmap->pm_pt1 + pte1_idx; KASSERT(pte1_load(pte1p) == 0, ("%s: pm_pt1[%#x] is not zero: %#x", __func__, pte1_idx, pte1_load(pte1p))); pte2 = pt2tab_load(pmap_pt2tab_entry(pmap, va)); if (!pte2_is_valid(pte2)) { /* * Install new PT2s page into pmap PT2TAB. */ m = vm_page_alloc(NULL, pte1_idx & ~PT2PG_MASK, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED | VM_ALLOC_ZERO); if (m == NULL) { if ((flags & PMAP_ENTER_NOSLEEP) == 0) { PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); VM_WAIT; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); } /* * Indicate the need to retry. While waiting, * the L2 page table page may have been allocated. */ return (NULL); } pmap->pm_stats.resident_count++; pt2pg_pa = pmap_pt2pg_init(pmap, va, m); } else { pt2pg_pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pt2pg_pa); } pt2_wirecount_inc(m, pte1_idx); pt2_pa = page_pt2pa(pt2pg_pa, pte1_idx); pte1_store(pte1p, PTE1_LINK(pt2_pa)); return (m); } static vm_page_t pmap_allocpte2(pmap_t pmap, vm_offset_t va, u_int flags) { u_int pte1_idx; pt1_entry_t *pte1p, pte1; vm_page_t m; pte1_idx = pte1_index(va); retry: pte1p = pmap->pm_pt1 + pte1_idx; pte1 = pte1_load(pte1p); /* * This supports switching from a 1MB page to a * normal 4K page. */ if (pte1_is_section(pte1)) { (void)pmap_demote_pte1(pmap, pte1p, va); /* * Reload pte1 after demotion. * * Note: Demotion can even fail as either PT2 is not find for * the virtual address or PT2PG can not be allocated. */ pte1 = pte1_load(pte1p); } /* * If the L2 page table page is mapped, we just increment the * hold count, and activate it. */ if (pte1_is_link(pte1)) { m = PHYS_TO_VM_PAGE(pte1_link_pa(pte1)); pt2_wirecount_inc(m, pte1_idx); } else { /* * Here if the PT2 isn't mapped, or if it has * been deallocated. */ m = _pmap_allocpte2(pmap, va, flags); if (m == NULL && (flags & PMAP_ENTER_NOSLEEP) == 0) goto retry; } return (m); } static __inline void pmap_free_zero_pages(struct spglist *free) { vm_page_t m; while ((m = SLIST_FIRST(free)) != NULL) { SLIST_REMOVE_HEAD(free, plinks.s.ss); /* Preserve the page's PG_ZERO setting. */ vm_page_free_toq(m); } } /* * Schedule the specified unused L2 page table page to be freed. Specifically, * add the page to the specified list of pages that will be released to the * physical memory manager after the TLB has been updated. */ static __inline void pmap_add_delayed_free_list(vm_page_t m, struct spglist *free) { /* * Put page on a list so that it is released after * *ALL* TLB shootdown is done */ #ifdef PMAP_DEBUG pmap_zero_page_check(m); #endif m->flags |= PG_ZERO; SLIST_INSERT_HEAD(free, m, plinks.s.ss); } /* * Unwire L2 page tables page. */ static void pmap_unwire_pt2pg(pmap_t pmap, vm_offset_t va, vm_page_t m) { pt1_entry_t *pte1p, opte1 __unused; pt2_entry_t *pte2p; uint32_t i; KASSERT(pt2pg_is_empty(m), ("%s: pmap %p PT2PG %p wired", __func__, pmap, m)); /* * Unmap all L2 page tables in the page from L1 page table. * * QQQ: Individual L2 page tables (except the last one) can be unmapped * earlier. However, we are doing that this way. */ KASSERT(m->pindex == (pte1_index(va) & ~PT2PG_MASK), ("%s: pmap %p va %#x PT2PG %p bad index", __func__, pmap, va, m)); pte1p = pmap->pm_pt1 + m->pindex; for (i = 0; i < NPT2_IN_PG; i++, pte1p++) { KASSERT(m->md.pt2_wirecount[i] == 0, ("%s: pmap %p PT2 %u (PG %p) wired", __func__, pmap, i, m)); opte1 = pte1_load(pte1p); if (pte1_is_link(opte1)) { pte1_clear(pte1p); /* * Flush intermediate TLB cache. */ pmap_tlb_flush(pmap, (m->pindex + i) << PTE1_SHIFT); } #ifdef INVARIANTS else KASSERT((opte1 == 0) || pte1_is_section(opte1), ("%s: pmap %p va %#x bad pte1 %x at %u", __func__, pmap, va, opte1, i)); #endif } /* * Unmap the page from PT2TAB. */ pte2p = pmap_pt2tab_entry(pmap, va); (void)pt2tab_load_clear(pte2p); pmap_tlb_flush(pmap, pt2map_pt2pg(va)); m->wire_count = 0; pmap->pm_stats.resident_count--; /* * This is a release store so that the ordinary store unmapping * the L2 page table page is globally performed before TLB shoot- * down is begun. */ atomic_subtract_rel_int(&vm_cnt.v_wire_count, 1); } /* * Decrements a L2 page table page's wire count, which is used to record the * number of valid page table entries within the page. If the wire count * drops to zero, then the page table page is unmapped. Returns TRUE if the * page table page was unmapped and FALSE otherwise. */ static __inline boolean_t pmap_unwire_pt2(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { pt2_wirecount_dec(m, pte1_index(va)); if (pt2pg_is_empty(m)) { /* * QQQ: Wire count is zero, so whole page should be zero and * we can set PG_ZERO flag to it. * Note that when promotion is enabled, it takes some * more efforts. See pmap_unwire_pt2_all() below. */ pmap_unwire_pt2pg(pmap, va, m); pmap_add_delayed_free_list(m, free); return (TRUE); } else return (FALSE); } /* * Drop a L2 page table page's wire count at once, which is used to record * the number of valid L2 page table entries within the page. If the wire * count drops to zero, then the L2 page table page is unmapped. */ static __inline void pmap_unwire_pt2_all(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free) { u_int pte1_idx = pte1_index(va); KASSERT(m->pindex == (pte1_idx & ~PT2PG_MASK), ("%s: PT2 page's pindex is wrong", __func__)); KASSERT(m->wire_count > pt2_wirecount_get(m, pte1_idx), ("%s: bad pt2 wire count %u > %u", __func__, m->wire_count, pt2_wirecount_get(m, pte1_idx))); /* * It's possible that the L2 page table was never used. * It happened in case that a section was created without promotion. */ if (pt2_is_full(m, va)) { pt2_wirecount_set(m, pte1_idx, 0); /* * QQQ: We clear L2 page table now, so when L2 page table page * is going to be freed, we can set it PG_ZERO flag ... * This function is called only on section mappings, so * hopefully it's not to big overload. * * XXX: If pmap is current, existing PT2MAP mapping could be * used for zeroing. */ pmap_zero_page_area(m, page_pt2off(pte1_idx), NB_IN_PT2); } #ifdef INVARIANTS else KASSERT(pt2_is_empty(m, va), ("%s: PT2 is not empty (%u)", __func__, pt2_wirecount_get(m, pte1_idx))); #endif if (pt2pg_is_empty(m)) { pmap_unwire_pt2pg(pmap, va, m); pmap_add_delayed_free_list(m, free); } } /* * After removing a L2 page table entry, this routine is used to * conditionally free the page, and manage the hold/wire counts. */ static boolean_t pmap_unuse_pt2(pmap_t pmap, vm_offset_t va, struct spglist *free) { pt1_entry_t pte1; vm_page_t mpte; if (va >= VM_MAXUSER_ADDRESS) return (FALSE); pte1 = pte1_load(pmap_pte1(pmap, va)); mpte = PHYS_TO_VM_PAGE(pte1_link_pa(pte1)); return (pmap_unwire_pt2(pmap, va, mpte, free)); } /************************************* * * Page management routines. * *************************************/ CTASSERT(sizeof(struct pv_chunk) == PAGE_SIZE); CTASSERT(_NPCM == 11); CTASSERT(_NPCPV == 336); static __inline struct pv_chunk * pv_to_chunk(pv_entry_t pv) { return ((struct pv_chunk *)((uintptr_t)pv & ~(uintptr_t)PAGE_MASK)); } #define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap) #define PC_FREE0_9 0xfffffffful /* Free values for index 0 through 9 */ #define PC_FREE10 0x0000fffful /* Free values for index 10 */ static const uint32_t pc_freemask[_NPCM] = { PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE0_9, PC_FREE10 }; SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_count, CTLFLAG_RD, &pv_entry_count, 0, "Current number of pv entries"); #ifdef PV_STATS static int pc_chunk_count, pc_chunk_allocs, pc_chunk_frees, pc_chunk_tryfail; SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_count, CTLFLAG_RD, &pc_chunk_count, 0, "Current number of pv entry chunks"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_allocs, CTLFLAG_RD, &pc_chunk_allocs, 0, "Current number of pv entry chunks allocated"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_frees, CTLFLAG_RD, &pc_chunk_frees, 0, "Current number of pv entry chunks frees"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_tryfail, CTLFLAG_RD, &pc_chunk_tryfail, 0, "Number of times tried to get a chunk page but failed."); static long pv_entry_frees, pv_entry_allocs; static int pv_entry_spare; SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_frees, CTLFLAG_RD, &pv_entry_frees, 0, "Current number of pv entry frees"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_allocs, CTLFLAG_RD, &pv_entry_allocs, 0, "Current number of pv entry allocs"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_spare, CTLFLAG_RD, &pv_entry_spare, 0, "Current number of spare pv entries"); #endif /* * Is given page managed? */ static __inline boolean_t is_managed(vm_paddr_t pa) { vm_offset_t pgnum; vm_page_t m; pgnum = atop(pa); if (pgnum >= first_page) { m = PHYS_TO_VM_PAGE(pa); if (m == NULL) return (FALSE); if ((m->oflags & VPO_UNMANAGED) == 0) return (TRUE); } return (FALSE); } static __inline boolean_t pte1_is_managed(pt1_entry_t pte1) { return (is_managed(pte1_pa(pte1))); } static __inline boolean_t pte2_is_managed(pt2_entry_t pte2) { return (is_managed(pte2_pa(pte2))); } /* * We are in a serious low memory condition. Resort to * drastic measures to free some pages so we can allocate * another pv entry chunk. */ static vm_page_t pmap_pv_reclaim(pmap_t locked_pmap) { struct pch newtail; struct pv_chunk *pc; struct md_page *pvh; pt1_entry_t *pte1p; pmap_t pmap; pt2_entry_t *pte2p, tpte2; pv_entry_t pv; vm_offset_t va; vm_page_t m, m_pc; struct spglist free; uint32_t inuse; int bit, field, freed; PMAP_LOCK_ASSERT(locked_pmap, MA_OWNED); pmap = NULL; m_pc = NULL; SLIST_INIT(&free); TAILQ_INIT(&newtail); while ((pc = TAILQ_FIRST(&pv_chunks)) != NULL && (pv_vafree == 0 || SLIST_EMPTY(&free))) { TAILQ_REMOVE(&pv_chunks, pc, pc_lru); if (pmap != pc->pc_pmap) { if (pmap != NULL) { if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } pmap = pc->pc_pmap; /* Avoid deadlock and lock recursion. */ if (pmap > locked_pmap) PMAP_LOCK(pmap); else if (pmap != locked_pmap && !PMAP_TRYLOCK(pmap)) { pmap = NULL; TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); continue; } } /* * Destroy every non-wired, 4 KB page mapping in the chunk. */ freed = 0; for (field = 0; field < _NPCM; field++) { for (inuse = ~pc->pc_map[field] & pc_freemask[field]; inuse != 0; inuse &= ~(1UL << bit)) { bit = ffs(inuse) - 1; pv = &pc->pc_pventry[field * 32 + bit]; va = pv->pv_va; pte1p = pmap_pte1(pmap, va); if (pte1_is_section(pte1_load(pte1p))) continue; pte2p = pmap_pte2(pmap, va); tpte2 = pte2_load(pte2p); if ((tpte2 & PTE2_W) == 0) tpte2 = pte2_load_clear(pte2p); pmap_pte2_release(pte2p); if ((tpte2 & PTE2_W) != 0) continue; KASSERT(tpte2 != 0, ("pmap_pv_reclaim: pmap %p va %#x zero pte", pmap, va)); pmap_tlb_flush(pmap, va); m = PHYS_TO_VM_PAGE(pte2_pa(tpte2)); if (pte2_is_dirty(tpte2)) vm_page_dirty(m); if ((tpte2 & PTE2_A) != 0) vm_page_aflag_set(m, PGA_REFERENCED); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) { vm_page_aflag_clear(m, PGA_WRITEABLE); } } pc->pc_map[field] |= 1UL << bit; pmap_unuse_pt2(pmap, va, &free); freed++; } } if (freed == 0) { TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); continue; } /* Every freed mapping is for a 4 KB page. */ pmap->pm_stats.resident_count -= freed; PV_STAT(pv_entry_frees += freed); PV_STAT(pv_entry_spare += freed); pv_entry_count -= freed; TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); for (field = 0; field < _NPCM; field++) if (pc->pc_map[field] != pc_freemask[field]) { TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); /* * One freed pv entry in locked_pmap is * sufficient. */ if (pmap == locked_pmap) goto out; break; } if (field == _NPCM) { PV_STAT(pv_entry_spare -= _NPCPV); PV_STAT(pc_chunk_count--); PV_STAT(pc_chunk_frees++); /* Entire chunk is free; return it. */ m_pc = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)pc)); pmap_qremove((vm_offset_t)pc, 1); pmap_pte2list_free(&pv_vafree, (vm_offset_t)pc); break; } } out: TAILQ_CONCAT(&pv_chunks, &newtail, pc_lru); if (pmap != NULL) { if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } if (m_pc == NULL && pv_vafree != 0 && SLIST_EMPTY(&free)) { m_pc = SLIST_FIRST(&free); SLIST_REMOVE_HEAD(&free, plinks.s.ss); /* Recycle a freed page table page. */ m_pc->wire_count = 1; atomic_add_int(&vm_cnt.v_wire_count, 1); } pmap_free_zero_pages(&free); return (m_pc); } static void free_pv_chunk(struct pv_chunk *pc) { vm_page_t m; TAILQ_REMOVE(&pv_chunks, pc, pc_lru); PV_STAT(pv_entry_spare -= _NPCPV); PV_STAT(pc_chunk_count--); PV_STAT(pc_chunk_frees++); /* entire chunk is free, return it */ m = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)pc)); pmap_qremove((vm_offset_t)pc, 1); vm_page_unwire(m, PQ_NONE); vm_page_free(m); pmap_pte2list_free(&pv_vafree, (vm_offset_t)pc); } /* * Free the pv_entry back to the free list. */ static void free_pv_entry(pmap_t pmap, pv_entry_t pv) { struct pv_chunk *pc; int idx, field, bit; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(pv_entry_frees++); PV_STAT(pv_entry_spare++); pv_entry_count--; pc = pv_to_chunk(pv); idx = pv - &pc->pc_pventry[0]; field = idx / 32; bit = idx % 32; pc->pc_map[field] |= 1ul << bit; for (idx = 0; idx < _NPCM; idx++) if (pc->pc_map[idx] != pc_freemask[idx]) { /* * 98% of the time, pc is already at the head of the * list. If it isn't already, move it to the head. */ if (__predict_false(TAILQ_FIRST(&pmap->pm_pvchunk) != pc)) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); } return; } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } /* * Get a new pv_entry, allocating a block from the system * when needed. */ static pv_entry_t get_pv_entry(pmap_t pmap, boolean_t try) { static const struct timeval printinterval = { 60, 0 }; static struct timeval lastprint; int bit, field; pv_entry_t pv; struct pv_chunk *pc; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); PV_STAT(pv_entry_allocs++); pv_entry_count++; if (pv_entry_count > pv_entry_high_water) if (ratecheck(&lastprint, &printinterval)) printf("Approaching the limit on PV entries, consider " "increasing either the vm.pmap.shpgperproc or the " "vm.pmap.pv_entry_max tunable.\n"); retry: pc = TAILQ_FIRST(&pmap->pm_pvchunk); if (pc != NULL) { for (field = 0; field < _NPCM; field++) { if (pc->pc_map[field]) { bit = ffs(pc->pc_map[field]) - 1; break; } } if (field < _NPCM) { pv = &pc->pc_pventry[field * 32 + bit]; pc->pc_map[field] &= ~(1ul << bit); /* If this was the last item, move it to tail */ for (field = 0; field < _NPCM; field++) if (pc->pc_map[field] != 0) { PV_STAT(pv_entry_spare--); return (pv); /* not full, return */ } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(pv_entry_spare--); return (pv); } } /* * Access to the pte2list "pv_vafree" is synchronized by the pvh * global lock. If "pv_vafree" is currently non-empty, it will * remain non-empty until pmap_pte2list_alloc() completes. */ if (pv_vafree == 0 || (m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { if (try) { pv_entry_count--; PV_STAT(pc_chunk_tryfail++); return (NULL); } m = pmap_pv_reclaim(pmap); if (m == NULL) goto retry; } PV_STAT(pc_chunk_count++); PV_STAT(pc_chunk_allocs++); pc = (struct pv_chunk *)pmap_pte2list_alloc(&pv_vafree); pmap_qenter((vm_offset_t)pc, &m, 1); pc->pc_pmap = pmap; pc->pc_map[0] = pc_freemask[0] & ~1ul; /* preallocated bit 0 */ for (field = 1; field < _NPCM; field++) pc->pc_map[field] = pc_freemask[field]; TAILQ_INSERT_TAIL(&pv_chunks, pc, pc_lru); pv = &pc->pc_pventry[0]; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(pv_entry_spare += _NPCPV - 1); return (pv); } /* * Create a pv entry for page at pa for * (pmap, va). */ static void pmap_insert_entry(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); pv = get_pv_entry(pmap, FALSE); pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); } static __inline pv_entry_t pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (pmap == PV_PMAP(pv) && va == pv->pv_va) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); break; } } return (pv); } static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pvh_free: pv not found")); free_pv_entry(pmap, pv); } static void pmap_remove_entry(pmap_t pmap, vm_page_t m, vm_offset_t va) { struct md_page *pvh; rw_assert(&pvh_global_lock, RA_WLOCKED); pmap_pvh_free(&m->md, pmap, va); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } } static void pmap_pv_demote_pte1(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT((pa & PTE1_OFFSET) == 0, ("pmap_pv_demote_pte1: pa is not 1mpage aligned")); /* * Transfer the 1mpage's pv entry for this mapping to the first * page's pv list. */ pvh = pa_to_pvh(pa); va = pte1_trunc(va); pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pv_demote_pte1: pv not found")); m = PHYS_TO_VM_PAGE(pa); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); /* Instantiate the remaining NPTE2_IN_PT2 - 1 pv entries. */ va_last = va + PTE1_SIZE - PAGE_SIZE; do { m++; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_pv_demote_pte1: page %p is not managed", m)); va += PAGE_SIZE; pmap_insert_entry(pmap, va, m); } while (va < va_last); } static void pmap_pv_promote_pte1(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT((pa & PTE1_OFFSET) == 0, ("pmap_pv_promote_pte1: pa is not 1mpage aligned")); /* * Transfer the first page's pv entry for this mapping to the * 1mpage's pv list. Aside from avoiding the cost of a call * to get_pv_entry(), a transfer avoids the possibility that * get_pv_entry() calls pmap_pv_reclaim() and that pmap_pv_reclaim() * removes one of the mappings that is being promoted. */ m = PHYS_TO_VM_PAGE(pa); va = pte1_trunc(va); pv = pmap_pvh_remove(&m->md, pmap, va); KASSERT(pv != NULL, ("pmap_pv_promote_pte1: pv not found")); pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); /* Free the remaining NPTE2_IN_PT2 - 1 pv entries. */ va_last = va + PTE1_SIZE - PAGE_SIZE; do { m++; va += PAGE_SIZE; pmap_pvh_free(&m->md, pmap, va); } while (va < va_last); } /* * Conditionally create a pv entry. */ static boolean_t pmap_try_insert_pv_entry(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); if (pv_entry_count < pv_entry_high_water && (pv = get_pv_entry(pmap, TRUE)) != NULL) { pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); return (TRUE); } else return (FALSE); } /* * Create the pv entries for each of the pages within a section. */ static boolean_t pmap_pv_insert_pte1(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); if (pv_entry_count < pv_entry_high_water && (pv = get_pv_entry(pmap, TRUE)) != NULL) { pv->pv_va = va; pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); return (TRUE); } else return (FALSE); } /* * Tries to promote the NPTE2_IN_PT2, contiguous 4KB page mappings that are * within a single page table page (PT2) to a single 1MB page mapping. * For promotion to occur, two conditions must be met: (1) the 4KB page * mappings must map aligned, contiguous physical memory and (2) the 4KB page * mappings must have identical characteristics. * * Managed (PG_MANAGED) mappings within the kernel address space are not * promoted. The reason is that kernel PTE1s are replicated in each pmap but * pmap_remove_write(), pmap_clear_modify(), and pmap_clear_reference() only * read the PTE1 from the kernel pmap. */ static void pmap_promote_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va) { pt1_entry_t npte1; pt2_entry_t *fpte2p, fpte2, fpte2_fav; pt2_entry_t *pte2p, pte2; vm_offset_t pteva __unused; vm_page_t m __unused; PDEBUG(6, printf("%s(%p): try for va %#x pte1 %#x at %p\n", __func__, pmap, va, pte1_load(pte1p), pte1p)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * Examine the first PTE2 in the specified PT2. Abort if this PTE2 is * either invalid, unused, or does not map the first 4KB physical page * within a 1MB page. */ fpte2p = pmap_pte2_quick(pmap, pte1_trunc(va)); setpte1: fpte2 = pte2_load(fpte2p); if ((fpte2 & ((PTE2_FRAME & PTE1_OFFSET) | PTE2_A | PTE2_V)) != (PTE2_A | PTE2_V)) { pmap_pte1_p_failures++; CTR3(KTR_PMAP, "%s: failure(1) for va %#x in pmap %p", __func__, va, pmap); return; } if (pte2_is_managed(fpte2) && pmap == kernel_pmap) { pmap_pte1_p_failures++; CTR3(KTR_PMAP, "%s: failure(2) for va %#x in pmap %p", __func__, va, pmap); return; } if ((fpte2 & (PTE2_NM | PTE2_RO)) == PTE2_NM) { /* * When page is not modified, PTE2_RO can be set without * a TLB invalidation. * * Note: When modified bit is being set, then in hardware case, * the TLB entry is re-read (updated) from PT2, and in * software case (abort), the PTE2 is read from PT2 and * TLB flushed if changed. The following cmpset() solves * any race with setting this bit in both cases. */ if (!pte2_cmpset(fpte2p, fpte2, fpte2 | PTE2_RO)) goto setpte1; fpte2 |= PTE2_RO; } /* * Examine each of the other PTE2s in the specified PT2. Abort if this * PTE2 maps an unexpected 4KB physical page or does not have identical * characteristics to the first PTE2. */ fpte2_fav = (fpte2 & (PTE2_FRAME | PTE2_A | PTE2_V)); fpte2_fav += PTE1_SIZE - PTE2_SIZE; /* examine from the end */ for (pte2p = fpte2p + NPTE2_IN_PT2 - 1; pte2p > fpte2p; pte2p--) { setpte2: pte2 = pte2_load(pte2p); if ((pte2 & (PTE2_FRAME | PTE2_A | PTE2_V)) != fpte2_fav) { pmap_pte1_p_failures++; CTR3(KTR_PMAP, "%s: failure(3) for va %#x in pmap %p", __func__, va, pmap); return; } if ((pte2 & (PTE2_NM | PTE2_RO)) == PTE2_NM) { /* * When page is not modified, PTE2_RO can be set * without a TLB invalidation. See note above. */ if (!pte2_cmpset(pte2p, pte2, pte2 | PTE2_RO)) goto setpte2; pte2 |= PTE2_RO; pteva = pte1_trunc(va) | (pte2 & PTE1_OFFSET & PTE2_FRAME); CTR3(KTR_PMAP, "%s: protect for va %#x in pmap %p", __func__, pteva, pmap); } if ((pte2 & PTE2_PROMOTE) != (fpte2 & PTE2_PROMOTE)) { pmap_pte1_p_failures++; CTR3(KTR_PMAP, "%s: failure(4) for va %#x in pmap %p", __func__, va, pmap); return; } fpte2_fav -= PTE2_SIZE; } /* * The page table page in its current state will stay in PT2TAB * until the PTE1 mapping the section is demoted by pmap_demote_pte1() * or destroyed by pmap_remove_pte1(). * * Note that L2 page table size is not equal to PAGE_SIZE. */ m = PHYS_TO_VM_PAGE(trunc_page(pte1_link_pa(pte1_load(pte1p)))); KASSERT(m >= vm_page_array && m < &vm_page_array[vm_page_array_size], ("%s: PT2 page is out of range", __func__)); KASSERT(m->pindex == (pte1_index(va) & ~PT2PG_MASK), ("%s: PT2 page's pindex is wrong", __func__)); /* * Get pte1 from pte2 format. */ npte1 = (fpte2 & PTE1_FRAME) | ATTR_TO_L1(fpte2) | PTE1_V; /* * Promote the pv entries. */ if (pte2_is_managed(fpte2)) pmap_pv_promote_pte1(pmap, va, pte1_pa(npte1)); /* * Map the section. */ if (pmap == kernel_pmap) pmap_kenter_pte1(va, npte1); else pte1_store(pte1p, npte1); /* * Flush old small mappings. We call single pmap_tlb_flush() in * pmap_demote_pte1() and pmap_remove_pte1(), so we must be sure that * no small mappings survive. We assume that given pmap is current and * don't play game with PTE2_NG. */ pmap_tlb_flush_range(pmap, pte1_trunc(va), PTE1_SIZE); pmap_pte1_promotions++; CTR3(KTR_PMAP, "%s: success for va %#x in pmap %p", __func__, va, pmap); PDEBUG(6, printf("%s(%p): success for va %#x pte1 %#x(%#x) at %p\n", __func__, pmap, va, npte1, pte1_load(pte1p), pte1p)); } /* * Zero L2 page table page. */ static __inline void pmap_clear_pt2(pt2_entry_t *fpte2p) { pt2_entry_t *pte2p; for (pte2p = fpte2p; pte2p < fpte2p + NPTE2_IN_PT2; pte2p++) pte2_clear(pte2p); } /* * Removes a 1MB page mapping from the kernel pmap. */ static void pmap_remove_kernel_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va) { vm_page_t m; uint32_t pte1_idx; pt2_entry_t *fpte2p; vm_paddr_t pt2_pa; PMAP_LOCK_ASSERT(pmap, MA_OWNED); m = pmap_pt2_page(pmap, va); if (m == NULL) /* * QQQ: Is this function called only on promoted pte1? * We certainly do section mappings directly * (without promotion) in kernel !!! */ panic("%s: missing pt2 page", __func__); pte1_idx = pte1_index(va); /* * Initialize the L2 page table. */ fpte2p = page_pt2(pt2map_pt2pg(va), pte1_idx); pmap_clear_pt2(fpte2p); /* * Remove the mapping. */ pt2_pa = page_pt2pa(VM_PAGE_TO_PHYS(m), pte1_idx); pmap_kenter_pte1(va, PTE1_LINK(pt2_pa)); /* * QQQ: We do not need to invalidate PT2MAP mapping * as we did not change it. I.e. the L2 page table page * was and still is mapped the same way. */ } /* * Do the things to unmap a section in a process */ static void pmap_remove_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t sva, struct spglist *free) { pt1_entry_t opte1; struct md_page *pvh; vm_offset_t eva, va; vm_page_t m; PDEBUG(6, printf("%s(%p): va %#x pte1 %#x at %p\n", __func__, pmap, sva, pte1_load(pte1p), pte1p)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((sva & PTE1_OFFSET) == 0, ("%s: sva is not 1mpage aligned", __func__)); /* * Clear and invalidate the mapping. It should occupy one and only TLB * entry. So, pmap_tlb_flush() called with aligned address should be * sufficient. */ opte1 = pte1_load_clear(pte1p); pmap_tlb_flush(pmap, sva); if (pte1_is_wired(opte1)) pmap->pm_stats.wired_count -= PTE1_SIZE / PAGE_SIZE; pmap->pm_stats.resident_count -= PTE1_SIZE / PAGE_SIZE; if (pte1_is_managed(opte1)) { pvh = pa_to_pvh(pte1_pa(opte1)); pmap_pvh_free(pvh, pmap, sva); eva = sva + PTE1_SIZE; for (va = sva, m = PHYS_TO_VM_PAGE(pte1_pa(opte1)); va < eva; va += PAGE_SIZE, m++) { if (pte1_is_dirty(opte1)) vm_page_dirty(m); if (opte1 & PTE1_A) vm_page_aflag_set(m, PGA_REFERENCED); if (TAILQ_EMPTY(&m->md.pv_list) && TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } } if (pmap == kernel_pmap) { /* * L2 page table(s) can't be removed from kernel map as * kernel counts on it (stuff around pmap_growkernel()). */ pmap_remove_kernel_pte1(pmap, pte1p, sva); } else { /* * Get associated L2 page table page. * It's possible that the page was never allocated. */ m = pmap_pt2_page(pmap, sva); if (m != NULL) pmap_unwire_pt2_all(pmap, sva, m, free); } } /* * Fills L2 page table page with mappings to consecutive physical pages. */ static __inline void pmap_fill_pt2(pt2_entry_t *fpte2p, pt2_entry_t npte2) { pt2_entry_t *pte2p; for (pte2p = fpte2p; pte2p < fpte2p + NPTE2_IN_PT2; pte2p++) { pte2_store(pte2p, npte2); npte2 += PTE2_SIZE; } } /* * Tries to demote a 1MB page mapping. If demotion fails, the * 1MB page mapping is invalidated. */ static boolean_t pmap_demote_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t va) { pt1_entry_t opte1, npte1; pt2_entry_t *fpte2p, npte2; vm_paddr_t pt2pg_pa, pt2_pa; vm_page_t m; struct spglist free; uint32_t pte1_idx, isnew = 0; PDEBUG(6, printf("%s(%p): try for va %#x pte1 %#x at %p\n", __func__, pmap, va, pte1_load(pte1p), pte1p)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); opte1 = pte1_load(pte1p); KASSERT(pte1_is_section(opte1), ("%s: opte1 not a section", __func__)); if ((opte1 & PTE1_A) == 0 || (m = pmap_pt2_page(pmap, va)) == NULL) { KASSERT(!pte1_is_wired(opte1), ("%s: PT2 page for a wired mapping is missing", __func__)); /* * Invalidate the 1MB page mapping and return * "failure" if the mapping was never accessed or the * allocation of the new page table page fails. */ if ((opte1 & PTE1_A) == 0 || (m = vm_page_alloc(NULL, pte1_index(va) & ~PT2PG_MASK, VM_ALLOC_NOOBJ | VM_ALLOC_NORMAL | VM_ALLOC_WIRED)) == NULL) { SLIST_INIT(&free); pmap_remove_pte1(pmap, pte1p, pte1_trunc(va), &free); pmap_free_zero_pages(&free); CTR3(KTR_PMAP, "%s: failure for va %#x in pmap %p", __func__, va, pmap); return (FALSE); } if (va < VM_MAXUSER_ADDRESS) pmap->pm_stats.resident_count++; isnew = 1; /* * We init all L2 page tables in the page even if * we are going to change everything for one L2 page * table in a while. */ pt2pg_pa = pmap_pt2pg_init(pmap, va, m); } else { if (va < VM_MAXUSER_ADDRESS) { if (pt2_is_empty(m, va)) isnew = 1; /* Demoting section w/o promotion. */ #ifdef INVARIANTS else KASSERT(pt2_is_full(m, va), ("%s: bad PT2 wire" " count %u", __func__, pt2_wirecount_get(m, pte1_index(va)))); #endif } } pt2pg_pa = VM_PAGE_TO_PHYS(m); pte1_idx = pte1_index(va); /* * If the pmap is current, then the PT2MAP can provide access to * the page table page (promoted L2 page tables are not unmapped). * Otherwise, temporarily map the L2 page table page (m) into * the kernel's address space at either PADDR1 or PADDR2. * * Note that L2 page table size is not equal to PAGE_SIZE. */ if (pmap_is_current(pmap)) fpte2p = page_pt2(pt2map_pt2pg(va), pte1_idx); else if (curthread->td_pinned > 0 && rw_wowned(&pvh_global_lock)) { if (pte2_pa(pte2_load(PMAP1)) != pt2pg_pa) { pte2_store(PMAP1, PTE2_KPT(pt2pg_pa)); #ifdef SMP PMAP1cpu = PCPU_GET(cpuid); #endif tlb_flush_local((vm_offset_t)PADDR1); PMAP1changed++; } else #ifdef SMP if (PMAP1cpu != PCPU_GET(cpuid)) { PMAP1cpu = PCPU_GET(cpuid); tlb_flush_local((vm_offset_t)PADDR1); PMAP1changedcpu++; } else #endif PMAP1unchanged++; fpte2p = page_pt2((vm_offset_t)PADDR1, pte1_idx); } else { mtx_lock(&PMAP2mutex); if (pte2_pa(pte2_load(PMAP2)) != pt2pg_pa) { pte2_store(PMAP2, PTE2_KPT(pt2pg_pa)); tlb_flush((vm_offset_t)PADDR2); } fpte2p = page_pt2((vm_offset_t)PADDR2, pte1_idx); } pt2_pa = page_pt2pa(pt2pg_pa, pte1_idx); npte1 = PTE1_LINK(pt2_pa); KASSERT((opte1 & PTE1_A) != 0, ("%s: opte1 is missing PTE1_A", __func__)); KASSERT((opte1 & (PTE1_NM | PTE1_RO)) != PTE1_NM, ("%s: opte1 has PTE1_NM", __func__)); /* * Get pte2 from pte1 format. */ npte2 = pte1_pa(opte1) | ATTR_TO_L2(opte1) | PTE2_V; /* * If the L2 page table page is new, initialize it. If the mapping * has changed attributes, update the page table entries. */ if (isnew != 0) { pt2_wirecount_set(m, pte1_idx, NPTE2_IN_PT2); pmap_fill_pt2(fpte2p, npte2); } else if ((pte2_load(fpte2p) & PTE2_PROMOTE) != (npte2 & PTE2_PROMOTE)) pmap_fill_pt2(fpte2p, npte2); KASSERT(pte2_pa(pte2_load(fpte2p)) == pte2_pa(npte2), ("%s: fpte2p and npte2 map different physical addresses", __func__)); if (fpte2p == PADDR2) mtx_unlock(&PMAP2mutex); /* * Demote the mapping. This pmap is locked. The old PTE1 has * PTE1_A set. If the old PTE1 has not PTE1_RO set, it also * has not PTE1_NM set. Thus, there is no danger of a race with * another processor changing the setting of PTE1_A and/or PTE1_NM * between the read above and the store below. */ if (pmap == kernel_pmap) pmap_kenter_pte1(va, npte1); else pte1_store(pte1p, npte1); /* * Flush old big mapping. The mapping should occupy one and only * TLB entry. So, pmap_tlb_flush() called with aligned address * should be sufficient. */ pmap_tlb_flush(pmap, pte1_trunc(va)); /* * Demote the pv entry. This depends on the earlier demotion * of the mapping. Specifically, the (re)creation of a per- * page pv entry might trigger the execution of pmap_pv_reclaim(), * which might reclaim a newly (re)created per-page pv entry * and destroy the associated mapping. In order to destroy * the mapping, the PTE1 must have already changed from mapping * the 1mpage to referencing the page table page. */ if (pte1_is_managed(opte1)) pmap_pv_demote_pte1(pmap, va, pte1_pa(opte1)); pmap_pte1_demotions++; CTR3(KTR_PMAP, "%s: success for va %#x in pmap %p", __func__, va, pmap); PDEBUG(6, printf("%s(%p): success for va %#x pte1 %#x(%#x) at %p\n", __func__, pmap, va, npte1, pte1_load(pte1p), pte1p)); return (TRUE); } /* * Insert the given physical page (p) at * the specified virtual address (v) in the * target physical map with the protection requested. * * If specified, the page will be wired down, meaning * that the related pte can not be reclaimed. * * NB: This is the only routine which MAY NOT lazy-evaluate * or lose information. That is, this routine must actually * insert this page into the given map NOW. */ int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind) { pt1_entry_t *pte1p; pt2_entry_t *pte2p; pt2_entry_t npte2, opte2; pv_entry_t pv; vm_paddr_t opa, pa; vm_page_t mpte2, om; boolean_t wired; va = trunc_page(va); mpte2 = NULL; wired = (flags & PMAP_ENTER_WIRED) != 0; KASSERT(va <= vm_max_kernel_address, ("%s: toobig", __func__)); KASSERT(va < UPT2V_MIN_ADDRESS || va >= UPT2V_MAX_ADDRESS, ("%s: invalid to pmap_enter page table pages (va: 0x%x)", __func__, va)); if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); sched_pin(); /* * In the case that a page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { mpte2 = pmap_allocpte2(pmap, va, flags); if (mpte2 == NULL) { KASSERT((flags & PMAP_ENTER_NOSLEEP) != 0, ("pmap_allocpte2 failed with sleep allowed")); sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (KERN_RESOURCE_SHORTAGE); } } pte1p = pmap_pte1(pmap, va); if (pte1_is_section(pte1_load(pte1p))) panic("%s: attempted on 1MB page", __func__); pte2p = pmap_pte2_quick(pmap, va); if (pte2p == NULL) panic("%s: invalid L1 page table entry va=%#x", __func__, va); om = NULL; pa = VM_PAGE_TO_PHYS(m); opte2 = pte2_load(pte2p); opa = pte2_pa(opte2); /* * Mapping has not changed, must be protection or wiring change. */ if (pte2_is_valid(opte2) && (opa == pa)) { /* * Wiring change, just update stats. We don't worry about * wiring PT2 pages as they remain resident as long as there * are valid mappings in them. Hence, if a user page is wired, * the PT2 page will be also. */ if (wired && !pte2_is_wired(opte2)) pmap->pm_stats.wired_count++; else if (!wired && pte2_is_wired(opte2)) pmap->pm_stats.wired_count--; /* * Remove extra pte2 reference */ if (mpte2) pt2_wirecount_dec(mpte2, pte1_index(va)); if (pte2_is_managed(opte2)) om = m; goto validate; } /* * QQQ: We think that changing physical address on writeable mapping * is not safe. Well, maybe on kernel address space with correct * locking, it can make a sense. However, we have no idea why * anyone should do that on user address space. Are we wrong? */ KASSERT((opa == 0) || (opa == pa) || !pte2_is_valid(opte2) || ((opte2 & PTE2_RO) != 0), ("%s: pmap %p va %#x(%#x) opa %#x pa %#x - gotcha %#x %#x!", __func__, pmap, va, opte2, opa, pa, flags, prot)); pv = NULL; /* * Mapping has changed, invalidate old range and fall through to * handle validating new mapping. */ if (opa) { if (pte2_is_wired(opte2)) pmap->pm_stats.wired_count--; if (pte2_is_managed(opte2)) { om = PHYS_TO_VM_PAGE(opa); pv = pmap_pvh_remove(&om->md, pmap, va); } /* * Remove extra pte2 reference */ if (mpte2 != NULL) pt2_wirecount_dec(mpte2, va >> PTE1_SHIFT); } else pmap->pm_stats.resident_count++; /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0) { KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva, ("%s: managed mapping within the clean submap", __func__)); if (pv == NULL) pv = get_pv_entry(pmap, FALSE); pv->pv_va = va; TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); } else if (pv != NULL) free_pv_entry(pmap, pv); /* * Increment counters */ if (wired) pmap->pm_stats.wired_count++; validate: /* * Now validate mapping with desired protection/wiring. */ npte2 = PTE2(pa, PTE2_NM, m->md.pat_mode); if (prot & VM_PROT_WRITE) { if (pte2_is_managed(npte2)) vm_page_aflag_set(m, PGA_WRITEABLE); } else npte2 |= PTE2_RO; if ((prot & VM_PROT_EXECUTE) == 0) npte2 |= PTE2_NX; if (wired) npte2 |= PTE2_W; if (va < VM_MAXUSER_ADDRESS) npte2 |= PTE2_U; if (pmap != kernel_pmap) npte2 |= PTE2_NG; /* * If the mapping or permission bits are different, we need * to update the pte2. * * QQQ: Think again and again what to do * if the mapping is going to be changed! */ if ((opte2 & ~(PTE2_NM | PTE2_A)) != (npte2 & ~(PTE2_NM | PTE2_A))) { /* * Sync icache if exec permission and attribute PTE2_ATTR_WB_WA * is set. Do it now, before the mapping is stored and made * valid for hardware table walk. If done later, there is a race * for other threads of current process in lazy loading case. * * QQQ: (1) Does it exist any better way where * or how to sync icache? * (2) Now, we do it on a page basis. */ if ((prot & VM_PROT_EXECUTE) && (m->md.pat_mode == PTE2_ATTR_WB_WA) && ((opa != pa) || (opte2 & PTE2_NX))) cache_icache_sync_fresh(va, pa, PAGE_SIZE); npte2 |= PTE2_A; if (flags & VM_PROT_WRITE) npte2 &= ~PTE2_NM; if (opte2 & PTE2_V) { /* Change mapping with break-before-make approach. */ opte2 = pte2_load_clear(pte2p); pmap_tlb_flush(pmap, va); pte2_store(pte2p, npte2); if (opte2 & PTE2_A) { if (pte2_is_managed(opte2)) vm_page_aflag_set(om, PGA_REFERENCED); } if (pte2_is_dirty(opte2)) { if (pte2_is_managed(opte2)) vm_page_dirty(om); } if (pte2_is_managed(opte2) && TAILQ_EMPTY(&om->md.pv_list) && ((om->flags & PG_FICTITIOUS) != 0 || TAILQ_EMPTY(&pa_to_pvh(opa)->pv_list))) vm_page_aflag_clear(om, PGA_WRITEABLE); } else pte2_store(pte2p, npte2); } #if 0 else { /* * QQQ: In time when both access and not mofified bits are * emulated by software, this should not happen. Some * analysis is need, if this really happen. Missing * tlb flush somewhere could be the reason. */ panic("%s: pmap %p va %#x opte2 %x npte2 %x !!", __func__, pmap, va, opte2, npte2); } #endif /* * If both the L2 page table page and the reservation are fully * populated, then attempt promotion. */ if ((mpte2 == NULL || pt2_is_full(mpte2, va)) && sp_enabled && (m->flags & PG_FICTITIOUS) == 0 && vm_reserv_level_iffullpop(m) == 0) pmap_promote_pte1(pmap, pte1p, va); sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } /* * Do the things to unmap a page in a process. */ static int pmap_remove_pte2(pmap_t pmap, pt2_entry_t *pte2p, vm_offset_t va, struct spglist *free) { pt2_entry_t opte2; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* Clear and invalidate the mapping. */ opte2 = pte2_load_clear(pte2p); pmap_tlb_flush(pmap, va); KASSERT(pte2_is_valid(opte2), ("%s: pmap %p va %#x not link pte2 %#x", __func__, pmap, va, opte2)); if (opte2 & PTE2_W) pmap->pm_stats.wired_count -= 1; pmap->pm_stats.resident_count -= 1; if (pte2_is_managed(opte2)) { m = PHYS_TO_VM_PAGE(pte2_pa(opte2)); if (pte2_is_dirty(opte2)) vm_page_dirty(m); if (opte2 & PTE2_A) vm_page_aflag_set(m, PGA_REFERENCED); pmap_remove_entry(pmap, m, va); } return (pmap_unuse_pt2(pmap, va, free)); } /* * Remove a single page from a process address space. */ static void pmap_remove_page(pmap_t pmap, vm_offset_t va, struct spglist *free) { pt2_entry_t *pte2p; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT(curthread->td_pinned > 0, ("%s: curthread not pinned", __func__)); PMAP_LOCK_ASSERT(pmap, MA_OWNED); if ((pte2p = pmap_pte2_quick(pmap, va)) == NULL || !pte2_is_valid(pte2_load(pte2p))) return; pmap_remove_pte2(pmap, pte2p, va, free); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly * rounded to the page size. */ void pmap_remove(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t nextva; pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; struct spglist free; /* * Perform an unsynchronized read. This is, however, safe. */ if (pmap->pm_stats.resident_count == 0) return; SLIST_INIT(&free); rw_wlock(&pvh_global_lock); sched_pin(); PMAP_LOCK(pmap); /* * Special handling of removing one page. A very common * operation and easy to short circuit some code. */ if (sva + PAGE_SIZE == eva) { pte1 = pte1_load(pmap_pte1(pmap, sva)); if (pte1_is_link(pte1)) { pmap_remove_page(pmap, sva, &free); goto out; } } for (; sva < eva; sva = nextva) { /* * Calculate address for next L2 page table. */ nextva = pte1_trunc(sva + PTE1_SIZE); if (nextva < sva) nextva = eva; if (pmap->pm_stats.resident_count == 0) break; pte1p = pmap_pte1(pmap, sva); pte1 = pte1_load(pte1p); /* * Weed out invalid mappings. Note: we assume that the L1 page * table is always allocated, and in kernel virtual. */ if (pte1 == 0) continue; if (pte1_is_section(pte1)) { /* * Are we removing the entire large page? If not, * demote the mapping and fall through. */ if (sva + PTE1_SIZE == nextva && eva >= nextva) { pmap_remove_pte1(pmap, pte1p, sva, &free); continue; } else if (!pmap_demote_pte1(pmap, pte1p, sva)) { /* The large page mapping was destroyed. */ continue; } #ifdef INVARIANTS else { /* Update pte1 after demotion. */ pte1 = pte1_load(pte1p); } #endif } KASSERT(pte1_is_link(pte1), ("%s: pmap %p va %#x pte1 %#x at %p" " is not link", __func__, pmap, sva, pte1, pte1p)); /* * Limit our scan to either the end of the va represented * by the current L2 page table page, or to the end of the * range being removed. */ if (nextva > eva) nextva = eva; for (pte2p = pmap_pte2_quick(pmap, sva); sva != nextva; pte2p++, sva += PAGE_SIZE) { pte2 = pte2_load(pte2p); if (!pte2_is_valid(pte2)) continue; if (pmap_remove_pte2(pmap, pte2p, sva, &free)) break; } } out: sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * Routine: pmap_remove_all * Function: * Removes this physical page from * all physical maps in which it resides. * Reflects back modify bits to the pager. * * Notes: * Original versions of this routine were very * inefficient because they iteratively called * pmap_remove (slow...) */ void pmap_remove_all(vm_page_t m) { struct md_page *pvh; pv_entry_t pv; pmap_t pmap; pt2_entry_t *pte2p, opte2; pt1_entry_t *pte1p; vm_offset_t va; struct spglist free; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); SLIST_INIT(&free); rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); while ((pv = TAILQ_FIRST(&pvh->pv_list)) != NULL) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, va); (void)pmap_demote_pte1(pmap, pte1p, va); PMAP_UNLOCK(pmap); } small_mappings: while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pmap->pm_stats.resident_count--; pte1p = pmap_pte1(pmap, pv->pv_va); KASSERT(!pte1_is_section(pte1_load(pte1p)), ("%s: found " "a 1mpage in page %p's pv list", __func__, m)); pte2p = pmap_pte2_quick(pmap, pv->pv_va); opte2 = pte2_load_clear(pte2p); pmap_tlb_flush(pmap, pv->pv_va); KASSERT(pte2_is_valid(opte2), ("%s: pmap %p va %x zero pte2", __func__, pmap, pv->pv_va)); if (pte2_is_wired(opte2)) pmap->pm_stats.wired_count--; if (opte2 & PTE2_A) vm_page_aflag_set(m, PGA_REFERENCED); /* * Update the vm_page_t clean and reference bits. */ if (pte2_is_dirty(opte2)) vm_page_dirty(m); pmap_unuse_pt2(pmap, pv->pv_va, &free); TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); free_pv_entry(pmap, pv); PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); sched_unpin(); rw_wunlock(&pvh_global_lock); pmap_free_zero_pages(&free); } /* * Just subroutine for pmap_remove_pages() to reasonably satisfy * good coding style, a.k.a. 80 character line width limit hell. */ static __inline void pmap_remove_pte1_quick(pmap_t pmap, pt1_entry_t pte1, pv_entry_t pv, struct spglist *free) { vm_paddr_t pa; vm_page_t m, mt, mpt2pg; struct md_page *pvh; pa = pte1_pa(pte1); m = PHYS_TO_VM_PAGE(pa); KASSERT(m->phys_addr == pa, ("%s: vm_page_t %p addr mismatch %#x %#x", __func__, m, m->phys_addr, pa)); KASSERT((m->flags & PG_FICTITIOUS) != 0 || m < &vm_page_array[vm_page_array_size], ("%s: bad pte1 %#x", __func__, pte1)); if (pte1_is_dirty(pte1)) { for (mt = m; mt < &m[PTE1_SIZE / PAGE_SIZE]; mt++) vm_page_dirty(mt); } pmap->pm_stats.resident_count -= PTE1_SIZE / PAGE_SIZE; pvh = pa_to_pvh(pa); TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); if (TAILQ_EMPTY(&pvh->pv_list)) { for (mt = m; mt < &m[PTE1_SIZE / PAGE_SIZE]; mt++) if (TAILQ_EMPTY(&mt->md.pv_list)) vm_page_aflag_clear(mt, PGA_WRITEABLE); } mpt2pg = pmap_pt2_page(pmap, pv->pv_va); if (mpt2pg != NULL) pmap_unwire_pt2_all(pmap, pv->pv_va, mpt2pg, free); } /* * Just subroutine for pmap_remove_pages() to reasonably satisfy * good coding style, a.k.a. 80 character line width limit hell. */ static __inline void pmap_remove_pte2_quick(pmap_t pmap, pt2_entry_t pte2, pv_entry_t pv, struct spglist *free) { vm_paddr_t pa; vm_page_t m; struct md_page *pvh; pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pa); KASSERT(m->phys_addr == pa, ("%s: vm_page_t %p addr mismatch %#x %#x", __func__, m, m->phys_addr, pa)); KASSERT((m->flags & PG_FICTITIOUS) != 0 || m < &vm_page_array[vm_page_array_size], ("%s: bad pte2 %#x", __func__, pte2)); if (pte2_is_dirty(pte2)) vm_page_dirty(m); pmap->pm_stats.resident_count--; TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(pa); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } pmap_unuse_pt2(pmap, pv->pv_va, free); } /* * Remove all pages from specified address space this aids process * exit speeds. Also, this code is special cased for current process * only, but can have the more generic (and slightly slower) mode enabled. * This is much faster than pmap_remove in the case of running down * an entire address space. */ void pmap_remove_pages(pmap_t pmap) { pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; pv_entry_t pv; struct pv_chunk *pc, *npc; struct spglist free; int field, idx; int32_t bit; uint32_t inuse, bitmask; boolean_t allfree; /* * Assert that the given pmap is only active on the current * CPU. Unfortunately, we cannot block another CPU from * activating the pmap while this function is executing. */ KASSERT(pmap == vmspace_pmap(curthread->td_proc->p_vmspace), ("%s: non-current pmap %p", __func__, pmap)); #if defined(SMP) && defined(INVARIANTS) { cpuset_t other_cpus; sched_pin(); other_cpus = pmap->pm_active; CPU_CLR(PCPU_GET(cpuid), &other_cpus); sched_unpin(); KASSERT(CPU_EMPTY(&other_cpus), ("%s: pmap %p active on other cpus", __func__, pmap)); } #endif SLIST_INIT(&free); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); sched_pin(); TAILQ_FOREACH_SAFE(pc, &pmap->pm_pvchunk, pc_list, npc) { KASSERT(pc->pc_pmap == pmap, ("%s: wrong pmap %p %p", __func__, pmap, pc->pc_pmap)); allfree = TRUE; for (field = 0; field < _NPCM; field++) { inuse = (~(pc->pc_map[field])) & pc_freemask[field]; while (inuse != 0) { bit = ffs(inuse) - 1; bitmask = 1UL << bit; idx = field * 32 + bit; pv = &pc->pc_pventry[idx]; inuse &= ~bitmask; /* * Note that we cannot remove wired pages * from a process' mapping at this time */ pte1p = pmap_pte1(pmap, pv->pv_va); pte1 = pte1_load(pte1p); if (pte1_is_section(pte1)) { if (pte1_is_wired(pte1)) { allfree = FALSE; continue; } pte1_clear(pte1p); pmap_remove_pte1_quick(pmap, pte1, pv, &free); } else if (pte1_is_link(pte1)) { pte2p = pt2map_entry(pv->pv_va); pte2 = pte2_load(pte2p); if (!pte2_is_valid(pte2)) { printf("%s: pmap %p va %#x " "pte2 %#x\n", __func__, pmap, pv->pv_va, pte2); panic("bad pte2"); } if (pte2_is_wired(pte2)) { allfree = FALSE; continue; } pte2_clear(pte2p); pmap_remove_pte2_quick(pmap, pte2, pv, &free); } else { printf("%s: pmap %p va %#x pte1 %#x\n", __func__, pmap, pv->pv_va, pte1); panic("bad pte1"); } /* Mark free */ PV_STAT(pv_entry_frees++); PV_STAT(pv_entry_spare++); pv_entry_count--; pc->pc_map[field] |= bitmask; } } if (allfree) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); free_pv_chunk(pc); } } tlb_flush_all_ng_local(); sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); pmap_free_zero_pages(&free); } /* * This code makes some *MAJOR* assumptions: * 1. Current pmap & pmap exists. * 2. Not wired. * 3. Read access. * 4. No L2 page table pages. * but is *MUCH* faster than pmap_enter... */ static vm_page_t pmap_enter_quick_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, vm_page_t mpt2pg) { pt2_entry_t *pte2p, pte2; vm_paddr_t pa; struct spglist free; uint32_t l2prot; KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva || (m->oflags & VPO_UNMANAGED) != 0, ("%s: managed mapping within the clean submap", __func__)); rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * In the case that a L2 page table page is not * resident, we are creating it here. */ if (va < VM_MAXUSER_ADDRESS) { u_int pte1_idx; pt1_entry_t pte1, *pte1p; vm_paddr_t pt2_pa; /* * Get L1 page table things. */ pte1_idx = pte1_index(va); pte1p = pmap_pte1(pmap, va); pte1 = pte1_load(pte1p); if (mpt2pg && (mpt2pg->pindex == (pte1_idx & ~PT2PG_MASK))) { /* * Each of NPT2_IN_PG L2 page tables on the page can * come here. Make sure that associated L1 page table * link is established. * * QQQ: It comes that we don't establish all links to * L2 page tables for newly allocated L2 page * tables page. */ KASSERT(!pte1_is_section(pte1), ("%s: pte1 %#x is section", __func__, pte1)); if (!pte1_is_link(pte1)) { pt2_pa = page_pt2pa(VM_PAGE_TO_PHYS(mpt2pg), pte1_idx); pte1_store(pte1p, PTE1_LINK(pt2_pa)); } pt2_wirecount_inc(mpt2pg, pte1_idx); } else { /* * If the L2 page table page is mapped, we just * increment the hold count, and activate it. */ if (pte1_is_section(pte1)) { return (NULL); } else if (pte1_is_link(pte1)) { mpt2pg = PHYS_TO_VM_PAGE(pte1_link_pa(pte1)); pt2_wirecount_inc(mpt2pg, pte1_idx); } else { mpt2pg = _pmap_allocpte2(pmap, va, PMAP_ENTER_NOSLEEP); if (mpt2pg == NULL) return (NULL); } } } else { mpt2pg = NULL; } /* * This call to pt2map_entry() makes the assumption that we are * entering the page into the current pmap. In order to support * quick entry into any pmap, one would likely use pmap_pte2_quick(). * But that isn't as quick as pt2map_entry(). */ pte2p = pt2map_entry(va); pte2 = pte2_load(pte2p); if (pte2_is_valid(pte2)) { if (mpt2pg != NULL) { /* * Remove extra pte2 reference */ pt2_wirecount_dec(mpt2pg, pte1_index(va)); mpt2pg = NULL; } return (NULL); } /* * Enter on the PV list if part of our managed memory. */ if ((m->oflags & VPO_UNMANAGED) == 0 && !pmap_try_insert_pv_entry(pmap, va, m)) { if (mpt2pg != NULL) { SLIST_INIT(&free); if (pmap_unwire_pt2(pmap, va, mpt2pg, &free)) { pmap_tlb_flush(pmap, va); pmap_free_zero_pages(&free); } mpt2pg = NULL; } return (NULL); } /* * Increment counters */ pmap->pm_stats.resident_count++; /* * Now validate mapping with RO protection */ pa = VM_PAGE_TO_PHYS(m); l2prot = PTE2_RO | PTE2_NM; if (va < VM_MAXUSER_ADDRESS) l2prot |= PTE2_U | PTE2_NG; if ((prot & VM_PROT_EXECUTE) == 0) l2prot |= PTE2_NX; else if (m->md.pat_mode == PTE2_ATTR_WB_WA) { /* * Sync icache if exec permission and attribute PTE2_ATTR_WB_WA * is set. QQQ: For more info, see comments in pmap_enter(). */ cache_icache_sync_fresh(va, pa, PAGE_SIZE); } pte2_store(pte2p, PTE2(pa, l2prot, m->md.pat_mode)); return (mpt2pg); } void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); (void)pmap_enter_quick_locked(pmap, va, m, prot, NULL); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Tries to create 1MB page mapping. Returns TRUE if successful and * FALSE otherwise. Fails if (1) a page table page cannot be allocated without * blocking, (2) a mapping already exists at the specified virtual address, or * (3) a pv entry cannot be allocated without reclaiming another pv entry. */ static boolean_t pmap_enter_pte1(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { pt1_entry_t *pte1p; vm_paddr_t pa; uint32_t l1prot; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_LOCK_ASSERT(pmap, MA_OWNED); pte1p = pmap_pte1(pmap, va); if (pte1_is_valid(pte1_load(pte1p))) { CTR3(KTR_PMAP, "%s: failure for va %#lx in pmap %p", __func__, va, pmap); return (FALSE); } if ((m->oflags & VPO_UNMANAGED) == 0) { /* * Abort this mapping if its PV entry could not be created. */ if (!pmap_pv_insert_pte1(pmap, va, VM_PAGE_TO_PHYS(m))) { CTR3(KTR_PMAP, "%s: failure for va %#lx in pmap %p", __func__, va, pmap); return (FALSE); } } /* * Increment counters. */ pmap->pm_stats.resident_count += PTE1_SIZE / PAGE_SIZE; /* * Map the section. * * QQQ: Why VM_PROT_WRITE is not evaluated and the mapping is * made readonly? */ pa = VM_PAGE_TO_PHYS(m); l1prot = PTE1_RO | PTE1_NM; if (va < VM_MAXUSER_ADDRESS) l1prot |= PTE1_U | PTE1_NG; if ((prot & VM_PROT_EXECUTE) == 0) l1prot |= PTE1_NX; else if (m->md.pat_mode == PTE2_ATTR_WB_WA) { /* * Sync icache if exec permission and attribute PTE2_ATTR_WB_WA * is set. QQQ: For more info, see comments in pmap_enter(). */ cache_icache_sync_fresh(va, pa, PTE1_SIZE); } pte1_store(pte1p, PTE1(pa, l1prot, ATTR_TO_L1(m->md.pat_mode))); pmap_pte1_mappings++; CTR3(KTR_PMAP, "%s: success for va %#lx in pmap %p", __func__, va, pmap); return (TRUE); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { vm_offset_t va; vm_page_t m, mpt2pg; vm_pindex_t diff, psize; PDEBUG(6, printf("%s: pmap %p start %#x end %#x m %p prot %#x\n", __func__, pmap, start, end, m_start, prot)); VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); mpt2pg = NULL; m = m_start; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { va = start + ptoa(diff); if ((va & PTE1_OFFSET) == 0 && va + PTE1_SIZE <= end && m->psind == 1 && sp_enabled && pmap_enter_pte1(pmap, va, m, prot)) m = &m[PTE1_SIZE / PAGE_SIZE - 1]; else mpt2pg = pmap_enter_quick_locked(pmap, va, m, prot, mpt2pg); m = TAILQ_NEXT(m, listq); } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * This code maps large physical mmap regions into the * processor address space. Note that some shortcuts * are taken, but the code works. */ void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { pt1_entry_t *pte1p; vm_paddr_t pa, pte2_pa; vm_page_t p; int pat_mode; u_int l1attr, l1prot; VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("%s: non-device object", __func__)); if ((addr & PTE1_OFFSET) == 0 && (size & PTE1_OFFSET) == 0) { if (!vm_object_populate(object, pindex, pindex + atop(size))) return; p = vm_page_lookup(object, pindex); KASSERT(p->valid == VM_PAGE_BITS_ALL, ("%s: invalid page %p", __func__, p)); pat_mode = p->md.pat_mode; /* * Abort the mapping if the first page is not physically * aligned to a 1MB page boundary. */ pte2_pa = VM_PAGE_TO_PHYS(p); if (pte2_pa & PTE1_OFFSET) return; /* * Skip the first page. Abort the mapping if the rest of * the pages are not physically contiguous or have differing * memory attributes. */ p = TAILQ_NEXT(p, listq); for (pa = pte2_pa + PAGE_SIZE; pa < pte2_pa + size; pa += PAGE_SIZE) { KASSERT(p->valid == VM_PAGE_BITS_ALL, ("%s: invalid page %p", __func__, p)); if (pa != VM_PAGE_TO_PHYS(p) || pat_mode != p->md.pat_mode) return; p = TAILQ_NEXT(p, listq); } /* * Map using 1MB pages. * * QQQ: Well, we are mapping a section, so same condition must * be hold like during promotion. It looks that only RW mapping * is done here, so readonly mapping must be done elsewhere. */ l1prot = PTE1_U | PTE1_NG | PTE1_RW | PTE1_M | PTE1_A; l1attr = ATTR_TO_L1(pat_mode); PMAP_LOCK(pmap); for (pa = pte2_pa; pa < pte2_pa + size; pa += PTE1_SIZE) { pte1p = pmap_pte1(pmap, addr); if (!pte1_is_valid(pte1_load(pte1p))) { pte1_store(pte1p, PTE1(pa, l1prot, l1attr)); pmap->pm_stats.resident_count += PTE1_SIZE / PAGE_SIZE; pmap_pte1_mappings++; } /* Else continue on if the PTE1 is already valid. */ addr += PTE1_SIZE; } PMAP_UNLOCK(pmap); } } /* * Do the things to protect a 1mpage in a process. */ static void pmap_protect_pte1(pmap_t pmap, pt1_entry_t *pte1p, vm_offset_t sva, vm_prot_t prot) { pt1_entry_t npte1, opte1; vm_offset_t eva, va; vm_page_t m; PMAP_LOCK_ASSERT(pmap, MA_OWNED); KASSERT((sva & PTE1_OFFSET) == 0, ("%s: sva is not 1mpage aligned", __func__)); retry: opte1 = npte1 = pte1_load(pte1p); if (pte1_is_managed(opte1)) { eva = sva + PTE1_SIZE; for (va = sva, m = PHYS_TO_VM_PAGE(pte1_pa(opte1)); va < eva; va += PAGE_SIZE, m++) if (pte1_is_dirty(opte1)) vm_page_dirty(m); } if ((prot & VM_PROT_WRITE) == 0) npte1 |= PTE1_RO | PTE1_NM; if ((prot & VM_PROT_EXECUTE) == 0) npte1 |= PTE1_NX; /* * QQQ: Herein, execute permission is never set. * It only can be cleared. So, no icache * syncing is needed. */ if (npte1 != opte1) { if (!pte1_cmpset(pte1p, opte1, npte1)) goto retry; pmap_tlb_flush(pmap, sva); } } /* * Set the physical protection on the * specified range of this map as requested. */ void pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { boolean_t pv_lists_locked; vm_offset_t nextva; pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, opte2, npte2; KASSERT((prot & ~VM_PROT_ALL) == 0, ("invalid prot %x", prot)); if (prot == VM_PROT_NONE) { pmap_remove(pmap, sva, eva); return; } if ((prot & (VM_PROT_WRITE | VM_PROT_EXECUTE)) == (VM_PROT_WRITE | VM_PROT_EXECUTE)) return; if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } PMAP_LOCK(pmap); for (; sva < eva; sva = nextva) { /* * Calculate address for next L2 page table. */ nextva = pte1_trunc(sva + PTE1_SIZE); if (nextva < sva) nextva = eva; pte1p = pmap_pte1(pmap, sva); pte1 = pte1_load(pte1p); /* * Weed out invalid mappings. Note: we assume that L1 page * page table is always allocated, and in kernel virtual. */ if (pte1 == 0) continue; if (pte1_is_section(pte1)) { /* * Are we protecting the entire large page? If not, * demote the mapping and fall through. */ if (sva + PTE1_SIZE == nextva && eva >= nextva) { pmap_protect_pte1(pmap, pte1p, sva, prot); continue; } else { if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { PMAP_UNLOCK(pmap); goto resume; } sched_pin(); } if (!pmap_demote_pte1(pmap, pte1p, sva)) { /* * The large page mapping * was destroyed. */ continue; } #ifdef INVARIANTS else { /* Update pte1 after demotion */ pte1 = pte1_load(pte1p); } #endif } } KASSERT(pte1_is_link(pte1), ("%s: pmap %p va %#x pte1 %#x at %p" " is not link", __func__, pmap, sva, pte1, pte1p)); /* * Limit our scan to either the end of the va represented * by the current L2 page table page, or to the end of the * range being protected. */ if (nextva > eva) nextva = eva; for (pte2p = pmap_pte2_quick(pmap, sva); sva != nextva; pte2p++, sva += PAGE_SIZE) { vm_page_t m; retry: opte2 = npte2 = pte2_load(pte2p); if (!pte2_is_valid(opte2)) continue; if ((prot & VM_PROT_WRITE) == 0) { if (pte2_is_managed(opte2) && pte2_is_dirty(opte2)) { m = PHYS_TO_VM_PAGE(pte2_pa(opte2)); vm_page_dirty(m); } npte2 |= PTE2_RO | PTE2_NM; } if ((prot & VM_PROT_EXECUTE) == 0) npte2 |= PTE2_NX; /* * QQQ: Herein, execute permission is never set. * It only can be cleared. So, no icache * syncing is needed. */ if (npte2 != opte2) { if (!pte2_cmpset(pte2p, opte2, npte2)) goto retry; pmap_tlb_flush(pmap, sva); } } } if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * pmap_pvh_wired_mappings: * * Return the updated number "count" of managed mappings that are wired. */ static int pmap_pvh_wired_mappings(struct md_page *pvh, int count) { pmap_t pmap; pt1_entry_t pte1; pt2_entry_t pte2; pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, pv->pv_va)); if (pte1_is_section(pte1)) { if (pte1_is_wired(pte1)) count++; } else { KASSERT(pte1_is_link(pte1), ("%s: pte1 %#x is not link", __func__, pte1)); pte2 = pte2_load(pmap_pte2_quick(pmap, pv->pv_va)); if (pte2_is_wired(pte2)) count++; } PMAP_UNLOCK(pmap); } sched_unpin(); return (count); } /* * pmap_page_wired_mappings: * * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { int count; count = 0; if ((m->oflags & VPO_UNMANAGED) != 0) return (count); rw_wlock(&pvh_global_lock); count = pmap_pvh_wired_mappings(&m->md, count); if ((m->flags & PG_FICTITIOUS) == 0) { count = pmap_pvh_wired_mappings(pa_to_pvh(VM_PAGE_TO_PHYS(m)), count); } rw_wunlock(&pvh_global_lock); return (count); } /* * Returns TRUE if any of the given mappings were used to modify * physical memory. Otherwise, returns FALSE. Both page and 1mpage * mappings are supported. */ static boolean_t pmap_is_modified_pvh(struct md_page *pvh) { pv_entry_t pv; pt1_entry_t pte1; pt2_entry_t pte2; pmap_t pmap; boolean_t rv; rw_assert(&pvh_global_lock, RA_WLOCKED); rv = FALSE; sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, pv->pv_va)); if (pte1_is_section(pte1)) { rv = pte1_is_dirty(pte1); } else { KASSERT(pte1_is_link(pte1), ("%s: pte1 %#x is not link", __func__, pte1)); pte2 = pte2_load(pmap_pte2_quick(pmap, pv->pv_va)); rv = pte2_is_dirty(pte2); } PMAP_UNLOCK(pmap); if (rv) break; } sched_unpin(); return (rv); } /* * pmap_is_modified: * * Return whether or not the specified physical page was modified * in any physical maps. */ boolean_t pmap_is_modified(vm_page_t m) { boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTE2s can have PG_M set. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (FALSE); rw_wlock(&pvh_global_lock); rv = pmap_is_modified_pvh(&m->md) || ((m->flags & PG_FICTITIOUS) == 0 && pmap_is_modified_pvh(pa_to_pvh(VM_PAGE_TO_PHYS(m)))); rw_wunlock(&pvh_global_lock); return (rv); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is eligible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { pt1_entry_t pte1; pt2_entry_t pte2; boolean_t rv; rv = FALSE; PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, addr)); if (pte1_is_link(pte1)) { pte2 = pte2_load(pt2map_entry(addr)); rv = !pte2_is_valid(pte2) ; } PMAP_UNLOCK(pmap); return (rv); } /* * Returns TRUE if any of the given mappings were referenced and FALSE * otherwise. Both page and 1mpage mappings are supported. */ static boolean_t pmap_is_referenced_pvh(struct md_page *pvh) { pv_entry_t pv; pt1_entry_t pte1; pt2_entry_t pte2; pmap_t pmap; boolean_t rv; rw_assert(&pvh_global_lock, RA_WLOCKED); rv = FALSE; sched_pin(); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1 = pte1_load(pmap_pte1(pmap, pv->pv_va)); if (pte1_is_section(pte1)) { rv = (pte1 & (PTE1_A | PTE1_V)) == (PTE1_A | PTE1_V); } else { pte2 = pte2_load(pmap_pte2_quick(pmap, pv->pv_va)); rv = (pte2 & (PTE2_A | PTE2_V)) == (PTE2_A | PTE2_V); } PMAP_UNLOCK(pmap); if (rv) break; } sched_unpin(); return (rv); } /* * pmap_is_referenced: * * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); rw_wlock(&pvh_global_lock); rv = pmap_is_referenced_pvh(&m->md) || ((m->flags & PG_FICTITIOUS) == 0 && pmap_is_referenced_pvh(pa_to_pvh(VM_PAGE_TO_PHYS(m)))); rw_wunlock(&pvh_global_lock); return (rv); } #define PMAP_TS_REFERENCED_MAX 5 /* * pmap_ts_referenced: * * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. * * XXX: The exact number of bits to check and clear is a matter that * should be tested and standardized at some point in the future for * optimal aging of shared pages. */ int pmap_ts_referenced(vm_page_t m) { struct md_page *pvh; pv_entry_t pv, pvf; pmap_t pmap; pt1_entry_t *pte1p, opte1; pt2_entry_t *pte2p; vm_paddr_t pa; int rtval = 0; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); pa = VM_PAGE_TO_PHYS(m); pvh = pa_to_pvh(pa); rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0 || (pvf = TAILQ_FIRST(&pvh->pv_list)) == NULL) goto small_mappings; pv = pvf; do { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, pv->pv_va); opte1 = pte1_load(pte1p); if ((opte1 & PTE1_A) != 0) { /* * Since this reference bit is shared by 256 4KB pages, * it should not be cleared every time it is tested. * Apply a simple "hash" function on the physical page * number, the virtual section number, and the pmap * address to select one 4KB page out of the 256 * on which testing the reference bit will result * in clearing that bit. This function is designed * to avoid the selection of the same 4KB page * for every 1MB page mapping. * * On demotion, a mapping that hasn't been referenced * is simply destroyed. To avoid the possibility of a * subsequent page fault on a demoted wired mapping, * always leave its reference bit set. Moreover, * since the section is wired, the current state of * its reference bit won't affect page replacement. */ if ((((pa >> PAGE_SHIFT) ^ (pv->pv_va >> PTE1_SHIFT) ^ (uintptr_t)pmap) & (NPTE2_IN_PG - 1)) == 0 && !pte1_is_wired(opte1)) { pte1_clear_bit(pte1p, PTE1_A); pmap_tlb_flush(pmap, pv->pv_va); } rtval++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&pvh->pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_next); } if (rtval >= PMAP_TS_REFERENCED_MAX) goto out; } while ((pv = TAILQ_FIRST(&pvh->pv_list)) != pvf); small_mappings: if ((pvf = TAILQ_FIRST(&m->md.pv_list)) == NULL) goto out; pv = pvf; do { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, pv->pv_va); KASSERT(pte1_is_link(pte1_load(pte1p)), ("%s: not found a link in page %p's pv list", __func__, m)); pte2p = pmap_pte2_quick(pmap, pv->pv_va); if ((pte2_load(pte2p) & PTE2_A) != 0) { pte2_clear_bit(pte2p, PTE2_A); pmap_tlb_flush(pmap, pv->pv_va); rtval++; } PMAP_UNLOCK(pmap); /* Rotate the PV list if it has more than one entry. */ if (TAILQ_NEXT(pv, pv_next) != NULL) { TAILQ_REMOVE(&m->md.pv_list, pv, pv_next); TAILQ_INSERT_TAIL(&m->md.pv_list, pv, pv_next); } } while ((pv = TAILQ_FIRST(&m->md.pv_list)) != pvf && rtval < PMAP_TS_REFERENCED_MAX); out: sched_unpin(); rw_wunlock(&pvh_global_lock); return (rtval); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range * must have the wired attribute set. In contrast, invalid mappings * cannot have the wired attribute set, so they are ignored. * * The wired attribute of the page table entry is not a hardware feature, * so there is no need to invalidate any TLB entries. */ void pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t nextva; pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; boolean_t pv_lists_locked; if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } PMAP_LOCK(pmap); for (; sva < eva; sva = nextva) { nextva = pte1_trunc(sva + PTE1_SIZE); if (nextva < sva) nextva = eva; pte1p = pmap_pte1(pmap, sva); pte1 = pte1_load(pte1p); /* * Weed out invalid mappings. Note: we assume that L1 page * page table is always allocated, and in kernel virtual. */ if (pte1 == 0) continue; if (pte1_is_section(pte1)) { if (!pte1_is_wired(pte1)) panic("%s: pte1 %#x not wired", __func__, pte1); /* * Are we unwiring the entire large page? If not, * demote the mapping and fall through. */ if (sva + PTE1_SIZE == nextva && eva >= nextva) { pte1_clear_bit(pte1p, PTE1_W); pmap->pm_stats.wired_count -= PTE1_SIZE / PAGE_SIZE; continue; } else { if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { PMAP_UNLOCK(pmap); /* Repeat sva. */ goto resume; } sched_pin(); } if (!pmap_demote_pte1(pmap, pte1p, sva)) panic("%s: demotion failed", __func__); #ifdef INVARIANTS else { /* Update pte1 after demotion */ pte1 = pte1_load(pte1p); } #endif } } KASSERT(pte1_is_link(pte1), ("%s: pmap %p va %#x pte1 %#x at %p" " is not link", __func__, pmap, sva, pte1, pte1p)); /* * Limit our scan to either the end of the va represented * by the current L2 page table page, or to the end of the * range being protected. */ if (nextva > eva) nextva = eva; for (pte2p = pmap_pte2_quick(pmap, sva); sva != nextva; pte2p++, sva += PAGE_SIZE) { pte2 = pte2_load(pte2p); if (!pte2_is_valid(pte2)) continue; if (!pte2_is_wired(pte2)) panic("%s: pte2 %#x is missing PTE2_W", __func__, pte2); /* * PTE2_W must be cleared atomically. Although the pmap * lock synchronizes access to PTE2_W, another processor * could be changing PTE2_NM and/or PTE2_A concurrently. */ pte2_clear_bit(pte2p, PTE2_W); pmap->pm_stats.wired_count--; } } if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * Clear the write and modified bits in each of the given page's mappings. */ void pmap_remove_write(vm_page_t m) { struct md_page *pvh; pv_entry_t next_pv, pv; pmap_t pmap; pt1_entry_t *pte1p; pt2_entry_t *pte2p, opte2; vm_offset_t va; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, va); if (!(pte1_load(pte1p) & PTE1_RO)) (void)pmap_demote_pte1(pmap, pte1p, va); PMAP_UNLOCK(pmap); } small_mappings: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, pv->pv_va); KASSERT(!pte1_is_section(pte1_load(pte1p)), ("%s: found" " a section in page %p's pv list", __func__, m)); pte2p = pmap_pte2_quick(pmap, pv->pv_va); retry: opte2 = pte2_load(pte2p); if (!(opte2 & PTE2_RO)) { if (!pte2_cmpset(pte2p, opte2, opte2 | (PTE2_RO | PTE2_NM))) goto retry; if (pte2_is_dirty(opte2)) vm_page_dirty(m); pmap_tlb_flush(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); sched_unpin(); rw_wunlock(&pvh_global_lock); } /* * Apply the given advice to the specified range of addresses within the * given pmap. Depending on the advice, clear the referenced and/or * modified flags in each mapping and set the mapped page's dirty field. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { pt1_entry_t *pte1p, opte1; pt2_entry_t *pte2p, pte2; vm_offset_t pdnxt; vm_page_t m; boolean_t pv_lists_locked; if (advice != MADV_DONTNEED && advice != MADV_FREE) return; if (pmap_is_current(pmap)) pv_lists_locked = FALSE; else { pv_lists_locked = TRUE; resume: rw_wlock(&pvh_global_lock); sched_pin(); } PMAP_LOCK(pmap); for (; sva < eva; sva = pdnxt) { pdnxt = pte1_trunc(sva + PTE1_SIZE); if (pdnxt < sva) pdnxt = eva; pte1p = pmap_pte1(pmap, sva); opte1 = pte1_load(pte1p); if (!pte1_is_valid(opte1)) /* XXX */ continue; else if (pte1_is_section(opte1)) { if (!pte1_is_managed(opte1)) continue; if (!pv_lists_locked) { pv_lists_locked = TRUE; if (!rw_try_wlock(&pvh_global_lock)) { PMAP_UNLOCK(pmap); goto resume; } sched_pin(); } if (!pmap_demote_pte1(pmap, pte1p, sva)) { /* * The large page mapping was destroyed. */ continue; } /* * Unless the page mappings are wired, remove the * mapping to a single page so that a subsequent * access may repromote. Since the underlying L2 page * table is fully populated, this removal never * frees a L2 page table page. */ if (!pte1_is_wired(opte1)) { pte2p = pmap_pte2_quick(pmap, sva); KASSERT(pte2_is_valid(pte2_load(pte2p)), ("%s: invalid PTE2", __func__)); pmap_remove_pte2(pmap, pte2p, sva, NULL); } } if (pdnxt > eva) pdnxt = eva; for (pte2p = pmap_pte2_quick(pmap, sva); sva != pdnxt; pte2p++, sva += PAGE_SIZE) { pte2 = pte2_load(pte2p); if (!pte2_is_valid(pte2) || !pte2_is_managed(pte2)) continue; else if (pte2_is_dirty(pte2)) { if (advice == MADV_DONTNEED) { /* * Future calls to pmap_is_modified() * can be avoided by making the page * dirty now. */ m = PHYS_TO_VM_PAGE(pte2_pa(pte2)); vm_page_dirty(m); } pte2_set_bit(pte2p, PTE2_NM); pte2_clear_bit(pte2p, PTE2_A); } else if ((pte2 & PTE2_A) != 0) pte2_clear_bit(pte2p, PTE2_A); else continue; pmap_tlb_flush(pmap, sva); } } if (pv_lists_locked) { sched_unpin(); rw_wunlock(&pvh_global_lock); } PMAP_UNLOCK(pmap); } /* * Clear the modify bits on the specified physical page. */ void pmap_clear_modify(vm_page_t m) { struct md_page *pvh; pv_entry_t next_pv, pv; pmap_t pmap; pt1_entry_t *pte1p, opte1; pt2_entry_t *pte2p, opte2; vm_offset_t va; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("%s: page %p is exclusive busy", __func__, m)); /* * If the page is not PGA_WRITEABLE, then no PTE2s can have PTE2_NM * cleared. If the object containing the page is locked and the page * is not exclusive busied, then PGA_WRITEABLE cannot be concurrently * set. */ if ((m->flags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); sched_pin(); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_next, next_pv) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, va); opte1 = pte1_load(pte1p); if (!(opte1 & PTE1_RO)) { if (pmap_demote_pte1(pmap, pte1p, va) && !pte1_is_wired(opte1)) { /* * Write protect the mapping to a * single page so that a subsequent * write access may repromote. */ va += VM_PAGE_TO_PHYS(m) - pte1_pa(opte1); pte2p = pmap_pte2_quick(pmap, va); opte2 = pte2_load(pte2p); if ((opte2 & PTE2_V)) { pte2_set_bit(pte2p, PTE2_NM | PTE2_RO); vm_page_dirty(m); pmap_tlb_flush(pmap, va); } } } PMAP_UNLOCK(pmap); } small_mappings: TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pte1p = pmap_pte1(pmap, pv->pv_va); KASSERT(!pte1_is_section(pte1_load(pte1p)), ("%s: found" " a section in page %p's pv list", __func__, m)); pte2p = pmap_pte2_quick(pmap, pv->pv_va); if (pte2_is_dirty(pte2_load(pte2p))) { pte2_set_bit(pte2p, PTE2_NM); pmap_tlb_flush(pmap, pv->pv_va); } PMAP_UNLOCK(pmap); } sched_unpin(); rw_wunlock(&pvh_global_lock); } /* * Sets the memory attribute for the specified page. */ void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma) { struct sysmaps *sysmaps; vm_memattr_t oma; vm_paddr_t pa; oma = m->md.pat_mode; m->md.pat_mode = ma; CTR5(KTR_PMAP, "%s: page %p - 0x%08X oma: %d, ma: %d", __func__, m, VM_PAGE_TO_PHYS(m), oma, ma); if ((m->flags & PG_FICTITIOUS) != 0) return; #if 0 /* * If "m" is a normal page, flush it from the cache. * * First, try to find an existing mapping of the page by sf * buffer. sf_buf_invalidate_cache() modifies mapping and * flushes the cache. */ if (sf_buf_invalidate_cache(m, oma)) return; #endif /* * If page is not mapped by sf buffer, map the page * transient and do invalidation. */ if (ma != oma) { pa = VM_PAGE_TO_PHYS(m); sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP2) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(pa, PTE2_AP_KRW, ma)); dcache_wbinv_poc((vm_offset_t)sysmaps->CADDR2, pa, PAGE_SIZE); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } } /* * Miscellaneous support routines follow */ /* * Returns TRUE if the given page is mapped individually or as part of * a 1mpage. Otherwise, returns FALSE. */ boolean_t pmap_page_is_mapped(vm_page_t m) { boolean_t rv; if ((m->oflags & VPO_UNMANAGED) != 0) return (FALSE); rw_wlock(&pvh_global_lock); rv = !TAILQ_EMPTY(&m->md.pv_list) || ((m->flags & PG_FICTITIOUS) == 0 && !TAILQ_EMPTY(&pa_to_pvh(VM_PAGE_TO_PHYS(m))->pv_list)); rw_wunlock(&pvh_global_lock); return (rv); } /* * Returns true if the pmap's pv is one of the first * 16 pvs linked to from this page. This count may * be changed upwards or downwards in the future; it * is only necessary that true be returned for a small * subset of pmaps for proper page aging. */ boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m) { struct md_page *pvh; pv_entry_t pv; int loops = 0; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("%s: page %p is not managed", __func__, m)); rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } if (!rv && loops < 16 && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_next) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } } rw_wunlock(&pvh_global_lock); return (rv); } /* * pmap_zero_page zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. */ void pmap_zero_page(vm_page_t m) { struct sysmaps *sysmaps; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, m->md.pat_mode)); pagezero(sysmaps->CADDR2); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * pmap_zero_page_area zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. * * off and size may not cover an area beyond a single hardware page. */ void pmap_zero_page_area(vm_page_t m, int off, int size) { struct sysmaps *sysmaps; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, m->md.pat_mode)); if (off == 0 && size == PAGE_SIZE) pagezero(sysmaps->CADDR2); else bzero(sysmaps->CADDR2 + off, size); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * pmap_zero_page_idle zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. This * is intended to be called from the vm_pagezero process only and * outside of Giant. */ void pmap_zero_page_idle(vm_page_t m) { if (pte2_load(CMAP3) != 0) panic("%s: CMAP3 busy", __func__); sched_pin(); pte2_store(CMAP3, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, m->md.pat_mode)); pagezero(CADDR3); pte2_clear(CMAP3); tlb_flush((vm_offset_t)CADDR3); sched_unpin(); } /* * pmap_copy_page copies the specified (machine independent) * page by mapping the page into virtual memory and using * bcopy to copy the page, one machine dependent page at a * time. */ void pmap_copy_page(vm_page_t src, vm_page_t dst) { struct sysmaps *sysmaps; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP1) != 0) panic("%s: CMAP1 busy", __func__); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP1, PTE2_KERN_NG(VM_PAGE_TO_PHYS(src), PTE2_AP_KR | PTE2_NM, src->md.pat_mode)); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(dst), PTE2_AP_KRW, dst->md.pat_mode)); bcopy(sysmaps->CADDR1, sysmaps->CADDR2, PAGE_SIZE); pte2_clear(sysmaps->CMAP1); tlb_flush((vm_offset_t)sysmaps->CADDR1); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } int unmapped_buf_allowed = 1; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { struct sysmaps *sysmaps; vm_page_t a_pg, b_pg; char *a_cp, *b_cp; vm_offset_t a_pg_offset, b_pg_offset; int cnt; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP1 != 0) panic("pmap_copy_pages: CMAP1 busy"); if (*sysmaps->CMAP2 != 0) panic("pmap_copy_pages: CMAP2 busy"); while (xfersize > 0) { a_pg = ma[a_offset >> PAGE_SHIFT]; a_pg_offset = a_offset & PAGE_MASK; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); b_pg = mb[b_offset >> PAGE_SHIFT]; b_pg_offset = b_offset & PAGE_MASK; cnt = min(cnt, PAGE_SIZE - b_pg_offset); pte2_store(sysmaps->CMAP1, PTE2_KERN_NG(VM_PAGE_TO_PHYS(a_pg), PTE2_AP_KR | PTE2_NM, a_pg->md.pat_mode)); tlb_flush_local((vm_offset_t)sysmaps->CADDR1); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(b_pg), PTE2_AP_KRW, b_pg->md.pat_mode)); tlb_flush_local((vm_offset_t)sysmaps->CADDR2); a_cp = sysmaps->CADDR1 + a_pg_offset; b_cp = sysmaps->CADDR2 + b_pg_offset; bcopy(a_cp, b_cp, cnt); a_offset += cnt; b_offset += cnt; xfersize -= cnt; } pte2_clear(sysmaps->CMAP1); tlb_flush((vm_offset_t)sysmaps->CADDR1); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } vm_offset_t pmap_quick_enter_page(vm_page_t m) { pt2_entry_t *pte2p; vm_offset_t qmap_addr; critical_enter(); qmap_addr = PCPU_GET(qmap_addr); pte2p = pt2map_entry(qmap_addr); KASSERT(pte2_load(pte2p) == 0, ("%s: PTE2 busy", __func__)); pte2_store(pte2p, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, pmap_page_get_memattr(m))); return (qmap_addr); } void pmap_quick_remove_page(vm_offset_t addr) { pt2_entry_t *pte2p; vm_offset_t qmap_addr; qmap_addr = PCPU_GET(qmap_addr); pte2p = pt2map_entry(qmap_addr); KASSERT(addr == qmap_addr, ("%s: invalid address", __func__)); KASSERT(pte2_load(pte2p) != 0, ("%s: PTE2 not in use", __func__)); pte2_clear(pte2p); tlb_flush(qmap_addr); critical_exit(); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { struct spglist free; vm_offset_t addr; vm_offset_t end_addr = src_addr + len; vm_offset_t nextva; if (dst_addr != src_addr) return; if (!pmap_is_current(src_pmap)) return; rw_wlock(&pvh_global_lock); if (dst_pmap < src_pmap) { PMAP_LOCK(dst_pmap); PMAP_LOCK(src_pmap); } else { PMAP_LOCK(src_pmap); PMAP_LOCK(dst_pmap); } sched_pin(); for (addr = src_addr; addr < end_addr; addr = nextva) { pt2_entry_t *src_pte2p, *dst_pte2p; vm_page_t dst_mpt2pg, src_mpt2pg; pt1_entry_t src_pte1; u_int pte1_idx; KASSERT(addr < VM_MAXUSER_ADDRESS, ("%s: invalid to pmap_copy page tables", __func__)); nextva = pte1_trunc(addr + PTE1_SIZE); if (nextva < addr) nextva = end_addr; pte1_idx = pte1_index(addr); src_pte1 = src_pmap->pm_pt1[pte1_idx]; if (pte1_is_section(src_pte1)) { if ((addr & PTE1_OFFSET) != 0 || (addr + PTE1_SIZE) > end_addr) continue; if (dst_pmap->pm_pt1[pte1_idx] == 0 && (!pte1_is_managed(src_pte1) || pmap_pv_insert_pte1(dst_pmap, addr, pte1_pa(src_pte1)))) { dst_pmap->pm_pt1[pte1_idx] = src_pte1 & ~PTE1_W; dst_pmap->pm_stats.resident_count += PTE1_SIZE / PAGE_SIZE; pmap_pte1_mappings++; } continue; } else if (!pte1_is_link(src_pte1)) continue; src_mpt2pg = PHYS_TO_VM_PAGE(pte1_link_pa(src_pte1)); /* * We leave PT2s to be linked from PT1 even if they are not * referenced until all PT2s in a page are without reference. * * QQQ: It could be changed ... */ #if 0 /* single_pt2_link_is_cleared */ KASSERT(pt2_wirecount_get(src_mpt2pg, pte1_idx) > 0, ("%s: source page table page is unused", __func__)); #else if (pt2_wirecount_get(src_mpt2pg, pte1_idx) == 0) continue; #endif if (nextva > end_addr) nextva = end_addr; src_pte2p = pt2map_entry(addr); while (addr < nextva) { pt2_entry_t temp_pte2; temp_pte2 = pte2_load(src_pte2p); /* * we only virtual copy managed pages */ if (pte2_is_managed(temp_pte2)) { dst_mpt2pg = pmap_allocpte2(dst_pmap, addr, PMAP_ENTER_NOSLEEP); if (dst_mpt2pg == NULL) goto out; dst_pte2p = pmap_pte2_quick(dst_pmap, addr); if (!pte2_is_valid(pte2_load(dst_pte2p)) && pmap_try_insert_pv_entry(dst_pmap, addr, PHYS_TO_VM_PAGE(pte2_pa(temp_pte2)))) { /* * Clear the wired, modified, and * accessed (referenced) bits * during the copy. */ temp_pte2 &= ~(PTE2_W | PTE2_A); temp_pte2 |= PTE2_NM; pte2_store(dst_pte2p, temp_pte2); dst_pmap->pm_stats.resident_count++; } else { SLIST_INIT(&free); if (pmap_unwire_pt2(dst_pmap, addr, dst_mpt2pg, &free)) { pmap_tlb_flush(dst_pmap, addr); pmap_free_zero_pages(&free); } goto out; } if (pt2_wirecount_get(dst_mpt2pg, pte1_idx) >= pt2_wirecount_get(src_mpt2pg, pte1_idx)) break; } addr += PAGE_SIZE; src_pte2p++; } } out: sched_unpin(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(src_pmap); PMAP_UNLOCK(dst_pmap); } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more section mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { vm_offset_t pte1_offset; if (size < PTE1_SIZE) return; if (object != NULL && (object->flags & OBJ_COLORED) != 0) offset += ptoa(object->pg_color); pte1_offset = offset & PTE1_OFFSET; if (size - ((PTE1_SIZE - pte1_offset) & PTE1_OFFSET) < PTE1_SIZE || (*addr & PTE1_OFFSET) == pte1_offset) return; if ((*addr & PTE1_OFFSET) < pte1_offset) *addr = pte1_trunc(*addr) + pte1_offset; else *addr = pte1_roundup(*addr) + pte1_offset; } void pmap_activate(struct thread *td) { pmap_t pmap, oldpmap; u_int cpuid, ttb; PDEBUG(9, printf("%s: td = %08x\n", __func__, (uint32_t)td)); critical_enter(); pmap = vmspace_pmap(td->td_proc->p_vmspace); oldpmap = PCPU_GET(curpmap); cpuid = PCPU_GET(cpuid); #if defined(SMP) CPU_CLR_ATOMIC(cpuid, &oldpmap->pm_active); CPU_SET_ATOMIC(cpuid, &pmap->pm_active); #else CPU_CLR(cpuid, &oldpmap->pm_active); CPU_SET(cpuid, &pmap->pm_active); #endif ttb = pmap_ttb_get(pmap); /* * pmap_activate is for the current thread on the current cpu */ td->td_pcb->pcb_pagedir = ttb; cp15_ttbr_set(ttb); PCPU_SET(curpmap, pmap); critical_exit(); } /* * Perform the pmap work for mincore. */ int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; vm_paddr_t pa; boolean_t managed; int val; PMAP_LOCK(pmap); retry: pte1p = pmap_pte1(pmap, addr); pte1 = pte1_load(pte1p); if (pte1_is_section(pte1)) { pa = trunc_page(pte1_pa(pte1) | (addr & PTE1_OFFSET)); managed = pte1_is_managed(pte1); val = MINCORE_SUPER | MINCORE_INCORE; if (pte1_is_dirty(pte1)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if (pte1 & PTE1_A) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } else if (pte1_is_link(pte1)) { pte2p = pmap_pte2(pmap, addr); pte2 = pte2_load(pte2p); pmap_pte2_release(pte2p); pa = pte2_pa(pte2); managed = pte2_is_managed(pte2); val = MINCORE_INCORE; if (pte2_is_dirty(pte2)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; if (pte2 & PTE2_A) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } else { managed = FALSE; val = 0; } if ((val & (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER)) != (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER) && managed) { /* Ensure that "PHYS_TO_VM_PAGE(pa)->object" doesn't change. */ if (vm_page_pa_tryrelock(pmap, pa, locked_pa)) goto retry; } else PA_UNLOCK_COND(*locked_pa); PMAP_UNLOCK(pmap); return (val); } void pmap_kenter_device(vm_offset_t va, vm_size_t size, vm_paddr_t pa) { vm_offset_t sva; KASSERT((size & PAGE_MASK) == 0, ("%s: device mapping not page-sized", __func__)); sva = va; while (size != 0) { pmap_kenter_prot_attr(va, pa, PTE2_AP_KRW, PTE2_ATTR_DEVICE); va += PAGE_SIZE; pa += PAGE_SIZE; size -= PAGE_SIZE; } tlb_flush_range(sva, va - sva); } void pmap_kremove_device(vm_offset_t va, vm_size_t size) { vm_offset_t sva; KASSERT((size & PAGE_MASK) == 0, ("%s: device mapping not page-sized", __func__)); sva = va; while (size != 0) { pmap_kremove(va); va += PAGE_SIZE; size -= PAGE_SIZE; } tlb_flush_range(sva, va - sva); } void pmap_set_pcb_pagedir(pmap_t pmap, struct pcb *pcb) { pcb->pcb_pagedir = pmap_ttb_get(pmap); } /* * Clean L1 data cache range by physical address. * The range must be within a single page. */ static void pmap_dcache_wb_pou(vm_paddr_t pa, vm_size_t size, vm_memattr_t ma) { struct sysmaps *sysmaps; KASSERT(((pa & PAGE_MASK) + size) <= PAGE_SIZE, ("%s: not on single page", __func__)); sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (*sysmaps->CMAP3) panic("%s: CMAP3 busy", __func__); pte2_store(sysmaps->CMAP3, PTE2_KERN_NG(pa, PTE2_AP_KRW, ma)); dcache_wb_pou((vm_offset_t)sysmaps->CADDR3 + (pa & PAGE_MASK), size); pte2_clear(sysmaps->CMAP3); tlb_flush((vm_offset_t)sysmaps->CADDR3); sched_unpin(); mtx_unlock(&sysmaps->lock); } /* * Sync instruction cache range which is not mapped yet. */ void cache_icache_sync_fresh(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { uint32_t len, offset; vm_page_t m; /* Write back d-cache on given address range. */ offset = pa & PAGE_MASK; for ( ; size != 0; size -= len, pa += len, offset = 0) { len = min(PAGE_SIZE - offset, size); m = PHYS_TO_VM_PAGE(pa); KASSERT(m != NULL, ("%s: vm_page_t is null for %#x", __func__, pa)); pmap_dcache_wb_pou(pa, len, m->md.pat_mode); } /* * I-cache is VIPT. Only way how to flush all virtual mappings * on given physical address is to invalidate all i-cache. */ icache_inv_all(); } void pmap_sync_icache(pmap_t pmap, vm_offset_t va, vm_size_t size) { /* Write back d-cache on given address range. */ if (va >= VM_MIN_KERNEL_ADDRESS) { dcache_wb_pou(va, size); } else { uint32_t len, offset; vm_paddr_t pa; vm_page_t m; offset = va & PAGE_MASK; for ( ; size != 0; size -= len, va += len, offset = 0) { pa = pmap_extract(pmap, va); /* offset is preserved */ len = min(PAGE_SIZE - offset, size); m = PHYS_TO_VM_PAGE(pa); KASSERT(m != NULL, ("%s: vm_page_t is null for %#x", __func__, pa)); pmap_dcache_wb_pou(pa, len, m->md.pat_mode); } } /* * I-cache is VIPT. Only way how to flush all virtual mappings * on given physical address is to invalidate all i-cache. */ icache_inv_all(); } /* * The implementation of pmap_fault() uses IN_RANGE2() macro which * depends on the fact that given range size is a power of 2. */ CTASSERT(powerof2(NB_IN_PT1)); CTASSERT(powerof2(PT2MAP_SIZE)); #define IN_RANGE2(addr, start, size) \ ((vm_offset_t)(start) == ((vm_offset_t)(addr) & ~((size) - 1))) /* * Handle access and R/W emulation faults. */ int pmap_fault(pmap_t pmap, vm_offset_t far, uint32_t fsr, int idx, bool usermode) { pt1_entry_t *pte1p, pte1; pt2_entry_t *pte2p, pte2; if (pmap == NULL) pmap = kernel_pmap; /* * In kernel, we should never get abort with FAR which is in range of * pmap->pm_pt1 or PT2MAP address spaces. If it happens, stop here * and print out a useful abort message and even get to the debugger * otherwise it likely ends with never ending loop of aborts. */ if (__predict_false(IN_RANGE2(far, pmap->pm_pt1, NB_IN_PT1))) { /* * All L1 tables should always be mapped and present. * However, we check only current one herein. For user mode, * only permission abort from malicious user is not fatal. * And alignment abort as it may have higher priority. */ if (!usermode || (idx != FAULT_ALIGN && idx != FAULT_PERM_L2)) { CTR4(KTR_PMAP, "%s: pmap %#x pm_pt1 %#x far %#x", __func__, pmap, pmap->pm_pt1, far); panic("%s: pm_pt1 abort", __func__); } return (EFAULT); } if (__predict_false(IN_RANGE2(far, PT2MAP, PT2MAP_SIZE))) { /* * PT2MAP should be always mapped and present in current * L1 table. However, only existing L2 tables are mapped * in PT2MAP. For user mode, only L2 translation abort and * permission abort from malicious user is not fatal. * And alignment abort as it may have higher priority. */ if (!usermode || (idx != FAULT_ALIGN && idx != FAULT_TRAN_L2 && idx != FAULT_PERM_L2)) { CTR4(KTR_PMAP, "%s: pmap %#x PT2MAP %#x far %#x", __func__, pmap, PT2MAP, far); panic("%s: PT2MAP abort", __func__); } return (EFAULT); } /* * Accesss bits for page and section. Note that the entry * is not in TLB yet, so TLB flush is not necessary. * * QQQ: This is hardware emulation, we do not call userret() * for aborts from user mode. * We do not lock PMAP, so cmpset() is a need. Hopefully, * no one removes the mapping when we are here. */ if (idx == FAULT_ACCESS_L2) { pte2p = pt2map_entry(far); pte2_seta: pte2 = pte2_load(pte2p); if (pte2_is_valid(pte2)) { if (!pte2_cmpset(pte2p, pte2, pte2 | PTE2_A)) { goto pte2_seta; } return (0); } } if (idx == FAULT_ACCESS_L1) { pte1p = pmap_pte1(pmap, far); pte1_seta: pte1 = pte1_load(pte1p); if (pte1_is_section(pte1)) { if (!pte1_cmpset(pte1p, pte1, pte1 | PTE1_A)) { goto pte1_seta; } return (0); } } /* * Handle modify bits for page and section. Note that the modify * bit is emulated by software. So PTEx_RO is software read only * bit and PTEx_NM flag is real hardware read only bit. * * QQQ: This is hardware emulation, we do not call userret() * for aborts from user mode. * We do not lock PMAP, so cmpset() is a need. Hopefully, * no one removes the mapping when we are here. */ if ((fsr & FSR_WNR) && (idx == FAULT_PERM_L2)) { pte2p = pt2map_entry(far); pte2_setrw: pte2 = pte2_load(pte2p); if (pte2_is_valid(pte2) && !(pte2 & PTE2_RO) && (pte2 & PTE2_NM)) { if (!pte2_cmpset(pte2p, pte2, pte2 & ~PTE2_NM)) { goto pte2_setrw; } tlb_flush(trunc_page(far)); return (0); } } if ((fsr & FSR_WNR) && (idx == FAULT_PERM_L1)) { pte1p = pmap_pte1(pmap, far); pte1_setrw: pte1 = pte1_load(pte1p); if (pte1_is_section(pte1) && !(pte1 & PTE1_RO) && (pte1 & PTE1_NM)) { if (!pte1_cmpset(pte1p, pte1, pte1 & ~PTE1_NM)) { goto pte1_setrw; } tlb_flush(pte1_trunc(far)); return (0); } } /* * QQQ: The previous code, mainly fast handling of access and * modify bits aborts, could be moved to ASM. Now we are * starting to deal with not fast aborts. */ #ifdef INVARIANTS /* * Read an entry in PT2TAB associated with both pmap and far. * It's safe because PT2TAB is always mapped. * * QQQ: We do not lock PMAP, so false positives could happen if * the mapping is removed concurrently. */ pte2 = pt2tab_load(pmap_pt2tab_entry(pmap, far)); if (pte2_is_valid(pte2)) { /* * Now, when we know that L2 page table is allocated, * we can use PT2MAP to get L2 page table entry. */ pte2 = pte2_load(pt2map_entry(far)); if (pte2_is_valid(pte2)) { /* * If L2 page table entry is valid, make sure that * L1 page table entry is valid too. Note that we * leave L2 page entries untouched when promoted. */ pte1 = pte1_load(pmap_pte1(pmap, far)); if (!pte1_is_valid(pte1)) { panic("%s: missing L1 page entry (%p, %#x)", __func__, pmap, far); } } } #endif return (EAGAIN); } /* !!!! REMOVE !!!! */ void pmap_pte_init_mmu_v6(void) { } void vector_page_setprot(int p) { } #if defined(PMAP_DEBUG) /* * Reusing of KVA used in pmap_zero_page function !!! */ static void pmap_zero_page_check(vm_page_t m) { uint32_t *p, *end; struct sysmaps *sysmaps; sched_pin(); sysmaps = &sysmaps_pcpu[PCPU_GET(cpuid)]; mtx_lock(&sysmaps->lock); if (pte2_load(sysmaps->CMAP2) != 0) panic("%s: CMAP2 busy", __func__); pte2_store(sysmaps->CMAP2, PTE2_KERN_NG(VM_PAGE_TO_PHYS(m), PTE2_AP_KRW, m->md.pat_mode)); end = (uint32_t*)(sysmaps->CADDR2 + PAGE_SIZE); for (p = (uint32_t*)sysmaps->CADDR2; p < end; p++) if (*p != 0) panic("%s: page %p not zero, va: %p", __func__, m, sysmaps->CADDR2); pte2_clear(sysmaps->CMAP2); tlb_flush((vm_offset_t)sysmaps->CADDR2); sched_unpin(); mtx_unlock(&sysmaps->lock); } int pmap_pid_dump(int pid) { pmap_t pmap; struct proc *p; int npte2 = 0; int i, j, index; sx_slock(&allproc_lock); FOREACH_PROC_IN_SYSTEM(p) { if (p->p_pid != pid || p->p_vmspace == NULL) continue; index = 0; pmap = vmspace_pmap(p->p_vmspace); for (i = 0; i < NPTE1_IN_PT1; i++) { pt1_entry_t pte1; pt2_entry_t *pte2p, pte2; vm_offset_t base, va; vm_paddr_t pa; vm_page_t m; base = i << PTE1_SHIFT; pte1 = pte1_load(&pmap->pm_pt1[i]); if (pte1_is_section(pte1)) { /* * QQQ: Do something here! */ } else if (pte1_is_link(pte1)) { for (j = 0; j < NPTE2_IN_PT2; j++) { va = base + (j << PAGE_SHIFT); if (va >= VM_MIN_KERNEL_ADDRESS) { if (index) { index = 0; printf("\n"); } sx_sunlock(&allproc_lock); return (npte2); } pte2p = pmap_pte2(pmap, va); pte2 = pte2_load(pte2p); pmap_pte2_release(pte2p); if (!pte2_is_valid(pte2)) continue; pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pa); printf("va: 0x%x, pa: 0x%x, h: %d, w:" " %d, f: 0x%x", va, pa, m->hold_count, m->wire_count, m->flags); npte2++; index++; if (index >= 2) { index = 0; printf("\n"); } else { printf(" "); } } } } } sx_sunlock(&allproc_lock); return (npte2); } #endif #ifdef DDB static pt2_entry_t * pmap_pte2_ddb(pmap_t pmap, vm_offset_t va) { pt1_entry_t pte1; vm_paddr_t pt2pg_pa; pte1 = pte1_load(pmap_pte1(pmap, va)); if (!pte1_is_link(pte1)) return (NULL); if (pmap_is_current(pmap)) return (pt2map_entry(va)); /* Note that L2 page table size is not equal to PAGE_SIZE. */ pt2pg_pa = trunc_page(pte1_link_pa(pte1)); if (pte2_pa(pte2_load(PMAP3)) != pt2pg_pa) { pte2_store(PMAP3, PTE2_KPT(pt2pg_pa)); #ifdef SMP PMAP3cpu = PCPU_GET(cpuid); #endif tlb_flush_local((vm_offset_t)PADDR3); } #ifdef SMP else if (PMAP3cpu != PCPU_GET(cpuid)) { PMAP3cpu = PCPU_GET(cpuid); tlb_flush_local((vm_offset_t)PADDR3); } #endif return (PADDR3 + (arm32_btop(va) & (NPTE2_IN_PG - 1))); } static void dump_pmap(pmap_t pmap) { printf("pmap %p\n", pmap); printf(" pm_pt1: %p\n", pmap->pm_pt1); printf(" pm_pt2tab: %p\n", pmap->pm_pt2tab); printf(" pm_active: 0x%08lX\n", pmap->pm_active.__bits[0]); } DB_SHOW_COMMAND(pmaps, pmap_list_pmaps) { pmap_t pmap; LIST_FOREACH(pmap, &allpmaps, pm_list) { dump_pmap(pmap); } } static int pte2_class(pt2_entry_t pte2) { int cls; cls = (pte2 >> 2) & 0x03; cls |= (pte2 >> 4) & 0x04; return (cls); } static void dump_section(pmap_t pmap, uint32_t pte1_idx) { } static void dump_link(pmap_t pmap, uint32_t pte1_idx, boolean_t invalid_ok) { uint32_t i; vm_offset_t va; pt2_entry_t *pte2p, pte2; vm_page_t m; va = pte1_idx << PTE1_SHIFT; pte2p = pmap_pte2_ddb(pmap, va); for (i = 0; i < NPTE2_IN_PT2; i++, pte2p++, va += PAGE_SIZE) { pte2 = pte2_load(pte2p); if (pte2 == 0) continue; if (!pte2_is_valid(pte2)) { printf(" 0x%08X: 0x%08X", va, pte2); if (!invalid_ok) printf(" - not valid !!!"); printf("\n"); continue; } m = PHYS_TO_VM_PAGE(pte2_pa(pte2)); printf(" 0x%08X: 0x%08X, TEX%d, s:%d, g:%d, m:%p", va , pte2, pte2_class(pte2), !!(pte2 & PTE2_S), !(pte2 & PTE2_NG), m); if (m != NULL) { printf(" v:%d h:%d w:%d f:0x%04X\n", m->valid, m->hold_count, m->wire_count, m->flags); } else { printf("\n"); } } } static __inline boolean_t is_pv_chunk_space(vm_offset_t va) { if ((((vm_offset_t)pv_chunkbase) <= va) && (va < ((vm_offset_t)pv_chunkbase + PAGE_SIZE * pv_maxchunks))) return (TRUE); return (FALSE); } DB_SHOW_COMMAND(pmap, pmap_pmap_print) { /* XXX convert args. */ pmap_t pmap = (pmap_t)addr; pt1_entry_t pte1; pt2_entry_t pte2; vm_offset_t va, eva; vm_page_t m; uint32_t i; boolean_t invalid_ok, dump_link_ok, dump_pv_chunk; if (have_addr) { pmap_t pm; LIST_FOREACH(pm, &allpmaps, pm_list) if (pm == pmap) break; if (pm == NULL) { printf("given pmap %p is not in allpmaps list\n", pmap); return; } } else pmap = PCPU_GET(curpmap); eva = (modif[0] == 'u') ? VM_MAXUSER_ADDRESS : 0xFFFFFFFF; dump_pv_chunk = FALSE; /* XXX evaluate from modif[] */ printf("pmap: 0x%08X\n", (uint32_t)pmap); printf("PT2MAP: 0x%08X\n", (uint32_t)PT2MAP); printf("pt2tab: 0x%08X\n", (uint32_t)pmap->pm_pt2tab); for(i = 0; i < NPTE1_IN_PT1; i++) { pte1 = pte1_load(&pmap->pm_pt1[i]); if (pte1 == 0) continue; va = i << PTE1_SHIFT; if (va >= eva) break; if (pte1_is_section(pte1)) { printf("0x%08X: Section 0x%08X, s:%d g:%d\n", va, pte1, !!(pte1 & PTE1_S), !(pte1 & PTE1_NG)); dump_section(pmap, i); } else if (pte1_is_link(pte1)) { dump_link_ok = TRUE; invalid_ok = FALSE; pte2 = pte2_load(pmap_pt2tab_entry(pmap, va)); m = PHYS_TO_VM_PAGE(pte1_link_pa(pte1)); printf("0x%08X: Link 0x%08X, pt2tab: 0x%08X m: %p", va, pte1, pte2, m); if (is_pv_chunk_space(va)) { printf(" - pv_chunk space"); if (dump_pv_chunk) invalid_ok = TRUE; else dump_link_ok = FALSE; } else if (m != NULL) printf(" w:%d w2:%u", m->wire_count, pt2_wirecount_get(m, pte1_index(va))); if (pte2 == 0) printf(" !!! pt2tab entry is ZERO"); else if (pte2_pa(pte1) != pte2_pa(pte2)) printf(" !!! pt2tab entry is DIFFERENT - m: %p", PHYS_TO_VM_PAGE(pte2_pa(pte2))); printf("\n"); if (dump_link_ok) dump_link(pmap, i, invalid_ok); } else printf("0x%08X: Invalid entry 0x%08X\n", va, pte1); } } static void dump_pt2tab(pmap_t pmap) { uint32_t i; pt2_entry_t pte2; vm_offset_t va; vm_paddr_t pa; vm_page_t m; printf("PT2TAB:\n"); for (i = 0; i < PT2TAB_ENTRIES; i++) { pte2 = pte2_load(&pmap->pm_pt2tab[i]); if (!pte2_is_valid(pte2)) continue; va = i << PT2TAB_SHIFT; pa = pte2_pa(pte2); m = PHYS_TO_VM_PAGE(pa); printf(" 0x%08X: 0x%08X, TEX%d, s:%d, m:%p", va, pte2, pte2_class(pte2), !!(pte2 & PTE2_S), m); if (m != NULL) printf(" , h: %d, w: %d, f: 0x%04X pidx: %lld", m->hold_count, m->wire_count, m->flags, m->pindex); printf("\n"); } } DB_SHOW_COMMAND(pmap_pt2tab, pmap_pt2tab_print) { /* XXX convert args. */ pmap_t pmap = (pmap_t)addr; pt1_entry_t pte1; pt2_entry_t pte2; vm_offset_t va; uint32_t i, start; if (have_addr) { printf("supported only on current pmap\n"); return; } pmap = PCPU_GET(curpmap); printf("curpmap: 0x%08X\n", (uint32_t)pmap); printf("PT2MAP: 0x%08X\n", (uint32_t)PT2MAP); printf("pt2tab: 0x%08X\n", (uint32_t)pmap->pm_pt2tab); start = pte1_index((vm_offset_t)PT2MAP); for (i = start; i < (start + NPT2_IN_PT2TAB); i++) { pte1 = pte1_load(&pmap->pm_pt1[i]); if (pte1 == 0) continue; va = i << PTE1_SHIFT; if (pte1_is_section(pte1)) { printf("0x%08X: Section 0x%08X, s:%d\n", va, pte1, !!(pte1 & PTE1_S)); dump_section(pmap, i); } else if (pte1_is_link(pte1)) { pte2 = pte2_load(pmap_pt2tab_entry(pmap, va)); printf("0x%08X: Link 0x%08X, pt2tab: 0x%08X\n", va, pte1, pte2); if (pte2 == 0) printf(" !!! pt2tab entry is ZERO\n"); } else printf("0x%08X: Invalid entry 0x%08X\n", va, pte1); } dump_pt2tab(pmap); } #endif Index: projects/clang380-import/sys/arm/arm/pmap-v6.c =================================================================== --- projects/clang380-import/sys/arm/arm/pmap-v6.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/pmap-v6.c (revision 294777) @@ -1,5419 +1,5465 @@ /* From: $NetBSD: pmap.c,v 1.148 2004/04/03 04:35:48 bsh Exp $ */ /*- * Copyright 2011 Semihalf * Copyright 2004 Olivier Houchard. * Copyright 2003 Wasabi Systems, Inc. * All rights reserved. * * Written by Steve C. Woodford for Wasabi Systems, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed for the NetBSD Project by * Wasabi Systems, Inc. * 4. The name of Wasabi Systems, Inc. may not be used to endorse * or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY WASABI SYSTEMS, INC. ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WASABI SYSTEMS, INC * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * From: FreeBSD: src/sys/arm/arm/pmap.c,v 1.113 2009/07/24 13:50:29 */ /*- * Copyright (c) 2002-2003 Wasabi Systems, Inc. * Copyright (c) 2001 Richard Earnshaw * Copyright (c) 2001-2002 Christopher Gilbert * All rights reserved. * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the company nor the name of the author may be used to * endorse or promote products derived from this software without specific * prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /*- * Copyright (c) 1999 The NetBSD Foundation, Inc. * All rights reserved. * * This code is derived from software contributed to The NetBSD Foundation * by Charles M. Hannum. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /*- * Copyright (c) 1994-1998 Mark Brinicombe. * Copyright (c) 1994 Brini. * All rights reserved. * * This code is derived from software written for Brini by Mark Brinicombe * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by Mark Brinicombe. * 4. The name of the author may not be used to endorse or promote products * derived from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * * RiscBSD kernel project * * pmap.c * * Machine dependant vm stuff * * Created : 20/09/94 */ /* * Special compilation symbols * PMAP_DEBUG - Build in pmap_debug_level code * * Note that pmap_mapdev() and pmap_unmapdev() are implemented in arm/devmap.c */ /* Include header files */ #include "opt_vm.h" #include "opt_pmap.h" #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef DEBUG extern int last_fault_code; #endif #ifdef PMAP_DEBUG #define PDEBUG(_lev_,_stat_) \ if (pmap_debug_level >= (_lev_)) \ ((_stat_)) #define dprintf printf int pmap_debug_level = 0; #define PMAP_INLINE #else /* PMAP_DEBUG */ #define PDEBUG(_lev_,_stat_) /* Nothing */ #define dprintf(x, arg...) #define PMAP_INLINE __inline #endif /* PMAP_DEBUG */ #ifdef PV_STATS #define PV_STAT(x) do { x ; } while (0) #else #define PV_STAT(x) do { } while (0) #endif #define pa_to_pvh(pa) (&pv_table[pa_index(pa)]) #ifdef ARM_L2_PIPT #define pmap_l2cache_wbinv_range(va, pa, size) cpu_l2cache_wbinv_range((pa), (size)) #define pmap_l2cache_inv_range(va, pa, size) cpu_l2cache_inv_range((pa), (size)) #else #define pmap_l2cache_wbinv_range(va, pa, size) cpu_l2cache_wbinv_range((va), (size)) #define pmap_l2cache_inv_range(va, pa, size) cpu_l2cache_inv_range((va), (size)) #endif extern struct pv_addr systempage; /* * Internal function prototypes */ static PMAP_INLINE struct pv_entry *pmap_find_pv(struct md_page *, pmap_t, vm_offset_t); static void pmap_free_pv_chunk(struct pv_chunk *pc); static void pmap_free_pv_entry(pmap_t pmap, pv_entry_t pv); static pv_entry_t pmap_get_pv_entry(pmap_t pmap, boolean_t try); static vm_page_t pmap_pv_reclaim(pmap_t locked_pmap); static boolean_t pmap_pv_insert_section(pmap_t, vm_offset_t, vm_paddr_t); static struct pv_entry *pmap_remove_pv(struct vm_page *, pmap_t, vm_offset_t); static int pmap_pvh_wired_mappings(struct md_page *, int); static int pmap_enter_locked(pmap_t, vm_offset_t, vm_page_t, vm_prot_t, u_int); static vm_paddr_t pmap_extract_locked(pmap_t pmap, vm_offset_t va); static void pmap_alloc_l1(pmap_t); static void pmap_free_l1(pmap_t); static void pmap_map_section(pmap_t, vm_offset_t, vm_offset_t, vm_prot_t, boolean_t); static void pmap_promote_section(pmap_t, vm_offset_t); static boolean_t pmap_demote_section(pmap_t, vm_offset_t); static boolean_t pmap_enter_section(pmap_t, vm_offset_t, vm_page_t, vm_prot_t); static void pmap_remove_section(pmap_t, vm_offset_t); static int pmap_clearbit(struct vm_page *, u_int); static struct l2_bucket *pmap_get_l2_bucket(pmap_t, vm_offset_t); static struct l2_bucket *pmap_alloc_l2_bucket(pmap_t, vm_offset_t); static void pmap_free_l2_bucket(pmap_t, struct l2_bucket *, u_int); static vm_offset_t kernel_pt_lookup(vm_paddr_t); static MALLOC_DEFINE(M_VMPMAP, "pmap", "PMAP L1"); vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ vm_offset_t pmap_curmaxkvaddr; vm_paddr_t kernel_l1pa; vm_offset_t kernel_vm_end = 0; vm_offset_t vm_max_kernel_address; struct pmap kernel_pmap_store; /* * Resources for quickly copying and zeroing pages using virtual address space * and page table entries that are pre-allocated per-CPU by pmap_init(). */ struct czpages { struct mtx lock; pt_entry_t *srcptep; pt_entry_t *dstptep; vm_offset_t srcva; vm_offset_t dstva; }; static struct czpages cpu_czpages[MAXCPU]; static void pmap_init_l1(struct l1_ttable *, pd_entry_t *); /* * These routines are called when the CPU type is identified to set up * the PTE prototypes, cache modes, etc. * * The variables are always here, just in case LKMs need to reference * them (though, they shouldn't). */ static void pmap_set_prot(pt_entry_t *pte, vm_prot_t prot, uint8_t user); pt_entry_t pte_l1_s_cache_mode; pt_entry_t pte_l1_s_cache_mode_pt; pt_entry_t pte_l2_l_cache_mode; pt_entry_t pte_l2_l_cache_mode_pt; pt_entry_t pte_l2_s_cache_mode; pt_entry_t pte_l2_s_cache_mode_pt; struct msgbuf *msgbufp = 0; /* * Crashdump maps. */ static caddr_t crashdumpmap; extern void bcopy_page(vm_offset_t, vm_offset_t); extern void bzero_page(vm_offset_t); char *_tmppt; /* * Metadata for L1 translation tables. */ struct l1_ttable { /* Entry on the L1 Table list */ SLIST_ENTRY(l1_ttable) l1_link; /* Entry on the L1 Least Recently Used list */ TAILQ_ENTRY(l1_ttable) l1_lru; /* Track how many domains are allocated from this L1 */ volatile u_int l1_domain_use_count; /* * A free-list of domain numbers for this L1. * We avoid using ffs() and a bitmap to track domains since ffs() * is slow on ARM. */ u_int8_t l1_domain_first; u_int8_t l1_domain_free[PMAP_DOMAINS]; /* Physical address of this L1 page table */ vm_paddr_t l1_physaddr; /* KVA of this L1 page table */ pd_entry_t *l1_kva; }; /* * Convert a virtual address into its L1 table index. That is, the * index used to locate the L2 descriptor table pointer in an L1 table. * This is basically used to index l1->l1_kva[]. * * Each L2 descriptor table represents 1MB of VA space. */ #define L1_IDX(va) (((vm_offset_t)(va)) >> L1_S_SHIFT) /* * L1 Page Tables are tracked using a Least Recently Used list. * - New L1s are allocated from the HEAD. * - Freed L1s are added to the TAIl. * - Recently accessed L1s (where an 'access' is some change to one of * the userland pmaps which owns this L1) are moved to the TAIL. */ static TAILQ_HEAD(, l1_ttable) l1_lru_list; /* * A list of all L1 tables */ static SLIST_HEAD(, l1_ttable) l1_list; static struct mtx l1_lru_lock; /* * The l2_dtable tracks L2_BUCKET_SIZE worth of L1 slots. * * This is normally 16MB worth L2 page descriptors for any given pmap. * Reference counts are maintained for L2 descriptors so they can be * freed when empty. */ struct l2_dtable { /* The number of L2 page descriptors allocated to this l2_dtable */ u_int l2_occupancy; /* List of L2 page descriptors */ struct l2_bucket { pt_entry_t *l2b_kva; /* KVA of L2 Descriptor Table */ vm_paddr_t l2b_phys; /* Physical address of same */ u_short l2b_l1idx; /* This L2 table's L1 index */ u_short l2b_occupancy; /* How many active descriptors */ } l2_bucket[L2_BUCKET_SIZE]; }; /* pmap_kenter_internal flags */ #define KENTER_CACHE 0x1 #define KENTER_DEVICE 0x2 #define KENTER_USER 0x4 /* * Given an L1 table index, calculate the corresponding l2_dtable index * and bucket index within the l2_dtable. */ #define L2_IDX(l1idx) (((l1idx) >> L2_BUCKET_LOG2) & \ (L2_SIZE - 1)) #define L2_BUCKET(l1idx) ((l1idx) & (L2_BUCKET_SIZE - 1)) /* * Given a virtual address, this macro returns the * virtual address required to drop into the next L2 bucket. */ #define L2_NEXT_BUCKET(va) (((va) & L1_S_FRAME) + L1_S_SIZE) /* * We try to map the page tables write-through, if possible. However, not * all CPUs have a write-through cache mode, so on those we have to sync * the cache when we frob page tables. * * We try to evaluate this at compile time, if possible. However, it's * not always possible to do that, hence this run-time var. */ int pmap_needs_pte_sync; /* * Macro to determine if a mapping might be resident in the * instruction cache and/or TLB */ #define PTE_BEEN_EXECD(pte) (L2_S_EXECUTABLE(pte) && L2_S_REFERENCED(pte)) /* * Macro to determine if a mapping might be resident in the * data cache and/or TLB */ #define PTE_BEEN_REFD(pte) (L2_S_REFERENCED(pte)) #ifndef PMAP_SHPGPERPROC #define PMAP_SHPGPERPROC 200 #endif #define pmap_is_current(pm) ((pm) == pmap_kernel() || \ curproc->p_vmspace->vm_map.pmap == (pm)) /* * Data for the pv entry allocation mechanism */ static TAILQ_HEAD(pch, pv_chunk) pv_chunks = TAILQ_HEAD_INITIALIZER(pv_chunks); static int pv_entry_count, pv_entry_max, pv_entry_high_water; static struct md_page *pv_table; static int shpgperproc = PMAP_SHPGPERPROC; struct pv_chunk *pv_chunkbase; /* KVA block for pv_chunks */ int pv_maxchunks; /* How many chunks we have KVA for */ vm_offset_t pv_vafree; /* Freelist stored in the PTE */ static __inline struct pv_chunk * pv_to_chunk(pv_entry_t pv) { return ((struct pv_chunk *)((uintptr_t)pv & ~(uintptr_t)PAGE_MASK)); } #define PV_PMAP(pv) (pv_to_chunk(pv)->pc_pmap) CTASSERT(sizeof(struct pv_chunk) == PAGE_SIZE); CTASSERT(_NPCM == 8); CTASSERT(_NPCPV == 252); #define PC_FREE0_6 0xfffffffful /* Free values for index 0 through 6 */ #define PC_FREE7 0x0ffffffful /* Free values for index 7 */ static const uint32_t pc_freemask[_NPCM] = { PC_FREE0_6, PC_FREE0_6, PC_FREE0_6, PC_FREE0_6, PC_FREE0_6, PC_FREE0_6, PC_FREE0_6, PC_FREE7 }; static SYSCTL_NODE(_vm, OID_AUTO, pmap, CTLFLAG_RD, 0, "VM/pmap parameters"); /* Superpages utilization enabled = 1 / disabled = 0 */ static int sp_enabled = 1; SYSCTL_INT(_vm_pmap, OID_AUTO, sp_enabled, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &sp_enabled, 0, "Are large page mappings enabled?"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_count, CTLFLAG_RD, &pv_entry_count, 0, "Current number of pv entries"); #ifdef PV_STATS static int pc_chunk_count, pc_chunk_allocs, pc_chunk_frees, pc_chunk_tryfail; SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_count, CTLFLAG_RD, &pc_chunk_count, 0, "Current number of pv entry chunks"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_allocs, CTLFLAG_RD, &pc_chunk_allocs, 0, "Current number of pv entry chunks allocated"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_frees, CTLFLAG_RD, &pc_chunk_frees, 0, "Current number of pv entry chunks frees"); SYSCTL_INT(_vm_pmap, OID_AUTO, pc_chunk_tryfail, CTLFLAG_RD, &pc_chunk_tryfail, 0, "Number of times tried to get a chunk page but failed."); static long pv_entry_frees, pv_entry_allocs; static int pv_entry_spare; SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_frees, CTLFLAG_RD, &pv_entry_frees, 0, "Current number of pv entry frees"); SYSCTL_LONG(_vm_pmap, OID_AUTO, pv_entry_allocs, CTLFLAG_RD, &pv_entry_allocs, 0, "Current number of pv entry allocs"); SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_spare, CTLFLAG_RD, &pv_entry_spare, 0, "Current number of spare pv entries"); #endif uma_zone_t l2zone; static uma_zone_t l2table_zone; static vm_offset_t pmap_kernel_l2dtable_kva; static vm_offset_t pmap_kernel_l2ptp_kva; static vm_paddr_t pmap_kernel_l2ptp_phys; static struct rwlock pvh_global_lock; int l1_mem_types[] = { ARM_L1S_STRONG_ORD, ARM_L1S_DEVICE_NOSHARE, ARM_L1S_DEVICE_SHARE, ARM_L1S_NRML_NOCACHE, ARM_L1S_NRML_IWT_OWT, ARM_L1S_NRML_IWB_OWB, ARM_L1S_NRML_IWBA_OWBA }; int l2l_mem_types[] = { ARM_L2L_STRONG_ORD, ARM_L2L_DEVICE_NOSHARE, ARM_L2L_DEVICE_SHARE, ARM_L2L_NRML_NOCACHE, ARM_L2L_NRML_IWT_OWT, ARM_L2L_NRML_IWB_OWB, ARM_L2L_NRML_IWBA_OWBA }; int l2s_mem_types[] = { ARM_L2S_STRONG_ORD, ARM_L2S_DEVICE_NOSHARE, ARM_L2S_DEVICE_SHARE, ARM_L2S_NRML_NOCACHE, ARM_L2S_NRML_IWT_OWT, ARM_L2S_NRML_IWB_OWB, ARM_L2S_NRML_IWBA_OWBA }; /* * This list exists for the benefit of pmap_map_chunk(). It keeps track * of the kernel L2 tables during bootstrap, so that pmap_map_chunk() can * find them as necessary. * * Note that the data on this list MUST remain valid after initarm() returns, * as pmap_bootstrap() uses it to contruct L2 table metadata. */ SLIST_HEAD(, pv_addr) kernel_pt_list = SLIST_HEAD_INITIALIZER(kernel_pt_list); static void pmap_init_l1(struct l1_ttable *l1, pd_entry_t *l1pt) { int i; l1->l1_kva = l1pt; l1->l1_domain_use_count = 0; l1->l1_domain_first = 0; for (i = 0; i < PMAP_DOMAINS; i++) l1->l1_domain_free[i] = i + 1; /* * Copy the kernel's L1 entries to each new L1. */ if (l1pt != pmap_kernel()->pm_l1->l1_kva) memcpy(l1pt, pmap_kernel()->pm_l1->l1_kva, L1_TABLE_SIZE); if ((l1->l1_physaddr = pmap_extract(pmap_kernel(), (vm_offset_t)l1pt)) == 0) panic("pmap_init_l1: can't get PA of L1 at %p", l1pt); SLIST_INSERT_HEAD(&l1_list, l1, l1_link); TAILQ_INSERT_TAIL(&l1_lru_list, l1, l1_lru); } static vm_offset_t kernel_pt_lookup(vm_paddr_t pa) { struct pv_addr *pv; SLIST_FOREACH(pv, &kernel_pt_list, pv_list) { if (pv->pv_pa == pa) return (pv->pv_va); } return (0); } void pmap_pte_init_mmu_v6(void) { if (PTE_PAGETABLE >= 3) pmap_needs_pte_sync = 1; pte_l1_s_cache_mode = l1_mem_types[PTE_CACHE]; pte_l2_l_cache_mode = l2l_mem_types[PTE_CACHE]; pte_l2_s_cache_mode = l2s_mem_types[PTE_CACHE]; pte_l1_s_cache_mode_pt = l1_mem_types[PTE_PAGETABLE]; pte_l2_l_cache_mode_pt = l2l_mem_types[PTE_PAGETABLE]; pte_l2_s_cache_mode_pt = l2s_mem_types[PTE_PAGETABLE]; } /* * Allocate an L1 translation table for the specified pmap. * This is called at pmap creation time. */ static void pmap_alloc_l1(pmap_t pmap) { struct l1_ttable *l1; u_int8_t domain; /* * Remove the L1 at the head of the LRU list */ mtx_lock(&l1_lru_lock); l1 = TAILQ_FIRST(&l1_lru_list); TAILQ_REMOVE(&l1_lru_list, l1, l1_lru); /* * Pick the first available domain number, and update * the link to the next number. */ domain = l1->l1_domain_first; l1->l1_domain_first = l1->l1_domain_free[domain]; /* * If there are still free domain numbers in this L1, * put it back on the TAIL of the LRU list. */ if (++l1->l1_domain_use_count < PMAP_DOMAINS) TAILQ_INSERT_TAIL(&l1_lru_list, l1, l1_lru); mtx_unlock(&l1_lru_lock); /* * Fix up the relevant bits in the pmap structure */ pmap->pm_l1 = l1; pmap->pm_domain = domain + 1; } /* * Free an L1 translation table. * This is called at pmap destruction time. */ static void pmap_free_l1(pmap_t pmap) { struct l1_ttable *l1 = pmap->pm_l1; mtx_lock(&l1_lru_lock); /* * If this L1 is currently on the LRU list, remove it. */ if (l1->l1_domain_use_count < PMAP_DOMAINS) TAILQ_REMOVE(&l1_lru_list, l1, l1_lru); /* * Free up the domain number which was allocated to the pmap */ l1->l1_domain_free[pmap->pm_domain - 1] = l1->l1_domain_first; l1->l1_domain_first = pmap->pm_domain - 1; l1->l1_domain_use_count--; /* * The L1 now must have at least 1 free domain, so add * it back to the LRU list. If the use count is zero, * put it at the head of the list, otherwise it goes * to the tail. */ if (l1->l1_domain_use_count == 0) { TAILQ_INSERT_HEAD(&l1_lru_list, l1, l1_lru); } else TAILQ_INSERT_TAIL(&l1_lru_list, l1, l1_lru); mtx_unlock(&l1_lru_lock); } /* * Returns a pointer to the L2 bucket associated with the specified pmap * and VA, or NULL if no L2 bucket exists for the address. */ static PMAP_INLINE struct l2_bucket * pmap_get_l2_bucket(pmap_t pmap, vm_offset_t va) { struct l2_dtable *l2; struct l2_bucket *l2b; u_short l1idx; l1idx = L1_IDX(va); if ((l2 = pmap->pm_l2[L2_IDX(l1idx)]) == NULL || (l2b = &l2->l2_bucket[L2_BUCKET(l1idx)])->l2b_kva == NULL) return (NULL); return (l2b); } /* * Returns a pointer to the L2 bucket associated with the specified pmap * and VA. * * If no L2 bucket exists, perform the necessary allocations to put an L2 * bucket/page table in place. * * Note that if a new L2 bucket/page was allocated, the caller *must* * increment the bucket occupancy counter appropriately *before* * releasing the pmap's lock to ensure no other thread or cpu deallocates * the bucket/page in the meantime. */ static struct l2_bucket * pmap_alloc_l2_bucket(pmap_t pmap, vm_offset_t va) { struct l2_dtable *l2; struct l2_bucket *l2b; u_short l1idx; l1idx = L1_IDX(va); PMAP_ASSERT_LOCKED(pmap); rw_assert(&pvh_global_lock, RA_WLOCKED); if ((l2 = pmap->pm_l2[L2_IDX(l1idx)]) == NULL) { /* * No mapping at this address, as there is * no entry in the L1 table. * Need to allocate a new l2_dtable. */ PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); if ((l2 = uma_zalloc(l2table_zone, M_NOWAIT)) == NULL) { rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); return (NULL); } rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); if (pmap->pm_l2[L2_IDX(l1idx)] != NULL) { /* * Someone already allocated the l2_dtable while * we were doing the same. */ uma_zfree(l2table_zone, l2); l2 = pmap->pm_l2[L2_IDX(l1idx)]; } else { bzero(l2, sizeof(*l2)); /* * Link it into the parent pmap */ pmap->pm_l2[L2_IDX(l1idx)] = l2; } } l2b = &l2->l2_bucket[L2_BUCKET(l1idx)]; /* * Fetch pointer to the L2 page table associated with the address. */ if (l2b->l2b_kva == NULL) { pt_entry_t *ptep; /* * No L2 page table has been allocated. Chances are, this * is because we just allocated the l2_dtable, above. */ l2->l2_occupancy++; PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); ptep = uma_zalloc(l2zone, M_NOWAIT); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); if (l2b->l2b_kva != 0) { /* We lost the race. */ l2->l2_occupancy--; uma_zfree(l2zone, ptep); return (l2b); } l2b->l2b_phys = vtophys(ptep); if (ptep == NULL) { /* * Oops, no more L2 page tables available at this * time. We may need to deallocate the l2_dtable * if we allocated a new one above. */ l2->l2_occupancy--; if (l2->l2_occupancy == 0) { pmap->pm_l2[L2_IDX(l1idx)] = NULL; uma_zfree(l2table_zone, l2); } return (NULL); } l2b->l2b_kva = ptep; l2b->l2b_l1idx = l1idx; } return (l2b); } static PMAP_INLINE void pmap_free_l2_ptp(pt_entry_t *l2) { uma_zfree(l2zone, l2); } /* * One or more mappings in the specified L2 descriptor table have just been * invalidated. * * Garbage collect the metadata and descriptor table itself if necessary. * * The pmap lock must be acquired when this is called (not necessary * for the kernel pmap). */ static void pmap_free_l2_bucket(pmap_t pmap, struct l2_bucket *l2b, u_int count) { struct l2_dtable *l2; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep; u_short l1idx; /* * Update the bucket's reference count according to how many * PTEs the caller has just invalidated. */ l2b->l2b_occupancy -= count; /* * Note: * * Level 2 page tables allocated to the kernel pmap are never freed * as that would require checking all Level 1 page tables and * removing any references to the Level 2 page table. See also the * comment elsewhere about never freeing bootstrap L2 descriptors. * * We make do with just invalidating the mapping in the L2 table. * * This isn't really a big deal in practice and, in fact, leads * to a performance win over time as we don't need to continually * alloc/free. */ if (l2b->l2b_occupancy > 0 || pmap == pmap_kernel()) return; /* * There are no more valid mappings in this level 2 page table. * Go ahead and NULL-out the pointer in the bucket, then * free the page table. */ l1idx = l2b->l2b_l1idx; ptep = l2b->l2b_kva; l2b->l2b_kva = NULL; pl1pd = &pmap->pm_l1->l1_kva[l1idx]; /* * If the L1 slot matches the pmap's domain * number, then invalidate it. */ l1pd = *pl1pd & (L1_TYPE_MASK | L1_C_DOM_MASK); if (l1pd == (L1_C_DOM(pmap->pm_domain) | L1_TYPE_C)) { *pl1pd = 0; PTE_SYNC(pl1pd); cpu_tlb_flushD_SE((vm_offset_t)ptep); cpu_cpwait(); } /* * Release the L2 descriptor table back to the pool cache. */ pmap_free_l2_ptp(ptep); /* * Update the reference count in the associated l2_dtable */ l2 = pmap->pm_l2[L2_IDX(l1idx)]; if (--l2->l2_occupancy > 0) return; /* * There are no more valid mappings in any of the Level 1 * slots managed by this l2_dtable. Go ahead and NULL-out * the pointer in the parent pmap and free the l2_dtable. */ pmap->pm_l2[L2_IDX(l1idx)] = NULL; uma_zfree(l2table_zone, l2); } /* * Pool cache constructors for L2 descriptor tables, metadata and pmap * structures. */ static int pmap_l2ptp_ctor(void *mem, int size, void *arg, int flags) { struct l2_bucket *l2b; pt_entry_t *ptep, pte; vm_offset_t va = (vm_offset_t)mem & ~PAGE_MASK; /* * The mappings for these page tables were initially made using * pmap_kenter() by the pool subsystem. Therefore, the cache- * mode will not be right for page table mappings. To avoid * polluting the pmap_kenter() code with a special case for * page tables, we simply fix up the cache-mode here if it's not * correct. */ l2b = pmap_get_l2_bucket(pmap_kernel(), va); ptep = &l2b->l2b_kva[l2pte_index(va)]; pte = *ptep; cpu_idcache_wbinv_range(va, PAGE_SIZE); pmap_l2cache_wbinv_range(va, pte & L2_S_FRAME, PAGE_SIZE); if ((pte & L2_S_CACHE_MASK) != pte_l2_s_cache_mode_pt) { /* * Page tables must have the cache-mode set to * Write-Thru. */ *ptep = (pte & ~L2_S_CACHE_MASK) | pte_l2_s_cache_mode_pt; PTE_SYNC(ptep); cpu_tlb_flushD_SE(va); cpu_cpwait(); } memset(mem, 0, L2_TABLE_SIZE_REAL); return (0); } /* * Modify pte bits for all ptes corresponding to the given physical address. * We use `maskbits' rather than `clearbits' because we're always passing * constants and the latter would require an extra inversion at run-time. */ static int pmap_clearbit(struct vm_page *m, u_int maskbits) { struct l2_bucket *l2b; struct pv_entry *pv, *pve, *next_pv; struct md_page *pvh; pd_entry_t *pl1pd; pt_entry_t *ptep, npte, opte; pmap_t pmap; vm_offset_t va; u_int oflags; int count = 0; rw_wlock(&pvh_global_lock); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH_SAFE(pv, &pvh->pv_list, pv_list, next_pv) { va = pv->pv_va; pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; KASSERT((*pl1pd & L1_TYPE_MASK) == L1_S_PROTO, ("pmap_clearbit: valid section mapping expected")); if ((maskbits & PVF_WRITE) && (pv->pv_flags & PVF_WRITE)) (void)pmap_demote_section(pmap, va); else if ((maskbits & PVF_REF) && L1_S_REFERENCED(*pl1pd)) { if (pmap_demote_section(pmap, va)) { if ((pv->pv_flags & PVF_WIRED) == 0) { /* * Remove the mapping to a single page * so that a subsequent access may * repromote. Since the underlying * l2_bucket is fully populated, this * removal never frees an entire * l2_bucket. */ va += (VM_PAGE_TO_PHYS(m) & L1_S_OFFSET); l2b = pmap_get_l2_bucket(pmap, va); KASSERT(l2b != NULL, ("pmap_clearbit: no l2 bucket for " "va 0x%#x, pmap 0x%p", va, pmap)); ptep = &l2b->l2b_kva[l2pte_index(va)]; *ptep = 0; PTE_SYNC(ptep); pmap_free_l2_bucket(pmap, l2b, 1); pve = pmap_remove_pv(m, pmap, va); KASSERT(pve != NULL, ("pmap_clearbit: " "no PV entry for managed mapping")); pmap_free_pv_entry(pmap, pve); } } } else if ((maskbits & PVF_MOD) && L1_S_WRITABLE(*pl1pd)) { if (pmap_demote_section(pmap, va)) { if ((pv->pv_flags & PVF_WIRED) == 0) { /* * Write protect the mapping to a * single page so that a subsequent * write access may repromote. */ va += (VM_PAGE_TO_PHYS(m) & L1_S_OFFSET); l2b = pmap_get_l2_bucket(pmap, va); KASSERT(l2b != NULL, ("pmap_clearbit: no l2 bucket for " "va 0x%#x, pmap 0x%p", va, pmap)); ptep = &l2b->l2b_kva[l2pte_index(va)]; if ((*ptep & L2_S_PROTO) != 0) { pve = pmap_find_pv(&m->md, pmap, va); KASSERT(pve != NULL, ("pmap_clearbit: no PV " "entry for managed mapping")); pve->pv_flags &= ~PVF_WRITE; *ptep |= L2_APX; PTE_SYNC(ptep); } } } } PMAP_UNLOCK(pmap); } small_mappings: if (TAILQ_EMPTY(&m->md.pv_list)) { rw_wunlock(&pvh_global_lock); return (0); } /* * Loop over all current mappings setting/clearing as appropos */ TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) { va = pv->pv_va; pmap = PV_PMAP(pv); oflags = pv->pv_flags; pv->pv_flags &= ~maskbits; PMAP_LOCK(pmap); l2b = pmap_get_l2_bucket(pmap, va); KASSERT(l2b != NULL, ("pmap_clearbit: no l2 bucket for " "va 0x%#x, pmap 0x%p", va, pmap)); ptep = &l2b->l2b_kva[l2pte_index(va)]; npte = opte = *ptep; if (maskbits & (PVF_WRITE | PVF_MOD)) { /* make the pte read only */ npte |= L2_APX; } if (maskbits & PVF_REF) { /* * Clear referenced flag in PTE so that we * will take a flag fault the next time the mapping * is referenced. */ npte &= ~L2_S_REF; } CTR4(KTR_PMAP,"clearbit: pmap:%p bits:%x pte:%x->%x", pmap, maskbits, opte, npte); if (npte != opte) { count++; *ptep = npte; PTE_SYNC(ptep); /* Flush the TLB entry if a current pmap. */ if (PTE_BEEN_EXECD(opte)) cpu_tlb_flushID_SE(pv->pv_va); else if (PTE_BEEN_REFD(opte)) cpu_tlb_flushD_SE(pv->pv_va); cpu_cpwait(); } PMAP_UNLOCK(pmap); } if (maskbits & PVF_WRITE) vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); return (count); } /* * main pv_entry manipulation functions: * pmap_enter_pv: enter a mapping onto a vm_page list * pmap_remove_pv: remove a mappiing from a vm_page list * * NOTE: pmap_enter_pv expects to lock the pvh itself * pmap_remove_pv expects the caller to lock the pvh before calling */ /* * pmap_enter_pv: enter a mapping onto a vm_page's PV list * * => caller should hold the proper lock on pvh_global_lock * => caller should have pmap locked * => we will (someday) gain the lock on the vm_page's PV list * => caller should adjust ptp's wire_count before calling * => caller should not adjust pmap's wire_count */ static void pmap_enter_pv(struct vm_page *m, struct pv_entry *pve, pmap_t pmap, vm_offset_t va, u_int flags) { rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_ASSERT_LOCKED(pmap); pve->pv_va = va; pve->pv_flags = flags; TAILQ_INSERT_HEAD(&m->md.pv_list, pve, pv_list); if (pve->pv_flags & PVF_WIRED) ++pmap->pm_stats.wired_count; } /* * * pmap_find_pv: Find a pv entry * * => caller should hold lock on vm_page */ static PMAP_INLINE struct pv_entry * pmap_find_pv(struct md_page *md, pmap_t pmap, vm_offset_t va) { struct pv_entry *pv; rw_assert(&pvh_global_lock, RA_WLOCKED); TAILQ_FOREACH(pv, &md->pv_list, pv_list) if (pmap == PV_PMAP(pv) && va == pv->pv_va) break; return (pv); } /* * vector_page_setprot: * * Manipulate the protection of the vector page. */ void vector_page_setprot(int prot) { struct l2_bucket *l2b; pt_entry_t *ptep; l2b = pmap_get_l2_bucket(pmap_kernel(), vector_page); ptep = &l2b->l2b_kva[l2pte_index(vector_page)]; /* * Set referenced flag. * Vectors' page is always desired * to be allowed to reside in TLB. */ *ptep |= L2_S_REF; pmap_set_prot(ptep, prot|VM_PROT_EXECUTE, 0); PTE_SYNC(ptep); cpu_tlb_flushID_SE(vector_page); cpu_cpwait(); } static void pmap_set_prot(pt_entry_t *ptep, vm_prot_t prot, uint8_t user) { *ptep &= ~(L2_S_PROT_MASK | L2_XN); if (!(prot & VM_PROT_EXECUTE)) *ptep |= L2_XN; /* Set defaults first - kernel read access */ *ptep |= L2_APX; *ptep |= L2_S_PROT_R; /* Now tune APs as desired */ if (user) *ptep |= L2_S_PROT_U; if (prot & VM_PROT_WRITE) *ptep &= ~(L2_APX); } /* * pmap_remove_pv: try to remove a mapping from a pv_list * * => caller should hold proper lock on pmap_main_lock * => pmap should be locked * => caller should hold lock on vm_page [so that attrs can be adjusted] * => caller should adjust ptp's wire_count and free PTP if needed * => caller should NOT adjust pmap's wire_count * => we return the removed pve */ static struct pv_entry * pmap_remove_pv(struct vm_page *m, pmap_t pmap, vm_offset_t va) { struct pv_entry *pve; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_ASSERT_LOCKED(pmap); pve = pmap_find_pv(&m->md, pmap, va); /* find corresponding pve */ if (pve != NULL) { TAILQ_REMOVE(&m->md.pv_list, pve, pv_list); if (pve->pv_flags & PVF_WIRED) --pmap->pm_stats.wired_count; } if (TAILQ_EMPTY(&m->md.pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); return(pve); /* return removed pve */ } /* * * pmap_modify_pv: Update pv flags * * => caller should hold lock on vm_page [so that attrs can be adjusted] * => caller should NOT adjust pmap's wire_count * => we return the old flags * * Modify a physical-virtual mapping in the pv table */ static u_int pmap_modify_pv(struct vm_page *m, pmap_t pmap, vm_offset_t va, u_int clr_mask, u_int set_mask) { struct pv_entry *npv; u_int flags, oflags; PMAP_ASSERT_LOCKED(pmap); rw_assert(&pvh_global_lock, RA_WLOCKED); if ((npv = pmap_find_pv(&m->md, pmap, va)) == NULL) return (0); /* * There is at least one VA mapping this page. */ oflags = npv->pv_flags; npv->pv_flags = flags = (oflags & ~clr_mask) | set_mask; if ((flags ^ oflags) & PVF_WIRED) { if (flags & PVF_WIRED) ++pmap->pm_stats.wired_count; else --pmap->pm_stats.wired_count; } return (oflags); } /* Function to set the debug level of the pmap code */ #ifdef PMAP_DEBUG void pmap_debug(int level) { pmap_debug_level = level; dprintf("pmap_debug: level=%d\n", pmap_debug_level); } #endif /* PMAP_DEBUG */ void pmap_pinit0(struct pmap *pmap) { PDEBUG(1, printf("pmap_pinit0: pmap = %08x\n", (u_int32_t) pmap)); bcopy(kernel_pmap, pmap, sizeof(*pmap)); bzero(&pmap->pm_mtx, sizeof(pmap->pm_mtx)); PMAP_LOCK_INIT(pmap); TAILQ_INIT(&pmap->pm_pvchunk); } /* * Initialize a vm_page's machine-dependent fields. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); m->md.pv_memattr = VM_MEMATTR_DEFAULT; } static vm_offset_t pmap_ptelist_alloc(vm_offset_t *head) { pt_entry_t *pte; vm_offset_t va; va = *head; if (va == 0) return (va); /* Out of memory */ pte = vtopte(va); *head = *pte; if ((*head & L2_TYPE_MASK) != L2_TYPE_INV) panic("%s: va is not L2_TYPE_INV!", __func__); *pte = 0; return (va); } static void pmap_ptelist_free(vm_offset_t *head, vm_offset_t va) { pt_entry_t *pte; if ((va & L2_TYPE_MASK) != L2_TYPE_INV) panic("%s: freeing va that is not L2_TYPE INV!", __func__); pte = vtopte(va); *pte = *head; /* virtual! L2_TYPE is L2_TYPE_INV though */ *head = va; } static void pmap_ptelist_init(vm_offset_t *head, void *base, int npages) { int i; vm_offset_t va; *head = 0; for (i = npages - 1; i >= 0; i--) { va = (vm_offset_t)base + i * PAGE_SIZE; pmap_ptelist_free(head, va); } } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ void pmap_init(void) { vm_size_t s; int i, pv_npg; l2zone = uma_zcreate("L2 Table", L2_TABLE_SIZE_REAL, pmap_l2ptp_ctor, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); l2table_zone = uma_zcreate("L2 Table", sizeof(struct l2_dtable), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); /* * Are large page mappings supported and enabled? */ TUNABLE_INT_FETCH("vm.pmap.sp_enabled", &sp_enabled); if (sp_enabled) { KASSERT(MAXPAGESIZES > 1 && pagesizes[1] == 0, ("pmap_init: can't assign to pagesizes[1]")); pagesizes[1] = NBPDR; } /* * Calculate the size of the pv head table for superpages. * Handle the possibility that "vm_phys_segs[...].end" is zero. */ pv_npg = trunc_1mpage(vm_phys_segs[vm_phys_nsegs - 1].end - PAGE_SIZE) / NBPDR + 1; /* * Allocate memory for the pv head table for superpages. */ s = (vm_size_t)(pv_npg * sizeof(struct md_page)); s = round_page(s); pv_table = (struct md_page *)kmem_malloc(kernel_arena, s, M_WAITOK | M_ZERO); for (i = 0; i < pv_npg; i++) TAILQ_INIT(&pv_table[i].pv_list); /* * Initialize the address space for the pv chunks. */ TUNABLE_INT_FETCH("vm.pmap.shpgperproc", &shpgperproc); pv_entry_max = shpgperproc * maxproc + vm_cnt.v_page_count; TUNABLE_INT_FETCH("vm.pmap.pv_entries", &pv_entry_max); pv_entry_max = roundup(pv_entry_max, _NPCPV); pv_entry_high_water = 9 * (pv_entry_max / 10); pv_maxchunks = MAX(pv_entry_max / _NPCPV, maxproc); pv_chunkbase = (struct pv_chunk *)kva_alloc(PAGE_SIZE * pv_maxchunks); if (pv_chunkbase == NULL) panic("pmap_init: not enough kvm for pv chunks"); pmap_ptelist_init(&pv_vafree, pv_chunkbase, pv_maxchunks); /* * Now it is safe to enable pv_table recording. */ PDEBUG(1, printf("pmap_init: done!\n")); } SYSCTL_INT(_vm_pmap, OID_AUTO, pv_entry_max, CTLFLAG_RD, &pv_entry_max, 0, "Max number of PV entries"); SYSCTL_INT(_vm_pmap, OID_AUTO, shpgperproc, CTLFLAG_RD, &shpgperproc, 0, "Page share factor per proc"); static SYSCTL_NODE(_vm_pmap, OID_AUTO, section, CTLFLAG_RD, 0, "1MB page mapping counters"); static u_long pmap_section_demotions; SYSCTL_ULONG(_vm_pmap_section, OID_AUTO, demotions, CTLFLAG_RD, &pmap_section_demotions, 0, "1MB page demotions"); static u_long pmap_section_mappings; SYSCTL_ULONG(_vm_pmap_section, OID_AUTO, mappings, CTLFLAG_RD, &pmap_section_mappings, 0, "1MB page mappings"); static u_long pmap_section_p_failures; SYSCTL_ULONG(_vm_pmap_section, OID_AUTO, p_failures, CTLFLAG_RD, &pmap_section_p_failures, 0, "1MB page promotion failures"); static u_long pmap_section_promotions; SYSCTL_ULONG(_vm_pmap_section, OID_AUTO, promotions, CTLFLAG_RD, &pmap_section_promotions, 0, "1MB page promotions"); int pmap_fault_fixup(pmap_t pmap, vm_offset_t va, vm_prot_t ftype, int user) { struct l2_dtable *l2; struct l2_bucket *l2b; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep, pte; vm_paddr_t pa; u_int l1idx; int rv = 0; l1idx = L1_IDX(va); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); /* * Check and possibly fix-up L1 section mapping * only when superpage mappings are enabled to speed up. */ if (sp_enabled) { pl1pd = &pmap->pm_l1->l1_kva[l1idx]; l1pd = *pl1pd; if ((l1pd & L1_TYPE_MASK) == L1_S_PROTO) { /* Catch an access to the vectors section */ if (l1idx == L1_IDX(vector_page)) goto out; /* * Stay away from the kernel mappings. * None of them should fault from L1 entry. */ if (pmap == pmap_kernel()) goto out; /* * Catch a forbidden userland access */ if (user && !(l1pd & L1_S_PROT_U)) goto out; /* * Superpage is always either mapped read only * or it is modified and permitted to be written * by default. Therefore, process only reference * flag fault and demote page in case of write fault. */ if ((ftype & VM_PROT_WRITE) && !L1_S_WRITABLE(l1pd) && L1_S_REFERENCED(l1pd)) { (void)pmap_demote_section(pmap, va); goto out; } else if (!L1_S_REFERENCED(l1pd)) { /* Mark the page "referenced" */ *pl1pd = l1pd | L1_S_REF; PTE_SYNC(pl1pd); goto l1_section_out; } else goto out; } } /* * If there is no l2_dtable for this address, then the process * has no business accessing it. * * Note: This will catch userland processes trying to access * kernel addresses. */ l2 = pmap->pm_l2[L2_IDX(l1idx)]; if (l2 == NULL) goto out; /* * Likewise if there is no L2 descriptor table */ l2b = &l2->l2_bucket[L2_BUCKET(l1idx)]; if (l2b->l2b_kva == NULL) goto out; /* * Check the PTE itself. */ ptep = &l2b->l2b_kva[l2pte_index(va)]; pte = *ptep; if (pte == 0) goto out; /* * Catch a userland access to the vector page mapped at 0x0 */ if (user && !(pte & L2_S_PROT_U)) goto out; if (va == vector_page) goto out; pa = l2pte_pa(pte); CTR5(KTR_PMAP, "pmap_fault_fix: pmap:%p va:%x pte:0x%x ftype:%x user:%x", pmap, va, pte, ftype, user); if ((ftype & VM_PROT_WRITE) && !(L2_S_WRITABLE(pte)) && L2_S_REFERENCED(pte)) { /* * This looks like a good candidate for "page modified" * emulation... */ struct pv_entry *pv; struct vm_page *m; /* Extract the physical address of the page */ if ((m = PHYS_TO_VM_PAGE(pa)) == NULL) { goto out; } /* Get the current flags for this page. */ pv = pmap_find_pv(&m->md, pmap, va); if (pv == NULL) { goto out; } /* * Do the flags say this page is writable? If not then it * is a genuine write fault. If yes then the write fault is * our fault as we did not reflect the write access in the * PTE. Now we know a write has occurred we can correct this * and also set the modified bit */ if ((pv->pv_flags & PVF_WRITE) == 0) { goto out; } vm_page_dirty(m); /* Re-enable write permissions for the page */ *ptep = (pte & ~L2_APX); PTE_SYNC(ptep); rv = 1; CTR1(KTR_PMAP, "pmap_fault_fix: new pte:0x%x", *ptep); } else if (!L2_S_REFERENCED(pte)) { /* * This looks like a good candidate for "page referenced" * emulation. */ struct pv_entry *pv; struct vm_page *m; /* Extract the physical address of the page */ if ((m = PHYS_TO_VM_PAGE(pa)) == NULL) goto out; /* Get the current flags for this page. */ pv = pmap_find_pv(&m->md, pmap, va); if (pv == NULL) goto out; vm_page_aflag_set(m, PGA_REFERENCED); /* Mark the page "referenced" */ *ptep = pte | L2_S_REF; PTE_SYNC(ptep); rv = 1; CTR1(KTR_PMAP, "pmap_fault_fix: new pte:0x%x", *ptep); } /* * We know there is a valid mapping here, so simply * fix up the L1 if necessary. */ pl1pd = &pmap->pm_l1->l1_kva[l1idx]; l1pd = l2b->l2b_phys | L1_C_DOM(pmap->pm_domain) | L1_C_PROTO; if (*pl1pd != l1pd) { *pl1pd = l1pd; PTE_SYNC(pl1pd); rv = 1; } #ifdef DEBUG /* * If 'rv == 0' at this point, it generally indicates that there is a * stale TLB entry for the faulting address. This happens when two or * more processes are sharing an L1. Since we don't flush the TLB on * a context switch between such processes, we can take domain faults * for mappings which exist at the same VA in both processes. EVEN IF * WE'VE RECENTLY FIXED UP THE CORRESPONDING L1 in pmap_enter(), for * example. * * This is extremely likely to happen if pmap_enter() updated the L1 * entry for a recently entered mapping. In this case, the TLB is * flushed for the new mapping, but there may still be TLB entries for * other mappings belonging to other processes in the 1MB range * covered by the L1 entry. * * Since 'rv == 0', we know that the L1 already contains the correct * value, so the fault must be due to a stale TLB entry. * * Since we always need to flush the TLB anyway in the case where we * fixed up the L1, or frobbed the L2 PTE, we effectively deal with * stale TLB entries dynamically. * * However, the above condition can ONLY happen if the current L1 is * being shared. If it happens when the L1 is unshared, it indicates * that other parts of the pmap are not doing their job WRT managing * the TLB. */ if (rv == 0 && pmap->pm_l1->l1_domain_use_count == 1) { printf("fixup: pmap %p, va 0x%08x, ftype %d - nothing to do!\n", pmap, va, ftype); printf("fixup: l2 %p, l2b %p, ptep %p, pl1pd %p\n", l2, l2b, ptep, pl1pd); printf("fixup: pte 0x%x, l1pd 0x%x, last code 0x%x\n", pte, l1pd, last_fault_code); #ifdef DDB Debugger(); #endif } #endif l1_section_out: cpu_tlb_flushID_SE(va); cpu_cpwait(); rv = 1; out: rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (rv); } void pmap_postinit(void) { struct l2_bucket *l2b; struct l1_ttable *l1; pd_entry_t *pl1pt; pt_entry_t *ptep, pte; vm_offset_t va, eva; u_int loop, needed; needed = (maxproc / PMAP_DOMAINS) + ((maxproc % PMAP_DOMAINS) ? 1 : 0); needed -= 1; l1 = malloc(sizeof(*l1) * needed, M_VMPMAP, M_WAITOK); for (loop = 0; loop < needed; loop++, l1++) { /* Allocate a L1 page table */ va = (vm_offset_t)contigmalloc(L1_TABLE_SIZE, M_VMPMAP, 0, 0x0, 0xffffffff, L1_TABLE_SIZE, 0); if (va == 0) panic("Cannot allocate L1 KVM"); eva = va + L1_TABLE_SIZE; pl1pt = (pd_entry_t *)va; while (va < eva) { l2b = pmap_get_l2_bucket(pmap_kernel(), va); ptep = &l2b->l2b_kva[l2pte_index(va)]; pte = *ptep; pte = (pte & ~L2_S_CACHE_MASK) | pte_l2_s_cache_mode_pt; *ptep = pte; PTE_SYNC(ptep); cpu_tlb_flushID_SE(va); cpu_cpwait(); va += PAGE_SIZE; } pmap_init_l1(l1, pl1pt); } #ifdef DEBUG printf("pmap_postinit: Allocated %d static L1 descriptor tables\n", needed); #endif } /* * This is used to stuff certain critical values into the PCB where they * can be accessed quickly from cpu_switch() et al. */ void pmap_set_pcb_pagedir(pmap_t pmap, struct pcb *pcb) { struct l2_bucket *l2b; pcb->pcb_pagedir = pmap->pm_l1->l1_physaddr; pcb->pcb_dacr = (DOMAIN_CLIENT << (PMAP_DOMAIN_KERNEL * 2)) | (DOMAIN_CLIENT << (pmap->pm_domain * 2)); if (vector_page < KERNBASE) { pcb->pcb_pl1vec = &pmap->pm_l1->l1_kva[L1_IDX(vector_page)]; l2b = pmap_get_l2_bucket(pmap, vector_page); pcb->pcb_l1vec = l2b->l2b_phys | L1_C_PROTO | L1_C_DOM(pmap->pm_domain) | L1_C_DOM(PMAP_DOMAIN_KERNEL); } else pcb->pcb_pl1vec = NULL; } void pmap_activate(struct thread *td) { pmap_t pmap; struct pcb *pcb; pmap = vmspace_pmap(td->td_proc->p_vmspace); pcb = td->td_pcb; critical_enter(); pmap_set_pcb_pagedir(pmap, pcb); if (td == curthread) { u_int cur_dacr, cur_ttb; __asm __volatile("mrc p15, 0, %0, c2, c0, 0" : "=r"(cur_ttb)); __asm __volatile("mrc p15, 0, %0, c3, c0, 0" : "=r"(cur_dacr)); cur_ttb &= ~(L1_TABLE_SIZE - 1); if (cur_ttb == (u_int)pcb->pcb_pagedir && cur_dacr == pcb->pcb_dacr) { /* * No need to switch address spaces. */ critical_exit(); return; } /* * We MUST, I repeat, MUST fix up the L1 entry corresponding * to 'vector_page' in the incoming L1 table before switching * to it otherwise subsequent interrupts/exceptions (including * domain faults!) will jump into hyperspace. */ if (pcb->pcb_pl1vec) { *pcb->pcb_pl1vec = pcb->pcb_l1vec; } cpu_domains(pcb->pcb_dacr); cpu_setttb(pcb->pcb_pagedir); } critical_exit(); } static int pmap_set_pt_cache_mode(pd_entry_t *kl1, vm_offset_t va) { pd_entry_t *pdep, pde; pt_entry_t *ptep, pte; vm_offset_t pa; int rv = 0; /* * Make sure the descriptor itself has the correct cache mode */ pdep = &kl1[L1_IDX(va)]; pde = *pdep; if (l1pte_section_p(pde)) { if ((pde & L1_S_CACHE_MASK) != pte_l1_s_cache_mode_pt) { *pdep = (pde & ~L1_S_CACHE_MASK) | pte_l1_s_cache_mode_pt; PTE_SYNC(pdep); rv = 1; } } else { pa = (vm_paddr_t)(pde & L1_C_ADDR_MASK); ptep = (pt_entry_t *)kernel_pt_lookup(pa); if (ptep == NULL) panic("pmap_bootstrap: No L2 for L2 @ va %p\n", ptep); ptep = &ptep[l2pte_index(va)]; pte = *ptep; if ((pte & L2_S_CACHE_MASK) != pte_l2_s_cache_mode_pt) { *ptep = (pte & ~L2_S_CACHE_MASK) | pte_l2_s_cache_mode_pt; PTE_SYNC(ptep); rv = 1; } } return (rv); } static void pmap_alloc_specials(vm_offset_t *availp, int pages, vm_offset_t *vap, pt_entry_t **ptep) { vm_offset_t va = *availp; struct l2_bucket *l2b; if (ptep) { l2b = pmap_get_l2_bucket(pmap_kernel(), va); if (l2b == NULL) panic("pmap_alloc_specials: no l2b for 0x%x", va); *ptep = &l2b->l2b_kva[l2pte_index(va)]; } *vap = va; *availp = va + (PAGE_SIZE * pages); } /* * Bootstrap the system enough to run with virtual memory. * * On the arm this is called after mapping has already been enabled * and just syncs the pmap module with what has already been done. * [We can't call it easily with mapping off since the kernel is not * mapped with PA == VA, hence we would have to relocate every address * from the linked base (virtual) address "KERNBASE" to the actual * (physical) address starting relative to 0] */ #define PMAP_STATIC_L2_SIZE 16 void pmap_bootstrap(vm_offset_t firstaddr, struct pv_addr *l1pt) { static struct l1_ttable static_l1; static struct l2_dtable static_l2[PMAP_STATIC_L2_SIZE]; struct l1_ttable *l1 = &static_l1; struct l2_dtable *l2; struct l2_bucket *l2b; struct czpages *czp; pd_entry_t pde; pd_entry_t *kernel_l1pt = (pd_entry_t *)l1pt->pv_va; pt_entry_t *ptep; vm_paddr_t pa; vm_offset_t va; vm_size_t size; int i, l1idx, l2idx, l2next = 0; PDEBUG(1, printf("firstaddr = %08x, lastaddr = %08x\n", firstaddr, vm_max_kernel_address)); virtual_avail = firstaddr; kernel_pmap->pm_l1 = l1; kernel_l1pa = l1pt->pv_pa; /* * Scan the L1 translation table created by initarm() and create * the required metadata for all valid mappings found in it. */ for (l1idx = 0; l1idx < (L1_TABLE_SIZE / sizeof(pd_entry_t)); l1idx++) { pde = kernel_l1pt[l1idx]; /* * We're only interested in Coarse mappings. * pmap_extract() can deal with section mappings without * recourse to checking L2 metadata. */ if ((pde & L1_TYPE_MASK) != L1_TYPE_C) continue; /* * Lookup the KVA of this L2 descriptor table */ pa = (vm_paddr_t)(pde & L1_C_ADDR_MASK); ptep = (pt_entry_t *)kernel_pt_lookup(pa); if (ptep == NULL) { panic("pmap_bootstrap: No L2 for va 0x%x, pa 0x%lx", (u_int)l1idx << L1_S_SHIFT, (long unsigned int)pa); } /* * Fetch the associated L2 metadata structure. * Allocate a new one if necessary. */ if ((l2 = kernel_pmap->pm_l2[L2_IDX(l1idx)]) == NULL) { if (l2next == PMAP_STATIC_L2_SIZE) panic("pmap_bootstrap: out of static L2s"); kernel_pmap->pm_l2[L2_IDX(l1idx)] = l2 = &static_l2[l2next++]; } /* * One more L1 slot tracked... */ l2->l2_occupancy++; /* * Fill in the details of the L2 descriptor in the * appropriate bucket. */ l2b = &l2->l2_bucket[L2_BUCKET(l1idx)]; l2b->l2b_kva = ptep; l2b->l2b_phys = pa; l2b->l2b_l1idx = l1idx; /* * Establish an initial occupancy count for this descriptor */ for (l2idx = 0; l2idx < (L2_TABLE_SIZE_REAL / sizeof(pt_entry_t)); l2idx++) { if ((ptep[l2idx] & L2_TYPE_MASK) != L2_TYPE_INV) { l2b->l2b_occupancy++; } } /* * Make sure the descriptor itself has the correct cache mode. * If not, fix it, but whine about the problem. Port-meisters * should consider this a clue to fix up their initarm() * function. :) */ if (pmap_set_pt_cache_mode(kernel_l1pt, (vm_offset_t)ptep)) { printf("pmap_bootstrap: WARNING! wrong cache mode for " "L2 pte @ %p\n", ptep); } } /* * Ensure the primary (kernel) L1 has the correct cache mode for * a page table. Bitch if it is not correctly set. */ for (va = (vm_offset_t)kernel_l1pt; va < ((vm_offset_t)kernel_l1pt + L1_TABLE_SIZE); va += PAGE_SIZE) { if (pmap_set_pt_cache_mode(kernel_l1pt, va)) printf("pmap_bootstrap: WARNING! wrong cache mode for " "primary L1 @ 0x%x\n", va); } cpu_dcache_wbinv_all(); cpu_l2cache_wbinv_all(); cpu_tlb_flushID(); cpu_cpwait(); PMAP_LOCK_INIT(kernel_pmap); CPU_FILL(&kernel_pmap->pm_active); kernel_pmap->pm_domain = PMAP_DOMAIN_KERNEL; TAILQ_INIT(&kernel_pmap->pm_pvchunk); /* * Initialize the global pv list lock. */ rw_init(&pvh_global_lock, "pmap pv global"); /* * Reserve some special page table entries/VA space for temporary * mapping of pages that are being copied or zeroed. */ for (czp = cpu_czpages, i = 0; i < MAXCPU; ++i, ++czp) { mtx_init(&czp->lock, "czpages", NULL, MTX_DEF); pmap_alloc_specials(&virtual_avail, 1, &czp->srcva, &czp->srcptep); pmap_set_pt_cache_mode(kernel_l1pt, (vm_offset_t)czp->srcptep); pmap_alloc_specials(&virtual_avail, 1, &czp->dstva, &czp->dstptep); pmap_set_pt_cache_mode(kernel_l1pt, (vm_offset_t)czp->dstptep); } size = ((vm_max_kernel_address - pmap_curmaxkvaddr) + L1_S_OFFSET) / L1_S_SIZE; pmap_alloc_specials(&virtual_avail, round_page(size * L2_TABLE_SIZE_REAL) / PAGE_SIZE, &pmap_kernel_l2ptp_kva, NULL); size = (size + (L2_BUCKET_SIZE - 1)) / L2_BUCKET_SIZE; pmap_alloc_specials(&virtual_avail, round_page(size * sizeof(struct l2_dtable)) / PAGE_SIZE, &pmap_kernel_l2dtable_kva, NULL); pmap_alloc_specials(&virtual_avail, 1, (vm_offset_t*)&_tmppt, NULL); pmap_alloc_specials(&virtual_avail, MAXDUMPPGS, (vm_offset_t *)&crashdumpmap, NULL); SLIST_INIT(&l1_list); TAILQ_INIT(&l1_lru_list); mtx_init(&l1_lru_lock, "l1 list lock", NULL, MTX_DEF); pmap_init_l1(l1, kernel_l1pt); cpu_dcache_wbinv_all(); cpu_l2cache_wbinv_all(); cpu_tlb_flushID(); cpu_cpwait(); virtual_avail = round_page(virtual_avail); virtual_end = vm_max_kernel_address; kernel_vm_end = pmap_curmaxkvaddr; pmap_set_pcb_pagedir(kernel_pmap, thread0.td_pcb); } /*************************************************** * Pmap allocation/deallocation routines. ***************************************************/ /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pmap) { struct pcb *pcb; cpu_tlb_flushID(); cpu_cpwait(); if (vector_page < KERNBASE) { struct pcb *curpcb = PCPU_GET(curpcb); pcb = thread0.td_pcb; if (pmap_is_current(pmap)) { /* * Frob the L1 entry corresponding to the vector * page so that it contains the kernel pmap's domain * number. This will ensure pmap_remove() does not * pull the current vector page out from under us. */ critical_enter(); *pcb->pcb_pl1vec = pcb->pcb_l1vec; cpu_domains(pcb->pcb_dacr); cpu_setttb(pcb->pcb_pagedir); critical_exit(); } pmap_remove(pmap, vector_page, vector_page + PAGE_SIZE); /* * Make sure cpu_switch(), et al, DTRT. This is safe to do * since this process has no remaining mappings of its own. */ curpcb->pcb_pl1vec = pcb->pcb_pl1vec; curpcb->pcb_l1vec = pcb->pcb_l1vec; curpcb->pcb_dacr = pcb->pcb_dacr; curpcb->pcb_pagedir = pcb->pcb_pagedir; } pmap_free_l1(pmap); dprintf("pmap_release()\n"); } /* * Helper function for pmap_grow_l2_bucket() */ static __inline int pmap_grow_map(vm_offset_t va, pt_entry_t cache_mode, vm_paddr_t *pap) { struct l2_bucket *l2b; pt_entry_t *ptep; vm_paddr_t pa; struct vm_page *m; m = vm_page_alloc(NULL, 0, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED); if (m == NULL) return (1); pa = VM_PAGE_TO_PHYS(m); if (pap) *pap = pa; l2b = pmap_get_l2_bucket(pmap_kernel(), va); ptep = &l2b->l2b_kva[l2pte_index(va)]; *ptep = L2_S_PROTO | pa | cache_mode | L2_S_REF; pmap_set_prot(ptep, VM_PROT_READ | VM_PROT_WRITE, 0); PTE_SYNC(ptep); cpu_tlb_flushD_SE(va); cpu_cpwait(); return (0); } /* * This is the same as pmap_alloc_l2_bucket(), except that it is only * used by pmap_growkernel(). */ static __inline struct l2_bucket * pmap_grow_l2_bucket(pmap_t pmap, vm_offset_t va) { struct l2_dtable *l2; struct l2_bucket *l2b; struct l1_ttable *l1; pd_entry_t *pl1pd; u_short l1idx; vm_offset_t nva; l1idx = L1_IDX(va); if ((l2 = pmap->pm_l2[L2_IDX(l1idx)]) == NULL) { /* * No mapping at this address, as there is * no entry in the L1 table. * Need to allocate a new l2_dtable. */ nva = pmap_kernel_l2dtable_kva; if ((nva & PAGE_MASK) == 0) { /* * Need to allocate a backing page */ if (pmap_grow_map(nva, pte_l2_s_cache_mode, NULL)) return (NULL); } l2 = (struct l2_dtable *)nva; nva += sizeof(struct l2_dtable); if ((nva & PAGE_MASK) < (pmap_kernel_l2dtable_kva & PAGE_MASK)) { /* * The new l2_dtable straddles a page boundary. * Map in another page to cover it. */ if (pmap_grow_map(nva, pte_l2_s_cache_mode, NULL)) return (NULL); } pmap_kernel_l2dtable_kva = nva; /* * Link it into the parent pmap */ pmap->pm_l2[L2_IDX(l1idx)] = l2; memset(l2, 0, sizeof(*l2)); } l2b = &l2->l2_bucket[L2_BUCKET(l1idx)]; /* * Fetch pointer to the L2 page table associated with the address. */ if (l2b->l2b_kva == NULL) { pt_entry_t *ptep; /* * No L2 page table has been allocated. Chances are, this * is because we just allocated the l2_dtable, above. */ nva = pmap_kernel_l2ptp_kva; ptep = (pt_entry_t *)nva; if ((nva & PAGE_MASK) == 0) { /* * Need to allocate a backing page */ if (pmap_grow_map(nva, pte_l2_s_cache_mode_pt, &pmap_kernel_l2ptp_phys)) return (NULL); } memset(ptep, 0, L2_TABLE_SIZE_REAL); l2->l2_occupancy++; l2b->l2b_kva = ptep; l2b->l2b_l1idx = l1idx; l2b->l2b_phys = pmap_kernel_l2ptp_phys; pmap_kernel_l2ptp_kva += L2_TABLE_SIZE_REAL; pmap_kernel_l2ptp_phys += L2_TABLE_SIZE_REAL; } /* Distribute new L1 entry to all other L1s */ SLIST_FOREACH(l1, &l1_list, l1_link) { pl1pd = &l1->l1_kva[L1_IDX(va)]; *pl1pd = l2b->l2b_phys | L1_C_DOM(PMAP_DOMAIN_KERNEL) | L1_C_PROTO; PTE_SYNC(pl1pd); } cpu_tlb_flushID_SE(va); cpu_cpwait(); return (l2b); } /* * grow the number of kernel page table entries, if needed */ void pmap_growkernel(vm_offset_t addr) { pmap_t kpmap = pmap_kernel(); if (addr <= pmap_curmaxkvaddr) return; /* we are OK */ /* * whoops! we need to add kernel PTPs */ /* Map 1MB at a time */ for (; pmap_curmaxkvaddr < addr; pmap_curmaxkvaddr += L1_S_SIZE) pmap_grow_l2_bucket(kpmap, pmap_curmaxkvaddr); kernel_vm_end = pmap_curmaxkvaddr; } /* * Returns TRUE if the given page is mapped individually or as part of * a 1MB section. Otherwise, returns FALSE. */ boolean_t pmap_page_is_mapped(vm_page_t m) { boolean_t rv; if ((m->oflags & VPO_UNMANAGED) != 0) return (FALSE); rw_wlock(&pvh_global_lock); rv = !TAILQ_EMPTY(&m->md.pv_list) || ((m->flags & PG_FICTITIOUS) == 0 && !TAILQ_EMPTY(&pa_to_pvh(VM_PAGE_TO_PHYS(m))->pv_list)); rw_wunlock(&pvh_global_lock); return (rv); } /* * Remove all pages from specified address space * this aids process exit speeds. Also, this code * is special cased for current process only, but * can have the more generic (and slightly slower) * mode enabled. This is much faster than pmap_remove * in the case of running down an entire address space. */ void pmap_remove_pages(pmap_t pmap) { struct pv_entry *pv; struct l2_bucket *l2b = NULL; struct pv_chunk *pc, *npc; struct md_page *pvh; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep; vm_page_t m, mt; vm_offset_t va; uint32_t inuse, bitmask; int allfree, bit, field, idx; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); TAILQ_FOREACH_SAFE(pc, &pmap->pm_pvchunk, pc_list, npc) { allfree = 1; for (field = 0; field < _NPCM; field++) { inuse = ~pc->pc_map[field] & pc_freemask[field]; while (inuse != 0) { bit = ffs(inuse) - 1; bitmask = 1ul << bit; idx = field * sizeof(inuse) * NBBY + bit; pv = &pc->pc_pventry[idx]; va = pv->pv_va; inuse &= ~bitmask; if (pv->pv_flags & PVF_WIRED) { /* Cannot remove wired pages now. */ allfree = 0; continue; } pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; l1pd = *pl1pd; l2b = pmap_get_l2_bucket(pmap, va); if ((l1pd & L1_TYPE_MASK) == L1_S_PROTO) { pvh = pa_to_pvh(l1pd & L1_S_FRAME); TAILQ_REMOVE(&pvh->pv_list, pv, pv_list); if (TAILQ_EMPTY(&pvh->pv_list)) { m = PHYS_TO_VM_PAGE(l1pd & L1_S_FRAME); KASSERT((vm_offset_t)m >= KERNBASE, ("Trying to access non-existent page " "va %x l1pd %x", trunc_1mpage(va), l1pd)); for (mt = m; mt < &m[L2_PTE_NUM_TOTAL]; mt++) { if (TAILQ_EMPTY(&mt->md.pv_list)) vm_page_aflag_clear(mt, PGA_WRITEABLE); } } if (l2b != NULL) { KASSERT(l2b->l2b_occupancy == L2_PTE_NUM_TOTAL, ("pmap_remove_pages: l2_bucket occupancy error")); pmap_free_l2_bucket(pmap, l2b, L2_PTE_NUM_TOTAL); } pmap->pm_stats.resident_count -= L2_PTE_NUM_TOTAL; *pl1pd = 0; PTE_SYNC(pl1pd); } else { KASSERT(l2b != NULL, ("No L2 bucket in pmap_remove_pages")); ptep = &l2b->l2b_kva[l2pte_index(va)]; m = PHYS_TO_VM_PAGE(l2pte_pa(*ptep)); KASSERT((vm_offset_t)m >= KERNBASE, ("Trying to access non-existent page " "va %x pte %x", va, *ptep)); TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); if (TAILQ_EMPTY(&m->md.pv_list) && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(l2pte_pa(*ptep)); if (TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } *ptep = 0; PTE_SYNC(ptep); pmap_free_l2_bucket(pmap, l2b, 1); pmap->pm_stats.resident_count--; } /* Mark free */ PV_STAT(pv_entry_frees++); PV_STAT(pv_entry_spare++); pv_entry_count--; pc->pc_map[field] |= bitmask; } } if (allfree) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); pmap_free_pv_chunk(pc); } } rw_wunlock(&pvh_global_lock); cpu_tlb_flushID(); cpu_cpwait(); PMAP_UNLOCK(pmap); } static void pmap_init_qpages(void) { struct pcpu *pc; struct l2_bucket *l2b; int i; CPU_FOREACH(i) { pc = pcpu_find(i); pc->pc_qmap_addr = kva_alloc(PAGE_SIZE); if (pc->pc_qmap_addr == 0) panic("pmap_init_qpages: unable to allocate KVA"); l2b = pmap_get_l2_bucket(pmap_kernel(), pc->pc_qmap_addr); if (l2b == NULL) l2b = pmap_grow_l2_bucket(pmap_kernel(), pc->pc_qmap_addr); if (l2b == NULL) panic("pmap_alloc_specials: no l2b for 0x%x", pc->pc_qmap_addr); pc->pc_qmap_pte = &l2b->l2b_kva[l2pte_index(pc->pc_qmap_addr)]; } } SYSINIT(qpages_init, SI_SUB_CPU, SI_ORDER_ANY, pmap_init_qpages, NULL); /*************************************************** * Low level mapping routines..... ***************************************************/ #ifdef ARM_HAVE_SUPERSECTIONS /* Map a super section into the KVA. */ void pmap_kenter_supersection(vm_offset_t va, uint64_t pa, int flags) { pd_entry_t pd = L1_S_PROTO | L1_S_SUPERSEC | (pa & L1_SUP_FRAME) | (((pa >> 32) & 0xf) << 20) | L1_S_PROT(PTE_KERNEL, VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE) | L1_S_DOM(PMAP_DOMAIN_KERNEL); struct l1_ttable *l1; vm_offset_t va0, va_end; KASSERT(((va | pa) & L1_SUP_OFFSET) == 0, ("Not a valid super section mapping")); if (flags & SECTION_CACHE) pd |= pte_l1_s_cache_mode; else if (flags & SECTION_PT) pd |= pte_l1_s_cache_mode_pt; va0 = va & L1_SUP_FRAME; va_end = va + L1_SUP_SIZE; SLIST_FOREACH(l1, &l1_list, l1_link) { va = va0; for (; va < va_end; va += L1_S_SIZE) { l1->l1_kva[L1_IDX(va)] = pd; PTE_SYNC(&l1->l1_kva[L1_IDX(va)]); } } } #endif /* Map a section into the KVA. */ void pmap_kenter_section(vm_offset_t va, vm_offset_t pa, int flags) { pd_entry_t pd = L1_S_PROTO | pa | L1_S_PROT(PTE_KERNEL, VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE) | L1_S_REF | L1_S_DOM(PMAP_DOMAIN_KERNEL); struct l1_ttable *l1; KASSERT(((va | pa) & L1_S_OFFSET) == 0, ("Not a valid section mapping")); if (flags & SECTION_CACHE) pd |= pte_l1_s_cache_mode; else if (flags & SECTION_PT) pd |= pte_l1_s_cache_mode_pt; SLIST_FOREACH(l1, &l1_list, l1_link) { l1->l1_kva[L1_IDX(va)] = pd; PTE_SYNC(&l1->l1_kva[L1_IDX(va)]); } cpu_tlb_flushID_SE(va); cpu_cpwait(); } /* * Make a temporary mapping for a physical address. This is only intended * to be used for panic dumps. */ void * pmap_kenter_temporary(vm_paddr_t pa, int i) { vm_offset_t va; va = (vm_offset_t)crashdumpmap + (i * PAGE_SIZE); pmap_kenter(va, pa); return ((void *)crashdumpmap); } /* * add a wired page to the kva * note that in order for the mapping to take effect -- you * should do a invltlb after doing the pmap_kenter... */ static PMAP_INLINE void pmap_kenter_internal(vm_offset_t va, vm_offset_t pa, int flags) { struct l2_bucket *l2b; pt_entry_t *ptep; pt_entry_t opte; PDEBUG(1, printf("pmap_kenter: va = %08x, pa = %08x\n", (uint32_t) va, (uint32_t) pa)); l2b = pmap_get_l2_bucket(pmap_kernel(), va); if (l2b == NULL) l2b = pmap_grow_l2_bucket(pmap_kernel(), va); KASSERT(l2b != NULL, ("No L2 Bucket")); ptep = &l2b->l2b_kva[l2pte_index(va)]; opte = *ptep; if (flags & KENTER_CACHE) *ptep = L2_S_PROTO | l2s_mem_types[PTE_CACHE] | pa | L2_S_REF; else if (flags & KENTER_DEVICE) *ptep = L2_S_PROTO | l2s_mem_types[PTE_DEVICE] | pa | L2_S_REF; else *ptep = L2_S_PROTO | l2s_mem_types[PTE_NOCACHE] | pa | L2_S_REF; if (flags & KENTER_CACHE) { pmap_set_prot(ptep, VM_PROT_READ | VM_PROT_WRITE, flags & KENTER_USER); } else { pmap_set_prot(ptep, VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE, 0); } PTE_SYNC(ptep); if (l2pte_valid(opte)) { if (L2_S_EXECUTABLE(opte) || L2_S_EXECUTABLE(*ptep)) cpu_tlb_flushID_SE(va); else cpu_tlb_flushD_SE(va); } else { if (opte == 0) l2b->l2b_occupancy++; } cpu_cpwait(); PDEBUG(1, printf("pmap_kenter: pte = %08x, opte = %08x, npte = %08x\n", (uint32_t) ptep, opte, *ptep)); } void pmap_kenter(vm_offset_t va, vm_paddr_t pa) { pmap_kenter_internal(va, pa, KENTER_CACHE); } void pmap_kenter_nocache(vm_offset_t va, vm_paddr_t pa) { pmap_kenter_internal(va, pa, 0); } void pmap_kenter_device(vm_offset_t va, vm_size_t size, vm_paddr_t pa) { vm_offset_t sva; KASSERT((size & PAGE_MASK) == 0, ("%s: device mapping not page-sized", __func__)); sva = va; while (size != 0) { pmap_kenter_internal(va, pa, KENTER_DEVICE); va += PAGE_SIZE; pa += PAGE_SIZE; size -= PAGE_SIZE; } } void pmap_kremove_device(vm_offset_t va, vm_size_t size) { vm_offset_t sva; KASSERT((size & PAGE_MASK) == 0, ("%s: device mapping not page-sized", __func__)); sva = va; while (size != 0) { pmap_kremove(va); va += PAGE_SIZE; size -= PAGE_SIZE; } } void pmap_kenter_user(vm_offset_t va, vm_paddr_t pa) { pmap_kenter_internal(va, pa, KENTER_CACHE|KENTER_USER); /* * Call pmap_fault_fixup now, to make sure we'll have no exception * at the first use of the new address, or bad things will happen, * as we use one of these addresses in the exception handlers. */ pmap_fault_fixup(pmap_kernel(), va, VM_PROT_READ|VM_PROT_WRITE, 1); } vm_paddr_t pmap_kextract(vm_offset_t va) { if (kernel_vm_end == 0) return (0); return (pmap_extract_locked(kernel_pmap, va)); } /* * remove a page from the kernel pagetables */ void pmap_kremove(vm_offset_t va) { struct l2_bucket *l2b; pt_entry_t *ptep, opte; l2b = pmap_get_l2_bucket(pmap_kernel(), va); if (!l2b) return; KASSERT(l2b != NULL, ("No L2 Bucket")); ptep = &l2b->l2b_kva[l2pte_index(va)]; opte = *ptep; if (l2pte_valid(opte)) { va = va & ~PAGE_MASK; *ptep = 0; PTE_SYNC(ptep); if (L2_S_EXECUTABLE(opte)) cpu_tlb_flushID_SE(va); else cpu_tlb_flushD_SE(va); cpu_cpwait(); } } /* * Used to map a range of physical addresses into kernel * virtual address space. * * The value passed in '*virt' is a suggested virtual address for * the mapping. Architectures which can support a direct-mapped * physical to virtual region can return the appropriate address * within that region, leaving '*virt' unchanged. Other * architectures should map the pages starting at '*virt' and * update '*virt' with the first usable address after the mapped * region. */ vm_offset_t pmap_map(vm_offset_t *virt, vm_offset_t start, vm_offset_t end, int prot) { vm_offset_t sva = *virt; vm_offset_t va = sva; PDEBUG(1, printf("pmap_map: virt = %08x, start = %08x, end = %08x, " "prot = %d\n", (uint32_t) *virt, (uint32_t) start, (uint32_t) end, prot)); while (start < end) { pmap_kenter(va, start); va += PAGE_SIZE; start += PAGE_SIZE; } *virt = va; return (sva); } /* * Add a list of wired pages to the kva * this routine is only used for temporary * kernel mappings that do not need to have * page modification or references recorded. * Note that old mappings are simply written * over. The page *must* be wired. */ void pmap_qenter(vm_offset_t va, vm_page_t *m, int count) { int i; for (i = 0; i < count; i++) { pmap_kenter_internal(va, VM_PAGE_TO_PHYS(m[i]), KENTER_CACHE); va += PAGE_SIZE; } } /* * this routine jerks page mappings from the * kernel -- it is meant only for temporary mappings. */ void pmap_qremove(vm_offset_t va, int count) { int i; for (i = 0; i < count; i++) { if (vtophys(va)) pmap_kremove(va); va += PAGE_SIZE; } } /* * pmap_object_init_pt preloads the ptes for a given object * into the specified pmap. This eliminates the blast of soft * faults on process startup and immediately after an mmap. */ void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("pmap_object_init_pt: non-device object")); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is elgible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { pd_entry_t *pdep; pt_entry_t *ptep; if (!pmap_get_pde_pte(pmap, addr, &pdep, &ptep)) return (FALSE); KASSERT((pdep != NULL && (l1pte_section_p(*pdep) || ptep != NULL)), ("Valid mapping but no pte ?")); if (*pdep != 0 && !l1pte_section_p(*pdep)) if (*ptep == 0) return (TRUE); return (FALSE); } /* * Fetch pointers to the PDE/PTE for the given pmap/VA pair. * Returns TRUE if the mapping exists, else FALSE. * * NOTE: This function is only used by a couple of arm-specific modules. * It is not safe to take any pmap locks here, since we could be right * in the middle of debugging the pmap anyway... * * It is possible for this routine to return FALSE even though a valid * mapping does exist. This is because we don't lock, so the metadata * state may be inconsistent. * * NOTE: We can return a NULL *ptp in the case where the L1 pde is * a "section" mapping. */ boolean_t pmap_get_pde_pte(pmap_t pmap, vm_offset_t va, pd_entry_t **pdp, pt_entry_t **ptp) { struct l2_dtable *l2; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep; u_short l1idx; if (pmap->pm_l1 == NULL) return (FALSE); l1idx = L1_IDX(va); *pdp = pl1pd = &pmap->pm_l1->l1_kva[l1idx]; l1pd = *pl1pd; if (l1pte_section_p(l1pd)) { *ptp = NULL; return (TRUE); } if (pmap->pm_l2 == NULL) return (FALSE); l2 = pmap->pm_l2[L2_IDX(l1idx)]; if (l2 == NULL || (ptep = l2->l2_bucket[L2_BUCKET(l1idx)].l2b_kva) == NULL) { return (FALSE); } *ptp = &ptep[l2pte_index(va)]; return (TRUE); } /* * Routine: pmap_remove_all * Function: * Removes this physical page from * all physical maps in which it resides. * Reflects back modify bits to the pager. * * Notes: * Original versions of this routine were very * inefficient because they iteratively called * pmap_remove (slow...) */ void pmap_remove_all(vm_page_t m) { struct md_page *pvh; pv_entry_t pv; pmap_t pmap; pt_entry_t *ptep; struct l2_bucket *l2b; boolean_t flush = FALSE; pmap_t curpmap; u_int is_exec = 0; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_all: page %p is not managed", m)); rw_wlock(&pvh_global_lock); if ((m->flags & PG_FICTITIOUS) != 0) goto small_mappings; pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); while ((pv = TAILQ_FIRST(&pvh->pv_list)) != NULL) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pd_entry_t *pl1pd; pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(pv->pv_va)]; KASSERT((*pl1pd & L1_TYPE_MASK) == L1_S_PROTO, ("pmap_remove_all: valid section mapping expected")); (void)pmap_demote_section(pmap, pv->pv_va); PMAP_UNLOCK(pmap); } small_mappings: curpmap = vmspace_pmap(curproc->p_vmspace); while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { pmap = PV_PMAP(pv); if (flush == FALSE && (pmap == curpmap || pmap == pmap_kernel())) flush = TRUE; PMAP_LOCK(pmap); l2b = pmap_get_l2_bucket(pmap, pv->pv_va); KASSERT(l2b != NULL, ("No l2 bucket")); ptep = &l2b->l2b_kva[l2pte_index(pv->pv_va)]; is_exec |= PTE_BEEN_EXECD(*ptep); *ptep = 0; if (pmap_is_current(pmap)) PTE_SYNC(ptep); pmap_free_l2_bucket(pmap, l2b, 1); pmap->pm_stats.resident_count--; TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); if (pv->pv_flags & PVF_WIRED) pmap->pm_stats.wired_count--; pmap_free_pv_entry(pmap, pv); PMAP_UNLOCK(pmap); } if (flush) { if (is_exec) cpu_tlb_flushID(); else cpu_tlb_flushD(); cpu_cpwait(); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); } int pmap_change_attr(vm_offset_t sva, vm_size_t len, int mode) { vm_offset_t base, offset, tmpva; vm_size_t size; struct l2_bucket *l2b; pt_entry_t *ptep, pte; vm_offset_t next_bucket; PMAP_LOCK(kernel_pmap); base = trunc_page(sva); offset = sva & PAGE_MASK; size = roundup(offset + len, PAGE_SIZE); for (tmpva = base; tmpva < base + size; ) { next_bucket = L2_NEXT_BUCKET(tmpva); if (next_bucket > base + size) next_bucket = base + size; l2b = pmap_get_l2_bucket(kernel_pmap, tmpva); if (l2b == NULL) { tmpva = next_bucket; continue; } ptep = &l2b->l2b_kva[l2pte_index(tmpva)]; if (*ptep == 0) { PMAP_UNLOCK(kernel_pmap); return(EINVAL); } pte = *ptep &~ L2_S_CACHE_MASK; cpu_idcache_wbinv_range(tmpva, PAGE_SIZE); pmap_l2cache_wbinv_range(tmpva, pte & L2_S_FRAME, PAGE_SIZE); *ptep = pte; cpu_tlb_flushID_SE(tmpva); cpu_cpwait(); dprintf("%s: for va:%x ptep:%x pte:%x\n", __func__, tmpva, (uint32_t)ptep, pte); tmpva += PAGE_SIZE; } PMAP_UNLOCK(kernel_pmap); return (0); } /* * Set the physical protection on the * specified range of this map as requested. */ void pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { struct l2_bucket *l2b; struct md_page *pvh; struct pv_entry *pve; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep, pte; vm_offset_t next_bucket; u_int is_exec, is_refd; int flush; if ((prot & VM_PROT_READ) == 0) { pmap_remove(pmap, sva, eva); return; } if (prot & VM_PROT_WRITE) { /* * If this is a read->write transition, just ignore it and let * vm_fault() take care of it later. */ return; } rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); /* * OK, at this point, we know we're doing write-protect operation. * If the pmap is active, write-back the range. */ flush = ((eva - sva) >= (PAGE_SIZE * 4)) ? 0 : -1; is_exec = is_refd = 0; while (sva < eva) { next_bucket = L2_NEXT_BUCKET(sva); /* * Check for large page. */ pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(sva)]; l1pd = *pl1pd; if ((l1pd & L1_TYPE_MASK) == L1_S_PROTO) { KASSERT(pmap != pmap_kernel(), ("pmap_protect: trying to modify " "kernel section protections")); /* * Are we protecting the entire large page? If not, * demote the mapping and fall through. */ if (sva + L1_S_SIZE == next_bucket && eva >= next_bucket) { l1pd &= ~(L1_S_PROT_MASK | L1_S_XN); if (!(prot & VM_PROT_EXECUTE)) l1pd |= L1_S_XN; /* * At this point we are always setting * write-protect bit. */ l1pd |= L1_S_APX; /* All managed superpages are user pages. */ l1pd |= L1_S_PROT_U; *pl1pd = l1pd; PTE_SYNC(pl1pd); pvh = pa_to_pvh(l1pd & L1_S_FRAME); pve = pmap_find_pv(pvh, pmap, trunc_1mpage(sva)); pve->pv_flags &= ~PVF_WRITE; sva = next_bucket; continue; } else if (!pmap_demote_section(pmap, sva)) { /* The large page mapping was destroyed. */ sva = next_bucket; continue; } } if (next_bucket > eva) next_bucket = eva; l2b = pmap_get_l2_bucket(pmap, sva); if (l2b == NULL) { sva = next_bucket; continue; } ptep = &l2b->l2b_kva[l2pte_index(sva)]; while (sva < next_bucket) { if ((pte = *ptep) != 0 && L2_S_WRITABLE(pte)) { struct vm_page *m; m = PHYS_TO_VM_PAGE(l2pte_pa(pte)); pmap_set_prot(ptep, prot, !(pmap == pmap_kernel())); PTE_SYNC(ptep); pmap_modify_pv(m, pmap, sva, PVF_WRITE, 0); if (flush >= 0) { flush++; is_exec |= PTE_BEEN_EXECD(pte); is_refd |= PTE_BEEN_REFD(pte); } else { if (PTE_BEEN_EXECD(pte)) cpu_tlb_flushID_SE(sva); else if (PTE_BEEN_REFD(pte)) cpu_tlb_flushD_SE(sva); } } sva += PAGE_SIZE; ptep++; } } if (flush) { if (is_exec) cpu_tlb_flushID(); else if (is_refd) cpu_tlb_flushD(); cpu_cpwait(); } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Insert the given physical page (p) at * the specified virtual address (v) in the * target physical map with the protection requested. * * If specified, the page will be wired down, meaning * that the related pte can not be reclaimed. * * NB: This is the only routine which MAY NOT lazy-evaluate * or lose information. That is, this routine must actually * insert this page into the given map NOW. */ int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind __unused) { struct l2_bucket *l2b; int rv; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); rv = pmap_enter_locked(pmap, va, m, prot, flags); if (rv == KERN_SUCCESS) { /* * If both the l2b_occupancy and the reservation are fully * populated, then attempt promotion. */ l2b = pmap_get_l2_bucket(pmap, va); if (l2b != NULL && l2b->l2b_occupancy == L2_PTE_NUM_TOTAL && sp_enabled && (m->flags & PG_FICTITIOUS) == 0 && vm_reserv_level_iffullpop(m) == 0) pmap_promote_section(pmap, va); } PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); return (rv); } /* * The pvh global and pmap locks must be held. */ static int pmap_enter_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags) { struct l2_bucket *l2b = NULL; struct vm_page *om; struct pv_entry *pve = NULL; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep, npte, opte; u_int nflags; u_int is_exec, is_refd; vm_paddr_t pa; u_char user; PMAP_ASSERT_LOCKED(pmap); rw_assert(&pvh_global_lock, RA_WLOCKED); if (va == vector_page) { pa = systempage.pv_pa; m = NULL; } else { if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); pa = VM_PAGE_TO_PHYS(m); } pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; if ((va < VM_MAXUSER_ADDRESS) && (*pl1pd & L1_TYPE_MASK) == L1_S_PROTO) { (void)pmap_demote_section(pmap, va); } user = 0; /* * Make sure userland mappings get the right permissions */ if (pmap != pmap_kernel() && va != vector_page) user = 1; nflags = 0; if (prot & VM_PROT_WRITE) nflags |= PVF_WRITE; if ((flags & PMAP_ENTER_WIRED) != 0) nflags |= PVF_WIRED; PDEBUG(1, printf("pmap_enter: pmap = %08x, va = %08x, m = %08x, " "prot = %x, flags = %x\n", (uint32_t) pmap, va, (uint32_t) m, prot, flags)); if (pmap == pmap_kernel()) { l2b = pmap_get_l2_bucket(pmap, va); if (l2b == NULL) l2b = pmap_grow_l2_bucket(pmap, va); } else { do_l2b_alloc: l2b = pmap_alloc_l2_bucket(pmap, va); if (l2b == NULL) { if ((flags & PMAP_ENTER_NOSLEEP) == 0) { PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); VM_WAIT; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); goto do_l2b_alloc; } return (KERN_RESOURCE_SHORTAGE); } } pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; if ((*pl1pd & L1_TYPE_MASK) == L1_S_PROTO) panic("pmap_enter: attempt to enter on 1MB page, va: %#x", va); ptep = &l2b->l2b_kva[l2pte_index(va)]; opte = *ptep; npte = pa; is_exec = is_refd = 0; if (opte) { if (l2pte_pa(opte) == pa) { /* * We're changing the attrs of an existing mapping. */ if (m != NULL) pmap_modify_pv(m, pmap, va, PVF_WRITE | PVF_WIRED, nflags); is_exec |= PTE_BEEN_EXECD(opte); is_refd |= PTE_BEEN_REFD(opte); goto validate; } if ((om = PHYS_TO_VM_PAGE(l2pte_pa(opte)))) { /* * Replacing an existing mapping with a new one. * It is part of our managed memory so we * must remove it from the PV list */ if ((pve = pmap_remove_pv(om, pmap, va))) { is_exec |= PTE_BEEN_EXECD(opte); is_refd |= PTE_BEEN_REFD(opte); if (m && ((m->oflags & VPO_UNMANAGED))) pmap_free_pv_entry(pmap, pve); } } } else { /* * Keep the stats up to date */ l2b->l2b_occupancy++; pmap->pm_stats.resident_count++; } /* * Enter on the PV list if part of our managed memory. */ if ((m && !(m->oflags & VPO_UNMANAGED))) { if ((!pve) && (pve = pmap_get_pv_entry(pmap, FALSE)) == NULL) panic("pmap_enter: no pv entries"); KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva, ("pmap_enter: managed mapping within the clean submap")); KASSERT(pve != NULL, ("No pv")); pmap_enter_pv(m, pve, pmap, va, nflags); } validate: /* Make the new PTE valid */ npte |= L2_S_PROTO; #ifdef SMP npte |= L2_SHARED; #endif /* Set defaults first - kernel read access */ npte |= L2_APX; npte |= L2_S_PROT_R; /* Set "referenced" flag */ npte |= L2_S_REF; /* Now tune APs as desired */ if (user) npte |= L2_S_PROT_U; /* * If this is not a vector_page * then continue setting mapping parameters */ if (m != NULL) { if ((m->oflags & VPO_UNMANAGED) == 0) { if (prot & (VM_PROT_ALL)) { vm_page_aflag_set(m, PGA_REFERENCED); } else { /* * Need to do page referenced emulation. */ npte &= ~L2_S_REF; } } if (prot & VM_PROT_WRITE) { if ((m->oflags & VPO_UNMANAGED) == 0) { vm_page_aflag_set(m, PGA_WRITEABLE); /* * XXX: Skip modified bit emulation for now. * The emulation reveals problems * that result in random failures * during memory allocation on some * platforms. * Therefore, the page is marked RW * immediately. */ npte &= ~(L2_APX); vm_page_dirty(m); } else npte &= ~(L2_APX); } if (!(prot & VM_PROT_EXECUTE)) npte |= L2_XN; if (m->md.pv_memattr != VM_MEMATTR_UNCACHEABLE) npte |= pte_l2_s_cache_mode; } CTR5(KTR_PMAP,"enter: pmap:%p va:%x prot:%x pte:%x->%x", pmap, va, prot, opte, npte); /* * If this is just a wiring change, the two PTEs will be * identical, so there's no need to update the page table. */ if (npte != opte) { boolean_t is_cached = pmap_is_current(pmap); *ptep = npte; PTE_SYNC(ptep); if (is_cached) { /* * We only need to frob the cache/tlb if this pmap * is current */ if (L1_IDX(va) != L1_IDX(vector_page) && l2pte_valid(npte)) { /* * This mapping is likely to be accessed as * soon as we return to userland. Fix up the * L1 entry to avoid taking another * page/domain fault. */ l1pd = l2b->l2b_phys | L1_C_DOM(pmap->pm_domain) | L1_C_PROTO; if (*pl1pd != l1pd) { *pl1pd = l1pd; PTE_SYNC(pl1pd); } } } if (is_exec) cpu_tlb_flushID_SE(va); else if (is_refd) cpu_tlb_flushD_SE(va); cpu_cpwait(); } if ((pmap != pmap_kernel()) && (pmap == &curproc->p_vmspace->vm_pmap)) cpu_icache_sync_range(va, PAGE_SIZE); return (KERN_SUCCESS); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { vm_offset_t va; vm_page_t m; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); m = m_start; prot &= VM_PROT_READ | VM_PROT_EXECUTE; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { va = start + ptoa(diff); if ((va & L1_S_OFFSET) == 0 && L2_NEXT_BUCKET(va) <= end && m->psind == 1 && sp_enabled && pmap_enter_section(pmap, va, m, prot)) m = &m[L1_S_SIZE / PAGE_SIZE - 1]; else pmap_enter_locked(pmap, va, m, prot, PMAP_ENTER_NOSLEEP); m = TAILQ_NEXT(m, listq); } PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); } /* * this code makes some *MAJOR* assumptions: * 1. Current pmap & pmap exists. * 2. Not wired. * 3. Read access. * 4. No page table pages. * but is *MUCH* faster than pmap_enter... */ void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { prot &= VM_PROT_READ | VM_PROT_EXECUTE; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); pmap_enter_locked(pmap, va, m, prot, PMAP_ENTER_NOSLEEP); PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range * must have the wired attribute set. In contrast, invalid mappings * cannot have the wired attribute set, so they are ignored. * * XXX Wired mappings of unmanaged pages cannot be counted by this pmap * implementation. */ void pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { struct l2_bucket *l2b; struct md_page *pvh; pd_entry_t l1pd; pt_entry_t *ptep, pte; pv_entry_t pv; vm_offset_t next_bucket; vm_paddr_t pa; vm_page_t m; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (sva < eva) { next_bucket = L2_NEXT_BUCKET(sva); l1pd = pmap->pm_l1->l1_kva[L1_IDX(sva)]; if ((l1pd & L1_TYPE_MASK) == L1_S_PROTO) { pa = l1pd & L1_S_FRAME; m = PHYS_TO_VM_PAGE(pa); KASSERT(m != NULL && (m->oflags & VPO_UNMANAGED) == 0, ("pmap_unwire: unmanaged 1mpage %p", m)); pvh = pa_to_pvh(pa); pv = pmap_find_pv(pvh, pmap, trunc_1mpage(sva)); if ((pv->pv_flags & PVF_WIRED) == 0) panic("pmap_unwire: pv %p isn't wired", pv); /* * Are we unwiring the entire large page? If not, * demote the mapping and fall through. */ if (sva + L1_S_SIZE == next_bucket && eva >= next_bucket) { pv->pv_flags &= ~PVF_WIRED; pmap->pm_stats.wired_count -= L2_PTE_NUM_TOTAL; sva = next_bucket; continue; } else if (!pmap_demote_section(pmap, sva)) panic("pmap_unwire: demotion failed"); } if (next_bucket > eva) next_bucket = eva; l2b = pmap_get_l2_bucket(pmap, sva); if (l2b == NULL) { sva = next_bucket; continue; } for (ptep = &l2b->l2b_kva[l2pte_index(sva)]; sva < next_bucket; sva += PAGE_SIZE, ptep++) { if ((pte = *ptep) == 0 || (m = PHYS_TO_VM_PAGE(l2pte_pa(pte))) == NULL || (m->oflags & VPO_UNMANAGED) != 0) continue; pv = pmap_find_pv(&m->md, pmap, sva); if ((pv->pv_flags & PVF_WIRED) == 0) panic("pmap_unwire: pv %p isn't wired", pv); pv->pv_flags &= ~PVF_WIRED; pmap->pm_stats.wired_count--; } } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { } /* * Routine: pmap_extract * Function: * Extract the physical page address associated * with the given map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va) { vm_paddr_t pa; PMAP_LOCK(pmap); pa = pmap_extract_locked(pmap, va); PMAP_UNLOCK(pmap); return (pa); } static vm_paddr_t pmap_extract_locked(pmap_t pmap, vm_offset_t va) { struct l2_dtable *l2; pd_entry_t l1pd; pt_entry_t *ptep, pte; vm_paddr_t pa; u_int l1idx; if (kernel_vm_end != 0 && pmap != kernel_pmap) PMAP_ASSERT_LOCKED(pmap); l1idx = L1_IDX(va); l1pd = pmap->pm_l1->l1_kva[l1idx]; if (l1pte_section_p(l1pd)) { /* XXX: what to do about the bits > 32 ? */ if (l1pd & L1_S_SUPERSEC) pa = (l1pd & L1_SUP_FRAME) | (va & L1_SUP_OFFSET); else pa = (l1pd & L1_S_FRAME) | (va & L1_S_OFFSET); } else { /* * Note that we can't rely on the validity of the L1 * descriptor as an indication that a mapping exists. * We have to look it up in the L2 dtable. */ l2 = pmap->pm_l2[L2_IDX(l1idx)]; if (l2 == NULL || (ptep = l2->l2_bucket[L2_BUCKET(l1idx)].l2b_kva) == NULL) return (0); pte = ptep[l2pte_index(va)]; if (pte == 0) return (0); switch (pte & L2_TYPE_MASK) { case L2_TYPE_L: pa = (pte & L2_L_FRAME) | (va & L2_L_OFFSET); break; default: pa = (pte & L2_S_FRAME) | (va & L2_S_OFFSET); break; } } return (pa); } /* * Atomically extract and hold the physical page with the given * pmap and virtual address pair if that mapping permits the given * protection. * */ vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot) { struct l2_dtable *l2; pd_entry_t l1pd; pt_entry_t *ptep, pte; vm_paddr_t pa, paddr; vm_page_t m = NULL; u_int l1idx; l1idx = L1_IDX(va); paddr = 0; PMAP_LOCK(pmap); retry: l1pd = pmap->pm_l1->l1_kva[l1idx]; if (l1pte_section_p(l1pd)) { /* XXX: what to do about the bits > 32 ? */ if (l1pd & L1_S_SUPERSEC) pa = (l1pd & L1_SUP_FRAME) | (va & L1_SUP_OFFSET); else pa = (l1pd & L1_S_FRAME) | (va & L1_S_OFFSET); if (vm_page_pa_tryrelock(pmap, pa & PG_FRAME, &paddr)) goto retry; if (L1_S_WRITABLE(l1pd) || (prot & VM_PROT_WRITE) == 0) { m = PHYS_TO_VM_PAGE(pa); vm_page_hold(m); } } else { /* * Note that we can't rely on the validity of the L1 * descriptor as an indication that a mapping exists. * We have to look it up in the L2 dtable. */ l2 = pmap->pm_l2[L2_IDX(l1idx)]; if (l2 == NULL || (ptep = l2->l2_bucket[L2_BUCKET(l1idx)].l2b_kva) == NULL) { PMAP_UNLOCK(pmap); return (NULL); } ptep = &ptep[l2pte_index(va)]; pte = *ptep; if (pte == 0) { PMAP_UNLOCK(pmap); return (NULL); } else if ((prot & VM_PROT_WRITE) && (pte & L2_APX)) { PMAP_UNLOCK(pmap); return (NULL); } else { switch (pte & L2_TYPE_MASK) { case L2_TYPE_L: panic("extract and hold section mapping"); break; default: pa = (pte & L2_S_FRAME) | (va & L2_S_OFFSET); break; } if (vm_page_pa_tryrelock(pmap, pa & PG_FRAME, &paddr)) goto retry; m = PHYS_TO_VM_PAGE(pa); vm_page_hold(m); } } PMAP_UNLOCK(pmap); PA_UNLOCK_COND(paddr); return (m); } +vm_paddr_t +pmap_dump_kextract(vm_offset_t va, pt2_entry_t *pte2p) +{ + struct l2_dtable *l2; + pd_entry_t l1pd; + pt_entry_t *ptep, pte; + vm_paddr_t pa; + u_int l1idx; + + l1idx = L1_IDX(va); + l1pd = kernel_pmap->pm_l1->l1_kva[l1idx]; + if (l1pte_section_p(l1pd)) { + if (l1pd & L1_S_SUPERSEC) + pa = (l1pd & L1_SUP_FRAME) | (va & L1_SUP_OFFSET); + else + pa = (l1pd & L1_S_FRAME) | (va & L1_S_OFFSET); + pte = L2_S_PROTO | pa | + L2_S_PROT(PTE_KERNEL, VM_PROT_READ | VM_PROT_WRITE); + } else { + l2 = kernel_pmap->pm_l2[L2_IDX(l1idx)]; + if (l2 == NULL || + (ptep = l2->l2_bucket[L2_BUCKET(l1idx)].l2b_kva) == NULL) { + pte = 0; + pa = 0; + goto out; + } + pte = ptep[l2pte_index(va)]; + if (pte == 0) { + pa = 0; + goto out; + } + switch (pte & L2_TYPE_MASK) { + case L2_TYPE_L: + pa = (pte & L2_L_FRAME) | (va & L2_L_OFFSET); + break; + default: + pa = (pte & L2_S_FRAME) | (va & L2_S_OFFSET); + break; + } + } +out: + if (pte2p != NULL) + *pte2p = pte; + return (pa); +} + /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. */ int pmap_pinit(pmap_t pmap) { PDEBUG(1, printf("pmap_pinit: pmap = %08x\n", (uint32_t) pmap)); pmap_alloc_l1(pmap); bzero(pmap->pm_l2, sizeof(pmap->pm_l2)); CPU_ZERO(&pmap->pm_active); TAILQ_INIT(&pmap->pm_pvchunk); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); pmap->pm_stats.resident_count = 1; if (vector_page < KERNBASE) { pmap_enter(pmap, vector_page, PHYS_TO_VM_PAGE(systempage.pv_pa), VM_PROT_READ, PMAP_ENTER_WIRED, 0); } return (1); } /*************************************************** * Superpage management routines. ***************************************************/ static PMAP_INLINE struct pv_entry * pmap_pvh_remove(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); pv = pmap_find_pv(pvh, pmap, va); if (pv != NULL) TAILQ_REMOVE(&pvh->pv_list, pv, pv_list); return (pv); } static void pmap_pvh_free(struct md_page *pvh, pmap_t pmap, vm_offset_t va) { pv_entry_t pv; pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pvh_free: pv not found")); pmap_free_pv_entry(pmap, pv); } static boolean_t pmap_pv_insert_section(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); if (pv_entry_count < pv_entry_high_water && (pv = pmap_get_pv_entry(pmap, TRUE)) != NULL) { pv->pv_va = va; pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_list); return (TRUE); } else return (FALSE); } /* * Create the pv entries for each of the pages within a superpage. */ static void pmap_pv_demote_section(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pve, pv; vm_offset_t va_last; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT((pa & L1_S_OFFSET) == 0, ("pmap_pv_demote_section: pa is not 1mpage aligned")); /* * Transfer the 1mpage's pv entry for this mapping to the first * page's pv list. */ pvh = pa_to_pvh(pa); va = trunc_1mpage(va); pv = pmap_pvh_remove(pvh, pmap, va); KASSERT(pv != NULL, ("pmap_pv_demote_section: pv not found")); m = PHYS_TO_VM_PAGE(pa); TAILQ_INSERT_HEAD(&m->md.pv_list, pv, pv_list); /* Instantiate the remaining pv entries. */ va_last = L2_NEXT_BUCKET(va) - PAGE_SIZE; do { m++; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_pv_demote_section: page %p is not managed", m)); va += PAGE_SIZE; pve = pmap_get_pv_entry(pmap, FALSE); pmap_enter_pv(m, pve, pmap, va, pv->pv_flags); } while (va < va_last); } static void pmap_pv_promote_section(pmap_t pmap, vm_offset_t va, vm_paddr_t pa) { struct md_page *pvh; pv_entry_t pv; vm_offset_t va_last; vm_page_t m; rw_assert(&pvh_global_lock, RA_WLOCKED); KASSERT((pa & L1_S_OFFSET) == 0, ("pmap_pv_promote_section: pa is not 1mpage aligned")); /* * Transfer the first page's pv entry for this mapping to the * 1mpage's pv list. Aside from avoiding the cost of a call * to get_pv_entry(), a transfer avoids the possibility that * get_pv_entry() calls pmap_pv_reclaim() and that pmap_pv_reclaim() * removes one of the mappings that is being promoted. */ m = PHYS_TO_VM_PAGE(pa); va = trunc_1mpage(va); pv = pmap_pvh_remove(&m->md, pmap, va); KASSERT(pv != NULL, ("pmap_pv_promote_section: pv not found")); pvh = pa_to_pvh(pa); TAILQ_INSERT_TAIL(&pvh->pv_list, pv, pv_list); /* Free the remaining pv entries in the newly mapped section pages */ va_last = L2_NEXT_BUCKET(va) - PAGE_SIZE; do { m++; va += PAGE_SIZE; /* * Don't care the flags, first pv contains sufficient * information for all of the pages so nothing is really lost. */ pmap_pvh_free(&m->md, pmap, va); } while (va < va_last); } /* * Tries to create a 1MB page mapping. Returns TRUE if successful and * FALSE otherwise. Fails if (1) page is unmanageg, kernel pmap or vectors * page, (2) a mapping already exists at the specified virtual address, or * (3) a pv entry cannot be allocated without reclaiming another pv entry. */ static boolean_t pmap_enter_section(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { pd_entry_t *pl1pd; vm_offset_t pa; struct l2_bucket *l2b; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_ASSERT_LOCKED(pmap); /* Skip kernel, vectors page and unmanaged mappings */ if ((pmap == pmap_kernel()) || (L1_IDX(va) == L1_IDX(vector_page)) || ((m->oflags & VPO_UNMANAGED) != 0)) { CTR2(KTR_PMAP, "pmap_enter_section: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } /* * Check whether this is a valid section superpage entry or * there is a l2_bucket associated with that L1 page directory. */ va = trunc_1mpage(va); pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; l2b = pmap_get_l2_bucket(pmap, va); if ((*pl1pd & L1_S_PROTO) || (l2b != NULL)) { CTR2(KTR_PMAP, "pmap_enter_section: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } pa = VM_PAGE_TO_PHYS(m); /* * Abort this mapping if its PV entry could not be created. */ if (!pmap_pv_insert_section(pmap, va, VM_PAGE_TO_PHYS(m))) { CTR2(KTR_PMAP, "pmap_enter_section: failure for va %#lx" " in pmap %p", va, pmap); return (FALSE); } /* * Increment counters. */ pmap->pm_stats.resident_count += L2_PTE_NUM_TOTAL; /* * Despite permissions, mark the superpage read-only. */ prot &= ~VM_PROT_WRITE; /* * Map the superpage. */ pmap_map_section(pmap, va, pa, prot, FALSE); pmap_section_mappings++; CTR2(KTR_PMAP, "pmap_enter_section: success for va %#lx" " in pmap %p", va, pmap); return (TRUE); } /* * pmap_remove_section: do the things to unmap a superpage in a process */ static void pmap_remove_section(pmap_t pmap, vm_offset_t sva) { struct md_page *pvh; struct l2_bucket *l2b; pd_entry_t *pl1pd, l1pd; vm_offset_t eva, va; vm_page_t m; PMAP_ASSERT_LOCKED(pmap); if ((pmap == pmap_kernel()) || (L1_IDX(sva) == L1_IDX(vector_page))) return; KASSERT((sva & L1_S_OFFSET) == 0, ("pmap_remove_section: sva is not 1mpage aligned")); pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(sva)]; l1pd = *pl1pd; m = PHYS_TO_VM_PAGE(l1pd & L1_S_FRAME); KASSERT((m != NULL && ((m->oflags & VPO_UNMANAGED) == 0)), ("pmap_remove_section: no corresponding vm_page or " "page unmanaged")); pmap->pm_stats.resident_count -= L2_PTE_NUM_TOTAL; pvh = pa_to_pvh(l1pd & L1_S_FRAME); pmap_pvh_free(pvh, pmap, sva); eva = L2_NEXT_BUCKET(sva); for (va = sva, m = PHYS_TO_VM_PAGE(l1pd & L1_S_FRAME); va < eva; va += PAGE_SIZE, m++) { /* * Mark base pages referenced but skip marking them dirty. * If the superpage is writeable, hence all base pages were * already marked as dirty in pmap_fault_fixup() before * promotion. Reference bit however, might not have been set * for each base page when the superpage was created at once, * not as a result of promotion. */ if (L1_S_REFERENCED(l1pd)) vm_page_aflag_set(m, PGA_REFERENCED); if (TAILQ_EMPTY(&m->md.pv_list) && TAILQ_EMPTY(&pvh->pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); } l2b = pmap_get_l2_bucket(pmap, sva); if (l2b != NULL) { KASSERT(l2b->l2b_occupancy == L2_PTE_NUM_TOTAL, ("pmap_remove_section: l2_bucket occupancy error")); pmap_free_l2_bucket(pmap, l2b, L2_PTE_NUM_TOTAL); } /* Now invalidate L1 slot */ *pl1pd = 0; PTE_SYNC(pl1pd); if (L1_S_EXECUTABLE(l1pd)) cpu_tlb_flushID_SE(sva); else cpu_tlb_flushD_SE(sva); cpu_cpwait(); } /* * Tries to promote the 256, contiguous 4KB page mappings that are * within a single l2_bucket to a single 1MB section mapping. * For promotion to occur, two conditions must be met: (1) the 4KB page * mappings must map aligned, contiguous physical memory and (2) the 4KB page * mappings must have identical characteristics. */ static void pmap_promote_section(pmap_t pmap, vm_offset_t va) { pt_entry_t *firstptep, firstpte, oldpte, pa, *pte; vm_page_t m, oldm; vm_offset_t first_va, old_va; struct l2_bucket *l2b = NULL; vm_prot_t prot; struct pv_entry *pve, *first_pve; PMAP_ASSERT_LOCKED(pmap); prot = VM_PROT_ALL; /* * Skip promoting kernel pages. This is justified by following: * 1. Kernel is already mapped using section mappings in each pmap * 2. Managed mappings within the kernel are not to be promoted anyway */ if (pmap == pmap_kernel()) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for va %#x" " in pmap %p", va, pmap); return; } /* Do not attemp to promote vectors pages */ if (L1_IDX(va) == L1_IDX(vector_page)) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for va %#x" " in pmap %p", va, pmap); return; } /* * Examine the first PTE in the specified l2_bucket. Abort if this PTE * is either invalid, unused, or does not map the first 4KB physical * page within 1MB page. */ first_va = trunc_1mpage(va); l2b = pmap_get_l2_bucket(pmap, first_va); KASSERT(l2b != NULL, ("pmap_promote_section: trying to promote " "not existing l2 bucket")); firstptep = &l2b->l2b_kva[0]; firstpte = *firstptep; if ((l2pte_pa(firstpte) & L1_S_OFFSET) != 0) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for va %#x" " in pmap %p", va, pmap); return; } if ((firstpte & (L2_S_PROTO | L2_S_REF)) != (L2_S_PROTO | L2_S_REF)) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for va %#x" " in pmap %p", va, pmap); return; } /* * ARM uses pv_entry to mark particular mapping WIRED so don't promote * unmanaged pages since it is impossible to determine, whether the * page is wired or not if there is no corresponding pv_entry. */ m = PHYS_TO_VM_PAGE(l2pte_pa(firstpte)); if (m && ((m->oflags & VPO_UNMANAGED) != 0)) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for va %#x" " in pmap %p", va, pmap); return; } first_pve = pmap_find_pv(&m->md, pmap, first_va); /* * PTE is modified only on write due to modified bit * emulation. If the entry is referenced and writable * then it is modified and we don't clear write enable. * Otherwise, writing is disabled in PTE anyway and * we just configure protections for the section mapping * that is going to be created. */ if ((first_pve->pv_flags & PVF_WRITE) != 0) { if (!L2_S_WRITABLE(firstpte)) { first_pve->pv_flags &= ~PVF_WRITE; prot &= ~VM_PROT_WRITE; } } else prot &= ~VM_PROT_WRITE; if (!L2_S_EXECUTABLE(firstpte)) prot &= ~VM_PROT_EXECUTE; /* * Examine each of the other PTEs in the specified l2_bucket. * Abort if this PTE maps an unexpected 4KB physical page or * does not have identical characteristics to the first PTE. */ pa = l2pte_pa(firstpte) + ((L2_PTE_NUM_TOTAL - 1) * PAGE_SIZE); old_va = L2_NEXT_BUCKET(first_va) - PAGE_SIZE; for (pte = (firstptep + L2_PTE_NUM_TOTAL - 1); pte > firstptep; pte--) { oldpte = *pte; if (l2pte_pa(oldpte) != pa) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for " "va %#x in pmap %p", va, pmap); return; } if ((oldpte & L2_S_PROMOTE) != (firstpte & L2_S_PROMOTE)) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for " "va %#x in pmap %p", va, pmap); return; } oldm = PHYS_TO_VM_PAGE(l2pte_pa(oldpte)); if (oldm && ((oldm->oflags & VPO_UNMANAGED) != 0)) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for " "va %#x in pmap %p", va, pmap); return; } pve = pmap_find_pv(&oldm->md, pmap, old_va); if (pve == NULL) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for " "va %#x old_va %x - no pve", va, old_va); return; } if (!L2_S_WRITABLE(oldpte) && (pve->pv_flags & PVF_WRITE)) pve->pv_flags &= ~PVF_WRITE; if (pve->pv_flags != first_pve->pv_flags) { pmap_section_p_failures++; CTR2(KTR_PMAP, "pmap_promote_section: failure for " "va %#x in pmap %p", va, pmap); return; } old_va -= PAGE_SIZE; pa -= PAGE_SIZE; } /* * Promote the pv entries. */ pmap_pv_promote_section(pmap, first_va, l2pte_pa(firstpte)); /* * Map the superpage. */ pmap_map_section(pmap, first_va, l2pte_pa(firstpte), prot, TRUE); /* * Invalidate all possible TLB mappings for small * pages within the newly created superpage. * Rely on the first PTE's attributes since they * have to be consistent across all of the base pages * within the superpage. If page is not executable it * is at least referenced. * The fastest way to do that is to invalidate whole * TLB at once instead of executing 256 CP15 TLB * invalidations by single entry. TLBs usually maintain * several dozen entries so loss of unrelated entries is * still a less agresive approach. */ if (L2_S_EXECUTABLE(firstpte)) cpu_tlb_flushID(); else cpu_tlb_flushD(); cpu_cpwait(); pmap_section_promotions++; CTR2(KTR_PMAP, "pmap_promote_section: success for va %#x" " in pmap %p", first_va, pmap); } /* * Fills a l2_bucket with mappings to consecutive physical pages. */ static void pmap_fill_l2b(struct l2_bucket *l2b, pt_entry_t newpte) { pt_entry_t *ptep; int i; for (i = 0; i < L2_PTE_NUM_TOTAL; i++) { ptep = &l2b->l2b_kva[i]; *ptep = newpte; PTE_SYNC(ptep); newpte += PAGE_SIZE; } l2b->l2b_occupancy = L2_PTE_NUM_TOTAL; } /* * Tries to demote a 1MB section mapping. If demotion fails, the * 1MB section mapping is invalidated. */ static boolean_t pmap_demote_section(pmap_t pmap, vm_offset_t va) { struct l2_bucket *l2b; struct pv_entry *l1pdpve; struct md_page *pvh; pd_entry_t *pl1pd, l1pd, newl1pd; pt_entry_t *firstptep, newpte; vm_offset_t pa; vm_page_t m; PMAP_ASSERT_LOCKED(pmap); /* * According to assumptions described in pmap_promote_section, * kernel is and always should be mapped using 1MB section mappings. * What more, managed kernel pages were not to be promoted. */ KASSERT(pmap != pmap_kernel() && L1_IDX(va) != L1_IDX(vector_page), ("pmap_demote_section: forbidden section mapping")); va = trunc_1mpage(va); pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; l1pd = *pl1pd; KASSERT((l1pd & L1_TYPE_MASK) == L1_S_PROTO, ("pmap_demote_section: not section or invalid section")); pa = l1pd & L1_S_FRAME; m = PHYS_TO_VM_PAGE(pa); KASSERT((m != NULL && (m->oflags & VPO_UNMANAGED) == 0), ("pmap_demote_section: no vm_page for selected superpage or" "unmanaged")); pvh = pa_to_pvh(pa); l1pdpve = pmap_find_pv(pvh, pmap, va); KASSERT(l1pdpve != NULL, ("pmap_demote_section: no pv entry for " "managed page")); l2b = pmap_get_l2_bucket(pmap, va); if (l2b == NULL) { KASSERT((l1pdpve->pv_flags & PVF_WIRED) == 0, ("pmap_demote_section: No l2_bucket for wired mapping")); /* * Invalidate the 1MB section mapping and return * "failure" if the mapping was never accessed or the * allocation of the new l2_bucket fails. */ if (!L1_S_REFERENCED(l1pd) || (l2b = pmap_alloc_l2_bucket(pmap, va)) == NULL) { /* Unmap and invalidate superpage. */ pmap_remove_section(pmap, trunc_1mpage(va)); CTR2(KTR_PMAP, "pmap_demote_section: failure for " "va %#x in pmap %p", va, pmap); return (FALSE); } } /* * Now we should have corresponding l2_bucket available. * Let's process it to recreate 256 PTEs for each base page * within superpage. */ newpte = pa | L1_S_DEMOTE(l1pd); if (m->md.pv_memattr != VM_MEMATTR_UNCACHEABLE) newpte |= pte_l2_s_cache_mode; /* * If the l2_bucket is new, initialize it. */ if (l2b->l2b_occupancy == 0) pmap_fill_l2b(l2b, newpte); else { firstptep = &l2b->l2b_kva[0]; KASSERT(l2pte_pa(*firstptep) == (pa), ("pmap_demote_section: firstpte and newpte map different " "physical addresses")); /* * If the mapping has changed attributes, update the page table * entries. */ if ((*firstptep & L2_S_PROMOTE) != (L1_S_DEMOTE(l1pd))) pmap_fill_l2b(l2b, newpte); } /* Demote PV entry */ pmap_pv_demote_section(pmap, va, pa); /* Now fix-up L1 */ newl1pd = l2b->l2b_phys | L1_C_DOM(pmap->pm_domain) | L1_C_PROTO; *pl1pd = newl1pd; PTE_SYNC(pl1pd); /* Invalidate old TLB mapping */ if (L1_S_EXECUTABLE(l1pd)) cpu_tlb_flushID_SE(va); else if (L1_S_REFERENCED(l1pd)) cpu_tlb_flushD_SE(va); cpu_cpwait(); pmap_section_demotions++; CTR2(KTR_PMAP, "pmap_demote_section: success for va %#x" " in pmap %p", va, pmap); return (TRUE); } /*************************************************** * page management routines. ***************************************************/ /* * We are in a serious low memory condition. Resort to * drastic measures to free some pages so we can allocate * another pv entry chunk. */ static vm_page_t pmap_pv_reclaim(pmap_t locked_pmap) { struct pch newtail; struct pv_chunk *pc; struct l2_bucket *l2b = NULL; pmap_t pmap; pd_entry_t *pl1pd; pt_entry_t *ptep; pv_entry_t pv; vm_offset_t va; vm_page_t free, m, m_pc; uint32_t inuse; int bit, field, freed, idx; PMAP_ASSERT_LOCKED(locked_pmap); pmap = NULL; free = m_pc = NULL; TAILQ_INIT(&newtail); while ((pc = TAILQ_FIRST(&pv_chunks)) != NULL && (pv_vafree == 0 || free == NULL)) { TAILQ_REMOVE(&pv_chunks, pc, pc_lru); if (pmap != pc->pc_pmap) { if (pmap != NULL) { cpu_tlb_flushID(); cpu_cpwait(); if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } pmap = pc->pc_pmap; /* Avoid deadlock and lock recursion. */ if (pmap > locked_pmap) PMAP_LOCK(pmap); else if (pmap != locked_pmap && !PMAP_TRYLOCK(pmap)) { pmap = NULL; TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); continue; } } /* * Destroy every non-wired, 4 KB page mapping in the chunk. */ freed = 0; for (field = 0; field < _NPCM; field++) { for (inuse = ~pc->pc_map[field] & pc_freemask[field]; inuse != 0; inuse &= ~(1UL << bit)) { bit = ffs(inuse) - 1; idx = field * sizeof(inuse) * NBBY + bit; pv = &pc->pc_pventry[idx]; va = pv->pv_va; pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; if ((*pl1pd & L1_TYPE_MASK) == L1_S_PROTO) continue; if (pv->pv_flags & PVF_WIRED) continue; l2b = pmap_get_l2_bucket(pmap, va); KASSERT(l2b != NULL, ("No l2 bucket")); ptep = &l2b->l2b_kva[l2pte_index(va)]; m = PHYS_TO_VM_PAGE(l2pte_pa(*ptep)); KASSERT((vm_offset_t)m >= KERNBASE, ("Trying to access non-existent page " "va %x pte %x", va, *ptep)); *ptep = 0; PTE_SYNC(ptep); TAILQ_REMOVE(&m->md.pv_list, pv, pv_list); if (TAILQ_EMPTY(&m->md.pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); pc->pc_map[field] |= 1UL << bit; freed++; } } if (freed == 0) { TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); continue; } /* Every freed mapping is for a 4 KB page. */ pmap->pm_stats.resident_count -= freed; PV_STAT(pv_entry_frees += freed); PV_STAT(pv_entry_spare += freed); pv_entry_count -= freed; TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); for (field = 0; field < _NPCM; field++) if (pc->pc_map[field] != pc_freemask[field]) { TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&newtail, pc, pc_lru); /* * One freed pv entry in locked_pmap is * sufficient. */ if (pmap == locked_pmap) goto out; break; } if (field == _NPCM) { PV_STAT(pv_entry_spare -= _NPCPV); PV_STAT(pc_chunk_count--); PV_STAT(pc_chunk_frees++); /* Entire chunk is free; return it. */ m_pc = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)pc)); pmap_qremove((vm_offset_t)pc, 1); pmap_ptelist_free(&pv_vafree, (vm_offset_t)pc); break; } } out: TAILQ_CONCAT(&pv_chunks, &newtail, pc_lru); if (pmap != NULL) { cpu_tlb_flushID(); cpu_cpwait(); if (pmap != locked_pmap) PMAP_UNLOCK(pmap); } return (m_pc); } /* * free the pv_entry back to the free list */ static void pmap_free_pv_entry(pmap_t pmap, pv_entry_t pv) { struct pv_chunk *pc; int bit, field, idx; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_ASSERT_LOCKED(pmap); PV_STAT(pv_entry_frees++); PV_STAT(pv_entry_spare++); pv_entry_count--; pc = pv_to_chunk(pv); idx = pv - &pc->pc_pventry[0]; field = idx / (sizeof(u_long) * NBBY); bit = idx % (sizeof(u_long) * NBBY); pc->pc_map[field] |= 1ul << bit; for (idx = 0; idx < _NPCM; idx++) if (pc->pc_map[idx] != pc_freemask[idx]) { /* * 98% of the time, pc is already at the head of the * list. If it isn't already, move it to the head. */ if (__predict_false(TAILQ_FIRST(&pmap->pm_pvchunk) != pc)) { TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); } return; } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); pmap_free_pv_chunk(pc); } static void pmap_free_pv_chunk(struct pv_chunk *pc) { vm_page_t m; TAILQ_REMOVE(&pv_chunks, pc, pc_lru); PV_STAT(pv_entry_spare -= _NPCPV); PV_STAT(pc_chunk_count--); PV_STAT(pc_chunk_frees++); /* entire chunk is free, return it */ m = PHYS_TO_VM_PAGE(pmap_kextract((vm_offset_t)pc)); pmap_qremove((vm_offset_t)pc, 1); vm_page_unwire(m, PQ_NONE); vm_page_free(m); pmap_ptelist_free(&pv_vafree, (vm_offset_t)pc); } static pv_entry_t pmap_get_pv_entry(pmap_t pmap, boolean_t try) { static const struct timeval printinterval = { 60, 0 }; static struct timeval lastprint; struct pv_chunk *pc; pv_entry_t pv; vm_page_t m; int bit, field, idx; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_ASSERT_LOCKED(pmap); PV_STAT(pv_entry_allocs++); pv_entry_count++; if (pv_entry_count > pv_entry_high_water) if (ratecheck(&lastprint, &printinterval)) printf("%s: Approaching the limit on PV entries.\n", __func__); retry: pc = TAILQ_FIRST(&pmap->pm_pvchunk); if (pc != NULL) { for (field = 0; field < _NPCM; field++) { if (pc->pc_map[field]) { bit = ffs(pc->pc_map[field]) - 1; break; } } if (field < _NPCM) { idx = field * sizeof(pc->pc_map[field]) * NBBY + bit; pv = &pc->pc_pventry[idx]; pc->pc_map[field] &= ~(1ul << bit); /* If this was the last item, move it to tail */ for (field = 0; field < _NPCM; field++) if (pc->pc_map[field] != 0) { PV_STAT(pv_entry_spare--); return (pv); /* not full, return */ } TAILQ_REMOVE(&pmap->pm_pvchunk, pc, pc_list); TAILQ_INSERT_TAIL(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(pv_entry_spare--); return (pv); } } /* * Access to the ptelist "pv_vafree" is synchronized by the pvh * global lock. If "pv_vafree" is currently non-empty, it will * remain non-empty until pmap_ptelist_alloc() completes. */ if (pv_vafree == 0 || (m = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { if (try) { pv_entry_count--; PV_STAT(pc_chunk_tryfail++); return (NULL); } m = pmap_pv_reclaim(pmap); if (m == NULL) goto retry; } PV_STAT(pc_chunk_count++); PV_STAT(pc_chunk_allocs++); pc = (struct pv_chunk *)pmap_ptelist_alloc(&pv_vafree); pmap_qenter((vm_offset_t)pc, &m, 1); pc->pc_pmap = pmap; pc->pc_map[0] = pc_freemask[0] & ~1ul; /* preallocated bit 0 */ for (field = 1; field < _NPCM; field++) pc->pc_map[field] = pc_freemask[field]; TAILQ_INSERT_TAIL(&pv_chunks, pc, pc_lru); pv = &pc->pc_pventry[0]; TAILQ_INSERT_HEAD(&pmap->pm_pvchunk, pc, pc_list); PV_STAT(pv_entry_spare += _NPCPV - 1); return (pv); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly * rounded to the page size. */ #define PMAP_REMOVE_CLEAN_LIST_SIZE 3 void pmap_remove(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { struct l2_bucket *l2b; vm_offset_t next_bucket; pd_entry_t l1pd; pt_entry_t *ptep; u_int total; u_int mappings, is_exec, is_refd; int flushall = 0; /* * we lock in the pmap => pv_head direction */ rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); total = 0; while (sva < eva) { next_bucket = L2_NEXT_BUCKET(sva); /* * Check for large page. */ l1pd = pmap->pm_l1->l1_kva[L1_IDX(sva)]; if ((l1pd & L1_TYPE_MASK) == L1_S_PROTO) { KASSERT((l1pd & L1_S_DOM_MASK) != L1_S_DOM(PMAP_DOMAIN_KERNEL), ("pmap_remove: " "Trying to remove kernel section mapping")); /* * Are we removing the entire large page? If not, * demote the mapping and fall through. */ if (sva + L1_S_SIZE == next_bucket && eva >= next_bucket) { pmap_remove_section(pmap, sva); sva = next_bucket; continue; } else if (!pmap_demote_section(pmap, sva)) { /* The large page mapping was destroyed. */ sva = next_bucket; continue; } } /* * Do one L2 bucket's worth at a time. */ if (next_bucket > eva) next_bucket = eva; l2b = pmap_get_l2_bucket(pmap, sva); if (l2b == NULL) { sva = next_bucket; continue; } ptep = &l2b->l2b_kva[l2pte_index(sva)]; mappings = 0; while (sva < next_bucket) { struct vm_page *m; pt_entry_t pte; vm_paddr_t pa; pte = *ptep; if (pte == 0) { /* * Nothing here, move along */ sva += PAGE_SIZE; ptep++; continue; } pmap->pm_stats.resident_count--; pa = l2pte_pa(pte); is_exec = 0; is_refd = 1; /* * Update flags. In a number of circumstances, * we could cluster a lot of these and do a * number of sequential pages in one go. */ if ((m = PHYS_TO_VM_PAGE(pa)) != NULL) { struct pv_entry *pve; pve = pmap_remove_pv(m, pmap, sva); if (pve) { is_exec = PTE_BEEN_EXECD(pte); is_refd = PTE_BEEN_REFD(pte); pmap_free_pv_entry(pmap, pve); } } *ptep = 0; PTE_SYNC(ptep); if (pmap_is_current(pmap)) { total++; if (total < PMAP_REMOVE_CLEAN_LIST_SIZE) { if (is_exec) cpu_tlb_flushID_SE(sva); else if (is_refd) cpu_tlb_flushD_SE(sva); } else if (total == PMAP_REMOVE_CLEAN_LIST_SIZE) flushall = 1; } sva += PAGE_SIZE; ptep++; mappings++; } pmap_free_l2_bucket(pmap, l2b, mappings); } rw_wunlock(&pvh_global_lock); if (flushall) cpu_tlb_flushID(); cpu_cpwait(); PMAP_UNLOCK(pmap); } /* * pmap_zero_page() * * Zero a given physical page by mapping it at a page hook point. * In doing the zero page op, the page we zero is mapped cachable, as with * StrongARM accesses to non-cached pages are non-burst making writing * _any_ bulk data very slow. */ static void pmap_zero_page_gen(vm_page_t m, int off, int size) { struct czpages *czp; KASSERT(TAILQ_EMPTY(&m->md.pv_list), ("pmap_zero_page_gen: page has mappings")); vm_paddr_t phys = VM_PAGE_TO_PHYS(m); sched_pin(); czp = &cpu_czpages[PCPU_GET(cpuid)]; mtx_lock(&czp->lock); /* * Hook in the page, zero it. */ *czp->dstptep = L2_S_PROTO | phys | pte_l2_s_cache_mode | L2_S_REF; pmap_set_prot(czp->dstptep, VM_PROT_WRITE, 0); PTE_SYNC(czp->dstptep); cpu_tlb_flushD_SE(czp->dstva); cpu_cpwait(); if (off || size != PAGE_SIZE) bzero((void *)(czp->dstva + off), size); else bzero_page(czp->dstva); /* * Although aliasing is not possible, if we use temporary mappings with * memory that will be mapped later as non-cached or with write-through * caches, we might end up overwriting it when calling wbinv_all. So * make sure caches are clean after the operation. */ cpu_idcache_wbinv_range(czp->dstva, size); pmap_l2cache_wbinv_range(czp->dstva, phys, size); mtx_unlock(&czp->lock); sched_unpin(); } /* * pmap_zero_page zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. */ void pmap_zero_page(vm_page_t m) { pmap_zero_page_gen(m, 0, PAGE_SIZE); } /* * pmap_zero_page_area zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. * * off and size may not cover an area beyond a single hardware page. */ void pmap_zero_page_area(vm_page_t m, int off, int size) { pmap_zero_page_gen(m, off, size); } /* * pmap_zero_page_idle zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. This * is intended to be called from the vm_pagezero process only and * outside of Giant. */ void pmap_zero_page_idle(vm_page_t m) { pmap_zero_page(m); } /* * pmap_copy_page copies the specified (machine independent) * page by mapping the page into virtual memory and using * bcopy to copy the page, one machine dependent page at a * time. */ /* * pmap_copy_page() * * Copy one physical page into another, by mapping the pages into * hook points. The same comment regarding cachability as in * pmap_zero_page also applies here. */ void pmap_copy_page_generic(vm_paddr_t src, vm_paddr_t dst) { struct czpages *czp; sched_pin(); czp = &cpu_czpages[PCPU_GET(cpuid)]; mtx_lock(&czp->lock); /* * Map the pages into the page hook points, copy them, and purge the * cache for the appropriate page. */ *czp->srcptep = L2_S_PROTO | src | pte_l2_s_cache_mode | L2_S_REF; pmap_set_prot(czp->srcptep, VM_PROT_READ, 0); PTE_SYNC(czp->srcptep); cpu_tlb_flushD_SE(czp->srcva); *czp->dstptep = L2_S_PROTO | dst | pte_l2_s_cache_mode | L2_S_REF; pmap_set_prot(czp->dstptep, VM_PROT_READ | VM_PROT_WRITE, 0); PTE_SYNC(czp->dstptep); cpu_tlb_flushD_SE(czp->dstva); cpu_cpwait(); bcopy_page(czp->srcva, czp->dstva); /* * Although aliasing is not possible, if we use temporary mappings with * memory that will be mapped later as non-cached or with write-through * caches, we might end up overwriting it when calling wbinv_all. So * make sure caches are clean after the operation. */ cpu_idcache_wbinv_range(czp->dstva, PAGE_SIZE); pmap_l2cache_wbinv_range(czp->dstva, dst, PAGE_SIZE); mtx_unlock(&czp->lock); sched_unpin(); } int unmapped_buf_allowed = 1; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { vm_page_t a_pg, b_pg; vm_offset_t a_pg_offset, b_pg_offset; int cnt; struct czpages *czp; sched_pin(); czp = &cpu_czpages[PCPU_GET(cpuid)]; mtx_lock(&czp->lock); while (xfersize > 0) { a_pg = ma[a_offset >> PAGE_SHIFT]; a_pg_offset = a_offset & PAGE_MASK; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); b_pg = mb[b_offset >> PAGE_SHIFT]; b_pg_offset = b_offset & PAGE_MASK; cnt = min(cnt, PAGE_SIZE - b_pg_offset); *czp->srcptep = L2_S_PROTO | VM_PAGE_TO_PHYS(a_pg) | pte_l2_s_cache_mode | L2_S_REF; pmap_set_prot(czp->srcptep, VM_PROT_READ, 0); PTE_SYNC(czp->srcptep); cpu_tlb_flushD_SE(czp->srcva); *czp->dstptep = L2_S_PROTO | VM_PAGE_TO_PHYS(b_pg) | pte_l2_s_cache_mode | L2_S_REF; pmap_set_prot(czp->dstptep, VM_PROT_READ | VM_PROT_WRITE, 0); PTE_SYNC(czp->dstptep); cpu_tlb_flushD_SE(czp->dstva); cpu_cpwait(); bcopy((char *)czp->srcva + a_pg_offset, (char *)czp->dstva + b_pg_offset, cnt); cpu_idcache_wbinv_range(czp->dstva + b_pg_offset, cnt); pmap_l2cache_wbinv_range(czp->dstva + b_pg_offset, VM_PAGE_TO_PHYS(b_pg) + b_pg_offset, cnt); xfersize -= cnt; a_offset += cnt; b_offset += cnt; } mtx_unlock(&czp->lock); sched_unpin(); } void pmap_copy_page(vm_page_t src, vm_page_t dst) { if (_arm_memcpy && PAGE_SIZE >= _min_memcpy_size && _arm_memcpy((void *)VM_PAGE_TO_PHYS(dst), (void *)VM_PAGE_TO_PHYS(src), PAGE_SIZE, IS_PHYSICAL) == 0) return; pmap_copy_page_generic(VM_PAGE_TO_PHYS(src), VM_PAGE_TO_PHYS(dst)); } vm_offset_t pmap_quick_enter_page(vm_page_t m) { pt_entry_t *qmap_pte; vm_offset_t qmap_addr; critical_enter(); qmap_addr = PCPU_GET(qmap_addr); qmap_pte = PCPU_GET(qmap_pte); KASSERT(*qmap_pte == 0, ("pmap_quick_enter_page: PTE busy")); *qmap_pte = L2_S_PROTO | VM_PAGE_TO_PHYS(m) | L2_S_REF; if (m->md.pv_memattr != VM_MEMATTR_UNCACHEABLE) *qmap_pte |= pte_l2_s_cache_mode; pmap_set_prot(qmap_pte, VM_PROT_READ | VM_PROT_WRITE, 0); PTE_SYNC(qmap_pte); cpu_tlb_flushD_SE(qmap_addr); cpu_cpwait(); return (qmap_addr); } void pmap_quick_remove_page(vm_offset_t addr) { pt_entry_t *qmap_pte; qmap_pte = PCPU_GET(qmap_pte); KASSERT(addr == PCPU_GET(qmap_addr), ("pmap_quick_remove_page: invalid address")); KASSERT(*qmap_pte != 0, ("pmap_quick_remove_page: PTE not in use")); cpu_idcache_wbinv_range(addr, PAGE_SIZE); pmap_l2cache_wbinv_range(addr, *qmap_pte & L2_S_FRAME, PAGE_SIZE); *qmap_pte = 0; PTE_SYNC(qmap_pte); critical_exit(); } /* * this routine returns true if a physical page resides * in the given pmap. */ boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m) { struct md_page *pvh; pv_entry_t pv; int loops = 0; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_page_exists_quick: page %p is not managed", m)); rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } if (!rv && loops < 16 && (m->flags & PG_FICTITIOUS) == 0) { pvh = pa_to_pvh(VM_PAGE_TO_PHYS(m)); TAILQ_FOREACH(pv, &pvh->pv_list, pv_list) { if (PV_PMAP(pv) == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } } rw_wunlock(&pvh_global_lock); return (rv); } /* * pmap_page_wired_mappings: * * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { int count; count = 0; if ((m->oflags & VPO_UNMANAGED) != 0) return (count); rw_wlock(&pvh_global_lock); count = pmap_pvh_wired_mappings(&m->md, count); if ((m->flags & PG_FICTITIOUS) == 0) { count = pmap_pvh_wired_mappings(pa_to_pvh(VM_PAGE_TO_PHYS(m)), count); } rw_wunlock(&pvh_global_lock); return (count); } /* * pmap_pvh_wired_mappings: * * Return the updated number "count" of managed mappings that are wired. */ static int pmap_pvh_wired_mappings(struct md_page *pvh, int count) { pv_entry_t pv; rw_assert(&pvh_global_lock, RA_WLOCKED); TAILQ_FOREACH(pv, &pvh->pv_list, pv_list) { if ((pv->pv_flags & PVF_WIRED) != 0) count++; } return (count); } /* * Returns TRUE if any of the given mappings were referenced and FALSE * otherwise. Both page and section mappings are supported. */ static boolean_t pmap_is_referenced_pvh(struct md_page *pvh) { struct l2_bucket *l2b; pv_entry_t pv; pd_entry_t *pl1pd; pt_entry_t *ptep; pmap_t pmap; boolean_t rv; rw_assert(&pvh_global_lock, RA_WLOCKED); rv = FALSE; TAILQ_FOREACH(pv, &pvh->pv_list, pv_list) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(pv->pv_va)]; if ((*pl1pd & L1_TYPE_MASK) == L1_S_PROTO) rv = L1_S_REFERENCED(*pl1pd); else { l2b = pmap_get_l2_bucket(pmap, pv->pv_va); ptep = &l2b->l2b_kva[l2pte_index(pv->pv_va)]; rv = L2_S_REFERENCED(*ptep); } PMAP_UNLOCK(pmap); if (rv) break; } return (rv); } /* * pmap_is_referenced: * * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_referenced: page %p is not managed", m)); rw_wlock(&pvh_global_lock); rv = pmap_is_referenced_pvh(&m->md) || ((m->flags & PG_FICTITIOUS) == 0 && pmap_is_referenced_pvh(pa_to_pvh(VM_PAGE_TO_PHYS(m)))); rw_wunlock(&pvh_global_lock); return (rv); } /* * pmap_ts_referenced: * * Return the count of reference bits for a page, clearing all of them. */ int pmap_ts_referenced(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_ts_referenced: page %p is not managed", m)); return (pmap_clearbit(m, PVF_REF)); } /* * Returns TRUE if any of the given mappings were used to modify * physical memory. Otherwise, returns FALSE. Both page and 1MB section * mappings are supported. */ static boolean_t pmap_is_modified_pvh(struct md_page *pvh) { pd_entry_t *pl1pd; struct l2_bucket *l2b; pv_entry_t pv; pt_entry_t *ptep; pmap_t pmap; boolean_t rv; rw_assert(&pvh_global_lock, RA_WLOCKED); rv = FALSE; TAILQ_FOREACH(pv, &pvh->pv_list, pv_list) { pmap = PV_PMAP(pv); PMAP_LOCK(pmap); pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(pv->pv_va)]; if ((*pl1pd & L1_TYPE_MASK) == L1_S_PROTO) rv = L1_S_WRITABLE(*pl1pd); else { l2b = pmap_get_l2_bucket(pmap, pv->pv_va); ptep = &l2b->l2b_kva[l2pte_index(pv->pv_va)]; rv = L2_S_WRITABLE(*ptep); } PMAP_UNLOCK(pmap); if (rv) break; } return (rv); } boolean_t pmap_is_modified(vm_page_t m) { boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_modified: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTEs can have APX cleared. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (FALSE); rw_wlock(&pvh_global_lock); rv = pmap_is_modified_pvh(&m->md) || ((m->flags & PG_FICTITIOUS) == 0 && pmap_is_modified_pvh(pa_to_pvh(VM_PAGE_TO_PHYS(m)))); rw_wunlock(&pvh_global_lock); return (rv); } /* * Apply the given advice to the specified range of addresses within the * given pmap. Depending on the advice, clear the referenced and/or * modified flags in each mapping. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { struct l2_bucket *l2b; struct pv_entry *pve; pd_entry_t l1pd; pt_entry_t *ptep, opte, pte; vm_offset_t next_bucket; vm_page_t m; if (advice != MADV_DONTNEED && advice != MADV_FREE) return; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); for (; sva < eva; sva = next_bucket) { next_bucket = L2_NEXT_BUCKET(sva); if (next_bucket < sva) next_bucket = eva; l1pd = pmap->pm_l1->l1_kva[L1_IDX(sva)]; if ((l1pd & L1_TYPE_MASK) == L1_S_PROTO) { if (pmap == pmap_kernel()) continue; if (!pmap_demote_section(pmap, sva)) { /* * The large page mapping was destroyed. */ continue; } /* * Unless the page mappings are wired, remove the * mapping to a single page so that a subsequent * access may repromote. Since the underlying * l2_bucket is fully populated, this removal * never frees an entire l2_bucket. */ l2b = pmap_get_l2_bucket(pmap, sva); KASSERT(l2b != NULL, ("pmap_advise: no l2 bucket for " "va 0x%#x, pmap 0x%p", sva, pmap)); ptep = &l2b->l2b_kva[l2pte_index(sva)]; opte = *ptep; m = PHYS_TO_VM_PAGE(l2pte_pa(*ptep)); KASSERT(m != NULL, ("pmap_advise: no vm_page for demoted superpage")); pve = pmap_find_pv(&m->md, pmap, sva); KASSERT(pve != NULL, ("pmap_advise: no PV entry for managed mapping")); if ((pve->pv_flags & PVF_WIRED) == 0) { pmap_free_l2_bucket(pmap, l2b, 1); pve = pmap_remove_pv(m, pmap, sva); pmap_free_pv_entry(pmap, pve); *ptep = 0; PTE_SYNC(ptep); if (pmap_is_current(pmap)) { if (PTE_BEEN_EXECD(opte)) cpu_tlb_flushID_SE(sva); else if (PTE_BEEN_REFD(opte)) cpu_tlb_flushD_SE(sva); } } } if (next_bucket > eva) next_bucket = eva; l2b = pmap_get_l2_bucket(pmap, sva); if (l2b == NULL) continue; for (ptep = &l2b->l2b_kva[l2pte_index(sva)]; sva != next_bucket; ptep++, sva += PAGE_SIZE) { opte = pte = *ptep; if ((opte & L2_S_PROTO) == 0) continue; m = PHYS_TO_VM_PAGE(l2pte_pa(opte)); if (m == NULL || (m->oflags & VPO_UNMANAGED) != 0) continue; else if (L2_S_WRITABLE(opte)) { if (advice == MADV_DONTNEED) { /* * Don't need to mark the page * dirty as it was already marked as * such in pmap_fault_fixup() or * pmap_enter_locked(). * Just clear the state. */ } else pte |= L2_APX; pte &= ~L2_S_REF; *ptep = pte; PTE_SYNC(ptep); } else if (L2_S_REFERENCED(opte)) { pte &= ~L2_S_REF; *ptep = pte; PTE_SYNC(ptep); } else continue; if (pmap_is_current(pmap)) { if (PTE_BEEN_EXECD(opte)) cpu_tlb_flushID_SE(sva); else if (PTE_BEEN_REFD(opte)) cpu_tlb_flushD_SE(sva); } } } cpu_cpwait(); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Clear the modify bits on the specified physical page. */ void pmap_clear_modify(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("pmap_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PGA_WRITEABLE, then no mappings can be modified. * If the object containing the page is locked and the page is not * exclusive busied, then PGA_WRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; if (pmap_is_modified(m)) pmap_clearbit(m, PVF_MOD); } /* * Clear the write and modified bits in each of the given page's mappings. */ void pmap_remove_write(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (vm_page_xbusied(m) || (m->aflags & PGA_WRITEABLE) != 0) pmap_clearbit(m, PVF_WRITE); } /* * perform the pmap work for mincore */ int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { struct l2_bucket *l2b; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep, pte; vm_paddr_t pa; vm_page_t m; int val; boolean_t managed; PMAP_LOCK(pmap); retry: pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(addr)]; l1pd = *pl1pd; if ((l1pd & L1_TYPE_MASK) == L1_S_PROTO) { pa = (l1pd & L1_S_FRAME); val = MINCORE_SUPER | MINCORE_INCORE; if (L1_S_WRITABLE(l1pd)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; managed = FALSE; m = PHYS_TO_VM_PAGE(pa); if (m != NULL && (m->oflags & VPO_UNMANAGED) == 0) managed = TRUE; if (managed) { if (L1_S_REFERENCED(l1pd)) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } } else { l2b = pmap_get_l2_bucket(pmap, addr); if (l2b == NULL) { val = 0; goto out; } ptep = &l2b->l2b_kva[l2pte_index(addr)]; pte = *ptep; if (!l2pte_valid(pte)) { val = 0; goto out; } val = MINCORE_INCORE; if (L2_S_WRITABLE(pte)) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; managed = FALSE; pa = l2pte_pa(pte); m = PHYS_TO_VM_PAGE(pa); if (m != NULL && (m->oflags & VPO_UNMANAGED) == 0) managed = TRUE; if (managed) { if (L2_S_REFERENCED(pte)) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } } if ((val & (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER)) != (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER) && managed) { /* Ensure that "PHYS_TO_VM_PAGE(pa)->object" doesn't change. */ if (vm_page_pa_tryrelock(pmap, pa, locked_pa)) goto retry; } else out: PA_UNLOCK_COND(*locked_pa); PMAP_UNLOCK(pmap); return (val); } void pmap_sync_icache(pmap_t pmap, vm_offset_t va, vm_size_t sz) { } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more superpage mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { vm_offset_t superpage_offset; if (size < NBPDR) return; if (object != NULL && (object->flags & OBJ_COLORED) != 0) offset += ptoa(object->pg_color); superpage_offset = offset & PDRMASK; if (size - ((NBPDR - superpage_offset) & PDRMASK) < NBPDR || (*addr & PDRMASK) == superpage_offset) return; if ((*addr & PDRMASK) < superpage_offset) *addr = (*addr & ~PDRMASK) + superpage_offset; else *addr = ((*addr + PDRMASK) & ~PDRMASK) + superpage_offset; } /* * pmap_map_section: * * Create a single section mapping. */ void pmap_map_section(pmap_t pmap, vm_offset_t va, vm_offset_t pa, vm_prot_t prot, boolean_t ref) { pd_entry_t *pl1pd, l1pd; pd_entry_t fl; KASSERT(((va | pa) & L1_S_OFFSET) == 0, ("Not a valid section mapping")); fl = pte_l1_s_cache_mode; pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; l1pd = L1_S_PROTO | pa | L1_S_PROT(PTE_USER, prot) | fl | L1_S_DOM(pmap->pm_domain); /* Mark page referenced if this section is a result of a promotion. */ if (ref == TRUE) l1pd |= L1_S_REF; #ifdef SMP l1pd |= L1_SHARED; #endif *pl1pd = l1pd; PTE_SYNC(pl1pd); } /* * pmap_link_l2pt: * * Link the L2 page table specified by l2pv.pv_pa into the L1 * page table at the slot for "va". */ void pmap_link_l2pt(vm_offset_t l1pt, vm_offset_t va, struct pv_addr *l2pv) { pd_entry_t *pde = (pd_entry_t *) l1pt, proto; u_int slot = va >> L1_S_SHIFT; proto = L1_S_DOM(PMAP_DOMAIN_KERNEL) | L1_C_PROTO; #ifdef VERBOSE_INIT_ARM printf("pmap_link_l2pt: pa=0x%x va=0x%x\n", l2pv->pv_pa, l2pv->pv_va); #endif pde[slot + 0] = proto | (l2pv->pv_pa + 0x000); PTE_SYNC(&pde[slot]); SLIST_INSERT_HEAD(&kernel_pt_list, l2pv, pv_list); } /* * pmap_map_entry * * Create a single page mapping. */ void pmap_map_entry(vm_offset_t l1pt, vm_offset_t va, vm_offset_t pa, int prot, int cache) { pd_entry_t *pde = (pd_entry_t *) l1pt; pt_entry_t fl; pt_entry_t *ptep; KASSERT(((va | pa) & PAGE_MASK) == 0, ("ouin")); fl = l2s_mem_types[cache]; if ((pde[va >> L1_S_SHIFT] & L1_TYPE_MASK) != L1_TYPE_C) panic("pmap_map_entry: no L2 table for VA 0x%08x", va); ptep = (pt_entry_t *)kernel_pt_lookup(pde[L1_IDX(va)] & L1_C_ADDR_MASK); if (ptep == NULL) panic("pmap_map_entry: can't find L2 table for VA 0x%08x", va); ptep[l2pte_index(va)] = L2_S_PROTO | pa | fl | L2_S_REF; pmap_set_prot(&ptep[l2pte_index(va)], prot, 0); PTE_SYNC(&ptep[l2pte_index(va)]); } /* * pmap_map_chunk: * * Map a chunk of memory using the most efficient mappings * possible (section. large page, small page) into the * provided L1 and L2 tables at the specified virtual address. */ vm_size_t pmap_map_chunk(vm_offset_t l1pt, vm_offset_t va, vm_offset_t pa, vm_size_t size, int prot, int type) { pd_entry_t *pde = (pd_entry_t *) l1pt; pt_entry_t *ptep, f1, f2s, f2l; vm_size_t resid; int i; resid = (size + (PAGE_SIZE - 1)) & ~(PAGE_SIZE - 1); if (l1pt == 0) panic("pmap_map_chunk: no L1 table provided"); #ifdef VERBOSE_INIT_ARM printf("pmap_map_chunk: pa=0x%x va=0x%x size=0x%x resid=0x%x " "prot=0x%x type=%d\n", pa, va, size, resid, prot, type); #endif f1 = l1_mem_types[type]; f2l = l2l_mem_types[type]; f2s = l2s_mem_types[type]; size = resid; while (resid > 0) { /* See if we can use a section mapping. */ if (L1_S_MAPPABLE_P(va, pa, resid)) { #ifdef VERBOSE_INIT_ARM printf("S"); #endif pde[va >> L1_S_SHIFT] = L1_S_PROTO | pa | L1_S_PROT(PTE_KERNEL, prot | VM_PROT_EXECUTE) | f1 | L1_S_DOM(PMAP_DOMAIN_KERNEL) | L1_S_REF; PTE_SYNC(&pde[va >> L1_S_SHIFT]); va += L1_S_SIZE; pa += L1_S_SIZE; resid -= L1_S_SIZE; continue; } /* * Ok, we're going to use an L2 table. Make sure * one is actually in the corresponding L1 slot * for the current VA. */ if ((pde[va >> L1_S_SHIFT] & L1_TYPE_MASK) != L1_TYPE_C) panic("pmap_map_chunk: no L2 table for VA 0x%08x", va); ptep = (pt_entry_t *) kernel_pt_lookup( pde[L1_IDX(va)] & L1_C_ADDR_MASK); if (ptep == NULL) panic("pmap_map_chunk: can't find L2 table for VA" "0x%08x", va); /* See if we can use a L2 large page mapping. */ if (L2_L_MAPPABLE_P(va, pa, resid)) { #ifdef VERBOSE_INIT_ARM printf("L"); #endif for (i = 0; i < 16; i++) { ptep[l2pte_index(va) + i] = L2_L_PROTO | pa | L2_L_PROT(PTE_KERNEL, prot) | f2l; PTE_SYNC(&ptep[l2pte_index(va) + i]); } va += L2_L_SIZE; pa += L2_L_SIZE; resid -= L2_L_SIZE; continue; } /* Use a small page mapping. */ #ifdef VERBOSE_INIT_ARM printf("P"); #endif ptep[l2pte_index(va)] = L2_S_PROTO | pa | f2s | L2_S_REF; pmap_set_prot(&ptep[l2pte_index(va)], prot, 0); PTE_SYNC(&ptep[l2pte_index(va)]); va += PAGE_SIZE; pa += PAGE_SIZE; resid -= PAGE_SIZE; } #ifdef VERBOSE_INIT_ARM printf("\n"); #endif return (size); } void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma) { /* * Remember the memattr in a field that gets used to set the appropriate * bits in the PTEs as mappings are established. */ m->md.pv_memattr = ma; /* * It appears that this function can only be called before any mappings * for the page are established on ARM. If this ever changes, this code * will need to walk the pv_list and make each of the existing mappings * uncacheable, being careful to sync caches and PTEs (and maybe * invalidate TLB?) for any current mapping it modifies. */ if (TAILQ_FIRST(&m->md.pv_list) != NULL) panic("Can't change memattr on page with existing mappings"); } Index: projects/clang380-import/sys/arm/arm/pmap.c =================================================================== --- projects/clang380-import/sys/arm/arm/pmap.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/pmap.c (revision 294777) @@ -1,4845 +1,4891 @@ /* From: $NetBSD: pmap.c,v 1.148 2004/04/03 04:35:48 bsh Exp $ */ /*- * Copyright 2004 Olivier Houchard. * Copyright 2003 Wasabi Systems, Inc. * All rights reserved. * * Written by Steve C. Woodford for Wasabi Systems, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed for the NetBSD Project by * Wasabi Systems, Inc. * 4. The name of Wasabi Systems, Inc. may not be used to endorse * or promote products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY WASABI SYSTEMS, INC. ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WASABI SYSTEMS, INC * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /*- * Copyright (c) 2002-2003 Wasabi Systems, Inc. * Copyright (c) 2001 Richard Earnshaw * Copyright (c) 2001-2002 Christopher Gilbert * All rights reserved. * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the company nor the name of the author may be used to * endorse or promote products derived from this software without specific * prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /*- * Copyright (c) 1999 The NetBSD Foundation, Inc. * All rights reserved. * * This code is derived from software contributed to The NetBSD Foundation * by Charles M. Hannum. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /*- * Copyright (c) 1994-1998 Mark Brinicombe. * Copyright (c) 1994 Brini. * All rights reserved. * * This code is derived from software written for Brini by Mark Brinicombe * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by Mark Brinicombe. * 4. The name of the author may not be used to endorse or promote products * derived from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * * RiscBSD kernel project * * pmap.c * * Machine dependant vm stuff * * Created : 20/09/94 */ /* * Special compilation symbols * PMAP_DEBUG - Build in pmap_debug_level code * * Note that pmap_mapdev() and pmap_unmapdev() are implemented in arm/devmap.c */ /* Include header files */ #include "opt_vm.h" #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef PMAP_DEBUG #define PDEBUG(_lev_,_stat_) \ if (pmap_debug_level >= (_lev_)) \ ((_stat_)) #define dprintf printf int pmap_debug_level = 0; #define PMAP_INLINE #else /* PMAP_DEBUG */ #define PDEBUG(_lev_,_stat_) /* Nothing */ #define dprintf(x, arg...) #define PMAP_INLINE __inline #endif /* PMAP_DEBUG */ extern struct pv_addr systempage; extern int last_fault_code; /* * Internal function prototypes */ static void pmap_free_pv_entry (pv_entry_t); static pv_entry_t pmap_get_pv_entry(void); static int pmap_enter_locked(pmap_t, vm_offset_t, vm_page_t, vm_prot_t, u_int); static vm_paddr_t pmap_extract_locked(pmap_t pmap, vm_offset_t va); static void pmap_fix_cache(struct vm_page *, pmap_t, vm_offset_t); static void pmap_alloc_l1(pmap_t); static void pmap_free_l1(pmap_t); static int pmap_clearbit(struct vm_page *, u_int); static struct l2_bucket *pmap_get_l2_bucket(pmap_t, vm_offset_t); static struct l2_bucket *pmap_alloc_l2_bucket(pmap_t, vm_offset_t); static void pmap_free_l2_bucket(pmap_t, struct l2_bucket *, u_int); static vm_offset_t kernel_pt_lookup(vm_paddr_t); static MALLOC_DEFINE(M_VMPMAP, "pmap", "PMAP L1"); vm_offset_t virtual_avail; /* VA of first avail page (after kernel bss) */ vm_offset_t virtual_end; /* VA of last avail page (end of kernel AS) */ vm_offset_t pmap_curmaxkvaddr; vm_paddr_t kernel_l1pa; vm_offset_t kernel_vm_end = 0; vm_offset_t vm_max_kernel_address; struct pmap kernel_pmap_store; static pt_entry_t *csrc_pte, *cdst_pte; static vm_offset_t csrcp, cdstp, qmap_addr; static struct mtx cmtx, qmap_mtx; static void pmap_init_l1(struct l1_ttable *, pd_entry_t *); /* * These routines are called when the CPU type is identified to set up * the PTE prototypes, cache modes, etc. * * The variables are always here, just in case LKMs need to reference * them (though, they shouldn't). */ pt_entry_t pte_l1_s_cache_mode; pt_entry_t pte_l1_s_cache_mode_pt; pt_entry_t pte_l1_s_cache_mask; pt_entry_t pte_l2_l_cache_mode; pt_entry_t pte_l2_l_cache_mode_pt; pt_entry_t pte_l2_l_cache_mask; pt_entry_t pte_l2_s_cache_mode; pt_entry_t pte_l2_s_cache_mode_pt; pt_entry_t pte_l2_s_cache_mask; pt_entry_t pte_l2_s_prot_u; pt_entry_t pte_l2_s_prot_w; pt_entry_t pte_l2_s_prot_mask; pt_entry_t pte_l1_s_proto; pt_entry_t pte_l1_c_proto; pt_entry_t pte_l2_s_proto; void (*pmap_copy_page_func)(vm_paddr_t, vm_paddr_t); void (*pmap_copy_page_offs_func)(vm_paddr_t a_phys, vm_offset_t a_offs, vm_paddr_t b_phys, vm_offset_t b_offs, int cnt); void (*pmap_zero_page_func)(vm_paddr_t, int, int); struct msgbuf *msgbufp = 0; /* * Crashdump maps. */ static caddr_t crashdumpmap; extern void bcopy_page(vm_offset_t, vm_offset_t); extern void bzero_page(vm_offset_t); extern vm_offset_t alloc_firstaddr; char *_tmppt; /* * Metadata for L1 translation tables. */ struct l1_ttable { /* Entry on the L1 Table list */ SLIST_ENTRY(l1_ttable) l1_link; /* Entry on the L1 Least Recently Used list */ TAILQ_ENTRY(l1_ttable) l1_lru; /* Track how many domains are allocated from this L1 */ volatile u_int l1_domain_use_count; /* * A free-list of domain numbers for this L1. * We avoid using ffs() and a bitmap to track domains since ffs() * is slow on ARM. */ u_int8_t l1_domain_first; u_int8_t l1_domain_free[PMAP_DOMAINS]; /* Physical address of this L1 page table */ vm_paddr_t l1_physaddr; /* KVA of this L1 page table */ pd_entry_t *l1_kva; }; /* * Convert a virtual address into its L1 table index. That is, the * index used to locate the L2 descriptor table pointer in an L1 table. * This is basically used to index l1->l1_kva[]. * * Each L2 descriptor table represents 1MB of VA space. */ #define L1_IDX(va) (((vm_offset_t)(va)) >> L1_S_SHIFT) /* * L1 Page Tables are tracked using a Least Recently Used list. * - New L1s are allocated from the HEAD. * - Freed L1s are added to the TAIl. * - Recently accessed L1s (where an 'access' is some change to one of * the userland pmaps which owns this L1) are moved to the TAIL. */ static TAILQ_HEAD(, l1_ttable) l1_lru_list; /* * A list of all L1 tables */ static SLIST_HEAD(, l1_ttable) l1_list; static struct mtx l1_lru_lock; /* * The l2_dtable tracks L2_BUCKET_SIZE worth of L1 slots. * * This is normally 16MB worth L2 page descriptors for any given pmap. * Reference counts are maintained for L2 descriptors so they can be * freed when empty. */ struct l2_dtable { /* The number of L2 page descriptors allocated to this l2_dtable */ u_int l2_occupancy; /* List of L2 page descriptors */ struct l2_bucket { pt_entry_t *l2b_kva; /* KVA of L2 Descriptor Table */ vm_paddr_t l2b_phys; /* Physical address of same */ u_short l2b_l1idx; /* This L2 table's L1 index */ u_short l2b_occupancy; /* How many active descriptors */ } l2_bucket[L2_BUCKET_SIZE]; }; /* pmap_kenter_internal flags */ #define KENTER_CACHE 0x1 #define KENTER_USER 0x2 /* * Given an L1 table index, calculate the corresponding l2_dtable index * and bucket index within the l2_dtable. */ #define L2_IDX(l1idx) (((l1idx) >> L2_BUCKET_LOG2) & \ (L2_SIZE - 1)) #define L2_BUCKET(l1idx) ((l1idx) & (L2_BUCKET_SIZE - 1)) /* * Given a virtual address, this macro returns the * virtual address required to drop into the next L2 bucket. */ #define L2_NEXT_BUCKET(va) (((va) & L1_S_FRAME) + L1_S_SIZE) /* * We try to map the page tables write-through, if possible. However, not * all CPUs have a write-through cache mode, so on those we have to sync * the cache when we frob page tables. * * We try to evaluate this at compile time, if possible. However, it's * not always possible to do that, hence this run-time var. */ int pmap_needs_pte_sync; /* * Macro to determine if a mapping might be resident in the * instruction cache and/or TLB */ #define PV_BEEN_EXECD(f) (((f) & (PVF_REF | PVF_EXEC)) == (PVF_REF | PVF_EXEC)) /* * Macro to determine if a mapping might be resident in the * data cache and/or TLB */ #define PV_BEEN_REFD(f) (((f) & PVF_REF) != 0) #ifndef PMAP_SHPGPERPROC #define PMAP_SHPGPERPROC 200 #endif #define pmap_is_current(pm) ((pm) == pmap_kernel() || \ curproc->p_vmspace->vm_map.pmap == (pm)) static uma_zone_t pvzone = NULL; uma_zone_t l2zone; static uma_zone_t l2table_zone; static vm_offset_t pmap_kernel_l2dtable_kva; static vm_offset_t pmap_kernel_l2ptp_kva; static vm_paddr_t pmap_kernel_l2ptp_phys; static int pv_entry_count=0, pv_entry_max=0, pv_entry_high_water=0; static struct rwlock pvh_global_lock; void pmap_copy_page_offs_generic(vm_paddr_t a_phys, vm_offset_t a_offs, vm_paddr_t b_phys, vm_offset_t b_offs, int cnt); #if ARM_MMU_XSCALE == 1 void pmap_copy_page_offs_xscale(vm_paddr_t a_phys, vm_offset_t a_offs, vm_paddr_t b_phys, vm_offset_t b_offs, int cnt); #endif /* * This list exists for the benefit of pmap_map_chunk(). It keeps track * of the kernel L2 tables during bootstrap, so that pmap_map_chunk() can * find them as necessary. * * Note that the data on this list MUST remain valid after initarm() returns, * as pmap_bootstrap() uses it to contruct L2 table metadata. */ SLIST_HEAD(, pv_addr) kernel_pt_list = SLIST_HEAD_INITIALIZER(kernel_pt_list); static void pmap_init_l1(struct l1_ttable *l1, pd_entry_t *l1pt) { int i; l1->l1_kva = l1pt; l1->l1_domain_use_count = 0; l1->l1_domain_first = 0; for (i = 0; i < PMAP_DOMAINS; i++) l1->l1_domain_free[i] = i + 1; /* * Copy the kernel's L1 entries to each new L1. */ if (l1pt != pmap_kernel()->pm_l1->l1_kva) memcpy(l1pt, pmap_kernel()->pm_l1->l1_kva, L1_TABLE_SIZE); if ((l1->l1_physaddr = pmap_extract(pmap_kernel(), (vm_offset_t)l1pt)) == 0) panic("pmap_init_l1: can't get PA of L1 at %p", l1pt); SLIST_INSERT_HEAD(&l1_list, l1, l1_link); TAILQ_INSERT_TAIL(&l1_lru_list, l1, l1_lru); } static vm_offset_t kernel_pt_lookup(vm_paddr_t pa) { struct pv_addr *pv; SLIST_FOREACH(pv, &kernel_pt_list, pv_list) { if (pv->pv_pa == pa) return (pv->pv_va); } return (0); } #if ARM_MMU_GENERIC != 0 void pmap_pte_init_generic(void) { pte_l1_s_cache_mode = L1_S_B|L1_S_C; pte_l1_s_cache_mask = L1_S_CACHE_MASK_generic; pte_l2_l_cache_mode = L2_B|L2_C; pte_l2_l_cache_mask = L2_L_CACHE_MASK_generic; pte_l2_s_cache_mode = L2_B|L2_C; pte_l2_s_cache_mask = L2_S_CACHE_MASK_generic; /* * If we have a write-through cache, set B and C. If * we have a write-back cache, then we assume setting * only C will make those pages write-through. */ if (cpufuncs.cf_dcache_wb_range == (void *) cpufunc_nullop) { pte_l1_s_cache_mode_pt = L1_S_B|L1_S_C; pte_l2_l_cache_mode_pt = L2_B|L2_C; pte_l2_s_cache_mode_pt = L2_B|L2_C; } else { pte_l1_s_cache_mode_pt = L1_S_C; pte_l2_l_cache_mode_pt = L2_C; pte_l2_s_cache_mode_pt = L2_C; } pte_l2_s_prot_u = L2_S_PROT_U_generic; pte_l2_s_prot_w = L2_S_PROT_W_generic; pte_l2_s_prot_mask = L2_S_PROT_MASK_generic; pte_l1_s_proto = L1_S_PROTO_generic; pte_l1_c_proto = L1_C_PROTO_generic; pte_l2_s_proto = L2_S_PROTO_generic; pmap_copy_page_func = pmap_copy_page_generic; pmap_copy_page_offs_func = pmap_copy_page_offs_generic; pmap_zero_page_func = pmap_zero_page_generic; } #endif /* ARM_MMU_GENERIC != 0 */ #if ARM_MMU_XSCALE == 1 #if (ARM_NMMUS > 1) || defined (CPU_XSCALE_CORE3) static u_int xscale_use_minidata; #endif void pmap_pte_init_xscale(void) { uint32_t auxctl; int write_through = 0; pte_l1_s_cache_mode = L1_S_B|L1_S_C|L1_S_XSCALE_P; pte_l1_s_cache_mask = L1_S_CACHE_MASK_xscale; pte_l2_l_cache_mode = L2_B|L2_C; pte_l2_l_cache_mask = L2_L_CACHE_MASK_xscale; pte_l2_s_cache_mode = L2_B|L2_C; pte_l2_s_cache_mask = L2_S_CACHE_MASK_xscale; pte_l1_s_cache_mode_pt = L1_S_C; pte_l2_l_cache_mode_pt = L2_C; pte_l2_s_cache_mode_pt = L2_C; #ifdef XSCALE_CACHE_READ_WRITE_ALLOCATE /* * The XScale core has an enhanced mode where writes that * miss the cache cause a cache line to be allocated. This * is significantly faster than the traditional, write-through * behavior of this case. */ pte_l1_s_cache_mode |= L1_S_XSCALE_TEX(TEX_XSCALE_X); pte_l2_l_cache_mode |= L2_XSCALE_L_TEX(TEX_XSCALE_X); pte_l2_s_cache_mode |= L2_XSCALE_T_TEX(TEX_XSCALE_X); #endif /* XSCALE_CACHE_READ_WRITE_ALLOCATE */ #ifdef XSCALE_CACHE_WRITE_THROUGH /* * Some versions of the XScale core have various bugs in * their cache units, the work-around for which is to run * the cache in write-through mode. Unfortunately, this * has a major (negative) impact on performance. So, we * go ahead and run fast-and-loose, in the hopes that we * don't line up the planets in a way that will trip the * bugs. * * However, we give you the option to be slow-but-correct. */ write_through = 1; #elif defined(XSCALE_CACHE_WRITE_BACK) /* force write back cache mode */ write_through = 0; #elif defined(CPU_XSCALE_PXA2X0) /* * Intel PXA2[15]0 processors are known to have a bug in * write-back cache on revision 4 and earlier (stepping * A[01] and B[012]). Fixed for C0 and later. */ { uint32_t id, type; id = cpufunc_id(); type = id & ~(CPU_ID_XSCALE_COREREV_MASK|CPU_ID_REVISION_MASK); if (type == CPU_ID_PXA250 || type == CPU_ID_PXA210) { if ((id & CPU_ID_REVISION_MASK) < 5) { /* write through for stepping A0-1 and B0-2 */ write_through = 1; } } } #endif /* XSCALE_CACHE_WRITE_THROUGH */ if (write_through) { pte_l1_s_cache_mode = L1_S_C; pte_l2_l_cache_mode = L2_C; pte_l2_s_cache_mode = L2_C; } #if (ARM_NMMUS > 1) xscale_use_minidata = 1; #endif pte_l2_s_prot_u = L2_S_PROT_U_xscale; pte_l2_s_prot_w = L2_S_PROT_W_xscale; pte_l2_s_prot_mask = L2_S_PROT_MASK_xscale; pte_l1_s_proto = L1_S_PROTO_xscale; pte_l1_c_proto = L1_C_PROTO_xscale; pte_l2_s_proto = L2_S_PROTO_xscale; #ifdef CPU_XSCALE_CORE3 pmap_copy_page_func = pmap_copy_page_generic; pmap_copy_page_offs_func = pmap_copy_page_offs_generic; pmap_zero_page_func = pmap_zero_page_generic; xscale_use_minidata = 0; /* Make sure it is L2-cachable */ pte_l1_s_cache_mode |= L1_S_XSCALE_TEX(TEX_XSCALE_T); pte_l1_s_cache_mode_pt = pte_l1_s_cache_mode &~ L1_S_XSCALE_P; pte_l2_l_cache_mode |= L2_XSCALE_L_TEX(TEX_XSCALE_T) ; pte_l2_l_cache_mode_pt = pte_l1_s_cache_mode; pte_l2_s_cache_mode |= L2_XSCALE_T_TEX(TEX_XSCALE_T); pte_l2_s_cache_mode_pt = pte_l2_s_cache_mode; #else pmap_copy_page_func = pmap_copy_page_xscale; pmap_copy_page_offs_func = pmap_copy_page_offs_xscale; pmap_zero_page_func = pmap_zero_page_xscale; #endif /* * Disable ECC protection of page table access, for now. */ __asm __volatile("mrc p15, 0, %0, c1, c0, 1" : "=r" (auxctl)); auxctl &= ~XSCALE_AUXCTL_P; __asm __volatile("mcr p15, 0, %0, c1, c0, 1" : : "r" (auxctl)); } /* * xscale_setup_minidata: * * Set up the mini-data cache clean area. We require the * caller to allocate the right amount of physically and * virtually contiguous space. */ extern vm_offset_t xscale_minidata_clean_addr; extern vm_size_t xscale_minidata_clean_size; /* already initialized */ void xscale_setup_minidata(vm_offset_t l1pt, vm_offset_t va, vm_paddr_t pa) { pd_entry_t *pde = (pd_entry_t *) l1pt; pt_entry_t *pte; vm_size_t size; uint32_t auxctl; xscale_minidata_clean_addr = va; /* Round it to page size. */ size = (xscale_minidata_clean_size + L2_S_OFFSET) & L2_S_FRAME; for (; size != 0; va += L2_S_SIZE, pa += L2_S_SIZE, size -= L2_S_SIZE) { pte = (pt_entry_t *) kernel_pt_lookup( pde[L1_IDX(va)] & L1_C_ADDR_MASK); if (pte == NULL) panic("xscale_setup_minidata: can't find L2 table for " "VA 0x%08x", (u_int32_t) va); pte[l2pte_index(va)] = L2_S_PROTO | pa | L2_S_PROT(PTE_KERNEL, VM_PROT_READ) | L2_C | L2_XSCALE_T_TEX(TEX_XSCALE_X); } /* * Configure the mini-data cache for write-back with * read/write-allocate. * * NOTE: In order to reconfigure the mini-data cache, we must * make sure it contains no valid data! In order to do that, * we must issue a global data cache invalidate command! * * WE ASSUME WE ARE RUNNING UN-CACHED WHEN THIS ROUTINE IS CALLED! * THIS IS VERY IMPORTANT! */ /* Invalidate data and mini-data. */ __asm __volatile("mcr p15, 0, %0, c7, c6, 0" : : "r" (0)); __asm __volatile("mrc p15, 0, %0, c1, c0, 1" : "=r" (auxctl)); auxctl = (auxctl & ~XSCALE_AUXCTL_MD_MASK) | XSCALE_AUXCTL_MD_WB_RWA; __asm __volatile("mcr p15, 0, %0, c1, c0, 1" : : "r" (auxctl)); } #endif /* * Allocate an L1 translation table for the specified pmap. * This is called at pmap creation time. */ static void pmap_alloc_l1(pmap_t pm) { struct l1_ttable *l1; u_int8_t domain; /* * Remove the L1 at the head of the LRU list */ mtx_lock(&l1_lru_lock); l1 = TAILQ_FIRST(&l1_lru_list); TAILQ_REMOVE(&l1_lru_list, l1, l1_lru); /* * Pick the first available domain number, and update * the link to the next number. */ domain = l1->l1_domain_first; l1->l1_domain_first = l1->l1_domain_free[domain]; /* * If there are still free domain numbers in this L1, * put it back on the TAIL of the LRU list. */ if (++l1->l1_domain_use_count < PMAP_DOMAINS) TAILQ_INSERT_TAIL(&l1_lru_list, l1, l1_lru); mtx_unlock(&l1_lru_lock); /* * Fix up the relevant bits in the pmap structure */ pm->pm_l1 = l1; pm->pm_domain = domain + 1; } /* * Free an L1 translation table. * This is called at pmap destruction time. */ static void pmap_free_l1(pmap_t pm) { struct l1_ttable *l1 = pm->pm_l1; mtx_lock(&l1_lru_lock); /* * If this L1 is currently on the LRU list, remove it. */ if (l1->l1_domain_use_count < PMAP_DOMAINS) TAILQ_REMOVE(&l1_lru_list, l1, l1_lru); /* * Free up the domain number which was allocated to the pmap */ l1->l1_domain_free[pm->pm_domain - 1] = l1->l1_domain_first; l1->l1_domain_first = pm->pm_domain - 1; l1->l1_domain_use_count--; /* * The L1 now must have at least 1 free domain, so add * it back to the LRU list. If the use count is zero, * put it at the head of the list, otherwise it goes * to the tail. */ if (l1->l1_domain_use_count == 0) { TAILQ_INSERT_HEAD(&l1_lru_list, l1, l1_lru); } else TAILQ_INSERT_TAIL(&l1_lru_list, l1, l1_lru); mtx_unlock(&l1_lru_lock); } /* * Returns a pointer to the L2 bucket associated with the specified pmap * and VA, or NULL if no L2 bucket exists for the address. */ static PMAP_INLINE struct l2_bucket * pmap_get_l2_bucket(pmap_t pm, vm_offset_t va) { struct l2_dtable *l2; struct l2_bucket *l2b; u_short l1idx; l1idx = L1_IDX(va); if ((l2 = pm->pm_l2[L2_IDX(l1idx)]) == NULL || (l2b = &l2->l2_bucket[L2_BUCKET(l1idx)])->l2b_kva == NULL) return (NULL); return (l2b); } /* * Returns a pointer to the L2 bucket associated with the specified pmap * and VA. * * If no L2 bucket exists, perform the necessary allocations to put an L2 * bucket/page table in place. * * Note that if a new L2 bucket/page was allocated, the caller *must* * increment the bucket occupancy counter appropriately *before* * releasing the pmap's lock to ensure no other thread or cpu deallocates * the bucket/page in the meantime. */ static struct l2_bucket * pmap_alloc_l2_bucket(pmap_t pm, vm_offset_t va) { struct l2_dtable *l2; struct l2_bucket *l2b; u_short l1idx; l1idx = L1_IDX(va); PMAP_ASSERT_LOCKED(pm); rw_assert(&pvh_global_lock, RA_WLOCKED); if ((l2 = pm->pm_l2[L2_IDX(l1idx)]) == NULL) { /* * No mapping at this address, as there is * no entry in the L1 table. * Need to allocate a new l2_dtable. */ PMAP_UNLOCK(pm); rw_wunlock(&pvh_global_lock); if ((l2 = uma_zalloc(l2table_zone, M_NOWAIT)) == NULL) { rw_wlock(&pvh_global_lock); PMAP_LOCK(pm); return (NULL); } rw_wlock(&pvh_global_lock); PMAP_LOCK(pm); if (pm->pm_l2[L2_IDX(l1idx)] != NULL) { /* * Someone already allocated the l2_dtable while * we were doing the same. */ uma_zfree(l2table_zone, l2); l2 = pm->pm_l2[L2_IDX(l1idx)]; } else { bzero(l2, sizeof(*l2)); /* * Link it into the parent pmap */ pm->pm_l2[L2_IDX(l1idx)] = l2; } } l2b = &l2->l2_bucket[L2_BUCKET(l1idx)]; /* * Fetch pointer to the L2 page table associated with the address. */ if (l2b->l2b_kva == NULL) { pt_entry_t *ptep; /* * No L2 page table has been allocated. Chances are, this * is because we just allocated the l2_dtable, above. */ l2->l2_occupancy++; PMAP_UNLOCK(pm); rw_wunlock(&pvh_global_lock); ptep = uma_zalloc(l2zone, M_NOWAIT); rw_wlock(&pvh_global_lock); PMAP_LOCK(pm); if (l2b->l2b_kva != 0) { /* We lost the race. */ l2->l2_occupancy--; uma_zfree(l2zone, ptep); return (l2b); } l2b->l2b_phys = vtophys(ptep); if (ptep == NULL) { /* * Oops, no more L2 page tables available at this * time. We may need to deallocate the l2_dtable * if we allocated a new one above. */ l2->l2_occupancy--; if (l2->l2_occupancy == 0) { pm->pm_l2[L2_IDX(l1idx)] = NULL; uma_zfree(l2table_zone, l2); } return (NULL); } l2b->l2b_kva = ptep; l2b->l2b_l1idx = l1idx; } return (l2b); } static PMAP_INLINE void #ifndef PMAP_INCLUDE_PTE_SYNC pmap_free_l2_ptp(pt_entry_t *l2) #else pmap_free_l2_ptp(boolean_t need_sync, pt_entry_t *l2) #endif { #ifdef PMAP_INCLUDE_PTE_SYNC /* * Note: With a write-back cache, we may need to sync this * L2 table before re-using it. * This is because it may have belonged to a non-current * pmap, in which case the cache syncs would have been * skipped when the pages were being unmapped. If the * L2 table were then to be immediately re-allocated to * the *current* pmap, it may well contain stale mappings * which have not yet been cleared by a cache write-back * and so would still be visible to the mmu. */ if (need_sync) PTE_SYNC_RANGE(l2, L2_TABLE_SIZE_REAL / sizeof(pt_entry_t)); #endif uma_zfree(l2zone, l2); } /* * One or more mappings in the specified L2 descriptor table have just been * invalidated. * * Garbage collect the metadata and descriptor table itself if necessary. * * The pmap lock must be acquired when this is called (not necessary * for the kernel pmap). */ static void pmap_free_l2_bucket(pmap_t pm, struct l2_bucket *l2b, u_int count) { struct l2_dtable *l2; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep; u_short l1idx; /* * Update the bucket's reference count according to how many * PTEs the caller has just invalidated. */ l2b->l2b_occupancy -= count; /* * Note: * * Level 2 page tables allocated to the kernel pmap are never freed * as that would require checking all Level 1 page tables and * removing any references to the Level 2 page table. See also the * comment elsewhere about never freeing bootstrap L2 descriptors. * * We make do with just invalidating the mapping in the L2 table. * * This isn't really a big deal in practice and, in fact, leads * to a performance win over time as we don't need to continually * alloc/free. */ if (l2b->l2b_occupancy > 0 || pm == pmap_kernel()) return; /* * There are no more valid mappings in this level 2 page table. * Go ahead and NULL-out the pointer in the bucket, then * free the page table. */ l1idx = l2b->l2b_l1idx; ptep = l2b->l2b_kva; l2b->l2b_kva = NULL; pl1pd = &pm->pm_l1->l1_kva[l1idx]; /* * If the L1 slot matches the pmap's domain * number, then invalidate it. */ l1pd = *pl1pd & (L1_TYPE_MASK | L1_C_DOM_MASK); if (l1pd == (L1_C_DOM(pm->pm_domain) | L1_TYPE_C)) { *pl1pd = 0; PTE_SYNC(pl1pd); } /* * Release the L2 descriptor table back to the pool cache. */ #ifndef PMAP_INCLUDE_PTE_SYNC pmap_free_l2_ptp(ptep); #else pmap_free_l2_ptp(!pmap_is_current(pm), ptep); #endif /* * Update the reference count in the associated l2_dtable */ l2 = pm->pm_l2[L2_IDX(l1idx)]; if (--l2->l2_occupancy > 0) return; /* * There are no more valid mappings in any of the Level 1 * slots managed by this l2_dtable. Go ahead and NULL-out * the pointer in the parent pmap and free the l2_dtable. */ pm->pm_l2[L2_IDX(l1idx)] = NULL; uma_zfree(l2table_zone, l2); } /* * Pool cache constructors for L2 descriptor tables, metadata and pmap * structures. */ static int pmap_l2ptp_ctor(void *mem, int size, void *arg, int flags) { #ifndef PMAP_INCLUDE_PTE_SYNC struct l2_bucket *l2b; pt_entry_t *ptep, pte; vm_offset_t va = (vm_offset_t)mem & ~PAGE_MASK; /* * The mappings for these page tables were initially made using * pmap_kenter() by the pool subsystem. Therefore, the cache- * mode will not be right for page table mappings. To avoid * polluting the pmap_kenter() code with a special case for * page tables, we simply fix up the cache-mode here if it's not * correct. */ l2b = pmap_get_l2_bucket(pmap_kernel(), va); ptep = &l2b->l2b_kva[l2pte_index(va)]; pte = *ptep; if ((pte & L2_S_CACHE_MASK) != pte_l2_s_cache_mode_pt) { /* * Page tables must have the cache-mode set to * Write-Thru. */ *ptep = (pte & ~L2_S_CACHE_MASK) | pte_l2_s_cache_mode_pt; PTE_SYNC(ptep); cpu_tlb_flushD_SE(va); cpu_cpwait(); } #endif memset(mem, 0, L2_TABLE_SIZE_REAL); PTE_SYNC_RANGE(mem, L2_TABLE_SIZE_REAL / sizeof(pt_entry_t)); return (0); } /* * A bunch of routines to conditionally flush the caches/TLB depending * on whether the specified pmap actually needs to be flushed at any * given time. */ static PMAP_INLINE void pmap_tlb_flushID_SE(pmap_t pm, vm_offset_t va) { if (pmap_is_current(pm)) cpu_tlb_flushID_SE(va); } static PMAP_INLINE void pmap_tlb_flushD_SE(pmap_t pm, vm_offset_t va) { if (pmap_is_current(pm)) cpu_tlb_flushD_SE(va); } static PMAP_INLINE void pmap_tlb_flushID(pmap_t pm) { if (pmap_is_current(pm)) cpu_tlb_flushID(); } static PMAP_INLINE void pmap_tlb_flushD(pmap_t pm) { if (pmap_is_current(pm)) cpu_tlb_flushD(); } static int pmap_has_valid_mapping(pmap_t pm, vm_offset_t va) { pd_entry_t *pde; pt_entry_t *ptep; if (pmap_get_pde_pte(pm, va, &pde, &ptep) && ptep && ((*ptep & L2_TYPE_MASK) != L2_TYPE_INV)) return (1); return (0); } static PMAP_INLINE void pmap_idcache_wbinv_range(pmap_t pm, vm_offset_t va, vm_size_t len) { vm_size_t rest; CTR4(KTR_PMAP, "pmap_dcache_wbinv_range: pmap %p is_kernel %d va 0x%08x" " len 0x%x ", pm, pm == pmap_kernel(), va, len); if (pmap_is_current(pm) || pm == pmap_kernel()) { rest = MIN(PAGE_SIZE - (va & PAGE_MASK), len); while (len > 0) { if (pmap_has_valid_mapping(pm, va)) { cpu_idcache_wbinv_range(va, rest); cpu_l2cache_wbinv_range(va, rest); } len -= rest; va += rest; rest = MIN(PAGE_SIZE, len); } } } static PMAP_INLINE void pmap_dcache_wb_range(pmap_t pm, vm_offset_t va, vm_size_t len, boolean_t do_inv, boolean_t rd_only) { vm_size_t rest; CTR4(KTR_PMAP, "pmap_dcache_wb_range: pmap %p is_kernel %d va 0x%08x " "len 0x%x ", pm, pm == pmap_kernel(), va, len); CTR2(KTR_PMAP, " do_inv %d rd_only %d", do_inv, rd_only); if (pmap_is_current(pm)) { rest = MIN(PAGE_SIZE - (va & PAGE_MASK), len); while (len > 0) { if (pmap_has_valid_mapping(pm, va)) { if (do_inv && rd_only) { cpu_dcache_inv_range(va, rest); cpu_l2cache_inv_range(va, rest); } else if (do_inv) { cpu_dcache_wbinv_range(va, rest); cpu_l2cache_wbinv_range(va, rest); } else if (!rd_only) { cpu_dcache_wb_range(va, rest); cpu_l2cache_wb_range(va, rest); } } len -= rest; va += rest; rest = MIN(PAGE_SIZE, len); } } } static PMAP_INLINE void pmap_idcache_wbinv_all(pmap_t pm) { if (pmap_is_current(pm)) { cpu_idcache_wbinv_all(); cpu_l2cache_wbinv_all(); } } #ifdef notyet static PMAP_INLINE void pmap_dcache_wbinv_all(pmap_t pm) { if (pmap_is_current(pm)) { cpu_dcache_wbinv_all(); cpu_l2cache_wbinv_all(); } } #endif /* * PTE_SYNC_CURRENT: * * Make sure the pte is written out to RAM. * We need to do this for one of two cases: * - We're dealing with the kernel pmap * - There is no pmap active in the cache/tlb. * - The specified pmap is 'active' in the cache/tlb. */ #ifdef PMAP_INCLUDE_PTE_SYNC #define PTE_SYNC_CURRENT(pm, ptep) \ do { \ if (PMAP_NEEDS_PTE_SYNC && \ pmap_is_current(pm)) \ PTE_SYNC(ptep); \ } while (/*CONSTCOND*/0) #else #define PTE_SYNC_CURRENT(pm, ptep) /* nothing */ #endif /* * cacheable == -1 means we must make the entry uncacheable, 1 means * cacheable; */ static __inline void pmap_set_cache_entry(pv_entry_t pv, pmap_t pm, vm_offset_t va, int cacheable) { struct l2_bucket *l2b; pt_entry_t *ptep, pte; l2b = pmap_get_l2_bucket(pv->pv_pmap, pv->pv_va); ptep = &l2b->l2b_kva[l2pte_index(pv->pv_va)]; if (cacheable == 1) { pte = (*ptep & ~L2_S_CACHE_MASK) | pte_l2_s_cache_mode; if (l2pte_valid(pte)) { if (PV_BEEN_EXECD(pv->pv_flags)) { pmap_tlb_flushID_SE(pv->pv_pmap, pv->pv_va); } else if (PV_BEEN_REFD(pv->pv_flags)) { pmap_tlb_flushD_SE(pv->pv_pmap, pv->pv_va); } } } else { pte = *ptep &~ L2_S_CACHE_MASK; if ((va != pv->pv_va || pm != pv->pv_pmap) && l2pte_valid(pte)) { if (PV_BEEN_EXECD(pv->pv_flags)) { pmap_idcache_wbinv_range(pv->pv_pmap, pv->pv_va, PAGE_SIZE); pmap_tlb_flushID_SE(pv->pv_pmap, pv->pv_va); } else if (PV_BEEN_REFD(pv->pv_flags)) { pmap_dcache_wb_range(pv->pv_pmap, pv->pv_va, PAGE_SIZE, TRUE, (pv->pv_flags & PVF_WRITE) == 0); pmap_tlb_flushD_SE(pv->pv_pmap, pv->pv_va); } } } *ptep = pte; PTE_SYNC_CURRENT(pv->pv_pmap, ptep); } static void pmap_fix_cache(struct vm_page *pg, pmap_t pm, vm_offset_t va) { int pmwc = 0; int writable = 0, kwritable = 0, uwritable = 0; int entries = 0, kentries = 0, uentries = 0; struct pv_entry *pv; rw_assert(&pvh_global_lock, RA_WLOCKED); /* the cache gets written back/invalidated on context switch. * therefore, if a user page shares an entry in the same page or * with the kernel map and at least one is writable, then the * cache entry must be set write-through. */ TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) { /* generate a count of the pv_entry uses */ if (pv->pv_flags & PVF_WRITE) { if (pv->pv_pmap == pmap_kernel()) kwritable++; else if (pv->pv_pmap == pm) uwritable++; writable++; } if (pv->pv_pmap == pmap_kernel()) kentries++; else { if (pv->pv_pmap == pm) uentries++; entries++; } } /* * check if the user duplicate mapping has * been removed. */ if ((pm != pmap_kernel()) && (((uentries > 1) && uwritable) || (uwritable > 1))) pmwc = 1; TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) { /* check for user uncachable conditions - order is important */ if (pm != pmap_kernel() && (pv->pv_pmap == pm || pv->pv_pmap == pmap_kernel())) { if ((uentries > 1 && uwritable) || uwritable > 1) { /* user duplicate mapping */ if (pv->pv_pmap != pmap_kernel()) pv->pv_flags |= PVF_MWC; if (!(pv->pv_flags & PVF_NC)) { pv->pv_flags |= PVF_NC; pmap_set_cache_entry(pv, pm, va, -1); } continue; } else /* no longer a duplicate user */ pv->pv_flags &= ~PVF_MWC; } /* * check for kernel uncachable conditions * kernel writable or kernel readable with writable user entry */ if ((kwritable && (entries || kentries > 1)) || (kwritable > 1) || ((kwritable != writable) && kentries && (pv->pv_pmap == pmap_kernel() || (pv->pv_flags & PVF_WRITE) || (pv->pv_flags & PVF_MWC)))) { if (!(pv->pv_flags & PVF_NC)) { pv->pv_flags |= PVF_NC; pmap_set_cache_entry(pv, pm, va, -1); } continue; } /* kernel and user are cachable */ if ((pm == pmap_kernel()) && !(pv->pv_flags & PVF_MWC) && (pv->pv_flags & PVF_NC)) { pv->pv_flags &= ~PVF_NC; if (pg->md.pv_memattr != VM_MEMATTR_UNCACHEABLE) pmap_set_cache_entry(pv, pm, va, 1); continue; } /* user is no longer sharable and writable */ if (pm != pmap_kernel() && (pv->pv_pmap == pm || pv->pv_pmap == pmap_kernel()) && !pmwc && (pv->pv_flags & PVF_NC)) { pv->pv_flags &= ~(PVF_NC | PVF_MWC); if (pg->md.pv_memattr != VM_MEMATTR_UNCACHEABLE) pmap_set_cache_entry(pv, pm, va, 1); } } if ((kwritable == 0) && (writable == 0)) { pg->md.pvh_attrs &= ~PVF_MOD; vm_page_aflag_clear(pg, PGA_WRITEABLE); return; } } /* * Modify pte bits for all ptes corresponding to the given physical address. * We use `maskbits' rather than `clearbits' because we're always passing * constants and the latter would require an extra inversion at run-time. */ static int pmap_clearbit(struct vm_page *pg, u_int maskbits) { struct l2_bucket *l2b; struct pv_entry *pv; pt_entry_t *ptep, npte, opte; pmap_t pm; vm_offset_t va; u_int oflags; int count = 0; rw_wlock(&pvh_global_lock); if (maskbits & PVF_WRITE) maskbits |= PVF_MOD; /* * Clear saved attributes (modify, reference) */ pg->md.pvh_attrs &= ~(maskbits & (PVF_MOD | PVF_REF)); if (TAILQ_EMPTY(&pg->md.pv_list)) { rw_wunlock(&pvh_global_lock); return (0); } /* * Loop over all current mappings setting/clearing as appropos */ TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) { va = pv->pv_va; pm = pv->pv_pmap; oflags = pv->pv_flags; if (!(oflags & maskbits)) { if ((maskbits & PVF_WRITE) && (pv->pv_flags & PVF_NC)) { if (pg->md.pv_memattr != VM_MEMATTR_UNCACHEABLE) { PMAP_LOCK(pm); l2b = pmap_get_l2_bucket(pm, va); ptep = &l2b->l2b_kva[l2pte_index(va)]; *ptep |= pte_l2_s_cache_mode; PTE_SYNC(ptep); PMAP_UNLOCK(pm); } pv->pv_flags &= ~(PVF_NC | PVF_MWC); } continue; } pv->pv_flags &= ~maskbits; PMAP_LOCK(pm); l2b = pmap_get_l2_bucket(pm, va); ptep = &l2b->l2b_kva[l2pte_index(va)]; npte = opte = *ptep; if (maskbits & (PVF_WRITE|PVF_MOD)) { if ((pv->pv_flags & PVF_NC)) { /* * Entry is not cacheable: * * Don't turn caching on again if this is a * modified emulation. This would be * inconsitent with the settings created by * pmap_fix_cache(). Otherwise, it's safe * to re-enable cacheing. * * There's no need to call pmap_fix_cache() * here: all pages are losing their write * permission. */ if (maskbits & PVF_WRITE) { if (pg->md.pv_memattr != VM_MEMATTR_UNCACHEABLE) npte |= pte_l2_s_cache_mode; pv->pv_flags &= ~(PVF_NC | PVF_MWC); } } else if (opte & L2_S_PROT_W) { vm_page_dirty(pg); /* * Entry is writable/cacheable: check if pmap * is current if it is flush it, otherwise it * won't be in the cache */ if (PV_BEEN_EXECD(oflags)) pmap_idcache_wbinv_range(pm, pv->pv_va, PAGE_SIZE); else if (PV_BEEN_REFD(oflags)) pmap_dcache_wb_range(pm, pv->pv_va, PAGE_SIZE, (maskbits & PVF_REF) ? TRUE : FALSE, FALSE); } /* make the pte read only */ npte &= ~L2_S_PROT_W; } if (maskbits & PVF_REF) { if ((pv->pv_flags & PVF_NC) == 0 && (maskbits & (PVF_WRITE|PVF_MOD)) == 0) { /* * Check npte here; we may have already * done the wbinv above, and the validity * of the PTE is the same for opte and * npte. */ if (npte & L2_S_PROT_W) { if (PV_BEEN_EXECD(oflags)) pmap_idcache_wbinv_range(pm, pv->pv_va, PAGE_SIZE); else if (PV_BEEN_REFD(oflags)) pmap_dcache_wb_range(pm, pv->pv_va, PAGE_SIZE, TRUE, FALSE); } else if ((npte & L2_TYPE_MASK) != L2_TYPE_INV) { /* XXXJRT need idcache_inv_range */ if (PV_BEEN_EXECD(oflags)) pmap_idcache_wbinv_range(pm, pv->pv_va, PAGE_SIZE); else if (PV_BEEN_REFD(oflags)) pmap_dcache_wb_range(pm, pv->pv_va, PAGE_SIZE, TRUE, TRUE); } } /* * Make the PTE invalid so that we will take a * page fault the next time the mapping is * referenced. */ npte &= ~L2_TYPE_MASK; npte |= L2_TYPE_INV; } if (npte != opte) { count++; *ptep = npte; PTE_SYNC(ptep); /* Flush the TLB entry if a current pmap. */ if (PV_BEEN_EXECD(oflags)) pmap_tlb_flushID_SE(pm, pv->pv_va); else if (PV_BEEN_REFD(oflags)) pmap_tlb_flushD_SE(pm, pv->pv_va); } PMAP_UNLOCK(pm); } if (maskbits & PVF_WRITE) vm_page_aflag_clear(pg, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); return (count); } /* * main pv_entry manipulation functions: * pmap_enter_pv: enter a mapping onto a vm_page list * pmap_remove_pv: remove a mappiing from a vm_page list * * NOTE: pmap_enter_pv expects to lock the pvh itself * pmap_remove_pv expects the caller to lock the pvh before calling */ /* * pmap_enter_pv: enter a mapping onto a vm_page's PV list * * => caller should hold the proper lock on pvh_global_lock * => caller should have pmap locked * => we will (someday) gain the lock on the vm_page's PV list * => caller should adjust ptp's wire_count before calling * => caller should not adjust pmap's wire_count */ static void pmap_enter_pv(struct vm_page *pg, struct pv_entry *pve, pmap_t pm, vm_offset_t va, u_int flags) { rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_ASSERT_LOCKED(pm); if (pg->md.pv_kva != 0) { pve->pv_pmap = kernel_pmap; pve->pv_va = pg->md.pv_kva; pve->pv_flags = PVF_WRITE | PVF_UNMAN; if (pm != kernel_pmap) PMAP_LOCK(kernel_pmap); TAILQ_INSERT_HEAD(&pg->md.pv_list, pve, pv_list); TAILQ_INSERT_HEAD(&kernel_pmap->pm_pvlist, pve, pv_plist); if (pm != kernel_pmap) PMAP_UNLOCK(kernel_pmap); pg->md.pv_kva = 0; if ((pve = pmap_get_pv_entry()) == NULL) panic("pmap_kenter_pv: no pv entries"); } pve->pv_pmap = pm; pve->pv_va = va; pve->pv_flags = flags; TAILQ_INSERT_HEAD(&pg->md.pv_list, pve, pv_list); TAILQ_INSERT_HEAD(&pm->pm_pvlist, pve, pv_plist); pg->md.pvh_attrs |= flags & (PVF_REF | PVF_MOD); if (pve->pv_flags & PVF_WIRED) ++pm->pm_stats.wired_count; vm_page_aflag_set(pg, PGA_REFERENCED); } /* * * pmap_find_pv: Find a pv entry * * => caller should hold lock on vm_page */ static PMAP_INLINE struct pv_entry * pmap_find_pv(struct vm_page *pg, pmap_t pm, vm_offset_t va) { struct pv_entry *pv; rw_assert(&pvh_global_lock, RA_WLOCKED); TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) if (pm == pv->pv_pmap && va == pv->pv_va) break; return (pv); } /* * vector_page_setprot: * * Manipulate the protection of the vector page. */ void vector_page_setprot(int prot) { struct l2_bucket *l2b; pt_entry_t *ptep; l2b = pmap_get_l2_bucket(pmap_kernel(), vector_page); ptep = &l2b->l2b_kva[l2pte_index(vector_page)]; *ptep = (*ptep & ~L1_S_PROT_MASK) | L2_S_PROT(PTE_KERNEL, prot); PTE_SYNC(ptep); cpu_tlb_flushD_SE(vector_page); cpu_cpwait(); } /* * pmap_remove_pv: try to remove a mapping from a pv_list * * => caller should hold proper lock on pmap_main_lock * => pmap should be locked * => caller should hold lock on vm_page [so that attrs can be adjusted] * => caller should adjust ptp's wire_count and free PTP if needed * => caller should NOT adjust pmap's wire_count * => we return the removed pve */ static void pmap_nuke_pv(struct vm_page *pg, pmap_t pm, struct pv_entry *pve) { struct pv_entry *pv; rw_assert(&pvh_global_lock, RA_WLOCKED); PMAP_ASSERT_LOCKED(pm); TAILQ_REMOVE(&pg->md.pv_list, pve, pv_list); TAILQ_REMOVE(&pm->pm_pvlist, pve, pv_plist); if (pve->pv_flags & PVF_WIRED) --pm->pm_stats.wired_count; if (pg->md.pvh_attrs & PVF_MOD) vm_page_dirty(pg); if (TAILQ_FIRST(&pg->md.pv_list) == NULL) pg->md.pvh_attrs &= ~PVF_REF; else vm_page_aflag_set(pg, PGA_REFERENCED); if ((pve->pv_flags & PVF_NC) && ((pm == pmap_kernel()) || (pve->pv_flags & PVF_WRITE) || !(pve->pv_flags & PVF_MWC))) pmap_fix_cache(pg, pm, 0); else if (pve->pv_flags & PVF_WRITE) { TAILQ_FOREACH(pve, &pg->md.pv_list, pv_list) if (pve->pv_flags & PVF_WRITE) break; if (!pve) { pg->md.pvh_attrs &= ~PVF_MOD; vm_page_aflag_clear(pg, PGA_WRITEABLE); } } pv = TAILQ_FIRST(&pg->md.pv_list); if (pv != NULL && (pv->pv_flags & PVF_UNMAN) && TAILQ_NEXT(pv, pv_list) == NULL) { pm = kernel_pmap; pg->md.pv_kva = pv->pv_va; /* a recursive pmap_nuke_pv */ TAILQ_REMOVE(&pg->md.pv_list, pv, pv_list); TAILQ_REMOVE(&pm->pm_pvlist, pv, pv_plist); if (pv->pv_flags & PVF_WIRED) --pm->pm_stats.wired_count; pg->md.pvh_attrs &= ~PVF_REF; pg->md.pvh_attrs &= ~PVF_MOD; vm_page_aflag_clear(pg, PGA_WRITEABLE); pmap_free_pv_entry(pv); } } static struct pv_entry * pmap_remove_pv(struct vm_page *pg, pmap_t pm, vm_offset_t va) { struct pv_entry *pve; rw_assert(&pvh_global_lock, RA_WLOCKED); pve = TAILQ_FIRST(&pg->md.pv_list); while (pve) { if (pve->pv_pmap == pm && pve->pv_va == va) { /* match? */ pmap_nuke_pv(pg, pm, pve); break; } pve = TAILQ_NEXT(pve, pv_list); } if (pve == NULL && pg->md.pv_kva == va) pg->md.pv_kva = 0; return(pve); /* return removed pve */ } /* * * pmap_modify_pv: Update pv flags * * => caller should hold lock on vm_page [so that attrs can be adjusted] * => caller should NOT adjust pmap's wire_count * => we return the old flags * * Modify a physical-virtual mapping in the pv table */ static u_int pmap_modify_pv(struct vm_page *pg, pmap_t pm, vm_offset_t va, u_int clr_mask, u_int set_mask) { struct pv_entry *npv; u_int flags, oflags; PMAP_ASSERT_LOCKED(pm); rw_assert(&pvh_global_lock, RA_WLOCKED); if ((npv = pmap_find_pv(pg, pm, va)) == NULL) return (0); /* * There is at least one VA mapping this page. */ if (clr_mask & (PVF_REF | PVF_MOD)) pg->md.pvh_attrs |= set_mask & (PVF_REF | PVF_MOD); oflags = npv->pv_flags; npv->pv_flags = flags = (oflags & ~clr_mask) | set_mask; if ((flags ^ oflags) & PVF_WIRED) { if (flags & PVF_WIRED) ++pm->pm_stats.wired_count; else --pm->pm_stats.wired_count; } if ((flags ^ oflags) & PVF_WRITE) pmap_fix_cache(pg, pm, 0); return (oflags); } /* Function to set the debug level of the pmap code */ #ifdef PMAP_DEBUG void pmap_debug(int level) { pmap_debug_level = level; dprintf("pmap_debug: level=%d\n", pmap_debug_level); } #endif /* PMAP_DEBUG */ void pmap_pinit0(struct pmap *pmap) { PDEBUG(1, printf("pmap_pinit0: pmap = %08x\n", (u_int32_t) pmap)); bcopy(kernel_pmap, pmap, sizeof(*pmap)); bzero(&pmap->pm_mtx, sizeof(pmap->pm_mtx)); PMAP_LOCK_INIT(pmap); } /* * Initialize a vm_page's machine-dependent fields. */ void pmap_page_init(vm_page_t m) { TAILQ_INIT(&m->md.pv_list); m->md.pv_memattr = VM_MEMATTR_DEFAULT; } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ void pmap_init(void) { int shpgperproc = PMAP_SHPGPERPROC; l2zone = uma_zcreate("L2 Table", L2_TABLE_SIZE_REAL, pmap_l2ptp_ctor, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); l2table_zone = uma_zcreate("L2 Table", sizeof(struct l2_dtable), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); /* * Initialize the PV entry allocator. */ pvzone = uma_zcreate("PV ENTRY", sizeof (struct pv_entry), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); TUNABLE_INT_FETCH("vm.pmap.shpgperproc", &shpgperproc); pv_entry_max = shpgperproc * maxproc + vm_cnt.v_page_count; uma_zone_reserve_kva(pvzone, pv_entry_max); pv_entry_high_water = 9 * (pv_entry_max / 10); /* * Now it is safe to enable pv_table recording. */ PDEBUG(1, printf("pmap_init: done!\n")); } int pmap_fault_fixup(pmap_t pm, vm_offset_t va, vm_prot_t ftype, int user) { struct l2_dtable *l2; struct l2_bucket *l2b; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep, pte; vm_paddr_t pa; u_int l1idx; int rv = 0; l1idx = L1_IDX(va); rw_wlock(&pvh_global_lock); PMAP_LOCK(pm); /* * If there is no l2_dtable for this address, then the process * has no business accessing it. * * Note: This will catch userland processes trying to access * kernel addresses. */ l2 = pm->pm_l2[L2_IDX(l1idx)]; if (l2 == NULL) goto out; /* * Likewise if there is no L2 descriptor table */ l2b = &l2->l2_bucket[L2_BUCKET(l1idx)]; if (l2b->l2b_kva == NULL) goto out; /* * Check the PTE itself. */ ptep = &l2b->l2b_kva[l2pte_index(va)]; pte = *ptep; if (pte == 0) goto out; /* * Catch a userland access to the vector page mapped at 0x0 */ if (user && (pte & L2_S_PROT_U) == 0) goto out; if (va == vector_page) goto out; pa = l2pte_pa(pte); if ((ftype & VM_PROT_WRITE) && (pte & L2_S_PROT_W) == 0) { /* * This looks like a good candidate for "page modified" * emulation... */ struct pv_entry *pv; struct vm_page *pg; /* Extract the physical address of the page */ if ((pg = PHYS_TO_VM_PAGE(pa)) == NULL) { goto out; } /* Get the current flags for this page. */ pv = pmap_find_pv(pg, pm, va); if (pv == NULL) { goto out; } /* * Do the flags say this page is writable? If not then it * is a genuine write fault. If yes then the write fault is * our fault as we did not reflect the write access in the * PTE. Now we know a write has occurred we can correct this * and also set the modified bit */ if ((pv->pv_flags & PVF_WRITE) == 0) { goto out; } pg->md.pvh_attrs |= PVF_REF | PVF_MOD; vm_page_dirty(pg); pv->pv_flags |= PVF_REF | PVF_MOD; /* * Re-enable write permissions for the page. No need to call * pmap_fix_cache(), since this is just a * modified-emulation fault, and the PVF_WRITE bit isn't * changing. We've already set the cacheable bits based on * the assumption that we can write to this page. */ *ptep = (pte & ~L2_TYPE_MASK) | L2_S_PROTO | L2_S_PROT_W; PTE_SYNC(ptep); rv = 1; } else if ((pte & L2_TYPE_MASK) == L2_TYPE_INV) { /* * This looks like a good candidate for "page referenced" * emulation. */ struct pv_entry *pv; struct vm_page *pg; /* Extract the physical address of the page */ if ((pg = PHYS_TO_VM_PAGE(pa)) == NULL) goto out; /* Get the current flags for this page. */ pv = pmap_find_pv(pg, pm, va); if (pv == NULL) goto out; pg->md.pvh_attrs |= PVF_REF; pv->pv_flags |= PVF_REF; *ptep = (pte & ~L2_TYPE_MASK) | L2_S_PROTO; PTE_SYNC(ptep); rv = 1; } /* * We know there is a valid mapping here, so simply * fix up the L1 if necessary. */ pl1pd = &pm->pm_l1->l1_kva[l1idx]; l1pd = l2b->l2b_phys | L1_C_DOM(pm->pm_domain) | L1_C_PROTO; if (*pl1pd != l1pd) { *pl1pd = l1pd; PTE_SYNC(pl1pd); rv = 1; } #ifdef DEBUG /* * If 'rv == 0' at this point, it generally indicates that there is a * stale TLB entry for the faulting address. This happens when two or * more processes are sharing an L1. Since we don't flush the TLB on * a context switch between such processes, we can take domain faults * for mappings which exist at the same VA in both processes. EVEN IF * WE'VE RECENTLY FIXED UP THE CORRESPONDING L1 in pmap_enter(), for * example. * * This is extremely likely to happen if pmap_enter() updated the L1 * entry for a recently entered mapping. In this case, the TLB is * flushed for the new mapping, but there may still be TLB entries for * other mappings belonging to other processes in the 1MB range * covered by the L1 entry. * * Since 'rv == 0', we know that the L1 already contains the correct * value, so the fault must be due to a stale TLB entry. * * Since we always need to flush the TLB anyway in the case where we * fixed up the L1, or frobbed the L2 PTE, we effectively deal with * stale TLB entries dynamically. * * However, the above condition can ONLY happen if the current L1 is * being shared. If it happens when the L1 is unshared, it indicates * that other parts of the pmap are not doing their job WRT managing * the TLB. */ if (rv == 0 && pm->pm_l1->l1_domain_use_count == 1) { printf("fixup: pm %p, va 0x%lx, ftype %d - nothing to do!\n", pm, (u_long)va, ftype); printf("fixup: l2 %p, l2b %p, ptep %p, pl1pd %p\n", l2, l2b, ptep, pl1pd); printf("fixup: pte 0x%x, l1pd 0x%x, last code 0x%x\n", pte, l1pd, last_fault_code); #ifdef DDB Debugger(); #endif } #endif cpu_tlb_flushID_SE(va); cpu_cpwait(); rv = 1; out: rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pm); return (rv); } void pmap_postinit(void) { struct l2_bucket *l2b; struct l1_ttable *l1; pd_entry_t *pl1pt; pt_entry_t *ptep, pte; vm_offset_t va, eva; u_int loop, needed; needed = (maxproc / PMAP_DOMAINS) + ((maxproc % PMAP_DOMAINS) ? 1 : 0); needed -= 1; l1 = malloc(sizeof(*l1) * needed, M_VMPMAP, M_WAITOK); for (loop = 0; loop < needed; loop++, l1++) { /* Allocate a L1 page table */ va = (vm_offset_t)contigmalloc(L1_TABLE_SIZE, M_VMPMAP, 0, 0x0, 0xffffffff, L1_TABLE_SIZE, 0); if (va == 0) panic("Cannot allocate L1 KVM"); eva = va + L1_TABLE_SIZE; pl1pt = (pd_entry_t *)va; while (va < eva) { l2b = pmap_get_l2_bucket(pmap_kernel(), va); ptep = &l2b->l2b_kva[l2pte_index(va)]; pte = *ptep; pte = (pte & ~L2_S_CACHE_MASK) | pte_l2_s_cache_mode_pt; *ptep = pte; PTE_SYNC(ptep); cpu_tlb_flushD_SE(va); va += PAGE_SIZE; } pmap_init_l1(l1, pl1pt); } #ifdef DEBUG printf("pmap_postinit: Allocated %d static L1 descriptor tables\n", needed); #endif } /* * This is used to stuff certain critical values into the PCB where they * can be accessed quickly from cpu_switch() et al. */ void pmap_set_pcb_pagedir(pmap_t pm, struct pcb *pcb) { struct l2_bucket *l2b; pcb->pcb_pagedir = pm->pm_l1->l1_physaddr; pcb->pcb_dacr = (DOMAIN_CLIENT << (PMAP_DOMAIN_KERNEL * 2)) | (DOMAIN_CLIENT << (pm->pm_domain * 2)); if (vector_page < KERNBASE) { pcb->pcb_pl1vec = &pm->pm_l1->l1_kva[L1_IDX(vector_page)]; l2b = pmap_get_l2_bucket(pm, vector_page); pcb->pcb_l1vec = l2b->l2b_phys | L1_C_PROTO | L1_C_DOM(pm->pm_domain) | L1_C_DOM(PMAP_DOMAIN_KERNEL); } else pcb->pcb_pl1vec = NULL; } void pmap_activate(struct thread *td) { pmap_t pm; struct pcb *pcb; pm = vmspace_pmap(td->td_proc->p_vmspace); pcb = td->td_pcb; critical_enter(); pmap_set_pcb_pagedir(pm, pcb); if (td == curthread) { u_int cur_dacr, cur_ttb; __asm __volatile("mrc p15, 0, %0, c2, c0, 0" : "=r"(cur_ttb)); __asm __volatile("mrc p15, 0, %0, c3, c0, 0" : "=r"(cur_dacr)); cur_ttb &= ~(L1_TABLE_SIZE - 1); if (cur_ttb == (u_int)pcb->pcb_pagedir && cur_dacr == pcb->pcb_dacr) { /* * No need to switch address spaces. */ critical_exit(); return; } /* * We MUST, I repeat, MUST fix up the L1 entry corresponding * to 'vector_page' in the incoming L1 table before switching * to it otherwise subsequent interrupts/exceptions (including * domain faults!) will jump into hyperspace. */ if (pcb->pcb_pl1vec) { *pcb->pcb_pl1vec = pcb->pcb_l1vec; /* * Don't need to PTE_SYNC() at this point since * cpu_setttb() is about to flush both the cache * and the TLB. */ } cpu_domains(pcb->pcb_dacr); cpu_setttb(pcb->pcb_pagedir); } critical_exit(); } static int pmap_set_pt_cache_mode(pd_entry_t *kl1, vm_offset_t va) { pd_entry_t *pdep, pde; pt_entry_t *ptep, pte; vm_offset_t pa; int rv = 0; /* * Make sure the descriptor itself has the correct cache mode */ pdep = &kl1[L1_IDX(va)]; pde = *pdep; if (l1pte_section_p(pde)) { if ((pde & L1_S_CACHE_MASK) != pte_l1_s_cache_mode_pt) { *pdep = (pde & ~L1_S_CACHE_MASK) | pte_l1_s_cache_mode_pt; PTE_SYNC(pdep); cpu_dcache_wbinv_range((vm_offset_t)pdep, sizeof(*pdep)); cpu_l2cache_wbinv_range((vm_offset_t)pdep, sizeof(*pdep)); rv = 1; } } else { pa = (vm_paddr_t)(pde & L1_C_ADDR_MASK); ptep = (pt_entry_t *)kernel_pt_lookup(pa); if (ptep == NULL) panic("pmap_bootstrap: No L2 for L2 @ va %p\n", ptep); ptep = &ptep[l2pte_index(va)]; pte = *ptep; if ((pte & L2_S_CACHE_MASK) != pte_l2_s_cache_mode_pt) { *ptep = (pte & ~L2_S_CACHE_MASK) | pte_l2_s_cache_mode_pt; PTE_SYNC(ptep); cpu_dcache_wbinv_range((vm_offset_t)ptep, sizeof(*ptep)); cpu_l2cache_wbinv_range((vm_offset_t)ptep, sizeof(*ptep)); rv = 1; } } return (rv); } static void pmap_alloc_specials(vm_offset_t *availp, int pages, vm_offset_t *vap, pt_entry_t **ptep) { vm_offset_t va = *availp; struct l2_bucket *l2b; if (ptep) { l2b = pmap_get_l2_bucket(pmap_kernel(), va); if (l2b == NULL) panic("pmap_alloc_specials: no l2b for 0x%x", va); *ptep = &l2b->l2b_kva[l2pte_index(va)]; } *vap = va; *availp = va + (PAGE_SIZE * pages); } /* * Bootstrap the system enough to run with virtual memory. * * On the arm this is called after mapping has already been enabled * and just syncs the pmap module with what has already been done. * [We can't call it easily with mapping off since the kernel is not * mapped with PA == VA, hence we would have to relocate every address * from the linked base (virtual) address "KERNBASE" to the actual * (physical) address starting relative to 0] */ #define PMAP_STATIC_L2_SIZE 16 void pmap_bootstrap(vm_offset_t firstaddr, struct pv_addr *l1pt) { static struct l1_ttable static_l1; static struct l2_dtable static_l2[PMAP_STATIC_L2_SIZE]; struct l1_ttable *l1 = &static_l1; struct l2_dtable *l2; struct l2_bucket *l2b; pd_entry_t pde; pd_entry_t *kernel_l1pt = (pd_entry_t *)l1pt->pv_va; pt_entry_t *ptep; pt_entry_t *qmap_pte; vm_paddr_t pa; vm_offset_t va; vm_size_t size; int l1idx, l2idx, l2next = 0; PDEBUG(1, printf("firstaddr = %08x, lastaddr = %08x\n", firstaddr, vm_max_kernel_address)); virtual_avail = firstaddr; kernel_pmap->pm_l1 = l1; kernel_l1pa = l1pt->pv_pa; /* * Scan the L1 translation table created by initarm() and create * the required metadata for all valid mappings found in it. */ for (l1idx = 0; l1idx < (L1_TABLE_SIZE / sizeof(pd_entry_t)); l1idx++) { pde = kernel_l1pt[l1idx]; /* * We're only interested in Coarse mappings. * pmap_extract() can deal with section mappings without * recourse to checking L2 metadata. */ if ((pde & L1_TYPE_MASK) != L1_TYPE_C) continue; /* * Lookup the KVA of this L2 descriptor table */ pa = (vm_paddr_t)(pde & L1_C_ADDR_MASK); ptep = (pt_entry_t *)kernel_pt_lookup(pa); if (ptep == NULL) { panic("pmap_bootstrap: No L2 for va 0x%x, pa 0x%lx", (u_int)l1idx << L1_S_SHIFT, (long unsigned int)pa); } /* * Fetch the associated L2 metadata structure. * Allocate a new one if necessary. */ if ((l2 = kernel_pmap->pm_l2[L2_IDX(l1idx)]) == NULL) { if (l2next == PMAP_STATIC_L2_SIZE) panic("pmap_bootstrap: out of static L2s"); kernel_pmap->pm_l2[L2_IDX(l1idx)] = l2 = &static_l2[l2next++]; } /* * One more L1 slot tracked... */ l2->l2_occupancy++; /* * Fill in the details of the L2 descriptor in the * appropriate bucket. */ l2b = &l2->l2_bucket[L2_BUCKET(l1idx)]; l2b->l2b_kva = ptep; l2b->l2b_phys = pa; l2b->l2b_l1idx = l1idx; /* * Establish an initial occupancy count for this descriptor */ for (l2idx = 0; l2idx < (L2_TABLE_SIZE_REAL / sizeof(pt_entry_t)); l2idx++) { if ((ptep[l2idx] & L2_TYPE_MASK) != L2_TYPE_INV) { l2b->l2b_occupancy++; } } /* * Make sure the descriptor itself has the correct cache mode. * If not, fix it, but whine about the problem. Port-meisters * should consider this a clue to fix up their initarm() * function. :) */ if (pmap_set_pt_cache_mode(kernel_l1pt, (vm_offset_t)ptep)) { printf("pmap_bootstrap: WARNING! wrong cache mode for " "L2 pte @ %p\n", ptep); } } /* * Ensure the primary (kernel) L1 has the correct cache mode for * a page table. Bitch if it is not correctly set. */ for (va = (vm_offset_t)kernel_l1pt; va < ((vm_offset_t)kernel_l1pt + L1_TABLE_SIZE); va += PAGE_SIZE) { if (pmap_set_pt_cache_mode(kernel_l1pt, va)) printf("pmap_bootstrap: WARNING! wrong cache mode for " "primary L1 @ 0x%x\n", va); } cpu_dcache_wbinv_all(); cpu_l2cache_wbinv_all(); cpu_tlb_flushID(); cpu_cpwait(); PMAP_LOCK_INIT(kernel_pmap); CPU_FILL(&kernel_pmap->pm_active); kernel_pmap->pm_domain = PMAP_DOMAIN_KERNEL; TAILQ_INIT(&kernel_pmap->pm_pvlist); /* * Initialize the global pv list lock. */ rw_init_flags(&pvh_global_lock, "pmap pv global", RW_RECURSE); /* * Reserve some special page table entries/VA space for temporary * mapping of pages. */ pmap_alloc_specials(&virtual_avail, 1, &csrcp, &csrc_pte); pmap_set_pt_cache_mode(kernel_l1pt, (vm_offset_t)csrc_pte); pmap_alloc_specials(&virtual_avail, 1, &cdstp, &cdst_pte); pmap_set_pt_cache_mode(kernel_l1pt, (vm_offset_t)cdst_pte); pmap_alloc_specials(&virtual_avail, 1, &qmap_addr, &qmap_pte); pmap_set_pt_cache_mode(kernel_l1pt, (vm_offset_t)qmap_pte); size = ((vm_max_kernel_address - pmap_curmaxkvaddr) + L1_S_OFFSET) / L1_S_SIZE; pmap_alloc_specials(&virtual_avail, round_page(size * L2_TABLE_SIZE_REAL) / PAGE_SIZE, &pmap_kernel_l2ptp_kva, NULL); size = (size + (L2_BUCKET_SIZE - 1)) / L2_BUCKET_SIZE; pmap_alloc_specials(&virtual_avail, round_page(size * sizeof(struct l2_dtable)) / PAGE_SIZE, &pmap_kernel_l2dtable_kva, NULL); pmap_alloc_specials(&virtual_avail, 1, (vm_offset_t*)&_tmppt, NULL); pmap_alloc_specials(&virtual_avail, MAXDUMPPGS, (vm_offset_t *)&crashdumpmap, NULL); SLIST_INIT(&l1_list); TAILQ_INIT(&l1_lru_list); mtx_init(&l1_lru_lock, "l1 list lock", NULL, MTX_DEF); pmap_init_l1(l1, kernel_l1pt); cpu_dcache_wbinv_all(); cpu_l2cache_wbinv_all(); virtual_avail = round_page(virtual_avail); virtual_end = vm_max_kernel_address; kernel_vm_end = pmap_curmaxkvaddr; mtx_init(&cmtx, "TMP mappings mtx", NULL, MTX_DEF); mtx_init(&qmap_mtx, "quick mapping mtx", NULL, MTX_DEF); pmap_set_pcb_pagedir(kernel_pmap, thread0.td_pcb); } /*************************************************** * Pmap allocation/deallocation routines. ***************************************************/ /* * Release any resources held by the given physical map. * Called when a pmap initialized by pmap_pinit is being released. * Should only be called if the map contains no valid mappings. */ void pmap_release(pmap_t pmap) { struct pcb *pcb; pmap_idcache_wbinv_all(pmap); cpu_l2cache_wbinv_all(); pmap_tlb_flushID(pmap); cpu_cpwait(); if (vector_page < KERNBASE) { struct pcb *curpcb = PCPU_GET(curpcb); pcb = thread0.td_pcb; if (pmap_is_current(pmap)) { /* * Frob the L1 entry corresponding to the vector * page so that it contains the kernel pmap's domain * number. This will ensure pmap_remove() does not * pull the current vector page out from under us. */ critical_enter(); *pcb->pcb_pl1vec = pcb->pcb_l1vec; cpu_domains(pcb->pcb_dacr); cpu_setttb(pcb->pcb_pagedir); critical_exit(); } pmap_remove(pmap, vector_page, vector_page + PAGE_SIZE); /* * Make sure cpu_switch(), et al, DTRT. This is safe to do * since this process has no remaining mappings of its own. */ curpcb->pcb_pl1vec = pcb->pcb_pl1vec; curpcb->pcb_l1vec = pcb->pcb_l1vec; curpcb->pcb_dacr = pcb->pcb_dacr; curpcb->pcb_pagedir = pcb->pcb_pagedir; } pmap_free_l1(pmap); dprintf("pmap_release()\n"); } /* * Helper function for pmap_grow_l2_bucket() */ static __inline int pmap_grow_map(vm_offset_t va, pt_entry_t cache_mode, vm_paddr_t *pap) { struct l2_bucket *l2b; pt_entry_t *ptep; vm_paddr_t pa; struct vm_page *pg; pg = vm_page_alloc(NULL, 0, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED); if (pg == NULL) return (1); pa = VM_PAGE_TO_PHYS(pg); if (pap) *pap = pa; l2b = pmap_get_l2_bucket(pmap_kernel(), va); ptep = &l2b->l2b_kva[l2pte_index(va)]; *ptep = L2_S_PROTO | pa | cache_mode | L2_S_PROT(PTE_KERNEL, VM_PROT_READ | VM_PROT_WRITE); PTE_SYNC(ptep); return (0); } /* * This is the same as pmap_alloc_l2_bucket(), except that it is only * used by pmap_growkernel(). */ static __inline struct l2_bucket * pmap_grow_l2_bucket(pmap_t pm, vm_offset_t va) { struct l2_dtable *l2; struct l2_bucket *l2b; struct l1_ttable *l1; pd_entry_t *pl1pd; u_short l1idx; vm_offset_t nva; l1idx = L1_IDX(va); if ((l2 = pm->pm_l2[L2_IDX(l1idx)]) == NULL) { /* * No mapping at this address, as there is * no entry in the L1 table. * Need to allocate a new l2_dtable. */ nva = pmap_kernel_l2dtable_kva; if ((nva & PAGE_MASK) == 0) { /* * Need to allocate a backing page */ if (pmap_grow_map(nva, pte_l2_s_cache_mode, NULL)) return (NULL); } l2 = (struct l2_dtable *)nva; nva += sizeof(struct l2_dtable); if ((nva & PAGE_MASK) < (pmap_kernel_l2dtable_kva & PAGE_MASK)) { /* * The new l2_dtable straddles a page boundary. * Map in another page to cover it. */ if (pmap_grow_map(nva, pte_l2_s_cache_mode, NULL)) return (NULL); } pmap_kernel_l2dtable_kva = nva; /* * Link it into the parent pmap */ pm->pm_l2[L2_IDX(l1idx)] = l2; memset(l2, 0, sizeof(*l2)); } l2b = &l2->l2_bucket[L2_BUCKET(l1idx)]; /* * Fetch pointer to the L2 page table associated with the address. */ if (l2b->l2b_kva == NULL) { pt_entry_t *ptep; /* * No L2 page table has been allocated. Chances are, this * is because we just allocated the l2_dtable, above. */ nva = pmap_kernel_l2ptp_kva; ptep = (pt_entry_t *)nva; if ((nva & PAGE_MASK) == 0) { /* * Need to allocate a backing page */ if (pmap_grow_map(nva, pte_l2_s_cache_mode_pt, &pmap_kernel_l2ptp_phys)) return (NULL); PTE_SYNC_RANGE(ptep, PAGE_SIZE / sizeof(pt_entry_t)); } memset(ptep, 0, L2_TABLE_SIZE_REAL); l2->l2_occupancy++; l2b->l2b_kva = ptep; l2b->l2b_l1idx = l1idx; l2b->l2b_phys = pmap_kernel_l2ptp_phys; pmap_kernel_l2ptp_kva += L2_TABLE_SIZE_REAL; pmap_kernel_l2ptp_phys += L2_TABLE_SIZE_REAL; } /* Distribute new L1 entry to all other L1s */ SLIST_FOREACH(l1, &l1_list, l1_link) { pl1pd = &l1->l1_kva[L1_IDX(va)]; *pl1pd = l2b->l2b_phys | L1_C_DOM(PMAP_DOMAIN_KERNEL) | L1_C_PROTO; PTE_SYNC(pl1pd); } return (l2b); } /* * grow the number of kernel page table entries, if needed */ void pmap_growkernel(vm_offset_t addr) { pmap_t kpm = pmap_kernel(); if (addr <= pmap_curmaxkvaddr) return; /* we are OK */ /* * whoops! we need to add kernel PTPs */ /* Map 1MB at a time */ for (; pmap_curmaxkvaddr < addr; pmap_curmaxkvaddr += L1_S_SIZE) pmap_grow_l2_bucket(kpm, pmap_curmaxkvaddr); /* * flush out the cache, expensive but growkernel will happen so * rarely */ cpu_dcache_wbinv_all(); cpu_l2cache_wbinv_all(); cpu_tlb_flushD(); cpu_cpwait(); kernel_vm_end = pmap_curmaxkvaddr; } /* * Remove all pages from specified address space * this aids process exit speeds. Also, this code * is special cased for current process only, but * can have the more generic (and slightly slower) * mode enabled. This is much faster than pmap_remove * in the case of running down an entire address space. */ void pmap_remove_pages(pmap_t pmap) { struct pv_entry *pv, *npv; struct l2_bucket *l2b = NULL; vm_page_t m; pt_entry_t *pt; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); cpu_idcache_wbinv_all(); cpu_l2cache_wbinv_all(); for (pv = TAILQ_FIRST(&pmap->pm_pvlist); pv; pv = npv) { if (pv->pv_flags & PVF_WIRED || pv->pv_flags & PVF_UNMAN) { /* Cannot remove wired or unmanaged pages now. */ npv = TAILQ_NEXT(pv, pv_plist); continue; } pmap->pm_stats.resident_count--; l2b = pmap_get_l2_bucket(pmap, pv->pv_va); KASSERT(l2b != NULL, ("No L2 bucket in pmap_remove_pages")); pt = &l2b->l2b_kva[l2pte_index(pv->pv_va)]; m = PHYS_TO_VM_PAGE(*pt & L2_ADDR_MASK); KASSERT((vm_offset_t)m >= KERNBASE, ("Trying to access non-existent page va %x pte %x", pv->pv_va, *pt)); *pt = 0; PTE_SYNC(pt); npv = TAILQ_NEXT(pv, pv_plist); pmap_nuke_pv(m, pmap, pv); if (TAILQ_EMPTY(&m->md.pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); pmap_free_pv_entry(pv); pmap_free_l2_bucket(pmap, l2b, 1); } rw_wunlock(&pvh_global_lock); cpu_tlb_flushID(); cpu_cpwait(); PMAP_UNLOCK(pmap); } /*************************************************** * Low level mapping routines..... ***************************************************/ #ifdef ARM_HAVE_SUPERSECTIONS /* Map a super section into the KVA. */ void pmap_kenter_supersection(vm_offset_t va, uint64_t pa, int flags) { pd_entry_t pd = L1_S_PROTO | L1_S_SUPERSEC | (pa & L1_SUP_FRAME) | (((pa >> 32) & 0xf) << 20) | L1_S_PROT(PTE_KERNEL, VM_PROT_READ|VM_PROT_WRITE) | L1_S_DOM(PMAP_DOMAIN_KERNEL); struct l1_ttable *l1; vm_offset_t va0, va_end; KASSERT(((va | pa) & L1_SUP_OFFSET) == 0, ("Not a valid super section mapping")); if (flags & SECTION_CACHE) pd |= pte_l1_s_cache_mode; else if (flags & SECTION_PT) pd |= pte_l1_s_cache_mode_pt; va0 = va & L1_SUP_FRAME; va_end = va + L1_SUP_SIZE; SLIST_FOREACH(l1, &l1_list, l1_link) { va = va0; for (; va < va_end; va += L1_S_SIZE) { l1->l1_kva[L1_IDX(va)] = pd; PTE_SYNC(&l1->l1_kva[L1_IDX(va)]); } } } #endif /* Map a section into the KVA. */ void pmap_kenter_section(vm_offset_t va, vm_offset_t pa, int flags) { pd_entry_t pd = L1_S_PROTO | pa | L1_S_PROT(PTE_KERNEL, VM_PROT_READ|VM_PROT_WRITE) | L1_S_DOM(PMAP_DOMAIN_KERNEL); struct l1_ttable *l1; KASSERT(((va | pa) & L1_S_OFFSET) == 0, ("Not a valid section mapping")); if (flags & SECTION_CACHE) pd |= pte_l1_s_cache_mode; else if (flags & SECTION_PT) pd |= pte_l1_s_cache_mode_pt; SLIST_FOREACH(l1, &l1_list, l1_link) { l1->l1_kva[L1_IDX(va)] = pd; PTE_SYNC(&l1->l1_kva[L1_IDX(va)]); } } /* * Make a temporary mapping for a physical address. This is only intended * to be used for panic dumps. */ void * pmap_kenter_temporary(vm_paddr_t pa, int i) { vm_offset_t va; va = (vm_offset_t)crashdumpmap + (i * PAGE_SIZE); pmap_kenter(va, pa); return ((void *)crashdumpmap); } /* * add a wired page to the kva * note that in order for the mapping to take effect -- you * should do a invltlb after doing the pmap_kenter... */ static PMAP_INLINE void pmap_kenter_internal(vm_offset_t va, vm_offset_t pa, int flags) { struct l2_bucket *l2b; pt_entry_t *pte; pt_entry_t opte; struct pv_entry *pve; vm_page_t m; PDEBUG(1, printf("pmap_kenter: va = %08x, pa = %08x\n", (uint32_t) va, (uint32_t) pa)); l2b = pmap_get_l2_bucket(pmap_kernel(), va); if (l2b == NULL) l2b = pmap_grow_l2_bucket(pmap_kernel(), va); KASSERT(l2b != NULL, ("No L2 Bucket")); pte = &l2b->l2b_kva[l2pte_index(va)]; opte = *pte; PDEBUG(1, printf("pmap_kenter: pte = %08x, opte = %08x, npte = %08x\n", (uint32_t) pte, opte, *pte)); if (l2pte_valid(opte)) { pmap_kremove(va); } else { if (opte == 0) l2b->l2b_occupancy++; } *pte = L2_S_PROTO | pa | L2_S_PROT(PTE_KERNEL, VM_PROT_READ | VM_PROT_WRITE); if (flags & KENTER_CACHE) *pte |= pte_l2_s_cache_mode; if (flags & KENTER_USER) *pte |= L2_S_PROT_U; PTE_SYNC(pte); /* * A kernel mapping may not be the page's only mapping, so create a PV * entry to ensure proper caching. * * The existence test for the pvzone is used to delay the recording of * kernel mappings until the VM system is fully initialized. * * This expects the physical memory to have a vm_page_array entry. */ if (pvzone != NULL && (m = vm_phys_paddr_to_vm_page(pa)) != NULL) { rw_wlock(&pvh_global_lock); if (!TAILQ_EMPTY(&m->md.pv_list) || m->md.pv_kva != 0) { if ((pve = pmap_get_pv_entry()) == NULL) panic("pmap_kenter_internal: no pv entries"); PMAP_LOCK(pmap_kernel()); pmap_enter_pv(m, pve, pmap_kernel(), va, PVF_WRITE | PVF_UNMAN); pmap_fix_cache(m, pmap_kernel(), va); PMAP_UNLOCK(pmap_kernel()); } else { m->md.pv_kva = va; } rw_wunlock(&pvh_global_lock); } } void pmap_kenter(vm_offset_t va, vm_paddr_t pa) { pmap_kenter_internal(va, pa, KENTER_CACHE); } void pmap_kenter_nocache(vm_offset_t va, vm_paddr_t pa) { pmap_kenter_internal(va, pa, 0); } void pmap_kenter_device(vm_offset_t va, vm_size_t size, vm_paddr_t pa) { vm_offset_t sva; KASSERT((size & PAGE_MASK) == 0, ("%s: device mapping not page-sized", __func__)); sva = va; while (size != 0) { pmap_kenter_internal(va, pa, 0); va += PAGE_SIZE; pa += PAGE_SIZE; size -= PAGE_SIZE; } } void pmap_kremove_device(vm_offset_t va, vm_size_t size) { vm_offset_t sva; KASSERT((size & PAGE_MASK) == 0, ("%s: device mapping not page-sized", __func__)); sva = va; while (size != 0) { pmap_kremove(va); va += PAGE_SIZE; size -= PAGE_SIZE; } } void pmap_kenter_user(vm_offset_t va, vm_paddr_t pa) { pmap_kenter_internal(va, pa, KENTER_CACHE|KENTER_USER); /* * Call pmap_fault_fixup now, to make sure we'll have no exception * at the first use of the new address, or bad things will happen, * as we use one of these addresses in the exception handlers. */ pmap_fault_fixup(pmap_kernel(), va, VM_PROT_READ|VM_PROT_WRITE, 1); } vm_paddr_t pmap_kextract(vm_offset_t va) { return (pmap_extract_locked(kernel_pmap, va)); } /* * remove a page from the kernel pagetables */ void pmap_kremove(vm_offset_t va) { struct l2_bucket *l2b; pt_entry_t *pte, opte; struct pv_entry *pve; vm_page_t m; vm_offset_t pa; l2b = pmap_get_l2_bucket(pmap_kernel(), va); if (!l2b) return; KASSERT(l2b != NULL, ("No L2 Bucket")); pte = &l2b->l2b_kva[l2pte_index(va)]; opte = *pte; if (l2pte_valid(opte)) { /* pa = vtophs(va) taken from pmap_extract() */ switch (opte & L2_TYPE_MASK) { case L2_TYPE_L: pa = (opte & L2_L_FRAME) | (va & L2_L_OFFSET); break; default: pa = (opte & L2_S_FRAME) | (va & L2_S_OFFSET); break; } /* note: should never have to remove an allocation * before the pvzone is initialized. */ rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap_kernel()); if (pvzone != NULL && (m = vm_phys_paddr_to_vm_page(pa)) && (pve = pmap_remove_pv(m, pmap_kernel(), va))) pmap_free_pv_entry(pve); PMAP_UNLOCK(pmap_kernel()); rw_wunlock(&pvh_global_lock); va = va & ~PAGE_MASK; cpu_dcache_wbinv_range(va, PAGE_SIZE); cpu_l2cache_wbinv_range(va, PAGE_SIZE); cpu_tlb_flushD_SE(va); cpu_cpwait(); *pte = 0; } } /* * Used to map a range of physical addresses into kernel * virtual address space. * * The value passed in '*virt' is a suggested virtual address for * the mapping. Architectures which can support a direct-mapped * physical to virtual region can return the appropriate address * within that region, leaving '*virt' unchanged. Other * architectures should map the pages starting at '*virt' and * update '*virt' with the first usable address after the mapped * region. */ vm_offset_t pmap_map(vm_offset_t *virt, vm_offset_t start, vm_offset_t end, int prot) { vm_offset_t sva = *virt; vm_offset_t va = sva; PDEBUG(1, printf("pmap_map: virt = %08x, start = %08x, end = %08x, " "prot = %d\n", (uint32_t) *virt, (uint32_t) start, (uint32_t) end, prot)); while (start < end) { pmap_kenter(va, start); va += PAGE_SIZE; start += PAGE_SIZE; } *virt = va; return (sva); } static void pmap_wb_page(vm_page_t m) { struct pv_entry *pv; TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) pmap_dcache_wb_range(pv->pv_pmap, pv->pv_va, PAGE_SIZE, FALSE, (pv->pv_flags & PVF_WRITE) == 0); } static void pmap_inv_page(vm_page_t m) { struct pv_entry *pv; TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) pmap_dcache_wb_range(pv->pv_pmap, pv->pv_va, PAGE_SIZE, TRUE, TRUE); } /* * Add a list of wired pages to the kva * this routine is only used for temporary * kernel mappings that do not need to have * page modification or references recorded. * Note that old mappings are simply written * over. The page *must* be wired. */ void pmap_qenter(vm_offset_t va, vm_page_t *m, int count) { int i; for (i = 0; i < count; i++) { pmap_wb_page(m[i]); pmap_kenter_internal(va, VM_PAGE_TO_PHYS(m[i]), KENTER_CACHE); va += PAGE_SIZE; } } /* * this routine jerks page mappings from the * kernel -- it is meant only for temporary mappings. */ void pmap_qremove(vm_offset_t va, int count) { vm_paddr_t pa; int i; for (i = 0; i < count; i++) { pa = vtophys(va); if (pa) { pmap_inv_page(PHYS_TO_VM_PAGE(pa)); pmap_kremove(va); } va += PAGE_SIZE; } } /* * pmap_object_init_pt preloads the ptes for a given object * into the specified pmap. This eliminates the blast of soft * faults on process startup and immediately after an mmap. */ void pmap_object_init_pt(pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("pmap_object_init_pt: non-device object")); } /* * pmap_is_prefaultable: * * Return whether or not the specified virtual address is elgible * for prefault. */ boolean_t pmap_is_prefaultable(pmap_t pmap, vm_offset_t addr) { pd_entry_t *pde; pt_entry_t *pte; if (!pmap_get_pde_pte(pmap, addr, &pde, &pte)) return (FALSE); KASSERT(pte != NULL, ("Valid mapping but no pte ?")); if (*pte == 0) return (TRUE); return (FALSE); } /* * Fetch pointers to the PDE/PTE for the given pmap/VA pair. * Returns TRUE if the mapping exists, else FALSE. * * NOTE: This function is only used by a couple of arm-specific modules. * It is not safe to take any pmap locks here, since we could be right * in the middle of debugging the pmap anyway... * * It is possible for this routine to return FALSE even though a valid * mapping does exist. This is because we don't lock, so the metadata * state may be inconsistent. * * NOTE: We can return a NULL *ptp in the case where the L1 pde is * a "section" mapping. */ boolean_t pmap_get_pde_pte(pmap_t pm, vm_offset_t va, pd_entry_t **pdp, pt_entry_t **ptp) { struct l2_dtable *l2; pd_entry_t *pl1pd, l1pd; pt_entry_t *ptep; u_short l1idx; if (pm->pm_l1 == NULL) return (FALSE); l1idx = L1_IDX(va); *pdp = pl1pd = &pm->pm_l1->l1_kva[l1idx]; l1pd = *pl1pd; if (l1pte_section_p(l1pd)) { *ptp = NULL; return (TRUE); } if (pm->pm_l2 == NULL) return (FALSE); l2 = pm->pm_l2[L2_IDX(l1idx)]; if (l2 == NULL || (ptep = l2->l2_bucket[L2_BUCKET(l1idx)].l2b_kva) == NULL) { return (FALSE); } *ptp = &ptep[l2pte_index(va)]; return (TRUE); } /* * Routine: pmap_remove_all * Function: * Removes this physical page from * all physical maps in which it resides. * Reflects back modify bits to the pager. * * Notes: * Original versions of this routine were very * inefficient because they iteratively called * pmap_remove (slow...) */ void pmap_remove_all(vm_page_t m) { pv_entry_t pv; pt_entry_t *ptep; struct l2_bucket *l2b; boolean_t flush = FALSE; pmap_t curpm; int flags = 0; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_all: page %p is not managed", m)); if (TAILQ_EMPTY(&m->md.pv_list)) return; rw_wlock(&pvh_global_lock); /* * XXX This call shouldn't exist. Iterating over the PV list twice, * once in pmap_clearbit() and again below, is both unnecessary and * inefficient. The below code should itself write back the cache * entry before it destroys the mapping. */ pmap_clearbit(m, PVF_WRITE); curpm = vmspace_pmap(curproc->p_vmspace); while ((pv = TAILQ_FIRST(&m->md.pv_list)) != NULL) { if (flush == FALSE && (pv->pv_pmap == curpm || pv->pv_pmap == pmap_kernel())) flush = TRUE; PMAP_LOCK(pv->pv_pmap); /* * Cached contents were written-back in pmap_clearbit(), * but we still have to invalidate the cache entry to make * sure stale data are not retrieved when another page will be * mapped under this virtual address. */ if (pmap_is_current(pv->pv_pmap)) { cpu_dcache_inv_range(pv->pv_va, PAGE_SIZE); if (pmap_has_valid_mapping(pv->pv_pmap, pv->pv_va)) cpu_l2cache_inv_range(pv->pv_va, PAGE_SIZE); } if (pv->pv_flags & PVF_UNMAN) { /* remove the pv entry, but do not remove the mapping * and remember this is a kernel mapped page */ m->md.pv_kva = pv->pv_va; } else { /* remove the mapping and pv entry */ l2b = pmap_get_l2_bucket(pv->pv_pmap, pv->pv_va); KASSERT(l2b != NULL, ("No l2 bucket")); ptep = &l2b->l2b_kva[l2pte_index(pv->pv_va)]; *ptep = 0; PTE_SYNC_CURRENT(pv->pv_pmap, ptep); pmap_free_l2_bucket(pv->pv_pmap, l2b, 1); pv->pv_pmap->pm_stats.resident_count--; flags |= pv->pv_flags; } pmap_nuke_pv(m, pv->pv_pmap, pv); PMAP_UNLOCK(pv->pv_pmap); pmap_free_pv_entry(pv); } if (flush) { if (PV_BEEN_EXECD(flags)) pmap_tlb_flushID(curpm); else pmap_tlb_flushD(curpm); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); } /* * Set the physical protection on the * specified range of this map as requested. */ void pmap_protect(pmap_t pm, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { struct l2_bucket *l2b; pt_entry_t *ptep, pte; vm_offset_t next_bucket; u_int flags; int flush; CTR4(KTR_PMAP, "pmap_protect: pmap %p sva 0x%08x eva 0x%08x prot %x", pm, sva, eva, prot); if ((prot & VM_PROT_READ) == 0) { pmap_remove(pm, sva, eva); return; } if (prot & VM_PROT_WRITE) { /* * If this is a read->write transition, just ignore it and let * vm_fault() take care of it later. */ return; } rw_wlock(&pvh_global_lock); PMAP_LOCK(pm); /* * OK, at this point, we know we're doing write-protect operation. * If the pmap is active, write-back the range. */ pmap_dcache_wb_range(pm, sva, eva - sva, FALSE, FALSE); flush = ((eva - sva) >= (PAGE_SIZE * 4)) ? 0 : -1; flags = 0; while (sva < eva) { next_bucket = L2_NEXT_BUCKET(sva); if (next_bucket > eva) next_bucket = eva; l2b = pmap_get_l2_bucket(pm, sva); if (l2b == NULL) { sva = next_bucket; continue; } ptep = &l2b->l2b_kva[l2pte_index(sva)]; while (sva < next_bucket) { if ((pte = *ptep) != 0 && (pte & L2_S_PROT_W) != 0) { struct vm_page *pg; u_int f; pg = PHYS_TO_VM_PAGE(l2pte_pa(pte)); pte &= ~L2_S_PROT_W; *ptep = pte; PTE_SYNC(ptep); if (!(pg->oflags & VPO_UNMANAGED)) { f = pmap_modify_pv(pg, pm, sva, PVF_WRITE, 0); if (f & PVF_WRITE) vm_page_dirty(pg); } else f = 0; if (flush >= 0) { flush++; flags |= f; } else if (PV_BEEN_EXECD(f)) pmap_tlb_flushID_SE(pm, sva); else if (PV_BEEN_REFD(f)) pmap_tlb_flushD_SE(pm, sva); } sva += PAGE_SIZE; ptep++; } } if (flush) { if (PV_BEEN_EXECD(flags)) pmap_tlb_flushID(pm); else if (PV_BEEN_REFD(flags)) pmap_tlb_flushD(pm); } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pm); } /* * Insert the given physical page (p) at * the specified virtual address (v) in the * target physical map with the protection requested. * * If specified, the page will be wired down, meaning * that the related pte can not be reclaimed. * * NB: This is the only routine which MAY NOT lazy-evaluate * or lose information. That is, this routine must actually * insert this page into the given map NOW. */ int pmap_enter(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind __unused) { int rv; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); rv = pmap_enter_locked(pmap, va, m, prot, flags); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (rv); } /* * The pvh global and pmap locks must be held. */ static int pmap_enter_locked(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags) { struct l2_bucket *l2b = NULL; struct vm_page *opg; struct pv_entry *pve = NULL; pt_entry_t *ptep, npte, opte; u_int nflags; u_int oflags; vm_paddr_t pa; PMAP_ASSERT_LOCKED(pmap); rw_assert(&pvh_global_lock, RA_WLOCKED); if (va == vector_page) { pa = systempage.pv_pa; m = NULL; } else { if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); pa = VM_PAGE_TO_PHYS(m); } nflags = 0; if (prot & VM_PROT_WRITE) nflags |= PVF_WRITE; if (prot & VM_PROT_EXECUTE) nflags |= PVF_EXEC; if ((flags & PMAP_ENTER_WIRED) != 0) nflags |= PVF_WIRED; PDEBUG(1, printf("pmap_enter: pmap = %08x, va = %08x, m = %08x, prot = %x, " "flags = %x\n", (uint32_t) pmap, va, (uint32_t) m, prot, flags)); if (pmap == pmap_kernel()) { l2b = pmap_get_l2_bucket(pmap, va); if (l2b == NULL) l2b = pmap_grow_l2_bucket(pmap, va); } else { do_l2b_alloc: l2b = pmap_alloc_l2_bucket(pmap, va); if (l2b == NULL) { if ((flags & PMAP_ENTER_NOSLEEP) == 0) { PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); VM_WAIT; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); goto do_l2b_alloc; } return (KERN_RESOURCE_SHORTAGE); } } ptep = &l2b->l2b_kva[l2pte_index(va)]; opte = *ptep; npte = pa; oflags = 0; if (opte) { /* * There is already a mapping at this address. * If the physical address is different, lookup the * vm_page. */ if (l2pte_pa(opte) != pa) opg = PHYS_TO_VM_PAGE(l2pte_pa(opte)); else opg = m; } else opg = NULL; if ((prot & (VM_PROT_ALL)) || (!m || m->md.pvh_attrs & PVF_REF)) { /* * - The access type indicates that we don't need * to do referenced emulation. * OR * - The physical page has already been referenced * so no need to re-do referenced emulation here. */ npte |= L2_S_PROTO; nflags |= PVF_REF; if (m && ((prot & VM_PROT_WRITE) != 0 || (m->md.pvh_attrs & PVF_MOD))) { /* * This is a writable mapping, and the * page's mod state indicates it has * already been modified. Make it * writable from the outset. */ nflags |= PVF_MOD; if (!(m->md.pvh_attrs & PVF_MOD)) vm_page_dirty(m); } if (m && opte) vm_page_aflag_set(m, PGA_REFERENCED); } else { /* * Need to do page referenced emulation. */ npte |= L2_TYPE_INV; } if (prot & VM_PROT_WRITE) { npte |= L2_S_PROT_W; if (m != NULL && (m->oflags & VPO_UNMANAGED) == 0) vm_page_aflag_set(m, PGA_WRITEABLE); } if (m->md.pv_memattr != VM_MEMATTR_UNCACHEABLE) npte |= pte_l2_s_cache_mode; if (m && m == opg) { /* * We're changing the attrs of an existing mapping. */ oflags = pmap_modify_pv(m, pmap, va, PVF_WRITE | PVF_EXEC | PVF_WIRED | PVF_MOD | PVF_REF, nflags); /* * We may need to flush the cache if we're * doing rw-ro... */ if (pmap_is_current(pmap) && (oflags & PVF_NC) == 0 && (opte & L2_S_PROT_W) != 0 && (prot & VM_PROT_WRITE) == 0 && (opte & L2_TYPE_MASK) != L2_TYPE_INV) { cpu_dcache_wb_range(va, PAGE_SIZE); cpu_l2cache_wb_range(va, PAGE_SIZE); } } else { /* * New mapping, or changing the backing page * of an existing mapping. */ if (opg) { /* * Replacing an existing mapping with a new one. * It is part of our managed memory so we * must remove it from the PV list */ if ((pve = pmap_remove_pv(opg, pmap, va))) { /* note for patch: the oflags/invalidation was moved * because PG_FICTITIOUS pages could free the pve */ oflags = pve->pv_flags; /* * If the old mapping was valid (ref/mod * emulation creates 'invalid' mappings * initially) then make sure to frob * the cache. */ if ((oflags & PVF_NC) == 0 && l2pte_valid(opte)) { if (PV_BEEN_EXECD(oflags)) { pmap_idcache_wbinv_range(pmap, va, PAGE_SIZE); } else if (PV_BEEN_REFD(oflags)) { pmap_dcache_wb_range(pmap, va, PAGE_SIZE, TRUE, (oflags & PVF_WRITE) == 0); } } /* free/allocate a pv_entry for UNMANAGED pages if * this physical page is not/is already mapped. */ if (m && (m->oflags & VPO_UNMANAGED) && !m->md.pv_kva && TAILQ_EMPTY(&m->md.pv_list)) { pmap_free_pv_entry(pve); pve = NULL; } } else if (m && (!(m->oflags & VPO_UNMANAGED) || m->md.pv_kva || !TAILQ_EMPTY(&m->md.pv_list))) pve = pmap_get_pv_entry(); } else if (m && (!(m->oflags & VPO_UNMANAGED) || m->md.pv_kva || !TAILQ_EMPTY(&m->md.pv_list))) pve = pmap_get_pv_entry(); if (m) { if ((m->oflags & VPO_UNMANAGED)) { if (!TAILQ_EMPTY(&m->md.pv_list) || m->md.pv_kva) { KASSERT(pve != NULL, ("No pv")); nflags |= PVF_UNMAN; pmap_enter_pv(m, pve, pmap, va, nflags); } else m->md.pv_kva = va; } else { KASSERT(va < kmi.clean_sva || va >= kmi.clean_eva, ("pmap_enter: managed mapping within the clean submap")); KASSERT(pve != NULL, ("No pv")); pmap_enter_pv(m, pve, pmap, va, nflags); } } } /* * Make sure userland mappings get the right permissions */ if (pmap != pmap_kernel() && va != vector_page) { npte |= L2_S_PROT_U; } /* * Keep the stats up to date */ if (opte == 0) { l2b->l2b_occupancy++; pmap->pm_stats.resident_count++; } /* * If this is just a wiring change, the two PTEs will be * identical, so there's no need to update the page table. */ if (npte != opte) { boolean_t is_cached = pmap_is_current(pmap); *ptep = npte; if (is_cached) { /* * We only need to frob the cache/tlb if this pmap * is current */ PTE_SYNC(ptep); if (L1_IDX(va) != L1_IDX(vector_page) && l2pte_valid(npte)) { /* * This mapping is likely to be accessed as * soon as we return to userland. Fix up the * L1 entry to avoid taking another * page/domain fault. */ pd_entry_t *pl1pd, l1pd; pl1pd = &pmap->pm_l1->l1_kva[L1_IDX(va)]; l1pd = l2b->l2b_phys | L1_C_DOM(pmap->pm_domain) | L1_C_PROTO; if (*pl1pd != l1pd) { *pl1pd = l1pd; PTE_SYNC(pl1pd); } } } if (PV_BEEN_EXECD(oflags)) pmap_tlb_flushID_SE(pmap, va); else if (PV_BEEN_REFD(oflags)) pmap_tlb_flushD_SE(pmap, va); if (m) pmap_fix_cache(m, pmap, va); } return (KERN_SUCCESS); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ void pmap_enter_object(pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { vm_page_t m; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); m = m_start; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { pmap_enter_locked(pmap, start + ptoa(diff), m, prot & (VM_PROT_READ | VM_PROT_EXECUTE), PMAP_ENTER_NOSLEEP); m = TAILQ_NEXT(m, listq); } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * this code makes some *MAJOR* assumptions: * 1. Current pmap & pmap exists. * 2. Not wired. * 3. Read access. * 4. No page table pages. * but is *MUCH* faster than pmap_enter... */ void pmap_enter_quick(pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); pmap_enter_locked(pmap, va, m, prot & (VM_PROT_READ | VM_PROT_EXECUTE), PMAP_ENTER_NOSLEEP); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range * must have the wired attribute set. In contrast, invalid mappings * cannot have the wired attribute set, so they are ignored. * * XXX Wired mappings of unmanaged pages cannot be counted by this pmap * implementation. */ void pmap_unwire(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { struct l2_bucket *l2b; pt_entry_t *ptep, pte; pv_entry_t pv; vm_offset_t next_bucket; vm_page_t m; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (sva < eva) { next_bucket = L2_NEXT_BUCKET(sva); if (next_bucket > eva) next_bucket = eva; l2b = pmap_get_l2_bucket(pmap, sva); if (l2b == NULL) { sva = next_bucket; continue; } for (ptep = &l2b->l2b_kva[l2pte_index(sva)]; sva < next_bucket; sva += PAGE_SIZE, ptep++) { if ((pte = *ptep) == 0 || (m = PHYS_TO_VM_PAGE(l2pte_pa(pte))) == NULL || (m->oflags & VPO_UNMANAGED) != 0) continue; pv = pmap_find_pv(m, pmap, sva); if ((pv->pv_flags & PVF_WIRED) == 0) panic("pmap_unwire: pv %p isn't wired", pv); pv->pv_flags &= ~PVF_WIRED; pmap->pm_stats.wired_count--; } } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ void pmap_copy(pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { } /* * Routine: pmap_extract * Function: * Extract the physical page address associated * with the given map/virtual_address pair. */ vm_paddr_t pmap_extract(pmap_t pmap, vm_offset_t va) { vm_paddr_t pa; PMAP_LOCK(pmap); pa = pmap_extract_locked(pmap, va); PMAP_UNLOCK(pmap); return (pa); } static vm_paddr_t pmap_extract_locked(pmap_t pmap, vm_offset_t va) { struct l2_dtable *l2; pd_entry_t l1pd; pt_entry_t *ptep, pte; vm_paddr_t pa; u_int l1idx; if (pmap != kernel_pmap) PMAP_ASSERT_LOCKED(pmap); l1idx = L1_IDX(va); l1pd = pmap->pm_l1->l1_kva[l1idx]; if (l1pte_section_p(l1pd)) { /* * These should only happen for the kernel pmap. */ KASSERT(pmap == kernel_pmap, ("unexpected section")); /* XXX: what to do about the bits > 32 ? */ if (l1pd & L1_S_SUPERSEC) pa = (l1pd & L1_SUP_FRAME) | (va & L1_SUP_OFFSET); else pa = (l1pd & L1_S_FRAME) | (va & L1_S_OFFSET); } else { /* * Note that we can't rely on the validity of the L1 * descriptor as an indication that a mapping exists. * We have to look it up in the L2 dtable. */ l2 = pmap->pm_l2[L2_IDX(l1idx)]; if (l2 == NULL || (ptep = l2->l2_bucket[L2_BUCKET(l1idx)].l2b_kva) == NULL) return (0); pte = ptep[l2pte_index(va)]; if (pte == 0) return (0); switch (pte & L2_TYPE_MASK) { case L2_TYPE_L: pa = (pte & L2_L_FRAME) | (va & L2_L_OFFSET); break; default: pa = (pte & L2_S_FRAME) | (va & L2_S_OFFSET); break; } } return (pa); } /* * Atomically extract and hold the physical page with the given * pmap and virtual address pair if that mapping permits the given * protection. * */ vm_page_t pmap_extract_and_hold(pmap_t pmap, vm_offset_t va, vm_prot_t prot) { struct l2_dtable *l2; pd_entry_t l1pd; pt_entry_t *ptep, pte; vm_paddr_t pa, paddr; vm_page_t m = NULL; u_int l1idx; l1idx = L1_IDX(va); paddr = 0; PMAP_LOCK(pmap); retry: l1pd = pmap->pm_l1->l1_kva[l1idx]; if (l1pte_section_p(l1pd)) { /* * These should only happen for pmap_kernel() */ KASSERT(pmap == pmap_kernel(), ("huh")); /* XXX: what to do about the bits > 32 ? */ if (l1pd & L1_S_SUPERSEC) pa = (l1pd & L1_SUP_FRAME) | (va & L1_SUP_OFFSET); else pa = (l1pd & L1_S_FRAME) | (va & L1_S_OFFSET); if (vm_page_pa_tryrelock(pmap, pa & PG_FRAME, &paddr)) goto retry; if (l1pd & L1_S_PROT_W || (prot & VM_PROT_WRITE) == 0) { m = PHYS_TO_VM_PAGE(pa); vm_page_hold(m); } } else { /* * Note that we can't rely on the validity of the L1 * descriptor as an indication that a mapping exists. * We have to look it up in the L2 dtable. */ l2 = pmap->pm_l2[L2_IDX(l1idx)]; if (l2 == NULL || (ptep = l2->l2_bucket[L2_BUCKET(l1idx)].l2b_kva) == NULL) { PMAP_UNLOCK(pmap); return (NULL); } ptep = &ptep[l2pte_index(va)]; pte = *ptep; if (pte == 0) { PMAP_UNLOCK(pmap); return (NULL); } if (pte & L2_S_PROT_W || (prot & VM_PROT_WRITE) == 0) { switch (pte & L2_TYPE_MASK) { case L2_TYPE_L: pa = (pte & L2_L_FRAME) | (va & L2_L_OFFSET); break; default: pa = (pte & L2_S_FRAME) | (va & L2_S_OFFSET); break; } if (vm_page_pa_tryrelock(pmap, pa & PG_FRAME, &paddr)) goto retry; m = PHYS_TO_VM_PAGE(pa); vm_page_hold(m); } } PMAP_UNLOCK(pmap); PA_UNLOCK_COND(paddr); return (m); } +vm_paddr_t +pmap_dump_kextract(vm_offset_t va, pt2_entry_t *pte2p) +{ + struct l2_dtable *l2; + pd_entry_t l1pd; + pt_entry_t *ptep, pte; + vm_paddr_t pa; + u_int l1idx; + + l1idx = L1_IDX(va); + l1pd = kernel_pmap->pm_l1->l1_kva[l1idx]; + if (l1pte_section_p(l1pd)) { + if (l1pd & L1_S_SUPERSEC) + pa = (l1pd & L1_SUP_FRAME) | (va & L1_SUP_OFFSET); + else + pa = (l1pd & L1_S_FRAME) | (va & L1_S_OFFSET); + pte = L2_S_PROTO | pa | + L2_S_PROT(PTE_KERNEL, VM_PROT_READ | VM_PROT_WRITE); + } else { + l2 = kernel_pmap->pm_l2[L2_IDX(l1idx)]; + if (l2 == NULL || + (ptep = l2->l2_bucket[L2_BUCKET(l1idx)].l2b_kva) == NULL) { + pte = 0; + pa = 0; + goto out; + } + pte = ptep[l2pte_index(va)]; + if (pte == 0) { + pa = 0; + goto out; + } + switch (pte & L2_TYPE_MASK) { + case L2_TYPE_L: + pa = (pte & L2_L_FRAME) | (va & L2_L_OFFSET); + break; + default: + pa = (pte & L2_S_FRAME) | (va & L2_S_OFFSET); + break; + } + } +out: + if (pte2p != NULL) + *pte2p = pte; + return (pa); +} + /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. */ int pmap_pinit(pmap_t pmap) { PDEBUG(1, printf("pmap_pinit: pmap = %08x\n", (uint32_t) pmap)); pmap_alloc_l1(pmap); bzero(pmap->pm_l2, sizeof(pmap->pm_l2)); CPU_ZERO(&pmap->pm_active); TAILQ_INIT(&pmap->pm_pvlist); bzero(&pmap->pm_stats, sizeof pmap->pm_stats); pmap->pm_stats.resident_count = 1; if (vector_page < KERNBASE) { pmap_enter(pmap, vector_page, PHYS_TO_VM_PAGE(systempage.pv_pa), VM_PROT_READ, PMAP_ENTER_WIRED | VM_PROT_READ, 0); } return (1); } /*************************************************** * page management routines. ***************************************************/ static void pmap_free_pv_entry(pv_entry_t pv) { pv_entry_count--; uma_zfree(pvzone, pv); } /* * get a new pv_entry, allocating a block from the system * when needed. * the memory allocation is performed bypassing the malloc code * because of the possibility of allocations at interrupt time. */ static pv_entry_t pmap_get_pv_entry(void) { pv_entry_t ret_value; pv_entry_count++; if (pv_entry_count > pv_entry_high_water) pagedaemon_wakeup(); ret_value = uma_zalloc(pvzone, M_NOWAIT); return ret_value; } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly * rounded to the page size. */ #define PMAP_REMOVE_CLEAN_LIST_SIZE 3 void pmap_remove(pmap_t pm, vm_offset_t sva, vm_offset_t eva) { struct l2_bucket *l2b; vm_offset_t next_bucket; pt_entry_t *ptep; u_int total; u_int mappings, is_exec, is_refd; int flushall = 0; /* * we lock in the pmap => pv_head direction */ rw_wlock(&pvh_global_lock); PMAP_LOCK(pm); total = 0; while (sva < eva) { /* * Do one L2 bucket's worth at a time. */ next_bucket = L2_NEXT_BUCKET(sva); if (next_bucket > eva) next_bucket = eva; l2b = pmap_get_l2_bucket(pm, sva); if (l2b == NULL) { sva = next_bucket; continue; } ptep = &l2b->l2b_kva[l2pte_index(sva)]; mappings = 0; while (sva < next_bucket) { struct vm_page *pg; pt_entry_t pte; vm_paddr_t pa; pte = *ptep; if (pte == 0) { /* * Nothing here, move along */ sva += PAGE_SIZE; ptep++; continue; } pm->pm_stats.resident_count--; pa = l2pte_pa(pte); is_exec = 0; is_refd = 1; /* * Update flags. In a number of circumstances, * we could cluster a lot of these and do a * number of sequential pages in one go. */ if ((pg = PHYS_TO_VM_PAGE(pa)) != NULL) { struct pv_entry *pve; pve = pmap_remove_pv(pg, pm, sva); if (pve) { is_exec = PV_BEEN_EXECD(pve->pv_flags); is_refd = PV_BEEN_REFD(pve->pv_flags); pmap_free_pv_entry(pve); } } if (l2pte_valid(pte) && pmap_is_current(pm)) { if (total < PMAP_REMOVE_CLEAN_LIST_SIZE) { total++; if (is_exec) { cpu_idcache_wbinv_range(sva, PAGE_SIZE); cpu_l2cache_wbinv_range(sva, PAGE_SIZE); cpu_tlb_flushID_SE(sva); } else if (is_refd) { cpu_dcache_wbinv_range(sva, PAGE_SIZE); cpu_l2cache_wbinv_range(sva, PAGE_SIZE); cpu_tlb_flushD_SE(sva); } } else if (total == PMAP_REMOVE_CLEAN_LIST_SIZE) { /* flushall will also only get set for * for a current pmap */ cpu_idcache_wbinv_all(); cpu_l2cache_wbinv_all(); flushall = 1; total++; } } *ptep = 0; PTE_SYNC(ptep); sva += PAGE_SIZE; ptep++; mappings++; } pmap_free_l2_bucket(pm, l2b, mappings); } rw_wunlock(&pvh_global_lock); if (flushall) cpu_tlb_flushID(); PMAP_UNLOCK(pm); } /* * pmap_zero_page() * * Zero a given physical page by mapping it at a page hook point. * In doing the zero page op, the page we zero is mapped cachable, as with * StrongARM accesses to non-cached pages are non-burst making writing * _any_ bulk data very slow. */ #if ARM_MMU_GENERIC != 0 || defined(CPU_XSCALE_CORE3) void pmap_zero_page_generic(vm_paddr_t phys, int off, int size) { if (_arm_bzero && size >= _min_bzero_size && _arm_bzero((void *)(phys + off), size, IS_PHYSICAL) == 0) return; mtx_lock(&cmtx); /* * Hook in the page, zero it, invalidate the TLB as needed. * * Note the temporary zero-page mapping must be a non-cached page in * order to work without corruption when write-allocate is enabled. */ *cdst_pte = L2_S_PROTO | phys | L2_S_PROT(PTE_KERNEL, VM_PROT_WRITE); PTE_SYNC(cdst_pte); cpu_tlb_flushD_SE(cdstp); cpu_cpwait(); if (off || size != PAGE_SIZE) bzero((void *)(cdstp + off), size); else bzero_page(cdstp); mtx_unlock(&cmtx); } #endif /* ARM_MMU_GENERIC != 0 */ #if ARM_MMU_XSCALE == 1 void pmap_zero_page_xscale(vm_paddr_t phys, int off, int size) { if (_arm_bzero && size >= _min_bzero_size && _arm_bzero((void *)(phys + off), size, IS_PHYSICAL) == 0) return; mtx_lock(&cmtx); /* * Hook in the page, zero it, and purge the cache for that * zeroed page. Invalidate the TLB as needed. */ *cdst_pte = L2_S_PROTO | phys | L2_S_PROT(PTE_KERNEL, VM_PROT_WRITE) | L2_C | L2_XSCALE_T_TEX(TEX_XSCALE_X); /* mini-data */ PTE_SYNC(cdst_pte); cpu_tlb_flushD_SE(cdstp); cpu_cpwait(); if (off || size != PAGE_SIZE) bzero((void *)(cdstp + off), size); else bzero_page(cdstp); mtx_unlock(&cmtx); xscale_cache_clean_minidata(); } /* * Change the PTEs for the specified kernel mappings such that they * will use the mini data cache instead of the main data cache. */ void pmap_use_minicache(vm_offset_t va, vm_size_t size) { struct l2_bucket *l2b; pt_entry_t *ptep, *sptep, pte; vm_offset_t next_bucket, eva; #if (ARM_NMMUS > 1) || defined(CPU_XSCALE_CORE3) if (xscale_use_minidata == 0) return; #endif eva = va + size; while (va < eva) { next_bucket = L2_NEXT_BUCKET(va); if (next_bucket > eva) next_bucket = eva; l2b = pmap_get_l2_bucket(pmap_kernel(), va); sptep = ptep = &l2b->l2b_kva[l2pte_index(va)]; while (va < next_bucket) { pte = *ptep; if (!l2pte_minidata(pte)) { cpu_dcache_wbinv_range(va, PAGE_SIZE); cpu_tlb_flushD_SE(va); *ptep = pte & ~L2_B; } ptep++; va += PAGE_SIZE; } PTE_SYNC_RANGE(sptep, (u_int)(ptep - sptep)); } cpu_cpwait(); } #endif /* ARM_MMU_XSCALE == 1 */ /* * pmap_zero_page zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. */ void pmap_zero_page(vm_page_t m) { pmap_zero_page_func(VM_PAGE_TO_PHYS(m), 0, PAGE_SIZE); } /* * pmap_zero_page_area zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. * * off and size may not cover an area beyond a single hardware page. */ void pmap_zero_page_area(vm_page_t m, int off, int size) { pmap_zero_page_func(VM_PAGE_TO_PHYS(m), off, size); } /* * pmap_zero_page_idle zeros the specified hardware page by mapping * the page into KVM and using bzero to clear its contents. This * is intended to be called from the vm_pagezero process only and * outside of Giant. */ void pmap_zero_page_idle(vm_page_t m) { pmap_zero_page(m); } #if 0 /* * pmap_clean_page() * * This is a local function used to work out the best strategy to clean * a single page referenced by its entry in the PV table. It should be used by * pmap_copy_page, pmap_zero page and maybe some others later on. * * Its policy is effectively: * o If there are no mappings, we don't bother doing anything with the cache. * o If there is one mapping, we clean just that page. * o If there are multiple mappings, we clean the entire cache. * * So that some functions can be further optimised, it returns 0 if it didn't * clean the entire cache, or 1 if it did. * * XXX One bug in this routine is that if the pv_entry has a single page * mapped at 0x00000000 a whole cache clean will be performed rather than * just the 1 page. Since this should not occur in everyday use and if it does * it will just result in not the most efficient clean for the page. * * We don't yet use this function but may want to. */ static int pmap_clean_page(struct pv_entry *pv, boolean_t is_src) { pmap_t pm, pm_to_clean = NULL; struct pv_entry *npv; u_int cache_needs_cleaning = 0; u_int flags = 0; vm_offset_t page_to_clean = 0; if (pv == NULL) { /* nothing mapped in so nothing to flush */ return (0); } /* * Since we flush the cache each time we change to a different * user vmspace, we only need to flush the page if it is in the * current pmap. */ if (curthread) pm = vmspace_pmap(curproc->p_vmspace); else pm = pmap_kernel(); for (npv = pv; npv; npv = TAILQ_NEXT(npv, pv_list)) { if (npv->pv_pmap == pmap_kernel() || npv->pv_pmap == pm) { flags |= npv->pv_flags; /* * The page is mapped non-cacheable in * this map. No need to flush the cache. */ if (npv->pv_flags & PVF_NC) { #ifdef DIAGNOSTIC if (cache_needs_cleaning) panic("pmap_clean_page: " "cache inconsistency"); #endif break; } else if (is_src && (npv->pv_flags & PVF_WRITE) == 0) continue; if (cache_needs_cleaning) { page_to_clean = 0; break; } else { page_to_clean = npv->pv_va; pm_to_clean = npv->pv_pmap; } cache_needs_cleaning = 1; } } if (page_to_clean) { if (PV_BEEN_EXECD(flags)) pmap_idcache_wbinv_range(pm_to_clean, page_to_clean, PAGE_SIZE); else pmap_dcache_wb_range(pm_to_clean, page_to_clean, PAGE_SIZE, !is_src, (flags & PVF_WRITE) == 0); } else if (cache_needs_cleaning) { if (PV_BEEN_EXECD(flags)) pmap_idcache_wbinv_all(pm); else pmap_dcache_wbinv_all(pm); return (1); } return (0); } #endif /* * pmap_copy_page copies the specified (machine independent) * page by mapping the page into virtual memory and using * bcopy to copy the page, one machine dependent page at a * time. */ /* * pmap_copy_page() * * Copy one physical page into another, by mapping the pages into * hook points. The same comment regarding cachability as in * pmap_zero_page also applies here. */ #if ARM_MMU_GENERIC != 0 || defined (CPU_XSCALE_CORE3) void pmap_copy_page_generic(vm_paddr_t src, vm_paddr_t dst) { #if 0 struct vm_page *src_pg = PHYS_TO_VM_PAGE(src); #endif /* * Clean the source page. Hold the source page's lock for * the duration of the copy so that no other mappings can * be created while we have a potentially aliased mapping. */ #if 0 /* * XXX: Not needed while we call cpu_dcache_wbinv_all() in * pmap_copy_page(). */ (void) pmap_clean_page(TAILQ_FIRST(&src_pg->md.pv_list), TRUE); #endif /* * Map the pages into the page hook points, copy them, and purge * the cache for the appropriate page. Invalidate the TLB * as required. */ mtx_lock(&cmtx); *csrc_pte = L2_S_PROTO | src | L2_S_PROT(PTE_KERNEL, VM_PROT_READ) | pte_l2_s_cache_mode; PTE_SYNC(csrc_pte); *cdst_pte = L2_S_PROTO | dst | L2_S_PROT(PTE_KERNEL, VM_PROT_WRITE) | pte_l2_s_cache_mode; PTE_SYNC(cdst_pte); cpu_tlb_flushD_SE(csrcp); cpu_tlb_flushD_SE(cdstp); cpu_cpwait(); bcopy_page(csrcp, cdstp); mtx_unlock(&cmtx); cpu_dcache_inv_range(csrcp, PAGE_SIZE); cpu_dcache_wbinv_range(cdstp, PAGE_SIZE); cpu_l2cache_inv_range(csrcp, PAGE_SIZE); cpu_l2cache_wbinv_range(cdstp, PAGE_SIZE); } void pmap_copy_page_offs_generic(vm_paddr_t a_phys, vm_offset_t a_offs, vm_paddr_t b_phys, vm_offset_t b_offs, int cnt) { mtx_lock(&cmtx); *csrc_pte = L2_S_PROTO | a_phys | L2_S_PROT(PTE_KERNEL, VM_PROT_READ) | pte_l2_s_cache_mode; PTE_SYNC(csrc_pte); *cdst_pte = L2_S_PROTO | b_phys | L2_S_PROT(PTE_KERNEL, VM_PROT_WRITE) | pte_l2_s_cache_mode; PTE_SYNC(cdst_pte); cpu_tlb_flushD_SE(csrcp); cpu_tlb_flushD_SE(cdstp); cpu_cpwait(); bcopy((char *)csrcp + a_offs, (char *)cdstp + b_offs, cnt); mtx_unlock(&cmtx); cpu_dcache_inv_range(csrcp + a_offs, cnt); cpu_dcache_wbinv_range(cdstp + b_offs, cnt); cpu_l2cache_inv_range(csrcp + a_offs, cnt); cpu_l2cache_wbinv_range(cdstp + b_offs, cnt); } #endif /* ARM_MMU_GENERIC != 0 */ #if ARM_MMU_XSCALE == 1 void pmap_copy_page_xscale(vm_paddr_t src, vm_paddr_t dst) { #if 0 /* XXX: Only needed for pmap_clean_page(), which is commented out. */ struct vm_page *src_pg = PHYS_TO_VM_PAGE(src); #endif /* * Clean the source page. Hold the source page's lock for * the duration of the copy so that no other mappings can * be created while we have a potentially aliased mapping. */ #if 0 /* * XXX: Not needed while we call cpu_dcache_wbinv_all() in * pmap_copy_page(). */ (void) pmap_clean_page(TAILQ_FIRST(&src_pg->md.pv_list), TRUE); #endif /* * Map the pages into the page hook points, copy them, and purge * the cache for the appropriate page. Invalidate the TLB * as required. */ mtx_lock(&cmtx); *csrc_pte = L2_S_PROTO | src | L2_S_PROT(PTE_KERNEL, VM_PROT_READ) | L2_C | L2_XSCALE_T_TEX(TEX_XSCALE_X); /* mini-data */ PTE_SYNC(csrc_pte); *cdst_pte = L2_S_PROTO | dst | L2_S_PROT(PTE_KERNEL, VM_PROT_WRITE) | L2_C | L2_XSCALE_T_TEX(TEX_XSCALE_X); /* mini-data */ PTE_SYNC(cdst_pte); cpu_tlb_flushD_SE(csrcp); cpu_tlb_flushD_SE(cdstp); cpu_cpwait(); bcopy_page(csrcp, cdstp); mtx_unlock(&cmtx); xscale_cache_clean_minidata(); } void pmap_copy_page_offs_xscale(vm_paddr_t a_phys, vm_offset_t a_offs, vm_paddr_t b_phys, vm_offset_t b_offs, int cnt) { mtx_lock(&cmtx); *csrc_pte = L2_S_PROTO | a_phys | L2_S_PROT(PTE_KERNEL, VM_PROT_READ) | L2_C | L2_XSCALE_T_TEX(TEX_XSCALE_X); PTE_SYNC(csrc_pte); *cdst_pte = L2_S_PROTO | b_phys | L2_S_PROT(PTE_KERNEL, VM_PROT_WRITE) | L2_C | L2_XSCALE_T_TEX(TEX_XSCALE_X); PTE_SYNC(cdst_pte); cpu_tlb_flushD_SE(csrcp); cpu_tlb_flushD_SE(cdstp); cpu_cpwait(); bcopy((char *)csrcp + a_offs, (char *)cdstp + b_offs, cnt); mtx_unlock(&cmtx); xscale_cache_clean_minidata(); } #endif /* ARM_MMU_XSCALE == 1 */ void pmap_copy_page(vm_page_t src, vm_page_t dst) { cpu_dcache_wbinv_all(); cpu_l2cache_wbinv_all(); if (_arm_memcpy && PAGE_SIZE >= _min_memcpy_size && _arm_memcpy((void *)VM_PAGE_TO_PHYS(dst), (void *)VM_PAGE_TO_PHYS(src), PAGE_SIZE, IS_PHYSICAL) == 0) return; pmap_copy_page_func(VM_PAGE_TO_PHYS(src), VM_PAGE_TO_PHYS(dst)); } /* * We have code to do unmapped I/O. However, it isn't quite right and * causes un-page-aligned I/O to devices to fail (most notably newfs * or fsck). We give up a little performance to not allow unmapped I/O * to gain stability. */ int unmapped_buf_allowed = 0; void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) { vm_page_t a_pg, b_pg; vm_offset_t a_pg_offset, b_pg_offset; int cnt; cpu_dcache_wbinv_all(); cpu_l2cache_wbinv_all(); while (xfersize > 0) { a_pg = ma[a_offset >> PAGE_SHIFT]; a_pg_offset = a_offset & PAGE_MASK; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); b_pg = mb[b_offset >> PAGE_SHIFT]; b_pg_offset = b_offset & PAGE_MASK; cnt = min(cnt, PAGE_SIZE - b_pg_offset); pmap_copy_page_offs_func(VM_PAGE_TO_PHYS(a_pg), a_pg_offset, VM_PAGE_TO_PHYS(b_pg), b_pg_offset, cnt); xfersize -= cnt; a_offset += cnt; b_offset += cnt; } } vm_offset_t pmap_quick_enter_page(vm_page_t m) { /* * Don't bother with a PCPU pageframe, since we don't support * SMP for anything pre-armv7. Use pmap_kenter() to ensure * caching is handled correctly for multiple mappings of the * same physical page. */ mtx_assert(&qmap_mtx, MA_NOTOWNED); mtx_lock(&qmap_mtx); pmap_kenter(qmap_addr, VM_PAGE_TO_PHYS(m)); return (qmap_addr); } void pmap_quick_remove_page(vm_offset_t addr) { KASSERT(addr == qmap_addr, ("pmap_quick_remove_page: invalid address")); mtx_assert(&qmap_mtx, MA_OWNED); pmap_kremove(addr); mtx_unlock(&qmap_mtx); } /* * this routine returns true if a physical page resides * in the given pmap. */ boolean_t pmap_page_exists_quick(pmap_t pmap, vm_page_t m) { pv_entry_t pv; int loops = 0; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_page_exists_quick: page %p is not managed", m)); rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) { if (pv->pv_pmap == pmap) { rv = TRUE; break; } loops++; if (loops >= 16) break; } rw_wunlock(&pvh_global_lock); return (rv); } /* * pmap_page_wired_mappings: * * Return the number of managed mappings to the given physical page * that are wired. */ int pmap_page_wired_mappings(vm_page_t m) { pv_entry_t pv; int count; count = 0; if ((m->oflags & VPO_UNMANAGED) != 0) return (count); rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) if ((pv->pv_flags & PVF_WIRED) != 0) count++; rw_wunlock(&pvh_global_lock); return (count); } /* * This function is advisory. */ void pmap_advise(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, int advice) { } /* * pmap_ts_referenced: * * Return the count of reference bits for a page, clearing all of them. */ int pmap_ts_referenced(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_ts_referenced: page %p is not managed", m)); return (pmap_clearbit(m, PVF_REF)); } boolean_t pmap_is_modified(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_modified: page %p is not managed", m)); if (m->md.pvh_attrs & PVF_MOD) return (TRUE); return(FALSE); } /* * Clear the modify bits on the specified physical page. */ void pmap_clear_modify(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("pmap_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PGA_WRITEABLE, then no mappings can be modified. * If the object containing the page is locked and the page is not * exclusive busied, then PGA_WRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; if (m->md.pvh_attrs & PVF_MOD) pmap_clearbit(m, PVF_MOD); } /* * pmap_is_referenced: * * Return whether or not the specified physical page was referenced * in any physical maps. */ boolean_t pmap_is_referenced(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_is_referenced: page %p is not managed", m)); return ((m->md.pvh_attrs & PVF_REF) != 0); } /* * Clear the write and modified bits in each of the given page's mappings. */ void pmap_remove_write(vm_page_t m) { KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("pmap_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (vm_page_xbusied(m) || (m->aflags & PGA_WRITEABLE) != 0) pmap_clearbit(m, PVF_WRITE); } /* * perform the pmap work for mincore */ int pmap_mincore(pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { struct l2_bucket *l2b; pt_entry_t *ptep, pte; vm_paddr_t pa; vm_page_t m; int val; boolean_t managed; PMAP_LOCK(pmap); retry: l2b = pmap_get_l2_bucket(pmap, addr); if (l2b == NULL) { val = 0; goto out; } ptep = &l2b->l2b_kva[l2pte_index(addr)]; pte = *ptep; if (!l2pte_valid(pte)) { val = 0; goto out; } val = MINCORE_INCORE; if (pte & L2_S_PROT_W) val |= MINCORE_MODIFIED | MINCORE_MODIFIED_OTHER; managed = false; pa = l2pte_pa(pte); m = PHYS_TO_VM_PAGE(pa); if (m != NULL && !(m->oflags & VPO_UNMANAGED)) managed = true; if (managed) { /* * The ARM pmap tries to maintain a per-mapping * reference bit. The trouble is that it's kept in * the PV entry, not the PTE, so it's costly to access * here. You would need to acquire the pvh global * lock, call pmap_find_pv(), and introduce a custom * version of vm_page_pa_tryrelock() that releases and * reacquires the pvh global lock. In the end, I * doubt it's worthwhile. This may falsely report * the given address as referenced. */ if ((m->md.pvh_attrs & PVF_REF) != 0) val |= MINCORE_REFERENCED | MINCORE_REFERENCED_OTHER; } if ((val & (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER)) != (MINCORE_MODIFIED_OTHER | MINCORE_REFERENCED_OTHER) && managed) { /* Ensure that "PHYS_TO_VM_PAGE(pa)->object" doesn't change. */ if (vm_page_pa_tryrelock(pmap, pa, locked_pa)) goto retry; } else out: PA_UNLOCK_COND(*locked_pa); PMAP_UNLOCK(pmap); return (val); } void pmap_sync_icache(pmap_t pm, vm_offset_t va, vm_size_t sz) { } /* * Increase the starting virtual address of the given mapping if a * different alignment might result in more superpage mappings. */ void pmap_align_superpage(vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, vm_size_t size) { } #define BOOTSTRAP_DEBUG /* * pmap_map_section: * * Create a single section mapping. */ void pmap_map_section(vm_offset_t l1pt, vm_offset_t va, vm_offset_t pa, int prot, int cache) { pd_entry_t *pde = (pd_entry_t *) l1pt; pd_entry_t fl; KASSERT(((va | pa) & L1_S_OFFSET) == 0, ("ouin2")); switch (cache) { case PTE_NOCACHE: default: fl = 0; break; case PTE_CACHE: fl = pte_l1_s_cache_mode; break; case PTE_PAGETABLE: fl = pte_l1_s_cache_mode_pt; break; } pde[va >> L1_S_SHIFT] = L1_S_PROTO | pa | L1_S_PROT(PTE_KERNEL, prot) | fl | L1_S_DOM(PMAP_DOMAIN_KERNEL); PTE_SYNC(&pde[va >> L1_S_SHIFT]); } /* * pmap_link_l2pt: * * Link the L2 page table specified by l2pv.pv_pa into the L1 * page table at the slot for "va". */ void pmap_link_l2pt(vm_offset_t l1pt, vm_offset_t va, struct pv_addr *l2pv) { pd_entry_t *pde = (pd_entry_t *) l1pt, proto; u_int slot = va >> L1_S_SHIFT; proto = L1_S_DOM(PMAP_DOMAIN_KERNEL) | L1_C_PROTO; #ifdef VERBOSE_INIT_ARM printf("pmap_link_l2pt: pa=0x%x va=0x%x\n", l2pv->pv_pa, l2pv->pv_va); #endif pde[slot + 0] = proto | (l2pv->pv_pa + 0x000); PTE_SYNC(&pde[slot]); SLIST_INSERT_HEAD(&kernel_pt_list, l2pv, pv_list); } /* * pmap_map_entry * * Create a single page mapping. */ void pmap_map_entry(vm_offset_t l1pt, vm_offset_t va, vm_offset_t pa, int prot, int cache) { pd_entry_t *pde = (pd_entry_t *) l1pt; pt_entry_t fl; pt_entry_t *pte; KASSERT(((va | pa) & PAGE_MASK) == 0, ("ouin")); switch (cache) { case PTE_NOCACHE: default: fl = 0; break; case PTE_CACHE: fl = pte_l2_s_cache_mode; break; case PTE_PAGETABLE: fl = pte_l2_s_cache_mode_pt; break; } if ((pde[va >> L1_S_SHIFT] & L1_TYPE_MASK) != L1_TYPE_C) panic("pmap_map_entry: no L2 table for VA 0x%08x", va); pte = (pt_entry_t *) kernel_pt_lookup(pde[L1_IDX(va)] & L1_C_ADDR_MASK); if (pte == NULL) panic("pmap_map_entry: can't find L2 table for VA 0x%08x", va); pte[l2pte_index(va)] = L2_S_PROTO | pa | L2_S_PROT(PTE_KERNEL, prot) | fl; PTE_SYNC(&pte[l2pte_index(va)]); } /* * pmap_map_chunk: * * Map a chunk of memory using the most efficient mappings * possible (section. large page, small page) into the * provided L1 and L2 tables at the specified virtual address. */ vm_size_t pmap_map_chunk(vm_offset_t l1pt, vm_offset_t va, vm_offset_t pa, vm_size_t size, int prot, int cache) { pd_entry_t *pde = (pd_entry_t *) l1pt; pt_entry_t *pte, f1, f2s, f2l; vm_size_t resid; int i; resid = (size + (PAGE_SIZE - 1)) & ~(PAGE_SIZE - 1); if (l1pt == 0) panic("pmap_map_chunk: no L1 table provided"); #ifdef VERBOSE_INIT_ARM printf("pmap_map_chunk: pa=0x%x va=0x%x size=0x%x resid=0x%x " "prot=0x%x cache=%d\n", pa, va, size, resid, prot, cache); #endif switch (cache) { case PTE_NOCACHE: default: f1 = 0; f2l = 0; f2s = 0; break; case PTE_CACHE: f1 = pte_l1_s_cache_mode; f2l = pte_l2_l_cache_mode; f2s = pte_l2_s_cache_mode; break; case PTE_PAGETABLE: f1 = pte_l1_s_cache_mode_pt; f2l = pte_l2_l_cache_mode_pt; f2s = pte_l2_s_cache_mode_pt; break; } size = resid; while (resid > 0) { /* See if we can use a section mapping. */ if (L1_S_MAPPABLE_P(va, pa, resid)) { #ifdef VERBOSE_INIT_ARM printf("S"); #endif pde[va >> L1_S_SHIFT] = L1_S_PROTO | pa | L1_S_PROT(PTE_KERNEL, prot) | f1 | L1_S_DOM(PMAP_DOMAIN_KERNEL); PTE_SYNC(&pde[va >> L1_S_SHIFT]); va += L1_S_SIZE; pa += L1_S_SIZE; resid -= L1_S_SIZE; continue; } /* * Ok, we're going to use an L2 table. Make sure * one is actually in the corresponding L1 slot * for the current VA. */ if ((pde[va >> L1_S_SHIFT] & L1_TYPE_MASK) != L1_TYPE_C) panic("pmap_map_chunk: no L2 table for VA 0x%08x", va); pte = (pt_entry_t *) kernel_pt_lookup( pde[L1_IDX(va)] & L1_C_ADDR_MASK); if (pte == NULL) panic("pmap_map_chunk: can't find L2 table for VA" "0x%08x", va); /* See if we can use a L2 large page mapping. */ if (L2_L_MAPPABLE_P(va, pa, resid)) { #ifdef VERBOSE_INIT_ARM printf("L"); #endif for (i = 0; i < 16; i++) { pte[l2pte_index(va) + i] = L2_L_PROTO | pa | L2_L_PROT(PTE_KERNEL, prot) | f2l; PTE_SYNC(&pte[l2pte_index(va) + i]); } va += L2_L_SIZE; pa += L2_L_SIZE; resid -= L2_L_SIZE; continue; } /* Use a small page mapping. */ #ifdef VERBOSE_INIT_ARM printf("P"); #endif pte[l2pte_index(va)] = L2_S_PROTO | pa | L2_S_PROT(PTE_KERNEL, prot) | f2s; PTE_SYNC(&pte[l2pte_index(va)]); va += PAGE_SIZE; pa += PAGE_SIZE; resid -= PAGE_SIZE; } #ifdef VERBOSE_INIT_ARM printf("\n"); #endif return (size); } void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma) { /* * Remember the memattr in a field that gets used to set the appropriate * bits in the PTEs as mappings are established. */ m->md.pv_memattr = ma; /* * It appears that this function can only be called before any mappings * for the page are established on ARM. If this ever changes, this code * will need to walk the pv_list and make each of the existing mappings * uncacheable, being careful to sync caches and PTEs (and maybe * invalidate TLB?) for any current mapping it modifies. */ if (m->md.pv_kva != 0 || TAILQ_FIRST(&m->md.pv_list) != NULL) panic("Can't change memattr on page with existing mappings"); } Index: projects/clang380-import/sys/arm/arm/trap-v6.c =================================================================== --- projects/clang380-import/sys/arm/arm/trap-v6.c (revision 294776) +++ projects/clang380-import/sys/arm/arm/trap-v6.c (revision 294777) @@ -1,671 +1,671 @@ /*- * Copyright 2014 Olivier Houchard * Copyright 2014 Svatopluk Kraus * Copyright 2014 Michal Meloun * Copyright 2014 Andrew Turner * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include "opt_ktrace.h" #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #ifdef KTRACE #include #include #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef KDB #include #include #endif #ifdef KDTRACE_HOOKS #include #endif extern char fusubailout[]; extern char cachebailout[]; #ifdef DEBUG int last_fault_code; /* For the benefit of pmap_fault_fixup() */ #endif struct ksig { int sig; u_long code; vm_offset_t addr; }; typedef int abort_func_t(struct trapframe *, u_int, u_int, u_int, u_int, struct thread *, struct ksig *); static abort_func_t abort_fatal; static abort_func_t abort_align; static abort_func_t abort_icache; struct abort { abort_func_t *func; const char *desc; }; /* * How are the aborts handled? * * Undefined Code: * - Always fatal as we do not know what does it mean. * Imprecise External Abort: * - Always fatal, but can be handled somehow in the future. * Now, due to PCIe buggy hardware, ignored. * Precise External Abort: * - Always fatal, but who knows in the future??? * Debug Event: * - Special handling. * External Translation Abort (L1 & L2) * - Always fatal as something is screwed up in page tables or hardware. * Domain Fault (L1 & L2): * - Always fatal as we do not play game with domains. * Alignment Fault: * - Everything should be aligned in kernel with exception of user to kernel * and vice versa data copying, so if pcb_onfault is not set, it's fatal. * We generate signal in case of abort from user mode. * Instruction cache maintenance: * - According to manual, this is translation fault during cache maintenance * operation. So, it could be really complex in SMP case and fuzzy too * for cache operations working on virtual addresses. For now, we will * consider this abort as fatal. In fact, no cache maintenance on * not mapped virtual addresses should be called. As cache maintenance * operation (except DMB, DSB, and Flush Prefetch Buffer) are priviledged, * the abort is fatal for user mode as well for now. (This is good place to * note that cache maintenance on virtual address fill TLB.) * Acces Bit (L1 & L2): * - Fast hardware emulation for kernel and user mode. * Translation Fault (L1 & L2): * - Standard fault mechanism is held including vm_fault(). * Permission Fault (L1 & L2): * - Fast hardware emulation of modify bits and in other cases, standard * fault mechanism is held including vm_fault(). */ static const struct abort aborts[] = { {abort_fatal, "Undefined Code (0x000)"}, {abort_align, "Alignment Fault"}, {abort_fatal, "Debug Event"}, {NULL, "Access Bit (L1)"}, {NULL, "Instruction cache maintenance"}, {NULL, "Translation Fault (L1)"}, {NULL, "Access Bit (L2)"}, {NULL, "Translation Fault (L2)"}, {abort_fatal, "External Abort"}, {abort_fatal, "Domain Fault (L1)"}, {abort_fatal, "Undefined Code (0x00A)"}, {abort_fatal, "Domain Fault (L2)"}, {abort_fatal, "External Translation Abort (L1)"}, {NULL, "Permission Fault (L1)"}, {abort_fatal, "External Translation Abort (L2)"}, {NULL, "Permission Fault (L2)"}, {abort_fatal, "TLB Conflict Abort"}, {abort_fatal, "Undefined Code (0x401)"}, {abort_fatal, "Undefined Code (0x402)"}, {abort_fatal, "Undefined Code (0x403)"}, {abort_fatal, "Undefined Code (0x404)"}, {abort_fatal, "Undefined Code (0x405)"}, {abort_fatal, "Asynchronous External Abort"}, {abort_fatal, "Undefined Code (0x407)"}, {abort_fatal, "Asynchronous Parity Error on Memory Access"}, {abort_fatal, "Parity Error on Memory Access"}, {abort_fatal, "Undefined Code (0x40A)"}, {abort_fatal, "Undefined Code (0x40B)"}, {abort_fatal, "Parity Error on Translation (L1)"}, {abort_fatal, "Undefined Code (0x40D)"}, {abort_fatal, "Parity Error on Translation (L2)"}, {abort_fatal, "Undefined Code (0x40F)"} }; static __inline void call_trapsignal(struct thread *td, int sig, int code, vm_offset_t addr) { ksiginfo_t ksi; CTR4(KTR_TRAP, "%s: addr: %#x, sig: %d, code: %d", __func__, addr, sig, code); /* * TODO: some info would be nice to know * if we are serving data or prefetch abort. */ ksiginfo_init_trap(&ksi); ksi.ksi_signo = sig; ksi.ksi_code = code; ksi.ksi_addr = (void *)addr; trapsignal(td, &ksi); } /* * abort_imprecise() handles the following abort: * * FAULT_EA_IMPREC - Imprecise External Abort * * The imprecise means that we don't know where the abort happened, * thus FAR is undefined. The abort should not never fire, but hot * plugging or accidental hardware failure can be the cause of it. * If the abort happens, it can even be on different (thread) context. * Without any additional support, the abort is fatal, as we do not * know what really happened. * * QQQ: Some additional functionality, like pcb_onfault but global, * can be implemented. Imprecise handlers could be registered * which tell us if the abort is caused by something they know * about. They should return one of three codes like: * FAULT_IS_MINE, * FAULT_CAN_BE_MINE, * FAULT_IS_NOT_MINE. * The handlers should be called until some of them returns * FAULT_IS_MINE value or all was called. If all handlers return * FAULT_IS_NOT_MINE value, then the abort is fatal. */ static __inline void abort_imprecise(struct trapframe *tf, u_int fsr, u_int prefetch, bool usermode) { /* * XXX - We can got imprecise abort as result of access * to not-present PCI/PCIe configuration space. */ #if 0 goto out; #endif abort_fatal(tf, FAULT_EA_IMPREC, fsr, 0, prefetch, curthread, NULL); /* * Returning from this function means that we ignore * the abort for good reason. Note that imprecise abort * could fire any time even in user mode. */ #if 0 out: if (usermode) userret(curthread, tf); #endif } /* * abort_debug() handles the following abort: * * FAULT_DEBUG - Debug Event * */ static __inline void abort_debug(struct trapframe *tf, u_int fsr, u_int prefetch, bool usermode, u_int far) { if (usermode) { struct thread *td; td = curthread; call_trapsignal(td, SIGTRAP, TRAP_BRKPT, far); userret(td, tf); } else { #ifdef KDB - kdb_trap(T_BREAKPOINT, 0, tf); + kdb_trap((prefetch) ? T_BREAKPOINT : T_WATCHPOINT, 0, tf); #else printf("No debugger in kernel.\n"); #endif } } /* * Abort handler. * * FAR, FSR, and everything what can be lost after enabling * interrupts must be grabbed before the interrupts will be * enabled. Note that when interrupts will be enabled, we * could even migrate to another CPU ... * * TODO: move quick cases to ASM */ void abort_handler(struct trapframe *tf, int prefetch) { struct thread *td; vm_offset_t far, va; int idx, rv; uint32_t fsr; struct ksig ksig; struct proc *p; struct pcb *pcb; struct vm_map *map; struct vmspace *vm; vm_prot_t ftype; bool usermode; #ifdef INVARIANTS void *onfault; #endif td = curthread; fsr = (prefetch) ? cp15_ifsr_get(): cp15_dfsr_get(); #if __ARM_ARCH >= 7 far = (prefetch) ? cp15_ifar_get() : cp15_dfar_get(); #else far = (prefetch) ? TRAPF_PC(tf) : cp15_dfar_get(); #endif idx = FSR_TO_FAULT(fsr); usermode = TRAPF_USERMODE(tf); /* Abort came from user mode? */ if (usermode) td->td_frame = tf; CTR6(KTR_TRAP, "%s: fsr %#x (idx %u) far %#x prefetch %u usermode %d", __func__, fsr, idx, far, prefetch, usermode); /* * Firstly, handle aborts that are not directly related to mapping. */ if (__predict_false(idx == FAULT_EA_IMPREC)) { abort_imprecise(tf, fsr, prefetch, usermode); return; } if (__predict_false(idx == FAULT_DEBUG)) { abort_debug(tf, fsr, prefetch, usermode, far); return; } /* * ARM has a set of unprivileged load and store instructions * (LDRT/LDRBT/STRT/STRBT ...) which are supposed to be used in other * than user mode and OS should recognize their aborts and behave * appropriately. However, there is no way how to do that reasonably * in general unless we restrict the handling somehow. * * For now, these instructions are used only in copyin()/copyout() * like functions where usermode buffers are checked in advance that * they are not from KVA space. Thus, no action is needed here. */ #ifdef ARM_NEW_PMAP rv = pmap_fault(PCPU_GET(curpmap), far, fsr, idx, usermode); if (rv == 0) { return; } else if (rv == EFAULT) { call_trapsignal(td, SIGSEGV, SEGV_MAPERR, far); userret(td, tf); return; } #endif /* * Now, when we handled imprecise and debug aborts, the rest of * aborts should be really related to mapping. */ PCPU_INC(cnt.v_trap); #ifdef KDB if (kdb_active) { kdb_reenter(); goto out; } #endif if (__predict_false((td->td_pflags & TDP_NOFAULTING) != 0)) { /* * Due to both processor errata and lazy TLB invalidation when * access restrictions are removed from virtual pages, memory * accesses that are allowed by the physical mapping layer may * nonetheless cause one spurious page fault per virtual page. * When the thread is executing a "no faulting" section that * is bracketed by vm_fault_{disable,enable}_pagefaults(), * every page fault is treated as a spurious page fault, * unless it accesses the same virtual address as the most * recent page fault within the same "no faulting" section. */ if (td->td_md.md_spurflt_addr != far || (td->td_pflags & TDP_RESETSPUR) != 0) { td->td_md.md_spurflt_addr = far; td->td_pflags &= ~TDP_RESETSPUR; tlb_flush_local(far & ~PAGE_MASK); return; } } else { /* * If we get a page fault while in a critical section, then * it is most likely a fatal kernel page fault. The kernel * is already going to panic trying to get a sleep lock to * do the VM lookup, so just consider it a fatal trap so the * kernel can print out a useful trap message and even get * to the debugger. * * If we get a page fault while holding a non-sleepable * lock, then it is most likely a fatal kernel page fault. * If WITNESS is enabled, then it's going to whine about * bogus LORs with various VM locks, so just skip to the * fatal trap handling directly. */ if (td->td_critnest != 0 || WITNESS_CHECK(WARN_SLEEPOK | WARN_GIANTOK, NULL, "Kernel page fault") != 0) { abort_fatal(tf, idx, fsr, far, prefetch, td, &ksig); return; } } /* Re-enable interrupts if they were enabled previously. */ if (td->td_md.md_spinlock_count == 0) { if (__predict_true(tf->tf_spsr & PSR_I) == 0) enable_interrupts(PSR_I); if (__predict_true(tf->tf_spsr & PSR_F) == 0) enable_interrupts(PSR_F); } p = td->td_proc; if (usermode) { td->td_pticks = 0; if (td->td_cowgen != p->p_cowgen) thread_cow_update(td); } /* Invoke the appropriate handler, if necessary. */ if (__predict_false(aborts[idx].func != NULL)) { if ((aborts[idx].func)(tf, idx, fsr, far, prefetch, td, &ksig)) goto do_trapsignal; goto out; } /* * Don't pass faulting cache operation to vm_fault(). We don't want * to handle all vm stuff at this moment. */ pcb = td->td_pcb; if (__predict_false(pcb->pcb_onfault == cachebailout)) { tf->tf_r0 = far; /* return failing address */ tf->tf_pc = (register_t)pcb->pcb_onfault; return; } /* Handle remaining I-cache aborts. */ if (idx == FAULT_ICACHE) { if (abort_icache(tf, idx, fsr, far, prefetch, td, &ksig)) goto do_trapsignal; goto out; } /* * At this point, we're dealing with one of the following aborts: * * FAULT_TRAN_xx - Translation * FAULT_PERM_xx - Permission * * These are the main virtual memory-related faults signalled by * the MMU. */ /* fusubailout is used by [fs]uswintr to avoid page faulting. */ pcb = td->td_pcb; if (__predict_false(pcb->pcb_onfault == fusubailout)) { tf->tf_r0 = EFAULT; tf->tf_pc = (register_t)pcb->pcb_onfault; return; } va = trunc_page(far); if (va >= KERNBASE) { /* * Don't allow user-mode faults in kernel address space. */ if (usermode) goto nogo; map = kernel_map; } else { /* * This is a fault on non-kernel virtual memory. If curproc * is NULL or curproc->p_vmspace is NULL the fault is fatal. */ vm = (p != NULL) ? p->p_vmspace : NULL; if (vm == NULL) goto nogo; map = &vm->vm_map; if (!usermode && (td->td_intr_nesting_level != 0 || pcb->pcb_onfault == NULL)) { abort_fatal(tf, idx, fsr, far, prefetch, td, &ksig); return; } } ftype = (fsr & FSR_WNR) ? VM_PROT_WRITE : VM_PROT_READ; if (prefetch) ftype |= VM_PROT_EXECUTE; #ifdef DEBUG last_fault_code = fsr; #endif #ifndef ARM_NEW_PMAP if (pmap_fault_fixup(vmspace_pmap(td->td_proc->p_vmspace), va, ftype, usermode)) { goto out; } #endif #ifdef INVARIANTS onfault = pcb->pcb_onfault; pcb->pcb_onfault = NULL; #endif /* Fault in the page. */ rv = vm_fault(map, va, ftype, VM_FAULT_NORMAL); #ifdef INVARIANTS pcb->pcb_onfault = onfault; #endif if (__predict_true(rv == KERN_SUCCESS)) goto out; nogo: if (!usermode) { if (td->td_intr_nesting_level == 0 && pcb->pcb_onfault != NULL) { tf->tf_r0 = rv; tf->tf_pc = (int)pcb->pcb_onfault; return; } CTR2(KTR_TRAP, "%s: vm_fault() failed with %d", __func__, rv); abort_fatal(tf, idx, fsr, far, prefetch, td, &ksig); return; } ksig.sig = SIGSEGV; ksig.code = (rv == KERN_PROTECTION_FAILURE) ? SEGV_ACCERR : SEGV_MAPERR; ksig.addr = far; do_trapsignal: call_trapsignal(td, ksig.sig, ksig.code, ksig.addr); out: if (usermode) userret(td, tf); } /* * abort_fatal() handles the following data aborts: * * FAULT_DEBUG - Debug Event * FAULT_ACCESS_xx - Acces Bit * FAULT_EA_PREC - Precise External Abort * FAULT_DOMAIN_xx - Domain Fault * FAULT_EA_TRAN_xx - External Translation Abort * FAULT_EA_IMPREC - Imprecise External Abort * + all undefined codes for ABORT * * We should never see these on a properly functioning system. * * This function is also called by the other handlers if they * detect a fatal problem. * * Note: If 'l' is NULL, we assume we're dealing with a prefetch abort. */ static int abort_fatal(struct trapframe *tf, u_int idx, u_int fsr, u_int far, u_int prefetch, struct thread *td, struct ksig *ksig) { bool usermode; const char *mode; const char *rw_mode; usermode = TRAPF_USERMODE(tf); #ifdef KDTRACE_HOOKS if (!usermode) { if (dtrace_trap_func != NULL && (*dtrace_trap_func)(tf, far)) return (0); } #endif mode = usermode ? "user" : "kernel"; rw_mode = fsr & FSR_WNR ? "write" : "read"; disable_interrupts(PSR_I|PSR_F); if (td != NULL) { printf("Fatal %s mode data abort: '%s' on %s\n", mode, aborts[idx].desc, rw_mode); printf("trapframe: %p\nFSR=%08x, FAR=", tf, fsr); if (idx != FAULT_EA_IMPREC) printf("%08x, ", far); else printf("Invalid, "); printf("spsr=%08x\n", tf->tf_spsr); } else { printf("Fatal %s mode prefetch abort at 0x%08x\n", mode, tf->tf_pc); printf("trapframe: %p, spsr=%08x\n", tf, tf->tf_spsr); } printf("r0 =%08x, r1 =%08x, r2 =%08x, r3 =%08x\n", tf->tf_r0, tf->tf_r1, tf->tf_r2, tf->tf_r3); printf("r4 =%08x, r5 =%08x, r6 =%08x, r7 =%08x\n", tf->tf_r4, tf->tf_r5, tf->tf_r6, tf->tf_r7); printf("r8 =%08x, r9 =%08x, r10=%08x, r11=%08x\n", tf->tf_r8, tf->tf_r9, tf->tf_r10, tf->tf_r11); printf("r12=%08x, ", tf->tf_r12); if (usermode) printf("usp=%08x, ulr=%08x", tf->tf_usr_sp, tf->tf_usr_lr); else printf("ssp=%08x, slr=%08x", tf->tf_svc_sp, tf->tf_svc_lr); printf(", pc =%08x\n\n", tf->tf_pc); #ifdef KDB if (debugger_on_panic || kdb_active) kdb_trap(fsr, 0, tf); #endif panic("Fatal abort"); /*NOTREACHED*/ } /* * abort_align() handles the following data abort: * * FAULT_ALIGN - Alignment fault * * Everything should be aligned in kernel with exception of user to kernel * and vice versa data copying, so if pcb_onfault is not set, it's fatal. * We generate signal in case of abort from user mode. */ static int abort_align(struct trapframe *tf, u_int idx, u_int fsr, u_int far, u_int prefetch, struct thread *td, struct ksig *ksig) { bool usermode; usermode = TRAPF_USERMODE(tf); if (!usermode) { if (td->td_intr_nesting_level == 0 && td != NULL && td->td_pcb->pcb_onfault != NULL) { tf->tf_r0 = EFAULT; tf->tf_pc = (int)td->td_pcb->pcb_onfault; return (0); } abort_fatal(tf, idx, fsr, far, prefetch, td, ksig); } /* Deliver a bus error signal to the process */ ksig->code = BUS_ADRALN; ksig->sig = SIGBUS; ksig->addr = far; return (1); } /* * abort_icache() handles the following data abort: * * FAULT_ICACHE - Instruction cache maintenance * * According to manual, FAULT_ICACHE is translation fault during cache * maintenance operation. In fact, no cache maintenance operation on * not mapped virtual addresses should be called. As cache maintenance * operation (except DMB, DSB, and Flush Prefetch Buffer) are priviledged, * the abort is concider as fatal for now. However, all the matter with * cache maintenance operation on virtual addresses could be really complex * and fuzzy in SMP case, so maybe in future standard fault mechanism * should be held here including vm_fault() calling. */ static int abort_icache(struct trapframe *tf, u_int idx, u_int fsr, u_int far, u_int prefetch, struct thread *td, struct ksig *ksig) { abort_fatal(tf, idx, fsr, far, prefetch, td, ksig); return(0); } Index: projects/clang380-import/sys/arm/conf/A20 =================================================================== --- projects/clang380-import/sys/arm/conf/A20 (revision 294776) +++ projects/clang380-import/sys/arm/conf/A20 (revision 294777) @@ -1,112 +1,115 @@ # # A20 -- Custom configuration for the Allwinner A20 ARM SoC # # For more information on this file, please read the config(5) manual page, # and/or the handbook section on Kernel Configuration Files: # # http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html # # The handbook is also available locally in /usr/share/doc/handbook # if you've installed the doc distribution, otherwise always see the # FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the # latest information. # # An exhaustive list of options and more detailed explanations of the # device lines is also present in the ../../conf/NOTES and NOTES files. # If you are in doubt as to the purpose or necessity of a line, check first # in NOTES. # # $FreeBSD$ ident A20 include "std.armv6" include "../allwinner/a20/std.a20" +options ARM_INTRNG + options HZ=100 options SCHED_ULE # ULE scheduler options SMP # Enable multiple cores +options PLATFORM # Debugging for use in -current makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options ALT_BREAK_TO_DEBUGGER #options VERBOSE_SYSINIT # Enable verbose sysinit messages options KDB # Enable kernel debugger support # For minimum debugger support (stable branch) use: #options KDB_TRACE # Print a stack trace for a panic # For full debugger support use this instead: options DDB # Enable the kernel debugger options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS options WITNESS # Enable checks to detect deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed #options DIAGNOSTIC # NFS root from boopt/dhcp #options BOOTP #options BOOTP_NFSROOT #options BOOTP_COMPAT #options BOOTP_NFSV3 #options BOOTP_WIRED_TO=dwc0 # Boot device is 2nd slice on MMC/SD card options ROOTDEVNAME=\"ufs:/dev/da0s2\" # Interrupt controller device gic # MMC/SD/SDIO Card slot support device mmc # mmc/sd bus device mmcsd # mmc/sd flash cards # ATA controllers device ahci # AHCI-compatible SATA controllers #device ata # Legacy ATA/SATA controllers # Console and misc device uart device uart_ns8250 device pty device snp device md device random # Entropy device # I2C support #device iicbus #device iic # GPIO device gpio device gpioled device scbus # SCSI bus (required for ATA/SCSI) device da # Direct Access (disks) device pass # Passthrough device (direct ATA/SCSI access) # USB support options USB_HOST_ALIGN=64 # Align usb buffers to cache line size. device usb options USB_DEBUG #options USB_REQ_DEBUG #options USB_VERBOSE #device uhci #device ohci device ehci device umass # Ethernet device loop device ether device mii device bpf #device emac # 10/100 integrated EMAC controller device dwc # 10/100/1000 integrated GMAC controller # USB ethernet support, requires miibus device miibus # Flattened Device Tree options FDT # Configure using FDT/DTB data makeoptions MODULES_EXTRA=dtb/allwinner Index: projects/clang380-import/sys/arm/conf/CUBIEBOARD =================================================================== --- projects/clang380-import/sys/arm/conf/CUBIEBOARD (revision 294776) +++ projects/clang380-import/sys/arm/conf/CUBIEBOARD (revision 294777) @@ -1,109 +1,110 @@ # # CUBIEBOARD -- Custom configuration for the CUBIEBOARD ARM development # platform, check out http://www.cubieboard.org # # For more information on this file, please read the config(5) manual page, # and/or the handbook section on Kernel Configuration Files: # # http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html # # The handbook is also available locally in /usr/share/doc/handbook # if you've installed the doc distribution, otherwise always see the # FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the # latest information. # # An exhaustive list of options and more detailed explanations of the # device lines is also present in the ../../conf/NOTES and NOTES files. # If you are in doubt as to the purpose or necessity of a line, check first # in NOTES. # # $FreeBSD$ ident CUBIEBOARD include "std.armv6" include "../allwinner/std.a10" options HZ=100 options SCHED_4BSD # 4BSD scheduler +options PLATFORM # Debugging for use in -current makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options ALT_BREAK_TO_DEBUGGER #options VERBOSE_SYSINIT # Enable verbose sysinit messages options KDB # Enable kernel debugger support # For minimum debugger support (stable branch) use: #options KDB_TRACE # Print a stack trace for a panic # For full debugger support use this instead: options DDB # Enable the kernel debugger options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS options WITNESS # Enable checks to detect deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed #options DIAGNOSTIC # NFS root from boopt/dhcp #options BOOTP #options BOOTP_NFSROOT #options BOOTP_COMPAT #options BOOTP_NFSV3 #options BOOTP_WIRED_TO=cpsw0 # Boot device is 2nd slice on MMC/SD card options ROOTDEVNAME=\"ufs:/dev/da0s2\" # MMC/SD/SDIO Card slot support device mmc # mmc/sd bus device mmcsd # mmc/sd flash cards # ATA controllers device ahci # AHCI-compatible SATA controllers #device ata # Legacy ATA/SATA controllers # Console and misc device uart device uart_ns8250 device pty device snp device md device random # Entropy device # I2C support #device iicbus #device iic # GPIO device gpio device scbus # SCSI bus (required for ATA/SCSI) device da # Direct Access (disks) device pass # Passthrough device (direct ATA/SCSI access) # USB support options USB_HOST_ALIGN=64 # Align usb buffers to cache line size. device usb options USB_DEBUG #options USB_REQ_DEBUG #options USB_VERBOSE #device uhci #device ohci device ehci device umass # Ethernet device loop device ether device mii device bpf device emac # USB ethernet support, requires miibus device miibus # Flattened Device Tree options FDT # Configure using FDT/DTB data options FDT_DTB_STATIC makeoptions FDT_DTS_FILE=cubieboard.dts makeoptions MODULES_EXTRA=dtb/allwinner Index: projects/clang380-import/sys/arm/include/cpu-v6.h =================================================================== --- projects/clang380-import/sys/arm/include/cpu-v6.h (revision 294776) +++ projects/clang380-import/sys/arm/include/cpu-v6.h (revision 294777) @@ -1,612 +1,624 @@ /*- * Copyright 2014 Svatopluk Kraus * Copyright 2014 Michal Meloun * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef MACHINE_CPU_V6_H #define MACHINE_CPU_V6_H /* There are no user serviceable parts here, they may change without notice */ #ifndef _KERNEL #error Only include this file in the kernel #else #include #include "machine/atomic.h" #include "machine/cpufunc.h" #include "machine/cpuinfo.h" #include "machine/sysreg.h" #define CPU_ASID_KERNEL 0 vm_offset_t dcache_wb_pou_checked(vm_offset_t, vm_size_t); vm_offset_t icache_inv_pou_checked(vm_offset_t, vm_size_t); /* * Macros to generate CP15 (system control processor) read/write functions. */ #define _FX(s...) #s #define _RF0(fname, aname...) \ static __inline register_t \ fname(void) \ { \ register_t reg; \ __asm __volatile("mrc\t" _FX(aname): "=r" (reg)); \ return(reg); \ } #define _R64F0(fname, aname) \ static __inline uint64_t \ fname(void) \ { \ uint64_t reg; \ __asm __volatile("mrrc\t" _FX(aname): "=r" (reg)); \ return(reg); \ } #define _WF0(fname, aname...) \ static __inline void \ fname(void) \ { \ __asm __volatile("mcr\t" _FX(aname)); \ } #define _WF1(fname, aname...) \ static __inline void \ fname(register_t reg) \ { \ __asm __volatile("mcr\t" _FX(aname):: "r" (reg)); \ } #define _W64F1(fname, aname...) \ static __inline void \ fname(uint64_t reg) \ { \ __asm __volatile("mcrr\t" _FX(aname):: "r" (reg)); \ } /* * Raw CP15 maintenance operations * !!! not for external use !!! */ /* TLB */ _WF0(_CP15_TLBIALL, CP15_TLBIALL) /* Invalidate entire unified TLB */ #if __ARM_ARCH >= 7 && defined SMP _WF0(_CP15_TLBIALLIS, CP15_TLBIALLIS) /* Invalidate entire unified TLB IS */ #endif _WF1(_CP15_TLBIASID, CP15_TLBIASID(%0)) /* Invalidate unified TLB by ASID */ #if __ARM_ARCH >= 7 && defined SMP _WF1(_CP15_TLBIASIDIS, CP15_TLBIASIDIS(%0)) /* Invalidate unified TLB by ASID IS */ #endif _WF1(_CP15_TLBIMVAA, CP15_TLBIMVAA(%0)) /* Invalidate unified TLB by MVA, all ASID */ #if __ARM_ARCH >= 7 && defined SMP _WF1(_CP15_TLBIMVAAIS, CP15_TLBIMVAAIS(%0)) /* Invalidate unified TLB by MVA, all ASID IS */ #endif _WF1(_CP15_TLBIMVA, CP15_TLBIMVA(%0)) /* Invalidate unified TLB by MVA */ _WF1(_CP15_TTB_SET, CP15_TTBR0(%0)) /* Cache and Branch predictor */ _WF0(_CP15_BPIALL, CP15_BPIALL) /* Branch predictor invalidate all */ #if __ARM_ARCH >= 7 && defined SMP _WF0(_CP15_BPIALLIS, CP15_BPIALLIS) /* Branch predictor invalidate all IS */ #endif _WF1(_CP15_BPIMVA, CP15_BPIMVA(%0)) /* Branch predictor invalidate by MVA */ _WF1(_CP15_DCCIMVAC, CP15_DCCIMVAC(%0)) /* Data cache clean and invalidate by MVA PoC */ _WF1(_CP15_DCCISW, CP15_DCCISW(%0)) /* Data cache clean and invalidate by set/way */ _WF1(_CP15_DCCMVAC, CP15_DCCMVAC(%0)) /* Data cache clean by MVA PoC */ #if __ARM_ARCH >= 7 _WF1(_CP15_DCCMVAU, CP15_DCCMVAU(%0)) /* Data cache clean by MVA PoU */ #endif _WF1(_CP15_DCCSW, CP15_DCCSW(%0)) /* Data cache clean by set/way */ _WF1(_CP15_DCIMVAC, CP15_DCIMVAC(%0)) /* Data cache invalidate by MVA PoC */ _WF1(_CP15_DCISW, CP15_DCISW(%0)) /* Data cache invalidate by set/way */ _WF0(_CP15_ICIALLU, CP15_ICIALLU) /* Instruction cache invalidate all PoU */ #if __ARM_ARCH >= 7 && defined SMP _WF0(_CP15_ICIALLUIS, CP15_ICIALLUIS) /* Instruction cache invalidate all PoU IS */ #endif _WF1(_CP15_ICIMVAU, CP15_ICIMVAU(%0)) /* Instruction cache invalidate */ /* * Publicly accessible functions */ +/* CP14 Debug Registers */ +_RF0(cp14_dbgdidr_get, CP14_DBGDIDR(%0)) +_RF0(cp14_dbgprsr_get, CP14_DBGPRSR(%0)) +_RF0(cp14_dbgoslsr_get, CP14_DBGOSLSR(%0)) +_RF0(cp14_dbgosdlr_get, CP14_DBGOSDLR(%0)) +_RF0(cp14_dbgdscrint_get, CP14_DBGDSCRint(%0)) + +_WF1(cp14_dbgdscr_v6_set, CP14_DBGDSCRext_V6(%0)) +_WF1(cp14_dbgdscr_v7_set, CP14_DBGDSCRext_V7(%0)) +_WF1(cp14_dbgvcr_set, CP14_DBGVCR(%0)) +_WF1(cp14_dbgoslar_set, CP14_DBGOSLAR(%0)) + /* Various control registers */ _RF0(cp15_cpacr_get, CP15_CPACR(%0)) _WF1(cp15_cpacr_set, CP15_CPACR(%0)) _RF0(cp15_dfsr_get, CP15_DFSR(%0)) _RF0(cp15_ifsr_get, CP15_IFSR(%0)) _WF1(cp15_prrr_set, CP15_PRRR(%0)) _WF1(cp15_nmrr_set, CP15_NMRR(%0)) _RF0(cp15_ttbr_get, CP15_TTBR0(%0)) _RF0(cp15_dfar_get, CP15_DFAR(%0)) #if __ARM_ARCH >= 7 _RF0(cp15_ifar_get, CP15_IFAR(%0)) _RF0(cp15_l2ctlr_get, CP15_L2CTLR(%0)) #endif /* ARMv6+ and XScale */ _RF0(cp15_actlr_get, CP15_ACTLR(%0)) _WF1(cp15_actlr_set, CP15_ACTLR(%0)) #if __ARM_ARCH >= 6 _WF1(cp15_ats1cpr_set, CP15_ATS1CPR(%0)) _WF1(cp15_ats1cpw_set, CP15_ATS1CPW(%0)) _RF0(cp15_par_get, CP15_PAR(%0)) _RF0(cp15_sctlr_get, CP15_SCTLR(%0)) #endif /*CPU id registers */ _RF0(cp15_midr_get, CP15_MIDR(%0)) _RF0(cp15_ctr_get, CP15_CTR(%0)) _RF0(cp15_tcmtr_get, CP15_TCMTR(%0)) _RF0(cp15_tlbtr_get, CP15_TLBTR(%0)) _RF0(cp15_mpidr_get, CP15_MPIDR(%0)) _RF0(cp15_revidr_get, CP15_REVIDR(%0)) _RF0(cp15_ccsidr_get, CP15_CCSIDR(%0)) _RF0(cp15_clidr_get, CP15_CLIDR(%0)) _RF0(cp15_aidr_get, CP15_AIDR(%0)) _WF1(cp15_csselr_set, CP15_CSSELR(%0)) _RF0(cp15_id_pfr0_get, CP15_ID_PFR0(%0)) _RF0(cp15_id_pfr1_get, CP15_ID_PFR1(%0)) _RF0(cp15_id_dfr0_get, CP15_ID_DFR0(%0)) _RF0(cp15_id_afr0_get, CP15_ID_AFR0(%0)) _RF0(cp15_id_mmfr0_get, CP15_ID_MMFR0(%0)) _RF0(cp15_id_mmfr1_get, CP15_ID_MMFR1(%0)) _RF0(cp15_id_mmfr2_get, CP15_ID_MMFR2(%0)) _RF0(cp15_id_mmfr3_get, CP15_ID_MMFR3(%0)) _RF0(cp15_id_isar0_get, CP15_ID_ISAR0(%0)) _RF0(cp15_id_isar1_get, CP15_ID_ISAR1(%0)) _RF0(cp15_id_isar2_get, CP15_ID_ISAR2(%0)) _RF0(cp15_id_isar3_get, CP15_ID_ISAR3(%0)) _RF0(cp15_id_isar4_get, CP15_ID_ISAR4(%0)) _RF0(cp15_id_isar5_get, CP15_ID_ISAR5(%0)) _RF0(cp15_cbar_get, CP15_CBAR(%0)) /* Performance Monitor registers */ #if __ARM_ARCH == 6 && defined(CPU_ARM1176) _RF0(cp15_pmuserenr_get, CP15_PMUSERENR(%0)) _WF1(cp15_pmuserenr_set, CP15_PMUSERENR(%0)) _RF0(cp15_pmcr_get, CP15_PMCR(%0)) _WF1(cp15_pmcr_set, CP15_PMCR(%0)) _RF0(cp15_pmccntr_get, CP15_PMCCNTR(%0)) _WF1(cp15_pmccntr_set, CP15_PMCCNTR(%0)) #elif __ARM_ARCH > 6 _RF0(cp15_pmcr_get, CP15_PMCR(%0)) _WF1(cp15_pmcr_set, CP15_PMCR(%0)) _RF0(cp15_pmcnten_get, CP15_PMCNTENSET(%0)) _WF1(cp15_pmcnten_set, CP15_PMCNTENSET(%0)) _WF1(cp15_pmcnten_clr, CP15_PMCNTENCLR(%0)) _RF0(cp15_pmovsr_get, CP15_PMOVSR(%0)) _WF1(cp15_pmovsr_set, CP15_PMOVSR(%0)) _WF1(cp15_pmswinc_set, CP15_PMSWINC(%0)) _RF0(cp15_pmselr_get, CP15_PMSELR(%0)) _WF1(cp15_pmselr_set, CP15_PMSELR(%0)) _RF0(cp15_pmccntr_get, CP15_PMCCNTR(%0)) _WF1(cp15_pmccntr_set, CP15_PMCCNTR(%0)) _RF0(cp15_pmxevtyper_get, CP15_PMXEVTYPER(%0)) _WF1(cp15_pmxevtyper_set, CP15_PMXEVTYPER(%0)) _RF0(cp15_pmxevcntr_get, CP15_PMXEVCNTRR(%0)) _WF1(cp15_pmxevcntr_set, CP15_PMXEVCNTRR(%0)) _RF0(cp15_pmuserenr_get, CP15_PMUSERENR(%0)) _WF1(cp15_pmuserenr_set, CP15_PMUSERENR(%0)) _RF0(cp15_pminten_get, CP15_PMINTENSET(%0)) _WF1(cp15_pminten_set, CP15_PMINTENSET(%0)) _WF1(cp15_pminten_clr, CP15_PMINTENCLR(%0)) #endif _RF0(cp15_tpidrurw_get, CP15_TPIDRURW(%0)) _WF1(cp15_tpidrurw_set, CP15_TPIDRURW(%0)) _RF0(cp15_tpidruro_get, CP15_TPIDRURO(%0)) _WF1(cp15_tpidruro_set, CP15_TPIDRURO(%0)) _RF0(cp15_tpidrpwr_get, CP15_TPIDRPRW(%0)) _WF1(cp15_tpidrpwr_set, CP15_TPIDRPRW(%0)) /* Generic Timer registers - only use when you know the hardware is available */ _RF0(cp15_cntfrq_get, CP15_CNTFRQ(%0)) _WF1(cp15_cntfrq_set, CP15_CNTFRQ(%0)) _RF0(cp15_cntkctl_get, CP15_CNTKCTL(%0)) _WF1(cp15_cntkctl_set, CP15_CNTKCTL(%0)) _RF0(cp15_cntp_tval_get, CP15_CNTP_TVAL(%0)) _WF1(cp15_cntp_tval_set, CP15_CNTP_TVAL(%0)) _RF0(cp15_cntp_ctl_get, CP15_CNTP_CTL(%0)) _WF1(cp15_cntp_ctl_set, CP15_CNTP_CTL(%0)) _RF0(cp15_cntv_tval_get, CP15_CNTV_TVAL(%0)) _WF1(cp15_cntv_tval_set, CP15_CNTV_TVAL(%0)) _RF0(cp15_cntv_ctl_get, CP15_CNTV_CTL(%0)) _WF1(cp15_cntv_ctl_set, CP15_CNTV_CTL(%0)) _RF0(cp15_cnthctl_get, CP15_CNTHCTL(%0)) _WF1(cp15_cnthctl_set, CP15_CNTHCTL(%0)) _RF0(cp15_cnthp_tval_get, CP15_CNTHP_TVAL(%0)) _WF1(cp15_cnthp_tval_set, CP15_CNTHP_TVAL(%0)) _RF0(cp15_cnthp_ctl_get, CP15_CNTHP_CTL(%0)) _WF1(cp15_cnthp_ctl_set, CP15_CNTHP_CTL(%0)) _R64F0(cp15_cntpct_get, CP15_CNTPCT(%Q0, %R0)) _R64F0(cp15_cntvct_get, CP15_CNTVCT(%Q0, %R0)) _R64F0(cp15_cntp_cval_get, CP15_CNTP_CVAL(%Q0, %R0)) _W64F1(cp15_cntp_cval_set, CP15_CNTP_CVAL(%Q0, %R0)) _R64F0(cp15_cntv_cval_get, CP15_CNTV_CVAL(%Q0, %R0)) _W64F1(cp15_cntv_cval_set, CP15_CNTV_CVAL(%Q0, %R0)) _R64F0(cp15_cntvoff_get, CP15_CNTVOFF(%Q0, %R0)) _W64F1(cp15_cntvoff_set, CP15_CNTVOFF(%Q0, %R0)) _R64F0(cp15_cnthp_cval_get, CP15_CNTHP_CVAL(%Q0, %R0)) _W64F1(cp15_cnthp_cval_set, CP15_CNTHP_CVAL(%Q0, %R0)) #undef _FX #undef _RF0 #undef _WF0 #undef _WF1 #if __ARM_ARCH >= 6 /* * Cache and TLB maintenance operations for armv6+ code. The #else block * provides armv4/v5 implementations for a few of these used in common code. */ /* * TLB maintenance operations. */ /* Local (i.e. not broadcasting ) operations. */ /* Flush all TLB entries (even global). */ static __inline void tlb_flush_all_local(void) { dsb(); _CP15_TLBIALL(); dsb(); } /* Flush all not global TLB entries. */ static __inline void tlb_flush_all_ng_local(void) { dsb(); _CP15_TLBIASID(CPU_ASID_KERNEL); dsb(); } /* Flush single TLB entry (even global). */ static __inline void tlb_flush_local(vm_offset_t va) { KASSERT((va & PAGE_MASK) == 0, ("%s: va %#x not aligned", __func__, va)); dsb(); _CP15_TLBIMVA(va | CPU_ASID_KERNEL); dsb(); } /* Flush range of TLB entries (even global). */ static __inline void tlb_flush_range_local(vm_offset_t va, vm_size_t size) { vm_offset_t eva = va + size; KASSERT((va & PAGE_MASK) == 0, ("%s: va %#x not aligned", __func__, va)); KASSERT((size & PAGE_MASK) == 0, ("%s: size %#x not aligned", __func__, size)); dsb(); for (; va < eva; va += PAGE_SIZE) _CP15_TLBIMVA(va | CPU_ASID_KERNEL); dsb(); } /* Broadcasting operations. */ #if __ARM_ARCH >= 7 && defined SMP static __inline void tlb_flush_all(void) { dsb(); _CP15_TLBIALLIS(); dsb(); } static __inline void tlb_flush_all_ng(void) { dsb(); _CP15_TLBIASIDIS(CPU_ASID_KERNEL); dsb(); } static __inline void tlb_flush(vm_offset_t va) { KASSERT((va & PAGE_MASK) == 0, ("%s: va %#x not aligned", __func__, va)); dsb(); _CP15_TLBIMVAAIS(va); dsb(); } static __inline void tlb_flush_range(vm_offset_t va, vm_size_t size) { vm_offset_t eva = va + size; KASSERT((va & PAGE_MASK) == 0, ("%s: va %#x not aligned", __func__, va)); KASSERT((size & PAGE_MASK) == 0, ("%s: size %#x not aligned", __func__, size)); dsb(); for (; va < eva; va += PAGE_SIZE) _CP15_TLBIMVAAIS(va); dsb(); } #else /* SMP */ #define tlb_flush_all() tlb_flush_all_local() #define tlb_flush_all_ng() tlb_flush_all_ng_local() #define tlb_flush(va) tlb_flush_local(va) #define tlb_flush_range(va, size) tlb_flush_range_local(va, size) #endif /* SMP */ /* * Cache maintenance operations. */ /* Sync I and D caches to PoU */ static __inline void icache_sync(vm_offset_t va, vm_size_t size) { vm_offset_t eva = va + size; dsb(); va &= ~cpuinfo.dcache_line_mask; for ( ; va < eva; va += cpuinfo.dcache_line_size) { #if __ARM_ARCH >= 7 && defined SMP _CP15_DCCMVAU(va); #else _CP15_DCCMVAC(va); #endif } dsb(); #if __ARM_ARCH >= 7 && defined SMP _CP15_ICIALLUIS(); #else _CP15_ICIALLU(); #endif dsb(); isb(); } /* Invalidate I cache */ static __inline void icache_inv_all(void) { #if __ARM_ARCH >= 7 && defined SMP _CP15_ICIALLUIS(); #else _CP15_ICIALLU(); #endif dsb(); isb(); } /* Invalidate branch predictor buffer */ static __inline void bpb_inv_all(void) { #if __ARM_ARCH >= 7 && defined SMP _CP15_BPIALLIS(); #else _CP15_BPIALL(); #endif dsb(); isb(); } /* Write back D-cache to PoU */ static __inline void dcache_wb_pou(vm_offset_t va, vm_size_t size) { vm_offset_t eva = va + size; dsb(); va &= ~cpuinfo.dcache_line_mask; for ( ; va < eva; va += cpuinfo.dcache_line_size) { #if __ARM_ARCH >= 7 && defined SMP _CP15_DCCMVAU(va); #else _CP15_DCCMVAC(va); #endif } dsb(); } /* * Invalidate D-cache to PoC * * Caches are invalidated from outermost to innermost as fresh cachelines * flow in this direction. In given range, if there was no dirty cacheline * in any cache before, no stale cacheline should remain in them after this * operation finishes. */ static __inline void dcache_inv_poc(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { vm_offset_t eva = va + size; dsb(); /* invalidate L2 first */ cpu_l2cache_inv_range(pa, size); /* then L1 */ va &= ~cpuinfo.dcache_line_mask; for ( ; va < eva; va += cpuinfo.dcache_line_size) { _CP15_DCIMVAC(va); } dsb(); } /* * Discard D-cache lines to PoC, prior to overwrite by DMA engine. * * Normal invalidation does L2 then L1 to ensure that stale data from L2 doesn't * flow into L1 while invalidating. This routine is intended to be used only * when invalidating a buffer before a DMA operation loads new data into memory. * The concern in this case is that dirty lines are not evicted to main memory, * overwriting the DMA data. For that reason, the L1 is done first to ensure * that an evicted L1 line doesn't flow to L2 after the L2 has been cleaned. */ static __inline void dcache_inv_poc_dma(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { vm_offset_t eva = va + size; /* invalidate L1 first */ dsb(); va &= ~cpuinfo.dcache_line_mask; for ( ; va < eva; va += cpuinfo.dcache_line_size) { _CP15_DCIMVAC(va); } dsb(); /* then L2 */ cpu_l2cache_inv_range(pa, size); } /* * Write back D-cache to PoC * * Caches are written back from innermost to outermost as dirty cachelines * flow in this direction. In given range, no dirty cacheline should remain * in any cache after this operation finishes. */ static __inline void dcache_wb_poc(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { vm_offset_t eva = va + size; dsb(); va &= ~cpuinfo.dcache_line_mask; for ( ; va < eva; va += cpuinfo.dcache_line_size) { _CP15_DCCMVAC(va); } dsb(); cpu_l2cache_wb_range(pa, size); } /* Write back and invalidate D-cache to PoC */ static __inline void dcache_wbinv_poc(vm_offset_t sva, vm_paddr_t pa, vm_size_t size) { vm_offset_t va; vm_offset_t eva = sva + size; dsb(); /* write back L1 first */ va = sva & ~cpuinfo.dcache_line_mask; for ( ; va < eva; va += cpuinfo.dcache_line_size) { _CP15_DCCMVAC(va); } dsb(); /* then write back and invalidate L2 */ cpu_l2cache_wbinv_range(pa, size); /* then invalidate L1 */ va = sva & ~cpuinfo.dcache_line_mask; for ( ; va < eva; va += cpuinfo.dcache_line_size) { _CP15_DCIMVAC(va); } dsb(); } /* Set TTB0 register */ static __inline void cp15_ttbr_set(uint32_t reg) { dsb(); _CP15_TTB_SET(reg); dsb(); _CP15_BPIALL(); dsb(); isb(); tlb_flush_all_ng_local(); } #else /* ! __ARM_ARCH >= 6 */ /* * armv4/5 compatibility shims. * * These functions provide armv4 cache maintenance using the new armv6 names. * Included here are just the functions actually used now in common code; it may * be necessary to add things here over time. * * The callers of the dcache functions expect these routines to handle address * and size values which are not aligned to cacheline boundaries; the armv4 and * armv5 asm code handles that. */ static __inline void dcache_inv_poc(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { cpu_dcache_inv_range(va, size); cpu_l2cache_inv_range(va, size); } static __inline void dcache_inv_poc_dma(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { /* See armv6 code, above, for why we do L2 before L1 in this case. */ cpu_l2cache_inv_range(va, size); cpu_dcache_inv_range(va, size); } static __inline void dcache_wb_poc(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { cpu_dcache_wb_range(va, size); cpu_l2cache_wb_range(va, size); } #endif /* __ARM_ARCH >= 6 */ #endif /* _KERNEL */ #endif /* !MACHINE_CPU_V6_H */ Index: projects/clang380-import/sys/arm/include/db_machdep.h =================================================================== --- projects/clang380-import/sys/arm/include/db_machdep.h (revision 294776) +++ projects/clang380-import/sys/arm/include/db_machdep.h (revision 294777) @@ -1,98 +1,105 @@ /*- * Mach Operating System * Copyright (c) 1991,1990 Carnegie Mellon University * All Rights Reserved. * * Permission to use, copy, modify and distribute this software and its * documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR * ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie Mellon * the rights to redistribute these changes. * * from: FreeBSD: src/sys/i386/include/db_machdep.h,v 1.16 1999/10/04 * $FreeBSD$ */ #ifndef _MACHINE_DB_MACHDEP_H_ #define _MACHINE_DB_MACHDEP_H_ #include #include #include +#include #define T_BREAKPOINT (1) +#define T_WATCHPOINT (2) typedef vm_offset_t db_addr_t; typedef int db_expr_t; #define PC_REGS() ((db_addr_t)kdb_thrctx->pcb_regs.sf_pc) #define BKPT_INST (KERNEL_BREAKPOINT) #define BKPT_SIZE (INSN_SIZE) #define BKPT_SET(inst) (BKPT_INST) #define BKPT_SKIP do { \ kdb_frame->tf_pc += BKPT_SIZE; \ } while (0) -#define SOFTWARE_SSTEP 1 +#if __ARM_ARCH >= 6 +#define db_clear_single_step kdb_cpu_clear_singlestep +#define db_set_single_step kdb_cpu_set_singlestep +#define db_pc_is_singlestep kdb_cpu_pc_is_singlestep +#else +#define SOFTWARE_SSTEP 1 +#endif #define IS_BREAKPOINT_TRAP(type, code) (type == T_BREAKPOINT) -#define IS_WATCHPOINT_TRAP(type, code) (0) - +#define IS_WATCHPOINT_TRAP(type, code) (type == T_WATCHPOINT) #define inst_trap_return(ins) (0) /* ldmxx reg, {..., pc} 01800000 stack mode 000f0000 register 0000ffff register list */ /* mov pc, reg 0000000f register */ #define inst_return(ins) (((ins) & 0x0e108000) == 0x08108000 || \ ((ins) & 0x0ff0fff0) == 0x01a0f000 || \ ((ins) & 0x0ffffff0) == 0x012fff10) /* bx */ /* bl ... 00ffffff offset>>2 */ #define inst_call(ins) (((ins) & 0x0f000000) == 0x0b000000) /* b ... 00ffffff offset>>2 */ /* ldr pc, [pc, reg, lsl #2] 0000000f register */ #define inst_branch(ins) (((ins) & 0x0f000000) == 0x0a000000 || \ ((ins) & 0x0fdffff0) == 0x079ff100 || \ ((ins) & 0x0cd0f000) == 0x0490f000 || \ ((ins) & 0x0ffffff0) == 0x012fff30 || /* blx */ \ ((ins) & 0x0de0f000) == 0x0080f000) #define inst_load(ins) (0) #define inst_store(ins) (0) #define next_instr_address(pc, bd) ((bd) ? (pc) : ((pc) + INSN_SIZE)) #define DB_SMALL_VALUE_MAX (0x7fffffff) #define DB_SMALL_VALUE_MIN (-0x40001) #define DB_ELFSIZE 32 int db_validate_address(vm_offset_t); u_int branch_taken (u_int insn, db_addr_t pc); #ifdef __ARMEB__ #define BYTE_MSF (1) #endif #endif /* !_MACHINE_DB_MACHDEP_H_ */ Index: projects/clang380-import/sys/arm/include/debug_monitor.h =================================================================== --- projects/clang380-import/sys/arm/include/debug_monitor.h (nonexistent) +++ projects/clang380-import/sys/arm/include/debug_monitor.h (revision 294777) @@ -0,0 +1,80 @@ +/*- + * Copyright (c) 2014 The FreeBSD Foundation + * All rights reserved. + * + * This software was developed by Semihalf under + * the sponsorship of the FreeBSD Foundation. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _MACHINE_DEBUG_MONITOR_H_ +#define _MACHINE_DEBUG_MONITOR_H_ + +#ifdef DDB + +#include + +enum dbg_access_t { + HW_BREAKPOINT_X = 0, + HW_WATCHPOINT_R = 1, + HW_WATCHPOINT_W = 2, + HW_WATCHPOINT_RW = HW_WATCHPOINT_R | HW_WATCHPOINT_W, +}; + +#if __ARM_ARCH >= 6 +void dbg_monitor_init(void); +void dbg_show_watchpoint(void); +int dbg_setup_watchpoint(db_expr_t, db_expr_t, enum dbg_access_t); +int dbg_remove_watchpoint(db_expr_t, db_expr_t); +#else /* __ARM_ARCH >= 6 */ +static __inline void +dbg_show_watchpoint(void) +{ +} +static __inline int +dbg_setup_watchpoint(db_expr_t addr __unused, db_expr_t size __unused, + enum dbg_access_t access __unused) +{ + return (ENXIO); +} +static __inline int +dbg_remove_watchpoint(db_expr_t addr __unused, db_expr_t size __unused) +{ + return (ENXIO); +} +static __inline void +dbg_monitor_init(void) +{ +} +#endif /* __ARM_ARCH < 6 */ + +#else /* DDB */ +static __inline void +dbg_monitor_init(void) +{ +} +#endif + +#endif /* _MACHINE_DEBUG_MONITOR_H_ */ Property changes on: projects/clang380-import/sys/arm/include/debug_monitor.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/arm/include/kdb.h =================================================================== --- projects/clang380-import/sys/arm/include/kdb.h (revision 294776) +++ projects/clang380-import/sys/arm/include/kdb.h (revision 294777) @@ -1,60 +1,67 @@ /*- * Copyright (c) 2004 Marcel Moolenaar * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _MACHINE_KDB_H_ #define _MACHINE_KDB_H_ #include #include #include +#include #define KDB_STOPPEDPCB(pc) &stoppcbs[pc->pc_cpuid] +#if __ARM_ARCH >= 6 +extern void kdb_cpu_clear_singlestep(void); +extern void kdb_cpu_set_singlestep(void); +boolean_t kdb_cpu_pc_is_singlestep(db_addr_t); +#else static __inline void kdb_cpu_clear_singlestep(void) { } static __inline void kdb_cpu_set_singlestep(void) { } +#endif static __inline void kdb_cpu_sync_icache(unsigned char *addr, size_t size) { cpu_icache_sync_range((vm_offset_t)addr, size); } static __inline void kdb_cpu_trap(int type, int code) { } #endif /* _MACHINE_KDB_H_ */ Index: projects/clang380-import/sys/arm/include/ofw_machdep.h =================================================================== --- projects/clang380-import/sys/arm/include/ofw_machdep.h (revision 294776) +++ projects/clang380-import/sys/arm/include/ofw_machdep.h (revision 294777) @@ -1,47 +1,47 @@ /*- * Copyright (c) 2009 The FreeBSD Foundation * All rights reserved. * * This software was developed by Semihalf under sponsorship from * the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _MACHINE_OFW_MACHDEP_H_ #define _MACHINE_OFW_MACHDEP_H_ #include #include #include #include typedef uint32_t cell_t; struct mem_region { - vm_offset_t mr_start; - vm_size_t mr_size; + uint64_t mr_start; + uint64_t mr_size; }; #endif /* _MACHINE_OFW_MACHDEP_H_ */ Index: projects/clang380-import/sys/arm/include/physmem.h =================================================================== --- projects/clang380-import/sys/arm/include/physmem.h (revision 294776) +++ projects/clang380-import/sys/arm/include/physmem.h (revision 294777) @@ -1,91 +1,91 @@ /*- * Copyright (c) 2014 Ian Lepore * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _MACHINE_PHYSMEM_H_ #define _MACHINE_PHYSMEM_H_ /* * The physical address at which the kernel was loaded. */ extern vm_paddr_t arm_physmem_kernaddr; /* * Routines to help configure physical ram. * * Multiple regions of contiguous physical ram can be added (in any order). * * Multiple regions of physical ram that should be excluded from crash dumps, or * memory allocation, or both, can be added (in any order). * * After all early kernel init is done and it's time to configure all * remainining non-excluded physical ram for use by other parts of the kernel, * arm_physmem_init_kernel_globals() processes the hardware regions and * exclusion regions to generate the global dump_avail and phys_avail arrays * that communicate physical ram configuration to other parts of the kernel. */ #define EXFLAG_NODUMP 0x01 #define EXFLAG_NOALLOC 0x02 -void arm_physmem_hardware_region(vm_paddr_t pa, vm_size_t sz); +void arm_physmem_hardware_region(uint64_t pa, uint64_t sz); void arm_physmem_exclude_region(vm_paddr_t pa, vm_size_t sz, uint32_t flags); void arm_physmem_init_kernel_globals(void); void arm_physmem_print_tables(void); /* * Convenience routines for FDT. */ #ifdef FDT #include static inline void arm_physmem_hardware_regions(struct mem_region * mrptr, int mrcount) { while (mrcount--) { arm_physmem_hardware_region(mrptr->mr_start, mrptr->mr_size); ++mrptr; } } static inline void arm_physmem_exclude_regions(struct mem_region * mrptr, int mrcount, uint32_t exflags) { while (mrcount--) { arm_physmem_exclude_region(mrptr->mr_start, mrptr->mr_size, exflags); ++mrptr; } } #endif /* FDT */ #endif Index: projects/clang380-import/sys/arm/include/pmap-v6.h =================================================================== --- projects/clang380-import/sys/arm/include/pmap-v6.h (revision 294776) +++ projects/clang380-import/sys/arm/include/pmap-v6.h (revision 294777) @@ -1,307 +1,309 @@ /*- * Copyright 2014 Svatopluk Kraus * Copyright 2014 Michal Meloun * Copyright (c) 1991 Regents of the University of California. * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * The ARM version of this file was more or less based on the i386 version, * which has the following provenance... * * Derived from hp300 version by Mike Hibler, this version by William * Jolitz uses a recursive map [a pde points to the page directory] to * map the page tables using the pagetables themselves. This is done to * reduce the impact on kernel virtual memory for lots of sparse address * space, and to reduce the cost of memory to each process. * * from: hp300: @(#)pmap.h 7.2 (Berkeley) 12/16/90 * from: @(#)pmap.h 7.4 (Berkeley) 5/12/91 * from: FreeBSD: src/sys/i386/include/pmap.h,v 1.70 2000/11/30 * * $FreeBSD$ */ #ifndef _MACHINE_PMAP_H_ #define _MACHINE_PMAP_H_ #include #include #include #include typedef uint32_t pt1_entry_t; /* L1 table entry */ typedef uint32_t pt2_entry_t; /* L2 table entry */ typedef uint32_t ttb_entry_t; /* TTB entry */ #ifdef _KERNEL #if 0 #define PMAP_PTE_NOCACHE // Use uncached page tables #endif /* * (1) During pmap bootstrap, physical pages for L2 page tables are * allocated in advance which are used for KVA continuous mapping * starting from KERNBASE. This makes things more simple. * (2) During vm subsystem initialization, only vm subsystem itself can * allocate physical memory safely. As pmap_map() is called during * this initialization, we must be prepared for that and have some * preallocated physical pages for L2 page tables. * * Note that some more pages for L2 page tables are preallocated too * for mappings laying above VM_MAX_KERNEL_ADDRESS. */ #ifndef NKPT2PG /* * The optimal way is to define this in board configuration as * definition here must be safe enough. It means really big. * * 1 GB KVA <=> 256 kernel L2 page table pages * * From real platforms: * 1 GB physical memory <=> 10 pages is enough * 2 GB physical memory <=> 21 pages is enough */ #define NKPT2PG 32 #endif extern vm_paddr_t phys_avail[]; extern vm_paddr_t dump_avail[]; extern char *_tmppt; /* poor name! */ extern vm_offset_t virtual_avail; extern vm_offset_t virtual_end; /* * Pmap stuff */ /* * This structure is used to hold a virtual<->physical address * association and is used mostly by bootstrap code */ struct pv_addr { SLIST_ENTRY(pv_addr) pv_list; vm_offset_t pv_va; vm_paddr_t pv_pa; }; #endif struct pv_entry; struct pv_chunk; struct md_page { TAILQ_HEAD(,pv_entry) pv_list; uint16_t pt2_wirecount[4]; int pat_mode; }; struct pmap { struct mtx pm_mtx; pt1_entry_t *pm_pt1; /* KVA of pt1 */ pt2_entry_t *pm_pt2tab; /* KVA of pt2 pages table */ TAILQ_HEAD(,pv_chunk) pm_pvchunk; /* list of mappings in pmap */ cpuset_t pm_active; /* active on cpus */ struct pmap_statistics pm_stats; /* pmap statictics */ LIST_ENTRY(pmap) pm_list; /* List of all pmaps */ }; typedef struct pmap *pmap_t; #ifdef _KERNEL extern struct pmap kernel_pmap_store; #define kernel_pmap (&kernel_pmap_store) #define PMAP_LOCK(pmap) mtx_lock(&(pmap)->pm_mtx) #define PMAP_LOCK_ASSERT(pmap, type) \ mtx_assert(&(pmap)->pm_mtx, (type)) #define PMAP_LOCK_DESTROY(pmap) mtx_destroy(&(pmap)->pm_mtx) #define PMAP_LOCK_INIT(pmap) mtx_init(&(pmap)->pm_mtx, "pmap", \ NULL, MTX_DEF | MTX_DUPOK) #define PMAP_LOCKED(pmap) mtx_owned(&(pmap)->pm_mtx) #define PMAP_MTX(pmap) (&(pmap)->pm_mtx) #define PMAP_TRYLOCK(pmap) mtx_trylock(&(pmap)->pm_mtx) #define PMAP_UNLOCK(pmap) mtx_unlock(&(pmap)->pm_mtx) #endif /* * For each vm_page_t, there is a list of all currently valid virtual * mappings of that page. An entry is a pv_entry_t, the list is pv_list. */ typedef struct pv_entry { vm_offset_t pv_va; /* virtual address for mapping */ TAILQ_ENTRY(pv_entry) pv_next; } *pv_entry_t; /* * pv_entries are allocated in chunks per-process. This avoids the * need to track per-pmap assignments. */ #define _NPCM 11 #define _NPCPV 336 struct pv_chunk { pmap_t pc_pmap; TAILQ_ENTRY(pv_chunk) pc_list; uint32_t pc_map[_NPCM]; /* bitmap; 1 = free */ TAILQ_ENTRY(pv_chunk) pc_lru; struct pv_entry pc_pventry[_NPCPV]; }; #ifdef _KERNEL struct pcb; extern ttb_entry_t pmap_kern_ttb; /* TTB for kernel pmap */ #define pmap_page_get_memattr(m) ((vm_memattr_t)(m)->md.pat_mode) #define pmap_page_is_write_mapped(m) (((m)->aflags & PGA_WRITEABLE) != 0) /* * Only the following functions or macros may be used before pmap_bootstrap() * is called: pmap_kenter(), pmap_kextract(), pmap_kremove(), vtophys(), and * vtopte2(). */ void pmap_bootstrap(vm_offset_t ); void pmap_kenter(vm_offset_t , vm_paddr_t ); void *pmap_kenter_temporary(vm_paddr_t , int ); void pmap_kremove(vm_offset_t); void *pmap_mapdev(vm_paddr_t, vm_size_t); void *pmap_mapdev_attr(vm_paddr_t, vm_size_t, int); boolean_t pmap_page_is_mapped(vm_page_t ); void pmap_page_set_memattr(vm_page_t , vm_memattr_t ); void pmap_unmapdev(vm_offset_t, vm_size_t); void pmap_kenter_device(vm_offset_t, vm_size_t, vm_paddr_t); void pmap_kremove_device(vm_offset_t, vm_size_t); void pmap_set_pcb_pagedir(pmap_t , struct pcb *); void pmap_tlb_flush(pmap_t , vm_offset_t ); void pmap_tlb_flush_range(pmap_t , vm_offset_t , vm_size_t ); void pmap_dcache_wb_range(vm_paddr_t , vm_size_t , vm_memattr_t ); vm_paddr_t pmap_kextract(vm_offset_t ); +vm_paddr_t pmap_dump_kextract(vm_offset_t, pt2_entry_t *); + int pmap_fault(pmap_t , vm_offset_t , uint32_t , int , bool); #define vtophys(va) pmap_kextract((vm_offset_t)(va)) void pmap_set_tex(void); void reinit_mmu(ttb_entry_t ttb, u_int aux_clr, u_int aux_set); /* * Pre-bootstrap epoch functions set. */ void pmap_bootstrap_prepare(vm_paddr_t ); vm_paddr_t pmap_preboot_get_pages(u_int ); void pmap_preboot_map_pages(vm_paddr_t , vm_offset_t , u_int ); vm_offset_t pmap_preboot_reserve_pages(u_int ); vm_offset_t pmap_preboot_get_vpages(u_int ); void pmap_preboot_map_attr(vm_paddr_t , vm_offset_t , vm_size_t , int , int ); static __inline void pmap_map_chunk(vm_offset_t l1pt, vm_offset_t va, vm_offset_t pa, vm_size_t size, int prot, int cache) { pmap_preboot_map_attr(pa, va, size, prot, cache); } /* * This structure is used by machine-dependent code to describe * static mappings of devices, created at bootstrap time. */ struct pmap_devmap { vm_offset_t pd_va; /* virtual address */ vm_paddr_t pd_pa; /* physical address */ vm_size_t pd_size; /* size of region */ vm_prot_t pd_prot; /* protection code */ int pd_cache; /* cache attributes */ }; void pmap_devmap_bootstrap(const struct pmap_devmap *); #endif /* _KERNEL */ // ----------------- TO BE DELETED --------------------------------------------- #include #ifdef _KERNEL /* * sys/arm/arm/elf_trampoline.c * sys/arm/arm/genassym.c * sys/arm/arm/machdep.c * sys/arm/arm/mp_machdep.c * sys/arm/arm/locore.S * sys/arm/arm/pmap.c * sys/arm/arm/swtch.S * sys/arm/at91/at91_machdep.c * sys/arm/cavium/cns11xx/econa_machdep.c * sys/arm/s3c2xx0/s3c24x0_machdep.c * sys/arm/xscale/ixp425/avila_machdep.c * sys/arm/xscale/i8134x/crb_machdep.c * sys/arm/xscale/i80321/ep80219_machdep.c * sys/arm/xscale/i80321/iq31244_machdep.c * sys/arm/xscale/pxa/pxa_machdep.c */ #define PMAP_DOMAIN_KERNEL 0 /* The kernel uses domain #0 */ /* * sys/arm/arm/cpufunc.c */ void pmap_pte_init_mmu_v6(void); void vector_page_setprot(int); /* * sys/arm/arm/db_interface.c * sys/arm/arm/machdep.c * sys/arm/arm/minidump_machdep.c * sys/arm/arm/pmap.c */ #define pmap_kernel() kernel_pmap /* * sys/arm/arm/bus_space_generic.c (just comment) * sys/arm/arm/devmap.c * sys/arm/arm/pmap.c (just comment) * sys/arm/at91/at91_machdep.c * sys/arm/cavium/cns11xx/econa_machdep.c * sys/arm/freescale/imx/imx6_machdep.c (just comment) * sys/arm/mv/orion/db88f5xxx.c * sys/arm/mv/mv_localbus.c * sys/arm/mv/mv_machdep.c * sys/arm/mv/mv_pci.c * sys/arm/s3c2xx0/s3c24x0_machdep.c * sys/arm/versatile/versatile_machdep.c * sys/arm/xscale/ixp425/avila_machdep.c * sys/arm/xscale/i8134x/crb_machdep.c * sys/arm/xscale/i80321/ep80219_machdep.c * sys/arm/xscale/i80321/iq31244_machdep.c * sys/arm/xscale/pxa/pxa_machdep.c */ #define PTE_DEVICE PTE2_ATTR_DEVICE #endif /* _KERNEL */ // ----------------------------------------------------------------------------- #endif /* !_MACHINE_PMAP_H_ */ Index: projects/clang380-import/sys/arm/include/pmap.h =================================================================== --- projects/clang380-import/sys/arm/include/pmap.h (revision 294776) +++ projects/clang380-import/sys/arm/include/pmap.h (revision 294777) @@ -1,706 +1,707 @@ /*- * Copyright (c) 1991 Regents of the University of California. * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department and William Jolitz of UUNET Technologies Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * Derived from hp300 version by Mike Hibler, this version by William * Jolitz uses a recursive map [a pde points to the page directory] to * map the page tables using the pagetables themselves. This is done to * reduce the impact on kernel virtual memory for lots of sparse address * space, and to reduce the cost of memory to each process. * * from: hp300: @(#)pmap.h 7.2 (Berkeley) 12/16/90 * from: @(#)pmap.h 7.4 (Berkeley) 5/12/91 * from: FreeBSD: src/sys/i386/include/pmap.h,v 1.70 2000/11/30 * * $FreeBSD$ */ #ifdef ARM_NEW_PMAP #include #else /* ARM_NEW_PMAP */ #ifndef _MACHINE_PMAP_H_ #define _MACHINE_PMAP_H_ #include #include /* * Pte related macros */ #if ARM_ARCH_6 || ARM_ARCH_7A #ifdef SMP #define PTE_NOCACHE 2 #else #define PTE_NOCACHE 1 #endif #define PTE_CACHE 6 #define PTE_DEVICE 2 #define PTE_PAGETABLE 6 #else #define PTE_NOCACHE 1 #define PTE_CACHE 2 #define PTE_DEVICE PTE_NOCACHE #define PTE_PAGETABLE 3 #endif enum mem_type { STRONG_ORD = 0, DEVICE_NOSHARE, DEVICE_SHARE, NRML_NOCACHE, NRML_IWT_OWT, NRML_IWB_OWB, NRML_IWBA_OWBA }; #ifndef LOCORE #include #include #include #include #define PDESIZE sizeof(pd_entry_t) /* for assembly files */ #define PTESIZE sizeof(pt_entry_t) /* for assembly files */ #ifdef _KERNEL #define vtophys(va) pmap_kextract((vm_offset_t)(va)) #endif #define pmap_page_get_memattr(m) ((m)->md.pv_memattr) #define pmap_page_is_write_mapped(m) (((m)->aflags & PGA_WRITEABLE) != 0) #if (ARM_MMU_V6 + ARM_MMU_V7) > 0 boolean_t pmap_page_is_mapped(vm_page_t); #else #define pmap_page_is_mapped(m) (!TAILQ_EMPTY(&(m)->md.pv_list)) #endif void pmap_page_set_memattr(vm_page_t m, vm_memattr_t ma); /* * Pmap stuff */ /* * This structure is used to hold a virtual<->physical address * association and is used mostly by bootstrap code */ struct pv_addr { SLIST_ENTRY(pv_addr) pv_list; vm_offset_t pv_va; vm_paddr_t pv_pa; }; struct pv_entry; struct pv_chunk; struct md_page { int pvh_attrs; vm_memattr_t pv_memattr; #if (ARM_MMU_V6 + ARM_MMU_V7) == 0 vm_offset_t pv_kva; /* first kernel VA mapping */ #endif TAILQ_HEAD(,pv_entry) pv_list; }; struct l1_ttable; struct l2_dtable; /* * The number of L2 descriptor tables which can be tracked by an l2_dtable. * A bucket size of 16 provides for 16MB of contiguous virtual address * space per l2_dtable. Most processes will, therefore, require only two or * three of these to map their whole working set. */ #define L2_BUCKET_LOG2 4 #define L2_BUCKET_SIZE (1 << L2_BUCKET_LOG2) /* * Given the above "L2-descriptors-per-l2_dtable" constant, the number * of l2_dtable structures required to track all possible page descriptors * mappable by an L1 translation table is given by the following constants: */ #define L2_LOG2 ((32 - L1_S_SHIFT) - L2_BUCKET_LOG2) #define L2_SIZE (1 << L2_LOG2) struct pmap { struct mtx pm_mtx; u_int8_t pm_domain; struct l1_ttable *pm_l1; struct l2_dtable *pm_l2[L2_SIZE]; cpuset_t pm_active; /* active on cpus */ struct pmap_statistics pm_stats; /* pmap statictics */ #if (ARM_MMU_V6 + ARM_MMU_V7) != 0 TAILQ_HEAD(,pv_chunk) pm_pvchunk; /* list of mappings in pmap */ #else TAILQ_HEAD(,pv_entry) pm_pvlist; /* list of mappings in pmap */ #endif }; typedef struct pmap *pmap_t; #ifdef _KERNEL extern struct pmap kernel_pmap_store; #define kernel_pmap (&kernel_pmap_store) #define pmap_kernel() kernel_pmap #define PMAP_ASSERT_LOCKED(pmap) \ mtx_assert(&(pmap)->pm_mtx, MA_OWNED) #define PMAP_LOCK(pmap) mtx_lock(&(pmap)->pm_mtx) #define PMAP_LOCK_DESTROY(pmap) mtx_destroy(&(pmap)->pm_mtx) #define PMAP_LOCK_INIT(pmap) mtx_init(&(pmap)->pm_mtx, "pmap", \ NULL, MTX_DEF | MTX_DUPOK) #define PMAP_OWNED(pmap) mtx_owned(&(pmap)->pm_mtx) #define PMAP_MTX(pmap) (&(pmap)->pm_mtx) #define PMAP_TRYLOCK(pmap) mtx_trylock(&(pmap)->pm_mtx) #define PMAP_UNLOCK(pmap) mtx_unlock(&(pmap)->pm_mtx) #endif /* * For each vm_page_t, there is a list of all currently valid virtual * mappings of that page. An entry is a pv_entry_t, the list is pv_list. */ typedef struct pv_entry { vm_offset_t pv_va; /* virtual address for mapping */ TAILQ_ENTRY(pv_entry) pv_list; int pv_flags; /* flags (wired, etc...) */ #if (ARM_MMU_V6 + ARM_MMU_V7) == 0 pmap_t pv_pmap; /* pmap where mapping lies */ TAILQ_ENTRY(pv_entry) pv_plist; #endif } *pv_entry_t; /* * pv_entries are allocated in chunks per-process. This avoids the * need to track per-pmap assignments. */ #define _NPCM 8 #define _NPCPV 252 struct pv_chunk { pmap_t pc_pmap; TAILQ_ENTRY(pv_chunk) pc_list; uint32_t pc_map[_NPCM]; /* bitmap; 1 = free */ uint32_t pc_dummy[3]; /* aligns pv_chunk to 4KB */ TAILQ_ENTRY(pv_chunk) pc_lru; struct pv_entry pc_pventry[_NPCPV]; }; #ifdef _KERNEL boolean_t pmap_get_pde_pte(pmap_t, vm_offset_t, pd_entry_t **, pt_entry_t **); /* * virtual address to page table entry and * to physical address. Likewise for alternate address space. * Note: these work recursively, thus vtopte of a pte will give * the corresponding pde that in turn maps it. */ /* * The current top of kernel VM. */ extern vm_offset_t pmap_curmaxkvaddr; struct pcb; void pmap_set_pcb_pagedir(pmap_t, struct pcb *); /* Virtual address to page table entry */ static __inline pt_entry_t * vtopte(vm_offset_t va) { pd_entry_t *pdep; pt_entry_t *ptep; if (pmap_get_pde_pte(pmap_kernel(), va, &pdep, &ptep) == FALSE) return (NULL); return (ptep); } extern vm_paddr_t phys_avail[]; extern vm_offset_t virtual_avail; extern vm_offset_t virtual_end; void pmap_bootstrap(vm_offset_t firstaddr, struct pv_addr *l1pt); int pmap_change_attr(vm_offset_t, vm_size_t, int); void pmap_kenter(vm_offset_t va, vm_paddr_t pa); void pmap_kenter_nocache(vm_offset_t va, vm_paddr_t pa); void pmap_kenter_device(vm_offset_t, vm_size_t, vm_paddr_t); void pmap_kremove_device(vm_offset_t, vm_size_t); void *pmap_kenter_temporary(vm_paddr_t pa, int i); void pmap_kenter_user(vm_offset_t va, vm_paddr_t pa); vm_paddr_t pmap_kextract(vm_offset_t va); +vm_paddr_t pmap_dump_kextract(vm_offset_t, pt2_entry_t *); void pmap_kremove(vm_offset_t); void *pmap_mapdev(vm_offset_t, vm_size_t); void pmap_unmapdev(vm_offset_t, vm_size_t); vm_page_t pmap_use_pt(pmap_t, vm_offset_t); void pmap_debug(int); #if (ARM_MMU_V6 + ARM_MMU_V7) == 0 void pmap_map_section(vm_offset_t, vm_offset_t, vm_offset_t, int, int); #endif void pmap_link_l2pt(vm_offset_t, vm_offset_t, struct pv_addr *); vm_size_t pmap_map_chunk(vm_offset_t, vm_offset_t, vm_offset_t, vm_size_t, int, int); void pmap_map_entry(vm_offset_t l1pt, vm_offset_t va, vm_offset_t pa, int prot, int cache); int pmap_fault_fixup(pmap_t, vm_offset_t, vm_prot_t, int); /* * Definitions for MMU domains */ #define PMAP_DOMAINS 15 /* 15 'user' domains (1-15) */ #define PMAP_DOMAIN_KERNEL 0 /* The kernel uses domain #0 */ /* * The new pmap ensures that page-tables are always mapping Write-Thru. * Thus, on some platforms we can run fast and loose and avoid syncing PTEs * on every change. * * Unfortunately, not all CPUs have a write-through cache mode. So we * define PMAP_NEEDS_PTE_SYNC for C code to conditionally do PTE syncs, * and if there is the chance for PTE syncs to be needed, we define * PMAP_INCLUDE_PTE_SYNC so e.g. assembly code can include (and run) * the code. */ extern int pmap_needs_pte_sync; /* * These macros define the various bit masks in the PTE. * * We use these macros since we use different bits on different processor * models. */ #define L1_S_CACHE_MASK_generic (L1_S_B|L1_S_C) #define L1_S_CACHE_MASK_xscale (L1_S_B|L1_S_C|L1_S_XSCALE_TEX(TEX_XSCALE_X)|\ L1_S_XSCALE_TEX(TEX_XSCALE_T)) #define L2_L_CACHE_MASK_generic (L2_B|L2_C) #define L2_L_CACHE_MASK_xscale (L2_B|L2_C|L2_XSCALE_L_TEX(TEX_XSCALE_X) | \ L2_XSCALE_L_TEX(TEX_XSCALE_T)) #define L2_S_PROT_U_generic (L2_AP(AP_U)) #define L2_S_PROT_W_generic (L2_AP(AP_W)) #define L2_S_PROT_MASK_generic (L2_S_PROT_U|L2_S_PROT_W) #define L2_S_PROT_U_xscale (L2_AP0(AP_U)) #define L2_S_PROT_W_xscale (L2_AP0(AP_W)) #define L2_S_PROT_MASK_xscale (L2_S_PROT_U|L2_S_PROT_W) #define L2_S_CACHE_MASK_generic (L2_B|L2_C) #define L2_S_CACHE_MASK_xscale (L2_B|L2_C|L2_XSCALE_T_TEX(TEX_XSCALE_X)| \ L2_XSCALE_T_TEX(TEX_XSCALE_X)) #define L1_S_PROTO_generic (L1_TYPE_S | L1_S_IMP) #define L1_S_PROTO_xscale (L1_TYPE_S) #define L1_C_PROTO_generic (L1_TYPE_C | L1_C_IMP2) #define L1_C_PROTO_xscale (L1_TYPE_C) #define L2_L_PROTO (L2_TYPE_L) #define L2_S_PROTO_generic (L2_TYPE_S) #define L2_S_PROTO_xscale (L2_TYPE_XSCALE_XS) /* * User-visible names for the ones that vary with MMU class. */ #if (ARM_MMU_V6 + ARM_MMU_V7) != 0 #define L2_AP(x) (L2_AP0(x)) #else #define L2_AP(x) (L2_AP0(x) | L2_AP1(x) | L2_AP2(x) | L2_AP3(x)) #endif #if (ARM_MMU_V6 + ARM_MMU_V7) != 0 /* * AP[2:1] access permissions model: * * AP[2](APX) - Write Disable * AP[1] - User Enable * AP[0] - Reference Flag * * AP[2] AP[1] Kernel User * 0 0 R/W N * 0 1 R/W R/W * 1 0 R N * 1 1 R R * */ #define L2_S_PROT_R (0) /* kernel read */ #define L2_S_PROT_U (L2_AP0(2)) /* user read */ #define L2_S_REF (L2_AP0(1)) /* reference flag */ #define L2_S_PROT_MASK (L2_S_PROT_U|L2_S_PROT_R|L2_APX) #define L2_S_EXECUTABLE(pte) (!(pte & L2_XN)) #define L2_S_WRITABLE(pte) (!(pte & L2_APX)) #define L2_S_REFERENCED(pte) (!!(pte & L2_S_REF)) #ifndef SMP #define L1_S_CACHE_MASK (L1_S_TEX_MASK|L1_S_B|L1_S_C) #define L2_L_CACHE_MASK (L2_L_TEX_MASK|L2_B|L2_C) #define L2_S_CACHE_MASK (L2_S_TEX_MASK|L2_B|L2_C) #else #define L1_S_CACHE_MASK (L1_S_TEX_MASK|L1_S_B|L1_S_C|L1_SHARED) #define L2_L_CACHE_MASK (L2_L_TEX_MASK|L2_B|L2_C|L2_SHARED) #define L2_S_CACHE_MASK (L2_S_TEX_MASK|L2_B|L2_C|L2_SHARED) #endif /* SMP */ #define L1_S_PROTO (L1_TYPE_S) #define L1_C_PROTO (L1_TYPE_C) #define L2_S_PROTO (L2_TYPE_S) /* * Promotion to a 1MB (SECTION) mapping requires that the corresponding * 4KB (SMALL) page mappings have identical settings for the following fields: */ #define L2_S_PROMOTE (L2_S_REF | L2_SHARED | L2_S_PROT_MASK | \ L2_XN | L2_S_PROTO) /* * In order to compare 1MB (SECTION) entry settings with the 4KB (SMALL) * page mapping it is necessary to read and shift appropriate bits from * L1 entry to positions of the corresponding bits in the L2 entry. */ #define L1_S_DEMOTE(l1pd) ((((l1pd) & L1_S_PROTO) >> 0) | \ (((l1pd) & L1_SHARED) >> 6) | \ (((l1pd) & L1_S_REF) >> 6) | \ (((l1pd) & L1_S_PROT_MASK) >> 6) | \ (((l1pd) & L1_S_XN) >> 4)) #ifndef SMP #define ARM_L1S_STRONG_ORD (0) #define ARM_L1S_DEVICE_NOSHARE (L1_S_TEX(2)) #define ARM_L1S_DEVICE_SHARE (L1_S_B) #define ARM_L1S_NRML_NOCACHE (L1_S_TEX(1)) #define ARM_L1S_NRML_IWT_OWT (L1_S_C) #define ARM_L1S_NRML_IWB_OWB (L1_S_C|L1_S_B) #define ARM_L1S_NRML_IWBA_OWBA (L1_S_TEX(1)|L1_S_C|L1_S_B) #define ARM_L2L_STRONG_ORD (0) #define ARM_L2L_DEVICE_NOSHARE (L2_L_TEX(2)) #define ARM_L2L_DEVICE_SHARE (L2_B) #define ARM_L2L_NRML_NOCACHE (L2_L_TEX(1)) #define ARM_L2L_NRML_IWT_OWT (L2_C) #define ARM_L2L_NRML_IWB_OWB (L2_C|L2_B) #define ARM_L2L_NRML_IWBA_OWBA (L2_L_TEX(1)|L2_C|L2_B) #define ARM_L2S_STRONG_ORD (0) #define ARM_L2S_DEVICE_NOSHARE (L2_S_TEX(2)) #define ARM_L2S_DEVICE_SHARE (L2_B) #define ARM_L2S_NRML_NOCACHE (L2_S_TEX(1)) #define ARM_L2S_NRML_IWT_OWT (L2_C) #define ARM_L2S_NRML_IWB_OWB (L2_C|L2_B) #define ARM_L2S_NRML_IWBA_OWBA (L2_S_TEX(1)|L2_C|L2_B) #else #define ARM_L1S_STRONG_ORD (0) #define ARM_L1S_DEVICE_NOSHARE (L1_S_TEX(2)) #define ARM_L1S_DEVICE_SHARE (L1_S_B) #define ARM_L1S_NRML_NOCACHE (L1_S_TEX(1)|L1_SHARED) #define ARM_L1S_NRML_IWT_OWT (L1_S_C|L1_SHARED) #define ARM_L1S_NRML_IWB_OWB (L1_S_C|L1_S_B|L1_SHARED) #define ARM_L1S_NRML_IWBA_OWBA (L1_S_TEX(1)|L1_S_C|L1_S_B|L1_SHARED) #define ARM_L2L_STRONG_ORD (0) #define ARM_L2L_DEVICE_NOSHARE (L2_L_TEX(2)) #define ARM_L2L_DEVICE_SHARE (L2_B) #define ARM_L2L_NRML_NOCACHE (L2_L_TEX(1)|L2_SHARED) #define ARM_L2L_NRML_IWT_OWT (L2_C|L2_SHARED) #define ARM_L2L_NRML_IWB_OWB (L2_C|L2_B|L2_SHARED) #define ARM_L2L_NRML_IWBA_OWBA (L2_L_TEX(1)|L2_C|L2_B|L2_SHARED) #define ARM_L2S_STRONG_ORD (0) #define ARM_L2S_DEVICE_NOSHARE (L2_S_TEX(2)) #define ARM_L2S_DEVICE_SHARE (L2_B) #define ARM_L2S_NRML_NOCACHE (L2_S_TEX(1)|L2_SHARED) #define ARM_L2S_NRML_IWT_OWT (L2_C|L2_SHARED) #define ARM_L2S_NRML_IWB_OWB (L2_C|L2_B|L2_SHARED) #define ARM_L2S_NRML_IWBA_OWBA (L2_S_TEX(1)|L2_C|L2_B|L2_SHARED) #endif /* SMP */ #elif ARM_NMMUS > 1 /* More than one MMU class configured; use variables. */ #define L2_S_PROT_U pte_l2_s_prot_u #define L2_S_PROT_W pte_l2_s_prot_w #define L2_S_PROT_MASK pte_l2_s_prot_mask #define L1_S_CACHE_MASK pte_l1_s_cache_mask #define L2_L_CACHE_MASK pte_l2_l_cache_mask #define L2_S_CACHE_MASK pte_l2_s_cache_mask #define L1_S_PROTO pte_l1_s_proto #define L1_C_PROTO pte_l1_c_proto #define L2_S_PROTO pte_l2_s_proto #elif ARM_MMU_GENERIC != 0 #define L2_S_PROT_U L2_S_PROT_U_generic #define L2_S_PROT_W L2_S_PROT_W_generic #define L2_S_PROT_MASK L2_S_PROT_MASK_generic #define L1_S_CACHE_MASK L1_S_CACHE_MASK_generic #define L2_L_CACHE_MASK L2_L_CACHE_MASK_generic #define L2_S_CACHE_MASK L2_S_CACHE_MASK_generic #define L1_S_PROTO L1_S_PROTO_generic #define L1_C_PROTO L1_C_PROTO_generic #define L2_S_PROTO L2_S_PROTO_generic #elif ARM_MMU_XSCALE == 1 #define L2_S_PROT_U L2_S_PROT_U_xscale #define L2_S_PROT_W L2_S_PROT_W_xscale #define L2_S_PROT_MASK L2_S_PROT_MASK_xscale #define L1_S_CACHE_MASK L1_S_CACHE_MASK_xscale #define L2_L_CACHE_MASK L2_L_CACHE_MASK_xscale #define L2_S_CACHE_MASK L2_S_CACHE_MASK_xscale #define L1_S_PROTO L1_S_PROTO_xscale #define L1_C_PROTO L1_C_PROTO_xscale #define L2_S_PROTO L2_S_PROTO_xscale #endif /* ARM_NMMUS > 1 */ #if defined(CPU_XSCALE_81342) || ARM_ARCH_6 || ARM_ARCH_7A #define PMAP_NEEDS_PTE_SYNC 1 #define PMAP_INCLUDE_PTE_SYNC #else #define PMAP_NEEDS_PTE_SYNC 0 #endif /* * These macros return various bits based on kernel/user and protection. * Note that the compiler will usually fold these at compile time. */ #if (ARM_MMU_V6 + ARM_MMU_V7) == 0 #define L1_S_PROT_U (L1_S_AP(AP_U)) #define L1_S_PROT_W (L1_S_AP(AP_W)) #define L1_S_PROT_MASK (L1_S_PROT_U|L1_S_PROT_W) #define L1_S_WRITABLE(pd) ((pd) & L1_S_PROT_W) #define L1_S_PROT(ku, pr) ((((ku) == PTE_USER) ? L1_S_PROT_U : 0) | \ (((pr) & VM_PROT_WRITE) ? L1_S_PROT_W : 0)) #define L2_L_PROT_U (L2_AP(AP_U)) #define L2_L_PROT_W (L2_AP(AP_W)) #define L2_L_PROT_MASK (L2_L_PROT_U|L2_L_PROT_W) #define L2_L_PROT(ku, pr) ((((ku) == PTE_USER) ? L2_L_PROT_U : 0) | \ (((pr) & VM_PROT_WRITE) ? L2_L_PROT_W : 0)) #define L2_S_PROT(ku, pr) ((((ku) == PTE_USER) ? L2_S_PROT_U : 0) | \ (((pr) & VM_PROT_WRITE) ? L2_S_PROT_W : 0)) #else #define L1_S_PROT_U (L1_S_AP(AP_U)) #define L1_S_PROT_W (L1_S_APX) /* Write disable */ #define L1_S_PROT_MASK (L1_S_PROT_W|L1_S_PROT_U) #define L1_S_REF (L1_S_AP(AP_REF)) /* Reference flag */ #define L1_S_WRITABLE(pd) (!((pd) & L1_S_PROT_W)) #define L1_S_EXECUTABLE(pd) (!((pd) & L1_S_XN)) #define L1_S_REFERENCED(pd) ((pd) & L1_S_REF) #define L1_S_PROT(ku, pr) (((((ku) == PTE_KERNEL) ? 0 : L1_S_PROT_U) | \ (((pr) & VM_PROT_WRITE) ? 0 : L1_S_PROT_W) | \ (((pr) & VM_PROT_EXECUTE) ? 0 : L1_S_XN))) #define L2_L_PROT_MASK (L2_APX|L2_AP0(0x3)) #define L2_L_PROT(ku, pr) (L2_L_PROT_MASK & ~((((ku) == PTE_KERNEL) ? L2_S_PROT_U : 0) | \ (((pr) & VM_PROT_WRITE) ? L2_APX : 0))) #define L2_S_PROT(ku, pr) (L2_S_PROT_MASK & ~((((ku) == PTE_KERNEL) ? L2_S_PROT_U : 0) | \ (((pr) & VM_PROT_WRITE) ? L2_APX : 0))) #endif /* * Macros to test if a mapping is mappable with an L1 Section mapping * or an L2 Large Page mapping. */ #define L1_S_MAPPABLE_P(va, pa, size) \ ((((va) | (pa)) & L1_S_OFFSET) == 0 && (size) >= L1_S_SIZE) #define L2_L_MAPPABLE_P(va, pa, size) \ ((((va) | (pa)) & L2_L_OFFSET) == 0 && (size) >= L2_L_SIZE) /* * Provide a fallback in case we were not able to determine it at * compile-time. */ #ifndef PMAP_NEEDS_PTE_SYNC #define PMAP_NEEDS_PTE_SYNC pmap_needs_pte_sync #define PMAP_INCLUDE_PTE_SYNC #endif #ifdef ARM_L2_PIPT #define _sync_l2(pte, size) cpu_l2cache_wb_range(vtophys(pte), size) #else #define _sync_l2(pte, size) cpu_l2cache_wb_range(pte, size) #endif #define PTE_SYNC(pte) \ do { \ if (PMAP_NEEDS_PTE_SYNC) { \ cpu_dcache_wb_range((vm_offset_t)(pte), sizeof(pt_entry_t));\ cpu_drain_writebuf(); \ _sync_l2((vm_offset_t)(pte), sizeof(pt_entry_t));\ } else \ cpu_drain_writebuf(); \ } while (/*CONSTCOND*/0) #define PTE_SYNC_RANGE(pte, cnt) \ do { \ if (PMAP_NEEDS_PTE_SYNC) { \ cpu_dcache_wb_range((vm_offset_t)(pte), \ (cnt) << 2); /* * sizeof(pt_entry_t) */ \ cpu_drain_writebuf(); \ _sync_l2((vm_offset_t)(pte), \ (cnt) << 2); /* * sizeof(pt_entry_t) */ \ } else \ cpu_drain_writebuf(); \ } while (/*CONSTCOND*/0) extern pt_entry_t pte_l1_s_cache_mode; extern pt_entry_t pte_l1_s_cache_mask; extern pt_entry_t pte_l2_l_cache_mode; extern pt_entry_t pte_l2_l_cache_mask; extern pt_entry_t pte_l2_s_cache_mode; extern pt_entry_t pte_l2_s_cache_mask; extern pt_entry_t pte_l1_s_cache_mode_pt; extern pt_entry_t pte_l2_l_cache_mode_pt; extern pt_entry_t pte_l2_s_cache_mode_pt; extern pt_entry_t pte_l2_s_prot_u; extern pt_entry_t pte_l2_s_prot_w; extern pt_entry_t pte_l2_s_prot_mask; extern pt_entry_t pte_l1_s_proto; extern pt_entry_t pte_l1_c_proto; extern pt_entry_t pte_l2_s_proto; extern void (*pmap_copy_page_func)(vm_paddr_t, vm_paddr_t); extern void (*pmap_copy_page_offs_func)(vm_paddr_t a_phys, vm_offset_t a_offs, vm_paddr_t b_phys, vm_offset_t b_offs, int cnt); extern void (*pmap_zero_page_func)(vm_paddr_t, int, int); #if (ARM_MMU_GENERIC + ARM_MMU_V6 + ARM_MMU_V7) != 0 || defined(CPU_XSCALE_81342) void pmap_copy_page_generic(vm_paddr_t, vm_paddr_t); void pmap_zero_page_generic(vm_paddr_t, int, int); void pmap_pte_init_generic(void); #if (ARM_MMU_V6 + ARM_MMU_V7) != 0 void pmap_pte_init_mmu_v6(void); #endif /* (ARM_MMU_V6 + ARM_MMU_V7) != 0 */ #endif /* (ARM_MMU_GENERIC + ARM_MMU_V6 + ARM_MMU_V7) != 0 */ #if ARM_MMU_XSCALE == 1 void pmap_copy_page_xscale(vm_paddr_t, vm_paddr_t); void pmap_zero_page_xscale(vm_paddr_t, int, int); void pmap_pte_init_xscale(void); void xscale_setup_minidata(vm_offset_t, vm_offset_t, vm_offset_t); void pmap_use_minicache(vm_offset_t, vm_size_t); #endif /* ARM_MMU_XSCALE == 1 */ #if defined(CPU_XSCALE_81342) #define ARM_HAVE_SUPERSECTIONS #endif #define PTE_KERNEL 0 #define PTE_USER 1 #define l1pte_valid(pde) ((pde) != 0) #define l1pte_section_p(pde) (((pde) & L1_TYPE_MASK) == L1_TYPE_S) #define l1pte_page_p(pde) (((pde) & L1_TYPE_MASK) == L1_TYPE_C) #define l1pte_fpage_p(pde) (((pde) & L1_TYPE_MASK) == L1_TYPE_F) #define l2pte_index(v) (((v) & L2_ADDR_BITS) >> L2_S_SHIFT) #define l2pte_valid(pte) ((pte) != 0) #define l2pte_pa(pte) ((pte) & L2_S_FRAME) #define l2pte_minidata(pte) (((pte) & \ (L2_B | L2_C | L2_XSCALE_T_TEX(TEX_XSCALE_X)))\ == (L2_C | L2_XSCALE_T_TEX(TEX_XSCALE_X))) /* L1 and L2 page table macros */ #define pmap_pde_v(pde) l1pte_valid(*(pde)) #define pmap_pde_section(pde) l1pte_section_p(*(pde)) #define pmap_pde_page(pde) l1pte_page_p(*(pde)) #define pmap_pde_fpage(pde) l1pte_fpage_p(*(pde)) #define pmap_pte_v(pte) l2pte_valid(*(pte)) #define pmap_pte_pa(pte) l2pte_pa(*(pte)) /* * Flags that indicate attributes of pages or mappings of pages. * * The PVF_MOD and PVF_REF flags are stored in the mdpage for each * page. PVF_WIRED, PVF_WRITE, and PVF_NC are kept in individual * pv_entry's for each page. They live in the same "namespace" so * that we can clear multiple attributes at a time. * * Note the "non-cacheable" flag generally means the page has * multiple mappings in a given address space. */ #define PVF_MOD 0x01 /* page is modified */ #define PVF_REF 0x02 /* page is referenced */ #define PVF_WIRED 0x04 /* mapping is wired */ #define PVF_WRITE 0x08 /* mapping is writable */ #define PVF_EXEC 0x10 /* mapping is executable */ #define PVF_NC 0x20 /* mapping is non-cacheable */ #define PVF_MWC 0x40 /* mapping is used multiple times in userland */ #define PVF_UNMAN 0x80 /* mapping is unmanaged */ void vector_page_setprot(int); #define SECTION_CACHE 0x1 #define SECTION_PT 0x2 void pmap_kenter_section(vm_offset_t, vm_paddr_t, int flags); #ifdef ARM_HAVE_SUPERSECTIONS void pmap_kenter_supersection(vm_offset_t, uint64_t, int flags); #endif extern char *_tmppt; void pmap_postinit(void); extern vm_paddr_t dump_avail[]; #endif /* _KERNEL */ #endif /* !LOCORE */ #endif /* !_MACHINE_PMAP_H_ */ #endif /* !ARM_NEW_PMAP */ Index: projects/clang380-import/sys/arm/include/pte.h =================================================================== --- projects/clang380-import/sys/arm/include/pte.h (revision 294776) +++ projects/clang380-import/sys/arm/include/pte.h (revision 294777) @@ -1,360 +1,361 @@ /* $NetBSD: pte.h,v 1.1 2001/11/23 17:39:04 thorpej Exp $ */ /*- * Copyright (c) 1994 Mark Brinicombe. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the RiscBSD team. * 4. The name "RiscBSD" nor the name of the author may be used to * endorse or promote products derived from this software without specific * prior written permission. * * THIS SOFTWARE IS PROVIDED BY RISCBSD ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL RISCBSD OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifdef ARM_NEW_PMAP #include #else /* ARM_NEW_PMAP */ #ifndef _MACHINE_PTE_H_ #define _MACHINE_PTE_H_ #ifndef LOCORE typedef uint32_t pd_entry_t; /* page directory entry */ typedef uint32_t pt_entry_t; /* page table entry */ +typedef pt_entry_t pt2_entry_t; /* compatibility with v6 */ #endif #define PG_FRAME 0xfffff000 /* The PT_SIZE definition is misleading... A page table is only 0x400 * bytes long. But since VM mapping can only be done to 0x1000 a single * 1KB blocks cannot be steered to a va by itself. Therefore the * pages tables are allocated in blocks of 4. i.e. if a 1 KB block * was allocated for a PT then the other 3KB would also get mapped * whenever the 1KB was mapped. */ #define PT_RSIZE 0x0400 /* Real page table size */ #define PT_SIZE 0x1000 #define PD_SIZE 0x4000 /* Page table types and masks */ #define L1_PAGE 0x01 /* L1 page table mapping */ #define L1_SECTION 0x02 /* L1 section mapping */ #define L1_FPAGE 0x03 /* L1 fine page mapping */ #define L1_MASK 0x03 /* Mask for L1 entry type */ #define L2_LPAGE 0x01 /* L2 large page (64KB) */ #define L2_SPAGE 0x02 /* L2 small page (4KB) */ #define L2_MASK 0x03 /* Mask for L2 entry type */ #define L2_INVAL 0x00 /* L2 invalid type */ /* L1 and L2 address masks */ #define L1_ADDR_MASK 0xfffffc00 #define L2_ADDR_MASK 0xfffff000 /* * The ARM MMU architecture was introduced with ARM v3 (previous ARM * architecture versions used an optional off-CPU memory controller * to perform address translation). * * The ARM MMU consists of a TLB and translation table walking logic. * There is typically one TLB per memory interface (or, put another * way, one TLB per software-visible cache). * * The ARM MMU is capable of mapping memory in the following chunks: * * 1M Sections (L1 table) * * 64K Large Pages (L2 table) * * 4K Small Pages (L2 table) * * 1K Tiny Pages (L2 table) * * There are two types of L2 tables: Coarse Tables and Fine Tables. * Coarse Tables can map Large and Small Pages. Fine Tables can * map Tiny Pages. * * Coarse Tables can define 4 Subpages within Large and Small pages. * Subpages define different permissions for each Subpage within * a Page. * * Coarse Tables are 1K in length. Fine tables are 4K in length. * * The Translation Table Base register holds the pointer to the * L1 Table. The L1 Table is a 16K contiguous chunk of memory * aligned to a 16K boundary. Each entry in the L1 Table maps * 1M of virtual address space, either via a Section mapping or * via an L2 Table. * * In addition, the Fast Context Switching Extension (FCSE) is available * on some ARM v4 and ARM v5 processors. FCSE is a way of eliminating * TLB/cache flushes on context switch by use of a smaller address space * and a "process ID" that modifies the virtual address before being * presented to the translation logic. */ /* ARMv6 super-sections. */ #define L1_SUP_SIZE 0x01000000 /* 16M */ #define L1_SUP_OFFSET (L1_SUP_SIZE - 1) #define L1_SUP_FRAME (~L1_SUP_OFFSET) #define L1_SUP_SHIFT 24 #define L1_S_SIZE 0x00100000 /* 1M */ #define L1_S_OFFSET (L1_S_SIZE - 1) #define L1_S_FRAME (~L1_S_OFFSET) #define L1_S_SHIFT 20 #define L2_L_SIZE 0x00010000 /* 64K */ #define L2_L_OFFSET (L2_L_SIZE - 1) #define L2_L_FRAME (~L2_L_OFFSET) #define L2_L_SHIFT 16 #define L2_S_SIZE 0x00001000 /* 4K */ #define L2_S_OFFSET (L2_S_SIZE - 1) #define L2_S_FRAME (~L2_S_OFFSET) #define L2_S_SHIFT 12 #define L2_T_SIZE 0x00000400 /* 1K */ #define L2_T_OFFSET (L2_T_SIZE - 1) #define L2_T_FRAME (~L2_T_OFFSET) #define L2_T_SHIFT 10 /* * The NetBSD VM implementation only works on whole pages (4K), * whereas the ARM MMU's Coarse tables are sized in terms of 1K * (16K L1 table, 1K L2 table). * * So, we allocate L2 tables 4 at a time, thus yielding a 4K L2 * table. */ #define L1_ADDR_BITS 0xfff00000 /* L1 PTE address bits */ #define L2_ADDR_BITS 0x000ff000 /* L2 PTE address bits */ #define L1_TABLE_SIZE 0x4000 /* 16K */ #define L2_TABLE_SIZE 0x1000 /* 4K */ /* * The new pmap deals with the 1KB coarse L2 tables by * allocating them from a pool. Until every port has been converted, * keep the old L2_TABLE_SIZE define lying around. Converted ports * should use L2_TABLE_SIZE_REAL until then. */ #define L2_TABLE_SIZE_REAL 0x400 /* 1K */ /* Total number of page table entries in L2 table */ #define L2_PTE_NUM_TOTAL (L2_TABLE_SIZE_REAL / sizeof(pt_entry_t)) /* * ARM L1 Descriptors */ #define L1_TYPE_INV 0x00 /* Invalid (fault) */ #define L1_TYPE_C 0x01 /* Coarse L2 */ #define L1_TYPE_S 0x02 /* Section */ #define L1_TYPE_F 0x03 /* Fine L2 */ #define L1_TYPE_MASK 0x03 /* mask of type bits */ /* L1 Section Descriptor */ #define L1_S_B 0x00000004 /* bufferable Section */ #define L1_S_C 0x00000008 /* cacheable Section */ #define L1_S_IMP 0x00000010 /* implementation defined */ #define L1_S_XN (1 << 4) /* execute not */ #define L1_S_DOM(x) ((x) << 5) /* domain */ #define L1_S_DOM_MASK L1_S_DOM(0xf) #define L1_S_AP(x) ((x) << 10) /* access permissions */ #define L1_S_ADDR_MASK 0xfff00000 /* phys address of section */ #define L1_S_TEX(x) (((x) & 0x7) << 12) /* Type Extension */ #define L1_S_TEX_MASK (0x7 << 12) /* Type Extension */ #define L1_S_APX (1 << 15) #define L1_SHARED (1 << 16) #define L1_S_XSCALE_P 0x00000200 /* ECC enable for this section */ #define L1_S_XSCALE_TEX(x) ((x) << 12) /* Type Extension */ #define L1_S_SUPERSEC ((1) << 18) /* Section is a super-section. */ /* L1 Coarse Descriptor */ #define L1_C_IMP0 0x00000004 /* implementation defined */ #define L1_C_IMP1 0x00000008 /* implementation defined */ #define L1_C_IMP2 0x00000010 /* implementation defined */ #define L1_C_DOM(x) ((x) << 5) /* domain */ #define L1_C_DOM_MASK L1_C_DOM(0xf) #define L1_C_ADDR_MASK 0xfffffc00 /* phys address of L2 Table */ #define L1_C_XSCALE_P 0x00000200 /* ECC enable for this section */ /* L1 Fine Descriptor */ #define L1_F_IMP0 0x00000004 /* implementation defined */ #define L1_F_IMP1 0x00000008 /* implementation defined */ #define L1_F_IMP2 0x00000010 /* implementation defined */ #define L1_F_DOM(x) ((x) << 5) /* domain */ #define L1_F_DOM_MASK L1_F_DOM(0xf) #define L1_F_ADDR_MASK 0xfffff000 /* phys address of L2 Table */ #define L1_F_XSCALE_P 0x00000200 /* ECC enable for this section */ /* * ARM L2 Descriptors */ #define L2_TYPE_INV 0x00 /* Invalid (fault) */ #define L2_TYPE_L 0x01 /* Large Page */ #define L2_TYPE_S 0x02 /* Small Page */ #define L2_TYPE_T 0x03 /* Tiny Page */ #define L2_TYPE_MASK 0x03 /* mask of type bits */ /* * This L2 Descriptor type is available on XScale processors * when using a Coarse L1 Descriptor. The Extended Small * Descriptor has the same format as the XScale Tiny Descriptor, * but describes a 4K page, rather than a 1K page. */ #define L2_TYPE_XSCALE_XS 0x03 /* XScale Extended Small Page */ #define L2_B 0x00000004 /* Bufferable page */ #define L2_C 0x00000008 /* Cacheable page */ #define L2_AP0(x) ((x) << 4) /* access permissions (sp 0) */ #define L2_AP1(x) ((x) << 6) /* access permissions (sp 1) */ #define L2_AP2(x) ((x) << 8) /* access permissions (sp 2) */ #define L2_AP3(x) ((x) << 10) /* access permissions (sp 3) */ #define L2_SHARED (1 << 10) #define L2_APX (1 << 9) #define L2_XN (1 << 0) #define L2_L_TEX_MASK (0x7 << 12) /* Type Extension */ #define L2_L_TEX(x) (((x) & 0x7) << 12) #define L2_S_TEX_MASK (0x7 << 6) /* Type Extension */ #define L2_S_TEX(x) (((x) & 0x7) << 6) #define L2_XSCALE_L_TEX(x) ((x) << 12) /* Type Extension */ #define L2_XSCALE_L_S(x) (1 << 15) /* Shared */ #define L2_XSCALE_T_TEX(x) ((x) << 6) /* Type Extension */ /* * Access Permissions for L1 and L2 Descriptors. */ #define AP_W 0x01 /* writable */ #define AP_REF 0x01 /* referenced flag */ #define AP_U 0x02 /* user */ /* * Short-hand for common AP_* constants. * * Note: These values assume the S (System) bit is set and * the R (ROM) bit is clear in CP15 register 1. */ #define AP_KR 0x00 /* kernel read */ #define AP_KRW 0x01 /* kernel read/write */ #define AP_KRWUR 0x02 /* kernel read/write usr read */ #define AP_KRWURW 0x03 /* kernel read/write usr read/write */ /* * Domain Types for the Domain Access Control Register. */ #define DOMAIN_FAULT 0x00 /* no access */ #define DOMAIN_CLIENT 0x01 /* client */ #define DOMAIN_RESERVED 0x02 /* reserved */ #define DOMAIN_MANAGER 0x03 /* manager */ /* * Type Extension bits for XScale processors. * * Behavior of C and B when X == 0: * * C B Cacheable Bufferable Write Policy Line Allocate Policy * 0 0 N N - - * 0 1 N Y - - * 1 0 Y Y Write-through Read Allocate * 1 1 Y Y Write-back Read Allocate * * Behavior of C and B when X == 1: * C B Cacheable Bufferable Write Policy Line Allocate Policy * 0 0 - - - - DO NOT USE * 0 1 N Y - - * 1 0 Mini-Data - - - * 1 1 Y Y Write-back R/W Allocate */ #define TEX_XSCALE_X 0x01 /* X modifies C and B */ #define TEX_XSCALE_E 0x02 #define TEX_XSCALE_T 0x04 /* Xscale core 3 */ /* * * Cache attributes with L2 present, S = 0 * T E X C B L1 i-cache L1 d-cache L1 DC WP L2 cacheable write coalesce * 0 0 0 0 0 N N - N N * 0 0 0 0 1 N N - N Y * 0 0 0 1 0 Y Y WT N Y * 0 0 0 1 1 Y Y WB Y Y * 0 0 1 0 0 N N - Y Y * 0 0 1 0 1 N N - N N * 0 0 1 1 0 Y Y - - N * 0 0 1 1 1 Y Y WT Y Y * 0 1 0 0 0 N N - N N * 0 1 0 0 1 N/A N/A N/A N/A N/A * 0 1 0 1 0 N/A N/A N/A N/A N/A * 0 1 0 1 1 N/A N/A N/A N/A N/A * 0 1 1 X X N/A N/A N/A N/A N/A * 1 X 0 0 0 N N - N Y * 1 X 0 0 1 Y N WB N Y * 1 X 0 1 0 Y N WT N Y * 1 X 0 1 1 Y N WB Y Y * 1 X 1 0 0 N N - Y Y * 1 X 1 0 1 Y Y WB Y Y * 1 X 1 1 0 Y Y WT Y Y * 1 X 1 1 1 Y Y WB Y Y * * * * * Cache attributes with L2 present, S = 1 * T E X C B L1 i-cache L1 d-cache L1 DC WP L2 cacheable write coalesce * 0 0 0 0 0 N N - N N * 0 0 0 0 1 N N - N Y * 0 0 0 1 0 Y Y - N Y * 0 0 0 1 1 Y Y WT Y Y * 0 0 1 0 0 N N - Y Y * 0 0 1 0 1 N N - N N * 0 0 1 1 0 Y Y - - N * 0 0 1 1 1 Y Y WT Y Y * 0 1 0 0 0 N N - N N * 0 1 0 0 1 N/A N/A N/A N/A N/A * 0 1 0 1 0 N/A N/A N/A N/A N/A * 0 1 0 1 1 N/A N/A N/A N/A N/A * 0 1 1 X X N/A N/A N/A N/A N/A * 1 X 0 0 0 N N - N Y * 1 X 0 0 1 Y N - N Y * 1 X 0 1 0 Y N - N Y * 1 X 0 1 1 Y N - Y Y * 1 X 1 0 0 N N - Y Y * 1 X 1 0 1 Y Y WT Y Y * 1 X 1 1 0 Y Y WT Y Y * 1 X 1 1 1 Y Y WT Y Y */ #endif /* !_MACHINE_PTE_H_ */ #endif /* !ARM_NEW_PMAP */ /* End of pte.h */ Index: projects/clang380-import/sys/arm/include/sysreg.h =================================================================== --- projects/clang380-import/sys/arm/include/sysreg.h (revision 294776) +++ projects/clang380-import/sys/arm/include/sysreg.h (revision 294777) @@ -1,289 +1,307 @@ /*- * Copyright 2014 Svatopluk Kraus * Copyright 2014 Michal Meloun * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ /* * Macros to make working with the System Control Registers simpler. * * Note that when register r0 is hard-coded in these definitions it means the * cp15 operation neither reads nor writes the register, and r0 is used only * because some syntatically-valid register name has to appear at that point to * keep the asm parser happy. */ #ifndef MACHINE_SYSREG_H #define MACHINE_SYSREG_H #include /* + * CP14 registers + */ +#if __ARM_ARCH >= 6 + +#define CP14_DBGDIDR(rr) p14, 0, rr, c0, c0, 0 /* Debug ID Register */ +#define CP14_DBGDSCRext_V6(rr) p14, 0, rr, c0, c1, 0 /* Debug Status and Ctrl Register v6 */ +#define CP14_DBGDSCRext_V7(rr) p14, 0, rr, c0, c2, 2 /* Debug Status and Ctrl Register v7 */ +#define CP14_DBGVCR(rr) p14, 0, rr, c0, c7, 0 /* Vector Catch Register */ +#define CP14_DBGOSLAR(rr) p14, 0, rr, c1, c0, 4 /* OS Lock Access Register */ +#define CP14_DBGOSLSR(rr) p14, 0, rr, c1, c1, 4 /* OS Lock Status Register */ +#define CP14_DBGOSDLR(rr) p14, 0, rr, c1, c3, 4 /* OS Double Lock Register */ +#define CP14_DBGPRSR(rr) p14, 0, rr, c1, c5, 4 /* Device Powerdown and Reset Status */ + +#define CP14_DBGDSCRint(rr) CP14_DBGDSCRext_V6(rr) /* Debug Status and Ctrl internal view */ + +#endif + +/* * CP15 C0 registers */ #define CP15_MIDR(rr) p15, 0, rr, c0, c0, 0 /* Main ID Register */ #define CP15_CTR(rr) p15, 0, rr, c0, c0, 1 /* Cache Type Register */ #define CP15_TCMTR(rr) p15, 0, rr, c0, c0, 2 /* TCM Type Register */ #define CP15_TLBTR(rr) p15, 0, rr, c0, c0, 3 /* TLB Type Register */ #define CP15_MPIDR(rr) p15, 0, rr, c0, c0, 5 /* Multiprocessor Affinity Register */ #define CP15_REVIDR(rr) p15, 0, rr, c0, c0, 6 /* Revision ID Register */ #define CP15_ID_PFR0(rr) p15, 0, rr, c0, c1, 0 /* Processor Feature Register 0 */ #define CP15_ID_PFR1(rr) p15, 0, rr, c0, c1, 1 /* Processor Feature Register 1 */ #define CP15_ID_DFR0(rr) p15, 0, rr, c0, c1, 2 /* Debug Feature Register 0 */ #define CP15_ID_AFR0(rr) p15, 0, rr, c0, c1, 3 /* Auxiliary Feature Register 0 */ #define CP15_ID_MMFR0(rr) p15, 0, rr, c0, c1, 4 /* Memory Model Feature Register 0 */ #define CP15_ID_MMFR1(rr) p15, 0, rr, c0, c1, 5 /* Memory Model Feature Register 1 */ #define CP15_ID_MMFR2(rr) p15, 0, rr, c0, c1, 6 /* Memory Model Feature Register 2 */ #define CP15_ID_MMFR3(rr) p15, 0, rr, c0, c1, 7 /* Memory Model Feature Register 3 */ #define CP15_ID_ISAR0(rr) p15, 0, rr, c0, c2, 0 /* Instruction Set Attribute Register 0 */ #define CP15_ID_ISAR1(rr) p15, 0, rr, c0, c2, 1 /* Instruction Set Attribute Register 1 */ #define CP15_ID_ISAR2(rr) p15, 0, rr, c0, c2, 2 /* Instruction Set Attribute Register 2 */ #define CP15_ID_ISAR3(rr) p15, 0, rr, c0, c2, 3 /* Instruction Set Attribute Register 3 */ #define CP15_ID_ISAR4(rr) p15, 0, rr, c0, c2, 4 /* Instruction Set Attribute Register 4 */ #define CP15_ID_ISAR5(rr) p15, 0, rr, c0, c2, 5 /* Instruction Set Attribute Register 5 */ #define CP15_CCSIDR(rr) p15, 1, rr, c0, c0, 0 /* Cache Size ID Registers */ #define CP15_CLIDR(rr) p15, 1, rr, c0, c0, 1 /* Cache Level ID Register */ #define CP15_AIDR(rr) p15, 1, rr, c0, c0, 7 /* Auxiliary ID Register */ #define CP15_CSSELR(rr) p15, 2, rr, c0, c0, 0 /* Cache Size Selection Register */ /* * CP15 C1 registers */ #define CP15_SCTLR(rr) p15, 0, rr, c1, c0, 0 /* System Control Register */ #define CP15_ACTLR(rr) p15, 0, rr, c1, c0, 1 /* IMPLEMENTATION DEFINED Auxiliary Control Register */ #define CP15_CPACR(rr) p15, 0, rr, c1, c0, 2 /* Coprocessor Access Control Register */ #define CP15_SCR(rr) p15, 0, rr, c1, c1, 0 /* Secure Configuration Register */ #define CP15_SDER(rr) p15, 0, rr, c1, c1, 1 /* Secure Debug Enable Register */ #define CP15_NSACR(rr) p15, 0, rr, c1, c1, 2 /* Non-Secure Access Control Register */ /* * CP15 C2 registers */ #define CP15_TTBR0(rr) p15, 0, rr, c2, c0, 0 /* Translation Table Base Register 0 */ #define CP15_TTBR1(rr) p15, 0, rr, c2, c0, 1 /* Translation Table Base Register 1 */ #define CP15_TTBCR(rr) p15, 0, rr, c2, c0, 2 /* Translation Table Base Control Register */ /* * CP15 C3 registers */ #define CP15_DACR(rr) p15, 0, rr, c3, c0, 0 /* Domain Access Control Register */ /* * CP15 C5 registers */ #define CP15_DFSR(rr) p15, 0, rr, c5, c0, 0 /* Data Fault Status Register */ #if __ARM_ARCH >= 6 /* From ARMv6: */ #define CP15_IFSR(rr) p15, 0, rr, c5, c0, 1 /* Instruction Fault Status Register */ #endif #if __ARM_ARCH >= 7 /* From ARMv7: */ #define CP15_ADFSR(rr) p15, 0, rr, c5, c1, 0 /* Auxiliary Data Fault Status Register */ #define CP15_AIFSR(rr) p15, 0, rr, c5, c1, 1 /* Auxiliary Instruction Fault Status Register */ #endif /* * CP15 C6 registers */ #define CP15_DFAR(rr) p15, 0, rr, c6, c0, 0 /* Data Fault Address Register */ #if __ARM_ARCH >= 6 /* From ARMv6k: */ #define CP15_IFAR(rr) p15, 0, rr, c6, c0, 2 /* Instruction Fault Address Register */ #endif /* * CP15 C7 registers */ #if __ARM_ARCH >= 7 && defined(SMP) /* From ARMv7: */ #define CP15_ICIALLUIS p15, 0, r0, c7, c1, 0 /* Instruction cache invalidate all PoU, IS */ #define CP15_BPIALLIS p15, 0, r0, c7, c1, 6 /* Branch predictor invalidate all IS */ #endif #define CP15_PAR(rr) p15, 0, rr, c7, c4, 0 /* Physical Address Register */ #define CP15_ICIALLU p15, 0, r0, c7, c5, 0 /* Instruction cache invalidate all PoU */ #define CP15_ICIMVAU(rr) p15, 0, rr, c7, c5, 1 /* Instruction cache invalidate */ #if __ARM_ARCH == 6 /* Deprecated in ARMv7 */ #define CP15_CP15ISB p15, 0, r0, c7, c5, 4 /* ISB */ #endif #define CP15_BPIALL p15, 0, r0, c7, c5, 6 /* Branch predictor invalidate all */ #define CP15_BPIMVA p15, 0, rr, c7, c5, 7 /* Branch predictor invalidate by MVA */ #if __ARM_ARCH == 6 /* Only ARMv6: */ #define CP15_DCIALL p15, 0, r0, c7, c6, 0 /* Data cache invalidate all */ #endif #define CP15_DCIMVAC(rr) p15, 0, rr, c7, c6, 1 /* Data cache invalidate by MVA PoC */ #define CP15_DCISW(rr) p15, 0, rr, c7, c6, 2 /* Data cache invalidate by set/way */ #define CP15_ATS1CPR(rr) p15, 0, rr, c7, c8, 0 /* Stage 1 Current state PL1 read */ #define CP15_ATS1CPW(rr) p15, 0, rr, c7, c8, 1 /* Stage 1 Current state PL1 write */ #define CP15_ATS1CUR(rr) p15, 0, rr, c7, c8, 2 /* Stage 1 Current state unprivileged read */ #define CP15_ATS1CUW(rr) p15, 0, rr, c7, c8, 3 /* Stage 1 Current state unprivileged write */ #if __ARM_ARCH >= 7 /* From ARMv7: */ #define CP15_ATS12NSOPR(rr) p15, 0, rr, c7, c8, 4 /* Stages 1 and 2 Non-secure only PL1 read */ #define CP15_ATS12NSOPW(rr) p15, 0, rr, c7, c8, 5 /* Stages 1 and 2 Non-secure only PL1 write */ #define CP15_ATS12NSOUR(rr) p15, 0, rr, c7, c8, 6 /* Stages 1 and 2 Non-secure only unprivileged read */ #define CP15_ATS12NSOUW(rr) p15, 0, rr, c7, c8, 7 /* Stages 1 and 2 Non-secure only unprivileged write */ #endif #if __ARM_ARCH == 6 /* Only ARMv6: */ #define CP15_DCCALL p15, 0, r0, c7, c10, 0 /* Data cache clean all */ #endif #define CP15_DCCMVAC(rr) p15, 0, rr, c7, c10, 1 /* Data cache clean by MVA PoC */ #define CP15_DCCSW(rr) p15, 0, rr, c7, c10, 2 /* Data cache clean by set/way */ #if __ARM_ARCH == 6 /* Only ARMv6: */ #define CP15_CP15DSB p15, 0, r0, c7, c10, 4 /* DSB */ #define CP15_CP15DMB p15, 0, r0, c7, c10, 5 /* DMB */ #define CP15_CP15WFI p15, 0, r0, c7, c0, 4 /* WFI */ #endif #if __ARM_ARCH >= 7 /* From ARMv7: */ #define CP15_DCCMVAU(rr) p15, 0, rr, c7, c11, 1 /* Data cache clean by MVA PoU */ #endif #if __ARM_ARCH == 6 /* Only ARMv6: */ #define CP15_DCCIALL p15, 0, r0, c7, c14, 0 /* Data cache clean and invalidate all */ #endif #define CP15_DCCIMVAC(rr) p15, 0, rr, c7, c14, 1 /* Data cache clean and invalidate by MVA PoC */ #define CP15_DCCISW(rr) p15, 0, rr, c7, c14, 2 /* Data cache clean and invalidate by set/way */ /* * CP15 C8 registers */ #if __ARM_ARCH >= 7 && defined(SMP) /* From ARMv7: */ #define CP15_TLBIALLIS p15, 0, r0, c8, c3, 0 /* Invalidate entire unified TLB IS */ #define CP15_TLBIMVAIS(rr) p15, 0, rr, c8, c3, 1 /* Invalidate unified TLB by MVA IS */ #define CP15_TLBIASIDIS(rr) p15, 0, rr, c8, c3, 2 /* Invalidate unified TLB by ASID IS */ #define CP15_TLBIMVAAIS(rr) p15, 0, rr, c8, c3, 3 /* Invalidate unified TLB by MVA, all ASID IS */ #endif #define CP15_TLBIALL p15, 0, r0, c8, c7, 0 /* Invalidate entire unified TLB */ #define CP15_TLBIMVA(rr) p15, 0, rr, c8, c7, 1 /* Invalidate unified TLB by MVA */ #define CP15_TLBIASID(rr) p15, 0, rr, c8, c7, 2 /* Invalidate unified TLB by ASID */ #if __ARM_ARCH >= 6 /* From ARMv6: */ #define CP15_TLBIMVAA(rr) p15, 0, rr, c8, c7, 3 /* Invalidate unified TLB by MVA, all ASID */ #endif /* * CP15 C9 registers */ #if __ARM_ARCH == 6 && defined(CPU_ARM1176) #define CP15_PMUSERENR(rr) p15, 0, rr, c15, c9, 0 /* Access Validation Control Register */ #define CP15_PMCR(rr) p15, 0, rr, c15, c12, 0 /* Performance Monitor Control Register */ #define CP15_PMCCNTR(rr) p15, 0, rr, c15, c12, 1 /* PM Cycle Count Register */ #elif __ARM_ARCH > 6 #define CP15_L2CTLR(rr) p15, 1, rr, c9, c0, 2 /* L2 Control Register */ #define CP15_PMCR(rr) p15, 0, rr, c9, c12, 0 /* Performance Monitor Control Register */ #define CP15_PMCNTENSET(rr) p15, 0, rr, c9, c12, 1 /* PM Count Enable Set Register */ #define CP15_PMCNTENCLR(rr) p15, 0, rr, c9, c12, 2 /* PM Count Enable Clear Register */ #define CP15_PMOVSR(rr) p15, 0, rr, c9, c12, 3 /* PM Overflow Flag Status Register */ #define CP15_PMSWINC(rr) p15, 0, rr, c9, c12, 4 /* PM Software Increment Register */ #define CP15_PMSELR(rr) p15, 0, rr, c9, c12, 5 /* PM Event Counter Selection Register */ #define CP15_PMCCNTR(rr) p15, 0, rr, c9, c13, 0 /* PM Cycle Count Register */ #define CP15_PMXEVTYPER(rr) p15, 0, rr, c9, c13, 1 /* PM Event Type Select Register */ #define CP15_PMXEVCNTRR(rr) p15, 0, rr, c9, c13, 2 /* PM Event Count Register */ #define CP15_PMUSERENR(rr) p15, 0, rr, c9, c14, 0 /* PM User Enable Register */ #define CP15_PMINTENSET(rr) p15, 0, rr, c9, c14, 1 /* PM Interrupt Enable Set Register */ #define CP15_PMINTENCLR(rr) p15, 0, rr, c9, c14, 2 /* PM Interrupt Enable Clear Register */ #endif /* * CP15 C10 registers */ /* Without LPAE this is PRRR, with LPAE it's MAIR0 */ #define CP15_PRRR(rr) p15, 0, rr, c10, c2, 0 /* Primary Region Remap Register */ #define CP15_MAIR0(rr) p15, 0, rr, c10, c2, 0 /* Memory Attribute Indirection Register 0 */ /* Without LPAE this is NMRR, with LPAE it's MAIR1 */ #define CP15_NMRR(rr) p15, 0, rr, c10, c2, 1 /* Normal Memory Remap Register */ #define CP15_MAIR1(rr) p15, 0, rr, c10, c2, 1 /* Memory Attribute Indirection Register 1 */ #define CP15_AMAIR0(rr) p15, 0, rr, c10, c3, 0 /* Auxiliary Memory Attribute Indirection Register 0 */ #define CP15_AMAIR1(rr) p15, 0, rr, c10, c3, 1 /* Auxiliary Memory Attribute Indirection Register 1 */ /* * CP15 C12 registers */ #define CP15_VBAR(rr) p15, 0, rr, c12, c0, 0 /* Vector Base Address Register */ #define CP15_MVBAR(rr) p15, 0, rr, c12, c0, 1 /* Monitor Vector Base Address Register */ #define CP15_ISR(rr) p15, 0, rr, c12, c1, 0 /* Interrupt Status Register */ /* * CP15 C13 registers */ #define CP15_FCSEIDR(rr) p15, 0, rr, c13, c0, 0 /* FCSE Process ID Register */ #define CP15_CONTEXTIDR(rr) p15, 0, rr, c13, c0, 1 /* Context ID Register */ #define CP15_TPIDRURW(rr) p15, 0, rr, c13, c0, 2 /* User Read/Write Thread ID Register */ #define CP15_TPIDRURO(rr) p15, 0, rr, c13, c0, 3 /* User Read-Only Thread ID Register */ #define CP15_TPIDRPRW(rr) p15, 0, rr, c13, c0, 4 /* PL1 only Thread ID Register */ /* * CP15 C14 registers * These are the Generic Timer registers and may be unallocated on some SoCs. * Only use these when you know the Generic Timer is available. */ #define CP15_CNTFRQ(rr) p15, 0, rr, c14, c0, 0 /* Counter Frequency Register */ #define CP15_CNTKCTL(rr) p15, 0, rr, c14, c1, 0 /* Timer PL1 Control Register */ #define CP15_CNTP_TVAL(rr) p15, 0, rr, c14, c2, 0 /* PL1 Physical Timer Value Register */ #define CP15_CNTP_CTL(rr) p15, 0, rr, c14, c2, 1 /* PL1 Physical Timer Control Register */ #define CP15_CNTV_TVAL(rr) p15, 0, rr, c14, c3, 0 /* Virtual Timer Value Register */ #define CP15_CNTV_CTL(rr) p15, 0, rr, c14, c3, 1 /* Virtual Timer Control Register */ #define CP15_CNTHCTL(rr) p15, 4, rr, c14, c1, 0 /* Timer PL2 Control Register */ #define CP15_CNTHP_TVAL(rr) p15, 4, rr, c14, c2, 0 /* PL2 Physical Timer Value Register */ #define CP15_CNTHP_CTL(rr) p15, 4, rr, c14, c2, 1 /* PL2 Physical Timer Control Register */ /* 64-bit registers for use with mcrr/mrrc */ #define CP15_CNTPCT(rq, rr) p15, 0, rq, rr, c14 /* Physical Count Register */ #define CP15_CNTVCT(rq, rr) p15, 1, rq, rr, c14 /* Virtual Count Register */ #define CP15_CNTP_CVAL(rq, rr) p15, 2, rq, rr, c14 /* PL1 Physical Timer Compare Value Register */ #define CP15_CNTV_CVAL(rq, rr) p15, 3, rq, rr, c14 /* Virtual Timer Compare Value Register */ #define CP15_CNTVOFF(rq, rr) p15, 4, rq, rr, c14 /* Virtual Offset Register */ #define CP15_CNTHP_CVAL(rq, rr) p15, 6, rq, rr, c14 /* PL2 Physical Timer Compare Value Register */ /* * CP15 C15 registers */ #define CP15_CBAR(rr) p15, 4, rr, c15, c0, 0 /* Configuration Base Address Register */ #endif /* !MACHINE_SYSREG_H */ Index: projects/clang380-import/sys/arm64/arm64/gic.c =================================================================== --- projects/clang380-import/sys/arm64/arm64/gic.c (revision 294776) +++ projects/clang380-import/sys/arm64/arm64/gic.c (revision 294777) @@ -1,497 +1,476 @@ /*- * Copyright (c) 2011 The FreeBSD Foundation * Copyright (c) 2014 Andrew Turner * All rights reserved. * * Developed by Damjan Marion * * Based on OMAP4 GIC code by Ben Gray * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the company nor the name of the author may be used to * endorse or promote products derived from this software without specific * prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pic_if.h" /* We are using GICv2 register naming */ /* Distributor Registers */ #define GICD_CTLR 0x000 /* v1 ICDDCR */ #define GICD_TYPER 0x004 /* v1 ICDICTR */ #define GICD_IIDR 0x008 /* v1 ICDIIDR */ #define GICD_IGROUPR(n) (0x0080 + ((n) * 4)) /* v1 ICDISER */ #define GICD_ISENABLER(n) (0x0100 + ((n) * 4)) /* v1 ICDISER */ #define GICD_ICENABLER(n) (0x0180 + ((n) * 4)) /* v1 ICDICER */ #define GICD_ISPENDR(n) (0x0200 + ((n) * 4)) /* v1 ICDISPR */ #define GICD_ICPENDR(n) (0x0280 + ((n) * 4)) /* v1 ICDICPR */ #define GICD_ICACTIVER(n) (0x0380 + ((n) * 4)) /* v1 ICDABR */ #define GICD_IPRIORITYR(n) (0x0400 + ((n) * 4)) /* v1 ICDIPR */ #define GICD_ITARGETSR(n) (0x0800 + ((n) * 4)) /* v1 ICDIPTR */ #define GICD_ICFGR(n) (0x0C00 + ((n) * 4)) /* v1 ICDICFR */ #define GICD_SGIR(n) (0x0F00 + ((n) * 4)) /* v1 ICDSGIR */ /* CPU Registers */ #define GICC_CTLR 0x0000 /* v1 ICCICR */ #define GICC_PMR 0x0004 /* v1 ICCPMR */ #define GICC_BPR 0x0008 /* v1 ICCBPR */ #define GICC_IAR 0x000C /* v1 ICCIAR */ #define GICC_EOIR 0x0010 /* v1 ICCEOIR */ #define GICC_RPR 0x0014 /* v1 ICCRPR */ #define GICC_HPPIR 0x0018 /* v1 ICCHPIR */ #define GICC_ABPR 0x001C /* v1 ICCABPR */ #define GICC_IIDR 0x00FC /* v1 ICCIIDR*/ #define GIC_FIRST_IPI 0 /* Irqs 0-15 are SGIs/IPIs. */ #define GIC_LAST_IPI 15 #define GIC_FIRST_PPI 16 /* Irqs 16-31 are private (per */ #define GIC_LAST_PPI 31 /* core) peripheral interrupts. */ #define GIC_FIRST_SPI 32 /* Irqs 32+ are shared peripherals. */ /* First bit is a polarity bit (0 - low, 1 - high) */ #define GICD_ICFGR_POL_LOW (0 << 0) #define GICD_ICFGR_POL_HIGH (1 << 0) #define GICD_ICFGR_POL_MASK 0x1 /* Second bit is a trigger bit (0 - level, 1 - edge) */ #define GICD_ICFGR_TRIG_LVL (0 << 1) #define GICD_ICFGR_TRIG_EDGE (1 << 1) #define GICD_ICFGR_TRIG_MASK 0x2 static struct resource_spec arm_gic_spec[] = { { SYS_RES_MEMORY, 0, RF_ACTIVE }, /* Distributor registers */ { SYS_RES_MEMORY, 1, RF_ACTIVE }, /* CPU Interrupt Intf. registers */ { -1, 0 } }; static struct arm_gic_softc *arm_gic_sc = NULL; #define gic_c_read_4(_sc, _reg) \ bus_space_read_4((_sc)->gic_c_bst, (_sc)->gic_c_bsh, (_reg)) #define gic_c_write_4(_sc, _reg, _val) \ bus_space_write_4((_sc)->gic_c_bst, (_sc)->gic_c_bsh, (_reg), (_val)) #define gic_d_read_4(_sc, _reg) \ bus_space_read_4((_sc)->gic_d_bst, (_sc)->gic_d_bsh, (_reg)) #define gic_d_write_4(_sc, _reg, _val) \ bus_space_write_4((_sc)->gic_d_bst, (_sc)->gic_d_bsh, (_reg), (_val)) static pic_dispatch_t gic_dispatch; static pic_eoi_t gic_eoi; static pic_mask_t gic_mask_irq; static pic_unmask_t gic_unmask_irq; #ifdef SMP static void gic_init_secondary(device_t dev) { struct arm_gic_softc *sc = device_get_softc(dev); int i; for (i = 0; i < sc->nirqs; i += 4) gic_d_write_4(sc, GICD_IPRIORITYR(i >> 2), 0); /* Set all the interrupts to be in Group 0 (secure) */ for (i = 0; i < sc->nirqs; i += 32) { gic_d_write_4(sc, GICD_IGROUPR(i >> 5), 0); } /* Enable CPU interface */ gic_c_write_4(sc, GICC_CTLR, 1); /* Set priority mask register. */ gic_c_write_4(sc, GICC_PMR, 0xff); /* Enable interrupt distribution */ gic_d_write_4(sc, GICD_CTLR, 0x01); /* * Activate the timer interrupts: virtual, secure, and non-secure. */ gic_d_write_4(sc, GICD_ISENABLER(27 >> 5), (1UL << (27 & 0x1F))); gic_d_write_4(sc, GICD_ISENABLER(29 >> 5), (1UL << (29 & 0x1F))); gic_d_write_4(sc, GICD_ISENABLER(30 >> 5), (1UL << (30 & 0x1F))); } #endif int arm_gic_attach(device_t dev) { struct arm_gic_softc *sc; int i; uint32_t icciidr; if (arm_gic_sc) return (ENXIO); sc = device_get_softc(dev); if (bus_alloc_resources(dev, arm_gic_spec, sc->gic_res)) { device_printf(dev, "could not allocate resources\n"); return (ENXIO); } sc->gic_dev = dev; arm_gic_sc = sc; /* Initialize mutex */ mtx_init(&sc->mutex, "GIC lock", "", MTX_SPIN); /* Distributor Interface */ sc->gic_d_bst = rman_get_bustag(sc->gic_res[0]); sc->gic_d_bsh = rman_get_bushandle(sc->gic_res[0]); /* CPU Interface */ sc->gic_c_bst = rman_get_bustag(sc->gic_res[1]); sc->gic_c_bsh = rman_get_bushandle(sc->gic_res[1]); /* Disable interrupt forwarding to the CPU interface */ gic_d_write_4(sc, GICD_CTLR, 0x00); /* Get the number of interrupts */ sc->nirqs = gic_d_read_4(sc, GICD_TYPER); sc->nirqs = 32 * ((sc->nirqs & 0x1f) + 1); arm_register_root_pic(dev, sc->nirqs); icciidr = gic_c_read_4(sc, GICC_IIDR); device_printf(dev,"pn 0x%x, arch 0x%x, rev 0x%x, implementer 0x%x irqs %u\n", icciidr>>20, (icciidr>>16) & 0xF, (icciidr>>12) & 0xf, (icciidr & 0xfff), sc->nirqs); /* Set all global interrupts to be level triggered, active low. */ for (i = 32; i < sc->nirqs; i += 16) { gic_d_write_4(sc, GICD_ICFGR(i >> 4), 0x00000000); } /* Disable all interrupts. */ for (i = 32; i < sc->nirqs; i += 32) { gic_d_write_4(sc, GICD_ICENABLER(i >> 5), 0xFFFFFFFF); } for (i = 0; i < sc->nirqs; i += 4) { gic_d_write_4(sc, GICD_IPRIORITYR(i >> 2), 0); gic_d_write_4(sc, GICD_ITARGETSR(i >> 2), 1 << 0 | 1 << 8 | 1 << 16 | 1 << 24); } /* Set all the interrupts to be in Group 0 (secure) */ for (i = 0; i < sc->nirqs; i += 32) { gic_d_write_4(sc, GICD_IGROUPR(i >> 5), 0); } /* Enable CPU interface */ gic_c_write_4(sc, GICC_CTLR, 1); /* Set priority mask register. */ gic_c_write_4(sc, GICC_PMR, 0xff); /* Enable interrupt distribution */ gic_d_write_4(sc, GICD_CTLR, 0x01); return (0); } static void gic_dispatch(device_t dev, struct trapframe *frame) { struct arm_gic_softc *sc = device_get_softc(dev); uint32_t active_irq; int first = 1; while (1) { active_irq = gic_c_read_4(sc, GICC_IAR); /* * Immediatly EOIR the SGIs, because doing so requires the other * bits (ie CPU number), not just the IRQ number, and we do not * have this information later. */ if ((active_irq & 0x3ff) <= GIC_LAST_IPI) gic_c_write_4(sc, GICC_EOIR, active_irq); active_irq &= 0x3FF; if (active_irq == 0x3FF) { if (first) printf("Spurious interrupt detected\n"); return; } arm_dispatch_intr(active_irq, frame); first = 0; } } static void gic_eoi(device_t dev, u_int irq) { struct arm_gic_softc *sc = device_get_softc(dev); gic_c_write_4(sc, GICC_EOIR, irq); } void gic_mask_irq(device_t dev, u_int irq) { struct arm_gic_softc *sc = device_get_softc(dev); gic_d_write_4(sc, GICD_ICENABLER(irq >> 5), (1UL << (irq & 0x1F))); gic_c_write_4(sc, GICC_EOIR, irq); } void gic_unmask_irq(device_t dev, u_int irq) { struct arm_gic_softc *sc = device_get_softc(dev); gic_d_write_4(sc, GICD_ISENABLER(irq >> 5), (1UL << (irq & 0x1F))); } #ifdef SMP static void gic_ipi_send(device_t dev, cpuset_t cpus, u_int ipi) { struct arm_gic_softc *sc = device_get_softc(dev); uint32_t val = 0, i; for (i = 0; i < MAXCPU; i++) if (CPU_ISSET(i, &cpus)) val |= 1 << (16 + i); gic_d_write_4(sc, GICD_SGIR(0), val | ipi); } static int arm_gic_ipi_read(device_t dev, int i) { if (i != -1) { /* * The intr code will automagically give the frame pointer * if the interrupt argument is 0. */ if ((unsigned int)i > 16) return (0); return (i); } return (0x3ff); } static void arm_gic_ipi_clear(device_t dev, int ipi) { /* no-op */ } #endif static device_method_t arm_gic_methods[] = { /* Device interface */ DEVMETHOD(device_attach, arm_gic_attach), /* pic_if */ DEVMETHOD(pic_dispatch, gic_dispatch), DEVMETHOD(pic_eoi, gic_eoi), DEVMETHOD(pic_mask, gic_mask_irq), DEVMETHOD(pic_unmask, gic_unmask_irq), #ifdef SMP DEVMETHOD(pic_init_secondary, gic_init_secondary), DEVMETHOD(pic_ipi_send, gic_ipi_send), #endif { 0, 0 } }; DEFINE_CLASS_0(gic, arm_gic_driver, arm_gic_methods, sizeof(struct arm_gic_softc)); #define GICV2M_MSI_TYPER 0x008 #define MSI_TYPER_SPI_BASE(x) (((x) >> 16) & 0x3ff) #define MSI_TYPER_SPI_COUNT(x) (((x) >> 0) & 0x3ff) #define GICv2M_MSI_SETSPI_NS 0x040 #define GICV2M_MSI_IIDR 0xFCC -struct gicv2m_softc { - struct resource *sc_mem; - struct mtx sc_mutex; - u_int sc_spi_start; - u_int sc_spi_count; - u_int sc_spi_offset; -}; - static int -gicv2m_probe(device_t dev) -{ - - device_set_desc(dev, "ARM Generic Interrupt Controller MSI/MSIX"); - return (BUS_PROBE_DEFAULT); -} - -static int gicv2m_attach(device_t dev) { struct gicv2m_softc *sc; uint32_t typer; int rid; sc = device_get_softc(dev); rid = 0; sc->sc_mem = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &rid, RF_ACTIVE); if (sc->sc_mem == NULL) { device_printf(dev, "Unable to allocate resources\n"); return (ENXIO); } typer = bus_read_4(sc->sc_mem, GICV2M_MSI_TYPER); sc->sc_spi_start = MSI_TYPER_SPI_BASE(typer); sc->sc_spi_count = MSI_TYPER_SPI_COUNT(typer); device_printf(dev, "using spi %u to %u\n", sc->sc_spi_start, sc->sc_spi_start + sc->sc_spi_count - 1); mtx_init(&sc->sc_mutex, "GICv2m lock", "", MTX_DEF); arm_register_msi_pic(dev); return (0); } static int gicv2m_alloc_msix(device_t dev, device_t pci_dev, int *pirq) { struct arm_gic_softc *psc; struct gicv2m_softc *sc; uint32_t reg; int irq; psc = device_get_softc(device_get_parent(dev)); sc = device_get_softc(dev); mtx_lock(&sc->sc_mutex); /* Find an unused interrupt */ KASSERT(sc->sc_spi_offset < sc->sc_spi_count, ("No free SPIs")); irq = sc->sc_spi_start + sc->sc_spi_offset; sc->sc_spi_offset++; /* Interrupts need to be edge triggered, set this */ reg = gic_d_read_4(psc, GICD_ICFGR(irq >> 4)); reg |= (GICD_ICFGR_TRIG_EDGE | GICD_ICFGR_POL_HIGH) << ((irq & 0xf) * 2); gic_d_write_4(psc, GICD_ICFGR(irq >> 4), reg); *pirq = irq; mtx_unlock(&sc->sc_mutex); return (0); } static int gicv2m_alloc_msi(device_t dev, device_t pci_dev, int count, int *irqs) { struct arm_gic_softc *psc; struct gicv2m_softc *sc; uint32_t reg; int i, irq; psc = device_get_softc(device_get_parent(dev)); sc = device_get_softc(dev); mtx_lock(&sc->sc_mutex); KASSERT(sc->sc_spi_offset + count <= sc->sc_spi_count, ("No free SPIs for %d MSI interrupts", count)); /* Find an unused interrupt */ for (i = 0; i < count; i++) { irq = sc->sc_spi_start + sc->sc_spi_offset; sc->sc_spi_offset++; /* Interrupts need to be edge triggered, set this */ reg = gic_d_read_4(psc, GICD_ICFGR(irq >> 4)); reg |= (GICD_ICFGR_TRIG_EDGE | GICD_ICFGR_POL_HIGH) << ((irq & 0xf) * 2); gic_d_write_4(psc, GICD_ICFGR(irq >> 4), reg); irqs[i] = irq; } mtx_unlock(&sc->sc_mutex); return (0); } static int gicv2m_map_msi(device_t dev, device_t pci_dev, int irq, uint64_t *addr, uint32_t *data) { struct gicv2m_softc *sc = device_get_softc(dev); *addr = vtophys(rman_get_virtual(sc->sc_mem)) + 0x40; *data = irq; return (0); } static device_method_t arm_gicv2m_methods[] = { /* Device interface */ - DEVMETHOD(device_probe, gicv2m_probe), DEVMETHOD(device_attach, gicv2m_attach), /* MSI/MSI-X */ DEVMETHOD(pic_alloc_msix, gicv2m_alloc_msix), DEVMETHOD(pic_alloc_msi, gicv2m_alloc_msi), DEVMETHOD(pic_map_msi, gicv2m_map_msi), { 0, 0 } }; -static devclass_t arm_gicv2m_devclass; - DEFINE_CLASS_0(gicv2m, arm_gicv2m_driver, arm_gicv2m_methods, sizeof(struct gicv2m_softc)); -EARLY_DRIVER_MODULE(gicv2m, gic, arm_gicv2m_driver, arm_gicv2m_devclass, - 0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE); Index: projects/clang380-import/sys/arm64/arm64/gic.h =================================================================== --- projects/clang380-import/sys/arm64/arm64/gic.h (revision 294776) +++ projects/clang380-import/sys/arm64/arm64/gic.h (revision 294777) @@ -1,56 +1,66 @@ /*- * Copyright (c) 2011 The FreeBSD Foundation * Copyright (c) 2014 Andrew Turner * All rights reserved. * * Developed by Damjan Marion * * Based on OMAP4 GIC code by Ben Gray * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the company nor the name of the author may be used to * endorse or promote products derived from this software without specific * prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _ARM64_GIC_H_ #define _ARM64_GIC_H_ DECLARE_CLASS(arm_gic_driver); struct arm_gic_softc { device_t gic_dev; struct resource * gic_res[3]; bus_space_tag_t gic_c_bst; bus_space_tag_t gic_d_bst; bus_space_handle_t gic_c_bsh; bus_space_handle_t gic_d_bsh; uint8_t ver; struct mtx mutex; uint32_t nirqs; }; +DECLARE_CLASS(arm_gicv2m_driver); + +struct gicv2m_softc { + struct resource *sc_mem; + struct mtx sc_mutex; + u_int sc_spi_start; + u_int sc_spi_count; + u_int sc_spi_offset; +}; + int arm_gic_attach(device_t); #endif Index: projects/clang380-import/sys/arm64/arm64/gic_fdt.c =================================================================== --- projects/clang380-import/sys/arm64/arm64/gic_fdt.c (revision 294776) +++ projects/clang380-import/sys/arm64/arm64/gic_fdt.c (revision 294777) @@ -1,292 +1,327 @@ /*- * Copyright (c) 2015 The FreeBSD Foundation * All rights reserved. * * This software was developed by Andrew Turner under * sponsorship from the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include static struct ofw_compat_data compat_data[] = { {"arm,gic", true}, /* Non-standard, used in FreeBSD dts. */ {"arm,gic-400", true}, {"arm,cortex-a15-gic", true}, {"arm,cortex-a9-gic", true}, {"arm,cortex-a7-gic", true}, {"arm,arm11mp-gic", true}, {"brcm,brahma-b15-gic", true}, {"qcom,msm-qgic2", true}, {NULL, false} }; struct gic_range { uint64_t bus; uint64_t host; uint64_t size; }; struct arm_gic_fdt_softc { struct arm_gic_softc sc_gic; pcell_t sc_host_cells; pcell_t sc_addr_cells; pcell_t sc_size_cells; struct gic_range *sc_ranges; int sc_nranges; }; struct gic_devinfo { struct ofw_bus_devinfo obdinfo; struct resource_list rl; }; static int gic_fill_ranges(phandle_t node, struct arm_gic_fdt_softc *sc) { cell_t *base_ranges; ssize_t nbase_ranges; int i, j, k; nbase_ranges = OF_getproplen(node, "ranges"); if (nbase_ranges < 0) return (-1); sc->sc_nranges = nbase_ranges / sizeof(cell_t) / (sc->sc_addr_cells + sc->sc_host_cells + sc->sc_size_cells); if (sc->sc_nranges == 0) return (0); sc->sc_ranges = malloc(sc->sc_nranges * sizeof(sc->sc_ranges[0]), M_DEVBUF, M_WAITOK); base_ranges = malloc(nbase_ranges, M_DEVBUF, M_WAITOK); OF_getencprop(node, "ranges", base_ranges, nbase_ranges); for (i = 0, j = 0; i < sc->sc_nranges; i++) { sc->sc_ranges[i].bus = 0; for (k = 0; k < sc->sc_addr_cells; k++) { sc->sc_ranges[i].bus <<= 32; sc->sc_ranges[i].bus |= base_ranges[j++]; } sc->sc_ranges[i].host = 0; for (k = 0; k < sc->sc_host_cells; k++) { sc->sc_ranges[i].host <<= 32; sc->sc_ranges[i].host |= base_ranges[j++]; } sc->sc_ranges[i].size = 0; for (k = 0; k < sc->sc_size_cells; k++) { sc->sc_ranges[i].size <<= 32; sc->sc_ranges[i].size |= base_ranges[j++]; } } free(base_ranges, M_DEVBUF); return (sc->sc_nranges); } static int arm_gic_fdt_probe(device_t dev) { if (!ofw_bus_status_okay(dev)) return (ENXIO); if (!ofw_bus_search_compatible(dev, compat_data)->ocd_data) return (ENXIO); device_set_desc(dev, "ARM Generic Interrupt Controller"); return (BUS_PROBE_DEFAULT); } static int arm_gic_fdt_attach(device_t dev) { struct arm_gic_fdt_softc *sc = device_get_softc(dev); phandle_t root, child; struct gic_devinfo *dinfo; device_t cdev; int err; err = arm_gic_attach(dev); if (err != 0) return (err); root = ofw_bus_get_node(dev); sc->sc_host_cells = 1; OF_getencprop(OF_parent(root), "#address-cells", &sc->sc_host_cells, sizeof(sc->sc_host_cells)); sc->sc_addr_cells = 2; OF_getencprop(root, "#address-cells", &sc->sc_addr_cells, sizeof(sc->sc_addr_cells)); sc->sc_size_cells = 2; OF_getencprop(root, "#size-cells", &sc->sc_size_cells, sizeof(sc->sc_size_cells)); if (gic_fill_ranges(root, sc) < 0) { device_printf(dev, "could not get ranges\n"); return (ENXIO); } for (child = OF_child(root); child != 0; child = OF_peer(child)) { dinfo = malloc(sizeof(*dinfo), M_DEVBUF, M_WAITOK | M_ZERO); if (ofw_bus_gen_setup_devinfo(&dinfo->obdinfo, child) != 0) { free(dinfo, M_DEVBUF); continue; } resource_list_init(&dinfo->rl); ofw_bus_reg_to_rl(dev, child, sc->sc_addr_cells, sc->sc_size_cells, &dinfo->rl); cdev = device_add_child(dev, NULL, -1); if (cdev == NULL) { device_printf(dev, "<%s>: device_add_child failed\n", dinfo->obdinfo.obd_name); resource_list_free(&dinfo->rl); ofw_bus_gen_destroy_devinfo(&dinfo->obdinfo); free(dinfo, M_DEVBUF); continue; } device_set_ivars(cdev, dinfo); } bus_generic_probe(dev); return (bus_generic_attach(dev)); } static struct resource * arm_gic_fdt_alloc_resource(device_t bus, device_t child, int type, int *rid, u_long start, u_long end, u_long count, u_int flags) { struct arm_gic_fdt_softc *sc = device_get_softc(bus); struct gic_devinfo *di; struct resource_list_entry *rle; int j; KASSERT(type == SYS_RES_MEMORY, ("Invalid resoure type %x", type)); /* * Request for the default allocation with a given rid: use resource * list stored in the local device info. */ if ((start == 0UL) && (end == ~0UL)) { if ((di = device_get_ivars(child)) == NULL) return (NULL); if (type == SYS_RES_IOPORT) type = SYS_RES_MEMORY; rle = resource_list_find(&di->rl, type, *rid); if (rle == NULL) { if (bootverbose) device_printf(bus, "no default resources for " "rid = %d, type = %d\n", *rid, type); return (NULL); } start = rle->start; end = rle->end; count = rle->count; } /* Remap through ranges property */ for (j = 0; j < sc->sc_nranges; j++) { if (start >= sc->sc_ranges[j].bus && end < sc->sc_ranges[j].bus + sc->sc_ranges[j].size) { start -= sc->sc_ranges[j].bus; start += sc->sc_ranges[j].host; end -= sc->sc_ranges[j].bus; end += sc->sc_ranges[j].host; break; } } if (j == sc->sc_nranges && sc->sc_nranges != 0) { if (bootverbose) device_printf(bus, "Could not map resource " "%#lx-%#lx\n", start, end); return (NULL); } return (bus_generic_alloc_resource(bus, child, type, rid, start, end, count, flags)); } static const struct ofw_bus_devinfo * arm_gic_fdt_ofw_get_devinfo(device_t bus __unused, device_t child) { struct gic_devinfo *di; di = device_get_ivars(child); return (&di->obdinfo); } static device_method_t arm_gic_fdt_methods[] = { /* Device interface */ DEVMETHOD(device_probe, arm_gic_fdt_probe), DEVMETHOD(device_attach, arm_gic_fdt_attach), /* Bus interface */ DEVMETHOD(bus_add_child, bus_generic_add_child), DEVMETHOD(bus_alloc_resource, arm_gic_fdt_alloc_resource), DEVMETHOD(bus_release_resource, bus_generic_release_resource), DEVMETHOD(bus_activate_resource,bus_generic_activate_resource), /* ofw_bus interface */ DEVMETHOD(ofw_bus_get_devinfo, arm_gic_fdt_ofw_get_devinfo), DEVMETHOD(ofw_bus_get_compat, ofw_bus_gen_get_compat), DEVMETHOD(ofw_bus_get_model, ofw_bus_gen_get_model), DEVMETHOD(ofw_bus_get_name, ofw_bus_gen_get_name), DEVMETHOD(ofw_bus_get_node, ofw_bus_gen_get_node), DEVMETHOD(ofw_bus_get_type, ofw_bus_gen_get_type), DEVMETHOD_END }; DEFINE_CLASS_1(gic, arm_gic_fdt_driver, arm_gic_fdt_methods, sizeof(struct arm_gic_fdt_softc), arm_gic_driver); static devclass_t arm_gic_fdt_devclass; EARLY_DRIVER_MODULE(gic, simplebus, arm_gic_fdt_driver, arm_gic_fdt_devclass, 0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE); EARLY_DRIVER_MODULE(gic, ofwbus, arm_gic_fdt_driver, arm_gic_fdt_devclass, 0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE); + +static struct ofw_compat_data gicv2m_compat_data[] = { + {"arm,gic-v2m-frame", true}, + {NULL, false} +}; + +static int +arm_gicv2m_fdt_probe(device_t dev) +{ + + if (!ofw_bus_status_okay(dev)) + return (ENXIO); + + if (!ofw_bus_search_compatible(dev, gicv2m_compat_data)->ocd_data) + return (ENXIO); + + device_set_desc(dev, "ARM Generic Interrupt Controller MSI/MSIX"); + return (BUS_PROBE_DEFAULT); +} + +static device_method_t arm_gicv2m_fdt_methods[] = { + /* Device interface */ + DEVMETHOD(device_probe, arm_gicv2m_fdt_probe), + + /* End */ + DEVMETHOD_END +}; + +DEFINE_CLASS_1(gicv2m, arm_gicv2m_fdt_driver, arm_gicv2m_fdt_methods, + sizeof(struct gicv2m_softc), arm_gicv2m_driver); + +static devclass_t arm_gicv2m_fdt_devclass; + +EARLY_DRIVER_MODULE(gicv2m, gic, arm_gicv2m_fdt_driver, + arm_gicv2m_fdt_devclass, 0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE); Index: projects/clang380-import/sys/arm64/arm64/gic_v3.c =================================================================== --- projects/clang380-import/sys/arm64/arm64/gic_v3.c (revision 294776) +++ projects/clang380-import/sys/arm64/arm64/gic_v3.c (revision 294777) @@ -1,719 +1,719 @@ /*- * Copyright (c) 2015 The FreeBSD Foundation * All rights reserved. * * This software was developed by Semihalf under * the sponsorship of the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pic_if.h" #include "gic_v3_reg.h" #include "gic_v3_var.h" /* Device and PIC methods */ static void gic_v3_dispatch(device_t, struct trapframe *); static void gic_v3_eoi(device_t, u_int); static void gic_v3_mask_irq(device_t, u_int); static void gic_v3_unmask_irq(device_t, u_int); #ifdef SMP static void gic_v3_init_secondary(device_t); static void gic_v3_ipi_send(device_t, cpuset_t, u_int); #endif static device_method_t gic_v3_methods[] = { /* Device interface */ DEVMETHOD(device_detach, gic_v3_detach), /* PIC interface */ DEVMETHOD(pic_dispatch, gic_v3_dispatch), DEVMETHOD(pic_eoi, gic_v3_eoi), DEVMETHOD(pic_mask, gic_v3_mask_irq), DEVMETHOD(pic_unmask, gic_v3_unmask_irq), #ifdef SMP DEVMETHOD(pic_init_secondary, gic_v3_init_secondary), DEVMETHOD(pic_ipi_send, gic_v3_ipi_send), #endif /* End */ DEVMETHOD_END }; -DEFINE_CLASS_0(gic_v3, gic_v3_driver, gic_v3_methods, +DEFINE_CLASS_0(gic, gic_v3_driver, gic_v3_methods, sizeof(struct gic_v3_softc)); /* * Driver-specific definitions. */ MALLOC_DEFINE(M_GIC_V3, "GICv3", GIC_V3_DEVSTR); /* * Helper functions and definitions. */ /* Destination registers, either Distributor or Re-Distributor */ enum gic_v3_xdist { DIST = 0, REDIST, }; /* Helper routines starting with gic_v3_ */ static int gic_v3_dist_init(struct gic_v3_softc *); static int gic_v3_redist_alloc(struct gic_v3_softc *); static int gic_v3_redist_find(struct gic_v3_softc *); static int gic_v3_redist_init(struct gic_v3_softc *); static int gic_v3_cpu_init(struct gic_v3_softc *); static void gic_v3_wait_for_rwp(struct gic_v3_softc *, enum gic_v3_xdist); /* A sequence of init functions for primary (boot) CPU */ typedef int (*gic_v3_initseq_t) (struct gic_v3_softc *); /* Primary CPU initialization sequence */ static gic_v3_initseq_t gic_v3_primary_init[] = { gic_v3_dist_init, gic_v3_redist_alloc, gic_v3_redist_init, gic_v3_cpu_init, NULL }; #ifdef SMP /* Secondary CPU initialization sequence */ static gic_v3_initseq_t gic_v3_secondary_init[] = { gic_v3_redist_init, gic_v3_cpu_init, NULL }; #endif /* * Device interface. */ int gic_v3_attach(device_t dev) { struct gic_v3_softc *sc; gic_v3_initseq_t *init_func; uint32_t typer; int rid; int err; size_t i; sc = device_get_softc(dev); sc->gic_registered = FALSE; sc->dev = dev; err = 0; /* Initialize mutex */ mtx_init(&sc->gic_mtx, "GICv3 lock", NULL, MTX_SPIN); /* * Allocate array of struct resource. * One entry for Distributor and all remaining for Re-Distributor. */ sc->gic_res = malloc( sizeof(*sc->gic_res) * (sc->gic_redists.nregions + 1), M_GIC_V3, M_WAITOK); /* Now allocate corresponding resources */ for (i = 0, rid = 0; i < (sc->gic_redists.nregions + 1); i++, rid++) { sc->gic_res[rid] = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &rid, RF_ACTIVE); if (sc->gic_res[rid] == NULL) return (ENXIO); } /* * Distributor interface */ sc->gic_dist = sc->gic_res[0]; /* * Re-Dristributor interface */ /* Allocate space under region descriptions */ sc->gic_redists.regions = malloc( sizeof(*sc->gic_redists.regions) * sc->gic_redists.nregions, M_GIC_V3, M_WAITOK); /* Fill-up bus_space information for each region. */ for (i = 0, rid = 1; i < sc->gic_redists.nregions; i++, rid++) sc->gic_redists.regions[i] = sc->gic_res[rid]; /* Get the number of supported SPI interrupts */ typer = gic_d_read(sc, 4, GICD_TYPER); sc->gic_nirqs = GICD_TYPER_I_NUM(typer); if (sc->gic_nirqs > GIC_I_NUM_MAX) sc->gic_nirqs = GIC_I_NUM_MAX; /* Get the number of supported interrupt identifier bits */ sc->gic_idbits = GICD_TYPER_IDBITS(typer); if (bootverbose) { device_printf(dev, "SPIs: %u, IDs: %u\n", sc->gic_nirqs, (1 << sc->gic_idbits) - 1); } /* Train init sequence for boot CPU */ for (init_func = gic_v3_primary_init; *init_func != NULL; init_func++) { err = (*init_func)(sc); if (err != 0) return (err); } /* * Full success. * Now register PIC to the interrupts handling layer. */ arm_register_root_pic(dev, sc->gic_nirqs); sc->gic_registered = TRUE; return (0); } int gic_v3_detach(device_t dev) { struct gic_v3_softc *sc; size_t i; int rid; sc = device_get_softc(dev); if (device_is_attached(dev)) { /* * XXX: We should probably deregister PIC */ if (sc->gic_registered) panic("Trying to detach registered PIC"); } for (rid = 0; rid < (sc->gic_redists.nregions + 1); rid++) bus_release_resource(dev, SYS_RES_MEMORY, rid, sc->gic_res[rid]); for (i = 0; i < mp_ncpus; i++) free(sc->gic_redists.pcpu[i], M_GIC_V3); free(sc->gic_res, M_GIC_V3); free(sc->gic_redists.regions, M_GIC_V3); return (0); } /* * PIC interface. */ static void gic_v3_dispatch(device_t dev, struct trapframe *frame) { uint64_t active_irq; while (1) { if (CPU_MATCH_ERRATA_CAVIUM_THUNDER_1_1) { /* * Hardware: Cavium ThunderX * Chip revision: Pass 1.0 (early version) * Pass 1.1 (production) * ERRATUM: 22978, 23154 */ __asm __volatile( "nop;nop;nop;nop;nop;nop;nop;nop; \n" "mrs %0, ICC_IAR1_EL1 \n" "nop;nop;nop;nop; \n" "dsb sy \n" : "=&r" (active_irq)); } else { active_irq = gic_icc_read(IAR1); } if (__predict_false(active_irq == ICC_IAR1_EL1_SPUR)) break; if (__predict_true((active_irq >= GIC_FIRST_PPI && active_irq <= GIC_LAST_SPI) || active_irq >= GIC_FIRST_LPI)) { arm_dispatch_intr(active_irq, frame); continue; } if (active_irq <= GIC_LAST_SGI) { gic_icc_write(EOIR1, (uint64_t)active_irq); arm_dispatch_intr(active_irq, frame); continue; } } } static void gic_v3_eoi(device_t dev, u_int irq) { gic_icc_write(EOIR1, (uint64_t)irq); } static void gic_v3_mask_irq(device_t dev, u_int irq) { struct gic_v3_softc *sc; sc = device_get_softc(dev); if (irq <= GIC_LAST_PPI) { /* SGIs and PPIs in corresponding Re-Distributor */ gic_r_write(sc, 4, GICR_SGI_BASE_SIZE + GICD_ICENABLER(irq), GICD_I_MASK(irq)); gic_v3_wait_for_rwp(sc, REDIST); } else if (irq >= GIC_FIRST_SPI && irq <= GIC_LAST_SPI) { /* SPIs in distributor */ gic_r_write(sc, 4, GICD_ICENABLER(irq), GICD_I_MASK(irq)); gic_v3_wait_for_rwp(sc, DIST); } else if (irq >= GIC_FIRST_LPI) { /* LPIs */ lpi_mask_irq(dev, irq); } else panic("%s: Unsupported IRQ number %u", __func__, irq); } static void gic_v3_unmask_irq(device_t dev, u_int irq) { struct gic_v3_softc *sc; sc = device_get_softc(dev); if (irq <= GIC_LAST_PPI) { /* SGIs and PPIs in corresponding Re-Distributor */ gic_r_write(sc, 4, GICR_SGI_BASE_SIZE + GICD_ISENABLER(irq), GICD_I_MASK(irq)); gic_v3_wait_for_rwp(sc, REDIST); } else if (irq >= GIC_FIRST_SPI && irq <= GIC_LAST_SPI) { /* SPIs in distributor */ gic_d_write(sc, 4, GICD_ISENABLER(irq), GICD_I_MASK(irq)); gic_v3_wait_for_rwp(sc, DIST); } else if (irq >= GIC_FIRST_LPI) { /* LPIs */ lpi_unmask_irq(dev, irq); } else panic("%s: Unsupported IRQ number %u", __func__, irq); } #ifdef SMP static void gic_v3_init_secondary(device_t dev) { struct gic_v3_softc *sc; gic_v3_initseq_t *init_func; int err; sc = device_get_softc(dev); /* Train init sequence for boot CPU */ for (init_func = gic_v3_secondary_init; *init_func != NULL; init_func++) { err = (*init_func)(sc); if (err != 0) { device_printf(dev, "Could not initialize GIC for CPU%u\n", PCPU_GET(cpuid)); return; } } /* * Try to initialize ITS. * If there is no driver attached this routine will fail but that * does not mean failure here as only LPIs will not be functional * on the current CPU. */ if (its_init_cpu(NULL) != 0) { device_printf(dev, "Could not initialize ITS for CPU%u. " "No LPIs will arrive on this CPU\n", PCPU_GET(cpuid)); } /* * ARM64TODO: Unmask timer PPIs. To be removed when appropriate * mechanism is implemented. * Activate the timer interrupts: virtual (27), secure (29), * and non-secure (30). Use hardcoded values here as there * should be no defines for them. */ gic_v3_unmask_irq(dev, 27); gic_v3_unmask_irq(dev, 29); gic_v3_unmask_irq(dev, 30); } static void gic_v3_ipi_send(device_t dev, cpuset_t cpuset, u_int ipi) { u_int cpu; uint64_t aff, tlist; uint64_t val; uint64_t aff_mask; /* Set affinity mask to match level 3, 2 and 1 */ aff_mask = CPU_AFF1_MASK | CPU_AFF2_MASK | CPU_AFF3_MASK; /* Iterate through all CPUs in set */ while (!CPU_EMPTY(&cpuset)) { aff = tlist = 0; for (cpu = 0; cpu < mp_ncpus; cpu++) { /* Compose target list for single AFF3:AFF2:AFF1 set */ if (CPU_ISSET(cpu, &cpuset)) { if (!tlist) { /* * Save affinity of the first CPU to * send IPI to for later comparison. */ aff = CPU_AFFINITY(cpu); tlist |= (1UL << CPU_AFF0(aff)); CPU_CLR(cpu, &cpuset); } /* Check for same Affinity level 3, 2 and 1 */ if ((aff & aff_mask) == (CPU_AFFINITY(cpu) & aff_mask)) { tlist |= (1UL << CPU_AFF0(CPU_AFFINITY(cpu))); /* Clear CPU in cpuset from target list */ CPU_CLR(cpu, &cpuset); } } } if (tlist) { KASSERT((tlist & ~GICI_SGI_TLIST_MASK) == 0, ("Target list too long for GICv3 IPI")); /* Send SGI to CPUs in target list */ val = tlist; val |= (uint64_t)CPU_AFF3(aff) << GICI_SGI_AFF3_SHIFT; val |= (uint64_t)CPU_AFF2(aff) << GICI_SGI_AFF2_SHIFT; val |= (uint64_t)CPU_AFF1(aff) << GICI_SGI_AFF1_SHIFT; val |= (uint64_t)(ipi & GICI_SGI_IPI_MASK) << GICI_SGI_IPI_SHIFT; gic_icc_write(SGI1R, val); } } } #endif /* * Helper routines */ static void gic_v3_wait_for_rwp(struct gic_v3_softc *sc, enum gic_v3_xdist xdist) { struct resource *res; u_int cpuid; size_t us_left = 1000000; cpuid = PCPU_GET(cpuid); switch (xdist) { case DIST: res = sc->gic_dist; break; case REDIST: res = sc->gic_redists.pcpu[cpuid]; break; default: KASSERT(0, ("%s: Attempt to wait for unknown RWP", __func__)); return; } while ((bus_read_4(res, GICD_CTLR) & GICD_CTLR_RWP) != 0) { DELAY(1); if (us_left-- == 0) panic("GICD Register write pending for too long"); } } /* CPU interface. */ static __inline void gic_v3_cpu_priority(uint64_t mask) { /* Set prority mask */ gic_icc_write(PMR, mask & ICC_PMR_EL1_PRIO_MASK); } static int gic_v3_cpu_enable_sre(struct gic_v3_softc *sc) { uint64_t sre; u_int cpuid; cpuid = PCPU_GET(cpuid); /* * Set the SRE bit to enable access to GIC CPU interface * via system registers. */ sre = READ_SPECIALREG(icc_sre_el1); sre |= ICC_SRE_EL1_SRE; WRITE_SPECIALREG(icc_sre_el1, sre); isb(); /* * Now ensure that the bit is set. */ sre = READ_SPECIALREG(icc_sre_el1); if ((sre & ICC_SRE_EL1_SRE) == 0) { /* We are done. This was disabled in EL2 */ device_printf(sc->dev, "ERROR: CPU%u cannot enable CPU interface " "via system registers\n", cpuid); return (ENXIO); } else if (bootverbose) { device_printf(sc->dev, "CPU%u enabled CPU interface via system registers\n", cpuid); } return (0); } static int gic_v3_cpu_init(struct gic_v3_softc *sc) { int err; /* Enable access to CPU interface via system registers */ err = gic_v3_cpu_enable_sre(sc); if (err != 0) return (err); /* Priority mask to minimum - accept all interrupts */ gic_v3_cpu_priority(GIC_PRIORITY_MIN); /* Disable EOI mode */ gic_icc_clear(CTLR, ICC_CTLR_EL1_EOIMODE); /* Enable group 1 (insecure) interrups */ gic_icc_set(IGRPEN1, ICC_IGRPEN0_EL1_EN); return (0); } /* Distributor */ static int gic_v3_dist_init(struct gic_v3_softc *sc) { uint64_t aff; u_int i; /* * 1. Disable the Distributor */ gic_d_write(sc, 4, GICD_CTLR, 0); gic_v3_wait_for_rwp(sc, DIST); /* * 2. Configure the Distributor */ /* Set all global interrupts to be level triggered, active low. */ for (i = GIC_FIRST_SPI; i < sc->gic_nirqs; i += GICD_I_PER_ICFGRn) gic_d_write(sc, 4, GICD_ICFGR(i), 0x00000000); /* Set priority to all shared interrupts */ for (i = GIC_FIRST_SPI; i < sc->gic_nirqs; i += GICD_I_PER_IPRIORITYn) { /* Set highest priority */ gic_d_write(sc, 4, GICD_IPRIORITYR(i), GIC_PRIORITY_MAX); } /* * Disable all interrupts. Leave PPI and SGIs as they are enabled in * Re-Distributor registers. */ for (i = GIC_FIRST_SPI; i < sc->gic_nirqs; i += GICD_I_PER_ISENABLERn) gic_d_write(sc, 4, GICD_ICENABLER(i), 0xFFFFFFFF); gic_v3_wait_for_rwp(sc, DIST); /* * 3. Enable Distributor */ /* Enable Distributor with ARE, Group 1 */ gic_d_write(sc, 4, GICD_CTLR, GICD_CTLR_ARE_NS | GICD_CTLR_G1A | GICD_CTLR_G1); /* * 4. Route all interrupts to boot CPU. */ aff = CPU_AFFINITY(PCPU_GET(cpuid)); for (i = GIC_FIRST_SPI; i < sc->gic_nirqs; i++) gic_d_write(sc, 4, GICD_IROUTER(i), aff); return (0); } /* Re-Distributor */ static int gic_v3_redist_alloc(struct gic_v3_softc *sc) { u_int cpuid; /* Allocate struct resource for all CPU's Re-Distributor registers */ for (cpuid = 0; cpuid < mp_ncpus; cpuid++) if (CPU_ISSET(cpuid, &all_cpus) != 0) sc->gic_redists.pcpu[cpuid] = malloc(sizeof(*sc->gic_redists.pcpu[0]), M_GIC_V3, M_WAITOK); else sc->gic_redists.pcpu[cpuid] = NULL; return (0); } static int gic_v3_redist_find(struct gic_v3_softc *sc) { struct resource r_res; bus_space_handle_t r_bsh; uint64_t aff; uint64_t typer; uint32_t pidr2; u_int cpuid; size_t i; cpuid = PCPU_GET(cpuid); aff = CPU_AFFINITY(cpuid); /* Affinity in format for comparison with typer */ aff = (CPU_AFF3(aff) << 24) | (CPU_AFF2(aff) << 16) | (CPU_AFF1(aff) << 8) | CPU_AFF0(aff); if (bootverbose) { device_printf(sc->dev, "Start searching for Re-Distributor\n"); } /* Iterate through Re-Distributor regions */ for (i = 0; i < sc->gic_redists.nregions; i++) { /* Take a copy of the region's resource */ r_res = *sc->gic_redists.regions[i]; r_bsh = rman_get_bushandle(&r_res); pidr2 = bus_read_4(&r_res, GICR_PIDR2); switch (pidr2 & GICR_PIDR2_ARCH_MASK) { case GICR_PIDR2_ARCH_GICv3: /* fall through */ case GICR_PIDR2_ARCH_GICv4: break; default: device_printf(sc->dev, "No Re-Distributor found for CPU%u\n", cpuid); return (ENODEV); } do { typer = bus_read_8(&r_res, GICR_TYPER); if ((typer >> GICR_TYPER_AFF_SHIFT) == aff) { KASSERT(sc->gic_redists.pcpu[cpuid] != NULL, ("Invalid pointer to per-CPU redistributor")); /* Copy res contents to its final destination */ *sc->gic_redists.pcpu[cpuid] = r_res; if (bootverbose) { device_printf(sc->dev, "CPU%u Re-Distributor has been found\n", cpuid); } return (0); } r_bsh += (GICR_RD_BASE_SIZE + GICR_SGI_BASE_SIZE); if ((typer & GICR_TYPER_VLPIS) != 0) { r_bsh += (GICR_VLPI_BASE_SIZE + GICR_RESERVED_SIZE); } rman_set_bushandle(&r_res, r_bsh); } while ((typer & GICR_TYPER_LAST) == 0); } device_printf(sc->dev, "No Re-Distributor found for CPU%u\n", cpuid); return (ENXIO); } static int gic_v3_redist_wake(struct gic_v3_softc *sc) { uint32_t waker; size_t us_left = 1000000; waker = gic_r_read(sc, 4, GICR_WAKER); /* Wake up Re-Distributor for this CPU */ waker &= ~GICR_WAKER_PS; gic_r_write(sc, 4, GICR_WAKER, waker); /* * When clearing ProcessorSleep bit it is required to wait for * ChildrenAsleep to become zero following the processor power-on. */ while ((gic_r_read(sc, 4, GICR_WAKER) & GICR_WAKER_CA) != 0) { DELAY(1); if (us_left-- == 0) { panic("Could not wake Re-Distributor for CPU%u", PCPU_GET(cpuid)); } } if (bootverbose) { device_printf(sc->dev, "CPU%u Re-Distributor woke up\n", PCPU_GET(cpuid)); } return (0); } static int gic_v3_redist_init(struct gic_v3_softc *sc) { int err; size_t i; err = gic_v3_redist_find(sc); if (err != 0) return (err); err = gic_v3_redist_wake(sc); if (err != 0) return (err); /* Disable SPIs */ gic_r_write(sc, 4, GICR_SGI_BASE_SIZE + GICR_ICENABLER0, GICR_I_ENABLER_PPI_MASK); /* Enable SGIs */ gic_r_write(sc, 4, GICR_SGI_BASE_SIZE + GICR_ISENABLER0, GICR_I_ENABLER_SGI_MASK); /* Set priority for SGIs and PPIs */ for (i = 0; i <= GIC_LAST_PPI; i += GICR_I_PER_IPRIORITYn) { gic_r_write(sc, 4, GICR_SGI_BASE_SIZE + GICD_IPRIORITYR(i), GIC_PRIORITY_MAX); } gic_v3_wait_for_rwp(sc, REDIST); return (0); } Index: projects/clang380-import/sys/arm64/arm64/gic_v3_fdt.c =================================================================== --- projects/clang380-import/sys/arm64/arm64/gic_v3_fdt.c (revision 294776) +++ projects/clang380-import/sys/arm64/arm64/gic_v3_fdt.c (revision 294777) @@ -1,310 +1,310 @@ /*- * Copyright (c) 2015 The FreeBSD Foundation * All rights reserved. * * This software was developed by Semihalf under * the sponsorship of the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include "pic_if.h" #include "gic_v3_reg.h" #include "gic_v3_var.h" /* * FDT glue. */ static int gic_v3_fdt_probe(device_t); static int gic_v3_fdt_attach(device_t); static struct resource *gic_v3_ofw_bus_alloc_res(device_t, device_t, int, int *, u_long, u_long, u_long, u_int); static const struct ofw_bus_devinfo *gic_v3_ofw_get_devinfo(device_t, device_t); static device_method_t gic_v3_fdt_methods[] = { /* Device interface */ DEVMETHOD(device_probe, gic_v3_fdt_probe), DEVMETHOD(device_attach, gic_v3_fdt_attach), /* Bus interface */ DEVMETHOD(bus_alloc_resource, gic_v3_ofw_bus_alloc_res), DEVMETHOD(bus_activate_resource, bus_generic_activate_resource), /* ofw_bus interface */ DEVMETHOD(ofw_bus_get_devinfo, gic_v3_ofw_get_devinfo), DEVMETHOD(ofw_bus_get_compat, ofw_bus_gen_get_compat), DEVMETHOD(ofw_bus_get_model, ofw_bus_gen_get_model), DEVMETHOD(ofw_bus_get_name, ofw_bus_gen_get_name), DEVMETHOD(ofw_bus_get_node, ofw_bus_gen_get_node), DEVMETHOD(ofw_bus_get_type, ofw_bus_gen_get_type), /* End */ DEVMETHOD_END }; -DEFINE_CLASS_1(gic_v3, gic_v3_fdt_driver, gic_v3_fdt_methods, +DEFINE_CLASS_1(gic, gic_v3_fdt_driver, gic_v3_fdt_methods, sizeof(struct gic_v3_softc), gic_v3_driver); static devclass_t gic_v3_fdt_devclass; EARLY_DRIVER_MODULE(gic_v3, simplebus, gic_v3_fdt_driver, gic_v3_fdt_devclass, 0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE); EARLY_DRIVER_MODULE(gic_v3, ofwbus, gic_v3_fdt_driver, gic_v3_fdt_devclass, 0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE); /* * Helper functions declarations. */ static int gic_v3_ofw_bus_attach(device_t); /* * Device interface. */ static int gic_v3_fdt_probe(device_t dev) { if (!ofw_bus_status_okay(dev)) return (ENXIO); if (!ofw_bus_is_compatible(dev, "arm,gic-v3")) return (ENXIO); device_set_desc(dev, GIC_V3_DEVSTR); return (BUS_PROBE_DEFAULT); } static int gic_v3_fdt_attach(device_t dev) { struct gic_v3_softc *sc; pcell_t redist_regions; int err; sc = device_get_softc(dev); sc->dev = dev; /* * Recover number of the Re-Distributor regions. */ if (OF_getencprop(ofw_bus_get_node(dev), "#redistributor-regions", &redist_regions, sizeof(redist_regions)) <= 0) sc->gic_redists.nregions = 1; else sc->gic_redists.nregions = redist_regions; err = gic_v3_attach(dev); if (err) goto error; /* * Try to register ITS to this GIC. * GIC will act as a bus in that case. * Failure here will not affect main GIC functionality. */ if (gic_v3_ofw_bus_attach(dev) != 0) { if (bootverbose) { device_printf(dev, "Failed to attach ITS to this GIC\n"); } } return (err); error: if (bootverbose) { device_printf(dev, "Failed to attach. Error %d\n", err); } /* Failure so free resources */ gic_v3_detach(dev); return (err); } /* OFW bus interface */ struct gic_v3_ofw_devinfo { struct ofw_bus_devinfo di_dinfo; struct resource_list di_rl; }; static const struct ofw_bus_devinfo * gic_v3_ofw_get_devinfo(device_t bus __unused, device_t child) { struct gic_v3_ofw_devinfo *di; di = device_get_ivars(child); return (&di->di_dinfo); } static struct resource * gic_v3_ofw_bus_alloc_res(device_t bus, device_t child, int type, int *rid, u_long start, u_long end, u_long count, u_int flags) { struct gic_v3_ofw_devinfo *di; struct resource_list_entry *rle; int ranges_len; if ((start == 0UL) && (end == ~0UL)) { if ((di = device_get_ivars(child)) == NULL) return (NULL); if (type != SYS_RES_MEMORY) return (NULL); /* Find defaults for this rid */ rle = resource_list_find(&di->di_rl, type, *rid); if (rle == NULL) return (NULL); start = rle->start; end = rle->end; count = rle->count; } /* * XXX: No ranges remap! * Absolute address is expected. */ if (ofw_bus_has_prop(bus, "ranges")) { ranges_len = OF_getproplen(ofw_bus_get_node(bus), "ranges"); if (ranges_len != 0) { if (bootverbose) { device_printf(child, "Ranges remap not supported\n"); } return (NULL); } } return (bus_generic_alloc_resource(bus, child, type, rid, start, end, count, flags)); } /* Helper functions */ /* * Bus capability support for GICv3. * Collects and configures device informations and finally * adds ITS device as a child of GICv3 in Newbus hierarchy. */ static int gic_v3_ofw_bus_attach(device_t dev) { struct gic_v3_ofw_devinfo *di; device_t child; phandle_t parent, node; pcell_t addr_cells, size_cells; parent = ofw_bus_get_node(dev); if (parent > 0) { addr_cells = 2; OF_getencprop(parent, "#address-cells", &addr_cells, sizeof(addr_cells)); size_cells = 2; OF_getencprop(parent, "#size-cells", &size_cells, sizeof(size_cells)); /* Iterate through all GIC subordinates */ for (node = OF_child(parent); node > 0; node = OF_peer(node)) { /* Allocate and populate devinfo. */ di = malloc(sizeof(*di), M_GIC_V3, M_WAITOK | M_ZERO); if (ofw_bus_gen_setup_devinfo(&di->di_dinfo, node)) { if (bootverbose) { device_printf(dev, "Could not set up devinfo for ITS\n"); } free(di, M_GIC_V3); continue; } /* Initialize and populate resource list. */ resource_list_init(&di->di_rl); ofw_bus_reg_to_rl(dev, node, addr_cells, size_cells, &di->di_rl); /* Should not have any interrupts, so don't add any */ /* Add newbus device for this FDT node */ child = device_add_child(dev, NULL, -1); if (!child) { if (bootverbose) { device_printf(dev, "Could not add child: %s\n", di->di_dinfo.obd_name); } resource_list_free(&di->di_rl); ofw_bus_gen_destroy_devinfo(&di->di_dinfo); free(di, M_GIC_V3); continue; } device_set_ivars(child, di); } } return (bus_generic_attach(dev)); } static int gic_v3_its_fdt_probe(device_t dev); static device_method_t gic_v3_its_fdt_methods[] = { /* Device interface */ DEVMETHOD(device_probe, gic_v3_its_fdt_probe), /* End */ DEVMETHOD_END }; -DEFINE_CLASS_1(gic_v3_its, gic_v3_its_fdt_driver, gic_v3_its_fdt_methods, +DEFINE_CLASS_1(its, gic_v3_its_fdt_driver, gic_v3_its_fdt_methods, sizeof(struct gic_v3_its_softc), gic_v3_its_driver); static devclass_t gic_v3_its_fdt_devclass; -EARLY_DRIVER_MODULE(gic_v3_its, gic_v3, gic_v3_its_fdt_driver, +EARLY_DRIVER_MODULE(its, gic, gic_v3_its_fdt_driver, gic_v3_its_fdt_devclass, 0, 0, BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE); static int gic_v3_its_fdt_probe(device_t dev) { if (!ofw_bus_status_okay(dev)) return (ENXIO); if (!ofw_bus_is_compatible(dev, GIC_V3_ITS_COMPSTR)) return (ENXIO); device_set_desc(dev, GIC_V3_ITS_DEVSTR); return (BUS_PROBE_DEFAULT); } Index: projects/clang380-import/sys/arm64/arm64/gic_v3_its.c =================================================================== --- projects/clang380-import/sys/arm64/arm64/gic_v3_its.c (revision 294776) +++ projects/clang380-import/sys/arm64/arm64/gic_v3_its.c (revision 294777) @@ -1,1687 +1,1687 @@ /*- * Copyright (c) 2015 The FreeBSD Foundation * All rights reserved. * * This software was developed by Semihalf under * the sponsorship of the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "gic_v3_reg.h" #include "gic_v3_var.h" #define GIC_V3_ITS_QUIRK_THUNDERX_PEM_BUS_OFFSET 144 #include "pic_if.h" /* Device and PIC methods */ static int gic_v3_its_attach(device_t); static device_method_t gic_v3_its_methods[] = { /* Device interface */ DEVMETHOD(device_attach, gic_v3_its_attach), /* * PIC interface */ /* MSI-X */ DEVMETHOD(pic_alloc_msix, gic_v3_its_alloc_msix), /* MSI */ DEVMETHOD(pic_alloc_msi, gic_v3_its_alloc_msi), DEVMETHOD(pic_map_msi, gic_v3_its_map_msi), /* End */ DEVMETHOD_END }; -DEFINE_CLASS_0(gic_v3_its, gic_v3_its_driver, gic_v3_its_methods, +DEFINE_CLASS_0(its, gic_v3_its_driver, gic_v3_its_methods, sizeof(struct gic_v3_its_softc)); MALLOC_DEFINE(M_GIC_V3_ITS, "GICv3 ITS", GIC_V3_ITS_DEVSTR); static int its_alloc_tables(struct gic_v3_its_softc *); static void its_free_tables(struct gic_v3_its_softc *); static void its_init_commandq(struct gic_v3_its_softc *); static void its_init_cpu_collection(struct gic_v3_its_softc *); static uint32_t its_get_devid(device_t); static int its_cmd_send(struct gic_v3_its_softc *, struct its_cmd_desc *); static void its_cmd_mapc(struct gic_v3_its_softc *, struct its_col *, uint8_t); static void its_cmd_mapvi(struct gic_v3_its_softc *, struct its_dev *, uint32_t, uint32_t); static void its_cmd_mapi(struct gic_v3_its_softc *, struct its_dev *, uint32_t); static void its_cmd_inv(struct gic_v3_its_softc *, struct its_dev *, uint32_t); static void its_cmd_invall(struct gic_v3_its_softc *, struct its_col *); static uint32_t its_get_devbits(device_t); static void lpi_init_conftable(struct gic_v3_its_softc *); static void lpi_bitmap_init(struct gic_v3_its_softc *); static int lpi_config_cpu(struct gic_v3_its_softc *); static void lpi_alloc_cpu_pendtables(struct gic_v3_its_softc *); const char *its_ptab_cache[] = { [GITS_BASER_CACHE_NCNB] = "(NC,NB)", [GITS_BASER_CACHE_NC] = "(NC)", [GITS_BASER_CACHE_RAWT] = "(RA,WT)", [GITS_BASER_CACHE_RAWB] = "(RA,WB)", [GITS_BASER_CACHE_WAWT] = "(WA,WT)", [GITS_BASER_CACHE_WAWB] = "(WA,WB)", [GITS_BASER_CACHE_RAWAWT] = "(RAWA,WT)", [GITS_BASER_CACHE_RAWAWB] = "(RAWA,WB)", }; const char *its_ptab_share[] = { [GITS_BASER_SHARE_NS] = "none", [GITS_BASER_SHARE_IS] = "inner", [GITS_BASER_SHARE_OS] = "outer", [GITS_BASER_SHARE_RES] = "none", }; const char *its_ptab_type[] = { [GITS_BASER_TYPE_UNIMPL] = "Unimplemented", [GITS_BASER_TYPE_DEV] = "Devices", [GITS_BASER_TYPE_VP] = "Virtual Processors", [GITS_BASER_TYPE_PP] = "Physical Processors", [GITS_BASER_TYPE_IC] = "Interrupt Collections", [GITS_BASER_TYPE_RES5] = "Reserved (5)", [GITS_BASER_TYPE_RES6] = "Reserved (6)", [GITS_BASER_TYPE_RES7] = "Reserved (7)", }; /* * Vendor specific quirks. * One needs to add appropriate entry to its_quirks[] * table if the imlementation varies from the generic ARM ITS. */ /* Cavium ThunderX PCI devid acquire function */ static uint32_t its_get_devbits_thunder(device_t); static uint32_t its_get_devid_thunder(device_t); static const struct its_quirks its_quirks[] = { { /* * Hardware: Cavium ThunderX * Chip revision: Pass 1.0, Pass 1.1 */ .cpuid = CPU_ID_RAW(CPU_IMPL_CAVIUM, CPU_PART_THUNDER, 0, 0), .cpuid_mask = CPU_IMPL_MASK | CPU_PART_MASK, .devid_func = its_get_devid_thunder, .devbits_func = its_get_devbits_thunder, }, }; static struct gic_v3_its_softc *its_sc; #define gic_its_read(sc, len, reg) \ bus_read_##len(&sc->its_res[0], reg) #define gic_its_write(sc, len, reg, val) \ bus_write_##len(&sc->its_res[0], reg, val) static int gic_v3_its_attach(device_t dev) { struct gic_v3_its_softc *sc; uint64_t gits_tmp; uint32_t gits_pidr2; int rid; int ret; sc = device_get_softc(dev); /* * XXX ARM64TODO: Avoid configuration of more than one ITS * device. To be removed when multi-PIC support is added * to FreeBSD (or at least multi-ITS is implemented). Limit * supported ITS sockets to '0' only. */ if (device_get_unit(dev) != 0) { device_printf(dev, "Only single instance of ITS is supported, exiting...\n"); return (ENXIO); } sc->its_socket = 0; /* * Initialize sleep & spin mutex for ITS */ /* Protects ITS device list and assigned LPIs bitmaps. */ mtx_init(&sc->its_mtx, "ITS sleep lock", NULL, MTX_DEF); /* Protects access to ITS command circular buffer. */ mtx_init(&sc->its_spin_mtx, "ITS spin lock", NULL, MTX_SPIN); rid = 0; sc->its_res = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &rid, RF_ACTIVE); if (sc->its_res == NULL) { device_printf(dev, "Could not allocate memory\n"); return (ENXIO); } sc->dev = dev; gits_pidr2 = gic_its_read(sc, 4, GITS_PIDR2); switch (gits_pidr2 & GITS_PIDR2_ARCH_MASK) { case GITS_PIDR2_ARCH_GICv3: /* fall through */ case GITS_PIDR2_ARCH_GICv4: if (bootverbose) { device_printf(dev, "ITS found. Architecture rev. %u\n", (u_int)(gits_pidr2 & GITS_PIDR2_ARCH_MASK) >> 4); } break; default: device_printf(dev, "No ITS found in the system\n"); gic_v3_its_detach(dev); return (ENODEV); } /* 1. Initialize commands queue */ its_init_commandq(sc); /* 2. Provide memory for any private ITS tables */ ret = its_alloc_tables(sc); if (ret != 0) { gic_v3_its_detach(dev); return (ret); } /* 3. Allocate collections. One per-CPU */ for (int cpu = 0; cpu < mp_ncpus; cpu++) if (CPU_ISSET(cpu, &all_cpus) != 0) sc->its_cols[cpu] = malloc(sizeof(*sc->its_cols[0]), M_GIC_V3_ITS, (M_WAITOK | M_ZERO)); else sc->its_cols[cpu] = NULL; /* 4. Enable ITS in GITS_CTLR */ gits_tmp = gic_its_read(sc, 4, GITS_CTLR); gic_its_write(sc, 4, GITS_CTLR, gits_tmp | GITS_CTLR_EN); /* 5. Initialize LPIs configuration table */ lpi_init_conftable(sc); /* 6. LPIs bitmap init */ lpi_bitmap_init(sc); /* 7. Allocate pending tables for all CPUs */ lpi_alloc_cpu_pendtables(sc); /* 8. CPU init */ (void)its_init_cpu(sc); /* 9. Init ITS devices list */ TAILQ_INIT(&sc->its_dev_list); arm_register_msi_pic(dev); /* * XXX ARM64TODO: We need to have ITS software context * when being called by the interrupt code (mask/unmask). * This may be used only when one ITS is present in * the system and eventually should be removed. */ KASSERT(its_sc == NULL, ("Trying to assign its_sc that is already set")); its_sc = sc; return (0); } /* Will not detach but use it for convenience */ int gic_v3_its_detach(device_t dev) { device_t parent; struct gic_v3_softc *gic_sc; struct gic_v3_its_softc *sc; u_int cpuid; int rid = 0; sc = device_get_softc(dev); cpuid = PCPU_GET(cpuid); /* Release what's possible */ /* Command queue */ if ((void *)sc->its_cmdq_base != NULL) { contigfree((void *)sc->its_cmdq_base, ITS_CMDQ_SIZE, M_GIC_V3_ITS); } /* ITTs */ its_free_tables(sc); /* Collections */ for (cpuid = 0; cpuid < mp_ncpus; cpuid++) free(sc->its_cols[cpuid], M_GIC_V3_ITS); /* LPI config table */ parent = device_get_parent(sc->dev); gic_sc = device_get_softc(parent); if ((void *)gic_sc->gic_redists.lpis.conf_base != NULL) { contigfree((void *)gic_sc->gic_redists.lpis.conf_base, LPI_CONFTAB_SIZE, M_GIC_V3_ITS); } for (cpuid = 0; cpuid < mp_ncpus; cpuid++) if ((void *)gic_sc->gic_redists.lpis.pend_base[cpuid] != NULL) { contigfree( (void *)gic_sc->gic_redists.lpis.pend_base[cpuid], roundup2(LPI_PENDTAB_SIZE, PAGE_SIZE_64K), M_GIC_V3_ITS); } /* Resource... */ bus_release_resource(dev, SYS_RES_MEMORY, rid, sc->its_res); /* XXX ARM64TODO: Reset global pointer to ITS software context */ its_sc = NULL; return (0); } static int its_alloc_tables(struct gic_v3_its_softc *sc) { uint64_t gits_baser, gits_tmp; uint64_t type, esize, cache, share, psz; size_t page_size, npages, nitspages, nidents, tn; size_t its_tbl_size; vm_offset_t ptab_vaddr; vm_paddr_t ptab_paddr; boolean_t first = TRUE; page_size = PAGE_SIZE_64K; for (tn = 0; tn < GITS_BASER_NUM; tn++) { gits_baser = gic_its_read(sc, 8, GITS_BASER(tn)); type = GITS_BASER_TYPE(gits_baser); /* Get the Table Entry size */ esize = GITS_BASER_ESIZE(gits_baser); switch (type) { case GITS_BASER_TYPE_UNIMPL: /* fall through */ case GITS_BASER_TYPE_RES5: case GITS_BASER_TYPE_RES6: case GITS_BASER_TYPE_RES7: continue; case GITS_BASER_TYPE_DEV: nidents = (1 << its_get_devbits(sc->dev)); its_tbl_size = esize * nidents; its_tbl_size = roundup2(its_tbl_size, page_size); npages = howmany(its_tbl_size, PAGE_SIZE); break; default: npages = howmany(page_size, PAGE_SIZE); break; } /* Allocate required space */ ptab_vaddr = (vm_offset_t)contigmalloc(npages * PAGE_SIZE, M_GIC_V3_ITS, (M_WAITOK | M_ZERO), 0, ~0UL, PAGE_SIZE, 0); sc->its_ptabs[tn].ptab_vaddr = ptab_vaddr; sc->its_ptabs[tn].ptab_pgsz = PAGE_SIZE; sc->its_ptabs[tn].ptab_npages = npages; ptab_paddr = vtophys(ptab_vaddr); KASSERT((ptab_paddr & GITS_BASER_PA_MASK) == ptab_paddr, ("%s: Unaligned PA for Interrupt Translation Table", device_get_name(sc->dev))); /* Set defaults: WAWB, IS */ cache = GITS_BASER_CACHE_WAWB; share = GITS_BASER_SHARE_IS; for (;;) { nitspages = howmany(its_tbl_size, page_size); switch (page_size) { case PAGE_SIZE: /* 4KB */ psz = GITS_BASER_PSZ_4K; break; case PAGE_SIZE_16K: /* 16KB */ psz = GITS_BASER_PSZ_4K; break; case PAGE_SIZE_64K: /* 64KB */ psz = GITS_BASER_PSZ_64K; break; default: device_printf(sc->dev, "Unsupported page size: %zuKB\n", (page_size / 1024)); its_free_tables(sc); return (ENXIO); } /* Clear fields under modification first */ gits_baser &= ~(GITS_BASER_VALID | GITS_BASER_CACHE_MASK | GITS_BASER_TYPE_MASK | GITS_BASER_ESIZE_MASK | GITS_BASER_PA_MASK | GITS_BASER_SHARE_MASK | GITS_BASER_PSZ_MASK | GITS_BASER_SIZE_MASK); /* Construct register value */ gits_baser |= (type << GITS_BASER_TYPE_SHIFT) | ((esize - 1) << GITS_BASER_ESIZE_SHIFT) | (cache << GITS_BASER_CACHE_SHIFT) | (share << GITS_BASER_SHARE_SHIFT) | (psz << GITS_BASER_PSZ_SHIFT) | ptab_paddr | (nitspages - 1) | GITS_BASER_VALID; gic_its_write(sc, 8, GITS_BASER(tn), gits_baser); /* * Verify. * Depending on implementation we may encounter * shareability and page size mismatch. */ gits_tmp = gic_its_read(sc, 8, GITS_BASER(tn)); if (((gits_tmp ^ gits_baser) & GITS_BASER_SHARE_MASK) != 0) { share = gits_tmp & GITS_BASER_SHARE_MASK; share >>= GITS_BASER_SHARE_SHIFT; continue; } if (((gits_tmp ^ gits_baser) & GITS_BASER_PSZ_MASK) != 0) { switch (page_size) { case PAGE_SIZE_16K: /* Drop to 4KB page */ page_size = PAGE_SIZE; continue; case PAGE_SIZE_64K: /* Drop to 16KB page */ page_size = PAGE_SIZE_16K; continue; } } /* * All possible adjustments should * be applied by now so just break the loop. */ break; } /* * Do not compare Cacheability field since * it is implementation defined. */ gits_tmp &= ~GITS_BASER_CACHE_MASK; gits_baser &= ~GITS_BASER_CACHE_MASK; if (gits_tmp != gits_baser) { device_printf(sc->dev, "Could not allocate ITS tables\n"); its_free_tables(sc); return (ENXIO); } if (bootverbose) { if (first) { device_printf(sc->dev, "Allocated ITS private tables:\n"); first = FALSE; } device_printf(sc->dev, "\tPTAB%zu for %s: PA 0x%lx," " %lu entries," " cache policy %s, %s shareable," " page size %zuKB\n", tn, its_ptab_type[type], ptab_paddr, (page_size * nitspages) / esize, its_ptab_cache[cache], its_ptab_share[share], page_size / 1024); } } return (0); } static void its_free_tables(struct gic_v3_its_softc *sc) { vm_offset_t ptab_vaddr; size_t size; size_t tn; for (tn = 0; tn < GITS_BASER_NUM; tn++) { ptab_vaddr = sc->its_ptabs[tn].ptab_vaddr; if (ptab_vaddr == 0) continue; size = sc->its_ptabs[tn].ptab_pgsz; size *= sc->its_ptabs[tn].ptab_npages; if ((void *)ptab_vaddr != NULL) contigfree((void *)ptab_vaddr, size, M_GIC_V3_ITS); /* Clear the table description */ memset(&sc->its_ptabs[tn], 0, sizeof(sc->its_ptabs[tn])); } } static void its_init_commandq(struct gic_v3_its_softc *sc) { uint64_t gits_cbaser, gits_tmp; uint64_t cache, share; vm_paddr_t cmdq_paddr; device_t dev; dev = sc->dev; /* Allocate memory for command queue */ sc->its_cmdq_base = contigmalloc(ITS_CMDQ_SIZE, M_GIC_V3_ITS, (M_WAITOK | M_ZERO), 0, ~0UL, ITS_CMDQ_SIZE, 0); /* Set command queue write pointer (command queue empty) */ sc->its_cmdq_write = sc->its_cmdq_base; /* Save command queue pointer and attributes */ cmdq_paddr = vtophys(sc->its_cmdq_base); /* Set defaults: Normal Inner WAWB, IS */ cache = GITS_CBASER_CACHE_NIWAWB; share = GITS_CBASER_SHARE_IS; gits_cbaser = (cmdq_paddr | (cache << GITS_CBASER_CACHE_SHIFT) | (share << GITS_CBASER_SHARE_SHIFT) | /* Number of 4KB pages - 1 */ ((ITS_CMDQ_SIZE / PAGE_SIZE) - 1) | /* Valid bit */ GITS_CBASER_VALID); gic_its_write(sc, 8, GITS_CBASER, gits_cbaser); gits_tmp = gic_its_read(sc, 8, GITS_CBASER); if (((gits_tmp ^ gits_cbaser) & GITS_CBASER_SHARE_MASK) != 0) { if (bootverbose) { device_printf(dev, "Will use cache flushing for commands queue\n"); } /* Command queue needs cache flushing */ sc->its_flags |= ITS_FLAGS_CMDQ_FLUSH; } gic_its_write(sc, 8, GITS_CWRITER, 0x0); } int its_init_cpu(struct gic_v3_its_softc *sc) { device_t parent; struct gic_v3_softc *gic_sc; /* * NULL in place of the softc pointer means that * this function was called during GICv3 secondary initialization. */ if (sc == NULL) { if (device_is_attached(its_sc->dev)) { /* * XXX ARM64TODO: This is part of the workaround that * saves ITS software context for further use in * mask/unmask and here. This should be removed as soon * as the upper layer is capable of passing the ITS * context to this function. */ sc = its_sc; } else return (ENXIO); /* Skip if running secondary init on a wrong socket */ if (sc->its_socket != CPU_CURRENT_SOCKET) return (ENXIO); } /* * Check for LPIs support on this Re-Distributor. */ parent = device_get_parent(sc->dev); gic_sc = device_get_softc(parent); if ((gic_r_read(gic_sc, 4, GICR_TYPER) & GICR_TYPER_PLPIS) == 0) { if (bootverbose) { device_printf(sc->dev, "LPIs not supported on CPU%u\n", PCPU_GET(cpuid)); } return (ENXIO); } /* Configure LPIs for this CPU */ lpi_config_cpu(sc); /* Initialize collections */ its_init_cpu_collection(sc); return (0); } static void its_init_cpu_collection(struct gic_v3_its_softc *sc) { device_t parent; struct gic_v3_softc *gic_sc; uint64_t typer; uint64_t target; vm_offset_t redist_base; u_int cpuid; cpuid = PCPU_GET(cpuid); parent = device_get_parent(sc->dev); gic_sc = device_get_softc(parent); typer = gic_its_read(sc, 8, GITS_TYPER); if ((typer & GITS_TYPER_PTA) != 0) { redist_base = rman_get_bushandle(gic_sc->gic_redists.pcpu[cpuid]); /* * Target Address correspond to the base physical * address of Re-Distributors. */ target = vtophys(redist_base); } else { /* Target Address correspond to unique processor numbers */ typer = gic_r_read(gic_sc, 8, GICR_TYPER); target = GICR_TYPER_CPUNUM(typer); } sc->its_cols[cpuid]->col_target = target; sc->its_cols[cpuid]->col_id = cpuid; its_cmd_mapc(sc, sc->its_cols[cpuid], 1); its_cmd_invall(sc, sc->its_cols[cpuid]); } static void lpi_init_conftable(struct gic_v3_its_softc *sc) { device_t parent; struct gic_v3_softc *gic_sc; vm_offset_t conf_base; uint8_t prio_default; parent = device_get_parent(sc->dev); gic_sc = device_get_softc(parent); /* * LPI Configuration Table settings. * Notice that Configuration Table is shared among all * Re-Distributors, so this is going to be created just once. */ conf_base = (vm_offset_t)contigmalloc(LPI_CONFTAB_SIZE, M_GIC_V3_ITS, (M_WAITOK | M_ZERO), 0, ~0UL, PAGE_SIZE_64K, 0); if (bootverbose) { device_printf(sc->dev, "LPI Configuration Table at PA: 0x%lx\n", vtophys(conf_base)); } /* * Let the default priority be aligned with all other * interrupts assuming that each interrupt is assigned * MAX priority at startup. MAX priority on the other * hand cannot be higher than 0xFC for LPIs. */ prio_default = GIC_PRIORITY_MAX; /* Write each settings byte to LPI configuration table */ memset((void *)conf_base, (prio_default & LPI_CONF_PRIO_MASK) | LPI_CONF_GROUP1, LPI_CONFTAB_SIZE); cpu_dcache_wb_range((vm_offset_t)conf_base, roundup2(LPI_CONFTAB_SIZE, PAGE_SIZE_64K)); gic_sc->gic_redists.lpis.conf_base = conf_base; } static void lpi_alloc_cpu_pendtables(struct gic_v3_its_softc *sc) { device_t parent; struct gic_v3_softc *gic_sc; vm_offset_t pend_base; u_int cpuid; parent = device_get_parent(sc->dev); gic_sc = device_get_softc(parent); /* * LPI Pending Table settings. * This has to be done for each Re-Distributor, hence for each CPU. */ for (cpuid = 0; cpuid < mp_ncpus; cpuid++) { /* Limit allocation to active CPUs only */ if (CPU_ISSET(cpuid, &all_cpus) == 0) continue; pend_base = (vm_offset_t)contigmalloc( roundup2(LPI_PENDTAB_SIZE, PAGE_SIZE_64K), M_GIC_V3_ITS, (M_WAITOK | M_ZERO), 0, ~0UL, PAGE_SIZE_64K, 0); /* Clean D-cache so that ITS can see zeroed pages */ cpu_dcache_wb_range((vm_offset_t)pend_base, roundup2(LPI_PENDTAB_SIZE, PAGE_SIZE_64K)); if (bootverbose) { device_printf(sc->dev, "LPI Pending Table for CPU%u at PA: 0x%lx\n", cpuid, vtophys(pend_base)); } gic_sc->gic_redists.lpis.pend_base[cpuid] = pend_base; } /* Ensure visibility of pend_base addresses on other CPUs */ wmb(); } static int lpi_config_cpu(struct gic_v3_its_softc *sc) { device_t parent; struct gic_v3_softc *gic_sc; vm_offset_t conf_base, pend_base; uint64_t gicr_xbaser, gicr_temp; uint64_t cache, share, idbits; uint32_t gicr_ctlr; u_int cpuid; parent = device_get_parent(sc->dev); gic_sc = device_get_softc(parent); cpuid = PCPU_GET(cpuid); /* Ensure data observability on a current CPU */ rmb(); conf_base = gic_sc->gic_redists.lpis.conf_base; pend_base = gic_sc->gic_redists.lpis.pend_base[cpuid]; /* Disable LPIs */ gicr_ctlr = gic_r_read(gic_sc, 4, GICR_CTLR); gicr_ctlr &= ~GICR_CTLR_LPI_ENABLE; gic_r_write(gic_sc, 4, GICR_CTLR, gicr_ctlr); /* Perform full system barrier */ dsb(sy); /* * Set GICR_PROPBASER */ /* * Find out how many bits do we need for LPI identifiers. * Remark 1.: Even though we have (LPI_CONFTAB_SIZE / 8) LPIs * the notified LPI ID still starts from 8192 * (GIC_FIRST_LPI). * Remark 2.: This could be done on compilation time but there * seems to be no sufficient macro. */ idbits = flsl(LPI_CONFTAB_SIZE + GIC_FIRST_LPI) - 1; /* Set defaults: Normal Inner WAWB, IS */ cache = GICR_PROPBASER_CACHE_NIWAWB; share = GICR_PROPBASER_SHARE_IS; gicr_xbaser = vtophys(conf_base) | ((idbits - 1) & GICR_PROPBASER_IDBITS_MASK) | (cache << GICR_PROPBASER_CACHE_SHIFT) | (share << GICR_PROPBASER_SHARE_SHIFT); gic_r_write(gic_sc, 8, GICR_PROPBASER, gicr_xbaser); gicr_temp = gic_r_read(gic_sc, 8, GICR_PROPBASER); if (((gicr_xbaser ^ gicr_temp) & GICR_PROPBASER_SHARE_MASK) != 0) { if (bootverbose) { device_printf(sc->dev, "Will use cache flushing for LPI " "Configuration Table\n"); } gic_sc->gic_redists.lpis.flags |= LPI_FLAGS_CONF_FLUSH; } /* * Set GICR_PENDBASER */ /* Set defaults: Normal Inner WAWB, IS */ cache = GICR_PENDBASER_CACHE_NIWAWB; share = GICR_PENDBASER_SHARE_IS; gicr_xbaser = vtophys(pend_base) | (cache << GICR_PENDBASER_CACHE_SHIFT) | (share << GICR_PENDBASER_SHARE_SHIFT); gic_r_write(gic_sc, 8, GICR_PENDBASER, gicr_xbaser); /* Enable LPIs */ gicr_ctlr = gic_r_read(gic_sc, 4, GICR_CTLR); gicr_ctlr |= GICR_CTLR_LPI_ENABLE; gic_r_write(gic_sc, 4, GICR_CTLR, gicr_ctlr); dsb(sy); return (0); } static void lpi_bitmap_init(struct gic_v3_its_softc *sc) { device_t parent; struct gic_v3_softc *gic_sc; uint32_t lpi_id_num; size_t lpi_chunks_num; size_t bits_in_chunk; parent = device_get_parent(sc->dev); gic_sc = device_get_softc(parent); lpi_id_num = (1 << gic_sc->gic_idbits) - 1; /* Substract IDs dedicated for SGIs, PPIs and SPIs */ lpi_id_num -= GIC_FIRST_LPI; sc->its_lpi_maxid = lpi_id_num; bits_in_chunk = sizeof(*sc->its_lpi_bitmap) * NBBY; /* * Round up to the number of bits in chunk. * We will need to take care to avoid using invalid LPI IDs later. */ lpi_id_num = roundup2(lpi_id_num, bits_in_chunk); lpi_chunks_num = lpi_id_num / bits_in_chunk; sc->its_lpi_bitmap = contigmalloc((lpi_chunks_num * sizeof(*sc->its_lpi_bitmap)), M_GIC_V3_ITS, (M_WAITOK | M_ZERO), 0, ~0UL, sizeof(*sc->its_lpi_bitmap), 0); } static int lpi_alloc_chunk(struct gic_v3_its_softc *sc, struct lpi_chunk *lpic, u_int nvecs) { int fclr; /* First cleared bit */ uint8_t *bitmap; size_t nb, i; bitmap = (uint8_t *)sc->its_lpi_bitmap; fclr = 0; retry: /* Check other bits - sloooow */ for (i = 0, nb = fclr; i < nvecs; i++, nb++) { if (nb > sc->its_lpi_maxid) return (EINVAL); if (isset(bitmap, nb)) { /* To little free bits in this area. Move on. */ fclr = nb + 1; goto retry; } } /* This area is free. Take it. */ bit_nset(bitmap, fclr, fclr + nvecs - 1); lpic->lpi_base = fclr + GIC_FIRST_LPI; lpic->lpi_num = nvecs; lpic->lpi_free = lpic->lpi_num; return (0); } static void lpi_free_chunk(struct gic_v3_its_softc *sc, struct lpi_chunk *lpic) { int start, end; uint8_t *bitmap; bitmap = (uint8_t *)sc->its_lpi_bitmap; KASSERT((lpic->lpi_free == lpic->lpi_num), ("Trying to free LPI chunk that is still in use.\n")); /* First bit of this chunk in a global bitmap */ start = lpic->lpi_base - GIC_FIRST_LPI; /* and last bit of this chunk... */ end = start + lpic->lpi_num - 1; /* Finally free this chunk */ bit_nclear(bitmap, start, end); } static void lpi_configure(struct gic_v3_its_softc *sc, struct its_dev *its_dev, uint32_t lpinum, boolean_t unmask) { device_t parent; struct gic_v3_softc *gic_sc; uint8_t *conf_byte; parent = device_get_parent(sc->dev); gic_sc = device_get_softc(parent); conf_byte = (uint8_t *)gic_sc->gic_redists.lpis.conf_base; conf_byte += (lpinum - GIC_FIRST_LPI); if (unmask) *conf_byte |= LPI_CONF_ENABLE; else *conf_byte &= ~LPI_CONF_ENABLE; if ((gic_sc->gic_redists.lpis.flags & LPI_FLAGS_CONF_FLUSH) != 0) { /* Clean D-cache under configuration byte */ cpu_dcache_wb_range((vm_offset_t)conf_byte, sizeof(*conf_byte)); } else { /* DSB inner shareable, store */ dsb(ishst); } its_cmd_inv(sc, its_dev, lpinum); } static void lpi_map_to_device(struct gic_v3_its_softc *sc, struct its_dev *its_dev, uint32_t id, uint32_t pid) { if ((pid < its_dev->lpis.lpi_base) || (pid >= (its_dev->lpis.lpi_base + its_dev->lpis.lpi_num))) panic("Trying to map ivalid LPI %u for the device\n", pid); its_cmd_mapvi(sc, its_dev, id, pid); } static void lpi_xmask_irq(device_t parent, uint32_t irq, boolean_t unmask) { struct its_dev *its_dev; TAILQ_FOREACH(its_dev, &its_sc->its_dev_list, entry) { if (irq >= its_dev->lpis.lpi_base && irq < (its_dev->lpis.lpi_base + its_dev->lpis.lpi_num)) { lpi_configure(its_sc, its_dev, irq, unmask); return; } } panic("Trying to %s not existing LPI: %u\n", (unmask == TRUE) ? "unmask" : "mask", irq); } void lpi_unmask_irq(device_t parent, uint32_t irq) { lpi_xmask_irq(parent, irq, 1); } void lpi_mask_irq(device_t parent, uint32_t irq) { lpi_xmask_irq(parent, irq, 0); } /* * Commands handling. */ static __inline void cmd_format_command(struct its_cmd *cmd, uint8_t cmd_type) { /* Command field: DW0 [7:0] */ cmd->cmd_dword[0] &= ~CMD_COMMAND_MASK; cmd->cmd_dword[0] |= cmd_type; } static __inline void cmd_format_devid(struct its_cmd *cmd, uint32_t devid) { /* Device ID field: DW0 [63:32] */ cmd->cmd_dword[0] &= ~CMD_DEVID_MASK; cmd->cmd_dword[0] |= ((uint64_t)devid << CMD_DEVID_SHIFT); } static __inline void cmd_format_size(struct its_cmd *cmd, uint16_t size) { /* Size field: DW1 [4:0] */ cmd->cmd_dword[1] &= ~CMD_SIZE_MASK; cmd->cmd_dword[1] |= (size & CMD_SIZE_MASK); } static __inline void cmd_format_id(struct its_cmd *cmd, uint32_t id) { /* ID field: DW1 [31:0] */ cmd->cmd_dword[1] &= ~CMD_ID_MASK; cmd->cmd_dword[1] |= id; } static __inline void cmd_format_pid(struct its_cmd *cmd, uint32_t pid) { /* Physical ID field: DW1 [63:32] */ cmd->cmd_dword[1] &= ~CMD_PID_MASK; cmd->cmd_dword[1] |= ((uint64_t)pid << CMD_PID_SHIFT); } static __inline void cmd_format_col(struct its_cmd *cmd, uint16_t col_id) { /* Collection field: DW2 [16:0] */ cmd->cmd_dword[2] &= ~CMD_COL_MASK; cmd->cmd_dword[2] |= col_id; } static __inline void cmd_format_target(struct its_cmd *cmd, uint64_t target) { /* Target Address field: DW2 [47:16] */ cmd->cmd_dword[2] &= ~CMD_TARGET_MASK; cmd->cmd_dword[2] |= (target & CMD_TARGET_MASK); } static __inline void cmd_format_itt(struct its_cmd *cmd, uint64_t itt) { /* ITT Address field: DW2 [47:8] */ cmd->cmd_dword[2] &= ~CMD_ITT_MASK; cmd->cmd_dword[2] |= (itt & CMD_ITT_MASK); } static __inline void cmd_format_valid(struct its_cmd *cmd, uint8_t valid) { /* Valid field: DW2 [63] */ cmd->cmd_dword[2] &= ~CMD_VALID_MASK; cmd->cmd_dword[2] |= ((uint64_t)valid << CMD_VALID_SHIFT); } static __inline void cmd_fix_endian(struct its_cmd *cmd) { size_t i; for (i = 0; i < nitems(cmd->cmd_dword); i++) cmd->cmd_dword[i] = htole64(cmd->cmd_dword[i]); } static void its_cmd_mapc(struct gic_v3_its_softc *sc, struct its_col *col, uint8_t valid) { struct its_cmd_desc desc; desc.cmd_type = ITS_CMD_MAPC; desc.cmd_desc_mapc.col = col; /* * Valid bit set - map the collection. * Valid bit cleared - unmap the collection. */ desc.cmd_desc_mapc.valid = valid; its_cmd_send(sc, &desc); } static void its_cmd_mapvi(struct gic_v3_its_softc *sc, struct its_dev *its_dev, uint32_t id, uint32_t pid) { struct its_cmd_desc desc; desc.cmd_type = ITS_CMD_MAPVI; desc.cmd_desc_mapvi.its_dev = its_dev; desc.cmd_desc_mapvi.id = id; desc.cmd_desc_mapvi.pid = pid; its_cmd_send(sc, &desc); } static void __unused its_cmd_mapi(struct gic_v3_its_softc *sc, struct its_dev *its_dev, uint32_t lpinum) { struct its_cmd_desc desc; desc.cmd_type = ITS_CMD_MAPI; desc.cmd_desc_mapi.its_dev = its_dev; desc.cmd_desc_mapi.lpinum = lpinum; its_cmd_send(sc, &desc); } static void its_cmd_mapd(struct gic_v3_its_softc *sc, struct its_dev *its_dev, uint8_t valid) { struct its_cmd_desc desc; desc.cmd_type = ITS_CMD_MAPD; desc.cmd_desc_mapd.its_dev = its_dev; desc.cmd_desc_mapd.valid = valid; its_cmd_send(sc, &desc); } static void its_cmd_inv(struct gic_v3_its_softc *sc, struct its_dev *its_dev, uint32_t lpinum) { struct its_cmd_desc desc; desc.cmd_type = ITS_CMD_INV; desc.cmd_desc_inv.lpinum = lpinum - its_dev->lpis.lpi_base; desc.cmd_desc_inv.its_dev = its_dev; its_cmd_send(sc, &desc); } static void its_cmd_invall(struct gic_v3_its_softc *sc, struct its_col *col) { struct its_cmd_desc desc; desc.cmd_type = ITS_CMD_INVALL; desc.cmd_desc_invall.col = col; its_cmd_send(sc, &desc); } /* * Helper routines for commands processing. */ static __inline boolean_t its_cmd_queue_full(struct gic_v3_its_softc *sc) { size_t read_idx, write_idx; write_idx = (size_t)(sc->its_cmdq_write - sc->its_cmdq_base); read_idx = gic_its_read(sc, 4, GITS_CREADR) / sizeof(struct its_cmd); /* * The queue is full when the write offset points * at the command before the current read offset. */ if (((write_idx + 1) % ITS_CMDQ_NENTRIES) == read_idx) return (TRUE); return (FALSE); } static __inline void its_cmd_sync(struct gic_v3_its_softc *sc, struct its_cmd *cmd) { if ((sc->its_flags & ITS_FLAGS_CMDQ_FLUSH) != 0) { /* Clean D-cache under command. */ cpu_dcache_wb_range((vm_offset_t)cmd, sizeof(*cmd)); } else { /* DSB inner shareable, store */ dsb(ishst); } } static struct its_cmd * its_cmd_alloc_locked(struct gic_v3_its_softc *sc) { struct its_cmd *cmd; size_t us_left; /* * XXX ARM64TODO: This is obviously a significant delay. * The reason for that is that currently the time frames for * the command to complete (and therefore free the descriptor) * are not known. */ us_left = 1000000; mtx_assert(&sc->its_spin_mtx, MA_OWNED); while (its_cmd_queue_full(sc)) { if (us_left-- == 0) { /* Timeout while waiting for free command */ device_printf(sc->dev, "Timeout while waiting for free command\n"); return (NULL); } DELAY(1); } cmd = sc->its_cmdq_write; sc->its_cmdq_write++; if (sc->its_cmdq_write == (sc->its_cmdq_base + ITS_CMDQ_NENTRIES)) { /* Wrap the queue */ sc->its_cmdq_write = sc->its_cmdq_base; } return (cmd); } static uint64_t its_cmd_prepare(struct its_cmd *cmd, struct its_cmd_desc *desc) { uint64_t target; uint8_t cmd_type; u_int size; boolean_t error; error = FALSE; cmd_type = desc->cmd_type; target = ITS_TARGET_NONE; switch (cmd_type) { case ITS_CMD_SYNC: /* Wait for previous commands completion */ target = desc->cmd_desc_sync.col->col_target; cmd_format_command(cmd, ITS_CMD_SYNC); cmd_format_target(cmd, target); break; case ITS_CMD_MAPD: /* Assign ITT to device */ target = desc->cmd_desc_mapd.its_dev->col->col_target; cmd_format_command(cmd, ITS_CMD_MAPD); cmd_format_itt(cmd, vtophys(desc->cmd_desc_mapd.its_dev->itt)); /* * Size describes number of bits to encode interrupt IDs * supported by the device minus one. * When V (valid) bit is zero, this field should be written * as zero. */ if (desc->cmd_desc_mapd.valid != 0) { size = fls(desc->cmd_desc_mapd.its_dev->lpis.lpi_num); size = MAX(1, size) - 1; } else size = 0; cmd_format_size(cmd, size); cmd_format_devid(cmd, desc->cmd_desc_mapd.its_dev->devid); cmd_format_valid(cmd, desc->cmd_desc_mapd.valid); break; case ITS_CMD_MAPC: /* Map collection to Re-Distributor */ target = desc->cmd_desc_mapc.col->col_target; cmd_format_command(cmd, ITS_CMD_MAPC); cmd_format_col(cmd, desc->cmd_desc_mapc.col->col_id); cmd_format_valid(cmd, desc->cmd_desc_mapc.valid); cmd_format_target(cmd, target); break; case ITS_CMD_MAPVI: target = desc->cmd_desc_mapvi.its_dev->col->col_target; cmd_format_command(cmd, ITS_CMD_MAPVI); cmd_format_devid(cmd, desc->cmd_desc_mapvi.its_dev->devid); cmd_format_id(cmd, desc->cmd_desc_mapvi.id); cmd_format_pid(cmd, desc->cmd_desc_mapvi.pid); cmd_format_col(cmd, desc->cmd_desc_mapvi.its_dev->col->col_id); break; case ITS_CMD_MAPI: target = desc->cmd_desc_mapi.its_dev->col->col_target; cmd_format_command(cmd, ITS_CMD_MAPI); cmd_format_devid(cmd, desc->cmd_desc_mapi.its_dev->devid); cmd_format_id(cmd, desc->cmd_desc_mapi.lpinum); cmd_format_col(cmd, desc->cmd_desc_mapi.its_dev->col->col_id); break; case ITS_CMD_INV: target = desc->cmd_desc_inv.its_dev->col->col_target; cmd_format_command(cmd, ITS_CMD_INV); cmd_format_devid(cmd, desc->cmd_desc_inv.its_dev->devid); cmd_format_id(cmd, desc->cmd_desc_inv.lpinum); break; case ITS_CMD_INVALL: cmd_format_command(cmd, ITS_CMD_INVALL); cmd_format_col(cmd, desc->cmd_desc_invall.col->col_id); break; default: error = TRUE; break; } if (!error) cmd_fix_endian(cmd); return (target); } static __inline uint64_t its_cmd_cwriter_offset(struct gic_v3_its_softc *sc, struct its_cmd *cmd) { uint64_t off; off = (cmd - sc->its_cmdq_base) * sizeof(*cmd); return (off); } static void its_cmd_wait_completion(struct gic_v3_its_softc *sc, struct its_cmd *cmd_first, struct its_cmd *cmd_last) { uint64_t first, last, read; size_t us_left; /* * XXX ARM64TODO: This is obviously a significant delay. * The reason for that is that currently the time frames for * the command to complete are not known. */ us_left = 1000000; first = its_cmd_cwriter_offset(sc, cmd_first); last = its_cmd_cwriter_offset(sc, cmd_last); for (;;) { read = gic_its_read(sc, 8, GITS_CREADR); if (read < first || read >= last) break; if (us_left-- == 0) { /* This means timeout */ device_printf(sc->dev, "Timeout while waiting for CMD completion.\n"); return; } DELAY(1); } } static int its_cmd_send(struct gic_v3_its_softc *sc, struct its_cmd_desc *desc) { struct its_cmd *cmd, *cmd_sync, *cmd_write; struct its_col col_sync; struct its_cmd_desc desc_sync; uint64_t target, cwriter; mtx_lock_spin(&sc->its_spin_mtx); cmd = its_cmd_alloc_locked(sc); if (cmd == NULL) { device_printf(sc->dev, "could not allocate ITS command\n"); mtx_unlock_spin(&sc->its_spin_mtx); return (EBUSY); } target = its_cmd_prepare(cmd, desc); its_cmd_sync(sc, cmd); if (target != ITS_TARGET_NONE) { cmd_sync = its_cmd_alloc_locked(sc); if (cmd_sync == NULL) goto end; desc_sync.cmd_type = ITS_CMD_SYNC; col_sync.col_target = target; desc_sync.cmd_desc_sync.col = &col_sync; its_cmd_prepare(cmd_sync, &desc_sync); its_cmd_sync(sc, cmd_sync); } end: /* Update GITS_CWRITER */ cwriter = its_cmd_cwriter_offset(sc, sc->its_cmdq_write); gic_its_write(sc, 8, GITS_CWRITER, cwriter); cmd_write = sc->its_cmdq_write; mtx_unlock_spin(&sc->its_spin_mtx); its_cmd_wait_completion(sc, cmd, cmd_write); return (0); } static struct its_dev * its_device_find_locked(struct gic_v3_its_softc *sc, device_t pci_dev) { struct its_dev *its_dev; mtx_assert(&sc->its_mtx, MA_OWNED); /* Find existing device if any */ TAILQ_FOREACH(its_dev, &sc->its_dev_list, entry) { if (its_dev->pci_dev == pci_dev) return (its_dev); } return (NULL); } static struct its_dev * its_device_alloc_locked(struct gic_v3_its_softc *sc, device_t pci_dev, u_int nvecs) { struct its_dev *newdev; uint64_t typer; uint32_t devid; u_int cpuid; size_t esize; mtx_assert(&sc->its_mtx, MA_OWNED); /* Find existing device if any */ newdev = its_device_find_locked(sc, pci_dev); if (newdev != NULL) return (newdev); devid = its_get_devid(pci_dev); /* There was no previously created device. Create one now */ newdev = malloc(sizeof(*newdev), M_GIC_V3_ITS, (M_NOWAIT | M_ZERO)); if (newdev == NULL) return (NULL); newdev->pci_dev = pci_dev; newdev->devid = devid; if (lpi_alloc_chunk(sc, &newdev->lpis, nvecs) != 0) { free(newdev, M_GIC_V3_ITS); return (NULL); } /* Get ITT entry size */ typer = gic_its_read(sc, 8, GITS_TYPER); esize = GITS_TYPER_ITTES(typer); /* * Allocate ITT for this device. * PA has to be 256 B aligned. At least two entries for device. */ newdev->itt = (vm_offset_t)contigmalloc( roundup2(roundup2(nvecs, 2) * esize, 0x100), M_GIC_V3_ITS, (M_NOWAIT | M_ZERO), 0, ~0UL, 0x100, 0); if (newdev->itt == 0) { lpi_free_chunk(sc, &newdev->lpis); free(newdev, M_GIC_V3_ITS); return (NULL); } /* * XXX ARM64TODO: Currently all interrupts are going * to be bound to the CPU that performs the configuration. */ cpuid = PCPU_GET(cpuid); newdev->col = sc->its_cols[cpuid]; TAILQ_INSERT_TAIL(&sc->its_dev_list, newdev, entry); /* Map device to its ITT */ its_cmd_mapd(sc, newdev, 1); return (newdev); } static __inline void its_device_asign_lpi_locked(struct gic_v3_its_softc *sc, struct its_dev *its_dev, u_int *irq) { mtx_assert(&sc->its_mtx, MA_OWNED); if (its_dev->lpis.lpi_free == 0) { panic("Requesting more LPIs than allocated for this device. " "LPI num: %u, free %u", its_dev->lpis.lpi_num, its_dev->lpis.lpi_free); } *irq = its_dev->lpis.lpi_base + (its_dev->lpis.lpi_num - its_dev->lpis.lpi_free); its_dev->lpis.lpi_free--; } /* * ITS quirks. * Add vendor specific PCI devid function here. */ static uint32_t its_get_devid_thunder(device_t pci_dev) { int bsf; int pem; uint32_t bus; bus = pci_get_bus(pci_dev); bsf = PCI_RID(pci_get_bus(pci_dev), pci_get_slot(pci_dev), pci_get_function(pci_dev)); /* Check if accessing internal PCIe (low bus numbers) */ if (bus < GIC_V3_ITS_QUIRK_THUNDERX_PEM_BUS_OFFSET) { return ((pci_get_domain(pci_dev) << PCI_RID_DOMAIN_SHIFT) | bsf); /* PEM otherwise */ } else { /* PEM (PCIe MAC/root complex) number is equal to domain */ pem = pci_get_domain(pci_dev); /* * Set appropriate device ID (passed by the HW along with * the transaction to memory) for different root complex * numbers using hard-coded domain portion for each group. */ if (pem < 3) return ((0x1 << PCI_RID_DOMAIN_SHIFT) | bsf); if (pem < 6) return ((0x3 << PCI_RID_DOMAIN_SHIFT) | bsf); if (pem < 9) return ((0x9 << PCI_RID_DOMAIN_SHIFT) | bsf); if (pem < 12) return ((0xB << PCI_RID_DOMAIN_SHIFT) | bsf); } return (0); } static uint32_t its_get_devbits_thunder(device_t dev) { uint32_t devid_bits; /* * GITS_TYPER[17:13] of ThunderX reports that device IDs * are to be 21 bits in length. * The entry size of the ITS table can be read from GITS_BASERn[52:48] * and on ThunderX is supposed to be 8 bytes in length (for device * table). Finally the page size that is to be used by ITS to access * this table will be set to 64KB. * * This gives 0x200000 entries of size 0x8 bytes covered by 256 pages * each of which 64KB in size. The number of pages (minus 1) should * then be written to GITS_BASERn[7:0]. In that case this value would * be 0xFF but on ThunderX the maximum value that HW accepts is 0xFD. * * Set arbitrary number of device ID bits to 20 in order to limit * the number of entries in ITS device table to 0x100000 and hence * the table size to 8MB. */ devid_bits = 20; if (bootverbose) { device_printf(dev, "Limiting number of Device ID bits implemented to %d\n", devid_bits); } return (devid_bits); } static __inline uint32_t its_get_devbits_default(device_t dev) { uint64_t gits_typer; struct gic_v3_its_softc *sc; sc = device_get_softc(dev); gits_typer = gic_its_read(sc, 8, GITS_TYPER); return (GITS_TYPER_DEVB(gits_typer)); } static uint32_t its_get_devbits(device_t dev) { const struct its_quirks *quirk; size_t i; for (i = 0; i < nitems(its_quirks); i++) { quirk = &its_quirks[i]; if (CPU_MATCH_RAW(quirk->cpuid_mask, quirk->cpuid)) { if (quirk->devbits_func != NULL) return ((*quirk->devbits_func)(dev)); } } return (its_get_devbits_default(dev)); } static __inline uint32_t its_get_devid_default(device_t pci_dev) { return (PCI_DEVID_GENERIC(pci_dev)); } static uint32_t its_get_devid(device_t pci_dev) { const struct its_quirks *quirk; size_t i; for (i = 0; i < nitems(its_quirks); i++) { quirk = &its_quirks[i]; if (CPU_MATCH_RAW(quirk->cpuid_mask, quirk->cpuid)) { if (quirk->devid_func != NULL) return ((*quirk->devid_func)(pci_dev)); } } return (its_get_devid_default(pci_dev)); } /* * Message signalled interrupts handling. */ /* * XXX ARM64TODO: Watch out for "irq" type. * * In theory GIC can handle up to (2^32 - 1) interrupt IDs whereas * we pass "irq" pointer of type integer. This is obviously wrong but * is determined by the way as PCI layer wants it to be done. */ int gic_v3_its_alloc_msix(device_t dev, device_t pci_dev, int *irq) { struct gic_v3_its_softc *sc; struct its_dev *its_dev; u_int nvecs; sc = device_get_softc(dev); mtx_lock(&sc->its_mtx); nvecs = PCI_MSIX_NUM(pci_dev); /* * Allocate device as seen by ITS if not already available. * Notice that MSI-X interrupts are allocated on one-by-one basis. */ its_dev = its_device_alloc_locked(sc, pci_dev, nvecs); if (its_dev == NULL) { mtx_unlock(&sc->its_mtx); return (ENOMEM); } its_device_asign_lpi_locked(sc, its_dev, irq); mtx_unlock(&sc->its_mtx); return (0); } int gic_v3_its_alloc_msi(device_t dev, device_t pci_dev, int count, int *irqs) { struct gic_v3_its_softc *sc; struct its_dev *its_dev; sc = device_get_softc(dev); /* Allocate device as seen by ITS if not already available. */ mtx_lock(&sc->its_mtx); its_dev = its_device_alloc_locked(sc, pci_dev, count); if (its_dev == NULL) { mtx_unlock(&sc->its_mtx); return (ENOMEM); } for (; count > 0; count--) { its_device_asign_lpi_locked(sc, its_dev, irqs); irqs++; } mtx_unlock(&sc->its_mtx); return (0); } int gic_v3_its_map_msi(device_t dev, device_t pci_dev, int irq, uint64_t *addr, uint32_t *data) { struct gic_v3_its_softc *sc; bus_space_handle_t its_bsh; struct its_dev *its_dev; uint64_t its_pa; uint32_t id; sc = device_get_softc(dev); /* Verify that this device is allocated and owns this LPI */ mtx_lock(&sc->its_mtx); its_dev = its_device_find_locked(sc, pci_dev); mtx_unlock(&sc->its_mtx); if (its_dev == NULL) return (EINVAL); id = irq - its_dev->lpis.lpi_base; lpi_map_to_device(sc, its_dev, id, irq); its_bsh = rman_get_bushandle(&sc->its_res[0]); its_pa = vtophys(its_bsh); *addr = (its_pa + GITS_TRANSLATER); *data = id; return (0); } Index: projects/clang380-import/sys/boot/arm/at91/boot2/boot2.c =================================================================== --- projects/clang380-import/sys/boot/arm/at91/boot2/boot2.c (revision 294776) +++ projects/clang380-import/sys/boot/arm/at91/boot2/boot2.c (revision 294777) @@ -1,397 +1,361 @@ /*- * Copyright (c) 2008 John Hay - * Copyright (c) 2006 Warner Losh + * Copyright (c) 2006 M Warner Losh * Copyright (c) 1998 Robert Nordier * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include "lib.h" #include "board.h" +#include "paths.h" +#include "rbx.h" -#define RBX_ASKNAME 0x0 /* -a */ -#define RBX_SINGLE 0x1 /* -s */ -/* 0x2 is reserved for log2(RB_NOSYNC). */ -/* 0x3 is reserved for log2(RB_HALT). */ -/* 0x4 is reserved for log2(RB_INITNAME). */ -#define RBX_DFLTROOT 0x5 /* -r */ -/* #define RBX_KDB 0x6 -d */ -/* 0x7 is reserved for log2(RB_RDONLY). */ -/* 0x8 is reserved for log2(RB_DUMP). */ -/* 0x9 is reserved for log2(RB_MINIROOT). */ -#define RBX_CONFIG 0xa /* -c */ -#define RBX_VERBOSE 0xb /* -v */ -/* #define RBX_SERIAL 0xc -h */ -/* #define RBX_CDROM 0xd -C */ -/* 0xe is reserved for log2(RB_POWEROFF). */ -#define RBX_GDB 0xf /* -g */ -/* #define RBX_MUTE 0x10 -m */ -/* 0x11 is reserved for log2(RB_SELFTEST). */ -/* 0x12 is reserved for boot programs. */ -/* 0x13 is reserved for boot programs. */ -/* #define RBX_PAUSE 0x14 -p */ -/* #define RBX_QUIET 0x15 -q */ -#define RBX_NOINTR 0x1c /* -n */ -/* 0x1d is reserved for log2(RB_MULTIPLE) and is just misnamed here. */ -/* #define RBX_DUAL 0x1d -D */ -/* 0x1f is reserved for log2(RB_BOOTINFO). */ - -/* pass: -a, -s, -r, -v, -g */ -#define RBX_MASK (OPT_SET(RBX_ASKNAME) | OPT_SET(RBX_SINGLE) | \ - OPT_SET(RBX_DFLTROOT) | \ - OPT_SET(RBX_VERBOSE) | \ - OPT_SET(RBX_GDB)) - -#define PATH_DOTCONFIG "/boot.config" -#define PATH_CONFIG "/boot/config" -//#define PATH_KERNEL "/boot/kernel/kernel" +#undef PATH_KERNEL #define PATH_KERNEL "/boot/kernel/kernel.gz.tramp" extern uint32_t _end; #define NOPT 6 - -#define OPT_SET(opt) (1 << (opt)) -#define OPT_CHECK(opt) ((opts) & OPT_SET(opt)) static const char optstr[NOPT] = "agnrsv"; static const unsigned char bootflags[NOPT] = { RBX_ASKNAME, RBX_GDB, RBX_NOINTR, RBX_DFLTROOT, RBX_SINGLE, RBX_VERBOSE }; unsigned board_id; /* board type to pass to kernel, if set by board_* code */ unsigned dsk_start; static char cmd[512]; static char kname[1024]; static uint32_t opts; static uint8_t dsk_meta; int main(void); static void load(void); static int parse(void); static int dskread(void *, unsigned, unsigned); #ifdef FIXUP_BOOT_DRV static void fixup_boot_drv(caddr_t, int, int, int); #endif #define UFS_SMALL_CGBASE #include "ufsread.c" #ifdef DEBUG #define DPRINTF(fmt, ...) printf(fmt, __VA_ARGS__) #else #define DPRINTF(fmt, ...) #endif static inline int xfsread(ufs_ino_t inode, void *buf, size_t nbyte) { if ((size_t)fsread(inode, buf, nbyte) != nbyte) return -1; return 0; } static inline void getstr(int c) { char *s; s = cmd; if (c == 0) c = getc(10000); for (;;) { switch (c) { case 0: break; case '\177': case '\b': if (s > cmd) { s--; printf("\b \b"); } break; case '\n': case '\r': *s = 0; return; default: if (s - cmd < sizeof(cmd) - 1) *s++ = c; xputchar(c); } c = getc(10000); } } int main(void) { int autoboot, c = 0; ufs_ino_t ino; dmadat = (void *)(0x20000000 + (16 << 20)); board_init(); autoboot = 1; /* Process configuration file */ if ((ino = lookup(PATH_CONFIG)) || (ino = lookup(PATH_DOTCONFIG))) fsread(ino, cmd, sizeof(cmd)); if (*cmd) { if (parse()) autoboot = 0; printf("%s: %s\n", PATH_CONFIG, cmd); /* Do not process this command twice */ *cmd = 0; } if (*kname == '\0') strcpy(kname, PATH_KERNEL); /* Present the user with the boot2 prompt. */ for (;;) { printf("\nDefault: %s\nboot: ", kname); if (!autoboot || (OPT_CHECK(RBX_NOINTR) == 0 && (c = getc(2)) != 0)) getstr(c); xputchar('\n'); autoboot = 0; c = 0; if (parse()) xputchar('\a'); else load(); } return (1); } static void load(void) { Elf32_Ehdr eh; static Elf32_Phdr ep[2]; caddr_t p; ufs_ino_t ino; uint32_t addr; int i, j; #ifdef FIXUP_BOOT_DRV caddr_t staddr; int klen; staddr = (caddr_t)0xffffffff; klen = 0; #endif if (!(ino = lookup(kname))) { if (!ls) printf("No %s\n", kname); return; } if (xfsread(ino, &eh, sizeof(eh))) return; if (!IS_ELF(eh)) { printf("Invalid %s\n", "format"); return; } fs_off = eh.e_phoff; for (j = i = 0; i < eh.e_phnum && j < 2; i++) { if (xfsread(ino, ep + j, sizeof(ep[0]))) return; if (ep[j].p_type == PT_LOAD) j++; } for (i = 0; i < 2; i++) { p = (caddr_t)ep[i].p_paddr; fs_off = ep[i].p_offset; #ifdef FIXUP_BOOT_DRV if (staddr == (caddr_t)0xffffffff) staddr = p; klen += ep[i].p_filesz; #endif if (xfsread(ino, p, ep[i].p_filesz)) return; } addr = eh.e_entry; #ifdef FIXUP_BOOT_DRV fixup_boot_drv(staddr, klen, bootslice, bootpart); #endif ((void(*)(int, int, int, int))addr)(opts & RBX_MASK, board_id, 0, 0); } static int parse() { char *arg = cmd; char *ep, *p; int c, i; while ((c = *arg++)) { if (c == ' ' || c == '\t' || c == '\n') continue; for (p = arg; *p && *p != '\n' && *p != ' ' && *p != '\t'; p++); ep = p; if (*p) *p++ = 0; if (c == '-') { while ((c = *arg++)) { for (i = 0; c != optstr[i]; i++) if (i == NOPT - 1) return -1; opts ^= OPT_SET(bootflags[i]); } } else { arg--; if ((i = ep - arg)) { if ((size_t)i >= sizeof(kname)) return -1; memcpy(kname, arg, i + 1); } } arg = p; } return 0; } static int dskread(void *buf, unsigned lba, unsigned nblk) { struct dos_partition *dp; struct disklabel *d; char *sec; int i; if (!dsk_meta) { sec = dmadat->secbuf; dsk_start = 0; if (drvread(sec, DOSBBSECTOR, 1)) return -1; dp = (void *)(sec + DOSPARTOFF); for (i = 0; i < NDOSPART; i++) { if (dp[i].dp_typ == DOSPTYP_386BSD) break; } if (i == NDOSPART) return -1; /* * Although dp_start is aligned within the disk * partition structure, DOSPARTOFF is 446, which is * only word (2) aligned, not longword (4) aligned. * Cope by using memcpy to fetch the start of this * partition. */ memcpy(&dsk_start, &dp[1].dp_start, 4); if (drvread(sec, dsk_start + LABELSECTOR, 1)) return -1; d = (void *)(sec + LABELOFFSET); if (d->d_magic != DISKMAGIC || d->d_magic2 != DISKMAGIC) { printf("Invalid %s\n", "label"); return -1; } if (!d->d_partitions[0].p_size) { printf("Invalid %s\n", "partition"); return -1; } dsk_start += d->d_partitions[0].p_offset; dsk_start -= d->d_partitions[RAW_PART].p_offset; dsk_meta++; } return drvread(buf, dsk_start + lba, nblk); } #ifdef FIXUP_BOOT_DRV /* * fixup_boot_drv() will try to find the ROOTDEVNAME spec in the kernel * and change it to what was specified on the comandline or /boot.conf * file or to what was encountered on the disk. It will try to handle 3 * different disk layouts, raw (dangerously dedicated), slice only and * slice + partition. It will look for the following strings in the * kernel, but if it is one of the first three, the string in the kernel * must use the correct form to match the actual disk layout: * - ufs:ad0a * - ufs:ad0s1 * - ufs:ad0s1a * - ufs:ROOTDEVNAME * In the case of the first three strings, only the "a" at the end and * the "1" after the "s" will be modified, if they exist. The string * length will not be changed. In the case of the last string, the * whole string will be built up and nul, '\0' terminated. */ static void fixup_boot_drv(caddr_t addr, int klen, int bs, int bp) { const u_int8_t op[] = "ufs:ROOTDEVNAME"; const u_int8_t op2[] = "ufs:ad0"; u_int8_t *p, *ps; DPRINTF("fixup_boot_drv: 0x%x, %d, slice %d, partition %d\n", (int)addr, klen, bs, bp); if (bs > 4) return; if (bp > 7) return; ps = memmem(addr, klen, op, sizeof(op)); if (ps != NULL) { p = ps + 4; /* past ufs: */ DPRINTF("Found it at 0x%x\n", (int)ps); p[0] = 'a'; p[1] = 'd'; p[2] = '0'; /* ad0 */ p += 3; if (bs > 0) { /* append slice */ *p++ = 's'; *p++ = bs + '0'; } if (disk_layout != DL_SLICE) { /* append partition */ *p++ = bp + 'a'; } *p = '\0'; } else { ps = memmem(addr, klen, op2, sizeof(op2) - 1); if (ps != NULL) { p = ps + sizeof(op2) - 1; DPRINTF("Found it at 0x%x\n", (int)ps); if (*p == 's') { /* fix slice */ p++; *p++ = bs + '0'; } if (*p == 'a') *p = bp + 'a'; } } if (ps == NULL) { printf("Could not locate \"%s\" to fix kernel boot device, " "check ROOTDEVNAME is set\n", op); return; } DPRINTF("Changed boot device to %s\n", ps); } #endif Index: projects/clang380-import/sys/boot/arm/ixp425/boot2/boot2.c =================================================================== --- projects/clang380-import/sys/boot/arm/ixp425/boot2/boot2.c (revision 294776) +++ projects/clang380-import/sys/boot/arm/ixp425/boot2/boot2.c (revision 294777) @@ -1,484 +1,446 @@ /*- * Copyright (c) 2008 John Hay * Copyright (c) 1998 Robert Nordier * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include "lib.h" +#include "paths.h" +#include "rbx.h" -#define RBX_ASKNAME 0x0 /* -a */ -#define RBX_SINGLE 0x1 /* -s */ -/* 0x2 is reserved for log2(RB_NOSYNC). */ -/* 0x3 is reserved for log2(RB_HALT). */ -/* 0x4 is reserved for log2(RB_INITNAME). */ -#define RBX_DFLTROOT 0x5 /* -r */ -/* #define RBX_KDB 0x6 -d */ -/* 0x7 is reserved for log2(RB_RDONLY). */ -/* 0x8 is reserved for log2(RB_DUMP). */ -/* 0x9 is reserved for log2(RB_MINIROOT). */ -#define RBX_CONFIG 0xa /* -c */ -#define RBX_VERBOSE 0xb /* -v */ -/* #define RBX_SERIAL 0xc -h */ -/* #define RBX_CDROM 0xd -C */ -/* 0xe is reserved for log2(RB_POWEROFF). */ -#define RBX_GDB 0xf /* -g */ -/* #define RBX_MUTE 0x10 -m */ -/* 0x11 is reserved for log2(RB_SELFTEST). */ -/* 0x12 is reserved for boot programs. */ -/* 0x13 is reserved for boot programs. */ -/* #define RBX_PAUSE 0x14 -p */ -/* #define RBX_QUIET 0x15 -q */ -#define RBX_NOINTR 0x1c /* -n */ -/* 0x1d is reserved for log2(RB_MULTIPLE) and is just misnamed here. */ -/* #define RBX_DUAL 0x1d -D */ -/* 0x1f is reserved for log2(RB_BOOTINFO). */ - -/* pass: -a, -s, -r, -v, -g */ -#define RBX_MASK (OPT_SET(RBX_ASKNAME) | OPT_SET(RBX_SINGLE) | \ - OPT_SET(RBX_DFLTROOT) | \ - OPT_SET(RBX_VERBOSE) | \ - OPT_SET(RBX_GDB)) - -#define PATH_DOTCONFIG "/boot.config" -#define PATH_CONFIG "/boot/config" -#define PATH_KERNEL "/boot/kernel/kernel" - extern uint32_t _end; #define NOPT 6 - -#define OPT_SET(opt) (1 << (opt)) -#define OPT_CHECK(opt) ((opts) & OPT_SET(opt)) static const char optstr[NOPT] = "agnrsv"; static const unsigned char flags[NOPT] = { RBX_ASKNAME, RBX_GDB, RBX_NOINTR, RBX_DFLTROOT, RBX_SINGLE, RBX_VERBOSE }; static unsigned dsk_start; static char cmd[512]; static char kname[1024]; static uint32_t opts; static uint8_t dsk_meta; static int bootslice; static int bootpart; static int disk_layout; #define DL_UNKNOWN 0 #define DL_RAW 1 /* Dangerously dedicated */ #define DL_SLICE 2 /* Use only slices (DOS partitions) */ #define DL_SLICEPART 3 /* Use slices and partitions */ static void load(void); static int parse(void); static int dskread(void *, unsigned, unsigned); static int drvread(void *, unsigned, unsigned); #ifdef FIXUP_BOOT_DRV static void fixup_boot_drv(caddr_t, int, int, int); #endif #include "ufsread.c" #ifdef DEBUG #define DPRINTF(fmt, ...) printf(fmt, __VA_ARGS__) #else #define DPRINTF(fmt, ...) #endif static inline int xfsread(ufs_ino_t inode, void *buf, size_t nbyte) { if ((size_t)fsread(inode, buf, nbyte) != nbyte) return -1; return 0; } static inline void getstr(int c) { char *s; s = cmd; if (c == 0) c = getc(10000); for (;;) { switch (c) { case 0: break; case '\177': case '\b': if (s > cmd) { s--; printf("\b \b"); } break; case '\n': case '\r': *s = 0; return; default: if (s - cmd < sizeof(cmd) - 1) *s++ = c; xputchar(c); } c = getc(10000); } } int main(void) { const char *bt; int autoboot, c = 0; ufs_ino_t ino; dmadat = (void *)(0x1c0000); p_memset((char *)dmadat, 0, 32 * 1024); bt = board_init(); printf("FreeBSD ARM (%s) boot2 v%d.%d\n", bt, 0, 4); autoboot = 1; /* Process configuration file */ if ((ino = lookup(PATH_CONFIG)) || (ino = lookup(PATH_DOTCONFIG))) fsread(ino, cmd, sizeof(cmd)); if (*cmd) { if (parse()) autoboot = 0; printf("%s: %s\n", PATH_CONFIG, cmd); /* Do not process this command twice */ *cmd = 0; } if (*kname == '\0') strcpy(kname, PATH_KERNEL); /* Present the user with the boot2 prompt. */ for (;;) { printf("\nDefault: %s\nboot: ", kname); if (!autoboot || (OPT_CHECK(RBX_NOINTR) == 0 && (c = getc(2)) != 0)) getstr(c); xputchar('\n'); autoboot = 0; c = 0; DPRINTF("cmd is '%s'\n", cmd); if (parse()) xputchar('\a'); else load(); } } static void load(void) { Elf32_Ehdr eh; static Elf32_Phdr ep[2]; caddr_t p; ufs_ino_t ino; uint32_t addr; int i, j; #ifdef FIXUP_BOOT_DRV caddr_t staddr; int klen; staddr = (caddr_t)0xffffffff; klen = 0; #endif if (!(ino = lookup(kname))) { if (!ls) printf("No %s\n", kname); return; } DPRINTF("Found %s\n", kname); if (xfsread(ino, &eh, sizeof(eh))) return; if (!IS_ELF(eh)) { printf("Invalid %s\n", "format"); return; } fs_off = eh.e_phoff; for (j = i = 0; i < eh.e_phnum && j < 2; i++) { if (xfsread(ino, ep + j, sizeof(ep[0]))) return; if (ep[j].p_type == PT_LOAD) j++; } for (i = 0; i < 2; i++) { p = (caddr_t)(ep[i].p_paddr & 0x0fffffff); fs_off = ep[i].p_offset; #ifdef FIXUP_BOOT_DRV if (staddr == (caddr_t)0xffffffff) staddr = p; klen += ep[i].p_filesz; #endif if (xfsread(ino, p, ep[i].p_filesz)) return; } addr = eh.e_entry & 0x0fffffff; DPRINTF("Entry point %x for %s\n", addr, kname); clr_board(); #ifdef FIXUP_BOOT_DRV fixup_boot_drv(staddr, klen, bootslice, bootpart); #endif ((void(*)(int))addr)(RB_BOOTINFO /* XXX | (opts & RBX_MASK) */); } static int parse() { char *arg = cmd; char *ep, *p; int c, i; while ((c = *arg++)) { if (c == ' ' || c == '\t' || c == '\n') continue; for (p = arg; *p && *p != '\n' && *p != ' ' && *p != '\t'; p++); ep = p; if (*p) *p++ = 0; if (c == '-') { while ((c = *arg++)) { for (i = 0; c != optstr[i]; i++) if (i == NOPT - 1) return -1; opts ^= OPT_SET(flags[i]); } } else { arg--; /* look for ad0s1a:... | ad0s1:... */ if (strlen(arg) > 6 && arg[0] == 'a' && arg[1] == 'd' && arg[3] == 's' && (arg[5] == ':' || arg[6] == ':')) { /* XXX Should also handle disk. */ bootslice = arg[4] - '0'; if (bootslice < 1 || bootslice > 4) return (-1); bootpart = 0; if (arg[5] != ':') bootpart = arg[5] - 'a'; if (bootpart < 0 || bootpart > 7) return (-1); dsk_meta = 0; if (arg[5] == ':') arg += 6; else arg += 7; /* look for ad0a:... */ } else if (strlen(arg) > 4 && arg[0] == 'a' && arg[1] == 'd' && arg[2] == '0' && arg[4] == ':') { bootslice = 0; bootpart = arg[3] - 'a'; if (bootpart < 0 || bootpart > 7) return (-1); dsk_meta = 0; arg += 5; } if ((i = ep - arg)) { if ((size_t)i >= sizeof(kname)) return -1; memcpy(kname, arg, i + 1); } } arg = p; } return 0; } /* * dskread() will try to handle the disk layouts that are typically * encountered. * - raw or "Dangerously Dedicated" mode. No real slice table, just the * default one that is included with bsdlabel -B. Typically this is * used with ROOTDEVNAME=\"ufs:ad0a\". * - slice only. Only a slice table is installed with no bsd label or * bsd partition table. This is typically used with * ROOTDEVNAME=\"ufs:ad0s1\". * - slice + bsd label + partition table. This is typically done with * with fdisk + bsdlabel and is used with ROOTDEVNAME=\"ufs:ad0s1a\". */ static int dskread(void *buf, unsigned lba, unsigned nblk) { struct dos_partition *dp; struct disklabel *d; char *sec; int i; if (!dsk_meta) { sec = dmadat->secbuf; dsk_start = 0; if (drvread(sec, DOSBBSECTOR, 1)) return -1; dp = (void *)(sec + DOSPARTOFF); if (bootslice != 0) { i = bootslice - 1; if (dp[i].dp_typ != DOSPTYP_386BSD) return -1; } else { for (i = 0; i < NDOSPART; i++) { if ((dp[i].dp_typ == DOSPTYP_386BSD) && (dp[i].dp_flag == 0x80)) break; } } if (i != NDOSPART) { bootslice = i + 1; DPRINTF("Found an active fbsd slice. (%d)\n", i + 1); /* * Although dp_start is aligned within the disk * partition structure, DOSPARTOFF is 446, which * is only word (2) aligned, not longword (4) * aligned. Cope by using memcpy to fetch the * start of this partition. */ memcpy(&dsk_start, &dp[i].dp_start, 4); dsk_start = swap32(dsk_start); DPRINTF("dsk_start %x\n", dsk_start); if ((bootslice == 4) && (dsk_start == 0)) { disk_layout = DL_RAW; bootslice = 0; } } if (drvread(sec, dsk_start + LABELSECTOR, 1)) return -1; d = (void *)(sec + LABELOFFSET); if ((d->d_magic == DISKMAGIC && d->d_magic2 == DISKMAGIC) || (swap32(d->d_magic) == DISKMAGIC && swap32(d->d_magic2) == DISKMAGIC)) { DPRINTF("p_size = %x\n", !d->d_partitions[bootpart].p_size); if (!d->d_partitions[bootpart].p_size) { printf("Invalid partition\n"); return -1; } DPRINTF("p_offset %x, RAW %x\n", swap32(d->d_partitions[bootpart].p_offset), swap32(d->d_partitions[RAW_PART].p_offset)); dsk_start += swap32(d->d_partitions[bootpart].p_offset); dsk_start -= swap32(d->d_partitions[RAW_PART].p_offset); if ((disk_layout == DL_UNKNOWN) && (bootslice == 0)) disk_layout = DL_RAW; else if (disk_layout == DL_UNKNOWN) disk_layout = DL_SLICEPART; } else { disk_layout = DL_SLICE; DPRINTF("Invalid %s\n", "label"); } DPRINTF("bootslice %d, bootpart %d, dsk_start %u\n", bootslice, bootpart, dsk_start); dsk_meta++; } return drvread(buf, dsk_start + lba, nblk); } static int drvread(void *buf, unsigned lba, unsigned nblk) { static unsigned c = 0x2d5c7c2f; printf("%c\b", c = c << 8 | c >> 24); return (avila_read((char *)buf, lba, nblk)); } #ifdef FIXUP_BOOT_DRV /* * fixup_boot_drv() will try to find the ROOTDEVNAME spec in the kernel * and change it to what was specified on the comandline or /boot.conf * file or to what was encountered on the disk. It will try to handle 3 * different disk layouts, raw (dangerously dedicated), slice only and * slice + partition. It will look for the following strings in the * kernel, but if it is one of the first three, the string in the kernel * must use the correct form to match the actual disk layout: * - ufs:ad0a * - ufs:ad0s1 * - ufs:ad0s1a * - ufs:ROOTDEVNAME * In the case of the first three strings, only the "a" at the end and * the "1" after the "s" will be modified, if they exist. The string * length will not be changed. In the case of the last string, the * whole string will be built up and nul, '\0' terminated. */ static void fixup_boot_drv(caddr_t addr, int klen, int bs, int bp) { const u_int8_t op[] = "ufs:ROOTDEVNAME"; const u_int8_t op2[] = "ufs:ad0"; u_int8_t *p, *ps; DPRINTF("fixup_boot_drv: 0x%x, %d, slice %d, partition %d\n", (int)addr, klen, bs, bp); if (bs > 4) return; if (bp > 7) return; ps = memmem(addr, klen, op, sizeof(op)); if (ps != NULL) { p = ps + 4; /* past ufs: */ DPRINTF("Found it at 0x%x\n", (int)ps); p[0] = 'a'; p[1] = 'd'; p[2] = '0'; /* ad0 */ p += 3; if (bs > 0) { /* append slice */ *p++ = 's'; *p++ = bs + '0'; } if (disk_layout != DL_SLICE) { /* append partition */ *p++ = bp + 'a'; } *p = '\0'; } else { ps = memmem(addr, klen, op2, sizeof(op2) - 1); if (ps != NULL) { p = ps + sizeof(op2) - 1; DPRINTF("Found it at 0x%x\n", (int)ps); if (*p == 's') { /* fix slice */ p++; *p++ = bs + '0'; } if (*p == 'a') *p = bp + 'a'; } } if (ps == NULL) { printf("Could not locate \"%s\" to fix kernel boot device, " "check ROOTDEVNAME is set\n", op); return; } DPRINTF("Changed boot device to %s\n", ps); } #endif Index: projects/clang380-import/sys/boot/common/paths.h =================================================================== --- projects/clang380-import/sys/boot/common/paths.h (nonexistent) +++ projects/clang380-import/sys/boot/common/paths.h (revision 294777) @@ -0,0 +1,39 @@ +/*- + * Copyright (c) 2016 M. Warner Losh + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _PATHS_H_ +#define _PATHS_H_ + +#define PATH_DOTCONFIG "/boot.config" +#define PATH_CONFIG "/boot/config" +#define PATH_BOOT3 "/boot/loader" +#define PATH_LOADER "/boot/loader" +#define PATH_LOADER_EFI "/boot/loader.efi" +#define PATH_KERNEL "/boot/kernel/kernel" + +#endif /* _PATHS_H_ */ Property changes on: projects/clang380-import/sys/boot/common/paths.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/boot/common/rbx.h =================================================================== --- projects/clang380-import/sys/boot/common/rbx.h (nonexistent) +++ projects/clang380-import/sys/boot/common/rbx.h (revision 294777) @@ -0,0 +1,61 @@ +/*- + * Copyright (c) 1998 Robert Nordier + * All rights reserved. + * + * Redistribution and use in source and binary forms are freely + * permitted provided that the above copyright notice and this + * paragraph and the following disclaimer are duplicated in all + * such forms. + * + * This software is provided "AS IS" and without any express or + * implied warranties, including, without limitation, the implied + * warranties of merchantability and fitness for a particular + * purpose. + * + * $FreeBSD$ + */ + +#ifndef _RBX_H_ +#define _RBX_H_ + +#define RBX_ASKNAME 0x0 /* -a */ +#define RBX_SINGLE 0x1 /* -s */ +/* 0x2 is reserved for log2(RB_NOSYNC). */ +/* 0x3 is reserved for log2(RB_HALT). */ +/* 0x4 is reserved for log2(RB_INITNAME). */ +#define RBX_DFLTROOT 0x5 /* -r */ +#define RBX_KDB 0x6 /* -d */ +/* 0x7 is reserved for log2(RB_RDONLY). */ +/* 0x8 is reserved for log2(RB_DUMP). */ +/* 0x9 is reserved for log2(RB_MINIROOT). */ +#define RBX_CONFIG 0xa /* -c */ +#define RBX_VERBOSE 0xb /* -v */ +#define RBX_SERIAL 0xc /* -h */ +#define RBX_CDROM 0xd /* -C */ +/* 0xe is reserved for log2(RB_POWEROFF). */ +#define RBX_GDB 0xf /* -g */ +#define RBX_MUTE 0x10 /* -m */ +/* 0x11 is reserved for log2(RB_SELFTEST). */ +/* 0x12 is reserved for boot programs. */ +/* 0x13 is reserved for boot programs. */ +#define RBX_PAUSE 0x14 /* -p */ +#define RBX_QUIET 0x15 /* -q */ +#define RBX_NOINTR 0x1c /* -n */ +/* 0x1d is reserved for log2(RB_MULTIPLE) and is just misnamed here. */ +#define RBX_DUAL 0x1d /* -D */ +/* 0x1f is reserved for log2(RB_BOOTINFO). */ + +/* pass: -a, -s, -r, -d, -c, -v, -h, -C, -g, -m, -p, -D */ +#define RBX_MASK (OPT_SET(RBX_ASKNAME) | OPT_SET(RBX_SINGLE) | \ + OPT_SET(RBX_DFLTROOT) | OPT_SET(RBX_KDB ) | \ + OPT_SET(RBX_CONFIG) | OPT_SET(RBX_VERBOSE) | \ + OPT_SET(RBX_SERIAL) | OPT_SET(RBX_CDROM) | \ + OPT_SET(RBX_GDB ) | OPT_SET(RBX_MUTE) | \ + OPT_SET(RBX_PAUSE) | OPT_SET(RBX_DUAL)) + +#define OPT_SET(opt) (1 << (opt)) +#define OPT_CHECK(opt) ((opts) & OPT_SET(opt)) + +extern uint32_t opts; + +#endif /* !_RBX_H_ */ Property changes on: projects/clang380-import/sys/boot/common/rbx.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/boot/efi/boot1/boot1.c =================================================================== --- projects/clang380-import/sys/boot/efi/boot1/boot1.c (revision 294776) +++ projects/clang380-import/sys/boot/efi/boot1/boot1.c (revision 294777) @@ -1,361 +1,396 @@ /*- * Copyright (c) 1998 Robert Nordier * All rights reserved. * Copyright (c) 2001 Robert Drehmel * All rights reserved. * Copyright (c) 2014 Nathan Whitehorn * All rights reserved. * Copyright (c) 2015 Eric McCorkle * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include "boot_module.h" +#include "paths.h" -#define _PATH_LOADER "/boot/loader.efi" - static const boot_module_t *boot_modules[] = { #ifdef EFI_ZFS_BOOT &zfs_module, #endif #ifdef EFI_UFS_BOOT &ufs_module #endif }; #define NUM_BOOT_MODULES (sizeof(boot_modules) / sizeof(boot_module_t*)) /* The initial number of handles used to query EFI for partitions. */ #define NUM_HANDLES_INIT 24 void putchar(int c); EFI_STATUS efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE* Xsystab); static void try_load(const boot_module_t* mod); static EFI_STATUS probe_handle(EFI_HANDLE h); EFI_SYSTEM_TABLE *systab; EFI_BOOT_SERVICES *bs; static EFI_HANDLE *image; static EFI_GUID BlockIoProtocolGUID = BLOCK_IO_PROTOCOL; static EFI_GUID DevicePathGUID = DEVICE_PATH_PROTOCOL; static EFI_GUID LoadedImageGUID = LOADED_IMAGE_PROTOCOL; static EFI_GUID ConsoleControlGUID = EFI_CONSOLE_CONTROL_PROTOCOL_GUID; /* * Provide Malloc / Free backed by EFIs AllocatePool / FreePool which ensures * memory is correctly aligned avoiding EFI_INVALID_PARAMETER returns from * EFI methods. */ void * Malloc(size_t len, const char *file __unused, int line __unused) { void *out; if (bs->AllocatePool(EfiLoaderData, len, &out) == EFI_SUCCESS) return (out); return (NULL); } void Free(void *buf, const char *file __unused, int line __unused) { (void)bs->FreePool(buf); } /* * This function only returns if it fails to load the kernel. If it * succeeds, it simply boots the kernel. */ void try_load(const boot_module_t *mod) { - size_t bufsize; + size_t bufsize, cmdsize; void *buf; + char *cmd; dev_info_t *dev; EFI_HANDLE loaderhandle; EFI_LOADED_IMAGE *loaded_image; EFI_STATUS status; - status = mod->load(_PATH_LOADER, &dev, &buf, &bufsize); + /* + * Read in and parse the command line from /boot.config or /boot/config, + * if present. We'll pass it the next stage via a simple ASCII + * string. loader.efi has a hack for ASCII strings, so we'll use that to + * keep the size down here. We only try to read the alternate file if + * we get EFI_NOT_FOUND because all other errors mean that the boot_module + * had troubles with the filesystem. We could return early, but we'll let + * loading the actual kernel sort all that out. Since these files are + * optional, we don't report errors in trying to read them. + */ + cmd = NULL; + cmdsize = 0; + status = mod->load(PATH_DOTCONFIG, &dev, &buf, &bufsize); if (status == EFI_NOT_FOUND) + status = mod->load(PATH_CONFIG, &dev, &buf, &bufsize); + if (status == EFI_SUCCESS) { + cmdsize = bufsize + 1; + cmd = malloc(cmdsize); + if (cmd == NULL) { + free(buf); + return; + } + memcpy(cmd, buf, bufsize); + cmd[bufsize] = '\0'; + free(buf); + } + + status = mod->load(PATH_LOADER_EFI, &dev, &buf, &bufsize); + if (status == EFI_NOT_FOUND) return; if (status != EFI_SUCCESS) { - printf("%s failed to load %s (%lu)\n", mod->name, _PATH_LOADER, - EFI_ERROR_CODE(status)); + printf("%s failed to load %s (%lu)\n", mod->name, + PATH_LOADER_EFI, EFI_ERROR_CODE(status)); return; } if ((status = bs->LoadImage(TRUE, image, dev->devpath, buf, bufsize, &loaderhandle)) != EFI_SUCCESS) { printf("Failed to load image provided by %s, size: %zu, (%lu)\n", mod->name, bufsize, EFI_ERROR_CODE(status)); return; } + if (cmd != NULL) + printf(" command args: %s\n", cmd); + if ((status = bs->HandleProtocol(loaderhandle, &LoadedImageGUID, (VOID**)&loaded_image)) != EFI_SUCCESS) { printf("Failed to query LoadedImage provided by %s (%lu)\n", mod->name, EFI_ERROR_CODE(status)); return; } loaded_image->DeviceHandle = dev->devhandle; + loaded_image->LoadOptionsSize = cmdsize; + loaded_image->LoadOptions = cmd; if ((status = bs->StartImage(loaderhandle, NULL, NULL)) != EFI_SUCCESS) { printf("Failed to start image provided by %s (%lu)\n", mod->name, EFI_ERROR_CODE(status)); + free(cmd); + loaded_image->LoadOptionsSize = 0; + loaded_image->LoadOptions = NULL; return; } } EFI_STATUS efi_main(EFI_HANDLE Ximage, EFI_SYSTEM_TABLE *Xsystab) { EFI_HANDLE *handles; EFI_STATUS status; EFI_CONSOLE_CONTROL_PROTOCOL *ConsoleControl = NULL; SIMPLE_TEXT_OUTPUT_INTERFACE *conout = NULL; UINTN i, max_dim, best_mode, cols, rows, hsize, nhandles; /* Basic initialization*/ systab = Xsystab; image = Ximage; bs = Xsystab->BootServices; /* Set up the console, so printf works. */ status = bs->LocateProtocol(&ConsoleControlGUID, NULL, (VOID **)&ConsoleControl); if (status == EFI_SUCCESS) (void)ConsoleControl->SetMode(ConsoleControl, EfiConsoleControlScreenText); /* * Reset the console and find the best text mode. */ conout = systab->ConOut; conout->Reset(conout, TRUE); max_dim = best_mode = 0; for (i = 0; ; i++) { status = conout->QueryMode(conout, i, &cols, &rows); if (EFI_ERROR(status)) break; if (cols * rows > max_dim) { max_dim = cols * rows; best_mode = i; } } if (max_dim > 0) conout->SetMode(conout, best_mode); conout->EnableCursor(conout, TRUE); conout->ClearScreen(conout); printf("\n>> FreeBSD EFI boot block\n"); - printf(" Loader path: %s\n\n", _PATH_LOADER); + printf(" Loader path: %s\n\n", PATH_LOADER_EFI); printf(" Initializing modules:"); for (i = 0; i < NUM_BOOT_MODULES; i++) { if (boot_modules[i] == NULL) continue; printf(" %s", boot_modules[i]->name); if (boot_modules[i]->init != NULL) boot_modules[i]->init(); } putchar('\n'); /* Get all the device handles */ hsize = (UINTN)NUM_HANDLES_INIT * sizeof(EFI_HANDLE); if ((status = bs->AllocatePool(EfiLoaderData, hsize, (void **)&handles)) != EFI_SUCCESS) panic("Failed to allocate %d handles (%lu)", NUM_HANDLES_INIT, EFI_ERROR_CODE(status)); status = bs->LocateHandle(ByProtocol, &BlockIoProtocolGUID, NULL, &hsize, handles); switch (status) { case EFI_SUCCESS: break; case EFI_BUFFER_TOO_SMALL: (void)bs->FreePool(handles); if ((status = bs->AllocatePool(EfiLoaderData, hsize, (void **)&handles) != EFI_SUCCESS)) { panic("Failed to allocate %zu handles (%lu)", hsize / sizeof(*handles), EFI_ERROR_CODE(status)); } status = bs->LocateHandle(ByProtocol, &BlockIoProtocolGUID, NULL, &hsize, handles); if (status != EFI_SUCCESS) panic("Failed to get device handles (%lu)\n", EFI_ERROR_CODE(status)); break; default: panic("Failed to get device handles (%lu)", EFI_ERROR_CODE(status)); } /* Scan all partitions, probing with all modules. */ nhandles = hsize / sizeof(*handles); printf(" Probing %zu block devices...", nhandles); for (i = 0; i < nhandles; i++) { status = probe_handle(handles[i]); switch (status) { case EFI_UNSUPPORTED: printf("."); break; case EFI_SUCCESS: printf("+"); break; default: printf("x"); break; } } printf(" done\n"); /* Status summary. */ for (i = 0; i < NUM_BOOT_MODULES; i++) { if (boot_modules[i] != NULL) { printf(" "); boot_modules[i]->status(); } } /* Select a partition to boot by trying each module in order. */ for (i = 0; i < NUM_BOOT_MODULES; i++) if (boot_modules[i] != NULL) try_load(boot_modules[i]); /* If we get here, we're out of luck... */ panic("No bootable partitions found!"); } static EFI_STATUS probe_handle(EFI_HANDLE h) { dev_info_t *devinfo; EFI_BLOCK_IO *blkio; EFI_DEVICE_PATH *devpath; EFI_STATUS status; UINTN i; /* Figure out if we're dealing with an actual partition. */ status = bs->HandleProtocol(h, &DevicePathGUID, (void **)&devpath); if (status == EFI_UNSUPPORTED) return (status); if (status != EFI_SUCCESS) { DPRINTF("\nFailed to query DevicePath (%lu)\n", EFI_ERROR_CODE(status)); return (status); } while (!IsDevicePathEnd(NextDevicePathNode(devpath))) devpath = NextDevicePathNode(devpath); status = bs->HandleProtocol(h, &BlockIoProtocolGUID, (void **)&blkio); if (status == EFI_UNSUPPORTED) return (status); if (status != EFI_SUCCESS) { DPRINTF("\nFailed to query BlockIoProtocol (%lu)\n", EFI_ERROR_CODE(status)); return (status); } if (!blkio->Media->LogicalPartition) return (EFI_UNSUPPORTED); /* Run through each module, see if it can load this partition */ for (i = 0; i < NUM_BOOT_MODULES; i++) { if (boot_modules[i] == NULL) continue; if ((status = bs->AllocatePool(EfiLoaderData, sizeof(*devinfo), (void **)&devinfo)) != EFI_SUCCESS) { DPRINTF("\nFailed to allocate devinfo (%lu)\n", EFI_ERROR_CODE(status)); continue; } devinfo->dev = blkio; devinfo->devpath = devpath; devinfo->devhandle = h; devinfo->devdata = NULL; devinfo->next = NULL; status = boot_modules[i]->probe(devinfo); if (status == EFI_SUCCESS) return (EFI_SUCCESS); (void)bs->FreePool(devinfo); } return (EFI_UNSUPPORTED); } void add_device(dev_info_t **devinfop, dev_info_t *devinfo) { dev_info_t *dev; if (*devinfop == NULL) { *devinfop = devinfo; return; } for (dev = *devinfop; dev->next != NULL; dev = dev->next) ; dev->next = devinfo; } void panic(const char *fmt, ...) { va_list ap; printf("panic: "); va_start(ap, fmt); vprintf(fmt, ap); va_end(ap); printf("\n"); while (1) {} } void putchar(int c) { CHAR16 buf[2]; if (c == '\n') { buf[0] = '\r'; buf[1] = 0; systab->ConOut->OutputString(systab->ConOut, buf); } buf[0] = c; buf[1] = 0; systab->ConOut->OutputString(systab->ConOut, buf); } Index: projects/clang380-import/sys/boot/efi/libefi/libefi.c =================================================================== --- projects/clang380-import/sys/boot/efi/libefi/libefi.c (revision 294776) +++ projects/clang380-import/sys/boot/efi/libefi/libefi.c (revision 294777) @@ -1,198 +1,198 @@ /*- * Copyright (c) 2000 Doug Rabson * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include EFI_HANDLE IH; EFI_SYSTEM_TABLE *ST; EFI_BOOT_SERVICES *BS; EFI_RUNTIME_SERVICES *RS; static EFI_PHYSICAL_ADDRESS heap; static UINTN heapsize; static CHAR16 * arg_skipsep(CHAR16 *argp) { - while (*argp == ' ' || *argp == '\t') + while (*argp == ' ' || *argp == '\t' || *argp == '\n') argp++; return (argp); } static CHAR16 * arg_skipword(CHAR16 *argp) { - while (*argp && *argp != ' ' && *argp != '\t') + while (*argp && *argp != ' ' && *argp != '\t' && *argp != '\n') argp++; return (argp); } void * efi_get_table(EFI_GUID *tbl) { EFI_GUID *id; int i; for (i = 0; i < ST->NumberOfTableEntries; i++) { id = &ST->ConfigurationTable[i].VendorGuid; if (!memcmp(id, tbl, sizeof(EFI_GUID))) return (ST->ConfigurationTable[i].VendorTable); } return (NULL); } void exit(EFI_STATUS exit_code) { BS->FreePages(heap, EFI_SIZE_TO_PAGES(heapsize)); BS->Exit(IH, exit_code, 0, NULL); } void efi_main(EFI_HANDLE image_handle, EFI_SYSTEM_TABLE *system_table) { static EFI_GUID image_protocol = LOADED_IMAGE_PROTOCOL; static EFI_GUID console_control_protocol = EFI_CONSOLE_CONTROL_PROTOCOL_GUID; EFI_CONSOLE_CONTROL_PROTOCOL *console_control = NULL; EFI_LOADED_IMAGE *img; CHAR16 *argp, *args, **argv; EFI_STATUS status; int argc, addprog; IH = image_handle; ST = system_table; BS = ST->BootServices; RS = ST->RuntimeServices; status = BS->LocateProtocol(&console_control_protocol, NULL, (VOID **)&console_control); if (status == EFI_SUCCESS) (void)console_control->SetMode(console_control, EfiConsoleControlScreenText); heapsize = 3 * 1024 * 1024; status = BS->AllocatePages(AllocateAnyPages, EfiLoaderData, EFI_SIZE_TO_PAGES(heapsize), &heap); if (status != EFI_SUCCESS) BS->Exit(IH, status, 0, NULL); setheap((void *)(uintptr_t)heap, (void *)(uintptr_t)(heap + heapsize)); /* Use exit() from here on... */ status = BS->HandleProtocol(IH, &image_protocol, (VOID**)&img); if (status != EFI_SUCCESS) exit(status); /* * Pre-process the (optional) load options. If the option string * is given as an ASCII string, we use a poor man's ASCII to * Unicode-16 translation. The size of the option string as given * to us includes the terminating null character. We assume the * string is an ASCII string if strlen() plus the terminating * '\0' is less than LoadOptionsSize. Even if all Unicode-16 * characters have the upper 8 bits non-zero, the terminating * null character will cause a one-off. * If the string is already in Unicode-16, we make a copy so that * we know we can always modify the string. */ if (img->LoadOptionsSize > 0 && img->LoadOptions != NULL) { if (img->LoadOptionsSize == strlen(img->LoadOptions) + 1) { args = malloc(img->LoadOptionsSize << 1); for (argc = 0; argc < img->LoadOptionsSize; argc++) args[argc] = ((char*)img->LoadOptions)[argc]; } else { args = malloc(img->LoadOptionsSize); memcpy(args, img->LoadOptions, img->LoadOptionsSize); } } else args = NULL; /* * Use a quick and dirty algorithm to build the argv vector. We * first count the number of words. Then, after allocating the * vector, we split the string up. We don't deal with quotes or * other more advanced shell features. * The EFI shell will pass the name of the image as the first * word in the argument list. This does not happen if we're * loaded by the boot manager. This is not so easy to figure * out though. The ParentHandle is not always NULL, because * there can be a function (=image) that will perform the task * for the boot manager. */ /* Part 1: Figure out if we need to add our program name. */ addprog = (args == NULL || img->ParentHandle == NULL || img->FilePath == NULL) ? 1 : 0; if (!addprog) { addprog = (DevicePathType(img->FilePath) != MEDIA_DEVICE_PATH || DevicePathSubType(img->FilePath) != MEDIA_FILEPATH_DP || DevicePathNodeLength(img->FilePath) <= sizeof(FILEPATH_DEVICE_PATH)) ? 1 : 0; if (!addprog) { /* XXX todo. */ } } /* Part 2: count words. */ argc = (addprog) ? 1 : 0; argp = args; while (argp != NULL && *argp != 0) { argp = arg_skipsep(argp); if (*argp == 0) break; argc++; argp = arg_skipword(argp); } /* Part 3: build vector. */ argv = malloc((argc + 1) * sizeof(CHAR16*)); argc = 0; if (addprog) argv[argc++] = (CHAR16 *)L"loader.efi"; argp = args; while (argp != NULL && *argp != 0) { argp = arg_skipsep(argp); if (*argp == 0) break; argv[argc++] = argp; argp = arg_skipword(argp); /* Terminate the words. */ if (*argp != 0) *argp++ = 0; } argv[argc] = NULL; status = main(argc, argv); exit(status); } Index: projects/clang380-import/sys/boot/efi/loader/main.c =================================================================== --- projects/clang380-import/sys/boot/efi/loader/main.c (revision 294776) +++ projects/clang380-import/sys/boot/efi/loader/main.c (revision 294777) @@ -1,582 +1,663 @@ /*- * Copyright (c) 2008-2010 Rui Paulo * Copyright (c) 2006 Marcel Moolenaar * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include +#include +#include #include #include #include #include #include #include #include #ifdef EFI_ZFS_BOOT #include #endif #include "loader_efi.h" extern char bootprog_name[]; extern char bootprog_rev[]; extern char bootprog_date[]; extern char bootprog_maker[]; struct arch_switch archsw; /* MI/MD interface boundary */ EFI_GUID acpi = ACPI_TABLE_GUID; EFI_GUID acpi20 = ACPI_20_TABLE_GUID; EFI_GUID devid = DEVICE_PATH_PROTOCOL; EFI_GUID imgid = LOADED_IMAGE_PROTOCOL; EFI_GUID mps = MPS_TABLE_GUID; EFI_GUID netid = EFI_SIMPLE_NETWORK_PROTOCOL; EFI_GUID smbios = SMBIOS_TABLE_GUID; EFI_GUID dxe = DXE_SERVICES_TABLE_GUID; EFI_GUID hoblist = HOB_LIST_TABLE_GUID; EFI_GUID memtype = MEMORY_TYPE_INFORMATION_TABLE_GUID; EFI_GUID debugimg = DEBUG_IMAGE_INFO_TABLE_GUID; EFI_GUID fdtdtb = FDT_TABLE_GUID; #ifdef EFI_ZFS_BOOT static void efi_zfs_probe(void); #endif /* * Need this because EFI uses UTF-16 unicode string constants, but we * use UTF-8. We can't use printf due to the possiblity of \0 and we * don't support support wide characters either. */ static void print_str16(const CHAR16 *str) { int i; for (i = 0; str[i]; i++) printf("%c", (char)str[i]); } +static void +cp16to8(const CHAR16 *src, char *dst, size_t len) +{ + size_t i; + + for (i = 0; i < len && src[i]; i++) + dst[i] = (char)src[i]; +} + EFI_STATUS main(int argc, CHAR16 *argv[]) { char var[128]; EFI_LOADED_IMAGE *img; EFI_GUID *guid; - int i, j, vargood, unit; + int i, j, vargood, unit, howto; struct devsw *dev; uint64_t pool_guid; UINTN k; archsw.arch_autoload = efi_autoload; archsw.arch_getdev = efi_getdev; archsw.arch_copyin = efi_copyin; archsw.arch_copyout = efi_copyout; archsw.arch_readin = efi_readin; #ifdef EFI_ZFS_BOOT /* Note this needs to be set before ZFS init. */ archsw.arch_zfs_probe = efi_zfs_probe; #endif /* * XXX Chicken-and-egg problem; we want to have console output * early, but some console attributes may depend on reading from * eg. the boot device, which we can't do yet. We can use * printf() etc. once this is done. */ cons_probe(); /* + * Parse the args to set the console settings, etc + * boot1.efi passes these in, if it can read /boot.config or /boot/config + * or iPXE may be setup to pass these in. + * * Loop through the args, and for each one that contains an '=' that is * not the first character, add it to the environment. This allows * loader and kernel env vars to be passed on the command line. Convert * args from UCS-2 to ASCII (16 to 8 bit) as they are copied. */ + howto = 0; for (i = 1; i < argc; i++) { - vargood = 0; - for (j = 0; argv[i][j] != 0; j++) { - if (j == sizeof(var)) { - vargood = 0; - break; + if (argv[i][0] == '-') { + for (j = 1; argv[i][j] != 0; j++) { + int ch; + + ch = argv[i][j]; + switch (ch) { + case 'a': + howto |= RB_ASKNAME; + break; + case 'd': + howto |= RB_KDB; + break; + case 'D': + howto |= RB_MULTIPLE; + break; + case 'm': + howto |= RB_MUTE; + break; + case 'h': + howto |= RB_SERIAL; + break; + case 'p': + howto |= RB_PAUSE; + break; + case 'r': + howto |= RB_DFLTROOT; + break; + case 's': + howto |= RB_SINGLE; + break; + case 'S': + if (argv[i][j + 1] == 0) { + if (i + 1 == argc) { + setenv("comconsole_speed", "115200", 1); + } else { + cp16to8(&argv[i + 1][0], var, + sizeof(var)); + setenv("comconsole_speedspeed", var, 1); + } + i++; + break; + } else { + cp16to8(&argv[i][j + 1], var, + sizeof(var)); + setenv("comconsole_speed", var, 1); + break; + } + case 'v': + howto |= RB_VERBOSE; + break; + } } - if (j > 0 && argv[i][j] == '=') - vargood = 1; - var[j] = (char)argv[i][j]; + } else { + vargood = 0; + for (j = 0; argv[i][j] != 0; j++) { + if (j == sizeof(var)) { + vargood = 0; + break; + } + if (j > 0 && argv[i][j] == '=') + vargood = 1; + var[j] = (char)argv[i][j]; + } + if (vargood) { + var[j] = 0; + putenv(var); + } } - if (vargood) { - var[j] = 0; - putenv(var); - } + } + for (i = 0; howto_names[i].ev != NULL; i++) + if (howto & howto_names[i].mask) + setenv(howto_names[i].ev, "YES", 1); + if (howto & RB_MULTIPLE) { + if (howto & RB_SERIAL) + setenv("console", "comconsole efi" , 1); + else + setenv("console", "efi comconsole" , 1); + } else if (howto & RB_SERIAL) { + setenv("console", "comconsole" , 1); } if (efi_copy_init()) { printf("failed to allocate staging area\n"); return (EFI_BUFFER_TOO_SMALL); } /* * March through the device switch probing for things. */ for (i = 0; devsw[i] != NULL; i++) if (devsw[i]->dv_init != NULL) (devsw[i]->dv_init)(); /* Get our loaded image protocol interface structure. */ BS->HandleProtocol(IH, &imgid, (VOID**)&img); printf("Command line arguments:"); for (i = 0; i < argc; i++) { printf(" "); print_str16(argv[i]); } printf("\n"); printf("Image base: 0x%lx\n", (u_long)img->ImageBase); printf("EFI version: %d.%02d\n", ST->Hdr.Revision >> 16, ST->Hdr.Revision & 0xffff); printf("EFI Firmware: "); /* printf doesn't understand EFI Unicode */ ST->ConOut->OutputString(ST->ConOut, ST->FirmwareVendor); printf(" (rev %d.%02d)\n", ST->FirmwareRevision >> 16, ST->FirmwareRevision & 0xffff); printf("\n"); printf("%s, Revision %s\n", bootprog_name, bootprog_rev); printf("(%s, %s)\n", bootprog_maker, bootprog_date); /* * Disable the watchdog timer. By default the boot manager sets * the timer to 5 minutes before invoking a boot option. If we * want to return to the boot manager, we have to disable the * watchdog timer and since we're an interactive program, we don't * want to wait until the user types "quit". The timer may have * fired by then. We don't care if this fails. It does not prevent * normal functioning in any way... */ BS->SetWatchdogTimer(0, 0, 0, NULL); if (efi_handle_lookup(img->DeviceHandle, &dev, &unit, &pool_guid) != 0) return (EFI_NOT_FOUND); switch (dev->dv_type) { #ifdef EFI_ZFS_BOOT case DEVT_ZFS: { struct zfs_devdesc currdev; currdev.d_dev = dev; currdev.d_unit = unit; currdev.d_type = currdev.d_dev->dv_type; currdev.d_opendata = NULL; currdev.pool_guid = pool_guid; currdev.root_guid = 0; env_setenv("currdev", EV_VOLATILE, efi_fmtdev(&currdev), efi_setcurrdev, env_nounset); env_setenv("loaddev", EV_VOLATILE, efi_fmtdev(&currdev), env_noset, env_nounset); init_zfs_bootenv(zfs_fmtdev(&currdev)); break; } #endif default: { struct devdesc currdev; currdev.d_dev = dev; currdev.d_unit = unit; currdev.d_opendata = NULL; currdev.d_type = currdev.d_dev->dv_type; env_setenv("currdev", EV_VOLATILE, efi_fmtdev(&currdev), efi_setcurrdev, env_nounset); env_setenv("loaddev", EV_VOLATILE, efi_fmtdev(&currdev), env_noset, env_nounset); break; } } setenv("LINES", "24", 1); /* optional */ for (k = 0; k < ST->NumberOfTableEntries; k++) { guid = &ST->ConfigurationTable[k].VendorGuid; if (!memcmp(guid, &smbios, sizeof(EFI_GUID))) { smbios_detect(ST->ConfigurationTable[k].VendorTable); break; } } interact(NULL); /* doesn't return */ return (EFI_SUCCESS); /* keep compiler happy */ } COMMAND_SET(reboot, "reboot", "reboot the system", command_reboot); static int command_reboot(int argc, char *argv[]) { int i; for (i = 0; devsw[i] != NULL; ++i) if (devsw[i]->dv_cleanup != NULL) (devsw[i]->dv_cleanup)(); RS->ResetSystem(EfiResetCold, EFI_SUCCESS, 23, (CHAR16 *)"Reboot from the loader"); /* NOTREACHED */ return (CMD_ERROR); } COMMAND_SET(quit, "quit", "exit the loader", command_quit); static int command_quit(int argc, char *argv[]) { exit(0); return (CMD_OK); } COMMAND_SET(memmap, "memmap", "print memory map", command_memmap); static int command_memmap(int argc, char *argv[]) { UINTN sz; EFI_MEMORY_DESCRIPTOR *map, *p; UINTN key, dsz; UINT32 dver; EFI_STATUS status; int i, ndesc; static char *types[] = { "Reserved", "LoaderCode", "LoaderData", "BootServicesCode", "BootServicesData", "RuntimeServicesCode", "RuntimeServicesData", "ConventionalMemory", "UnusableMemory", "ACPIReclaimMemory", "ACPIMemoryNVS", "MemoryMappedIO", "MemoryMappedIOPortSpace", "PalCode" }; sz = 0; status = BS->GetMemoryMap(&sz, 0, &key, &dsz, &dver); if (status != EFI_BUFFER_TOO_SMALL) { printf("Can't determine memory map size\n"); return (CMD_ERROR); } map = malloc(sz); status = BS->GetMemoryMap(&sz, map, &key, &dsz, &dver); if (EFI_ERROR(status)) { printf("Can't read memory map\n"); return (CMD_ERROR); } ndesc = sz / dsz; printf("%23s %12s %12s %8s %4s\n", "Type", "Physical", "Virtual", "#Pages", "Attr"); for (i = 0, p = map; i < ndesc; i++, p = NextMemoryDescriptor(p, dsz)) { printf("%23s %012jx %012jx %08jx ", types[p->Type], (uintmax_t)p->PhysicalStart, (uintmax_t)p->VirtualStart, (uintmax_t)p->NumberOfPages); if (p->Attribute & EFI_MEMORY_UC) printf("UC "); if (p->Attribute & EFI_MEMORY_WC) printf("WC "); if (p->Attribute & EFI_MEMORY_WT) printf("WT "); if (p->Attribute & EFI_MEMORY_WB) printf("WB "); if (p->Attribute & EFI_MEMORY_UCE) printf("UCE "); if (p->Attribute & EFI_MEMORY_WP) printf("WP "); if (p->Attribute & EFI_MEMORY_RP) printf("RP "); if (p->Attribute & EFI_MEMORY_XP) printf("XP "); printf("\n"); } return (CMD_OK); } COMMAND_SET(configuration, "configuration", "print configuration tables", command_configuration); static const char * guid_to_string(EFI_GUID *guid) { static char buf[40]; sprintf(buf, "%08x-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x", guid->Data1, guid->Data2, guid->Data3, guid->Data4[0], guid->Data4[1], guid->Data4[2], guid->Data4[3], guid->Data4[4], guid->Data4[5], guid->Data4[6], guid->Data4[7]); return (buf); } static int command_configuration(int argc, char *argv[]) { UINTN i; printf("NumberOfTableEntries=%lu\n", (unsigned long)ST->NumberOfTableEntries); for (i = 0; i < ST->NumberOfTableEntries; i++) { EFI_GUID *guid; printf(" "); guid = &ST->ConfigurationTable[i].VendorGuid; if (!memcmp(guid, &mps, sizeof(EFI_GUID))) printf("MPS Table"); else if (!memcmp(guid, &acpi, sizeof(EFI_GUID))) printf("ACPI Table"); else if (!memcmp(guid, &acpi20, sizeof(EFI_GUID))) printf("ACPI 2.0 Table"); else if (!memcmp(guid, &smbios, sizeof(EFI_GUID))) printf("SMBIOS Table"); else if (!memcmp(guid, &dxe, sizeof(EFI_GUID))) printf("DXE Table"); else if (!memcmp(guid, &hoblist, sizeof(EFI_GUID))) printf("HOB List Table"); else if (!memcmp(guid, &memtype, sizeof(EFI_GUID))) printf("Memory Type Information Table"); else if (!memcmp(guid, &debugimg, sizeof(EFI_GUID))) printf("Debug Image Info Table"); else if (!memcmp(guid, &fdtdtb, sizeof(EFI_GUID))) printf("FDT Table"); else printf("Unknown Table (%s)", guid_to_string(guid)); printf(" at %p\n", ST->ConfigurationTable[i].VendorTable); } return (CMD_OK); } COMMAND_SET(mode, "mode", "change or display EFI text modes", command_mode); static int command_mode(int argc, char *argv[]) { UINTN cols, rows; unsigned int mode; int i; char *cp; char rowenv[8]; EFI_STATUS status; SIMPLE_TEXT_OUTPUT_INTERFACE *conout; extern void HO(void); conout = ST->ConOut; if (argc > 1) { mode = strtol(argv[1], &cp, 0); if (cp[0] != '\0') { printf("Invalid mode\n"); return (CMD_ERROR); } status = conout->QueryMode(conout, mode, &cols, &rows); if (EFI_ERROR(status)) { printf("invalid mode %d\n", mode); return (CMD_ERROR); } status = conout->SetMode(conout, mode); if (EFI_ERROR(status)) { printf("couldn't set mode %d\n", mode); return (CMD_ERROR); } sprintf(rowenv, "%u", (unsigned)rows); setenv("LINES", rowenv, 1); HO(); /* set cursor */ return (CMD_OK); } printf("Current mode: %d\n", conout->Mode->Mode); for (i = 0; i <= conout->Mode->MaxMode; i++) { status = conout->QueryMode(conout, i, &cols, &rows); if (EFI_ERROR(status)) continue; printf("Mode %d: %u columns, %u rows\n", i, (unsigned)cols, (unsigned)rows); } if (i != 0) printf("Select a mode with the command \"mode \"\n"); return (CMD_OK); } COMMAND_SET(nvram, "nvram", "get or set NVRAM variables", command_nvram); static int command_nvram(int argc, char *argv[]) { CHAR16 var[128]; CHAR16 *data; EFI_STATUS status; EFI_GUID varguid = { 0,0,0,{0,0,0,0,0,0,0,0} }; UINTN varsz, datasz, i; SIMPLE_TEXT_OUTPUT_INTERFACE *conout; conout = ST->ConOut; /* Initiate the search */ status = RS->GetNextVariableName(&varsz, NULL, NULL); for (; status != EFI_NOT_FOUND; ) { status = RS->GetNextVariableName(&varsz, var, &varguid); //if (EFI_ERROR(status)) //break; conout->OutputString(conout, var); printf("="); datasz = 0; status = RS->GetVariable(var, &varguid, NULL, &datasz, NULL); /* XXX: check status */ data = malloc(datasz); status = RS->GetVariable(var, &varguid, NULL, &datasz, data); if (EFI_ERROR(status)) printf(""); else { for (i = 0; i < datasz; i++) { if (isalnum(data[i]) || isspace(data[i])) printf("%c", data[i]); else printf("\\x%02x", data[i]); } } /* XXX */ pager_output("\n"); free(data); } return (CMD_OK); } #ifdef EFI_ZFS_BOOT COMMAND_SET(lszfs, "lszfs", "list child datasets of a zfs dataset", command_lszfs); static int command_lszfs(int argc, char *argv[]) { int err; if (argc != 2) { command_errmsg = "wrong number of arguments"; return (CMD_ERROR); } err = zfs_list(argv[1]); if (err != 0) { command_errmsg = strerror(err); return (CMD_ERROR); } return (CMD_OK); } COMMAND_SET(reloadbe, "reloadbe", "refresh the list of ZFS Boot Environments", command_reloadbe); static int command_reloadbe(int argc, char *argv[]) { int err; char *root; if (argc > 2) { command_errmsg = "wrong number of arguments"; return (CMD_ERROR); } if (argc == 2) { err = zfs_bootenv(argv[1]); } else { root = getenv("zfs_be_root"); if (root == NULL) { return (CMD_OK); } err = zfs_bootenv(root); } if (err != 0) { command_errmsg = strerror(err); return (CMD_ERROR); } return (CMD_OK); } #endif #ifdef LOADER_FDT_SUPPORT extern int command_fdt_internal(int argc, char *argv[]); /* * Since proper fdt command handling function is defined in fdt_loader_cmd.c, * and declaring it as extern is in contradiction with COMMAND_SET() macro * (which uses static pointer), we're defining wrapper function, which * calls the proper fdt handling routine. */ static int command_fdt(int argc, char *argv[]) { return (command_fdt_internal(argc, argv)); } COMMAND_SET(fdt, "fdt", "flattened device tree handling", command_fdt); #endif #ifdef EFI_ZFS_BOOT static void efi_zfs_probe(void) { EFI_HANDLE h; u_int unit; int i; char dname[SPECNAMELEN + 1]; uint64_t guid; unit = 0; h = efi_find_handle(&efipart_dev, 0); for (i = 0; h != NULL; h = efi_find_handle(&efipart_dev, ++i)) { snprintf(dname, sizeof(dname), "%s%d:", efipart_dev.dv_name, i); if (zfs_probe_dev(dname, &guid) == 0) (void)efi_handle_update_dev(h, &zfs_dev, unit++, guid); } } #endif Index: projects/clang380-import/sys/boot/i386/boot2/boot2.c =================================================================== --- projects/clang380-import/sys/boot/i386/boot2/boot2.c (revision 294776) +++ projects/clang380-import/sys/boot/i386/boot2/boot2.c (revision 294777) @@ -1,687 +1,646 @@ /*- * Copyright (c) 1998 Robert Nordier * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include "boot2.h" #include "lib.h" +#include "paths.h" +#include "rbx.h" /* Define to 0 to omit serial support */ #ifndef SERIAL #define SERIAL 1 #endif #define IO_KEYBOARD 1 #define IO_SERIAL 2 #if SERIAL #define DO_KBD (ioctrl & IO_KEYBOARD) #define DO_SIO (ioctrl & IO_SERIAL) #else #define DO_KBD (1) #define DO_SIO (0) #endif #define SECOND 18 /* Circa that many ticks in a second. */ -#define RBX_ASKNAME 0x0 /* -a */ -#define RBX_SINGLE 0x1 /* -s */ -/* 0x2 is reserved for log2(RB_NOSYNC). */ -/* 0x3 is reserved for log2(RB_HALT). */ -/* 0x4 is reserved for log2(RB_INITNAME). */ -#define RBX_DFLTROOT 0x5 /* -r */ -#define RBX_KDB 0x6 /* -d */ -/* 0x7 is reserved for log2(RB_RDONLY). */ -/* 0x8 is reserved for log2(RB_DUMP). */ -/* 0x9 is reserved for log2(RB_MINIROOT). */ -#define RBX_CONFIG 0xa /* -c */ -#define RBX_VERBOSE 0xb /* -v */ -#define RBX_SERIAL 0xc /* -h */ -#define RBX_CDROM 0xd /* -C */ -/* 0xe is reserved for log2(RB_POWEROFF). */ -#define RBX_GDB 0xf /* -g */ -#define RBX_MUTE 0x10 /* -m */ -/* 0x11 is reserved for log2(RB_SELFTEST). */ -/* 0x12 is reserved for boot programs. */ -/* 0x13 is reserved for boot programs. */ -#define RBX_PAUSE 0x14 /* -p */ -#define RBX_QUIET 0x15 /* -q */ -#define RBX_NOINTR 0x1c /* -n */ -/* 0x1d is reserved for log2(RB_MULTIPLE) and is just misnamed here. */ -#define RBX_DUAL 0x1d /* -D */ -/* 0x1f is reserved for log2(RB_BOOTINFO). */ - -/* pass: -a, -s, -r, -d, -c, -v, -h, -C, -g, -m, -p, -D */ -#define RBX_MASK (OPT_SET(RBX_ASKNAME) | OPT_SET(RBX_SINGLE) | \ - OPT_SET(RBX_DFLTROOT) | OPT_SET(RBX_KDB ) | \ - OPT_SET(RBX_CONFIG) | OPT_SET(RBX_VERBOSE) | \ - OPT_SET(RBX_SERIAL) | OPT_SET(RBX_CDROM) | \ - OPT_SET(RBX_GDB ) | OPT_SET(RBX_MUTE) | \ - OPT_SET(RBX_PAUSE) | OPT_SET(RBX_DUAL)) - -#define PATH_DOTCONFIG "/boot.config" -#define PATH_CONFIG "/boot/config" -#define PATH_BOOT3 "/boot/loader" -#define PATH_KERNEL "/boot/kernel/kernel" - #define ARGS 0x900 #define NOPT 14 #define NDEV 3 #define MEM_BASE 0x12 #define MEM_EXT 0x15 #define DRV_HARD 0x80 #define DRV_MASK 0x7f #define TYPE_AD 0 #define TYPE_DA 1 #define TYPE_MAXHARD TYPE_DA #define TYPE_FD 2 -#define OPT_SET(opt) (1 << (opt)) -#define OPT_CHECK(opt) ((opts) & OPT_SET(opt)) - extern uint32_t _end; static const char optstr[NOPT] = "DhaCcdgmnpqrsv"; /* Also 'P', 'S' */ static const unsigned char flags[NOPT] = { RBX_DUAL, RBX_SERIAL, RBX_ASKNAME, RBX_CDROM, RBX_CONFIG, RBX_KDB, RBX_GDB, RBX_MUTE, RBX_NOINTR, RBX_PAUSE, RBX_QUIET, RBX_DFLTROOT, RBX_SINGLE, RBX_VERBOSE }; static const char *const dev_nm[NDEV] = {"ad", "da", "fd"}; static const unsigned char dev_maj[NDEV] = {30, 4, 2}; static struct dsk { unsigned drive; unsigned type; unsigned unit; uint8_t slice; uint8_t part; unsigned start; int init; } dsk; static char cmd[512], cmddup[512], knamebuf[1024]; static const char *kname; -static uint32_t opts; +uint32_t opts; static struct bootinfo bootinfo; #if SERIAL static int comspeed = SIOSPD; static uint8_t ioctrl = IO_KEYBOARD; #endif int main(void); void exit(int); static void load(void); static int parse(void); static int dskread(void *, unsigned, unsigned); static void printf(const char *,...); static void putchar(int); static int drvread(void *, unsigned, unsigned); static int keyhit(unsigned); static int xputc(int); static int xgetc(int); static inline int getc(int); static void memcpy(void *, const void *, int); static void memcpy(void *dst, const void *src, int len) { const char *s = src; char *d = dst; while (len--) *d++ = *s++; } static inline int strcmp(const char *s1, const char *s2) { for (; *s1 == *s2 && *s1; s1++, s2++); return (unsigned char)*s1 - (unsigned char)*s2; } #define UFS_SMALL_CGBASE #include "ufsread.c" static inline int xfsread(ufs_ino_t inode, void *buf, size_t nbyte) { if ((size_t)fsread(inode, buf, nbyte) != nbyte) { printf("Invalid %s\n", "format"); return -1; } return 0; } static inline void getstr(void) { char *s; int c; s = cmd; for (;;) { switch (c = xgetc(0)) { case 0: break; case '\177': case '\b': if (s > cmd) { s--; printf("\b \b"); } break; case '\n': case '\r': *s = 0; return; default: if (s - cmd < sizeof(cmd) - 1) *s++ = c; putchar(c); } } } static inline void putc(int c) { v86.addr = 0x10; v86.eax = 0xe00 | (c & 0xff); v86.ebx = 0x7; v86int(); } int main(void) { uint8_t autoboot; ufs_ino_t ino; size_t nbyte; dmadat = (void *)(roundup2(__base + (int32_t)&_end, 0x10000) - __base); v86.ctl = V86_FLAGS; v86.efl = PSL_RESERVED_DEFAULT | PSL_I; dsk.drive = *(uint8_t *)PTOV(ARGS); dsk.type = dsk.drive & DRV_HARD ? TYPE_AD : TYPE_FD; dsk.unit = dsk.drive & DRV_MASK; dsk.slice = *(uint8_t *)PTOV(ARGS + 1) + 1; bootinfo.bi_version = BOOTINFO_VERSION; bootinfo.bi_size = sizeof(bootinfo); /* Process configuration file */ autoboot = 1; if ((ino = lookup(PATH_CONFIG)) || (ino = lookup(PATH_DOTCONFIG))) { nbyte = fsread(ino, cmd, sizeof(cmd) - 1); cmd[nbyte] = '\0'; } if (*cmd) { memcpy(cmddup, cmd, sizeof(cmd)); if (parse()) autoboot = 0; if (!OPT_CHECK(RBX_QUIET)) printf("%s: %s", PATH_CONFIG, cmddup); /* Do not process this command twice */ *cmd = 0; } /* * Try to exec stage 3 boot loader. If interrupted by a keypress, * or in case of failure, try to load a kernel directly instead. */ if (!kname) { kname = PATH_BOOT3; if (autoboot && !keyhit(3*SECOND)) { load(); kname = PATH_KERNEL; } } /* Present the user with the boot2 prompt. */ for (;;) { if (!autoboot || !OPT_CHECK(RBX_QUIET)) printf("\nFreeBSD/x86 boot\n" "Default: %u:%s(%u,%c)%s\n" "boot: ", dsk.drive & DRV_MASK, dev_nm[dsk.type], dsk.unit, 'a' + dsk.part, kname); if (DO_SIO) sio_flush(); if (!autoboot || keyhit(3*SECOND)) getstr(); else if (!autoboot || !OPT_CHECK(RBX_QUIET)) putchar('\n'); autoboot = 0; if (parse()) putchar('\a'); else load(); } } /* XXX - Needed for btxld to link the boot2 binary; do not remove. */ void exit(int x) { } static void load(void) { union { struct exec ex; Elf32_Ehdr eh; } hdr; static Elf32_Phdr ep[2]; static Elf32_Shdr es[2]; caddr_t p; ufs_ino_t ino; uint32_t addr; int k; uint8_t i, j; if (!(ino = lookup(kname))) { if (!ls) printf("No %s\n", kname); return; } if (xfsread(ino, &hdr, sizeof(hdr))) return; if (N_GETMAGIC(hdr.ex) == ZMAGIC) { addr = hdr.ex.a_entry & 0xffffff; p = PTOV(addr); fs_off = PAGE_SIZE; if (xfsread(ino, p, hdr.ex.a_text)) return; p += roundup2(hdr.ex.a_text, PAGE_SIZE); if (xfsread(ino, p, hdr.ex.a_data)) return; } else if (IS_ELF(hdr.eh)) { fs_off = hdr.eh.e_phoff; for (j = k = 0; k < hdr.eh.e_phnum && j < 2; k++) { if (xfsread(ino, ep + j, sizeof(ep[0]))) return; if (ep[j].p_type == PT_LOAD) j++; } for (i = 0; i < 2; i++) { p = PTOV(ep[i].p_paddr & 0xffffff); fs_off = ep[i].p_offset; if (xfsread(ino, p, ep[i].p_filesz)) return; } p += roundup2(ep[1].p_memsz, PAGE_SIZE); bootinfo.bi_symtab = VTOP(p); if (hdr.eh.e_shnum == hdr.eh.e_shstrndx + 3) { fs_off = hdr.eh.e_shoff + sizeof(es[0]) * (hdr.eh.e_shstrndx + 1); if (xfsread(ino, &es, sizeof(es))) return; for (i = 0; i < 2; i++) { *(Elf32_Word *)p = es[i].sh_size; p += sizeof(es[i].sh_size); fs_off = es[i].sh_offset; if (xfsread(ino, p, es[i].sh_size)) return; p += es[i].sh_size; } } addr = hdr.eh.e_entry & 0xffffff; bootinfo.bi_esymtab = VTOP(p); } else { printf("Invalid %s\n", "format"); return; } bootinfo.bi_kernelname = VTOP(kname); bootinfo.bi_bios_dev = dsk.drive; __exec((caddr_t)addr, RB_BOOTINFO | (opts & RBX_MASK), MAKEBOOTDEV(dev_maj[dsk.type], dsk.slice, dsk.unit, dsk.part), 0, 0, 0, VTOP(&bootinfo)); } static int parse() { char *arg = cmd; char *ep, *p, *q; const char *cp; unsigned int drv; int c, i, j; size_t k; while ((c = *arg++)) { if (c == ' ' || c == '\t' || c == '\n') continue; for (p = arg; *p && *p != '\n' && *p != ' ' && *p != '\t'; p++); ep = p; if (*p) *p++ = 0; if (c == '-') { while ((c = *arg++)) { if (c == 'P') { if (*(uint8_t *)PTOV(0x496) & 0x10) { cp = "yes"; } else { opts |= OPT_SET(RBX_DUAL) | OPT_SET(RBX_SERIAL); cp = "no"; } printf("Keyboard: %s\n", cp); continue; #if SERIAL } else if (c == 'S') { j = 0; while ((unsigned int)(i = *arg++ - '0') <= 9) j = j * 10 + i; if (j > 0 && i == -'0') { comspeed = j; break; } /* Fall through to error below ('S' not in optstr[]). */ #endif } for (i = 0; c != optstr[i]; i++) if (i == NOPT - 1) return -1; opts ^= OPT_SET(flags[i]); } #if SERIAL ioctrl = OPT_CHECK(RBX_DUAL) ? (IO_SERIAL|IO_KEYBOARD) : OPT_CHECK(RBX_SERIAL) ? IO_SERIAL : IO_KEYBOARD; if (DO_SIO) { if (sio_init(115200 / comspeed) != 0) ioctrl &= ~IO_SERIAL; } #endif } else { for (q = arg--; *q && *q != '('; q++); if (*q) { drv = -1; if (arg[1] == ':') { drv = *arg - '0'; if (drv > 9) return (-1); arg += 2; } if (q - arg != 2) return -1; for (i = 0; arg[0] != dev_nm[i][0] || arg[1] != dev_nm[i][1]; i++) if (i == NDEV - 1) return -1; dsk.type = i; arg += 3; dsk.unit = *arg - '0'; if (arg[1] != ',' || dsk.unit > 9) return -1; arg += 2; dsk.slice = WHOLE_DISK_SLICE; if (arg[1] == ',') { dsk.slice = *arg - '0' + 1; if (dsk.slice > NDOSPART + 1) return -1; arg += 2; } if (arg[1] != ')') return -1; dsk.part = *arg - 'a'; if (dsk.part > 7) return (-1); arg += 2; if (drv == -1) drv = dsk.unit; dsk.drive = (dsk.type <= TYPE_MAXHARD ? DRV_HARD : 0) + drv; dsk_meta = 0; } k = ep - arg; if (k > 0) { if (k >= sizeof(knamebuf)) return -1; memcpy(knamebuf, arg, k + 1); kname = knamebuf; } } arg = p; } return 0; } static int dskread(void *buf, unsigned lba, unsigned nblk) { struct dos_partition *dp; struct disklabel *d; char *sec; unsigned i; uint8_t sl; const char *reason; if (!dsk_meta) { sec = dmadat->secbuf; dsk.start = 0; if (drvread(sec, DOSBBSECTOR, 1)) return -1; dp = (void *)(sec + DOSPARTOFF); sl = dsk.slice; if (sl < BASE_SLICE) { for (i = 0; i < NDOSPART; i++) if (dp[i].dp_typ == DOSPTYP_386BSD && (dp[i].dp_flag & 0x80 || sl < BASE_SLICE)) { sl = BASE_SLICE + i; if (dp[i].dp_flag & 0x80 || dsk.slice == COMPATIBILITY_SLICE) break; } if (dsk.slice == WHOLE_DISK_SLICE) dsk.slice = sl; } if (sl != WHOLE_DISK_SLICE) { if (sl != COMPATIBILITY_SLICE) dp += sl - BASE_SLICE; if (dp->dp_typ != DOSPTYP_386BSD) { reason = "slice"; goto error; } dsk.start = dp->dp_start; } if (drvread(sec, dsk.start + LABELSECTOR, 1)) return -1; d = (void *)(sec + LABELOFFSET); if (d->d_magic != DISKMAGIC || d->d_magic2 != DISKMAGIC) { if (dsk.part != RAW_PART) { reason = "label"; goto error; } } else { if (!dsk.init) { if (d->d_type == DTYPE_SCSI) dsk.type = TYPE_DA; dsk.init++; } if (dsk.part >= d->d_npartitions || !d->d_partitions[dsk.part].p_size) { reason = "partition"; goto error; } dsk.start += d->d_partitions[dsk.part].p_offset; dsk.start -= d->d_partitions[RAW_PART].p_offset; } } return drvread(buf, dsk.start + lba, nblk); error: printf("Invalid %s\n", reason); return -1; } static void printf(const char *fmt,...) { va_list ap; static char buf[10]; char *s; unsigned u; int c; va_start(ap, fmt); while ((c = *fmt++)) { if (c == '%') { c = *fmt++; switch (c) { case 'c': putchar(va_arg(ap, int)); continue; case 's': for (s = va_arg(ap, char *); *s; s++) putchar(*s); continue; case 'u': u = va_arg(ap, unsigned); s = buf; do *s++ = '0' + u % 10U; while (u /= 10U); while (--s >= buf) putchar(*s); continue; } } putchar(c); } va_end(ap); return; } static void putchar(int c) { if (c == '\n') xputc('\r'); xputc(c); } static int drvread(void *buf, unsigned lba, unsigned nblk) { static unsigned c = 0x2d5c7c2f; if (!OPT_CHECK(RBX_QUIET)) { xputc(c = c << 8 | c >> 24); xputc('\b'); } v86.ctl = V86_ADDR | V86_CALLF | V86_FLAGS; v86.addr = XREADORG; /* call to xread in boot1 */ v86.es = VTOPSEG(buf); v86.eax = lba; v86.ebx = VTOPOFF(buf); v86.ecx = lba >> 16; v86.edx = nblk << 8 | dsk.drive; v86int(); v86.ctl = V86_FLAGS; if (V86_CY(v86.efl)) { printf("error %u lba %u\n", v86.eax >> 8 & 0xff, lba); return -1; } return 0; } static int keyhit(unsigned ticks) { uint32_t t0, t1; if (OPT_CHECK(RBX_NOINTR)) return 0; t0 = 0; for (;;) { if (xgetc(1)) return 1; t1 = *(uint32_t *)PTOV(0x46c); if (!t0) t0 = t1; if ((uint32_t)(t1 - t0) >= ticks) return 0; } } static int xputc(int c) { if (DO_KBD) putc(c); if (DO_SIO) sio_putc(c); return c; } static int getc(int fn) { v86.addr = 0x16; v86.eax = fn << 8; v86int(); return fn == 0 ? v86.eax & 0xff : !V86_ZR(v86.efl); } static int xgetc(int fn) { if (OPT_CHECK(RBX_NOINTR)) return 0; for (;;) { if (DO_KBD && getc(1)) return fn ? 1 : getc(0); if (DO_SIO && sio_ischar()) return fn ? 1 : sio_getc(); if (fn) return 0; } } Index: projects/clang380-import/sys/boot/i386/common/rbx.h =================================================================== --- projects/clang380-import/sys/boot/i386/common/rbx.h (revision 294776) +++ projects/clang380-import/sys/boot/i386/common/rbx.h (nonexistent) @@ -1,61 +0,0 @@ -/*- - * Copyright (c) 1998 Robert Nordier - * All rights reserved. - * - * Redistribution and use in source and binary forms are freely - * permitted provided that the above copyright notice and this - * paragraph and the following disclaimer are duplicated in all - * such forms. - * - * This software is provided "AS IS" and without any express or - * implied warranties, including, without limitation, the implied - * warranties of merchantability and fitness for a particular - * purpose. - * - * $FreeBSD$ - */ - -#ifndef _RBX_H_ -#define _RBX_H_ - -#define RBX_ASKNAME 0x0 /* -a */ -#define RBX_SINGLE 0x1 /* -s */ -/* 0x2 is reserved for log2(RB_NOSYNC). */ -/* 0x3 is reserved for log2(RB_HALT). */ -/* 0x4 is reserved for log2(RB_INITNAME). */ -#define RBX_DFLTROOT 0x5 /* -r */ -#define RBX_KDB 0x6 /* -d */ -/* 0x7 is reserved for log2(RB_RDONLY). */ -/* 0x8 is reserved for log2(RB_DUMP). */ -/* 0x9 is reserved for log2(RB_MINIROOT). */ -#define RBX_CONFIG 0xa /* -c */ -#define RBX_VERBOSE 0xb /* -v */ -#define RBX_SERIAL 0xc /* -h */ -#define RBX_CDROM 0xd /* -C */ -/* 0xe is reserved for log2(RB_POWEROFF). */ -#define RBX_GDB 0xf /* -g */ -#define RBX_MUTE 0x10 /* -m */ -/* 0x11 is reserved for log2(RB_SELFTEST). */ -/* 0x12 is reserved for boot programs. */ -/* 0x13 is reserved for boot programs. */ -#define RBX_PAUSE 0x14 /* -p */ -#define RBX_QUIET 0x15 /* -q */ -#define RBX_NOINTR 0x1c /* -n */ -/* 0x1d is reserved for log2(RB_MULTIPLE) and is just misnamed here. */ -#define RBX_DUAL 0x1d /* -D */ -/* 0x1f is reserved for log2(RB_BOOTINFO). */ - -/* pass: -a, -s, -r, -d, -c, -v, -h, -C, -g, -m, -p, -D */ -#define RBX_MASK (OPT_SET(RBX_ASKNAME) | OPT_SET(RBX_SINGLE) | \ - OPT_SET(RBX_DFLTROOT) | OPT_SET(RBX_KDB ) | \ - OPT_SET(RBX_CONFIG) | OPT_SET(RBX_VERBOSE) | \ - OPT_SET(RBX_SERIAL) | OPT_SET(RBX_CDROM) | \ - OPT_SET(RBX_GDB ) | OPT_SET(RBX_MUTE) | \ - OPT_SET(RBX_PAUSE) | OPT_SET(RBX_DUAL)) - -#define OPT_SET(opt) (1 << (opt)) -#define OPT_CHECK(opt) ((opts) & OPT_SET(opt)) - -extern uint32_t opts; - -#endif /* !_RBX_H_ */ Property changes on: projects/clang380-import/sys/boot/i386/common/rbx.h ___________________________________________________________________ Deleted: svn:eol-style ## -1 +0,0 ## -native \ No newline at end of property Deleted: svn:keywords ## -1 +0,0 ## -FreeBSD=%H \ No newline at end of property Deleted: svn:mime-type ## -1 +0,0 ## -text/plain \ No newline at end of property Index: projects/clang380-import/sys/boot/i386/gptboot/gptboot.c =================================================================== --- projects/clang380-import/sys/boot/i386/gptboot/gptboot.c (revision 294776) +++ projects/clang380-import/sys/boot/i386/gptboot/gptboot.c (revision 294777) @@ -1,439 +1,435 @@ /*- * Copyright (c) 1998 Robert Nordier * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include "lib.h" #include "rbx.h" #include "drv.h" #include "util.h" #include "cons.h" #include "gpt.h" - -#define PATH_DOTCONFIG "/boot.config" -#define PATH_CONFIG "/boot/config" -#define PATH_BOOT3 "/boot/loader" -#define PATH_KERNEL "/boot/kernel/kernel" +#include "paths.h" #define ARGS 0x900 #define NOPT 14 #define NDEV 3 #define MEM_BASE 0x12 #define MEM_EXT 0x15 #define DRV_HARD 0x80 #define DRV_MASK 0x7f #define TYPE_AD 0 #define TYPE_DA 1 #define TYPE_MAXHARD TYPE_DA #define TYPE_FD 2 extern uint32_t _end; static const uuid_t freebsd_ufs_uuid = GPT_ENT_TYPE_FREEBSD_UFS; static const char optstr[NOPT] = "DhaCcdgmnpqrsv"; /* Also 'P', 'S' */ static const unsigned char flags[NOPT] = { RBX_DUAL, RBX_SERIAL, RBX_ASKNAME, RBX_CDROM, RBX_CONFIG, RBX_KDB, RBX_GDB, RBX_MUTE, RBX_NOINTR, RBX_PAUSE, RBX_QUIET, RBX_DFLTROOT, RBX_SINGLE, RBX_VERBOSE }; uint32_t opts; static const char *const dev_nm[NDEV] = {"ad", "da", "fd"}; static const unsigned char dev_maj[NDEV] = {30, 4, 2}; static struct dsk dsk; static char kname[1024]; static int comspeed = SIOSPD; static struct bootinfo bootinfo; void exit(int); static void load(void); static int parse(char *, int *); static int dskread(void *, daddr_t, unsigned); static uint32_t memsize(void); #include "ufsread.c" static inline int xfsread(ufs_ino_t inode, void *buf, size_t nbyte) { if ((size_t)fsread(inode, buf, nbyte) != nbyte) { printf("Invalid %s\n", "format"); return (-1); } return (0); } static inline uint32_t memsize(void) { v86.addr = MEM_EXT; v86.eax = 0x8800; v86int(); return (v86.eax); } static int gptinit(void) { if (gptread(&freebsd_ufs_uuid, &dsk, dmadat->secbuf) == -1) { printf("%s: unable to load GPT\n", BOOTPROG); return (-1); } if (gptfind(&freebsd_ufs_uuid, &dsk, dsk.part) == -1) { printf("%s: no UFS partition was found\n", BOOTPROG); return (-1); } dsk_meta = 0; return (0); } int main(void) { char cmd[512], cmdtmp[512]; ssize_t sz; int autoboot, dskupdated; ufs_ino_t ino; dmadat = (void *)(roundup2(__base + (int32_t)&_end, 0x10000) - __base); v86.ctl = V86_FLAGS; v86.efl = PSL_RESERVED_DEFAULT | PSL_I; dsk.drive = *(uint8_t *)PTOV(ARGS); dsk.type = dsk.drive & DRV_HARD ? TYPE_AD : TYPE_FD; dsk.unit = dsk.drive & DRV_MASK; dsk.part = -1; dsk.start = 0; bootinfo.bi_version = BOOTINFO_VERSION; bootinfo.bi_size = sizeof(bootinfo); bootinfo.bi_basemem = 0; /* XXX will be filled by loader or kernel */ bootinfo.bi_extmem = memsize(); bootinfo.bi_memsizes_valid++; /* Process configuration file */ if (gptinit() != 0) return (-1); autoboot = 1; *cmd = '\0'; for (;;) { *kname = '\0'; if ((ino = lookup(PATH_CONFIG)) || (ino = lookup(PATH_DOTCONFIG))) { sz = fsread(ino, cmd, sizeof(cmd) - 1); cmd[(sz < 0) ? 0 : sz] = '\0'; } if (*cmd != '\0') { memcpy(cmdtmp, cmd, sizeof(cmdtmp)); if (parse(cmdtmp, &dskupdated)) break; if (dskupdated && gptinit() != 0) break; if (!OPT_CHECK(RBX_QUIET)) printf("%s: %s", PATH_CONFIG, cmd); *cmd = '\0'; } if (autoboot && keyhit(3)) { if (*kname == '\0') memcpy(kname, PATH_BOOT3, sizeof(PATH_BOOT3)); break; } autoboot = 0; /* * Try to exec stage 3 boot loader. If interrupted by a * keypress, or in case of failure, try to load a kernel * directly instead. */ if (*kname != '\0') load(); memcpy(kname, PATH_BOOT3, sizeof(PATH_BOOT3)); load(); memcpy(kname, PATH_KERNEL, sizeof(PATH_KERNEL)); load(); gptbootfailed(&dsk); if (gptfind(&freebsd_ufs_uuid, &dsk, -1) == -1) break; dsk_meta = 0; } /* Present the user with the boot2 prompt. */ for (;;) { if (!OPT_CHECK(RBX_QUIET)) { printf("\nFreeBSD/x86 boot\n" "Default: %u:%s(%up%u)%s\n" "boot: ", dsk.drive & DRV_MASK, dev_nm[dsk.type], dsk.unit, dsk.part, kname); } if (ioctrl & IO_SERIAL) sio_flush(); *cmd = '\0'; if (keyhit(0)) getstr(cmd, sizeof(cmd)); else if (!OPT_CHECK(RBX_QUIET)) putchar('\n'); if (parse(cmd, &dskupdated)) { putchar('\a'); continue; } if (dskupdated && gptinit() != 0) continue; load(); } /* NOTREACHED */ } /* XXX - Needed for btxld to link the boot2 binary; do not remove. */ void exit(int x) { } static void load(void) { union { struct exec ex; Elf32_Ehdr eh; } hdr; static Elf32_Phdr ep[2]; static Elf32_Shdr es[2]; caddr_t p; ufs_ino_t ino; uint32_t addr, x; int fmt, i, j; if (!(ino = lookup(kname))) { if (!ls) { printf("%s: No %s on %u:%s(%up%u)\n", BOOTPROG, kname, dsk.drive & DRV_MASK, dev_nm[dsk.type], dsk.unit, dsk.part); } return; } if (xfsread(ino, &hdr, sizeof(hdr))) return; if (N_GETMAGIC(hdr.ex) == ZMAGIC) fmt = 0; else if (IS_ELF(hdr.eh)) fmt = 1; else { printf("Invalid %s\n", "format"); return; } if (fmt == 0) { addr = hdr.ex.a_entry & 0xffffff; p = PTOV(addr); fs_off = PAGE_SIZE; if (xfsread(ino, p, hdr.ex.a_text)) return; p += roundup2(hdr.ex.a_text, PAGE_SIZE); if (xfsread(ino, p, hdr.ex.a_data)) return; p += hdr.ex.a_data + roundup2(hdr.ex.a_bss, PAGE_SIZE); bootinfo.bi_symtab = VTOP(p); memcpy(p, &hdr.ex.a_syms, sizeof(hdr.ex.a_syms)); p += sizeof(hdr.ex.a_syms); if (hdr.ex.a_syms) { if (xfsread(ino, p, hdr.ex.a_syms)) return; p += hdr.ex.a_syms; if (xfsread(ino, p, sizeof(int))) return; x = *(uint32_t *)p; p += sizeof(int); x -= sizeof(int); if (xfsread(ino, p, x)) return; p += x; } } else { fs_off = hdr.eh.e_phoff; for (j = i = 0; i < hdr.eh.e_phnum && j < 2; i++) { if (xfsread(ino, ep + j, sizeof(ep[0]))) return; if (ep[j].p_type == PT_LOAD) j++; } for (i = 0; i < 2; i++) { p = PTOV(ep[i].p_paddr & 0xffffff); fs_off = ep[i].p_offset; if (xfsread(ino, p, ep[i].p_filesz)) return; } p += roundup2(ep[1].p_memsz, PAGE_SIZE); bootinfo.bi_symtab = VTOP(p); if (hdr.eh.e_shnum == hdr.eh.e_shstrndx + 3) { fs_off = hdr.eh.e_shoff + sizeof(es[0]) * (hdr.eh.e_shstrndx + 1); if (xfsread(ino, &es, sizeof(es))) return; for (i = 0; i < 2; i++) { memcpy(p, &es[i].sh_size, sizeof(es[i].sh_size)); p += sizeof(es[i].sh_size); fs_off = es[i].sh_offset; if (xfsread(ino, p, es[i].sh_size)) return; p += es[i].sh_size; } } addr = hdr.eh.e_entry & 0xffffff; } bootinfo.bi_esymtab = VTOP(p); bootinfo.bi_kernelname = VTOP(kname); bootinfo.bi_bios_dev = dsk.drive; __exec((caddr_t)addr, RB_BOOTINFO | (opts & RBX_MASK), MAKEBOOTDEV(dev_maj[dsk.type], dsk.part + 1, dsk.unit, 0xff), 0, 0, 0, VTOP(&bootinfo)); } static int parse(char *cmdstr, int *dskupdated) { char *arg = cmdstr; char *ep, *p, *q; const char *cp; unsigned int drv; int c, i, j; *dskupdated = 0; while ((c = *arg++)) { if (c == ' ' || c == '\t' || c == '\n') continue; for (p = arg; *p && *p != '\n' && *p != ' ' && *p != '\t'; p++); ep = p; if (*p) *p++ = 0; if (c == '-') { while ((c = *arg++)) { if (c == 'P') { if (*(uint8_t *)PTOV(0x496) & 0x10) { cp = "yes"; } else { opts |= OPT_SET(RBX_DUAL) | OPT_SET(RBX_SERIAL); cp = "no"; } printf("Keyboard: %s\n", cp); continue; } else if (c == 'S') { j = 0; while ((unsigned int)(i = *arg++ - '0') <= 9) j = j * 10 + i; if (j > 0 && i == -'0') { comspeed = j; break; } /* Fall through to error below ('S' not in optstr[]). */ } for (i = 0; c != optstr[i]; i++) if (i == NOPT - 1) return -1; opts ^= OPT_SET(flags[i]); } ioctrl = OPT_CHECK(RBX_DUAL) ? (IO_SERIAL|IO_KEYBOARD) : OPT_CHECK(RBX_SERIAL) ? IO_SERIAL : IO_KEYBOARD; if (ioctrl & IO_SERIAL) { if (sio_init(115200 / comspeed) != 0) ioctrl &= ~IO_SERIAL; } } else { for (q = arg--; *q && *q != '('; q++); if (*q) { drv = -1; if (arg[1] == ':') { drv = *arg - '0'; if (drv > 9) return (-1); arg += 2; } if (q - arg != 2) return -1; for (i = 0; arg[0] != dev_nm[i][0] || arg[1] != dev_nm[i][1]; i++) if (i == NDEV - 1) return -1; dsk.type = i; arg += 3; dsk.unit = *arg - '0'; if (arg[1] != 'p' || dsk.unit > 9) return -1; arg += 2; dsk.part = *arg - '0'; if (dsk.part < 1 || dsk.part > 9) return -1; arg++; if (arg[0] != ')') return -1; arg++; if (drv == -1) drv = dsk.unit; dsk.drive = (dsk.type <= TYPE_MAXHARD ? DRV_HARD : 0) + drv; *dskupdated = 1; } if ((i = ep - arg)) { if ((size_t)i >= sizeof(kname)) return -1; memcpy(kname, arg, i + 1); } } arg = p; } return 0; } static int dskread(void *buf, daddr_t lba, unsigned nblk) { return drvread(&dsk, buf, lba + dsk.start, nblk); } Index: projects/clang380-import/sys/boot/i386/zfsboot/zfsboot.c =================================================================== --- projects/clang380-import/sys/boot/i386/zfsboot/zfsboot.c (revision 294776) +++ projects/clang380-import/sys/boot/i386/zfsboot/zfsboot.c (revision 294777) @@ -1,836 +1,832 @@ /*- * Copyright (c) 1998 Robert Nordier * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #ifdef GPT #include #endif #include #include #include #include #include #include #include #include #include #include "lib.h" #include "rbx.h" #include "drv.h" #include "util.h" #include "cons.h" #include "bootargs.h" +#include "paths.h" #include "libzfs.h" - -#define PATH_DOTCONFIG "/boot.config" -#define PATH_CONFIG "/boot/config" -#define PATH_BOOT3 "/boot/zfsloader" -#define PATH_KERNEL "/boot/kernel/kernel" #define ARGS 0x900 #define NOPT 14 #define NDEV 3 #define BIOS_NUMDRIVES 0x475 #define DRV_HARD 0x80 #define DRV_MASK 0x7f #define TYPE_AD 0 #define TYPE_DA 1 #define TYPE_MAXHARD TYPE_DA #define TYPE_FD 2 extern uint32_t _end; #ifdef GPT static const uuid_t freebsd_zfs_uuid = GPT_ENT_TYPE_FREEBSD_ZFS; #endif static const char optstr[NOPT] = "DhaCcdgmnpqrsv"; /* Also 'P', 'S' */ static const unsigned char flags[NOPT] = { RBX_DUAL, RBX_SERIAL, RBX_ASKNAME, RBX_CDROM, RBX_CONFIG, RBX_KDB, RBX_GDB, RBX_MUTE, RBX_NOINTR, RBX_PAUSE, RBX_QUIET, RBX_DFLTROOT, RBX_SINGLE, RBX_VERBOSE }; uint32_t opts; static const char *const dev_nm[NDEV] = {"ad", "da", "fd"}; static const unsigned char dev_maj[NDEV] = {30, 4, 2}; static char cmd[512]; static char cmddup[512]; static char kname[1024]; static char rootname[256]; static int comspeed = SIOSPD; static struct bootinfo bootinfo; static uint32_t bootdev; static struct zfs_boot_args zfsargs; static struct zfsmount zfsmount; vm_offset_t high_heap_base; uint32_t bios_basemem, bios_extmem, high_heap_size; static struct bios_smap smap; /* * The minimum amount of memory to reserve in bios_extmem for the heap. */ #define HEAP_MIN (3 * 1024 * 1024) static char *heap_next; static char *heap_end; /* Buffers that must not span a 64k boundary. */ #define READ_BUF_SIZE 8192 struct dmadat { char rdbuf[READ_BUF_SIZE]; /* for reading large things */ char secbuf[READ_BUF_SIZE]; /* for MBR/disklabel */ }; static struct dmadat *dmadat; void exit(int); static void load(void); static int parse(void); static void bios_getmem(void); static void * malloc(size_t n) { char *p = heap_next; if (p + n > heap_end) { printf("malloc failure\n"); for (;;) ; return 0; } heap_next += n; return p; } static char * strdup(const char *s) { char *p = malloc(strlen(s) + 1); strcpy(p, s); return p; } #include "zfsimpl.c" /* * Read from a dnode (which must be from a ZPL filesystem). */ static int zfs_read(spa_t *spa, const dnode_phys_t *dnode, off_t *offp, void *start, size_t size) { const znode_phys_t *zp = (const znode_phys_t *) dnode->dn_bonus; size_t n; int rc; n = size; if (*offp + n > zp->zp_size) n = zp->zp_size - *offp; rc = dnode_read(spa, dnode, *offp, start, n); if (rc) return (-1); *offp += n; return (n); } /* * Current ZFS pool */ static spa_t *spa; static spa_t *primary_spa; static vdev_t *primary_vdev; /* * A wrapper for dskread that doesn't have to worry about whether the * buffer pointer crosses a 64k boundary. */ static int vdev_read(vdev_t *vdev, void *priv, off_t off, void *buf, size_t bytes) { char *p; daddr_t lba; unsigned int nb; struct dsk *dsk = (struct dsk *) priv; if ((off & (DEV_BSIZE - 1)) || (bytes & (DEV_BSIZE - 1))) return -1; p = buf; lba = off / DEV_BSIZE; lba += dsk->start; while (bytes > 0) { nb = bytes / DEV_BSIZE; if (nb > READ_BUF_SIZE / DEV_BSIZE) nb = READ_BUF_SIZE / DEV_BSIZE; if (drvread(dsk, dmadat->rdbuf, lba, nb)) return -1; memcpy(p, dmadat->rdbuf, nb * DEV_BSIZE); p += nb * DEV_BSIZE; lba += nb; bytes -= nb * DEV_BSIZE; } return 0; } static int xfsread(const dnode_phys_t *dnode, off_t *offp, void *buf, size_t nbyte) { if ((size_t)zfs_read(spa, dnode, offp, buf, nbyte) != nbyte) { printf("Invalid format\n"); return -1; } return 0; } static void bios_getmem(void) { uint64_t size; /* Parse system memory map */ v86.ebx = 0; do { v86.ctl = V86_FLAGS; v86.addr = 0x15; /* int 0x15 function 0xe820*/ v86.eax = 0xe820; v86.ecx = sizeof(struct bios_smap); v86.edx = SMAP_SIG; v86.es = VTOPSEG(&smap); v86.edi = VTOPOFF(&smap); v86int(); if (V86_CY(v86.efl) || (v86.eax != SMAP_SIG)) break; /* look for a low-memory segment that's large enough */ if ((smap.type == SMAP_TYPE_MEMORY) && (smap.base == 0) && (smap.length >= (512 * 1024))) bios_basemem = smap.length; /* look for the first segment in 'extended' memory */ if ((smap.type == SMAP_TYPE_MEMORY) && (smap.base == 0x100000)) { bios_extmem = smap.length; } /* * Look for the largest segment in 'extended' memory beyond * 1MB but below 4GB. */ if ((smap.type == SMAP_TYPE_MEMORY) && (smap.base > 0x100000) && (smap.base < 0x100000000ull)) { size = smap.length; /* * If this segment crosses the 4GB boundary, truncate it. */ if (smap.base + size > 0x100000000ull) size = 0x100000000ull - smap.base; if (size > high_heap_size) { high_heap_size = size; high_heap_base = smap.base; } } } while (v86.ebx != 0); /* Fall back to the old compatibility function for base memory */ if (bios_basemem == 0) { v86.ctl = 0; v86.addr = 0x12; /* int 0x12 */ v86int(); bios_basemem = (v86.eax & 0xffff) * 1024; } /* Fall back through several compatibility functions for extended memory */ if (bios_extmem == 0) { v86.ctl = V86_FLAGS; v86.addr = 0x15; /* int 0x15 function 0xe801*/ v86.eax = 0xe801; v86int(); if (!V86_CY(v86.efl)) { bios_extmem = ((v86.ecx & 0xffff) + ((v86.edx & 0xffff) * 64)) * 1024; } } if (bios_extmem == 0) { v86.ctl = 0; v86.addr = 0x15; /* int 0x15 function 0x88*/ v86.eax = 0x8800; v86int(); bios_extmem = (v86.eax & 0xffff) * 1024; } /* * If we have extended memory and did not find a suitable heap * region in the SMAP, use the last 3MB of 'extended' memory as a * high heap candidate. */ if (bios_extmem >= HEAP_MIN && high_heap_size < HEAP_MIN) { high_heap_size = HEAP_MIN; high_heap_base = bios_extmem + 0x100000 - HEAP_MIN; } } /* * Try to detect a device supported by the legacy int13 BIOS */ static int int13probe(int drive) { v86.ctl = V86_FLAGS; v86.addr = 0x13; v86.eax = 0x800; v86.edx = drive; v86int(); if (!V86_CY(v86.efl) && /* carry clear */ ((v86.edx & 0xff) != (drive & DRV_MASK))) { /* unit # OK */ if ((v86.ecx & 0x3f) == 0) { /* absurd sector size */ return(0); /* skip device */ } return (1); } return(0); } /* * We call this when we find a ZFS vdev - ZFS consumes the dsk * structure so we must make a new one. */ static struct dsk * copy_dsk(struct dsk *dsk) { struct dsk *newdsk; newdsk = malloc(sizeof(struct dsk)); *newdsk = *dsk; return (newdsk); } static void probe_drive(struct dsk *dsk) { #ifdef GPT struct gpt_hdr hdr; struct gpt_ent *ent; daddr_t slba, elba; unsigned part, entries_per_sec; #endif struct dos_partition *dp; char *sec; unsigned i; /* * If we find a vdev on the whole disk, stop here. Otherwise dig * out the partition table and probe each slice/partition * in turn for a vdev. */ if (vdev_probe(vdev_read, dsk, NULL) == 0) return; sec = dmadat->secbuf; dsk->start = 0; #ifdef GPT /* * First check for GPT. */ if (drvread(dsk, sec, 1, 1)) { return; } memcpy(&hdr, sec, sizeof(hdr)); if (memcmp(hdr.hdr_sig, GPT_HDR_SIG, sizeof(hdr.hdr_sig)) != 0 || hdr.hdr_lba_self != 1 || hdr.hdr_revision < 0x00010000 || hdr.hdr_entsz < sizeof(*ent) || DEV_BSIZE % hdr.hdr_entsz != 0) { goto trymbr; } /* * Probe all GPT partitions for the presense of ZFS pools. We * return the spa_t for the first we find (if requested). This * will have the effect of booting from the first pool on the * disk. */ entries_per_sec = DEV_BSIZE / hdr.hdr_entsz; slba = hdr.hdr_lba_table; elba = slba + hdr.hdr_entries / entries_per_sec; while (slba < elba) { dsk->start = 0; if (drvread(dsk, sec, slba, 1)) return; for (part = 0; part < entries_per_sec; part++) { ent = (struct gpt_ent *)(sec + part * hdr.hdr_entsz); if (memcmp(&ent->ent_type, &freebsd_zfs_uuid, sizeof(uuid_t)) == 0) { dsk->start = ent->ent_lba_start; if (vdev_probe(vdev_read, dsk, NULL) == 0) { /* * This slice had a vdev. We need a new dsk * structure now since the vdev now owns this one. */ dsk = copy_dsk(dsk); } } } slba++; } return; trymbr: #endif if (drvread(dsk, sec, DOSBBSECTOR, 1)) return; dp = (void *)(sec + DOSPARTOFF); for (i = 0; i < NDOSPART; i++) { if (!dp[i].dp_typ) continue; dsk->start = dp[i].dp_start; if (vdev_probe(vdev_read, dsk, NULL) == 0) { /* * This slice had a vdev. We need a new dsk structure now * since the vdev now owns this one. */ dsk = copy_dsk(dsk); } } } int main(void) { int autoboot, i; dnode_phys_t dn; off_t off; struct dsk *dsk; dmadat = (void *)(roundup2(__base + (int32_t)&_end, 0x10000) - __base); bios_getmem(); if (high_heap_size > 0) { heap_end = PTOV(high_heap_base + high_heap_size); heap_next = PTOV(high_heap_base); } else { heap_next = (char *) dmadat + sizeof(*dmadat); heap_end = (char *) PTOV(bios_basemem); } dsk = malloc(sizeof(struct dsk)); dsk->drive = *(uint8_t *)PTOV(ARGS); dsk->type = dsk->drive & DRV_HARD ? TYPE_AD : TYPE_FD; dsk->unit = dsk->drive & DRV_MASK; dsk->slice = *(uint8_t *)PTOV(ARGS + 1) + 1; dsk->part = 0; dsk->start = 0; dsk->init = 0; bootinfo.bi_version = BOOTINFO_VERSION; bootinfo.bi_size = sizeof(bootinfo); bootinfo.bi_basemem = bios_basemem / 1024; bootinfo.bi_extmem = bios_extmem / 1024; bootinfo.bi_memsizes_valid++; bootinfo.bi_bios_dev = dsk->drive; bootdev = MAKEBOOTDEV(dev_maj[dsk->type], dsk->slice, dsk->unit, dsk->part), /* Process configuration file */ autoboot = 1; zfs_init(); /* * Probe the boot drive first - we will try to boot from whatever * pool we find on that drive. */ probe_drive(dsk); /* * Probe the rest of the drives that the bios knows about. This * will find any other available pools and it may fill in missing * vdevs for the boot pool. */ #ifndef VIRTUALBOX for (i = 0; i < *(unsigned char *)PTOV(BIOS_NUMDRIVES); i++) #else for (i = 0; i < MAXBDDEV; i++) #endif { if ((i | DRV_HARD) == *(uint8_t *)PTOV(ARGS)) continue; if (!int13probe(i | DRV_HARD)) break; dsk = malloc(sizeof(struct dsk)); dsk->drive = i | DRV_HARD; dsk->type = dsk->drive & TYPE_AD; dsk->unit = i; dsk->slice = 0; dsk->part = 0; dsk->start = 0; dsk->init = 0; probe_drive(dsk); } /* * The first discovered pool, if any, is the pool. */ spa = spa_get_primary(); if (!spa) { printf("%s: No ZFS pools located, can't boot\n", BOOTPROG); for (;;) ; } primary_spa = spa; primary_vdev = spa_get_primary_vdev(spa); if (zfs_spa_init(spa) != 0 || zfs_mount(spa, 0, &zfsmount) != 0) { printf("%s: failed to mount default pool %s\n", BOOTPROG, spa->spa_name); autoboot = 0; } else if (zfs_lookup(&zfsmount, PATH_CONFIG, &dn) == 0 || zfs_lookup(&zfsmount, PATH_DOTCONFIG, &dn) == 0) { off = 0; zfs_read(spa, &dn, &off, cmd, sizeof(cmd)); } if (*cmd) { /* * Note that parse() is destructive to cmd[] and we also want * to honor RBX_QUIET option that could be present in cmd[]. */ memcpy(cmddup, cmd, sizeof(cmd)); if (parse()) autoboot = 0; if (!OPT_CHECK(RBX_QUIET)) printf("%s: %s\n", PATH_CONFIG, cmddup); /* Do not process this command twice */ *cmd = 0; } /* * Try to exec stage 3 boot loader. If interrupted by a keypress, * or in case of failure, try to load a kernel directly instead. */ if (autoboot && !*kname) { memcpy(kname, PATH_BOOT3, sizeof(PATH_BOOT3)); if (!keyhit(3)) { load(); memcpy(kname, PATH_KERNEL, sizeof(PATH_KERNEL)); } } /* Present the user with the boot2 prompt. */ for (;;) { if (!autoboot || !OPT_CHECK(RBX_QUIET)) { printf("\nFreeBSD/x86 boot\n"); if (zfs_rlookup(spa, zfsmount.rootobj, rootname) != 0) printf("Default: %s/<0x%llx>:%s\n" "boot: ", spa->spa_name, zfsmount.rootobj, kname); else if (rootname[0] != '\0') printf("Default: %s/%s:%s\n" "boot: ", spa->spa_name, rootname, kname); else printf("Default: %s:%s\n" "boot: ", spa->spa_name, kname); } if (ioctrl & IO_SERIAL) sio_flush(); if (!autoboot || keyhit(5)) getstr(cmd, sizeof(cmd)); else if (!autoboot || !OPT_CHECK(RBX_QUIET)) putchar('\n'); autoboot = 0; if (parse()) putchar('\a'); else load(); } } /* XXX - Needed for btxld to link the boot2 binary; do not remove. */ void exit(int x) { } static void load(void) { union { struct exec ex; Elf32_Ehdr eh; } hdr; static Elf32_Phdr ep[2]; static Elf32_Shdr es[2]; caddr_t p; dnode_phys_t dn; off_t off; uint32_t addr, x; int fmt, i, j; if (zfs_lookup(&zfsmount, kname, &dn)) { printf("\nCan't find %s\n", kname); return; } off = 0; if (xfsread(&dn, &off, &hdr, sizeof(hdr))) return; if (N_GETMAGIC(hdr.ex) == ZMAGIC) fmt = 0; else if (IS_ELF(hdr.eh)) fmt = 1; else { printf("Invalid %s\n", "format"); return; } if (fmt == 0) { addr = hdr.ex.a_entry & 0xffffff; p = PTOV(addr); off = PAGE_SIZE; if (xfsread(&dn, &off, p, hdr.ex.a_text)) return; p += roundup2(hdr.ex.a_text, PAGE_SIZE); if (xfsread(&dn, &off, p, hdr.ex.a_data)) return; p += hdr.ex.a_data + roundup2(hdr.ex.a_bss, PAGE_SIZE); bootinfo.bi_symtab = VTOP(p); memcpy(p, &hdr.ex.a_syms, sizeof(hdr.ex.a_syms)); p += sizeof(hdr.ex.a_syms); if (hdr.ex.a_syms) { if (xfsread(&dn, &off, p, hdr.ex.a_syms)) return; p += hdr.ex.a_syms; if (xfsread(&dn, &off, p, sizeof(int))) return; x = *(uint32_t *)p; p += sizeof(int); x -= sizeof(int); if (xfsread(&dn, &off, p, x)) return; p += x; } } else { off = hdr.eh.e_phoff; for (j = i = 0; i < hdr.eh.e_phnum && j < 2; i++) { if (xfsread(&dn, &off, ep + j, sizeof(ep[0]))) return; if (ep[j].p_type == PT_LOAD) j++; } for (i = 0; i < 2; i++) { p = PTOV(ep[i].p_paddr & 0xffffff); off = ep[i].p_offset; if (xfsread(&dn, &off, p, ep[i].p_filesz)) return; } p += roundup2(ep[1].p_memsz, PAGE_SIZE); bootinfo.bi_symtab = VTOP(p); if (hdr.eh.e_shnum == hdr.eh.e_shstrndx + 3) { off = hdr.eh.e_shoff + sizeof(es[0]) * (hdr.eh.e_shstrndx + 1); if (xfsread(&dn, &off, &es, sizeof(es))) return; for (i = 0; i < 2; i++) { memcpy(p, &es[i].sh_size, sizeof(es[i].sh_size)); p += sizeof(es[i].sh_size); off = es[i].sh_offset; if (xfsread(&dn, &off, p, es[i].sh_size)) return; p += es[i].sh_size; } } addr = hdr.eh.e_entry & 0xffffff; } bootinfo.bi_esymtab = VTOP(p); bootinfo.bi_kernelname = VTOP(kname); zfsargs.size = sizeof(zfsargs); zfsargs.pool = zfsmount.spa->spa_guid; zfsargs.root = zfsmount.rootobj; zfsargs.primary_pool = primary_spa->spa_guid; if (primary_vdev != NULL) zfsargs.primary_vdev = primary_vdev->v_guid; else printf("failed to detect primary vdev\n"); __exec((caddr_t)addr, RB_BOOTINFO | (opts & RBX_MASK), bootdev, KARGS_FLAGS_ZFS | KARGS_FLAGS_EXTARG, (uint32_t) spa->spa_guid, (uint32_t) (spa->spa_guid >> 32), VTOP(&bootinfo), zfsargs); } static int zfs_mount_ds(char *dsname) { uint64_t newroot; spa_t *newspa; char *q; q = strchr(dsname, '/'); if (q) *q++ = '\0'; newspa = spa_find_by_name(dsname); if (newspa == NULL) { printf("\nCan't find ZFS pool %s\n", dsname); return -1; } if (zfs_spa_init(newspa)) return -1; newroot = 0; if (q) { if (zfs_lookup_dataset(newspa, q, &newroot)) { printf("\nCan't find dataset %s in ZFS pool %s\n", q, newspa->spa_name); return -1; } } if (zfs_mount(newspa, newroot, &zfsmount)) { printf("\nCan't mount ZFS dataset\n"); return -1; } spa = newspa; return (0); } static int parse(void) { char *arg = cmd; char *ep, *p, *q; const char *cp; int c, i, j; while ((c = *arg++)) { if (c == ' ' || c == '\t' || c == '\n') continue; for (p = arg; *p && *p != '\n' && *p != ' ' && *p != '\t'; p++); ep = p; if (*p) *p++ = 0; if (c == '-') { while ((c = *arg++)) { if (c == 'P') { if (*(uint8_t *)PTOV(0x496) & 0x10) { cp = "yes"; } else { opts |= OPT_SET(RBX_DUAL) | OPT_SET(RBX_SERIAL); cp = "no"; } printf("Keyboard: %s\n", cp); continue; } else if (c == 'S') { j = 0; while ((unsigned int)(i = *arg++ - '0') <= 9) j = j * 10 + i; if (j > 0 && i == -'0') { comspeed = j; break; } /* Fall through to error below ('S' not in optstr[]). */ } for (i = 0; c != optstr[i]; i++) if (i == NOPT - 1) return -1; opts ^= OPT_SET(flags[i]); } ioctrl = OPT_CHECK(RBX_DUAL) ? (IO_SERIAL|IO_KEYBOARD) : OPT_CHECK(RBX_SERIAL) ? IO_SERIAL : IO_KEYBOARD; if (ioctrl & IO_SERIAL) { if (sio_init(115200 / comspeed) != 0) ioctrl &= ~IO_SERIAL; } } if (c == '?') { dnode_phys_t dn; if (zfs_lookup(&zfsmount, arg, &dn) == 0) { zap_list(spa, &dn); } return -1; } else { arg--; /* * Report pool status if the comment is 'status'. Lets * hope no-one wants to load /status as a kernel. */ if (!strcmp(arg, "status")) { spa_all_status(); return -1; } /* * If there is "zfs:" prefix simply ignore it. */ if (strncmp(arg, "zfs:", 4) == 0) arg += 4; /* * If there is a colon, switch pools. */ q = strchr(arg, ':'); if (q) { *q++ = '\0'; if (zfs_mount_ds(arg) != 0) return -1; arg = q; } if ((i = ep - arg)) { if ((size_t)i >= sizeof(kname)) return -1; memcpy(kname, arg, i + 1); } } arg = p; } return 0; } Index: projects/clang380-import/sys/boot/mips/beri/boot2/boot2.c =================================================================== --- projects/clang380-import/sys/boot/mips/beri/boot2/boot2.c (revision 294776) +++ projects/clang380-import/sys/boot/mips/beri/boot2/boot2.c (revision 294777) @@ -1,701 +1,661 @@ /*- * Copyright (c) 2013-2014 Robert N. M. Watson * All rights reserved. * * This software was developed by SRI International and the University of * Cambridge Computer Laboratory under DARPA/AFRL contract (FA8750-10-C-0237) * ("CTSRD"), as part of the DARPA CRASH research programme. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * Copyright (c) 1998 Robert Nordier * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include "paths.h" +#include "rbx.h" + static int beri_argc; static const char **beri_argv, **beri_envv; static uint64_t beri_memsize; #define IO_KEYBOARD 1 #define IO_SERIAL 2 #define SECOND 1 /* Circa that many ticks in a second. */ -#define RBX_ASKNAME 0x0 /* -a */ -#define RBX_SINGLE 0x1 /* -s */ -/* 0x2 is reserved for log2(RB_NOSYNC). */ -/* 0x3 is reserved for log2(RB_HALT). */ -/* 0x4 is reserved for log2(RB_INITNAME). */ -#define RBX_DFLTROOT 0x5 /* -r */ -#define RBX_KDB 0x6 /* -d */ -/* 0x7 is reserved for log2(RB_RDONLY). */ -/* 0x8 is reserved for log2(RB_DUMP). */ -/* 0x9 is reserved for log2(RB_MINIROOT). */ -#define RBX_CONFIG 0xa /* -c */ -#define RBX_VERBOSE 0xb /* -v */ -#define RBX_SERIAL 0xc /* -h */ -#define RBX_CDROM 0xd /* -C */ -/* 0xe is reserved for log2(RB_POWEROFF). */ -#define RBX_GDB 0xf /* -g */ -#define RBX_MUTE 0x10 /* -m */ -/* 0x11 is reserved for log2(RB_SELFTEST). */ -/* 0x12 is reserved for boot programs. */ -/* 0x13 is reserved for boot programs. */ -#define RBX_PAUSE 0x14 /* -p */ -#define RBX_QUIET 0x15 /* -q */ -#define RBX_NOINTR 0x1c /* -n */ -/* 0x1d is reserved for log2(RB_MULTIPLE) and is just misnamed here. */ -#define RBX_DUAL 0x1d /* -D */ -/* 0x1f is reserved for log2(RB_BOOTINFO). */ - -/* pass: -a, -s, -r, -d, -c, -v, -h, -C, -g, -m, -p, -D */ -#define RBX_MASK (OPT_SET(RBX_ASKNAME) | OPT_SET(RBX_SINGLE) | \ - OPT_SET(RBX_DFLTROOT) | OPT_SET(RBX_KDB ) | \ - OPT_SET(RBX_CONFIG) | OPT_SET(RBX_VERBOSE) | \ - OPT_SET(RBX_SERIAL) | OPT_SET(RBX_CDROM) | \ - OPT_SET(RBX_GDB ) | OPT_SET(RBX_MUTE) | \ - OPT_SET(RBX_PAUSE) | OPT_SET(RBX_DUAL)) - -#define PATH_DOTCONFIG "/boot.config" -#define PATH_CONFIG "/boot/config" -#define PATH_BOOT3 "/boot/loader" -#define PATH_KERNEL "/boot/kernel/kernel" - #define ARGS 0x900 #define NOPT 14 #define MEM_BASE 0x12 #define MEM_EXT 0x15 /* * XXXRW: I think this has to do with whether boot2 expects a partition * table? */ #define DRV_HARD 0x80 #define DRV_MASK 0x7f /* Default to using CFI flash. */ #define TYPE_DEFAULT BOOTINFO_DEV_TYPE_SDCARD /* Hard-coded assumption about location of JTAG-loaded kernel. */ #define DRAM_KERNEL_ADDR ((void *)mips_phys_to_cached(0x20000)) - -#define OPT_SET(opt) (1 << (opt)) -#define OPT_CHECK(opt) ((opts) & OPT_SET(opt)) extern uint32_t _end; static const char optstr[NOPT] = "DhaCcdgmnpqrsv"; /* Also 'P', 'S' */ static const unsigned char flags[NOPT] = { RBX_DUAL, RBX_SERIAL, RBX_ASKNAME, RBX_CDROM, RBX_CONFIG, RBX_KDB, RBX_GDB, RBX_MUTE, RBX_NOINTR, RBX_PAUSE, RBX_QUIET, RBX_DFLTROOT, RBX_SINGLE, RBX_VERBOSE }; /* These must match BOOTINFO_DEV_TYPE constants. */ static const char *const dev_nm[] = {"dram", "cfi", "sdcard"}; static const u_int dev_nm_count = sizeof(dev_nm) / sizeof(dev_nm[0]); static struct dmadat __dmadat; static struct dsk { unsigned type; /* BOOTINFO_DEV_TYPE_x object type. */ uintptr_t unitptr; /* Unit number or pointer to object. */ uint8_t slice; uint8_t part; #if 0 unsigned start; int init; #endif } dsk; static char cmd[512], cmddup[512], knamebuf[1024]; static const char *kname; static uint32_t opts; #if 0 static int comspeed = SIOSPD; #endif struct bootinfo bootinfo; static uint8_t ioctrl = IO_KEYBOARD; void exit(int); void putchar(int); static void boot_fromdram(void); static void boot_fromfs(void); static void load(void); static int parse(void); static int dskread(void *, unsigned, unsigned); static int xputc(int); static int xgetc(int); #define UFS_SMALL_CGBASE #include "ufsread.c" static inline int xfsread(ufs_ino_t inode, void *buf, size_t nbyte) { if ((size_t)fsread(inode, buf, nbyte) != nbyte) { printf("Invalid %s\n", "format"); return -1; } return 0; } static inline void getstr(void) { char *s; int c; s = cmd; for (;;) { switch (c = xgetc(0)) { case 0: break; case '\177': case '\b': if (s > cmd) { s--; printf("\b \b"); } break; case '\n': case '\r': putchar('\n'); *s = 0; return; default: if (s - cmd < sizeof(cmd) - 1) *s++ = c; putchar(c); } } } int main(u_int argc, const char *argv[], const char *envv[], uint64_t memsize) { uint8_t autoboot; ufs_ino_t ino; size_t nbyte; /* Arguments from Miniboot. */ beri_argc = argc; beri_argv = argv; beri_envv = envv; beri_memsize = memsize; dmadat = &__dmadat; #if 0 /* XXXRW: more here. */ v86.ctl = V86_FLAGS; v86.efl = PSL_RESERVED_DEFAULT | PSL_I; dsk.drive = *(uint8_t *)PTOV(ARGS); #endif dsk.type = TYPE_DEFAULT; #if 0 dsk.unit = dsk.drive & DRV_MASK; dsk.slice = *(uint8_t *)PTOV(ARGS + 1) + 1; #endif bootinfo.bi_version = BOOTINFO_VERSION; bootinfo.bi_size = sizeof(bootinfo); /* Process configuration file */ autoboot = 1; if ((ino = lookup(PATH_CONFIG)) || (ino = lookup(PATH_DOTCONFIG))) { nbyte = fsread(ino, cmd, sizeof(cmd) - 1); cmd[nbyte] = '\0'; } if (*cmd) { memcpy(cmddup, cmd, sizeof(cmd)); if (parse()) autoboot = 0; if (!OPT_CHECK(RBX_QUIET)) printf("%s: %s", PATH_CONFIG, cmddup); /* Do not process this command twice */ *cmd = 0; } /* * Try to exec stage 3 boot loader. If interrupted by a keypress, * or in case of failure, try to load a kernel directly instead. */ if (!kname) { kname = PATH_BOOT3; if (autoboot && !keyhit(3*SECOND)) { boot_fromfs(); kname = PATH_KERNEL; } } /* Present the user with the boot2 prompt. */ for (;;) { if (!autoboot || !OPT_CHECK(RBX_QUIET)) printf("\nFreeBSD/mips boot\n" "Default: %s%ju:%s\n" "boot: ", dev_nm[dsk.type], dsk.unitptr, kname); #if 0 if (ioctrl & IO_SERIAL) sio_flush(); #endif if (!autoboot || keyhit(3*SECOND)) getstr(); else if (!autoboot || !OPT_CHECK(RBX_QUIET)) putchar('\n'); autoboot = 0; if (parse()) putchar('\a'); else load(); } } /* XXX - Needed for btxld to link the boot2 binary; do not remove. */ void exit(int x) { } static void boot(void *entryp, int argc, const char *argv[], const char *envv[]) { bootinfo.bi_kernelname = (bi_ptr_t)kname; bootinfo.bi_boot2opts = opts & RBX_MASK; bootinfo.bi_boot_dev_type = dsk.type; bootinfo.bi_boot_dev_unitptr = dsk.unitptr; bootinfo.bi_memsize = beri_memsize; #if 0 /* * XXXRW: A possible future way to distinguish Miniboot passing a memory * size vs DTB..? */ if (beri_memsize <= BERI_MEMVSDTB) bootinfo.bi_memsize = beri_memsize; else bootinfo.bi_dtb = beri_memsize; #endif ((void(*)(int, const char **, const char **, void *))entryp)(argc, argv, envv, &bootinfo); } /* * Boot a kernel that has mysteriously (i.e., by JTAG) appeared in DRAM; * assume that it is already properly relocated, etc, and invoke its entry * address without question or concern. */ static void boot_fromdram(void) { void *kaddr = DRAM_KERNEL_ADDR; /* XXXRW: Something better here. */ Elf64_Ehdr *ehp = kaddr; if (!IS_ELF(*ehp)) { printf("Invalid %s\n", "format"); return; } boot((void *)ehp->e_entry, beri_argc, beri_argv, beri_envv); } static void boot_fromfs(void) { union { Elf64_Ehdr eh; } hdr; static Elf64_Phdr ep[2]; #if 0 static Elf64_Shdr es[2]; #endif caddr_t p; ufs_ino_t ino; uint64_t addr; int i, j; if (!(ino = lookup(kname))) { if (!ls) printf("No %s\n", kname); return; } if (xfsread(ino, &hdr, sizeof(hdr))) return; if (IS_ELF(hdr.eh)) { fs_off = hdr.eh.e_phoff; for (j = i = 0; i < hdr.eh.e_phnum && j < 2; i++) { if (xfsread(ino, ep + j, sizeof(ep[0]))) return; if (ep[j].p_type == PT_LOAD) j++; } for (i = 0; i < 2; i++) { p = (caddr_t)ep[i].p_paddr; fs_off = ep[i].p_offset; if (xfsread(ino, p, ep[i].p_filesz)) return; } p += roundup2(ep[1].p_memsz, PAGE_SIZE); #if 0 bootinfo.bi_symtab = VTOP(p); if (hdr.eh.e_shnum == hdr.eh.e_shstrndx + 3) { fs_off = hdr.eh.e_shoff + sizeof(es[0]) * (hdr.eh.e_shstrndx + 1); if (xfsread(ino, &es, sizeof(es))) return; for (i = 0; i < 2; i++) { *(Elf32_Word *)p = es[i].sh_size; p += sizeof(es[i].sh_size); fs_off = es[i].sh_offset; if (xfsread(ino, p, es[i].sh_size)) return; p += es[i].sh_size; } } #endif addr = hdr.eh.e_entry; #if 0 bootinfo.bi_esymtab = VTOP(p); #endif } else { printf("Invalid %s\n", "format"); return; } boot((void *)addr, beri_argc, beri_argv, beri_envv); } static void load(void) { switch (dsk.type) { case BOOTINFO_DEV_TYPE_DRAM: boot_fromdram(); break; default: boot_fromfs(); break; } } static int parse() { char *arg = cmd; char *ep, *p, *q; char unit; size_t len; const char *cp; #if 0 int c, i, j; #else int c, i; #endif while ((c = *arg++)) { if (c == ' ' || c == '\t' || c == '\n') continue; for (p = arg; *p && *p != '\n' && *p != ' ' && *p != '\t'; p++); ep = p; if (*p) *p++ = 0; if (c == '-') { while ((c = *arg++)) { if (c == 'P') { cp = "yes"; #if 0 } else { opts |= OPT_SET(RBX_DUAL) | OPT_SET(RBX_SERIAL); cp = "no"; } #endif printf("Keyboard: %s\n", cp); continue; #if 0 } else if (c == 'S') { j = 0; while ((unsigned int)(i = *arg++ - '0') <= 9) j = j * 10 + i; if (j > 0 && i == -'0') { comspeed = j; break; } /* Fall through to error below ('S' not in optstr[]). */ #endif } for (i = 0; c != optstr[i]; i++) if (i == NOPT - 1) return -1; opts ^= OPT_SET(flags[i]); } ioctrl = OPT_CHECK(RBX_DUAL) ? (IO_SERIAL|IO_KEYBOARD) : OPT_CHECK(RBX_SERIAL) ? IO_SERIAL : IO_KEYBOARD; #if 0 if (ioctrl & IO_SERIAL) { if (sio_init(115200 / comspeed) != 0) ioctrl &= ~IO_SERIAL; } #endif } else { /*- * Parse a device/kernel name. Format(s): * * path * deviceX:path * * NB: Utterly incomprehensible but space-efficient ARM/i386 * parsing removed in favour of larger but easier-to-read C. This * is still not great, however -- e.g., relating to unit handling. * * TODO: it would be nice if a DRAM pointer could be specified * here. * * XXXRW: Pick up pieces here. */ /* * Search for a parens; if none, then it's just a path. * Otherwise, it's a devicename. */ arg--; q = strsep(&arg, ":"); if (arg != NULL) { len = strlen(q); if (len < 2) { printf("Invalid device: name too short\n"); return (-1); } /* * First, handle one-digit unit. */ unit = q[len-1]; if (unit < '0' || unit > '9') { printf("Invalid device: invalid unit\n", q, unit); return (-1); } unit -= '0'; q[len-1] = '\0'; /* * Next, find matching device. */ for (i = 0; i < dev_nm_count; i++) { if (strcmp(q, dev_nm[i]) == 0) break; } if (i == dev_nm_count) { printf("Invalid device: no driver match\n"); return (-1); } dsk.type = i; dsk.unitptr = unit; /* Someday: also a DRAM pointer? */ } else arg = q; if ((i = ep - arg)) { if ((size_t)i >= sizeof(knamebuf)) return -1; memcpy(knamebuf, arg, i + 1); kname = knamebuf; } } arg = p; } return 0; } static int drvread(void *buf, unsigned lba, unsigned nblk) { /* XXXRW: eventually, we may want to pass 'drive' and 'unit' here. */ switch (dsk.type) { case BOOTINFO_DEV_TYPE_CFI: return (cfi_read(buf, lba, nblk)); case BOOTINFO_DEV_TYPE_SDCARD: return (altera_sdcard_read(buf, lba, nblk)); default: return (-1); } } static int dskread(void *buf, unsigned lba, unsigned nblk) { #if 0 /* * XXXRW: For now, assume no partition table around the file system; it's * just in raw flash. */ struct dos_partition *dp; struct disklabel *d; char *sec; unsigned i; uint8_t sl; if (!dsk_meta) { sec = dmadat->secbuf; dsk.start = 0; if (drvread(sec, DOSBBSECTOR, 1)) return -1; dp = (void *)(sec + DOSPARTOFF); sl = dsk.slice; if (sl < BASE_SLICE) { for (i = 0; i < NDOSPART; i++) if (dp[i].dp_typ == DOSPTYP_386BSD && (dp[i].dp_flag & 0x80 || sl < BASE_SLICE)) { sl = BASE_SLICE + i; if (dp[i].dp_flag & 0x80 || dsk.slice == COMPATIBILITY_SLICE) break; } if (dsk.slice == WHOLE_DISK_SLICE) dsk.slice = sl; } if (sl != WHOLE_DISK_SLICE) { if (sl != COMPATIBILITY_SLICE) dp += sl - BASE_SLICE; if (dp->dp_typ != DOSPTYP_386BSD) { printf("Invalid %s\n", "slice"); return -1; } dsk.start = le32toh(dp->dp_start); } if (drvread(sec, dsk.start + LABELSECTOR, 1)) return -1; d = (void *)(sec + LABELOFFSET); if (le32toh(d->d_magic) != DISKMAGIC || le32toh(d->d_magic2) != DISKMAGIC) { if (dsk.part != RAW_PART) { printf("Invalid %s\n", "label"); return -1; } } else { if (!dsk.init) { if (le16toh(d->d_type) == DTYPE_SCSI) dsk.type = TYPE_DA; dsk.init++; } if (dsk.part >= le16toh(d->d_npartitions) || !(le32toh(d->d_partitions[dsk.part].p_size))) { printf("Invalid %s\n", "partition"); return -1; } dsk.start += le32toh(d->d_partitions[dsk.part].p_offset); dsk.start -= le32toh(d->d_partitions[RAW_PART].p_offset); } } return drvread(buf, dsk.start + lba, nblk); #else return drvread(buf, lba, nblk); #endif } void putchar(int c) { if (c == '\n') xputc('\r'); xputc(c); } static int xputc(int c) { if (ioctrl & IO_KEYBOARD) putc(c); #if 0 if (ioctrl & IO_SERIAL) sio_putc(c); #endif return c; } static int xgetc(int fn) { if (OPT_CHECK(RBX_NOINTR)) return 0; for (;;) { if (ioctrl & IO_KEYBOARD && keyhit(0)) return fn ? 1 : getc(); #if 0 if (ioctrl & IO_SERIAL && sio_ischar()) return fn ? 1 : sio_getc(); #endif if (fn) return 0; } } Index: projects/clang380-import/sys/boot/pc98/boot2/boot2.c =================================================================== --- projects/clang380-import/sys/boot/pc98/boot2/boot2.c (revision 294776) +++ projects/clang380-import/sys/boot/pc98/boot2/boot2.c (revision 294777) @@ -1,844 +1,803 @@ /*- * Copyright (c) 2008-2009 TAKAHASHI Yoshihiro * Copyright (c) 1998 Robert Nordier * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include "boot2.h" #include "lib.h" +#include "paths.h" +#include "rbx.h" /* Define to 0 to omit serial support */ #ifndef SERIAL #define SERIAL 0 #endif #define IO_KEYBOARD 1 #define IO_SERIAL 2 #if SERIAL #define DO_KBD (ioctrl & IO_KEYBOARD) #define DO_SIO (ioctrl & IO_SERIAL) #else #define DO_KBD (1) #define DO_SIO (0) #endif #define SECOND 1 /* Circa that many ticks in a second. */ -#define RBX_ASKNAME 0x0 /* -a */ -#define RBX_SINGLE 0x1 /* -s */ -/* 0x2 is reserved for log2(RB_NOSYNC). */ -/* 0x3 is reserved for log2(RB_HALT). */ -/* 0x4 is reserved for log2(RB_INITNAME). */ -#define RBX_DFLTROOT 0x5 /* -r */ -#define RBX_KDB 0x6 /* -d */ -/* 0x7 is reserved for log2(RB_RDONLY). */ -/* 0x8 is reserved for log2(RB_DUMP). */ -/* 0x9 is reserved for log2(RB_MINIROOT). */ -#define RBX_CONFIG 0xa /* -c */ -#define RBX_VERBOSE 0xb /* -v */ -#define RBX_SERIAL 0xc /* -h */ -#define RBX_CDROM 0xd /* -C */ -/* 0xe is reserved for log2(RB_POWEROFF). */ -#define RBX_GDB 0xf /* -g */ -#define RBX_MUTE 0x10 /* -m */ -/* 0x11 is reserved for log2(RB_SELFTEST). */ -/* 0x12 is reserved for boot programs. */ -/* 0x13 is reserved for boot programs. */ -#define RBX_PAUSE 0x14 /* -p */ -#define RBX_QUIET 0x15 /* -q */ -#define RBX_NOINTR 0x1c /* -n */ -/* 0x1d is reserved for log2(RB_MULTIPLE) and is just misnamed here. */ -#define RBX_DUAL 0x1d /* -D */ -/* 0x1f is reserved for log2(RB_BOOTINFO). */ - -/* pass: -a, -s, -r, -d, -c, -v, -h, -C, -g, -m, -p, -D */ -#define RBX_MASK (OPT_SET(RBX_ASKNAME) | OPT_SET(RBX_SINGLE) | \ - OPT_SET(RBX_DFLTROOT) | OPT_SET(RBX_KDB ) | \ - OPT_SET(RBX_CONFIG) | OPT_SET(RBX_VERBOSE) | \ - OPT_SET(RBX_SERIAL) | OPT_SET(RBX_CDROM) | \ - OPT_SET(RBX_GDB ) | OPT_SET(RBX_MUTE) | \ - OPT_SET(RBX_PAUSE) | OPT_SET(RBX_DUAL)) - -#define PATH_DOTCONFIG "/boot.config" -#define PATH_CONFIG "/boot/config" -#define PATH_BOOT3 "/boot/loader" -#define PATH_KERNEL "/boot/kernel/kernel" - #define ARGS 0x900 #define NOPT 14 #define NDEV 3 #define DRV_DISK 0xf0 #define DRV_UNIT 0x0f #define TYPE_AD 0 #define TYPE_DA 1 #define TYPE_FD 2 - -#define OPT_SET(opt) (1 << (opt)) -#define OPT_CHECK(opt) ((opts) & OPT_SET(opt)) extern uint32_t _end; static const char optstr[NOPT] = "DhaCcdgmnpqrsv"; /* Also 'P', 'S' */ static const unsigned char flags[NOPT] = { RBX_DUAL, RBX_SERIAL, RBX_ASKNAME, RBX_CDROM, RBX_CONFIG, RBX_KDB, RBX_GDB, RBX_MUTE, RBX_NOINTR, RBX_PAUSE, RBX_QUIET, RBX_DFLTROOT, RBX_SINGLE, RBX_VERBOSE }; static const char *const dev_nm[NDEV] = {"ad", "da", "fd"}; static const unsigned char dev_maj[NDEV] = {30, 4, 2}; static const unsigned char dev_daua[NDEV] = {0x80, 0xa0, 0x90}; static struct dsk { unsigned daua; unsigned type; unsigned disk; unsigned unit; unsigned head; unsigned sec; uint8_t slice; uint8_t part; unsigned start; } dsk; static char cmd[512], cmddup[512], knamebuf[1024]; static const char *kname; static uint32_t opts; static struct bootinfo bootinfo; #if SERIAL static int comspeed = SIOSPD; static uint8_t ioctrl = IO_KEYBOARD; #endif int main(void); void exit(int); static void load(void); static int parse(void); static int dskread(void *, unsigned, unsigned); static void printf(const char *,...); static void putchar(int); static int drvread(void *, unsigned); static int keyhit(unsigned); static int xputc(int); static int xgetc(int); static inline int getc(int); static void memcpy(void *, const void *, int); static void memcpy(void *dst, const void *src, int len) { const char *s = src; char *d = dst; while (len--) *d++ = *s++; } static inline int strcmp(const char *s1, const char *s2) { for (; *s1 == *s2 && *s1; s1++, s2++); return (unsigned char)*s1 - (unsigned char)*s2; } #define UFS_SMALL_CGBASE #include "ufsread.c" static inline int xfsread(ufs_ino_t inode, void *buf, size_t nbyte) { if ((size_t)fsread(inode, buf, nbyte) != nbyte) { printf("Invalid %s\n", "format"); return -1; } return 0; } static inline void getstr(void) { char *s; int c; s = cmd; for (;;) { switch (c = xgetc(0)) { case 0: break; case '\177': case '\b': if (s > cmd) { s--; printf("\b \b"); } break; case '\n': case '\r': *s = 0; return; default: if (s - cmd < sizeof(cmd) - 1) *s++ = c; putchar(c); } } } static inline void putc(int c) { v86.ctl = V86_ADDR | V86_CALLF | V86_FLAGS; v86.addr = PUTCORG; /* call to putc in boot1 */ v86.eax = c; v86int(); v86.ctl = V86_FLAGS; } static inline int is_scsi_hd(void) { if ((*(u_char *)PTOV(0x482) >> dsk.unit) & 0x01) return 1; return 0; } static inline void fix_sector_size(void) { u_char *p; p = (u_char *)PTOV(0x460 + dsk.unit * 4); /* SCSI equipment parameter */ if ((p[0] & 0x1f) == 7) { /* SCSI MO */ if (!(p[3] & 0x30)) { /* 256B / sector */ p[3] |= 0x10; /* forced set 512B / sector */ p[3 + 0xa1000] |= 0x10; } } } static inline uint32_t get_diskinfo(void) { if (dsk.disk == 0x30) { /* 1440KB FD */ /* 80 cylinders, 2 heads, 18 sectors */ return (80 << 16) | (2 << 8) | 18; } else if (dsk.disk == 0x90) { /* 1200KB FD */ /* 80 cylinders, 2 heads, 15 sectors */ return (80 << 16) | (2 << 8) | 15; } else if (dsk.disk == 0x80 || is_scsi_hd()) { /* IDE or SCSI HDD */ v86.addr = 0x1b; v86.eax = 0x8400 | dsk.daua; v86int(); return (v86.ecx << 16) | v86.edx; } /* SCSI MO or CD */ fix_sector_size(); /* SCSI MO */ /* other SCSI devices */ return (65535 << 16) | (8 << 8) | 32; } static void set_dsk(void) { uint32_t di; di = get_diskinfo(); dsk.head = (di >> 8) & 0xff; dsk.sec = di & 0xff; dsk.start = 0; } #ifdef GET_BIOSGEOM static uint32_t bd_getbigeom(int bunit) { int hds = 0; int unit = 0x80; /* IDE HDD */ u_int addr = 0x55d; while (unit < 0xa7) { if (*(u_char *)PTOV(addr) & (1 << (unit & 0x0f))) if (hds++ == bunit) break; if (unit >= 0xA0) { int media = ((unsigned *)PTOV(0x460))[unit & 0x0F] & 0x1F; if (media == 7 && hds++ == bunit) /* SCSI MO */ return(0xFFFE0820); /* C:65535 H:8 S:32 */ } if (++unit == 0x84) { unit = 0xA0; /* SCSI HDD */ addr = 0x482; } } if (unit == 0xa7) return 0x4F020F; /* 1200KB FD C:80 H:2 S:15 */ v86.addr = 0x1b; v86.eax = 0x8400 | unit; v86int(); if (V86_CY(v86.efl)) return 0x4F020F; /* 1200KB FD C:80 H:2 S:15 */ return ((v86.ecx & 0xffff) << 16) | (v86.edx & 0xffff); } #endif static int check_slice(void) { struct pc98_partition *dp; char *sec; unsigned i, cyl; sec = dmadat->secbuf; cyl = *(uint16_t *)PTOV(ARGS); set_dsk(); if (dsk.type == TYPE_FD) return (WHOLE_DISK_SLICE); if (drvread(sec, PC98_BBSECTOR)) return (WHOLE_DISK_SLICE); /* Read error */ dp = (void *)(sec + PC98_PARTOFF); for (i = 0; i < PC98_NPARTS; i++) { if (dp[i].dp_mid == DOSMID_386BSD) { if (dp[i].dp_scyl <= cyl && cyl <= dp[i].dp_ecyl) return (BASE_SLICE + i); } } return (WHOLE_DISK_SLICE); } int main(void) { #ifdef GET_BIOSGEOM int i; #endif uint8_t autoboot; ufs_ino_t ino; size_t nbyte; dmadat = (void *)(roundup2(__base + (int32_t)&_end, 0x10000) - __base); v86.ctl = V86_FLAGS; v86.efl = PSL_RESERVED_DEFAULT | PSL_I; dsk.daua = *(uint8_t *)PTOV(0x584); dsk.disk = dsk.daua & DRV_DISK; dsk.unit = dsk.daua & DRV_UNIT; if (dsk.disk == 0x80) dsk.type = TYPE_AD; else if (dsk.disk == 0xa0) dsk.type = TYPE_DA; else /* if (dsk.disk == 0x30 || dsk.disk == 0x90) */ dsk.type = TYPE_FD; dsk.slice = check_slice(); #ifdef GET_BIOSGEOM for (i = 0; i < N_BIOS_GEOM; i++) bootinfo.bi_bios_geom[i] = bd_getbigeom(i); #endif bootinfo.bi_version = BOOTINFO_VERSION; bootinfo.bi_size = sizeof(bootinfo); /* Process configuration file */ autoboot = 1; if ((ino = lookup(PATH_CONFIG)) || (ino = lookup(PATH_DOTCONFIG))) { nbyte = fsread(ino, cmd, sizeof(cmd) - 1); cmd[nbyte] = '\0'; } if (*cmd) { memcpy(cmddup, cmd, sizeof(cmd)); if (parse()) autoboot = 0; if (!OPT_CHECK(RBX_QUIET)) printf("%s: %s", PATH_CONFIG, cmddup); /* Do not process this command twice */ *cmd = 0; } /* * Try to exec stage 3 boot loader. If interrupted by a keypress, * or in case of failure, try to load a kernel directly instead. */ if (!kname) { kname = PATH_BOOT3; if (autoboot && !keyhit(3*SECOND)) { load(); kname = PATH_KERNEL; } } /* Present the user with the boot2 prompt. */ for (;;) { if (!autoboot || !OPT_CHECK(RBX_QUIET)) printf("\nFreeBSD/pc98 boot\n" "Default: %u:%s(%u,%c)%s\n" "boot: ", dsk.unit, dev_nm[dsk.type], dsk.unit, 'a' + dsk.part, kname); if (DO_SIO) sio_flush(); if (!autoboot || keyhit(3*SECOND)) getstr(); else if (!autoboot || !OPT_CHECK(RBX_QUIET)) putchar('\n'); autoboot = 0; if (parse()) putchar('\a'); else load(); } } /* XXX - Needed for btxld to link the boot2 binary; do not remove. */ void exit(int x) { } static void load(void) { union { struct exec ex; Elf32_Ehdr eh; } hdr; static Elf32_Phdr ep[2]; static Elf32_Shdr es[2]; caddr_t p; ufs_ino_t ino; uint32_t addr; int k; uint8_t i, j; if (!(ino = lookup(kname))) { if (!ls) printf("No %s\n", kname); return; } if (xfsread(ino, &hdr, sizeof(hdr))) return; if (N_GETMAGIC(hdr.ex) == ZMAGIC) { addr = hdr.ex.a_entry & 0xffffff; p = PTOV(addr); fs_off = PAGE_SIZE; if (xfsread(ino, p, hdr.ex.a_text)) return; p += roundup2(hdr.ex.a_text, PAGE_SIZE); if (xfsread(ino, p, hdr.ex.a_data)) return; } else if (IS_ELF(hdr.eh)) { fs_off = hdr.eh.e_phoff; for (j = k = 0; k < hdr.eh.e_phnum && j < 2; k++) { if (xfsread(ino, ep + j, sizeof(ep[0]))) return; if (ep[j].p_type == PT_LOAD) j++; } for (i = 0; i < 2; i++) { p = PTOV(ep[i].p_paddr & 0xffffff); fs_off = ep[i].p_offset; if (xfsread(ino, p, ep[i].p_filesz)) return; } p += roundup2(ep[1].p_memsz, PAGE_SIZE); bootinfo.bi_symtab = VTOP(p); if (hdr.eh.e_shnum == hdr.eh.e_shstrndx + 3) { fs_off = hdr.eh.e_shoff + sizeof(es[0]) * (hdr.eh.e_shstrndx + 1); if (xfsread(ino, &es, sizeof(es))) return; for (i = 0; i < 2; i++) { *(Elf32_Word *)p = es[i].sh_size; p += sizeof(es[i].sh_size); fs_off = es[i].sh_offset; if (xfsread(ino, p, es[i].sh_size)) return; p += es[i].sh_size; } } addr = hdr.eh.e_entry & 0xffffff; bootinfo.bi_esymtab = VTOP(p); } else { printf("Invalid %s\n", "format"); return; } bootinfo.bi_kernelname = VTOP(kname); bootinfo.bi_bios_dev = dsk.daua; __exec((caddr_t)addr, RB_BOOTINFO | (opts & RBX_MASK), MAKEBOOTDEV(dev_maj[dsk.type], dsk.slice, dsk.unit, dsk.part), 0, 0, 0, VTOP(&bootinfo)); } static int parse() { char *arg = cmd; char *ep, *p, *q; const char *cp; unsigned int drv; int c, i, j; size_t k; while ((c = *arg++)) { if (c == ' ' || c == '\t' || c == '\n') continue; for (p = arg; *p && *p != '\n' && *p != ' ' && *p != '\t'; p++); ep = p; if (*p) *p++ = 0; if (c == '-') { while ((c = *arg++)) { if (c == 'P') { if (*(uint8_t *)PTOV(0x481) & 0x48) { cp = "yes"; } else { opts |= OPT_SET(RBX_DUAL) | OPT_SET(RBX_SERIAL); cp = "no"; } printf("Keyboard: %s\n", cp); continue; #if SERIAL } else if (c == 'S') { j = 0; while ((unsigned int)(i = *arg++ - '0') <= 9) j = j * 10 + i; if (j > 0 && i == -'0') { comspeed = j; break; } /* Fall through to error below ('S' not in optstr[]). */ #endif } for (i = 0; c != optstr[i]; i++) if (i == NOPT - 1) return -1; opts ^= OPT_SET(flags[i]); } #if SERIAL ioctrl = OPT_CHECK(RBX_DUAL) ? (IO_SERIAL|IO_KEYBOARD) : OPT_CHECK(RBX_SERIAL) ? IO_SERIAL : IO_KEYBOARD; if (DO_SIO) { if (sio_init(115200 / comspeed) != 0) ioctrl &= ~IO_SERIAL; } #endif } else { for (q = arg--; *q && *q != '('; q++); if (*q) { drv = -1; if (arg[1] == ':') { drv = *arg - '0'; if (drv > 9) return (-1); arg += 2; } if (q - arg != 2) return -1; for (i = 0; arg[0] != dev_nm[i][0] || arg[1] != dev_nm[i][1]; i++) if (i == NDEV - 1) return -1; dsk.type = i; arg += 3; dsk.unit = *arg - '0'; if (arg[1] != ',' || dsk.unit > 9) return -1; arg += 2; dsk.slice = WHOLE_DISK_SLICE; if (arg[1] == ',') { dsk.slice = *arg - '0' + 1; if (dsk.slice > PC98_NPARTS + 1) return -1; arg += 2; } if (arg[1] != ')') return -1; dsk.part = *arg - 'a'; if (dsk.part > 7) return (-1); arg += 2; if (drv == -1) drv = dsk.unit; dsk.disk = dev_daua[dsk.type]; dsk.daua = dsk.disk | dsk.unit; dsk_meta = 0; } k = ep - arg; if (k > 0) { if (k >= sizeof(knamebuf)) return -1; memcpy(knamebuf, arg, k + 1); kname = knamebuf; } } arg = p; } return 0; } static int dskread(void *buf, unsigned lba, unsigned nblk) { struct pc98_partition *dp; struct disklabel *d; char *sec; unsigned i; uint8_t sl; u_char *p; const char *reason; if (!dsk_meta) { sec = dmadat->secbuf; set_dsk(); if (dsk.type == TYPE_FD) goto unsliced; if (drvread(sec, PC98_BBSECTOR)) return -1; dp = (void *)(sec + PC98_PARTOFF); sl = dsk.slice; if (sl < BASE_SLICE) { for (i = 0; i < PC98_NPARTS; i++) if (dp[i].dp_mid == DOSMID_386BSD) { sl = BASE_SLICE + i; break; } dsk.slice = sl; } if (sl != WHOLE_DISK_SLICE) { dp += sl - BASE_SLICE; if (dp->dp_mid != DOSMID_386BSD) { reason = "slice"; goto error; } dsk.start = dp->dp_scyl * dsk.head * dsk.sec + dp->dp_shd * dsk.sec + dp->dp_ssect; } if (drvread(sec, dsk.start + LABELSECTOR)) return -1; d = (void *)(sec + LABELOFFSET); if (d->d_magic != DISKMAGIC || d->d_magic2 != DISKMAGIC) { if (dsk.part != RAW_PART) { reason = "label"; goto error; } } else { if (dsk.part >= d->d_npartitions || !d->d_partitions[dsk.part].p_size) { reason = "partition"; goto error; } dsk.start += d->d_partitions[dsk.part].p_offset; dsk.start -= d->d_partitions[RAW_PART].p_offset; } unsliced: ; } for (p = buf; nblk; p += 512, lba++, nblk--) { if ((i = drvread(p, dsk.start + lba))) return i; } return 0; error: printf("Invalid %s\n", reason); return -1; } static void printf(const char *fmt,...) { va_list ap; static char buf[10]; char *s; unsigned u; int c; va_start(ap, fmt); while ((c = *fmt++)) { if (c == '%') { c = *fmt++; switch (c) { case 'c': putchar(va_arg(ap, int)); continue; case 's': for (s = va_arg(ap, char *); *s; s++) putchar(*s); continue; case 'u': u = va_arg(ap, unsigned); s = buf; do *s++ = '0' + u % 10U; while (u /= 10U); while (--s >= buf) putchar(*s); continue; } } putchar(c); } va_end(ap); return; } static void putchar(int c) { if (c == '\n') xputc('\r'); xputc(c); } static int drvread(void *buf, unsigned lba) { static unsigned c = 0x2d5c7c2f; unsigned bpc, x, cyl, head, sec; bpc = dsk.sec * dsk.head; cyl = lba / bpc; x = lba % bpc; head = x / dsk.sec; sec = x % dsk.sec; if (!OPT_CHECK(RBX_QUIET)) { xputc(c = c << 8 | c >> 24); xputc('\b'); } v86.ctl = V86_ADDR | V86_CALLF | V86_FLAGS; v86.addr = READORG; /* call to read in boot1 */ v86.ecx = cyl; v86.edx = (head << 8) | sec; v86.edi = lba; v86.ebx = 512; v86.es = VTOPSEG(buf); v86.ebp = VTOPOFF(buf); v86int(); v86.ctl = V86_FLAGS; if (V86_CY(v86.efl)) { printf("error %u c/h/s %u/%u/%u lba %u\n", v86.eax >> 8 & 0xff, cyl, head, sec, lba); return -1; } return 0; } static inline void delay(void) { int i; i = 800; do { outb(0x5f, 0); /* about 600ns */ } while (--i >= 0); } static int keyhit(unsigned sec) { unsigned i; if (OPT_CHECK(RBX_NOINTR)) return 0; for (i = 0; i < sec * 1000; i++) { if (xgetc(1)) return 1; delay(); } return 0; } static int xputc(int c) { if (DO_KBD) putc(c); if (DO_SIO) sio_putc(c); return c; } static int getc(int fn) { v86.addr = 0x18; v86.eax = fn << 8; v86int(); if (fn) return (v86.ebx >> 8) & 0x01; else return v86.eax & 0xff; } static int xgetc(int fn) { if (OPT_CHECK(RBX_NOINTR)) return 0; for (;;) { if (DO_KBD && getc(1)) return fn ? 1 : getc(0); if (DO_SIO && sio_ischar()) return fn ? 1 : sio_getc(); if (fn) return 0; } } Index: projects/clang380-import/sys/boot/powerpc/boot1.chrp/boot1.c =================================================================== --- projects/clang380-import/sys/boot/powerpc/boot1.chrp/boot1.c (revision 294776) +++ projects/clang380-import/sys/boot/powerpc/boot1.chrp/boot1.c (revision 294777) @@ -1,767 +1,766 @@ /*- * Copyright (c) 1998 Robert Nordier * All rights reserved. * Copyright (c) 2001 Robert Drehmel * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include -#define _PATH_LOADER "/boot/loader" -#define _PATH_KERNEL "/boot/kernel/kernel" +#include "paths.h" #define BSIZEMAX 16384 typedef int putc_func_t(char c, void *arg); typedef int32_t ofwh_t; struct sp_data { char *sp_buf; u_int sp_len; u_int sp_size; }; static const char digits[] = "0123456789abcdef"; static char bootpath[128]; static char bootargs[128]; static ofwh_t bootdev; static struct fs fs; static char blkbuf[BSIZEMAX]; static unsigned int fsblks; static uint32_t fs_off; int main(int ac, char **av); static void exit(int) __dead2; static void load(const char *); static int dskread(void *, u_int64_t, int); static void usage(void); static void bcopy(const void *src, void *dst, size_t len); static void bzero(void *b, size_t len); static int domount(const char *device, int quiet); static void panic(const char *fmt, ...) __dead2; static int printf(const char *fmt, ...); static int putchar(char c, void *arg); static int vprintf(const char *fmt, va_list ap); static int vsnprintf(char *str, size_t sz, const char *fmt, va_list ap); static int __printf(const char *fmt, putc_func_t *putc, void *arg, va_list ap); static int __putc(char c, void *arg); static int __puts(const char *s, putc_func_t *putc, void *arg); static int __sputc(char c, void *arg); static char *__uitoa(char *buf, u_int val, int base); static char *__ultoa(char *buf, u_long val, int base); void __syncicache(void *, int); /* * Open Firmware interface functions */ typedef u_int32_t ofwcell_t; typedef u_int32_t u_ofwh_t; typedef int (*ofwfp_t)(void *); ofwfp_t ofw; /* the prom Open Firmware entry */ ofwh_t chosenh; void ofw_init(void *, int, int (*)(void *), char *, int); static ofwh_t ofw_finddevice(const char *); static ofwh_t ofw_open(const char *); static int ofw_close(ofwh_t); static int ofw_getprop(ofwh_t, const char *, void *, size_t); static int ofw_setprop(ofwh_t, const char *, void *, size_t); static int ofw_read(ofwh_t, void *, size_t); static int ofw_write(ofwh_t, const void *, size_t); static int ofw_claim(void *virt, size_t len, u_int align); static int ofw_seek(ofwh_t, u_int64_t); static void ofw_exit(void) __dead2; ofwh_t bootdevh; ofwh_t stdinh, stdouth; __asm(" \n\ .data \n\ .align 4 \n\ stack: \n\ .space 16384 \n\ \n\ .text \n\ .globl _start \n\ _start: \n\ lis %r1,stack@ha \n\ addi %r1,%r1,stack@l \n\ addi %r1,%r1,8192 \n\ \n\ b ofw_init \n\ "); void ofw_init(void *vpd, int res, int (*openfirm)(void *), char *arg, int argl) { char *av[16]; char *p; int ac; ofw = openfirm; chosenh = ofw_finddevice("/chosen"); ofw_getprop(chosenh, "stdin", &stdinh, sizeof(stdinh)); ofw_getprop(chosenh, "stdout", &stdouth, sizeof(stdouth)); ofw_getprop(chosenh, "bootargs", bootargs, sizeof(bootargs)); ofw_getprop(chosenh, "bootpath", bootpath, sizeof(bootpath)); bootargs[sizeof(bootargs) - 1] = '\0'; bootpath[sizeof(bootpath) - 1] = '\0'; p = bootpath; while (*p != '\0') { if (*p == ':') { *(++p) = '\0'; break; } p++; } ac = 0; p = bootargs; for (;;) { while (*p == ' ' && *p != '\0') p++; if (*p == '\0' || ac >= 16) break; av[ac++] = p; while (*p != ' ' && *p != '\0') p++; if (*p != '\0') *p++ = '\0'; } exit(main(ac, av)); } static ofwh_t ofw_finddevice(const char *name) { ofwcell_t args[] = { (ofwcell_t)"finddevice", 1, 1, (ofwcell_t)name, 0 }; if ((*ofw)(args)) { printf("ofw_finddevice: name=\"%s\"\n", name); return (1); } return (args[4]); } static int ofw_getprop(ofwh_t ofwh, const char *name, void *buf, size_t len) { ofwcell_t args[] = { (ofwcell_t)"getprop", 4, 1, (u_ofwh_t)ofwh, (ofwcell_t)name, (ofwcell_t)buf, len, 0 }; if ((*ofw)(args)) { printf("ofw_getprop: ofwh=0x%x buf=%p len=%u\n", ofwh, buf, len); return (1); } return (0); } static int ofw_setprop(ofwh_t ofwh, const char *name, void *buf, size_t len) { ofwcell_t args[] = { (ofwcell_t)"setprop", 4, 1, (u_ofwh_t)ofwh, (ofwcell_t)name, (ofwcell_t)buf, len, 0 }; if ((*ofw)(args)) { printf("ofw_setprop: ofwh=0x%x buf=%p len=%u\n", ofwh, buf, len); return (1); } return (0); } static ofwh_t ofw_open(const char *path) { ofwcell_t args[] = { (ofwcell_t)"open", 1, 1, (ofwcell_t)path, 0 }; if ((*ofw)(args)) { printf("ofw_open: path=\"%s\"\n", path); return (-1); } return (args[4]); } static int ofw_close(ofwh_t devh) { ofwcell_t args[] = { (ofwcell_t)"close", 1, 0, (u_ofwh_t)devh }; if ((*ofw)(args)) { printf("ofw_close: devh=0x%x\n", devh); return (1); } return (0); } static int ofw_claim(void *virt, size_t len, u_int align) { ofwcell_t args[] = { (ofwcell_t)"claim", 3, 1, (ofwcell_t)virt, len, align, 0, 0 }; if ((*ofw)(args)) { printf("ofw_claim: virt=%p len=%u\n", virt, len); return (1); } return (0); } static int ofw_read(ofwh_t devh, void *buf, size_t len) { ofwcell_t args[] = { (ofwcell_t)"read", 3, 1, (u_ofwh_t)devh, (ofwcell_t)buf, len, 0 }; if ((*ofw)(args)) { printf("ofw_read: devh=0x%x buf=%p len=%u\n", devh, buf, len); return (1); } return (0); } static int ofw_write(ofwh_t devh, const void *buf, size_t len) { ofwcell_t args[] = { (ofwcell_t)"write", 3, 1, (u_ofwh_t)devh, (ofwcell_t)buf, len, 0 }; if ((*ofw)(args)) { printf("ofw_write: devh=0x%x buf=%p len=%u\n", devh, buf, len); return (1); } return (0); } static int ofw_seek(ofwh_t devh, u_int64_t off) { ofwcell_t args[] = { (ofwcell_t)"seek", 3, 1, (u_ofwh_t)devh, off >> 32, off, 0 }; if ((*ofw)(args)) { printf("ofw_seek: devh=0x%x off=0x%lx\n", devh, off); return (1); } return (0); } static void ofw_exit(void) { ofwcell_t args[3]; args[0] = (ofwcell_t)"exit"; args[1] = 0; args[2] = 0; for (;;) (*ofw)(args); } static void bcopy(const void *src, void *dst, size_t len) { const char *s = src; char *d = dst; while (len-- != 0) *d++ = *s++; } static void memcpy(void *dst, const void *src, size_t len) { bcopy(src, dst, len); } static void bzero(void *b, size_t len) { char *p = b; while (len-- != 0) *p++ = 0; } static int strcmp(const char *s1, const char *s2) { for (; *s1 == *s2 && *s1; s1++, s2++) ; return ((u_char)*s1 - (u_char)*s2); } #include "ufsread.c" int main(int ac, char **av) { const char *path; char bootpath_full[255]; int i, len; - path = _PATH_LOADER; + path = PATH_LOADER; for (i = 0; i < ac; i++) { switch (av[i][0]) { case '-': switch (av[i][1]) { default: usage(); } break; default: path = av[i]; break; } } printf(" \n>> FreeBSD/powerpc Open Firmware boot block\n" " Boot path: %s\n" " Boot loader: %s\n", bootpath, path); len = 0; while (bootpath[len] != '\0') len++; memcpy(bootpath_full,bootpath,len+1); if (bootpath_full[len-1] == ':') { for (i = 0; i < 16; i++) { if (i < 10) { bootpath_full[len] = i + '0'; bootpath_full[len+1] = '\0'; } else { bootpath_full[len] = '1'; bootpath_full[len+1] = i - 10 + '0'; bootpath_full[len+2] = '\0'; } if (domount(bootpath_full,1) >= 0) break; if (bootdev > 0) ofw_close(bootdev); } if (i >= 16) panic("domount"); } else { if (domount(bootpath_full,0) == -1) panic("domount"); } printf(" Boot volume: %s\n",bootpath_full); ofw_setprop(chosenh, "bootargs", bootpath_full, len+2); load(path); return (1); } static void usage(void) { printf("usage: boot device [/path/to/loader]\n"); exit(1); } static void exit(int code) { ofw_exit(); } static struct dmadat __dmadat; static int domount(const char *device, int quiet) { dmadat = &__dmadat; if ((bootdev = ofw_open(device)) == -1) { printf("domount: can't open device\n"); return (-1); } if (fsread(0, NULL, 0)) { if (!quiet) printf("domount: can't read superblock\n"); return (-1); } return (0); } static void load(const char *fname) { Elf32_Ehdr eh; Elf32_Phdr ph; caddr_t p; ufs_ino_t ino; int i; if ((ino = lookup(fname)) == 0) { printf("File %s not found\n", fname); return; } if (fsread(ino, &eh, sizeof(eh)) != sizeof(eh)) { printf("Can't read elf header\n"); return; } if (!IS_ELF(eh)) { printf("Not an ELF file\n"); return; } for (i = 0; i < eh.e_phnum; i++) { fs_off = eh.e_phoff + i * eh.e_phentsize; if (fsread(ino, &ph, sizeof(ph)) != sizeof(ph)) { printf("Can't read program header %d\n", i); return; } if (ph.p_type != PT_LOAD) continue; fs_off = ph.p_offset; p = (caddr_t)ph.p_vaddr; ofw_claim(p,(ph.p_filesz > ph.p_memsz) ? ph.p_filesz : ph.p_memsz,0); if (fsread(ino, p, ph.p_filesz) != ph.p_filesz) { printf("Can't read content of section %d\n", i); return; } if (ph.p_filesz != ph.p_memsz) bzero(p + ph.p_filesz, ph.p_memsz - ph.p_filesz); __syncicache(p, ph.p_memsz); } ofw_close(bootdev); (*(void (*)(void *, int, ofwfp_t, char *, int))eh.e_entry)(NULL, 0, ofw,NULL,0); } static int dskread(void *buf, u_int64_t lba, int nblk) { /* * The Open Firmware should open the correct partition for us. * That means, if we read from offset zero on an open instance handle, * we should read from offset zero of that partition. */ ofw_seek(bootdev, lba * DEV_BSIZE); ofw_read(bootdev, buf, nblk * DEV_BSIZE); return (0); } static void panic(const char *fmt, ...) { char buf[128]; va_list ap; va_start(ap, fmt); vsnprintf(buf, sizeof buf, fmt, ap); printf("panic: %s\n", buf); va_end(ap); exit(1); } static int printf(const char *fmt, ...) { va_list ap; int ret; va_start(ap, fmt); ret = vprintf(fmt, ap); va_end(ap); return (ret); } static int putchar(char c, void *arg) { char buf; if (c == '\n') { buf = '\r'; ofw_write(stdouth, &buf, 1); } buf = c; ofw_write(stdouth, &buf, 1); return (1); } static int vprintf(const char *fmt, va_list ap) { int ret; ret = __printf(fmt, putchar, 0, ap); return (ret); } static int vsnprintf(char *str, size_t sz, const char *fmt, va_list ap) { struct sp_data sp; int ret; sp.sp_buf = str; sp.sp_len = 0; sp.sp_size = sz; ret = __printf(fmt, __sputc, &sp, ap); return (ret); } static int __printf(const char *fmt, putc_func_t *putc, void *arg, va_list ap) { char buf[(sizeof(long) * 8) + 1]; char *nbuf; u_long ul; u_int ui; int lflag; int sflag; char *s; int pad; int ret; int c; nbuf = &buf[sizeof buf - 1]; ret = 0; while ((c = *fmt++) != 0) { if (c != '%') { ret += putc(c, arg); continue; } lflag = 0; sflag = 0; pad = 0; reswitch: c = *fmt++; switch (c) { case '#': sflag = 1; goto reswitch; case '%': ret += putc('%', arg); break; case 'c': c = va_arg(ap, int); ret += putc(c, arg); break; case 'd': if (lflag == 0) { ui = (u_int)va_arg(ap, int); if (ui < (int)ui) { ui = -ui; ret += putc('-', arg); } s = __uitoa(nbuf, ui, 10); } else { ul = (u_long)va_arg(ap, long); if (ul < (long)ul) { ul = -ul; ret += putc('-', arg); } s = __ultoa(nbuf, ul, 10); } ret += __puts(s, putc, arg); break; case 'l': lflag = 1; goto reswitch; case 'o': if (lflag == 0) { ui = (u_int)va_arg(ap, u_int); s = __uitoa(nbuf, ui, 8); } else { ul = (u_long)va_arg(ap, u_long); s = __ultoa(nbuf, ul, 8); } ret += __puts(s, putc, arg); break; case 'p': ul = (u_long)va_arg(ap, void *); s = __ultoa(nbuf, ul, 16); ret += __puts("0x", putc, arg); ret += __puts(s, putc, arg); break; case 's': s = va_arg(ap, char *); ret += __puts(s, putc, arg); break; case 'u': if (lflag == 0) { ui = va_arg(ap, u_int); s = __uitoa(nbuf, ui, 10); } else { ul = va_arg(ap, u_long); s = __ultoa(nbuf, ul, 10); } ret += __puts(s, putc, arg); break; case 'x': if (lflag == 0) { ui = va_arg(ap, u_int); s = __uitoa(nbuf, ui, 16); } else { ul = va_arg(ap, u_long); s = __ultoa(nbuf, ul, 16); } if (sflag) ret += __puts("0x", putc, arg); ret += __puts(s, putc, arg); break; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': pad = pad * 10 + c - '0'; goto reswitch; default: break; } } return (ret); } static int __sputc(char c, void *arg) { struct sp_data *sp; sp = arg; if (sp->sp_len < sp->sp_size) sp->sp_buf[sp->sp_len++] = c; sp->sp_buf[sp->sp_len] = '\0'; return (1); } static int __puts(const char *s, putc_func_t *putc, void *arg) { const char *p; int ret; ret = 0; for (p = s; *p != '\0'; p++) ret += putc(*p, arg); return (ret); } static char * __uitoa(char *buf, u_int ui, int base) { char *p; p = buf; *p = '\0'; do *--p = digits[ui % base]; while ((ui /= base) != 0); return (p); } static char * __ultoa(char *buf, u_long ul, int base) { char *p; p = buf; *p = '\0'; do *--p = digits[ul % base]; while ((ul /= base) != 0); return (p); } Index: projects/clang380-import/sys/boot/powerpc/boot1.chrp =================================================================== --- projects/clang380-import/sys/boot/powerpc/boot1.chrp (revision 294776) +++ projects/clang380-import/sys/boot/powerpc/boot1.chrp (revision 294777) Property changes on: projects/clang380-import/sys/boot/powerpc/boot1.chrp ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys/boot/powerpc/boot1.chrp:r293686-294776 Index: projects/clang380-import/sys/boot/sparc64/boot1/boot1.c =================================================================== --- projects/clang380-import/sys/boot/sparc64/boot1/boot1.c (revision 294776) +++ projects/clang380-import/sys/boot/sparc64/boot1/boot1.c (revision 294777) @@ -1,751 +1,751 @@ /*- * Copyright (c) 1998 Robert Nordier * All rights reserved. * Copyright (c) 2001 Robert Drehmel * All rights reserved. * * Redistribution and use in source and binary forms are freely * permitted provided that the above copyright notice and this * paragraph and the following disclaimer are duplicated in all * such forms. * * This software is provided "AS IS" and without any express or * implied warranties, including, without limitation, the implied * warranties of merchantability and fitness for a particular * purpose. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include -#define _PATH_LOADER "/boot/loader" -#define _PATH_KERNEL "/boot/kernel/kernel" +#include "paths.h" + #define READ_BUF_SIZE 8192 typedef int putc_func_t(char c, void *arg); typedef int32_t ofwh_t; struct sp_data { char *sp_buf; u_int sp_len; u_int sp_size; }; static const char digits[] = "0123456789abcdef"; static char bootpath[128]; static char bootargs[128]; static ofwh_t bootdev; static uint32_t fs_off; int main(int ac, char **av); static void exit(int) __dead2; static void usage(void); #ifdef ZFSBOOT static void loadzfs(void); static int zbread(char *buf, off_t off, size_t bytes); #else static void load(const char *); #endif static void bcopy(const void *src, void *dst, size_t len); static void bzero(void *b, size_t len); static int domount(const char *device); static int dskread(void *buf, u_int64_t lba, int nblk); static void panic(const char *fmt, ...) __dead2; static int printf(const char *fmt, ...); static int putchar(char c, void *arg); static int vprintf(const char *fmt, va_list ap); static int vsnprintf(char *str, size_t sz, const char *fmt, va_list ap); static int __printf(const char *fmt, putc_func_t *putc, void *arg, va_list ap); static int __puts(const char *s, putc_func_t *putc, void *arg); static int __sputc(char c, void *arg); static char *__uitoa(char *buf, u_int val, int base); static char *__ultoa(char *buf, u_long val, int base); /* * Open Firmware interface functions */ typedef u_int64_t ofwcell_t; typedef u_int32_t u_ofwh_t; typedef int (*ofwfp_t)(ofwcell_t []); static ofwfp_t ofw; /* the PROM Open Firmware entry */ void ofw_init(int, int, int, int, ofwfp_t); static ofwh_t ofw_finddevice(const char *); static ofwh_t ofw_open(const char *); static int ofw_getprop(ofwh_t, const char *, void *, size_t); static int ofw_read(ofwh_t, void *, size_t); static int ofw_write(ofwh_t, const void *, size_t); static int ofw_seek(ofwh_t, u_int64_t); static void ofw_exit(void) __dead2; static ofwh_t stdinh, stdouth; /* * This has to stay here, as the PROM seems to ignore the * entry point specified in the a.out header. (or elftoaout is broken) */ void ofw_init(int d, int d1, int d2, int d3, ofwfp_t ofwaddr) { ofwh_t chosenh; char *av[16]; char *p; int ac; ofw = ofwaddr; chosenh = ofw_finddevice("/chosen"); ofw_getprop(chosenh, "stdin", &stdinh, sizeof(stdinh)); ofw_getprop(chosenh, "stdout", &stdouth, sizeof(stdouth)); ofw_getprop(chosenh, "bootargs", bootargs, sizeof(bootargs)); ofw_getprop(chosenh, "bootpath", bootpath, sizeof(bootpath)); bootargs[sizeof(bootargs) - 1] = '\0'; bootpath[sizeof(bootpath) - 1] = '\0'; ac = 0; p = bootargs; for (;;) { while (*p == ' ' && *p != '\0') p++; if (*p == '\0' || ac >= 16) break; av[ac++] = p; while (*p != ' ' && *p != '\0') p++; if (*p != '\0') *p++ = '\0'; } exit(main(ac, av)); } static ofwh_t ofw_finddevice(const char *name) { ofwcell_t args[] = { (ofwcell_t)"finddevice", 1, 1, (ofwcell_t)name, 0 }; if ((*ofw)(args)) { printf("ofw_finddevice: name=\"%s\"\n", name); return (1); } return (args[4]); } static int ofw_getprop(ofwh_t ofwh, const char *name, void *buf, size_t len) { ofwcell_t args[] = { (ofwcell_t)"getprop", 4, 1, (u_ofwh_t)ofwh, (ofwcell_t)name, (ofwcell_t)buf, len, 0 }; if ((*ofw)(args)) { printf("ofw_getprop: ofwh=0x%x buf=%p len=%u\n", ofwh, buf, len); return (1); } return (0); } static ofwh_t ofw_open(const char *path) { ofwcell_t args[] = { (ofwcell_t)"open", 1, 1, (ofwcell_t)path, 0 }; if ((*ofw)(args)) { printf("ofw_open: path=\"%s\"\n", path); return (-1); } return (args[4]); } static int ofw_close(ofwh_t devh) { ofwcell_t args[] = { (ofwcell_t)"close", 1, 0, (u_ofwh_t)devh }; if ((*ofw)(args)) { printf("ofw_close: devh=0x%x\n", devh); return (1); } return (0); } static int ofw_read(ofwh_t devh, void *buf, size_t len) { ofwcell_t args[] = { (ofwcell_t)"read", 3, 1, (u_ofwh_t)devh, (ofwcell_t)buf, len, 0 }; if ((*ofw)(args)) { printf("ofw_read: devh=0x%x buf=%p len=%u\n", devh, buf, len); return (1); } return (0); } static int ofw_write(ofwh_t devh, const void *buf, size_t len) { ofwcell_t args[] = { (ofwcell_t)"write", 3, 1, (u_ofwh_t)devh, (ofwcell_t)buf, len, 0 }; if ((*ofw)(args)) { printf("ofw_write: devh=0x%x buf=%p len=%u\n", devh, buf, len); return (1); } return (0); } static int ofw_seek(ofwh_t devh, u_int64_t off) { ofwcell_t args[] = { (ofwcell_t)"seek", 3, 1, (u_ofwh_t)devh, off >> 32, off, 0 }; if ((*ofw)(args)) { printf("ofw_seek: devh=0x%x off=0x%lx\n", devh, off); return (1); } return (0); } static void ofw_exit(void) { ofwcell_t args[3]; args[0] = (ofwcell_t)"exit"; args[1] = 0; args[2] = 0; for (;;) (*ofw)(args); } static void bcopy(const void *src, void *dst, size_t len) { const char *s = src; char *d = dst; while (len-- != 0) *d++ = *s++; } static void memcpy(void *dst, const void *src, size_t len) { bcopy(src, dst, len); } static void bzero(void *b, size_t len) { char *p = b; while (len-- != 0) *p++ = 0; } static int strcmp(const char *s1, const char *s2) { for (; *s1 == *s2 && *s1; s1++, s2++) ; return ((u_char)*s1 - (u_char)*s2); } int main(int ac, char **av) { const char *path; int i; - path = _PATH_LOADER; + path = PATH_LOADER; for (i = 0; i < ac; i++) { switch (av[i][0]) { case '-': switch (av[i][1]) { default: usage(); } break; default: path = av[i]; break; } } #ifdef ZFSBOOT printf(" \n>> FreeBSD/sparc64 ZFS boot block\n Boot path: %s\n", bootpath); #else printf(" \n>> FreeBSD/sparc64 boot block\n Boot path: %s\n" " Boot loader: %s\n", bootpath, path); #endif if (domount(bootpath) == -1) panic("domount"); #ifdef ZFSBOOT loadzfs(); #else load(path); #endif return (1); } static void usage(void) { printf("usage: boot device [/path/to/loader]\n"); exit(1); } static void exit(int code) { ofw_exit(); } #ifdef ZFSBOOT #define VDEV_BOOT_OFFSET (2 * 256 * 1024) static char zbuf[READ_BUF_SIZE]; static int zbread(char *buf, off_t off, size_t bytes) { size_t len; off_t poff; off_t soff; char *p; unsigned int nb; unsigned int lb; p = buf; soff = VDEV_BOOT_OFFSET + off; lb = (soff + bytes + DEV_BSIZE - 1) / DEV_BSIZE; poff = soff; while (poff < soff + bytes) { nb = lb - poff / DEV_BSIZE; if (nb > READ_BUF_SIZE / DEV_BSIZE) nb = READ_BUF_SIZE / DEV_BSIZE; if (dskread(zbuf, poff / DEV_BSIZE, nb)) break; if ((poff / DEV_BSIZE + nb) * DEV_BSIZE > soff + bytes) len = soff + bytes - poff; else len = (poff / DEV_BSIZE + nb) * DEV_BSIZE - poff; memcpy(p, zbuf + poff % DEV_BSIZE, len); p += len; poff += len; } return (poff - soff); } static void loadzfs(void) { Elf64_Ehdr eh; Elf64_Phdr ph; caddr_t p; int i; if (zbread((char *)&eh, 0, sizeof(eh)) != sizeof(eh)) { printf("Can't read elf header\n"); return; } if (!IS_ELF(eh)) { printf("Not an ELF file\n"); return; } for (i = 0; i < eh.e_phnum; i++) { fs_off = eh.e_phoff + i * eh.e_phentsize; if (zbread((char *)&ph, fs_off, sizeof(ph)) != sizeof(ph)) { printf("Can't read program header %d\n", i); return; } if (ph.p_type != PT_LOAD) continue; fs_off = ph.p_offset; p = (caddr_t)ph.p_vaddr; if (zbread(p, fs_off, ph.p_filesz) != ph.p_filesz) { printf("Can't read content of section %d\n", i); return; } if (ph.p_filesz != ph.p_memsz) bzero(p + ph.p_filesz, ph.p_memsz - ph.p_filesz); } ofw_close(bootdev); (*(void (*)(int, int, int, int, ofwfp_t))eh.e_entry)(0, 0, 0, 0, ofw); } #else #include "ufsread.c" static struct dmadat __dmadat; static void load(const char *fname) { Elf64_Ehdr eh; Elf64_Phdr ph; caddr_t p; ufs_ino_t ino; int i; if ((ino = lookup(fname)) == 0) { printf("File %s not found\n", fname); return; } if (fsread(ino, &eh, sizeof(eh)) != sizeof(eh)) { printf("Can't read elf header\n"); return; } if (!IS_ELF(eh)) { printf("Not an ELF file\n"); return; } for (i = 0; i < eh.e_phnum; i++) { fs_off = eh.e_phoff + i * eh.e_phentsize; if (fsread(ino, &ph, sizeof(ph)) != sizeof(ph)) { printf("Can't read program header %d\n", i); return; } if (ph.p_type != PT_LOAD) continue; fs_off = ph.p_offset; p = (caddr_t)ph.p_vaddr; if (fsread(ino, p, ph.p_filesz) != ph.p_filesz) { printf("Can't read content of section %d\n", i); return; } if (ph.p_filesz != ph.p_memsz) bzero(p + ph.p_filesz, ph.p_memsz - ph.p_filesz); } ofw_close(bootdev); (*(void (*)(int, int, int, int, ofwfp_t))eh.e_entry)(0, 0, 0, 0, ofw); } #endif /* ZFSBOOT */ static int domount(const char *device) { if ((bootdev = ofw_open(device)) == -1) { printf("domount: can't open device\n"); return (-1); } #ifndef ZFSBOOT dmadat = &__dmadat; if (fsread(0, NULL, 0)) { printf("domount: can't read superblock\n"); return (-1); } #endif return (0); } static int dskread(void *buf, u_int64_t lba, int nblk) { /* * The Open Firmware should open the correct partition for us. * That means, if we read from offset zero on an open instance handle, * we should read from offset zero of that partition. */ ofw_seek(bootdev, lba * DEV_BSIZE); ofw_read(bootdev, buf, nblk * DEV_BSIZE); return (0); } static void panic(const char *fmt, ...) { char buf[128]; va_list ap; va_start(ap, fmt); vsnprintf(buf, sizeof buf, fmt, ap); printf("panic: %s\n", buf); va_end(ap); exit(1); } static int printf(const char *fmt, ...) { va_list ap; int ret; va_start(ap, fmt); ret = vprintf(fmt, ap); va_end(ap); return (ret); } static int putchar(char c, void *arg) { char buf; if (c == '\n') { buf = '\r'; ofw_write(stdouth, &buf, 1); } buf = c; ofw_write(stdouth, &buf, 1); return (1); } static int vprintf(const char *fmt, va_list ap) { int ret; ret = __printf(fmt, putchar, 0, ap); return (ret); } static int vsnprintf(char *str, size_t sz, const char *fmt, va_list ap) { struct sp_data sp; int ret; sp.sp_buf = str; sp.sp_len = 0; sp.sp_size = sz; ret = __printf(fmt, __sputc, &sp, ap); return (ret); } static int __printf(const char *fmt, putc_func_t *putc, void *arg, va_list ap) { char buf[(sizeof(long) * 8) + 1]; char *nbuf; u_long ul; u_int ui; int lflag; int sflag; char *s; int pad; int ret; int c; nbuf = &buf[sizeof buf - 1]; ret = 0; while ((c = *fmt++) != 0) { if (c != '%') { ret += putc(c, arg); continue; } lflag = 0; sflag = 0; pad = 0; reswitch: c = *fmt++; switch (c) { case '#': sflag = 1; goto reswitch; case '%': ret += putc('%', arg); break; case 'c': c = va_arg(ap, int); ret += putc(c, arg); break; case 'd': if (lflag == 0) { ui = (u_int)va_arg(ap, int); if (ui < (int)ui) { ui = -ui; ret += putc('-', arg); } s = __uitoa(nbuf, ui, 10); } else { ul = (u_long)va_arg(ap, long); if (ul < (long)ul) { ul = -ul; ret += putc('-', arg); } s = __ultoa(nbuf, ul, 10); } ret += __puts(s, putc, arg); break; case 'l': lflag = 1; goto reswitch; case 'o': if (lflag == 0) { ui = (u_int)va_arg(ap, u_int); s = __uitoa(nbuf, ui, 8); } else { ul = (u_long)va_arg(ap, u_long); s = __ultoa(nbuf, ul, 8); } ret += __puts(s, putc, arg); break; case 'p': ul = (u_long)va_arg(ap, void *); s = __ultoa(nbuf, ul, 16); ret += __puts("0x", putc, arg); ret += __puts(s, putc, arg); break; case 's': s = va_arg(ap, char *); ret += __puts(s, putc, arg); break; case 'u': if (lflag == 0) { ui = va_arg(ap, u_int); s = __uitoa(nbuf, ui, 10); } else { ul = va_arg(ap, u_long); s = __ultoa(nbuf, ul, 10); } ret += __puts(s, putc, arg); break; case 'x': if (lflag == 0) { ui = va_arg(ap, u_int); s = __uitoa(nbuf, ui, 16); } else { ul = va_arg(ap, u_long); s = __ultoa(nbuf, ul, 16); } if (sflag) ret += __puts("0x", putc, arg); ret += __puts(s, putc, arg); break; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': pad = pad * 10 + c - '0'; goto reswitch; default: break; } } return (ret); } static int __sputc(char c, void *arg) { struct sp_data *sp; sp = arg; if (sp->sp_len < sp->sp_size) sp->sp_buf[sp->sp_len++] = c; sp->sp_buf[sp->sp_len] = '\0'; return (1); } static int __puts(const char *s, putc_func_t *putc, void *arg) { const char *p; int ret; ret = 0; for (p = s; *p != '\0'; p++) ret += putc(*p, arg); return (ret); } static char * __uitoa(char *buf, u_int ui, int base) { char *p; p = buf; *p = '\0'; do *--p = digits[ui % base]; while ((ui /= base) != 0); return (p); } static char * __ultoa(char *buf, u_long ul, int base) { char *p; p = buf; *p = '\0'; do *--p = digits[ul % base]; while ((ul /= base) != 0); return (p); } Index: projects/clang380-import/sys/boot/usb/tools/Makefile =================================================================== --- projects/clang380-import/sys/boot/usb/tools/Makefile (revision 294776) +++ projects/clang380-import/sys/boot/usb/tools/Makefile (revision 294777) @@ -1,10 +1,10 @@ # $FreeBSD$ PROG= sysinit -NO_MAN= +MAN= CFLAGS+= -I${.CURDIR}/../../kshim BINDIR?= /usr/bin .include Index: projects/clang380-import/sys/boot =================================================================== --- projects/clang380-import/sys/boot (revision 294776) +++ projects/clang380-import/sys/boot (revision 294777) Property changes on: projects/clang380-import/sys/boot ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys/boot:r294599-294776 Index: projects/clang380-import/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c =================================================================== --- projects/clang380-import/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c (revision 294776) +++ projects/clang380-import/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c (revision 294777) @@ -1,2127 +1,2130 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011, 2015 by Delphix. All rights reserved. */ /* Copyright (c) 2013 by Saso Kiselkov. All rights reserved. */ /* Copyright (c) 2013, Joyent, Inc. All rights reserved. */ /* Copyright (c) 2014, Nexenta Systems, Inc. All rights reserved. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef _KERNEL #include #include #endif /* * Enable/disable nopwrite feature. */ int zfs_nopwrite_enabled = 1; SYSCTL_DECL(_vfs_zfs); SYSCTL_INT(_vfs_zfs, OID_AUTO, nopwrite_enabled, CTLFLAG_RDTUN, &zfs_nopwrite_enabled, 0, "Enable nopwrite feature"); const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES] = { { DMU_BSWAP_UINT8, TRUE, "unallocated" }, { DMU_BSWAP_ZAP, TRUE, "object directory" }, { DMU_BSWAP_UINT64, TRUE, "object array" }, { DMU_BSWAP_UINT8, TRUE, "packed nvlist" }, { DMU_BSWAP_UINT64, TRUE, "packed nvlist size" }, { DMU_BSWAP_UINT64, TRUE, "bpobj" }, { DMU_BSWAP_UINT64, TRUE, "bpobj header" }, { DMU_BSWAP_UINT64, TRUE, "SPA space map header" }, { DMU_BSWAP_UINT64, TRUE, "SPA space map" }, { DMU_BSWAP_UINT64, TRUE, "ZIL intent log" }, { DMU_BSWAP_DNODE, TRUE, "DMU dnode" }, { DMU_BSWAP_OBJSET, TRUE, "DMU objset" }, { DMU_BSWAP_UINT64, TRUE, "DSL directory" }, { DMU_BSWAP_ZAP, TRUE, "DSL directory child map"}, { DMU_BSWAP_ZAP, TRUE, "DSL dataset snap map" }, { DMU_BSWAP_ZAP, TRUE, "DSL props" }, { DMU_BSWAP_UINT64, TRUE, "DSL dataset" }, { DMU_BSWAP_ZNODE, TRUE, "ZFS znode" }, { DMU_BSWAP_OLDACL, TRUE, "ZFS V0 ACL" }, { DMU_BSWAP_UINT8, FALSE, "ZFS plain file" }, { DMU_BSWAP_ZAP, TRUE, "ZFS directory" }, { DMU_BSWAP_ZAP, TRUE, "ZFS master node" }, { DMU_BSWAP_ZAP, TRUE, "ZFS delete queue" }, { DMU_BSWAP_UINT8, FALSE, "zvol object" }, { DMU_BSWAP_ZAP, TRUE, "zvol prop" }, { DMU_BSWAP_UINT8, FALSE, "other uint8[]" }, { DMU_BSWAP_UINT64, FALSE, "other uint64[]" }, { DMU_BSWAP_ZAP, TRUE, "other ZAP" }, { DMU_BSWAP_ZAP, TRUE, "persistent error log" }, { DMU_BSWAP_UINT8, TRUE, "SPA history" }, { DMU_BSWAP_UINT64, TRUE, "SPA history offsets" }, { DMU_BSWAP_ZAP, TRUE, "Pool properties" }, { DMU_BSWAP_ZAP, TRUE, "DSL permissions" }, { DMU_BSWAP_ACL, TRUE, "ZFS ACL" }, { DMU_BSWAP_UINT8, TRUE, "ZFS SYSACL" }, { DMU_BSWAP_UINT8, TRUE, "FUID table" }, { DMU_BSWAP_UINT64, TRUE, "FUID table size" }, { DMU_BSWAP_ZAP, TRUE, "DSL dataset next clones"}, { DMU_BSWAP_ZAP, TRUE, "scan work queue" }, { DMU_BSWAP_ZAP, TRUE, "ZFS user/group used" }, { DMU_BSWAP_ZAP, TRUE, "ZFS user/group quota" }, { DMU_BSWAP_ZAP, TRUE, "snapshot refcount tags"}, { DMU_BSWAP_ZAP, TRUE, "DDT ZAP algorithm" }, { DMU_BSWAP_ZAP, TRUE, "DDT statistics" }, { DMU_BSWAP_UINT8, TRUE, "System attributes" }, { DMU_BSWAP_ZAP, TRUE, "SA master node" }, { DMU_BSWAP_ZAP, TRUE, "SA attr registration" }, { DMU_BSWAP_ZAP, TRUE, "SA attr layouts" }, { DMU_BSWAP_ZAP, TRUE, "scan translations" }, { DMU_BSWAP_UINT8, FALSE, "deduplicated block" }, { DMU_BSWAP_ZAP, TRUE, "DSL deadlist map" }, { DMU_BSWAP_UINT64, TRUE, "DSL deadlist map hdr" }, { DMU_BSWAP_ZAP, TRUE, "DSL dir clones" }, { DMU_BSWAP_UINT64, TRUE, "bpobj subobj" } }; const dmu_object_byteswap_info_t dmu_ot_byteswap[DMU_BSWAP_NUMFUNCS] = { { byteswap_uint8_array, "uint8" }, { byteswap_uint16_array, "uint16" }, { byteswap_uint32_array, "uint32" }, { byteswap_uint64_array, "uint64" }, { zap_byteswap, "zap" }, { dnode_buf_byteswap, "dnode" }, { dmu_objset_byteswap, "objset" }, { zfs_znode_byteswap, "znode" }, { zfs_oldacl_byteswap, "oldacl" }, { zfs_acl_byteswap, "acl" } }; int dmu_buf_hold_noread(objset_t *os, uint64_t object, uint64_t offset, void *tag, dmu_buf_t **dbp) { dnode_t *dn; uint64_t blkid; dmu_buf_impl_t *db; int err; err = dnode_hold(os, object, FTAG, &dn); if (err) return (err); blkid = dbuf_whichblock(dn, 0, offset); rw_enter(&dn->dn_struct_rwlock, RW_READER); db = dbuf_hold(dn, blkid, tag); rw_exit(&dn->dn_struct_rwlock); dnode_rele(dn, FTAG); if (db == NULL) { *dbp = NULL; return (SET_ERROR(EIO)); } *dbp = &db->db; return (err); } int dmu_buf_hold(objset_t *os, uint64_t object, uint64_t offset, void *tag, dmu_buf_t **dbp, int flags) { int err; int db_flags = DB_RF_CANFAIL; if (flags & DMU_READ_NO_PREFETCH) db_flags |= DB_RF_NOPREFETCH; err = dmu_buf_hold_noread(os, object, offset, tag, dbp); if (err == 0) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)(*dbp); err = dbuf_read(db, NULL, db_flags); if (err != 0) { dbuf_rele(db, tag); *dbp = NULL; } } return (err); } int dmu_bonus_max(void) { return (DN_MAX_BONUSLEN); } int dmu_set_bonus(dmu_buf_t *db_fake, int newsize, dmu_tx_t *tx) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; dnode_t *dn; int error; DB_DNODE_ENTER(db); dn = DB_DNODE(db); if (dn->dn_bonus != db) { error = SET_ERROR(EINVAL); } else if (newsize < 0 || newsize > db_fake->db_size) { error = SET_ERROR(EINVAL); } else { dnode_setbonuslen(dn, newsize, tx); error = 0; } DB_DNODE_EXIT(db); return (error); } int dmu_set_bonustype(dmu_buf_t *db_fake, dmu_object_type_t type, dmu_tx_t *tx) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; dnode_t *dn; int error; DB_DNODE_ENTER(db); dn = DB_DNODE(db); if (!DMU_OT_IS_VALID(type)) { error = SET_ERROR(EINVAL); } else if (dn->dn_bonus != db) { error = SET_ERROR(EINVAL); } else { dnode_setbonus_type(dn, type, tx); error = 0; } DB_DNODE_EXIT(db); return (error); } dmu_object_type_t dmu_get_bonustype(dmu_buf_t *db_fake) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; dnode_t *dn; dmu_object_type_t type; DB_DNODE_ENTER(db); dn = DB_DNODE(db); type = dn->dn_bonustype; DB_DNODE_EXIT(db); return (type); } int dmu_rm_spill(objset_t *os, uint64_t object, dmu_tx_t *tx) { dnode_t *dn; int error; error = dnode_hold(os, object, FTAG, &dn); dbuf_rm_spill(dn, tx); rw_enter(&dn->dn_struct_rwlock, RW_WRITER); dnode_rm_spill(dn, tx); rw_exit(&dn->dn_struct_rwlock); dnode_rele(dn, FTAG); return (error); } /* * returns ENOENT, EIO, or 0. */ int dmu_bonus_hold(objset_t *os, uint64_t object, void *tag, dmu_buf_t **dbp) { dnode_t *dn; dmu_buf_impl_t *db; int error; error = dnode_hold(os, object, FTAG, &dn); if (error) return (error); rw_enter(&dn->dn_struct_rwlock, RW_READER); if (dn->dn_bonus == NULL) { rw_exit(&dn->dn_struct_rwlock); rw_enter(&dn->dn_struct_rwlock, RW_WRITER); if (dn->dn_bonus == NULL) dbuf_create_bonus(dn); } db = dn->dn_bonus; /* as long as the bonus buf is held, the dnode will be held */ if (refcount_add(&db->db_holds, tag) == 1) { VERIFY(dnode_add_ref(dn, db)); atomic_inc_32(&dn->dn_dbufs_count); } /* * Wait to drop dn_struct_rwlock until after adding the bonus dbuf's * hold and incrementing the dbuf count to ensure that dnode_move() sees * a dnode hold for every dbuf. */ rw_exit(&dn->dn_struct_rwlock); dnode_rele(dn, FTAG); VERIFY(0 == dbuf_read(db, NULL, DB_RF_MUST_SUCCEED | DB_RF_NOPREFETCH)); *dbp = &db->db; return (0); } /* * returns ENOENT, EIO, or 0. * * This interface will allocate a blank spill dbuf when a spill blk * doesn't already exist on the dnode. * * if you only want to find an already existing spill db, then * dmu_spill_hold_existing() should be used. */ int dmu_spill_hold_by_dnode(dnode_t *dn, uint32_t flags, void *tag, dmu_buf_t **dbp) { dmu_buf_impl_t *db = NULL; int err; if ((flags & DB_RF_HAVESTRUCT) == 0) rw_enter(&dn->dn_struct_rwlock, RW_READER); db = dbuf_hold(dn, DMU_SPILL_BLKID, tag); if ((flags & DB_RF_HAVESTRUCT) == 0) rw_exit(&dn->dn_struct_rwlock); ASSERT(db != NULL); err = dbuf_read(db, NULL, flags); if (err == 0) *dbp = &db->db; else dbuf_rele(db, tag); return (err); } int dmu_spill_hold_existing(dmu_buf_t *bonus, void *tag, dmu_buf_t **dbp) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)bonus; dnode_t *dn; int err; DB_DNODE_ENTER(db); dn = DB_DNODE(db); if (spa_version(dn->dn_objset->os_spa) < SPA_VERSION_SA) { err = SET_ERROR(EINVAL); } else { rw_enter(&dn->dn_struct_rwlock, RW_READER); if (!dn->dn_have_spill) { err = SET_ERROR(ENOENT); } else { err = dmu_spill_hold_by_dnode(dn, DB_RF_HAVESTRUCT | DB_RF_CANFAIL, tag, dbp); } rw_exit(&dn->dn_struct_rwlock); } DB_DNODE_EXIT(db); return (err); } int dmu_spill_hold_by_bonus(dmu_buf_t *bonus, void *tag, dmu_buf_t **dbp) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)bonus; dnode_t *dn; int err; DB_DNODE_ENTER(db); dn = DB_DNODE(db); err = dmu_spill_hold_by_dnode(dn, DB_RF_CANFAIL, tag, dbp); DB_DNODE_EXIT(db); return (err); } /* * Note: longer-term, we should modify all of the dmu_buf_*() interfaces * to take a held dnode rather than -- the lookup is wasteful, * and can induce severe lock contention when writing to several files * whose dnodes are in the same block. */ static int dmu_buf_hold_array_by_dnode(dnode_t *dn, uint64_t offset, uint64_t length, boolean_t read, void *tag, int *numbufsp, dmu_buf_t ***dbpp, uint32_t flags) { dmu_buf_t **dbp; uint64_t blkid, nblks, i; uint32_t dbuf_flags; int err; zio_t *zio; ASSERT(length <= DMU_MAX_ACCESS); /* * Note: We directly notify the prefetch code of this read, so that * we can tell it about the multi-block read. dbuf_read() only knows * about the one block it is accessing. */ dbuf_flags = DB_RF_CANFAIL | DB_RF_NEVERWAIT | DB_RF_HAVESTRUCT | DB_RF_NOPREFETCH; rw_enter(&dn->dn_struct_rwlock, RW_READER); if (dn->dn_datablkshift) { int blkshift = dn->dn_datablkshift; nblks = (P2ROUNDUP(offset + length, 1ULL << blkshift) - P2ALIGN(offset, 1ULL << blkshift)) >> blkshift; } else { if (offset + length > dn->dn_datablksz) { zfs_panic_recover("zfs: accessing past end of object " "%llx/%llx (size=%u access=%llu+%llu)", (longlong_t)dn->dn_objset-> os_dsl_dataset->ds_object, (longlong_t)dn->dn_object, dn->dn_datablksz, (longlong_t)offset, (longlong_t)length); rw_exit(&dn->dn_struct_rwlock); return (SET_ERROR(EIO)); } nblks = 1; } dbp = kmem_zalloc(sizeof (dmu_buf_t *) * nblks, KM_SLEEP); zio = zio_root(dn->dn_objset->os_spa, NULL, NULL, ZIO_FLAG_CANFAIL); blkid = dbuf_whichblock(dn, 0, offset); for (i = 0; i < nblks; i++) { dmu_buf_impl_t *db = dbuf_hold(dn, blkid + i, tag); if (db == NULL) { rw_exit(&dn->dn_struct_rwlock); dmu_buf_rele_array(dbp, nblks, tag); zio_nowait(zio); return (SET_ERROR(EIO)); } /* initiate async i/o */ if (read) (void) dbuf_read(db, zio, dbuf_flags); #ifdef _KERNEL else curthread->td_ru.ru_oublock++; #endif dbp[i] = &db->db; } if ((flags & DMU_READ_NO_PREFETCH) == 0 && read && length <= zfetch_array_rd_sz) { dmu_zfetch(&dn->dn_zfetch, blkid, nblks); } rw_exit(&dn->dn_struct_rwlock); /* wait for async i/o */ err = zio_wait(zio); if (err) { dmu_buf_rele_array(dbp, nblks, tag); return (err); } /* wait for other io to complete */ if (read) { for (i = 0; i < nblks; i++) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)dbp[i]; mutex_enter(&db->db_mtx); while (db->db_state == DB_READ || db->db_state == DB_FILL) cv_wait(&db->db_changed, &db->db_mtx); if (db->db_state == DB_UNCACHED) err = SET_ERROR(EIO); mutex_exit(&db->db_mtx); if (err) { dmu_buf_rele_array(dbp, nblks, tag); return (err); } } } *numbufsp = nblks; *dbpp = dbp; return (0); } static int dmu_buf_hold_array(objset_t *os, uint64_t object, uint64_t offset, uint64_t length, int read, void *tag, int *numbufsp, dmu_buf_t ***dbpp) { dnode_t *dn; int err; err = dnode_hold(os, object, FTAG, &dn); if (err) return (err); err = dmu_buf_hold_array_by_dnode(dn, offset, length, read, tag, numbufsp, dbpp, DMU_READ_PREFETCH); dnode_rele(dn, FTAG); return (err); } int dmu_buf_hold_array_by_bonus(dmu_buf_t *db_fake, uint64_t offset, uint64_t length, boolean_t read, void *tag, int *numbufsp, dmu_buf_t ***dbpp) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; dnode_t *dn; int err; DB_DNODE_ENTER(db); dn = DB_DNODE(db); err = dmu_buf_hold_array_by_dnode(dn, offset, length, read, tag, numbufsp, dbpp, DMU_READ_PREFETCH); DB_DNODE_EXIT(db); return (err); } void dmu_buf_rele_array(dmu_buf_t **dbp_fake, int numbufs, void *tag) { int i; dmu_buf_impl_t **dbp = (dmu_buf_impl_t **)dbp_fake; if (numbufs == 0) return; for (i = 0; i < numbufs; i++) { if (dbp[i]) dbuf_rele(dbp[i], tag); } kmem_free(dbp, sizeof (dmu_buf_t *) * numbufs); } /* * Issue prefetch i/os for the given blocks. If level is greater than 0, the * indirect blocks prefeteched will be those that point to the blocks containing * the data starting at offset, and continuing to offset + len. * * Note that if the indirect blocks above the blocks being prefetched are not in * cache, they will be asychronously read in. */ void dmu_prefetch(objset_t *os, uint64_t object, int64_t level, uint64_t offset, uint64_t len, zio_priority_t pri) { dnode_t *dn; uint64_t blkid; int nblks, err; if (len == 0) { /* they're interested in the bonus buffer */ dn = DMU_META_DNODE(os); if (object == 0 || object >= DN_MAX_OBJECT) return; rw_enter(&dn->dn_struct_rwlock, RW_READER); blkid = dbuf_whichblock(dn, level, object * sizeof (dnode_phys_t)); dbuf_prefetch(dn, level, blkid, pri, 0); rw_exit(&dn->dn_struct_rwlock); return; } /* * XXX - Note, if the dnode for the requested object is not * already cached, we will do a *synchronous* read in the * dnode_hold() call. The same is true for any indirects. */ err = dnode_hold(os, object, FTAG, &dn); if (err != 0) return; rw_enter(&dn->dn_struct_rwlock, RW_READER); /* * offset + len - 1 is the last byte we want to prefetch for, and offset * is the first. Then dbuf_whichblk(dn, level, off + len - 1) is the * last block we want to prefetch, and dbuf_whichblock(dn, level, * offset) is the first. Then the number we need to prefetch is the * last - first + 1. */ if (level > 0 || dn->dn_datablkshift != 0) { nblks = dbuf_whichblock(dn, level, offset + len - 1) - dbuf_whichblock(dn, level, offset) + 1; } else { nblks = (offset < dn->dn_datablksz); } if (nblks != 0) { blkid = dbuf_whichblock(dn, level, offset); for (int i = 0; i < nblks; i++) dbuf_prefetch(dn, level, blkid + i, pri, 0); } rw_exit(&dn->dn_struct_rwlock); dnode_rele(dn, FTAG); } /* * Get the next "chunk" of file data to free. We traverse the file from * the end so that the file gets shorter over time (if we crashes in the * middle, this will leave us in a better state). We find allocated file * data by simply searching the allocated level 1 indirects. * * On input, *start should be the first offset that does not need to be * freed (e.g. "offset + length"). On return, *start will be the first * offset that should be freed. */ static int get_next_chunk(dnode_t *dn, uint64_t *start, uint64_t minimum) { uint64_t maxblks = DMU_MAX_ACCESS >> (dn->dn_indblkshift + 1); /* bytes of data covered by a level-1 indirect block */ uint64_t iblkrange = dn->dn_datablksz * EPB(dn->dn_indblkshift, SPA_BLKPTRSHIFT); ASSERT3U(minimum, <=, *start); if (*start - minimum <= iblkrange * maxblks) { *start = minimum; return (0); } ASSERT(ISP2(iblkrange)); for (uint64_t blks = 0; *start > minimum && blks < maxblks; blks++) { int err; /* * dnode_next_offset(BACKWARDS) will find an allocated L1 * indirect block at or before the input offset. We must * decrement *start so that it is at the end of the region * to search. */ (*start)--; err = dnode_next_offset(dn, DNODE_FIND_BACKWARDS, start, 2, 1, 0); /* if there are no indirect blocks before start, we are done */ if (err == ESRCH) { *start = minimum; break; } else if (err != 0) { return (err); } /* set start to the beginning of this L1 indirect */ *start = P2ALIGN(*start, iblkrange); } if (*start < minimum) *start = minimum; return (0); } static int dmu_free_long_range_impl(objset_t *os, dnode_t *dn, uint64_t offset, uint64_t length) { uint64_t object_size = (dn->dn_maxblkid + 1) * dn->dn_datablksz; int err; if (offset >= object_size) return (0); if (length == DMU_OBJECT_END || offset + length > object_size) length = object_size - offset; while (length != 0) { uint64_t chunk_end, chunk_begin; chunk_end = chunk_begin = offset + length; /* move chunk_begin backwards to the beginning of this chunk */ err = get_next_chunk(dn, &chunk_begin, offset); if (err) return (err); ASSERT3U(chunk_begin, >=, offset); ASSERT3U(chunk_begin, <=, chunk_end); dmu_tx_t *tx = dmu_tx_create(os); dmu_tx_hold_free(tx, dn->dn_object, chunk_begin, chunk_end - chunk_begin); /* * Mark this transaction as typically resulting in a net * reduction in space used. */ dmu_tx_mark_netfree(tx); err = dmu_tx_assign(tx, TXG_WAIT); if (err) { dmu_tx_abort(tx); return (err); } dnode_free_range(dn, chunk_begin, chunk_end - chunk_begin, tx); dmu_tx_commit(tx); length -= chunk_end - chunk_begin; } return (0); } int dmu_free_long_range(objset_t *os, uint64_t object, uint64_t offset, uint64_t length) { dnode_t *dn; int err; err = dnode_hold(os, object, FTAG, &dn); if (err != 0) return (err); err = dmu_free_long_range_impl(os, dn, offset, length); /* * It is important to zero out the maxblkid when freeing the entire * file, so that (a) subsequent calls to dmu_free_long_range_impl() * will take the fast path, and (b) dnode_reallocate() can verify * that the entire file has been freed. */ if (err == 0 && offset == 0 && length == DMU_OBJECT_END) dn->dn_maxblkid = 0; dnode_rele(dn, FTAG); return (err); } int dmu_free_long_object(objset_t *os, uint64_t object) { dmu_tx_t *tx; int err; err = dmu_free_long_range(os, object, 0, DMU_OBJECT_END); if (err != 0) return (err); tx = dmu_tx_create(os); dmu_tx_hold_bonus(tx, object); dmu_tx_hold_free(tx, object, 0, DMU_OBJECT_END); dmu_tx_mark_netfree(tx); err = dmu_tx_assign(tx, TXG_WAIT); if (err == 0) { err = dmu_object_free(os, object, tx); dmu_tx_commit(tx); } else { dmu_tx_abort(tx); } return (err); } int dmu_free_range(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, dmu_tx_t *tx) { dnode_t *dn; int err = dnode_hold(os, object, FTAG, &dn); if (err) return (err); ASSERT(offset < UINT64_MAX); ASSERT(size == -1ULL || size <= UINT64_MAX - offset); dnode_free_range(dn, offset, size, tx); dnode_rele(dn, FTAG); return (0); } int dmu_read(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, void *buf, uint32_t flags) { dnode_t *dn; dmu_buf_t **dbp; int numbufs, err; err = dnode_hold(os, object, FTAG, &dn); if (err) return (err); /* * Deal with odd block sizes, where there can't be data past the first * block. If we ever do the tail block optimization, we will need to * handle that here as well. */ if (dn->dn_maxblkid == 0) { int newsz = offset > dn->dn_datablksz ? 0 : MIN(size, dn->dn_datablksz - offset); bzero((char *)buf + newsz, size - newsz); size = newsz; } while (size > 0) { uint64_t mylen = MIN(size, DMU_MAX_ACCESS / 2); int i; /* * NB: we could do this block-at-a-time, but it's nice * to be reading in parallel. */ err = dmu_buf_hold_array_by_dnode(dn, offset, mylen, TRUE, FTAG, &numbufs, &dbp, flags); if (err) break; for (i = 0; i < numbufs; i++) { int tocpy; int bufoff; dmu_buf_t *db = dbp[i]; ASSERT(size > 0); bufoff = offset - db->db_offset; tocpy = (int)MIN(db->db_size - bufoff, size); bcopy((char *)db->db_data + bufoff, buf, tocpy); offset += tocpy; size -= tocpy; buf = (char *)buf + tocpy; } dmu_buf_rele_array(dbp, numbufs, FTAG); } dnode_rele(dn, FTAG); return (err); } void dmu_write(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, const void *buf, dmu_tx_t *tx) { dmu_buf_t **dbp; int numbufs, i; if (size == 0) return; VERIFY(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)); for (i = 0; i < numbufs; i++) { int tocpy; int bufoff; dmu_buf_t *db = dbp[i]; ASSERT(size > 0); bufoff = offset - db->db_offset; tocpy = (int)MIN(db->db_size - bufoff, size); ASSERT(i == 0 || i == numbufs-1 || tocpy == db->db_size); if (tocpy == db->db_size) dmu_buf_will_fill(db, tx); else dmu_buf_will_dirty(db, tx); bcopy(buf, (char *)db->db_data + bufoff, tocpy); if (tocpy == db->db_size) dmu_buf_fill_done(db, tx); offset += tocpy; size -= tocpy; buf = (char *)buf + tocpy; } dmu_buf_rele_array(dbp, numbufs, FTAG); } void dmu_prealloc(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, dmu_tx_t *tx) { dmu_buf_t **dbp; int numbufs, i; if (size == 0) return; VERIFY(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)); for (i = 0; i < numbufs; i++) { dmu_buf_t *db = dbp[i]; dmu_buf_will_not_fill(db, tx); } dmu_buf_rele_array(dbp, numbufs, FTAG); } void dmu_write_embedded(objset_t *os, uint64_t object, uint64_t offset, void *data, uint8_t etype, uint8_t comp, int uncompressed_size, int compressed_size, int byteorder, dmu_tx_t *tx) { dmu_buf_t *db; ASSERT3U(etype, <, NUM_BP_EMBEDDED_TYPES); ASSERT3U(comp, <, ZIO_COMPRESS_FUNCTIONS); VERIFY0(dmu_buf_hold_noread(os, object, offset, FTAG, &db)); dmu_buf_write_embedded(db, data, (bp_embedded_type_t)etype, (enum zio_compress)comp, uncompressed_size, compressed_size, byteorder, tx); dmu_buf_rele(db, FTAG); } /* * DMU support for xuio */ kstat_t *xuio_ksp = NULL; int dmu_xuio_init(xuio_t *xuio, int nblk) { dmu_xuio_t *priv; uio_t *uio = &xuio->xu_uio; uio->uio_iovcnt = nblk; uio->uio_iov = kmem_zalloc(nblk * sizeof (iovec_t), KM_SLEEP); priv = kmem_zalloc(sizeof (dmu_xuio_t), KM_SLEEP); priv->cnt = nblk; priv->bufs = kmem_zalloc(nblk * sizeof (arc_buf_t *), KM_SLEEP); priv->iovp = uio->uio_iov; XUIO_XUZC_PRIV(xuio) = priv; if (XUIO_XUZC_RW(xuio) == UIO_READ) XUIOSTAT_INCR(xuiostat_onloan_rbuf, nblk); else XUIOSTAT_INCR(xuiostat_onloan_wbuf, nblk); return (0); } void dmu_xuio_fini(xuio_t *xuio) { dmu_xuio_t *priv = XUIO_XUZC_PRIV(xuio); int nblk = priv->cnt; kmem_free(priv->iovp, nblk * sizeof (iovec_t)); kmem_free(priv->bufs, nblk * sizeof (arc_buf_t *)); kmem_free(priv, sizeof (dmu_xuio_t)); if (XUIO_XUZC_RW(xuio) == UIO_READ) XUIOSTAT_INCR(xuiostat_onloan_rbuf, -nblk); else XUIOSTAT_INCR(xuiostat_onloan_wbuf, -nblk); } /* * Initialize iov[priv->next] and priv->bufs[priv->next] with { off, n, abuf } * and increase priv->next by 1. */ int dmu_xuio_add(xuio_t *xuio, arc_buf_t *abuf, offset_t off, size_t n) { struct iovec *iov; uio_t *uio = &xuio->xu_uio; dmu_xuio_t *priv = XUIO_XUZC_PRIV(xuio); int i = priv->next++; ASSERT(i < priv->cnt); ASSERT(off + n <= arc_buf_size(abuf)); iov = uio->uio_iov + i; iov->iov_base = (char *)abuf->b_data + off; iov->iov_len = n; priv->bufs[i] = abuf; return (0); } int dmu_xuio_cnt(xuio_t *xuio) { dmu_xuio_t *priv = XUIO_XUZC_PRIV(xuio); return (priv->cnt); } arc_buf_t * dmu_xuio_arcbuf(xuio_t *xuio, int i) { dmu_xuio_t *priv = XUIO_XUZC_PRIV(xuio); ASSERT(i < priv->cnt); return (priv->bufs[i]); } void dmu_xuio_clear(xuio_t *xuio, int i) { dmu_xuio_t *priv = XUIO_XUZC_PRIV(xuio); ASSERT(i < priv->cnt); priv->bufs[i] = NULL; } static void xuio_stat_init(void) { xuio_ksp = kstat_create("zfs", 0, "xuio_stats", "misc", KSTAT_TYPE_NAMED, sizeof (xuio_stats) / sizeof (kstat_named_t), KSTAT_FLAG_VIRTUAL); if (xuio_ksp != NULL) { xuio_ksp->ks_data = &xuio_stats; kstat_install(xuio_ksp); } } static void xuio_stat_fini(void) { if (xuio_ksp != NULL) { kstat_delete(xuio_ksp); xuio_ksp = NULL; } } void xuio_stat_wbuf_copied() { XUIOSTAT_BUMP(xuiostat_wbuf_copied); } void xuio_stat_wbuf_nocopy() { XUIOSTAT_BUMP(xuiostat_wbuf_nocopy); } #ifdef _KERNEL static int dmu_read_uio_dnode(dnode_t *dn, uio_t *uio, uint64_t size) { dmu_buf_t **dbp; int numbufs, i, err; xuio_t *xuio = NULL; /* * NB: we could do this block-at-a-time, but it's nice * to be reading in parallel. */ err = dmu_buf_hold_array_by_dnode(dn, uio->uio_loffset, size, TRUE, FTAG, &numbufs, &dbp, 0); if (err) return (err); #ifdef UIO_XUIO if (uio->uio_extflg == UIO_XUIO) xuio = (xuio_t *)uio; #endif for (i = 0; i < numbufs; i++) { int tocpy; int bufoff; dmu_buf_t *db = dbp[i]; ASSERT(size > 0); bufoff = uio->uio_loffset - db->db_offset; tocpy = (int)MIN(db->db_size - bufoff, size); if (xuio) { dmu_buf_impl_t *dbi = (dmu_buf_impl_t *)db; arc_buf_t *dbuf_abuf = dbi->db_buf; arc_buf_t *abuf = dbuf_loan_arcbuf(dbi); err = dmu_xuio_add(xuio, abuf, bufoff, tocpy); if (!err) { uio->uio_resid -= tocpy; uio->uio_loffset += tocpy; } if (abuf == dbuf_abuf) XUIOSTAT_BUMP(xuiostat_rbuf_nocopy); else XUIOSTAT_BUMP(xuiostat_rbuf_copied); } else { err = uiomove((char *)db->db_data + bufoff, tocpy, UIO_READ, uio); } if (err) break; size -= tocpy; } dmu_buf_rele_array(dbp, numbufs, FTAG); return (err); } /* * Read 'size' bytes into the uio buffer. * From object zdb->db_object. * Starting at offset uio->uio_loffset. * * If the caller already has a dbuf in the target object * (e.g. its bonus buffer), this routine is faster than dmu_read_uio(), * because we don't have to find the dnode_t for the object. */ int dmu_read_uio_dbuf(dmu_buf_t *zdb, uio_t *uio, uint64_t size) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)zdb; dnode_t *dn; int err; if (size == 0) return (0); DB_DNODE_ENTER(db); dn = DB_DNODE(db); err = dmu_read_uio_dnode(dn, uio, size); DB_DNODE_EXIT(db); return (err); } /* * Read 'size' bytes into the uio buffer. * From the specified object * Starting at offset uio->uio_loffset. */ int dmu_read_uio(objset_t *os, uint64_t object, uio_t *uio, uint64_t size) { dnode_t *dn; int err; if (size == 0) return (0); err = dnode_hold(os, object, FTAG, &dn); if (err) return (err); err = dmu_read_uio_dnode(dn, uio, size); dnode_rele(dn, FTAG); return (err); } static int dmu_write_uio_dnode(dnode_t *dn, uio_t *uio, uint64_t size, dmu_tx_t *tx) { dmu_buf_t **dbp; int numbufs; int err = 0; int i; err = dmu_buf_hold_array_by_dnode(dn, uio->uio_loffset, size, FALSE, FTAG, &numbufs, &dbp, DMU_READ_PREFETCH); if (err) return (err); for (i = 0; i < numbufs; i++) { int tocpy; int bufoff; dmu_buf_t *db = dbp[i]; ASSERT(size > 0); bufoff = uio->uio_loffset - db->db_offset; tocpy = (int)MIN(db->db_size - bufoff, size); ASSERT(i == 0 || i == numbufs-1 || tocpy == db->db_size); if (tocpy == db->db_size) dmu_buf_will_fill(db, tx); else dmu_buf_will_dirty(db, tx); /* * XXX uiomove could block forever (eg. nfs-backed * pages). There needs to be a uiolockdown() function * to lock the pages in memory, so that uiomove won't * block. */ err = uiomove((char *)db->db_data + bufoff, tocpy, UIO_WRITE, uio); if (tocpy == db->db_size) dmu_buf_fill_done(db, tx); if (err) break; size -= tocpy; } dmu_buf_rele_array(dbp, numbufs, FTAG); return (err); } /* * Write 'size' bytes from the uio buffer. * To object zdb->db_object. * Starting at offset uio->uio_loffset. * * If the caller already has a dbuf in the target object * (e.g. its bonus buffer), this routine is faster than dmu_write_uio(), * because we don't have to find the dnode_t for the object. */ int dmu_write_uio_dbuf(dmu_buf_t *zdb, uio_t *uio, uint64_t size, dmu_tx_t *tx) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)zdb; dnode_t *dn; int err; if (size == 0) return (0); DB_DNODE_ENTER(db); dn = DB_DNODE(db); err = dmu_write_uio_dnode(dn, uio, size, tx); DB_DNODE_EXIT(db); return (err); } /* * Write 'size' bytes from the uio buffer. * To the specified object. * Starting at offset uio->uio_loffset. */ int dmu_write_uio(objset_t *os, uint64_t object, uio_t *uio, uint64_t size, dmu_tx_t *tx) { dnode_t *dn; int err; if (size == 0) return (0); err = dnode_hold(os, object, FTAG, &dn); if (err) return (err); err = dmu_write_uio_dnode(dn, uio, size, tx); dnode_rele(dn, FTAG); return (err); } #ifdef illumos int dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, page_t *pp, dmu_tx_t *tx) { dmu_buf_t **dbp; int numbufs, i; int err; if (size == 0) return (0); err = dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp); if (err) return (err); for (i = 0; i < numbufs; i++) { int tocpy, copied, thiscpy; int bufoff; dmu_buf_t *db = dbp[i]; caddr_t va; ASSERT(size > 0); ASSERT3U(db->db_size, >=, PAGESIZE); bufoff = offset - db->db_offset; tocpy = (int)MIN(db->db_size - bufoff, size); ASSERT(i == 0 || i == numbufs-1 || tocpy == db->db_size); if (tocpy == db->db_size) dmu_buf_will_fill(db, tx); else dmu_buf_will_dirty(db, tx); for (copied = 0; copied < tocpy; copied += PAGESIZE) { ASSERT3U(pp->p_offset, ==, db->db_offset + bufoff); thiscpy = MIN(PAGESIZE, tocpy - copied); va = zfs_map_page(pp, S_READ); bcopy(va, (char *)db->db_data + bufoff, thiscpy); zfs_unmap_page(pp, va); pp = pp->p_next; bufoff += PAGESIZE; } if (tocpy == db->db_size) dmu_buf_fill_done(db, tx); offset += tocpy; size -= tocpy; } dmu_buf_rele_array(dbp, numbufs, FTAG); return (err); } #else /* !illumos */ int dmu_write_pages(objset_t *os, uint64_t object, uint64_t offset, uint64_t size, vm_page_t *ma, dmu_tx_t *tx) { dmu_buf_t **dbp; struct sf_buf *sf; int numbufs, i; int err; if (size == 0) return (0); err = dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp); if (err) return (err); for (i = 0; i < numbufs; i++) { int tocpy, copied, thiscpy; int bufoff; dmu_buf_t *db = dbp[i]; caddr_t va; ASSERT(size > 0); ASSERT3U(db->db_size, >=, PAGESIZE); bufoff = offset - db->db_offset; tocpy = (int)MIN(db->db_size - bufoff, size); ASSERT(i == 0 || i == numbufs-1 || tocpy == db->db_size); if (tocpy == db->db_size) dmu_buf_will_fill(db, tx); else dmu_buf_will_dirty(db, tx); for (copied = 0; copied < tocpy; copied += PAGESIZE) { ASSERT3U(ptoa((*ma)->pindex), ==, db->db_offset + bufoff); thiscpy = MIN(PAGESIZE, tocpy - copied); va = zfs_map_page(*ma, &sf); bcopy(va, (char *)db->db_data + bufoff, thiscpy); zfs_unmap_page(sf); ma += 1; bufoff += PAGESIZE; } if (tocpy == db->db_size) dmu_buf_fill_done(db, tx); offset += tocpy; size -= tocpy; } dmu_buf_rele_array(dbp, numbufs, FTAG); return (err); } #endif /* illumos */ #endif /* _KERNEL */ /* * Allocate a loaned anonymous arc buffer. */ arc_buf_t * dmu_request_arcbuf(dmu_buf_t *handle, int size) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)handle; return (arc_loan_buf(db->db_objset->os_spa, size)); } /* * Free a loaned arc buffer. */ void dmu_return_arcbuf(arc_buf_t *buf) { arc_return_buf(buf, FTAG); VERIFY(arc_buf_remove_ref(buf, FTAG)); } /* * When possible directly assign passed loaned arc buffer to a dbuf. * If this is not possible copy the contents of passed arc buf via * dmu_write(). */ void dmu_assign_arcbuf(dmu_buf_t *handle, uint64_t offset, arc_buf_t *buf, dmu_tx_t *tx) { dmu_buf_impl_t *dbuf = (dmu_buf_impl_t *)handle; dnode_t *dn; dmu_buf_impl_t *db; uint32_t blksz = (uint32_t)arc_buf_size(buf); uint64_t blkid; DB_DNODE_ENTER(dbuf); dn = DB_DNODE(dbuf); rw_enter(&dn->dn_struct_rwlock, RW_READER); blkid = dbuf_whichblock(dn, 0, offset); VERIFY((db = dbuf_hold(dn, blkid, FTAG)) != NULL); rw_exit(&dn->dn_struct_rwlock); DB_DNODE_EXIT(dbuf); /* * We can only assign if the offset is aligned, the arc buf is the * same size as the dbuf, and the dbuf is not metadata. It * can't be metadata because the loaned arc buf comes from the * user-data kmem arena. */ if (offset == db->db.db_offset && blksz == db->db.db_size && DBUF_GET_BUFC_TYPE(db) == ARC_BUFC_DATA) { +#ifdef _KERNEL + curthread->td_ru.ru_oublock++; +#endif dbuf_assign_arcbuf(db, buf, tx); dbuf_rele(db, FTAG); } else { objset_t *os; uint64_t object; DB_DNODE_ENTER(dbuf); dn = DB_DNODE(dbuf); os = dn->dn_objset; object = dn->dn_object; DB_DNODE_EXIT(dbuf); dbuf_rele(db, FTAG); dmu_write(os, object, offset, blksz, buf->b_data, tx); dmu_return_arcbuf(buf); XUIOSTAT_BUMP(xuiostat_wbuf_copied); } } typedef struct { dbuf_dirty_record_t *dsa_dr; dmu_sync_cb_t *dsa_done; zgd_t *dsa_zgd; dmu_tx_t *dsa_tx; } dmu_sync_arg_t; /* ARGSUSED */ static void dmu_sync_ready(zio_t *zio, arc_buf_t *buf, void *varg) { dmu_sync_arg_t *dsa = varg; dmu_buf_t *db = dsa->dsa_zgd->zgd_db; blkptr_t *bp = zio->io_bp; if (zio->io_error == 0) { if (BP_IS_HOLE(bp)) { /* * A block of zeros may compress to a hole, but the * block size still needs to be known for replay. */ BP_SET_LSIZE(bp, db->db_size); } else if (!BP_IS_EMBEDDED(bp)) { ASSERT(BP_GET_LEVEL(bp) == 0); bp->blk_fill = 1; } } } static void dmu_sync_late_arrival_ready(zio_t *zio) { dmu_sync_ready(zio, NULL, zio->io_private); } /* ARGSUSED */ static void dmu_sync_done(zio_t *zio, arc_buf_t *buf, void *varg) { dmu_sync_arg_t *dsa = varg; dbuf_dirty_record_t *dr = dsa->dsa_dr; dmu_buf_impl_t *db = dr->dr_dbuf; mutex_enter(&db->db_mtx); ASSERT(dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC); if (zio->io_error == 0) { dr->dt.dl.dr_nopwrite = !!(zio->io_flags & ZIO_FLAG_NOPWRITE); if (dr->dt.dl.dr_nopwrite) { blkptr_t *bp = zio->io_bp; blkptr_t *bp_orig = &zio->io_bp_orig; uint8_t chksum = BP_GET_CHECKSUM(bp_orig); ASSERT(BP_EQUAL(bp, bp_orig)); ASSERT(zio->io_prop.zp_compress != ZIO_COMPRESS_OFF); ASSERT(zio_checksum_table[chksum].ci_flags & ZCHECKSUM_FLAG_NOPWRITE); } dr->dt.dl.dr_overridden_by = *zio->io_bp; dr->dt.dl.dr_override_state = DR_OVERRIDDEN; dr->dt.dl.dr_copies = zio->io_prop.zp_copies; /* * Old style holes are filled with all zeros, whereas * new-style holes maintain their lsize, type, level, * and birth time (see zio_write_compress). While we * need to reset the BP_SET_LSIZE() call that happened * in dmu_sync_ready for old style holes, we do *not* * want to wipe out the information contained in new * style holes. Thus, only zero out the block pointer if * it's an old style hole. */ if (BP_IS_HOLE(&dr->dt.dl.dr_overridden_by) && dr->dt.dl.dr_overridden_by.blk_birth == 0) BP_ZERO(&dr->dt.dl.dr_overridden_by); } else { dr->dt.dl.dr_override_state = DR_NOT_OVERRIDDEN; } cv_broadcast(&db->db_changed); mutex_exit(&db->db_mtx); dsa->dsa_done(dsa->dsa_zgd, zio->io_error); kmem_free(dsa, sizeof (*dsa)); } static void dmu_sync_late_arrival_done(zio_t *zio) { blkptr_t *bp = zio->io_bp; dmu_sync_arg_t *dsa = zio->io_private; blkptr_t *bp_orig = &zio->io_bp_orig; if (zio->io_error == 0 && !BP_IS_HOLE(bp)) { /* * If we didn't allocate a new block (i.e. ZIO_FLAG_NOPWRITE) * then there is nothing to do here. Otherwise, free the * newly allocated block in this txg. */ if (zio->io_flags & ZIO_FLAG_NOPWRITE) { ASSERT(BP_EQUAL(bp, bp_orig)); } else { ASSERT(BP_IS_HOLE(bp_orig) || !BP_EQUAL(bp, bp_orig)); ASSERT(zio->io_bp->blk_birth == zio->io_txg); ASSERT(zio->io_txg > spa_syncing_txg(zio->io_spa)); zio_free(zio->io_spa, zio->io_txg, zio->io_bp); } } dmu_tx_commit(dsa->dsa_tx); dsa->dsa_done(dsa->dsa_zgd, zio->io_error); kmem_free(dsa, sizeof (*dsa)); } static int dmu_sync_late_arrival(zio_t *pio, objset_t *os, dmu_sync_cb_t *done, zgd_t *zgd, zio_prop_t *zp, zbookmark_phys_t *zb) { dmu_sync_arg_t *dsa; dmu_tx_t *tx; tx = dmu_tx_create(os); dmu_tx_hold_space(tx, zgd->zgd_db->db_size); if (dmu_tx_assign(tx, TXG_WAIT) != 0) { dmu_tx_abort(tx); /* Make zl_get_data do txg_waited_synced() */ return (SET_ERROR(EIO)); } dsa = kmem_alloc(sizeof (dmu_sync_arg_t), KM_SLEEP); dsa->dsa_dr = NULL; dsa->dsa_done = done; dsa->dsa_zgd = zgd; dsa->dsa_tx = tx; zio_nowait(zio_write(pio, os->os_spa, dmu_tx_get_txg(tx), zgd->zgd_bp, zgd->zgd_db->db_data, zgd->zgd_db->db_size, zp, dmu_sync_late_arrival_ready, NULL, dmu_sync_late_arrival_done, dsa, ZIO_PRIORITY_SYNC_WRITE, ZIO_FLAG_CANFAIL, zb)); return (0); } /* * Intent log support: sync the block associated with db to disk. * N.B. and XXX: the caller is responsible for making sure that the * data isn't changing while dmu_sync() is writing it. * * Return values: * * EEXIST: this txg has already been synced, so there's nothing to do. * The caller should not log the write. * * ENOENT: the block was dbuf_free_range()'d, so there's nothing to do. * The caller should not log the write. * * EALREADY: this block is already in the process of being synced. * The caller should track its progress (somehow). * * EIO: could not do the I/O. * The caller should do a txg_wait_synced(). * * 0: the I/O has been initiated. * The caller should log this blkptr in the done callback. * It is possible that the I/O will fail, in which case * the error will be reported to the done callback and * propagated to pio from zio_done(). */ int dmu_sync(zio_t *pio, uint64_t txg, dmu_sync_cb_t *done, zgd_t *zgd) { blkptr_t *bp = zgd->zgd_bp; dmu_buf_impl_t *db = (dmu_buf_impl_t *)zgd->zgd_db; objset_t *os = db->db_objset; dsl_dataset_t *ds = os->os_dsl_dataset; dbuf_dirty_record_t *dr; dmu_sync_arg_t *dsa; zbookmark_phys_t zb; zio_prop_t zp; dnode_t *dn; ASSERT(pio != NULL); ASSERT(txg != 0); SET_BOOKMARK(&zb, ds->ds_object, db->db.db_object, db->db_level, db->db_blkid); DB_DNODE_ENTER(db); dn = DB_DNODE(db); dmu_write_policy(os, dn, db->db_level, WP_DMU_SYNC, &zp); DB_DNODE_EXIT(db); /* * If we're frozen (running ziltest), we always need to generate a bp. */ if (txg > spa_freeze_txg(os->os_spa)) return (dmu_sync_late_arrival(pio, os, done, zgd, &zp, &zb)); /* * Grabbing db_mtx now provides a barrier between dbuf_sync_leaf() * and us. If we determine that this txg is not yet syncing, * but it begins to sync a moment later, that's OK because the * sync thread will block in dbuf_sync_leaf() until we drop db_mtx. */ mutex_enter(&db->db_mtx); if (txg <= spa_last_synced_txg(os->os_spa)) { /* * This txg has already synced. There's nothing to do. */ mutex_exit(&db->db_mtx); return (SET_ERROR(EEXIST)); } if (txg <= spa_syncing_txg(os->os_spa)) { /* * This txg is currently syncing, so we can't mess with * the dirty record anymore; just write a new log block. */ mutex_exit(&db->db_mtx); return (dmu_sync_late_arrival(pio, os, done, zgd, &zp, &zb)); } dr = db->db_last_dirty; while (dr && dr->dr_txg != txg) dr = dr->dr_next; if (dr == NULL) { /* * There's no dr for this dbuf, so it must have been freed. * There's no need to log writes to freed blocks, so we're done. */ mutex_exit(&db->db_mtx); return (SET_ERROR(ENOENT)); } ASSERT(dr->dr_next == NULL || dr->dr_next->dr_txg < txg); /* * Assume the on-disk data is X, the current syncing data (in * txg - 1) is Y, and the current in-memory data is Z (currently * in dmu_sync). * * We usually want to perform a nopwrite if X and Z are the * same. However, if Y is different (i.e. the BP is going to * change before this write takes effect), then a nopwrite will * be incorrect - we would override with X, which could have * been freed when Y was written. * * (Note that this is not a concern when we are nop-writing from * syncing context, because X and Y must be identical, because * all previous txgs have been synced.) * * Therefore, we disable nopwrite if the current BP could change * before this TXG. There are two ways it could change: by * being dirty (dr_next is non-NULL), or by being freed * (dnode_block_freed()). This behavior is verified by * zio_done(), which VERIFYs that the override BP is identical * to the on-disk BP. */ DB_DNODE_ENTER(db); dn = DB_DNODE(db); if (dr->dr_next != NULL || dnode_block_freed(dn, db->db_blkid)) zp.zp_nopwrite = B_FALSE; DB_DNODE_EXIT(db); ASSERT(dr->dr_txg == txg); if (dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC || dr->dt.dl.dr_override_state == DR_OVERRIDDEN) { /* * We have already issued a sync write for this buffer, * or this buffer has already been synced. It could not * have been dirtied since, or we would have cleared the state. */ mutex_exit(&db->db_mtx); return (SET_ERROR(EALREADY)); } ASSERT(dr->dt.dl.dr_override_state == DR_NOT_OVERRIDDEN); dr->dt.dl.dr_override_state = DR_IN_DMU_SYNC; mutex_exit(&db->db_mtx); dsa = kmem_alloc(sizeof (dmu_sync_arg_t), KM_SLEEP); dsa->dsa_dr = dr; dsa->dsa_done = done; dsa->dsa_zgd = zgd; dsa->dsa_tx = NULL; zio_nowait(arc_write(pio, os->os_spa, txg, bp, dr->dt.dl.dr_data, DBUF_IS_L2CACHEABLE(db), DBUF_IS_L2COMPRESSIBLE(db), &zp, dmu_sync_ready, NULL, dmu_sync_done, dsa, ZIO_PRIORITY_SYNC_WRITE, ZIO_FLAG_CANFAIL, &zb)); return (0); } int dmu_object_set_blocksize(objset_t *os, uint64_t object, uint64_t size, int ibs, dmu_tx_t *tx) { dnode_t *dn; int err; err = dnode_hold(os, object, FTAG, &dn); if (err) return (err); err = dnode_set_blksz(dn, size, ibs, tx); dnode_rele(dn, FTAG); return (err); } void dmu_object_set_checksum(objset_t *os, uint64_t object, uint8_t checksum, dmu_tx_t *tx) { dnode_t *dn; /* * Send streams include each object's checksum function. This * check ensures that the receiving system can understand the * checksum function transmitted. */ ASSERT3U(checksum, <, ZIO_CHECKSUM_LEGACY_FUNCTIONS); VERIFY0(dnode_hold(os, object, FTAG, &dn)); ASSERT3U(checksum, <, ZIO_CHECKSUM_FUNCTIONS); dn->dn_checksum = checksum; dnode_setdirty(dn, tx); dnode_rele(dn, FTAG); } void dmu_object_set_compress(objset_t *os, uint64_t object, uint8_t compress, dmu_tx_t *tx) { dnode_t *dn; /* * Send streams include each object's compression function. This * check ensures that the receiving system can understand the * compression function transmitted. */ ASSERT3U(compress, <, ZIO_COMPRESS_LEGACY_FUNCTIONS); VERIFY0(dnode_hold(os, object, FTAG, &dn)); dn->dn_compress = compress; dnode_setdirty(dn, tx); dnode_rele(dn, FTAG); } int zfs_mdcomp_disable = 0; SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_disable, CTLFLAG_RWTUN, &zfs_mdcomp_disable, 0, "Disable metadata compression"); /* * When the "redundant_metadata" property is set to "most", only indirect * blocks of this level and higher will have an additional ditto block. */ int zfs_redundant_metadata_most_ditto_level = 2; void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp, zio_prop_t *zp) { dmu_object_type_t type = dn ? dn->dn_type : DMU_OT_OBJSET; boolean_t ismd = (level > 0 || DMU_OT_IS_METADATA(type) || (wp & WP_SPILL)); enum zio_checksum checksum = os->os_checksum; enum zio_compress compress = os->os_compress; enum zio_checksum dedup_checksum = os->os_dedup_checksum; boolean_t dedup = B_FALSE; boolean_t nopwrite = B_FALSE; boolean_t dedup_verify = os->os_dedup_verify; int copies = os->os_copies; /* * We maintain different write policies for each of the following * types of data: * 1. metadata * 2. preallocated blocks (i.e. level-0 blocks of a dump device) * 3. all other level 0 blocks */ if (ismd) { if (zfs_mdcomp_disable) { compress = ZIO_COMPRESS_EMPTY; } else { /* * XXX -- we should design a compression algorithm * that specializes in arrays of bps. */ compress = zio_compress_select(os->os_spa, ZIO_COMPRESS_ON, ZIO_COMPRESS_ON); } /* * Metadata always gets checksummed. If the data * checksum is multi-bit correctable, and it's not a * ZBT-style checksum, then it's suitable for metadata * as well. Otherwise, the metadata checksum defaults * to fletcher4. */ if (!(zio_checksum_table[checksum].ci_flags & ZCHECKSUM_FLAG_METADATA) || (zio_checksum_table[checksum].ci_flags & ZCHECKSUM_FLAG_EMBEDDED)) checksum = ZIO_CHECKSUM_FLETCHER_4; if (os->os_redundant_metadata == ZFS_REDUNDANT_METADATA_ALL || (os->os_redundant_metadata == ZFS_REDUNDANT_METADATA_MOST && (level >= zfs_redundant_metadata_most_ditto_level || DMU_OT_IS_METADATA(type) || (wp & WP_SPILL)))) copies++; } else if (wp & WP_NOFILL) { ASSERT(level == 0); /* * If we're writing preallocated blocks, we aren't actually * writing them so don't set any policy properties. These * blocks are currently only used by an external subsystem * outside of zfs (i.e. dump) and not written by the zio * pipeline. */ compress = ZIO_COMPRESS_OFF; checksum = ZIO_CHECKSUM_NOPARITY; } else { compress = zio_compress_select(os->os_spa, dn->dn_compress, compress); checksum = (dedup_checksum == ZIO_CHECKSUM_OFF) ? zio_checksum_select(dn->dn_checksum, checksum) : dedup_checksum; /* * Determine dedup setting. If we are in dmu_sync(), * we won't actually dedup now because that's all * done in syncing context; but we do want to use the * dedup checkum. If the checksum is not strong * enough to ensure unique signatures, force * dedup_verify. */ if (dedup_checksum != ZIO_CHECKSUM_OFF) { dedup = (wp & WP_DMU_SYNC) ? B_FALSE : B_TRUE; if (!(zio_checksum_table[checksum].ci_flags & ZCHECKSUM_FLAG_DEDUP)) dedup_verify = B_TRUE; } /* * Enable nopwrite if we have secure enough checksum * algorithm (see comment in zio_nop_write) and * compression is enabled. We don't enable nopwrite if * dedup is enabled as the two features are mutually * exclusive. */ nopwrite = (!dedup && (zio_checksum_table[checksum].ci_flags & ZCHECKSUM_FLAG_NOPWRITE) && compress != ZIO_COMPRESS_OFF && zfs_nopwrite_enabled); } zp->zp_checksum = checksum; zp->zp_compress = compress; zp->zp_type = (wp & WP_SPILL) ? dn->dn_bonustype : type; zp->zp_level = level; zp->zp_copies = MIN(copies, spa_max_replication(os->os_spa)); zp->zp_dedup = dedup; zp->zp_dedup_verify = dedup && dedup_verify; zp->zp_nopwrite = nopwrite; } int dmu_offset_next(objset_t *os, uint64_t object, boolean_t hole, uint64_t *off) { dnode_t *dn; int err; /* * Sync any current changes before * we go trundling through the block pointers. */ err = dmu_object_wait_synced(os, object); if (err) { return (err); } err = dnode_hold(os, object, FTAG, &dn); if (err) { return (err); } err = dnode_next_offset(dn, (hole ? DNODE_FIND_HOLE : 0), off, 1, 1, 0); dnode_rele(dn, FTAG); return (err); } /* * Given the ZFS object, if it contains any dirty nodes * this function flushes all dirty blocks to disk. This * ensures the DMU object info is updated. A more efficient * future version might just find the TXG with the maximum * ID and wait for that to be synced. */ int dmu_object_wait_synced(objset_t *os, uint64_t object) { dnode_t *dn; int error, i; error = dnode_hold(os, object, FTAG, &dn); if (error) { return (error); } for (i = 0; i < TXG_SIZE; i++) { if (list_link_active(&dn->dn_dirty_link[i])) { break; } } dnode_rele(dn, FTAG); if (i != TXG_SIZE) { txg_wait_synced(dmu_objset_pool(os), 0); } return (0); } void dmu_object_info_from_dnode(dnode_t *dn, dmu_object_info_t *doi) { dnode_phys_t *dnp; rw_enter(&dn->dn_struct_rwlock, RW_READER); mutex_enter(&dn->dn_mtx); dnp = dn->dn_phys; doi->doi_data_block_size = dn->dn_datablksz; doi->doi_metadata_block_size = dn->dn_indblkshift ? 1ULL << dn->dn_indblkshift : 0; doi->doi_type = dn->dn_type; doi->doi_bonus_type = dn->dn_bonustype; doi->doi_bonus_size = dn->dn_bonuslen; doi->doi_indirection = dn->dn_nlevels; doi->doi_checksum = dn->dn_checksum; doi->doi_compress = dn->dn_compress; doi->doi_nblkptr = dn->dn_nblkptr; doi->doi_physical_blocks_512 = (DN_USED_BYTES(dnp) + 256) >> 9; doi->doi_max_offset = (dn->dn_maxblkid + 1) * dn->dn_datablksz; doi->doi_fill_count = 0; for (int i = 0; i < dnp->dn_nblkptr; i++) doi->doi_fill_count += BP_GET_FILL(&dnp->dn_blkptr[i]); mutex_exit(&dn->dn_mtx); rw_exit(&dn->dn_struct_rwlock); } /* * Get information on a DMU object. * If doi is NULL, just indicates whether the object exists. */ int dmu_object_info(objset_t *os, uint64_t object, dmu_object_info_t *doi) { dnode_t *dn; int err = dnode_hold(os, object, FTAG, &dn); if (err) return (err); if (doi != NULL) dmu_object_info_from_dnode(dn, doi); dnode_rele(dn, FTAG); return (0); } /* * As above, but faster; can be used when you have a held dbuf in hand. */ void dmu_object_info_from_db(dmu_buf_t *db_fake, dmu_object_info_t *doi) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; DB_DNODE_ENTER(db); dmu_object_info_from_dnode(DB_DNODE(db), doi); DB_DNODE_EXIT(db); } /* * Faster still when you only care about the size. * This is specifically optimized for zfs_getattr(). */ void dmu_object_size_from_db(dmu_buf_t *db_fake, uint32_t *blksize, u_longlong_t *nblk512) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; dnode_t *dn; DB_DNODE_ENTER(db); dn = DB_DNODE(db); *blksize = dn->dn_datablksz; /* add 1 for dnode space */ *nblk512 = ((DN_USED_BYTES(dn->dn_phys) + SPA_MINBLOCKSIZE/2) >> SPA_MINBLOCKSHIFT) + 1; DB_DNODE_EXIT(db); } void byteswap_uint64_array(void *vbuf, size_t size) { uint64_t *buf = vbuf; size_t count = size >> 3; int i; ASSERT((size & 7) == 0); for (i = 0; i < count; i++) buf[i] = BSWAP_64(buf[i]); } void byteswap_uint32_array(void *vbuf, size_t size) { uint32_t *buf = vbuf; size_t count = size >> 2; int i; ASSERT((size & 3) == 0); for (i = 0; i < count; i++) buf[i] = BSWAP_32(buf[i]); } void byteswap_uint16_array(void *vbuf, size_t size) { uint16_t *buf = vbuf; size_t count = size >> 1; int i; ASSERT((size & 1) == 0); for (i = 0; i < count; i++) buf[i] = BSWAP_16(buf[i]); } /* ARGSUSED */ void byteswap_uint8_array(void *vbuf, size_t size) { } void dmu_init(void) { zfs_dbgmsg_init(); sa_cache_init(); xuio_stat_init(); dmu_objset_init(); dnode_init(); dbuf_init(); zfetch_init(); zio_compress_init(); l2arc_init(); arc_init(); } void dmu_fini(void) { arc_fini(); /* arc depends on l2arc, so arc must go first */ l2arc_fini(); zfetch_fini(); zio_compress_fini(); dbuf_fini(); dnode_fini(); dmu_objset_fini(); xuio_stat_fini(); sa_cache_fini(); zfs_dbgmsg_fini(); } Index: projects/clang380-import/sys/cddl/contrib/opensolaris =================================================================== --- projects/clang380-import/sys/cddl/contrib/opensolaris (revision 294776) +++ projects/clang380-import/sys/cddl/contrib/opensolaris (revision 294777) Property changes on: projects/clang380-import/sys/cddl/contrib/opensolaris ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys/cddl/contrib/opensolaris:r294599-294776 Index: projects/clang380-import/sys/conf/files =================================================================== --- projects/clang380-import/sys/conf/files (revision 294776) +++ projects/clang380-import/sys/conf/files (revision 294777) @@ -1,4306 +1,4315 @@ # $FreeBSD$ # # The long compile-with and dependency lines are required because of # limitations in config: backslash-newline doesn't work in strings, and # dependency lines other than the first are silently ignored. # acpi_quirks.h optional acpi \ dependency "$S/tools/acpi_quirks2h.awk $S/dev/acpica/acpi_quirks" \ compile-with "${AWK} -f $S/tools/acpi_quirks2h.awk $S/dev/acpica/acpi_quirks" \ no-obj no-implicit-rule before-depend \ clean "acpi_quirks.h" # # The 'fdt_dtb_file' target covers an actual DTB file name, which is derived # from the specified source (DTS) file: .dts -> .dtb # fdt_dtb_file optional fdt fdt_dtb_static \ compile-with "sh -c 'MACHINE=${MACHINE} $S/tools/fdt/make_dtb.sh $S ${FDT_DTS_FILE} ${.CURDIR}'" \ no-obj no-implicit-rule before-depend \ clean "${FDT_DTS_FILE:R}.dtb" fdt_static_dtb.h optional fdt fdt_dtb_static \ compile-with "sh -c 'MACHINE=${MACHINE} $S/tools/fdt/make_dtbh.sh ${FDT_DTS_FILE} ${.CURDIR}'" \ dependency "fdt_dtb_file" \ no-obj no-implicit-rule before-depend \ clean "fdt_static_dtb.h" feeder_eq_gen.h optional sound \ dependency "$S/tools/sound/feeder_eq_mkfilter.awk" \ compile-with "${AWK} -f $S/tools/sound/feeder_eq_mkfilter.awk -- ${FEEDER_EQ_PRESETS} > feeder_eq_gen.h" \ no-obj no-implicit-rule before-depend \ clean "feeder_eq_gen.h" feeder_rate_gen.h optional sound \ dependency "$S/tools/sound/feeder_rate_mkfilter.awk" \ compile-with "${AWK} -f $S/tools/sound/feeder_rate_mkfilter.awk -- ${FEEDER_RATE_PRESETS} > feeder_rate_gen.h" \ no-obj no-implicit-rule before-depend \ clean "feeder_rate_gen.h" snd_fxdiv_gen.h optional sound \ dependency "$S/tools/sound/snd_fxdiv_gen.awk" \ compile-with "${AWK} -f $S/tools/sound/snd_fxdiv_gen.awk -- > snd_fxdiv_gen.h" \ no-obj no-implicit-rule before-depend \ clean "snd_fxdiv_gen.h" miidevs.h optional miibus | mii \ dependency "$S/tools/miidevs2h.awk $S/dev/mii/miidevs" \ compile-with "${AWK} -f $S/tools/miidevs2h.awk $S/dev/mii/miidevs" \ no-obj no-implicit-rule before-depend \ clean "miidevs.h" pccarddevs.h standard \ dependency "$S/tools/pccarddevs2h.awk $S/dev/pccard/pccarddevs" \ compile-with "${AWK} -f $S/tools/pccarddevs2h.awk $S/dev/pccard/pccarddevs" \ no-obj no-implicit-rule before-depend \ clean "pccarddevs.h" teken_state.h optional sc | vt \ dependency "$S/teken/gensequences $S/teken/sequences" \ compile-with "${AWK} -f $S/teken/gensequences $S/teken/sequences > teken_state.h" \ no-obj no-implicit-rule before-depend \ clean "teken_state.h" usbdevs.h optional usb \ dependency "$S/tools/usbdevs2h.awk $S/dev/usb/usbdevs" \ compile-with "${AWK} -f $S/tools/usbdevs2h.awk $S/dev/usb/usbdevs -h" \ no-obj no-implicit-rule before-depend \ clean "usbdevs.h" usbdevs_data.h optional usb \ dependency "$S/tools/usbdevs2h.awk $S/dev/usb/usbdevs" \ compile-with "${AWK} -f $S/tools/usbdevs2h.awk $S/dev/usb/usbdevs -d" \ no-obj no-implicit-rule before-depend \ clean "usbdevs_data.h" cam/cam.c optional scbus cam/cam_compat.c optional scbus cam/cam_periph.c optional scbus cam/cam_queue.c optional scbus cam/cam_sim.c optional scbus cam/cam_xpt.c optional scbus cam/ata/ata_all.c optional scbus cam/ata/ata_xpt.c optional scbus cam/ata/ata_pmp.c optional scbus cam/scsi/scsi_xpt.c optional scbus cam/scsi/scsi_all.c optional scbus cam/scsi/scsi_cd.c optional cd cam/scsi/scsi_ch.c optional ch cam/ata/ata_da.c optional ada | da cam/ctl/ctl.c optional ctl cam/ctl/ctl_backend.c optional ctl cam/ctl/ctl_backend_block.c optional ctl cam/ctl/ctl_backend_ramdisk.c optional ctl cam/ctl/ctl_cmd_table.c optional ctl cam/ctl/ctl_frontend.c optional ctl cam/ctl/ctl_frontend_cam_sim.c optional ctl cam/ctl/ctl_frontend_ioctl.c optional ctl cam/ctl/ctl_frontend_iscsi.c optional ctl cam/ctl/ctl_ha.c optional ctl cam/ctl/ctl_scsi_all.c optional ctl cam/ctl/ctl_tpc.c optional ctl cam/ctl/ctl_tpc_local.c optional ctl cam/ctl/ctl_error.c optional ctl cam/ctl/ctl_util.c optional ctl cam/ctl/scsi_ctl.c optional ctl cam/scsi/scsi_da.c optional da cam/scsi/scsi_low.c optional ct | ncv | nsp | stg cam/scsi/scsi_pass.c optional pass cam/scsi/scsi_pt.c optional pt cam/scsi/scsi_sa.c optional sa cam/scsi/scsi_enc.c optional ses cam/scsi/scsi_enc_ses.c optional ses cam/scsi/scsi_enc_safte.c optional ses cam/scsi/scsi_sg.c optional sg cam/scsi/scsi_targ_bh.c optional targbh cam/scsi/scsi_target.c optional targ cam/scsi/smp_all.c optional scbus # shared between zfs and dtrace cddl/compat/opensolaris/kern/opensolaris.c optional zfs | dtrace compile-with "${CDDL_C}" cddl/compat/opensolaris/kern/opensolaris_cmn_err.c optional zfs | dtrace compile-with "${CDDL_C}" cddl/compat/opensolaris/kern/opensolaris_kmem.c optional zfs | dtrace compile-with "${CDDL_C}" cddl/compat/opensolaris/kern/opensolaris_misc.c optional zfs | dtrace compile-with "${CDDL_C}" cddl/compat/opensolaris/kern/opensolaris_sunddi.c optional zfs | dtrace compile-with "${CDDL_C}" cddl/compat/opensolaris/kern/opensolaris_taskq.c optional zfs | dtrace compile-with "${CDDL_C}" # zfs specific cddl/compat/opensolaris/kern/opensolaris_acl.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_dtrace.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_kobj.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_kstat.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_lookup.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_policy.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_string.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_sysevent.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_uio.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_vfs.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_vm.c optional zfs compile-with "${ZFS_C}" cddl/compat/opensolaris/kern/opensolaris_zone.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/acl/acl_common.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/avl/avl.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/nvpair/opensolaris_fnvpair.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/nvpair/opensolaris_nvpair.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/nvpair/opensolaris_nvpair_alloc_fixed.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/unicode/u8_textprep.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zfeature_common.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zfs_comutil.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zfs_deleg.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zfs_fletcher.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zfs_ioctl_compat.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zfs_namecheck.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zfs_prop.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zpool_prop.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/common/zfs/zprop_common.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/gfs.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/vnode.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/blkptr.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/bplist.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/bpobj.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/bqueue.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/ddt_zap.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_diff.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_zfetch.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c optional zfs compile-with "${ZFS_C}" \ warning "kernel contains CDDL licensed ZFS filesystem" cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_bookmark.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deleg.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_destroy.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dir.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_synctask.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/gzip.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/lz4.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/multilist.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/range_tree.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/rrwlock.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/sha256.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/spa_config.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/spa_errlog.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/spa_history.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/space_reftree.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/uberblock.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/unique.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_cache.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_missing.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_raidz.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_root.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfeature.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_byteswap.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_debug.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fm.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_log.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_onexit.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_replay.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_sa.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zio_checksum.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zio_inject.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zle.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/os/callb.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/os/fm.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/os/list.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/os/nvpair_alloc_system.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/adler32.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/deflate.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/inffast.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/inflate.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/inftrees.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/opensolaris_crc32.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/trees.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/zmod.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/zmod_subr.c optional zfs compile-with "${ZFS_C}" cddl/contrib/opensolaris/uts/common/zmod/zutil.c optional zfs compile-with "${ZFS_C}" # dtrace specific cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c optional dtrace compile-with "${DTRACE_C}" \ warning "kernel contains CDDL licensed DTRACE" cddl/dev/dtmalloc/dtmalloc.c optional dtmalloc | dtraceall compile-with "${CDDL_C}" cddl/dev/profile/profile.c optional dtrace_profile | dtraceall compile-with "${CDDL_C}" cddl/dev/sdt/sdt.c optional dtrace_sdt | dtraceall compile-with "${CDDL_C}" cddl/dev/fbt/fbt.c optional dtrace_fbt | dtraceall compile-with "${FBT_C}" cddl/dev/systrace/systrace.c optional dtrace_systrace | dtraceall compile-with "${CDDL_C}" cddl/dev/prototype.c optional dtrace_prototype | dtraceall compile-with "${CDDL_C}" fs/nfsclient/nfs_clkdtrace.c optional dtnfscl nfscl | dtraceall nfscl compile-with "${CDDL_C}" compat/cloudabi/cloudabi_clock.c optional compat_cloudabi64 compat/cloudabi/cloudabi_errno.c optional compat_cloudabi64 compat/cloudabi/cloudabi_fd.c optional compat_cloudabi64 compat/cloudabi/cloudabi_file.c optional compat_cloudabi64 compat/cloudabi/cloudabi_futex.c optional compat_cloudabi64 compat/cloudabi/cloudabi_mem.c optional compat_cloudabi64 compat/cloudabi/cloudabi_proc.c optional compat_cloudabi64 compat/cloudabi/cloudabi_random.c optional compat_cloudabi64 compat/cloudabi/cloudabi_sock.c optional compat_cloudabi64 compat/cloudabi/cloudabi_thread.c optional compat_cloudabi64 compat/cloudabi64/cloudabi64_fd.c optional compat_cloudabi64 compat/cloudabi64/cloudabi64_module.c optional compat_cloudabi64 compat/cloudabi64/cloudabi64_poll.c optional compat_cloudabi64 compat/cloudabi64/cloudabi64_sock.c optional compat_cloudabi64 compat/cloudabi64/cloudabi64_syscalls.c optional compat_cloudabi64 compat/cloudabi64/cloudabi64_sysent.c optional compat_cloudabi64 compat/cloudabi64/cloudabi64_thread.c optional compat_cloudabi64 compat/freebsd32/freebsd32_capability.c optional compat_freebsd32 compat/freebsd32/freebsd32_ioctl.c optional compat_freebsd32 compat/freebsd32/freebsd32_misc.c optional compat_freebsd32 compat/freebsd32/freebsd32_syscalls.c optional compat_freebsd32 compat/freebsd32/freebsd32_sysent.c optional compat_freebsd32 contrib/dev/acpica/common/ahids.c optional acpi acpi_debug contrib/dev/acpica/common/ahuuids.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbcmds.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbconvert.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbdisply.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbexec.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbfileio.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbhistry.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbinput.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbmethod.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbnames.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbobject.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbstats.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbtest.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbutils.c optional acpi acpi_debug contrib/dev/acpica/components/debugger/dbxface.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmbuffer.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmcstyle.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmdeferred.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmnames.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmopcode.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmresrc.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmresrcl.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmresrcl2.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmresrcs.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmutils.c optional acpi acpi_debug contrib/dev/acpica/components/disassembler/dmwalk.c optional acpi acpi_debug contrib/dev/acpica/components/dispatcher/dsargs.c optional acpi contrib/dev/acpica/components/dispatcher/dscontrol.c optional acpi contrib/dev/acpica/components/dispatcher/dsdebug.c optional acpi contrib/dev/acpica/components/dispatcher/dsfield.c optional acpi contrib/dev/acpica/components/dispatcher/dsinit.c optional acpi contrib/dev/acpica/components/dispatcher/dsmethod.c optional acpi contrib/dev/acpica/components/dispatcher/dsmthdat.c optional acpi contrib/dev/acpica/components/dispatcher/dsobject.c optional acpi contrib/dev/acpica/components/dispatcher/dsopcode.c optional acpi contrib/dev/acpica/components/dispatcher/dsutils.c optional acpi contrib/dev/acpica/components/dispatcher/dswexec.c optional acpi contrib/dev/acpica/components/dispatcher/dswload.c optional acpi contrib/dev/acpica/components/dispatcher/dswload2.c optional acpi contrib/dev/acpica/components/dispatcher/dswscope.c optional acpi contrib/dev/acpica/components/dispatcher/dswstate.c optional acpi contrib/dev/acpica/components/events/evevent.c optional acpi contrib/dev/acpica/components/events/evglock.c optional acpi contrib/dev/acpica/components/events/evgpe.c optional acpi contrib/dev/acpica/components/events/evgpeblk.c optional acpi contrib/dev/acpica/components/events/evgpeinit.c optional acpi contrib/dev/acpica/components/events/evgpeutil.c optional acpi contrib/dev/acpica/components/events/evhandler.c optional acpi contrib/dev/acpica/components/events/evmisc.c optional acpi contrib/dev/acpica/components/events/evregion.c optional acpi contrib/dev/acpica/components/events/evrgnini.c optional acpi contrib/dev/acpica/components/events/evsci.c optional acpi contrib/dev/acpica/components/events/evxface.c optional acpi contrib/dev/acpica/components/events/evxfevnt.c optional acpi contrib/dev/acpica/components/events/evxfgpe.c optional acpi contrib/dev/acpica/components/events/evxfregn.c optional acpi contrib/dev/acpica/components/executer/exconfig.c optional acpi contrib/dev/acpica/components/executer/exconvrt.c optional acpi contrib/dev/acpica/components/executer/excreate.c optional acpi contrib/dev/acpica/components/executer/exdebug.c optional acpi contrib/dev/acpica/components/executer/exdump.c optional acpi contrib/dev/acpica/components/executer/exfield.c optional acpi contrib/dev/acpica/components/executer/exfldio.c optional acpi contrib/dev/acpica/components/executer/exmisc.c optional acpi contrib/dev/acpica/components/executer/exmutex.c optional acpi contrib/dev/acpica/components/executer/exnames.c optional acpi contrib/dev/acpica/components/executer/exoparg1.c optional acpi contrib/dev/acpica/components/executer/exoparg2.c optional acpi contrib/dev/acpica/components/executer/exoparg3.c optional acpi contrib/dev/acpica/components/executer/exoparg6.c optional acpi contrib/dev/acpica/components/executer/exprep.c optional acpi contrib/dev/acpica/components/executer/exregion.c optional acpi contrib/dev/acpica/components/executer/exresnte.c optional acpi contrib/dev/acpica/components/executer/exresolv.c optional acpi contrib/dev/acpica/components/executer/exresop.c optional acpi contrib/dev/acpica/components/executer/exstore.c optional acpi contrib/dev/acpica/components/executer/exstoren.c optional acpi contrib/dev/acpica/components/executer/exstorob.c optional acpi contrib/dev/acpica/components/executer/exsystem.c optional acpi contrib/dev/acpica/components/executer/exutils.c optional acpi contrib/dev/acpica/components/hardware/hwacpi.c optional acpi contrib/dev/acpica/components/hardware/hwesleep.c optional acpi contrib/dev/acpica/components/hardware/hwgpe.c optional acpi contrib/dev/acpica/components/hardware/hwpci.c optional acpi contrib/dev/acpica/components/hardware/hwregs.c optional acpi contrib/dev/acpica/components/hardware/hwsleep.c optional acpi contrib/dev/acpica/components/hardware/hwtimer.c optional acpi contrib/dev/acpica/components/hardware/hwvalid.c optional acpi contrib/dev/acpica/components/hardware/hwxface.c optional acpi contrib/dev/acpica/components/hardware/hwxfsleep.c optional acpi contrib/dev/acpica/components/namespace/nsaccess.c optional acpi contrib/dev/acpica/components/namespace/nsalloc.c optional acpi contrib/dev/acpica/components/namespace/nsarguments.c optional acpi contrib/dev/acpica/components/namespace/nsconvert.c optional acpi contrib/dev/acpica/components/namespace/nsdump.c optional acpi contrib/dev/acpica/components/namespace/nseval.c optional acpi contrib/dev/acpica/components/namespace/nsinit.c optional acpi contrib/dev/acpica/components/namespace/nsload.c optional acpi contrib/dev/acpica/components/namespace/nsnames.c optional acpi contrib/dev/acpica/components/namespace/nsobject.c optional acpi contrib/dev/acpica/components/namespace/nsparse.c optional acpi contrib/dev/acpica/components/namespace/nspredef.c optional acpi contrib/dev/acpica/components/namespace/nsprepkg.c optional acpi contrib/dev/acpica/components/namespace/nsrepair.c optional acpi contrib/dev/acpica/components/namespace/nsrepair2.c optional acpi contrib/dev/acpica/components/namespace/nssearch.c optional acpi contrib/dev/acpica/components/namespace/nsutils.c optional acpi contrib/dev/acpica/components/namespace/nswalk.c optional acpi contrib/dev/acpica/components/namespace/nsxfeval.c optional acpi contrib/dev/acpica/components/namespace/nsxfname.c optional acpi contrib/dev/acpica/components/namespace/nsxfobj.c optional acpi contrib/dev/acpica/components/parser/psargs.c optional acpi contrib/dev/acpica/components/parser/psloop.c optional acpi contrib/dev/acpica/components/parser/psobject.c optional acpi contrib/dev/acpica/components/parser/psopcode.c optional acpi contrib/dev/acpica/components/parser/psopinfo.c optional acpi contrib/dev/acpica/components/parser/psparse.c optional acpi contrib/dev/acpica/components/parser/psscope.c optional acpi contrib/dev/acpica/components/parser/pstree.c optional acpi contrib/dev/acpica/components/parser/psutils.c optional acpi contrib/dev/acpica/components/parser/pswalk.c optional acpi contrib/dev/acpica/components/parser/psxface.c optional acpi contrib/dev/acpica/components/resources/rsaddr.c optional acpi contrib/dev/acpica/components/resources/rscalc.c optional acpi contrib/dev/acpica/components/resources/rscreate.c optional acpi contrib/dev/acpica/components/resources/rsdump.c optional acpi acpi_debug contrib/dev/acpica/components/resources/rsdumpinfo.c optional acpi contrib/dev/acpica/components/resources/rsinfo.c optional acpi contrib/dev/acpica/components/resources/rsio.c optional acpi contrib/dev/acpica/components/resources/rsirq.c optional acpi contrib/dev/acpica/components/resources/rslist.c optional acpi contrib/dev/acpica/components/resources/rsmemory.c optional acpi contrib/dev/acpica/components/resources/rsmisc.c optional acpi contrib/dev/acpica/components/resources/rsserial.c optional acpi contrib/dev/acpica/components/resources/rsutils.c optional acpi contrib/dev/acpica/components/resources/rsxface.c optional acpi contrib/dev/acpica/components/tables/tbdata.c optional acpi contrib/dev/acpica/components/tables/tbfadt.c optional acpi contrib/dev/acpica/components/tables/tbfind.c optional acpi contrib/dev/acpica/components/tables/tbinstal.c optional acpi contrib/dev/acpica/components/tables/tbprint.c optional acpi contrib/dev/acpica/components/tables/tbutils.c optional acpi contrib/dev/acpica/components/tables/tbxface.c optional acpi contrib/dev/acpica/components/tables/tbxfload.c optional acpi contrib/dev/acpica/components/tables/tbxfroot.c optional acpi contrib/dev/acpica/components/utilities/utaddress.c optional acpi contrib/dev/acpica/components/utilities/utalloc.c optional acpi contrib/dev/acpica/components/utilities/utbuffer.c optional acpi contrib/dev/acpica/components/utilities/utcache.c optional acpi contrib/dev/acpica/components/utilities/utcopy.c optional acpi contrib/dev/acpica/components/utilities/utdebug.c optional acpi contrib/dev/acpica/components/utilities/utdecode.c optional acpi contrib/dev/acpica/components/utilities/utdelete.c optional acpi contrib/dev/acpica/components/utilities/uterror.c optional acpi contrib/dev/acpica/components/utilities/uteval.c optional acpi contrib/dev/acpica/components/utilities/utexcep.c optional acpi contrib/dev/acpica/components/utilities/utglobal.c optional acpi contrib/dev/acpica/components/utilities/uthex.c optional acpi contrib/dev/acpica/components/utilities/utids.c optional acpi contrib/dev/acpica/components/utilities/utinit.c optional acpi contrib/dev/acpica/components/utilities/utlock.c optional acpi contrib/dev/acpica/components/utilities/utmath.c optional acpi contrib/dev/acpica/components/utilities/utmisc.c optional acpi contrib/dev/acpica/components/utilities/utmutex.c optional acpi contrib/dev/acpica/components/utilities/utnonansi.c optional acpi contrib/dev/acpica/components/utilities/utobject.c optional acpi contrib/dev/acpica/components/utilities/utosi.c optional acpi contrib/dev/acpica/components/utilities/utownerid.c optional acpi contrib/dev/acpica/components/utilities/utpredef.c optional acpi contrib/dev/acpica/components/utilities/utresrc.c optional acpi contrib/dev/acpica/components/utilities/utstate.c optional acpi contrib/dev/acpica/components/utilities/utstring.c optional acpi contrib/dev/acpica/components/utilities/utuuid.c optional acpi acpi_debug contrib/dev/acpica/components/utilities/utxface.c optional acpi contrib/dev/acpica/components/utilities/utxferror.c optional acpi contrib/dev/acpica/components/utilities/utxfinit.c optional acpi #contrib/dev/acpica/components/utilities/utxfmutex.c optional acpi contrib/ipfilter/netinet/fil.c optional ipfilter inet \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_auth.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_fil_freebsd.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_frag.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_log.c optional ipfilter inet \ compile-with "${NORMAL_C} -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_nat.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_proxy.c optional ipfilter inet \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_state.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_lookup.c optional ipfilter inet \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN} -Wno-unused -Wno-error -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_pool.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_htable.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_sync.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/mlfk_ipl.c optional ipfilter inet \ compile-with "${NORMAL_C} -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_nat6.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_rules.c optional ipfilter inet \ compile-with "${NORMAL_C} -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_scan.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/ip_dstlist.c optional ipfilter inet \ compile-with "${NORMAL_C} -Wno-unused -I$S/contrib/ipfilter" contrib/ipfilter/netinet/radix_ipf.c optional ipfilter inet \ compile-with "${NORMAL_C} -I$S/contrib/ipfilter" contrib/libfdt/fdt.c optional fdt contrib/libfdt/fdt_ro.c optional fdt contrib/libfdt/fdt_rw.c optional fdt contrib/libfdt/fdt_strerror.c optional fdt contrib/libfdt/fdt_sw.c optional fdt contrib/libfdt/fdt_wip.c optional fdt contrib/libnv/dnvlist.c standard contrib/libnv/nvlist.c standard contrib/libnv/nvpair.c standard contrib/ngatm/netnatm/api/cc_conn.c optional ngatm_ccatm \ compile-with "${NORMAL_C_NOWERROR} -I$S/contrib/ngatm" contrib/ngatm/netnatm/api/cc_data.c optional ngatm_ccatm \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/api/cc_dump.c optional ngatm_ccatm \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/api/cc_port.c optional ngatm_ccatm \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/api/cc_sig.c optional ngatm_ccatm \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/api/cc_user.c optional ngatm_ccatm \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/api/unisap.c optional ngatm_ccatm \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/misc/straddr.c optional ngatm_atmbase \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/misc/unimsg_common.c optional ngatm_atmbase \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/msg/traffic.c optional ngatm_atmbase \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/msg/uni_ie.c optional ngatm_atmbase \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/msg/uni_msg.c optional ngatm_atmbase \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/saal/saal_sscfu.c optional ngatm_sscfu \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/saal/saal_sscop.c optional ngatm_sscop \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/sig/sig_call.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/sig/sig_coord.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/sig/sig_party.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/sig/sig_print.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/sig/sig_reset.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/sig/sig_uni.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/sig/sig_unimsgcpy.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" contrib/ngatm/netnatm/sig/sig_verify.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" crypto/blowfish/bf_ecb.c optional ipsec crypto/blowfish/bf_skey.c optional crypto | ipsec crypto/camellia/camellia.c optional crypto | ipsec crypto/camellia/camellia-api.c optional crypto | ipsec crypto/des/des_ecb.c optional crypto | ipsec | netsmb crypto/des/des_setkey.c optional crypto | ipsec | netsmb crypto/rc4/rc4.c optional netgraph_mppc_encryption | kgssapi crypto/rijndael/rijndael-alg-fst.c optional crypto | geom_bde | \ ipsec | random !random_loadable | wlan_ccmp crypto/rijndael/rijndael-api-fst.c optional geom_bde | random !random_loadable crypto/rijndael/rijndael-api.c optional crypto | ipsec | wlan_ccmp crypto/sha1.c optional carp | crypto | ipsec | \ netgraph_mppc_encryption | sctp crypto/sha2/sha256c.c optional crypto | geom_bde | ipsec | random !random_loadable | \ sctp | zfs crypto/sha2/sha512c.c optional crypto | geom_bde | ipsec | zfs crypto/siphash/siphash.c optional inet | inet6 crypto/siphash/siphash_test.c optional inet | inet6 ddb/db_access.c optional ddb ddb/db_break.c optional ddb ddb/db_capture.c optional ddb ddb/db_command.c optional ddb ddb/db_examine.c optional ddb ddb/db_expr.c optional ddb ddb/db_input.c optional ddb ddb/db_lex.c optional ddb ddb/db_main.c optional ddb ddb/db_output.c optional ddb ddb/db_print.c optional ddb ddb/db_ps.c optional ddb ddb/db_run.c optional ddb ddb/db_script.c optional ddb ddb/db_sym.c optional ddb ddb/db_thread.c optional ddb ddb/db_textdump.c optional ddb ddb/db_variables.c optional ddb ddb/db_watch.c optional ddb ddb/db_write_cmd.c optional ddb dev/aac/aac.c optional aac dev/aac/aac_cam.c optional aacp aac dev/aac/aac_debug.c optional aac dev/aac/aac_disk.c optional aac dev/aac/aac_linux.c optional aac compat_linux dev/aac/aac_pci.c optional aac pci dev/aacraid/aacraid.c optional aacraid dev/aacraid/aacraid_cam.c optional aacraid scbus dev/aacraid/aacraid_debug.c optional aacraid dev/aacraid/aacraid_linux.c optional aacraid compat_linux dev/aacraid/aacraid_pci.c optional aacraid pci dev/acpi_support/acpi_wmi.c optional acpi_wmi acpi dev/acpi_support/acpi_asus.c optional acpi_asus acpi dev/acpi_support/acpi_asus_wmi.c optional acpi_asus_wmi acpi dev/acpi_support/acpi_fujitsu.c optional acpi_fujitsu acpi dev/acpi_support/acpi_hp.c optional acpi_hp acpi dev/acpi_support/acpi_ibm.c optional acpi_ibm acpi dev/acpi_support/acpi_panasonic.c optional acpi_panasonic acpi dev/acpi_support/acpi_sony.c optional acpi_sony acpi dev/acpi_support/acpi_toshiba.c optional acpi_toshiba acpi dev/acpi_support/atk0110.c optional aibs acpi dev/acpica/Osd/OsdDebug.c optional acpi dev/acpica/Osd/OsdHardware.c optional acpi dev/acpica/Osd/OsdInterrupt.c optional acpi dev/acpica/Osd/OsdMemory.c optional acpi dev/acpica/Osd/OsdSchedule.c optional acpi dev/acpica/Osd/OsdStream.c optional acpi dev/acpica/Osd/OsdSynch.c optional acpi dev/acpica/Osd/OsdTable.c optional acpi dev/acpica/acpi.c optional acpi dev/acpica/acpi_acad.c optional acpi dev/acpica/acpi_battery.c optional acpi dev/acpica/acpi_button.c optional acpi dev/acpica/acpi_cmbat.c optional acpi dev/acpica/acpi_cpu.c optional acpi dev/acpica/acpi_ec.c optional acpi dev/acpica/acpi_isab.c optional acpi isa dev/acpica/acpi_lid.c optional acpi dev/acpica/acpi_package.c optional acpi dev/acpica/acpi_pci.c optional acpi pci dev/acpica/acpi_pci_link.c optional acpi pci dev/acpica/acpi_pcib.c optional acpi pci dev/acpica/acpi_pcib_acpi.c optional acpi pci dev/acpica/acpi_pcib_pci.c optional acpi pci dev/acpica/acpi_perf.c optional acpi dev/acpica/acpi_powerres.c optional acpi dev/acpica/acpi_quirk.c optional acpi dev/acpica/acpi_resource.c optional acpi dev/acpica/acpi_smbat.c optional acpi dev/acpica/acpi_thermal.c optional acpi dev/acpica/acpi_throttle.c optional acpi dev/acpica/acpi_timer.c optional acpi dev/acpica/acpi_video.c optional acpi_video acpi dev/acpica/acpi_dock.c optional acpi_dock acpi dev/adlink/adlink.c optional adlink dev/advansys/adv_eisa.c optional adv eisa dev/advansys/adv_pci.c optional adv pci dev/advansys/advansys.c optional adv dev/advansys/advlib.c optional adv dev/advansys/advmcode.c optional adv dev/advansys/adw_pci.c optional adw pci dev/advansys/adwcam.c optional adw dev/advansys/adwlib.c optional adw dev/advansys/adwmcode.c optional adw dev/ae/if_ae.c optional ae pci dev/age/if_age.c optional age pci dev/agp/agp.c optional agp pci dev/agp/agp_if.m optional agp pci dev/aha/aha.c optional aha dev/aha/aha_isa.c optional aha isa dev/aha/aha_mca.c optional aha mca dev/ahb/ahb.c optional ahb eisa dev/ahci/ahci.c optional ahci dev/ahci/ahciem.c optional ahci dev/ahci/ahci_pci.c optional ahci pci dev/aic/aic.c optional aic dev/aic/aic_pccard.c optional aic pccard dev/aic7xxx/ahc_eisa.c optional ahc eisa dev/aic7xxx/ahc_isa.c optional ahc isa dev/aic7xxx/ahc_pci.c optional ahc pci \ compile-with "${NORMAL_C} ${NO_WCONSTANT_CONVERSION}" dev/aic7xxx/ahd_pci.c optional ahd pci \ compile-with "${NORMAL_C} ${NO_WCONSTANT_CONVERSION}" dev/aic7xxx/aic7770.c optional ahc dev/aic7xxx/aic79xx.c optional ahd pci dev/aic7xxx/aic79xx_osm.c optional ahd pci dev/aic7xxx/aic79xx_pci.c optional ahd pci dev/aic7xxx/aic79xx_reg_print.c optional ahd pci ahd_reg_pretty_print dev/aic7xxx/aic7xxx.c optional ahc dev/aic7xxx/aic7xxx_93cx6.c optional ahc dev/aic7xxx/aic7xxx_osm.c optional ahc dev/aic7xxx/aic7xxx_pci.c optional ahc pci dev/aic7xxx/aic7xxx_reg_print.c optional ahc ahc_reg_pretty_print dev/alc/if_alc.c optional alc pci dev/ale/if_ale.c optional ale pci dev/alpm/alpm.c optional alpm pci dev/altera/avgen/altera_avgen.c optional altera_avgen dev/altera/avgen/altera_avgen_fdt.c optional altera_avgen fdt dev/altera/avgen/altera_avgen_nexus.c optional altera_avgen dev/altera/sdcard/altera_sdcard.c optional altera_sdcard dev/altera/sdcard/altera_sdcard_disk.c optional altera_sdcard dev/altera/sdcard/altera_sdcard_io.c optional altera_sdcard dev/altera/sdcard/altera_sdcard_fdt.c optional altera_sdcard fdt dev/altera/sdcard/altera_sdcard_nexus.c optional altera_sdcard dev/altera/pio/pio.c optional altera_pio dev/altera/pio/pio_if.m optional altera_pio dev/amdpm/amdpm.c optional amdpm pci | nfpm pci dev/amdsmb/amdsmb.c optional amdsmb pci dev/amr/amr.c optional amr dev/amr/amr_cam.c optional amrp amr dev/amr/amr_disk.c optional amr dev/amr/amr_linux.c optional amr compat_linux dev/amr/amr_pci.c optional amr pci dev/an/if_an.c optional an dev/an/if_an_isa.c optional an isa dev/an/if_an_pccard.c optional an pccard dev/an/if_an_pci.c optional an pci # dev/ata/ata_if.m optional ata | atacore dev/ata/ata-all.c optional ata | atacore dev/ata/ata-dma.c optional ata | atacore dev/ata/ata-lowlevel.c optional ata | atacore dev/ata/ata-sata.c optional ata | atacore dev/ata/ata-card.c optional ata pccard | atapccard dev/ata/ata-cbus.c optional ata pc98 | atapc98 dev/ata/ata-isa.c optional ata isa | ataisa dev/ata/ata-pci.c optional ata pci | atapci dev/ata/chipsets/ata-acard.c optional ata pci | ataacard dev/ata/chipsets/ata-acerlabs.c optional ata pci | ataacerlabs dev/ata/chipsets/ata-amd.c optional ata pci | ataamd dev/ata/chipsets/ata-ati.c optional ata pci | ataati dev/ata/chipsets/ata-cenatek.c optional ata pci | atacenatek dev/ata/chipsets/ata-cypress.c optional ata pci | atacypress dev/ata/chipsets/ata-cyrix.c optional ata pci | atacyrix dev/ata/chipsets/ata-highpoint.c optional ata pci | atahighpoint dev/ata/chipsets/ata-intel.c optional ata pci | ataintel dev/ata/chipsets/ata-ite.c optional ata pci | ataite dev/ata/chipsets/ata-jmicron.c optional ata pci | atajmicron dev/ata/chipsets/ata-marvell.c optional ata pci | atamarvell dev/ata/chipsets/ata-micron.c optional ata pci | atamicron dev/ata/chipsets/ata-national.c optional ata pci | atanational dev/ata/chipsets/ata-netcell.c optional ata pci | atanetcell dev/ata/chipsets/ata-nvidia.c optional ata pci | atanvidia dev/ata/chipsets/ata-promise.c optional ata pci | atapromise dev/ata/chipsets/ata-serverworks.c optional ata pci | ataserverworks dev/ata/chipsets/ata-siliconimage.c optional ata pci | atasiliconimage | ataati dev/ata/chipsets/ata-sis.c optional ata pci | atasis dev/ata/chipsets/ata-via.c optional ata pci | atavia # dev/ath/if_ath_pci.c optional ath_pci pci \ compile-with "${NORMAL_C} -I$S/dev/ath" # dev/ath/if_ath_ahb.c optional ath_ahb \ compile-with "${NORMAL_C} -I$S/dev/ath" # dev/ath/if_ath.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_alq.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_beacon.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_btcoex.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_debug.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_descdma.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_keycache.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_ioctl.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_led.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_lna_div.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_tx.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_tx_edma.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_tx_ht.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_tdma.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_sysctl.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_rx.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_rx_edma.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/if_ath_spectral.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ah_osdep.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" # dev/ath/ath_hal/ah.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ath_hal/ah_eeprom_v1.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ath_hal/ah_eeprom_v3.c optional ath_hal | ath_ar5211 | ath_ar5212 \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ath_hal/ah_eeprom_v14.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ath_hal/ah_eeprom_v4k.c \ optional ath_hal | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ath_hal/ah_eeprom_9287.c \ optional ath_hal | ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ath_hal/ah_regdomain.c optional ath \ compile-with "${NORMAL_C} ${NO_WSHIFT_COUNT_NEGATIVE} ${NO_WSHIFT_COUNT_OVERFLOW} -I$S/dev/ath" # ar5210 dev/ath/ath_hal/ar5210/ar5210_attach.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_beacon.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_interrupts.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_keycache.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_misc.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_phy.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_power.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_recv.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_reset.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5210/ar5210_xmit.c optional ath_hal | ath_ar5210 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar5211 dev/ath/ath_hal/ar5211/ar5211_attach.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_beacon.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_interrupts.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_keycache.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_misc.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_phy.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_power.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_recv.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_reset.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5211/ar5211_xmit.c optional ath_hal | ath_ar5211 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar5212 dev/ath/ath_hal/ar5212/ar5212_ani.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_attach.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_beacon.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_eeprom.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_gpio.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_interrupts.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_keycache.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_misc.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_phy.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_power.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_recv.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_reset.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_rfgain.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5212_xmit.c \ optional ath_hal | ath_ar5212 | ath_ar5416 | ath_ar9160 | ath_ar9280 | \ ath_ar9285 ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar5416 (depends on ar5212) dev/ath/ath_hal/ar5416/ar5416_ani.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_attach.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_beacon.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_btcoex.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_cal.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_cal_iq.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_cal_adcgain.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_cal_adcdc.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_eeprom.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_gpio.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_interrupts.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_keycache.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_misc.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_phy.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_power.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_radar.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_recv.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_reset.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_spectral.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar5416_xmit.c \ optional ath_hal | ath_ar5416 | ath_ar9160 | ath_ar9280 | ath_ar9285 | \ ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar9130 (depends upon ar5416) - also requires AH_SUPPORT_AR9130 # # Since this is an embedded MAC SoC, there's no need to compile it into the # default HAL. dev/ath/ath_hal/ar9001/ar9130_attach.c optional ath_ar9130 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9001/ar9130_phy.c optional ath_ar9130 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9001/ar9130_eeprom.c optional ath_ar9130 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar9160 (depends on ar5416) dev/ath/ath_hal/ar9001/ar9160_attach.c optional ath_hal | ath_ar9160 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar9280 (depends on ar5416) dev/ath/ath_hal/ar9002/ar9280_attach.c optional ath_hal | ath_ar9280 | \ ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9280_olc.c optional ath_hal | ath_ar9280 | \ ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar9285 (depends on ar5416 and ar9280) dev/ath/ath_hal/ar9002/ar9285_attach.c optional ath_hal | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9285_btcoex.c optional ath_hal | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9285_reset.c optional ath_hal | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9285_cal.c optional ath_hal | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9285_phy.c optional ath_hal | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9285_diversity.c optional ath_hal | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar9287 (depends on ar5416) dev/ath/ath_hal/ar9002/ar9287_attach.c optional ath_hal | ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9287_reset.c optional ath_hal | ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9287_cal.c optional ath_hal | ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9287_olc.c optional ath_hal | ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ar9300 contrib/dev/ath/ath_hal/ar9300/ar9300_ani.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_attach.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_beacon.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_eeprom.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal ${NO_WCONSTANT_CONVERSION}" contrib/dev/ath/ath_hal/ar9300/ar9300_freebsd.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_gpio.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_interrupts.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_keycache.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_mci.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_misc.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_paprd.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_phy.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_power.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_radar.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_radio.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_recv.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_recv_ds.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_reset.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal ${NO_WSOMETIMES_UNINITIALIZED} -Wno-unused-function" contrib/dev/ath/ath_hal/ar9300/ar9300_stub.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_stub_funcs.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_spectral.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_timer.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_xmit.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" contrib/dev/ath/ath_hal/ar9300/ar9300_xmit_ds.c optional ath_hal | ath_ar9300 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal -I$S/contrib/dev/ath/ath_hal" # rf backends dev/ath/ath_hal/ar5212/ar2316.c optional ath_rf2316 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar2317.c optional ath_rf2317 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar2413.c optional ath_hal | ath_rf2413 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar2425.c optional ath_hal | ath_rf2425 | ath_rf2417 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5111.c optional ath_hal | ath_rf5111 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5112.c optional ath_hal | ath_rf5112 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5212/ar5413.c optional ath_hal | ath_rf5413 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar5416/ar2133.c optional ath_hal | ath_ar5416 | \ ath_ar9130 | ath_ar9160 | ath_ar9280 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9280.c optional ath_hal | ath_ar9280 | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9285.c optional ath_hal | ath_ar9285 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" dev/ath/ath_hal/ar9002/ar9287.c optional ath_hal | ath_ar9287 \ compile-with "${NORMAL_C} -I$S/dev/ath -I$S/dev/ath/ath_hal" # ath rate control algorithms dev/ath/ath_rate/amrr/amrr.c optional ath_rate_amrr \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ath_rate/onoe/onoe.c optional ath_rate_onoe \ compile-with "${NORMAL_C} -I$S/dev/ath" dev/ath/ath_rate/sample/sample.c optional ath_rate_sample \ compile-with "${NORMAL_C} -I$S/dev/ath" # ath DFS modules dev/ath/ath_dfs/null/dfs_null.c optional ath \ compile-with "${NORMAL_C} -I$S/dev/ath" # dev/bce/if_bce.c optional bce dev/bfe/if_bfe.c optional bfe dev/bge/if_bge.c optional bge dev/bktr/bktr_audio.c optional bktr pci dev/bktr/bktr_card.c optional bktr pci dev/bktr/bktr_core.c optional bktr pci dev/bktr/bktr_i2c.c optional bktr pci smbus dev/bktr/bktr_os.c optional bktr pci dev/bktr/bktr_tuner.c optional bktr pci dev/bktr/msp34xx.c optional bktr pci dev/buslogic/bt.c optional bt dev/buslogic/bt_eisa.c optional bt eisa dev/buslogic/bt_isa.c optional bt isa dev/buslogic/bt_mca.c optional bt mca dev/buslogic/bt_pci.c optional bt pci dev/bwi/bwimac.c optional bwi dev/bwi/bwiphy.c optional bwi dev/bwi/bwirf.c optional bwi dev/bwi/if_bwi.c optional bwi dev/bwi/if_bwi_pci.c optional bwi pci # XXX Work around clang warning, until maintainer approves fix. dev/bwn/if_bwn.c optional bwn siba_bwn \ compile-with "${NORMAL_C} ${NO_WSOMETIMES_UNINITIALIZED}" dev/cardbus/cardbus.c optional cardbus dev/cardbus/cardbus_cis.c optional cardbus dev/cardbus/cardbus_device.c optional cardbus dev/cas/if_cas.c optional cas dev/cfi/cfi_bus_fdt.c optional cfi fdt dev/cfi/cfi_bus_nexus.c optional cfi dev/cfi/cfi_core.c optional cfi dev/cfi/cfi_dev.c optional cfi dev/cfi/cfi_disk.c optional cfid dev/ciss/ciss.c optional ciss dev/cm/smc90cx6.c optional cm dev/cmx/cmx.c optional cmx dev/cmx/cmx_pccard.c optional cmx pccard dev/cpufreq/ichss.c optional cpufreq dev/cs/if_cs.c optional cs dev/cs/if_cs_isa.c optional cs isa dev/cs/if_cs_pccard.c optional cs pccard dev/cxgb/cxgb_main.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/cxgb_sge.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_mc5.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_vsc7323.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_vsc8211.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_ael1002.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_aq100x.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_mv88e1xxx.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_xgmac.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_t3_hw.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/common/cxgb_tn1010.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/sys/uipc_mvec.c optional cxgb pci \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/cxgb_t3fw.c optional cxgb cxgb_t3fw \ compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgbe/t4_mp_ring.c optional cxgbe pci \ compile-with "${NORMAL_C} -I$S/dev/cxgbe" dev/cxgbe/t4_main.c optional cxgbe pci \ compile-with "${NORMAL_C} -I$S/dev/cxgbe" dev/cxgbe/t4_netmap.c optional cxgbe pci \ compile-with "${NORMAL_C} -I$S/dev/cxgbe" dev/cxgbe/t4_sge.c optional cxgbe pci \ compile-with "${NORMAL_C} -I$S/dev/cxgbe" dev/cxgbe/t4_l2t.c optional cxgbe pci \ compile-with "${NORMAL_C} -I$S/dev/cxgbe" dev/cxgbe/t4_tracer.c optional cxgbe pci \ compile-with "${NORMAL_C} -I$S/dev/cxgbe" dev/cxgbe/common/t4_hw.c optional cxgbe pci \ compile-with "${NORMAL_C} -I$S/dev/cxgbe" t4fw_cfg.c optional cxgbe \ compile-with "${AWK} -f $S/tools/fw_stub.awk t4fw_cfg.fw:t4fw_cfg t4fw_cfg_uwire.fw:t4fw_cfg_uwire t4fw.fw:t4fw -mt4fw_cfg -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "t4fw_cfg.c" t4fw_cfg.fwo optional cxgbe \ dependency "t4fw_cfg.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "t4fw_cfg.fwo" t4fw_cfg.fw optional cxgbe \ dependency "$S/dev/cxgbe/firmware/t4fw_cfg.txt" \ compile-with "${CP} ${.ALLSRC} ${.TARGET}" \ no-obj no-implicit-rule \ clean "t4fw_cfg.fw" t4fw_cfg_uwire.fwo optional cxgbe \ dependency "t4fw_cfg_uwire.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "t4fw_cfg_uwire.fwo" t4fw_cfg_uwire.fw optional cxgbe \ dependency "$S/dev/cxgbe/firmware/t4fw_cfg_uwire.txt" \ compile-with "${CP} ${.ALLSRC} ${.TARGET}" \ no-obj no-implicit-rule \ clean "t4fw_cfg_uwire.fw" t4fw.fwo optional cxgbe \ dependency "t4fw.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "t4fw.fwo" t4fw.fw optional cxgbe \ dependency "$S/dev/cxgbe/firmware/t4fw-1.14.4.0.bin.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "t4fw.fw" t5fw_cfg.c optional cxgbe \ compile-with "${AWK} -f $S/tools/fw_stub.awk t5fw_cfg.fw:t5fw_cfg t5fw.fw:t5fw -mt5fw_cfg -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "t5fw_cfg.c" t5fw_cfg.fwo optional cxgbe \ dependency "t5fw_cfg.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "t5fw_cfg.fwo" t5fw_cfg.fw optional cxgbe \ dependency "$S/dev/cxgbe/firmware/t5fw_cfg.txt" \ compile-with "${CP} ${.ALLSRC} ${.TARGET}" \ no-obj no-implicit-rule \ clean "t5fw_cfg.fw" t5fw.fwo optional cxgbe \ dependency "t5fw.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "t5fw.fwo" t5fw.fw optional cxgbe \ dependency "$S/dev/cxgbe/firmware/t5fw-1.14.4.0.bin.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "t5fw.fw" dev/cy/cy.c optional cy dev/cy/cy_isa.c optional cy isa dev/cy/cy_pci.c optional cy pci dev/cyapa/cyapa.c optional cyapa smbus dev/dc/if_dc.c optional dc pci dev/dc/dcphy.c optional dc pci dev/dc/pnphy.c optional dc pci dev/dcons/dcons.c optional dcons dev/dcons/dcons_crom.c optional dcons_crom dev/dcons/dcons_os.c optional dcons dev/de/if_de.c optional de pci dev/digi/CX.c optional digi_CX dev/digi/CX_PCI.c optional digi_CX_PCI dev/digi/EPCX.c optional digi_EPCX dev/digi/EPCX_PCI.c optional digi_EPCX_PCI dev/digi/Xe.c optional digi_Xe dev/digi/Xem.c optional digi_Xem dev/digi/Xr.c optional digi_Xr dev/digi/digi.c optional digi dev/digi/digi_isa.c optional digi isa dev/digi/digi_pci.c optional digi pci dev/dpt/dpt_eisa.c optional dpt eisa dev/dpt/dpt_pci.c optional dpt pci dev/dpt/dpt_scsi.c optional dpt dev/drm/ati_pcigart.c optional drm dev/drm/drm_agpsupport.c optional drm dev/drm/drm_auth.c optional drm dev/drm/drm_bufs.c optional drm dev/drm/drm_context.c optional drm dev/drm/drm_dma.c optional drm dev/drm/drm_drawable.c optional drm dev/drm/drm_drv.c optional drm dev/drm/drm_fops.c optional drm dev/drm/drm_hashtab.c optional drm dev/drm/drm_ioctl.c optional drm dev/drm/drm_irq.c optional drm dev/drm/drm_lock.c optional drm dev/drm/drm_memory.c optional drm dev/drm/drm_mm.c optional drm dev/drm/drm_pci.c optional drm dev/drm/drm_scatter.c optional drm dev/drm/drm_sman.c optional drm dev/drm/drm_sysctl.c optional drm dev/drm/drm_vm.c optional drm dev/drm/i915_dma.c optional i915drm dev/drm/i915_drv.c optional i915drm dev/drm/i915_irq.c optional i915drm dev/drm/i915_mem.c optional i915drm dev/drm/i915_suspend.c optional i915drm dev/drm/mach64_dma.c optional mach64drm dev/drm/mach64_drv.c optional mach64drm dev/drm/mach64_irq.c optional mach64drm dev/drm/mach64_state.c optional mach64drm dev/drm/mga_dma.c optional mgadrm dev/drm/mga_drv.c optional mgadrm dev/drm/mga_irq.c optional mgadrm dev/drm/mga_state.c optional mgadrm dev/drm/mga_warp.c optional mgadrm dev/drm/r128_cce.c optional r128drm \ compile-with "${NORMAL_C} ${NO_WCONSTANT_CONVERSION}" dev/drm/r128_drv.c optional r128drm dev/drm/r128_irq.c optional r128drm dev/drm/r128_state.c optional r128drm dev/drm/r300_cmdbuf.c optional radeondrm dev/drm/r600_blit.c optional radeondrm dev/drm/r600_cp.c optional radeondrm \ compile-with "${NORMAL_C} ${NO_WCONSTANT_CONVERSION}" dev/drm/radeon_cp.c optional radeondrm \ compile-with "${NORMAL_C} ${NO_WCONSTANT_CONVERSION}" dev/drm/radeon_cs.c optional radeondrm dev/drm/radeon_drv.c optional radeondrm dev/drm/radeon_irq.c optional radeondrm dev/drm/radeon_mem.c optional radeondrm dev/drm/radeon_state.c optional radeondrm dev/drm/savage_bci.c optional savagedrm dev/drm/savage_drv.c optional savagedrm dev/drm/savage_state.c optional savagedrm dev/drm/sis_drv.c optional sisdrm dev/drm/sis_ds.c optional sisdrm dev/drm/sis_mm.c optional sisdrm dev/drm/tdfx_drv.c optional tdfxdrm dev/drm/via_dma.c optional viadrm dev/drm/via_dmablit.c optional viadrm dev/drm/via_drv.c optional viadrm dev/drm/via_irq.c optional viadrm dev/drm/via_map.c optional viadrm dev/drm/via_mm.c optional viadrm dev/drm/via_verifier.c optional viadrm dev/drm/via_video.c optional viadrm dev/ed/if_ed.c optional ed dev/ed/if_ed_novell.c optional ed dev/ed/if_ed_rtl80x9.c optional ed dev/ed/if_ed_pccard.c optional ed pccard dev/ed/if_ed_pci.c optional ed pci dev/eisa/eisa_if.m standard dev/eisa/eisaconf.c optional eisa dev/e1000/if_em.c optional em \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/if_lem.c optional em \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/if_igb.c optional igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_80003es2lan.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_82540.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_82541.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_82542.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_82543.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_82571.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_82575.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_ich8lan.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_i210.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_api.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_mac.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_manage.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_nvm.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_phy.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_vf.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_mbx.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/e1000/e1000_osdep.c optional em | igb \ compile-with "${NORMAL_C} -I$S/dev/e1000" dev/et/if_et.c optional et dev/en/if_en_pci.c optional en pci dev/en/midway.c optional en dev/ep/if_ep.c optional ep dev/ep/if_ep_eisa.c optional ep eisa dev/ep/if_ep_isa.c optional ep isa dev/ep/if_ep_mca.c optional ep mca dev/ep/if_ep_pccard.c optional ep pccard dev/esp/esp_pci.c optional esp pci dev/esp/ncr53c9x.c optional esp dev/etherswitch/arswitch/arswitch.c optional arswitch dev/etherswitch/arswitch/arswitch_reg.c optional arswitch dev/etherswitch/arswitch/arswitch_phy.c optional arswitch dev/etherswitch/arswitch/arswitch_8216.c optional arswitch dev/etherswitch/arswitch/arswitch_8226.c optional arswitch dev/etherswitch/arswitch/arswitch_8316.c optional arswitch dev/etherswitch/arswitch/arswitch_8327.c optional arswitch dev/etherswitch/arswitch/arswitch_7240.c optional arswitch dev/etherswitch/arswitch/arswitch_9340.c optional arswitch dev/etherswitch/arswitch/arswitch_vlans.c optional arswitch dev/etherswitch/etherswitch.c optional etherswitch dev/etherswitch/etherswitch_if.m optional etherswitch dev/etherswitch/ip17x/ip17x.c optional ip17x dev/etherswitch/ip17x/ip175c.c optional ip17x dev/etherswitch/ip17x/ip175d.c optional ip17x dev/etherswitch/ip17x/ip17x_phy.c optional ip17x dev/etherswitch/ip17x/ip17x_vlans.c optional ip17x dev/etherswitch/miiproxy.c optional miiproxy dev/etherswitch/rtl8366/rtl8366rb.c optional rtl8366rb dev/etherswitch/ukswitch/ukswitch.c optional ukswitch dev/ex/if_ex.c optional ex dev/ex/if_ex_isa.c optional ex isa dev/ex/if_ex_pccard.c optional ex pccard dev/exca/exca.c optional cbb +dev/extres/clk/clk.c optional ext_resources clk +dev/extres/clk/clkdev_if.m optional ext_resources clk +dev/extres/clk/clknode_if.m optional ext_resources clk +dev/extres/clk/clk_div.c optional ext_resources clk +dev/extres/clk/clk_fixed.c optional ext_resources clk +dev/extres/clk/clk_gate.c optional ext_resources clk +dev/extres/clk/clk_mux.c optional ext_resources clk +dev/extres/hwreset/hwreset.c optional ext_resources hwreset +dev/extres/hwreset/hwreset_if.m optional ext_resources hwreset dev/fatm/if_fatm.c optional fatm pci dev/fb/fbd.c optional fbd | vt dev/fb/fb_if.m standard dev/fb/splash.c optional sc splash dev/fdt/fdt_clock.c optional fdt fdt_clock dev/fdt/fdt_clock_if.m optional fdt fdt_clock dev/fdt/fdt_common.c optional fdt dev/fdt/fdt_pinctrl.c optional fdt fdt_pinctrl dev/fdt/fdt_pinctrl_if.m optional fdt fdt_pinctrl -dev/fdt/fdt_slicer.c optional fdt cfi | fdt nand +dev/fdt/fdt_slicer.c optional fdt cfi | fdt nand | fdt mx25l dev/fdt/fdt_static_dtb.S optional fdt fdt_dtb_static \ dependency "$S/boot/fdt/dts/${MACHINE}/${FDT_DTS_FILE}" dev/fdt/simplebus.c optional fdt dev/fe/if_fe.c optional fe dev/fe/if_fe_pccard.c optional fe pccard dev/filemon/filemon.c optional filemon dev/firewire/firewire.c optional firewire dev/firewire/fwcrom.c optional firewire dev/firewire/fwdev.c optional firewire dev/firewire/fwdma.c optional firewire dev/firewire/fwmem.c optional firewire dev/firewire/fwohci.c optional firewire dev/firewire/fwohci_pci.c optional firewire pci dev/firewire/if_fwe.c optional fwe dev/firewire/if_fwip.c optional fwip dev/firewire/sbp.c optional sbp dev/firewire/sbp_targ.c optional sbp_targ dev/flash/at45d.c optional at45d dev/flash/mx25l.c optional mx25l dev/fxp/if_fxp.c optional fxp dev/fxp/inphy.c optional fxp dev/gem/if_gem.c optional gem dev/gem/if_gem_pci.c optional gem pci dev/gem/if_gem_sbus.c optional gem sbus dev/gpio/gpiobacklight.c optional gpiobacklight fdt dev/gpio/gpiobus.c optional gpio \ dependency "gpiobus_if.h" dev/gpio/gpioc.c optional gpio \ dependency "gpio_if.h" dev/gpio/gpioiic.c optional gpioiic dev/gpio/gpioled.c optional gpioled dev/gpio/gpio_if.m optional gpio dev/gpio/gpiobus_if.m optional gpio dev/gpio/ofw_gpiobus.c optional fdt gpio dev/hatm/if_hatm.c optional hatm pci dev/hatm/if_hatm_intr.c optional hatm pci dev/hatm/if_hatm_ioctl.c optional hatm pci dev/hatm/if_hatm_rx.c optional hatm pci dev/hatm/if_hatm_tx.c optional hatm pci dev/hifn/hifn7751.c optional hifn dev/hme/if_hme.c optional hme dev/hme/if_hme_pci.c optional hme pci dev/hme/if_hme_sbus.c optional hme sbus dev/hptiop/hptiop.c optional hptiop scbus dev/hwpmc/hwpmc_logging.c optional hwpmc dev/hwpmc/hwpmc_mod.c optional hwpmc dev/hwpmc/hwpmc_soft.c optional hwpmc dev/ichiic/ig4_iic.c optional ig4 smbus dev/ichiic/ig4_pci.c optional ig4 pci smbus dev/ichsmb/ichsmb.c optional ichsmb dev/ichsmb/ichsmb_pci.c optional ichsmb pci dev/ida/ida.c optional ida dev/ida/ida_disk.c optional ida dev/ida/ida_eisa.c optional ida eisa dev/ida/ida_pci.c optional ida pci dev/ie/if_ie.c optional ie isa nowerror dev/ie/if_ie_isa.c optional ie isa dev/iicbus/ad7418.c optional ad7418 dev/iicbus/ds1307.c optional ds1307 dev/iicbus/ds133x.c optional ds133x dev/iicbus/ds1374.c optional ds1374 dev/iicbus/ds1672.c optional ds1672 dev/iicbus/ds3231.c optional ds3231 dev/iicbus/icee.c optional icee dev/iicbus/if_ic.c optional ic dev/iicbus/iic.c optional iic dev/iicbus/iicbb.c optional iicbb dev/iicbus/iicbb_if.m optional iicbb dev/iicbus/iicbus.c optional iicbus dev/iicbus/iicbus_if.m optional iicbus dev/iicbus/iiconf.c optional iicbus dev/iicbus/iicsmb.c optional iicsmb \ dependency "iicbus_if.h" dev/iicbus/iicoc.c optional iicoc dev/iicbus/lm75.c optional lm75 dev/iicbus/pcf8563.c optional pcf8563 dev/iicbus/s35390a.c optional s35390a dev/iir/iir.c optional iir dev/iir/iir_ctrl.c optional iir dev/iir/iir_pci.c optional iir pci dev/intpm/intpm.c optional intpm pci # XXX Work around clang warning, until maintainer approves fix. dev/ips/ips.c optional ips \ compile-with "${NORMAL_C} ${NO_WSOMETIMES_UNINITIALIZED}" dev/ips/ips_commands.c optional ips dev/ips/ips_disk.c optional ips dev/ips/ips_ioctl.c optional ips dev/ips/ips_pci.c optional ips pci dev/ipw/if_ipw.c optional ipw ipwbssfw.c optional ipwbssfw | ipwfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk ipw_bss.fw:ipw_bss:130 -lintel_ipw -mipw_bss -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "ipwbssfw.c" ipw_bss.fwo optional ipwbssfw | ipwfw \ dependency "ipw_bss.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "ipw_bss.fwo" ipw_bss.fw optional ipwbssfw | ipwfw \ dependency "$S/contrib/dev/ipw/ipw2100-1.3.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "ipw_bss.fw" ipwibssfw.c optional ipwibssfw | ipwfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk ipw_ibss.fw:ipw_ibss:130 -lintel_ipw -mipw_ibss -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "ipwibssfw.c" ipw_ibss.fwo optional ipwibssfw | ipwfw \ dependency "ipw_ibss.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "ipw_ibss.fwo" ipw_ibss.fw optional ipwibssfw | ipwfw \ dependency "$S/contrib/dev/ipw/ipw2100-1.3-i.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "ipw_ibss.fw" ipwmonitorfw.c optional ipwmonitorfw | ipwfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk ipw_monitor.fw:ipw_monitor:130 -lintel_ipw -mipw_monitor -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "ipwmonitorfw.c" ipw_monitor.fwo optional ipwmonitorfw | ipwfw \ dependency "ipw_monitor.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "ipw_monitor.fwo" ipw_monitor.fw optional ipwmonitorfw | ipwfw \ dependency "$S/contrib/dev/ipw/ipw2100-1.3-p.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "ipw_monitor.fw" dev/iscsi/icl.c optional iscsi | ctl dev/iscsi/icl_conn_if.m optional iscsi | ctl dev/iscsi/icl_proxy.c optional iscsi | ctl dev/iscsi/icl_soft.c optional iscsi | ctl dev/iscsi/iscsi.c optional iscsi scbus dev/iscsi_initiator/iscsi.c optional iscsi_initiator scbus dev/iscsi_initiator/iscsi_subr.c optional iscsi_initiator scbus dev/iscsi_initiator/isc_cam.c optional iscsi_initiator scbus dev/iscsi_initiator/isc_soc.c optional iscsi_initiator scbus dev/iscsi_initiator/isc_sm.c optional iscsi_initiator scbus dev/iscsi_initiator/isc_subr.c optional iscsi_initiator scbus dev/ismt/ismt.c optional ismt dev/isl/isl.c optional isl smbus dev/isp/isp.c optional isp dev/isp/isp_freebsd.c optional isp dev/isp/isp_library.c optional isp dev/isp/isp_pci.c optional isp pci dev/isp/isp_sbus.c optional isp sbus dev/isp/isp_target.c optional isp dev/ispfw/ispfw.c optional ispfw dev/iwi/if_iwi.c optional iwi iwibssfw.c optional iwibssfw | iwifw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwi_bss.fw:iwi_bss:300 -lintel_iwi -miwi_bss -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwibssfw.c" iwi_bss.fwo optional iwibssfw | iwifw \ dependency "iwi_bss.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwi_bss.fwo" iwi_bss.fw optional iwibssfw | iwifw \ dependency "$S/contrib/dev/iwi/ipw2200-bss.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwi_bss.fw" iwiibssfw.c optional iwiibssfw | iwifw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwi_ibss.fw:iwi_ibss:300 -lintel_iwi -miwi_ibss -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwiibssfw.c" iwi_ibss.fwo optional iwiibssfw | iwifw \ dependency "iwi_ibss.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwi_ibss.fwo" iwi_ibss.fw optional iwiibssfw | iwifw \ dependency "$S/contrib/dev/iwi/ipw2200-ibss.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwi_ibss.fw" iwimonitorfw.c optional iwimonitorfw | iwifw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwi_monitor.fw:iwi_monitor:300 -lintel_iwi -miwi_monitor -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwimonitorfw.c" iwi_monitor.fwo optional iwimonitorfw | iwifw \ dependency "iwi_monitor.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwi_monitor.fwo" iwi_monitor.fw optional iwimonitorfw | iwifw \ dependency "$S/contrib/dev/iwi/ipw2200-sniffer.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwi_monitor.fw" dev/iwm/if_iwm.c optional iwm dev/iwm/if_iwm_binding.c optional iwm dev/iwm/if_iwm_mac_ctxt.c optional iwm dev/iwm/if_iwm_pcie_trans.c optional iwm dev/iwm/if_iwm_phy_ctxt.c optional iwm dev/iwm/if_iwm_phy_db.c optional iwm dev/iwm/if_iwm_power.c optional iwm dev/iwm/if_iwm_scan.c optional iwm dev/iwm/if_iwm_time_event.c optional iwm dev/iwm/if_iwm_util.c optional iwm iwm3160fw.c optional iwm3160fw | iwmfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwm3160.fw:iwm3160fw -miwm3160fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwm3160fw.c" iwm3160fw.fwo optional iwm3160fw | iwmfw \ dependency "iwm3160.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwm3160fw.fwo" iwm3160.fw optional iwm3160fw | iwmfw \ dependency "$S/contrib/dev/iwm/iwm-3160-9.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwm3160.fw" iwm7260fw.c optional iwm7260fw | iwmfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwm7260.fw:iwm7260fw -miwm7260fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwm7260fw.c" iwm7260fw.fwo optional iwm7260fw | iwmfw \ dependency "iwm7260.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwm7260fw.fwo" iwm7260.fw optional iwm7260fw | iwmfw \ dependency "$S/contrib/dev/iwm/iwm-7260-9.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwm7260.fw" iwm7265fw.c optional iwm7265fw | iwmfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwm7265.fw:iwm7265fw -miwm7265fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwm7265fw.c" iwm7265fw.fwo optional iwm7265fw | iwmfw \ dependency "iwm7265.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwm7265fw.fwo" iwm7265.fw optional iwm7265fw | iwmfw \ dependency "$S/contrib/dev/iwm/iwm-7265-9.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwm7265.fw" dev/iwn/if_iwn.c optional iwn iwn1000fw.c optional iwn1000fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn1000.fw:iwn1000fw -miwn1000fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn1000fw.c" iwn1000fw.fwo optional iwn1000fw | iwnfw \ dependency "iwn1000.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn1000fw.fwo" iwn1000.fw optional iwn1000fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-1000-39.31.5.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn1000.fw" iwn100fw.c optional iwn100fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn100.fw:iwn100fw -miwn100fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn100fw.c" iwn100fw.fwo optional iwn100fw | iwnfw \ dependency "iwn100.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn100fw.fwo" iwn100.fw optional iwn100fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-100-39.31.5.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn100.fw" iwn105fw.c optional iwn105fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn105.fw:iwn105fw -miwn105fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn105fw.c" iwn105fw.fwo optional iwn105fw | iwnfw \ dependency "iwn105.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn105fw.fwo" iwn105.fw optional iwn105fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-105-6-18.168.6.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn105.fw" iwn135fw.c optional iwn135fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn135.fw:iwn135fw -miwn135fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn135fw.c" iwn135fw.fwo optional iwn135fw | iwnfw \ dependency "iwn135.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn135fw.fwo" iwn135.fw optional iwn135fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-135-6-18.168.6.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn135.fw" iwn2000fw.c optional iwn2000fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn2000.fw:iwn2000fw -miwn2000fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn2000fw.c" iwn2000fw.fwo optional iwn2000fw | iwnfw \ dependency "iwn2000.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn2000fw.fwo" iwn2000.fw optional iwn2000fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-2000-18.168.6.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn2000.fw" iwn2030fw.c optional iwn2030fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn2030.fw:iwn2030fw -miwn2030fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn2030fw.c" iwn2030fw.fwo optional iwn2030fw | iwnfw \ dependency "iwn2030.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn2030fw.fwo" iwn2030.fw optional iwn2030fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwnwifi-2030-18.168.6.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn2030.fw" iwn4965fw.c optional iwn4965fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn4965.fw:iwn4965fw -miwn4965fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn4965fw.c" iwn4965fw.fwo optional iwn4965fw | iwnfw \ dependency "iwn4965.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn4965fw.fwo" iwn4965.fw optional iwn4965fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-4965-228.61.2.24.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn4965.fw" iwn5000fw.c optional iwn5000fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn5000.fw:iwn5000fw -miwn5000fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn5000fw.c" iwn5000fw.fwo optional iwn5000fw | iwnfw \ dependency "iwn5000.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn5000fw.fwo" iwn5000.fw optional iwn5000fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-5000-8.83.5.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn5000.fw" iwn5150fw.c optional iwn5150fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn5150.fw:iwn5150fw -miwn5150fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn5150fw.c" iwn5150fw.fwo optional iwn5150fw | iwnfw \ dependency "iwn5150.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn5150fw.fwo" iwn5150.fw optional iwn5150fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-5150-8.24.2.2.fw.uu"\ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn5150.fw" iwn6000fw.c optional iwn6000fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn6000.fw:iwn6000fw -miwn6000fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn6000fw.c" iwn6000fw.fwo optional iwn6000fw | iwnfw \ dependency "iwn6000.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn6000fw.fwo" iwn6000.fw optional iwn6000fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-6000-9.221.4.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn6000.fw" iwn6000g2afw.c optional iwn6000g2afw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn6000g2a.fw:iwn6000g2afw -miwn6000g2afw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn6000g2afw.c" iwn6000g2afw.fwo optional iwn6000g2afw | iwnfw \ dependency "iwn6000g2a.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn6000g2afw.fwo" iwn6000g2a.fw optional iwn6000g2afw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-6000g2a-18.168.6.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn6000g2a.fw" iwn6000g2bfw.c optional iwn6000g2bfw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn6000g2b.fw:iwn6000g2bfw -miwn6000g2bfw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn6000g2bfw.c" iwn6000g2bfw.fwo optional iwn6000g2bfw | iwnfw \ dependency "iwn6000g2b.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn6000g2bfw.fwo" iwn6000g2b.fw optional iwn6000g2bfw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-6000g2b-18.168.6.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn6000g2b.fw" iwn6050fw.c optional iwn6050fw | iwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk iwn6050.fw:iwn6050fw -miwn6050fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "iwn6050fw.c" iwn6050fw.fwo optional iwn6050fw | iwnfw \ dependency "iwn6050.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "iwn6050fw.fwo" iwn6050.fw optional iwn6050fw | iwnfw \ dependency "$S/contrib/dev/iwn/iwlwifi-6050-41.28.5.1.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "iwn6050.fw" dev/ixgb/if_ixgb.c optional ixgb dev/ixgb/ixgb_ee.c optional ixgb dev/ixgb/ixgb_hw.c optional ixgb dev/ixgbe/if_ix.c optional ix inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe -DSMP" dev/ixgbe/if_ixv.c optional ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe -DSMP" dev/ixgbe/ix_txrx.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_osdep.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_phy.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_api.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_common.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_mbx.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_vf.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_82598.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_82599.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_x540.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_x550.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_dcb.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_dcb_82598.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/ixgbe/ixgbe_dcb_82599.c optional ix inet | ixv inet \ compile-with "${NORMAL_C} -I$S/dev/ixgbe" dev/jme/if_jme.c optional jme pci dev/joy/joy.c optional joy dev/joy/joy_isa.c optional joy isa dev/kbd/kbd.c optional atkbd | pckbd | sc | ukbd | vt dev/kbdmux/kbdmux.c optional kbdmux dev/ksyms/ksyms.c optional ksyms dev/le/am7990.c optional le dev/le/am79900.c optional le dev/le/if_le_pci.c optional le pci dev/le/lance.c optional le dev/led/led.c standard dev/lge/if_lge.c optional lge dev/lmc/if_lmc.c optional lmc dev/malo/if_malo.c optional malo dev/malo/if_malohal.c optional malo dev/malo/if_malo_pci.c optional malo pci dev/mc146818/mc146818.c optional mc146818 dev/mca/mca_bus.c optional mca dev/mcd/mcd.c optional mcd isa nowerror dev/mcd/mcd_isa.c optional mcd isa nowerror dev/md/md.c optional md dev/mdio/mdio_if.m optional miiproxy | mdio dev/mdio/mdio.c optional miiproxy | mdio dev/mem/memdev.c optional mem dev/mem/memutil.c optional mem dev/mfi/mfi.c optional mfi dev/mfi/mfi_debug.c optional mfi dev/mfi/mfi_pci.c optional mfi pci dev/mfi/mfi_disk.c optional mfi dev/mfi/mfi_syspd.c optional mfi dev/mfi/mfi_tbolt.c optional mfi dev/mfi/mfi_linux.c optional mfi compat_linux dev/mfi/mfi_cam.c optional mfip scbus dev/mii/acphy.c optional miibus | acphy dev/mii/amphy.c optional miibus | amphy dev/mii/atphy.c optional miibus | atphy dev/mii/axphy.c optional miibus | axphy dev/mii/bmtphy.c optional miibus | bmtphy dev/mii/brgphy.c optional miibus | brgphy dev/mii/ciphy.c optional miibus | ciphy dev/mii/e1000phy.c optional miibus | e1000phy dev/mii/gentbi.c optional miibus | gentbi dev/mii/icsphy.c optional miibus | icsphy dev/mii/ip1000phy.c optional miibus | ip1000phy dev/mii/jmphy.c optional miibus | jmphy dev/mii/lxtphy.c optional miibus | lxtphy dev/mii/mii.c optional miibus | mii dev/mii/mii_bitbang.c optional miibus | mii_bitbang dev/mii/mii_physubr.c optional miibus | mii dev/mii/miibus_if.m optional miibus | mii dev/mii/mlphy.c optional miibus | mlphy dev/mii/nsgphy.c optional miibus | nsgphy dev/mii/nsphy.c optional miibus | nsphy dev/mii/nsphyter.c optional miibus | nsphyter dev/mii/pnaphy.c optional miibus | pnaphy dev/mii/qsphy.c optional miibus | qsphy dev/mii/rdcphy.c optional miibus | rdcphy dev/mii/rgephy.c optional miibus | rgephy dev/mii/rlphy.c optional miibus | rlphy dev/mii/rlswitch.c optional rlswitch dev/mii/smcphy.c optional miibus | smcphy dev/mii/smscphy.c optional miibus | smscphy dev/mii/tdkphy.c optional miibus | tdkphy dev/mii/tlphy.c optional miibus | tlphy dev/mii/truephy.c optional miibus | truephy dev/mii/ukphy.c optional miibus | mii dev/mii/ukphy_subr.c optional miibus | mii dev/mii/xmphy.c optional miibus | xmphy dev/mk48txx/mk48txx.c optional mk48txx dev/mlx/mlx.c optional mlx dev/mlx/mlx_disk.c optional mlx dev/mlx/mlx_pci.c optional mlx pci dev/mly/mly.c optional mly dev/mmc/mmc.c optional mmc dev/mmc/mmcbr_if.m standard dev/mmc/mmcbus_if.m standard dev/mmc/mmcsd.c optional mmcsd dev/mn/if_mn.c optional mn pci dev/mpr/mpr.c optional mpr dev/mpr/mpr_config.c optional mpr # XXX Work around clang warning, until maintainer approves fix. dev/mpr/mpr_mapping.c optional mpr \ compile-with "${NORMAL_C} ${NO_WSOMETIMES_UNINITIALIZED}" dev/mpr/mpr_pci.c optional mpr pci dev/mpr/mpr_sas.c optional mpr \ compile-with "${NORMAL_C} ${NO_WUNNEEDED_INTERNAL_DECL}" dev/mpr/mpr_sas_lsi.c optional mpr dev/mpr/mpr_table.c optional mpr dev/mpr/mpr_user.c optional mpr dev/mps/mps.c optional mps dev/mps/mps_config.c optional mps # XXX Work around clang warning, until maintainer approves fix. dev/mps/mps_mapping.c optional mps \ compile-with "${NORMAL_C} ${NO_WSOMETIMES_UNINITIALIZED}" dev/mps/mps_pci.c optional mps pci dev/mps/mps_sas.c optional mps \ compile-with "${NORMAL_C} ${NO_WUNNEEDED_INTERNAL_DECL}" dev/mps/mps_sas_lsi.c optional mps dev/mps/mps_table.c optional mps dev/mps/mps_user.c optional mps dev/mpt/mpt.c optional mpt dev/mpt/mpt_cam.c optional mpt dev/mpt/mpt_debug.c optional mpt dev/mpt/mpt_pci.c optional mpt pci dev/mpt/mpt_raid.c optional mpt dev/mpt/mpt_user.c optional mpt dev/mrsas/mrsas.c optional mrsas dev/mrsas/mrsas_cam.c optional mrsas dev/mrsas/mrsas_ioctl.c optional mrsas dev/mrsas/mrsas_fp.c optional mrsas dev/msk/if_msk.c optional msk dev/mvs/mvs.c optional mvs dev/mvs/mvs_if.m optional mvs dev/mvs/mvs_pci.c optional mvs pci dev/mwl/if_mwl.c optional mwl dev/mwl/if_mwl_pci.c optional mwl pci dev/mwl/mwlhal.c optional mwl mwlfw.c optional mwlfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk mw88W8363.fw:mw88W8363fw mwlboot.fw:mwlboot -mmwl -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "mwlfw.c" mw88W8363.fwo optional mwlfw \ dependency "mw88W8363.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "mw88W8363.fwo" mw88W8363.fw optional mwlfw \ dependency "$S/contrib/dev/mwl/mw88W8363.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "mw88W8363.fw" mwlboot.fwo optional mwlfw \ dependency "mwlboot.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "mwlboot.fwo" mwlboot.fw optional mwlfw \ dependency "$S/contrib/dev/mwl/mwlboot.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "mwlboot.fw" dev/mxge/if_mxge.c optional mxge pci dev/mxge/mxge_eth_z8e.c optional mxge pci dev/mxge/mxge_ethp_z8e.c optional mxge pci dev/mxge/mxge_rss_eth_z8e.c optional mxge pci dev/mxge/mxge_rss_ethp_z8e.c optional mxge pci dev/my/if_my.c optional my dev/nand/nand.c optional nand dev/nand/nand_bbt.c optional nand dev/nand/nand_cdev.c optional nand dev/nand/nand_generic.c optional nand dev/nand/nand_geom.c optional nand dev/nand/nand_id.c optional nand dev/nand/nandbus.c optional nand dev/nand/nandbus_if.m optional nand dev/nand/nand_if.m optional nand dev/nand/nandsim.c optional nandsim nand dev/nand/nandsim_chip.c optional nandsim nand dev/nand/nandsim_ctrl.c optional nandsim nand dev/nand/nandsim_log.c optional nandsim nand dev/nand/nandsim_swap.c optional nandsim nand dev/nand/nfc_if.m optional nand dev/ncr/ncr.c optional ncr pci dev/ncv/ncr53c500.c optional ncv dev/ncv/ncr53c500_pccard.c optional ncv pccard dev/netmap/netmap.c optional netmap dev/netmap/netmap_freebsd.c optional netmap dev/netmap/netmap_generic.c optional netmap dev/netmap/netmap_mbq.c optional netmap dev/netmap/netmap_mem2.c optional netmap dev/netmap/netmap_monitor.c optional netmap dev/netmap/netmap_offloadings.c optional netmap dev/netmap/netmap_pipe.c optional netmap dev/netmap/netmap_vale.c optional netmap # compile-with "${NORMAL_C} -Wconversion -Wextra" dev/nfsmb/nfsmb.c optional nfsmb pci dev/nge/if_nge.c optional nge dev/nxge/if_nxge.c optional nxge \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN}" dev/nxge/xgehal/xgehal-device.c optional nxge \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN}" dev/nxge/xgehal/xgehal-mm.c optional nxge dev/nxge/xgehal/xge-queue.c optional nxge dev/nxge/xgehal/xgehal-driver.c optional nxge \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN}" dev/nxge/xgehal/xgehal-ring.c optional nxge \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN}" dev/nxge/xgehal/xgehal-channel.c optional nxge \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN}" dev/nxge/xgehal/xgehal-fifo.c optional nxge \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN}" dev/nxge/xgehal/xgehal-stats.c optional nxge \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN}" dev/nxge/xgehal/xgehal-config.c optional nxge dev/nxge/xgehal/xgehal-mgmt.c optional nxge \ compile-with "${NORMAL_C} ${NO_WSELF_ASSIGN}" dev/nmdm/nmdm.c optional nmdm dev/nsp/nsp.c optional nsp dev/nsp/nsp_pccard.c optional nsp pccard dev/null/null.c standard dev/oce/oce_hw.c optional oce pci dev/oce/oce_if.c optional oce pci dev/oce/oce_mbox.c optional oce pci dev/oce/oce_queue.c optional oce pci dev/oce/oce_sysctl.c optional oce pci dev/oce/oce_util.c optional oce pci dev/ofw/ofw_bus_if.m optional fdt dev/ofw/ofw_bus_subr.c optional fdt dev/ofw/ofw_fdt.c optional fdt dev/ofw/ofw_if.m optional fdt dev/ofw/ofw_iicbus.c optional fdt iicbus dev/ofw/ofw_subr.c optional fdt dev/ofw/ofwbus.c optional fdt dev/ofw/openfirm.c optional fdt dev/ofw/openfirmio.c optional fdt dev/ow/ow.c optional ow \ dependency "owll_if.h" \ dependency "own_if.h" dev/ow/owll_if.m optional ow dev/ow/own_if.m optional ow dev/ow/ow_temp.c optional ow_temp dev/ow/owc_gpiobus.c optional owc gpio dev/patm/if_patm.c optional patm pci dev/patm/if_patm_attach.c optional patm pci dev/patm/if_patm_intr.c optional patm pci dev/patm/if_patm_ioctl.c optional patm pci dev/patm/if_patm_rtables.c optional patm pci dev/patm/if_patm_rx.c optional patm pci dev/patm/if_patm_tx.c optional patm pci dev/pbio/pbio.c optional pbio isa dev/pccard/card_if.m standard dev/pccard/pccard.c optional pccard dev/pccard/pccard_cis.c optional pccard dev/pccard/pccard_cis_quirks.c optional pccard dev/pccard/pccard_device.c optional pccard dev/pccard/power_if.m standard dev/pccbb/pccbb.c optional cbb dev/pccbb/pccbb_isa.c optional cbb isa dev/pccbb/pccbb_pci.c optional cbb pci dev/pcf/pcf.c optional pcf dev/pci/eisa_pci.c optional pci eisa dev/pci/fixup_pci.c optional pci dev/pci/hostb_pci.c optional pci dev/pci/ignore_pci.c optional pci dev/pci/isa_pci.c optional pci isa dev/pci/pci.c optional pci dev/pci/pci_if.m standard dev/pci/pci_iov.c optional pci pci_iov dev/pci/pci_iov_if.m standard dev/pci/pci_iov_schema.c optional pci pci_iov dev/pci/pci_pci.c optional pci dev/pci/pci_subr.c optional pci dev/pci/pci_user.c optional pci dev/pci/pcib_if.m standard dev/pci/pcib_support.c standard dev/pci/vga_pci.c optional pci dev/pcn/if_pcn.c optional pcn pci dev/pdq/if_fea.c optional fea eisa dev/pdq/if_fpa.c optional fpa pci dev/pdq/pdq.c optional nowerror fea eisa | fpa pci dev/pdq/pdq_ifsubr.c optional nowerror fea eisa | fpa pci dev/pms/freebsd/driver/ini/src/agtiapi.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sadisc.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/mpi.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/saframe.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sahw.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sainit.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/saint.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sampicmd.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sampirsp.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/saphy.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/saport.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sasata.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sasmp.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sassp.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/satimer.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/sautil.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/saioctlcmd.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sallsdk/spc/mpidebug.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/discovery/dm/dminit.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/discovery/dm/dmsmp.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/discovery/dm/dmdisc.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/discovery/dm/dmport.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/discovery/dm/dmtimer.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/discovery/dm/dmmisc.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sat/src/sminit.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sat/src/smmisc.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sat/src/smsat.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sat/src/smsatcb.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sat/src/smsathw.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/sat/src/smtimer.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdinit.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdmisc.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdesgl.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdport.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdint.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdioctl.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdhw.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/ossacmnapi.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tddmcmnapi.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdsmcmnapi.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/common/tdtimers.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/sas/ini/itdio.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/sas/ini/itdcb.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/sas/ini/itdinit.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/sas/ini/itddisc.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/sata/host/sat.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/sata/host/ossasat.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/pms/RefTisa/tisa/sassata/sata/host/sathw.c optional pmspcv \ compile-with "${NORMAL_C} -Wunused-variable -Woverflow -Wparentheses -w" dev/ppbus/if_plip.c optional plip dev/ppbus/immio.c optional vpo dev/ppbus/lpbb.c optional lpbb dev/ppbus/lpt.c optional lpt dev/ppbus/pcfclock.c optional pcfclock dev/ppbus/ppb_1284.c optional ppbus dev/ppbus/ppb_base.c optional ppbus dev/ppbus/ppb_msq.c optional ppbus dev/ppbus/ppbconf.c optional ppbus dev/ppbus/ppbus_if.m optional ppbus dev/ppbus/ppi.c optional ppi dev/ppbus/pps.c optional pps dev/ppbus/vpo.c optional vpo dev/ppbus/vpoio.c optional vpo dev/ppc/ppc.c optional ppc dev/ppc/ppc_acpi.c optional ppc acpi dev/ppc/ppc_isa.c optional ppc isa dev/ppc/ppc_pci.c optional ppc pci dev/ppc/ppc_puc.c optional ppc puc dev/proto/proto_bus_isa.c optional proto acpi | proto isa dev/proto/proto_bus_pci.c optional proto pci dev/proto/proto_busdma.c optional proto dev/proto/proto_core.c optional proto dev/pst/pst-iop.c optional pst dev/pst/pst-pci.c optional pst pci dev/pst/pst-raid.c optional pst dev/pty/pty.c optional pty dev/puc/puc.c optional puc dev/puc/puc_cfg.c optional puc dev/puc/puc_pccard.c optional puc pccard dev/puc/puc_pci.c optional puc pci dev/puc/pucdata.c optional puc pci dev/quicc/quicc_core.c optional quicc dev/ral/rt2560.c optional ral dev/ral/rt2661.c optional ral dev/ral/rt2860.c optional ral dev/ral/if_ral_pci.c optional ral pci rt2561fw.c optional rt2561fw | ralfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk rt2561.fw:rt2561fw -mrt2561 -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "rt2561fw.c" rt2561fw.fwo optional rt2561fw | ralfw \ dependency "rt2561.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "rt2561fw.fwo" rt2561.fw optional rt2561fw | ralfw \ dependency "$S/contrib/dev/ral/rt2561.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "rt2561.fw" rt2561sfw.c optional rt2561sfw | ralfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk rt2561s.fw:rt2561sfw -mrt2561s -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "rt2561sfw.c" rt2561sfw.fwo optional rt2561sfw | ralfw \ dependency "rt2561s.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "rt2561sfw.fwo" rt2561s.fw optional rt2561sfw | ralfw \ dependency "$S/contrib/dev/ral/rt2561s.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "rt2561s.fw" rt2661fw.c optional rt2661fw | ralfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk rt2661.fw:rt2661fw -mrt2661 -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "rt2661fw.c" rt2661fw.fwo optional rt2661fw | ralfw \ dependency "rt2661.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "rt2661fw.fwo" rt2661.fw optional rt2661fw | ralfw \ dependency "$S/contrib/dev/ral/rt2661.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "rt2661.fw" rt2860fw.c optional rt2860fw | ralfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk rt2860.fw:rt2860fw -mrt2860 -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "rt2860fw.c" rt2860fw.fwo optional rt2860fw | ralfw \ dependency "rt2860.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "rt2860fw.fwo" rt2860.fw optional rt2860fw | ralfw \ dependency "$S/contrib/dev/ral/rt2860.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "rt2860.fw" dev/random/random_infra.c optional random dev/random/random_harvestq.c optional random dev/random/randomdev.c optional random random_yarrow | \ random !random_yarrow !random_loadable dev/random/yarrow.c optional random random_yarrow dev/random/fortuna.c optional random !random_yarrow !random_loadable dev/random/hash.c optional random random_yarrow | \ random !random_yarrow !random_loadable dev/rc/rc.c optional rc dev/rccgpio/rccgpio.c optional rccgpio gpio dev/re/if_re.c optional re dev/rl/if_rl.c optional rl pci dev/rndtest/rndtest.c optional rndtest dev/rp/rp.c optional rp dev/rp/rp_isa.c optional rp isa dev/rp/rp_pci.c optional rp pci dev/safe/safe.c optional safe dev/scc/scc_if.m optional scc dev/scc/scc_bfe_ebus.c optional scc ebus dev/scc/scc_bfe_quicc.c optional scc quicc dev/scc/scc_bfe_sbus.c optional scc fhc | scc sbus dev/scc/scc_core.c optional scc dev/scc/scc_dev_quicc.c optional scc quicc dev/scc/scc_dev_sab82532.c optional scc dev/scc/scc_dev_z8530.c optional scc dev/scd/scd.c optional scd isa dev/scd/scd_isa.c optional scd isa dev/sdhci/sdhci.c optional sdhci dev/sdhci/sdhci_if.m optional sdhci dev/sdhci/sdhci_pci.c optional sdhci pci dev/sf/if_sf.c optional sf pci dev/sge/if_sge.c optional sge pci dev/si/si.c optional si dev/si/si2_z280.c optional si dev/si/si3_t225.c optional si dev/si/si_eisa.c optional si eisa dev/si/si_isa.c optional si isa dev/si/si_pci.c optional si pci dev/siba/siba.c optional siba dev/siba/siba_bwn.c optional siba_bwn pci dev/siba/siba_cc.c optional siba dev/siba/siba_core.c optional siba | siba_bwn pci dev/siba/siba_pcib.c optional siba pci dev/siis/siis.c optional siis pci dev/sis/if_sis.c optional sis pci dev/sk/if_sk.c optional sk pci dev/smbus/smb.c optional smb dev/smbus/smbconf.c optional smbus dev/smbus/smbus.c optional smbus dev/smbus/smbus_if.m optional smbus dev/smc/if_smc.c optional smc dev/smc/if_smc_fdt.c optional smc fdt dev/sn/if_sn.c optional sn dev/sn/if_sn_isa.c optional sn isa dev/sn/if_sn_pccard.c optional sn pccard dev/snp/snp.c optional snp dev/sound/clone.c optional sound dev/sound/unit.c optional sound dev/sound/isa/ad1816.c optional snd_ad1816 isa dev/sound/isa/ess.c optional snd_ess isa dev/sound/isa/gusc.c optional snd_gusc isa dev/sound/isa/mss.c optional snd_mss isa dev/sound/isa/sb16.c optional snd_sb16 isa dev/sound/isa/sb8.c optional snd_sb8 isa dev/sound/isa/sbc.c optional snd_sbc isa dev/sound/isa/sndbuf_dma.c optional sound isa dev/sound/pci/als4000.c optional snd_als4000 pci dev/sound/pci/atiixp.c optional snd_atiixp pci dev/sound/pci/cmi.c optional snd_cmi pci dev/sound/pci/cs4281.c optional snd_cs4281 pci dev/sound/pci/csa.c optional snd_csa pci dev/sound/pci/csapcm.c optional snd_csa pci dev/sound/pci/ds1.c optional snd_ds1 pci dev/sound/pci/emu10k1.c optional snd_emu10k1 pci dev/sound/pci/emu10kx.c optional snd_emu10kx pci dev/sound/pci/emu10kx-pcm.c optional snd_emu10kx pci dev/sound/pci/emu10kx-midi.c optional snd_emu10kx pci dev/sound/pci/envy24.c optional snd_envy24 pci dev/sound/pci/envy24ht.c optional snd_envy24ht pci dev/sound/pci/es137x.c optional snd_es137x pci dev/sound/pci/fm801.c optional snd_fm801 pci dev/sound/pci/ich.c optional snd_ich pci dev/sound/pci/maestro.c optional snd_maestro pci dev/sound/pci/maestro3.c optional snd_maestro3 pci dev/sound/pci/neomagic.c optional snd_neomagic pci dev/sound/pci/solo.c optional snd_solo pci dev/sound/pci/spicds.c optional snd_spicds pci dev/sound/pci/t4dwave.c optional snd_t4dwave pci dev/sound/pci/via8233.c optional snd_via8233 pci dev/sound/pci/via82c686.c optional snd_via82c686 pci dev/sound/pci/vibes.c optional snd_vibes pci dev/sound/pci/hda/hdaa.c optional snd_hda pci dev/sound/pci/hda/hdaa_patches.c optional snd_hda pci dev/sound/pci/hda/hdac.c optional snd_hda pci dev/sound/pci/hda/hdac_if.m optional snd_hda pci dev/sound/pci/hda/hdacc.c optional snd_hda pci dev/sound/pci/hdspe.c optional snd_hdspe pci dev/sound/pci/hdspe-pcm.c optional snd_hdspe pci dev/sound/pcm/ac97.c optional sound dev/sound/pcm/ac97_if.m optional sound dev/sound/pcm/ac97_patch.c optional sound dev/sound/pcm/buffer.c optional sound \ dependency "snd_fxdiv_gen.h" dev/sound/pcm/channel.c optional sound dev/sound/pcm/channel_if.m optional sound dev/sound/pcm/dsp.c optional sound dev/sound/pcm/feeder.c optional sound dev/sound/pcm/feeder_chain.c optional sound dev/sound/pcm/feeder_eq.c optional sound \ dependency "feeder_eq_gen.h" \ dependency "snd_fxdiv_gen.h" dev/sound/pcm/feeder_if.m optional sound dev/sound/pcm/feeder_format.c optional sound \ dependency "snd_fxdiv_gen.h" dev/sound/pcm/feeder_matrix.c optional sound \ dependency "snd_fxdiv_gen.h" dev/sound/pcm/feeder_mixer.c optional sound \ dependency "snd_fxdiv_gen.h" dev/sound/pcm/feeder_rate.c optional sound \ dependency "feeder_rate_gen.h" \ dependency "snd_fxdiv_gen.h" dev/sound/pcm/feeder_volume.c optional sound \ dependency "snd_fxdiv_gen.h" dev/sound/pcm/mixer.c optional sound dev/sound/pcm/mixer_if.m optional sound dev/sound/pcm/sndstat.c optional sound dev/sound/pcm/sound.c optional sound dev/sound/pcm/vchan.c optional sound dev/sound/usb/uaudio.c optional snd_uaudio usb dev/sound/usb/uaudio_pcm.c optional snd_uaudio usb dev/sound/midi/midi.c optional sound dev/sound/midi/mpu401.c optional sound dev/sound/midi/mpu_if.m optional sound dev/sound/midi/mpufoi_if.m optional sound dev/sound/midi/sequencer.c optional sound dev/sound/midi/synth_if.m optional sound dev/spibus/ofw_spibus.c optional fdt spibus dev/spibus/spibus.c optional spibus \ dependency "spibus_if.h" dev/spibus/spibus_if.m optional spibus dev/ste/if_ste.c optional ste pci dev/stg/tmc18c30.c optional stg dev/stg/tmc18c30_isa.c optional stg isa dev/stg/tmc18c30_pccard.c optional stg pccard dev/stg/tmc18c30_pci.c optional stg pci dev/stg/tmc18c30_subr.c optional stg dev/stge/if_stge.c optional stge dev/streams/streams.c optional streams dev/sym/sym_hipd.c optional sym \ dependency "$S/dev/sym/sym_{conf,defs}.h" dev/syscons/blank/blank_saver.c optional blank_saver dev/syscons/daemon/daemon_saver.c optional daemon_saver dev/syscons/dragon/dragon_saver.c optional dragon_saver dev/syscons/fade/fade_saver.c optional fade_saver dev/syscons/fire/fire_saver.c optional fire_saver dev/syscons/green/green_saver.c optional green_saver dev/syscons/logo/logo.c optional logo_saver dev/syscons/logo/logo_saver.c optional logo_saver dev/syscons/rain/rain_saver.c optional rain_saver dev/syscons/schistory.c optional sc dev/syscons/scmouse.c optional sc dev/syscons/scterm.c optional sc dev/syscons/scvidctl.c optional sc dev/syscons/snake/snake_saver.c optional snake_saver dev/syscons/star/star_saver.c optional star_saver dev/syscons/syscons.c optional sc dev/syscons/sysmouse.c optional sc dev/syscons/warp/warp_saver.c optional warp_saver dev/tdfx/tdfx_linux.c optional tdfx_linux tdfx compat_linux dev/tdfx/tdfx_pci.c optional tdfx pci dev/ti/if_ti.c optional ti pci dev/tl/if_tl.c optional tl pci dev/trm/trm.c optional trm dev/twa/tw_cl_init.c optional twa \ compile-with "${NORMAL_C} -I$S/dev/twa" dev/twa/tw_cl_intr.c optional twa \ compile-with "${NORMAL_C} -I$S/dev/twa" dev/twa/tw_cl_io.c optional twa \ compile-with "${NORMAL_C} -I$S/dev/twa" dev/twa/tw_cl_misc.c optional twa \ compile-with "${NORMAL_C} -I$S/dev/twa" dev/twa/tw_osl_cam.c optional twa \ compile-with "${NORMAL_C} -I$S/dev/twa" dev/twa/tw_osl_freebsd.c optional twa \ compile-with "${NORMAL_C} -I$S/dev/twa" dev/twe/twe.c optional twe dev/twe/twe_freebsd.c optional twe dev/tws/tws.c optional tws dev/tws/tws_cam.c optional tws dev/tws/tws_hdm.c optional tws dev/tws/tws_services.c optional tws dev/tws/tws_user.c optional tws dev/tx/if_tx.c optional tx dev/txp/if_txp.c optional txp dev/uart/uart_bus_acpi.c optional uart acpi dev/uart/uart_bus_ebus.c optional uart ebus dev/uart/uart_bus_fdt.c optional uart fdt dev/uart/uart_bus_isa.c optional uart isa dev/uart/uart_bus_pccard.c optional uart pccard dev/uart/uart_bus_pci.c optional uart pci dev/uart/uart_bus_puc.c optional uart puc dev/uart/uart_bus_scc.c optional uart scc dev/uart/uart_core.c optional uart dev/uart/uart_dbg.c optional uart gdb dev/uart/uart_dev_ns8250.c optional uart uart_ns8250 dev/uart/uart_dev_pl011.c optional uart pl011 dev/uart/uart_dev_quicc.c optional uart quicc dev/uart/uart_dev_sab82532.c optional uart uart_sab82532 dev/uart/uart_dev_sab82532.c optional uart scc dev/uart/uart_dev_z8530.c optional uart uart_z8530 dev/uart/uart_dev_z8530.c optional uart scc dev/uart/uart_if.m optional uart dev/uart/uart_subr.c optional uart dev/uart/uart_tty.c optional uart dev/ubsec/ubsec.c optional ubsec # # USB controller drivers # dev/usb/controller/at91dci.c optional at91dci dev/usb/controller/at91dci_atmelarm.c optional at91dci at91rm9200 dev/usb/controller/musb_otg.c optional musb dev/usb/controller/musb_otg_atmelarm.c optional musb at91rm9200 dev/usb/controller/dwc_otg.c optional dwcotg dev/usb/controller/dwc_otg_fdt.c optional dwcotg fdt dev/usb/controller/ehci.c optional ehci dev/usb/controller/ehci_pci.c optional ehci pci dev/usb/controller/ohci.c optional ohci dev/usb/controller/ohci_pci.c optional ohci pci dev/usb/controller/uhci.c optional uhci dev/usb/controller/uhci_pci.c optional uhci pci dev/usb/controller/xhci.c optional xhci dev/usb/controller/xhci_pci.c optional xhci pci dev/usb/controller/saf1761_otg.c optional saf1761otg dev/usb/controller/saf1761_otg_fdt.c optional saf1761otg fdt dev/usb/controller/uss820dci.c optional uss820dci dev/usb/controller/uss820dci_atmelarm.c optional uss820dci at91rm9200 dev/usb/controller/usb_controller.c optional usb # # USB storage drivers # dev/usb/storage/umass.c optional umass dev/usb/storage/urio.c optional urio dev/usb/storage/ustorage_fs.c optional usfs # # USB core # dev/usb/usb_busdma.c optional usb dev/usb/usb_core.c optional usb dev/usb/usb_debug.c optional usb dev/usb/usb_dev.c optional usb dev/usb/usb_device.c optional usb dev/usb/usb_dynamic.c optional usb dev/usb/usb_error.c optional usb dev/usb/usb_generic.c optional usb dev/usb/usb_handle_request.c optional usb dev/usb/usb_hid.c optional usb dev/usb/usb_hub.c optional usb dev/usb/usb_if.m optional usb dev/usb/usb_lookup.c optional usb dev/usb/usb_mbuf.c optional usb dev/usb/usb_msctest.c optional usb dev/usb/usb_parse.c optional usb dev/usb/usb_pf.c optional usb dev/usb/usb_process.c optional usb dev/usb/usb_request.c optional usb dev/usb/usb_transfer.c optional usb dev/usb/usb_util.c optional usb # # USB network drivers # dev/usb/net/if_aue.c optional aue dev/usb/net/if_axe.c optional axe dev/usb/net/if_axge.c optional axge dev/usb/net/if_cdce.c optional cdce dev/usb/net/if_cue.c optional cue dev/usb/net/if_ipheth.c optional ipheth dev/usb/net/if_kue.c optional kue dev/usb/net/if_mos.c optional mos dev/usb/net/if_rue.c optional rue dev/usb/net/if_smsc.c optional smsc dev/usb/net/if_udav.c optional udav dev/usb/net/if_ure.c optional ure dev/usb/net/if_usie.c optional usie dev/usb/net/if_urndis.c optional urndis dev/usb/net/ruephy.c optional rue dev/usb/net/usb_ethernet.c optional uether | aue | axe | axge | cdce | \ cue | ipheth | kue | mos | rue | \ smsc | udav | ure | urndis dev/usb/net/uhso.c optional uhso # # USB WLAN drivers # dev/usb/wlan/if_rsu.c optional rsu rsu-rtl8712fw.c optional rsu-rtl8712fw | rsufw \ compile-with "${AWK} -f $S/tools/fw_stub.awk rsu-rtl8712fw.fw:rsu-rtl8712fw:120 -mrsu-rtl8712fw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "rsu-rtl8712fw.c" rsu-rtl8712fw.fwo optional rsu-rtl8712fw | rsufw \ dependency "rsu-rtl8712fw.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "rsu-rtl8712fw.fwo" rsu-rtl8712fw.fw optional rsu-rtl8712.fw | rsufw \ dependency "$S/contrib/dev/rsu/rsu-rtl8712fw.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "rsu-rtl8712fw.fw" dev/usb/wlan/if_rum.c optional rum dev/usb/wlan/if_run.c optional run runfw.c optional runfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk run.fw:runfw -mrunfw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "runfw.c" runfw.fwo optional runfw \ dependency "run.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "runfw.fwo" run.fw optional runfw \ dependency "$S/contrib/dev/run/rt2870.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "run.fw" dev/usb/wlan/if_uath.c optional uath dev/usb/wlan/if_upgt.c optional upgt dev/usb/wlan/if_ural.c optional ural dev/usb/wlan/if_urtw.c optional urtw dev/usb/wlan/if_urtwn.c optional urtwn urtwn-rtl8188eufw.c optional urtwn-rtl8188eufw | urtwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk urtwn-rtl8188eufw.fw:urtwn-rtl8188eufw:111 -murtwn-rtl8188eufw -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "urtwn-rtl8188eufw.c" urtwn-rtl8188eufw.fwo optional urtwn-rtl8188eufw | urtwnfw \ dependency "urtwn-rtl8188eufw.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "urtwn-rtl8188eufw.fwo" urtwn-rtl8188eufw.fw optional urtwn-rtl8188eufw | urtwnfw \ dependency "$S/contrib/dev/urtwn/urtwn-rtl8188eufw.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "urtwn-rtl8188eufw.fw" urtwn-rtl8192cfwT.c optional urtwn-rtl8192cfwT | urtwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk urtwn-rtl8192cfwT.fw:urtwn-rtl8192cfwT:111 -murtwn-rtl8192cfwT -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "urtwn-rtl8192cfwT.c" urtwn-rtl8192cfwT.fwo optional urtwn-rtl8192cfwT | urtwnfw \ dependency "urtwn-rtl8192cfwT.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "urtwn-rtl8192cfwT.fwo" urtwn-rtl8192cfwT.fw optional urtwn-rtl8192cfwT | urtwnfw \ dependency "$S/contrib/dev/urtwn/urtwn-rtl8192cfwT.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "urtwn-rtl8192cfwT.fw" urtwn-rtl8192cfwU.c optional urtwn-rtl8192cfwU | urtwnfw \ compile-with "${AWK} -f $S/tools/fw_stub.awk urtwn-rtl8192cfwU.fw:urtwn-rtl8192cfwU:111 -murtwn-rtl8192cfwU -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "urtwn-rtl8192cfwU.c" urtwn-rtl8192cfwU.fwo optional urtwn-rtl8192cfwU | urtwnfw \ dependency "urtwn-rtl8192cfwU.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "urtwn-rtl8192cfwU.fwo" urtwn-rtl8192cfwU.fw optional urtwn-rtl8192cfwU | urtwnfw \ dependency "$S/contrib/dev/urtwn/urtwn-rtl8192cfwU.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "urtwn-rtl8192cfwU.fw" dev/usb/wlan/if_zyd.c optional zyd # # USB serial and parallel port drivers # dev/usb/serial/u3g.c optional u3g dev/usb/serial/uark.c optional uark dev/usb/serial/ubsa.c optional ubsa dev/usb/serial/ubser.c optional ubser dev/usb/serial/uchcom.c optional uchcom dev/usb/serial/ucycom.c optional ucycom dev/usb/serial/ufoma.c optional ufoma dev/usb/serial/uftdi.c optional uftdi dev/usb/serial/ugensa.c optional ugensa dev/usb/serial/uipaq.c optional uipaq dev/usb/serial/ulpt.c optional ulpt dev/usb/serial/umcs.c optional umcs dev/usb/serial/umct.c optional umct dev/usb/serial/umodem.c optional umodem dev/usb/serial/umoscom.c optional umoscom dev/usb/serial/uplcom.c optional uplcom dev/usb/serial/uslcom.c optional uslcom dev/usb/serial/uvisor.c optional uvisor dev/usb/serial/uvscom.c optional uvscom dev/usb/serial/usb_serial.c optional ucom | u3g | uark | ubsa | ubser | \ uchcom | ucycom | ufoma | uftdi | \ ugensa | uipaq | umcs | umct | \ umodem | umoscom | uplcom | usie | \ uslcom | uvisor | uvscom # # USB misc drivers # dev/usb/misc/ufm.c optional ufm dev/usb/misc/udbp.c optional udbp dev/usb/misc/ugold.c optional ugold dev/usb/misc/uled.c optional uled # # USB input drivers # dev/usb/input/atp.c optional atp dev/usb/input/uep.c optional uep dev/usb/input/uhid.c optional uhid dev/usb/input/ukbd.c optional ukbd dev/usb/input/ums.c optional ums dev/usb/input/wsp.c optional wsp # # USB quirks # dev/usb/quirk/usb_quirk.c optional usb # # USB templates # dev/usb/template/usb_template.c optional usb_template dev/usb/template/usb_template_audio.c optional usb_template dev/usb/template/usb_template_cdce.c optional usb_template dev/usb/template/usb_template_kbd.c optional usb_template dev/usb/template/usb_template_modem.c optional usb_template dev/usb/template/usb_template_mouse.c optional usb_template dev/usb/template/usb_template_msc.c optional usb_template dev/usb/template/usb_template_mtp.c optional usb_template dev/usb/template/usb_template_phone.c optional usb_template dev/usb/template/usb_template_serialnet.c optional usb_template dev/usb/template/usb_template_midi.c optional usb_template # # USB video drivers # dev/usb/video/udl.c optional udl # # USB END # dev/videomode/videomode.c optional videomode dev/videomode/edid.c optional videomode dev/videomode/pickmode.c optional videomode dev/videomode/vesagtf.c optional videomode dev/utopia/idtphy.c optional utopia dev/utopia/suni.c optional utopia dev/utopia/utopia.c optional utopia dev/vge/if_vge.c optional vge dev/viapm/viapm.c optional viapm pci dev/virtio/virtio.c optional virtio dev/virtio/virtqueue.c optional virtio dev/virtio/virtio_bus_if.m optional virtio dev/virtio/virtio_if.m optional virtio dev/virtio/pci/virtio_pci.c optional virtio_pci dev/virtio/mmio/virtio_mmio.c optional virtio_mmio dev/virtio/mmio/virtio_mmio_if.m optional virtio_mmio dev/virtio/network/if_vtnet.c optional vtnet dev/virtio/block/virtio_blk.c optional virtio_blk dev/virtio/balloon/virtio_balloon.c optional virtio_balloon dev/virtio/scsi/virtio_scsi.c optional virtio_scsi dev/virtio/random/virtio_random.c optional virtio_random dev/virtio/console/virtio_console.c optional virtio_console dev/vkbd/vkbd.c optional vkbd dev/vr/if_vr.c optional vr pci dev/vt/colors/vt_termcolors.c optional vt dev/vt/font/vt_font_default.c optional vt dev/vt/font/vt_mouse_cursor.c optional vt dev/vt/hw/efifb/efifb.c optional vt_efifb dev/vt/hw/fb/vt_fb.c optional vt dev/vt/hw/vga/vt_vga.c optional vt vt_vga dev/vt/logo/logo_freebsd.c optional vt splash dev/vt/logo/logo_beastie.c optional vt splash dev/vt/vt_buf.c optional vt dev/vt/vt_consolectl.c optional vt dev/vt/vt_core.c optional vt dev/vt/vt_cpulogos.c optional vt splash dev/vt/vt_font.c optional vt dev/vt/vt_sysmouse.c optional vt dev/vte/if_vte.c optional vte pci dev/vx/if_vx.c optional vx dev/vx/if_vx_eisa.c optional vx eisa dev/vx/if_vx_pci.c optional vx pci dev/vxge/vxge.c optional vxge dev/vxge/vxgehal/vxgehal-ifmsg.c optional vxge dev/vxge/vxgehal/vxgehal-mrpcim.c optional vxge dev/vxge/vxgehal/vxge-queue.c optional vxge dev/vxge/vxgehal/vxgehal-ring.c optional vxge dev/vxge/vxgehal/vxgehal-swapper.c optional vxge dev/vxge/vxgehal/vxgehal-mgmt.c optional vxge dev/vxge/vxgehal/vxgehal-srpcim.c optional vxge dev/vxge/vxgehal/vxgehal-config.c optional vxge dev/vxge/vxgehal/vxgehal-blockpool.c optional vxge dev/vxge/vxgehal/vxgehal-doorbells.c optional vxge dev/vxge/vxgehal/vxgehal-mgmtaux.c optional vxge dev/vxge/vxgehal/vxgehal-device.c optional vxge dev/vxge/vxgehal/vxgehal-mm.c optional vxge dev/vxge/vxgehal/vxgehal-driver.c optional vxge dev/vxge/vxgehal/vxgehal-virtualpath.c optional vxge dev/vxge/vxgehal/vxgehal-channel.c optional vxge dev/vxge/vxgehal/vxgehal-fifo.c optional vxge dev/watchdog/watchdog.c standard dev/wb/if_wb.c optional wb pci dev/wds/wd7000.c optional wds isa dev/wi/if_wi.c optional wi dev/wi/if_wi_pccard.c optional wi pccard dev/wi/if_wi_pci.c optional wi pci dev/wl/if_wl.c optional wl isa dev/wpi/if_wpi.c optional wpi pci wpifw.c optional wpifw \ compile-with "${AWK} -f $S/tools/fw_stub.awk wpi.fw:wpifw:153229 -mwpi -c${.TARGET}" \ no-implicit-rule before-depend local \ clean "wpifw.c" wpifw.fwo optional wpifw \ dependency "wpi.fw" \ compile-with "${NORMAL_FWO}" \ no-implicit-rule \ clean "wpifw.fwo" wpi.fw optional wpifw \ dependency "$S/contrib/dev/wpi/iwlwifi-3945-15.32.2.9.fw.uu" \ compile-with "${NORMAL_FW}" \ no-obj no-implicit-rule \ clean "wpi.fw" dev/xe/if_xe.c optional xe dev/xe/if_xe_pccard.c optional xe pccard dev/xen/balloon/balloon.c optional xenhvm dev/xen/blkfront/blkfront.c optional xenhvm dev/xen/blkback/blkback.c optional xenhvm dev/xen/console/xen_console.c optional xenhvm dev/xen/control/control.c optional xenhvm dev/xen/grant_table/grant_table.c optional xenhvm dev/xen/netback/netback.c optional xenhvm dev/xen/netfront/netfront.c optional xenhvm dev/xen/xenpci/xenpci.c optional xenpci dev/xen/timer/timer.c optional xenhvm dev/xen/pvcpu/pvcpu.c optional xenhvm dev/xen/xenstore/xenstore.c optional xenhvm dev/xen/xenstore/xenstore_dev.c optional xenhvm dev/xen/xenstore/xenstored_dev.c optional xenhvm dev/xen/evtchn/evtchn_dev.c optional xenhvm dev/xen/privcmd/privcmd.c optional xenhvm dev/xen/debug/debug.c optional xenhvm dev/xl/if_xl.c optional xl pci dev/xl/xlphy.c optional xl pci fs/autofs/autofs.c optional autofs fs/autofs/autofs_vfsops.c optional autofs fs/autofs/autofs_vnops.c optional autofs fs/deadfs/dead_vnops.c standard fs/devfs/devfs_devs.c standard fs/devfs/devfs_dir.c standard fs/devfs/devfs_rule.c standard fs/devfs/devfs_vfsops.c standard fs/devfs/devfs_vnops.c standard fs/fdescfs/fdesc_vfsops.c optional fdescfs fs/fdescfs/fdesc_vnops.c optional fdescfs fs/fifofs/fifo_vnops.c standard fs/cuse/cuse.c optional cuse fs/fuse/fuse_device.c optional fuse fs/fuse/fuse_file.c optional fuse fs/fuse/fuse_internal.c optional fuse fs/fuse/fuse_io.c optional fuse fs/fuse/fuse_ipc.c optional fuse fs/fuse/fuse_main.c optional fuse fs/fuse/fuse_node.c optional fuse fs/fuse/fuse_vfsops.c optional fuse fs/fuse/fuse_vnops.c optional fuse fs/msdosfs/msdosfs_conv.c optional msdosfs fs/msdosfs/msdosfs_denode.c optional msdosfs fs/msdosfs/msdosfs_fat.c optional msdosfs fs/msdosfs/msdosfs_fileno.c optional msdosfs fs/msdosfs/msdosfs_iconv.c optional msdosfs_iconv fs/msdosfs/msdosfs_lookup.c optional msdosfs fs/msdosfs/msdosfs_vfsops.c optional msdosfs fs/msdosfs/msdosfs_vnops.c optional msdosfs fs/nandfs/bmap.c optional nandfs fs/nandfs/nandfs_alloc.c optional nandfs fs/nandfs/nandfs_bmap.c optional nandfs fs/nandfs/nandfs_buffer.c optional nandfs fs/nandfs/nandfs_cleaner.c optional nandfs fs/nandfs/nandfs_cpfile.c optional nandfs fs/nandfs/nandfs_dat.c optional nandfs fs/nandfs/nandfs_dir.c optional nandfs fs/nandfs/nandfs_ifile.c optional nandfs fs/nandfs/nandfs_segment.c optional nandfs fs/nandfs/nandfs_subr.c optional nandfs fs/nandfs/nandfs_sufile.c optional nandfs fs/nandfs/nandfs_vfsops.c optional nandfs fs/nandfs/nandfs_vnops.c optional nandfs fs/nfs/nfs_commonkrpc.c optional nfscl | nfsd fs/nfs/nfs_commonsubs.c optional nfscl | nfsd fs/nfs/nfs_commonport.c optional nfscl | nfsd fs/nfs/nfs_commonacl.c optional nfscl | nfsd fs/nfsclient/nfs_clcomsubs.c optional nfscl fs/nfsclient/nfs_clsubs.c optional nfscl fs/nfsclient/nfs_clstate.c optional nfscl fs/nfsclient/nfs_clkrpc.c optional nfscl fs/nfsclient/nfs_clrpcops.c optional nfscl fs/nfsclient/nfs_clvnops.c optional nfscl fs/nfsclient/nfs_clnode.c optional nfscl fs/nfsclient/nfs_clvfsops.c optional nfscl fs/nfsclient/nfs_clport.c optional nfscl fs/nfsclient/nfs_clbio.c optional nfscl fs/nfsclient/nfs_clnfsiod.c optional nfscl fs/nfsserver/nfs_fha_new.c optional nfsd inet fs/nfsserver/nfs_nfsdsocket.c optional nfsd inet fs/nfsserver/nfs_nfsdsubs.c optional nfsd inet fs/nfsserver/nfs_nfsdstate.c optional nfsd inet fs/nfsserver/nfs_nfsdkrpc.c optional nfsd inet fs/nfsserver/nfs_nfsdserv.c optional nfsd inet fs/nfsserver/nfs_nfsdport.c optional nfsd inet fs/nfsserver/nfs_nfsdcache.c optional nfsd inet fs/nullfs/null_subr.c optional nullfs fs/nullfs/null_vfsops.c optional nullfs fs/nullfs/null_vnops.c optional nullfs fs/procfs/procfs.c optional procfs fs/procfs/procfs_ctl.c optional procfs fs/procfs/procfs_dbregs.c optional procfs fs/procfs/procfs_fpregs.c optional procfs fs/procfs/procfs_ioctl.c optional procfs fs/procfs/procfs_map.c optional procfs fs/procfs/procfs_mem.c optional procfs fs/procfs/procfs_note.c optional procfs fs/procfs/procfs_osrel.c optional procfs fs/procfs/procfs_regs.c optional procfs fs/procfs/procfs_rlimit.c optional procfs fs/procfs/procfs_status.c optional procfs fs/procfs/procfs_type.c optional procfs fs/pseudofs/pseudofs.c optional pseudofs fs/pseudofs/pseudofs_fileno.c optional pseudofs fs/pseudofs/pseudofs_vncache.c optional pseudofs fs/pseudofs/pseudofs_vnops.c optional pseudofs fs/smbfs/smbfs_io.c optional smbfs fs/smbfs/smbfs_node.c optional smbfs fs/smbfs/smbfs_smb.c optional smbfs fs/smbfs/smbfs_subr.c optional smbfs fs/smbfs/smbfs_vfsops.c optional smbfs fs/smbfs/smbfs_vnops.c optional smbfs fs/udf/osta.c optional udf fs/udf/udf_iconv.c optional udf_iconv fs/udf/udf_vfsops.c optional udf fs/udf/udf_vnops.c optional udf fs/unionfs/union_subr.c optional unionfs fs/unionfs/union_vfsops.c optional unionfs fs/unionfs/union_vnops.c optional unionfs fs/tmpfs/tmpfs_vnops.c optional tmpfs fs/tmpfs/tmpfs_fifoops.c optional tmpfs fs/tmpfs/tmpfs_vfsops.c optional tmpfs fs/tmpfs/tmpfs_subr.c optional tmpfs gdb/gdb_cons.c optional gdb gdb/gdb_main.c optional gdb gdb/gdb_packet.c optional gdb geom/bde/g_bde.c optional geom_bde geom/bde/g_bde_crypt.c optional geom_bde geom/bde/g_bde_lock.c optional geom_bde geom/bde/g_bde_work.c optional geom_bde geom/cache/g_cache.c optional geom_cache geom/concat/g_concat.c optional geom_concat geom/eli/g_eli.c optional geom_eli geom/eli/g_eli_crypto.c optional geom_eli geom/eli/g_eli_ctl.c optional geom_eli geom/eli/g_eli_hmac.c optional geom_eli geom/eli/g_eli_integrity.c optional geom_eli geom/eli/g_eli_key.c optional geom_eli geom/eli/g_eli_key_cache.c optional geom_eli geom/eli/g_eli_privacy.c optional geom_eli geom/eli/pkcs5v2.c optional geom_eli geom/gate/g_gate.c optional geom_gate geom/geom_aes.c optional geom_aes geom/geom_bsd.c optional geom_bsd geom/geom_bsd_enc.c optional geom_bsd | geom_part_bsd geom/geom_ccd.c optional ccd | geom_ccd geom/geom_ctl.c standard geom/geom_dev.c standard geom/geom_disk.c standard geom/geom_dump.c standard geom/geom_event.c standard geom/geom_fox.c optional geom_fox -geom/geom_flashmap.c optional fdt cfi | fdt nand +geom/geom_flashmap.c optional fdt cfi | fdt nand | fdt mx25l geom/geom_io.c standard geom/geom_kern.c standard geom/geom_map.c optional geom_map geom/geom_mbr.c optional geom_mbr geom/geom_mbr_enc.c optional geom_mbr geom/geom_pc98.c optional geom_pc98 geom/geom_pc98_enc.c optional geom_pc98 geom/geom_redboot.c optional geom_redboot geom/geom_slice.c standard geom/geom_subr.c standard geom/geom_sunlabel.c optional geom_sunlabel geom/geom_sunlabel_enc.c optional geom_sunlabel geom/geom_vfs.c standard geom/geom_vol_ffs.c optional geom_vol geom/journal/g_journal.c optional geom_journal geom/journal/g_journal_ufs.c optional geom_journal geom/label/g_label.c optional geom_label | geom_label_gpt geom/label/g_label_ext2fs.c optional geom_label geom/label/g_label_iso9660.c optional geom_label geom/label/g_label_msdosfs.c optional geom_label geom/label/g_label_ntfs.c optional geom_label geom/label/g_label_reiserfs.c optional geom_label geom/label/g_label_ufs.c optional geom_label geom/label/g_label_gpt.c optional geom_label | geom_label_gpt geom/label/g_label_disk_ident.c optional geom_label geom/linux_lvm/g_linux_lvm.c optional geom_linux_lvm geom/mirror/g_mirror.c optional geom_mirror geom/mirror/g_mirror_ctl.c optional geom_mirror geom/mountver/g_mountver.c optional geom_mountver geom/multipath/g_multipath.c optional geom_multipath geom/nop/g_nop.c optional geom_nop geom/part/g_part.c standard geom/part/g_part_if.m standard geom/part/g_part_apm.c optional geom_part_apm geom/part/g_part_bsd.c optional geom_part_bsd geom/part/g_part_bsd64.c optional geom_part_bsd64 geom/part/g_part_ebr.c optional geom_part_ebr geom/part/g_part_gpt.c optional geom_part_gpt geom/part/g_part_ldm.c optional geom_part_ldm geom/part/g_part_mbr.c optional geom_part_mbr geom/part/g_part_pc98.c optional geom_part_pc98 geom/part/g_part_vtoc8.c optional geom_part_vtoc8 geom/raid/g_raid.c optional geom_raid geom/raid/g_raid_ctl.c optional geom_raid geom/raid/g_raid_md_if.m optional geom_raid geom/raid/g_raid_tr_if.m optional geom_raid geom/raid/md_ddf.c optional geom_raid geom/raid/md_intel.c optional geom_raid geom/raid/md_jmicron.c optional geom_raid geom/raid/md_nvidia.c optional geom_raid geom/raid/md_promise.c optional geom_raid geom/raid/md_sii.c optional geom_raid geom/raid/tr_concat.c optional geom_raid geom/raid/tr_raid0.c optional geom_raid geom/raid/tr_raid1.c optional geom_raid geom/raid/tr_raid1e.c optional geom_raid geom/raid/tr_raid5.c optional geom_raid geom/raid3/g_raid3.c optional geom_raid3 geom/raid3/g_raid3_ctl.c optional geom_raid3 geom/shsec/g_shsec.c optional geom_shsec geom/stripe/g_stripe.c optional geom_stripe geom/uncompress/g_uncompress.c optional geom_uncompress contrib/xz-embedded/freebsd/xz_malloc.c \ optional xz_embedded | geom_uncompress \ compile-with "${NORMAL_C} -I$S/contrib/xz-embedded/freebsd/ -I$S/contrib/xz-embedded/linux/lib/xz/ -I$S/contrib/xz-embedded/linux/include/linux/" contrib/xz-embedded/linux/lib/xz/xz_crc32.c \ optional xz_embedded | geom_uncompress \ compile-with "${NORMAL_C} -I$S/contrib/xz-embedded/freebsd/ -I$S/contrib/xz-embedded/linux/lib/xz/ -I$S/contrib/xz-embedded/linux/include/linux/" contrib/xz-embedded/linux/lib/xz/xz_dec_bcj.c \ optional xz_embedded | geom_uncompress \ compile-with "${NORMAL_C} -I$S/contrib/xz-embedded/freebsd/ -I$S/contrib/xz-embedded/linux/lib/xz/ -I$S/contrib/xz-embedded/linux/include/linux/" contrib/xz-embedded/linux/lib/xz/xz_dec_lzma2.c \ optional xz_embedded | geom_uncompress \ compile-with "${NORMAL_C} -I$S/contrib/xz-embedded/freebsd/ -I$S/contrib/xz-embedded/linux/lib/xz/ -I$S/contrib/xz-embedded/linux/include/linux/" contrib/xz-embedded/linux/lib/xz/xz_dec_stream.c \ optional xz_embedded | geom_uncompress \ compile-with "${NORMAL_C} -I$S/contrib/xz-embedded/freebsd/ -I$S/contrib/xz-embedded/linux/lib/xz/ -I$S/contrib/xz-embedded/linux/include/linux/" geom/uzip/g_uzip.c optional geom_uzip geom/vinum/geom_vinum.c optional geom_vinum geom/vinum/geom_vinum_create.c optional geom_vinum geom/vinum/geom_vinum_drive.c optional geom_vinum geom/vinum/geom_vinum_plex.c optional geom_vinum geom/vinum/geom_vinum_volume.c optional geom_vinum geom/vinum/geom_vinum_subr.c optional geom_vinum geom/vinum/geom_vinum_raid5.c optional geom_vinum geom/vinum/geom_vinum_share.c optional geom_vinum geom/vinum/geom_vinum_list.c optional geom_vinum geom/vinum/geom_vinum_rm.c optional geom_vinum geom/vinum/geom_vinum_init.c optional geom_vinum geom/vinum/geom_vinum_state.c optional geom_vinum geom/vinum/geom_vinum_rename.c optional geom_vinum geom/vinum/geom_vinum_move.c optional geom_vinum geom/vinum/geom_vinum_events.c optional geom_vinum geom/virstor/binstream.c optional geom_virstor geom/virstor/g_virstor.c optional geom_virstor geom/virstor/g_virstor_md.c optional geom_virstor geom/zero/g_zero.c optional geom_zero fs/ext2fs/ext2_alloc.c optional ext2fs fs/ext2fs/ext2_balloc.c optional ext2fs fs/ext2fs/ext2_bmap.c optional ext2fs fs/ext2fs/ext2_extents.c optional ext2fs fs/ext2fs/ext2_inode.c optional ext2fs fs/ext2fs/ext2_inode_cnv.c optional ext2fs fs/ext2fs/ext2_hash.c optional ext2fs fs/ext2fs/ext2_htree.c optional ext2fs fs/ext2fs/ext2_lookup.c optional ext2fs fs/ext2fs/ext2_subr.c optional ext2fs fs/ext2fs/ext2_vfsops.c optional ext2fs fs/ext2fs/ext2_vnops.c optional ext2fs gnu/fs/reiserfs/reiserfs_hashes.c optional reiserfs \ warning "kernel contains GPL contaminated ReiserFS filesystem" gnu/fs/reiserfs/reiserfs_inode.c optional reiserfs gnu/fs/reiserfs/reiserfs_item_ops.c optional reiserfs gnu/fs/reiserfs/reiserfs_namei.c optional reiserfs gnu/fs/reiserfs/reiserfs_prints.c optional reiserfs gnu/fs/reiserfs/reiserfs_stree.c optional reiserfs gnu/fs/reiserfs/reiserfs_vfsops.c optional reiserfs gnu/fs/reiserfs/reiserfs_vnops.c optional reiserfs # isa/isa_if.m standard isa/isa_common.c optional isa isa/isahint.c optional isa isa/pnp.c optional isa isapnp isa/pnpparse.c optional isa isapnp fs/cd9660/cd9660_bmap.c optional cd9660 fs/cd9660/cd9660_lookup.c optional cd9660 fs/cd9660/cd9660_node.c optional cd9660 fs/cd9660/cd9660_rrip.c optional cd9660 fs/cd9660/cd9660_util.c optional cd9660 fs/cd9660/cd9660_vfsops.c optional cd9660 fs/cd9660/cd9660_vnops.c optional cd9660 fs/cd9660/cd9660_iconv.c optional cd9660_iconv kern/bus_if.m standard kern/clock_if.m standard kern/cpufreq_if.m standard kern/device_if.m standard kern/imgact_binmisc.c optional imagact_binmisc kern/imgact_elf.c standard kern/imgact_elf32.c optional compat_freebsd32 kern/imgact_shell.c standard kern/inflate.c optional gzip kern/init_main.c standard kern/init_sysent.c standard kern/ksched.c optional _kposix_priority_scheduling kern/kern_acct.c standard kern/kern_alq.c optional alq kern/kern_clock.c standard kern/kern_condvar.c standard kern/kern_conf.c standard kern/kern_cons.c standard kern/kern_cpu.c standard kern/kern_cpuset.c standard kern/kern_context.c standard kern/kern_descrip.c standard kern/kern_dtrace.c optional kdtrace_hooks kern/kern_dump.c standard kern/kern_environment.c standard kern/kern_et.c standard kern/kern_event.c standard kern/kern_exec.c standard kern/kern_exit.c standard kern/kern_fail.c standard kern/kern_ffclock.c standard kern/kern_fork.c standard kern/kern_gzio.c optional gzio kern/kern_hhook.c standard kern/kern_idle.c standard kern/kern_intr.c standard kern/kern_jail.c standard kern/kern_khelp.c standard kern/kern_kthread.c standard kern/kern_ktr.c optional ktr kern/kern_ktrace.c standard kern/kern_linker.c standard kern/kern_lock.c standard kern/kern_lockf.c standard kern/kern_lockstat.c optional kdtrace_hooks kern/kern_loginclass.c standard kern/kern_malloc.c standard kern/kern_mbuf.c standard kern/kern_mib.c standard kern/kern_module.c standard kern/kern_mtxpool.c standard kern/kern_mutex.c standard kern/kern_ntptime.c standard kern/kern_numa.c standard kern/kern_osd.c standard kern/kern_physio.c standard kern/kern_pmc.c standard kern/kern_poll.c optional device_polling kern/kern_priv.c standard kern/kern_proc.c standard kern/kern_procctl.c standard kern/kern_prot.c standard kern/kern_racct.c standard kern/kern_rangelock.c standard kern/kern_rctl.c standard kern/kern_resource.c standard kern/kern_rmlock.c standard kern/kern_rwlock.c standard kern/kern_sdt.c optional kdtrace_hooks kern/kern_sema.c standard kern/kern_sendfile.c standard kern/kern_sharedpage.c standard kern/kern_shutdown.c standard kern/kern_sig.c standard kern/kern_switch.c standard kern/kern_sx.c standard kern/kern_synch.c standard kern/kern_syscalls.c standard kern/kern_sysctl.c standard kern/kern_tc.c standard kern/kern_thr.c standard kern/kern_thread.c standard kern/kern_time.c standard kern/kern_timeout.c standard kern/kern_umtx.c standard kern/kern_uuid.c standard kern/kern_xxx.c standard kern/link_elf.c standard kern/linker_if.m standard kern/md4c.c optional netsmb kern/md5c.c standard kern/p1003_1b.c standard kern/posix4_mib.c standard kern/sched_4bsd.c optional sched_4bsd kern/sched_ule.c optional sched_ule kern/serdev_if.m standard kern/stack_protector.c standard \ compile-with "${NORMAL_C:N-fstack-protector*}" kern/subr_acl_nfs4.c optional ufs_acl | zfs kern/subr_acl_posix1e.c optional ufs_acl kern/subr_autoconf.c standard kern/subr_blist.c standard kern/subr_bus.c standard kern/subr_bus_dma.c standard kern/subr_bufring.c standard kern/subr_capability.c standard kern/subr_clock.c standard kern/subr_counter.c standard kern/subr_devstat.c standard kern/subr_disk.c standard kern/subr_eventhandler.c standard kern/subr_fattime.c standard kern/subr_firmware.c optional firmware kern/subr_hash.c standard kern/subr_hints.c standard kern/subr_kdb.c standard kern/subr_kobj.c standard kern/subr_lock.c standard kern/subr_log.c standard kern/subr_mbpool.c optional libmbpool kern/subr_mchain.c optional libmchain kern/subr_module.c standard kern/subr_msgbuf.c standard kern/subr_param.c standard kern/subr_pcpu.c standard kern/subr_pctrie.c standard kern/subr_power.c standard kern/subr_prf.c standard kern/subr_prof.c standard kern/subr_rman.c standard kern/subr_rtc.c standard kern/subr_sbuf.c standard kern/subr_scanf.c standard kern/subr_sglist.c standard kern/subr_sleepqueue.c standard kern/subr_smp.c standard kern/subr_stack.c optional ddb | stack | ktr kern/subr_taskqueue.c standard kern/subr_terminal.c optional vt kern/subr_trap.c standard kern/subr_turnstile.c standard kern/subr_uio.c standard kern/subr_unit.c standard kern/subr_vmem.c standard kern/subr_witness.c optional witness kern/sys_capability.c standard kern/sys_generic.c standard kern/sys_pipe.c standard kern/sys_procdesc.c standard kern/sys_process.c standard kern/sys_socket.c standard kern/syscalls.c standard kern/sysv_ipc.c standard kern/sysv_msg.c optional sysvmsg kern/sysv_sem.c optional sysvsem kern/sysv_shm.c optional sysvshm kern/tty.c standard kern/tty_compat.c optional compat_43tty kern/tty_info.c standard kern/tty_inq.c standard kern/tty_outq.c standard kern/tty_pts.c standard kern/tty_tty.c standard kern/tty_ttydisc.c standard kern/uipc_accf.c standard kern/uipc_debug.c optional ddb kern/uipc_domain.c standard kern/uipc_mbuf.c standard kern/uipc_mbuf2.c standard kern/uipc_mbufhash.c standard kern/uipc_mqueue.c optional p1003_1b_mqueue kern/uipc_sem.c optional p1003_1b_semaphores kern/uipc_shm.c standard kern/uipc_sockbuf.c standard kern/uipc_socket.c standard kern/uipc_syscalls.c standard kern/uipc_usrreq.c standard kern/vfs_acl.c standard kern/vfs_aio.c optional vfs_aio kern/vfs_bio.c standard kern/vfs_cache.c standard kern/vfs_cluster.c standard kern/vfs_default.c standard kern/vfs_export.c standard kern/vfs_extattr.c standard kern/vfs_hash.c standard kern/vfs_init.c standard kern/vfs_lookup.c standard kern/vfs_mount.c standard kern/vfs_mountroot.c standard kern/vfs_subr.c standard kern/vfs_syscalls.c standard kern/vfs_vnops.c standard # # Kernel GSS-API # gssd.h optional kgssapi \ dependency "$S/kgssapi/gssd.x" \ compile-with "RPCGEN_CPP='${CPP}' rpcgen -hM $S/kgssapi/gssd.x | grep -v pthread.h > gssd.h" \ no-obj no-implicit-rule before-depend local \ clean "gssd.h" gssd_xdr.c optional kgssapi \ dependency "$S/kgssapi/gssd.x gssd.h" \ compile-with "RPCGEN_CPP='${CPP}' rpcgen -c $S/kgssapi/gssd.x -o gssd_xdr.c" \ no-implicit-rule before-depend local \ clean "gssd_xdr.c" gssd_clnt.c optional kgssapi \ dependency "$S/kgssapi/gssd.x gssd.h" \ compile-with "RPCGEN_CPP='${CPP}' rpcgen -lM $S/kgssapi/gssd.x | grep -v string.h > gssd_clnt.c" \ no-implicit-rule before-depend local \ clean "gssd_clnt.c" kgssapi/gss_accept_sec_context.c optional kgssapi kgssapi/gss_add_oid_set_member.c optional kgssapi kgssapi/gss_acquire_cred.c optional kgssapi kgssapi/gss_canonicalize_name.c optional kgssapi kgssapi/gss_create_empty_oid_set.c optional kgssapi kgssapi/gss_delete_sec_context.c optional kgssapi kgssapi/gss_display_status.c optional kgssapi kgssapi/gss_export_name.c optional kgssapi kgssapi/gss_get_mic.c optional kgssapi kgssapi/gss_init_sec_context.c optional kgssapi kgssapi/gss_impl.c optional kgssapi kgssapi/gss_import_name.c optional kgssapi kgssapi/gss_names.c optional kgssapi kgssapi/gss_pname_to_uid.c optional kgssapi kgssapi/gss_release_buffer.c optional kgssapi kgssapi/gss_release_cred.c optional kgssapi kgssapi/gss_release_name.c optional kgssapi kgssapi/gss_release_oid_set.c optional kgssapi kgssapi/gss_set_cred_option.c optional kgssapi kgssapi/gss_test_oid_set_member.c optional kgssapi kgssapi/gss_unwrap.c optional kgssapi kgssapi/gss_verify_mic.c optional kgssapi kgssapi/gss_wrap.c optional kgssapi kgssapi/gss_wrap_size_limit.c optional kgssapi kgssapi/gssd_prot.c optional kgssapi kgssapi/krb5/krb5_mech.c optional kgssapi kgssapi/krb5/kcrypto.c optional kgssapi kgssapi/krb5/kcrypto_aes.c optional kgssapi kgssapi/krb5/kcrypto_arcfour.c optional kgssapi kgssapi/krb5/kcrypto_des.c optional kgssapi kgssapi/krb5/kcrypto_des3.c optional kgssapi kgssapi/kgss_if.m optional kgssapi kgssapi/gsstest.c optional kgssapi_debug # These files in libkern/ are those needed by all architectures. Some # of the files in libkern/ are only needed on some architectures, e.g., # libkern/divdi3.c is needed by i386 but not alpha. Also, some of these # routines may be optimized for a particular platform. In either case, # the file should be moved to conf/files. from here. # libkern/arc4random.c standard libkern/asprintf.c standard libkern/bcd.c standard libkern/bsearch.c standard libkern/crc32.c standard libkern/explicit_bzero.c standard libkern/fnmatch.c standard libkern/iconv.c optional libiconv libkern/iconv_converter_if.m optional libiconv libkern/iconv_ucs.c optional libiconv libkern/iconv_xlat.c optional libiconv libkern/iconv_xlat16.c optional libiconv libkern/inet_aton.c standard libkern/inet_ntoa.c standard libkern/inet_ntop.c standard libkern/inet_pton.c standard libkern/jenkins_hash.c standard libkern/murmur3_32.c standard libkern/mcount.c optional profiling-routine libkern/memcchr.c standard libkern/memchr.c standard libkern/memcmp.c standard libkern/memmem.c optional gdb libkern/qsort.c standard libkern/qsort_r.c standard libkern/random.c standard libkern/scanc.c standard libkern/strcasecmp.c standard libkern/strcat.c standard libkern/strchr.c standard libkern/strcmp.c standard libkern/strcpy.c standard libkern/strcspn.c standard libkern/strdup.c standard libkern/strndup.c standard libkern/strlcat.c standard libkern/strlcpy.c standard libkern/strlen.c standard libkern/strncmp.c standard libkern/strncpy.c standard libkern/strnlen.c standard libkern/strrchr.c standard libkern/strsep.c standard libkern/strspn.c standard libkern/strstr.c standard libkern/strtol.c standard libkern/strtoq.c standard libkern/strtoul.c standard libkern/strtouq.c standard libkern/strvalid.c standard libkern/timingsafe_bcmp.c standard libkern/zlib.c optional crypto | geom_uzip | ipsec | \ mxge | netgraph_deflate | \ ddb_ctf | gzio | geom_uncompress net/altq/altq_cbq.c optional altq net/altq/altq_cdnr.c optional altq net/altq/altq_codel.c optional altq net/altq/altq_hfsc.c optional altq net/altq/altq_fairq.c optional altq net/altq/altq_priq.c optional altq net/altq/altq_red.c optional altq net/altq/altq_rio.c optional altq net/altq/altq_rmclass.c optional altq net/altq/altq_subr.c optional altq net/bpf.c standard net/bpf_buffer.c optional bpf net/bpf_jitter.c optional bpf_jitter net/bpf_filter.c optional bpf | netgraph_bpf net/bpf_zerocopy.c optional bpf net/bridgestp.c optional bridge | if_bridge net/flowtable.c optional flowtable inet | flowtable inet6 net/ieee8023ad_lacp.c optional lagg net/if.c standard net/if_arcsubr.c optional arcnet net/if_atmsubr.c optional atm net/if_bridge.c optional bridge inet | if_bridge inet net/if_clone.c standard net/if_dead.c standard net/if_debug.c optional ddb net/if_disc.c optional disc net/if_edsc.c optional edsc net/if_enc.c optional enc inet | enc inet6 net/if_epair.c optional epair net/if_ethersubr.c optional ether net/if_fddisubr.c optional fddi net/if_fwsubr.c optional fwip net/if_gif.c optional gif inet | gif inet6 | \ netgraph_gif inet | netgraph_gif inet6 net/if_gre.c optional gre inet | gre inet6 net/if_iso88025subr.c optional token net/if_lagg.c optional lagg net/if_loop.c optional loop net/if_llatbl.c standard net/if_me.c optional me inet net/if_media.c standard net/if_mib.c standard net/if_spppfr.c optional sppp | netgraph_sppp net/if_spppsubr.c optional sppp | netgraph_sppp net/if_stf.c optional stf inet inet6 net/if_tun.c optional tun net/if_tap.c optional tap net/if_vlan.c optional vlan net/if_vxlan.c optional vxlan inet | vxlan inet6 net/mppcc.c optional netgraph_mppc_compression net/mppcd.c optional netgraph_mppc_compression net/netisr.c standard net/pfil.c optional ether | inet net/radix.c standard net/radix_mpath.c standard net/raw_cb.c standard net/raw_usrreq.c standard net/route.c standard net/rss_config.c optional inet rss | inet6 rss net/rtsock.c standard net/slcompress.c optional netgraph_vjc | sppp | \ netgraph_sppp net/toeplitz.c optional inet rss | inet6 rss net/vnet.c optional vimage net80211/ieee80211.c optional wlan net80211/ieee80211_acl.c optional wlan wlan_acl net80211/ieee80211_action.c optional wlan net80211/ieee80211_ageq.c optional wlan net80211/ieee80211_adhoc.c optional wlan \ compile-with "${NORMAL_C} -Wno-unused-function" net80211/ieee80211_ageq.c optional wlan net80211/ieee80211_amrr.c optional wlan | wlan_amrr net80211/ieee80211_crypto.c optional wlan \ compile-with "${NORMAL_C} -Wno-unused-function" net80211/ieee80211_crypto_ccmp.c optional wlan wlan_ccmp net80211/ieee80211_crypto_none.c optional wlan net80211/ieee80211_crypto_tkip.c optional wlan wlan_tkip net80211/ieee80211_crypto_wep.c optional wlan wlan_wep net80211/ieee80211_ddb.c optional wlan ddb net80211/ieee80211_dfs.c optional wlan net80211/ieee80211_freebsd.c optional wlan net80211/ieee80211_hostap.c optional wlan \ compile-with "${NORMAL_C} -Wno-unused-function" net80211/ieee80211_ht.c optional wlan net80211/ieee80211_hwmp.c optional wlan ieee80211_support_mesh net80211/ieee80211_input.c optional wlan net80211/ieee80211_ioctl.c optional wlan net80211/ieee80211_mesh.c optional wlan ieee80211_support_mesh \ compile-with "${NORMAL_C} -Wno-unused-function" net80211/ieee80211_monitor.c optional wlan net80211/ieee80211_node.c optional wlan net80211/ieee80211_output.c optional wlan net80211/ieee80211_phy.c optional wlan net80211/ieee80211_power.c optional wlan net80211/ieee80211_proto.c optional wlan net80211/ieee80211_radiotap.c optional wlan net80211/ieee80211_ratectl.c optional wlan net80211/ieee80211_ratectl_none.c optional wlan net80211/ieee80211_regdomain.c optional wlan net80211/ieee80211_rssadapt.c optional wlan wlan_rssadapt net80211/ieee80211_scan.c optional wlan net80211/ieee80211_scan_sta.c optional wlan net80211/ieee80211_sta.c optional wlan \ compile-with "${NORMAL_C} -Wno-unused-function" net80211/ieee80211_superg.c optional wlan ieee80211_support_superg net80211/ieee80211_scan_sw.c optional wlan net80211/ieee80211_tdma.c optional wlan ieee80211_support_tdma net80211/ieee80211_wds.c optional wlan net80211/ieee80211_xauth.c optional wlan wlan_xauth net80211/ieee80211_alq.c optional wlan ieee80211_alq netgraph/atm/ccatm/ng_ccatm.c optional ngatm_ccatm \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" netgraph/atm/ng_atm.c optional ngatm_atm netgraph/atm/ngatmbase.c optional ngatm_atmbase \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" netgraph/atm/sscfu/ng_sscfu.c optional ngatm_sscfu \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" netgraph/atm/sscop/ng_sscop.c optional ngatm_sscop \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" netgraph/atm/uni/ng_uni.c optional ngatm_uni \ compile-with "${NORMAL_C} -I$S/contrib/ngatm" netgraph/bluetooth/common/ng_bluetooth.c optional netgraph_bluetooth netgraph/bluetooth/drivers/bt3c/ng_bt3c_pccard.c optional netgraph_bluetooth_bt3c netgraph/bluetooth/drivers/h4/ng_h4.c optional netgraph_bluetooth_h4 netgraph/bluetooth/drivers/ubt/ng_ubt.c optional netgraph_bluetooth_ubt usb netgraph/bluetooth/drivers/ubtbcmfw/ubtbcmfw.c optional netgraph_bluetooth_ubtbcmfw usb netgraph/bluetooth/hci/ng_hci_cmds.c optional netgraph_bluetooth_hci netgraph/bluetooth/hci/ng_hci_evnt.c optional netgraph_bluetooth_hci netgraph/bluetooth/hci/ng_hci_main.c optional netgraph_bluetooth_hci netgraph/bluetooth/hci/ng_hci_misc.c optional netgraph_bluetooth_hci netgraph/bluetooth/hci/ng_hci_ulpi.c optional netgraph_bluetooth_hci netgraph/bluetooth/l2cap/ng_l2cap_cmds.c optional netgraph_bluetooth_l2cap netgraph/bluetooth/l2cap/ng_l2cap_evnt.c optional netgraph_bluetooth_l2cap netgraph/bluetooth/l2cap/ng_l2cap_llpi.c optional netgraph_bluetooth_l2cap netgraph/bluetooth/l2cap/ng_l2cap_main.c optional netgraph_bluetooth_l2cap netgraph/bluetooth/l2cap/ng_l2cap_misc.c optional netgraph_bluetooth_l2cap netgraph/bluetooth/l2cap/ng_l2cap_ulpi.c optional netgraph_bluetooth_l2cap netgraph/bluetooth/socket/ng_btsocket.c optional netgraph_bluetooth_socket netgraph/bluetooth/socket/ng_btsocket_hci_raw.c optional netgraph_bluetooth_socket netgraph/bluetooth/socket/ng_btsocket_l2cap.c optional netgraph_bluetooth_socket netgraph/bluetooth/socket/ng_btsocket_l2cap_raw.c optional netgraph_bluetooth_socket netgraph/bluetooth/socket/ng_btsocket_rfcomm.c optional netgraph_bluetooth_socket netgraph/bluetooth/socket/ng_btsocket_sco.c optional netgraph_bluetooth_socket netgraph/netflow/netflow.c optional netgraph_netflow netgraph/netflow/netflow_v9.c optional netgraph_netflow netgraph/netflow/ng_netflow.c optional netgraph_netflow netgraph/ng_UI.c optional netgraph_UI netgraph/ng_async.c optional netgraph_async netgraph/ng_atmllc.c optional netgraph_atmllc netgraph/ng_base.c optional netgraph netgraph/ng_bpf.c optional netgraph_bpf netgraph/ng_bridge.c optional netgraph_bridge netgraph/ng_car.c optional netgraph_car netgraph/ng_cisco.c optional netgraph_cisco netgraph/ng_deflate.c optional netgraph_deflate netgraph/ng_device.c optional netgraph_device netgraph/ng_echo.c optional netgraph_echo netgraph/ng_eiface.c optional netgraph_eiface netgraph/ng_ether.c optional netgraph_ether netgraph/ng_ether_echo.c optional netgraph_ether_echo netgraph/ng_frame_relay.c optional netgraph_frame_relay netgraph/ng_gif.c optional netgraph_gif inet6 | netgraph_gif inet netgraph/ng_gif_demux.c optional netgraph_gif_demux netgraph/ng_hole.c optional netgraph_hole netgraph/ng_iface.c optional netgraph_iface netgraph/ng_ip_input.c optional netgraph_ip_input netgraph/ng_ipfw.c optional netgraph_ipfw inet ipfirewall netgraph/ng_ksocket.c optional netgraph_ksocket netgraph/ng_l2tp.c optional netgraph_l2tp netgraph/ng_lmi.c optional netgraph_lmi netgraph/ng_mppc.c optional netgraph_mppc_compression | \ netgraph_mppc_encryption netgraph/ng_nat.c optional netgraph_nat inet libalias netgraph/ng_one2many.c optional netgraph_one2many netgraph/ng_parse.c optional netgraph netgraph/ng_patch.c optional netgraph_patch netgraph/ng_pipe.c optional netgraph_pipe netgraph/ng_ppp.c optional netgraph_ppp netgraph/ng_pppoe.c optional netgraph_pppoe netgraph/ng_pptpgre.c optional netgraph_pptpgre netgraph/ng_pred1.c optional netgraph_pred1 netgraph/ng_rfc1490.c optional netgraph_rfc1490 netgraph/ng_socket.c optional netgraph_socket netgraph/ng_split.c optional netgraph_split netgraph/ng_sppp.c optional netgraph_sppp netgraph/ng_tag.c optional netgraph_tag netgraph/ng_tcpmss.c optional netgraph_tcpmss netgraph/ng_tee.c optional netgraph_tee netgraph/ng_tty.c optional netgraph_tty netgraph/ng_vjc.c optional netgraph_vjc netgraph/ng_vlan.c optional netgraph_vlan netinet/accf_data.c optional accept_filter_data inet netinet/accf_dns.c optional accept_filter_dns inet netinet/accf_http.c optional accept_filter_http inet netinet/if_atm.c optional atm netinet/if_ether.c optional inet ether netinet/igmp.c optional inet netinet/in.c optional inet netinet/in_debug.c optional inet ddb netinet/in_kdtrace.c optional inet | inet6 netinet/ip_carp.c optional inet carp | inet6 carp netinet/in_fib.c optional inet netinet/in_gif.c optional gif inet | netgraph_gif inet netinet/ip_gre.c optional gre inet netinet/ip_id.c optional inet netinet/in_mcast.c optional inet netinet/in_pcb.c optional inet | inet6 netinet/in_pcbgroup.c optional inet pcbgroup | inet6 pcbgroup netinet/in_proto.c optional inet | inet6 netinet/in_rmx.c optional inet netinet/in_rss.c optional inet rss netinet/ip_divert.c optional inet ipdivert ipfirewall netinet/ip_ecn.c optional inet | inet6 netinet/ip_encap.c optional inet | inet6 netinet/ip_fastfwd.c optional inet netinet/ip_icmp.c optional inet | inet6 netinet/ip_input.c optional inet netinet/ip_ipsec.c optional inet ipsec netinet/ip_mroute.c optional mrouting inet netinet/ip_options.c optional inet netinet/ip_output.c optional inet netinet/ip_reass.c optional inet netinet/raw_ip.c optional inet | inet6 netinet/cc/cc.c optional inet | inet6 netinet/cc/cc_newreno.c optional inet | inet6 netinet/sctp_asconf.c optional inet sctp | inet6 sctp netinet/sctp_auth.c optional inet sctp | inet6 sctp netinet/sctp_bsd_addr.c optional inet sctp | inet6 sctp netinet/sctp_cc_functions.c optional inet sctp | inet6 sctp netinet/sctp_crc32.c optional inet sctp | inet6 sctp netinet/sctp_indata.c optional inet sctp | inet6 sctp netinet/sctp_input.c optional inet sctp | inet6 sctp netinet/sctp_output.c optional inet sctp | inet6 sctp netinet/sctp_pcb.c optional inet sctp | inet6 sctp netinet/sctp_peeloff.c optional inet sctp | inet6 sctp netinet/sctp_ss_functions.c optional inet sctp | inet6 sctp netinet/sctp_syscalls.c optional inet sctp | inet6 sctp netinet/sctp_sysctl.c optional inet sctp | inet6 sctp netinet/sctp_timer.c optional inet sctp | inet6 sctp netinet/sctp_usrreq.c optional inet sctp | inet6 sctp netinet/sctputil.c optional inet sctp | inet6 sctp netinet/siftr.c optional inet siftr alq | inet6 siftr alq netinet/tcp_debug.c optional tcpdebug netinet/tcp_fastopen.c optional inet tcp_rfc7413 | inet6 tcp_rfc7413 netinet/tcp_hostcache.c optional inet | inet6 netinet/tcp_input.c optional inet | inet6 netinet/tcp_lro.c optional inet | inet6 netinet/tcp_output.c optional inet | inet6 netinet/tcp_offload.c optional tcp_offload inet | tcp_offload inet6 netinet/tcp_pcap.c optional inet tcppcap | inet6 tcppcap netinet/tcp_reass.c optional inet | inet6 netinet/tcp_sack.c optional inet | inet6 netinet/tcp_subr.c optional inet | inet6 netinet/tcp_syncache.c optional inet | inet6 netinet/tcp_timer.c optional inet | inet6 netinet/tcp_timewait.c optional inet | inet6 netinet/tcp_usrreq.c optional inet | inet6 netinet/udp_usrreq.c optional inet | inet6 netinet/libalias/alias.c optional libalias inet | netgraph_nat inet netinet/libalias/alias_db.c optional libalias inet | netgraph_nat inet netinet/libalias/alias_mod.c optional libalias | netgraph_nat netinet/libalias/alias_proxy.c optional libalias inet | netgraph_nat inet netinet/libalias/alias_util.c optional libalias inet | netgraph_nat inet netinet/libalias/alias_sctp.c optional libalias inet | netgraph_nat inet netinet6/dest6.c optional inet6 netinet6/frag6.c optional inet6 netinet6/icmp6.c optional inet6 netinet6/in6.c optional inet6 netinet6/in6_cksum.c optional inet6 netinet6/in6_fib.c optional inet6 netinet6/in6_gif.c optional gif inet6 | netgraph_gif inet6 netinet6/in6_ifattach.c optional inet6 netinet6/in6_mcast.c optional inet6 netinet6/in6_pcb.c optional inet6 netinet6/in6_pcbgroup.c optional inet6 pcbgroup netinet6/in6_proto.c optional inet6 netinet6/in6_rmx.c optional inet6 netinet6/in6_rss.c optional inet6 rss netinet6/in6_src.c optional inet6 netinet6/ip6_forward.c optional inet6 netinet6/ip6_gre.c optional gre inet6 netinet6/ip6_id.c optional inet6 netinet6/ip6_input.c optional inet6 netinet6/ip6_mroute.c optional mrouting inet6 netinet6/ip6_output.c optional inet6 netinet6/ip6_ipsec.c optional inet6 ipsec netinet6/mld6.c optional inet6 netinet6/nd6.c optional inet6 netinet6/nd6_nbr.c optional inet6 netinet6/nd6_rtr.c optional inet6 netinet6/raw_ip6.c optional inet6 netinet6/route6.c optional inet6 netinet6/scope6.c optional inet6 netinet6/sctp6_usrreq.c optional inet6 sctp netinet6/udp6_usrreq.c optional inet6 netipsec/ipsec.c optional ipsec inet | ipsec inet6 netipsec/ipsec_input.c optional ipsec inet | ipsec inet6 netipsec/ipsec_mbuf.c optional ipsec inet | ipsec inet6 netipsec/ipsec_output.c optional ipsec inet | ipsec inet6 netipsec/key.c optional ipsec inet | ipsec inet6 netipsec/key_debug.c optional ipsec inet | ipsec inet6 netipsec/keysock.c optional ipsec inet | ipsec inet6 netipsec/xform_ah.c optional ipsec inet | ipsec inet6 netipsec/xform_esp.c optional ipsec inet | ipsec inet6 netipsec/xform_ipcomp.c optional ipsec inet | ipsec inet6 netipsec/xform_tcp.c optional ipsec inet tcp_signature | \ ipsec inet6 tcp_signature netnatm/natm.c optional natm netnatm/natm_pcb.c optional natm netnatm/natm_proto.c optional natm netpfil/ipfw/dn_heap.c optional inet dummynet netpfil/ipfw/dn_sched_fifo.c optional inet dummynet netpfil/ipfw/dn_sched_prio.c optional inet dummynet netpfil/ipfw/dn_sched_qfq.c optional inet dummynet netpfil/ipfw/dn_sched_rr.c optional inet dummynet netpfil/ipfw/dn_sched_wf2q.c optional inet dummynet netpfil/ipfw/ip_dummynet.c optional inet dummynet netpfil/ipfw/ip_dn_io.c optional inet dummynet netpfil/ipfw/ip_dn_glue.c optional inet dummynet netpfil/ipfw/ip_fw2.c optional inet ipfirewall netpfil/ipfw/ip_fw_dynamic.c optional inet ipfirewall netpfil/ipfw/ip_fw_log.c optional inet ipfirewall netpfil/ipfw/ip_fw_pfil.c optional inet ipfirewall netpfil/ipfw/ip_fw_sockopt.c optional inet ipfirewall netpfil/ipfw/ip_fw_table.c optional inet ipfirewall netpfil/ipfw/ip_fw_table_algo.c optional inet ipfirewall netpfil/ipfw/ip_fw_table_value.c optional inet ipfirewall netpfil/ipfw/ip_fw_iface.c optional inet ipfirewall netpfil/ipfw/ip_fw_nat.c optional inet ipfirewall_nat netpfil/pf/if_pflog.c optional pflog pf inet netpfil/pf/if_pfsync.c optional pfsync pf inet netpfil/pf/pf.c optional pf inet netpfil/pf/pf_if.c optional pf inet netpfil/pf/pf_ioctl.c optional pf inet netpfil/pf/pf_lb.c optional pf inet netpfil/pf/pf_norm.c optional pf inet netpfil/pf/pf_osfp.c optional pf inet netpfil/pf/pf_ruleset.c optional pf inet netpfil/pf/pf_table.c optional pf inet netpfil/pf/in4_cksum.c optional pf inet netsmb/smb_conn.c optional netsmb netsmb/smb_crypt.c optional netsmb netsmb/smb_dev.c optional netsmb netsmb/smb_iod.c optional netsmb netsmb/smb_rq.c optional netsmb netsmb/smb_smb.c optional netsmb netsmb/smb_subr.c optional netsmb netsmb/smb_trantcp.c optional netsmb netsmb/smb_usr.c optional netsmb nfs/bootp_subr.c optional bootp nfscl nfs/krpc_subr.c optional bootp nfscl nfs/nfs_diskless.c optional nfscl nfs_root nfs/nfs_fha.c optional nfsd nfs/nfs_lock.c optional nfscl | nfslockd | nfsd nfs/nfs_nfssvc.c optional nfscl | nfsd nlm/nlm_advlock.c optional nfslockd | nfsd nlm/nlm_prot_clnt.c optional nfslockd | nfsd nlm/nlm_prot_impl.c optional nfslockd | nfsd nlm/nlm_prot_server.c optional nfslockd | nfsd nlm/nlm_prot_svc.c optional nfslockd | nfsd nlm/nlm_prot_xdr.c optional nfslockd | nfsd nlm/sm_inter_xdr.c optional nfslockd | nfsd # Linux Kernel Programming Interface compat/linuxkpi/common/src/linux_kmod.c optional compat_linuxkpi \ no-depend compile-with "${LINUXKPI_C}" compat/linuxkpi/common/src/linux_compat.c optional compat_linuxkpi \ no-depend compile-with "${LINUXKPI_C}" compat/linuxkpi/common/src/linux_pci.c optional compat_linuxkpi pci \ no-depend compile-with "${LINUXKPI_C}" compat/linuxkpi/common/src/linux_idr.c optional compat_linuxkpi \ no-depend compile-with "${LINUXKPI_C}" compat/linuxkpi/common/src/linux_radix.c optional compat_linuxkpi \ no-depend compile-with "${LINUXKPI_C}" compat/linuxkpi/common/src/linux_usb.c optional compat_linuxkpi usb \ no-depend compile-with "${LINUXKPI_C}" # OpenFabrics Enterprise Distribution (Infiniband) ofed/drivers/infiniband/core/addr.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/agent.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/cache.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" # XXX Mad.c must be ordered before cm.c for sysinit sets to occur in # the correct order. ofed/drivers/infiniband/core/mad.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/cm.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/ -Wno-unused-function" ofed/drivers/infiniband/core/cma.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/device.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/fmr_pool.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/iwcm.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/mad_rmpp.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/multicast.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/packer.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/peer_mem.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/sa_query.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/smi.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/sysfs.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/ucm.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/ucma.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/ud_header.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/umem.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/user_mad.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/uverbs_cmd.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/uverbs_main.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/uverbs_marshall.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/core/verbs.c optional ofed \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/core/" ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c optional ipoib \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/ipoib/" #ofed/drivers/infiniband/ulp/ipoib/ipoib_fs.c optional ipoib \ # no-depend \ # compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/ipoib/" ofed/drivers/infiniband/ulp/ipoib/ipoib_ib.c optional ipoib \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/ipoib/" ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c optional ipoib \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/ipoib/" ofed/drivers/infiniband/ulp/ipoib/ipoib_multicast.c optional ipoib \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/ipoib/" ofed/drivers/infiniband/ulp/ipoib/ipoib_verbs.c optional ipoib \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/ipoib/" #ofed/drivers/infiniband/ulp/ipoib/ipoib_vlan.c optional ipoib \ # no-depend \ # compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/ipoib/" ofed/drivers/infiniband/ulp/sdp/sdp_bcopy.c optional sdp inet \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/sdp/" ofed/drivers/infiniband/ulp/sdp/sdp_main.c optional sdp inet \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/sdp/" ofed/drivers/infiniband/ulp/sdp/sdp_rx.c optional sdp inet \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/sdp/" ofed/drivers/infiniband/ulp/sdp/sdp_cma.c optional sdp inet \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/sdp/" ofed/drivers/infiniband/ulp/sdp/sdp_tx.c optional sdp inet \ no-depend \ compile-with "${OFED_C} -I$S/ofed/drivers/infiniband/ulp/sdp/" ofed/drivers/infiniband/hw/mlx4/alias_GUID.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/mcg.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/sysfs.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/cm.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/ah.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/cq.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/doorbell.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/mad.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/main.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/mlx4_exp.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/mr.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/qp.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/srq.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/infiniband/hw/mlx4/wc.c optional mlx4ib \ no-depend obj-prefix "mlx4ib_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/infiniband/hw/mlx4/" ofed/drivers/net/mlx4/alloc.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/catas.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/cmd.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/cq.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/eq.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/fw.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/icm.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/intf.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/main.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/mcg.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/ -Wno-unused" ofed/drivers/net/mlx4/mr.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/pd.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/port.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/profile.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/qp.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/reset.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/sense.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/srq.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/resource_tracker.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/sys_tune.c optional mlx4ib | mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/en_cq.c optional mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/en_main.c optional mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/en_netdev.c optional mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/en_port.c optional mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/en_resources.c optional mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/en_rx.c optional mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" ofed/drivers/net/mlx4/en_tx.c optional mlxen \ no-depend obj-prefix "mlx4_" \ compile-with "${OFED_C_NOIMP} -I$S/ofed/drivers/net/mlx4/" dev/mlx5/mlx5_core/mlx5_alloc.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_cmd.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_cq.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_eq.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_flow_table.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_fw.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_health.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_mad.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_main.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_mcg.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_mr.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_pagealloc.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_pd.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_port.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_qp.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_srq.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_transobj.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_uar.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_vport.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_core/mlx5_wq.c optional mlx5 pci \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_en/mlx5_en_ethtool.c optional mlx5en pci inet inet6 \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_en/mlx5_en_main.c optional mlx5en pci inet inet6 \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_en/mlx5_en_tx.c optional mlx5en pci inet inet6 \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_en/mlx5_en_flow_table.c optional mlx5en pci inet inet6 \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_en/mlx5_en_rx.c optional mlx5en pci inet inet6 \ no-depend compile-with "${OFED_C}" dev/mlx5/mlx5_en/mlx5_en_txrx.c optional mlx5en pci inet inet6 \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_allocator.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_av.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_catas.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_cmd.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_cq.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_eq.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_mad.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_main.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_mcg.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_memfree.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_mr.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_pd.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_profile.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_provider.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_qp.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_reset.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_srq.c optional mthca \ no-depend compile-with "${OFED_C}" ofed/drivers/infiniband/hw/mthca/mthca_uar.c optional mthca \ no-depend compile-with "${OFED_C}" # crypto support opencrypto/cast.c optional crypto | ipsec opencrypto/criov.c optional crypto | ipsec opencrypto/crypto.c optional crypto | ipsec opencrypto/cryptodev.c optional cryptodev opencrypto/cryptodev_if.m optional crypto | ipsec opencrypto/cryptosoft.c optional crypto | ipsec opencrypto/cryptodeflate.c optional crypto | ipsec opencrypto/gmac.c optional crypto | ipsec opencrypto/gfmult.c optional crypto | ipsec opencrypto/rmd160.c optional crypto | ipsec opencrypto/skipjack.c optional crypto | ipsec opencrypto/xform.c optional crypto | ipsec rpc/auth_none.c optional krpc | nfslockd | nfscl | nfsd rpc/auth_unix.c optional krpc | nfslockd | nfscl | nfsd rpc/authunix_prot.c optional krpc | nfslockd | nfscl | nfsd rpc/clnt_bck.c optional krpc | nfslockd | nfscl | nfsd rpc/clnt_dg.c optional krpc | nfslockd | nfscl | nfsd rpc/clnt_rc.c optional krpc | nfslockd | nfscl | nfsd rpc/clnt_vc.c optional krpc | nfslockd | nfscl | nfsd rpc/getnetconfig.c optional krpc | nfslockd | nfscl | nfsd rpc/replay.c optional krpc | nfslockd | nfscl | nfsd rpc/rpc_callmsg.c optional krpc | nfslockd | nfscl | nfsd rpc/rpc_generic.c optional krpc | nfslockd | nfscl | nfsd rpc/rpc_prot.c optional krpc | nfslockd | nfscl | nfsd rpc/rpcb_clnt.c optional krpc | nfslockd | nfscl | nfsd rpc/rpcb_prot.c optional krpc | nfslockd | nfscl | nfsd rpc/svc.c optional krpc | nfslockd | nfscl | nfsd rpc/svc_auth.c optional krpc | nfslockd | nfscl | nfsd rpc/svc_auth_unix.c optional krpc | nfslockd | nfscl | nfsd rpc/svc_dg.c optional krpc | nfslockd | nfscl | nfsd rpc/svc_generic.c optional krpc | nfslockd | nfscl | nfsd rpc/svc_vc.c optional krpc | nfslockd | nfscl | nfsd rpc/rpcsec_gss/rpcsec_gss.c optional krpc kgssapi | nfslockd kgssapi | nfscl kgssapi | nfsd kgssapi rpc/rpcsec_gss/rpcsec_gss_conf.c optional krpc kgssapi | nfslockd kgssapi | nfscl kgssapi | nfsd kgssapi rpc/rpcsec_gss/rpcsec_gss_misc.c optional krpc kgssapi | nfslockd kgssapi | nfscl kgssapi | nfsd kgssapi rpc/rpcsec_gss/rpcsec_gss_prot.c optional krpc kgssapi | nfslockd kgssapi | nfscl kgssapi | nfsd kgssapi rpc/rpcsec_gss/svc_rpcsec_gss.c optional krpc kgssapi | nfslockd kgssapi | nfscl kgssapi | nfsd kgssapi security/audit/audit.c optional audit security/audit/audit_arg.c optional audit security/audit/audit_bsm.c optional audit security/audit/audit_bsm_klib.c optional audit security/audit/audit_pipe.c optional audit security/audit/audit_syscalls.c standard security/audit/audit_trigger.c optional audit security/audit/audit_worker.c optional audit security/audit/bsm_domain.c optional audit security/audit/bsm_errno.c optional audit security/audit/bsm_fcntl.c optional audit security/audit/bsm_socket_type.c optional audit security/audit/bsm_token.c optional audit security/mac/mac_audit.c optional mac audit security/mac/mac_cred.c optional mac security/mac/mac_framework.c optional mac security/mac/mac_inet.c optional mac inet | mac inet6 security/mac/mac_inet6.c optional mac inet6 security/mac/mac_label.c optional mac security/mac/mac_net.c optional mac security/mac/mac_pipe.c optional mac security/mac/mac_posix_sem.c optional mac security/mac/mac_posix_shm.c optional mac security/mac/mac_priv.c optional mac security/mac/mac_process.c optional mac security/mac/mac_socket.c optional mac security/mac/mac_syscalls.c standard security/mac/mac_system.c optional mac security/mac/mac_sysv_msg.c optional mac security/mac/mac_sysv_sem.c optional mac security/mac/mac_sysv_shm.c optional mac security/mac/mac_vfs.c optional mac security/mac_biba/mac_biba.c optional mac_biba security/mac_bsdextended/mac_bsdextended.c optional mac_bsdextended security/mac_bsdextended/ugidfw_system.c optional mac_bsdextended security/mac_bsdextended/ugidfw_vnode.c optional mac_bsdextended security/mac_ifoff/mac_ifoff.c optional mac_ifoff security/mac_lomac/mac_lomac.c optional mac_lomac security/mac_mls/mac_mls.c optional mac_mls security/mac_none/mac_none.c optional mac_none security/mac_partition/mac_partition.c optional mac_partition security/mac_portacl/mac_portacl.c optional mac_portacl security/mac_seeotheruids/mac_seeotheruids.c optional mac_seeotheruids security/mac_stub/mac_stub.c optional mac_stub security/mac_test/mac_test.c optional mac_test teken/teken.c optional sc | vt ufs/ffs/ffs_alloc.c optional ffs ufs/ffs/ffs_balloc.c optional ffs ufs/ffs/ffs_inode.c optional ffs ufs/ffs/ffs_snapshot.c optional ffs ufs/ffs/ffs_softdep.c optional ffs ufs/ffs/ffs_subr.c optional ffs ufs/ffs/ffs_tables.c optional ffs ufs/ffs/ffs_vfsops.c optional ffs ufs/ffs/ffs_vnops.c optional ffs ufs/ffs/ffs_rawread.c optional ffs directio ufs/ffs/ffs_suspend.c optional ffs ufs/ufs/ufs_acl.c optional ffs ufs/ufs/ufs_bmap.c optional ffs ufs/ufs/ufs_dirhash.c optional ffs ufs/ufs/ufs_extattr.c optional ffs ufs/ufs/ufs_gjournal.c optional ffs UFS_GJOURNAL ufs/ufs/ufs_inode.c optional ffs ufs/ufs/ufs_lookup.c optional ffs ufs/ufs/ufs_quota.c optional ffs ufs/ufs/ufs_vfsops.c optional ffs ufs/ufs/ufs_vnops.c optional ffs vm/default_pager.c standard vm/device_pager.c standard vm/phys_pager.c standard vm/redzone.c optional DEBUG_REDZONE vm/sg_pager.c standard vm/swap_pager.c standard vm/uma_core.c standard vm/uma_dbg.c standard vm/memguard.c optional DEBUG_MEMGUARD vm/vm_fault.c standard vm/vm_glue.c standard vm/vm_init.c standard vm/vm_kern.c standard vm/vm_map.c standard vm/vm_meter.c standard vm/vm_mmap.c standard vm/vm_object.c standard vm/vm_page.c standard vm/vm_pageout.c standard vm/vm_pager.c standard vm/vm_phys.c standard vm/vm_radix.c standard vm/vm_reserv.c standard vm/vm_domain.c standard vm/vm_unix.c standard vm/vm_zeroidle.c standard vm/vnode_pager.c standard xen/features.c optional xenhvm xen/xenbus/xenbus_if.m optional xenhvm xen/xenbus/xenbus.c optional xenhvm xen/xenbus/xenbusb_if.m optional xenhvm xen/xenbus/xenbusb.c optional xenhvm xen/xenbus/xenbusb_front.c optional xenhvm xen/xenbus/xenbusb_back.c optional xenhvm xen/xenmem/xenmem_if.m optional xenhvm xdr/xdr.c optional krpc | nfslockd | nfscl | nfsd xdr/xdr_array.c optional krpc | nfslockd | nfscl | nfsd xdr/xdr_mbuf.c optional krpc | nfslockd | nfscl | nfsd xdr/xdr_mem.c optional krpc | nfslockd | nfscl | nfsd xdr/xdr_reference.c optional krpc | nfslockd | nfscl | nfsd xdr/xdr_sizeof.c optional krpc | nfslockd | nfscl | nfsd Index: projects/clang380-import/sys/conf/files.arm =================================================================== --- projects/clang380-import/sys/conf/files.arm (revision 294776) +++ projects/clang380-import/sys/conf/files.arm (revision 294777) @@ -1,138 +1,139 @@ # $FreeBSD$ arm/arm/autoconf.c standard arm/arm/bcopy_page.S standard arm/arm/bcopyinout.S standard arm/arm/blockio.S standard arm/arm/bus_space_asm_generic.S standard arm/arm/bus_space_base.c optional fdt arm/arm/bus_space_generic.c standard arm/arm/busdma_machdep.c optional !armv6 arm/arm/busdma_machdep-v6.c optional armv6 arm/arm/copystr.S standard arm/arm/cpufunc.c standard arm/arm/cpufunc_asm.S standard arm/arm/cpufunc_asm_arm9.S optional cpu_arm9 arm/arm/cpufunc_asm_arm10.S optional cpu_arm9e arm/arm/cpufunc_asm_arm11.S optional cpu_arm1176 arm/arm/cpufunc_asm_arm11x6.S optional cpu_arm1176 arm/arm/cpufunc_asm_armv4.S optional cpu_arm9 | cpu_arm9e | cpu_fa526 | cpu_xscale_80321 | cpu_xscale_pxa2x0 | cpu_xscale_ixp425 | cpu_xscale_80219 | cpu_xscale_81342 arm/arm/cpufunc_asm_armv5_ec.S optional cpu_arm9e arm/arm/cpufunc_asm_armv6.S optional cpu_arm1176 arm/arm/cpufunc_asm_armv7.S optional cpu_cortexa | cpu_krait | cpu_mv_pj4b arm/arm/cpufunc_asm_fa526.S optional cpu_fa526 arm/arm/cpufunc_asm_pj4b.S optional cpu_mv_pj4b arm/arm/cpufunc_asm_sheeva.S optional cpu_arm9e arm/arm/cpufunc_asm_xscale.S optional cpu_xscale_80321 | cpu_xscale_pxa2x0 | cpu_xscale_ixp425 | cpu_xscale_80219 | cpu_xscale_81342 arm/arm/cpufunc_asm_xscale_c3.S optional cpu_xscale_81342 arm/arm/cpuinfo.c standard arm/arm/cpu_asm-v6.S optional armv6 arm/arm/db_disasm.c optional ddb arm/arm/db_interface.c optional ddb arm/arm/db_trace.c optional ddb +arm/arm/debug_monitor.c optional ddb armv6 arm/arm/devmap.c standard arm/arm/disassem.c optional ddb arm/arm/dump_machdep.c standard arm/arm/elf_machdep.c standard arm/arm/elf_note.S standard arm/arm/exception.S standard arm/arm/fiq.c standard arm/arm/fiq_subr.S standard arm/arm/fusu.S standard arm/arm/gdb_machdep.c optional gdb arm/arm/generic_timer.c optional generic_timer arm/arm/gic.c optional gic arm/arm/hdmi_if.m optional hdmi arm/arm/identcpu.c standard arm/arm/in_cksum.c optional inet | inet6 arm/arm/in_cksum_arm.S optional inet | inet6 arm/arm/intr.c optional !arm_intrng kern/subr_intr.c optional arm_intrng arm/arm/locore.S standard no-obj arm/arm/machdep.c standard arm/arm/machdep_intr.c standard arm/arm/mem.c optional mem arm/arm/minidump_machdep.c optional mem arm/arm/mp_machdep.c optional smp arm/arm/mpcore_timer.c optional mpcore_timer arm/arm/nexus.c standard arm/arm/ofw_machdep.c optional fdt arm/arm/physmem.c standard kern/pic_if.m optional arm_intrng arm/arm/pl190.c optional pl190 arm/arm/pl310.c optional pl310 arm/arm/platform.c optional platform arm/arm/platform_if.m optional platform arm/arm/pmap.c optional !armv6 arm/arm/pmap-v6.c optional armv6 !arm_new_pmap arm/arm/pmap-v6-new.c optional armv6 arm_new_pmap arm/arm/pmu.c optional pmu | fdt hwpmc arm/arm/sc_machdep.c optional sc arm/arm/setcpsr.S standard arm/arm/setstack.s standard arm/arm/stack_machdep.c optional ddb | stack arm/arm/stdatomic.c standard \ compile-with "${NORMAL_C:N-Wmissing-prototypes}" arm/arm/support.S standard arm/arm/swtch.S standard arm/arm/sys_machdep.c standard arm/arm/syscall.c standard arm/arm/trap.c optional !armv6 arm/arm/trap-v6.c optional armv6 arm/arm/uio_machdep.c standard arm/arm/undefined.c standard arm/arm/unwind.c optional ddb | kdtrace_hooks arm/arm/vm_machdep.c standard arm/arm/vfp.c standard board_id.h standard \ dependency "$S/arm/conf/genboardid.awk $S/arm/conf/mach-types" \ compile-with "${AWK} -f $S/arm/conf/genboardid.awk $S/arm/conf/mach-types > board_id.h" \ no-obj no-implicit-rule before-depend \ clean "board_id.h" cddl/compat/opensolaris/kern/opensolaris_atomic.c optional zfs | dtrace compile-with "${CDDL_C}" cddl/dev/dtrace/arm/dtrace_asm.S optional dtrace compile-with "${DTRACE_S}" cddl/dev/dtrace/arm/dtrace_subr.c optional dtrace compile-with "${DTRACE_C}" cddl/dev/fbt/arm/fbt_isa.c optional dtrace_fbt | dtraceall compile-with "${FBT_C}" crypto/blowfish/bf_enc.c optional crypto | ipsec crypto/des/des_enc.c optional crypto | ipsec | netsmb dev/dwc/if_dwc.c optional dwc dev/dwc/if_dwc_if.m optional dwc dev/fb/fb.c optional sc dev/fdt/fdt_arm_platform.c optional platform fdt dev/hwpmc/hwpmc_arm.c optional hwpmc dev/hwpmc/hwpmc_armv7.c optional hwpmc armv6 dev/psci/psci.c optional psci dev/psci/psci_arm.S optional psci dev/syscons/scgfbrndr.c optional sc dev/syscons/scterm-teken.c optional sc dev/syscons/scvtb.c optional sc dev/uart/uart_cpu_fdt.c optional uart fdt font.h optional sc \ compile-with "uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x16.fnt && file2c 'u_char dflt_font_16[16*256] = {' '};' < ${SC_DFLT_FONT}-8x16 > font.h && uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x14.fnt && file2c 'u_char dflt_font_14[14*256] = {' '};' < ${SC_DFLT_FONT}-8x14 >> font.h && uudecode < /usr/share/syscons/fonts/${SC_DFLT_FONT}-8x8.fnt && file2c 'u_char dflt_font_8[8*256] = {' '};' < ${SC_DFLT_FONT}-8x8 >> font.h" \ no-obj no-implicit-rule before-depend \ clean "font.h ${SC_DFLT_FONT}-8x14 ${SC_DFLT_FONT}-8x16 ${SC_DFLT_FONT}-8x8" kern/subr_busdma_bufalloc.c standard kern/subr_sfbuf.c standard libkern/arm/aeabi_unwind.c standard libkern/arm/divsi3.S standard libkern/arm/ffs.S standard libkern/arm/ldivmod.S standard libkern/arm/ldivmod_helper.c standard libkern/arm/memclr.S standard libkern/arm/memcpy.S standard libkern/arm/memset.S standard libkern/arm/muldi3.c standard libkern/ashldi3.c standard libkern/ashrdi3.c standard libkern/divdi3.c standard libkern/ffsl.c standard libkern/ffsll.c standard libkern/fls.c standard libkern/flsl.c standard libkern/flsll.c standard libkern/lshrdi3.c standard libkern/moddi3.c standard libkern/qdivrem.c standard libkern/ucmpdi2.c standard libkern/udivdi3.c standard libkern/umoddi3.c standard Index: projects/clang380-import/sys/conf/options =================================================================== --- projects/clang380-import/sys/conf/options (revision 294776) +++ projects/clang380-import/sys/conf/options (revision 294777) @@ -1,965 +1,966 @@ # $FreeBSD$ # # On the handling of kernel options # # All kernel options should be listed in NOTES, with suitable # descriptions. Negative options (options that make some code not # compile) should be commented out; LINT (generated from NOTES) should # compile as much code as possible. Try to structure option-using # code so that a single option only switch code on, or only switch # code off, to make it possible to have a full compile-test. If # necessary, you can check for COMPILING_LINT to get maximum code # coverage. # # All new options shall also be listed in either "conf/options" or # "conf/options.". Options that affect a single source-file # .[c|s] should be directed into "opt_.h", while options # that affect multiple files should either go in "opt_global.h" if # this is a kernel-wide option (used just about everywhere), or in # "opt_.h" if it affects only some files. # Note that the effect of listing only an option without a # header-file-name in conf/options (and cousins) is that the last # convention is followed. # # This handling scheme is not yet fully implemented. # # # Format of this file: # Option name filename # # If filename is missing, the default is # opt_.h AAC_DEBUG opt_aac.h AACRAID_DEBUG opt_aacraid.h AHC_ALLOW_MEMIO opt_aic7xxx.h AHC_TMODE_ENABLE opt_aic7xxx.h AHC_DUMP_EEPROM opt_aic7xxx.h AHC_DEBUG opt_aic7xxx.h AHC_DEBUG_OPTS opt_aic7xxx.h AHC_REG_PRETTY_PRINT opt_aic7xxx.h AHD_DEBUG opt_aic79xx.h AHD_DEBUG_OPTS opt_aic79xx.h AHD_TMODE_ENABLE opt_aic79xx.h AHD_REG_PRETTY_PRINT opt_aic79xx.h ADW_ALLOW_MEMIO opt_adw.h TWA_DEBUG opt_twa.h TWA_FLASH_FIRMWARE opt_twa.h # Debugging options. ALT_BREAK_TO_DEBUGGER opt_kdb.h BREAK_TO_DEBUGGER opt_kdb.h DDB DDB_BUFR_SIZE opt_ddb.h DDB_CAPTURE_DEFAULTBUFSIZE opt_ddb.h DDB_CAPTURE_MAXBUFSIZE opt_ddb.h DDB_CTF opt_ddb.h DDB_NUMSYM opt_ddb.h GDB KDB opt_global.h KDB_TRACE opt_kdb.h KDB_UNATTENDED opt_kdb.h KLD_DEBUG opt_kld.h SYSCTL_DEBUG opt_sysctl.h EARLY_PRINTF opt_global.h TEXTDUMP_PREFERRED opt_ddb.h TEXTDUMP_VERBOSE opt_ddb.h # Miscellaneous options. ADAPTIVE_LOCKMGRS ALQ ALTERA_SDCARD_FAST_SIM opt_altera_sdcard.h ATSE_CFI_HACK opt_cfi.h AUDIT opt_global.h BOOTHOWTO opt_global.h BOOTVERBOSE opt_global.h CALLOUT_PROFILING CAPABILITIES opt_capsicum.h CAPABILITY_MODE opt_capsicum.h COMPAT_43 opt_compat.h COMPAT_43TTY opt_compat.h COMPAT_FREEBSD4 opt_compat.h COMPAT_FREEBSD5 opt_compat.h COMPAT_FREEBSD6 opt_compat.h COMPAT_FREEBSD7 opt_compat.h COMPAT_FREEBSD9 opt_compat.h COMPAT_FREEBSD10 opt_compat.h COMPAT_CLOUDABI64 opt_dontuse.h COMPAT_LINUXKPI opt_compat.h COMPILING_LINT opt_global.h CY_PCI_FASTINTR DEADLKRES opt_watchdog.h +EXT_RESOURCES opt_global.h DIRECTIO FILEMON opt_dontuse.h FFCLOCK FULL_PREEMPTION opt_sched.h GZIO opt_gzio.h IMAGACT_BINMISC opt_dontuse.h IPI_PREEMPTION opt_sched.h GEOM_AES opt_geom.h GEOM_BDE opt_geom.h GEOM_BSD opt_geom.h GEOM_CACHE opt_geom.h GEOM_CONCAT opt_geom.h GEOM_ELI opt_geom.h GEOM_FOX opt_geom.h GEOM_GATE opt_geom.h GEOM_JOURNAL opt_geom.h GEOM_LABEL opt_geom.h GEOM_LABEL_GPT opt_geom.h GEOM_LINUX_LVM opt_geom.h GEOM_MAP opt_geom.h GEOM_MBR opt_geom.h GEOM_MIRROR opt_geom.h GEOM_MOUNTVER opt_geom.h GEOM_MULTIPATH opt_geom.h GEOM_NOP opt_geom.h GEOM_PART_APM opt_geom.h GEOM_PART_BSD opt_geom.h GEOM_PART_BSD64 opt_geom.h GEOM_PART_EBR opt_geom.h GEOM_PART_EBR_COMPAT opt_geom.h GEOM_PART_GPT opt_geom.h GEOM_PART_LDM opt_geom.h GEOM_PART_MBR opt_geom.h GEOM_PART_PC98 opt_geom.h GEOM_PART_VTOC8 opt_geom.h GEOM_PC98 opt_geom.h GEOM_RAID opt_geom.h GEOM_RAID3 opt_geom.h GEOM_SHSEC opt_geom.h GEOM_STRIPE opt_geom.h GEOM_SUNLABEL opt_geom.h GEOM_UNCOMPRESS opt_geom.h GEOM_UNCOMPRESS_DEBUG opt_geom.h GEOM_UZIP opt_geom.h GEOM_VINUM opt_geom.h GEOM_VIRSTOR opt_geom.h GEOM_VOL opt_geom.h GEOM_ZERO opt_geom.h KDTRACE_HOOKS opt_global.h KDTRACE_FRAME opt_kdtrace.h KN_HASHSIZE opt_kqueue.h KSTACK_MAX_PAGES KSTACK_PAGES KSTACK_USAGE_PROF KTRACE KTRACE_REQUEST_POOL opt_ktrace.h LIBICONV MAC opt_global.h MAC_BIBA opt_dontuse.h MAC_BSDEXTENDED opt_dontuse.h MAC_IFOFF opt_dontuse.h MAC_LOMAC opt_dontuse.h MAC_MLS opt_dontuse.h MAC_NONE opt_dontuse.h MAC_PARTITION opt_dontuse.h MAC_PORTACL opt_dontuse.h MAC_SEEOTHERUIDS opt_dontuse.h MAC_STATIC opt_mac.h MAC_STUB opt_dontuse.h MAC_TEST opt_dontuse.h MD_ROOT opt_md.h MD_ROOT_FSTYPE opt_md.h MD_ROOT_SIZE opt_md.h MFI_DEBUG opt_mfi.h MFI_DECODE_LOG opt_mfi.h MPROF_BUFFERS opt_mprof.h MPROF_HASH_SIZE opt_mprof.h NEW_PCIB opt_global.h NO_ADAPTIVE_MUTEXES opt_adaptive_mutexes.h NO_ADAPTIVE_RWLOCKS NO_ADAPTIVE_SX NO_EVENTTIMERS opt_timer.h NO_SYSCTL_DESCR opt_global.h NSWBUF_MIN opt_swap.h MBUF_PACKET_ZONE_DISABLE opt_global.h PANIC_REBOOT_WAIT_TIME opt_panic.h PCI_IOV opt_global.h PPC_DEBUG opt_ppc.h PPC_PROBE_CHIPSET opt_ppc.h PPS_SYNC opt_ntp.h PREEMPTION opt_sched.h QUOTA SCHED_4BSD opt_sched.h SCHED_STATS opt_sched.h SCHED_ULE opt_sched.h SLEEPQUEUE_PROFILING SLHCI_DEBUG opt_slhci.h SPX_HACK STACK opt_stack.h SUIDDIR MSGMNB opt_sysvipc.h MSGMNI opt_sysvipc.h MSGSEG opt_sysvipc.h MSGSSZ opt_sysvipc.h MSGTQL opt_sysvipc.h SEMMNI opt_sysvipc.h SEMMNS opt_sysvipc.h SEMMNU opt_sysvipc.h SEMMSL opt_sysvipc.h SEMOPM opt_sysvipc.h SEMUME opt_sysvipc.h SHMALL opt_sysvipc.h SHMMAX opt_sysvipc.h SHMMAXPGS opt_sysvipc.h SHMMIN opt_sysvipc.h SHMMNI opt_sysvipc.h SHMSEG opt_sysvipc.h SYSVMSG opt_sysvipc.h SYSVSEM opt_sysvipc.h SYSVSHM opt_sysvipc.h SW_WATCHDOG opt_watchdog.h TURNSTILE_PROFILING UMTX_PROFILING VFS_AIO VERBOSE_SYSINIT WLCACHE opt_wavelan.h WLDEBUG opt_wavelan.h # POSIX kernel options P1003_1B_MQUEUE opt_posix.h P1003_1B_SEMAPHORES opt_posix.h _KPOSIX_PRIORITY_SCHEDULING opt_posix.h # Do we want the config file compiled into the kernel? INCLUDE_CONFIG_FILE opt_config.h # Options for static filesystems. These should only be used at config # time, since the corresponding lkms cannot work if there are any static # dependencies. Unusability is enforced by hiding the defines for the # options in a never-included header. AUTOFS opt_dontuse.h CD9660 opt_dontuse.h EXT2FS opt_dontuse.h FDESCFS opt_dontuse.h FFS opt_dontuse.h FUSE opt_dontuse.h MSDOSFS opt_dontuse.h NANDFS opt_dontuse.h NULLFS opt_dontuse.h PROCFS opt_dontuse.h PSEUDOFS opt_dontuse.h REISERFS opt_dontuse.h SMBFS opt_dontuse.h TMPFS opt_dontuse.h UDF opt_dontuse.h UNIONFS opt_dontuse.h ZFS opt_dontuse.h # Pseudofs debugging PSEUDOFS_TRACE opt_pseudofs.h # In-kernel GSS-API KGSSAPI opt_kgssapi.h KGSSAPI_DEBUG opt_kgssapi.h # These static filesystems have one slightly bogus static dependency in # sys/i386/i386/autoconf.c. If any of these filesystems are # statically compiled into the kernel, code for mounting them as root # filesystems will be enabled - but look below. # NFSCL - client # NFSD - server NFSCL opt_nfs.h NFSD opt_nfs.h # filesystems and libiconv bridge CD9660_ICONV opt_dontuse.h MSDOSFS_ICONV opt_dontuse.h UDF_ICONV opt_dontuse.h # If you are following the conditions in the copyright, # you can enable soft-updates which will speed up a lot of thigs # and make the system safer from crashes at the same time. # otherwise a STUB module will be compiled in. SOFTUPDATES opt_ffs.h # On small, embedded systems, it can be useful to turn off support for # snapshots. It saves about 30-40k for a feature that would be lightly # used, if it is used at all. NO_FFS_SNAPSHOT opt_ffs.h # Enabling this option turns on support for Access Control Lists in UFS, # which can be used to support high security configurations. Depends on # UFS_EXTATTR. UFS_ACL opt_ufs.h # Enabling this option turns on support for extended attributes in UFS-based # filesystems, which can be used to support high security configurations # as well as new filesystem features. UFS_EXTATTR opt_ufs.h UFS_EXTATTR_AUTOSTART opt_ufs.h # Enable fast hash lookups for large directories on UFS-based filesystems. UFS_DIRHASH opt_ufs.h # Enable gjournal-based UFS journal. UFS_GJOURNAL opt_ufs.h # The below sentence is not in English, and neither is this one. # We plan to remove the static dependences above, with a # _ROOT option to control if it usable as root. This list # allows these options to be present in config files already (though # they won't make any difference yet). NFS_ROOT opt_nfsroot.h # SMB/CIFS requester NETSMB opt_netsmb.h # Options used only in subr_param.c. HZ opt_param.h MAXFILES opt_param.h NBUF opt_param.h NSFBUFS opt_param.h VM_BCACHE_SIZE_MAX opt_param.h VM_SWZONE_SIZE_MAX opt_param.h MAXUSERS DFLDSIZ opt_param.h MAXDSIZ opt_param.h MAXSSIZ opt_param.h # Generic SCSI options. CAM_MAX_HIGHPOWER opt_cam.h CAMDEBUG opt_cam.h CAM_DEBUG_COMPILE opt_cam.h CAM_DEBUG_DELAY opt_cam.h CAM_DEBUG_BUS opt_cam.h CAM_DEBUG_TARGET opt_cam.h CAM_DEBUG_LUN opt_cam.h CAM_DEBUG_FLAGS opt_cam.h CAM_BOOT_DELAY opt_cam.h SCSI_DELAY opt_scsi.h SCSI_NO_SENSE_STRINGS opt_scsi.h SCSI_NO_OP_STRINGS opt_scsi.h # Options used only in cam/ata/ata_da.c ADA_TEST_FAILURE opt_ada.h ATA_STATIC_ID opt_ada.h # Options used only in cam/scsi/scsi_cd.c CHANGER_MIN_BUSY_SECONDS opt_cd.h CHANGER_MAX_BUSY_SECONDS opt_cd.h # Options used only in cam/scsi/scsi_sa.c. SA_IO_TIMEOUT opt_sa.h SA_SPACE_TIMEOUT opt_sa.h SA_REWIND_TIMEOUT opt_sa.h SA_ERASE_TIMEOUT opt_sa.h SA_1FM_AT_EOD opt_sa.h # Options used only in cam/scsi/scsi_pt.c SCSI_PT_DEFAULT_TIMEOUT opt_pt.h # Options used only in cam/scsi/scsi_ses.c SES_ENABLE_PASSTHROUGH opt_ses.h # Options used in dev/sym/ (Symbios SCSI driver). SYM_SETUP_LP_PROBE_MAP opt_sym.h #-Low Priority Probe Map (bits) # Allows the ncr to take precedence # 1 (1<<0) -> 810a, 860 # 2 (1<<1) -> 825a, 875, 885, 895 # 4 (1<<2) -> 895a, 896, 1510d SYM_SETUP_SCSI_DIFF opt_sym.h #-HVD support for 825a, 875, 885 # disabled:0 (default), enabled:1 SYM_SETUP_PCI_PARITY opt_sym.h #-PCI parity checking # disabled:0, enabled:1 (default) SYM_SETUP_MAX_LUN opt_sym.h #-Number of LUNs supported # default:8, range:[1..64] # Options used only in dev/ncr/* SCSI_NCR_DEBUG opt_ncr.h SCSI_NCR_MAX_SYNC opt_ncr.h SCSI_NCR_MAX_WIDE opt_ncr.h SCSI_NCR_MYADDR opt_ncr.h # Options used only in dev/isp/* ISP_TARGET_MODE opt_isp.h ISP_FW_CRASH_DUMP opt_isp.h ISP_DEFAULT_ROLES opt_isp.h ISP_INTERNAL_TARGET opt_isp.h # Options used only in dev/iscsi ISCSI_INITIATOR_DEBUG opt_iscsi_initiator.h # Net stuff. ACCEPT_FILTER_DATA ACCEPT_FILTER_DNS ACCEPT_FILTER_HTTP ALTQ opt_global.h ALTQ_CBQ opt_altq.h ALTQ_CDNR opt_altq.h ALTQ_CODEL opt_altq.h ALTQ_DEBUG opt_altq.h ALTQ_HFSC opt_altq.h ALTQ_FAIRQ opt_altq.h ALTQ_NOPCC opt_altq.h ALTQ_PRIQ opt_altq.h ALTQ_RED opt_altq.h ALTQ_RIO opt_altq.h BOOTP opt_bootp.h BOOTP_BLOCKSIZE opt_bootp.h BOOTP_COMPAT opt_bootp.h BOOTP_NFSROOT opt_bootp.h BOOTP_NFSV3 opt_bootp.h BOOTP_WIRED_TO opt_bootp.h DEVICE_POLLING DUMMYNET opt_ipdn.h INET opt_inet.h INET6 opt_inet6.h IPDIVERT IPFILTER opt_ipfilter.h IPFILTER_DEFAULT_BLOCK opt_ipfilter.h IPFILTER_LOG opt_ipfilter.h IPFILTER_LOOKUP opt_ipfilter.h IPFIREWALL opt_ipfw.h IPFIREWALL_DEFAULT_TO_ACCEPT opt_ipfw.h IPFIREWALL_NAT opt_ipfw.h IPFIREWALL_VERBOSE opt_ipfw.h IPFIREWALL_VERBOSE_LIMIT opt_ipfw.h IPSEC opt_ipsec.h IPSEC_DEBUG opt_ipsec.h IPSEC_FILTERTUNNEL opt_ipsec.h IPSEC_NAT_T opt_ipsec.h IPSTEALTH KRPC LIBALIAS LIBMBPOOL LIBMCHAIN MBUF_PROFILING MBUF_STRESS_TEST MROUTING opt_mrouting.h NFSLOCKD PCBGROUP opt_pcbgroup.h PF_DEFAULT_TO_DROP opt_pf.h RADIX_MPATH opt_mpath.h ROUTETABLES opt_route.h RSS opt_rss.h SLIP_IFF_OPTS opt_slip.h TCPDEBUG TCPPCAP opt_global.h SIFTR TCP_OFFLOAD opt_inet.h # Enable code to dispatch TCP offloading TCP_RFC7413 opt_inet.h TCP_RFC7413_MAX_KEYS opt_inet.h TCP_SIGNATURE opt_inet.h VLAN_ARRAY opt_vlan.h XBONEHACK FLOWTABLE opt_route.h FLOWTABLE_HASH_ALL opt_route.h # # SCTP # SCTP opt_sctp.h SCTP_DEBUG opt_sctp.h # Enable debug printfs SCTP_WITH_NO_CSUM opt_sctp.h # Use this at your peril SCTP_LOCK_LOGGING opt_sctp.h # Log to KTR lock activity SCTP_MBUF_LOGGING opt_sctp.h # Log to KTR general mbuf aloc/free SCTP_MBCNT_LOGGING opt_sctp.h # Log to KTR mbcnt activity SCTP_PACKET_LOGGING opt_sctp.h # Log to a packet buffer last N packets SCTP_LTRACE_CHUNKS opt_sctp.h # Log to KTR chunks processed SCTP_LTRACE_ERRORS opt_sctp.h # Log to KTR error returns. SCTP_USE_PERCPU_STAT opt_sctp.h # Use per cpu stats. SCTP_MCORE_INPUT opt_sctp.h # Have multiple input threads for input mbufs SCTP_LOCAL_TRACE_BUF opt_sctp.h # Use tracebuffer exported via sysctl SCTP_DETAILED_STR_STATS opt_sctp.h # Use per PR-SCTP policy stream stats # # # # Netgraph(4). Use option NETGRAPH to enable the base netgraph code. # Each netgraph node type can be either be compiled into the kernel # or loaded dynamically. To get the former, include the corresponding # option below. Each type has its own man page, e.g. ng_async(4). NETGRAPH NETGRAPH_DEBUG opt_netgraph.h NETGRAPH_ASYNC opt_netgraph.h NETGRAPH_ATMLLC opt_netgraph.h NETGRAPH_ATM_ATMPIF opt_netgraph.h NETGRAPH_BLUETOOTH opt_netgraph.h NETGRAPH_BLUETOOTH_BT3C opt_netgraph.h NETGRAPH_BLUETOOTH_H4 opt_netgraph.h NETGRAPH_BLUETOOTH_HCI opt_netgraph.h NETGRAPH_BLUETOOTH_L2CAP opt_netgraph.h NETGRAPH_BLUETOOTH_SOCKET opt_netgraph.h NETGRAPH_BLUETOOTH_UBT opt_netgraph.h NETGRAPH_BLUETOOTH_UBTBCMFW opt_netgraph.h NETGRAPH_BPF opt_netgraph.h NETGRAPH_BRIDGE opt_netgraph.h NETGRAPH_CAR opt_netgraph.h NETGRAPH_CISCO opt_netgraph.h NETGRAPH_DEFLATE opt_netgraph.h NETGRAPH_DEVICE opt_netgraph.h NETGRAPH_ECHO opt_netgraph.h NETGRAPH_EIFACE opt_netgraph.h NETGRAPH_ETHER opt_netgraph.h NETGRAPH_ETHER_ECHO opt_netgraph.h NETGRAPH_FEC opt_netgraph.h NETGRAPH_FRAME_RELAY opt_netgraph.h NETGRAPH_GIF opt_netgraph.h NETGRAPH_GIF_DEMUX opt_netgraph.h NETGRAPH_HOLE opt_netgraph.h NETGRAPH_IFACE opt_netgraph.h NETGRAPH_IP_INPUT opt_netgraph.h NETGRAPH_IPFW opt_netgraph.h NETGRAPH_KSOCKET opt_netgraph.h NETGRAPH_L2TP opt_netgraph.h NETGRAPH_LMI opt_netgraph.h # MPPC compression requires proprietary files (not included) NETGRAPH_MPPC_COMPRESSION opt_netgraph.h NETGRAPH_MPPC_ENCRYPTION opt_netgraph.h NETGRAPH_NAT opt_netgraph.h NETGRAPH_NETFLOW opt_netgraph.h NETGRAPH_ONE2MANY opt_netgraph.h NETGRAPH_PATCH opt_netgraph.h NETGRAPH_PIPE opt_netgraph.h NETGRAPH_PPP opt_netgraph.h NETGRAPH_PPPOE opt_netgraph.h NETGRAPH_PPTPGRE opt_netgraph.h NETGRAPH_PRED1 opt_netgraph.h NETGRAPH_RFC1490 opt_netgraph.h NETGRAPH_SOCKET opt_netgraph.h NETGRAPH_SPLIT opt_netgraph.h NETGRAPH_SPPP opt_netgraph.h NETGRAPH_TAG opt_netgraph.h NETGRAPH_TCPMSS opt_netgraph.h NETGRAPH_TEE opt_netgraph.h NETGRAPH_TTY opt_netgraph.h NETGRAPH_UI opt_netgraph.h NETGRAPH_VJC opt_netgraph.h NETGRAPH_VLAN opt_netgraph.h # NgATM options NGATM_ATM opt_netgraph.h NGATM_ATMBASE opt_netgraph.h NGATM_SSCOP opt_netgraph.h NGATM_SSCFU opt_netgraph.h NGATM_UNI opt_netgraph.h NGATM_CCATM opt_netgraph.h # DRM options DRM_DEBUG opt_drm.h TI_SF_BUF_JUMBO opt_ti.h TI_JUMBO_HDRSPLIT opt_ti.h # XXX Conflict: # of devices vs network protocol (Native ATM). # This makes "atm.h" unusable. NATM # DPT driver debug flags DPT_MEASURE_PERFORMANCE opt_dpt.h DPT_RESET_HBA opt_dpt.h # Misc debug flags. Most of these should probably be replaced with # 'DEBUG', and then let people recompile just the interesting modules # with 'make CC="cc -DDEBUG"'. CLUSTERDEBUG opt_debug_cluster.h DEBUG_1284 opt_ppb_1284.h VP0_DEBUG opt_vpo.h LPT_DEBUG opt_lpt.h PLIP_DEBUG opt_plip.h LOCKF_DEBUG opt_debug_lockf.h SI_DEBUG opt_debug_si.h IFMEDIA_DEBUG opt_ifmedia.h # Fb options FB_DEBUG opt_fb.h FB_INSTALL_CDEV opt_fb.h # ppbus related options PERIPH_1284 opt_ppb_1284.h DONTPROBE_1284 opt_ppb_1284.h # smbus related options ENABLE_ALART opt_intpm.h # These cause changes all over the kernel BLKDEV_IOSIZE opt_global.h BURN_BRIDGES opt_global.h DEBUG opt_global.h DEBUG_LOCKS opt_global.h DEBUG_VFS_LOCKS opt_global.h DFLTPHYS opt_global.h DIAGNOSTIC opt_global.h INVARIANT_SUPPORT opt_global.h INVARIANTS opt_global.h MAXCPU opt_global.h MAXMEMDOM opt_global.h MAXPHYS opt_global.h MCLSHIFT opt_global.h MUTEX_DEBUG opt_global.h MUTEX_NOINLINE opt_global.h LOCK_PROFILING opt_global.h LOCK_PROFILING_FAST opt_global.h MSIZE opt_global.h REGRESSION opt_global.h RWLOCK_NOINLINE opt_global.h SX_NOINLINE opt_global.h VFS_BIO_DEBUG opt_global.h # These are VM related options VM_KMEM_SIZE opt_vm.h VM_KMEM_SIZE_SCALE opt_vm.h VM_KMEM_SIZE_MAX opt_vm.h VM_NRESERVLEVEL opt_vm.h VM_LEVEL_0_ORDER opt_vm.h NO_SWAPPING opt_vm.h MALLOC_MAKE_FAILURES opt_vm.h MALLOC_PROFILE opt_vm.h MALLOC_DEBUG_MAXZONES opt_vm.h # The MemGuard replacement allocator used for tamper-after-free detection DEBUG_MEMGUARD opt_vm.h # The RedZone malloc(9) protection DEBUG_REDZONE opt_vm.h # Standard SMP options SMP opt_global.h # Size of the kernel message buffer MSGBUF_SIZE opt_msgbuf.h # NFS options NFS_MINATTRTIMO opt_nfs.h NFS_MAXATTRTIMO opt_nfs.h NFS_MINDIRATTRTIMO opt_nfs.h NFS_MAXDIRATTRTIMO opt_nfs.h NFS_DEBUG opt_nfs.h # For the Bt848/Bt848A/Bt849/Bt878/Bt879 driver OVERRIDE_CARD opt_bktr.h OVERRIDE_TUNER opt_bktr.h OVERRIDE_DBX opt_bktr.h OVERRIDE_MSP opt_bktr.h BROOKTREE_SYSTEM_DEFAULT opt_bktr.h BROOKTREE_ALLOC_PAGES opt_bktr.h BKTR_OVERRIDE_CARD opt_bktr.h BKTR_OVERRIDE_TUNER opt_bktr.h BKTR_OVERRIDE_DBX opt_bktr.h BKTR_OVERRIDE_MSP opt_bktr.h BKTR_SYSTEM_DEFAULT opt_bktr.h BKTR_ALLOC_PAGES opt_bktr.h BKTR_USE_PLL opt_bktr.h BKTR_GPIO_ACCESS opt_bktr.h BKTR_NO_MSP_RESET opt_bktr.h BKTR_430_FX_MODE opt_bktr.h BKTR_SIS_VIA_MODE opt_bktr.h BKTR_USE_FREEBSD_SMBUS opt_bktr.h BKTR_NEW_MSP34XX_DRIVER opt_bktr.h # Options for uart(4) UART_PPS_ON_CTS opt_uart.h UART_POLL_FREQ opt_uart.h UART_DEV_TOLERANCE_PCT opt_uart.h # options for bus/device framework BUS_DEBUG opt_bus.h # options for USB support USB_DEBUG opt_usb.h USB_HOST_ALIGN opt_usb.h USB_REQ_DEBUG opt_usb.h USB_TEMPLATE opt_usb.h USB_VERBOSE opt_usb.h USB_DMA_SINGLE_ALLOC opt_usb.h USB_EHCI_BIG_ENDIAN_DESC opt_usb.h U3G_DEBUG opt_u3g.h UKBD_DFLT_KEYMAP opt_ukbd.h UPLCOM_INTR_INTERVAL opt_uplcom.h UVSCOM_DEFAULT_OPKTSIZE opt_uvscom.h UVSCOM_INTR_INTERVAL opt_uvscom.h # Embedded system options INIT_PATH ROOTDEVNAME FDC_DEBUG opt_fdc.h PCFCLOCK_VERBOSE opt_pcfclock.h PCFCLOCK_MAX_RETRIES opt_pcfclock.h KTR opt_global.h KTR_ALQ opt_ktr.h KTR_MASK opt_ktr.h KTR_CPUMASK opt_ktr.h KTR_COMPILE opt_global.h KTR_BOOT_ENTRIES opt_global.h KTR_ENTRIES opt_global.h KTR_VERBOSE opt_ktr.h WITNESS opt_global.h WITNESS_KDB opt_witness.h WITNESS_NO_VNODE opt_witness.h WITNESS_SKIPSPIN opt_witness.h WITNESS_COUNT opt_witness.h OPENSOLARIS_WITNESS opt_global.h # options for ACPI support ACPI_DEBUG opt_acpi.h ACPI_MAX_TASKS opt_acpi.h ACPI_MAX_THREADS opt_acpi.h ACPI_DMAR opt_acpi.h DEV_ACPI opt_acpi.h # ISA support DEV_ISA opt_isa.h ISAPNP opt_isa.h # various 'device presence' options. DEV_BPF opt_bpf.h DEV_CARP opt_carp.h DEV_MCA opt_mca.h DEV_NETMAP opt_global.h DEV_PCI opt_pci.h DEV_PF opt_pf.h DEV_PFLOG opt_pf.h DEV_PFSYNC opt_pf.h DEV_RANDOM opt_global.h DEV_SPLASH opt_splash.h DEV_VLAN opt_vlan.h # EISA support DEV_EISA opt_eisa.h EISA_SLOTS opt_eisa.h # ed driver ED_HPP opt_ed.h ED_3C503 opt_ed.h ED_SIC opt_ed.h # bce driver BCE_DEBUG opt_bce.h BCE_NVRAM_WRITE_SUPPORT opt_bce.h SOCKBUF_DEBUG opt_global.h # options for ubsec driver UBSEC_DEBUG opt_ubsec.h UBSEC_RNDTEST opt_ubsec.h UBSEC_NO_RNG opt_ubsec.h # options for hifn driver HIFN_DEBUG opt_hifn.h HIFN_RNDTEST opt_hifn.h # options for safenet driver SAFE_DEBUG opt_safe.h SAFE_NO_RNG opt_safe.h SAFE_RNDTEST opt_safe.h # syscons/vt options MAXCONS opt_syscons.h SC_ALT_MOUSE_IMAGE opt_syscons.h SC_CUT_SPACES2TABS opt_syscons.h SC_CUT_SEPCHARS opt_syscons.h SC_DEBUG_LEVEL opt_syscons.h SC_DFLT_FONT opt_syscons.h SC_DISABLE_KDBKEY opt_syscons.h SC_DISABLE_REBOOT opt_syscons.h SC_HISTORY_SIZE opt_syscons.h SC_KERNEL_CONS_ATTR opt_syscons.h SC_KERNEL_CONS_REV_ATTR opt_syscons.h SC_MOUSE_CHAR opt_syscons.h SC_NO_CUTPASTE opt_syscons.h SC_NO_FONT_LOADING opt_syscons.h SC_NO_HISTORY opt_syscons.h SC_NO_MODE_CHANGE opt_syscons.h SC_NO_SUSPEND_VTYSWITCH opt_syscons.h SC_NO_SYSMOUSE opt_syscons.h SC_NORM_ATTR opt_syscons.h SC_NORM_REV_ATTR opt_syscons.h SC_PIXEL_MODE opt_syscons.h SC_RENDER_DEBUG opt_syscons.h SC_TWOBUTTON_MOUSE opt_syscons.h VT_ALT_TO_ESC_HACK opt_syscons.h VT_FB_DEFAULT_WIDTH opt_syscons.h VT_FB_DEFAULT_HEIGHT opt_syscons.h VT_MAXWINDOWS opt_syscons.h VT_TWOBUTTON_MOUSE opt_syscons.h DEV_SC opt_syscons.h DEV_VT opt_syscons.h # teken terminal emulator options TEKEN_CONS25 opt_teken.h TEKEN_UTF8 opt_teken.h TERMINAL_KERN_ATTR opt_teken.h TERMINAL_NORM_ATTR opt_teken.h # options for printf PRINTF_BUFR_SIZE opt_printf.h # kbd options KBD_DISABLE_KEYMAP_LOAD opt_kbd.h KBD_INSTALL_CDEV opt_kbd.h KBD_MAXRETRY opt_kbd.h KBD_MAXWAIT opt_kbd.h KBD_RESETDELAY opt_kbd.h KBDIO_DEBUG opt_kbd.h # options for the Atheros driver ATH_DEBUG opt_ath.h ATH_TXBUF opt_ath.h ATH_RXBUF opt_ath.h ATH_DIAGAPI opt_ath.h ATH_TX99_DIAG opt_ath.h ATH_ENABLE_11N opt_ath.h ATH_ENABLE_DFS opt_ath.h ATH_EEPROM_FIRMWARE opt_ath.h ATH_ENABLE_RADIOTAP_VENDOR_EXT opt_ath.h ATH_DEBUG_ALQ opt_ath.h ATH_KTR_INTR_DEBUG opt_ath.h # options for the Atheros hal AH_SUPPORT_AR5416 opt_ah.h # XXX For now, this breaks non-AR9130 chipsets, so only use it # XXX when actually targetting AR9130. AH_SUPPORT_AR9130 opt_ah.h # This is required for AR933x SoC support AH_SUPPORT_AR9330 opt_ah.h AH_SUPPORT_AR9340 opt_ah.h AH_SUPPORT_QCA9530 opt_ah.h AH_SUPPORT_QCA9550 opt_ah.h AH_DEBUG opt_ah.h AH_ASSERT opt_ah.h AH_DEBUG_ALQ opt_ah.h AH_REGOPS_FUNC opt_ah.h AH_WRITE_REGDOMAIN opt_ah.h AH_DEBUG_COUNTRY opt_ah.h AH_WRITE_EEPROM opt_ah.h AH_PRIVATE_DIAG opt_ah.h AH_NEED_DESC_SWAP opt_ah.h AH_USE_INIPDGAIN opt_ah.h AH_MAXCHAN opt_ah.h AH_RXCFG_SDMAMW_4BYTES opt_ah.h AH_INTERRUPT_DEBUGGING opt_ah.h # AR5416 and later interrupt mitigation # XXX do not use this for AR9130 AH_AR5416_INTERRUPT_MITIGATION opt_ah.h # options for the Broadcom BCM43xx driver (bwi) BWI_DEBUG opt_bwi.h BWI_DEBUG_VERBOSE opt_bwi.h # options for the Marvell 8335 wireless driver MALO_DEBUG opt_malo.h MALO_TXBUF opt_malo.h MALO_RXBUF opt_malo.h # options for the Marvell wireless driver MWL_DEBUG opt_mwl.h MWL_TXBUF opt_mwl.h MWL_RXBUF opt_mwl.h MWL_DIAGAPI opt_mwl.h MWL_AGGR_SIZE opt_mwl.h MWL_TX_NODROP opt_mwl.h # Options for the Intel 802.11ac wireless driver IWM_DEBUG opt_iwm.h # Options for the Intel 802.11n wireless driver IWN_DEBUG opt_iwn.h # Options for the Intel 3945ABG wireless driver WPI_DEBUG opt_wpi.h # dcons options DCONS_BUF_SIZE opt_dcons.h DCONS_POLL_HZ opt_dcons.h DCONS_FORCE_CONSOLE opt_dcons.h DCONS_FORCE_GDB opt_dcons.h # HWPMC options HWPMC_DEBUG opt_global.h HWPMC_HOOKS HWPMC_MIPS_BACKTRACE opt_hwpmc_hooks.h # XBOX options for FreeBSD/i386, but some files are MI XBOX opt_xbox.h # Interrupt filtering INTR_FILTER # 802.11 support layer IEEE80211_DEBUG opt_wlan.h IEEE80211_DEBUG_REFCNT opt_wlan.h IEEE80211_AMPDU_AGE opt_wlan.h IEEE80211_SUPPORT_MESH opt_wlan.h IEEE80211_SUPPORT_SUPERG opt_wlan.h IEEE80211_SUPPORT_TDMA opt_wlan.h IEEE80211_ALQ opt_wlan.h IEEE80211_DFS_DEBUG opt_wlan.h # 802.11 TDMA support TDMA_SLOTLEN_DEFAULT opt_tdma.h TDMA_SLOTCNT_DEFAULT opt_tdma.h TDMA_BINTVAL_DEFAULT opt_tdma.h TDMA_TXRATE_11B_DEFAULT opt_tdma.h TDMA_TXRATE_11G_DEFAULT opt_tdma.h TDMA_TXRATE_11A_DEFAULT opt_tdma.h TDMA_TXRATE_TURBO_DEFAULT opt_tdma.h TDMA_TXRATE_HALF_DEFAULT opt_tdma.h TDMA_TXRATE_QUARTER_DEFAULT opt_tdma.h TDMA_TXRATE_11NA_DEFAULT opt_tdma.h TDMA_TXRATE_11NG_DEFAULT opt_tdma.h # VideoMode PICKMODE_DEBUG opt_videomode.h # Network stack virtualization options VIMAGE opt_global.h VNET_DEBUG opt_global.h # Common Flash Interface (CFI) options CFI_SUPPORT_STRATAFLASH opt_cfi.h CFI_ARMEDANDDANGEROUS opt_cfi.h # Sound options SND_DEBUG opt_snd.h SND_DIAGNOSTIC opt_snd.h SND_FEEDER_MULTIFORMAT opt_snd.h SND_FEEDER_FULL_MULTIFORMAT opt_snd.h SND_FEEDER_RATE_HP opt_snd.h SND_PCM_64 opt_snd.h SND_OLDSTEREO opt_snd.h X86BIOS # Flattened device tree options FDT opt_platform.h FDT_DTB_STATIC opt_platform.h # OFED Infiniband stack OFED opt_ofed.h OFED_DEBUG_INIT opt_ofed.h SDP opt_ofed.h SDP_DEBUG opt_ofed.h IPOIB opt_ofed.h IPOIB_DEBUG opt_ofed.h IPOIB_CM opt_ofed.h # Resource Accounting RACCT opt_global.h RACCT_DEFAULT_TO_DISABLED opt_global.h # Resource Limits RCTL opt_global.h # Random number generator(s) # Which CSPRNG hash we get. # If Yarrow is not chosen, Fortuna is selected. RANDOM_YARROW opt_global.h # With this, no entropy processor is loaded, but the entropy # harvesting infrastructure is present. This means an entropy # processor may be loaded as a module. RANDOM_LOADABLE opt_global.h # This turns on high-rate and potentially expensive harvesting in # the uma slab allocator. RANDOM_ENABLE_UMA opt_global.h # Intel em(4) driver EM_MULTIQUEUE opt_em.h Index: projects/clang380-import/sys/conf =================================================================== --- projects/clang380-import/sys/conf (revision 294776) +++ projects/clang380-import/sys/conf (revision 294777) Property changes on: projects/clang380-import/sys/conf ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys/conf:r294599-294776 Index: projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb.h =================================================================== --- projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb.h (revision 294776) +++ projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb.h (revision 294777) @@ -1,177 +1,178 @@ /************************************************************************** Copyright (c) 2007, 2008 Chelsio Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Neither the name of the Chelsio Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. $FreeBSD$ ***************************************************************************/ #ifndef __IWCH_H__ #define __IWCH_H__ struct iwch_pd; struct iwch_cq; struct iwch_qp; struct iwch_mr; enum t3ctype { T3A = 0, T3B, T3C }; #define PAGE_MASK_IWARP (~(PAGE_SIZE-1)) struct iwch_rnic_attributes { u32 vendor_id; u32 vendor_part_id; u32 max_qps; u32 max_wrs; /* Max for any SQ/RQ */ u32 max_sge_per_wr; u32 max_sge_per_rdma_write_wr; /* for RDMA Write WR */ u32 max_cqs; u32 max_cqes_per_cq; u32 max_mem_regs; u32 max_phys_buf_entries; /* for phys buf list */ u32 max_pds; /* * The memory page sizes supported by this RNIC. * Bit position i in bitmap indicates page of * size (4k)^i. Phys block list mode unsupported. */ u32 mem_pgsizes_bitmask; u64 max_mr_size; u8 can_resize_wq; /* * The maximum number of RDMA Reads that can be outstanding * per QP with this RNIC as the target. */ u32 max_rdma_reads_per_qp; /* * The maximum number of resources used for RDMA Reads * by this RNIC with this RNIC as the target. */ u32 max_rdma_read_resources; /* * The max depth per QP for initiation of RDMA Read * by this RNIC. */ u32 max_rdma_read_qp_depth; /* * The maximum depth for initiation of RDMA Read * operations by this RNIC on all QPs */ u32 max_rdma_read_depth; u8 rq_overflow_handled; u32 can_modify_ird; u32 can_modify_ord; u32 max_mem_windows; u32 stag0_value; u8 zbva_support; u8 local_invalidate_fence; u32 cq_overflow_detection; }; struct iwch_dev { struct ib_device ibdev; struct cxio_rdev rdev; u32 device_cap_flags; struct iwch_rnic_attributes attr; struct idr cqidr; struct idr qpidr; struct idr mmidr; struct mtx lock; TAILQ_ENTRY(iwch_dev) entry; }; #ifndef container_of #define container_of(p, stype, field) ((stype *)(((uint8_t *)(p)) - offsetof(stype, field))) #endif static inline struct iwch_dev *to_iwch_dev(struct ib_device *ibdev) { return container_of(ibdev, struct iwch_dev, ibdev); } static inline int t3b_device(const struct iwch_dev *rhp __unused) { return (0); } static inline int t3a_device(const struct iwch_dev *rhp __unused) { return (0); } static inline struct iwch_cq *get_chp(struct iwch_dev *rhp, u32 cqid) { return idr_find(&rhp->cqidr, cqid); } static inline struct iwch_qp *get_qhp(struct iwch_dev *rhp, u32 qpid) { return idr_find(&rhp->qpidr, qpid); } static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid) { return idr_find(&rhp->mmidr, mmid); } static inline int insert_handle(struct iwch_dev *rhp, struct idr *idr, void *handle, u32 id) { int ret; u32 newid; do { if (!idr_pre_get(idr, GFP_KERNEL)) { return -ENOMEM; } mtx_lock(&rhp->lock); ret = idr_get_new_above(idr, handle, id, &newid); WARN_ON(ret != 0); WARN_ON(!ret && newid != id); mtx_unlock(&rhp->lock); } while (ret == -EAGAIN); return ret; } static inline void remove_handle(struct iwch_dev *rhp, struct idr *idr, u32 id) { mtx_lock(&rhp->lock); idr_remove(idr, id); mtx_unlock(&rhp->lock); } void iwch_ev_dispatch(struct iwch_dev *, struct mbuf *); +void process_newconn(struct iw_cm_id *parent_cm_id, struct socket *child_so); #endif Index: projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c =================================================================== --- projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (revision 294776) +++ projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (revision 294777) @@ -1,1719 +1,1686 @@ /************************************************************************** Copyright (c) 2007, Chelsio Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Neither the name of the Chelsio Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ***************************************************************************/ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #ifdef TCP_OFFLOAD #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef KTR static char *states[] = { "idle", "listen", "connecting", "mpa_wait_req", "mpa_req_sent", "mpa_req_rcvd", "mpa_rep_sent", "fpdu_mode", "aborting", "closing", "moribund", "dead", NULL, }; #endif SYSCTL_NODE(_hw, OID_AUTO, iw_cxgb, CTLFLAG_RD, 0, "iw_cxgb driver parameters"); static int ep_timeout_secs = 60; SYSCTL_INT(_hw_iw_cxgb, OID_AUTO, ep_timeout_secs, CTLFLAG_RWTUN, &ep_timeout_secs, 0, "CM Endpoint operation timeout in seconds (default=60)"); static int mpa_rev = 1; SYSCTL_INT(_hw_iw_cxgb, OID_AUTO, mpa_rev, CTLFLAG_RWTUN, &mpa_rev, 0, "MPA Revision, 0 supports amso1100, 1 is spec compliant. (default=1)"); static int markers_enabled = 0; SYSCTL_INT(_hw_iw_cxgb, OID_AUTO, markers_enabled, CTLFLAG_RWTUN, &markers_enabled, 0, "Enable MPA MARKERS (default(0)=disabled)"); static int crc_enabled = 1; SYSCTL_INT(_hw_iw_cxgb, OID_AUTO, crc_enabled, CTLFLAG_RWTUN, &crc_enabled, 0, "Enable MPA CRC (default(1)=enabled)"); static int rcv_win = 256 * 1024; SYSCTL_INT(_hw_iw_cxgb, OID_AUTO, rcv_win, CTLFLAG_RWTUN, &rcv_win, 0, "TCP receive window in bytes (default=256KB)"); static int snd_win = 32 * 1024; SYSCTL_INT(_hw_iw_cxgb, OID_AUTO, snd_win, CTLFLAG_RWTUN, &snd_win, 0, "TCP send window in bytes (default=32KB)"); static unsigned int nocong = 0; SYSCTL_UINT(_hw_iw_cxgb, OID_AUTO, nocong, CTLFLAG_RWTUN, &nocong, 0, "Turn off congestion control (default=0)"); static unsigned int cong_flavor = 1; SYSCTL_UINT(_hw_iw_cxgb, OID_AUTO, cong_flavor, CTLFLAG_RWTUN, &cong_flavor, 0, "TCP Congestion control flavor (default=1)"); static void ep_timeout(void *arg); static void connect_reply_upcall(struct iwch_ep *ep, int status); static int iwch_so_upcall(struct socket *so, void *arg, int waitflag); /* * Cruft to offload socket upcalls onto thread. */ static struct mtx req_lock; static TAILQ_HEAD(iwch_ep_list, iwch_ep_common) req_list; static struct task iw_cxgb_task; static struct taskqueue *iw_cxgb_taskq; static void process_req(void *ctx, int pending); static void start_ep_timer(struct iwch_ep *ep) { CTR2(KTR_IW_CXGB, "%s ep %p", __FUNCTION__, ep); if (callout_pending(&ep->timer)) { CTR2(KTR_IW_CXGB, "%s stopped / restarted timer ep %p", __FUNCTION__, ep); callout_deactivate(&ep->timer); callout_drain(&ep->timer); } else { /* * XXX this looks racy */ get_ep(&ep->com); callout_init(&ep->timer, 1); } callout_reset(&ep->timer, ep_timeout_secs * hz, ep_timeout, ep); } static void stop_ep_timer(struct iwch_ep *ep) { CTR2(KTR_IW_CXGB, "%s ep %p", __FUNCTION__, ep); if (!callout_pending(&ep->timer)) { CTR3(KTR_IW_CXGB, "%s timer stopped when its not running! ep %p state %u\n", __func__, ep, ep->com.state); return; } callout_drain(&ep->timer); put_ep(&ep->com); } static int set_tcpinfo(struct iwch_ep *ep) { struct socket *so = ep->com.so; struct inpcb *inp = sotoinpcb(so); struct tcpcb *tp; struct toepcb *toep; int rc = 0; INP_WLOCK(inp); tp = intotcpcb(inp); if ((tp->t_flags & TF_TOE) == 0) { rc = EINVAL; printf("%s: connection NOT OFFLOADED!\n", __func__); goto done; } toep = tp->t_toe; ep->hwtid = toep->tp_tid; ep->snd_seq = tp->snd_nxt; ep->rcv_seq = tp->rcv_nxt; ep->emss = tp->t_maxseg; if (ep->emss < 128) ep->emss = 128; done: INP_WUNLOCK(inp); return (rc); } static enum iwch_ep_state state_read(struct iwch_ep_common *epc) { enum iwch_ep_state state; mtx_lock(&epc->lock); state = epc->state; mtx_unlock(&epc->lock); return state; } static void __state_set(struct iwch_ep_common *epc, enum iwch_ep_state new) { epc->state = new; } static void state_set(struct iwch_ep_common *epc, enum iwch_ep_state new) { mtx_lock(&epc->lock); CTR3(KTR_IW_CXGB, "%s - %s -> %s", __FUNCTION__, states[epc->state], states[new]); __state_set(epc, new); mtx_unlock(&epc->lock); return; } static void * alloc_ep(int size, int flags) { struct iwch_ep_common *epc; epc = malloc(size, M_DEVBUF, flags); if (epc) { memset(epc, 0, size); refcount_init(&epc->refcount, 1); mtx_init(&epc->lock, "iwch_epc lock", NULL, MTX_DEF|MTX_DUPOK); cv_init(&epc->waitq, "iwch_epc cv"); } CTR2(KTR_IW_CXGB, "%s alloc ep %p", __FUNCTION__, epc); return epc; } void __free_ep(struct iwch_ep_common *epc) { CTR3(KTR_IW_CXGB, "%s ep %p state %s", __FUNCTION__, epc, states[state_read(epc)]); - KASSERT(!epc->so, ("%s warning ep->so %p \n", __FUNCTION__, epc->so)); KASSERT(!epc->entry.tqe_prev, ("%s epc %p still on req list!\n", __FUNCTION__, epc)); free(epc, M_DEVBUF); } static int find_route(__be32 local_ip, __be32 peer_ip, __be16 local_port, __be16 peer_port, u8 tos, struct nhop4_extended *pnh4) { struct in_addr addr; addr.s_addr = peer_ip; return (fib4_lookup_nh_ext(RT_DEFAULT_FIB, addr, NHR_REF, 0, pnh4)); } static void close_socket(struct iwch_ep_common *epc, int close) { CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, epc, epc->so, states[epc->state]); SOCK_LOCK(epc->so); soupcall_clear(epc->so, SO_RCV); SOCK_UNLOCK(epc->so); if (close) soclose(epc->so); else soshutdown(epc->so, SHUT_WR|SHUT_RD); epc->so = NULL; } static void shutdown_socket(struct iwch_ep_common *epc) { CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, epc, epc->so, states[epc->state]); soshutdown(epc->so, SHUT_WR); } static void abort_socket(struct iwch_ep *ep) { struct sockopt sopt; int err; struct linger l; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); l.l_onoff = 1; l.l_linger = 0; /* linger_time of 0 forces RST to be sent */ sopt.sopt_dir = SOPT_SET; sopt.sopt_level = SOL_SOCKET; sopt.sopt_name = SO_LINGER; sopt.sopt_val = (caddr_t)&l; sopt.sopt_valsize = sizeof l; sopt.sopt_td = NULL; err = sosetopt(ep->com.so, &sopt); if (err) printf("%s can't set linger to 0, no RST! err %d\n", __FUNCTION__, err); } static void send_mpa_req(struct iwch_ep *ep) { int mpalen; struct mpa_message *mpa; struct mbuf *m; int err; CTR3(KTR_IW_CXGB, "%s ep %p pd_len %d", __FUNCTION__, ep, ep->plen); mpalen = sizeof(*mpa) + ep->plen; m = m_gethdr(mpalen, M_NOWAIT); if (m == NULL) { connect_reply_upcall(ep, -ENOMEM); return; } mpa = mtod(m, struct mpa_message *); m->m_len = mpalen; m->m_pkthdr.len = mpalen; memset(mpa, 0, sizeof(*mpa)); memcpy(mpa->key, MPA_KEY_REQ, sizeof(mpa->key)); mpa->flags = (crc_enabled ? MPA_CRC : 0) | (markers_enabled ? MPA_MARKERS : 0); mpa->private_data_size = htons(ep->plen); mpa->revision = mpa_rev; if (ep->plen) memcpy(mpa->private_data, ep->mpa_pkt + sizeof(*mpa), ep->plen); err = sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); if (err) { m_freem(m); connect_reply_upcall(ep, -ENOMEM); return; } start_ep_timer(ep); state_set(&ep->com, MPA_REQ_SENT); return; } static int send_mpa_reject(struct iwch_ep *ep, const void *pdata, u8 plen) { int mpalen; struct mpa_message *mpa; struct mbuf *m; int err; CTR3(KTR_IW_CXGB, "%s ep %p plen %d", __FUNCTION__, ep, plen); mpalen = sizeof(*mpa) + plen; m = m_gethdr(mpalen, M_NOWAIT); if (m == NULL) { printf("%s - cannot alloc mbuf!\n", __FUNCTION__); return (-ENOMEM); } mpa = mtod(m, struct mpa_message *); m->m_len = mpalen; m->m_pkthdr.len = mpalen; memset(mpa, 0, sizeof(*mpa)); memcpy(mpa->key, MPA_KEY_REP, sizeof(mpa->key)); mpa->flags = MPA_REJECT; mpa->revision = mpa_rev; mpa->private_data_size = htons(plen); if (plen) memcpy(mpa->private_data, pdata, plen); err = sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); PANIC_IF(err); return 0; } static int send_mpa_reply(struct iwch_ep *ep, const void *pdata, u8 plen) { int mpalen; struct mpa_message *mpa; struct mbuf *m; CTR4(KTR_IW_CXGB, "%s ep %p so %p plen %d", __FUNCTION__, ep, ep->com.so, plen); mpalen = sizeof(*mpa) + plen; m = m_gethdr(mpalen, M_NOWAIT); if (m == NULL) { printf("%s - cannot alloc mbuf!\n", __FUNCTION__); return (-ENOMEM); } mpa = mtod(m, struct mpa_message *); m->m_len = mpalen; m->m_pkthdr.len = mpalen; memset(mpa, 0, sizeof(*mpa)); memcpy(mpa->key, MPA_KEY_REP, sizeof(mpa->key)); mpa->flags = (ep->mpa_attr.crc_enabled ? MPA_CRC : 0) | (markers_enabled ? MPA_MARKERS : 0); mpa->revision = mpa_rev; mpa->private_data_size = htons(plen); if (plen) memcpy(mpa->private_data, pdata, plen); state_set(&ep->com, MPA_REP_SENT); return sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); } static void close_complete_upcall(struct iwch_ep *ep) { struct iw_cm_event event; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CLOSE; if (ep->com.cm_id) { CTR3(KTR_IW_CXGB, "close complete delivered ep %p cm_id %p tid %d", ep, ep->com.cm_id, ep->hwtid); ep->com.cm_id->event_handler(ep->com.cm_id, &event); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; } } static void abort_connection(struct iwch_ep *ep) { CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); state_set(&ep->com, ABORTING); abort_socket(ep); close_socket(&ep->com, 0); close_complete_upcall(ep); state_set(&ep->com, DEAD); put_ep(&ep->com); } static void peer_close_upcall(struct iwch_ep *ep) { struct iw_cm_event event; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_DISCONNECT; if (ep->com.cm_id) { CTR3(KTR_IW_CXGB, "peer close delivered ep %p cm_id %p tid %d", ep, ep->com.cm_id, ep->hwtid); ep->com.cm_id->event_handler(ep->com.cm_id, &event); } } static void peer_abort_upcall(struct iwch_ep *ep) { struct iw_cm_event event; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CLOSE; event.status = ECONNRESET; if (ep->com.cm_id) { CTR3(KTR_IW_CXGB, "abort delivered ep %p cm_id %p tid %d", ep, ep->com.cm_id, ep->hwtid); ep->com.cm_id->event_handler(ep->com.cm_id, &event); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; } } static void connect_reply_upcall(struct iwch_ep *ep, int status) { struct iw_cm_event event; CTR5(KTR_IW_CXGB, "%s ep %p so %p state %s status %d", __FUNCTION__, ep, ep->com.so, states[ep->com.state], status); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CONNECT_REPLY; event.status = status; event.local_addr = ep->com.local_addr; event.remote_addr = ep->com.remote_addr; if ((status == 0) || (status == ECONNREFUSED)) { event.private_data_len = ep->plen; event.private_data = ep->mpa_pkt + sizeof(struct mpa_message); } if (ep->com.cm_id) { CTR4(KTR_IW_CXGB, "%s ep %p tid %d status %d", __FUNCTION__, ep, ep->hwtid, status); ep->com.cm_id->event_handler(ep->com.cm_id, &event); } if (status < 0) { ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; } } static void connect_request_upcall(struct iwch_ep *ep) { struct iw_cm_event event; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CONNECT_REQUEST; event.local_addr = ep->com.local_addr; event.remote_addr = ep->com.remote_addr; event.private_data_len = ep->plen; event.private_data = ep->mpa_pkt + sizeof(struct mpa_message); event.provider_data = ep; event.so = ep->com.so; if (state_read(&ep->parent_ep->com) != DEAD) { get_ep(&ep->com); ep->parent_ep->com.cm_id->event_handler( ep->parent_ep->com.cm_id, &event); } put_ep(&ep->parent_ep->com); } static void established_upcall(struct iwch_ep *ep) { struct iw_cm_event event; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_ESTABLISHED; if (ep->com.cm_id) { CTR3(KTR_IW_CXGB, "%s ep %p tid %d", __FUNCTION__, ep, ep->hwtid); ep->com.cm_id->event_handler(ep->com.cm_id, &event); } } static void process_mpa_reply(struct iwch_ep *ep) { struct mpa_message *mpa; u16 plen; struct iwch_qp_attributes attrs; enum iwch_qp_attr_mask mask; int err; struct mbuf *top, *m; int flags = MSG_DONTWAIT; struct uio uio; int len; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); /* * Stop mpa timer. If it expired, then the state has * changed and we bail since ep_timeout already aborted * the connection. */ stop_ep_timer(ep); if (state_read(&ep->com) != MPA_REQ_SENT) return; uio.uio_resid = len = 1000000; uio.uio_td = ep->com.thread; err = soreceive(ep->com.so, NULL, &uio, &top, NULL, &flags); if (err) { if (err == EWOULDBLOCK) { start_ep_timer(ep); return; } err = -err; goto err; } if (ep->com.so->so_rcv.sb_mb) { printf("%s data after soreceive called! so %p sb_mb %p top %p\n", __FUNCTION__, ep->com.so, ep->com.so->so_rcv.sb_mb, top); } m = top; do { /* * If we get more than the supported amount of private data * then we must fail this connection. */ if (ep->mpa_pkt_len + m->m_len > sizeof(ep->mpa_pkt)) { err = (-EINVAL); goto err; } /* * copy the new data into our accumulation buffer. */ m_copydata(m, 0, m->m_len, &(ep->mpa_pkt[ep->mpa_pkt_len])); ep->mpa_pkt_len += m->m_len; if (!m->m_next) m = m->m_nextpkt; else m = m->m_next; } while (m); m_freem(top); /* * if we don't even have the mpa message, then bail. */ if (ep->mpa_pkt_len < sizeof(*mpa)) return; mpa = (struct mpa_message *)ep->mpa_pkt; /* Validate MPA header. */ if (mpa->revision != mpa_rev) { CTR2(KTR_IW_CXGB, "%s bad mpa rev %d", __FUNCTION__, mpa->revision); err = EPROTO; goto err; } if (memcmp(mpa->key, MPA_KEY_REP, sizeof(mpa->key))) { CTR2(KTR_IW_CXGB, "%s bad mpa key |%16s|", __FUNCTION__, mpa->key); err = EPROTO; goto err; } plen = ntohs(mpa->private_data_size); /* * Fail if there's too much private data. */ if (plen > MPA_MAX_PRIVATE_DATA) { CTR2(KTR_IW_CXGB, "%s plen too big %d", __FUNCTION__, plen); err = EPROTO; goto err; } /* * If plen does not account for pkt size */ if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) { CTR2(KTR_IW_CXGB, "%s pkt too big %d", __FUNCTION__, ep->mpa_pkt_len); err = EPROTO; goto err; } ep->plen = (u8) plen; /* * If we don't have all the pdata yet, then bail. * We'll continue process when more data arrives. */ if (ep->mpa_pkt_len < (sizeof(*mpa) + plen)) return; if (mpa->flags & MPA_REJECT) { err = ECONNREFUSED; goto err; } /* * If we get here we have accumulated the entire mpa * start reply message including private data. And * the MPA header is valid. */ CTR1(KTR_IW_CXGB, "%s mpa rpl looks good!", __FUNCTION__); state_set(&ep->com, FPDU_MODE); ep->mpa_attr.initiator = 1; ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0; ep->mpa_attr.recv_marker_enabled = markers_enabled; ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0; ep->mpa_attr.version = mpa_rev; if (set_tcpinfo(ep)) { printf("%s set_tcpinfo error\n", __FUNCTION__); goto err; } CTR5(KTR_IW_CXGB, "%s - crc_enabled=%d, recv_marker_enabled=%d, " "xmit_marker_enabled=%d, version=%d", __FUNCTION__, ep->mpa_attr.crc_enabled, ep->mpa_attr.recv_marker_enabled, ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version); attrs.mpa_attr = ep->mpa_attr; attrs.max_ird = ep->ird; attrs.max_ord = ep->ord; attrs.llp_stream_handle = ep; attrs.next_state = IWCH_QP_STATE_RTS; mask = IWCH_QP_ATTR_NEXT_STATE | IWCH_QP_ATTR_LLP_STREAM_HANDLE | IWCH_QP_ATTR_MPA_ATTR | IWCH_QP_ATTR_MAX_IRD | IWCH_QP_ATTR_MAX_ORD; /* bind QP and TID with INIT_WR */ err = iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, mask, &attrs, 1); if (!err) goto out; err: abort_connection(ep); out: connect_reply_upcall(ep, err); return; } static void process_mpa_request(struct iwch_ep *ep) { struct mpa_message *mpa; u16 plen; int flags = MSG_DONTWAIT; struct mbuf *top, *m; int err; struct uio uio; int len; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); /* * Stop mpa timer. If it expired, then the state has * changed and we bail since ep_timeout already aborted * the connection. */ stop_ep_timer(ep); if (state_read(&ep->com) != MPA_REQ_WAIT) return; uio.uio_resid = len = 1000000; uio.uio_td = ep->com.thread; err = soreceive(ep->com.so, NULL, &uio, &top, NULL, &flags); if (err) { if (err == EWOULDBLOCK) { start_ep_timer(ep); return; } err = -err; goto err; } m = top; do { /* * If we get more than the supported amount of private data * then we must fail this connection. */ if (ep->mpa_pkt_len + m->m_len > sizeof(ep->mpa_pkt)) { CTR2(KTR_IW_CXGB, "%s mpa message too big %d", __FUNCTION__, ep->mpa_pkt_len + m->m_len); goto err; } /* * Copy the new data into our accumulation buffer. */ m_copydata(m, 0, m->m_len, &(ep->mpa_pkt[ep->mpa_pkt_len])); ep->mpa_pkt_len += m->m_len; if (!m->m_next) m = m->m_nextpkt; else m = m->m_next; } while (m); m_freem(top); /* * If we don't even have the mpa message, then bail. * We'll continue process when more data arrives. */ if (ep->mpa_pkt_len < sizeof(*mpa)) { start_ep_timer(ep); CTR2(KTR_IW_CXGB, "%s not enough header %d...waiting...", __FUNCTION__, ep->mpa_pkt_len); return; } mpa = (struct mpa_message *) ep->mpa_pkt; /* * Validate MPA Header. */ if (mpa->revision != mpa_rev) { CTR2(KTR_IW_CXGB, "%s bad mpa rev %d", __FUNCTION__, mpa->revision); goto err; } if (memcmp(mpa->key, MPA_KEY_REQ, sizeof(mpa->key))) { CTR2(KTR_IW_CXGB, "%s bad mpa key |%16s|", __FUNCTION__, mpa->key); goto err; } plen = ntohs(mpa->private_data_size); /* * Fail if there's too much private data. */ if (plen > MPA_MAX_PRIVATE_DATA) { CTR2(KTR_IW_CXGB, "%s plen too big %d", __FUNCTION__, plen); goto err; } /* * If plen does not account for pkt size */ if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) { CTR2(KTR_IW_CXGB, "%s more data after private data %d", __FUNCTION__, ep->mpa_pkt_len); goto err; } ep->plen = (u8) plen; /* * If we don't have all the pdata yet, then bail. */ if (ep->mpa_pkt_len < (sizeof(*mpa) + plen)) { start_ep_timer(ep); CTR2(KTR_IW_CXGB, "%s more mpa msg to come %d", __FUNCTION__, ep->mpa_pkt_len); return; } /* * If we get here we have accumulated the entire mpa * start reply message including private data. */ ep->mpa_attr.initiator = 0; ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0; ep->mpa_attr.recv_marker_enabled = markers_enabled; ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0; ep->mpa_attr.version = mpa_rev; if (set_tcpinfo(ep)) { printf("%s set_tcpinfo error\n", __FUNCTION__); goto err; } CTR5(KTR_IW_CXGB, "%s - crc_enabled=%d, recv_marker_enabled=%d, " "xmit_marker_enabled=%d, version=%d", __FUNCTION__, ep->mpa_attr.crc_enabled, ep->mpa_attr.recv_marker_enabled, ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version); state_set(&ep->com, MPA_REQ_RCVD); /* drive upcall */ connect_request_upcall(ep); return; err: abort_connection(ep); return; } static void process_peer_close(struct iwch_ep *ep) { struct iwch_qp_attributes attrs; int disconnect = 1; int release = 0; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); mtx_lock(&ep->com.lock); switch (ep->com.state) { case MPA_REQ_WAIT: __state_set(&ep->com, CLOSING); break; case MPA_REQ_SENT: __state_set(&ep->com, CLOSING); connect_reply_upcall(ep, -ECONNRESET); break; case MPA_REQ_RCVD: /* * We're gonna mark this puppy DEAD, but keep * the reference on it until the ULP accepts or * rejects the CR. */ __state_set(&ep->com, CLOSING); break; case MPA_REP_SENT: __state_set(&ep->com, CLOSING); break; case FPDU_MODE: start_ep_timer(ep); __state_set(&ep->com, CLOSING); attrs.next_state = IWCH_QP_STATE_CLOSING; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 1); peer_close_upcall(ep); break; case ABORTING: disconnect = 0; break; case CLOSING: __state_set(&ep->com, MORIBUND); disconnect = 0; break; case MORIBUND: stop_ep_timer(ep); if (ep->com.cm_id && ep->com.qp) { attrs.next_state = IWCH_QP_STATE_IDLE; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 1); } close_socket(&ep->com, 0); close_complete_upcall(ep); __state_set(&ep->com, DEAD); release = 1; disconnect = 0; break; case DEAD: disconnect = 0; break; default: PANIC_IF(1); } mtx_unlock(&ep->com.lock); if (disconnect) iwch_ep_disconnect(ep, 0, M_NOWAIT); if (release) put_ep(&ep->com); return; } static void process_conn_error(struct iwch_ep *ep) { struct iwch_qp_attributes attrs; int ret; mtx_lock(&ep->com.lock); CTR3(KTR_IW_CXGB, "%s ep %p state %u", __func__, ep, ep->com.state); switch (ep->com.state) { case MPA_REQ_WAIT: stop_ep_timer(ep); break; case MPA_REQ_SENT: stop_ep_timer(ep); connect_reply_upcall(ep, -ECONNRESET); break; case MPA_REP_SENT: ep->com.rpl_err = ECONNRESET; CTR1(KTR_IW_CXGB, "waking up ep %p", ep); break; case MPA_REQ_RCVD: /* * We're gonna mark this puppy DEAD, but keep * the reference on it until the ULP accepts or * rejects the CR. */ break; case MORIBUND: case CLOSING: stop_ep_timer(ep); /*FALLTHROUGH*/ case FPDU_MODE: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = IWCH_QP_STATE_ERROR; ret = iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 1); if (ret) log(LOG_ERR, "%s - qp <- error failed!\n", __FUNCTION__); } peer_abort_upcall(ep); break; case ABORTING: break; case DEAD: mtx_unlock(&ep->com.lock); CTR2(KTR_IW_CXGB, "%s so_error %d IN DEAD STATE!!!!", __FUNCTION__, ep->com.so->so_error); return; default: PANIC_IF(1); break; } if (ep->com.state != ABORTING) { close_socket(&ep->com, 0); __state_set(&ep->com, DEAD); put_ep(&ep->com); } mtx_unlock(&ep->com.lock); return; } static void process_close_complete(struct iwch_ep *ep) { struct iwch_qp_attributes attrs; int release = 0; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); PANIC_IF(!ep); /* The cm_id may be null if we failed to connect */ mtx_lock(&ep->com.lock); switch (ep->com.state) { case CLOSING: __state_set(&ep->com, MORIBUND); break; case MORIBUND: stop_ep_timer(ep); if ((ep->com.cm_id) && (ep->com.qp)) { attrs.next_state = IWCH_QP_STATE_IDLE; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 1); } if (ep->parent_ep) close_socket(&ep->com, 1); else close_socket(&ep->com, 0); close_complete_upcall(ep); __state_set(&ep->com, DEAD); release = 1; break; case ABORTING: break; case DEAD: default: PANIC_IF(1); break; } mtx_unlock(&ep->com.lock); if (release) put_ep(&ep->com); return; } /* * T3A does 3 things when a TERM is received: * 1) send up a CPL_RDMA_TERMINATE message with the TERM packet * 2) generate an async event on the QP with the TERMINATE opcode * 3) post a TERMINATE opcde cqe into the associated CQ. * * For (1), we save the message in the qp for later consumer consumption. * For (2), we move the QP into TERMINATE, post a QP event and disconnect. * For (3), we toss the CQE in cxio_poll_cq(). * * terminate() handles case (1)... */ static int terminate(struct sge_qset *qs, struct rsp_desc *r, struct mbuf *m) { struct adapter *sc = qs->adap; struct tom_data *td = sc->tom_softc; uint32_t hash = *((uint32_t *)r + 1); unsigned int tid = ntohl(hash) >> 8 & 0xfffff; struct toepcb *toep = lookup_tid(&td->tid_maps, tid); struct socket *so = toep->tp_inp->inp_socket; struct iwch_ep *ep = so->so_rcv.sb_upcallarg; if (state_read(&ep->com) != FPDU_MODE) goto done; m_adj(m, sizeof(struct cpl_rdma_terminate)); CTR4(KTR_IW_CXGB, "%s: tid %u, ep %p, saved %d bytes", __func__, tid, ep, m->m_len); m_copydata(m, 0, m->m_len, ep->com.qp->attr.terminate_buffer); ep->com.qp->attr.terminate_msg_len = m->m_len; ep->com.qp->attr.is_terminate_local = 0; done: m_freem(m); return (0); } static int ec_status(struct sge_qset *qs, struct rsp_desc *r, struct mbuf *m) { struct adapter *sc = qs->adap; struct tom_data *td = sc->tom_softc; struct cpl_rdma_ec_status *rep = mtod(m, void *); unsigned int tid = GET_TID(rep); struct toepcb *toep = lookup_tid(&td->tid_maps, tid); struct socket *so = toep->tp_inp->inp_socket; struct iwch_ep *ep = so->so_rcv.sb_upcallarg; if (rep->status) { struct iwch_qp_attributes attrs; CTR1(KTR_IW_CXGB, "%s BAD CLOSE - Aborting", __FUNCTION__); stop_ep_timer(ep); attrs.next_state = IWCH_QP_STATE_ERROR; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 1); abort_connection(ep); } m_freem(m); return (0); } static void ep_timeout(void *arg) { struct iwch_ep *ep = (struct iwch_ep *)arg; struct iwch_qp_attributes attrs; int err = 0; int abort = 1; mtx_lock(&ep->com.lock); CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); switch (ep->com.state) { case MPA_REQ_SENT: __state_set(&ep->com, ABORTING); connect_reply_upcall(ep, -ETIMEDOUT); break; case MPA_REQ_WAIT: __state_set(&ep->com, ABORTING); break; case CLOSING: case MORIBUND: if (ep->com.cm_id && ep->com.qp) err = 1; __state_set(&ep->com, ABORTING); break; default: CTR3(KTR_IW_CXGB, "%s unexpected state ep %p state %u\n", __func__, ep, ep->com.state); abort = 0; } mtx_unlock(&ep->com.lock); if (err){ attrs.next_state = IWCH_QP_STATE_ERROR; iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 1); } if (abort) abort_connection(ep); put_ep(&ep->com); } int iwch_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len) { int err; struct iwch_ep *ep = to_ep(cm_id); CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); if (state_read(&ep->com) == DEAD) { put_ep(&ep->com); return (-ECONNRESET); } PANIC_IF(state_read(&ep->com) != MPA_REQ_RCVD); if (mpa_rev == 0) { abort_connection(ep); } else { err = send_mpa_reject(ep, pdata, pdata_len); err = soshutdown(ep->com.so, 3); } put_ep(&ep->com); return 0; } int iwch_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) { int err; struct iwch_qp_attributes attrs; enum iwch_qp_attr_mask mask; struct iwch_ep *ep = to_ep(cm_id); struct iwch_dev *h = to_iwch_dev(cm_id->device); struct iwch_qp *qp = get_qhp(h, conn_param->qpn); CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); if (state_read(&ep->com) == DEAD) { err = -ECONNRESET; goto err; } PANIC_IF(state_read(&ep->com) != MPA_REQ_RCVD); PANIC_IF(!qp); if ((conn_param->ord > qp->rhp->attr.max_rdma_read_qp_depth) || (conn_param->ird > qp->rhp->attr.max_rdma_reads_per_qp)) { abort_connection(ep); err = -EINVAL; goto err; } cm_id->add_ref(cm_id); ep->com.cm_id = cm_id; ep->com.qp = qp; ep->com.rpl_err = 0; ep->com.rpl_done = 0; ep->ird = conn_param->ird; ep->ord = conn_param->ord; CTR3(KTR_IW_CXGB, "%s ird %d ord %d", __FUNCTION__, ep->ird, ep->ord); /* bind QP to EP and move to RTS */ attrs.mpa_attr = ep->mpa_attr; attrs.max_ird = ep->ird; attrs.max_ord = ep->ord; attrs.llp_stream_handle = ep; attrs.next_state = IWCH_QP_STATE_RTS; /* bind QP and TID with INIT_WR */ mask = IWCH_QP_ATTR_NEXT_STATE | IWCH_QP_ATTR_LLP_STREAM_HANDLE | IWCH_QP_ATTR_MPA_ATTR | IWCH_QP_ATTR_MAX_IRD | IWCH_QP_ATTR_MAX_ORD; err = iwch_modify_qp(ep->com.qp->rhp, ep->com.qp, mask, &attrs, 1); if (err) goto err1; err = send_mpa_reply(ep, conn_param->private_data, conn_param->private_data_len); if (err) goto err1; state_set(&ep->com, FPDU_MODE); established_upcall(ep); put_ep(&ep->com); return 0; err1: ep->com.cm_id = NULL; ep->com.qp = NULL; cm_id->rem_ref(cm_id); err: put_ep(&ep->com); return err; } static int init_sock(struct iwch_ep_common *epc) { int err; struct sockopt sopt; int on=1; SOCK_LOCK(epc->so); soupcall_set(epc->so, SO_RCV, iwch_so_upcall, epc); epc->so->so_state |= SS_NBIO; SOCK_UNLOCK(epc->so); sopt.sopt_dir = SOPT_SET; sopt.sopt_level = IPPROTO_TCP; sopt.sopt_name = TCP_NODELAY; sopt.sopt_val = (caddr_t)&on; sopt.sopt_valsize = sizeof on; sopt.sopt_td = NULL; err = sosetopt(epc->so, &sopt); if (err) printf("%s can't set TCP_NODELAY err %d\n", __FUNCTION__, err); return 0; } static int is_loopback_dst(struct iw_cm_id *cm_id) { uint16_t port = cm_id->remote_addr.sin_port; int ifa_present; cm_id->remote_addr.sin_port = 0; ifa_present = ifa_ifwithaddr_check( (struct sockaddr *)&cm_id->remote_addr); cm_id->remote_addr.sin_port = port; return (ifa_present); } int iwch_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) { int err = 0; struct iwch_dev *h = to_iwch_dev(cm_id->device); struct iwch_ep *ep; struct nhop4_extended nh4; struct toedev *tdev; if (is_loopback_dst(cm_id)) { err = -ENOSYS; goto out; } ep = alloc_ep(sizeof(*ep), M_NOWAIT); if (!ep) { printf("%s - cannot alloc ep.\n", __FUNCTION__); err = (-ENOMEM); goto out; } callout_init(&ep->timer, 1); ep->plen = conn_param->private_data_len; if (ep->plen) memcpy(ep->mpa_pkt + sizeof(struct mpa_message), conn_param->private_data, ep->plen); ep->ird = conn_param->ird; ep->ord = conn_param->ord; cm_id->add_ref(cm_id); ep->com.cm_id = cm_id; ep->com.qp = get_qhp(h, conn_param->qpn); ep->com.thread = curthread; PANIC_IF(!ep->com.qp); CTR4(KTR_IW_CXGB, "%s qpn 0x%x qp %p cm_id %p", __FUNCTION__, conn_param->qpn, ep->com.qp, cm_id); ep->com.so = cm_id->so; err = init_sock(&ep->com); if (err) goto fail2; /* find a route */ err = find_route(cm_id->local_addr.sin_addr.s_addr, cm_id->remote_addr.sin_addr.s_addr, cm_id->local_addr.sin_port, cm_id->remote_addr.sin_port, IPTOS_LOWDELAY, &nh4); if (err) { printf("%s - cannot find route.\n", __FUNCTION__); err = EHOSTUNREACH; goto fail2; } if (!(nh4.nh_ifp->if_flags & IFCAP_TOE)) { printf("%s - interface not TOE capable.\n", __FUNCTION__); fib4_free_nh_ext(RT_DEFAULT_FIB, &nh4); goto fail2; } tdev = TOEDEV(nh4.nh_ifp); if (tdev == NULL) { printf("%s - No toedev for interface.\n", __FUNCTION__); fib4_free_nh_ext(RT_DEFAULT_FIB, &nh4); goto fail2; } fib4_free_nh_ext(RT_DEFAULT_FIB, &nh4); state_set(&ep->com, CONNECTING); ep->com.local_addr = cm_id->local_addr; ep->com.remote_addr = cm_id->remote_addr; err = soconnect(ep->com.so, (struct sockaddr *)&ep->com.remote_addr, ep->com.thread); if (!err) goto out; fail2: put_ep(&ep->com); out: return err; } int -iwch_create_listen(struct iw_cm_id *cm_id, int backlog) +iwch_create_listen_ep(struct iw_cm_id *cm_id, int backlog) { int err = 0; struct iwch_listen_ep *ep; ep = alloc_ep(sizeof(*ep), M_NOWAIT); if (!ep) { printf("%s - cannot alloc ep.\n", __FUNCTION__); err = ENOMEM; goto out; } CTR2(KTR_IW_CXGB, "%s ep %p", __FUNCTION__, ep); cm_id->add_ref(cm_id); ep->com.cm_id = cm_id; ep->backlog = backlog; ep->com.local_addr = cm_id->local_addr; ep->com.thread = curthread; state_set(&ep->com, LISTEN); ep->com.so = cm_id->so; - err = init_sock(&ep->com); - if (err) - goto fail; - - err = solisten(ep->com.so, ep->backlog, ep->com.thread); - if (!err) { - cm_id->provider_data = ep; - goto out; - } - close_socket(&ep->com, 0); -fail: - cm_id->rem_ref(cm_id); - put_ep(&ep->com); + cm_id->provider_data = ep; out: return err; } -int -iwch_destroy_listen(struct iw_cm_id *cm_id) +void +iwch_destroy_listen_ep(struct iw_cm_id *cm_id) { struct iwch_listen_ep *ep = to_listen_ep(cm_id); CTR2(KTR_IW_CXGB, "%s ep %p", __FUNCTION__, ep); state_set(&ep->com, DEAD); - close_socket(&ep->com, 0); cm_id->rem_ref(cm_id); put_ep(&ep->com); - return 0; + return; } int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, int flags) { int close = 0; mtx_lock(&ep->com.lock); PANIC_IF(!ep); PANIC_IF(!ep->com.so); CTR5(KTR_IW_CXGB, "%s ep %p so %p state %s, abrupt %d", __FUNCTION__, ep, ep->com.so, states[ep->com.state], abrupt); switch (ep->com.state) { case MPA_REQ_WAIT: case MPA_REQ_SENT: case MPA_REQ_RCVD: case MPA_REP_SENT: case FPDU_MODE: close = 1; if (abrupt) ep->com.state = ABORTING; else { ep->com.state = CLOSING; start_ep_timer(ep); } break; case CLOSING: close = 1; if (abrupt) { stop_ep_timer(ep); ep->com.state = ABORTING; } else ep->com.state = MORIBUND; break; case MORIBUND: case ABORTING: case DEAD: CTR3(KTR_IW_CXGB, "%s ignoring disconnect ep %p state %u\n", __func__, ep, ep->com.state); break; default: panic("unknown state: %d\n", ep->com.state); break; } mtx_unlock(&ep->com.lock); if (close) { if (abrupt) abort_connection(ep); else { if (!ep->parent_ep) __state_set(&ep->com, MORIBUND); shutdown_socket(&ep->com); } } return 0; } static void process_data(struct iwch_ep *ep) { struct sockaddr_in *local, *remote; CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); switch (state_read(&ep->com)) { case MPA_REQ_SENT: process_mpa_reply(ep); break; case MPA_REQ_WAIT: /* * XXX * Set local and remote addrs here because when we * dequeue the newly accepted socket, they aren't set * yet in the pcb! */ in_getsockaddr(ep->com.so, (struct sockaddr **)&local); in_getpeeraddr(ep->com.so, (struct sockaddr **)&remote); CTR3(KTR_IW_CXGB, "%s local %s remote %s", __FUNCTION__, inet_ntoa(local->sin_addr), inet_ntoa(remote->sin_addr)); ep->com.local_addr = *local; ep->com.remote_addr = *remote; free(local, M_SONAME); free(remote, M_SONAME); process_mpa_request(ep); break; default: if (sbavail(&ep->com.so->so_rcv)) printf("%s Unexpected streaming data." " ep %p state %d so %p so_state %x so_rcv.sb_cc %u so_rcv.sb_mb %p\n", __FUNCTION__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state, sbavail(&ep->com.so->so_rcv), ep->com.so->so_rcv.sb_mb); break; } return; } static void process_connected(struct iwch_ep *ep) { CTR4(KTR_IW_CXGB, "%s ep %p so %p state %s", __FUNCTION__, ep, ep->com.so, states[ep->com.state]); if ((ep->com.so->so_state & SS_ISCONNECTED) && !ep->com.so->so_error) { send_mpa_req(ep); } else { connect_reply_upcall(ep, -ep->com.so->so_error); close_socket(&ep->com, 0); state_set(&ep->com, DEAD); put_ep(&ep->com); } } -static struct socket * -dequeue_socket(struct socket *head, struct sockaddr_in **remote, struct iwch_ep *child_ep) +void +process_newconn(struct iw_cm_id *parent_cm_id, struct socket *child_so) { - struct socket *so; - - ACCEPT_LOCK(); - so = TAILQ_FIRST(&head->so_comp); - if (!so) { - ACCEPT_UNLOCK(); - return NULL; - } - TAILQ_REMOVE(&head->so_comp, so, so_list); - head->so_qlen--; - SOCK_LOCK(so); - so->so_qstate &= ~SQ_COMP; - so->so_head = NULL; - soref(so); - soupcall_set(so, SO_RCV, iwch_so_upcall, child_ep); - so->so_state |= SS_NBIO; - PANIC_IF(!(so->so_state & SS_ISCONNECTED)); - PANIC_IF(so->so_error); - SOCK_UNLOCK(so); - ACCEPT_UNLOCK(); - soaccept(so, (struct sockaddr **)remote); - return so; -} - -static void -process_newconn(struct iwch_ep *parent_ep) -{ - struct socket *child_so; struct iwch_ep *child_ep; + struct sockaddr_in *local; struct sockaddr_in *remote; + struct iwch_ep *parent_ep = parent_cm_id->provider_data; CTR3(KTR_IW_CXGB, "%s parent ep %p so %p", __FUNCTION__, parent_ep, parent_ep->com.so); + if (!child_so) { + log(LOG_ERR, "%s - invalid child socket!\n", __func__); + return; + } child_ep = alloc_ep(sizeof(*child_ep), M_NOWAIT); if (!child_ep) { log(LOG_ERR, "%s - failed to allocate ep entry!\n", __FUNCTION__); return; } - child_so = dequeue_socket(parent_ep->com.so, &remote, child_ep); - if (!child_so) { - log(LOG_ERR, "%s - failed to dequeue child socket!\n", - __FUNCTION__); - __free_ep(&child_ep->com); - return; - } + SOCKBUF_LOCK(&child_so->so_rcv); + soupcall_set(child_so, SO_RCV, iwch_so_upcall, child_ep); + SOCKBUF_UNLOCK(&child_so->so_rcv); + + in_getsockaddr(child_so, (struct sockaddr **)&local); + in_getpeeraddr(child_so, (struct sockaddr **)&remote); + CTR3(KTR_IW_CXGB, "%s remote addr %s port %d", __FUNCTION__, inet_ntoa(remote->sin_addr), ntohs(remote->sin_port)); child_ep->com.tdev = parent_ep->com.tdev; child_ep->com.local_addr.sin_family = parent_ep->com.local_addr.sin_family; child_ep->com.local_addr.sin_port = parent_ep->com.local_addr.sin_port; child_ep->com.local_addr.sin_addr.s_addr = parent_ep->com.local_addr.sin_addr.s_addr; child_ep->com.local_addr.sin_len = parent_ep->com.local_addr.sin_len; child_ep->com.remote_addr.sin_family = remote->sin_family; child_ep->com.remote_addr.sin_port = remote->sin_port; child_ep->com.remote_addr.sin_addr.s_addr = remote->sin_addr.s_addr; child_ep->com.remote_addr.sin_len = remote->sin_len; child_ep->com.so = child_so; child_ep->com.cm_id = NULL; child_ep->com.thread = parent_ep->com.thread; child_ep->parent_ep = parent_ep; + free(local, M_SONAME); free(remote, M_SONAME); get_ep(&parent_ep->com); - child_ep->parent_ep = parent_ep; callout_init(&child_ep->timer, 1); state_set(&child_ep->com, MPA_REQ_WAIT); start_ep_timer(child_ep); /* maybe the request has already been queued up on the socket... */ process_mpa_request(child_ep); } static int iwch_so_upcall(struct socket *so, void *arg, int waitflag) { struct iwch_ep *ep = arg; CTR6(KTR_IW_CXGB, "%s so %p so state %x ep %p ep state(%d)=%s", __FUNCTION__, so, so->so_state, ep, ep->com.state, states[ep->com.state]); mtx_lock(&req_lock); if (ep && ep->com.so && !ep->com.entry.tqe_prev) { get_ep(&ep->com); TAILQ_INSERT_TAIL(&req_list, &ep->com, entry); taskqueue_enqueue(iw_cxgb_taskq, &iw_cxgb_task); } mtx_unlock(&req_lock); return (SU_OK); } static void process_socket_event(struct iwch_ep *ep) { int state = state_read(&ep->com); struct socket *so = ep->com.so; CTR6(KTR_IW_CXGB, "%s so %p so state %x ep %p ep state(%d)=%s", __FUNCTION__, so, so->so_state, ep, ep->com.state, states[ep->com.state]); if (state == CONNECTING) { process_connected(ep); return; } if (state == LISTEN) { - process_newconn(ep); + /* socket listening events are handled at IWCM */ + CTR3(KTR_IW_CXGB, "%s Invalid ep state:%u, ep:%p", __func__, + ep->com.state, ep); + BUG(); return; } /* connection error */ if (so->so_error) { process_conn_error(ep); return; } /* peer close */ if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && state < CLOSING) { process_peer_close(ep); return; } /* close complete */ if (so->so_state & (SS_ISDISCONNECTED)) { process_close_complete(ep); return; } /* rx data */ process_data(ep); return; } static void process_req(void *ctx, int pending) { struct iwch_ep_common *epc; CTR1(KTR_IW_CXGB, "%s enter", __FUNCTION__); mtx_lock(&req_lock); while (!TAILQ_EMPTY(&req_list)) { epc = TAILQ_FIRST(&req_list); TAILQ_REMOVE(&req_list, epc, entry); epc->entry.tqe_prev = NULL; mtx_unlock(&req_lock); if (epc->so) process_socket_event((struct iwch_ep *)epc); put_ep(epc); mtx_lock(&req_lock); } mtx_unlock(&req_lock); } int iwch_cm_init(void) { TAILQ_INIT(&req_list); mtx_init(&req_lock, "iw_cxgb req_list lock", NULL, MTX_DEF); iw_cxgb_taskq = taskqueue_create("iw_cxgb_taskq", M_NOWAIT, taskqueue_thread_enqueue, &iw_cxgb_taskq); if (iw_cxgb_taskq == NULL) { printf("failed to allocate iw_cxgb taskqueue\n"); return (ENOMEM); } taskqueue_start_threads(&iw_cxgb_taskq, 1, PI_NET, "iw_cxgb taskq"); TASK_INIT(&iw_cxgb_task, 0, process_req, NULL); return (0); } void iwch_cm_term(void) { taskqueue_drain(iw_cxgb_taskq, &iw_cxgb_task); taskqueue_free(iw_cxgb_taskq); } void iwch_cm_init_cpl(struct adapter *sc) { t3_register_cpl_handler(sc, CPL_RDMA_TERMINATE, terminate); t3_register_cpl_handler(sc, CPL_RDMA_EC_STATUS, ec_status); } void iwch_cm_term_cpl(struct adapter *sc) { t3_register_cpl_handler(sc, CPL_RDMA_TERMINATE, NULL); t3_register_cpl_handler(sc, CPL_RDMA_EC_STATUS, NULL); } #endif Index: projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.h =================================================================== --- projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.h (revision 294776) +++ projects/clang380-import/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.h (revision 294777) @@ -1,248 +1,248 @@ /************************************************************************** Copyright (c) 2007, 2008 Chelsio Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Neither the name of the Chelsio Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. $FreeBSD$ ***************************************************************************/ #ifndef _IWCH_CM_H_ #define _IWCH_CM_H_ #include #include #include #include #include #define MPA_KEY_REQ "MPA ID Req Frame" #define MPA_KEY_REP "MPA ID Rep Frame" #define MPA_MAX_PRIVATE_DATA 256 #define MPA_REV 0 /* XXX - amso1100 uses rev 0 ! */ #define MPA_REJECT 0x20 #define MPA_CRC 0x40 #define MPA_MARKERS 0x80 #define MPA_FLAGS_MASK 0xE0 #define put_ep(ep) { \ CTR4(KTR_IW_CXGB, "put_ep (via %s:%u) ep %p refcnt %d", __FUNCTION__, __LINE__, \ ep, atomic_load_acq_int(&((ep)->refcount))); \ if (refcount_release(&((ep)->refcount))) \ __free_ep(ep); \ } #define get_ep(ep) { \ CTR4(KTR_IW_CXGB, "get_ep (via %s:%u) ep %p, refcnt %d", __FUNCTION__, __LINE__, \ ep, atomic_load_acq_int(&((ep)->refcount))); \ refcount_acquire(&((ep)->refcount)); \ } struct mpa_message { u8 key[16]; u8 flags; u8 revision; __be16 private_data_size; u8 private_data[0]; }; struct terminate_message { u8 layer_etype; u8 ecode; __be16 hdrct_rsvd; u8 len_hdrs[0]; }; #define TERM_MAX_LENGTH (sizeof(struct terminate_message) + 2 + 18 + 28) enum iwch_layers_types { LAYER_RDMAP = 0x00, LAYER_DDP = 0x10, LAYER_MPA = 0x20, RDMAP_LOCAL_CATA = 0x00, RDMAP_REMOTE_PROT = 0x01, RDMAP_REMOTE_OP = 0x02, DDP_LOCAL_CATA = 0x00, DDP_TAGGED_ERR = 0x01, DDP_UNTAGGED_ERR = 0x02, DDP_LLP = 0x03 }; enum iwch_rdma_ecodes { RDMAP_INV_STAG = 0x00, RDMAP_BASE_BOUNDS = 0x01, RDMAP_ACC_VIOL = 0x02, RDMAP_STAG_NOT_ASSOC = 0x03, RDMAP_TO_WRAP = 0x04, RDMAP_INV_VERS = 0x05, RDMAP_INV_OPCODE = 0x06, RDMAP_STREAM_CATA = 0x07, RDMAP_GLOBAL_CATA = 0x08, RDMAP_CANT_INV_STAG = 0x09, RDMAP_UNSPECIFIED = 0xff }; enum iwch_ddp_ecodes { DDPT_INV_STAG = 0x00, DDPT_BASE_BOUNDS = 0x01, DDPT_STAG_NOT_ASSOC = 0x02, DDPT_TO_WRAP = 0x03, DDPT_INV_VERS = 0x04, DDPU_INV_QN = 0x01, DDPU_INV_MSN_NOBUF = 0x02, DDPU_INV_MSN_RANGE = 0x03, DDPU_INV_MO = 0x04, DDPU_MSG_TOOBIG = 0x05, DDPU_INV_VERS = 0x06 }; enum iwch_mpa_ecodes { MPA_CRC_ERR = 0x02, MPA_MARKER_ERR = 0x03 }; enum iwch_ep_state { IDLE = 0, LISTEN, CONNECTING, MPA_REQ_WAIT, MPA_REQ_SENT, MPA_REQ_RCVD, MPA_REP_SENT, FPDU_MODE, ABORTING, CLOSING, MORIBUND, DEAD, }; enum iwch_ep_flags { PEER_ABORT_IN_PROGRESS = (1 << 0), ABORT_REQ_IN_PROGRESS = (1 << 1), }; struct iwch_ep_common { TAILQ_ENTRY(iwch_ep_common) entry; struct iw_cm_id *cm_id; struct iwch_qp *qp; struct toedev *tdev; enum iwch_ep_state state; u_int refcount; struct cv waitq; struct mtx lock; struct sockaddr_in local_addr; struct sockaddr_in remote_addr; int rpl_err; int rpl_done; struct thread *thread; struct socket *so; }; struct iwch_listen_ep { struct iwch_ep_common com; unsigned int stid; int backlog; }; struct iwch_ep { struct iwch_ep_common com; struct iwch_ep *parent_ep; struct callout timer; unsigned int atid; u32 hwtid; u32 snd_seq; u32 rcv_seq; struct l2t_entry *l2t; struct mbuf *mpa_mbuf; struct iwch_mpa_attributes mpa_attr; unsigned int mpa_pkt_len; u8 mpa_pkt[sizeof(struct mpa_message) + MPA_MAX_PRIVATE_DATA]; u8 tos; u16 emss; u16 plen; u32 ird; u32 ord; u32 flags; }; static inline struct iwch_ep *to_ep(struct iw_cm_id *cm_id) { return cm_id->provider_data; } static inline struct iwch_listen_ep *to_listen_ep(struct iw_cm_id *cm_id) { return cm_id->provider_data; } static inline int compute_wscale(int win) { int wscale = 0; while (wscale < 14 && (65535< __FBSDID("$FreeBSD$"); #include "opt_inet.h" #ifdef TCP_OFFLOAD #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static int iwch_modify_port(struct ib_device *ibdev, u8 port, int port_modify_mask, struct ib_port_modify *props) { return (-ENOSYS); } static struct ib_ah * iwch_ah_create(struct ib_pd *pd, struct ib_ah_attr *ah_attr) { return ERR_PTR(-ENOSYS); } static int iwch_ah_destroy(struct ib_ah *ah) { return (-ENOSYS); } static int iwch_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { return (-ENOSYS); } static int iwch_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { return (-ENOSYS); } static int iwch_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, struct ib_wc *in_wc, struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad) { return (-ENOSYS); } static int iwch_dealloc_ucontext(struct ib_ucontext *context) { struct iwch_dev *rhp = to_iwch_dev(context->device); struct iwch_ucontext *ucontext = to_iwch_ucontext(context); struct iwch_mm_entry *mm, *tmp; CTR2(KTR_IW_CXGB, "%s context %p", __FUNCTION__, context); TAILQ_FOREACH_SAFE(mm, &ucontext->mmaps, entry, tmp) { TAILQ_REMOVE(&ucontext->mmaps, mm, entry); cxfree(mm); } cxio_release_ucontext(&rhp->rdev, &ucontext->uctx); cxfree(ucontext); return 0; } static struct ib_ucontext * iwch_alloc_ucontext(struct ib_device *ibdev, struct ib_udata *udata) { struct iwch_ucontext *context; struct iwch_dev *rhp = to_iwch_dev(ibdev); CTR2(KTR_IW_CXGB, "%s ibdev %p", __FUNCTION__, ibdev); context = malloc(sizeof(*context), M_DEVBUF, M_ZERO|M_NOWAIT); if (!context) return ERR_PTR(-ENOMEM); cxio_init_ucontext(&rhp->rdev, &context->uctx); TAILQ_INIT(&context->mmaps); mtx_init(&context->mmap_lock, "ucontext mmap", NULL, MTX_DEF); return &context->ibucontext; } static int iwch_destroy_cq(struct ib_cq *ib_cq) { struct iwch_cq *chp; CTR2(KTR_IW_CXGB, "%s ib_cq %p", __FUNCTION__, ib_cq); chp = to_iwch_cq(ib_cq); remove_handle(chp->rhp, &chp->rhp->cqidr, chp->cq.cqid); mtx_lock(&chp->lock); if (--chp->refcnt) msleep(chp, &chp->lock, 0, "iwch_destroy_cq", 0); mtx_unlock(&chp->lock); cxio_destroy_cq(&chp->rhp->rdev, &chp->cq); cxfree(chp); return 0; } static struct ib_cq * iwch_create_cq(struct ib_device *ibdev, struct ib_cq_init_attr *attr, struct ib_ucontext *ib_context, struct ib_udata *udata) { struct iwch_dev *rhp; struct iwch_cq *chp; struct iwch_create_cq_resp uresp; struct iwch_create_cq_req ureq; struct iwch_ucontext *ucontext = NULL; static int warned; size_t resplen; int entries = attr->cqe; CTR3(KTR_IW_CXGB, "%s ib_dev %p entries %d", __FUNCTION__, ibdev, entries); rhp = to_iwch_dev(ibdev); chp = malloc(sizeof(*chp), M_DEVBUF, M_NOWAIT|M_ZERO); if (!chp) { return ERR_PTR(-ENOMEM); } if (ib_context) { ucontext = to_iwch_ucontext(ib_context); if (!t3a_device(rhp)) { if (ib_copy_from_udata(&ureq, udata, sizeof (ureq))) { cxfree(chp); return ERR_PTR(-EFAULT); } chp->user_rptr_addr = (u32 /*__user */*)(unsigned long)ureq.user_rptr_addr; } } if (t3a_device(rhp)) { /* * T3A: Add some fluff to handle extra CQEs inserted * for various errors. * Additional CQE possibilities: * TERMINATE, * incoming RDMA WRITE Failures * incoming RDMA READ REQUEST FAILUREs * NOTE: We cannot ensure the CQ won't overflow. */ entries += 16; } entries = roundup_pow_of_two(entries); chp->cq.size_log2 = ilog2(entries); if (cxio_create_cq(&rhp->rdev, &chp->cq, !ucontext)) { cxfree(chp); return ERR_PTR(-ENOMEM); } chp->rhp = rhp; chp->ibcq.cqe = 1 << chp->cq.size_log2; mtx_init(&chp->lock, "cxgb cq", NULL, MTX_DEF|MTX_DUPOK); chp->refcnt = 1; if (insert_handle(rhp, &rhp->cqidr, chp, chp->cq.cqid)) { cxio_destroy_cq(&chp->rhp->rdev, &chp->cq); cxfree(chp); return ERR_PTR(-ENOMEM); } if (ucontext) { struct iwch_mm_entry *mm; mm = kmalloc(sizeof *mm, M_NOWAIT); if (!mm) { iwch_destroy_cq(&chp->ibcq); return ERR_PTR(-ENOMEM); } uresp.cqid = chp->cq.cqid; uresp.size_log2 = chp->cq.size_log2; mtx_lock(&ucontext->mmap_lock); uresp.key = ucontext->key; ucontext->key += PAGE_SIZE; mtx_unlock(&ucontext->mmap_lock); mm->key = uresp.key; mm->addr = vtophys(chp->cq.queue); if (udata->outlen < sizeof uresp) { if (!warned++) CTR1(KTR_IW_CXGB, "%s Warning - " "downlevel libcxgb3 (non-fatal).\n", __func__); mm->len = PAGE_ALIGN((1UL << uresp.size_log2) * sizeof(struct t3_cqe)); resplen = sizeof(struct iwch_create_cq_resp_v0); } else { mm->len = PAGE_ALIGN(((1UL << uresp.size_log2) + 1) * sizeof(struct t3_cqe)); uresp.memsize = mm->len; resplen = sizeof uresp; } if (ib_copy_to_udata(udata, &uresp, resplen)) { cxfree(mm); iwch_destroy_cq(&chp->ibcq); return ERR_PTR(-EFAULT); } insert_mmap(ucontext, mm); } CTR4(KTR_IW_CXGB, "created cqid 0x%0x chp %p size 0x%0x, dma_addr 0x%0llx", chp->cq.cqid, chp, (1 << chp->cq.size_log2), (unsigned long long) chp->cq.dma_addr); return &chp->ibcq; } static int iwch_resize_cq(struct ib_cq *cq __unused, int cqe __unused, struct ib_udata *udata __unused) { return (-ENOSYS); } static int iwch_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags) { struct iwch_dev *rhp; struct iwch_cq *chp; enum t3_cq_opcode cq_op; int err; u32 rptr; chp = to_iwch_cq(ibcq); rhp = chp->rhp; if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED) cq_op = CQ_ARM_SE; else cq_op = CQ_ARM_AN; if (chp->user_rptr_addr) { if (copyin(chp->user_rptr_addr, &rptr, sizeof(rptr))) return (-EFAULT); mtx_lock(&chp->lock); chp->cq.rptr = rptr; } else mtx_lock(&chp->lock); CTR2(KTR_IW_CXGB, "%s rptr 0x%x", __FUNCTION__, chp->cq.rptr); err = cxio_hal_cq_op(&rhp->rdev, &chp->cq, cq_op, 0); mtx_unlock(&chp->lock); if (err < 0) log(LOG_ERR, "Error %d rearming CQID 0x%x\n", err, chp->cq.cqid); if (err > 0 && !(flags & IB_CQ_REPORT_MISSED_EVENTS)) err = 0; return err; } static int iwch_mmap(struct ib_ucontext *context __unused, struct vm_area_struct *vma __unused) { return (-ENOSYS); } static int iwch_deallocate_pd(struct ib_pd *pd) { struct iwch_dev *rhp; struct iwch_pd *php; php = to_iwch_pd(pd); rhp = php->rhp; CTR3(KTR_IW_CXGB, "%s ibpd %p pdid 0x%x", __FUNCTION__, pd, php->pdid); cxio_hal_put_pdid(rhp->rdev.rscp, php->pdid); cxfree(php); return 0; } static struct ib_pd *iwch_allocate_pd(struct ib_device *ibdev, struct ib_ucontext *context, struct ib_udata *udata) { struct iwch_pd *php; u32 pdid; struct iwch_dev *rhp; CTR2(KTR_IW_CXGB, "%s ibdev %p", __FUNCTION__, ibdev); rhp = (struct iwch_dev *) ibdev; pdid = cxio_hal_get_pdid(rhp->rdev.rscp); if (!pdid) return ERR_PTR(-EINVAL); php = malloc(sizeof(*php), M_DEVBUF, M_ZERO|M_NOWAIT); if (!php) { cxio_hal_put_pdid(rhp->rdev.rscp, pdid); return ERR_PTR(-ENOMEM); } php->pdid = pdid; php->rhp = rhp; if (context) { if (ib_copy_to_udata(udata, &php->pdid, sizeof (__u32))) { iwch_deallocate_pd(&php->ibpd); return ERR_PTR(-EFAULT); } } CTR3(KTR_IW_CXGB, "%s pdid 0x%0x ptr 0x%p", __FUNCTION__, pdid, php); return &php->ibpd; } static int iwch_dereg_mr(struct ib_mr *ib_mr) { struct iwch_dev *rhp; struct iwch_mr *mhp; u32 mmid; CTR2(KTR_IW_CXGB, "%s ib_mr %p", __FUNCTION__, ib_mr); /* There can be no memory windows */ if (atomic_load_acq_int(&ib_mr->usecnt.counter)) return (-EINVAL); mhp = to_iwch_mr(ib_mr); rhp = mhp->rhp; mmid = mhp->attr.stag >> 8; cxio_dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size, mhp->attr.pbl_addr); iwch_free_pbl(mhp); remove_handle(rhp, &rhp->mmidr, mmid); if (mhp->kva) cxfree((void *) (unsigned long) mhp->kva); if (mhp->umem) ib_umem_release(mhp->umem); CTR3(KTR_IW_CXGB, "%s mmid 0x%x ptr %p", __FUNCTION__, mmid, mhp); cxfree(mhp); return 0; } static struct ib_mr *iwch_register_phys_mem(struct ib_pd *pd, struct ib_phys_buf *buffer_list, int num_phys_buf, int acc, u64 *iova_start) { __be64 *page_list; int shift; u64 total_size; int npages; struct iwch_dev *rhp; struct iwch_pd *php; struct iwch_mr *mhp; int ret; CTR2(KTR_IW_CXGB, "%s ib_pd %p", __FUNCTION__, pd); php = to_iwch_pd(pd); rhp = php->rhp; mhp = malloc(sizeof(*mhp), M_DEVBUF, M_ZERO|M_NOWAIT); if (!mhp) return ERR_PTR(-ENOMEM); mhp->rhp = rhp; /* First check that we have enough alignment */ if ((*iova_start & ~PAGE_MASK) != (buffer_list[0].addr & ~PAGE_MASK)) { ret = -EINVAL; goto err; } if (num_phys_buf > 1 && ((buffer_list[0].addr + buffer_list[0].size) & ~PAGE_MASK)) { ret = -EINVAL; goto err; } ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, &total_size, &npages, &shift, &page_list); if (ret) goto err; ret = iwch_alloc_pbl(mhp, npages); if (ret) { cxfree(page_list); goto err_pbl; } ret = iwch_write_pbl(mhp, page_list, npages, 0); cxfree(page_list); if (ret) goto err; mhp->attr.pdid = php->pdid; mhp->attr.zbva = 0; mhp->attr.perms = iwch_ib_to_tpt_access(acc); mhp->attr.va_fbo = *iova_start; mhp->attr.page_size = shift - 12; mhp->attr.len = (u32) total_size; mhp->attr.pbl_size = npages; ret = iwch_register_mem(rhp, php, mhp, shift); if (ret) goto err_pbl; return &mhp->ibmr; err_pbl: iwch_free_pbl(mhp); err: cxfree(mhp); return ERR_PTR(ret); } static int iwch_reregister_phys_mem(struct ib_mr *mr, int mr_rereg_mask, struct ib_pd *pd, struct ib_phys_buf *buffer_list, int num_phys_buf, int acc, u64 * iova_start) { struct iwch_mr mh, *mhp; struct iwch_pd *php; struct iwch_dev *rhp; __be64 *page_list = NULL; int shift = 0; u64 total_size; int npages = 0; int ret; CTR3(KTR_IW_CXGB, "%s ib_mr %p ib_pd %p", __FUNCTION__, mr, pd); /* There can be no memory windows */ if (atomic_load_acq_int(&mr->usecnt.counter)) return (-EINVAL); mhp = to_iwch_mr(mr); rhp = mhp->rhp; php = to_iwch_pd(mr->pd); /* make sure we are on the same adapter */ if (rhp != php->rhp) return (-EINVAL); memcpy(&mh, mhp, sizeof *mhp); if (mr_rereg_mask & IB_MR_REREG_PD) php = to_iwch_pd(pd); if (mr_rereg_mask & IB_MR_REREG_ACCESS) mh.attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) { ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start, &total_size, &npages, &shift, &page_list); if (ret) return ret; } ret = iwch_reregister_mem(rhp, php, &mh, shift, npages); cxfree(page_list); if (ret) { return ret; } if (mr_rereg_mask & IB_MR_REREG_PD) mhp->attr.pdid = php->pdid; if (mr_rereg_mask & IB_MR_REREG_ACCESS) mhp->attr.perms = iwch_ib_to_tpt_access(acc); if (mr_rereg_mask & IB_MR_REREG_TRANS) { mhp->attr.zbva = 0; mhp->attr.va_fbo = *iova_start; mhp->attr.page_size = shift - 12; mhp->attr.len = (u32) total_size; mhp->attr.pbl_size = npages; } return 0; } static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt, int acc, struct ib_udata *udata, int mr_id) { __be64 *pages; int shift, n, len; int i, k, entry; int err = 0; struct iwch_dev *rhp; struct iwch_pd *php; struct iwch_mr *mhp; struct iwch_reg_user_mr_resp uresp; struct scatterlist *sg; CTR2(KTR_IW_CXGB, "%s ib_pd %p", __FUNCTION__, pd); php = to_iwch_pd(pd); rhp = php->rhp; mhp = malloc(sizeof(*mhp), M_DEVBUF, M_NOWAIT|M_ZERO); if (!mhp) return ERR_PTR(-ENOMEM); mhp->rhp = rhp; mhp->umem = ib_umem_get(pd->uobject->context, start, length, acc, 0); if (IS_ERR(mhp->umem)) { err = PTR_ERR(mhp->umem); cxfree(mhp); return ERR_PTR(-err); } shift = ffs(mhp->umem->page_size) - 1; n = mhp->umem->nmap; err = iwch_alloc_pbl(mhp, n); if (err) goto err; pages = (__be64 *) kmalloc(n * sizeof(u64), M_NOWAIT); if (!pages) { err = -ENOMEM; goto err_pbl; } i = n = 0; for_each_sg(mhp->umem->sg_head.sgl, sg, mhp->umem->nmap, entry) { len = sg_dma_len(sg) >> shift; for (k = 0; k < len; ++k) { pages[i++] = cpu_to_be64(sg_dma_address(sg) + mhp->umem->page_size * k); if (i == PAGE_SIZE / sizeof *pages) { err = iwch_write_pbl(mhp, pages, i, n); if (err) goto pbl_done; n += i; i = 0; } } } #if 0 TAILQ_FOREACH(chunk, &mhp->umem->chunk_list, entry) for (j = 0; j < chunk->nmap; ++j) { len = sg_dma_len(&chunk->page_list[j]) >> shift; for (k = 0; k < len; ++k) { pages[i++] = htobe64(sg_dma_address( &chunk->page_list[j]) + mhp->umem->page_size * k); if (i == PAGE_SIZE / sizeof *pages) { err = iwch_write_pbl(mhp, pages, i, n); if (err) goto pbl_done; n += i; i = 0; } } } #endif if (i) err = iwch_write_pbl(mhp, pages, i, n); pbl_done: cxfree(pages); if (err) goto err_pbl; mhp->attr.pdid = php->pdid; mhp->attr.zbva = 0; mhp->attr.perms = iwch_ib_to_tpt_access(acc); mhp->attr.va_fbo = virt; mhp->attr.page_size = shift - 12; mhp->attr.len = (u32) length; err = iwch_register_mem(rhp, php, mhp, shift); if (err) goto err_pbl; if (udata && !t3a_device(rhp)) { uresp.pbl_addr = (mhp->attr.pbl_addr - rhp->rdev.rnic_info.pbl_base) >> 3; CTR2(KTR_IW_CXGB, "%s user resp pbl_addr 0x%x", __FUNCTION__, uresp.pbl_addr); if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) { iwch_dereg_mr(&mhp->ibmr); err = EFAULT; goto err; } } return &mhp->ibmr; err_pbl: iwch_free_pbl(mhp); err: ib_umem_release(mhp->umem); cxfree(mhp); return ERR_PTR(-err); } static struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, int acc) { struct ib_phys_buf bl; u64 kva; struct ib_mr *ibmr; CTR2(KTR_IW_CXGB, "%s ib_pd %p", __FUNCTION__, pd); /* * T3 only supports 32 bits of size. */ bl.size = 0xffffffff; bl.addr = 0; kva = 0; ibmr = iwch_register_phys_mem(pd, &bl, 1, acc, &kva); return ibmr; } static struct ib_mw *iwch_alloc_mw(struct ib_pd *pd, enum ib_mw_type type) { struct iwch_dev *rhp; struct iwch_pd *php; struct iwch_mw *mhp; u32 mmid; u32 stag = 0; int ret; php = to_iwch_pd(pd); rhp = php->rhp; mhp = malloc(sizeof(*mhp), M_DEVBUF, M_ZERO|M_NOWAIT); if (!mhp) return ERR_PTR(-ENOMEM); ret = cxio_allocate_window(&rhp->rdev, &stag, php->pdid); if (ret) { cxfree(mhp); return ERR_PTR(-ret); } mhp->rhp = rhp; mhp->attr.pdid = php->pdid; mhp->attr.type = TPT_MW; mhp->attr.stag = stag; mmid = (stag) >> 8; mhp->ibmw.rkey = stag; if (insert_handle(rhp, &rhp->mmidr, mhp, mmid)) { cxio_deallocate_window(&rhp->rdev, mhp->attr.stag); cxfree(mhp); return ERR_PTR(-ENOMEM); } CTR4(KTR_IW_CXGB, "%s mmid 0x%x mhp %p stag 0x%x", __FUNCTION__, mmid, mhp, stag); return &(mhp->ibmw); } static int iwch_dealloc_mw(struct ib_mw *mw) { struct iwch_dev *rhp; struct iwch_mw *mhp; u32 mmid; mhp = to_iwch_mw(mw); rhp = mhp->rhp; mmid = (mw->rkey) >> 8; cxio_deallocate_window(&rhp->rdev, mhp->attr.stag); remove_handle(rhp, &rhp->mmidr, mmid); cxfree(mhp); CTR4(KTR_IW_CXGB, "%s ib_mw %p mmid 0x%x ptr %p", __FUNCTION__, mw, mmid, mhp); return 0; } static int iwch_destroy_qp(struct ib_qp *ib_qp) { struct iwch_dev *rhp; struct iwch_qp *qhp; struct iwch_qp_attributes attrs; struct iwch_ucontext *ucontext; qhp = to_iwch_qp(ib_qp); rhp = qhp->rhp; attrs.next_state = IWCH_QP_STATE_ERROR; iwch_modify_qp(rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 0); mtx_lock(&qhp->lock); if (qhp->ep) msleep(qhp, &qhp->lock, 0, "iwch_destroy_qp1", 0); mtx_unlock(&qhp->lock); remove_handle(rhp, &rhp->qpidr, qhp->wq.qpid); mtx_lock(&qhp->lock); if (--qhp->refcnt) msleep(qhp, &qhp->lock, 0, "iwch_destroy_qp2", 0); mtx_unlock(&qhp->lock); ucontext = ib_qp->uobject ? to_iwch_ucontext(ib_qp->uobject->context) : NULL; cxio_destroy_qp(&rhp->rdev, &qhp->wq, ucontext ? &ucontext->uctx : &rhp->rdev.uctx); CTR4(KTR_IW_CXGB, "%s ib_qp %p qpid 0x%0x qhp %p", __FUNCTION__, ib_qp, qhp->wq.qpid, qhp); cxfree(qhp); return 0; } static struct ib_qp *iwch_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs, struct ib_udata *udata) { struct iwch_dev *rhp; struct iwch_qp *qhp; struct iwch_pd *php; struct iwch_cq *schp; struct iwch_cq *rchp; struct iwch_create_qp_resp uresp; int wqsize, sqsize, rqsize; struct iwch_ucontext *ucontext; CTR2(KTR_IW_CXGB, "%s ib_pd %p", __FUNCTION__, pd); if (attrs->qp_type != IB_QPT_RC) return ERR_PTR(-EINVAL); php = to_iwch_pd(pd); rhp = php->rhp; schp = get_chp(rhp, ((struct iwch_cq *) attrs->send_cq)->cq.cqid); rchp = get_chp(rhp, ((struct iwch_cq *) attrs->recv_cq)->cq.cqid); if (!schp || !rchp) return ERR_PTR(-EINVAL); /* The RQT size must be # of entries + 1 rounded up to a power of two */ rqsize = roundup_pow_of_two(attrs->cap.max_recv_wr); if (rqsize == attrs->cap.max_recv_wr) rqsize = roundup_pow_of_two(attrs->cap.max_recv_wr+1); /* T3 doesn't support RQT depth < 16 */ if (rqsize < 16) rqsize = 16; if (rqsize > T3_MAX_RQ_SIZE) return ERR_PTR(-EINVAL); if (attrs->cap.max_inline_data > T3_MAX_INLINE) return ERR_PTR(-EINVAL); /* * NOTE: The SQ and total WQ sizes don't need to be * a power of two. However, all the code assumes * they are. EG: Q_FREECNT() and friends. */ sqsize = roundup_pow_of_two(attrs->cap.max_send_wr); wqsize = roundup_pow_of_two(rqsize + sqsize); CTR4(KTR_IW_CXGB, "%s wqsize %d sqsize %d rqsize %d", __FUNCTION__, wqsize, sqsize, rqsize); qhp = malloc(sizeof(*qhp), M_DEVBUF, M_ZERO|M_NOWAIT); if (!qhp) return ERR_PTR(-ENOMEM); qhp->wq.size_log2 = ilog2(wqsize); qhp->wq.rq_size_log2 = ilog2(rqsize); qhp->wq.sq_size_log2 = ilog2(sqsize); ucontext = pd->uobject ? to_iwch_ucontext(pd->uobject->context) : NULL; if (cxio_create_qp(&rhp->rdev, !udata, &qhp->wq, ucontext ? &ucontext->uctx : &rhp->rdev.uctx)) { cxfree(qhp); return ERR_PTR(-ENOMEM); } attrs->cap.max_recv_wr = rqsize - 1; attrs->cap.max_send_wr = sqsize; attrs->cap.max_inline_data = T3_MAX_INLINE; qhp->rhp = rhp; qhp->attr.pd = php->pdid; qhp->attr.scq = ((struct iwch_cq *) attrs->send_cq)->cq.cqid; qhp->attr.rcq = ((struct iwch_cq *) attrs->recv_cq)->cq.cqid; qhp->attr.sq_num_entries = attrs->cap.max_send_wr; qhp->attr.rq_num_entries = attrs->cap.max_recv_wr; qhp->attr.sq_max_sges = attrs->cap.max_send_sge; qhp->attr.sq_max_sges_rdma_write = attrs->cap.max_send_sge; qhp->attr.rq_max_sges = attrs->cap.max_recv_sge; qhp->attr.state = IWCH_QP_STATE_IDLE; qhp->attr.next_state = IWCH_QP_STATE_IDLE; /* * XXX - These don't get passed in from the openib user * at create time. The CM sets them via a QP modify. * Need to fix... I think the CM should */ qhp->attr.enable_rdma_read = 1; qhp->attr.enable_rdma_write = 1; qhp->attr.enable_bind = 1; qhp->attr.max_ord = 1; qhp->attr.max_ird = 1; mtx_init(&qhp->lock, "cxgb qp", NULL, MTX_DEF|MTX_DUPOK); qhp->refcnt = 1; if (insert_handle(rhp, &rhp->qpidr, qhp, qhp->wq.qpid)) { cxio_destroy_qp(&rhp->rdev, &qhp->wq, ucontext ? &ucontext->uctx : &rhp->rdev.uctx); cxfree(qhp); return ERR_PTR(-ENOMEM); } if (udata) { struct iwch_mm_entry *mm1, *mm2; mm1 = kmalloc(sizeof *mm1, M_NOWAIT); if (!mm1) { iwch_destroy_qp(&qhp->ibqp); return ERR_PTR(-ENOMEM); } mm2 = kmalloc(sizeof *mm2, M_NOWAIT); if (!mm2) { cxfree(mm1); iwch_destroy_qp(&qhp->ibqp); return ERR_PTR(-ENOMEM); } uresp.qpid = qhp->wq.qpid; uresp.size_log2 = qhp->wq.size_log2; uresp.sq_size_log2 = qhp->wq.sq_size_log2; uresp.rq_size_log2 = qhp->wq.rq_size_log2; mtx_lock(&ucontext->mmap_lock); uresp.key = ucontext->key; ucontext->key += PAGE_SIZE; uresp.db_key = ucontext->key; ucontext->key += PAGE_SIZE; mtx_unlock(&ucontext->mmap_lock); if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) { cxfree(mm1); cxfree(mm2); iwch_destroy_qp(&qhp->ibqp); return ERR_PTR(-EFAULT); } mm1->key = uresp.key; mm1->addr = vtophys(qhp->wq.queue); mm1->len = PAGE_ALIGN(wqsize * sizeof (union t3_wr)); insert_mmap(ucontext, mm1); mm2->key = uresp.db_key; mm2->addr = qhp->wq.udb & PAGE_MASK; mm2->len = PAGE_SIZE; insert_mmap(ucontext, mm2); } qhp->ibqp.qp_num = qhp->wq.qpid; callout_init(&(qhp->timer), 1); CTR6(KTR_IW_CXGB, "sq_num_entries %d, rq_num_entries %d " "qpid 0x%0x qhp %p dma_addr 0x%llx size %d", qhp->attr.sq_num_entries, qhp->attr.rq_num_entries, qhp->wq.qpid, qhp, (unsigned long long) qhp->wq.dma_addr, 1 << qhp->wq.size_log2); return &qhp->ibqp; } static int iwch_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, struct ib_udata *udata) { struct iwch_dev *rhp; struct iwch_qp *qhp; enum iwch_qp_attr_mask mask = 0; struct iwch_qp_attributes attrs; CTR2(KTR_IW_CXGB, "%s ib_qp %p", __FUNCTION__, ibqp); /* iwarp does not support the RTR state */ if ((attr_mask & IB_QP_STATE) && (attr->qp_state == IB_QPS_RTR)) attr_mask &= ~IB_QP_STATE; /* Make sure we still have something left to do */ if (!attr_mask) return 0; memset(&attrs, 0, sizeof attrs); qhp = to_iwch_qp(ibqp); rhp = qhp->rhp; attrs.next_state = iwch_convert_state(attr->qp_state); attrs.enable_rdma_read = (attr->qp_access_flags & IB_ACCESS_REMOTE_READ) ? 1 : 0; attrs.enable_rdma_write = (attr->qp_access_flags & IB_ACCESS_REMOTE_WRITE) ? 1 : 0; attrs.enable_bind = (attr->qp_access_flags & IB_ACCESS_MW_BIND) ? 1 : 0; mask |= (attr_mask & IB_QP_STATE) ? IWCH_QP_ATTR_NEXT_STATE : 0; mask |= (attr_mask & IB_QP_ACCESS_FLAGS) ? (IWCH_QP_ATTR_ENABLE_RDMA_READ | IWCH_QP_ATTR_ENABLE_RDMA_WRITE | IWCH_QP_ATTR_ENABLE_RDMA_BIND) : 0; return iwch_modify_qp(rhp, qhp, mask, &attrs, 0); } void iwch_qp_add_ref(struct ib_qp *qp) { CTR2(KTR_IW_CXGB, "%s ib_qp %p", __FUNCTION__, qp); mtx_lock(&to_iwch_qp(qp)->lock); to_iwch_qp(qp)->refcnt++; mtx_unlock(&to_iwch_qp(qp)->lock); } void iwch_qp_rem_ref(struct ib_qp *qp) { CTR2(KTR_IW_CXGB, "%s ib_qp %p", __FUNCTION__, qp); mtx_lock(&to_iwch_qp(qp)->lock); if (--to_iwch_qp(qp)->refcnt == 0) wakeup(to_iwch_qp(qp)); mtx_unlock(&to_iwch_qp(qp)->lock); } static struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn) { CTR3(KTR_IW_CXGB, "%s ib_dev %p qpn 0x%x", __FUNCTION__, dev, qpn); return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn); } static int iwch_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 * pkey) { CTR2(KTR_IW_CXGB, "%s ibdev %p", __FUNCTION__, ibdev); *pkey = 0; return 0; } static int iwch_query_gid(struct ib_device *ibdev, u8 port, int index, union ib_gid *gid) { struct iwch_dev *dev; struct port_info *pi; struct adapter *sc; CTR5(KTR_IW_CXGB, "%s ibdev %p, port %d, index %d, gid %p", __FUNCTION__, ibdev, port, index, gid); dev = to_iwch_dev(ibdev); sc = dev->rdev.adap; PANIC_IF(port == 0 || port > 2); pi = &sc->port[port - 1]; memset(&(gid->raw[0]), 0, sizeof(gid->raw)); memcpy(&(gid->raw[0]), pi->hw_addr, 6); return 0; } static int iwch_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { struct iwch_dev *dev; struct adapter *sc; CTR2(KTR_IW_CXGB, "%s ibdev %p", __FUNCTION__, ibdev); dev = to_iwch_dev(ibdev); sc = dev->rdev.adap; memset(props, 0, sizeof *props); memcpy(&props->sys_image_guid, sc->port[0].hw_addr, 6); props->device_cap_flags = dev->device_cap_flags; props->page_size_cap = dev->attr.mem_pgsizes_bitmask; props->vendor_id = pci_get_vendor(sc->dev); props->vendor_part_id = pci_get_device(sc->dev); props->max_mr_size = dev->attr.max_mr_size; props->max_qp = dev->attr.max_qps; props->max_qp_wr = dev->attr.max_wrs; props->max_sge = dev->attr.max_sge_per_wr; props->max_sge_rd = 1; props->max_qp_rd_atom = dev->attr.max_rdma_reads_per_qp; props->max_qp_init_rd_atom = dev->attr.max_rdma_reads_per_qp; props->max_cq = dev->attr.max_cqs; props->max_cqe = dev->attr.max_cqes_per_cq; props->max_mr = dev->attr.max_mem_regs; props->max_pd = dev->attr.max_pds; props->local_ca_ack_delay = 0; return 0; } static int iwch_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props) { CTR2(KTR_IW_CXGB, "%s ibdev %p", __FUNCTION__, ibdev); memset(props, 0, sizeof(struct ib_port_attr)); props->max_mtu = IB_MTU_4096; props->active_mtu = IB_MTU_2048; props->state = IB_PORT_ACTIVE; props->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_SNMP_TUNNEL_SUP | IB_PORT_REINIT_SUP | IB_PORT_DEVICE_MGMT_SUP | IB_PORT_VENDOR_CLASS_SUP | IB_PORT_BOOT_MGMT_SUP; props->gid_tbl_len = 1; props->pkey_tbl_len = 1; props->active_width = 2; props->active_speed = 2; props->max_msg_sz = -1; return 0; } int iwch_register_device(struct iwch_dev *dev) { int ret; struct adapter *sc = dev->rdev.adap; CTR2(KTR_IW_CXGB, "%s iwch_dev %p", __FUNCTION__, dev); strlcpy(dev->ibdev.name, "cxgb3_%d", IB_DEVICE_NAME_MAX); memset(&dev->ibdev.node_guid, 0, sizeof(dev->ibdev.node_guid)); memcpy(&dev->ibdev.node_guid, sc->port[0].hw_addr, 6); dev->device_cap_flags = (IB_DEVICE_LOCAL_DMA_LKEY | IB_DEVICE_MEM_WINDOW); dev->ibdev.uverbs_cmd_mask = (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) | (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE) | (1ull << IB_USER_VERBS_CMD_QUERY_PORT) | (1ull << IB_USER_VERBS_CMD_ALLOC_PD) | (1ull << IB_USER_VERBS_CMD_DEALLOC_PD) | (1ull << IB_USER_VERBS_CMD_REG_MR) | (1ull << IB_USER_VERBS_CMD_DEREG_MR) | (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | (1ull << IB_USER_VERBS_CMD_CREATE_CQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | (1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ) | (1ull << IB_USER_VERBS_CMD_CREATE_QP) | (1ull << IB_USER_VERBS_CMD_MODIFY_QP) | (1ull << IB_USER_VERBS_CMD_POLL_CQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_QP) | (1ull << IB_USER_VERBS_CMD_POST_SEND) | (1ull << IB_USER_VERBS_CMD_POST_RECV); dev->ibdev.node_type = RDMA_NODE_RNIC; memcpy(dev->ibdev.node_desc, IWCH_NODE_DESC, sizeof(IWCH_NODE_DESC)); dev->ibdev.phys_port_cnt = sc->params.nports; dev->ibdev.num_comp_vectors = 1; dev->ibdev.dma_device = dev->rdev.adap->dev; dev->ibdev.query_device = iwch_query_device; dev->ibdev.query_port = iwch_query_port; dev->ibdev.modify_port = iwch_modify_port; dev->ibdev.query_pkey = iwch_query_pkey; dev->ibdev.query_gid = iwch_query_gid; dev->ibdev.alloc_ucontext = iwch_alloc_ucontext; dev->ibdev.dealloc_ucontext = iwch_dealloc_ucontext; dev->ibdev.mmap = iwch_mmap; dev->ibdev.alloc_pd = iwch_allocate_pd; dev->ibdev.dealloc_pd = iwch_deallocate_pd; dev->ibdev.create_ah = iwch_ah_create; dev->ibdev.destroy_ah = iwch_ah_destroy; dev->ibdev.create_qp = iwch_create_qp; dev->ibdev.modify_qp = iwch_ib_modify_qp; dev->ibdev.destroy_qp = iwch_destroy_qp; dev->ibdev.create_cq = iwch_create_cq; dev->ibdev.destroy_cq = iwch_destroy_cq; dev->ibdev.resize_cq = iwch_resize_cq; dev->ibdev.poll_cq = iwch_poll_cq; dev->ibdev.get_dma_mr = iwch_get_dma_mr; dev->ibdev.reg_phys_mr = iwch_register_phys_mem; dev->ibdev.rereg_phys_mr = iwch_reregister_phys_mem; dev->ibdev.reg_user_mr = iwch_reg_user_mr; dev->ibdev.dereg_mr = iwch_dereg_mr; dev->ibdev.alloc_mw = iwch_alloc_mw; dev->ibdev.bind_mw = iwch_bind_mw; dev->ibdev.dealloc_mw = iwch_dealloc_mw; dev->ibdev.attach_mcast = iwch_multicast_attach; dev->ibdev.detach_mcast = iwch_multicast_detach; dev->ibdev.process_mad = iwch_process_mad; dev->ibdev.req_notify_cq = iwch_arm_cq; dev->ibdev.post_send = iwch_post_send; dev->ibdev.post_recv = iwch_post_receive; dev->ibdev.uverbs_abi_ver = IWCH_UVERBS_ABI_VERSION; dev->ibdev.iwcm = kmalloc(sizeof(struct iw_cm_verbs), M_NOWAIT); if (!dev->ibdev.iwcm) return (ENOMEM); dev->ibdev.iwcm->connect = iwch_connect; dev->ibdev.iwcm->accept = iwch_accept_cr; dev->ibdev.iwcm->reject = iwch_reject_cr; - dev->ibdev.iwcm->create_listen = iwch_create_listen; - dev->ibdev.iwcm->destroy_listen = iwch_destroy_listen; + dev->ibdev.iwcm->create_listen_ep = iwch_create_listen_ep; + dev->ibdev.iwcm->destroy_listen_ep = iwch_destroy_listen_ep; + dev->ibdev.iwcm->newconn = process_newconn; dev->ibdev.iwcm->add_ref = iwch_qp_add_ref; dev->ibdev.iwcm->rem_ref = iwch_qp_rem_ref; dev->ibdev.iwcm->get_qp = iwch_get_qp; ret = ib_register_device(&dev->ibdev, NULL); if (ret) goto bail1; return (0); bail1: cxfree(dev->ibdev.iwcm); return (ret); } void iwch_unregister_device(struct iwch_dev *dev) { ib_unregister_device(&dev->ibdev); cxfree(dev->ibdev.iwcm); return; } #endif Index: projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/cm.c =================================================================== --- projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/cm.c (revision 294776) +++ projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/cm.c (revision 294777) @@ -1,2441 +1,2406 @@ /* - * Copyright (c) 2009-2013 Chelsio, Inc. All rights reserved. + * Copyright (c) 2009-2013, 2016 Chelsio, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #ifdef TCP_OFFLOAD #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include struct sge_iq; struct rss_header; #include #include "offload.h" #include "tom/t4_tom.h" #define TOEPCB(so) ((struct toepcb *)(so_sototcpcb((so))->t_toe)) #include "iw_cxgbe.h" #include #include #include #include #include #include static spinlock_t req_lock; static TAILQ_HEAD(c4iw_ep_list, c4iw_ep_common) req_list; static struct work_struct c4iw_task; static struct workqueue_struct *c4iw_taskq; static LIST_HEAD(timeout_list); static spinlock_t timeout_lock; static void process_req(struct work_struct *ctx); static void start_ep_timer(struct c4iw_ep *ep); static void stop_ep_timer(struct c4iw_ep *ep); static int set_tcpinfo(struct c4iw_ep *ep); static enum c4iw_ep_state state_read(struct c4iw_ep_common *epc); static void __state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state tostate); static void state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state tostate); static void *alloc_ep(int size, gfp_t flags); void __free_ep(struct c4iw_ep_common *epc); static int find_route(__be32 local_ip, __be32 peer_ip, __be16 local_port, __be16 peer_port, u8 tos, struct nhop4_extended *pnh4); static int close_socket(struct c4iw_ep_common *epc, int close); static int shutdown_socket(struct c4iw_ep_common *epc); static void abort_socket(struct c4iw_ep *ep); static void send_mpa_req(struct c4iw_ep *ep); static int send_mpa_reject(struct c4iw_ep *ep, const void *pdata, u8 plen); static int send_mpa_reply(struct c4iw_ep *ep, const void *pdata, u8 plen); static void close_complete_upcall(struct c4iw_ep *ep, int status); static int abort_connection(struct c4iw_ep *ep); static void peer_close_upcall(struct c4iw_ep *ep); static void peer_abort_upcall(struct c4iw_ep *ep); static void connect_reply_upcall(struct c4iw_ep *ep, int status); static int connect_request_upcall(struct c4iw_ep *ep); static void established_upcall(struct c4iw_ep *ep); static void process_mpa_reply(struct c4iw_ep *ep); static void process_mpa_request(struct c4iw_ep *ep); static void process_peer_close(struct c4iw_ep *ep); static void process_conn_error(struct c4iw_ep *ep); static void process_close_complete(struct c4iw_ep *ep); static void ep_timeout(unsigned long arg); static void init_sock(struct c4iw_ep_common *epc); static void process_data(struct c4iw_ep *ep); static void process_connected(struct c4iw_ep *ep); -static struct socket * dequeue_socket(struct socket *head, struct sockaddr_in **remote, struct c4iw_ep *child_ep); -static void process_newconn(struct c4iw_ep *parent_ep); static int c4iw_so_upcall(struct socket *so, void *arg, int waitflag); static void process_socket_event(struct c4iw_ep *ep); static void release_ep_resources(struct c4iw_ep *ep); #define START_EP_TIMER(ep) \ do { \ CTR3(KTR_IW_CXGBE, "start_ep_timer (%s:%d) ep %p", \ __func__, __LINE__, (ep)); \ start_ep_timer(ep); \ } while (0) #define STOP_EP_TIMER(ep) \ do { \ CTR3(KTR_IW_CXGBE, "stop_ep_timer (%s:%d) ep %p", \ __func__, __LINE__, (ep)); \ stop_ep_timer(ep); \ } while (0) #ifdef KTR static char *states[] = { "idle", "listen", "connecting", "mpa_wait_req", "mpa_req_sent", "mpa_req_rcvd", "mpa_rep_sent", "fpdu_mode", "aborting", "closing", "moribund", "dead", NULL, }; #endif static void process_req(struct work_struct *ctx) { struct c4iw_ep_common *epc; spin_lock(&req_lock); while (!TAILQ_EMPTY(&req_list)) { epc = TAILQ_FIRST(&req_list); TAILQ_REMOVE(&req_list, epc, entry); epc->entry.tqe_prev = NULL; spin_unlock(&req_lock); if (epc->so) process_socket_event((struct c4iw_ep *)epc); c4iw_put_ep(epc); spin_lock(&req_lock); } spin_unlock(&req_lock); } /* * XXX: doesn't belong here in the iWARP driver. * XXX: assumes that the connection was offloaded by cxgbe/t4_tom if TF_TOE is * set. Is this a valid assumption for active open? */ static int set_tcpinfo(struct c4iw_ep *ep) { struct socket *so = ep->com.so; struct inpcb *inp = sotoinpcb(so); struct tcpcb *tp; struct toepcb *toep; int rc = 0; INP_WLOCK(inp); tp = intotcpcb(inp); if ((tp->t_flags & TF_TOE) == 0) { rc = EINVAL; log(LOG_ERR, "%s: connection not offloaded (so %p, ep %p)\n", __func__, so, ep); goto done; } toep = TOEPCB(so); ep->hwtid = toep->tid; ep->snd_seq = tp->snd_nxt; ep->rcv_seq = tp->rcv_nxt; ep->emss = max(tp->t_maxseg, 128); done: INP_WUNLOCK(inp); return (rc); } static int find_route(__be32 local_ip, __be32 peer_ip, __be16 local_port, __be16 peer_port, u8 tos, struct nhop4_extended *pnh4) { struct in_addr addr; int err; CTR5(KTR_IW_CXGBE, "%s:frtB %x, %x, %d, %d", __func__, local_ip, peer_ip, ntohs(local_port), ntohs(peer_port)); addr.s_addr = peer_ip; err = fib4_lookup_nh_ext(RT_DEFAULT_FIB, addr, NHR_REF, 0, pnh4); CTR2(KTR_IW_CXGBE, "%s:frtE %d", __func__, err); return err; } static int close_socket(struct c4iw_ep_common *epc, int close) { struct socket *so = epc->so; int rc; CTR4(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s", __func__, epc, so, states[epc->state]); SOCK_LOCK(so); soupcall_clear(so, SO_RCV); SOCK_UNLOCK(so); if (close) rc = soclose(so); else rc = soshutdown(so, SHUT_WR | SHUT_RD); epc->so = NULL; return (rc); } static int shutdown_socket(struct c4iw_ep_common *epc) { CTR4(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s", __func__, epc->so, epc, states[epc->state]); return (soshutdown(epc->so, SHUT_WR)); } static void abort_socket(struct c4iw_ep *ep) { struct sockopt sopt; int rc; struct linger l; CTR4(KTR_IW_CXGBE, "%s ep %p so %p state %s", __func__, ep, ep->com.so, states[ep->com.state]); l.l_onoff = 1; l.l_linger = 0; /* linger_time of 0 forces RST to be sent */ sopt.sopt_dir = SOPT_SET; sopt.sopt_level = SOL_SOCKET; sopt.sopt_name = SO_LINGER; sopt.sopt_val = (caddr_t)&l; sopt.sopt_valsize = sizeof l; sopt.sopt_td = NULL; rc = sosetopt(ep->com.so, &sopt); if (rc) { log(LOG_ERR, "%s: can't set linger to 0, no RST! err %d\n", __func__, rc); } } static void process_peer_close(struct c4iw_ep *ep) { struct c4iw_qp_attributes attrs; int disconnect = 1; int release = 0; CTR4(KTR_IW_CXGBE, "%s:ppcB ep %p so %p state %s", __func__, ep, ep->com.so, states[ep->com.state]); mutex_lock(&ep->com.mutex); switch (ep->com.state) { case MPA_REQ_WAIT: CTR2(KTR_IW_CXGBE, "%s:ppc1 %p MPA_REQ_WAIT CLOSING", __func__, ep); __state_set(&ep->com, CLOSING); break; case MPA_REQ_SENT: CTR2(KTR_IW_CXGBE, "%s:ppc2 %p MPA_REQ_SENT CLOSING", __func__, ep); __state_set(&ep->com, DEAD); connect_reply_upcall(ep, -ECONNABORTED); disconnect = 0; STOP_EP_TIMER(ep); close_socket(&ep->com, 0); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; release = 1; break; case MPA_REQ_RCVD: /* * We're gonna mark this puppy DEAD, but keep * the reference on it until the ULP accepts or * rejects the CR. */ CTR2(KTR_IW_CXGBE, "%s:ppc3 %p MPA_REQ_RCVD CLOSING", __func__, ep); __state_set(&ep->com, CLOSING); c4iw_get_ep(&ep->com); break; case MPA_REP_SENT: CTR2(KTR_IW_CXGBE, "%s:ppc4 %p MPA_REP_SENT CLOSING", __func__, ep); __state_set(&ep->com, CLOSING); break; case FPDU_MODE: CTR2(KTR_IW_CXGBE, "%s:ppc5 %p FPDU_MODE CLOSING", __func__, ep); START_EP_TIMER(ep); __state_set(&ep->com, CLOSING); attrs.next_state = C4IW_QP_STATE_CLOSING; c4iw_modify_qp(ep->com.dev, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); peer_close_upcall(ep); break; case ABORTING: CTR2(KTR_IW_CXGBE, "%s:ppc6 %p ABORTING (disconn)", __func__, ep); disconnect = 0; break; case CLOSING: CTR2(KTR_IW_CXGBE, "%s:ppc7 %p CLOSING MORIBUND", __func__, ep); __state_set(&ep->com, MORIBUND); disconnect = 0; break; case MORIBUND: CTR2(KTR_IW_CXGBE, "%s:ppc8 %p MORIBUND DEAD", __func__, ep); STOP_EP_TIMER(ep); if (ep->com.cm_id && ep->com.qp) { attrs.next_state = C4IW_QP_STATE_IDLE; c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); } close_socket(&ep->com, 0); close_complete_upcall(ep, 0); __state_set(&ep->com, DEAD); release = 1; disconnect = 0; break; case DEAD: CTR2(KTR_IW_CXGBE, "%s:ppc9 %p DEAD (disconn)", __func__, ep); disconnect = 0; break; default: panic("%s: ep %p state %d", __func__, ep, ep->com.state); break; } mutex_unlock(&ep->com.mutex); if (disconnect) { CTR2(KTR_IW_CXGBE, "%s:ppca %p", __func__, ep); c4iw_ep_disconnect(ep, 0, M_NOWAIT); } if (release) { CTR2(KTR_IW_CXGBE, "%s:ppcb %p", __func__, ep); c4iw_put_ep(&ep->com); } CTR2(KTR_IW_CXGBE, "%s:ppcE %p", __func__, ep); return; } static void process_conn_error(struct c4iw_ep *ep) { struct c4iw_qp_attributes attrs; int ret; int state; state = state_read(&ep->com); CTR5(KTR_IW_CXGBE, "%s:pceB ep %p so %p so->so_error %u state %s", __func__, ep, ep->com.so, ep->com.so->so_error, states[ep->com.state]); switch (state) { case MPA_REQ_WAIT: STOP_EP_TIMER(ep); break; case MPA_REQ_SENT: STOP_EP_TIMER(ep); connect_reply_upcall(ep, -ECONNRESET); break; case MPA_REP_SENT: ep->com.rpl_err = ECONNRESET; CTR1(KTR_IW_CXGBE, "waking up ep %p", ep); break; case MPA_REQ_RCVD: /* * We're gonna mark this puppy DEAD, but keep * the reference on it until the ULP accepts or * rejects the CR. */ c4iw_get_ep(&ep->com); break; case MORIBUND: case CLOSING: STOP_EP_TIMER(ep); /*FALLTHROUGH*/ case FPDU_MODE: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = C4IW_QP_STATE_ERROR; ret = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); if (ret) log(LOG_ERR, "%s - qp <- error failed!\n", __func__); } peer_abort_upcall(ep); break; case ABORTING: break; case DEAD: CTR2(KTR_IW_CXGBE, "%s so_error %d IN DEAD STATE!!!!", __func__, ep->com.so->so_error); return; default: panic("%s: ep %p state %d", __func__, ep, state); break; } if (state != ABORTING) { CTR2(KTR_IW_CXGBE, "%s:pce1 %p", __func__, ep); close_socket(&ep->com, 0); state_set(&ep->com, DEAD); c4iw_put_ep(&ep->com); } CTR2(KTR_IW_CXGBE, "%s:pceE %p", __func__, ep); return; } static void process_close_complete(struct c4iw_ep *ep) { struct c4iw_qp_attributes attrs; int release = 0; CTR4(KTR_IW_CXGBE, "%s:pccB ep %p so %p state %s", __func__, ep, ep->com.so, states[ep->com.state]); /* The cm_id may be null if we failed to connect */ mutex_lock(&ep->com.mutex); switch (ep->com.state) { case CLOSING: CTR2(KTR_IW_CXGBE, "%s:pcc1 %p CLOSING MORIBUND", __func__, ep); __state_set(&ep->com, MORIBUND); break; case MORIBUND: CTR2(KTR_IW_CXGBE, "%s:pcc1 %p MORIBUND DEAD", __func__, ep); STOP_EP_TIMER(ep); if ((ep->com.cm_id) && (ep->com.qp)) { CTR2(KTR_IW_CXGBE, "%s:pcc2 %p QP_STATE_IDLE", __func__, ep); attrs.next_state = C4IW_QP_STATE_IDLE; c4iw_modify_qp(ep->com.dev, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); } if (ep->parent_ep) { CTR2(KTR_IW_CXGBE, "%s:pcc3 %p", __func__, ep); close_socket(&ep->com, 1); } else { CTR2(KTR_IW_CXGBE, "%s:pcc4 %p", __func__, ep); close_socket(&ep->com, 0); } close_complete_upcall(ep, 0); __state_set(&ep->com, DEAD); release = 1; break; case ABORTING: CTR2(KTR_IW_CXGBE, "%s:pcc5 %p ABORTING", __func__, ep); break; case DEAD: default: CTR2(KTR_IW_CXGBE, "%s:pcc6 %p DEAD", __func__, ep); panic("%s:pcc6 %p DEAD", __func__, ep); break; } mutex_unlock(&ep->com.mutex); if (release) { CTR2(KTR_IW_CXGBE, "%s:pcc7 %p", __func__, ep); c4iw_put_ep(&ep->com); } CTR2(KTR_IW_CXGBE, "%s:pccE %p", __func__, ep); return; } static void init_sock(struct c4iw_ep_common *epc) { int rc; struct sockopt sopt; struct socket *so = epc->so; int on = 1; SOCK_LOCK(so); soupcall_set(so, SO_RCV, c4iw_so_upcall, epc); so->so_state |= SS_NBIO; SOCK_UNLOCK(so); sopt.sopt_dir = SOPT_SET; sopt.sopt_level = IPPROTO_TCP; sopt.sopt_name = TCP_NODELAY; sopt.sopt_val = (caddr_t)&on; sopt.sopt_valsize = sizeof on; sopt.sopt_td = NULL; rc = sosetopt(so, &sopt); if (rc) { log(LOG_ERR, "%s: can't set TCP_NODELAY on so %p (%d)\n", __func__, so, rc); } } static void process_data(struct c4iw_ep *ep) { struct sockaddr_in *local, *remote; CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__, ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv)); switch (state_read(&ep->com)) { case MPA_REQ_SENT: process_mpa_reply(ep); break; case MPA_REQ_WAIT: in_getsockaddr(ep->com.so, (struct sockaddr **)&local); in_getpeeraddr(ep->com.so, (struct sockaddr **)&remote); ep->com.local_addr = *local; ep->com.remote_addr = *remote; free(local, M_SONAME); free(remote, M_SONAME); process_mpa_request(ep); break; default: if (sbused(&ep->com.so->so_rcv)) log(LOG_ERR, "%s: Unexpected streaming data. ep %p, " "state %d, so %p, so_state 0x%x, sbused %u\n", __func__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state, sbused(&ep->com.so->so_rcv)); break; } } static void process_connected(struct c4iw_ep *ep) { if ((ep->com.so->so_state & SS_ISCONNECTED) && !ep->com.so->so_error) send_mpa_req(ep); else { connect_reply_upcall(ep, -ep->com.so->so_error); close_socket(&ep->com, 0); state_set(&ep->com, DEAD); c4iw_put_ep(&ep->com); } } -static struct socket * -dequeue_socket(struct socket *head, struct sockaddr_in **remote, - struct c4iw_ep *child_ep) +void +process_newconn(struct iw_cm_id *parent_cm_id, struct socket *child_so) { - struct socket *so; - - ACCEPT_LOCK(); - so = TAILQ_FIRST(&head->so_comp); - if (!so) { - ACCEPT_UNLOCK(); - return (NULL); - } - TAILQ_REMOVE(&head->so_comp, so, so_list); - head->so_qlen--; - SOCK_LOCK(so); - so->so_qstate &= ~SQ_COMP; - so->so_head = NULL; - soref(so); - soupcall_set(so, SO_RCV, c4iw_so_upcall, child_ep); - so->so_state |= SS_NBIO; - SOCK_UNLOCK(so); - ACCEPT_UNLOCK(); - soaccept(so, (struct sockaddr **)remote); - - return (so); -} - -static void -process_newconn(struct c4iw_ep *parent_ep) -{ - struct socket *child_so; struct c4iw_ep *child_ep; + struct sockaddr_in *local; struct sockaddr_in *remote; + struct c4iw_ep *parent_ep = parent_cm_id->provider_data; + if (!child_so) { + CTR4(KTR_IW_CXGBE, + "%s: parent so %p, parent ep %p, child so %p, invalid so", + __func__, parent_ep->com.so, parent_ep, child_so); + log(LOG_ERR, "%s: invalid child socket\n", __func__); + return; + } child_ep = alloc_ep(sizeof(*child_ep), M_NOWAIT); if (!child_ep) { CTR3(KTR_IW_CXGBE, "%s: parent so %p, parent ep %p, ENOMEM", __func__, parent_ep->com.so, parent_ep); log(LOG_ERR, "%s: failed to allocate ep entry\n", __func__); return; } + SOCKBUF_LOCK(&child_so->so_rcv); + soupcall_set(child_so, SO_RCV, c4iw_so_upcall, child_ep); + SOCKBUF_UNLOCK(&child_so->so_rcv); - child_so = dequeue_socket(parent_ep->com.so, &remote, child_ep); - if (!child_so) { - CTR4(KTR_IW_CXGBE, - "%s: parent so %p, parent ep %p, child ep %p, dequeue err", - __func__, parent_ep->com.so, parent_ep, child_ep); - log(LOG_ERR, "%s: failed to dequeue child socket\n", __func__); - __free_ep(&child_ep->com); - return; - - } - CTR5(KTR_IW_CXGBE, "%s: parent so %p, parent ep %p, child so %p, child ep %p", __func__, parent_ep->com.so, parent_ep, child_so, child_ep); - child_ep->com.local_addr = parent_ep->com.local_addr; + in_getsockaddr(child_so, (struct sockaddr **)&local); + in_getpeeraddr(child_so, (struct sockaddr **)&remote); + + child_ep->com.local_addr = *local; child_ep->com.remote_addr = *remote; child_ep->com.dev = parent_ep->com.dev; child_ep->com.so = child_so; child_ep->com.cm_id = NULL; child_ep->com.thread = parent_ep->com.thread; child_ep->parent_ep = parent_ep; + free(local, M_SONAME); free(remote, M_SONAME); + c4iw_get_ep(&parent_ep->com); - child_ep->parent_ep = parent_ep; init_timer(&child_ep->timer); state_set(&child_ep->com, MPA_REQ_WAIT); START_EP_TIMER(child_ep); /* maybe the request has already been queued up on the socket... */ process_mpa_request(child_ep); + return; } static int c4iw_so_upcall(struct socket *so, void *arg, int waitflag) { struct c4iw_ep *ep = arg; spin_lock(&req_lock); CTR6(KTR_IW_CXGBE, "%s: so %p, so_state 0x%x, ep %p, ep_state %s, tqe_prev %p", __func__, so, so->so_state, ep, states[ep->com.state], ep->com.entry.tqe_prev); if (ep && ep->com.so && !ep->com.entry.tqe_prev) { KASSERT(ep->com.so == so, ("%s: XXX review.", __func__)); c4iw_get_ep(&ep->com); TAILQ_INSERT_TAIL(&req_list, &ep->com, entry); queue_work(c4iw_taskq, &c4iw_task); } spin_unlock(&req_lock); return (SU_OK); } static void process_socket_event(struct c4iw_ep *ep) { int state = state_read(&ep->com); struct socket *so = ep->com.so; CTR6(KTR_IW_CXGBE, "process_socket_event: so %p, so_state 0x%x, " "so_err %d, sb_state 0x%x, ep %p, ep_state %s", so, so->so_state, so->so_error, so->so_rcv.sb_state, ep, states[state]); if (state == CONNECTING) { process_connected(ep); return; } if (state == LISTEN) { - process_newconn(ep); + /* socket listening events are handled at IWCM */ + CTR3(KTR_IW_CXGBE, "%s Invalid ep state:%u, ep:%p", __func__, + ep->com.state, ep); + BUG(); return; } /* connection error */ if (so->so_error) { process_conn_error(ep); return; } /* peer close */ if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && state < CLOSING) { process_peer_close(ep); return; } /* close complete */ if (so->so_state & SS_ISDISCONNECTED) { process_close_complete(ep); return; } /* rx data */ process_data(ep); } SYSCTL_NODE(_hw, OID_AUTO, iw_cxgbe, CTLFLAG_RD, 0, "iw_cxgbe driver parameters"); int db_delay_usecs = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, db_delay_usecs, CTLFLAG_RWTUN, &db_delay_usecs, 0, "Usecs to delay awaiting db fifo to drain"); static int dack_mode = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, dack_mode, CTLFLAG_RWTUN, &dack_mode, 0, "Delayed ack mode (default = 1)"); int c4iw_max_read_depth = 8; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, c4iw_max_read_depth, CTLFLAG_RWTUN, &c4iw_max_read_depth, 0, "Per-connection max ORD/IRD (default = 8)"); static int enable_tcp_timestamps; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, enable_tcp_timestamps, CTLFLAG_RWTUN, &enable_tcp_timestamps, 0, "Enable tcp timestamps (default = 0)"); static int enable_tcp_sack; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, enable_tcp_sack, CTLFLAG_RWTUN, &enable_tcp_sack, 0, "Enable tcp SACK (default = 0)"); static int enable_tcp_window_scaling = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, enable_tcp_window_scaling, CTLFLAG_RWTUN, &enable_tcp_window_scaling, 0, "Enable tcp window scaling (default = 1)"); int c4iw_debug = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, c4iw_debug, CTLFLAG_RWTUN, &c4iw_debug, 0, "Enable debug logging (default = 0)"); static int peer2peer; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, peer2peer, CTLFLAG_RWTUN, &peer2peer, 0, "Support peer2peer ULPs (default = 0)"); static int p2p_type = FW_RI_INIT_P2PTYPE_READ_REQ; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, p2p_type, CTLFLAG_RWTUN, &p2p_type, 0, "RDMAP opcode to use for the RTR message: 1 = RDMA_READ 0 = RDMA_WRITE (default 1)"); static int ep_timeout_secs = 60; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, ep_timeout_secs, CTLFLAG_RWTUN, &ep_timeout_secs, 0, "CM Endpoint operation timeout in seconds (default = 60)"); static int mpa_rev = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, mpa_rev, CTLFLAG_RWTUN, &mpa_rev, 0, "MPA Revision, 0 supports amso1100, 1 is RFC5044 spec compliant, 2 is IETF MPA Peer Connect Draft compliant (default = 1)"); static int markers_enabled; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, markers_enabled, CTLFLAG_RWTUN, &markers_enabled, 0, "Enable MPA MARKERS (default(0) = disabled)"); static int crc_enabled = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, crc_enabled, CTLFLAG_RWTUN, &crc_enabled, 0, "Enable MPA CRC (default(1) = enabled)"); static int rcv_win = 256 * 1024; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, rcv_win, CTLFLAG_RWTUN, &rcv_win, 0, "TCP receive window in bytes (default = 256KB)"); static int snd_win = 128 * 1024; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, snd_win, CTLFLAG_RWTUN, &snd_win, 0, "TCP send window in bytes (default = 128KB)"); int db_fc_threshold = 2000; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, db_fc_threshold, CTLFLAG_RWTUN, &db_fc_threshold, 0, "QP count/threshold that triggers automatic"); static void start_ep_timer(struct c4iw_ep *ep) { if (timer_pending(&ep->timer)) { CTR2(KTR_IW_CXGBE, "%s: ep %p, already started", __func__, ep); printk(KERN_ERR "%s timer already started! ep %p\n", __func__, ep); return; } clear_bit(TIMEOUT, &ep->com.flags); c4iw_get_ep(&ep->com); ep->timer.expires = jiffies + ep_timeout_secs * HZ; ep->timer.data = (unsigned long)ep; ep->timer.function = ep_timeout; add_timer(&ep->timer); } static void stop_ep_timer(struct c4iw_ep *ep) { del_timer_sync(&ep->timer); if (!test_and_set_bit(TIMEOUT, &ep->com.flags)) { c4iw_put_ep(&ep->com); } } static enum c4iw_ep_state state_read(struct c4iw_ep_common *epc) { enum c4iw_ep_state state; mutex_lock(&epc->mutex); state = epc->state; mutex_unlock(&epc->mutex); return (state); } static void __state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state new) { epc->state = new; } static void state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state new) { mutex_lock(&epc->mutex); __state_set(epc, new); mutex_unlock(&epc->mutex); } static void * alloc_ep(int size, gfp_t gfp) { struct c4iw_ep_common *epc; epc = kzalloc(size, gfp); if (epc == NULL) return (NULL); kref_init(&epc->kref); mutex_init(&epc->mutex); c4iw_init_wr_wait(&epc->wr_wait); return (epc); } void __free_ep(struct c4iw_ep_common *epc) { CTR2(KTR_IW_CXGBE, "%s:feB %p", __func__, epc); KASSERT(!epc->so, ("%s warning ep->so %p \n", __func__, epc->so)); KASSERT(!epc->entry.tqe_prev, ("%s epc %p still on req list!\n", __func__, epc)); free(epc, M_DEVBUF); CTR2(KTR_IW_CXGBE, "%s:feE %p", __func__, epc); } void _c4iw_free_ep(struct kref *kref) { struct c4iw_ep *ep; struct c4iw_ep_common *epc; ep = container_of(kref, struct c4iw_ep, com.kref); epc = &ep->com; - KASSERT(!epc->so, ("%s ep->so %p", __func__, epc->so)); KASSERT(!epc->entry.tqe_prev, ("%s epc %p still on req list", __func__, epc)); kfree(ep); } static void release_ep_resources(struct c4iw_ep *ep) { CTR2(KTR_IW_CXGBE, "%s:rerB %p", __func__, ep); set_bit(RELEASE_RESOURCES, &ep->com.flags); c4iw_put_ep(&ep->com); CTR2(KTR_IW_CXGBE, "%s:rerE %p", __func__, ep); } static void send_mpa_req(struct c4iw_ep *ep) { int mpalen; struct mpa_message *mpa; struct mpa_v2_conn_params mpa_v2_params; struct mbuf *m; char mpa_rev_to_use = mpa_rev; int err; if (ep->retry_with_mpa_v1) mpa_rev_to_use = 1; mpalen = sizeof(*mpa) + ep->plen; if (mpa_rev_to_use == 2) mpalen += sizeof(struct mpa_v2_conn_params); mpa = malloc(mpalen, M_CXGBE, M_NOWAIT); if (mpa == NULL) { failed: connect_reply_upcall(ep, -ENOMEM); return; } memset(mpa, 0, mpalen); memcpy(mpa->key, MPA_KEY_REQ, sizeof(mpa->key)); mpa->flags = (crc_enabled ? MPA_CRC : 0) | (markers_enabled ? MPA_MARKERS : 0) | (mpa_rev_to_use == 2 ? MPA_ENHANCED_RDMA_CONN : 0); mpa->private_data_size = htons(ep->plen); mpa->revision = mpa_rev_to_use; if (mpa_rev_to_use == 1) { ep->tried_with_mpa_v1 = 1; ep->retry_with_mpa_v1 = 0; } if (mpa_rev_to_use == 2) { mpa->private_data_size += htons(sizeof(struct mpa_v2_conn_params)); mpa_v2_params.ird = htons((u16)ep->ird); mpa_v2_params.ord = htons((u16)ep->ord); if (peer2peer) { mpa_v2_params.ird |= htons(MPA_V2_PEER2PEER_MODEL); if (p2p_type == FW_RI_INIT_P2PTYPE_RDMA_WRITE) { mpa_v2_params.ord |= htons(MPA_V2_RDMA_WRITE_RTR); } else if (p2p_type == FW_RI_INIT_P2PTYPE_READ_REQ) { mpa_v2_params.ord |= htons(MPA_V2_RDMA_READ_RTR); } } memcpy(mpa->private_data, &mpa_v2_params, sizeof(struct mpa_v2_conn_params)); if (ep->plen) { memcpy(mpa->private_data + sizeof(struct mpa_v2_conn_params), ep->mpa_pkt + sizeof(*mpa), ep->plen); } } else { if (ep->plen) memcpy(mpa->private_data, ep->mpa_pkt + sizeof(*mpa), ep->plen); CTR2(KTR_IW_CXGBE, "%s:smr7 %p", __func__, ep); } m = m_getm(NULL, mpalen, M_NOWAIT, MT_DATA); if (m == NULL) { free(mpa, M_CXGBE); goto failed; } m_copyback(m, 0, mpalen, (void *)mpa); free(mpa, M_CXGBE); err = sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); if (err) goto failed; START_EP_TIMER(ep); state_set(&ep->com, MPA_REQ_SENT); ep->mpa_attr.initiator = 1; } static int send_mpa_reject(struct c4iw_ep *ep, const void *pdata, u8 plen) { int mpalen ; struct mpa_message *mpa; struct mpa_v2_conn_params mpa_v2_params; struct mbuf *m; int err; CTR4(KTR_IW_CXGBE, "%s:smrejB %p %u %d", __func__, ep, ep->hwtid, ep->plen); mpalen = sizeof(*mpa) + plen; if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { mpalen += sizeof(struct mpa_v2_conn_params); CTR4(KTR_IW_CXGBE, "%s:smrej1 %p %u %d", __func__, ep, ep->mpa_attr.version, mpalen); } mpa = malloc(mpalen, M_CXGBE, M_NOWAIT); if (mpa == NULL) return (-ENOMEM); memset(mpa, 0, mpalen); memcpy(mpa->key, MPA_KEY_REP, sizeof(mpa->key)); mpa->flags = MPA_REJECT; mpa->revision = mpa_rev; mpa->private_data_size = htons(plen); if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { mpa->flags |= MPA_ENHANCED_RDMA_CONN; mpa->private_data_size += htons(sizeof(struct mpa_v2_conn_params)); mpa_v2_params.ird = htons(((u16)ep->ird) | (peer2peer ? MPA_V2_PEER2PEER_MODEL : 0)); mpa_v2_params.ord = htons(((u16)ep->ord) | (peer2peer ? (p2p_type == FW_RI_INIT_P2PTYPE_RDMA_WRITE ? MPA_V2_RDMA_WRITE_RTR : p2p_type == FW_RI_INIT_P2PTYPE_READ_REQ ? MPA_V2_RDMA_READ_RTR : 0) : 0)); memcpy(mpa->private_data, &mpa_v2_params, sizeof(struct mpa_v2_conn_params)); if (ep->plen) memcpy(mpa->private_data + sizeof(struct mpa_v2_conn_params), pdata, plen); CTR5(KTR_IW_CXGBE, "%s:smrej3 %p %d %d %d", __func__, ep, mpa_v2_params.ird, mpa_v2_params.ord, ep->plen); } else if (plen) memcpy(mpa->private_data, pdata, plen); m = m_getm(NULL, mpalen, M_NOWAIT, MT_DATA); if (m == NULL) { free(mpa, M_CXGBE); return (-ENOMEM); } m_copyback(m, 0, mpalen, (void *)mpa); free(mpa, M_CXGBE); err = -sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); if (!err) ep->snd_seq += mpalen; CTR4(KTR_IW_CXGBE, "%s:smrejE %p %u %d", __func__, ep, ep->hwtid, err); return err; } static int send_mpa_reply(struct c4iw_ep *ep, const void *pdata, u8 plen) { int mpalen; struct mpa_message *mpa; struct mbuf *m; struct mpa_v2_conn_params mpa_v2_params; int err; CTR2(KTR_IW_CXGBE, "%s:smrepB %p", __func__, ep); mpalen = sizeof(*mpa) + plen; if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { CTR3(KTR_IW_CXGBE, "%s:smrep1 %p %d", __func__, ep, ep->mpa_attr.version); mpalen += sizeof(struct mpa_v2_conn_params); } mpa = malloc(mpalen, M_CXGBE, M_NOWAIT); if (mpa == NULL) return (-ENOMEM); memset(mpa, 0, sizeof(*mpa)); memcpy(mpa->key, MPA_KEY_REP, sizeof(mpa->key)); mpa->flags = (ep->mpa_attr.crc_enabled ? MPA_CRC : 0) | (markers_enabled ? MPA_MARKERS : 0); mpa->revision = ep->mpa_attr.version; mpa->private_data_size = htons(plen); if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { mpa->flags |= MPA_ENHANCED_RDMA_CONN; mpa->private_data_size += htons(sizeof(struct mpa_v2_conn_params)); mpa_v2_params.ird = htons((u16)ep->ird); mpa_v2_params.ord = htons((u16)ep->ord); CTR5(KTR_IW_CXGBE, "%s:smrep3 %p %d %d %d", __func__, ep, ep->mpa_attr.version, mpa_v2_params.ird, mpa_v2_params.ord); if (peer2peer && (ep->mpa_attr.p2p_type != FW_RI_INIT_P2PTYPE_DISABLED)) { mpa_v2_params.ird |= htons(MPA_V2_PEER2PEER_MODEL); if (p2p_type == FW_RI_INIT_P2PTYPE_RDMA_WRITE) { mpa_v2_params.ord |= htons(MPA_V2_RDMA_WRITE_RTR); CTR5(KTR_IW_CXGBE, "%s:smrep4 %p %d %d %d", __func__, ep, p2p_type, mpa_v2_params.ird, mpa_v2_params.ord); } else if (p2p_type == FW_RI_INIT_P2PTYPE_READ_REQ) { mpa_v2_params.ord |= htons(MPA_V2_RDMA_READ_RTR); CTR5(KTR_IW_CXGBE, "%s:smrep5 %p %d %d %d", __func__, ep, p2p_type, mpa_v2_params.ird, mpa_v2_params.ord); } } memcpy(mpa->private_data, &mpa_v2_params, sizeof(struct mpa_v2_conn_params)); if (ep->plen) memcpy(mpa->private_data + sizeof(struct mpa_v2_conn_params), pdata, plen); } else if (plen) memcpy(mpa->private_data, pdata, plen); m = m_getm(NULL, mpalen, M_NOWAIT, MT_DATA); if (m == NULL) { free(mpa, M_CXGBE); return (-ENOMEM); } m_copyback(m, 0, mpalen, (void *)mpa); free(mpa, M_CXGBE); state_set(&ep->com, MPA_REP_SENT); ep->snd_seq += mpalen; err = -sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); CTR3(KTR_IW_CXGBE, "%s:smrepE %p %d", __func__, ep, err); return err; } static void close_complete_upcall(struct c4iw_ep *ep, int status) { struct iw_cm_event event; CTR2(KTR_IW_CXGBE, "%s:ccuB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CLOSE; event.status = status; if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:ccu1 %1", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; set_bit(CLOSE_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:ccuE %p", __func__, ep); } static int abort_connection(struct c4iw_ep *ep) { int err; CTR2(KTR_IW_CXGBE, "%s:abB %p", __func__, ep); state_set(&ep->com, ABORTING); abort_socket(ep); err = close_socket(&ep->com, 0); set_bit(ABORT_CONN, &ep->com.history); CTR2(KTR_IW_CXGBE, "%s:abE %p", __func__, ep); return err; } static void peer_close_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; CTR2(KTR_IW_CXGBE, "%s:pcuB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_DISCONNECT; if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:pcu1 %p", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); set_bit(DISCONN_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:pcuE %p", __func__, ep); } static void peer_abort_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; CTR2(KTR_IW_CXGBE, "%s:pauB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CLOSE; event.status = -ECONNRESET; if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:pau1 %p", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; set_bit(ABORT_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:pauE %p", __func__, ep); } static void connect_reply_upcall(struct c4iw_ep *ep, int status) { struct iw_cm_event event; CTR3(KTR_IW_CXGBE, "%s:cruB %p", __func__, ep, status); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CONNECT_REPLY; event.status = (status ==-ECONNABORTED)?-ECONNRESET: status; event.local_addr = ep->com.local_addr; event.remote_addr = ep->com.remote_addr; if ((status == 0) || (status == -ECONNREFUSED)) { if (!ep->tried_with_mpa_v1) { CTR2(KTR_IW_CXGBE, "%s:cru1 %p", __func__, ep); /* this means MPA_v2 is used */ event.private_data_len = ep->plen - sizeof(struct mpa_v2_conn_params); event.private_data = ep->mpa_pkt + sizeof(struct mpa_message) + sizeof(struct mpa_v2_conn_params); } else { CTR2(KTR_IW_CXGBE, "%s:cru2 %p", __func__, ep); /* this means MPA_v1 is used */ event.private_data_len = ep->plen; event.private_data = ep->mpa_pkt + sizeof(struct mpa_message); } } if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:cru3 %p", __func__, ep); set_bit(CONN_RPL_UPCALL, &ep->com.history); ep->com.cm_id->event_handler(ep->com.cm_id, &event); } if(status == -ECONNABORTED) { CTR3(KTR_IW_CXGBE, "%s:cruE %p %d", __func__, ep, status); return; } if (status < 0) { CTR3(KTR_IW_CXGBE, "%s:cru4 %p %d", __func__, ep, status); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; } CTR2(KTR_IW_CXGBE, "%s:cruE %p", __func__, ep); } static int connect_request_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; int ret; CTR3(KTR_IW_CXGBE, "%s: ep %p, mpa_v1 %d", __func__, ep, ep->tried_with_mpa_v1); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CONNECT_REQUEST; event.local_addr = ep->com.local_addr; event.remote_addr = ep->com.remote_addr; event.provider_data = ep; event.so = ep->com.so; if (!ep->tried_with_mpa_v1) { /* this means MPA_v2 is used */ event.ord = ep->ord; event.ird = ep->ird; event.private_data_len = ep->plen - sizeof(struct mpa_v2_conn_params); event.private_data = ep->mpa_pkt + sizeof(struct mpa_message) + sizeof(struct mpa_v2_conn_params); } else { /* this means MPA_v1 is used. Send max supported */ event.ord = c4iw_max_read_depth; event.ird = c4iw_max_read_depth; event.private_data_len = ep->plen; event.private_data = ep->mpa_pkt + sizeof(struct mpa_message); } c4iw_get_ep(&ep->com); ret = ep->parent_ep->com.cm_id->event_handler(ep->parent_ep->com.cm_id, &event); if(ret) c4iw_put_ep(&ep->com); set_bit(CONNREQ_UPCALL, &ep->com.history); c4iw_put_ep(&ep->parent_ep->com); return ret; } static void established_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; CTR2(KTR_IW_CXGBE, "%s:euB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_ESTABLISHED; event.ird = ep->ird; event.ord = ep->ord; if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:eu1 %p", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); set_bit(ESTAB_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:euE %p", __func__, ep); } static void process_mpa_reply(struct c4iw_ep *ep) { struct mpa_message *mpa; struct mpa_v2_conn_params *mpa_v2_params; u16 plen; u16 resp_ird, resp_ord; u8 rtr_mismatch = 0, insuff_ird = 0; struct c4iw_qp_attributes attrs; enum c4iw_qp_attr_mask mask; int err; struct mbuf *top, *m; int flags = MSG_DONTWAIT; struct uio uio; CTR2(KTR_IW_CXGBE, "%s:pmrB %p", __func__, ep); /* * Stop mpa timer. If it expired, then the state has * changed and we bail since ep_timeout already aborted * the connection. */ STOP_EP_TIMER(ep); if (state_read(&ep->com) != MPA_REQ_SENT) return; uio.uio_resid = 1000000; uio.uio_td = ep->com.thread; err = soreceive(ep->com.so, NULL, &uio, &top, NULL, &flags); if (err) { if (err == EWOULDBLOCK) { CTR2(KTR_IW_CXGBE, "%s:pmr1 %p", __func__, ep); START_EP_TIMER(ep); return; } err = -err; CTR2(KTR_IW_CXGBE, "%s:pmr2 %p", __func__, ep); goto err; } if (ep->com.so->so_rcv.sb_mb) { CTR2(KTR_IW_CXGBE, "%s:pmr3 %p", __func__, ep); printf("%s data after soreceive called! so %p sb_mb %p top %p\n", __func__, ep->com.so, ep->com.so->so_rcv.sb_mb, top); } m = top; do { CTR2(KTR_IW_CXGBE, "%s:pmr4 %p", __func__, ep); /* * If we get more than the supported amount of private data * then we must fail this connection. */ if (ep->mpa_pkt_len + m->m_len > sizeof(ep->mpa_pkt)) { CTR3(KTR_IW_CXGBE, "%s:pmr5 %p %d", __func__, ep, ep->mpa_pkt_len + m->m_len); err = (-EINVAL); goto err; } /* * copy the new data into our accumulation buffer. */ m_copydata(m, 0, m->m_len, &(ep->mpa_pkt[ep->mpa_pkt_len])); ep->mpa_pkt_len += m->m_len; if (!m->m_next) m = m->m_nextpkt; else m = m->m_next; } while (m); m_freem(top); /* * if we don't even have the mpa message, then bail. */ if (ep->mpa_pkt_len < sizeof(*mpa)) return; mpa = (struct mpa_message *) ep->mpa_pkt; /* Validate MPA header. */ if (mpa->revision > mpa_rev) { CTR4(KTR_IW_CXGBE, "%s:pmr6 %p %d %d", __func__, ep, mpa->revision, mpa_rev); printk(KERN_ERR MOD "%s MPA version mismatch. Local = %d, " " Received = %d\n", __func__, mpa_rev, mpa->revision); err = -EPROTO; goto err; } if (memcmp(mpa->key, MPA_KEY_REP, sizeof(mpa->key))) { CTR2(KTR_IW_CXGBE, "%s:pmr7 %p", __func__, ep); err = -EPROTO; goto err; } plen = ntohs(mpa->private_data_size); /* * Fail if there's too much private data. */ if (plen > MPA_MAX_PRIVATE_DATA) { CTR2(KTR_IW_CXGBE, "%s:pmr8 %p", __func__, ep); err = -EPROTO; goto err; } /* * If plen does not account for pkt size */ if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) { CTR2(KTR_IW_CXGBE, "%s:pmr9 %p", __func__, ep); err = -EPROTO; goto err; } ep->plen = (u8) plen; /* * If we don't have all the pdata yet, then bail. * We'll continue process when more data arrives. */ if (ep->mpa_pkt_len < (sizeof(*mpa) + plen)) { CTR2(KTR_IW_CXGBE, "%s:pmra %p", __func__, ep); return; } if (mpa->flags & MPA_REJECT) { CTR2(KTR_IW_CXGBE, "%s:pmrb %p", __func__, ep); err = -ECONNREFUSED; goto err; } /* * If we get here we have accumulated the entire mpa * start reply message including private data. And * the MPA header is valid. */ state_set(&ep->com, FPDU_MODE); ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0; ep->mpa_attr.recv_marker_enabled = markers_enabled; ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0; ep->mpa_attr.version = mpa->revision; ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_DISABLED; if (mpa->revision == 2) { CTR2(KTR_IW_CXGBE, "%s:pmrc %p", __func__, ep); ep->mpa_attr.enhanced_rdma_conn = mpa->flags & MPA_ENHANCED_RDMA_CONN ? 1 : 0; if (ep->mpa_attr.enhanced_rdma_conn) { CTR2(KTR_IW_CXGBE, "%s:pmrd %p", __func__, ep); mpa_v2_params = (struct mpa_v2_conn_params *) (ep->mpa_pkt + sizeof(*mpa)); resp_ird = ntohs(mpa_v2_params->ird) & MPA_V2_IRD_ORD_MASK; resp_ord = ntohs(mpa_v2_params->ord) & MPA_V2_IRD_ORD_MASK; /* * This is a double-check. Ideally, below checks are * not required since ird/ord stuff has been taken * care of in c4iw_accept_cr */ if ((ep->ird < resp_ord) || (ep->ord > resp_ird)) { CTR2(KTR_IW_CXGBE, "%s:pmre %p", __func__, ep); err = -ENOMEM; ep->ird = resp_ord; ep->ord = resp_ird; insuff_ird = 1; } if (ntohs(mpa_v2_params->ird) & MPA_V2_PEER2PEER_MODEL) { CTR2(KTR_IW_CXGBE, "%s:pmrf %p", __func__, ep); if (ntohs(mpa_v2_params->ord) & MPA_V2_RDMA_WRITE_RTR) { CTR2(KTR_IW_CXGBE, "%s:pmrg %p", __func__, ep); ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_RDMA_WRITE; } else if (ntohs(mpa_v2_params->ord) & MPA_V2_RDMA_READ_RTR) { CTR2(KTR_IW_CXGBE, "%s:pmrh %p", __func__, ep); ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_READ_REQ; } } } } else { CTR2(KTR_IW_CXGBE, "%s:pmri %p", __func__, ep); if (mpa->revision == 1) { CTR2(KTR_IW_CXGBE, "%s:pmrj %p", __func__, ep); if (peer2peer) { CTR2(KTR_IW_CXGBE, "%s:pmrk %p", __func__, ep); ep->mpa_attr.p2p_type = p2p_type; } } } if (set_tcpinfo(ep)) { CTR2(KTR_IW_CXGBE, "%s:pmrl %p", __func__, ep); printf("%s set_tcpinfo error\n", __func__); goto err; } CTR6(KTR_IW_CXGBE, "%s - crc_enabled = %d, recv_marker_enabled = %d, " "xmit_marker_enabled = %d, version = %d p2p_type = %d", __func__, ep->mpa_attr.crc_enabled, ep->mpa_attr.recv_marker_enabled, ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version, ep->mpa_attr.p2p_type); /* * If responder's RTR does not match with that of initiator, assign * FW_RI_INIT_P2PTYPE_DISABLED in mpa attributes so that RTR is not * generated when moving QP to RTS state. * A TERM message will be sent after QP has moved to RTS state */ if ((ep->mpa_attr.version == 2) && peer2peer && (ep->mpa_attr.p2p_type != p2p_type)) { CTR2(KTR_IW_CXGBE, "%s:pmrm %p", __func__, ep); ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_DISABLED; rtr_mismatch = 1; } //ep->ofld_txq = TOEPCB(ep->com.so)->ofld_txq; attrs.mpa_attr = ep->mpa_attr; attrs.max_ird = ep->ird; attrs.max_ord = ep->ord; attrs.llp_stream_handle = ep; attrs.next_state = C4IW_QP_STATE_RTS; mask = C4IW_QP_ATTR_NEXT_STATE | C4IW_QP_ATTR_LLP_STREAM_HANDLE | C4IW_QP_ATTR_MPA_ATTR | C4IW_QP_ATTR_MAX_IRD | C4IW_QP_ATTR_MAX_ORD; /* bind QP and TID with INIT_WR */ err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, mask, &attrs, 1); if (err) { CTR2(KTR_IW_CXGBE, "%s:pmrn %p", __func__, ep); goto err; } /* * If responder's RTR requirement did not match with what initiator * supports, generate TERM message */ if (rtr_mismatch) { CTR2(KTR_IW_CXGBE, "%s:pmro %p", __func__, ep); printk(KERN_ERR "%s: RTR mismatch, sending TERM\n", __func__); attrs.layer_etype = LAYER_MPA | DDP_LLP; attrs.ecode = MPA_NOMATCH_RTR; attrs.next_state = C4IW_QP_STATE_TERMINATE; err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0); err = -ENOMEM; goto out; } /* * Generate TERM if initiator IRD is not sufficient for responder * provided ORD. Currently, we do the same behaviour even when * responder provided IRD is also not sufficient as regards to * initiator ORD. */ if (insuff_ird) { CTR2(KTR_IW_CXGBE, "%s:pmrp %p", __func__, ep); printk(KERN_ERR "%s: Insufficient IRD, sending TERM\n", __func__); attrs.layer_etype = LAYER_MPA | DDP_LLP; attrs.ecode = MPA_INSUFF_IRD; attrs.next_state = C4IW_QP_STATE_TERMINATE; err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0); err = -ENOMEM; goto out; } goto out; err: state_set(&ep->com, ABORTING); abort_connection(ep); out: connect_reply_upcall(ep, err); CTR2(KTR_IW_CXGBE, "%s:pmrE %p", __func__, ep); return; } static void process_mpa_request(struct c4iw_ep *ep) { struct mpa_message *mpa; u16 plen; int flags = MSG_DONTWAIT; int rc; struct iovec iov; struct uio uio; enum c4iw_ep_state state = state_read(&ep->com); CTR3(KTR_IW_CXGBE, "%s: ep %p, state %s", __func__, ep, states[state]); if (state != MPA_REQ_WAIT) return; iov.iov_base = &ep->mpa_pkt[ep->mpa_pkt_len]; iov.iov_len = sizeof(ep->mpa_pkt) - ep->mpa_pkt_len; uio.uio_iov = &iov; uio.uio_iovcnt = 1; uio.uio_offset = 0; uio.uio_resid = sizeof(ep->mpa_pkt) - ep->mpa_pkt_len; uio.uio_segflg = UIO_SYSSPACE; uio.uio_rw = UIO_READ; uio.uio_td = NULL; /* uio.uio_td = ep->com.thread; */ rc = soreceive(ep->com.so, NULL, &uio, NULL, NULL, &flags); if (rc == EAGAIN) return; else if (rc) { abort: STOP_EP_TIMER(ep); abort_connection(ep); return; } KASSERT(uio.uio_offset > 0, ("%s: sorecieve on so %p read no data", __func__, ep->com.so)); ep->mpa_pkt_len += uio.uio_offset; /* * If we get more than the supported amount of private data then we must * fail this connection. XXX: check so_rcv->sb_cc, or peek with another * soreceive, or increase the size of mpa_pkt by 1 and abort if the last * byte is filled by the soreceive above. */ /* Don't even have the MPA message. Wait for more data to arrive. */ if (ep->mpa_pkt_len < sizeof(*mpa)) return; mpa = (struct mpa_message *) ep->mpa_pkt; /* * Validate MPA Header. */ if (mpa->revision > mpa_rev) { log(LOG_ERR, "%s: MPA version mismatch. Local = %d," " Received = %d\n", __func__, mpa_rev, mpa->revision); goto abort; } if (memcmp(mpa->key, MPA_KEY_REQ, sizeof(mpa->key))) goto abort; /* * Fail if there's too much private data. */ plen = ntohs(mpa->private_data_size); if (plen > MPA_MAX_PRIVATE_DATA) goto abort; /* * If plen does not account for pkt size */ if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) goto abort; ep->plen = (u8) plen; /* * If we don't have all the pdata yet, then bail. */ if (ep->mpa_pkt_len < (sizeof(*mpa) + plen)) return; /* * If we get here we have accumulated the entire mpa * start reply message including private data. */ ep->mpa_attr.initiator = 0; ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0; ep->mpa_attr.recv_marker_enabled = markers_enabled; ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0; ep->mpa_attr.version = mpa->revision; if (mpa->revision == 1) ep->tried_with_mpa_v1 = 1; ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_DISABLED; if (mpa->revision == 2) { ep->mpa_attr.enhanced_rdma_conn = mpa->flags & MPA_ENHANCED_RDMA_CONN ? 1 : 0; if (ep->mpa_attr.enhanced_rdma_conn) { struct mpa_v2_conn_params *mpa_v2_params; u16 ird, ord; mpa_v2_params = (void *)&ep->mpa_pkt[sizeof(*mpa)]; ird = ntohs(mpa_v2_params->ird); ord = ntohs(mpa_v2_params->ord); ep->ird = ird & MPA_V2_IRD_ORD_MASK; ep->ord = ord & MPA_V2_IRD_ORD_MASK; if (ird & MPA_V2_PEER2PEER_MODEL && peer2peer) { if (ord & MPA_V2_RDMA_WRITE_RTR) { ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_RDMA_WRITE; } else if (ord & MPA_V2_RDMA_READ_RTR) { ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_READ_REQ; } } } } else if (mpa->revision == 1 && peer2peer) ep->mpa_attr.p2p_type = p2p_type; if (set_tcpinfo(ep)) goto abort; CTR5(KTR_IW_CXGBE, "%s: crc_enabled = %d, recv_marker_enabled = %d, " "xmit_marker_enabled = %d, version = %d", __func__, ep->mpa_attr.crc_enabled, ep->mpa_attr.recv_marker_enabled, ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version); state_set(&ep->com, MPA_REQ_RCVD); STOP_EP_TIMER(ep); /* drive upcall */ mutex_lock(&ep->parent_ep->com.mutex); if (ep->parent_ep->com.state != DEAD) { if(connect_request_upcall(ep)) { abort_connection(ep); } }else abort_connection(ep); mutex_unlock(&ep->parent_ep->com.mutex); } /* * Upcall from the adapter indicating data has been transmitted. * For us its just the single MPA request or reply. We can now free * the skb holding the mpa message. */ int c4iw_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len) { int err; struct c4iw_ep *ep = to_ep(cm_id); CTR2(KTR_IW_CXGBE, "%s:crcB %p", __func__, ep); if (state_read(&ep->com) == DEAD) { CTR2(KTR_IW_CXGBE, "%s:crc1 %p", __func__, ep); c4iw_put_ep(&ep->com); return -ECONNRESET; } set_bit(ULP_REJECT, &ep->com.history); BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD); if (mpa_rev == 0) { CTR2(KTR_IW_CXGBE, "%s:crc2 %p", __func__, ep); abort_connection(ep); } else { CTR2(KTR_IW_CXGBE, "%s:crc3 %p", __func__, ep); err = send_mpa_reject(ep, pdata, pdata_len); err = soshutdown(ep->com.so, 3); } c4iw_put_ep(&ep->com); CTR2(KTR_IW_CXGBE, "%s:crc4 %p", __func__, ep); return 0; } int c4iw_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) { int err; struct c4iw_qp_attributes attrs; enum c4iw_qp_attr_mask mask; struct c4iw_ep *ep = to_ep(cm_id); struct c4iw_dev *h = to_c4iw_dev(cm_id->device); struct c4iw_qp *qp = get_qhp(h, conn_param->qpn); CTR2(KTR_IW_CXGBE, "%s:cacB %p", __func__, ep); if (state_read(&ep->com) == DEAD) { CTR2(KTR_IW_CXGBE, "%s:cac1 %p", __func__, ep); err = -ECONNRESET; goto err; } BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD); BUG_ON(!qp); set_bit(ULP_ACCEPT, &ep->com.history); if ((conn_param->ord > c4iw_max_read_depth) || (conn_param->ird > c4iw_max_read_depth)) { CTR2(KTR_IW_CXGBE, "%s:cac2 %p", __func__, ep); abort_connection(ep); err = -EINVAL; goto err; } if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { CTR2(KTR_IW_CXGBE, "%s:cac3 %p", __func__, ep); if (conn_param->ord > ep->ird) { CTR2(KTR_IW_CXGBE, "%s:cac4 %p", __func__, ep); ep->ird = conn_param->ird; ep->ord = conn_param->ord; send_mpa_reject(ep, conn_param->private_data, conn_param->private_data_len); abort_connection(ep); err = -ENOMEM; goto err; } if (conn_param->ird > ep->ord) { CTR2(KTR_IW_CXGBE, "%s:cac5 %p", __func__, ep); if (!ep->ord) { CTR2(KTR_IW_CXGBE, "%s:cac6 %p", __func__, ep); conn_param->ird = 1; } else { CTR2(KTR_IW_CXGBE, "%s:cac7 %p", __func__, ep); abort_connection(ep); err = -ENOMEM; goto err; } } } ep->ird = conn_param->ird; ep->ord = conn_param->ord; if (ep->mpa_attr.version != 2) { CTR2(KTR_IW_CXGBE, "%s:cac8 %p", __func__, ep); if (peer2peer && ep->ird == 0) { CTR2(KTR_IW_CXGBE, "%s:cac9 %p", __func__, ep); ep->ird = 1; } } cm_id->add_ref(cm_id); ep->com.cm_id = cm_id; ep->com.qp = qp; //ep->ofld_txq = TOEPCB(ep->com.so)->ofld_txq; /* bind QP to EP and move to RTS */ attrs.mpa_attr = ep->mpa_attr; attrs.max_ird = ep->ird; attrs.max_ord = ep->ord; attrs.llp_stream_handle = ep; attrs.next_state = C4IW_QP_STATE_RTS; /* bind QP and TID with INIT_WR */ mask = C4IW_QP_ATTR_NEXT_STATE | C4IW_QP_ATTR_LLP_STREAM_HANDLE | C4IW_QP_ATTR_MPA_ATTR | C4IW_QP_ATTR_MAX_IRD | C4IW_QP_ATTR_MAX_ORD; err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, mask, &attrs, 1); if (err) { CTR2(KTR_IW_CXGBE, "%s:caca %p", __func__, ep); goto err1; } err = send_mpa_reply(ep, conn_param->private_data, conn_param->private_data_len); if (err) { CTR2(KTR_IW_CXGBE, "%s:caca %p", __func__, ep); goto err1; } state_set(&ep->com, FPDU_MODE); established_upcall(ep); c4iw_put_ep(&ep->com); CTR2(KTR_IW_CXGBE, "%s:cacE %p", __func__, ep); return 0; err1: ep->com.cm_id = NULL; ep->com.qp = NULL; cm_id->rem_ref(cm_id); err: c4iw_put_ep(&ep->com); CTR2(KTR_IW_CXGBE, "%s:cacE err %p", __func__, ep); return err; } int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) { int err = 0; struct c4iw_dev *dev = to_c4iw_dev(cm_id->device); struct c4iw_ep *ep = NULL; struct nhop4_extended nh4; struct toedev *tdev; CTR2(KTR_IW_CXGBE, "%s:ccB %p", __func__, cm_id); if ((conn_param->ord > c4iw_max_read_depth) || (conn_param->ird > c4iw_max_read_depth)) { CTR2(KTR_IW_CXGBE, "%s:cc1 %p", __func__, cm_id); err = -EINVAL; goto out; } ep = alloc_ep(sizeof(*ep), M_NOWAIT); if (!ep) { CTR2(KTR_IW_CXGBE, "%s:cc2 %p", __func__, cm_id); printk(KERN_ERR MOD "%s - cannot alloc ep.\n", __func__); err = -ENOMEM; goto out; } init_timer(&ep->timer); ep->plen = conn_param->private_data_len; if (ep->plen) { CTR2(KTR_IW_CXGBE, "%s:cc3 %p", __func__, ep); memcpy(ep->mpa_pkt + sizeof(struct mpa_message), conn_param->private_data, ep->plen); } ep->ird = conn_param->ird; ep->ord = conn_param->ord; if (peer2peer && ep->ord == 0) { CTR2(KTR_IW_CXGBE, "%s:cc4 %p", __func__, ep); ep->ord = 1; } cm_id->add_ref(cm_id); ep->com.dev = dev; ep->com.cm_id = cm_id; ep->com.qp = get_qhp(dev, conn_param->qpn); if (!ep->com.qp) { CTR2(KTR_IW_CXGBE, "%s:cc5 %p", __func__, ep); err = -EINVAL; goto fail2; } ep->com.thread = curthread; ep->com.so = cm_id->so; init_sock(&ep->com); /* find a route */ err = find_route( cm_id->local_addr.sin_addr.s_addr, cm_id->remote_addr.sin_addr.s_addr, cm_id->local_addr.sin_port, cm_id->remote_addr.sin_port, 0, &nh4); if (err) { CTR2(KTR_IW_CXGBE, "%s:cc7 %p", __func__, ep); printk(KERN_ERR MOD "%s - cannot find route.\n", __func__); err = -EHOSTUNREACH; goto fail2; } if (!(nh4.nh_ifp->if_capenable & IFCAP_TOE)) { CTR2(KTR_IW_CXGBE, "%s:cc8 %p", __func__, ep); printf("%s - interface not TOE capable.\n", __func__); close_socket(&ep->com, 0); err = -ENOPROTOOPT; goto fail3; } tdev = TOEDEV(nh4.nh_ifp); if (tdev == NULL) { CTR2(KTR_IW_CXGBE, "%s:cc9 %p", __func__, ep); printf("%s - No toedev for interface.\n", __func__); goto fail3; } fib4_free_nh_ext(RT_DEFAULT_FIB, &nh4); state_set(&ep->com, CONNECTING); ep->tos = 0; ep->com.local_addr = cm_id->local_addr; ep->com.remote_addr = cm_id->remote_addr; err = soconnect(ep->com.so, (struct sockaddr *)&ep->com.remote_addr, ep->com.thread); if (!err) { CTR2(KTR_IW_CXGBE, "%s:cca %p", __func__, ep); goto out; } else { close_socket(&ep->com, 0); goto fail2; } fail3: CTR2(KTR_IW_CXGBE, "%s:ccb %p", __func__, ep); fib4_free_nh_ext(RT_DEFAULT_FIB, &nh4); fail2: cm_id->rem_ref(cm_id); c4iw_put_ep(&ep->com); out: CTR2(KTR_IW_CXGBE, "%s:ccE %p", __func__, ep); return err; } /* - * iwcm->create_listen. Returns -errno on failure. + * iwcm->create_listen_ep. Returns -errno on failure. */ int -c4iw_create_listen(struct iw_cm_id *cm_id, int backlog) +c4iw_create_listen_ep(struct iw_cm_id *cm_id, int backlog) { int rc; struct c4iw_dev *dev = to_c4iw_dev(cm_id->device); struct c4iw_listen_ep *ep; struct socket *so = cm_id->so; ep = alloc_ep(sizeof(*ep), GFP_KERNEL); CTR5(KTR_IW_CXGBE, "%s: cm_id %p, lso %p, ep %p, inp %p", __func__, cm_id, so, ep, so->so_pcb); if (ep == NULL) { log(LOG_ERR, "%s: failed to alloc memory for endpoint\n", __func__); rc = ENOMEM; goto failed; } cm_id->add_ref(cm_id); ep->com.cm_id = cm_id; ep->com.dev = dev; ep->backlog = backlog; ep->com.local_addr = cm_id->local_addr; ep->com.thread = curthread; state_set(&ep->com, LISTEN); ep->com.so = so; - init_sock(&ep->com); - rc = solisten(so, ep->backlog, ep->com.thread); - if (rc != 0) { - log(LOG_ERR, "%s: failed to start listener: %d\n", __func__, - rc); - close_socket(&ep->com, 0); - cm_id->rem_ref(cm_id); - c4iw_put_ep(&ep->com); - goto failed; - } - cm_id->provider_data = ep; return (0); failed: CTR3(KTR_IW_CXGBE, "%s: cm_id %p, FAILED (%d)", __func__, cm_id, rc); return (-rc); } -int -c4iw_destroy_listen(struct iw_cm_id *cm_id) +void +c4iw_destroy_listen_ep(struct iw_cm_id *cm_id) { - int rc; struct c4iw_listen_ep *ep = to_listen_ep(cm_id); - CTR4(KTR_IW_CXGBE, "%s: cm_id %p, so %p, inp %p", __func__, cm_id, - cm_id->so, cm_id->so->so_pcb); + CTR4(KTR_IW_CXGBE, "%s: cm_id %p, so %p, state %s", __func__, cm_id, + cm_id->so, states[ep->com.state]); state_set(&ep->com, DEAD); - rc = close_socket(&ep->com, 0); cm_id->rem_ref(cm_id); c4iw_put_ep(&ep->com); - return (rc); + return; } int c4iw_ep_disconnect(struct c4iw_ep *ep, int abrupt, gfp_t gfp) { int ret = 0; int close = 0; int fatal = 0; struct c4iw_rdev *rdev; mutex_lock(&ep->com.mutex); CTR2(KTR_IW_CXGBE, "%s:cedB %p", __func__, ep); rdev = &ep->com.dev->rdev; if (c4iw_fatal_error(rdev)) { CTR2(KTR_IW_CXGBE, "%s:ced1 %p", __func__, ep); fatal = 1; close_complete_upcall(ep, -ECONNRESET); ep->com.state = DEAD; } CTR3(KTR_IW_CXGBE, "%s:ced2 %p %s", __func__, ep, states[ep->com.state]); switch (ep->com.state) { case MPA_REQ_WAIT: case MPA_REQ_SENT: case MPA_REQ_RCVD: case MPA_REP_SENT: case FPDU_MODE: close = 1; if (abrupt) ep->com.state = ABORTING; else { ep->com.state = CLOSING; START_EP_TIMER(ep); } set_bit(CLOSE_SENT, &ep->com.flags); break; case CLOSING: if (!test_and_set_bit(CLOSE_SENT, &ep->com.flags)) { close = 1; if (abrupt) { STOP_EP_TIMER(ep); ep->com.state = ABORTING; } else ep->com.state = MORIBUND; } break; case MORIBUND: case ABORTING: case DEAD: CTR3(KTR_IW_CXGBE, "%s ignoring disconnect ep %p state %u", __func__, ep, ep->com.state); break; default: BUG(); break; } mutex_unlock(&ep->com.mutex); if (close) { CTR2(KTR_IW_CXGBE, "%s:ced3 %p", __func__, ep); if (abrupt) { CTR2(KTR_IW_CXGBE, "%s:ced4 %p", __func__, ep); set_bit(EP_DISC_ABORT, &ep->com.history); ret = abort_connection(ep); } else { CTR2(KTR_IW_CXGBE, "%s:ced5 %p", __func__, ep); set_bit(EP_DISC_CLOSE, &ep->com.history); if (!ep->parent_ep) __state_set(&ep->com, MORIBUND); ret = shutdown_socket(&ep->com); } if (ret) { fatal = 1; } } if (fatal) { release_ep_resources(ep); CTR2(KTR_IW_CXGBE, "%s:ced6 %p", __func__, ep); } CTR2(KTR_IW_CXGBE, "%s:cedE %p", __func__, ep); return ret; } #ifdef C4IW_EP_REDIRECT int c4iw_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new, struct l2t_entry *l2t) { struct c4iw_ep *ep = ctx; if (ep->dst != old) return 0; PDBG("%s ep %p redirect to dst %p l2t %p\n", __func__, ep, new, l2t); dst_hold(new); cxgb4_l2t_release(ep->l2t); ep->l2t = l2t; dst_release(old); ep->dst = new; return 1; } #endif static void ep_timeout(unsigned long arg) { struct c4iw_ep *ep = (struct c4iw_ep *)arg; int kickit = 0; CTR2(KTR_IW_CXGBE, "%s:etB %p", __func__, ep); spin_lock(&timeout_lock); if (!test_and_set_bit(TIMEOUT, &ep->com.flags)) { list_add_tail(&ep->entry, &timeout_list); kickit = 1; } spin_unlock(&timeout_lock); if (kickit) { CTR2(KTR_IW_CXGBE, "%s:et1 %p", __func__, ep); queue_work(c4iw_taskq, &c4iw_task); } CTR2(KTR_IW_CXGBE, "%s:etE %p", __func__, ep); } static int fw6_wr_rpl(struct adapter *sc, const __be64 *rpl) { uint64_t val = be64toh(*rpl); int ret; struct c4iw_wr_wait *wr_waitp; ret = (int)((val >> 8) & 0xff); wr_waitp = (struct c4iw_wr_wait *)rpl[1]; CTR3(KTR_IW_CXGBE, "%s wr_waitp %p ret %u", __func__, wr_waitp, ret); if (wr_waitp) c4iw_wake_up(wr_waitp, ret ? -ret : 0); return (0); } static int fw6_cqe_handler(struct adapter *sc, const __be64 *rpl) { struct t4_cqe cqe =*(const struct t4_cqe *)(&rpl[0]); CTR2(KTR_IW_CXGBE, "%s rpl %p", __func__, rpl); c4iw_ev_dispatch(sc->iwarp_softc, &cqe); return (0); } static int terminate(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) { struct adapter *sc = iq->adapter; const struct cpl_rdma_terminate *cpl = mtod(m, const void *); unsigned int tid = GET_TID(cpl); struct c4iw_qp_attributes attrs; struct toepcb *toep = lookup_tid(sc, tid); struct socket *so; struct c4iw_ep *ep; INP_WLOCK(toep->inp); so = inp_inpcbtosocket(toep->inp); ep = so->so_rcv.sb_upcallarg; INP_WUNLOCK(toep->inp); CTR2(KTR_IW_CXGBE, "%s:tB %p %d", __func__, ep); if (ep && ep->com.qp) { printk(KERN_WARNING MOD "TERM received tid %u qpid %u\n", tid, ep->com.qp->wq.sq.qid); attrs.next_state = C4IW_QP_STATE_TERMINATE; c4iw_modify_qp(ep->com.dev, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); } else printk(KERN_WARNING MOD "TERM received tid %u no ep/qp\n", tid); CTR2(KTR_IW_CXGBE, "%s:tE %p %d", __func__, ep); return 0; } void c4iw_cm_init_cpl(struct adapter *sc) { t4_register_cpl_handler(sc, CPL_RDMA_TERMINATE, terminate); t4_register_fw_msg_handler(sc, FW6_TYPE_WR_RPL, fw6_wr_rpl); t4_register_fw_msg_handler(sc, FW6_TYPE_CQE, fw6_cqe_handler); t4_register_an_handler(sc, c4iw_ev_handler); } void c4iw_cm_term_cpl(struct adapter *sc) { t4_register_cpl_handler(sc, CPL_RDMA_TERMINATE, NULL); t4_register_fw_msg_handler(sc, FW6_TYPE_WR_RPL, NULL); t4_register_fw_msg_handler(sc, FW6_TYPE_CQE, NULL); } int __init c4iw_cm_init(void) { TAILQ_INIT(&req_list); spin_lock_init(&req_lock); INIT_LIST_HEAD(&timeout_list); spin_lock_init(&timeout_lock); INIT_WORK(&c4iw_task, process_req); c4iw_taskq = create_singlethread_workqueue("iw_cxgbe"); if (!c4iw_taskq) return -ENOMEM; return 0; } void __exit c4iw_cm_term(void) { WARN_ON(!TAILQ_EMPTY(&req_list)); WARN_ON(!list_empty(&timeout_list)); flush_workqueue(c4iw_taskq); destroy_workqueue(c4iw_taskq); } #endif Index: projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/iw_cxgbe.h =================================================================== --- projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/iw_cxgbe.h (revision 294776) +++ projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/iw_cxgbe.h (revision 294777) @@ -1,1042 +1,1044 @@ /* - * Copyright (c) 2009-2013 Chelsio, Inc. All rights reserved. + * Copyright (c) 2009-2013, 2016 Chelsio, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * * $FreeBSD$ */ #ifndef __IW_CXGB4_H__ #define __IW_CXGB4_H__ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #undef prefetch #include "common/common.h" #include "common/t4_msg.h" #include "common/t4_regs.h" #include "common/t4_tcb.h" #include "t4_l2t.h" #define DRV_NAME "iw_cxgbe" #define MOD DRV_NAME ":" #define KTR_IW_CXGBE KTR_SPARE3 extern int c4iw_debug; #define PDBG(fmt, args...) \ do { \ if (c4iw_debug) \ printf(MOD fmt, ## args); \ } while (0) #include "t4.h" static inline void *cplhdr(struct mbuf *m) { return mtod(m, void*); } #define PBL_OFF(rdev_p, a) ((a) - (rdev_p)->adap->vres.pbl.start) #define RQT_OFF(rdev_p, a) ((a) - (rdev_p)->adap->vres.rq.start) #define C4IW_ID_TABLE_F_RANDOM 1 /* Pseudo-randomize the id's returned */ #define C4IW_ID_TABLE_F_EMPTY 2 /* Table is initially empty */ struct c4iw_id_table { u32 flags; u32 start; /* logical minimal id */ u32 last; /* hint for find */ u32 max; spinlock_t lock; unsigned long *table; }; struct c4iw_resource { struct c4iw_id_table tpt_table; struct c4iw_id_table qid_table; struct c4iw_id_table pdid_table; }; struct c4iw_qid_list { struct list_head entry; u32 qid; }; struct c4iw_dev_ucontext { struct list_head qpids; struct list_head cqids; struct mutex lock; }; enum c4iw_rdev_flags { T4_FATAL_ERROR = (1<<0), }; struct c4iw_stat { u64 total; u64 cur; u64 max; u64 fail; }; struct c4iw_stats { struct mutex lock; struct c4iw_stat qid; struct c4iw_stat pd; struct c4iw_stat stag; struct c4iw_stat pbl; struct c4iw_stat rqt; u64 db_full; u64 db_empty; u64 db_drop; u64 db_state_transitions; }; struct c4iw_rdev { struct adapter *adap; struct c4iw_resource resource; unsigned long qpshift; u32 qpmask; unsigned long cqshift; u32 cqmask; struct c4iw_dev_ucontext uctx; struct gen_pool *pbl_pool; struct gen_pool *rqt_pool; u32 flags; struct c4iw_stats stats; }; static inline int c4iw_fatal_error(struct c4iw_rdev *rdev) { return rdev->flags & T4_FATAL_ERROR; } static inline int c4iw_num_stags(struct c4iw_rdev *rdev) { return min((int)T4_MAX_NUM_STAG, (int)(rdev->adap->vres.stag.size >> 5)); } #define C4IW_WR_TO (10*HZ) struct c4iw_wr_wait { int ret; atomic_t completion; }; static inline void c4iw_init_wr_wait(struct c4iw_wr_wait *wr_waitp) { wr_waitp->ret = 0; atomic_set(&wr_waitp->completion, 0); } static inline void c4iw_wake_up(struct c4iw_wr_wait *wr_waitp, int ret) { wr_waitp->ret = ret; atomic_set(&wr_waitp->completion, 1); wakeup(wr_waitp); } static inline int c4iw_wait_for_reply(struct c4iw_rdev *rdev, struct c4iw_wr_wait *wr_waitp, u32 hwtid, u32 qpid, const char *func) { struct adapter *sc = rdev->adap; unsigned to = C4IW_WR_TO; while (!atomic_read(&wr_waitp->completion)) { tsleep(wr_waitp, 0, "c4iw_wait", to); if (SIGPENDING(curthread)) { printf("%s - Device %s not responding - " "tid %u qpid %u\n", func, device_get_nameunit(sc->dev), hwtid, qpid); if (c4iw_fatal_error(rdev)) { wr_waitp->ret = -EIO; break; } to = to << 2; } } if (wr_waitp->ret) CTR4(KTR_IW_CXGBE, "%s: FW reply %d tid %u qpid %u", device_get_nameunit(sc->dev), wr_waitp->ret, hwtid, qpid); return (wr_waitp->ret); } enum db_state { NORMAL = 0, FLOW_CONTROL = 1, RECOVERY = 2 }; struct c4iw_dev { struct ib_device ibdev; struct c4iw_rdev rdev; u32 device_cap_flags; struct idr cqidr; struct idr qpidr; struct idr mmidr; spinlock_t lock; struct dentry *debugfs_root; enum db_state db_state; int qpcnt; }; static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev) { return container_of(ibdev, struct c4iw_dev, ibdev); } static inline struct c4iw_dev *rdev_to_c4iw_dev(struct c4iw_rdev *rdev) { return container_of(rdev, struct c4iw_dev, rdev); } static inline struct c4iw_cq *get_chp(struct c4iw_dev *rhp, u32 cqid) { return idr_find(&rhp->cqidr, cqid); } static inline struct c4iw_qp *get_qhp(struct c4iw_dev *rhp, u32 qpid) { return idr_find(&rhp->qpidr, qpid); } static inline struct c4iw_mr *get_mhp(struct c4iw_dev *rhp, u32 mmid) { return idr_find(&rhp->mmidr, mmid); } static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr, void *handle, u32 id, int lock) { int ret; int newid; do { if (!idr_pre_get(idr, lock ? GFP_KERNEL : GFP_ATOMIC)) return -ENOMEM; if (lock) spin_lock_irq(&rhp->lock); ret = idr_get_new_above(idr, handle, id, &newid); BUG_ON(!ret && newid != id); if (lock) spin_unlock_irq(&rhp->lock); } while (ret == -EAGAIN); return ret; } static inline int insert_handle(struct c4iw_dev *rhp, struct idr *idr, void *handle, u32 id) { return _insert_handle(rhp, idr, handle, id, 1); } static inline int insert_handle_nolock(struct c4iw_dev *rhp, struct idr *idr, void *handle, u32 id) { return _insert_handle(rhp, idr, handle, id, 0); } static inline void _remove_handle(struct c4iw_dev *rhp, struct idr *idr, u32 id, int lock) { if (lock) spin_lock_irq(&rhp->lock); idr_remove(idr, id); if (lock) spin_unlock_irq(&rhp->lock); } static inline void remove_handle(struct c4iw_dev *rhp, struct idr *idr, u32 id) { _remove_handle(rhp, idr, id, 1); } static inline void remove_handle_nolock(struct c4iw_dev *rhp, struct idr *idr, u32 id) { _remove_handle(rhp, idr, id, 0); } struct c4iw_pd { struct ib_pd ibpd; u32 pdid; struct c4iw_dev *rhp; }; static inline struct c4iw_pd *to_c4iw_pd(struct ib_pd *ibpd) { return container_of(ibpd, struct c4iw_pd, ibpd); } struct tpt_attributes { u64 len; u64 va_fbo; enum fw_ri_mem_perms perms; u32 stag; u32 pdid; u32 qpid; u32 pbl_addr; u32 pbl_size; u32 state:1; u32 type:2; u32 rsvd:1; u32 remote_invaliate_disable:1; u32 zbva:1; u32 mw_bind_enable:1; u32 page_size:5; }; struct c4iw_mr { struct ib_mr ibmr; struct ib_umem *umem; struct c4iw_dev *rhp; u64 kva; struct tpt_attributes attr; }; static inline struct c4iw_mr *to_c4iw_mr(struct ib_mr *ibmr) { return container_of(ibmr, struct c4iw_mr, ibmr); } struct c4iw_mw { struct ib_mw ibmw; struct c4iw_dev *rhp; u64 kva; struct tpt_attributes attr; }; static inline struct c4iw_mw *to_c4iw_mw(struct ib_mw *ibmw) { return container_of(ibmw, struct c4iw_mw, ibmw); } struct c4iw_fr_page_list { struct ib_fast_reg_page_list ibpl; DECLARE_PCI_UNMAP_ADDR(mapping); dma_addr_t dma_addr; struct c4iw_dev *dev; int size; }; static inline struct c4iw_fr_page_list *to_c4iw_fr_page_list( struct ib_fast_reg_page_list *ibpl) { return container_of(ibpl, struct c4iw_fr_page_list, ibpl); } struct c4iw_cq { struct ib_cq ibcq; struct c4iw_dev *rhp; struct t4_cq cq; spinlock_t lock; spinlock_t comp_handler_lock; atomic_t refcnt; wait_queue_head_t wait; }; static inline struct c4iw_cq *to_c4iw_cq(struct ib_cq *ibcq) { return container_of(ibcq, struct c4iw_cq, ibcq); } struct c4iw_mpa_attributes { u8 initiator; u8 recv_marker_enabled; u8 xmit_marker_enabled; u8 crc_enabled; u8 enhanced_rdma_conn; u8 version; u8 p2p_type; }; struct c4iw_qp_attributes { u32 scq; u32 rcq; u32 sq_num_entries; u32 rq_num_entries; u32 sq_max_sges; u32 sq_max_sges_rdma_write; u32 rq_max_sges; u32 state; u8 enable_rdma_read; u8 enable_rdma_write; u8 enable_bind; u8 enable_mmid0_fastreg; u32 max_ord; u32 max_ird; u32 pd; u32 next_state; char terminate_buffer[52]; u32 terminate_msg_len; u8 is_terminate_local; struct c4iw_mpa_attributes mpa_attr; struct c4iw_ep *llp_stream_handle; u8 layer_etype; u8 ecode; u16 sq_db_inc; u16 rq_db_inc; }; struct c4iw_qp { struct ib_qp ibqp; struct c4iw_dev *rhp; struct c4iw_ep *ep; struct c4iw_qp_attributes attr; struct t4_wq wq; spinlock_t lock; struct mutex mutex; atomic_t refcnt; wait_queue_head_t wait; struct timer_list timer; }; static inline struct c4iw_qp *to_c4iw_qp(struct ib_qp *ibqp) { return container_of(ibqp, struct c4iw_qp, ibqp); } struct c4iw_ucontext { struct ib_ucontext ibucontext; struct c4iw_dev_ucontext uctx; u32 key; spinlock_t mmap_lock; struct list_head mmaps; }; static inline struct c4iw_ucontext *to_c4iw_ucontext(struct ib_ucontext *c) { return container_of(c, struct c4iw_ucontext, ibucontext); } struct c4iw_mm_entry { struct list_head entry; u64 addr; u32 key; unsigned len; }; static inline struct c4iw_mm_entry *remove_mmap(struct c4iw_ucontext *ucontext, u32 key, unsigned len) { struct list_head *pos, *nxt; struct c4iw_mm_entry *mm; spin_lock(&ucontext->mmap_lock); list_for_each_safe(pos, nxt, &ucontext->mmaps) { mm = list_entry(pos, struct c4iw_mm_entry, entry); if (mm->key == key && mm->len == len) { list_del_init(&mm->entry); spin_unlock(&ucontext->mmap_lock); CTR4(KTR_IW_CXGBE, "%s key 0x%x addr 0x%llx len %d", __func__, key, (unsigned long long) mm->addr, mm->len); return mm; } } spin_unlock(&ucontext->mmap_lock); return NULL; } static inline void insert_mmap(struct c4iw_ucontext *ucontext, struct c4iw_mm_entry *mm) { spin_lock(&ucontext->mmap_lock); CTR4(KTR_IW_CXGBE, "%s key 0x%x addr 0x%llx len %d", __func__, mm->key, (unsigned long long) mm->addr, mm->len); list_add_tail(&mm->entry, &ucontext->mmaps); spin_unlock(&ucontext->mmap_lock); } enum c4iw_qp_attr_mask { C4IW_QP_ATTR_NEXT_STATE = 1 << 0, C4IW_QP_ATTR_SQ_DB = 1<<1, C4IW_QP_ATTR_RQ_DB = 1<<2, C4IW_QP_ATTR_ENABLE_RDMA_READ = 1 << 7, C4IW_QP_ATTR_ENABLE_RDMA_WRITE = 1 << 8, C4IW_QP_ATTR_ENABLE_RDMA_BIND = 1 << 9, C4IW_QP_ATTR_MAX_ORD = 1 << 11, C4IW_QP_ATTR_MAX_IRD = 1 << 12, C4IW_QP_ATTR_LLP_STREAM_HANDLE = 1 << 22, C4IW_QP_ATTR_STREAM_MSG_BUFFER = 1 << 23, C4IW_QP_ATTR_MPA_ATTR = 1 << 24, C4IW_QP_ATTR_QP_CONTEXT_ACTIVATE = 1 << 25, C4IW_QP_ATTR_VALID_MODIFY = (C4IW_QP_ATTR_ENABLE_RDMA_READ | C4IW_QP_ATTR_ENABLE_RDMA_WRITE | C4IW_QP_ATTR_MAX_ORD | C4IW_QP_ATTR_MAX_IRD | C4IW_QP_ATTR_LLP_STREAM_HANDLE | C4IW_QP_ATTR_STREAM_MSG_BUFFER | C4IW_QP_ATTR_MPA_ATTR | C4IW_QP_ATTR_QP_CONTEXT_ACTIVATE) }; int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp, enum c4iw_qp_attr_mask mask, struct c4iw_qp_attributes *attrs, int internal); enum c4iw_qp_state { C4IW_QP_STATE_IDLE, C4IW_QP_STATE_RTS, C4IW_QP_STATE_ERROR, C4IW_QP_STATE_TERMINATE, C4IW_QP_STATE_CLOSING, C4IW_QP_STATE_TOT }; static inline int c4iw_convert_state(enum ib_qp_state ib_state) { switch (ib_state) { case IB_QPS_RESET: case IB_QPS_INIT: return C4IW_QP_STATE_IDLE; case IB_QPS_RTS: return C4IW_QP_STATE_RTS; case IB_QPS_SQD: return C4IW_QP_STATE_CLOSING; case IB_QPS_SQE: return C4IW_QP_STATE_TERMINATE; case IB_QPS_ERR: return C4IW_QP_STATE_ERROR; default: return -1; } } static inline int to_ib_qp_state(int c4iw_qp_state) { switch (c4iw_qp_state) { case C4IW_QP_STATE_IDLE: return IB_QPS_INIT; case C4IW_QP_STATE_RTS: return IB_QPS_RTS; case C4IW_QP_STATE_CLOSING: return IB_QPS_SQD; case C4IW_QP_STATE_TERMINATE: return IB_QPS_SQE; case C4IW_QP_STATE_ERROR: return IB_QPS_ERR; } return IB_QPS_ERR; } static inline u32 c4iw_ib_to_tpt_access(int a) { return (a & IB_ACCESS_REMOTE_WRITE ? FW_RI_MEM_ACCESS_REM_WRITE : 0) | (a & IB_ACCESS_REMOTE_READ ? FW_RI_MEM_ACCESS_REM_READ : 0) | (a & IB_ACCESS_LOCAL_WRITE ? FW_RI_MEM_ACCESS_LOCAL_WRITE : 0) | FW_RI_MEM_ACCESS_LOCAL_READ; } static inline u32 c4iw_ib_to_tpt_bind_access(int acc) { return (acc & IB_ACCESS_REMOTE_WRITE ? FW_RI_MEM_ACCESS_REM_WRITE : 0) | (acc & IB_ACCESS_REMOTE_READ ? FW_RI_MEM_ACCESS_REM_READ : 0); } enum c4iw_mmid_state { C4IW_STAG_STATE_VALID, C4IW_STAG_STATE_INVALID }; #define C4IW_NODE_DESC "iw_cxgbe Chelsio Communications" #define MPA_KEY_REQ "MPA ID Req Frame" #define MPA_KEY_REP "MPA ID Rep Frame" #define MPA_MAX_PRIVATE_DATA 256 #define MPA_ENHANCED_RDMA_CONN 0x10 #define MPA_REJECT 0x20 #define MPA_CRC 0x40 #define MPA_MARKERS 0x80 #define MPA_FLAGS_MASK 0xE0 #define MPA_V2_PEER2PEER_MODEL 0x8000 #define MPA_V2_ZERO_LEN_FPDU_RTR 0x4000 #define MPA_V2_RDMA_WRITE_RTR 0x8000 #define MPA_V2_RDMA_READ_RTR 0x4000 #define MPA_V2_IRD_ORD_MASK 0x3FFF #define c4iw_put_ep(ep) { \ CTR4(KTR_IW_CXGBE, "put_ep (%s:%u) ep %p, refcnt %d", \ __func__, __LINE__, ep, atomic_read(&(ep)->kref.refcount)); \ WARN_ON(atomic_read(&(ep)->kref.refcount) < 1); \ kref_put(&((ep)->kref), _c4iw_free_ep); \ } #define c4iw_get_ep(ep) { \ CTR4(KTR_IW_CXGBE, "get_ep (%s:%u) ep %p, refcnt %d", \ __func__, __LINE__, ep, atomic_read(&(ep)->kref.refcount)); \ kref_get(&((ep)->kref)); \ } void _c4iw_free_ep(struct kref *kref); struct mpa_message { u8 key[16]; u8 flags; u8 revision; __be16 private_data_size; u8 private_data[0]; }; struct mpa_v2_conn_params { __be16 ird; __be16 ord; }; struct terminate_message { u8 layer_etype; u8 ecode; __be16 hdrct_rsvd; u8 len_hdrs[0]; }; #define TERM_MAX_LENGTH (sizeof(struct terminate_message) + 2 + 18 + 28) enum c4iw_layers_types { LAYER_RDMAP = 0x00, LAYER_DDP = 0x10, LAYER_MPA = 0x20, RDMAP_LOCAL_CATA = 0x00, RDMAP_REMOTE_PROT = 0x01, RDMAP_REMOTE_OP = 0x02, DDP_LOCAL_CATA = 0x00, DDP_TAGGED_ERR = 0x01, DDP_UNTAGGED_ERR = 0x02, DDP_LLP = 0x03 }; enum c4iw_rdma_ecodes { RDMAP_INV_STAG = 0x00, RDMAP_BASE_BOUNDS = 0x01, RDMAP_ACC_VIOL = 0x02, RDMAP_STAG_NOT_ASSOC = 0x03, RDMAP_TO_WRAP = 0x04, RDMAP_INV_VERS = 0x05, RDMAP_INV_OPCODE = 0x06, RDMAP_STREAM_CATA = 0x07, RDMAP_GLOBAL_CATA = 0x08, RDMAP_CANT_INV_STAG = 0x09, RDMAP_UNSPECIFIED = 0xff }; enum c4iw_ddp_ecodes { DDPT_INV_STAG = 0x00, DDPT_BASE_BOUNDS = 0x01, DDPT_STAG_NOT_ASSOC = 0x02, DDPT_TO_WRAP = 0x03, DDPT_INV_VERS = 0x04, DDPU_INV_QN = 0x01, DDPU_INV_MSN_NOBUF = 0x02, DDPU_INV_MSN_RANGE = 0x03, DDPU_INV_MO = 0x04, DDPU_MSG_TOOBIG = 0x05, DDPU_INV_VERS = 0x06 }; enum c4iw_mpa_ecodes { MPA_CRC_ERR = 0x02, MPA_MARKER_ERR = 0x03, MPA_LOCAL_CATA = 0x05, MPA_INSUFF_IRD = 0x06, MPA_NOMATCH_RTR = 0x07, }; enum c4iw_ep_state { IDLE = 0, LISTEN, CONNECTING, MPA_REQ_WAIT, MPA_REQ_SENT, MPA_REQ_RCVD, MPA_REP_SENT, FPDU_MODE, ABORTING, CLOSING, MORIBUND, DEAD, }; enum c4iw_ep_flags { PEER_ABORT_IN_PROGRESS = 0, ABORT_REQ_IN_PROGRESS = 1, RELEASE_RESOURCES = 2, CLOSE_SENT = 3, TIMEOUT = 4 }; enum c4iw_ep_history { ACT_OPEN_REQ = 0, ACT_OFLD_CONN = 1, ACT_OPEN_RPL = 2, ACT_ESTAB = 3, PASS_ACCEPT_REQ = 4, PASS_ESTAB = 5, ABORT_UPCALL = 6, ESTAB_UPCALL = 7, CLOSE_UPCALL = 8, ULP_ACCEPT = 9, ULP_REJECT = 10, TIMEDOUT = 11, PEER_ABORT = 12, PEER_CLOSE = 13, CONNREQ_UPCALL = 14, ABORT_CONN = 15, DISCONN_UPCALL = 16, EP_DISC_CLOSE = 17, EP_DISC_ABORT = 18, CONN_RPL_UPCALL = 19, ACT_RETRY_NOMEM = 20, ACT_RETRY_INUSE = 21 }; struct c4iw_ep_common { TAILQ_ENTRY(c4iw_ep_common) entry; /* Work queue attachment */ struct iw_cm_id *cm_id; struct c4iw_qp *qp; struct c4iw_dev *dev; enum c4iw_ep_state state; struct kref kref; struct mutex mutex; struct sockaddr_in local_addr; struct sockaddr_in remote_addr; struct c4iw_wr_wait wr_wait; unsigned long flags; unsigned long history; int rpl_err; int rpl_done; struct thread *thread; struct socket *so; }; struct c4iw_listen_ep { struct c4iw_ep_common com; unsigned int stid; int backlog; }; struct c4iw_ep { struct c4iw_ep_common com; struct c4iw_ep *parent_ep; struct timer_list timer; struct list_head entry; unsigned int atid; u32 hwtid; u32 snd_seq; u32 rcv_seq; struct l2t_entry *l2t; struct dst_entry *dst; struct c4iw_mpa_attributes mpa_attr; u8 mpa_pkt[sizeof(struct mpa_message) + MPA_MAX_PRIVATE_DATA]; unsigned int mpa_pkt_len; u32 ird; u32 ord; u32 smac_idx; u32 tx_chan; u32 mtu; u16 mss; u16 emss; u16 plen; u16 rss_qid; u16 txq_idx; u16 ctrlq_idx; u8 tos; u8 retry_with_mpa_v1; u8 tried_with_mpa_v1; }; static inline struct c4iw_ep *to_ep(struct iw_cm_id *cm_id) { return cm_id->provider_data; } static inline struct c4iw_listen_ep *to_listen_ep(struct iw_cm_id *cm_id) { return cm_id->provider_data; } static inline int compute_wscale(int win) { int wscale = 0; while (wscale < 14 && (65535< struct gen_pool { blist_t gen_list; daddr_t gen_base; int gen_chunk_shift; struct mutex gen_lock; }; static __inline struct gen_pool * gen_pool_create(daddr_t base, u_int chunk_shift, u_int len) { struct gen_pool *gp; gp = malloc(sizeof(struct gen_pool), M_DEVBUF, M_NOWAIT); if (gp == NULL) return (NULL); memset(gp, 0, sizeof(struct gen_pool)); gp->gen_list = blist_create(len >> chunk_shift, M_NOWAIT); if (gp->gen_list == NULL) { free(gp, M_DEVBUF); return (NULL); } blist_free(gp->gen_list, 0, len >> chunk_shift); gp->gen_base = base; gp->gen_chunk_shift = chunk_shift; //mutex_init(&gp->gen_lock, "genpool", NULL, MTX_DUPOK|MTX_DEF); mutex_init(&gp->gen_lock); return (gp); } static __inline unsigned long gen_pool_alloc(struct gen_pool *gp, int size) { int chunks; daddr_t blkno; chunks = (size + (1<gen_chunk_shift) - 1) >> gp->gen_chunk_shift; mutex_lock(&gp->gen_lock); blkno = blist_alloc(gp->gen_list, chunks); mutex_unlock(&gp->gen_lock); if (blkno == SWAPBLK_NONE) return (0); return (gp->gen_base + ((1 << gp->gen_chunk_shift) * blkno)); } static __inline void gen_pool_free(struct gen_pool *gp, daddr_t address, int size) { int chunks; daddr_t blkno; chunks = (size + (1<gen_chunk_shift) - 1) >> gp->gen_chunk_shift; blkno = (address - gp->gen_base) / (1 << gp->gen_chunk_shift); mutex_lock(&gp->gen_lock); blist_free(gp->gen_list, blkno, chunks); mutex_unlock(&gp->gen_lock); } static __inline void gen_pool_destroy(struct gen_pool *gp) { blist_destroy(gp->gen_list); free(gp, M_DEVBUF); } #if defined(__i386__) || defined(__amd64__) #define L1_CACHE_BYTES 128 #else #define L1_CACHE_BYTES 32 #endif static inline int idr_for_each(struct idr *idp, int (*fn)(int id, void *p, void *data), void *data) { int n, id, max, error = 0; struct idr_layer *p; struct idr_layer *pa[MAX_LEVEL]; struct idr_layer **paa = &pa[0]; n = idp->layers * IDR_BITS; p = idp->top; max = 1 << n; id = 0; while (id < max) { while (n > 0 && p) { n -= IDR_BITS; *paa++ = p; p = p->ary[(id >> n) & IDR_MASK]; } if (p) { error = fn(id, (void *)p, data); if (error) break; } id += 1 << n; while (n < fls(id)) { n += IDR_BITS; p = *--paa; } } return error; } void c4iw_cm_init_cpl(struct adapter *); void c4iw_cm_term_cpl(struct adapter *); void your_reg_device(struct c4iw_dev *dev); #define SGE_CTRLQ_NUM 0 extern int spg_creds;/* Status Page size in credit units(1 unit = 64) */ #endif Index: projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/provider.c =================================================================== --- projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/provider.c (revision 294776) +++ projects/clang380-import/sys/dev/cxgbe/iw_cxgbe/provider.c (revision 294777) @@ -1,501 +1,502 @@ /* - * Copyright (c) 2009-2013 Chelsio, Inc. All rights reserved. + * Copyright (c) 2009-2013, 2016 Chelsio, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #ifdef TCP_OFFLOAD #include #include #include #include #include "iw_cxgbe.h" #include "user.h" static int fastreg_support = 1; module_param(fastreg_support, int, 0644); MODULE_PARM_DESC(fastreg_support, "Advertise fastreg support (default = 1)"); static int c4iw_modify_port(struct ib_device *ibdev, u8 port, int port_modify_mask, struct ib_port_modify *props) { return -ENOSYS; } static struct ib_ah *c4iw_ah_create(struct ib_pd *pd, struct ib_ah_attr *ah_attr) { return ERR_PTR(-ENOSYS); } static int c4iw_ah_destroy(struct ib_ah *ah) { return -ENOSYS; } static int c4iw_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { return -ENOSYS; } static int c4iw_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) { return -ENOSYS; } static int c4iw_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num, struct ib_wc *in_wc, struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad) { return -ENOSYS; } static int c4iw_dealloc_ucontext(struct ib_ucontext *context) { struct c4iw_dev *rhp = to_c4iw_dev(context->device); struct c4iw_ucontext *ucontext = to_c4iw_ucontext(context); struct c4iw_mm_entry *mm, *tmp; CTR2(KTR_IW_CXGBE, "%s context %p", __func__, context); list_for_each_entry_safe(mm, tmp, &ucontext->mmaps, entry) kfree(mm); c4iw_release_dev_ucontext(&rhp->rdev, &ucontext->uctx); kfree(ucontext); return 0; } static struct ib_ucontext *c4iw_alloc_ucontext(struct ib_device *ibdev, struct ib_udata *udata) { struct c4iw_ucontext *context; struct c4iw_dev *rhp = to_c4iw_dev(ibdev); CTR2(KTR_IW_CXGBE, "%s ibdev %p", __func__, ibdev); context = kzalloc(sizeof(*context), GFP_KERNEL); if (!context) return ERR_PTR(-ENOMEM); c4iw_init_dev_ucontext(&rhp->rdev, &context->uctx); INIT_LIST_HEAD(&context->mmaps); spin_lock_init(&context->mmap_lock); return &context->ibucontext; } #ifdef DOT5 static inline pgprot_t t4_pgprot_wc(pgprot_t prot) { return pgprot_writecombine(prot); } #endif static int c4iw_mmap(struct ib_ucontext *context, struct vm_area_struct *vma) { int len = vma->vm_end - vma->vm_start; u32 key = vma->vm_pgoff << PAGE_SHIFT; struct c4iw_rdev *rdev; int ret = 0; struct c4iw_mm_entry *mm; struct c4iw_ucontext *ucontext; u64 addr, paddr; u64 va_regs_res = 0, va_udbs_res = 0; u64 len_regs_res = 0, len_udbs_res = 0; CTR3(KTR_IW_CXGBE, "%s:1 ctx %p vma %p", __func__, context, vma); CTR4(KTR_IW_CXGBE, "%s:1a pgoff 0x%lx key 0x%x len %d", __func__, vma->vm_pgoff, key, len); if (vma->vm_start & (PAGE_SIZE-1)) { CTR3(KTR_IW_CXGBE, "%s:2 unaligned vm_start %u vma %p", __func__, vma->vm_start, vma); return -EINVAL; } rdev = &(to_c4iw_dev(context->device)->rdev); ucontext = to_c4iw_ucontext(context); mm = remove_mmap(ucontext, key, len); if (!mm) { CTR4(KTR_IW_CXGBE, "%s:3 ucontext %p key %u len %u", __func__, ucontext, key, len); return -EINVAL; } addr = mm->addr; kfree(mm); va_regs_res = (u64)rman_get_virtual(rdev->adap->regs_res); len_regs_res = (u64)rman_get_size(rdev->adap->regs_res); va_udbs_res = (u64)rman_get_virtual(rdev->adap->udbs_res); len_udbs_res = (u64)rman_get_size(rdev->adap->udbs_res); CTR6(KTR_IW_CXGBE, "%s:4 addr %p, masync region %p:%p, udb region %p:%p", __func__, addr, va_regs_res, va_regs_res+len_regs_res, va_udbs_res, va_udbs_res+len_udbs_res); if (addr >= va_regs_res && addr < va_regs_res + len_regs_res) { CTR4(KTR_IW_CXGBE, "%s:5 MA_SYNC addr %p region %p, reglen %u", __func__, addr, va_regs_res, len_regs_res); /* * MA_SYNC register... */ paddr = vtophys(addr); vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); ret = io_remap_pfn_range(vma, vma->vm_start, paddr >> PAGE_SHIFT, len, vma->vm_page_prot); } else { if (addr >= va_udbs_res && addr < va_udbs_res + len_udbs_res) { /* * Map user DB or OCQP memory... */ paddr = vtophys(addr); CTR4(KTR_IW_CXGBE, "%s:6 USER DB-GTS addr %p region %p, reglen %u", __func__, addr, va_udbs_res, len_udbs_res); #ifdef DOT5 if (is_t5(rdev->lldi.adapter_type) && map_udb_as_wc) vma->vm_page_prot = t4_pgprot_wc(vma->vm_page_prot); else #endif vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); ret = io_remap_pfn_range(vma, vma->vm_start, paddr >> PAGE_SHIFT, len, vma->vm_page_prot); } else { /* * Map WQ or CQ contig dma memory... */ CTR4(KTR_IW_CXGBE, "%s:7 WQ/CQ addr %p vm_start %u vma %p", __func__, addr, vma->vm_start, vma); ret = io_remap_pfn_range(vma, vma->vm_start, addr >> PAGE_SHIFT, len, vma->vm_page_prot); } } CTR4(KTR_IW_CXGBE, "%s:8 ctx %p vma %p ret %u", __func__, context, vma, ret); return ret; } static int c4iw_deallocate_pd(struct ib_pd *pd) { struct c4iw_pd *php = to_c4iw_pd(pd); struct c4iw_dev *rhp = php->rhp; CTR3(KTR_IW_CXGBE, "%s: pd %p, pdid 0x%x", __func__, pd, php->pdid); c4iw_put_resource(&rhp->rdev.resource.pdid_table, php->pdid); mutex_lock(&rhp->rdev.stats.lock); rhp->rdev.stats.pd.cur--; mutex_unlock(&rhp->rdev.stats.lock); kfree(php); return (0); } static struct ib_pd * c4iw_allocate_pd(struct ib_device *ibdev, struct ib_ucontext *context, struct ib_udata *udata) { struct c4iw_pd *php; u32 pdid; struct c4iw_dev *rhp; CTR4(KTR_IW_CXGBE, "%s: ibdev %p, context %p, data %p", __func__, ibdev, context, udata); rhp = (struct c4iw_dev *) ibdev; pdid = c4iw_get_resource(&rhp->rdev.resource.pdid_table); if (!pdid) return ERR_PTR(-EINVAL); php = kzalloc(sizeof(*php), GFP_KERNEL); if (!php) { c4iw_put_resource(&rhp->rdev.resource.pdid_table, pdid); return ERR_PTR(-ENOMEM); } php->pdid = pdid; php->rhp = rhp; if (context) { if (ib_copy_to_udata(udata, &php->pdid, sizeof(u32))) { c4iw_deallocate_pd(&php->ibpd); return ERR_PTR(-EFAULT); } } mutex_lock(&rhp->rdev.stats.lock); rhp->rdev.stats.pd.cur++; if (rhp->rdev.stats.pd.cur > rhp->rdev.stats.pd.max) rhp->rdev.stats.pd.max = rhp->rdev.stats.pd.cur; mutex_unlock(&rhp->rdev.stats.lock); CTR6(KTR_IW_CXGBE, "%s: ibdev %p, context %p, data %p, pddid 0x%x, pd %p", __func__, ibdev, context, udata, pdid, php); return (&php->ibpd); } static int c4iw_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 *pkey) { CTR5(KTR_IW_CXGBE, "%s ibdev %p, port %d, index %d, pkey %p", __func__, ibdev, port, index, pkey); *pkey = 0; return (0); } static int c4iw_query_gid(struct ib_device *ibdev, u8 port, int index, union ib_gid *gid) { struct c4iw_dev *dev; struct port_info *pi; struct adapter *sc; CTR5(KTR_IW_CXGBE, "%s ibdev %p, port %d, index %d, gid %p", __func__, ibdev, port, index, gid); memset(&gid->raw[0], 0, sizeof(gid->raw)); dev = to_c4iw_dev(ibdev); sc = dev->rdev.adap; if (port == 0 || port > sc->params.nports) return (-EINVAL); pi = sc->port[port - 1]; memcpy(&gid->raw[0], pi->vi[0].hw_addr, ETHER_ADDR_LEN); return (0); } static int c4iw_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { struct c4iw_dev *dev = to_c4iw_dev(ibdev); struct adapter *sc = dev->rdev.adap; CTR3(KTR_IW_CXGBE, "%s ibdev %p, props %p", __func__, ibdev, props); memset(props, 0, sizeof *props); memcpy(&props->sys_image_guid, sc->port[0]->vi[0].hw_addr, ETHER_ADDR_LEN); props->hw_ver = sc->params.chipid; props->fw_ver = sc->params.fw_vers; props->device_cap_flags = dev->device_cap_flags; props->page_size_cap = T4_PAGESIZE_MASK; props->vendor_id = pci_get_vendor(sc->dev); props->vendor_part_id = pci_get_device(sc->dev); props->max_mr_size = T4_MAX_MR_SIZE; props->max_qp = T4_MAX_NUM_QP; props->max_qp_wr = T4_MAX_QP_DEPTH; props->max_sge = T4_MAX_RECV_SGE; props->max_sge_rd = 1; props->max_qp_rd_atom = c4iw_max_read_depth; props->max_qp_init_rd_atom = c4iw_max_read_depth; props->max_cq = T4_MAX_NUM_CQ; props->max_cqe = T4_MAX_CQ_DEPTH; props->max_mr = c4iw_num_stags(&dev->rdev); props->max_pd = T4_MAX_NUM_PD; props->local_ca_ack_delay = 0; props->max_fast_reg_page_list_len = T4_MAX_FR_DEPTH; return (0); } /* * Returns -errno on failure. */ static int c4iw_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *props) { struct c4iw_dev *dev; struct adapter *sc; struct port_info *pi; struct ifnet *ifp; CTR4(KTR_IW_CXGBE, "%s ibdev %p, port %d, props %p", __func__, ibdev, port, props); dev = to_c4iw_dev(ibdev); sc = dev->rdev.adap; if (port > sc->params.nports) return (-EINVAL); pi = sc->port[port - 1]; ifp = pi->vi[0].ifp; memset(props, 0, sizeof(struct ib_port_attr)); props->max_mtu = IB_MTU_4096; if (ifp->if_mtu >= 4096) props->active_mtu = IB_MTU_4096; else if (ifp->if_mtu >= 2048) props->active_mtu = IB_MTU_2048; else if (ifp->if_mtu >= 1024) props->active_mtu = IB_MTU_1024; else if (ifp->if_mtu >= 512) props->active_mtu = IB_MTU_512; else props->active_mtu = IB_MTU_256; props->state = pi->link_cfg.link_ok ? IB_PORT_ACTIVE : IB_PORT_DOWN; props->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_SNMP_TUNNEL_SUP | IB_PORT_REINIT_SUP | IB_PORT_DEVICE_MGMT_SUP | IB_PORT_VENDOR_CLASS_SUP | IB_PORT_BOOT_MGMT_SUP; props->gid_tbl_len = 1; props->pkey_tbl_len = 1; props->active_width = 2; props->active_speed = 2; props->max_msg_sz = -1; return 0; } /* * Returns -errno on error. */ int c4iw_register_device(struct c4iw_dev *dev) { struct adapter *sc = dev->rdev.adap; struct ib_device *ibdev = &dev->ibdev; struct iw_cm_verbs *iwcm; int ret; CTR3(KTR_IW_CXGBE, "%s c4iw_dev %p, adapter %p", __func__, dev, sc); BUG_ON(!sc->port[0]); strlcpy(ibdev->name, device_get_nameunit(sc->dev), sizeof(ibdev->name)); memset(&ibdev->node_guid, 0, sizeof(ibdev->node_guid)); memcpy(&ibdev->node_guid, sc->port[0]->vi[0].hw_addr, ETHER_ADDR_LEN); ibdev->owner = THIS_MODULE; dev->device_cap_flags = IB_DEVICE_LOCAL_DMA_LKEY | IB_DEVICE_MEM_WINDOW; if (fastreg_support) dev->device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; ibdev->local_dma_lkey = 0; ibdev->uverbs_cmd_mask = (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) | (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE) | (1ull << IB_USER_VERBS_CMD_QUERY_PORT) | (1ull << IB_USER_VERBS_CMD_ALLOC_PD) | (1ull << IB_USER_VERBS_CMD_DEALLOC_PD) | (1ull << IB_USER_VERBS_CMD_REG_MR) | (1ull << IB_USER_VERBS_CMD_DEREG_MR) | (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) | (1ull << IB_USER_VERBS_CMD_CREATE_CQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | (1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ) | (1ull << IB_USER_VERBS_CMD_CREATE_QP) | (1ull << IB_USER_VERBS_CMD_MODIFY_QP) | (1ull << IB_USER_VERBS_CMD_QUERY_QP) | (1ull << IB_USER_VERBS_CMD_POLL_CQ) | (1ull << IB_USER_VERBS_CMD_DESTROY_QP) | (1ull << IB_USER_VERBS_CMD_POST_SEND) | (1ull << IB_USER_VERBS_CMD_POST_RECV); ibdev->node_type = RDMA_NODE_RNIC; strlcpy(ibdev->node_desc, C4IW_NODE_DESC, sizeof(ibdev->node_desc)); ibdev->phys_port_cnt = sc->params.nports; ibdev->num_comp_vectors = 1; ibdev->dma_device = sc->dev; ibdev->query_device = c4iw_query_device; ibdev->query_port = c4iw_query_port; ibdev->modify_port = c4iw_modify_port; ibdev->query_pkey = c4iw_query_pkey; ibdev->query_gid = c4iw_query_gid; ibdev->alloc_ucontext = c4iw_alloc_ucontext; ibdev->dealloc_ucontext = c4iw_dealloc_ucontext; ibdev->mmap = c4iw_mmap; ibdev->alloc_pd = c4iw_allocate_pd; ibdev->dealloc_pd = c4iw_deallocate_pd; ibdev->create_ah = c4iw_ah_create; ibdev->destroy_ah = c4iw_ah_destroy; ibdev->create_qp = c4iw_create_qp; ibdev->modify_qp = c4iw_ib_modify_qp; ibdev->query_qp = c4iw_ib_query_qp; ibdev->destroy_qp = c4iw_destroy_qp; ibdev->create_cq = c4iw_create_cq; ibdev->destroy_cq = c4iw_destroy_cq; ibdev->resize_cq = c4iw_resize_cq; ibdev->poll_cq = c4iw_poll_cq; ibdev->get_dma_mr = c4iw_get_dma_mr; ibdev->reg_phys_mr = c4iw_register_phys_mem; ibdev->rereg_phys_mr = c4iw_reregister_phys_mem; ibdev->reg_user_mr = c4iw_reg_user_mr; ibdev->dereg_mr = c4iw_dereg_mr; ibdev->alloc_mw = c4iw_alloc_mw; ibdev->bind_mw = c4iw_bind_mw; ibdev->dealloc_mw = c4iw_dealloc_mw; ibdev->alloc_fast_reg_mr = c4iw_alloc_fast_reg_mr; ibdev->alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl; ibdev->free_fast_reg_page_list = c4iw_free_fastreg_pbl; ibdev->attach_mcast = c4iw_multicast_attach; ibdev->detach_mcast = c4iw_multicast_detach; ibdev->process_mad = c4iw_process_mad; ibdev->req_notify_cq = c4iw_arm_cq; ibdev->post_send = c4iw_post_send; ibdev->post_recv = c4iw_post_receive; ibdev->uverbs_abi_ver = C4IW_UVERBS_ABI_VERSION; iwcm = kmalloc(sizeof(*iwcm), GFP_KERNEL); if (iwcm == NULL) return (-ENOMEM); iwcm->connect = c4iw_connect; iwcm->accept = c4iw_accept_cr; iwcm->reject = c4iw_reject_cr; - iwcm->create_listen = c4iw_create_listen; - iwcm->destroy_listen = c4iw_destroy_listen; + iwcm->create_listen_ep = c4iw_create_listen_ep; + iwcm->destroy_listen_ep = c4iw_destroy_listen_ep; + iwcm->newconn = process_newconn; iwcm->add_ref = c4iw_qp_add_ref; iwcm->rem_ref = c4iw_qp_rem_ref; iwcm->get_qp = c4iw_get_qp; ibdev->iwcm = iwcm; ret = ib_register_device(&dev->ibdev, NULL); if (ret) kfree(iwcm); return (ret); } void c4iw_unregister_device(struct c4iw_dev *dev) { CTR3(KTR_IW_CXGBE, "%s c4iw_dev %p, adapter %p", __func__, dev, dev->rdev.adap); ib_unregister_device(&dev->ibdev); kfree(dev->ibdev.iwcm); return; } #endif Index: projects/clang380-import/sys/dev/extres/hwreset/hwreset.c =================================================================== --- projects/clang380-import/sys/dev/extres/hwreset/hwreset.c (nonexistent) +++ projects/clang380-import/sys/dev/extres/hwreset/hwreset.c (revision 294777) @@ -0,0 +1,186 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ +#include "opt_platform.h" +#include +#include +#include +#include +#include +#include + +#ifdef FDT +#include +#include +#endif + +#include + +#include "hwreset_if.h" + +struct hwreset { + device_t consumer_dev; /* consumer device*/ + device_t provider_dev; /* provider device*/ + int rst_id; /* reset id */ +}; + +MALLOC_DEFINE(M_HWRESET, "hwreset", "Reset framework"); + +int +hwreset_assert(hwreset_t rst) +{ + + return (HWRESET_ASSERT(rst->provider_dev, rst->rst_id, true)); +} + +int +hwreset_deassert(hwreset_t rst) +{ + + return (HWRESET_ASSERT(rst->provider_dev, rst->rst_id, false)); +} + +int +hwreset_is_asserted(hwreset_t rst, bool *value) +{ + + return (HWRESET_IS_ASSERTED(rst->provider_dev, rst->rst_id, value)); +} + +void +hwreset_release(hwreset_t rst) +{ + free(rst, M_HWRESET); +} + +int +hwreset_get_by_id(device_t consumer_dev, device_t provider_dev, intptr_t id, + hwreset_t *rst_out) +{ + hwreset_t rst; + + /* Create handle */ + rst = malloc(sizeof(struct hwreset), M_HWRESET, + M_WAITOK | M_ZERO); + rst->consumer_dev = consumer_dev; + rst->provider_dev = provider_dev; + rst->rst_id = id; + *rst_out = rst; + return (0); +} + +#ifdef FDT +int +hwreset_default_ofw_map(device_t provider_dev, phandle_t xref, int ncells, + pcell_t *cells, intptr_t *id) +{ + if (ncells == 0) + *id = 1; + else if (ncells == 1) + *id = cells[0]; + else + return (ERANGE); + + return (0); +} + +int +hwreset_get_by_ofw_idx(device_t consumer_dev, int idx, hwreset_t *rst) +{ + phandle_t cnode, xnode; + pcell_t *cells; + device_t rstdev; + int ncells, rv; + intptr_t id; + + cnode = ofw_bus_get_node(consumer_dev); + if (cnode <= 0) { + device_printf(consumer_dev, + "%s called on not ofw based device\n", __func__); + return (ENXIO); + } + + rv = ofw_bus_parse_xref_list_alloc(cnode, "resets", "#reset-cells", + idx, &xnode, &ncells, &cells); + if (rv != 0) + return (rv); + + /* Tranlate provider to device */ + rstdev = OF_device_from_xref(xnode); + if (rstdev == NULL) { + free(cells, M_OFWPROP); + return (ENODEV); + } + /* Map reset to number */ + rv = HWRESET_MAP(rstdev, xnode, ncells, cells, &id); + free(cells, M_OFWPROP); + if (rv != 0) + return (rv); + + return (hwreset_get_by_id(consumer_dev, rstdev, id, rst)); +} + +int +hwreset_get_by_ofw_name(device_t consumer_dev, char *name, hwreset_t *rst) +{ + int rv, idx; + phandle_t cnode; + + cnode = ofw_bus_get_node(consumer_dev); + if (cnode <= 0) { + device_printf(consumer_dev, + "%s called on not ofw based device\n", __func__); + return (ENXIO); + } + rv = ofw_bus_find_string_index(cnode, "reset-names", name, &idx); + if (rv != 0) + return (rv); + return (hwreset_get_by_ofw_idx(consumer_dev, idx, rst)); +} + +void +hwreset_register_ofw_provider(device_t provider_dev) +{ + phandle_t xref, node; + + node = ofw_bus_get_node(provider_dev); + if (node <= 0) + panic("%s called on not ofw based device.\n", __func__); + + xref = OF_xref_from_node(node); + OF_device_register_xref(xref, provider_dev); +} + +void +hwreset_unregister_ofw_provider(device_t provider_dev) +{ + phandle_t xref; + + xref = OF_xref_from_device(provider_dev); + OF_device_register_xref(xref, NULL); +} +#endif Property changes on: projects/clang380-import/sys/dev/extres/hwreset/hwreset.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/hwreset/hwreset.h =================================================================== --- projects/clang380-import/sys/dev/extres/hwreset/hwreset.h (nonexistent) +++ projects/clang380-import/sys/dev/extres/hwreset/hwreset.h (revision 294777) @@ -0,0 +1,67 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef DEV_EXTRES_HWRESET_HWRESET_H +#define DEV_EXTRES_HWRESET_HWRESET_H + +#include "opt_platform.h" +#include +#ifdef FDT +#include +#endif + +typedef struct hwreset *hwreset_t; + +/* + * Provider interface + */ +#ifdef FDT +int hwreset_default_ofw_map(device_t provider_dev, phandle_t xref, int ncells, + pcell_t *cells, intptr_t *id); +void hwreset_register_ofw_provider(device_t provider_dev); +void hwreset_unregister_ofw_provider(device_t provider_dev); +#endif + +/* + * Consumer interface + */ +int hwreset_get_by_id(device_t consumer_dev, device_t provider_dev, intptr_t id, + hwreset_t *rst); +void hwreset_release(hwreset_t rst); + +int hwreset_assert(hwreset_t rst); +int hwreset_deassert(hwreset_t rst); +int hwreset_is_asserted(hwreset_t rst, bool *value); + +#ifdef FDT +int hwreset_get_by_ofw_name(device_t consumer_dev, char *name, hwreset_t *rst); +int hwreset_get_by_ofw_idx(device_t consumer_dev, int idx, hwreset_t *rst); +#endif + + + +#endif /* DEV_EXTRES_HWRESET_HWRESET_H */ Property changes on: projects/clang380-import/sys/dev/extres/hwreset/hwreset.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/hwreset/hwreset_if.m =================================================================== --- projects/clang380-import/sys/dev/extres/hwreset/hwreset_if.m (nonexistent) +++ projects/clang380-import/sys/dev/extres/hwreset/hwreset_if.m (revision 294777) @@ -0,0 +1,72 @@ +#- +# Copyright 2016 Michal Meloun +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. +# +# $FreeBSD$ +# + +#ifdef FDT +#include +#include +#endif + +INTERFACE hwreset; + +#ifdef FDT +HEADER { +int hwreset_default_ofw_map(device_t , phandle_t, int, pcell_t *, intptr_t *); +} + +# +# map fdt property cells to reset id +# Returns 0 on success or a standard errno value. +# +METHOD int map { + device_t provider_dev; + phandle_t xref; + int ncells; + pcell_t *cells; + intptr_t *id; +} DEFAULT hwreset_default_ofw_map; +#endif + +# +# Assert/deassert given reset. +# Returns 0 on success or a standard errno value. +# +METHOD int assert { + device_t provider_dev; + intptr_t id; + bool value; +}; + +# +# Get actual status of given reset. +# Returns 0 on success or a standard errno value. +# +METHOD int is_asserted { + device_t provider_dev; + intptr_t id; + bool *value; +}; Property changes on: projects/clang380-import/sys/dev/extres/hwreset/hwreset_if.m ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk.c =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk.c (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk.c (revision 294777) @@ -0,0 +1,1261 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + +#include "opt_platform.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef FDT +#include +#include +#include +#endif +#include + +MALLOC_DEFINE(M_CLOCK, "clocks", "Clock framework"); + +/* Forward declarations. */ +struct clk; +struct clknodenode; +struct clkdom; + +typedef TAILQ_HEAD(clknode_list, clknode) clknode_list_t; +typedef TAILQ_HEAD(clkdom_list, clkdom) clkdom_list_t; + +/* Default clock methods. */ +static int clknode_method_init(struct clknode *clk, device_t dev); +static int clknode_method_recalc_freq(struct clknode *clk, uint64_t *freq); +static int clknode_method_set_freq(struct clknode *clk, uint64_t fin, + uint64_t *fout, int flags, int *stop); +static int clknode_method_set_gate(struct clknode *clk, bool enable); +static int clknode_method_set_mux(struct clknode *clk, int idx); + +/* + * Clock controller methods. + */ +static clknode_method_t clknode_methods[] = { + CLKNODEMETHOD(clknode_init, clknode_method_init), + CLKNODEMETHOD(clknode_recalc_freq, clknode_method_recalc_freq), + CLKNODEMETHOD(clknode_set_freq, clknode_method_set_freq), + CLKNODEMETHOD(clknode_set_gate, clknode_method_set_gate), + CLKNODEMETHOD(clknode_set_mux, clknode_method_set_mux), + + CLKNODEMETHOD_END +}; +DEFINE_CLASS_0(clknode, clknode_class, clknode_methods, 0); + +/* + * Clock node - basic element for modeling SOC clock graph. It holds the clock + * provider's data about the clock, and the links for the clock's membership in + * various lists. + */ +struct clknode { + KOBJ_FIELDS; + + /* Clock nodes topology. */ + struct clkdom *clkdom; /* Owning clock domain */ + TAILQ_ENTRY(clknode) clkdom_link; /* Domain list entry */ + TAILQ_ENTRY(clknode) clklist_link; /* Global list entry */ + + /* String based parent list. */ + const char **parent_names; /* Array of parent names */ + int parent_cnt; /* Number of parents */ + int parent_idx; /* Parent index or -1 */ + + /* Cache for already resolved names. */ + struct clknode **parents; /* Array of potential parents */ + struct clknode *parent; /* Current parent */ + + /* Parent/child relationship links. */ + clknode_list_t children; /* List of our children */ + TAILQ_ENTRY(clknode) sibling_link; /* Our entry in parent's list */ + + /* Details of this device. */ + void *softc; /* Instance softc */ + const char *name; /* Globally unique name */ + intptr_t id; /* Per domain unique id */ + int flags; /* CLK_FLAG_* */ + struct sx lock; /* Lock for this clock */ + int ref_cnt; /* Reference counter */ + int enable_cnt; /* Enabled counter */ + + /* Cached values. */ + uint64_t freq; /* Actual frequency */ +}; + +/* + * Per consumer data, information about how a consumer is using a clock node. + * A pointer to this structure is used as a handle in the consumer interface. + */ +struct clk { + device_t dev; + struct clknode *clknode; + int enable_cnt; +}; + +/* + * Clock domain - a group of clocks provided by one clock device. + */ +struct clkdom { + device_t dev; /* Link to provider device */ + TAILQ_ENTRY(clkdom) link; /* Global domain list entry */ + clknode_list_t clknode_list; /* All clocks in the domain */ + +#ifdef FDT + clknode_ofw_mapper_func *ofw_mapper; /* Find clock using FDT xref */ +#endif +}; + +/* + * The system-wide list of clock domains. + */ +static clkdom_list_t clkdom_list = TAILQ_HEAD_INITIALIZER(clkdom_list); + +/* + * Each clock node is linked on a system-wide list and can be searched by name. + */ +static clknode_list_t clknode_list = TAILQ_HEAD_INITIALIZER(clknode_list); + +/* + * Locking - we use three levels of locking: + * - First, topology lock is taken. This one protect all lists. + * - Second level is per clknode lock. It protects clknode data. + * - Third level is outside of this file, it protect clock device registers. + * First two levels use sleepable locks; clock device can use mutex or sx lock. + */ +static struct sx clk_topo_lock; +SX_SYSINIT(clock_topology, &clk_topo_lock, "Clock topology lock"); + +#define CLK_TOPO_SLOCK() sx_slock(&clk_topo_lock) +#define CLK_TOPO_XLOCK() sx_xlock(&clk_topo_lock) +#define CLK_TOPO_UNLOCK() sx_unlock(&clk_topo_lock) +#define CLK_TOPO_ASSERT() sx_assert(&clk_topo_lock, SA_LOCKED) +#define CLK_TOPO_XASSERT() sx_assert(&clk_topo_lock, SA_XLOCKED) + +#define CLKNODE_SLOCK(_sc) sx_slock(&((_sc)->lock)) +#define CLKNODE_XLOCK(_sc) sx_xlock(&((_sc)->lock)) +#define CLKNODE_UNLOCK(_sc) sx_unlock(&((_sc)->lock)) + +static void clknode_adjust_parent(struct clknode *clknode, int idx); + +/* + * Default clock methods for base class. + */ +static int +clknode_method_init(struct clknode *clknode, device_t dev) +{ + + return (0); +} + +static int +clknode_method_recalc_freq(struct clknode *clknode, uint64_t *freq) +{ + + return (0); +} + +static int +clknode_method_set_freq(struct clknode *clknode, uint64_t fin, uint64_t *fout, + int flags, int *stop) +{ + + *stop = 0; + return (0); +} + +static int +clknode_method_set_gate(struct clknode *clk, bool enable) +{ + + return (0); +} + +static int +clknode_method_set_mux(struct clknode *clk, int idx) +{ + + return (0); +} + +/* + * Internal functions. + */ + +/* + * Duplicate an array of parent names. + * + * Compute total size and allocate a single block which holds both the array of + * pointers to strings and the copied strings themselves. Returns a pointer to + * the start of the block where the array of copied string pointers lives. + * + * XXX Revisit this, no need for the DECONST stuff. + */ +static const char ** +strdup_list(const char **names, int num) +{ + size_t len, slen; + const char **outptr, *ptr; + int i; + + len = sizeof(char *) * num; + for (i = 0; i < num; i++) { + if (names[i] == NULL) + continue; + slen = strlen(names[i]); + if (slen == 0) + panic("Clock parent names array have empty string"); + len += slen + 1; + } + outptr = malloc(len, M_CLOCK, M_WAITOK | M_ZERO); + ptr = (char *)(outptr + num); + for (i = 0; i < num; i++) { + if (names[i] == NULL) + continue; + outptr[i] = ptr; + slen = strlen(names[i]) + 1; + bcopy(names[i], __DECONST(void *, outptr[i]), slen); + ptr += slen; + } + return (outptr); +} + +/* + * Recompute the cached frequency for this node and all its children. + */ +static int +clknode_refresh_cache(struct clknode *clknode, uint64_t freq) +{ + int rv; + struct clknode *entry; + + CLK_TOPO_XASSERT(); + + /* Compute generated frequency. */ + rv = CLKNODE_RECALC_FREQ(clknode, &freq); + if (rv != 0) { + /* XXX If an error happens while refreshing children + * this leaves the world in a partially-updated state. + * Panic for now. + */ + panic("clknode_refresh_cache failed for '%s'\n", + clknode->name); + return (rv); + } + /* Refresh cache for this node. */ + clknode->freq = freq; + + /* Refresh cache for all children. */ + TAILQ_FOREACH(entry, &(clknode->children), sibling_link) { + rv = clknode_refresh_cache(entry, freq); + if (rv != 0) + return (rv); + } + return (0); +} + +/* + * Public interface. + */ + +struct clknode * +clknode_find_by_name(const char *name) +{ + struct clknode *entry; + + CLK_TOPO_ASSERT(); + + TAILQ_FOREACH(entry, &clknode_list, clklist_link) { + if (strcmp(entry->name, name) == 0) + return (entry); + } + return (NULL); +} + +struct clknode * +clknode_find_by_id(struct clkdom *clkdom, intptr_t id) +{ + struct clknode *entry; + + CLK_TOPO_ASSERT(); + + TAILQ_FOREACH(entry, &clkdom->clknode_list, clkdom_link) { + if (entry->id == id) + return (entry); + } + + return (NULL); +} + +/* -------------------------------------------------------------------------- */ +/* + * Clock domain functions + */ + +/* Find clock domain associated to device in global list. */ +struct clkdom * +clkdom_get_by_dev(const device_t dev) +{ + struct clkdom *entry; + + CLK_TOPO_ASSERT(); + + TAILQ_FOREACH(entry, &clkdom_list, link) { + if (entry->dev == dev) + return (entry); + } + return (NULL); +} + + +#ifdef FDT +/* Default DT mapper. */ +static int +clknode_default_ofw_map(struct clkdom *clkdom, uint32_t ncells, + phandle_t *cells, struct clknode **clk) +{ + + CLK_TOPO_ASSERT(); + + if (ncells == 0) + *clk = clknode_find_by_id(clkdom, 1); + else if (ncells == 1) + *clk = clknode_find_by_id(clkdom, cells[0]); + else + return (ERANGE); + + if (*clk == NULL) + return (ENXIO); + return (0); +} +#endif + +/* + * Create a clock domain. Returns with the topo lock held. + */ +struct clkdom * +clkdom_create(device_t dev) +{ + struct clkdom *clkdom; + + clkdom = malloc(sizeof(struct clkdom), M_CLOCK, M_WAITOK | M_ZERO); + clkdom->dev = dev; + TAILQ_INIT(&clkdom->clknode_list); +#ifdef FDT + clkdom->ofw_mapper = clknode_default_ofw_map; +#endif + + return (clkdom); +} + +void +clkdom_unlock(struct clkdom *clkdom) +{ + + CLK_TOPO_UNLOCK(); +} + +void +clkdom_xlock(struct clkdom *clkdom) +{ + + CLK_TOPO_XLOCK(); +} + +/* + * Finalize initialization of clock domain. Releases topo lock. + * + * XXX Revisit failure handling. + */ +int +clkdom_finit(struct clkdom *clkdom) +{ + struct clknode *clknode; + int i, rv; +#ifdef FDT + phandle_t node; + + + if ((node = ofw_bus_get_node(clkdom->dev)) == -1) { + device_printf(clkdom->dev, + "%s called on not ofw based device\n", __func__); + return (ENXIO); + } +#endif + rv = 0; + + /* Make clock domain globally visible. */ + CLK_TOPO_XLOCK(); + TAILQ_INSERT_TAIL(&clkdom_list, clkdom, link); +#ifdef FDT + OF_device_register_xref(OF_xref_from_node(node), clkdom->dev); +#endif + + /* Register all clock names into global list. */ + TAILQ_FOREACH(clknode, &clkdom->clknode_list, clkdom_link) { + TAILQ_INSERT_TAIL(&clknode_list, clknode, clklist_link); + } + /* + * At this point all domain nodes must be registered and all + * parents must be valid. + */ + TAILQ_FOREACH(clknode, &clkdom->clknode_list, clkdom_link) { + if (clknode->parent_cnt == 0) + continue; + for (i = 0; i < clknode->parent_cnt; i++) { + if (clknode->parents[i] != NULL) + continue; + if (clknode->parent_names[i] == NULL) + continue; + clknode->parents[i] = clknode_find_by_name( + clknode->parent_names[i]); + if (clknode->parents[i] == NULL) { + device_printf(clkdom->dev, + "Clock %s have unknown parent: %s\n", + clknode->name, clknode->parent_names[i]); + rv = ENODEV; + } + } + + /* If parent index is not set yet... */ + if (clknode->parent_idx == CLKNODE_IDX_NONE) { + device_printf(clkdom->dev, + "Clock %s have not set parent idx\n", + clknode->name); + rv = ENXIO; + continue; + } + if (clknode->parents[clknode->parent_idx] == NULL) { + device_printf(clkdom->dev, + "Clock %s have unknown parent(idx %d): %s\n", + clknode->name, clknode->parent_idx, + clknode->parent_names[clknode->parent_idx]); + rv = ENXIO; + continue; + } + clknode_adjust_parent(clknode, clknode->parent_idx); + } + CLK_TOPO_UNLOCK(); + return (rv); +} + +/* Dump clock domain. */ +void +clkdom_dump(struct clkdom * clkdom) +{ + struct clknode *clknode; + int rv; + uint64_t freq; + + CLK_TOPO_SLOCK(); + TAILQ_FOREACH(clknode, &clkdom->clknode_list, clkdom_link) { + rv = clknode_get_freq(clknode, &freq); + printf("Clock: %s, parent: %s(%d), freq: %llu\n", clknode->name, + clknode->parent == NULL ? "(NULL)" : clknode->parent->name, + clknode->parent_idx, + ((rv == 0) ? freq: rv)); + } + CLK_TOPO_UNLOCK(); +} + +/* + * Create and initialize clock object, but do not register it. + */ +struct clknode * +clknode_create(struct clkdom * clkdom, clknode_class_t clknode_class, + const struct clknode_init_def *def) +{ + struct clknode *clknode; + + KASSERT(def->name != NULL, ("clock name is NULL")); + KASSERT(def->name[0] != '\0', ("clock name is empty")); +#ifdef INVARIANTS + CLK_TOPO_SLOCK(); + if (clknode_find_by_name(def->name) != NULL) + panic("Duplicated clock registration: %s\n", def->name); + CLK_TOPO_UNLOCK(); +#endif + + /* Create object and initialize it. */ + clknode = malloc(sizeof(struct clknode), M_CLOCK, M_WAITOK | M_ZERO); + kobj_init((kobj_t)clknode, (kobj_class_t)clknode_class); + sx_init(&clknode->lock, "Clocknode lock"); + + /* Allocate softc if required. */ + if (clknode_class->size > 0) { + clknode->softc = malloc(clknode_class->size, + M_CLOCK, M_WAITOK | M_ZERO); + } + + /* Prepare array for ptrs to parent clocks. */ + clknode->parents = malloc(sizeof(struct clknode *) * def->parent_cnt, + M_CLOCK, M_WAITOK | M_ZERO); + + /* Copy all strings unless they're flagged as static. */ + if (def->flags & CLK_NODE_STATIC_STRINGS) { + clknode->name = def->name; + clknode->parent_names = def->parent_names; + } else { + clknode->name = strdup(def->name, M_CLOCK); + clknode->parent_names = + strdup_list(def->parent_names, def->parent_cnt); + } + + /* Rest of init. */ + clknode->id = def->id; + clknode->clkdom = clkdom; + clknode->flags = def->flags; + clknode->parent_cnt = def->parent_cnt; + clknode->parent = NULL; + clknode->parent_idx = CLKNODE_IDX_NONE; + TAILQ_INIT(&clknode->children); + + return (clknode); +} + +/* + * Register clock object into clock domain hierarchy. + */ +struct clknode * +clknode_register(struct clkdom * clkdom, struct clknode *clknode) +{ + int rv; + + rv = CLKNODE_INIT(clknode, clknode_get_device(clknode)); + if (rv != 0) { + printf(" CLKNODE_INIT failed: %d\n", rv); + return (NULL); + } + + TAILQ_INSERT_TAIL(&clkdom->clknode_list, clknode, clkdom_link); + + return (clknode); +} + +/* + * Clock providers interface. + */ + +/* + * Reparent clock node. + */ +static void +clknode_adjust_parent(struct clknode *clknode, int idx) +{ + + CLK_TOPO_XASSERT(); + + if (clknode->parent_cnt == 0) + return; + if ((idx == CLKNODE_IDX_NONE) || (idx >= clknode->parent_cnt)) + panic("Invalid clock parent index\n"); + + if (clknode->parents[idx] == NULL) + panic("%s: Attempt to set invalid parent %d for clock %s", + __func__, idx, clknode->name); + + /* Remove me from old children list. */ + if (clknode->parent != NULL) { + TAILQ_REMOVE(&clknode->parent->children, clknode, sibling_link); + } + + /* Insert into children list of new parent. */ + clknode->parent_idx = idx; + clknode->parent = clknode->parents[idx]; + TAILQ_INSERT_TAIL(&clknode->parent->children, clknode, sibling_link); +} + +/* + * Set parent index - init function. + */ +void +clknode_init_parent_idx(struct clknode *clknode, int idx) +{ + + if (clknode->parent_cnt == 0) { + clknode->parent_idx = CLKNODE_IDX_NONE; + clknode->parent = NULL; + return; + } + if ((idx == CLKNODE_IDX_NONE) || + (idx >= clknode->parent_cnt) || + (clknode->parent_names[idx] == NULL)) + panic("%s: Invalid clock parent index: %d\n", __func__, idx); + + clknode->parent_idx = idx; +} + +int +clknode_set_parent_by_idx(struct clknode *clknode, int idx) +{ + int rv; + uint64_t freq; + int oldidx; + + /* We have exclusive topology lock, node lock is not needed. */ + CLK_TOPO_XASSERT(); + + if (clknode->parent_cnt == 0) + return (0); + + if (clknode->parent_idx == idx) + return (0); + + oldidx = clknode->parent_idx; + clknode_adjust_parent(clknode, idx); + rv = CLKNODE_SET_MUX(clknode, idx); + if (rv != 0) { + clknode_adjust_parent(clknode, oldidx); + return (rv); + } + rv = clknode_get_freq(clknode->parent, &freq); + if (rv != 0) + return (rv); + rv = clknode_refresh_cache(clknode, freq); + return (rv); +} + +int +clknode_set_parent_by_name(struct clknode *clknode, const char *name) +{ + int rv; + uint64_t freq; + int oldidx, idx; + + /* We have exclusive topology lock, node lock is not needed. */ + CLK_TOPO_XASSERT(); + + if (clknode->parent_cnt == 0) + return (0); + + /* + * If this node doesnt have mux, then passthrough request to parent. + * This feature is used in clock domain initialization and allows us to + * set clock source and target frequency on the tail node of the clock + * chain. + */ + if (clknode->parent_cnt == 1) { + rv = clknode_set_parent_by_name(clknode->parent, name); + return (rv); + } + + for (idx = 0; idx < clknode->parent_cnt; idx++) { + if (clknode->parent_names[idx] == NULL) + continue; + if (strcmp(clknode->parent_names[idx], name) == 0) + break; + } + if (idx >= clknode->parent_cnt) { + return (ENXIO); + } + if (clknode->parent_idx == idx) + return (0); + + oldidx = clknode->parent_idx; + clknode_adjust_parent(clknode, idx); + rv = CLKNODE_SET_MUX(clknode, idx); + if (rv != 0) { + clknode_adjust_parent(clknode, oldidx); + CLKNODE_UNLOCK(clknode); + return (rv); + } + rv = clknode_get_freq(clknode->parent, &freq); + if (rv != 0) + return (rv); + rv = clknode_refresh_cache(clknode, freq); + return (rv); +} + +struct clknode * +clknode_get_parent(struct clknode *clknode) +{ + + return (clknode->parent); +} + +const char * +clknode_get_name(struct clknode *clknode) +{ + + return (clknode->name); +} + +const char ** +clknode_get_parent_names(struct clknode *clknode) +{ + + return (clknode->parent_names); +} + +int +clknode_get_parents_num(struct clknode *clknode) +{ + + return (clknode->parent_cnt); +} + +int +clknode_get_parent_idx(struct clknode *clknode) +{ + + return (clknode->parent_idx); +} + +int +clknode_get_flags(struct clknode *clknode) +{ + + return (clknode->flags); +} + + +void * +clknode_get_softc(struct clknode *clknode) +{ + + return (clknode->softc); +} + +device_t +clknode_get_device(struct clknode *clknode) +{ + + return (clknode->clkdom->dev); +} + +#ifdef FDT +void +clkdom_set_ofw_mapper(struct clkdom * clkdom, clknode_ofw_mapper_func *map) +{ + + clkdom->ofw_mapper = map; +} +#endif + +/* + * Real consumers executive + */ +int +clknode_get_freq(struct clknode *clknode, uint64_t *freq) +{ + int rv; + + CLK_TOPO_ASSERT(); + + /* Use cached value, if it exists. */ + *freq = clknode->freq; + if (*freq != 0) + return (0); + + /* Get frequency from parent, if the clock has a parent. */ + if (clknode->parent_cnt > 0) { + rv = clknode_get_freq(clknode->parent, freq); + if (rv != 0) { + return (rv); + } + } + + /* And recalculate my output frequency. */ + CLKNODE_XLOCK(clknode); + rv = CLKNODE_RECALC_FREQ(clknode, freq); + if (rv != 0) { + CLKNODE_UNLOCK(clknode); + printf("Cannot get frequency for clk: %s, error: %d\n", + clknode->name, rv); + return (rv); + } + + /* Save new frequency to cache. */ + clknode->freq = *freq; + CLKNODE_UNLOCK(clknode); + return (0); +} + +int +clknode_set_freq(struct clknode *clknode, uint64_t freq, int flags, + int enablecnt) +{ + int rv, done; + uint64_t parent_freq; + + /* We have exclusive topology lock, node lock is not needed. */ + CLK_TOPO_XASSERT(); + + parent_freq = 0; + + /* + * We can set frequency only if + * clock is disabled + * OR + * clock is glitch free and is enabled by calling consumer only + */ + if ((clknode->enable_cnt > 1) && + ((clknode->enable_cnt > enablecnt) || + !(clknode->flags & CLK_NODE_GLITCH_FREE))) { + return (EBUSY); + } + + /* Get frequency from parent, if the clock has a parent. */ + if (clknode->parent_cnt > 0) { + rv = clknode_get_freq(clknode->parent, &parent_freq); + if (rv != 0) { + return (rv); + } + } + + /* Set frequency for this clock. */ + rv = CLKNODE_SET_FREQ(clknode, parent_freq, &freq, flags, &done); + if (rv != 0) { + printf("Cannot set frequency for clk: %s, error: %d\n", + clknode->name, rv); + if ((flags & CLK_SET_DRYRUN) == 0) + clknode_refresh_cache(clknode, parent_freq); + return (rv); + } + + if (done) { + /* Success - invalidate frequency cache for all children. */ + clknode->freq = freq; + if ((flags & CLK_SET_DRYRUN) == 0) + clknode_refresh_cache(clknode, parent_freq); + } else if (clknode->parent != NULL) { + /* Nothing changed, pass request to parent. */ + rv = clknode_set_freq(clknode->parent, freq, flags, enablecnt); + } else { + /* End of chain without action. */ + printf("Cannot set frequency for clk: %s, end of chain\n", + clknode->name); + rv = ENXIO; + } + + return (rv); +} + +int +clknode_enable(struct clknode *clknode) +{ + int rv; + + CLK_TOPO_ASSERT(); + + /* Enable clock for each node in chain, starting from source. */ + if (clknode->parent_cnt > 0) { + rv = clknode_enable(clknode->parent); + if (rv != 0) { + return (rv); + } + } + + /* Handle this node */ + CLKNODE_XLOCK(clknode); + if (clknode->enable_cnt == 0) { + rv = CLKNODE_SET_GATE(clknode, 1); + if (rv != 0) { + CLKNODE_UNLOCK(clknode); + return (rv); + } + } + clknode->enable_cnt++; + CLKNODE_UNLOCK(clknode); + return (0); +} + +int +clknode_disable(struct clknode *clknode) +{ + int rv; + + CLK_TOPO_ASSERT(); + rv = 0; + + CLKNODE_XLOCK(clknode); + /* Disable clock for each node in chain, starting from consumer. */ + if ((clknode->enable_cnt == 1) && + ((clknode->flags & CLK_NODE_CANNOT_STOP) == 0)) { + rv = CLKNODE_SET_GATE(clknode, 0); + if (rv != 0) { + CLKNODE_UNLOCK(clknode); + return (rv); + } + } + clknode->enable_cnt--; + CLKNODE_UNLOCK(clknode); + + if (clknode->parent_cnt > 0) { + rv = clknode_disable(clknode->parent); + } + return (rv); +} + +int +clknode_stop(struct clknode *clknode, int depth) +{ + int rv; + + CLK_TOPO_ASSERT(); + rv = 0; + + CLKNODE_XLOCK(clknode); + /* The first node cannot be enabled. */ + if ((clknode->enable_cnt != 0) && (depth == 0)) { + CLKNODE_UNLOCK(clknode); + return (EBUSY); + } + /* Stop clock for each node in chain, starting from consumer. */ + if ((clknode->enable_cnt == 0) && + ((clknode->flags & CLK_NODE_CANNOT_STOP) == 0)) { + rv = CLKNODE_SET_GATE(clknode, 0); + if (rv != 0) { + CLKNODE_UNLOCK(clknode); + return (rv); + } + } + CLKNODE_UNLOCK(clknode); + + if (clknode->parent_cnt > 0) + rv = clknode_stop(clknode->parent, depth + 1); + return (rv); +} + +/* -------------------------------------------------------------------------- + * + * Clock consumers interface. + * + */ +/* Helper function for clk_get*() */ +static clk_t +clk_create(struct clknode *clknode, device_t dev) +{ + struct clk *clk; + + CLK_TOPO_ASSERT(); + + clk = malloc(sizeof(struct clk), M_CLOCK, M_WAITOK); + clk->dev = dev; + clk->clknode = clknode; + clk->enable_cnt = 0; + clknode->ref_cnt++; + + return (clk); +} + +int +clk_get_freq(clk_t clk, uint64_t *freq) +{ + int rv; + struct clknode *clknode; + + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + + CLK_TOPO_SLOCK(); + rv = clknode_get_freq(clknode, freq); + CLK_TOPO_UNLOCK(); + return (rv); +} + +int +clk_set_freq(clk_t clk, uint64_t freq, int flags) +{ + int rv; + struct clknode *clknode; + + flags &= CLK_SET_USER_MASK; + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + + CLK_TOPO_XLOCK(); + rv = clknode_set_freq(clknode, freq, flags, clk->enable_cnt); + CLK_TOPO_UNLOCK(); + return (rv); +} + +int +clk_test_freq(clk_t clk, uint64_t freq, int flags) +{ + int rv; + struct clknode *clknode; + + flags &= CLK_SET_USER_MASK; + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + + CLK_TOPO_XLOCK(); + rv = clknode_set_freq(clknode, freq, flags | CLK_SET_DRYRUN, 0); + CLK_TOPO_UNLOCK(); + return (rv); +} + +int +clk_get_parent(clk_t clk, clk_t *parent) +{ + struct clknode *clknode; + struct clknode *parentnode; + + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + + CLK_TOPO_SLOCK(); + parentnode = clknode_get_parent(clknode); + if (parentnode == NULL) { + CLK_TOPO_UNLOCK(); + return (ENODEV); + } + *parent = clk_create(parentnode, clk->dev); + CLK_TOPO_UNLOCK(); + return (0); +} + +int +clk_set_parent_by_clk(clk_t clk, clk_t parent) +{ + int rv; + struct clknode *clknode; + struct clknode *parentnode; + + clknode = clk->clknode; + parentnode = parent->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + KASSERT(parentnode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + CLK_TOPO_XLOCK(); + rv = clknode_set_parent_by_name(clknode, parentnode->name); + CLK_TOPO_UNLOCK(); + return (rv); +} + +int +clk_enable(clk_t clk) +{ + int rv; + struct clknode *clknode; + + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + CLK_TOPO_SLOCK(); + rv = clknode_enable(clknode); + if (rv == 0) + clk->enable_cnt++; + CLK_TOPO_UNLOCK(); + return (rv); +} + +int +clk_disable(clk_t clk) +{ + int rv; + struct clknode *clknode; + + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + KASSERT(clk->enable_cnt > 0, + ("Attempt to disable already disabled clock: %s\n", clknode->name)); + CLK_TOPO_SLOCK(); + rv = clknode_disable(clknode); + if (rv == 0) + clk->enable_cnt--; + CLK_TOPO_UNLOCK(); + return (rv); +} + +int +clk_stop(clk_t clk) +{ + int rv; + struct clknode *clknode; + + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + KASSERT(clk->enable_cnt == 0, + ("Attempt to stop already enabled clock: %s\n", clknode->name)); + + CLK_TOPO_SLOCK(); + rv = clknode_stop(clknode, 0); + CLK_TOPO_UNLOCK(); + return (rv); +} + +int +clk_release(clk_t clk) +{ + struct clknode *clknode; + + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + CLK_TOPO_SLOCK(); + while (clk->enable_cnt > 0) { + clknode_disable(clknode); + clk->enable_cnt--; + } + CLKNODE_XLOCK(clknode); + clknode->ref_cnt--; + CLKNODE_UNLOCK(clknode); + CLK_TOPO_UNLOCK(); + + free(clk, M_CLOCK); + return (0); +} + +const char * +clk_get_name(clk_t clk) +{ + const char *name; + struct clknode *clknode; + + clknode = clk->clknode; + KASSERT(clknode->ref_cnt > 0, + ("Attempt to access unreferenced clock: %s\n", clknode->name)); + name = clknode_get_name(clknode); + return (name); +} + +int +clk_get_by_name(device_t dev, const char *name, clk_t *clk) +{ + struct clknode *clknode; + + CLK_TOPO_SLOCK(); + clknode = clknode_find_by_name(name); + if (clknode == NULL) { + CLK_TOPO_UNLOCK(); + return (ENODEV); + } + *clk = clk_create(clknode, dev); + CLK_TOPO_UNLOCK(); + return (0); +} + +int +clk_get_by_id(device_t dev, struct clkdom *clkdom, intptr_t id, clk_t *clk) +{ + struct clknode *clknode; + + CLK_TOPO_SLOCK(); + + clknode = clknode_find_by_id(clkdom, id); + if (clknode == NULL) { + CLK_TOPO_UNLOCK(); + return (ENODEV); + } + *clk = clk_create(clknode, dev); + CLK_TOPO_UNLOCK(); + + return (0); +} + +#ifdef FDT + +int +clk_get_by_ofw_index(device_t dev, int idx, clk_t *clk) +{ + phandle_t cnode, parent, *cells; + device_t clockdev; + int ncells, rv; + struct clkdom *clkdom; + struct clknode *clknode; + + *clk = NULL; + + cnode = ofw_bus_get_node(dev); + if (cnode <= 0) { + device_printf(dev, "%s called on not ofw based device\n", + __func__); + return (ENXIO); + } + + rv = ofw_bus_parse_xref_list_alloc(cnode, "clocks", "#clock-cells", idx, + &parent, &ncells, &cells); + if (rv != 0) { + return (rv); + } + + clockdev = OF_device_from_xref(parent); + if (clockdev == NULL) { + rv = ENODEV; + goto done; + } + + CLK_TOPO_SLOCK(); + clkdom = clkdom_get_by_dev(clockdev); + if (clkdom == NULL){ + CLK_TOPO_UNLOCK(); + rv = ENXIO; + goto done; + } + + rv = clkdom->ofw_mapper(clkdom, ncells, cells, &clknode); + if (rv == 0) { + *clk = clk_create(clknode, dev); + } + CLK_TOPO_UNLOCK(); + +done: + if (cells != NULL) + free(cells, M_OFWPROP); + return (rv); +} + +int +clk_get_by_ofw_name(device_t dev, const char *name, clk_t *clk) +{ + int rv, idx; + phandle_t cnode; + + cnode = ofw_bus_get_node(dev); + if (cnode <= 0) { + device_printf(dev, "%s called on not ofw based device\n", + __func__); + return (ENXIO); + } + rv = ofw_bus_find_string_index(cnode, "clock-names", name, &idx); + if (rv != 0) + return (rv); + return (clk_get_by_ofw_index(dev, idx, clk)); +} +#endif Property changes on: projects/clang380-import/sys/dev/extres/clk/clk.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk.h =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk.h (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk.h (revision 294777) @@ -0,0 +1,136 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _DEV_EXTRES_CLK_H_ +#define _DEV_EXTRES_CLK_H_ +#include "opt_platform.h" + +#include +#ifdef FDT +#include +#endif +#include "clknode_if.h" + +#define CLKNODE_IDX_NONE -1 /* Not-selected index */ + +/* clknode flags. */ +#define CLK_NODE_STATIC_STRINGS 0x00000001 /* Static name strings */ +#define CLK_NODE_GLITCH_FREE 0x00000002 /* Freq can change w/o stop */ +#define CLK_NODE_CANNOT_STOP 0x00000004 /* Clock cannot be disabled */ + +/* Flags passed to clk_set_freq() and clknode_set_freq(). */ +#define CLK_SET_ROUND_UP 0x00000001 +#define CLK_SET_ROUND_DOWN 0x00000002 +#define CLK_SET_USER_MASK 0x0000FFFF +#define CLK_SET_DRYRUN 0x00010000 + +typedef struct clk *clk_t; + +/* Initialization parameters for clocknode creation. */ +struct clknode_init_def { + const char *name; + intptr_t id; + const char **parent_names; + int parent_cnt; + int flags; +}; + +/* + * Shorthands for constructing method tables. + */ +#define CLKNODEMETHOD KOBJMETHOD +#define CLKNODEMETHOD_END KOBJMETHOD_END +#define clknode_method_t kobj_method_t +#define clknode_class_t kobj_class_t +DECLARE_CLASS(clknode_class); + +/* + * Clock domain functions. + */ +struct clkdom *clkdom_create(device_t dev); +int clkdom_finit(struct clkdom *clkdom); +void clkdom_dump(struct clkdom * clkdom); +void clkdom_unlock(struct clkdom *clkdom); +void clkdom_xlock(struct clkdom *clkdom); + +/* + * Clock providers interface. + */ +struct clkdom *clkdom_get_by_dev(const device_t dev); + +struct clknode *clknode_create(struct clkdom *clkdom, + clknode_class_t clknode_class, const struct clknode_init_def *def); +struct clknode *clknode_register(struct clkdom *cldom, struct clknode *clk); +#ifdef FDT +typedef int clknode_ofw_mapper_func(struct clkdom *clkdom, uint32_t ncells, + phandle_t *cells, struct clknode **clk); +void clkdom_set_ofw_mapper(struct clkdom *clkdom, clknode_ofw_mapper_func *cmp); +#endif + +void clknode_init_parent_idx(struct clknode *clknode, int idx); +int clknode_set_parent_by_idx(struct clknode *clk, int idx); +int clknode_set_parent_by_name(struct clknode *clk, const char *name); +const char *clknode_get_name(struct clknode *clk); +const char **clknode_get_parent_names(struct clknode *clk); +int clknode_get_parents_num(struct clknode *clk); +int clknode_get_parent_idx(struct clknode *clk); +struct clknode *clknode_get_parent(struct clknode *clk); +int clknode_get_flags(struct clknode *clk); +void *clknode_get_softc(struct clknode *clk); +device_t clknode_get_device(struct clknode *clk); +struct clknode *clknode_find_by_name(const char *name); +struct clknode *clknode_find_by_id(struct clkdom *clkdom, intptr_t id); +int clknode_get_freq(struct clknode *clknode, uint64_t *freq); +int clknode_set_freq(struct clknode *clknode, uint64_t freq, int flags, + int enablecnt); +int clknode_enable(struct clknode *clknode); +int clknode_disable(struct clknode *clknode); +int clknode_stop(struct clknode *clknode, int depth); + +/* + * Clock consumers interface. + */ +int clk_get_by_name(device_t dev, const char *name, clk_t *clk); +int clk_get_by_id(device_t dev, struct clkdom *clkdom, intptr_t id, clk_t *clk); +int clk_release(clk_t clk); +int clk_get_freq(clk_t clk, uint64_t *freq); +int clk_set_freq(clk_t clk, uint64_t freq, int flags); +int clk_test_freq(clk_t clk, uint64_t freq, int flags); +int clk_enable(clk_t clk); +int clk_disable(clk_t clk); +int clk_stop(clk_t clk); +int clk_get_parent(clk_t clk, clk_t *parent); +int clk_set_parent_by_clk(clk_t clk, clk_t parent); +const char *clk_get_name(clk_t clk); + +#ifdef FDT +int clk_get_by_ofw_index(device_t dev, int idx, clk_t *clk); +int clk_get_by_ofw_name(device_t dev, const char *name, clk_t *clk); +#endif + +#endif /* _DEV_EXTRES_CLK_H_ */ Property changes on: projects/clang380-import/sys/dev/extres/clk/clk.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk_div.c =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk_div.c (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk_div.c (revision 294777) @@ -0,0 +1,209 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + + +#include +#include +#include +#include +#include + +#include + +#include + +#include "clkdev_if.h" + +#define WR4(_clk, off, val) \ + CLKDEV_WRITE_4(clknode_get_device(_clk), off, val) +#define RD4(_clk, off, val) \ + CLKDEV_READ_4(clknode_get_device(_clk), off, val) +#define MD4(_clk, off, clr, set ) \ + CLKDEV_MODIFY_4(clknode_get_device(_clk), off, clr, set) + +static int clknode_div_init(struct clknode *clk, device_t dev); +static int clknode_div_recalc(struct clknode *clk, uint64_t *req); +static int clknode_div_set_freq(struct clknode *clknode, uint64_t fin, + uint64_t *fout, int flag, int *stop); + +struct clknode_div_sc { + struct mtx *mtx; + struct resource *mem_res; + uint32_t offset; + uint32_t i_shift; + uint32_t i_mask; + uint32_t i_width; + uint32_t f_shift; + uint32_t f_mask; + uint32_t f_width; + int div_flags; + uint32_t divider; /* in natural form */ +}; + +static clknode_method_t clknode_div_methods[] = { + /* Device interface */ + CLKNODEMETHOD(clknode_init, clknode_div_init), + CLKNODEMETHOD(clknode_recalc_freq, clknode_div_recalc), + CLKNODEMETHOD(clknode_set_freq, clknode_div_set_freq), + CLKNODEMETHOD_END +}; +DEFINE_CLASS_1(clknode_div, clknode_div_class, clknode_div_methods, + sizeof(struct clknode_div_sc), clknode_class); + +static int +clknode_div_init(struct clknode *clk, device_t dev) +{ + uint32_t reg; + struct clknode_div_sc *sc; + uint32_t i_div, f_div; + int rv; + + sc = clknode_get_softc(clk); + + rv = RD4(clk, sc->offset, ®); + if (rv != 0) + return (rv); + + i_div = (reg >> sc->i_shift) & sc->i_mask; + if (!(sc->div_flags & CLK_DIV_ZERO_BASED)) + i_div++; + f_div = (reg >> sc->f_shift) & sc->f_mask; + sc->divider = i_div << sc->f_width | f_div; + clknode_init_parent_idx(clk, 0); + return(0); +} + +static int +clknode_div_recalc(struct clknode *clk, uint64_t *freq) +{ + struct clknode_div_sc *sc; + + sc = clknode_get_softc(clk); + if (sc->divider == 0) { + printf("%s: %s divider is zero!\n", clknode_get_name(clk), + __func__); + *freq = 0; + return(EINVAL); + } + *freq = (*freq << sc->f_width) / sc->divider; + return (0); +} + +static int +clknode_div_set_freq(struct clknode *clk, uint64_t fin, uint64_t *fout, + int flags, int *stop) +{ + struct clknode_div_sc *sc; + uint64_t divider, _fin, _fout; + uint32_t reg, i_div, f_div, hw_i_div; + int rv; + + sc = clknode_get_softc(clk); + + /* For fractional divider. */ + _fin = fin << sc->f_width; + divider = (_fin + *fout / 2) / *fout; + _fout = _fin / divider; + + /* Rounding. */ + if ((flags & CLK_SET_ROUND_UP) && (*fout < _fout)) + divider--; + else if ((flags & CLK_SET_ROUND_DOWN) && (*fout > _fout)) + divider++; + + /* Break divider into integer and fractional parts. */ + i_div = divider >> sc->f_width; + f_div = divider & sc->f_mask; + + if (i_div == 0) { + printf("%s: %s integer divider is zero!\n", + clknode_get_name(clk), __func__); + return(EINVAL); + } + + hw_i_div = i_div; + if (!(sc->div_flags & CLK_DIV_ZERO_BASED)) + hw_i_div--; + + *stop = 1; + if (hw_i_div > sc->i_mask) { + /* XXX Or only return error? */ + printf("%s: %s integer divider is too big: %u\n", + clknode_get_name(clk), __func__, hw_i_div); + hw_i_div = sc->i_mask; + *stop = 0; + } + + i_div = hw_i_div; + if (!(sc->div_flags & CLK_DIV_ZERO_BASED)) + i_div++; + divider = i_div << sc->f_width | f_div; + + if ((flags & CLK_SET_DRYRUN) == 0) { + if ((*stop != 0) && + ((flags & (CLK_SET_ROUND_UP | CLK_SET_ROUND_DOWN)) == 0) && + (*fout != (_fin / divider))) + return (ERANGE); + + rv = MD4(clk, sc->offset, + (sc->i_mask << sc->i_shift) | (sc->f_mask << sc->f_shift), + (i_div << sc->i_shift) | (f_div << sc->f_shift)); + if (rv != 0) + return (rv); + RD4(clk, sc->offset, ®); + sc->divider = divider; + } + + *fout = _fin / divider; + return (0); +} + +int +clknode_div_register(struct clkdom *clkdom, struct clk_div_def *clkdef) +{ + struct clknode *clk; + struct clknode_div_sc *sc; + + clk = clknode_create(clkdom, &clknode_div_class, &clkdef->clkdef); + if (clk == NULL) + return (1); + + sc = clknode_get_softc(clk); + sc->offset = clkdef->offset; + sc->i_shift = clkdef->i_shift; + sc->i_width = clkdef->i_width; + sc->i_mask = (1 << clkdef->i_width) - 1; + sc->f_shift = clkdef->f_shift; + sc->f_width = clkdef->f_width; + sc->f_mask = (1 << clkdef->f_width) - 1; + sc->div_flags = clkdef->div_flags; + + clknode_register(clkdom, clk); + return (0); +} Property changes on: projects/clang380-import/sys/dev/extres/clk/clk_div.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk_div.h =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk_div.h (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk_div.h (revision 294777) @@ -0,0 +1,48 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _DEV_EXTRES_CLK_DIV_H_ +#define _DEV_EXTRES_CLK_DIV_H_ + +#include + +#define CLK_DIV_ZERO_BASED 0x0001 /* Zero based divider. */ + +struct clk_div_def { + struct clknode_init_def clkdef; + uint32_t offset; /* Divider register offset */ + uint32_t i_shift; /* Pos of div bits in reg */ + uint32_t i_width; /* Width of div bit field */ + uint32_t f_shift; /* Fractional divide bits, */ + uint32_t f_width; /* set to 0 for int divider */ + int div_flags; /* Divider-specific flags */ +}; + +int clknode_div_register(struct clkdom *clkdom, struct clk_div_def *clkdef); + +#endif /*_DEV_EXTRES_CLK_DIV_H_*/ Property changes on: projects/clang380-import/sys/dev/extres/clk/clk_div.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk_fixed.c =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk_fixed.c (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk_fixed.c (revision 294777) @@ -0,0 +1,114 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#define DEVICE_LOCK(_sc) mtx_lock((_sc)->mtx) +#define DEVICE_UNLOCK(_sc) mtx_unlock((_sc)->mtx) + +static int clknode_fixed_init(struct clknode *clk, device_t dev); +static int clknode_fixed_recalc(struct clknode *clk, uint64_t *freq); +struct clknode_fixed_sc { + struct mtx *mtx; + int fixed_flags; + uint64_t freq; + uint32_t mult; + uint32_t div; +}; + +static clknode_method_t clknode_fixed_methods[] = { + /* Device interface */ + CLKNODEMETHOD(clknode_init, clknode_fixed_init), + CLKNODEMETHOD(clknode_recalc_freq, clknode_fixed_recalc), + CLKNODEMETHOD_END +}; +DEFINE_CLASS_1(clknode_fixed, clknode_fixed_class, clknode_fixed_methods, + sizeof(struct clknode_fixed_sc), clknode_class); + +static int +clknode_fixed_init(struct clknode *clk, device_t dev) +{ + struct clknode_fixed_sc *sc; + + sc = clknode_get_softc(clk); + if (sc->freq == 0) + clknode_init_parent_idx(clk, 0); + return(0); +} +static int +clknode_fixed_recalc(struct clknode *clk, uint64_t *freq) +{ + struct clknode_fixed_sc *sc; + + sc = clknode_get_softc(clk); + if (sc->freq != 0) + *freq = sc->freq; + else if ((sc->mult != 0) && (sc->div != 0)) + *freq = (*freq / sc->div) * sc->mult; + else + *freq = 0; + return (0); +} + +int +clknode_fixed_register(struct clkdom *clkdom, struct clk_fixed_def *clkdef, + struct mtx *dev_mtx) +{ + struct clknode *clk; + struct clknode_fixed_sc *sc; + + if ((clkdef->freq == 0) && (clkdef->clkdef.parent_cnt == 0)) + panic("fixed clk: Frequency is not defined for clock source"); + clk = clknode_create(clkdom, &clknode_fixed_class, &clkdef->clkdef); + if (clk == NULL) + return (1); + + sc = clknode_get_softc(clk); + sc->mtx = dev_mtx; + sc->fixed_flags = clkdef->fixed_flags; + sc->freq = clkdef->freq; + sc->mult = clkdef->mult; + sc->div = clkdef->div; + + clknode_register(clkdom, clk); + return (0); +} Property changes on: projects/clang380-import/sys/dev/extres/clk/clk_fixed.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk_fixed.h =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk_fixed.h (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk_fixed.h (revision 294777) @@ -0,0 +1,53 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _DEV_EXTRES_CLK_FIXED_H_ +#define _DEV_EXTRES_CLK_FIXED_H_ + +#include + +/* + * A fixed clock can represent several different real-world objects, including + * an oscillator with a fixed output frequency, a fixed divider (multiplier and + * divisor must both be > 0), or a phase-fractional divider within a PLL + * (however the code currently divides first, then multiplies, potentially + * leading to different roundoff errors than the hardware PLL). + */ + +struct clk_fixed_def { + struct clknode_init_def clkdef; + uint64_t freq; + uint32_t mult; + uint32_t div; + int fixed_flags; +}; + +int clknode_fixed_register(struct clkdom *clkdom, struct clk_fixed_def *clkdef, + struct mtx *dev_mtx); + +#endif /*_DEV_EXTRES_CLK_FIXED_H_*/ Property changes on: projects/clang380-import/sys/dev/extres/clk/clk_fixed.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk_gate.c =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk_gate.c (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk_gate.c (revision 294777) @@ -0,0 +1,126 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + + +#include +#include +#include +#include +#include + +#include + +#include + +#include "clkdev_if.h" + +#define WR4(_clk, off, val) \ + CLKDEV_WRITE_4(clknode_get_device(_clk), off, val) +#define RD4(_clk, off, val) \ + CLKDEV_READ_4(clknode_get_device(_clk), off, val) +#define MD4(_clk, off, clr, set ) \ + CLKDEV_MODIFY_4(clknode_get_device(_clk), off, clr, set) + + +static int clknode_gate_init(struct clknode *clk, device_t dev); +static int clknode_gate_set_gate(struct clknode *clk, bool enable); +struct clknode_gate_sc { + uint32_t offset; + uint32_t shift; + uint32_t mask; + uint32_t on_value; + uint32_t off_value; + int gate_flags; + bool ungated; +}; + +static clknode_method_t clknode_gate_methods[] = { + /* Device interface */ + CLKNODEMETHOD(clknode_init, clknode_gate_init), + CLKNODEMETHOD(clknode_set_gate, clknode_gate_set_gate), + CLKNODEMETHOD_END +}; +DEFINE_CLASS_1(clknode_gate, clknode_gate_class, clknode_gate_methods, + sizeof(struct clknode_gate_sc), clknode_class); + +static int +clknode_gate_init(struct clknode *clk, device_t dev) +{ + uint32_t reg; + struct clknode_gate_sc *sc; + int rv; + + sc = clknode_get_softc(clk); + rv = RD4(clk, sc->offset, ®); + if (rv != 0) + return (rv); + reg = (reg >> sc->shift) & sc->mask; + sc->ungated = reg == sc->on_value ? 1 : 0; + clknode_init_parent_idx(clk, 0); + return(0); +} + +static int +clknode_gate_set_gate(struct clknode *clk, bool enable) +{ + uint32_t reg; + struct clknode_gate_sc *sc; + int rv; + + sc = clknode_get_softc(clk); + sc->ungated = enable; + rv = MD4(clk, sc->offset, sc->mask << sc->shift, + (sc->ungated ? sc->on_value : sc->off_value) << sc->shift); + if (rv != 0) + return (rv); + RD4(clk, sc->offset, ®); + return(0); +} + +int +clknode_gate_register(struct clkdom *clkdom, struct clk_gate_def *clkdef) +{ + struct clknode *clk; + struct clknode_gate_sc *sc; + + clk = clknode_create(clkdom, &clknode_gate_class, &clkdef->clkdef); + if (clk == NULL) + return (1); + + sc = clknode_get_softc(clk); + sc->offset = clkdef->offset; + sc->shift = clkdef->shift; + sc->mask = clkdef->mask; + sc->on_value = clkdef->on_value; + sc->off_value = clkdef->off_value; + sc->gate_flags = clkdef->gate_flags; + + clknode_register(clkdom, clk); + return (0); +} Property changes on: projects/clang380-import/sys/dev/extres/clk/clk_gate.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk_gate.h =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk_gate.h (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk_gate.h (revision 294777) @@ -0,0 +1,46 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _DEV_EXTRES_CLK_GATE_H_ +#define _DEV_EXTRES_CLK_GATE_H_ + +#include + +struct clk_gate_def { + struct clknode_init_def clkdef; + uint32_t offset; + uint32_t shift; + uint32_t mask; + uint32_t on_value; + uint32_t off_value; + int gate_flags; +}; + +int clknode_gate_register(struct clkdom *clkdom, struct clk_gate_def *clkdef); + +#endif /* _DEV_EXTRES_CLK_GATE_H_ */ Property changes on: projects/clang380-import/sys/dev/extres/clk/clk_gate.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk_mux.c =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk_mux.c (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk_mux.c (revision 294777) @@ -0,0 +1,122 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include +__FBSDID("$FreeBSD$"); + + +#include +#include +#include +#include +#include + +#include + +#include + +#include "clkdev_if.h" + +#define WR4(_clk, off, val) \ + CLKDEV_WRITE_4(clknode_get_device(_clk), off, val) +#define RD4(_clk, off, val) \ + CLKDEV_READ_4(clknode_get_device(_clk), off, val) +#define MD4(_clk, off, clr, set ) \ + CLKDEV_MODIFY_4(clknode_get_device(_clk), off, clr, set) + +static int clknode_mux_init(struct clknode *clk, device_t dev); +static int clknode_mux_set_mux(struct clknode *clk, int idx); + +struct clknode_mux_sc { + uint32_t offset; + uint32_t shift; + uint32_t mask; + int mux_flags; +}; + +static clknode_method_t clknode_mux_methods[] = { + /* Device interface */ + CLKNODEMETHOD(clknode_init, clknode_mux_init), + CLKNODEMETHOD(clknode_set_mux, clknode_mux_set_mux), + CLKNODEMETHOD_END +}; +DEFINE_CLASS_1(clknode_mux, clknode_mux_class, clknode_mux_methods, + sizeof(struct clknode_mux_sc), clknode_class); + + +static int +clknode_mux_init(struct clknode *clk, device_t dev) +{ + uint32_t reg; + struct clknode_mux_sc *sc; + int rv; + + sc = clknode_get_softc(clk); + + rv = RD4(clk, sc->offset, ®); + if (rv != 0) + return (rv); + reg = (reg >> sc->shift) & sc->mask; + clknode_init_parent_idx(clk, reg); + return(0); +} + +static int +clknode_mux_set_mux(struct clknode *clk, int idx) +{ + uint32_t reg; + struct clknode_mux_sc *sc; + int rv; + + sc = clknode_get_softc(clk); + + rv = MD4(clk, sc->offset, sc->mask << sc->shift, + (idx & sc->mask) << sc->shift); + if (rv != 0) + return (rv); + RD4(clk, sc->offset, ®); + return(0); +} + +int +clknode_mux_register(struct clkdom *clkdom, struct clk_mux_def *clkdef) +{ + struct clknode *clk; + struct clknode_mux_sc *sc; + + clk = clknode_create(clkdom, &clknode_mux_class, &clkdef->clkdef); + if (clk == NULL) + return (1); + + sc = clknode_get_softc(clk); + sc->offset = clkdef->offset; + sc->shift = clkdef->shift; + sc->mask = (1 << clkdef->width) - 1; + sc->mux_flags = clkdef->mux_flags; + + clknode_register(clkdom, clk); + return (0); +} Property changes on: projects/clang380-import/sys/dev/extres/clk/clk_mux.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clk_mux.h =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clk_mux.h (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clk_mux.h (revision 294777) @@ -0,0 +1,43 @@ +/*- + * Copyright 2016 Michal Meloun + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ +#ifndef _DEV_EXTRESF_CLK_MUX_H_ +#define _DEV_EXTRESF_CLK_MUX_H_ + +#include + +struct clk_mux_def { + struct clknode_init_def clkdef; + uint32_t offset; + uint32_t shift; + uint32_t width; + int mux_flags; +}; + +int clknode_mux_register(struct clkdom *clkdom, struct clk_mux_def *clkdef); + +#endif /* _DEV_EXTRESF_CLK_MUX_H_ */ Property changes on: projects/clang380-import/sys/dev/extres/clk/clk_mux.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clkdev_if.m =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clkdev_if.m (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clkdev_if.m (revision 294777) @@ -0,0 +1,59 @@ +#- +# Copyright 2016 Michal Meloun +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. +# +# $FreeBSD$ +# + +#include + +INTERFACE clkdev; + +# +# Write single register +# +METHOD int write_4 { + device_t dev; + bus_addr_t addr; + uint32_t val; +}; + +# +# Read single register +# +METHOD int read_4 { + device_t dev; + bus_addr_t addr; + uint32_t *val; +}; + +# +# Modify single register +# +METHOD int modify_4 { + device_t dev; + bus_addr_t addr; + uint32_t clear_mask; + uint32_t set_mask; +}; Property changes on: projects/clang380-import/sys/dev/extres/clk/clkdev_if.m ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/extres/clk/clknode_if.m =================================================================== --- projects/clang380-import/sys/dev/extres/clk/clknode_if.m (nonexistent) +++ projects/clang380-import/sys/dev/extres/clk/clknode_if.m (revision 294777) @@ -0,0 +1,79 @@ +#- +# Copyright 2016 Michal Meloun +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# 1. Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# 2. Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. +# +# $FreeBSD$ +# + +INTERFACE clknode; + +HEADER { + struct clknode; +} + +# +# Initialize clock node, get shanpshot of cached values +# +METHOD int init { + struct clknode *clk; + device_t dev; +}; + +# +# Recalculate frequency +# req - in/out recalulated frequency +# +METHOD int recalc_freq { + struct clknode *clk; + uint64_t *freq; +}; + +# +# Set frequency +# fin - parent (input)frequency. +# fout - requested output freqency. If clock cannot change frequency, +# then must return new requested frequency for his parent +METHOD int set_freq { + struct clknode *clk; + uint64_t fin; + uint64_t *fout; + int flags; + int *done; +}; + +# +# Enable/disable clock +# +METHOD int set_gate { + struct clknode *clk; + bool enable; +}; + +# +# Set multiplexer +# +METHOD int set_mux { + struct clknode *clk; + int idx; +}; Property changes on: projects/clang380-import/sys/dev/extres/clk/clknode_if.m ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/dev/hyperv/netvsc/hv_net_vsc.c =================================================================== --- projects/clang380-import/sys/dev/hyperv/netvsc/hv_net_vsc.c (revision 294776) +++ projects/clang380-import/sys/dev/hyperv/netvsc/hv_net_vsc.c (revision 294777) @@ -1,1031 +1,1033 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2010-2012 Citrix Inc. * Copyright (c) 2012 NetApp Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ /** * HyperV vmbus network VSC (virtual services client) module * */ #include #include #include #include #include #include #include #include #include #include #include "hv_net_vsc.h" #include "hv_rndis.h" #include "hv_rndis_filter.h" MALLOC_DEFINE(M_NETVSC, "netvsc", "Hyper-V netvsc driver"); /* * Forward declarations */ static void hv_nv_on_channel_callback(void *context); static int hv_nv_init_send_buffer_with_net_vsp(struct hv_device *device); static int hv_nv_init_rx_buffer_with_net_vsp(struct hv_device *device); static int hv_nv_destroy_send_buffer(netvsc_dev *net_dev); static int hv_nv_destroy_rx_buffer(netvsc_dev *net_dev); static int hv_nv_connect_to_vsp(struct hv_device *device); static void hv_nv_on_send_completion(netvsc_dev *net_dev, struct hv_device *device, hv_vm_packet_descriptor *pkt); static void hv_nv_on_receive(netvsc_dev *net_dev, struct hv_device *device, hv_vm_packet_descriptor *pkt); /* * */ static inline netvsc_dev * hv_nv_alloc_net_device(struct hv_device *device) { netvsc_dev *net_dev; hn_softc_t *sc = device_get_softc(device->device); net_dev = malloc(sizeof(netvsc_dev), M_NETVSC, M_NOWAIT | M_ZERO); if (net_dev == NULL) { return (NULL); } net_dev->dev = device; net_dev->destroy = FALSE; sc->net_dev = net_dev; return (net_dev); } /* * */ static inline netvsc_dev * hv_nv_get_outbound_net_device(struct hv_device *device) { hn_softc_t *sc = device_get_softc(device->device); netvsc_dev *net_dev = sc->net_dev;; if ((net_dev != NULL) && net_dev->destroy) { return (NULL); } return (net_dev); } /* * */ static inline netvsc_dev * hv_nv_get_inbound_net_device(struct hv_device *device) { hn_softc_t *sc = device_get_softc(device->device); netvsc_dev *net_dev = sc->net_dev;; if (net_dev == NULL) { return (net_dev); } /* * When the device is being destroyed; we only * permit incoming packets if and only if there * are outstanding sends. */ if (net_dev->destroy && net_dev->num_outstanding_sends == 0) { return (NULL); } return (net_dev); } int hv_nv_get_next_send_section(netvsc_dev *net_dev) { unsigned long bitsmap_words = net_dev->bitsmap_words; unsigned long *bitsmap = net_dev->send_section_bitsmap; unsigned long idx; int ret = NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX; int i; for (i = 0; i < bitsmap_words; i++) { idx = ffs(~bitsmap[i]); if (0 == idx) continue; idx--; if (i * BITS_PER_LONG + idx >= net_dev->send_section_count) return (ret); if (synch_test_and_set_bit(idx, &bitsmap[i])) continue; ret = i * BITS_PER_LONG + idx; break; } return (ret); } /* * Net VSC initialize receive buffer with net VSP * * Net VSP: Network virtual services client, also known as the * Hyper-V extensible switch and the synthetic data path. */ static int hv_nv_init_rx_buffer_with_net_vsp(struct hv_device *device) { netvsc_dev *net_dev; nvsp_msg *init_pkt; int ret = 0; net_dev = hv_nv_get_outbound_net_device(device); if (!net_dev) { return (ENODEV); } net_dev->rx_buf = contigmalloc(net_dev->rx_buf_size, M_NETVSC, M_ZERO, 0UL, BUS_SPACE_MAXADDR, PAGE_SIZE, 0); /* * Establish the GPADL handle for this buffer on this channel. * Note: This call uses the vmbus connection rather than the * channel to establish the gpadl handle. * GPADL: Guest physical address descriptor list. */ ret = hv_vmbus_channel_establish_gpadl( device->channel, net_dev->rx_buf, net_dev->rx_buf_size, &net_dev->rx_buf_gpadl_handle); if (ret != 0) { goto cleanup; } /* sema_wait(&ext->channel_init_sema); KYS CHECK */ /* Notify the NetVsp of the gpadl handle */ init_pkt = &net_dev->channel_init_packet; memset(init_pkt, 0, sizeof(nvsp_msg)); init_pkt->hdr.msg_type = nvsp_msg_1_type_send_rx_buf; init_pkt->msgs.vers_1_msgs.send_rx_buf.gpadl_handle = net_dev->rx_buf_gpadl_handle; init_pkt->msgs.vers_1_msgs.send_rx_buf.id = NETVSC_RECEIVE_BUFFER_ID; /* Send the gpadl notification request */ ret = hv_vmbus_channel_send_packet(device->channel, init_pkt, sizeof(nvsp_msg), (uint64_t)(uintptr_t)init_pkt, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if (ret != 0) { goto cleanup; } sema_wait(&net_dev->channel_init_sema); /* Check the response */ if (init_pkt->msgs.vers_1_msgs.send_rx_buf_complete.status != nvsp_status_success) { ret = EINVAL; goto cleanup; } net_dev->rx_section_count = init_pkt->msgs.vers_1_msgs.send_rx_buf_complete.num_sections; net_dev->rx_sections = malloc(net_dev->rx_section_count * sizeof(nvsp_1_rx_buf_section), M_NETVSC, M_NOWAIT); if (net_dev->rx_sections == NULL) { ret = EINVAL; goto cleanup; } memcpy(net_dev->rx_sections, init_pkt->msgs.vers_1_msgs.send_rx_buf_complete.sections, net_dev->rx_section_count * sizeof(nvsp_1_rx_buf_section)); /* * For first release, there should only be 1 section that represents * the entire receive buffer */ if (net_dev->rx_section_count != 1 || net_dev->rx_sections->offset != 0) { ret = EINVAL; goto cleanup; } goto exit; cleanup: hv_nv_destroy_rx_buffer(net_dev); exit: return (ret); } /* * Net VSC initialize send buffer with net VSP */ static int hv_nv_init_send_buffer_with_net_vsp(struct hv_device *device) { netvsc_dev *net_dev; nvsp_msg *init_pkt; int ret = 0; net_dev = hv_nv_get_outbound_net_device(device); if (!net_dev) { return (ENODEV); } net_dev->send_buf = contigmalloc(net_dev->send_buf_size, M_NETVSC, M_ZERO, 0UL, BUS_SPACE_MAXADDR, PAGE_SIZE, 0); if (net_dev->send_buf == NULL) { ret = ENOMEM; goto cleanup; } /* * Establish the gpadl handle for this buffer on this channel. * Note: This call uses the vmbus connection rather than the * channel to establish the gpadl handle. */ ret = hv_vmbus_channel_establish_gpadl(device->channel, net_dev->send_buf, net_dev->send_buf_size, &net_dev->send_buf_gpadl_handle); if (ret != 0) { goto cleanup; } /* Notify the NetVsp of the gpadl handle */ init_pkt = &net_dev->channel_init_packet; memset(init_pkt, 0, sizeof(nvsp_msg)); init_pkt->hdr.msg_type = nvsp_msg_1_type_send_send_buf; init_pkt->msgs.vers_1_msgs.send_rx_buf.gpadl_handle = net_dev->send_buf_gpadl_handle; init_pkt->msgs.vers_1_msgs.send_rx_buf.id = NETVSC_SEND_BUFFER_ID; /* Send the gpadl notification request */ ret = hv_vmbus_channel_send_packet(device->channel, init_pkt, sizeof(nvsp_msg), (uint64_t)init_pkt, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if (ret != 0) { goto cleanup; } sema_wait(&net_dev->channel_init_sema); /* Check the response */ if (init_pkt->msgs.vers_1_msgs.send_send_buf_complete.status != nvsp_status_success) { ret = EINVAL; goto cleanup; } net_dev->send_section_size = init_pkt->msgs.vers_1_msgs.send_send_buf_complete.section_size; net_dev->send_section_count = net_dev->send_buf_size / net_dev->send_section_size; net_dev->bitsmap_words = howmany(net_dev->send_section_count, BITS_PER_LONG); net_dev->send_section_bitsmap = malloc(net_dev->bitsmap_words * sizeof(long), M_NETVSC, M_NOWAIT | M_ZERO); if (NULL == net_dev->send_section_bitsmap) { ret = ENOMEM; goto cleanup; } goto exit; cleanup: hv_nv_destroy_send_buffer(net_dev); exit: return (ret); } /* * Net VSC destroy receive buffer */ static int hv_nv_destroy_rx_buffer(netvsc_dev *net_dev) { nvsp_msg *revoke_pkt; int ret = 0; /* * If we got a section count, it means we received a * send_rx_buf_complete msg * (ie sent nvsp_msg_1_type_send_rx_buf msg) therefore, * we need to send a revoke msg here */ if (net_dev->rx_section_count) { /* Send the revoke receive buffer */ revoke_pkt = &net_dev->revoke_packet; memset(revoke_pkt, 0, sizeof(nvsp_msg)); revoke_pkt->hdr.msg_type = nvsp_msg_1_type_revoke_rx_buf; revoke_pkt->msgs.vers_1_msgs.revoke_rx_buf.id = NETVSC_RECEIVE_BUFFER_ID; ret = hv_vmbus_channel_send_packet(net_dev->dev->channel, revoke_pkt, sizeof(nvsp_msg), (uint64_t)(uintptr_t)revoke_pkt, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, 0); /* * If we failed here, we might as well return and have a leak * rather than continue and a bugchk */ if (ret != 0) { return (ret); } } /* Tear down the gpadl on the vsp end */ if (net_dev->rx_buf_gpadl_handle) { ret = hv_vmbus_channel_teardown_gpdal(net_dev->dev->channel, net_dev->rx_buf_gpadl_handle); /* * If we failed here, we might as well return and have a leak * rather than continue and a bugchk */ if (ret != 0) { return (ret); } net_dev->rx_buf_gpadl_handle = 0; } if (net_dev->rx_buf) { /* Free up the receive buffer */ contigfree(net_dev->rx_buf, net_dev->rx_buf_size, M_NETVSC); net_dev->rx_buf = NULL; } if (net_dev->rx_sections) { free(net_dev->rx_sections, M_NETVSC); net_dev->rx_sections = NULL; net_dev->rx_section_count = 0; } return (ret); } /* * Net VSC destroy send buffer */ static int hv_nv_destroy_send_buffer(netvsc_dev *net_dev) { nvsp_msg *revoke_pkt; int ret = 0; /* * If we got a section count, it means we received a * send_rx_buf_complete msg * (ie sent nvsp_msg_1_type_send_rx_buf msg) therefore, * we need to send a revoke msg here */ if (net_dev->send_section_size) { /* Send the revoke send buffer */ revoke_pkt = &net_dev->revoke_packet; memset(revoke_pkt, 0, sizeof(nvsp_msg)); revoke_pkt->hdr.msg_type = nvsp_msg_1_type_revoke_send_buf; revoke_pkt->msgs.vers_1_msgs.revoke_send_buf.id = NETVSC_SEND_BUFFER_ID; ret = hv_vmbus_channel_send_packet(net_dev->dev->channel, revoke_pkt, sizeof(nvsp_msg), (uint64_t)(uintptr_t)revoke_pkt, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, 0); /* * If we failed here, we might as well return and have a leak * rather than continue and a bugchk */ if (ret != 0) { return (ret); } } /* Tear down the gpadl on the vsp end */ if (net_dev->send_buf_gpadl_handle) { ret = hv_vmbus_channel_teardown_gpdal(net_dev->dev->channel, net_dev->send_buf_gpadl_handle); /* * If we failed here, we might as well return and have a leak * rather than continue and a bugchk */ if (ret != 0) { return (ret); } net_dev->send_buf_gpadl_handle = 0; } if (net_dev->send_buf) { /* Free up the receive buffer */ contigfree(net_dev->send_buf, net_dev->send_buf_size, M_NETVSC); net_dev->send_buf = NULL; } if (net_dev->send_section_bitsmap) { free(net_dev->send_section_bitsmap, M_NETVSC); } return (ret); } /* * Attempt to negotiate the caller-specified NVSP version * * For NVSP v2, Server 2008 R2 does not set * init_pkt->msgs.init_msgs.init_compl.negotiated_prot_vers * to the negotiated version, so we cannot rely on that. */ static int hv_nv_negotiate_nvsp_protocol(struct hv_device *device, netvsc_dev *net_dev, uint32_t nvsp_ver) { nvsp_msg *init_pkt; int ret; init_pkt = &net_dev->channel_init_packet; memset(init_pkt, 0, sizeof(nvsp_msg)); init_pkt->hdr.msg_type = nvsp_msg_type_init; /* * Specify parameter as the only acceptable protocol version */ init_pkt->msgs.init_msgs.init.p1.protocol_version = nvsp_ver; init_pkt->msgs.init_msgs.init.protocol_version_2 = nvsp_ver; /* Send the init request */ ret = hv_vmbus_channel_send_packet(device->channel, init_pkt, sizeof(nvsp_msg), (uint64_t)(uintptr_t)init_pkt, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); if (ret != 0) return (-1); sema_wait(&net_dev->channel_init_sema); if (init_pkt->msgs.init_msgs.init_compl.status != nvsp_status_success) return (EINVAL); return (0); } /* * Send NDIS version 2 config packet containing MTU. * * Not valid for NDIS version 1. */ static int hv_nv_send_ndis_config(struct hv_device *device, uint32_t mtu) { netvsc_dev *net_dev; nvsp_msg *init_pkt; int ret; net_dev = hv_nv_get_outbound_net_device(device); if (!net_dev) return (-ENODEV); /* * Set up configuration packet, write MTU * Indicate we are capable of handling VLAN tags */ init_pkt = &net_dev->channel_init_packet; memset(init_pkt, 0, sizeof(nvsp_msg)); init_pkt->hdr.msg_type = nvsp_msg_2_type_send_ndis_config; init_pkt->msgs.vers_2_msgs.send_ndis_config.mtu = mtu; init_pkt-> msgs.vers_2_msgs.send_ndis_config.capabilities.u1.u2.ieee8021q = 1; /* Send the configuration packet */ ret = hv_vmbus_channel_send_packet(device->channel, init_pkt, sizeof(nvsp_msg), (uint64_t)(uintptr_t)init_pkt, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, 0); if (ret != 0) return (-EINVAL); return (0); } /* * Net VSC connect to VSP */ static int hv_nv_connect_to_vsp(struct hv_device *device) { netvsc_dev *net_dev; nvsp_msg *init_pkt; uint32_t ndis_version; uint32_t protocol_list[] = { NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2, NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5 }; int i; int protocol_number = nitems(protocol_list); int ret = 0; device_t dev = device->device; hn_softc_t *sc = device_get_softc(dev); struct ifnet *ifp = sc->hn_ifp; net_dev = hv_nv_get_outbound_net_device(device); if (!net_dev) { return (ENODEV); } /* * Negotiate the NVSP version. Try the latest NVSP first. */ for (i = protocol_number - 1; i >= 0; i--) { if (hv_nv_negotiate_nvsp_protocol(device, net_dev, protocol_list[i]) == 0) { net_dev->nvsp_version = protocol_list[i]; if (bootverbose) device_printf(dev, "Netvsc: got version 0x%x\n", net_dev->nvsp_version); break; } } if (i < 0) { if (bootverbose) device_printf(dev, "failed to negotiate a valid " "protocol.\n"); return (EPROTO); } /* * Set the MTU if supported by this NVSP protocol version * This needs to be right after the NVSP init message per Haiyang */ if (net_dev->nvsp_version >= NVSP_PROTOCOL_VERSION_2) ret = hv_nv_send_ndis_config(device, ifp->if_mtu); /* * Send the NDIS version */ init_pkt = &net_dev->channel_init_packet; memset(init_pkt, 0, sizeof(nvsp_msg)); if (net_dev->nvsp_version <= NVSP_PROTOCOL_VERSION_4) { ndis_version = NDIS_VERSION_6_1; } else { ndis_version = NDIS_VERSION_6_30; } init_pkt->hdr.msg_type = nvsp_msg_1_type_send_ndis_vers; init_pkt->msgs.vers_1_msgs.send_ndis_vers.ndis_major_vers = (ndis_version & 0xFFFF0000) >> 16; init_pkt->msgs.vers_1_msgs.send_ndis_vers.ndis_minor_vers = ndis_version & 0xFFFF; /* Send the init request */ ret = hv_vmbus_channel_send_packet(device->channel, init_pkt, sizeof(nvsp_msg), (uint64_t)(uintptr_t)init_pkt, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, 0); if (ret != 0) { goto cleanup; } /* * TODO: BUGBUG - We have to wait for the above msg since the netvsp * uses KMCL which acknowledges packet (completion packet) * since our Vmbus always set the * HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED flag */ /* sema_wait(&NetVscChannel->channel_init_sema); */ /* Post the big receive buffer to NetVSP */ if (net_dev->nvsp_version <= NVSP_PROTOCOL_VERSION_2) net_dev->rx_buf_size = NETVSC_RECEIVE_BUFFER_SIZE_LEGACY; else net_dev->rx_buf_size = NETVSC_RECEIVE_BUFFER_SIZE; net_dev->send_buf_size = NETVSC_SEND_BUFFER_SIZE; ret = hv_nv_init_rx_buffer_with_net_vsp(device); if (ret == 0) ret = hv_nv_init_send_buffer_with_net_vsp(device); cleanup: return (ret); } /* * Net VSC disconnect from VSP */ static void hv_nv_disconnect_from_vsp(netvsc_dev *net_dev) { hv_nv_destroy_rx_buffer(net_dev); hv_nv_destroy_send_buffer(net_dev); } /* * Net VSC on device add * * Callback when the device belonging to this driver is added */ netvsc_dev * hv_nv_on_device_add(struct hv_device *device, void *additional_info) { netvsc_dev *net_dev; int ret = 0; net_dev = hv_nv_alloc_net_device(device); if (!net_dev) goto cleanup; /* Initialize the NetVSC channel extension */ sema_init(&net_dev->channel_init_sema, 0, "netdev_sema"); /* * Open the channel */ ret = hv_vmbus_channel_open(device->channel, NETVSC_DEVICE_RING_BUFFER_SIZE, NETVSC_DEVICE_RING_BUFFER_SIZE, NULL, 0, hv_nv_on_channel_callback, device); if (ret != 0) goto cleanup; /* * Connect with the NetVsp */ ret = hv_nv_connect_to_vsp(device); if (ret != 0) goto close; return (net_dev); close: /* Now, we can close the channel safely */ hv_vmbus_channel_close(device->channel); cleanup: /* * Free the packet buffers on the netvsc device packet queue. * Release other resources. */ if (net_dev) { sema_destroy(&net_dev->channel_init_sema); free(net_dev, M_NETVSC); } return (NULL); } /* * Net VSC on device remove */ int hv_nv_on_device_remove(struct hv_device *device, boolean_t destroy_channel) { hn_softc_t *sc = device_get_softc(device->device); netvsc_dev *net_dev = sc->net_dev;; /* Stop outbound traffic ie sends and receives completions */ mtx_lock(&device->channel->inbound_lock); net_dev->destroy = TRUE; mtx_unlock(&device->channel->inbound_lock); /* Wait for all send completions */ while (net_dev->num_outstanding_sends) { DELAY(100); } hv_nv_disconnect_from_vsp(net_dev); /* At this point, no one should be accessing net_dev except in here */ /* Now, we can close the channel safely */ if (!destroy_channel) { device->channel->state = HV_CHANNEL_CLOSING_NONDESTRUCTIVE_STATE; } hv_vmbus_channel_close(device->channel); sema_destroy(&net_dev->channel_init_sema); free(net_dev, M_NETVSC); return (0); } /* * Net VSC on send completion */ static void hv_nv_on_send_completion(netvsc_dev *net_dev, struct hv_device *device, hv_vm_packet_descriptor *pkt) { nvsp_msg *nvsp_msg_pkt; netvsc_packet *net_vsc_pkt; nvsp_msg_pkt = (nvsp_msg *)((unsigned long)pkt + (pkt->data_offset8 << 3)); if (nvsp_msg_pkt->hdr.msg_type == nvsp_msg_type_init_complete || nvsp_msg_pkt->hdr.msg_type == nvsp_msg_1_type_send_rx_buf_complete || nvsp_msg_pkt->hdr.msg_type == nvsp_msg_1_type_send_send_buf_complete) { /* Copy the response back */ memcpy(&net_dev->channel_init_packet, nvsp_msg_pkt, sizeof(nvsp_msg)); sema_post(&net_dev->channel_init_sema); } else if (nvsp_msg_pkt->hdr.msg_type == nvsp_msg_1_type_send_rndis_pkt_complete) { /* Get the send context */ net_vsc_pkt = (netvsc_packet *)(unsigned long)pkt->transaction_id; if (NULL != net_vsc_pkt) { if (net_vsc_pkt->send_buf_section_idx != NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX) { synch_change_bit(net_vsc_pkt->send_buf_section_idx, net_dev->send_section_bitsmap); } /* Notify the layer above us */ net_vsc_pkt->compl.send.on_send_completion( net_vsc_pkt->compl.send.send_completion_context); } atomic_subtract_int(&net_dev->num_outstanding_sends, 1); } } /* * Net VSC on send * Sends a packet on the specified Hyper-V device. * Returns 0 on success, non-zero on failure. */ int hv_nv_on_send(struct hv_device *device, netvsc_packet *pkt) { netvsc_dev *net_dev; nvsp_msg send_msg; int ret; net_dev = hv_nv_get_outbound_net_device(device); if (!net_dev) return (ENODEV); send_msg.hdr.msg_type = nvsp_msg_1_type_send_rndis_pkt; if (pkt->is_data_pkt) { /* 0 is RMC_DATA */ send_msg.msgs.vers_1_msgs.send_rndis_pkt.chan_type = 0; } else { /* 1 is RMC_CONTROL */ send_msg.msgs.vers_1_msgs.send_rndis_pkt.chan_type = 1; } send_msg.msgs.vers_1_msgs.send_rndis_pkt.send_buf_section_idx = pkt->send_buf_section_idx; send_msg.msgs.vers_1_msgs.send_rndis_pkt.send_buf_section_size = pkt->send_buf_section_size; if (pkt->page_buf_count) { ret = hv_vmbus_channel_send_packet_pagebuffer(device->channel, pkt->page_buffers, pkt->page_buf_count, &send_msg, sizeof(nvsp_msg), (uint64_t)(uintptr_t)pkt); } else { ret = hv_vmbus_channel_send_packet(device->channel, &send_msg, sizeof(nvsp_msg), (uint64_t)(uintptr_t)pkt, HV_VMBUS_PACKET_TYPE_DATA_IN_BAND, HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); } /* Record outstanding send only if send_packet() succeeded */ if (ret == 0) atomic_add_int(&net_dev->num_outstanding_sends, 1); return (ret); } /* * Net VSC on receive * * In the FreeBSD Hyper-V virtual world, this function deals exclusively * with virtual addresses. */ static void hv_nv_on_receive(netvsc_dev *net_dev, struct hv_device *device, hv_vm_packet_descriptor *pkt) { hv_vm_transfer_page_packet_header *vm_xfer_page_pkt; nvsp_msg *nvsp_msg_pkt; netvsc_packet vsc_pkt; netvsc_packet *net_vsc_pkt = &vsc_pkt; device_t dev = device->device; int count = 0; int i = 0; int status = nvsp_status_success; /* * All inbound packets other than send completion should be * xfer page packet. */ if (pkt->type != HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES) { device_printf(dev, "packet type %d is invalid!\n", pkt->type); return; } nvsp_msg_pkt = (nvsp_msg *)((unsigned long)pkt + (pkt->data_offset8 << 3)); /* Make sure this is a valid nvsp packet */ if (nvsp_msg_pkt->hdr.msg_type != nvsp_msg_1_type_send_rndis_pkt) { device_printf(dev, "packet hdr type %d is invalid!\n", pkt->type); return; } vm_xfer_page_pkt = (hv_vm_transfer_page_packet_header *)pkt; if (vm_xfer_page_pkt->transfer_page_set_id != NETVSC_RECEIVE_BUFFER_ID) { device_printf(dev, "transfer_page_set_id %d is invalid!\n", vm_xfer_page_pkt->transfer_page_set_id); return; } count = vm_xfer_page_pkt->range_count; net_vsc_pkt->device = device; /* Each range represents 1 RNDIS pkt that contains 1 Ethernet frame */ for (i = 0; i < count; i++) { net_vsc_pkt->status = nvsp_status_success; net_vsc_pkt->data = (void *)((unsigned long)net_dev->rx_buf + vm_xfer_page_pkt->ranges[i].byte_offset); net_vsc_pkt->tot_data_buf_len = vm_xfer_page_pkt->ranges[i].byte_count; hv_rf_on_receive(net_dev, device, net_vsc_pkt); if (net_vsc_pkt->status != nvsp_status_success) { status = nvsp_status_failure; } } /* * Moved completion call back here so that all received * messages (not just data messages) will trigger a response * message back to the host. */ hv_nv_on_receive_completion(device, vm_xfer_page_pkt->d.transaction_id, status); hv_rf_receive_rollup(net_dev); } /* * Net VSC on receive completion * * Send a receive completion packet to RNDIS device (ie NetVsp) */ void hv_nv_on_receive_completion(struct hv_device *device, uint64_t tid, uint32_t status) { nvsp_msg rx_comp_msg; int retries = 0; int ret = 0; rx_comp_msg.hdr.msg_type = nvsp_msg_1_type_send_rndis_pkt_complete; /* Pass in the status */ rx_comp_msg.msgs.vers_1_msgs.send_rndis_pkt_complete.status = status; retry_send_cmplt: /* Send the completion */ ret = hv_vmbus_channel_send_packet(device->channel, &rx_comp_msg, sizeof(nvsp_msg), tid, HV_VMBUS_PACKET_TYPE_COMPLETION, 0); if (ret == 0) { /* success */ /* no-op */ } else if (ret == EAGAIN) { /* no more room... wait a bit and attempt to retry 3 times */ retries++; if (retries < 4) { DELAY(100); goto retry_send_cmplt; } } } /* * Net VSC on channel callback */ static void hv_nv_on_channel_callback(void *context) { struct hv_device *device = (struct hv_device *)context; netvsc_dev *net_dev; device_t dev = device->device; uint32_t bytes_rxed; uint64_t request_id; hv_vm_packet_descriptor *desc; uint8_t *buffer; int bufferlen = NETVSC_PACKET_SIZE; int ret = 0; net_dev = hv_nv_get_inbound_net_device(device); if (net_dev == NULL) return; buffer = net_dev->callback_buf; do { ret = hv_vmbus_channel_recv_packet_raw(device->channel, buffer, bufferlen, &bytes_rxed, &request_id); if (ret == 0) { if (bytes_rxed > 0) { desc = (hv_vm_packet_descriptor *)buffer; switch (desc->type) { case HV_VMBUS_PACKET_TYPE_COMPLETION: hv_nv_on_send_completion(net_dev, device, desc); break; case HV_VMBUS_PACKET_TYPE_DATA_USING_TRANSFER_PAGES: hv_nv_on_receive(net_dev, device, desc); break; default: device_printf(dev, "hv_cb recv unknow type %d " " packet\n", desc->type); break; } } else { break; } } else if (ret == ENOBUFS) { /* Handle large packet */ if (bufferlen > NETVSC_PACKET_SIZE) { free(buffer, M_NETVSC); buffer = NULL; } /* alloc new buffer */ buffer = malloc(bytes_rxed, M_NETVSC, M_NOWAIT); if (buffer == NULL) { device_printf(dev, "hv_cb malloc buffer failed, len=%u\n", bytes_rxed); bufferlen = 0; break; } bufferlen = bytes_rxed; } } while (1); if (bufferlen > NETVSC_PACKET_SIZE) free(buffer, M_NETVSC); + + hv_rf_channel_rollup(net_dev); } Index: projects/clang380-import/sys/dev/hyperv/netvsc/hv_net_vsc.h =================================================================== --- projects/clang380-import/sys/dev/hyperv/netvsc/hv_net_vsc.h (revision 294776) +++ projects/clang380-import/sys/dev/hyperv/netvsc/hv_net_vsc.h (revision 294777) @@ -1,1035 +1,1059 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2010-2012 Citrix Inc. * Copyright (c) 2012 NetApp Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ /* * HyperV vmbus (virtual machine bus) network VSC (virtual services client) * header file * * (Updated from unencumbered NvspProtocol.h) */ #ifndef __HV_NET_VSC_H__ #define __HV_NET_VSC_H__ -#include #include #include #include +#include #include +#include +#include +#include + #include #include #include #include #include MALLOC_DECLARE(M_NETVSC); #define NVSP_INVALID_PROTOCOL_VERSION (0xFFFFFFFF) #define NVSP_PROTOCOL_VERSION_1 2 #define NVSP_PROTOCOL_VERSION_2 0x30002 #define NVSP_PROTOCOL_VERSION_4 0x40000 #define NVSP_PROTOCOL_VERSION_5 0x50000 #define NVSP_MIN_PROTOCOL_VERSION (NVSP_PROTOCOL_VERSION_1) #define NVSP_MAX_PROTOCOL_VERSION (NVSP_PROTOCOL_VERSION_2) #define NVSP_PROTOCOL_VERSION_CURRENT NVSP_PROTOCOL_VERSION_2 #define VERSION_4_OFFLOAD_SIZE 22 #define NVSP_OPERATIONAL_STATUS_OK (0x00000000) #define NVSP_OPERATIONAL_STATUS_DEGRADED (0x00000001) #define NVSP_OPERATIONAL_STATUS_NONRECOVERABLE (0x00000002) #define NVSP_OPERATIONAL_STATUS_NO_CONTACT (0x00000003) #define NVSP_OPERATIONAL_STATUS_LOST_COMMUNICATION (0x00000004) /* * Maximun number of transfer pages (packets) the VSP will use on a receive */ #define NVSP_MAX_PACKETS_PER_RECEIVE 375 typedef enum nvsp_msg_type_ { nvsp_msg_type_none = 0, /* * Init Messages */ nvsp_msg_type_init = 1, nvsp_msg_type_init_complete = 2, nvsp_version_msg_start = 100, /* * Version 1 Messages */ nvsp_msg_1_type_send_ndis_vers = nvsp_version_msg_start, nvsp_msg_1_type_send_rx_buf, nvsp_msg_1_type_send_rx_buf_complete, nvsp_msg_1_type_revoke_rx_buf, nvsp_msg_1_type_send_send_buf, nvsp_msg_1_type_send_send_buf_complete, nvsp_msg_1_type_revoke_send_buf, nvsp_msg_1_type_send_rndis_pkt, nvsp_msg_1_type_send_rndis_pkt_complete, /* * Version 2 Messages */ nvsp_msg_2_type_send_chimney_delegated_buf, nvsp_msg_2_type_send_chimney_delegated_buf_complete, nvsp_msg_2_type_revoke_chimney_delegated_buf, nvsp_msg_2_type_resume_chimney_rx_indication, nvsp_msg_2_type_terminate_chimney, nvsp_msg_2_type_terminate_chimney_complete, nvsp_msg_2_type_indicate_chimney_event, nvsp_msg_2_type_send_chimney_packet, nvsp_msg_2_type_send_chimney_packet_complete, nvsp_msg_2_type_post_chimney_rx_request, nvsp_msg_2_type_post_chimney_rx_request_complete, nvsp_msg_2_type_alloc_rx_buf, nvsp_msg_2_type_alloc_rx_buf_complete, nvsp_msg_2_type_free_rx_buf, nvsp_msg_2_send_vmq_rndis_pkt, nvsp_msg_2_send_vmq_rndis_pkt_complete, nvsp_msg_2_type_send_ndis_config, nvsp_msg_2_type_alloc_chimney_handle, nvsp_msg_2_type_alloc_chimney_handle_complete, } nvsp_msg_type; typedef enum nvsp_status_ { nvsp_status_none = 0, nvsp_status_success, nvsp_status_failure, /* Deprecated */ nvsp_status_prot_vers_range_too_new, /* Deprecated */ nvsp_status_prot_vers_range_too_old, nvsp_status_invalid_rndis_pkt, nvsp_status_busy, nvsp_status_max, } nvsp_status; typedef struct nvsp_msg_hdr_ { uint32_t msg_type; } __packed nvsp_msg_hdr; /* * Init Messages */ /* * This message is used by the VSC to initialize the channel * after the channels has been opened. This message should * never include anything other then versioning (i.e. this * message will be the same for ever). * * Forever is a long time. The values have been redefined * in Win7 to indicate major and minor protocol version * number. */ typedef struct nvsp_msg_init_ { union { struct { uint16_t minor_protocol_version; uint16_t major_protocol_version; } s; /* Formerly min_protocol_version */ uint32_t protocol_version; } p1; /* Formerly max_protocol_version */ uint32_t protocol_version_2; } __packed nvsp_msg_init; /* * This message is used by the VSP to complete the initialization * of the channel. This message should never include anything other * then versioning (i.e. this message will be the same forever). */ typedef struct nvsp_msg_init_complete_ { /* Deprecated */ uint32_t negotiated_prot_vers; uint32_t max_mdl_chain_len; uint32_t status; } __packed nvsp_msg_init_complete; typedef union nvsp_msg_init_uber_ { nvsp_msg_init init; nvsp_msg_init_complete init_compl; } __packed nvsp_msg_init_uber; /* * Version 1 Messages */ /* * This message is used by the VSC to send the NDIS version * to the VSP. The VSP can use this information when handling * OIDs sent by the VSC. */ typedef struct nvsp_1_msg_send_ndis_version_ { uint32_t ndis_major_vers; /* Deprecated */ uint32_t ndis_minor_vers; } __packed nvsp_1_msg_send_ndis_version; /* * This message is used by the VSC to send a receive buffer * to the VSP. The VSP can then use the receive buffer to * send data to the VSC. */ typedef struct nvsp_1_msg_send_rx_buf_ { uint32_t gpadl_handle; uint16_t id; } __packed nvsp_1_msg_send_rx_buf; typedef struct nvsp_1_rx_buf_section_ { uint32_t offset; uint32_t sub_allocation_size; uint32_t num_sub_allocations; uint32_t end_offset; } __packed nvsp_1_rx_buf_section; /* * This message is used by the VSP to acknowledge a receive * buffer send by the VSC. This message must be sent by the * VSP before the VSP uses the receive buffer. */ typedef struct nvsp_1_msg_send_rx_buf_complete_ { uint32_t status; uint32_t num_sections; /* * The receive buffer is split into two parts, a large * suballocation section and a small suballocation * section. These sections are then suballocated by a * certain size. * * For example, the following break up of the receive * buffer has 6 large suballocations and 10 small * suballocations. * * | Large Section | | Small Section | * ------------------------------------------------------------ * | | | | | | | | | | | | | | | | | | * | | * LargeOffset SmallOffset */ nvsp_1_rx_buf_section sections[1]; } __packed nvsp_1_msg_send_rx_buf_complete; /* * This message is sent by the VSC to revoke the receive buffer. * After the VSP completes this transaction, the VSP should never * use the receive buffer again. */ typedef struct nvsp_1_msg_revoke_rx_buf_ { uint16_t id; } __packed nvsp_1_msg_revoke_rx_buf; /* * This message is used by the VSC to send a send buffer * to the VSP. The VSC can then use the send buffer to * send data to the VSP. */ typedef struct nvsp_1_msg_send_send_buf_ { uint32_t gpadl_handle; uint16_t id; } __packed nvsp_1_msg_send_send_buf; /* * This message is used by the VSP to acknowledge a send * buffer sent by the VSC. This message must be sent by the * VSP before the VSP uses the sent buffer. */ typedef struct nvsp_1_msg_send_send_buf_complete_ { uint32_t status; /* * The VSC gets to choose the size of the send buffer and * the VSP gets to choose the sections size of the buffer. * This was done to enable dynamic reconfigurations when * the cost of GPA-direct buffers decreases. */ uint32_t section_size; } __packed nvsp_1_msg_send_send_buf_complete; /* * This message is sent by the VSC to revoke the send buffer. * After the VSP completes this transaction, the vsp should never * use the send buffer again. */ typedef struct nvsp_1_msg_revoke_send_buf_ { uint16_t id; } __packed nvsp_1_msg_revoke_send_buf; /* * This message is used by both the VSP and the VSC to send * an RNDIS message to the opposite channel endpoint. */ typedef struct nvsp_1_msg_send_rndis_pkt_ { /* * This field is specified by RNIDS. They assume there's * two different channels of communication. However, * the Network VSP only has one. Therefore, the channel * travels with the RNDIS packet. */ uint32_t chan_type; /* * This field is used to send part or all of the data * through a send buffer. This values specifies an * index into the send buffer. If the index is * 0xFFFFFFFF, then the send buffer is not being used * and all of the data was sent through other VMBus * mechanisms. */ uint32_t send_buf_section_idx; uint32_t send_buf_section_size; } __packed nvsp_1_msg_send_rndis_pkt; /* * This message is used by both the VSP and the VSC to complete * a RNDIS message to the opposite channel endpoint. At this * point, the initiator of this message cannot use any resources * associated with the original RNDIS packet. */ typedef struct nvsp_1_msg_send_rndis_pkt_complete_ { uint32_t status; } __packed nvsp_1_msg_send_rndis_pkt_complete; /* * Version 2 Messages */ /* * This message is used by the VSC to send the NDIS version * to the VSP. The VSP can use this information when handling * OIDs sent by the VSC. */ typedef struct nvsp_2_netvsc_capabilities_ { union { uint64_t as_uint64; struct { uint64_t vmq : 1; uint64_t chimney : 1; uint64_t sriov : 1; uint64_t ieee8021q : 1; uint64_t correlationid : 1; uint64_t teaming : 1; } u2; } u1; } __packed nvsp_2_netvsc_capabilities; typedef struct nvsp_2_msg_send_ndis_config_ { uint32_t mtu; uint32_t reserved; nvsp_2_netvsc_capabilities capabilities; } __packed nvsp_2_msg_send_ndis_config; /* * NvspMessage2TypeSendChimneyDelegatedBuffer */ typedef struct nvsp_2_msg_send_chimney_buf_ { /* * On WIN7 beta, delegated_obj_max_size is defined as a uint32_t * Since WIN7 RC, it was split into two uint16_t. To have the same * struct layout, delegated_obj_max_size shall be the first field. */ uint16_t delegated_obj_max_size; /* * The revision # of chimney protocol used between NVSC and NVSP. * * This revision is NOT related to the chimney revision between * NDIS protocol and miniport drivers. */ uint16_t revision; uint32_t gpadl_handle; } __packed nvsp_2_msg_send_chimney_buf; /* Unsupported chimney revision 0 (only present in WIN7 beta) */ #define NVSP_CHIMNEY_REVISION_0 0 /* WIN7 Beta Chimney QFE */ #define NVSP_CHIMNEY_REVISION_1 1 /* The chimney revision since WIN7 RC */ #define NVSP_CHIMNEY_REVISION_2 2 /* * NvspMessage2TypeSendChimneyDelegatedBufferComplete */ typedef struct nvsp_2_msg_send_chimney_buf_complete_ { uint32_t status; /* * Maximum number outstanding sends and pre-posted receives. * * NVSC should not post more than SendQuota/ReceiveQuota packets. * Otherwise, it can block the non-chimney path for an indefinite * amount of time. * (since chimney sends/receives are affected by the remote peer). * * Note: NVSP enforces the quota restrictions on a per-VMBCHANNEL * basis. It doesn't enforce the restriction separately for chimney * send/receive. If NVSC doesn't voluntarily enforce "SendQuota", * it may kill its own network connectivity. */ uint32_t send_quota; uint32_t rx_quota; } __packed nvsp_2_msg_send_chimney_buf_complete; /* * NvspMessage2TypeRevokeChimneyDelegatedBuffer */ typedef struct nvsp_2_msg_revoke_chimney_buf_ { uint32_t gpadl_handle; } __packed nvsp_2_msg_revoke_chimney_buf; #define NVSP_CHIMNEY_OBJECT_TYPE_NEIGHBOR 0 #define NVSP_CHIMNEY_OBJECT_TYPE_PATH4 1 #define NVSP_CHIMNEY_OBJECT_TYPE_PATH6 2 #define NVSP_CHIMNEY_OBJECT_TYPE_TCP 3 /* * NvspMessage2TypeAllocateChimneyHandle */ typedef struct nvsp_2_msg_alloc_chimney_handle_ { uint64_t vsc_context; uint32_t object_type; } __packed nvsp_2_msg_alloc_chimney_handle; /* * NvspMessage2TypeAllocateChimneyHandleComplete */ typedef struct nvsp_2_msg_alloc_chimney_handle_complete_ { uint32_t vsp_handle; } __packed nvsp_2_msg_alloc_chimney_handle_complete; /* * NvspMessage2TypeResumeChimneyRXIndication */ typedef struct nvsp_2_msg_resume_chimney_rx_indication { /* * Handle identifying the offloaded connection */ uint32_t vsp_tcp_handle; } __packed nvsp_2_msg_resume_chimney_rx_indication; #define NVSP_2_MSG_TERMINATE_CHIMNEY_FLAGS_FIRST_STAGE (0x01u) #define NVSP_2_MSG_TERMINATE_CHIMNEY_FLAGS_RESERVED (~(0x01u)) /* * NvspMessage2TypeTerminateChimney */ typedef struct nvsp_2_msg_terminate_chimney_ { /* * Handle identifying the offloaded object */ uint32_t vsp_handle; /* * Terminate Offload Flags * Bit 0: * When set to 0, terminate the offload at the destination NIC * Bit 1-31: Reserved, shall be zero */ uint32_t flags; union { /* * This field is valid only when bit 0 of flags is clear. * It specifies the index into the premapped delegated * object buffer. The buffer was sent through the * NvspMessage2TypeSendChimneyDelegatedBuffer * message at initialization time. * * NVSP will write the delegated state into the delegated * buffer upon upload completion. */ uint32_t index; /* * This field is valid only when bit 0 of flags is set. * * The seqence number of the most recently accepted RX * indication when VSC sets its TCP context into * "terminating" state. * * This allows NVSP to determines if there are any in-flight * RX indications for which the acceptance state is still * undefined. */ uint64_t last_accepted_rx_seq_no; } f0; } __packed nvsp_2_msg_terminate_chimney; #define NVSP_TERMINATE_CHIMNEY_COMPLETE_FLAG_DATA_CORRUPTED 0x0000001u /* * NvspMessage2TypeTerminateChimneyComplete */ typedef struct nvsp_2_msg_terminate_chimney_complete_ { uint64_t vsc_context; uint32_t flags; } __packed nvsp_2_msg_terminate_chimney_complete; /* * NvspMessage2TypeIndicateChimneyEvent */ typedef struct nvsp_2_msg_indicate_chimney_event_ { /* * When VscTcpContext is 0, event_type is an NDIS_STATUS event code * Otherwise, EventType is an TCP connection event (defined in * NdisTcpOffloadEventHandler chimney DDK document). */ uint32_t event_type; /* * When VscTcpContext is 0, EventType is an NDIS_STATUS event code * Otherwise, EventType is an TCP connection event specific information * (defined in NdisTcpOffloadEventHandler chimney DDK document). */ uint32_t event_specific_info; /* * If not 0, the event is per-TCP connection event. This field * contains the VSC's TCP context. * If 0, the event indication is global. */ uint64_t vsc_tcp_context; } __packed nvsp_2_msg_indicate_chimney_event; #define NVSP_1_CHIMNEY_SEND_INVALID_OOB_INDEX 0xffffu #define NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX 0xffffffff /* * NvspMessage2TypeSendChimneyPacket */ typedef struct nvsp_2_msg_send_chimney_pkt_ { /* * Identify the TCP connection for which this chimney send is */ uint32_t vsp_tcp_handle; /* * This field is used to send part or all of the data * through a send buffer. This values specifies an * index into the send buffer. If the index is * 0xFFFF, then the send buffer is not being used * and all of the data was sent through other VMBus * mechanisms. */ uint16_t send_buf_section_index; uint16_t send_buf_section_size; /* * OOB Data Index * This an index to the OOB data buffer. If the index is 0xFFFFFFFF, * then there is no OOB data. * * This field shall be always 0xFFFFFFFF for now. It is reserved for * the future. */ uint16_t oob_data_index; /* * DisconnectFlags = 0 * Normal chimney send. See MiniportTcpOffloadSend for details. * * DisconnectFlags = TCP_DISCONNECT_GRACEFUL_CLOSE (0x01) * Graceful disconnect. See MiniportTcpOffloadDisconnect for details. * * DisconnectFlags = TCP_DISCONNECT_ABORTIVE_CLOSE (0x02) * Abortive disconnect. See MiniportTcpOffloadDisconnect for details. */ uint16_t disconnect_flags; uint32_t seq_no; } __packed nvsp_2_msg_send_chimney_pkt; /* * NvspMessage2TypeSendChimneyPacketComplete */ typedef struct nvsp_2_msg_send_chimney_pkt_complete_ { /* * The NDIS_STATUS for the chimney send */ uint32_t status; /* * Number of bytes that have been sent to the peer (and ACKed by the peer). */ uint32_t bytes_transferred; } __packed nvsp_2_msg_send_chimney_pkt_complete; #define NVSP_1_CHIMNEY_RECV_FLAG_NO_PUSH 0x0001u #define NVSP_1_CHIMNEY_RECV_INVALID_OOB_INDEX 0xffffu /* * NvspMessage2TypePostChimneyRecvRequest */ typedef struct nvsp_2_msg_post_chimney_rx_request_ { /* * Identify the TCP connection which this chimney receive request * is for. */ uint32_t vsp_tcp_handle; /* * OOB Data Index * This an index to the OOB data buffer. If the index is 0xFFFFFFFF, * then there is no OOB data. * * This field shall be always 0xFFFFFFFF for now. It is reserved for * the future. */ uint32_t oob_data_index; /* * Bit 0 * When it is set, this is a "no-push" receive. * When it is clear, this is a "push" receive. * * Bit 1-15: Reserved and shall be zero */ uint16_t flags; /* * For debugging and diagnoses purpose. * The SeqNo is per TCP connection and starts from 0. */ uint32_t seq_no; } __packed nvsp_2_msg_post_chimney_rx_request; /* * NvspMessage2TypePostChimneyRecvRequestComplete */ typedef struct nvsp_2_msg_post_chimney_rx_request_complete_ { /* * The NDIS_STATUS for the chimney send */ uint32_t status; /* * Number of bytes that have been sent to the peer (and ACKed by * the peer). */ uint32_t bytes_xferred; } __packed nvsp_2_msg_post_chimney_rx_request_complete; /* * NvspMessage2TypeAllocateReceiveBuffer */ typedef struct nvsp_2_msg_alloc_rx_buf_ { /* * Allocation ID to match the allocation request and response */ uint32_t allocation_id; /* * Length of the VM shared memory receive buffer that needs to * be allocated */ uint32_t length; } __packed nvsp_2_msg_alloc_rx_buf; /* * NvspMessage2TypeAllocateReceiveBufferComplete */ typedef struct nvsp_2_msg_alloc_rx_buf_complete_ { /* * The NDIS_STATUS code for buffer allocation */ uint32_t status; /* * Allocation ID from NVSP_2_MESSAGE_ALLOCATE_RECEIVE_BUFFER */ uint32_t allocation_id; /* * GPADL handle for the allocated receive buffer */ uint32_t gpadl_handle; /* * Receive buffer ID that is further used in * NvspMessage2SendVmqRndisPacket */ uint64_t rx_buf_id; } __packed nvsp_2_msg_alloc_rx_buf_complete; /* * NvspMessage2TypeFreeReceiveBuffer */ typedef struct nvsp_2_msg_free_rx_buf_ { /* * Receive buffer ID previous returned in * NvspMessage2TypeAllocateReceiveBufferComplete message */ uint64_t rx_buf_id; } __packed nvsp_2_msg_free_rx_buf; /* * This structure is used in defining the buffers in * NVSP_2_MESSAGE_SEND_VMQ_RNDIS_PACKET structure */ typedef struct nvsp_xfer_page_range_ { /* * Specifies the ID of the receive buffer that has the buffer. This * ID can be the general receive buffer ID specified in * NvspMessage1TypeSendReceiveBuffer or it can be the shared memory * receive buffer ID allocated by the VSC and specified in * NvspMessage2TypeAllocateReceiveBufferComplete message */ uint64_t xfer_page_set_id; /* * Number of bytes */ uint32_t byte_count; /* * Offset in bytes from the beginning of the buffer */ uint32_t byte_offset; } __packed nvsp_xfer_page_range; /* * NvspMessage2SendVmqRndisPacket */ typedef struct nvsp_2_msg_send_vmq_rndis_pkt_ { /* * This field is specified by RNIDS. They assume there's * two different channels of communication. However, * the Network VSP only has one. Therefore, the channel * travels with the RNDIS packet. It must be RMC_DATA */ uint32_t channel_type; /* * Only the Range element corresponding to the RNDIS header of * the first RNDIS message in the multiple RNDIS messages sent * in one NVSP message. Information about the data portions as well * as the subsequent RNDIS messages in the same NVSP message are * embedded in the RNDIS header itself */ nvsp_xfer_page_range range; } __packed nvsp_2_msg_send_vmq_rndis_pkt; /* * This message is used by the VSC to complete * a RNDIS VMQ message to the VSP. At this point, * the initiator of this message can use any resources * associated with the original RNDIS VMQ packet. */ typedef struct nvsp_2_msg_send_vmq_rndis_pkt_complete_ { uint32_t status; } __packed nvsp_2_msg_send_vmq_rndis_pkt_complete; typedef union nvsp_1_msg_uber_ { nvsp_1_msg_send_ndis_version send_ndis_vers; nvsp_1_msg_send_rx_buf send_rx_buf; nvsp_1_msg_send_rx_buf_complete send_rx_buf_complete; nvsp_1_msg_revoke_rx_buf revoke_rx_buf; nvsp_1_msg_send_send_buf send_send_buf; nvsp_1_msg_send_send_buf_complete send_send_buf_complete; nvsp_1_msg_revoke_send_buf revoke_send_buf; nvsp_1_msg_send_rndis_pkt send_rndis_pkt; nvsp_1_msg_send_rndis_pkt_complete send_rndis_pkt_complete; } __packed nvsp_1_msg_uber; typedef union nvsp_2_msg_uber_ { nvsp_2_msg_send_ndis_config send_ndis_config; nvsp_2_msg_send_chimney_buf send_chimney_buf; nvsp_2_msg_send_chimney_buf_complete send_chimney_buf_complete; nvsp_2_msg_revoke_chimney_buf revoke_chimney_buf; nvsp_2_msg_resume_chimney_rx_indication resume_chimney_rx_indication; nvsp_2_msg_terminate_chimney terminate_chimney; nvsp_2_msg_terminate_chimney_complete terminate_chimney_complete; nvsp_2_msg_indicate_chimney_event indicate_chimney_event; nvsp_2_msg_send_chimney_pkt send_chimney_packet; nvsp_2_msg_send_chimney_pkt_complete send_chimney_packet_complete; nvsp_2_msg_post_chimney_rx_request post_chimney_rx_request; nvsp_2_msg_post_chimney_rx_request_complete post_chimney_rx_request_complete; nvsp_2_msg_alloc_rx_buf alloc_rx_buffer; nvsp_2_msg_alloc_rx_buf_complete alloc_rx_buffer_complete; nvsp_2_msg_free_rx_buf free_rx_buffer; nvsp_2_msg_send_vmq_rndis_pkt send_vmq_rndis_pkt; nvsp_2_msg_send_vmq_rndis_pkt_complete send_vmq_rndis_pkt_complete; nvsp_2_msg_alloc_chimney_handle alloc_chimney_handle; nvsp_2_msg_alloc_chimney_handle_complete alloc_chimney_handle_complete; } __packed nvsp_2_msg_uber; typedef union nvsp_all_msgs_ { nvsp_msg_init_uber init_msgs; nvsp_1_msg_uber vers_1_msgs; nvsp_2_msg_uber vers_2_msgs; } __packed nvsp_all_msgs; /* * ALL Messages */ typedef struct nvsp_msg_ { nvsp_msg_hdr hdr; nvsp_all_msgs msgs; } __packed nvsp_msg; /* * The following arguably belongs in a separate header file */ /* * Defines */ #define NETVSC_SEND_BUFFER_SIZE (1024*1024*15) /* 15M */ #define NETVSC_SEND_BUFFER_ID 0xface #define NETVSC_RECEIVE_BUFFER_SIZE_LEGACY (1024*1024*15) /* 15MB */ #define NETVSC_RECEIVE_BUFFER_SIZE (1024*1024*16) /* 16MB */ #define NETVSC_RECEIVE_BUFFER_ID 0xcafe #define NETVSC_RECEIVE_SG_COUNT 1 /* Preallocated receive packets */ #define NETVSC_RECEIVE_PACKETLIST_COUNT 256 /* * Maximum MTU we permit to be configured for a netvsc interface. * When the code was developed, a max MTU of 12232 was tested and * proven to work. 9K is a reasonable maximum for an Ethernet. */ #define NETVSC_MAX_CONFIGURABLE_MTU (9 * 1024) #define NETVSC_PACKET_SIZE PAGE_SIZE /* * Data types */ /* * Per netvsc channel-specific */ typedef struct netvsc_dev_ { struct hv_device *dev; int num_outstanding_sends; /* Send buffer allocated by us but manages by NetVSP */ void *send_buf; uint32_t send_buf_size; uint32_t send_buf_gpadl_handle; uint32_t send_section_size; uint32_t send_section_count; unsigned long bitsmap_words; unsigned long *send_section_bitsmap; /* Receive buffer allocated by us but managed by NetVSP */ void *rx_buf; uint32_t rx_buf_size; uint32_t rx_buf_gpadl_handle; uint32_t rx_section_count; nvsp_1_rx_buf_section *rx_sections; /* Used for NetVSP initialization protocol */ struct sema channel_init_sema; nvsp_msg channel_init_packet; nvsp_msg revoke_packet; /*uint8_t hw_mac_addr[HW_MACADDR_LEN];*/ /* Holds rndis device info */ void *extension; hv_bool_uint8_t destroy; /* Negotiated NVSP version */ uint32_t nvsp_version; uint8_t callback_buf[NETVSC_PACKET_SIZE]; } netvsc_dev; typedef void (*pfn_on_send_rx_completion)(void *); #define NETVSC_DEVICE_RING_BUFFER_SIZE (128 * PAGE_SIZE) #define NETVSC_PACKET_MAXPAGE 32 #define NETVSC_VLAN_PRIO_MASK 0xe000 #define NETVSC_VLAN_PRIO_SHIFT 13 #define NETVSC_VLAN_VID_MASK 0x0fff #define TYPE_IPV4 2 #define TYPE_IPV6 4 #define TYPE_TCP 2 #define TYPE_UDP 4 #define TRANSPORT_TYPE_NOT_IP 0 #define TRANSPORT_TYPE_IPV4_TCP ((TYPE_IPV4 << 16) | TYPE_TCP) #define TRANSPORT_TYPE_IPV4_UDP ((TYPE_IPV4 << 16) | TYPE_UDP) #define TRANSPORT_TYPE_IPV6_TCP ((TYPE_IPV6 << 16) | TYPE_TCP) #define TRANSPORT_TYPE_IPV6_UDP ((TYPE_IPV6 << 16) | TYPE_UDP) #ifdef __LP64__ #define BITS_PER_LONG 64 #else #define BITS_PER_LONG 32 #endif typedef struct netvsc_packet_ { struct hv_device *device; hv_bool_uint8_t is_data_pkt; /* One byte */ uint16_t vlan_tci; uint32_t status; /* Completion */ union { struct { uint64_t rx_completion_tid; void *rx_completion_context; /* This is no longer used */ pfn_on_send_rx_completion on_rx_completion; } rx; struct { uint64_t send_completion_tid; void *send_completion_context; /* Still used in netvsc and filter code */ pfn_on_send_rx_completion on_send_completion; } send; } compl; uint32_t send_buf_section_idx; uint32_t send_buf_section_size; void *rndis_mesg; uint32_t tot_data_buf_len; void *data; uint32_t page_buf_count; hv_vmbus_page_buffer page_buffers[NETVSC_PACKET_MAXPAGE]; } netvsc_packet; typedef struct { uint8_t mac_addr[6]; /* Assumption unsigned long */ hv_bool_uint8_t link_state; } netvsc_device_info; +struct hn_txdesc; +SLIST_HEAD(hn_txdesc_list, hn_txdesc); + /* * Device-specific softc structure */ typedef struct hn_softc { struct ifnet *hn_ifp; struct ifmedia hn_media; device_t hn_dev; uint8_t hn_unit; int hn_carrier; int hn_if_flags; struct mtx hn_lock; int hn_initdone; /* See hv_netvsc_drv_freebsd.c for rules on how to use */ int temp_unusable; struct hv_device *hn_dev_obj; netvsc_dev *net_dev; + int hn_txdesc_cnt; + struct hn_txdesc *hn_txdesc; + bus_dma_tag_t hn_tx_data_dtag; + bus_dma_tag_t hn_tx_rndis_dtag; + int hn_tx_chimney_size; + int hn_tx_chimney_max; + + struct mtx hn_txlist_spin; + struct hn_txdesc_list hn_txlist; + int hn_txdesc_avail; + int hn_txeof; + struct lro_ctrl hn_lro; int hn_lro_hiwat; /* Trust tcp segments verification on host side */ int hn_trust_hosttcp; u_long hn_csum_ip; u_long hn_csum_tcp; u_long hn_csum_trusted; u_long hn_lro_tried; u_long hn_small_pkts; + u_long hn_no_txdescs; + u_long hn_send_failed; + u_long hn_txdma_failed; + u_long hn_tx_collapsed; + u_long hn_tx_chimney; } hn_softc_t; /* * Externs */ extern int hv_promisc_mode; void netvsc_linkstatus_callback(struct hv_device *device_obj, uint32_t status); void netvsc_xmit_completion(void *context); void hv_nv_on_receive_completion(struct hv_device *device, uint64_t tid, uint32_t status); netvsc_dev *hv_nv_on_device_add(struct hv_device *device, void *additional_info); int hv_nv_on_device_remove(struct hv_device *device, boolean_t destroy_channel); int hv_nv_on_send(struct hv_device *device, netvsc_packet *pkt); int hv_nv_get_next_send_section(netvsc_dev *net_dev); #endif /* __HV_NET_VSC_H__ */ Index: projects/clang380-import/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c =================================================================== --- projects/clang380-import/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c (revision 294776) +++ projects/clang380-import/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c (revision 294777) @@ -1,1574 +1,1930 @@ /*- * Copyright (c) 2010-2012 Citrix Inc. * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /*- * Copyright (c) 2004-2006 Kip Macy * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet6.h" #include "opt_inet.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "hv_net_vsc.h" #include "hv_rndis.h" #include "hv_rndis_filter.h" /* Short for Hyper-V network interface */ #define NETVSC_DEVNAME "hn" /* * It looks like offset 0 of buf is reserved to hold the softc pointer. * The sc pointer evidently not needed, and is not presently populated. * The packet offset is where the netvsc_packet starts in the buffer. */ #define HV_NV_SC_PTR_OFFSET_IN_BUF 0 #define HV_NV_PACKET_OFFSET_IN_BUF 16 +/* YYY should get it from the underlying channel */ +#define HN_TX_DESC_CNT 512 + +#define HN_RNDIS_MSG_LEN \ + (sizeof(rndis_msg) + \ + RNDIS_VLAN_PPI_SIZE + \ + RNDIS_TSO_PPI_SIZE + \ + RNDIS_CSUM_PPI_SIZE) +#define HN_RNDIS_MSG_BOUNDARY PAGE_SIZE +#define HN_RNDIS_MSG_ALIGN CACHE_LINE_SIZE + +#define HN_TX_DATA_BOUNDARY PAGE_SIZE +#define HN_TX_DATA_MAXSIZE IP_MAXPACKET +#define HN_TX_DATA_SEGSIZE PAGE_SIZE +#define HN_TX_DATA_SEGCNT_MAX \ + (NETVSC_PACKET_MAXPAGE - HV_RF_NUM_TX_RESERVED_PAGE_BUFS) + +struct hn_txdesc { + SLIST_ENTRY(hn_txdesc) link; + struct mbuf *m; + struct hn_softc *sc; + int refs; + uint32_t flags; /* HN_TXD_FLAG_ */ + netvsc_packet netvsc_pkt; /* XXX to be removed */ + + bus_dmamap_t data_dmap; + + bus_addr_t rndis_msg_paddr; + rndis_msg *rndis_msg; + bus_dmamap_t rndis_msg_dmap; +}; + +#define HN_TXD_FLAG_ONLIST 0x1 +#define HN_TXD_FLAG_DMAMAP 0x2 + /* * A unified flag for all outbound check sum flags is useful, * and it helps avoiding unnecessary check sum calculation in * network forwarding scenario. */ #define HV_CSUM_FOR_OUTBOUND \ (CSUM_IP|CSUM_IP_UDP|CSUM_IP_TCP|CSUM_IP_SCTP|CSUM_IP_TSO| \ CSUM_IP_ISCSI|CSUM_IP6_UDP|CSUM_IP6_TCP|CSUM_IP6_SCTP| \ CSUM_IP6_TSO|CSUM_IP6_ISCSI) /* XXX move to netinet/tcp_lro.h */ #define HN_LRO_HIWAT_MAX 65535 #define HN_LRO_HIWAT_DEF HN_LRO_HIWAT_MAX /* YYY 2*MTU is a bit rough, but should be good enough. */ #define HN_LRO_HIWAT_MTULIM(ifp) (2 * (ifp)->if_mtu) #define HN_LRO_HIWAT_ISVALID(sc, hiwat) \ ((hiwat) >= HN_LRO_HIWAT_MTULIM((sc)->hn_ifp) || \ (hiwat) <= HN_LRO_HIWAT_MAX) /* * Be aware that this sleepable mutex will exhibit WITNESS errors when * certain TCP and ARP code paths are taken. This appears to be a * well-known condition, as all other drivers checked use a sleeping * mutex to protect their transmit paths. * Also Be aware that mutexes do not play well with semaphores, and there * is a conflicting semaphore in a certain channel code path. */ #define NV_LOCK_INIT(_sc, _name) \ mtx_init(&(_sc)->hn_lock, _name, MTX_NETWORK_LOCK, MTX_DEF) #define NV_LOCK(_sc) mtx_lock(&(_sc)->hn_lock) #define NV_LOCK_ASSERT(_sc) mtx_assert(&(_sc)->hn_lock, MA_OWNED) #define NV_UNLOCK(_sc) mtx_unlock(&(_sc)->hn_lock) #define NV_LOCK_DESTROY(_sc) mtx_destroy(&(_sc)->hn_lock) /* * Globals */ int hv_promisc_mode = 0; /* normal mode by default */ /* Trust tcp segements verification on host side. */ -static int hn_trust_hosttcp = 0; +static int hn_trust_hosttcp = 1; TUNABLE_INT("dev.hn.trust_hosttcp", &hn_trust_hosttcp); +#if __FreeBSD_version >= 1100045 +/* Limit TSO burst size */ +static int hn_tso_maxlen = 0; +TUNABLE_INT("dev.hn.tso_maxlen", &hn_tso_maxlen); +#endif + +/* Limit chimney send size */ +static int hn_tx_chimney_size = 0; +TUNABLE_INT("dev.hn.tx_chimney_size", &hn_tx_chimney_size); + /* * Forward declarations */ static void hn_stop(hn_softc_t *sc); static void hn_ifinit_locked(hn_softc_t *sc); static void hn_ifinit(void *xsc); static int hn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data); -static int hn_start_locked(struct ifnet *ifp); +static void hn_start_locked(struct ifnet *ifp); static void hn_start(struct ifnet *ifp); static int hn_ifmedia_upd(struct ifnet *ifp); static void hn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr); #ifdef HN_LRO_HIWAT static int hn_lro_hiwat_sysctl(SYSCTL_HANDLER_ARGS); #endif +static int hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS); static int hn_check_iplen(const struct mbuf *, int); +static int hn_create_tx_ring(struct hn_softc *sc); +static void hn_destroy_tx_ring(struct hn_softc *sc); static __inline void hn_set_lro_hiwat(struct hn_softc *sc, int hiwat) { sc->hn_lro_hiwat = hiwat; #ifdef HN_LRO_HIWAT sc->hn_lro.lro_hiwat = sc->hn_lro_hiwat; #endif } /* * NetVsc get message transport protocol type */ static uint32_t get_transport_proto_type(struct mbuf *m_head) { uint32_t ret_val = TRANSPORT_TYPE_NOT_IP; uint16_t ether_type = 0; int ether_len = 0; struct ether_vlan_header *eh; #ifdef INET struct ip *iph; #endif #ifdef INET6 struct ip6_hdr *ip6; #endif eh = mtod(m_head, struct ether_vlan_header*); if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) { ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN; ether_type = eh->evl_proto; } else { ether_len = ETHER_HDR_LEN; ether_type = eh->evl_encap_proto; } switch (ntohs(ether_type)) { #ifdef INET6 case ETHERTYPE_IPV6: ip6 = (struct ip6_hdr *)(m_head->m_data + ether_len); if (IPPROTO_TCP == ip6->ip6_nxt) { ret_val = TRANSPORT_TYPE_IPV6_TCP; } else if (IPPROTO_UDP == ip6->ip6_nxt) { ret_val = TRANSPORT_TYPE_IPV6_UDP; } break; #endif #ifdef INET case ETHERTYPE_IP: iph = (struct ip *)(m_head->m_data + ether_len); if (IPPROTO_TCP == iph->ip_p) { ret_val = TRANSPORT_TYPE_IPV4_TCP; } else if (IPPROTO_UDP == iph->ip_p) { ret_val = TRANSPORT_TYPE_IPV4_UDP; } break; #endif default: ret_val = TRANSPORT_TYPE_NOT_IP; break; } return (ret_val); } static int hn_ifmedia_upd(struct ifnet *ifp __unused) { return EOPNOTSUPP; } static void hn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr) { struct hn_softc *sc = ifp->if_softc; ifmr->ifm_status = IFM_AVALID; ifmr->ifm_active = IFM_ETHER; if (!sc->hn_carrier) { ifmr->ifm_active |= IFM_NONE; return; } ifmr->ifm_status |= IFM_ACTIVE; ifmr->ifm_active |= IFM_10G_T | IFM_FDX; } /* {F8615163-DF3E-46c5-913F-F2D2F965ED0E} */ static const hv_guid g_net_vsc_device_type = { .data = {0x63, 0x51, 0x61, 0xF8, 0x3E, 0xDF, 0xc5, 0x46, 0x91, 0x3F, 0xF2, 0xD2, 0xF9, 0x65, 0xED, 0x0E} }; /* * Standard probe entry point. * */ static int netvsc_probe(device_t dev) { const char *p; p = vmbus_get_type(dev); if (!memcmp(p, &g_net_vsc_device_type.data, sizeof(hv_guid))) { device_set_desc(dev, "Synthetic Network Interface"); if (bootverbose) printf("Netvsc probe... DONE \n"); return (BUS_PROBE_DEFAULT); } return (ENXIO); } /* * Standard attach entry point. * * Called when the driver is loaded. It allocates needed resources, * and initializes the "hardware" and software. */ static int netvsc_attach(device_t dev) { struct hv_device *device_ctx = vmbus_get_devctx(dev); netvsc_device_info device_info; hn_softc_t *sc; int unit = device_get_unit(dev); - struct ifnet *ifp; + struct ifnet *ifp = NULL; struct sysctl_oid_list *child; struct sysctl_ctx_list *ctx; - int ret; + int error; +#if __FreeBSD_version >= 1100045 + int tso_maxlen; +#endif sc = device_get_softc(dev); if (sc == NULL) { return (ENOMEM); } bzero(sc, sizeof(hn_softc_t)); sc->hn_unit = unit; sc->hn_dev = dev; sc->hn_lro_hiwat = HN_LRO_HIWAT_DEF; sc->hn_trust_hosttcp = hn_trust_hosttcp; + error = hn_create_tx_ring(sc); + if (error) + goto failed; + NV_LOCK_INIT(sc, "NetVSCLock"); sc->hn_dev_obj = device_ctx; ifp = sc->hn_ifp = if_alloc(IFT_ETHER); ifp->if_softc = sc; if_initname(ifp, device_get_name(dev), device_get_unit(dev)); ifp->if_dunit = unit; ifp->if_dname = NETVSC_DEVNAME; ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST; ifp->if_ioctl = hn_ioctl; ifp->if_start = hn_start; ifp->if_init = hn_ifinit; /* needed by hv_rf_on_device_add() code */ ifp->if_mtu = ETHERMTU; IFQ_SET_MAXLEN(&ifp->if_snd, 512); ifp->if_snd.ifq_drv_maxlen = 511; IFQ_SET_READY(&ifp->if_snd); ifmedia_init(&sc->hn_media, 0, hn_ifmedia_upd, hn_ifmedia_sts); ifmedia_add(&sc->hn_media, IFM_ETHER | IFM_AUTO, 0, NULL); ifmedia_set(&sc->hn_media, IFM_ETHER | IFM_AUTO); /* XXX ifmedia_set really should do this for us */ sc->hn_media.ifm_media = sc->hn_media.ifm_cur->ifm_media; /* * Tell upper layers that we support full VLAN capability. */ ifp->if_hdrlen = sizeof(struct ether_vlan_header); ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO | IFCAP_LRO; ifp->if_capenable |= IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO | IFCAP_LRO; /* * Only enable UDP checksum offloading when it is on 2012R2 or * later. UDP checksum offloading doesn't work on earlier * Windows releases. */ if (hv_vmbus_protocal_version >= HV_VMBUS_VERSION_WIN8_1) ifp->if_hwassist = CSUM_TCP | CSUM_UDP | CSUM_TSO; else ifp->if_hwassist = CSUM_TCP | CSUM_TSO; - ret = hv_rf_on_device_add(device_ctx, &device_info); - if (ret != 0) { - if_free(ifp); + error = hv_rf_on_device_add(device_ctx, &device_info); + if (error) + goto failed; - return (ret); - } if (device_info.link_state == 0) { sc->hn_carrier = 1; } #if defined(INET) || defined(INET6) tcp_lro_init(&sc->hn_lro); /* Driver private LRO settings */ sc->hn_lro.ifp = ifp; #ifdef HN_LRO_HIWAT sc->hn_lro.lro_hiwat = sc->hn_lro_hiwat; #endif #endif /* INET || INET6 */ +#if __FreeBSD_version >= 1100045 + tso_maxlen = hn_tso_maxlen; + if (tso_maxlen <= 0 || tso_maxlen > IP_MAXPACKET) + tso_maxlen = IP_MAXPACKET; + + ifp->if_hw_tsomaxsegcount = HN_TX_DATA_SEGCNT_MAX; + ifp->if_hw_tsomaxsegsize = PAGE_SIZE; + ifp->if_hw_tsomax = tso_maxlen - + (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); +#endif + ether_ifattach(ifp, device_info.mac_addr); +#if __FreeBSD_version >= 1100045 + if_printf(ifp, "TSO: %u/%u/%u\n", ifp->if_hw_tsomax, + ifp->if_hw_tsomaxsegcount, ifp->if_hw_tsomaxsegsize); +#endif + + sc->hn_tx_chimney_max = sc->net_dev->send_section_size; + sc->hn_tx_chimney_size = sc->hn_tx_chimney_max; + if (hn_tx_chimney_size > 0 && + hn_tx_chimney_size < sc->hn_tx_chimney_max) + sc->hn_tx_chimney_size = hn_tx_chimney_size; + ctx = device_get_sysctl_ctx(dev); child = SYSCTL_CHILDREN(device_get_sysctl_tree(dev)); SYSCTL_ADD_U64(ctx, child, OID_AUTO, "lro_queued", CTLFLAG_RW, &sc->hn_lro.lro_queued, 0, "LRO queued"); SYSCTL_ADD_U64(ctx, child, OID_AUTO, "lro_flushed", CTLFLAG_RW, &sc->hn_lro.lro_flushed, 0, "LRO flushed"); SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "lro_tried", CTLFLAG_RW, &sc->hn_lro_tried, "# of LRO tries"); #ifdef HN_LRO_HIWAT SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "lro_hiwat", CTLTYPE_INT | CTLFLAG_RW, sc, 0, hn_lro_hiwat_sysctl, "I", "LRO high watermark"); #endif SYSCTL_ADD_INT(ctx, child, OID_AUTO, "trust_hosttcp", CTLFLAG_RW, &sc->hn_trust_hosttcp, 0, "Trust tcp segement verification on host side, " "when csum info is missing"); SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_ip", CTLFLAG_RW, &sc->hn_csum_ip, "RXCSUM IP"); SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_tcp", CTLFLAG_RW, &sc->hn_csum_tcp, "RXCSUM TCP"); SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_trusted", CTLFLAG_RW, &sc->hn_csum_trusted, "# of TCP segements that we trust host's csum verification"); SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "small_pkts", CTLFLAG_RW, &sc->hn_small_pkts, "# of small packets received"); + SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "no_txdescs", + CTLFLAG_RW, &sc->hn_no_txdescs, "# of times short of TX descs"); + SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "send_failed", + CTLFLAG_RW, &sc->hn_send_failed, "# of hyper-v sending failure"); + SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "txdma_failed", + CTLFLAG_RW, &sc->hn_txdma_failed, "# of TX DMA failure"); + SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_collapsed", + CTLFLAG_RW, &sc->hn_tx_collapsed, "# of TX mbuf collapsed"); + SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "tx_chimney", + CTLFLAG_RW, &sc->hn_tx_chimney, "# of chimney send"); + SYSCTL_ADD_INT(ctx, child, OID_AUTO, "txdesc_cnt", + CTLFLAG_RD, &sc->hn_txdesc_cnt, 0, "# of total TX descs"); + SYSCTL_ADD_INT(ctx, child, OID_AUTO, "txdesc_avail", + CTLFLAG_RD, &sc->hn_txdesc_avail, 0, "# of available TX descs"); + SYSCTL_ADD_INT(ctx, child, OID_AUTO, "tx_chimney_max", + CTLFLAG_RD, &sc->hn_tx_chimney_max, 0, + "Chimney send packet size upper boundary"); + SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "tx_chimney_size", + CTLTYPE_INT | CTLFLAG_RW, sc, 0, hn_tx_chimney_size_sysctl, + "I", "Chimney send packet size limit"); if (unit == 0) { struct sysctl_ctx_list *dc_ctx; struct sysctl_oid_list *dc_child; devclass_t dc; /* * Add sysctl nodes for devclass */ dc = device_get_devclass(dev); dc_ctx = devclass_get_sysctl_ctx(dc); dc_child = SYSCTL_CHILDREN(devclass_get_sysctl_tree(dc)); SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "trust_hosttcp", CTLFLAG_RD, &hn_trust_hosttcp, 0, "Trust tcp segement verification on host side, " "when csum info is missing (global setting)"); + SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "tx_chimney_size", + CTLFLAG_RD, &hn_tx_chimney_size, 0, + "Chimney send packet size limit"); +#if __FreeBSD_version >= 1100045 + SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "tso_maxlen", + CTLFLAG_RD, &hn_tso_maxlen, 0, "TSO burst limit"); +#endif } return (0); +failed: + hn_destroy_tx_ring(sc); + if (ifp != NULL) + if_free(ifp); + return (error); } /* * Standard detach entry point */ static int netvsc_detach(device_t dev) { struct hn_softc *sc = device_get_softc(dev); struct hv_device *hv_device = vmbus_get_devctx(dev); if (bootverbose) printf("netvsc_detach\n"); /* * XXXKYS: Need to clean up all our * driver state; this is the driver * unloading. */ /* * XXXKYS: Need to stop outgoing traffic and unregister * the netdevice. */ hv_rf_on_device_remove(hv_device, HV_RF_NV_DESTROY_CHANNEL); ifmedia_removeall(&sc->hn_media); #if defined(INET) || defined(INET6) tcp_lro_free(&sc->hn_lro); #endif + hn_destroy_tx_ring(sc); return (0); } /* * Standard shutdown entry point */ static int netvsc_shutdown(device_t dev) { return (0); } +static __inline int +hn_txdesc_dmamap_load(struct hn_softc *sc, struct hn_txdesc *txd, + struct mbuf **m_head, bus_dma_segment_t *segs, int *nsegs) +{ + struct mbuf *m = *m_head; + int error; + + error = bus_dmamap_load_mbuf_sg(sc->hn_tx_data_dtag, txd->data_dmap, + m, segs, nsegs, BUS_DMA_NOWAIT); + if (error == EFBIG) { + struct mbuf *m_new; + + m_new = m_collapse(m, M_NOWAIT, HN_TX_DATA_SEGCNT_MAX); + if (m_new == NULL) + return ENOBUFS; + else + *m_head = m = m_new; + sc->hn_tx_collapsed++; + + error = bus_dmamap_load_mbuf_sg(sc->hn_tx_data_dtag, + txd->data_dmap, m, segs, nsegs, BUS_DMA_NOWAIT); + } + if (!error) { + bus_dmamap_sync(sc->hn_tx_data_dtag, txd->data_dmap, + BUS_DMASYNC_PREWRITE); + txd->flags |= HN_TXD_FLAG_DMAMAP; + } + return error; +} + +static __inline void +hn_txdesc_dmamap_unload(struct hn_softc *sc, struct hn_txdesc *txd) +{ + + if (txd->flags & HN_TXD_FLAG_DMAMAP) { + bus_dmamap_sync(sc->hn_tx_data_dtag, + txd->data_dmap, BUS_DMASYNC_POSTWRITE); + bus_dmamap_unload(sc->hn_tx_data_dtag, + txd->data_dmap); + txd->flags &= ~HN_TXD_FLAG_DMAMAP; + } +} + +static __inline int +hn_txdesc_put(struct hn_softc *sc, struct hn_txdesc *txd) +{ + + KASSERT((txd->flags & HN_TXD_FLAG_ONLIST) == 0, + ("put an onlist txd %#x", txd->flags)); + + KASSERT(txd->refs > 0, ("invalid txd refs %d", txd->refs)); + if (atomic_fetchadd_int(&txd->refs, -1) != 1) + return 0; + + hn_txdesc_dmamap_unload(sc, txd); + if (txd->m != NULL) { + m_freem(txd->m); + txd->m = NULL; + } + + txd->flags |= HN_TXD_FLAG_ONLIST; + + mtx_lock_spin(&sc->hn_txlist_spin); + KASSERT(sc->hn_txdesc_avail >= 0 && + sc->hn_txdesc_avail < sc->hn_txdesc_cnt, + ("txdesc_put: invalid txd avail %d", sc->hn_txdesc_avail)); + sc->hn_txdesc_avail++; + SLIST_INSERT_HEAD(&sc->hn_txlist, txd, link); + mtx_unlock_spin(&sc->hn_txlist_spin); + + return 1; +} + +static __inline struct hn_txdesc * +hn_txdesc_get(struct hn_softc *sc) +{ + struct hn_txdesc *txd; + + mtx_lock_spin(&sc->hn_txlist_spin); + txd = SLIST_FIRST(&sc->hn_txlist); + if (txd != NULL) { + KASSERT(sc->hn_txdesc_avail > 0, + ("txdesc_get: invalid txd avail %d", sc->hn_txdesc_avail)); + sc->hn_txdesc_avail--; + SLIST_REMOVE_HEAD(&sc->hn_txlist, link); + } + mtx_unlock_spin(&sc->hn_txlist_spin); + + if (txd != NULL) { + KASSERT(txd->m == NULL && txd->refs == 0 && + (txd->flags & HN_TXD_FLAG_ONLIST), ("invalid txd")); + txd->flags &= ~HN_TXD_FLAG_ONLIST; + txd->refs = 1; + } + return txd; +} + +static __inline void +hn_txdesc_hold(struct hn_txdesc *txd) +{ + + /* 0->1 transition will never work */ + KASSERT(txd->refs > 0, ("invalid refs %d", txd->refs)); + atomic_add_int(&txd->refs, 1); +} + /* * Send completion processing * * Note: It looks like offset 0 of buf is reserved to hold the softc * pointer. The sc pointer is not currently needed in this function, and * it is not presently populated by the TX function. */ void netvsc_xmit_completion(void *context) { - netvsc_packet *packet = (netvsc_packet *)context; - struct mbuf *mb; - uint8_t *buf; + netvsc_packet *packet = context; + struct hn_txdesc *txd; + struct hn_softc *sc; - mb = (struct mbuf *)(uintptr_t)packet->compl.send.send_completion_tid; - buf = ((uint8_t *)packet) - HV_NV_PACKET_OFFSET_IN_BUF; + txd = (struct hn_txdesc *)(uintptr_t) + packet->compl.send.send_completion_tid; - free(buf, M_NETVSC); + sc = txd->sc; + sc->hn_txeof = 1; + hn_txdesc_put(sc, txd); +} - if (mb != NULL) { - m_freem(mb); - } +void +netvsc_channel_rollup(struct hv_device *device_ctx) +{ + struct hn_softc *sc = device_get_softc(device_ctx->device); + struct ifnet *ifp; + + if (!sc->hn_txeof) + return; + + sc->hn_txeof = 0; + ifp = sc->hn_ifp; + NV_LOCK(sc); + ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; + hn_start_locked(ifp); + NV_UNLOCK(sc); } /* * Start a transmit of one or more packets */ -static int +static void hn_start_locked(struct ifnet *ifp) { hn_softc_t *sc = ifp->if_softc; struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev); netvsc_dev *net_dev = sc->net_dev; - device_t dev = device_ctx->device; - uint8_t *buf; - netvsc_packet *packet; - struct mbuf *m_head, *m; - struct mbuf *mc_head = NULL; struct ether_vlan_header *eh; rndis_msg *rndis_mesg; rndis_packet *rndis_pkt; rndis_per_packet_info *rppi; ndis_8021q_info *rppi_vlan_info; rndis_tcp_ip_csum_info *csum_info; rndis_tcp_tso_info *tso_info; int ether_len; - int i; - int num_frags; - int len; - int retries = 0; - int ret = 0; uint32_t rndis_msg_size = 0; uint32_t trans_proto_type; - uint32_t send_buf_section_idx = - NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX; - while (!IFQ_DRV_IS_EMPTY(&sc->hn_ifp->if_snd)) { - IFQ_DRV_DEQUEUE(&sc->hn_ifp->if_snd, m_head); - if (m_head == NULL) { - break; - } + if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) != + IFF_DRV_RUNNING) + return; - len = 0; - num_frags = 0; + while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) { + bus_dma_segment_t segs[HN_TX_DATA_SEGCNT_MAX]; + int error, nsegs, i, send_failed = 0; + struct hn_txdesc *txd; + netvsc_packet *packet; + struct mbuf *m_head; - /* Walk the mbuf list computing total length and num frags */ - for (m = m_head; m != NULL; m = m->m_next) { - if (m->m_len != 0) { - num_frags++; - len += m->m_len; - } - } + IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head); + if (m_head == NULL) + break; - /* - * Reserve the number of pages requested. Currently, - * one page is reserved for the message in the RNDIS - * filter packet - */ - num_frags += HV_RF_NUM_TX_RESERVED_PAGE_BUFS; - - /* If exceeds # page_buffers in netvsc_packet */ - if (num_frags > NETVSC_PACKET_MAXPAGE) { - device_printf(dev, "exceed max page buffers,%d,%d\n", - num_frags, NETVSC_PACKET_MAXPAGE); - m_freem(m_head); - if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); - return (EINVAL); + txd = hn_txdesc_get(sc); + if (txd == NULL) { + sc->hn_no_txdescs++; + IF_PREPEND(&ifp->if_snd, m_head); + ifp->if_drv_flags |= IFF_DRV_OACTIVE; + break; } - /* - * Allocate a buffer with space for a netvsc packet plus a - * number of reserved areas. First comes a (currently 16 - * bytes, currently unused) reserved data area. Second is - * the netvsc_packet. Third is an area reserved for an - * rndis_filter_packet struct. Fourth (optional) is a - * rndis_per_packet_info struct. - * Changed malloc to M_NOWAIT to avoid sleep under spin lock. - * No longer reserving extra space for page buffers, as they - * are already part of the netvsc_packet. - */ - buf = malloc(HV_NV_PACKET_OFFSET_IN_BUF + - sizeof(netvsc_packet) + - sizeof(rndis_msg) + - RNDIS_VLAN_PPI_SIZE + - RNDIS_TSO_PPI_SIZE + - RNDIS_CSUM_PPI_SIZE, - M_NETVSC, M_ZERO | M_NOWAIT); - if (buf == NULL) { - device_printf(dev, "hn:malloc packet failed\n"); - m_freem(m_head); - if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); - return (ENOMEM); - } - - packet = (netvsc_packet *)(buf + HV_NV_PACKET_OFFSET_IN_BUF); - *(vm_offset_t *)buf = HV_NV_SC_PTR_OFFSET_IN_BUF; - + packet = &txd->netvsc_pkt; packet->is_data_pkt = TRUE; - - /* Set up the rndis header */ - packet->page_buf_count = num_frags; - /* Initialize it from the mbuf */ - packet->tot_data_buf_len = len; + packet->tot_data_buf_len = m_head->m_pkthdr.len; /* * extension points to the area reserved for the * rndis_filter_packet, which is placed just after * the netvsc_packet (and rppi struct, if present; * length is updated later). */ - packet->rndis_mesg = packet + 1; - rndis_mesg = (rndis_msg *)packet->rndis_mesg; + rndis_mesg = txd->rndis_msg; + /* XXX not necessary */ + memset(rndis_mesg, 0, HN_RNDIS_MSG_LEN); rndis_mesg->ndis_msg_type = REMOTE_NDIS_PACKET_MSG; rndis_pkt = &rndis_mesg->msg.packet; rndis_pkt->data_offset = sizeof(rndis_packet); rndis_pkt->data_length = packet->tot_data_buf_len; rndis_pkt->per_pkt_info_offset = sizeof(rndis_packet); rndis_msg_size = RNDIS_MESSAGE_SIZE(rndis_packet); /* * If the Hyper-V infrastructure needs to embed a VLAN tag, * initialize netvsc_packet and rppi struct values as needed. */ if (m_head->m_flags & M_VLANTAG) { /* * set up some additional fields so the Hyper-V infrastructure will stuff the VLAN tag * into the frame. */ - packet->vlan_tci = m_head->m_pkthdr.ether_vtag; - rndis_msg_size += RNDIS_VLAN_PPI_SIZE; rppi = hv_set_rppi_data(rndis_mesg, RNDIS_VLAN_PPI_SIZE, ieee_8021q_info); /* VLAN info immediately follows rppi struct */ rppi_vlan_info = (ndis_8021q_info *)((char*)rppi + rppi->per_packet_info_offset); /* FreeBSD does not support CFI or priority */ rppi_vlan_info->u1.s1.vlan_id = - packet->vlan_tci & 0xfff; + m_head->m_pkthdr.ether_vtag & 0xfff; } /* Only check the flags for outbound and ignore the ones for inbound */ if (0 == (m_head->m_pkthdr.csum_flags & HV_CSUM_FOR_OUTBOUND)) { goto pre_send; } eh = mtod(m_head, struct ether_vlan_header*); if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) { ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN; } else { ether_len = ETHER_HDR_LEN; } trans_proto_type = get_transport_proto_type(m_head); if (TRANSPORT_TYPE_NOT_IP == trans_proto_type) { goto pre_send; } /* * TSO packet needless to setup the send side checksum * offload. */ if (m_head->m_pkthdr.csum_flags & CSUM_TSO) { goto do_tso; } /* setup checksum offload */ rndis_msg_size += RNDIS_CSUM_PPI_SIZE; rppi = hv_set_rppi_data(rndis_mesg, RNDIS_CSUM_PPI_SIZE, tcpip_chksum_info); csum_info = (rndis_tcp_ip_csum_info *)((char*)rppi + rppi->per_packet_info_offset); if (trans_proto_type & (TYPE_IPV4 << 16)) { csum_info->xmit.is_ipv4 = 1; } else { csum_info->xmit.is_ipv6 = 1; } if (trans_proto_type & TYPE_TCP) { csum_info->xmit.tcp_csum = 1; csum_info->xmit.tcp_header_offset = 0; } else if (trans_proto_type & TYPE_UDP) { csum_info->xmit.udp_csum = 1; } goto pre_send; do_tso: /* setup TCP segmentation offload */ rndis_msg_size += RNDIS_TSO_PPI_SIZE; rppi = hv_set_rppi_data(rndis_mesg, RNDIS_TSO_PPI_SIZE, tcp_large_send_info); tso_info = (rndis_tcp_tso_info *)((char *)rppi + rppi->per_packet_info_offset); tso_info->lso_v2_xmit.type = RNDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE; #ifdef INET if (trans_proto_type & (TYPE_IPV4 << 16)) { struct ip *ip = (struct ip *)(m_head->m_data + ether_len); unsigned long iph_len = ip->ip_hl << 2; struct tcphdr *th = (struct tcphdr *)((caddr_t)ip + iph_len); tso_info->lso_v2_xmit.ip_version = RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV4; ip->ip_len = 0; ip->ip_sum = 0; th->th_sum = in_pseudo(ip->ip_src.s_addr, ip->ip_dst.s_addr, htons(IPPROTO_TCP)); } #endif #if defined(INET6) && defined(INET) else #endif #ifdef INET6 { struct ip6_hdr *ip6 = (struct ip6_hdr *)(m_head->m_data + ether_len); struct tcphdr *th = (struct tcphdr *)(ip6 + 1); tso_info->lso_v2_xmit.ip_version = RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV6; ip6->ip6_plen = 0; th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0); } #endif tso_info->lso_v2_xmit.tcp_header_offset = 0; tso_info->lso_v2_xmit.mss = m_head->m_pkthdr.tso_segsz; pre_send: rndis_mesg->msg_len = packet->tot_data_buf_len + rndis_msg_size; packet->tot_data_buf_len = rndis_mesg->msg_len; /* send packet with send buffer */ - if (packet->tot_data_buf_len < net_dev->send_section_size) { + if (packet->tot_data_buf_len < sc->hn_tx_chimney_size) { + uint32_t send_buf_section_idx; + send_buf_section_idx = hv_nv_get_next_send_section(net_dev); if (send_buf_section_idx != NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX) { - char *dest = ((char *)net_dev->send_buf + - send_buf_section_idx * - net_dev->send_section_size); + uint8_t *dest = ((uint8_t *)net_dev->send_buf + + (send_buf_section_idx * + net_dev->send_section_size)); memcpy(dest, rndis_mesg, rndis_msg_size); dest += rndis_msg_size; - for (m = m_head; m != NULL; m = m->m_next) { - if (m->m_len) { - memcpy(dest, - (void *)mtod(m, vm_offset_t), - m->m_len); - dest += m->m_len; - } - } + m_copydata(m_head, 0, m_head->m_pkthdr.len, + dest); + packet->send_buf_section_idx = send_buf_section_idx; packet->send_buf_section_size = packet->tot_data_buf_len; packet->page_buf_count = 0; + sc->hn_tx_chimney++; goto do_send; } } + error = hn_txdesc_dmamap_load(sc, txd, &m_head, segs, &nsegs); + if (error) { + int freed; + + /* + * This mbuf is not linked w/ the txd yet, so free + * it now. + */ + m_freem(m_head); + freed = hn_txdesc_put(sc, txd); + KASSERT(freed != 0, + ("fail to free txd upon txdma error")); + + sc->hn_txdma_failed++; + if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); + continue; + } + + packet->page_buf_count = nsegs + + HV_RF_NUM_TX_RESERVED_PAGE_BUFS; + /* send packet with page buffer */ - packet->page_buffers[0].pfn = - atop(hv_get_phys_addr(rndis_mesg)); + packet->page_buffers[0].pfn = atop(txd->rndis_msg_paddr); packet->page_buffers[0].offset = - (unsigned long)rndis_mesg & PAGE_MASK; + txd->rndis_msg_paddr & PAGE_MASK; packet->page_buffers[0].length = rndis_msg_size; /* * Fill the page buffers with mbuf info starting at index * HV_RF_NUM_TX_RESERVED_PAGE_BUFS. */ - i = HV_RF_NUM_TX_RESERVED_PAGE_BUFS; - for (m = m_head; m != NULL; m = m->m_next) { - if (m->m_len) { - vm_offset_t paddr = - vtophys(mtod(m, vm_offset_t)); - packet->page_buffers[i].pfn = - paddr >> PAGE_SHIFT; - packet->page_buffers[i].offset = - paddr & (PAGE_SIZE - 1); - packet->page_buffers[i].length = m->m_len; - i++; - } + for (i = 0; i < nsegs; ++i) { + hv_vmbus_page_buffer *pb = &packet->page_buffers[ + i + HV_RF_NUM_TX_RESERVED_PAGE_BUFS]; + + pb->pfn = atop(segs[i].ds_addr); + pb->offset = segs[i].ds_addr & PAGE_MASK; + pb->length = segs[i].ds_len; } packet->send_buf_section_idx = NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX; packet->send_buf_section_size = 0; do_send: + txd->m = m_head; - /* - * If bpf, copy the mbuf chain. This is less expensive than - * it appears; the mbuf clusters are not copied, only their - * reference counts are incremented. - * Needed to avoid a race condition where the completion - * callback is invoked, freeing the mbuf chain, before the - * bpf_mtap code has a chance to run. - */ - if (ifp->if_bpf) { - mc_head = m_copypacket(m_head, M_NOWAIT); - } -retry_send: /* Set the completion routine */ packet->compl.send.on_send_completion = netvsc_xmit_completion; packet->compl.send.send_completion_context = packet; - packet->compl.send.send_completion_tid = (uint64_t)(uintptr_t)m_head; + packet->compl.send.send_completion_tid = + (uint64_t)(uintptr_t)txd; - /* Removed critical_enter(), does not appear necessary */ - ret = hv_nv_on_send(device_ctx, packet); - if (ret == 0) { +again: + /* + * Make sure that txd is not freed before ETHER_BPF_MTAP. + */ + hn_txdesc_hold(txd); + error = hv_nv_on_send(device_ctx, packet); + if (!error) { + ETHER_BPF_MTAP(ifp, m_head); if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1); - /* if bpf && mc_head, call bpf_mtap code */ - if (mc_head) { - ETHER_BPF_MTAP(ifp, mc_head); - } - } else { - retries++; - if (retries < 4) { - goto retry_send; - } + } + hn_txdesc_put(sc, txd); - IF_PREPEND(&ifp->if_snd, m_head); - ifp->if_drv_flags |= IFF_DRV_OACTIVE; + if (__predict_false(error)) { + int freed; /* - * Null the mbuf pointer so the completion function - * does not free the mbuf chain. We just pushed the - * mbuf chain back on the if_snd queue. + * This should "really rarely" happen. + * + * XXX Too many RX to be acked or too many sideband + * commands to run? Ask netvsc_channel_rollup() + * to kick start later. */ - packet->compl.send.send_completion_tid = 0; + sc->hn_txeof = 1; + if (!send_failed) { + sc->hn_send_failed++; + send_failed = 1; + /* + * Try sending again after set hn_txeof; + * in case that we missed the last + * netvsc_channel_rollup(). + */ + goto again; + } + if_printf(ifp, "send failed\n"); /* - * Release the resources since we will not get any - * send completion + * This mbuf will be prepended, don't free it + * in hn_txdesc_put(); only unload it from the + * DMA map in hn_txdesc_put(), if it was loaded. */ - netvsc_xmit_completion(packet); - if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); - } + txd->m = NULL; + freed = hn_txdesc_put(sc, txd); + KASSERT(freed != 0, + ("fail to free txd upon send error")); - /* if bpf && mc_head, free the mbuf chain copy */ - if (mc_head) { - m_freem(mc_head); + sc->hn_send_failed++; + IF_PREPEND(&ifp->if_snd, m_head); + ifp->if_drv_flags |= IFF_DRV_OACTIVE; + break; } } - - return (ret); } /* * Link up/down notification */ void netvsc_linkstatus_callback(struct hv_device *device_obj, uint32_t status) { hn_softc_t *sc = device_get_softc(device_obj->device); if (sc == NULL) { return; } if (status == 1) { sc->hn_carrier = 1; } else { sc->hn_carrier = 0; } } /* * Append the specified data to the indicated mbuf chain, * Extend the mbuf chain if the new data does not fit in * existing space. * * This is a minor rewrite of m_append() from sys/kern/uipc_mbuf.c. * There should be an equivalent in the kernel mbuf code, * but there does not appear to be one yet. * * Differs from m_append() in that additional mbufs are * allocated with cluster size MJUMPAGESIZE, and filled * accordingly. * * Return 1 if able to complete the job; otherwise 0. */ static int hv_m_append(struct mbuf *m0, int len, c_caddr_t cp) { struct mbuf *m, *n; int remainder, space; for (m = m0; m->m_next != NULL; m = m->m_next) ; remainder = len; space = M_TRAILINGSPACE(m); if (space > 0) { /* * Copy into available space. */ if (space > remainder) space = remainder; bcopy(cp, mtod(m, caddr_t) + m->m_len, space); m->m_len += space; cp += space; remainder -= space; } while (remainder > 0) { /* * Allocate a new mbuf; could check space * and allocate a cluster instead. */ n = m_getjcl(M_NOWAIT, m->m_type, 0, MJUMPAGESIZE); if (n == NULL) break; n->m_len = min(MJUMPAGESIZE, remainder); bcopy(cp, mtod(n, caddr_t), n->m_len); cp += n->m_len; remainder -= n->m_len; m->m_next = n; m = n; } if (m0->m_flags & M_PKTHDR) m0->m_pkthdr.len += len - remainder; return (remainder == 0); } /* * Called when we receive a data packet from the "wire" on the * specified device * * Note: This is no longer used as a callback */ int netvsc_recv(struct hv_device *device_ctx, netvsc_packet *packet, rndis_tcp_ip_csum_info *csum_info) { hn_softc_t *sc = (hn_softc_t *)device_get_softc(device_ctx->device); struct mbuf *m_new; struct ifnet *ifp; device_t dev = device_ctx->device; int size, do_lro = 0; if (sc == NULL) { return (0); /* TODO: KYS how can this be! */ } ifp = sc->hn_ifp; if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { return (0); } /* * Bail out if packet contains more data than configured MTU. */ if (packet->tot_data_buf_len > (ifp->if_mtu + ETHER_HDR_LEN)) { return (0); } else if (packet->tot_data_buf_len <= MHLEN) { m_new = m_gethdr(M_NOWAIT, MT_DATA); if (m_new == NULL) return (0); memcpy(mtod(m_new, void *), packet->data, packet->tot_data_buf_len); m_new->m_pkthdr.len = m_new->m_len = packet->tot_data_buf_len; sc->hn_small_pkts++; } else { /* * Get an mbuf with a cluster. For packets 2K or less, * get a standard 2K cluster. For anything larger, get a * 4K cluster. Any buffers larger than 4K can cause problems * if looped around to the Hyper-V TX channel, so avoid them. */ size = MCLBYTES; if (packet->tot_data_buf_len > MCLBYTES) { /* 4096 */ size = MJUMPAGESIZE; } m_new = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, size); if (m_new == NULL) { device_printf(dev, "alloc mbuf failed.\n"); return (0); } hv_m_append(m_new, packet->tot_data_buf_len, packet->data); } m_new->m_pkthdr.rcvif = ifp; /* receive side checksum offload */ if (NULL != csum_info) { /* IP csum offload */ if (csum_info->receive.ip_csum_succeeded) { m_new->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID); sc->hn_csum_ip++; } /* TCP csum offload */ if (csum_info->receive.tcp_csum_succeeded) { m_new->m_pkthdr.csum_flags |= (CSUM_DATA_VALID | CSUM_PSEUDO_HDR); m_new->m_pkthdr.csum_data = 0xffff; sc->hn_csum_tcp++; } if (csum_info->receive.ip_csum_succeeded && csum_info->receive.tcp_csum_succeeded) do_lro = 1; } else { const struct ether_header *eh; uint16_t etype; int hoff; hoff = sizeof(*eh); if (m_new->m_len < hoff) goto skip; eh = mtod(m_new, struct ether_header *); etype = ntohs(eh->ether_type); if (etype == ETHERTYPE_VLAN) { const struct ether_vlan_header *evl; hoff = sizeof(*evl); if (m_new->m_len < hoff) goto skip; evl = mtod(m_new, struct ether_vlan_header *); etype = ntohs(evl->evl_proto); } if (etype == ETHERTYPE_IP) { int pr; pr = hn_check_iplen(m_new, hoff); if (pr == IPPROTO_TCP) { if (sc->hn_trust_hosttcp) { sc->hn_csum_trusted++; m_new->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID | CSUM_DATA_VALID | CSUM_PSEUDO_HDR); m_new->m_pkthdr.csum_data = 0xffff; } /* Rely on SW csum verification though... */ do_lro = 1; } } } skip: if ((packet->vlan_tci != 0) && (ifp->if_capenable & IFCAP_VLAN_HWTAGGING) != 0) { m_new->m_pkthdr.ether_vtag = packet->vlan_tci; m_new->m_flags |= M_VLANTAG; } /* * Note: Moved RX completion back to hv_nv_on_receive() so all * messages (not just data messages) will trigger a response. */ if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); if ((ifp->if_capenable & IFCAP_LRO) && do_lro) { #if defined(INET) || defined(INET6) struct lro_ctrl *lro = &sc->hn_lro; if (lro->lro_cnt) { sc->hn_lro_tried++; if (tcp_lro_rx(lro, m_new, 0) == 0) { /* DONE! */ return 0; } } #endif } /* We're not holding the lock here, so don't release it */ (*ifp->if_input)(ifp, m_new); return (0); } void netvsc_recv_rollup(struct hv_device *device_ctx) { #if defined(INET) || defined(INET6) hn_softc_t *sc = device_get_softc(device_ctx->device); struct lro_ctrl *lro = &sc->hn_lro; struct lro_entry *queued; while ((queued = SLIST_FIRST(&lro->lro_active)) != NULL) { SLIST_REMOVE_HEAD(&lro->lro_active, next); tcp_lro_flush(lro, queued); } #endif } /* * Rules for using sc->temp_unusable: * 1. sc->temp_unusable can only be read or written while holding NV_LOCK() * 2. code reading sc->temp_unusable under NV_LOCK(), and finding * sc->temp_unusable set, must release NV_LOCK() and exit * 3. to retain exclusive control of the interface, * sc->temp_unusable must be set by code before releasing NV_LOCK() * 4. only code setting sc->temp_unusable can clear sc->temp_unusable * 5. code setting sc->temp_unusable must eventually clear sc->temp_unusable */ /* * Standard ioctl entry point. Called when the user wants to configure * the interface. */ static int hn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { hn_softc_t *sc = ifp->if_softc; struct ifreq *ifr = (struct ifreq *)data; #ifdef INET struct ifaddr *ifa = (struct ifaddr *)data; #endif netvsc_device_info device_info; struct hv_device *hn_dev; int mask, error = 0; int retry_cnt = 500; switch(cmd) { case SIOCSIFADDR: #ifdef INET if (ifa->ifa_addr->sa_family == AF_INET) { ifp->if_flags |= IFF_UP; if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) hn_ifinit(sc); arp_ifinit(ifp, ifa); } else #endif error = ether_ioctl(ifp, cmd, data); break; case SIOCSIFMTU: hn_dev = vmbus_get_devctx(sc->hn_dev); /* Check MTU value change */ if (ifp->if_mtu == ifr->ifr_mtu) break; if (ifr->ifr_mtu > NETVSC_MAX_CONFIGURABLE_MTU) { error = EINVAL; break; } /* Obtain and record requested MTU */ ifp->if_mtu = ifr->ifr_mtu; /* * Make sure that LRO high watermark is still valid, * after MTU change (the 2*MTU limit). */ if (!HN_LRO_HIWAT_ISVALID(sc, sc->hn_lro_hiwat)) hn_set_lro_hiwat(sc, HN_LRO_HIWAT_MTULIM(ifp)); do { NV_LOCK(sc); if (!sc->temp_unusable) { sc->temp_unusable = TRUE; retry_cnt = -1; } NV_UNLOCK(sc); if (retry_cnt > 0) { retry_cnt--; DELAY(5 * 1000); } } while (retry_cnt > 0); if (retry_cnt == 0) { error = EINVAL; break; } /* We must remove and add back the device to cause the new * MTU to take effect. This includes tearing down, but not * deleting the channel, then bringing it back up. */ error = hv_rf_on_device_remove(hn_dev, HV_RF_NV_RETAIN_CHANNEL); if (error) { NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); break; } error = hv_rf_on_device_add(hn_dev, &device_info); if (error) { NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); break; } + sc->hn_tx_chimney_max = sc->net_dev->send_section_size; + if (sc->hn_tx_chimney_size > sc->hn_tx_chimney_max) + sc->hn_tx_chimney_size = sc->hn_tx_chimney_max; hn_ifinit_locked(sc); NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); break; case SIOCSIFFLAGS: do { NV_LOCK(sc); if (!sc->temp_unusable) { sc->temp_unusable = TRUE; retry_cnt = -1; } NV_UNLOCK(sc); if (retry_cnt > 0) { retry_cnt--; DELAY(5 * 1000); } } while (retry_cnt > 0); if (retry_cnt == 0) { error = EINVAL; break; } if (ifp->if_flags & IFF_UP) { /* * If only the state of the PROMISC flag changed, * then just use the 'set promisc mode' command * instead of reinitializing the entire NIC. Doing * a full re-init means reloading the firmware and * waiting for it to start up, which may take a * second or two. */ #ifdef notyet /* Fixme: Promiscuous mode? */ if (ifp->if_drv_flags & IFF_DRV_RUNNING && ifp->if_flags & IFF_PROMISC && !(sc->hn_if_flags & IFF_PROMISC)) { /* do something here for Hyper-V */ } else if (ifp->if_drv_flags & IFF_DRV_RUNNING && !(ifp->if_flags & IFF_PROMISC) && sc->hn_if_flags & IFF_PROMISC) { /* do something here for Hyper-V */ } else #endif hn_ifinit_locked(sc); } else { if (ifp->if_drv_flags & IFF_DRV_RUNNING) { hn_stop(sc); } } NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); sc->hn_if_flags = ifp->if_flags; error = 0; break; case SIOCSIFCAP: mask = ifr->ifr_reqcap ^ ifp->if_capenable; if (mask & IFCAP_TXCSUM) { if (IFCAP_TXCSUM & ifp->if_capenable) { ifp->if_capenable &= ~IFCAP_TXCSUM; ifp->if_hwassist &= ~(CSUM_TCP | CSUM_UDP); } else { ifp->if_capenable |= IFCAP_TXCSUM; /* * Only enable UDP checksum offloading on * Windows Server 2012R2 or later releases. */ if (hv_vmbus_protocal_version >= HV_VMBUS_VERSION_WIN8_1) { ifp->if_hwassist |= (CSUM_TCP | CSUM_UDP); } else { ifp->if_hwassist |= CSUM_TCP; } } } if (mask & IFCAP_RXCSUM) { if (IFCAP_RXCSUM & ifp->if_capenable) { ifp->if_capenable &= ~IFCAP_RXCSUM; } else { ifp->if_capenable |= IFCAP_RXCSUM; } } if (mask & IFCAP_LRO) ifp->if_capenable ^= IFCAP_LRO; if (mask & IFCAP_TSO4) { ifp->if_capenable ^= IFCAP_TSO4; ifp->if_hwassist ^= CSUM_IP_TSO; } if (mask & IFCAP_TSO6) { ifp->if_capenable ^= IFCAP_TSO6; ifp->if_hwassist ^= CSUM_IP6_TSO; } error = 0; break; case SIOCADDMULTI: case SIOCDELMULTI: #ifdef notyet /* Fixme: Multicast mode? */ if (ifp->if_drv_flags & IFF_DRV_RUNNING) { NV_LOCK(sc); netvsc_setmulti(sc); NV_UNLOCK(sc); error = 0; } #endif error = EINVAL; break; case SIOCSIFMEDIA: case SIOCGIFMEDIA: error = ifmedia_ioctl(ifp, ifr, &sc->hn_media, cmd); break; default: error = ether_ioctl(ifp, cmd, data); break; } return (error); } /* * */ static void hn_stop(hn_softc_t *sc) { struct ifnet *ifp; int ret; struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev); ifp = sc->hn_ifp; if (bootverbose) printf(" Closing Device ...\n"); ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); if_link_state_change(ifp, LINK_STATE_DOWN); sc->hn_initdone = 0; ret = hv_rf_on_close(device_ctx); } /* * FreeBSD transmit entry point */ static void hn_start(struct ifnet *ifp) { hn_softc_t *sc; sc = ifp->if_softc; NV_LOCK(sc); if (sc->temp_unusable) { NV_UNLOCK(sc); return; } hn_start_locked(ifp); NV_UNLOCK(sc); } /* * */ static void hn_ifinit_locked(hn_softc_t *sc) { struct ifnet *ifp; struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev); int ret; ifp = sc->hn_ifp; if (ifp->if_drv_flags & IFF_DRV_RUNNING) { return; } hv_promisc_mode = 1; ret = hv_rf_on_open(device_ctx); if (ret != 0) { return; } else { sc->hn_initdone = 1; } ifp->if_drv_flags |= IFF_DRV_RUNNING; ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; if_link_state_change(ifp, LINK_STATE_UP); } /* * */ static void hn_ifinit(void *xsc) { hn_softc_t *sc = xsc; NV_LOCK(sc); if (sc->temp_unusable) { NV_UNLOCK(sc); return; } sc->temp_unusable = TRUE; NV_UNLOCK(sc); hn_ifinit_locked(sc); NV_LOCK(sc); sc->temp_unusable = FALSE; NV_UNLOCK(sc); } #ifdef LATER /* * */ static void hn_watchdog(struct ifnet *ifp) { hn_softc_t *sc; sc = ifp->if_softc; printf("hn%d: watchdog timeout -- resetting\n", sc->hn_unit); hn_ifinit(sc); /*???*/ if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); } #endif #ifdef HN_LRO_HIWAT static int hn_lro_hiwat_sysctl(SYSCTL_HANDLER_ARGS) { struct hn_softc *sc = arg1; int hiwat, error; hiwat = sc->hn_lro_hiwat; error = sysctl_handle_int(oidp, &hiwat, 0, req); if (error || req->newptr == NULL) return error; if (!HN_LRO_HIWAT_ISVALID(sc, hiwat)) return EINVAL; if (sc->hn_lro_hiwat != hiwat) hn_set_lro_hiwat(sc, hiwat); return 0; } #endif /* HN_LRO_HIWAT */ static int +hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS) +{ + struct hn_softc *sc = arg1; + int chimney_size, error; + + chimney_size = sc->hn_tx_chimney_size; + error = sysctl_handle_int(oidp, &chimney_size, 0, req); + if (error || req->newptr == NULL) + return error; + + if (chimney_size > sc->hn_tx_chimney_max || chimney_size <= 0) + return EINVAL; + + if (sc->hn_tx_chimney_size != chimney_size) + sc->hn_tx_chimney_size = chimney_size; + return 0; +} + +static int hn_check_iplen(const struct mbuf *m, int hoff) { const struct ip *ip; int len, iphlen, iplen; const struct tcphdr *th; int thoff; /* TCP data offset */ len = hoff + sizeof(struct ip); /* The packet must be at least the size of an IP header. */ if (m->m_pkthdr.len < len) return IPPROTO_DONE; /* The fixed IP header must reside completely in the first mbuf. */ if (m->m_len < len) return IPPROTO_DONE; ip = mtodo(m, hoff); /* Bound check the packet's stated IP header length. */ iphlen = ip->ip_hl << 2; if (iphlen < sizeof(struct ip)) /* minimum header length */ return IPPROTO_DONE; /* The full IP header must reside completely in the one mbuf. */ if (m->m_len < hoff + iphlen) return IPPROTO_DONE; iplen = ntohs(ip->ip_len); /* * Check that the amount of data in the buffers is as * at least much as the IP header would have us expect. */ if (m->m_pkthdr.len < hoff + iplen) return IPPROTO_DONE; /* * Ignore IP fragments. */ if (ntohs(ip->ip_off) & (IP_OFFMASK | IP_MF)) return IPPROTO_DONE; /* * The TCP/IP or UDP/IP header must be entirely contained within * the first fragment of a packet. */ switch (ip->ip_p) { case IPPROTO_TCP: if (iplen < iphlen + sizeof(struct tcphdr)) return IPPROTO_DONE; if (m->m_len < hoff + iphlen + sizeof(struct tcphdr)) return IPPROTO_DONE; th = (const struct tcphdr *)((const uint8_t *)ip + iphlen); thoff = th->th_off << 2; if (thoff < sizeof(struct tcphdr) || thoff + iphlen > iplen) return IPPROTO_DONE; if (m->m_len < hoff + iphlen + thoff) return IPPROTO_DONE; break; case IPPROTO_UDP: if (iplen < iphlen + sizeof(struct udphdr)) return IPPROTO_DONE; if (m->m_len < hoff + iphlen + sizeof(struct udphdr)) return IPPROTO_DONE; break; default: if (iplen < iphlen) return IPPROTO_DONE; break; } return ip->ip_p; +} + +static void +hn_dma_map_paddr(void *arg, bus_dma_segment_t *segs, int nseg, int error) +{ + bus_addr_t *paddr = arg; + + if (error) + return; + + KASSERT(nseg == 1, ("too many segments %d!", nseg)); + *paddr = segs->ds_addr; +} + +static int +hn_create_tx_ring(struct hn_softc *sc) +{ + bus_dma_tag_t parent_dtag; + int error, i; + + sc->hn_txdesc_cnt = HN_TX_DESC_CNT; + sc->hn_txdesc = malloc(sizeof(struct hn_txdesc) * sc->hn_txdesc_cnt, + M_NETVSC, M_WAITOK | M_ZERO); + SLIST_INIT(&sc->hn_txlist); + mtx_init(&sc->hn_txlist_spin, "hn txlist", NULL, MTX_SPIN); + + parent_dtag = bus_get_dma_tag(sc->hn_dev); + + /* DMA tag for RNDIS messages. */ + error = bus_dma_tag_create(parent_dtag, /* parent */ + HN_RNDIS_MSG_ALIGN, /* alignment */ + HN_RNDIS_MSG_BOUNDARY, /* boundary */ + BUS_SPACE_MAXADDR, /* lowaddr */ + BUS_SPACE_MAXADDR, /* highaddr */ + NULL, NULL, /* filter, filterarg */ + HN_RNDIS_MSG_LEN, /* maxsize */ + 1, /* nsegments */ + HN_RNDIS_MSG_LEN, /* maxsegsize */ + 0, /* flags */ + NULL, /* lockfunc */ + NULL, /* lockfuncarg */ + &sc->hn_tx_rndis_dtag); + if (error) { + device_printf(sc->hn_dev, "failed to create rndis dmatag\n"); + return error; + } + + /* DMA tag for data. */ + error = bus_dma_tag_create(parent_dtag, /* parent */ + 1, /* alignment */ + HN_TX_DATA_BOUNDARY, /* boundary */ + BUS_SPACE_MAXADDR, /* lowaddr */ + BUS_SPACE_MAXADDR, /* highaddr */ + NULL, NULL, /* filter, filterarg */ + HN_TX_DATA_MAXSIZE, /* maxsize */ + HN_TX_DATA_SEGCNT_MAX, /* nsegments */ + HN_TX_DATA_SEGSIZE, /* maxsegsize */ + 0, /* flags */ + NULL, /* lockfunc */ + NULL, /* lockfuncarg */ + &sc->hn_tx_data_dtag); + if (error) { + device_printf(sc->hn_dev, "failed to create data dmatag\n"); + return error; + } + + for (i = 0; i < sc->hn_txdesc_cnt; ++i) { + struct hn_txdesc *txd = &sc->hn_txdesc[i]; + + txd->sc = sc; + + /* + * Allocate and load RNDIS messages. + */ + error = bus_dmamem_alloc(sc->hn_tx_rndis_dtag, + (void **)&txd->rndis_msg, + BUS_DMA_WAITOK | BUS_DMA_COHERENT, + &txd->rndis_msg_dmap); + if (error) { + device_printf(sc->hn_dev, + "failed to allocate rndis_msg, %d\n", i); + return error; + } + + error = bus_dmamap_load(sc->hn_tx_rndis_dtag, + txd->rndis_msg_dmap, + txd->rndis_msg, HN_RNDIS_MSG_LEN, + hn_dma_map_paddr, &txd->rndis_msg_paddr, + BUS_DMA_NOWAIT); + if (error) { + device_printf(sc->hn_dev, + "failed to load rndis_msg, %d\n", i); + bus_dmamem_free(sc->hn_tx_rndis_dtag, + txd->rndis_msg, txd->rndis_msg_dmap); + return error; + } + + /* DMA map for TX data. */ + error = bus_dmamap_create(sc->hn_tx_data_dtag, 0, + &txd->data_dmap); + if (error) { + device_printf(sc->hn_dev, + "failed to allocate tx data dmamap\n"); + bus_dmamap_unload(sc->hn_tx_rndis_dtag, + txd->rndis_msg_dmap); + bus_dmamem_free(sc->hn_tx_rndis_dtag, + txd->rndis_msg, txd->rndis_msg_dmap); + return error; + } + + /* All set, put it to list */ + txd->flags |= HN_TXD_FLAG_ONLIST; + SLIST_INSERT_HEAD(&sc->hn_txlist, txd, link); + } + sc->hn_txdesc_avail = sc->hn_txdesc_cnt; + + return 0; +} + +static void +hn_destroy_tx_ring(struct hn_softc *sc) +{ + struct hn_txdesc *txd; + + while ((txd = SLIST_FIRST(&sc->hn_txlist)) != NULL) { + KASSERT(txd->m == NULL, ("still has mbuf installed")); + KASSERT((txd->flags & HN_TXD_FLAG_DMAMAP) == 0, + ("still dma mapped")); + SLIST_REMOVE_HEAD(&sc->hn_txlist, link); + + bus_dmamap_unload(sc->hn_tx_rndis_dtag, + txd->rndis_msg_dmap); + bus_dmamem_free(sc->hn_tx_rndis_dtag, + txd->rndis_msg, txd->rndis_msg_dmap); + + bus_dmamap_destroy(sc->hn_tx_data_dtag, txd->data_dmap); + } + + if (sc->hn_tx_data_dtag != NULL) + bus_dma_tag_destroy(sc->hn_tx_data_dtag); + if (sc->hn_tx_rndis_dtag != NULL) + bus_dma_tag_destroy(sc->hn_tx_rndis_dtag); + free(sc->hn_txdesc, M_NETVSC); + mtx_destroy(&sc->hn_txlist_spin); } static device_method_t netvsc_methods[] = { /* Device interface */ DEVMETHOD(device_probe, netvsc_probe), DEVMETHOD(device_attach, netvsc_attach), DEVMETHOD(device_detach, netvsc_detach), DEVMETHOD(device_shutdown, netvsc_shutdown), { 0, 0 } }; static driver_t netvsc_driver = { NETVSC_DEVNAME, netvsc_methods, sizeof(hn_softc_t) }; static devclass_t netvsc_devclass; DRIVER_MODULE(hn, vmbus, netvsc_driver, netvsc_devclass, 0, 0); MODULE_VERSION(hn, 1); MODULE_DEPEND(hn, vmbus, 1, 1, 1); Index: projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis.h =================================================================== --- projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis.h (revision 294776) +++ projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis.h (revision 294777) @@ -1,1061 +1,1062 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2010-2012 Citrix Inc. * Copyright (c) 2012 NetApp Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #ifndef __HV_RNDIS_H__ #define __HV_RNDIS_H__ /* * NDIS protocol version numbers */ #define NDIS_VERSION_5_0 0x00050000 #define NDIS_VERSION_5_1 0x00050001 #define NDIS_VERSION_6_0 0x00060000 #define NDIS_VERSION_6_1 0x00060001 #define NDIS_VERSION_6_30 0x0006001e #define NDIS_VERSION (NDIS_VERSION_5_1) /* * Status codes */ #define STATUS_SUCCESS (0x00000000L) #define STATUS_UNSUCCESSFUL (0xC0000001L) #define STATUS_PENDING (0x00000103L) #define STATUS_INSUFFICIENT_RESOURCES (0xC000009AL) #define STATUS_BUFFER_OVERFLOW (0x80000005L) #define STATUS_NOT_SUPPORTED (0xC00000BBL) #define RNDIS_STATUS_SUCCESS (STATUS_SUCCESS) #define RNDIS_STATUS_PENDING (STATUS_PENDING) #define RNDIS_STATUS_NOT_RECOGNIZED (0x00010001L) #define RNDIS_STATUS_NOT_COPIED (0x00010002L) #define RNDIS_STATUS_NOT_ACCEPTED (0x00010003L) #define RNDIS_STATUS_CALL_ACTIVE (0x00010007L) #define RNDIS_STATUS_ONLINE (0x40010003L) #define RNDIS_STATUS_RESET_START (0x40010004L) #define RNDIS_STATUS_RESET_END (0x40010005L) #define RNDIS_STATUS_RING_STATUS (0x40010006L) #define RNDIS_STATUS_CLOSED (0x40010007L) #define RNDIS_STATUS_WAN_LINE_UP (0x40010008L) #define RNDIS_STATUS_WAN_LINE_DOWN (0x40010009L) #define RNDIS_STATUS_WAN_FRAGMENT (0x4001000AL) #define RNDIS_STATUS_MEDIA_CONNECT (0x4001000BL) #define RNDIS_STATUS_MEDIA_DISCONNECT (0x4001000CL) #define RNDIS_STATUS_HARDWARE_LINE_UP (0x4001000DL) #define RNDIS_STATUS_HARDWARE_LINE_DOWN (0x4001000EL) #define RNDIS_STATUS_INTERFACE_UP (0x4001000FL) #define RNDIS_STATUS_INTERFACE_DOWN (0x40010010L) #define RNDIS_STATUS_MEDIA_BUSY (0x40010011L) #define RNDIS_STATUS_MEDIA_SPECIFIC_INDICATION (0x40010012L) #define RNDIS_STATUS_WW_INDICATION RNDIS_STATUS_MEDIA_SPECIFIC_INDICATION #define RNDIS_STATUS_LINK_SPEED_CHANGE (0x40010013L) #define RNDIS_STATUS_NOT_RESETTABLE (0x80010001L) #define RNDIS_STATUS_SOFT_ERRORS (0x80010003L) #define RNDIS_STATUS_HARD_ERRORS (0x80010004L) #define RNDIS_STATUS_BUFFER_OVERFLOW (STATUS_BUFFER_OVERFLOW) #define RNDIS_STATUS_FAILURE (STATUS_UNSUCCESSFUL) #define RNDIS_STATUS_RESOURCES (STATUS_INSUFFICIENT_RESOURCES) #define RNDIS_STATUS_CLOSING (0xC0010002L) #define RNDIS_STATUS_BAD_VERSION (0xC0010004L) #define RNDIS_STATUS_BAD_CHARACTERISTICS (0xC0010005L) #define RNDIS_STATUS_ADAPTER_NOT_FOUND (0xC0010006L) #define RNDIS_STATUS_OPEN_FAILED (0xC0010007L) #define RNDIS_STATUS_DEVICE_FAILED (0xC0010008L) #define RNDIS_STATUS_MULTICAST_FULL (0xC0010009L) #define RNDIS_STATUS_MULTICAST_EXISTS (0xC001000AL) #define RNDIS_STATUS_MULTICAST_NOT_FOUND (0xC001000BL) #define RNDIS_STATUS_REQUEST_ABORTED (0xC001000CL) #define RNDIS_STATUS_RESET_IN_PROGRESS (0xC001000DL) #define RNDIS_STATUS_CLOSING_INDICATING (0xC001000EL) #define RNDIS_STATUS_NOT_SUPPORTED (STATUS_NOT_SUPPORTED) #define RNDIS_STATUS_INVALID_PACKET (0xC001000FL) #define RNDIS_STATUS_OPEN_LIST_FULL (0xC0010010L) #define RNDIS_STATUS_ADAPTER_NOT_READY (0xC0010011L) #define RNDIS_STATUS_ADAPTER_NOT_OPEN (0xC0010012L) #define RNDIS_STATUS_NOT_INDICATING (0xC0010013L) #define RNDIS_STATUS_INVALID_LENGTH (0xC0010014L) #define RNDIS_STATUS_INVALID_DATA (0xC0010015L) #define RNDIS_STATUS_BUFFER_TOO_SHORT (0xC0010016L) #define RNDIS_STATUS_INVALID_OID (0xC0010017L) #define RNDIS_STATUS_ADAPTER_REMOVED (0xC0010018L) #define RNDIS_STATUS_UNSUPPORTED_MEDIA (0xC0010019L) #define RNDIS_STATUS_GROUP_ADDRESS_IN_USE (0xC001001AL) #define RNDIS_STATUS_FILE_NOT_FOUND (0xC001001BL) #define RNDIS_STATUS_ERROR_READING_FILE (0xC001001CL) #define RNDIS_STATUS_ALREADY_MAPPED (0xC001001DL) #define RNDIS_STATUS_RESOURCE_CONFLICT (0xC001001EL) #define RNDIS_STATUS_NO_CABLE (0xC001001FL) #define RNDIS_STATUS_INVALID_SAP (0xC0010020L) #define RNDIS_STATUS_SAP_IN_USE (0xC0010021L) #define RNDIS_STATUS_INVALID_ADDRESS (0xC0010022L) #define RNDIS_STATUS_VC_NOT_ACTIVATED (0xC0010023L) #define RNDIS_STATUS_DEST_OUT_OF_ORDER (0xC0010024L) #define RNDIS_STATUS_VC_NOT_AVAILABLE (0xC0010025L) #define RNDIS_STATUS_CELLRATE_NOT_AVAILABLE (0xC0010026L) #define RNDIS_STATUS_INCOMPATABLE_QOS (0xC0010027L) #define RNDIS_STATUS_AAL_PARAMS_UNSUPPORTED (0xC0010028L) #define RNDIS_STATUS_NO_ROUTE_TO_DESTINATION (0xC0010029L) #define RNDIS_STATUS_TOKEN_RING_OPEN_ERROR (0xC0011000L) /* * Object Identifiers used by NdisRequest Query/Set Information */ /* * General Objects */ #define RNDIS_OID_GEN_SUPPORTED_LIST 0x00010101 #define RNDIS_OID_GEN_HARDWARE_STATUS 0x00010102 #define RNDIS_OID_GEN_MEDIA_SUPPORTED 0x00010103 #define RNDIS_OID_GEN_MEDIA_IN_USE 0x00010104 #define RNDIS_OID_GEN_MAXIMUM_LOOKAHEAD 0x00010105 #define RNDIS_OID_GEN_MAXIMUM_FRAME_SIZE 0x00010106 #define RNDIS_OID_GEN_LINK_SPEED 0x00010107 #define RNDIS_OID_GEN_TRANSMIT_BUFFER_SPACE 0x00010108 #define RNDIS_OID_GEN_RECEIVE_BUFFER_SPACE 0x00010109 #define RNDIS_OID_GEN_TRANSMIT_BLOCK_SIZE 0x0001010A #define RNDIS_OID_GEN_RECEIVE_BLOCK_SIZE 0x0001010B #define RNDIS_OID_GEN_VENDOR_ID 0x0001010C #define RNDIS_OID_GEN_VENDOR_DESCRIPTION 0x0001010D #define RNDIS_OID_GEN_CURRENT_PACKET_FILTER 0x0001010E #define RNDIS_OID_GEN_CURRENT_LOOKAHEAD 0x0001010F #define RNDIS_OID_GEN_DRIVER_VERSION 0x00010110 #define RNDIS_OID_GEN_MAXIMUM_TOTAL_SIZE 0x00010111 #define RNDIS_OID_GEN_PROTOCOL_OPTIONS 0x00010112 #define RNDIS_OID_GEN_MAC_OPTIONS 0x00010113 #define RNDIS_OID_GEN_MEDIA_CONNECT_STATUS 0x00010114 #define RNDIS_OID_GEN_MAXIMUM_SEND_PACKETS 0x00010115 #define RNDIS_OID_GEN_VENDOR_DRIVER_VERSION 0x00010116 #define RNDIS_OID_GEN_NETWORK_LAYER_ADDRESSES 0x00010118 #define RNDIS_OID_GEN_TRANSPORT_HEADER_OFFSET 0x00010119 #define RNDIS_OID_GEN_MACHINE_NAME 0x0001021A #define RNDIS_OID_GEN_RNDIS_CONFIG_PARAMETER 0x0001021B #define RNDIS_OID_GEN_XMIT_OK 0x00020101 #define RNDIS_OID_GEN_RCV_OK 0x00020102 #define RNDIS_OID_GEN_XMIT_ERROR 0x00020103 #define RNDIS_OID_GEN_RCV_ERROR 0x00020104 #define RNDIS_OID_GEN_RCV_NO_BUFFER 0x00020105 #define RNDIS_OID_GEN_DIRECTED_BYTES_XMIT 0x00020201 #define RNDIS_OID_GEN_DIRECTED_FRAMES_XMIT 0x00020202 #define RNDIS_OID_GEN_MULTICAST_BYTES_XMIT 0x00020203 #define RNDIS_OID_GEN_MULTICAST_FRAMES_XMIT 0x00020204 #define RNDIS_OID_GEN_BROADCAST_BYTES_XMIT 0x00020205 #define RNDIS_OID_GEN_BROADCAST_FRAMES_XMIT 0x00020206 #define RNDIS_OID_GEN_DIRECTED_BYTES_RCV 0x00020207 #define RNDIS_OID_GEN_DIRECTED_FRAMES_RCV 0x00020208 #define RNDIS_OID_GEN_MULTICAST_BYTES_RCV 0x00020209 #define RNDIS_OID_GEN_MULTICAST_FRAMES_RCV 0x0002020A #define RNDIS_OID_GEN_BROADCAST_BYTES_RCV 0x0002020B #define RNDIS_OID_GEN_BROADCAST_FRAMES_RCV 0x0002020C #define RNDIS_OID_GEN_RCV_CRC_ERROR 0x0002020D #define RNDIS_OID_GEN_TRANSMIT_QUEUE_LENGTH 0x0002020E #define RNDIS_OID_GEN_GET_TIME_CAPS 0x0002020F #define RNDIS_OID_GEN_GET_NETCARD_TIME 0x00020210 /* * These are connection-oriented general OIDs. * These replace the above OIDs for connection-oriented media. */ #define RNDIS_OID_GEN_CO_SUPPORTED_LIST 0x00010101 #define RNDIS_OID_GEN_CO_HARDWARE_STATUS 0x00010102 #define RNDIS_OID_GEN_CO_MEDIA_SUPPORTED 0x00010103 #define RNDIS_OID_GEN_CO_MEDIA_IN_USE 0x00010104 #define RNDIS_OID_GEN_CO_LINK_SPEED 0x00010105 #define RNDIS_OID_GEN_CO_VENDOR_ID 0x00010106 #define RNDIS_OID_GEN_CO_VENDOR_DESCRIPTION 0x00010107 #define RNDIS_OID_GEN_CO_DRIVER_VERSION 0x00010108 #define RNDIS_OID_GEN_CO_PROTOCOL_OPTIONS 0x00010109 #define RNDIS_OID_GEN_CO_MAC_OPTIONS 0x0001010A #define RNDIS_OID_GEN_CO_MEDIA_CONNECT_STATUS 0x0001010B #define RNDIS_OID_GEN_CO_VENDOR_DRIVER_VERSION 0x0001010C #define RNDIS_OID_GEN_CO_MINIMUM_LINK_SPEED 0x0001010D #define RNDIS_OID_GEN_CO_GET_TIME_CAPS 0x00010201 #define RNDIS_OID_GEN_CO_GET_NETCARD_TIME 0x00010202 /* * These are connection-oriented statistics OIDs. */ #define RNDIS_OID_GEN_CO_XMIT_PDUS_OK 0x00020101 #define RNDIS_OID_GEN_CO_RCV_PDUS_OK 0x00020102 #define RNDIS_OID_GEN_CO_XMIT_PDUS_ERROR 0x00020103 #define RNDIS_OID_GEN_CO_RCV_PDUS_ERROR 0x00020104 #define RNDIS_OID_GEN_CO_RCV_PDUS_NO_BUFFER 0x00020105 #define RNDIS_OID_GEN_CO_RCV_CRC_ERROR 0x00020201 #define RNDIS_OID_GEN_CO_TRANSMIT_QUEUE_LENGTH 0x00020202 #define RNDIS_OID_GEN_CO_BYTES_XMIT 0x00020203 #define RNDIS_OID_GEN_CO_BYTES_RCV 0x00020204 #define RNDIS_OID_GEN_CO_BYTES_XMIT_OUTSTANDING 0x00020205 #define RNDIS_OID_GEN_CO_NETCARD_LOAD 0x00020206 /* * These are objects for Connection-oriented media call-managers. */ #define RNDIS_OID_CO_ADD_PVC 0xFF000001 #define RNDIS_OID_CO_DELETE_PVC 0xFF000002 #define RNDIS_OID_CO_GET_CALL_INFORMATION 0xFF000003 #define RNDIS_OID_CO_ADD_ADDRESS 0xFF000004 #define RNDIS_OID_CO_DELETE_ADDRESS 0xFF000005 #define RNDIS_OID_CO_GET_ADDRESSES 0xFF000006 #define RNDIS_OID_CO_ADDRESS_CHANGE 0xFF000007 #define RNDIS_OID_CO_SIGNALING_ENABLED 0xFF000008 #define RNDIS_OID_CO_SIGNALING_DISABLED 0xFF000009 /* * 802.3 Objects (Ethernet) */ #define RNDIS_OID_802_3_PERMANENT_ADDRESS 0x01010101 #define RNDIS_OID_802_3_CURRENT_ADDRESS 0x01010102 #define RNDIS_OID_802_3_MULTICAST_LIST 0x01010103 #define RNDIS_OID_802_3_MAXIMUM_LIST_SIZE 0x01010104 #define RNDIS_OID_802_3_MAC_OPTIONS 0x01010105 /* * */ #define NDIS_802_3_MAC_OPTION_PRIORITY 0x00000001 #define RNDIS_OID_802_3_RCV_ERROR_ALIGNMENT 0x01020101 #define RNDIS_OID_802_3_XMIT_ONE_COLLISION 0x01020102 #define RNDIS_OID_802_3_XMIT_MORE_COLLISIONS 0x01020103 #define RNDIS_OID_802_3_XMIT_DEFERRED 0x01020201 #define RNDIS_OID_802_3_XMIT_MAX_COLLISIONS 0x01020202 #define RNDIS_OID_802_3_RCV_OVERRUN 0x01020203 #define RNDIS_OID_802_3_XMIT_UNDERRUN 0x01020204 #define RNDIS_OID_802_3_XMIT_HEARTBEAT_FAILURE 0x01020205 #define RNDIS_OID_802_3_XMIT_TIMES_CRS_LOST 0x01020206 #define RNDIS_OID_802_3_XMIT_LATE_COLLISIONS 0x01020207 /* * RNDIS MP custom OID for test */ #define OID_RNDISMP_GET_RECEIVE_BUFFERS 0xFFA0C90D // Query only /* * Remote NDIS message types */ #define REMOTE_NDIS_PACKET_MSG 0x00000001 #define REMOTE_NDIS_INITIALIZE_MSG 0x00000002 #define REMOTE_NDIS_HALT_MSG 0x00000003 #define REMOTE_NDIS_QUERY_MSG 0x00000004 #define REMOTE_NDIS_SET_MSG 0x00000005 #define REMOTE_NDIS_RESET_MSG 0x00000006 #define REMOTE_NDIS_INDICATE_STATUS_MSG 0x00000007 #define REMOTE_NDIS_KEEPALIVE_MSG 0x00000008 #define REMOTE_CONDIS_MP_CREATE_VC_MSG 0x00008001 #define REMOTE_CONDIS_MP_DELETE_VC_MSG 0x00008002 #define REMOTE_CONDIS_MP_ACTIVATE_VC_MSG 0x00008005 #define REMOTE_CONDIS_MP_DEACTIVATE_VC_MSG 0x00008006 #define REMOTE_CONDIS_INDICATE_STATUS_MSG 0x00008007 /* * Remote NDIS message completion types */ #define REMOTE_NDIS_INITIALIZE_CMPLT 0x80000002 #define REMOTE_NDIS_QUERY_CMPLT 0x80000004 #define REMOTE_NDIS_SET_CMPLT 0x80000005 #define REMOTE_NDIS_RESET_CMPLT 0x80000006 #define REMOTE_NDIS_KEEPALIVE_CMPLT 0x80000008 #define REMOTE_CONDIS_MP_CREATE_VC_CMPLT 0x80008001 #define REMOTE_CONDIS_MP_DELETE_VC_CMPLT 0x80008002 #define REMOTE_CONDIS_MP_ACTIVATE_VC_CMPLT 0x80008005 #define REMOTE_CONDIS_MP_DEACTIVATE_VC_CMPLT 0x80008006 /* * Reserved message type for private communication between lower-layer * host driver and remote device, if necessary. */ #define REMOTE_NDIS_BUS_MSG 0xff000001 /* * Defines for DeviceFlags in rndis_initialize_complete */ #define RNDIS_DF_CONNECTIONLESS 0x00000001 #define RNDIS_DF_CONNECTION_ORIENTED 0x00000002 #define RNDIS_DF_RAW_DATA 0x00000004 /* * Remote NDIS medium types. */ #define RNDIS_MEDIUM_802_3 0x00000000 #define RNDIS_MEDIUM_802_5 0x00000001 #define RNDIS_MEDIUM_FDDI 0x00000002 #define RNDIS_MEDIUM_WAN 0x00000003 #define RNDIS_MEDIUM_LOCAL_TALK 0x00000004 #define RNDIS_MEDIUM_ARCNET_RAW 0x00000006 #define RNDIS_MEDIUM_ARCNET_878_2 0x00000007 #define RNDIS_MEDIUM_ATM 0x00000008 #define RNDIS_MEDIUM_WIRELESS_WAN 0x00000009 #define RNDIS_MEDIUM_IRDA 0x0000000a #define RNDIS_MEDIUM_CO_WAN 0x0000000b /* Not a real medium, defined as an upper bound */ #define RNDIS_MEDIUM_MAX 0x0000000d /* * Remote NDIS medium connection states. */ #define RNDIS_MEDIA_STATE_CONNECTED 0x00000000 #define RNDIS_MEDIA_STATE_DISCONNECTED 0x00000001 /* * Remote NDIS version numbers */ #define RNDIS_MAJOR_VERSION 0x00000001 #define RNDIS_MINOR_VERSION 0x00000000 /* * Remote NDIS offload parameters */ #define RNDIS_OBJECT_TYPE_DEFAULT 0x80 #define RNDIS_OFFLOAD_PARAMETERS_REVISION_3 3 #define RNDIS_OFFLOAD_PARAMETERS_NO_CHANGE 0 #define RNDIS_OFFLOAD_PARAMETERS_LSOV2_DISABLED 1 #define RNDIS_OFFLOAD_PARAMETERS_LSOV2_ENABLED 2 #define RNDIS_OFFLOAD_PARAMETERS_LSOV1_ENABLED 2 #define RNDIS_OFFLOAD_PARAMETERS_RSC_DISABLED 1 #define RNDIS_OFFLOAD_PARAMETERS_RSC_ENABLED 2 #define RNDIS_OFFLOAD_PARAMETERS_TX_RX_DISABLED 1 #define RNDIS_OFFLOAD_PARAMETERS_TX_ENABLED_RX_DISABLED 2 #define RNDIS_OFFLOAD_PARAMETERS_RX_ENABLED_TX_DISABLED 3 #define RNDIS_OFFLOAD_PARAMETERS_TX_RX_ENABLED 4 #define RNDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE 1 #define RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV4 0 #define RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV6 1 #define RNDIS_OID_TCP_OFFLOAD_CURRENT_CONFIG 0xFC01020B /* query only */ #define RNDIS_OID_TCP_OFFLOAD_PARAMETERS 0xFC01020C /* set only */ #define RNDIS_OID_TCP_OFFLOAD_HARDWARE_CAPABILITIES 0xFC01020D/* query only */ #define RNDIS_OID_TCP_CONNECTION_OFFLOAD_CURRENT_CONFIG 0xFC01020E /* query only */ #define RNDIS_OID_TCP_CONNECTION_OFFLOAD_HARDWARE_CAPABILITIES 0xFC01020F /* query */ #define RNDIS_OID_OFFLOAD_ENCAPSULATION 0x0101010A /* set/query */ /* * NdisInitialize message */ typedef struct rndis_initialize_request_ { /* RNDIS request ID */ uint32_t request_id; uint32_t major_version; uint32_t minor_version; uint32_t max_xfer_size; } rndis_initialize_request; /* * Response to NdisInitialize */ typedef struct rndis_initialize_complete_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS status */ uint32_t status; uint32_t major_version; uint32_t minor_version; uint32_t device_flags; /* RNDIS medium */ uint32_t medium; uint32_t max_pkts_per_msg; uint32_t max_xfer_size; uint32_t pkt_align_factor; uint32_t af_list_offset; uint32_t af_list_size; } rndis_initialize_complete; /* * Call manager devices only: Information about an address family * supported by the device is appended to the response to NdisInitialize. */ typedef struct rndis_co_address_family_ { /* RNDIS AF */ uint32_t address_family; uint32_t major_version; uint32_t minor_version; } rndis_co_address_family; /* * NdisHalt message */ typedef struct rndis_halt_request_ { /* RNDIS request ID */ uint32_t request_id; } rndis_halt_request; /* * NdisQueryRequest message */ typedef struct rndis_query_request_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS OID */ uint32_t oid; uint32_t info_buffer_length; uint32_t info_buffer_offset; /* RNDIS handle */ uint32_t device_vc_handle; } rndis_query_request; /* * Response to NdisQueryRequest */ typedef struct rndis_query_complete_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS status */ uint32_t status; uint32_t info_buffer_length; uint32_t info_buffer_offset; } rndis_query_complete; /* * NdisSetRequest message */ typedef struct rndis_set_request_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS OID */ uint32_t oid; uint32_t info_buffer_length; uint32_t info_buffer_offset; /* RNDIS handle */ uint32_t device_vc_handle; } rndis_set_request; /* * Response to NdisSetRequest */ typedef struct rndis_set_complete_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS status */ uint32_t status; } rndis_set_complete; /* * NdisReset message */ typedef struct rndis_reset_request_ { uint32_t reserved; } rndis_reset_request; /* * Response to NdisReset */ typedef struct rndis_reset_complete_ { /* RNDIS status */ uint32_t status; uint32_t addressing_reset; } rndis_reset_complete; /* * NdisMIndicateStatus message */ typedef struct rndis_indicate_status_ { /* RNDIS status */ uint32_t status; uint32_t status_buf_length; uint32_t status_buf_offset; } rndis_indicate_status; /* * Diagnostic information passed as the status buffer in * rndis_indicate_status messages signifying error conditions. */ typedef struct rndis_diagnostic_info_ { /* RNDIS status */ uint32_t diag_status; uint32_t error_offset; } rndis_diagnostic_info; /* * NdisKeepAlive message */ typedef struct rndis_keepalive_request_ { /* RNDIS request ID */ uint32_t request_id; } rndis_keepalive_request; /* * Response to NdisKeepAlive */ typedef struct rndis_keepalive_complete_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS status */ uint32_t status; } rndis_keepalive_complete; /* * Data message. All offset fields contain byte offsets from the beginning * of the rndis_packet structure. All length fields are in bytes. * VcHandle is set to 0 for connectionless data, otherwise it * contains the VC handle. */ typedef struct rndis_packet_ { uint32_t data_offset; uint32_t data_length; uint32_t oob_data_offset; uint32_t oob_data_length; uint32_t num_oob_data_elements; uint32_t per_pkt_info_offset; uint32_t per_pkt_info_length; /* RNDIS handle */ uint32_t vc_handle; uint32_t reserved; } rndis_packet; typedef struct rndis_packet_ex_ { uint32_t data_offset; uint32_t data_length; uint32_t oob_data_offset; uint32_t oob_data_length; uint32_t num_oob_data_elements; uint32_t per_pkt_info_offset; uint32_t per_pkt_info_length; /* RNDIS handle */ uint32_t vc_handle; uint32_t reserved; uint64_t data_buf_id; uint32_t data_buf_offset; uint64_t next_header_buf_id; uint32_t next_header_byte_offset; uint32_t next_header_byte_count; } rndis_packet_ex; /* * Optional Out of Band data associated with a Data message. */ typedef struct rndis_oobd_ { uint32_t size; /* RNDIS class ID */ uint32_t type; uint32_t class_info_offset; } rndis_oobd; /* * Packet extension field contents associated with a Data message. */ typedef struct rndis_per_packet_info_ { uint32_t size; uint32_t type; uint32_t per_packet_info_offset; } rndis_per_packet_info; typedef enum ndis_per_pkt_infotype_ { tcpip_chksum_info, ipsec_info, tcp_large_send_info, classification_handle_info, ndis_reserved, sgl_info, ieee_8021q_info, original_pkt_info, pkt_cancel_id, original_netbuf_list, cached_netbuf_list, short_pkt_padding_info, max_perpkt_info } ndis_per_pkt_infotype; typedef struct ndis_8021q_info_ { union { struct { uint32_t user_pri : 3; /* User Priority */ uint32_t cfi : 1; /* Canonical Format ID */ uint32_t vlan_id : 12; uint32_t reserved : 16; } s1; uint32_t value; } u1; } ndis_8021q_info; struct rndis_object_header { uint8_t type; uint8_t revision; uint16_t size; }; typedef struct rndis_offload_params_ { struct rndis_object_header header; uint8_t ipv4_csum; uint8_t tcp_ipv4_csum; uint8_t udp_ipv4_csum; uint8_t tcp_ipv6_csum; uint8_t udp_ipv6_csum; uint8_t lso_v1; uint8_t ip_sec_v1; uint8_t lso_v2_ipv4; uint8_t lso_v2_ipv6; uint8_t tcp_connection_ipv4; uint8_t tcp_connection_ipv6; uint32_t flags; uint8_t ip_sec_v2; uint8_t ip_sec_v2_ipv4; struct { uint8_t rsc_ipv4; uint8_t rsc_ipv6; }; struct { uint8_t encapsulated_packet_task_offload; uint8_t encapsulation_types; }; } rndis_offload_params; typedef struct rndis_tcp_ip_csum_info_ { union { struct { uint32_t is_ipv4:1; uint32_t is_ipv6:1; uint32_t tcp_csum:1; uint32_t udp_csum:1; uint32_t ip_header_csum:1; uint32_t reserved:11; uint32_t tcp_header_offset:10; } xmit; struct { uint32_t tcp_csum_failed:1; uint32_t udp_csum_failed:1; uint32_t ip_csum_failed:1; uint32_t tcp_csum_succeeded:1; uint32_t udp_csum_succeeded:1; uint32_t ip_csum_succeeded:1; uint32_t loopback:1; uint32_t tcp_csum_value_invalid:1; uint32_t ip_csum_value_invalid:1; } receive; uint32_t value; }; } rndis_tcp_ip_csum_info; typedef struct rndis_tcp_tso_info_ { union { struct { uint32_t unused:30; uint32_t type:1; uint32_t reserved2:1; } xmit; struct { uint32_t mss:20; uint32_t tcp_header_offset:10; uint32_t type:1; uint32_t reserved2:1; } lso_v1_xmit; struct { uint32_t tcp_payload:30; uint32_t type:1; uint32_t reserved2:1; } lso_v1_xmit_complete; struct { uint32_t mss:20; uint32_t tcp_header_offset:10; uint32_t type:1; uint32_t ip_version:1; } lso_v2_xmit; struct { uint32_t reserved:30; uint32_t type:1; uint32_t reserved2:1; } lso_v2_xmit_complete; uint32_t value; }; } rndis_tcp_tso_info; #define RNDIS_VLAN_PPI_SIZE (sizeof(rndis_per_packet_info) + \ sizeof(ndis_8021q_info)) #define RNDIS_CSUM_PPI_SIZE (sizeof(rndis_per_packet_info) + \ sizeof(rndis_tcp_ip_csum_info)) #define RNDIS_TSO_PPI_SIZE (sizeof(rndis_per_packet_info) + \ sizeof(rndis_tcp_tso_info)) /* * Format of Information buffer passed in a SetRequest for the OID * OID_GEN_RNDIS_CONFIG_PARAMETER. */ typedef struct rndis_config_parameter_info_ { uint32_t parameter_name_offset; uint32_t parameter_name_length; uint32_t parameter_type; uint32_t parameter_value_offset; uint32_t parameter_value_length; } rndis_config_parameter_info; /* * Values for ParameterType in rndis_config_parameter_info */ #define RNDIS_CONFIG_PARAM_TYPE_INTEGER 0 #define RNDIS_CONFIG_PARAM_TYPE_STRING 2 /* * CONDIS Miniport messages for connection oriented devices * that do not implement a call manager. */ /* * CoNdisMiniportCreateVc message */ typedef struct rcondis_mp_create_vc_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS handle */ uint32_t ndis_vc_handle; } rcondis_mp_create_vc; /* * Response to CoNdisMiniportCreateVc */ typedef struct rcondis_mp_create_vc_complete_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS handle */ uint32_t device_vc_handle; /* RNDIS status */ uint32_t status; } rcondis_mp_create_vc_complete; /* * CoNdisMiniportDeleteVc message */ typedef struct rcondis_mp_delete_vc_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS handle */ uint32_t device_vc_handle; } rcondis_mp_delete_vc; /* * Response to CoNdisMiniportDeleteVc */ typedef struct rcondis_mp_delete_vc_complete_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS status */ uint32_t status; } rcondis_mp_delete_vc_complete; /* * CoNdisMiniportQueryRequest message */ typedef struct rcondis_mp_query_request_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS request type */ uint32_t request_type; /* RNDIS OID */ uint32_t oid; /* RNDIS handle */ uint32_t device_vc_handle; uint32_t info_buf_length; uint32_t info_buf_offset; } rcondis_mp_query_request; /* * CoNdisMiniportSetRequest message */ typedef struct rcondis_mp_set_request_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS request type */ uint32_t request_type; /* RNDIS OID */ uint32_t oid; /* RNDIS handle */ uint32_t device_vc_handle; uint32_t info_buf_length; uint32_t info_buf_offset; } rcondis_mp_set_request; /* * CoNdisIndicateStatus message */ typedef struct rcondis_indicate_status_ { /* RNDIS handle */ uint32_t ndis_vc_handle; /* RNDIS status */ uint32_t status; uint32_t status_buf_length; uint32_t status_buf_offset; } rcondis_indicate_status; /* * CONDIS Call/VC parameters */ typedef struct rcondis_specific_parameters_ { uint32_t parameter_type; uint32_t parameter_length; uint32_t parameter_offset; } rcondis_specific_parameters; typedef struct rcondis_media_parameters_ { uint32_t flags; uint32_t reserved1; uint32_t reserved2; rcondis_specific_parameters media_specific; } rcondis_media_parameters; typedef struct rndis_flowspec_ { uint32_t token_rate; uint32_t token_bucket_size; uint32_t peak_bandwidth; uint32_t latency; uint32_t delay_variation; uint32_t service_type; uint32_t max_sdu_size; uint32_t minimum_policed_size; } rndis_flowspec; typedef struct rcondis_call_manager_parameters_ { rndis_flowspec transmit; rndis_flowspec receive; rcondis_specific_parameters call_mgr_specific; } rcondis_call_manager_parameters; /* * CoNdisMiniportActivateVc message */ typedef struct rcondis_mp_activate_vc_request_ { /* RNDIS request ID */ uint32_t request_id; uint32_t flags; /* RNDIS handle */ uint32_t device_vc_handle; uint32_t media_params_offset; uint32_t media_params_length; uint32_t call_mgr_params_offset; uint32_t call_mgr_params_length; } rcondis_mp_activate_vc_request; /* * Response to CoNdisMiniportActivateVc */ typedef struct rcondis_mp_activate_vc_complete_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS status */ uint32_t status; } rcondis_mp_activate_vc_complete; /* * CoNdisMiniportDeactivateVc message */ typedef struct rcondis_mp_deactivate_vc_request_ { /* RNDIS request ID */ uint32_t request_id; uint32_t flags; /* RNDIS handle */ uint32_t device_vc_handle; } rcondis_mp_deactivate_vc_request; /* * Response to CoNdisMiniportDeactivateVc */ typedef struct rcondis_mp_deactivate_vc_complete_ { /* RNDIS request ID */ uint32_t request_id; /* RNDIS status */ uint32_t status; } rcondis_mp_deactivate_vc_complete; /* * union with all of the RNDIS messages */ typedef union rndis_msg_container_ { rndis_packet packet; rndis_initialize_request init_request; rndis_halt_request halt_request; rndis_query_request query_request; rndis_set_request set_request; rndis_reset_request reset_request; rndis_keepalive_request keepalive_request; rndis_indicate_status indicate_status; rndis_initialize_complete init_complete; rndis_query_complete query_complete; rndis_set_complete set_complete; rndis_reset_complete reset_complete; rndis_keepalive_complete keepalive_complete; rcondis_mp_create_vc co_miniport_create_vc; rcondis_mp_delete_vc co_miniport_delete_vc; rcondis_indicate_status co_miniport_status; rcondis_mp_activate_vc_request co_miniport_activate_vc; rcondis_mp_deactivate_vc_request co_miniport_deactivate_vc; rcondis_mp_create_vc_complete co_miniport_create_vc_complete; rcondis_mp_delete_vc_complete co_miniport_delete_vc_complete; rcondis_mp_activate_vc_complete co_miniport_activate_vc_complete; rcondis_mp_deactivate_vc_complete co_miniport_deactivate_vc_complete; rndis_packet_ex packet_ex; } rndis_msg_container; /* * Remote NDIS message format */ typedef struct rndis_msg_ { uint32_t ndis_msg_type; /* * Total length of this message, from the beginning * of the rndis_msg struct, in bytes. */ uint32_t msg_len; /* Actual message */ rndis_msg_container msg; } rndis_msg; /* * Handy macros */ /* * get the size of an RNDIS message. Pass in the message type, * rndis_set_request, rndis_packet for example */ #define RNDIS_MESSAGE_SIZE(message) \ (sizeof(message) + (sizeof(rndis_msg) - sizeof(rndis_msg_container))) /* * get pointer to info buffer with message pointer */ #define MESSAGE_TO_INFO_BUFFER(message) \ (((PUCHAR)(message)) + message->InformationBufferOffset) /* * get pointer to status buffer with message pointer */ #define MESSAGE_TO_STATUS_BUFFER(message) \ (((PUCHAR)(message)) + message->StatusBufferOffset) /* * get pointer to OOBD buffer with message pointer */ #define MESSAGE_TO_OOBD_BUFFER(message) \ (((PUCHAR)(message)) + message->OOBDataOffset) /* * get pointer to data buffer with message pointer */ #define MESSAGE_TO_DATA_BUFFER(message) \ (((PUCHAR)(message)) + message->PerPacketInfoOffset) /* * get pointer to contained message from NDIS_MESSAGE pointer */ #define RNDIS_MESSAGE_PTR_TO_MESSAGE_PTR(rndis_message) \ ((void *) &rndis_message->Message) /* * get pointer to contained message from NDIS_MESSAGE pointer */ #define RNDIS_MESSAGE_RAW_PTR_TO_MESSAGE_PTR(rndis_message) \ ((void *) rndis_message) /* * Structures used in OID_RNDISMP_GET_RECEIVE_BUFFERS */ #define RNDISMP_RECEIVE_BUFFER_ELEM_FLAG_VMQ_RECEIVE_BUFFER 0x00000001 typedef struct rndismp_rx_buf_elem_ { uint32_t flags; uint32_t length; uint64_t rx_buf_id; uint32_t gpadl_handle; void *rx_buf; } rndismp_rx_buf_elem; typedef struct rndismp_rx_bufs_info_ { uint32_t num_rx_bufs; rndismp_rx_buf_elem rx_buf_elems[1]; } rndismp_rx_bufs_info; #define RNDIS_HEADER_SIZE (sizeof(rndis_msg) - sizeof(rndis_msg_container)) #define NDIS_PACKET_TYPE_DIRECTED 0x00000001 #define NDIS_PACKET_TYPE_MULTICAST 0x00000002 #define NDIS_PACKET_TYPE_ALL_MULTICAST 0x00000004 #define NDIS_PACKET_TYPE_BROADCAST 0x00000008 #define NDIS_PACKET_TYPE_SOURCE_ROUTING 0x00000010 #define NDIS_PACKET_TYPE_PROMISCUOUS 0x00000020 #define NDIS_PACKET_TYPE_SMT 0x00000040 #define NDIS_PACKET_TYPE_ALL_LOCAL 0x00000080 #define NDIS_PACKET_TYPE_GROUP 0x00000100 #define NDIS_PACKET_TYPE_ALL_FUNCTIONAL 0x00000200 #define NDIS_PACKET_TYPE_FUNCTIONAL 0x00000400 #define NDIS_PACKET_TYPE_MAC_FRAME 0x00000800 /* * Externs */ int netvsc_recv(struct hv_device *device_ctx, netvsc_packet *packet, rndis_tcp_ip_csum_info *csum_info); void netvsc_recv_rollup(struct hv_device *device_ctx); +void netvsc_channel_rollup(struct hv_device *device_ctx); void* hv_set_rppi_data(rndis_msg *rndis_mesg, uint32_t rppi_size, int pkt_type); void* hv_get_ppi_data(rndis_packet *rpkt, uint32_t type); #endif /* __HV_RNDIS_H__ */ Index: projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis_filter.c =================================================================== --- projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis_filter.c (revision 294776) +++ projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis_filter.c (revision 294777) @@ -1,976 +1,994 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2010-2012 Citrix Inc. * Copyright (c) 2012 NetApp Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "hv_net_vsc.h" #include "hv_rndis.h" #include "hv_rndis_filter.h" /* * Forward declarations */ static int hv_rf_send_request(rndis_device *device, rndis_request *request, uint32_t message_type); static void hv_rf_receive_response(rndis_device *device, rndis_msg *response); static void hv_rf_receive_indicate_status(rndis_device *device, rndis_msg *response); static void hv_rf_receive_data(rndis_device *device, rndis_msg *message, netvsc_packet *pkt); static int hv_rf_query_device(rndis_device *device, uint32_t oid, void *result, uint32_t *result_size); static inline int hv_rf_query_device_mac(rndis_device *device); static inline int hv_rf_query_device_link_status(rndis_device *device); static int hv_rf_set_packet_filter(rndis_device *device, uint32_t new_filter); static int hv_rf_init_device(rndis_device *device); static int hv_rf_open_device(rndis_device *device); static int hv_rf_close_device(rndis_device *device); static void hv_rf_on_send_request_completion(void *context); static void hv_rf_on_send_request_halt_completion(void *context); int hv_rf_send_offload_request(struct hv_device *device, rndis_offload_params *offloads); /* * Set the Per-Packet-Info with the specified type */ void * hv_set_rppi_data(rndis_msg *rndis_mesg, uint32_t rppi_size, int pkt_type) { rndis_packet *rndis_pkt; rndis_per_packet_info *rppi; rndis_pkt = &rndis_mesg->msg.packet; rndis_pkt->data_offset += rppi_size; rppi = (rndis_per_packet_info *)((char *)rndis_pkt + rndis_pkt->per_pkt_info_offset + rndis_pkt->per_pkt_info_length); rppi->size = rppi_size; rppi->type = pkt_type; rppi->per_packet_info_offset = sizeof(rndis_per_packet_info); rndis_pkt->per_pkt_info_length += rppi_size; return (rppi); } /* * Get the Per-Packet-Info with the specified type * return NULL if not found. */ void * hv_get_ppi_data(rndis_packet *rpkt, uint32_t type) { rndis_per_packet_info *ppi; int len; if (rpkt->per_pkt_info_offset == 0) return (NULL); ppi = (rndis_per_packet_info *)((unsigned long)rpkt + rpkt->per_pkt_info_offset); len = rpkt->per_pkt_info_length; while (len > 0) { if (ppi->type == type) return (void *)((unsigned long)ppi + ppi->per_packet_info_offset); len -= ppi->size; ppi = (rndis_per_packet_info *)((unsigned long)ppi + ppi->size); } return (NULL); } /* * Allow module_param to work and override to switch to promiscuous mode. */ static inline rndis_device * hv_get_rndis_device(void) { rndis_device *device; device = malloc(sizeof(rndis_device), M_NETVSC, M_NOWAIT | M_ZERO); if (device == NULL) { return (NULL); } mtx_init(&device->req_lock, "HV-FRL", NULL, MTX_SPIN | MTX_RECURSE); /* Same effect as STAILQ_HEAD_INITIALIZER() static initializer */ STAILQ_INIT(&device->myrequest_list); device->state = RNDIS_DEV_UNINITIALIZED; return (device); } /* * */ static inline void hv_put_rndis_device(rndis_device *device) { mtx_destroy(&device->req_lock); free(device, M_NETVSC); } /* * */ static inline rndis_request * hv_rndis_request(rndis_device *device, uint32_t message_type, uint32_t message_length) { rndis_request *request; rndis_msg *rndis_mesg; rndis_set_request *set; request = malloc(sizeof(rndis_request), M_NETVSC, M_NOWAIT | M_ZERO); if (request == NULL) { return (NULL); } sema_init(&request->wait_sema, 0, "rndis sema"); rndis_mesg = &request->request_msg; rndis_mesg->ndis_msg_type = message_type; rndis_mesg->msg_len = message_length; /* * Set the request id. This field is always after the rndis header * for request/response packet types so we just use the set_request * as a template. */ set = &rndis_mesg->msg.set_request; set->request_id = atomic_fetchadd_int(&device->new_request_id, 1); /* Increment to get the new value (call above returns old value) */ set->request_id += 1; /* Add to the request list */ mtx_lock_spin(&device->req_lock); STAILQ_INSERT_TAIL(&device->myrequest_list, request, mylist_entry); mtx_unlock_spin(&device->req_lock); return (request); } /* * */ static inline void hv_put_rndis_request(rndis_device *device, rndis_request *request) { mtx_lock_spin(&device->req_lock); /* Fixme: Has O(n) performance */ /* * XXXKYS: Use Doubly linked lists. */ STAILQ_REMOVE(&device->myrequest_list, request, rndis_request_, mylist_entry); mtx_unlock_spin(&device->req_lock); sema_destroy(&request->wait_sema); free(request, M_NETVSC); } /* * */ static int hv_rf_send_request(rndis_device *device, rndis_request *request, uint32_t message_type) { int ret; netvsc_packet *packet; /* Set up the packet to send it */ packet = &request->pkt; packet->is_data_pkt = FALSE; packet->tot_data_buf_len = request->request_msg.msg_len; packet->page_buf_count = 1; packet->page_buffers[0].pfn = hv_get_phys_addr(&request->request_msg) >> PAGE_SHIFT; packet->page_buffers[0].length = request->request_msg.msg_len; packet->page_buffers[0].offset = (unsigned long)&request->request_msg & (PAGE_SIZE - 1); packet->compl.send.send_completion_context = request; /* packet */ if (message_type != REMOTE_NDIS_HALT_MSG) { packet->compl.send.on_send_completion = hv_rf_on_send_request_completion; } else { packet->compl.send.on_send_completion = hv_rf_on_send_request_halt_completion; } packet->compl.send.send_completion_tid = (unsigned long)device; packet->send_buf_section_idx = NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX; packet->send_buf_section_size = 0; ret = hv_nv_on_send(device->net_dev->dev, packet); return (ret); } /* * RNDIS filter receive response */ static void hv_rf_receive_response(rndis_device *device, rndis_msg *response) { rndis_request *request = NULL; rndis_request *next_request; boolean_t found = FALSE; mtx_lock_spin(&device->req_lock); request = STAILQ_FIRST(&device->myrequest_list); while (request != NULL) { /* * All request/response message contains request_id as the * first field */ if (request->request_msg.msg.init_request.request_id == response->msg.init_complete.request_id) { found = TRUE; break; } next_request = STAILQ_NEXT(request, mylist_entry); request = next_request; } mtx_unlock_spin(&device->req_lock); if (found) { if (response->msg_len <= sizeof(rndis_msg)) { memcpy(&request->response_msg, response, response->msg_len); } else { if (response->ndis_msg_type == REMOTE_NDIS_RESET_CMPLT) { /* Does not have a request id field */ request->response_msg.msg.reset_complete.status = STATUS_BUFFER_OVERFLOW; } else { request->response_msg.msg.init_complete.status = STATUS_BUFFER_OVERFLOW; } } sema_post(&request->wait_sema); } } int hv_rf_send_offload_request(struct hv_device *device, rndis_offload_params *offloads) { rndis_request *request; rndis_set_request *set; rndis_offload_params *offload_req; rndis_set_complete *set_complete; rndis_device *rndis_dev; hn_softc_t *sc = device_get_softc(device->device); device_t dev = device->device; netvsc_dev *net_dev = sc->net_dev; uint32_t vsp_version = net_dev->nvsp_version; uint32_t extlen = sizeof(rndis_offload_params); int ret; if (vsp_version <= NVSP_PROTOCOL_VERSION_4) { extlen = VERSION_4_OFFLOAD_SIZE; /* On NVSP_PROTOCOL_VERSION_4 and below, we do not support * UDP checksum offload. */ offloads->udp_ipv4_csum = 0; offloads->udp_ipv6_csum = 0; } rndis_dev = net_dev->extension; request = hv_rndis_request(rndis_dev, REMOTE_NDIS_SET_MSG, RNDIS_MESSAGE_SIZE(rndis_set_request) + extlen); if (!request) return (ENOMEM); set = &request->request_msg.msg.set_request; set->oid = RNDIS_OID_TCP_OFFLOAD_PARAMETERS; set->info_buffer_length = extlen; set->info_buffer_offset = sizeof(rndis_set_request); set->device_vc_handle = 0; offload_req = (rndis_offload_params *)((unsigned long)set + set->info_buffer_offset); *offload_req = *offloads; offload_req->header.type = RNDIS_OBJECT_TYPE_DEFAULT; offload_req->header.revision = RNDIS_OFFLOAD_PARAMETERS_REVISION_3; offload_req->header.size = extlen; ret = hv_rf_send_request(rndis_dev, request, REMOTE_NDIS_SET_MSG); if (ret != 0) { device_printf(dev, "hv send offload request failed, ret=%d!\n", ret); goto cleanup; } ret = sema_timedwait(&request->wait_sema, 500); if (ret != 0) { device_printf(dev, "hv send offload request timeout\n"); goto cleanup; } set_complete = &request->response_msg.msg.set_complete; if (set_complete->status == RNDIS_STATUS_SUCCESS) { device_printf(dev, "hv send offload request succeeded\n"); ret = 0; } else { if (set_complete->status == STATUS_NOT_SUPPORTED) { device_printf(dev, "HV Not support offload\n"); ret = 0; } else { ret = set_complete->status; } } cleanup: if (request) hv_put_rndis_request(rndis_dev, request); return (ret); } /* * RNDIS filter receive indicate status */ static void hv_rf_receive_indicate_status(rndis_device *device, rndis_msg *response) { rndis_indicate_status *indicate = &response->msg.indicate_status; switch(indicate->status) { case RNDIS_STATUS_MEDIA_CONNECT: netvsc_linkstatus_callback(device->net_dev->dev, 1); break; case RNDIS_STATUS_MEDIA_DISCONNECT: netvsc_linkstatus_callback(device->net_dev->dev, 0); break; default: /* TODO: */ device_printf(device->net_dev->dev->device, "unknown status %d received\n", indicate->status); break; } } /* * RNDIS filter receive data */ static void hv_rf_receive_data(rndis_device *device, rndis_msg *message, netvsc_packet *pkt) { rndis_packet *rndis_pkt; ndis_8021q_info *rppi_vlan_info; uint32_t data_offset; rndis_tcp_ip_csum_info *csum_info = NULL; device_t dev = device->net_dev->dev->device; rndis_pkt = &message->msg.packet; /* * Fixme: Handle multiple rndis pkt msgs that may be enclosed in this * netvsc packet (ie tot_data_buf_len != message_length) */ /* Remove rndis header, then pass data packet up the stack */ data_offset = RNDIS_HEADER_SIZE + rndis_pkt->data_offset; pkt->tot_data_buf_len -= data_offset; if (pkt->tot_data_buf_len < rndis_pkt->data_length) { pkt->status = nvsp_status_failure; device_printf(dev, "total length %u is less than data length %u\n", pkt->tot_data_buf_len, rndis_pkt->data_length); return; } pkt->tot_data_buf_len = rndis_pkt->data_length; pkt->data = (void *)((unsigned long)pkt->data + data_offset); rppi_vlan_info = hv_get_ppi_data(rndis_pkt, ieee_8021q_info); if (rppi_vlan_info) { pkt->vlan_tci = rppi_vlan_info->u1.s1.vlan_id; } else { pkt->vlan_tci = 0; } csum_info = hv_get_ppi_data(rndis_pkt, tcpip_chksum_info); netvsc_recv(device->net_dev->dev, pkt, csum_info); } /* * RNDIS filter on receive */ int hv_rf_on_receive(netvsc_dev *net_dev, struct hv_device *device, netvsc_packet *pkt) { rndis_device *rndis_dev; rndis_msg *rndis_hdr; /* Make sure the rndis device state is initialized */ if (net_dev->extension == NULL) { pkt->status = nvsp_status_failure; return (ENODEV); } rndis_dev = (rndis_device *)net_dev->extension; if (rndis_dev->state == RNDIS_DEV_UNINITIALIZED) { pkt->status = nvsp_status_failure; return (EINVAL); } rndis_hdr = pkt->data; switch (rndis_hdr->ndis_msg_type) { /* data message */ case REMOTE_NDIS_PACKET_MSG: hv_rf_receive_data(rndis_dev, rndis_hdr, pkt); break; /* completion messages */ case REMOTE_NDIS_INITIALIZE_CMPLT: case REMOTE_NDIS_QUERY_CMPLT: case REMOTE_NDIS_SET_CMPLT: case REMOTE_NDIS_RESET_CMPLT: case REMOTE_NDIS_KEEPALIVE_CMPLT: hv_rf_receive_response(rndis_dev, rndis_hdr); break; /* notification message */ case REMOTE_NDIS_INDICATE_STATUS_MSG: hv_rf_receive_indicate_status(rndis_dev, rndis_hdr); break; default: printf("hv_rf_on_receive(): Unknown msg_type 0x%x\n", rndis_hdr->ndis_msg_type); break; } return (0); } /* * RNDIS filter query device */ static int hv_rf_query_device(rndis_device *device, uint32_t oid, void *result, uint32_t *result_size) { rndis_request *request; uint32_t in_result_size = *result_size; rndis_query_request *query; rndis_query_complete *query_complete; int ret = 0; *result_size = 0; request = hv_rndis_request(device, REMOTE_NDIS_QUERY_MSG, RNDIS_MESSAGE_SIZE(rndis_query_request)); if (request == NULL) { ret = -1; goto cleanup; } /* Set up the rndis query */ query = &request->request_msg.msg.query_request; query->oid = oid; query->info_buffer_offset = sizeof(rndis_query_request); query->info_buffer_length = 0; query->device_vc_handle = 0; ret = hv_rf_send_request(device, request, REMOTE_NDIS_QUERY_MSG); if (ret != 0) { /* Fixme: printf added */ printf("RNDISFILTER request failed to Send!\n"); goto cleanup; } sema_wait(&request->wait_sema); /* Copy the response back */ query_complete = &request->response_msg.msg.query_complete; if (query_complete->info_buffer_length > in_result_size) { ret = EINVAL; goto cleanup; } memcpy(result, (void *)((unsigned long)query_complete + query_complete->info_buffer_offset), query_complete->info_buffer_length); *result_size = query_complete->info_buffer_length; cleanup: if (request != NULL) hv_put_rndis_request(device, request); return (ret); } /* * RNDIS filter query device MAC address */ static inline int hv_rf_query_device_mac(rndis_device *device) { uint32_t size = HW_MACADDR_LEN; return (hv_rf_query_device(device, RNDIS_OID_802_3_PERMANENT_ADDRESS, device->hw_mac_addr, &size)); } /* * RNDIS filter query device link status */ static inline int hv_rf_query_device_link_status(rndis_device *device) { uint32_t size = sizeof(uint32_t); return (hv_rf_query_device(device, RNDIS_OID_GEN_MEDIA_CONNECT_STATUS, &device->link_status, &size)); } /* * RNDIS filter set packet filter * Sends an rndis request with the new filter, then waits for a response * from the host. * Returns zero on success, non-zero on failure. */ static int hv_rf_set_packet_filter(rndis_device *device, uint32_t new_filter) { rndis_request *request; rndis_set_request *set; rndis_set_complete *set_complete; uint32_t status; int ret; request = hv_rndis_request(device, REMOTE_NDIS_SET_MSG, RNDIS_MESSAGE_SIZE(rndis_set_request) + sizeof(uint32_t)); if (request == NULL) { ret = -1; goto cleanup; } /* Set up the rndis set */ set = &request->request_msg.msg.set_request; set->oid = RNDIS_OID_GEN_CURRENT_PACKET_FILTER; set->info_buffer_length = sizeof(uint32_t); set->info_buffer_offset = sizeof(rndis_set_request); memcpy((void *)((unsigned long)set + sizeof(rndis_set_request)), &new_filter, sizeof(uint32_t)); ret = hv_rf_send_request(device, request, REMOTE_NDIS_SET_MSG); if (ret != 0) { goto cleanup; } /* * Wait for the response from the host. Another thread will signal * us when the response has arrived. In the failure case, * sema_timedwait() returns a non-zero status after waiting 5 seconds. */ ret = sema_timedwait(&request->wait_sema, 500); if (ret == 0) { /* Response received, check status */ set_complete = &request->response_msg.msg.set_complete; status = set_complete->status; if (status != RNDIS_STATUS_SUCCESS) { /* Bad response status, return error */ ret = -2; } } else { /* * We cannot deallocate the request since we may still * receive a send completion for it. */ goto exit; } cleanup: if (request != NULL) { hv_put_rndis_request(device, request); } exit: return (ret); } /* * RNDIS filter init device */ static int hv_rf_init_device(rndis_device *device) { rndis_request *request; rndis_initialize_request *init; rndis_initialize_complete *init_complete; uint32_t status; int ret; request = hv_rndis_request(device, REMOTE_NDIS_INITIALIZE_MSG, RNDIS_MESSAGE_SIZE(rndis_initialize_request)); if (!request) { ret = -1; goto cleanup; } /* Set up the rndis set */ init = &request->request_msg.msg.init_request; init->major_version = RNDIS_MAJOR_VERSION; init->minor_version = RNDIS_MINOR_VERSION; /* * Per the RNDIS document, this should be set to the max MTU * plus the header size. However, 2048 works fine, so leaving * it as is. */ init->max_xfer_size = 2048; device->state = RNDIS_DEV_INITIALIZING; ret = hv_rf_send_request(device, request, REMOTE_NDIS_INITIALIZE_MSG); if (ret != 0) { device->state = RNDIS_DEV_UNINITIALIZED; goto cleanup; } sema_wait(&request->wait_sema); init_complete = &request->response_msg.msg.init_complete; status = init_complete->status; if (status == RNDIS_STATUS_SUCCESS) { device->state = RNDIS_DEV_INITIALIZED; ret = 0; } else { device->state = RNDIS_DEV_UNINITIALIZED; ret = -1; } cleanup: if (request) { hv_put_rndis_request(device, request); } return (ret); } #define HALT_COMPLETION_WAIT_COUNT 25 /* * RNDIS filter halt device */ static int hv_rf_halt_device(rndis_device *device) { rndis_request *request; rndis_halt_request *halt; int i, ret; /* Attempt to do a rndis device halt */ request = hv_rndis_request(device, REMOTE_NDIS_HALT_MSG, RNDIS_MESSAGE_SIZE(rndis_halt_request)); if (request == NULL) { return (-1); } /* initialize "poor man's semaphore" */ request->halt_complete_flag = 0; /* Set up the rndis set */ halt = &request->request_msg.msg.halt_request; halt->request_id = atomic_fetchadd_int(&device->new_request_id, 1); /* Increment to get the new value (call above returns old value) */ halt->request_id += 1; ret = hv_rf_send_request(device, request, REMOTE_NDIS_HALT_MSG); if (ret != 0) { return (-1); } /* * Wait for halt response from halt callback. We must wait for * the transaction response before freeing the request and other * resources. */ for (i=HALT_COMPLETION_WAIT_COUNT; i > 0; i--) { if (request->halt_complete_flag != 0) { break; } DELAY(400); } if (i == 0) { return (-1); } device->state = RNDIS_DEV_UNINITIALIZED; if (request != NULL) { hv_put_rndis_request(device, request); } return (0); } /* * RNDIS filter open device */ static int hv_rf_open_device(rndis_device *device) { int ret; if (device->state != RNDIS_DEV_INITIALIZED) { return (0); } if (hv_promisc_mode != 1) { ret = hv_rf_set_packet_filter(device, NDIS_PACKET_TYPE_BROADCAST | NDIS_PACKET_TYPE_ALL_MULTICAST | NDIS_PACKET_TYPE_DIRECTED); } else { ret = hv_rf_set_packet_filter(device, NDIS_PACKET_TYPE_PROMISCUOUS); } if (ret == 0) { device->state = RNDIS_DEV_DATAINITIALIZED; } return (ret); } /* * RNDIS filter close device */ static int hv_rf_close_device(rndis_device *device) { int ret; if (device->state != RNDIS_DEV_DATAINITIALIZED) { return (0); } ret = hv_rf_set_packet_filter(device, 0); if (ret == 0) { device->state = RNDIS_DEV_INITIALIZED; } return (ret); } /* * RNDIS filter on device add */ int hv_rf_on_device_add(struct hv_device *device, void *additl_info) { int ret; netvsc_dev *net_dev; rndis_device *rndis_dev; rndis_offload_params offloads; netvsc_device_info *dev_info = (netvsc_device_info *)additl_info; device_t dev = device->device; rndis_dev = hv_get_rndis_device(); if (rndis_dev == NULL) { return (ENOMEM); } /* * Let the inner driver handle this first to create the netvsc channel * NOTE! Once the channel is created, we may get a receive callback * (hv_rf_on_receive()) before this call is completed. * Note: Earlier code used a function pointer here. */ net_dev = hv_nv_on_device_add(device, additl_info); if (!net_dev) { hv_put_rndis_device(rndis_dev); return (ENOMEM); } /* * Initialize the rndis device */ net_dev->extension = rndis_dev; rndis_dev->net_dev = net_dev; /* Send the rndis initialization message */ ret = hv_rf_init_device(rndis_dev); if (ret != 0) { /* * TODO: If rndis init failed, we will need to shut down * the channel */ } /* Get the mac address */ ret = hv_rf_query_device_mac(rndis_dev); if (ret != 0) { /* TODO: shut down rndis device and the channel */ } /* config csum offload and send request to host */ memset(&offloads, 0, sizeof(offloads)); offloads.ipv4_csum = RNDIS_OFFLOAD_PARAMETERS_TX_RX_ENABLED; offloads.tcp_ipv4_csum = RNDIS_OFFLOAD_PARAMETERS_TX_RX_ENABLED; offloads.udp_ipv4_csum = RNDIS_OFFLOAD_PARAMETERS_TX_RX_ENABLED; offloads.tcp_ipv6_csum = RNDIS_OFFLOAD_PARAMETERS_TX_RX_ENABLED; offloads.udp_ipv6_csum = RNDIS_OFFLOAD_PARAMETERS_TX_RX_ENABLED; offloads.lso_v2_ipv4 = RNDIS_OFFLOAD_PARAMETERS_LSOV2_ENABLED; ret = hv_rf_send_offload_request(device, &offloads); if (ret != 0) { /* TODO: shut down rndis device and the channel */ device_printf(dev, "hv_rf_send_offload_request failed, ret=%d\n", ret); } memcpy(dev_info->mac_addr, rndis_dev->hw_mac_addr, HW_MACADDR_LEN); hv_rf_query_device_link_status(rndis_dev); dev_info->link_state = rndis_dev->link_status; return (ret); } /* * RNDIS filter on device remove */ int hv_rf_on_device_remove(struct hv_device *device, boolean_t destroy_channel) { hn_softc_t *sc = device_get_softc(device->device); netvsc_dev *net_dev = sc->net_dev; rndis_device *rndis_dev = (rndis_device *)net_dev->extension; int ret; /* Halt and release the rndis device */ ret = hv_rf_halt_device(rndis_dev); hv_put_rndis_device(rndis_dev); net_dev->extension = NULL; /* Pass control to inner driver to remove the device */ ret |= hv_nv_on_device_remove(device, destroy_channel); return (ret); } /* * RNDIS filter on open */ int hv_rf_on_open(struct hv_device *device) { hn_softc_t *sc = device_get_softc(device->device); netvsc_dev *net_dev = sc->net_dev; return (hv_rf_open_device((rndis_device *)net_dev->extension)); } /* * RNDIS filter on close */ int hv_rf_on_close(struct hv_device *device) { hn_softc_t *sc = device_get_softc(device->device); netvsc_dev *net_dev = sc->net_dev; return (hv_rf_close_device((rndis_device *)net_dev->extension)); } /* * RNDIS filter on send request completion callback */ static void hv_rf_on_send_request_completion(void *context) { } /* * RNDIS filter on send request (halt only) completion callback */ static void hv_rf_on_send_request_halt_completion(void *context) { rndis_request *request = context; /* * Notify hv_rf_halt_device() about halt completion. * The halt code must wait for completion before freeing * the transaction resources. */ request->halt_complete_flag = 1; } /* * RNDIS filter when "all" reception is done */ void hv_rf_receive_rollup(netvsc_dev *net_dev) { rndis_device *rndis_dev; rndis_dev = (rndis_device *)net_dev->extension; netvsc_recv_rollup(rndis_dev->net_dev->dev); } + +void +hv_rf_channel_rollup(netvsc_dev *net_dev) +{ + rndis_device *rndis_dev; + + rndis_dev = (rndis_device *)net_dev->extension; + + /* + * This could be called pretty early, so we need + * to make sure everything has been setup. + */ + if (rndis_dev == NULL || + rndis_dev->net_dev == NULL || + rndis_dev->net_dev->dev == NULL) + return; + netvsc_channel_rollup(rndis_dev->net_dev->dev); +} Index: projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis_filter.h =================================================================== --- projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis_filter.h (revision 294776) +++ projects/clang380-import/sys/dev/hyperv/netvsc/hv_rndis_filter.h (revision 294777) @@ -1,108 +1,109 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2010-2012 Citrix Inc. * Copyright (c) 2012 NetApp Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #ifndef __HV_RNDIS_FILTER_H__ #define __HV_RNDIS_FILTER_H__ /* * Defines */ /* Destroy or preserve channel on filter/netvsc teardown */ #define HV_RF_NV_DESTROY_CHANNEL TRUE #define HV_RF_NV_RETAIN_CHANNEL FALSE /* * Number of page buffers to reserve for the RNDIS filter packet in the * transmitted message. */ #define HV_RF_NUM_TX_RESERVED_PAGE_BUFS 1 /* * Data types */ typedef enum { RNDIS_DEV_UNINITIALIZED = 0, RNDIS_DEV_INITIALIZING, RNDIS_DEV_INITIALIZED, RNDIS_DEV_DATAINITIALIZED, } rndis_device_state; typedef struct rndis_request_ { STAILQ_ENTRY(rndis_request_) mylist_entry; struct sema wait_sema; /* * Fixme: We assumed a fixed size response here. If we do ever * need to handle a bigger response, we can either define a max * response message or add a response buffer variable above this field */ rndis_msg response_msg; /* Simplify allocation by having a netvsc packet inline */ netvsc_packet pkt; hv_vmbus_page_buffer buffer; /* Fixme: We assumed a fixed size request here. */ rndis_msg request_msg; /* Fixme: Poor man's semaphore. */ uint32_t halt_complete_flag; } rndis_request; typedef struct rndis_device_ { netvsc_dev *net_dev; rndis_device_state state; uint32_t link_status; uint32_t new_request_id; struct mtx req_lock; STAILQ_HEAD(RQ, rndis_request_) myrequest_list; uint8_t hw_mac_addr[HW_MACADDR_LEN]; } rndis_device; /* * Externs */ int hv_rf_on_receive(netvsc_dev *net_dev, struct hv_device *device, netvsc_packet *pkt); void hv_rf_receive_rollup(netvsc_dev *net_dev); +void hv_rf_channel_rollup(netvsc_dev *net_dev); int hv_rf_on_device_add(struct hv_device *device, void *additl_info); int hv_rf_on_device_remove(struct hv_device *device, boolean_t destroy_channel); int hv_rf_on_open(struct hv_device *device); int hv_rf_on_close(struct hv_device *device); #endif /* __HV_RNDIS_FILTER_H__ */ Index: projects/clang380-import/sys/dev/hyperv/vmbus/hv_channel.c =================================================================== --- projects/clang380-import/sys/dev/hyperv/vmbus/hv_channel.c (revision 294776) +++ projects/clang380-import/sys/dev/hyperv/vmbus/hv_channel.c (revision 294777) @@ -1,881 +1,878 @@ /*- * Copyright (c) 2009-2012 Microsoft Corp. * Copyright (c) 2012 NetApp Inc. * Copyright (c) 2012 Citrix Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include "hv_vmbus_priv.h" static int vmbus_channel_create_gpadl_header( /* must be phys and virt contiguous*/ void* contig_buffer, /* page-size multiple */ uint32_t size, hv_vmbus_channel_msg_info** msg_info, uint32_t* message_count); static void vmbus_channel_set_event(hv_vmbus_channel* channel); /** * @brief Trigger an event notification on the specified channel */ static void vmbus_channel_set_event(hv_vmbus_channel *channel) { hv_vmbus_monitor_page *monitor_page; if (channel->offer_msg.monitor_allocated) { /* Each uint32_t represents 32 channels */ synch_set_bit((channel->offer_msg.child_rel_id & 31), ((uint32_t *)hv_vmbus_g_connection.send_interrupt_page + ((channel->offer_msg.child_rel_id >> 5)))); monitor_page = (hv_vmbus_monitor_page *) hv_vmbus_g_connection.monitor_pages; monitor_page++; /* Get the child to parent monitor page */ synch_set_bit(channel->monitor_bit, (uint32_t *)&monitor_page-> trigger_group[channel->monitor_group].u.pending); } else { hv_vmbus_set_event(channel); } } /** * @brief Open the specified channel */ int hv_vmbus_channel_open( hv_vmbus_channel* new_channel, uint32_t send_ring_buffer_size, uint32_t recv_ring_buffer_size, void* user_data, uint32_t user_data_len, hv_vmbus_pfn_channel_callback pfn_on_channel_callback, void* context) { int ret = 0; void *in, *out; hv_vmbus_channel_open_channel* open_msg; hv_vmbus_channel_msg_info* open_info; mtx_lock(&new_channel->sc_lock); if (new_channel->state == HV_CHANNEL_OPEN_STATE) { new_channel->state = HV_CHANNEL_OPENING_STATE; } else { mtx_unlock(&new_channel->sc_lock); if(bootverbose) printf("VMBUS: Trying to open channel <%p> which in " "%d state.\n", new_channel, new_channel->state); return (EINVAL); } mtx_unlock(&new_channel->sc_lock); new_channel->on_channel_callback = pfn_on_channel_callback; new_channel->channel_callback_context = context; /* Allocate the ring buffer */ out = contigmalloc((send_ring_buffer_size + recv_ring_buffer_size), M_DEVBUF, M_ZERO, 0UL, BUS_SPACE_MAXADDR, PAGE_SIZE, 0); KASSERT(out != NULL, ("Error VMBUS: contigmalloc failed to allocate Ring Buffer!")); if (out == NULL) return (ENOMEM); in = ((uint8_t *) out + send_ring_buffer_size); new_channel->ring_buffer_pages = out; new_channel->ring_buffer_page_count = (send_ring_buffer_size + recv_ring_buffer_size) >> PAGE_SHIFT; new_channel->ring_buffer_size = send_ring_buffer_size + recv_ring_buffer_size; hv_vmbus_ring_buffer_init( &new_channel->outbound, out, send_ring_buffer_size); hv_vmbus_ring_buffer_init( &new_channel->inbound, in, recv_ring_buffer_size); /** * Establish the gpadl for the ring buffer */ new_channel->ring_buffer_gpadl_handle = 0; ret = hv_vmbus_channel_establish_gpadl(new_channel, new_channel->outbound.ring_buffer, send_ring_buffer_size + recv_ring_buffer_size, &new_channel->ring_buffer_gpadl_handle); /** * Create and init the channel open message */ open_info = (hv_vmbus_channel_msg_info*) malloc( sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_open_channel), M_DEVBUF, M_NOWAIT); KASSERT(open_info != NULL, ("Error VMBUS: malloc failed to allocate Open Channel message!")); if (open_info == NULL) return (ENOMEM); sema_init(&open_info->wait_sema, 0, "Open Info Sema"); open_msg = (hv_vmbus_channel_open_channel*) open_info->msg; open_msg->header.message_type = HV_CHANNEL_MESSAGE_OPEN_CHANNEL; open_msg->open_id = new_channel->offer_msg.child_rel_id; open_msg->child_rel_id = new_channel->offer_msg.child_rel_id; open_msg->ring_buffer_gpadl_handle = new_channel->ring_buffer_gpadl_handle; open_msg->downstream_ring_buffer_page_offset = send_ring_buffer_size >> PAGE_SHIFT; open_msg->target_vcpu = new_channel->target_vcpu; if (user_data_len) memcpy(open_msg->user_data, user_data, user_data_len); mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_INSERT_TAIL( &hv_vmbus_g_connection.channel_msg_anchor, open_info, msg_list_entry); mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); ret = hv_vmbus_post_message( open_msg, sizeof(hv_vmbus_channel_open_channel)); if (ret != 0) goto cleanup; ret = sema_timedwait(&open_info->wait_sema, 500); /* KYS 5 seconds */ if (ret) { if(bootverbose) printf("VMBUS: channel <%p> open timeout.\n", new_channel); goto cleanup; } if (open_info->response.open_result.status == 0) { new_channel->state = HV_CHANNEL_OPENED_STATE; if(bootverbose) printf("VMBUS: channel <%p> open success.\n", new_channel); } else { if(bootverbose) printf("Error VMBUS: channel <%p> open failed - %d!\n", new_channel, open_info->response.open_result.status); } cleanup: mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_REMOVE( &hv_vmbus_g_connection.channel_msg_anchor, open_info, msg_list_entry); mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); sema_destroy(&open_info->wait_sema); free(open_info, M_DEVBUF); return (ret); } /** * @brief Create a gpadl for the specified buffer */ static int vmbus_channel_create_gpadl_header( void* contig_buffer, uint32_t size, /* page-size multiple */ hv_vmbus_channel_msg_info** msg_info, uint32_t* message_count) { int i; int page_count; unsigned long long pfn; uint32_t msg_size; hv_vmbus_channel_gpadl_header* gpa_header; hv_vmbus_channel_gpadl_body* gpadl_body; hv_vmbus_channel_msg_info* msg_header; hv_vmbus_channel_msg_info* msg_body; int pfnSum, pfnCount, pfnLeft, pfnCurr, pfnSize; page_count = size >> PAGE_SHIFT; pfn = hv_get_phys_addr(contig_buffer) >> PAGE_SHIFT; /*do we need a gpadl body msg */ pfnSize = HV_MAX_SIZE_CHANNEL_MESSAGE - sizeof(hv_vmbus_channel_gpadl_header) - sizeof(hv_gpa_range); pfnCount = pfnSize / sizeof(uint64_t); if (page_count > pfnCount) { /* if(we need a gpadl body) */ /* fill in the header */ msg_size = sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_gpadl_header) + sizeof(hv_gpa_range) + pfnCount * sizeof(uint64_t); msg_header = malloc(msg_size, M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT( msg_header != NULL, ("Error VMBUS: malloc failed to allocate Gpadl Message!")); if (msg_header == NULL) return (ENOMEM); TAILQ_INIT(&msg_header->sub_msg_list_anchor); msg_header->message_size = msg_size; gpa_header = (hv_vmbus_channel_gpadl_header*) msg_header->msg; gpa_header->range_count = 1; gpa_header->range_buf_len = sizeof(hv_gpa_range) + page_count * sizeof(uint64_t); gpa_header->range[0].byte_offset = 0; gpa_header->range[0].byte_count = size; for (i = 0; i < pfnCount; i++) { gpa_header->range[0].pfn_array[i] = pfn + i; } *msg_info = msg_header; *message_count = 1; pfnSum = pfnCount; pfnLeft = page_count - pfnCount; /* * figure out how many pfns we can fit */ pfnSize = HV_MAX_SIZE_CHANNEL_MESSAGE - sizeof(hv_vmbus_channel_gpadl_body); pfnCount = pfnSize / sizeof(uint64_t); /* * fill in the body */ while (pfnLeft) { if (pfnLeft > pfnCount) { pfnCurr = pfnCount; } else { pfnCurr = pfnLeft; } msg_size = sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_gpadl_body) + pfnCurr * sizeof(uint64_t); msg_body = malloc(msg_size, M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT( msg_body != NULL, ("Error VMBUS: malloc failed to allocate Gpadl msg_body!")); if (msg_body == NULL) return (ENOMEM); msg_body->message_size = msg_size; (*message_count)++; gpadl_body = (hv_vmbus_channel_gpadl_body*) msg_body->msg; /* * gpadl_body->gpadl = kbuffer; */ for (i = 0; i < pfnCurr; i++) { gpadl_body->pfn[i] = pfn + pfnSum + i; } TAILQ_INSERT_TAIL( &msg_header->sub_msg_list_anchor, msg_body, msg_list_entry); pfnSum += pfnCurr; pfnLeft -= pfnCurr; } } else { /* else everything fits in a header */ msg_size = sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_gpadl_header) + sizeof(hv_gpa_range) + page_count * sizeof(uint64_t); msg_header = malloc(msg_size, M_DEVBUF, M_NOWAIT | M_ZERO); KASSERT( msg_header != NULL, ("Error VMBUS: malloc failed to allocate Gpadl Message!")); if (msg_header == NULL) return (ENOMEM); msg_header->message_size = msg_size; gpa_header = (hv_vmbus_channel_gpadl_header*) msg_header->msg; gpa_header->range_count = 1; gpa_header->range_buf_len = sizeof(hv_gpa_range) + page_count * sizeof(uint64_t); gpa_header->range[0].byte_offset = 0; gpa_header->range[0].byte_count = size; for (i = 0; i < page_count; i++) { gpa_header->range[0].pfn_array[i] = pfn + i; } *msg_info = msg_header; *message_count = 1; } return (0); } /** * @brief Establish a GPADL for the specified buffer */ int hv_vmbus_channel_establish_gpadl( hv_vmbus_channel* channel, void* contig_buffer, uint32_t size, /* page-size multiple */ uint32_t* gpadl_handle) { int ret = 0; hv_vmbus_channel_gpadl_header* gpadl_msg; hv_vmbus_channel_gpadl_body* gpadl_body; hv_vmbus_channel_msg_info* msg_info; hv_vmbus_channel_msg_info* sub_msg_info; uint32_t msg_count; hv_vmbus_channel_msg_info* curr; uint32_t next_gpadl_handle; next_gpadl_handle = hv_vmbus_g_connection.next_gpadl_handle; atomic_add_int((int*) &hv_vmbus_g_connection.next_gpadl_handle, 1); ret = vmbus_channel_create_gpadl_header( contig_buffer, size, &msg_info, &msg_count); if(ret != 0) { /* if(allocation failed) return immediately */ /* reverse atomic_add_int above */ atomic_subtract_int((int*) &hv_vmbus_g_connection.next_gpadl_handle, 1); return ret; } sema_init(&msg_info->wait_sema, 0, "Open Info Sema"); gpadl_msg = (hv_vmbus_channel_gpadl_header*) msg_info->msg; gpadl_msg->header.message_type = HV_CHANNEL_MESSAGEL_GPADL_HEADER; gpadl_msg->child_rel_id = channel->offer_msg.child_rel_id; gpadl_msg->gpadl = next_gpadl_handle; mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_INSERT_TAIL( &hv_vmbus_g_connection.channel_msg_anchor, msg_info, msg_list_entry); mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); ret = hv_vmbus_post_message( gpadl_msg, msg_info->message_size - (uint32_t) sizeof(hv_vmbus_channel_msg_info)); if (ret != 0) goto cleanup; if (msg_count > 1) { TAILQ_FOREACH(curr, &msg_info->sub_msg_list_anchor, msg_list_entry) { sub_msg_info = curr; gpadl_body = (hv_vmbus_channel_gpadl_body*) sub_msg_info->msg; gpadl_body->header.message_type = HV_CHANNEL_MESSAGE_GPADL_BODY; gpadl_body->gpadl = next_gpadl_handle; ret = hv_vmbus_post_message( gpadl_body, sub_msg_info->message_size - (uint32_t) sizeof(hv_vmbus_channel_msg_info)); /* if (the post message failed) give up and clean up */ if(ret != 0) goto cleanup; } } ret = sema_timedwait(&msg_info->wait_sema, 500); /* KYS 5 seconds*/ if (ret != 0) goto cleanup; *gpadl_handle = gpadl_msg->gpadl; cleanup: mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_REMOVE(&hv_vmbus_g_connection.channel_msg_anchor, msg_info, msg_list_entry); mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); sema_destroy(&msg_info->wait_sema); free(msg_info, M_DEVBUF); return (ret); } /** * @brief Teardown the specified GPADL handle */ int hv_vmbus_channel_teardown_gpdal( hv_vmbus_channel* channel, uint32_t gpadl_handle) { int ret = 0; hv_vmbus_channel_gpadl_teardown* msg; hv_vmbus_channel_msg_info* info; info = (hv_vmbus_channel_msg_info *) malloc( sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_gpadl_teardown), M_DEVBUF, M_NOWAIT); KASSERT(info != NULL, ("Error VMBUS: malloc failed to allocate Gpadl Teardown Msg!")); if (info == NULL) { ret = ENOMEM; goto cleanup; } sema_init(&info->wait_sema, 0, "Open Info Sema"); msg = (hv_vmbus_channel_gpadl_teardown*) info->msg; msg->header.message_type = HV_CHANNEL_MESSAGE_GPADL_TEARDOWN; msg->child_rel_id = channel->offer_msg.child_rel_id; msg->gpadl = gpadl_handle; mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_INSERT_TAIL(&hv_vmbus_g_connection.channel_msg_anchor, info, msg_list_entry); mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); ret = hv_vmbus_post_message(msg, sizeof(hv_vmbus_channel_gpadl_teardown)); if (ret != 0) goto cleanup; ret = sema_timedwait(&info->wait_sema, 500); /* KYS 5 seconds */ cleanup: /* * Received a torndown response */ mtx_lock_spin(&hv_vmbus_g_connection.channel_msg_lock); TAILQ_REMOVE(&hv_vmbus_g_connection.channel_msg_anchor, info, msg_list_entry); mtx_unlock_spin(&hv_vmbus_g_connection.channel_msg_lock); sema_destroy(&info->wait_sema); free(info, M_DEVBUF); return (ret); } static void hv_vmbus_channel_close_internal(hv_vmbus_channel *channel) { int ret = 0; hv_vmbus_channel_close_channel* msg; hv_vmbus_channel_msg_info* info; channel->state = HV_CHANNEL_OPEN_STATE; channel->sc_creation_callback = NULL; /* * Grab the lock to prevent race condition when a packet received * and unloading driver is in the process. */ mtx_lock(&channel->inbound_lock); channel->on_channel_callback = NULL; mtx_unlock(&channel->inbound_lock); /** * Send a closing message */ info = (hv_vmbus_channel_msg_info *) malloc( sizeof(hv_vmbus_channel_msg_info) + sizeof(hv_vmbus_channel_close_channel), M_DEVBUF, M_NOWAIT); KASSERT(info != NULL, ("VMBUS: malloc failed hv_vmbus_channel_close!")); if(info == NULL) return; msg = (hv_vmbus_channel_close_channel*) info->msg; msg->header.message_type = HV_CHANNEL_MESSAGE_CLOSE_CHANNEL; msg->child_rel_id = channel->offer_msg.child_rel_id; ret = hv_vmbus_post_message( msg, sizeof(hv_vmbus_channel_close_channel)); /* Tear down the gpadl for the channel's ring buffer */ if (channel->ring_buffer_gpadl_handle) { hv_vmbus_channel_teardown_gpdal(channel, channel->ring_buffer_gpadl_handle); } /* TODO: Send a msg to release the childRelId */ /* cleanup the ring buffers for this channel */ hv_ring_buffer_cleanup(&channel->outbound); hv_ring_buffer_cleanup(&channel->inbound); contigfree(channel->ring_buffer_pages, channel->ring_buffer_size, M_DEVBUF); free(info, M_DEVBUF); } /** * @brief Close the specified channel */ void hv_vmbus_channel_close(hv_vmbus_channel *channel) { hv_vmbus_channel* sub_channel; if (channel->primary_channel != NULL) { /* * We only close multi-channels when the primary is * closed. */ return; } /* * Close all multi-channels first. */ TAILQ_FOREACH(sub_channel, &channel->sc_list_anchor, sc_list_entry) { if (sub_channel->state != HV_CHANNEL_OPENED_STATE) continue; hv_vmbus_channel_close_internal(sub_channel); } /* * Then close the primary channel. */ hv_vmbus_channel_close_internal(channel); } /** * @brief Send the specified buffer on the given channel */ int hv_vmbus_channel_send_packet( hv_vmbus_channel* channel, void* buffer, uint32_t buffer_len, uint64_t request_id, hv_vmbus_packet_type type, uint32_t flags) { int ret = 0; hv_vm_packet_descriptor desc; uint32_t packet_len; uint64_t aligned_data; uint32_t packet_len_aligned; boolean_t need_sig; hv_vmbus_sg_buffer_list buffer_list[3]; packet_len = sizeof(hv_vm_packet_descriptor) + buffer_len; packet_len_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t)); aligned_data = 0; /* Setup the descriptor */ desc.type = type; /* HV_VMBUS_PACKET_TYPE_DATA_IN_BAND; */ desc.flags = flags; /* HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED */ /* in 8-bytes granularity */ desc.data_offset8 = sizeof(hv_vm_packet_descriptor) >> 3; desc.length8 = (uint16_t) (packet_len_aligned >> 3); desc.transaction_id = request_id; buffer_list[0].data = &desc; buffer_list[0].length = sizeof(hv_vm_packet_descriptor); buffer_list[1].data = buffer; buffer_list[1].length = buffer_len; buffer_list[2].data = &aligned_data; buffer_list[2].length = packet_len_aligned - packet_len; ret = hv_ring_buffer_write(&channel->outbound, buffer_list, 3, &need_sig); /* TODO: We should determine if this is optional */ if (ret == 0 && need_sig) { vmbus_channel_set_event(channel); } return (ret); } /** * @brief Send a range of single-page buffer packets using * a GPADL Direct packet type */ int hv_vmbus_channel_send_packet_pagebuffer( hv_vmbus_channel* channel, hv_vmbus_page_buffer page_buffers[], uint32_t page_count, void* buffer, uint32_t buffer_len, uint64_t request_id) { int ret = 0; - int i = 0; boolean_t need_sig; uint32_t packet_len; + uint32_t page_buflen; uint32_t packetLen_aligned; - hv_vmbus_sg_buffer_list buffer_list[3]; + hv_vmbus_sg_buffer_list buffer_list[4]; hv_vmbus_channel_packet_page_buffer desc; uint32_t descSize; uint64_t alignedData = 0; if (page_count > HV_MAX_PAGE_BUFFER_COUNT) return (EINVAL); /* * Adjust the size down since hv_vmbus_channel_packet_page_buffer * is the largest size we support */ - descSize = sizeof(hv_vmbus_channel_packet_page_buffer) - - ((HV_MAX_PAGE_BUFFER_COUNT - page_count) * - sizeof(hv_vmbus_page_buffer)); - packet_len = descSize + buffer_len; + descSize = __offsetof(hv_vmbus_channel_packet_page_buffer, range); + page_buflen = sizeof(hv_vmbus_page_buffer) * page_count; + packet_len = descSize + page_buflen + buffer_len; packetLen_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t)); /* Setup the descriptor */ desc.type = HV_VMBUS_PACKET_TYPE_DATA_USING_GPA_DIRECT; desc.flags = HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED; - desc.data_offset8 = descSize >> 3; /* in 8-bytes granularity */ + /* in 8-bytes granularity */ + desc.data_offset8 = (descSize + page_buflen) >> 3; desc.length8 = (uint16_t) (packetLen_aligned >> 3); desc.transaction_id = request_id; desc.range_count = page_count; - for (i = 0; i < page_count; i++) { - desc.range[i].length = page_buffers[i].length; - desc.range[i].offset = page_buffers[i].offset; - desc.range[i].pfn = page_buffers[i].pfn; - } - buffer_list[0].data = &desc; buffer_list[0].length = descSize; - buffer_list[1].data = buffer; - buffer_list[1].length = buffer_len; + buffer_list[1].data = page_buffers; + buffer_list[1].length = page_buflen; - buffer_list[2].data = &alignedData; - buffer_list[2].length = packetLen_aligned - packet_len; + buffer_list[2].data = buffer; + buffer_list[2].length = buffer_len; - ret = hv_ring_buffer_write(&channel->outbound, buffer_list, 3, + buffer_list[3].data = &alignedData; + buffer_list[3].length = packetLen_aligned - packet_len; + + ret = hv_ring_buffer_write(&channel->outbound, buffer_list, 4, &need_sig); /* TODO: We should determine if this is optional */ if (ret == 0 && need_sig) { vmbus_channel_set_event(channel); } return (ret); } /** * @brief Send a multi-page buffer packet using a GPADL Direct packet type */ int hv_vmbus_channel_send_packet_multipagebuffer( hv_vmbus_channel* channel, hv_vmbus_multipage_buffer* multi_page_buffer, void* buffer, uint32_t buffer_len, uint64_t request_id) { int ret = 0; uint32_t desc_size; boolean_t need_sig; uint32_t packet_len; uint32_t packet_len_aligned; uint32_t pfn_count; uint64_t aligned_data = 0; hv_vmbus_sg_buffer_list buffer_list[3]; hv_vmbus_channel_packet_multipage_buffer desc; pfn_count = HV_NUM_PAGES_SPANNED( multi_page_buffer->offset, multi_page_buffer->length); if ((pfn_count == 0) || (pfn_count > HV_MAX_MULTIPAGE_BUFFER_COUNT)) return (EINVAL); /* * Adjust the size down since hv_vmbus_channel_packet_multipage_buffer * is the largest size we support */ desc_size = sizeof(hv_vmbus_channel_packet_multipage_buffer) - ((HV_MAX_MULTIPAGE_BUFFER_COUNT - pfn_count) * sizeof(uint64_t)); packet_len = desc_size + buffer_len; packet_len_aligned = HV_ALIGN_UP(packet_len, sizeof(uint64_t)); /* * Setup the descriptor */ desc.type = HV_VMBUS_PACKET_TYPE_DATA_USING_GPA_DIRECT; desc.flags = HV_VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED; desc.data_offset8 = desc_size >> 3; /* in 8-bytes granularity */ desc.length8 = (uint16_t) (packet_len_aligned >> 3); desc.transaction_id = request_id; desc.range_count = 1; desc.range.length = multi_page_buffer->length; desc.range.offset = multi_page_buffer->offset; memcpy(desc.range.pfn_array, multi_page_buffer->pfn_array, pfn_count * sizeof(uint64_t)); buffer_list[0].data = &desc; buffer_list[0].length = desc_size; buffer_list[1].data = buffer; buffer_list[1].length = buffer_len; buffer_list[2].data = &aligned_data; buffer_list[2].length = packet_len_aligned - packet_len; ret = hv_ring_buffer_write(&channel->outbound, buffer_list, 3, &need_sig); /* TODO: We should determine if this is optional */ if (ret == 0 && need_sig) { vmbus_channel_set_event(channel); } return (ret); } /** * @brief Retrieve the user packet on the specified channel */ int hv_vmbus_channel_recv_packet( hv_vmbus_channel* channel, void* Buffer, uint32_t buffer_len, uint32_t* buffer_actual_len, uint64_t* request_id) { int ret; uint32_t user_len; uint32_t packet_len; hv_vm_packet_descriptor desc; *buffer_actual_len = 0; *request_id = 0; ret = hv_ring_buffer_peek(&channel->inbound, &desc, sizeof(hv_vm_packet_descriptor)); if (ret != 0) return (0); packet_len = desc.length8 << 3; user_len = packet_len - (desc.data_offset8 << 3); *buffer_actual_len = user_len; if (user_len > buffer_len) return (EINVAL); *request_id = desc.transaction_id; /* Copy over the packet to the user buffer */ ret = hv_ring_buffer_read(&channel->inbound, Buffer, user_len, (desc.data_offset8 << 3)); return (0); } /** * @brief Retrieve the raw packet on the specified channel */ int hv_vmbus_channel_recv_packet_raw( hv_vmbus_channel* channel, void* buffer, uint32_t buffer_len, uint32_t* buffer_actual_len, uint64_t* request_id) { int ret; uint32_t packetLen; uint32_t userLen; hv_vm_packet_descriptor desc; *buffer_actual_len = 0; *request_id = 0; ret = hv_ring_buffer_peek( &channel->inbound, &desc, sizeof(hv_vm_packet_descriptor)); if (ret != 0) return (0); packetLen = desc.length8 << 3; userLen = packetLen - (desc.data_offset8 << 3); *buffer_actual_len = packetLen; if (packetLen > buffer_len) return (ENOBUFS); *request_id = desc.transaction_id; /* Copy over the entire packet to the user buffer */ ret = hv_ring_buffer_read(&channel->inbound, buffer, packetLen, 0); return (0); } Index: projects/clang380-import/sys/dev/hyperv =================================================================== --- projects/clang380-import/sys/dev/hyperv (revision 294776) +++ projects/clang380-import/sys/dev/hyperv (revision 294777) Property changes on: projects/clang380-import/sys/dev/hyperv ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys/dev/hyperv:r294599-294776 Index: projects/clang380-import/sys/dev/ixgbe/ixgbe_osdep.h =================================================================== --- projects/clang380-import/sys/dev/ixgbe/ixgbe_osdep.h (revision 294776) +++ projects/clang380-import/sys/dev/ixgbe/ixgbe_osdep.h (revision 294777) @@ -1,215 +1,238 @@ /****************************************************************************** Copyright (c) 2001-2015, Intel Corporation All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ /*$FreeBSD$*/ #ifndef _IXGBE_OS_H_ #define _IXGBE_OS_H_ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define ASSERT(x) if(!(x)) panic("IXGBE: x") #define EWARN(H, W, S) printf(W) +enum { + IXGBE_ERROR_SOFTWARE, + IXGBE_ERROR_POLLING, + IXGBE_ERROR_INVALID_STATE, + IXGBE_ERROR_UNSUPPORTED, + IXGBE_ERROR_ARGUMENT, + IXGBE_ERROR_CAUTION, +}; + /* The happy-fun DELAY macro is defined in /usr/src/sys/i386/include/clock.h */ #define usec_delay(x) DELAY(x) #define msec_delay(x) DELAY(1000*(x)) #define DBG 0 #define MSGOUT(S, A, B) printf(S "\n", A, B) #define DEBUGFUNC(F) DEBUGOUT(F); #if DBG #define DEBUGOUT(S) printf(S "\n") #define DEBUGOUT1(S,A) printf(S "\n",A) #define DEBUGOUT2(S,A,B) printf(S "\n",A,B) #define DEBUGOUT3(S,A,B,C) printf(S "\n",A,B,C) #define DEBUGOUT4(S,A,B,C,D) printf(S "\n",A,B,C,D) #define DEBUGOUT5(S,A,B,C,D,E) printf(S "\n",A,B,C,D,E) #define DEBUGOUT6(S,A,B,C,D,E,F) printf(S "\n",A,B,C,D,E,F) #define DEBUGOUT7(S,A,B,C,D,E,F,G) printf(S "\n",A,B,C,D,E,F,G) - #define ERROR_REPORT1(S,A) printf(S "\n",A) - #define ERROR_REPORT2(S,A,B) printf(S "\n",A,B) - #define ERROR_REPORT3(S,A,B,C) printf(S "\n",A,B,C) + #define ERROR_REPORT1 ERROR_REPORT + #define ERROR_REPORT2 ERROR_REPORT + #define ERROR_REPORT3 ERROR_REPORT + #define ERROR_REPORT(level, format, arg...) do { \ + switch (level) { \ + case IXGBE_ERROR_SOFTWARE: \ + case IXGBE_ERROR_CAUTION: \ + case IXGBE_ERROR_POLLING: \ + case IXGBE_ERROR_INVALID_STATE: \ + case IXGBE_ERROR_UNSUPPORTED: \ + case IXGBE_ERROR_ARGUMENT: \ + device_printf(ixgbe_dev_from_hw(hw), format, ## arg); \ + break; \ + default: \ + break; \ + } \ + } while (0) #else #define DEBUGOUT(S) #define DEBUGOUT1(S,A) #define DEBUGOUT2(S,A,B) #define DEBUGOUT3(S,A,B,C) #define DEBUGOUT4(S,A,B,C,D) #define DEBUGOUT5(S,A,B,C,D,E) #define DEBUGOUT6(S,A,B,C,D,E,F) #define DEBUGOUT7(S,A,B,C,D,E,F,G) #define ERROR_REPORT1(S,A) #define ERROR_REPORT2(S,A,B) #define ERROR_REPORT3(S,A,B,C) #endif #define FALSE 0 #define false 0 /* shared code requires this */ #define TRUE 1 #define true 1 #define CMD_MEM_WRT_INVALIDATE 0x0010 /* BIT_4 */ #define PCI_COMMAND_REGISTER PCIR_COMMAND /* Shared code dropped this define.. */ #define IXGBE_INTEL_VENDOR_ID 0x8086 /* Bunch of defines for shared code bogosity */ #define UNREFERENCED_PARAMETER(_p) #define UNREFERENCED_1PARAMETER(_p) #define UNREFERENCED_2PARAMETER(_p, _q) #define UNREFERENCED_3PARAMETER(_p, _q, _r) #define UNREFERENCED_4PARAMETER(_p, _q, _r, _s) #define IXGBE_NTOHL(_i) ntohl(_i) #define IXGBE_NTOHS(_i) ntohs(_i) /* XXX these need to be revisited */ #define IXGBE_CPU_TO_LE32 htole32 #define IXGBE_LE32_TO_CPUS(x) #define IXGBE_CPU_TO_BE16 htobe16 #define IXGBE_CPU_TO_BE32 htobe32 typedef uint8_t u8; typedef int8_t s8; typedef uint16_t u16; typedef int16_t s16; typedef uint32_t u32; typedef int32_t s32; typedef uint64_t u64; #ifndef __bool_true_false_are_defined typedef boolean_t bool; #endif /* shared code requires this */ #define __le16 u16 #define __le32 u32 #define __le64 u64 #define __be16 u16 #define __be32 u32 #define __be64 u64 #define le16_to_cpu #if __FreeBSD_version < 800000 #if defined(__i386__) || defined(__amd64__) #define mb() __asm volatile("mfence" ::: "memory") #define wmb() __asm volatile("sfence" ::: "memory") #define rmb() __asm volatile("lfence" ::: "memory") #else #define mb() #define rmb() #define wmb() #endif #endif #if defined(__i386__) || defined(__amd64__) static __inline void prefetch(void *x) { __asm volatile("prefetcht0 %0" :: "m" (*(unsigned long *)x)); } #else #define prefetch(x) #endif /* * Optimized bcopy thanks to Luigi Rizzo's investigative work. Assumes * non-overlapping regions and 32-byte padding on both src and dst. */ static __inline int ixgbe_bcopy(void *restrict _src, void *restrict _dst, int l) { uint64_t *src = _src; uint64_t *dst = _dst; for (; l > 0; l -= 32) { *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; } return (0); } struct ixgbe_osdep { bus_space_tag_t mem_bus_space_tag; bus_space_handle_t mem_bus_space_handle; }; /* These routines need struct ixgbe_hw declared */ struct ixgbe_hw; device_t ixgbe_dev_from_hw(struct ixgbe_hw *hw); /* These routines are needed by the shared code */ extern u16 ixgbe_read_pci_cfg(struct ixgbe_hw *, u32); #define IXGBE_READ_PCIE_WORD ixgbe_read_pci_cfg extern void ixgbe_write_pci_cfg(struct ixgbe_hw *, u32, u16); #define IXGBE_WRITE_PCIE_WORD ixgbe_write_pci_cfg #define IXGBE_WRITE_FLUSH(a) IXGBE_READ_REG(a, IXGBE_STATUS) extern u32 ixgbe_read_reg(struct ixgbe_hw *, u32); #define IXGBE_READ_REG(a, reg) ixgbe_read_reg(a, reg) extern void ixgbe_write_reg(struct ixgbe_hw *, u32, u32); #define IXGBE_WRITE_REG(a, reg, val) ixgbe_write_reg(a, reg, val) extern u32 ixgbe_read_reg_array(struct ixgbe_hw *, u32, u32); #define IXGBE_READ_REG_ARRAY(a, reg, offset) \ ixgbe_read_reg_array(a, reg, offset) extern void ixgbe_write_reg_array(struct ixgbe_hw *, u32, u32, u32); #define IXGBE_WRITE_REG_ARRAY(a, reg, offset, val) \ ixgbe_write_reg_array(a, reg, offset, val) #endif /* _IXGBE_OS_H_ */ Index: projects/clang380-import/sys/dev/ofw/openfirm.c =================================================================== --- projects/clang380-import/sys/dev/ofw/openfirm.c (revision 294776) +++ projects/clang380-import/sys/dev/ofw/openfirm.c (revision 294777) @@ -1,796 +1,798 @@ /* $NetBSD: Locore.c,v 1.7 2000/08/20 07:04:59 tsubai Exp $ */ /*- * Copyright (C) 1995, 1996 Wolfgang Solfrank. * Copyright (C) 1995, 1996 TooLs GmbH. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by TooLs GmbH. * 4. The name of TooLs GmbH may not be used to endorse or promote products * derived from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY TOOLS GMBH ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ /*- * Copyright (C) 2000 Benno Rice. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY Benno Rice ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL TOOLS GMBH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_platform.h" #include #include #include #include #include #include #include #include #include #include #include #include "ofw_if.h" static void OF_putchar(int c, void *arg); MALLOC_DEFINE(M_OFWPROP, "openfirm", "Open Firmware properties"); static ihandle_t stdout; static ofw_def_t *ofw_def_impl = NULL; static ofw_t ofw_obj; static struct ofw_kobj ofw_kernel_obj; static struct kobj_ops ofw_kernel_kops; struct xrefinfo { phandle_t xref; phandle_t node; device_t dev; SLIST_ENTRY(xrefinfo) next_entry; }; static SLIST_HEAD(, xrefinfo) xreflist = SLIST_HEAD_INITIALIZER(xreflist); static struct mtx xreflist_lock; static boolean_t xref_init_done; #define FIND_BY_XREF 0 #define FIND_BY_NODE 1 #define FIND_BY_DEV 2 /* * xref-phandle-device lookup helper routines. * * As soon as we are able to use malloc(), walk the node tree and build a list * of info that cross-references node handles, xref handles, and device_t * instances. This list exists primarily to allow association of a device_t * with an xref handle, but it is also used to speed up translation between xref * and node handles. Before malloc() is available we have to recursively search * the node tree each time we want to translate between a node and xref handle. * Afterwards we can do the translations by searching this much shorter list. */ static void xrefinfo_create(phandle_t node) { struct xrefinfo * xi; phandle_t child, xref; /* * Recursively descend from parent, looking for nodes with a property * named either "phandle", "ibm,phandle", or "linux,phandle". For each * such node found create an entry in the xreflist. */ for (child = OF_child(node); child != 0; child = OF_peer(child)) { xrefinfo_create(child); if (OF_getencprop(child, "phandle", &xref, sizeof(xref)) == -1 && OF_getencprop(child, "ibm,phandle", &xref, sizeof(xref)) == -1 && OF_getencprop(child, "linux,phandle", &xref, sizeof(xref)) == -1) continue; xi = malloc(sizeof(*xi), M_OFWPROP, M_WAITOK | M_ZERO); xi->node = child; xi->xref = xref; SLIST_INSERT_HEAD(&xreflist, xi, next_entry); } } static void xrefinfo_init(void *unsed) { /* * There is no locking during this init because it runs much earlier * than any of the clients/consumers of the xref list data, but we do * initialize the mutex that will be used for access later. */ mtx_init(&xreflist_lock, "OF xreflist lock", NULL, MTX_DEF); xrefinfo_create(OF_peer(0)); xref_init_done = true; } SYSINIT(xrefinfo, SI_SUB_KMEM, SI_ORDER_ANY, xrefinfo_init, NULL); static struct xrefinfo * xrefinfo_find(uintptr_t key, int find_by) { struct xrefinfo *rv, *xi; rv = NULL; mtx_lock(&xreflist_lock); SLIST_FOREACH(xi, &xreflist, next_entry) { if ((find_by == FIND_BY_XREF && (phandle_t)key == xi->xref) || (find_by == FIND_BY_NODE && (phandle_t)key == xi->node) || (find_by == FIND_BY_DEV && key == (uintptr_t)xi->dev)) { rv = xi; break; } } mtx_unlock(&xreflist_lock); return (rv); } static struct xrefinfo * xrefinfo_add(phandle_t node, phandle_t xref, device_t dev) { struct xrefinfo *xi; xi = malloc(sizeof(*xi), M_OFWPROP, M_WAITOK); xi->node = node; xi->xref = xref; xi->dev = dev; mtx_lock(&xreflist_lock); SLIST_INSERT_HEAD(&xreflist, xi, next_entry); mtx_unlock(&xreflist_lock); return (xi); } /* * OFW install routines. Highest priority wins, equal priority also * overrides allowing last-set to win. */ SET_DECLARE(ofw_set, ofw_def_t); boolean_t OF_install(char *name, int prio) { ofw_def_t *ofwp, **ofwpp; static int curr_prio = 0; /* * Try and locate the OFW kobj corresponding to the name. */ SET_FOREACH(ofwpp, ofw_set) { ofwp = *ofwpp; if (ofwp->name && !strcmp(ofwp->name, name) && prio >= curr_prio) { curr_prio = prio; ofw_def_impl = ofwp; return (TRUE); } } return (FALSE); } /* Initializer */ int OF_init(void *cookie) { phandle_t chosen; int rv; if (ofw_def_impl == NULL) return (-1); ofw_obj = &ofw_kernel_obj; /* * Take care of compiling the selected class, and * then statically initialize the OFW object. */ kobj_class_compile_static(ofw_def_impl, &ofw_kernel_kops); kobj_init_static((kobj_t)ofw_obj, ofw_def_impl); rv = OFW_INIT(ofw_obj, cookie); if ((chosen = OF_finddevice("/chosen")) != -1) if (OF_getencprop(chosen, "stdout", &stdout, sizeof(stdout)) == -1) stdout = -1; return (rv); } static void OF_putchar(int c, void *arg __unused) { char cbuf; if (c == '\n') { cbuf = '\r'; OF_write(stdout, &cbuf, 1); } cbuf = c; OF_write(stdout, &cbuf, 1); } void OF_printf(const char *fmt, ...) { va_list va; va_start(va, fmt); (void)kvprintf(fmt, OF_putchar, NULL, 10, va); va_end(va); } /* * Generic functions */ /* Test to see if a service exists. */ int OF_test(const char *name) { if (ofw_def_impl == NULL) return (-1); return (OFW_TEST(ofw_obj, name)); } int OF_interpret(const char *cmd, int nreturns, ...) { va_list ap; cell_t slots[16]; int i = 0; int status; if (ofw_def_impl == NULL) return (-1); status = OFW_INTERPRET(ofw_obj, cmd, nreturns, slots); if (status == -1) return (status); va_start(ap, nreturns); while (i < nreturns) *va_arg(ap, cell_t *) = slots[i++]; va_end(ap); return (status); } /* * Device tree functions */ /* Return the next sibling of this node or 0. */ phandle_t OF_peer(phandle_t node) { if (ofw_def_impl == NULL) return (0); return (OFW_PEER(ofw_obj, node)); } /* Return the first child of this node or 0. */ phandle_t OF_child(phandle_t node) { if (ofw_def_impl == NULL) return (0); return (OFW_CHILD(ofw_obj, node)); } /* Return the parent of this node or 0. */ phandle_t OF_parent(phandle_t node) { if (ofw_def_impl == NULL) return (0); return (OFW_PARENT(ofw_obj, node)); } /* Return the package handle that corresponds to an instance handle. */ phandle_t OF_instance_to_package(ihandle_t instance) { if (ofw_def_impl == NULL) return (-1); return (OFW_INSTANCE_TO_PACKAGE(ofw_obj, instance)); } /* Get the length of a property of a package. */ ssize_t OF_getproplen(phandle_t package, const char *propname) { if (ofw_def_impl == NULL) return (-1); return (OFW_GETPROPLEN(ofw_obj, package, propname)); } /* Check existence of a property of a package. */ int OF_hasprop(phandle_t package, const char *propname) { return (OF_getproplen(package, propname) >= 0 ? 1 : 0); } /* Get the value of a property of a package. */ ssize_t OF_getprop(phandle_t package, const char *propname, void *buf, size_t buflen) { if (ofw_def_impl == NULL) return (-1); return (OFW_GETPROP(ofw_obj, package, propname, buf, buflen)); } ssize_t OF_getencprop(phandle_t node, const char *propname, pcell_t *buf, size_t len) { ssize_t retval; int i; KASSERT(len % 4 == 0, ("Need a multiple of 4 bytes")); retval = OF_getprop(node, propname, buf, len); + if (retval <= 0) + return (retval); + for (i = 0; i < len/4; i++) buf[i] = be32toh(buf[i]); return (retval); } /* * Recursively search the node and its parent for the given property, working * downward from the node to the device tree root. Returns the value of the * first match. */ ssize_t OF_searchprop(phandle_t node, const char *propname, void *buf, size_t len) { ssize_t rv; for (; node != 0; node = OF_parent(node)) if ((rv = OF_getprop(node, propname, buf, len)) != -1) return (rv); return (-1); } ssize_t OF_searchencprop(phandle_t node, const char *propname, void *buf, size_t len) { ssize_t rv; for (; node != 0; node = OF_parent(node)) if ((rv = OF_getencprop(node, propname, buf, len)) != -1) return (rv); return (-1); } /* * Store the value of a property of a package into newly allocated memory * (using the M_OFWPROP malloc pool and M_WAITOK). elsz is the size of a * single element, the number of elements is return in number. */ ssize_t OF_getprop_alloc(phandle_t package, const char *propname, int elsz, void **buf) { int len; *buf = NULL; if ((len = OF_getproplen(package, propname)) == -1 || len % elsz != 0) return (-1); *buf = malloc(len, M_OFWPROP, M_WAITOK); if (OF_getprop(package, propname, *buf, len) == -1) { free(*buf, M_OFWPROP); *buf = NULL; return (-1); } return (len / elsz); } ssize_t OF_getencprop_alloc(phandle_t package, const char *name, int elsz, void **buf) { ssize_t retval; pcell_t *cell; int i; retval = OF_getprop_alloc(package, name, elsz, buf); if (retval == -1) return (-1); if (retval * elsz % 4 != 0) { free(*buf, M_OFWPROP); *buf = NULL; return (-1); } cell = *buf; for (i = 0; i < retval * elsz / 4; i++) cell[i] = be32toh(cell[i]); return (retval); } /* Get the next property of a package. */ int OF_nextprop(phandle_t package, const char *previous, char *buf, size_t size) { if (ofw_def_impl == NULL) return (-1); return (OFW_NEXTPROP(ofw_obj, package, previous, buf, size)); } /* Set the value of a property of a package. */ int OF_setprop(phandle_t package, const char *propname, const void *buf, size_t len) { if (ofw_def_impl == NULL) return (-1); return (OFW_SETPROP(ofw_obj, package, propname, buf,len)); } /* Convert a device specifier to a fully qualified pathname. */ ssize_t OF_canon(const char *device, char *buf, size_t len) { if (ofw_def_impl == NULL) return (-1); return (OFW_CANON(ofw_obj, device, buf, len)); } /* Return a package handle for the specified device. */ phandle_t OF_finddevice(const char *device) { if (ofw_def_impl == NULL) return (-1); return (OFW_FINDDEVICE(ofw_obj, device)); } /* Return the fully qualified pathname corresponding to an instance. */ ssize_t OF_instance_to_path(ihandle_t instance, char *buf, size_t len) { if (ofw_def_impl == NULL) return (-1); return (OFW_INSTANCE_TO_PATH(ofw_obj, instance, buf, len)); } /* Return the fully qualified pathname corresponding to a package. */ ssize_t OF_package_to_path(phandle_t package, char *buf, size_t len) { if (ofw_def_impl == NULL) return (-1); return (OFW_PACKAGE_TO_PATH(ofw_obj, package, buf, len)); } /* Look up effective phandle (see FDT/PAPR spec) */ static phandle_t OF_child_xref_phandle(phandle_t parent, phandle_t xref) { phandle_t child, rxref; /* * Recursively descend from parent, looking for a node with a property * named either "phandle", "ibm,phandle", or "linux,phandle" that * matches the xref we are looking for. */ for (child = OF_child(parent); child != 0; child = OF_peer(child)) { rxref = OF_child_xref_phandle(child, xref); if (rxref != -1) return (rxref); if (OF_getencprop(child, "phandle", &rxref, sizeof(rxref)) == -1 && OF_getencprop(child, "ibm,phandle", &rxref, sizeof(rxref)) == -1 && OF_getencprop(child, "linux,phandle", &rxref, sizeof(rxref)) == -1) continue; if (rxref == xref) return (child); } return (-1); } phandle_t OF_node_from_xref(phandle_t xref) { struct xrefinfo *xi; phandle_t node; if (xref_init_done) { if ((xi = xrefinfo_find(xref, FIND_BY_XREF)) == NULL) return (xref); return (xi->node); } if ((node = OF_child_xref_phandle(OF_peer(0), xref)) == -1) return (xref); return (node); } phandle_t OF_xref_from_node(phandle_t node) { struct xrefinfo *xi; phandle_t xref; if (xref_init_done) { if ((xi = xrefinfo_find(node, FIND_BY_NODE)) == NULL) return (node); return (xi->xref); } - if (OF_getencprop(node, "phandle", &xref, sizeof(xref)) == - -1 && OF_getencprop(node, "ibm,phandle", &xref, - sizeof(xref)) == -1 && OF_getencprop(node, - "linux,phandle", &xref, sizeof(xref)) == -1) + if (OF_getencprop(node, "phandle", &xref, sizeof(xref)) == -1 && + OF_getencprop(node, "ibm,phandle", &xref, sizeof(xref)) == -1 && + OF_getencprop(node, "linux,phandle", &xref, sizeof(xref)) == -1) return (node); return (xref); } device_t OF_device_from_xref(phandle_t xref) { struct xrefinfo *xi; if (xref_init_done) { if ((xi = xrefinfo_find(xref, FIND_BY_XREF)) == NULL) return (NULL); return (xi->dev); } panic("Attempt to find device before xreflist_init"); } phandle_t OF_xref_from_device(device_t dev) { struct xrefinfo *xi; if (xref_init_done) { if ((xi = xrefinfo_find((uintptr_t)dev, FIND_BY_DEV)) == NULL) return (0); return (xi->xref); } panic("Attempt to find xref before xreflist_init"); } int OF_device_register_xref(phandle_t xref, device_t dev) { struct xrefinfo *xi; /* * If the given xref handle doesn't already exist in the list then we * add a list entry. In theory this can only happen on a system where * nodes don't contain phandle properties and xref and node handles are * synonymous, so the xref handle is added as the node handle as well. */ if (xref_init_done) { if ((xi = xrefinfo_find(xref, FIND_BY_XREF)) == NULL) xrefinfo_add(xref, xref, dev); else xi->dev = dev; return (0); } panic("Attempt to register device before xreflist_init"); } /* Call the method in the scope of a given instance. */ int OF_call_method(const char *method, ihandle_t instance, int nargs, int nreturns, ...) { va_list ap; cell_t args_n_results[12]; int n, status; if (nargs > 6 || ofw_def_impl == NULL) return (-1); va_start(ap, nreturns); for (n = 0; n < nargs; n++) args_n_results[n] = va_arg(ap, cell_t); status = OFW_CALL_METHOD(ofw_obj, instance, method, nargs, nreturns, args_n_results); if (status != 0) return (status); for (; n < nargs + nreturns; n++) *va_arg(ap, cell_t *) = args_n_results[n]; va_end(ap); return (0); } /* * Device I/O functions */ /* Open an instance for a device. */ ihandle_t OF_open(const char *device) { if (ofw_def_impl == NULL) return (0); return (OFW_OPEN(ofw_obj, device)); } /* Close an instance. */ void OF_close(ihandle_t instance) { if (ofw_def_impl == NULL) return; OFW_CLOSE(ofw_obj, instance); } /* Read from an instance. */ ssize_t OF_read(ihandle_t instance, void *addr, size_t len) { if (ofw_def_impl == NULL) return (-1); return (OFW_READ(ofw_obj, instance, addr, len)); } /* Write to an instance. */ ssize_t OF_write(ihandle_t instance, const void *addr, size_t len) { if (ofw_def_impl == NULL) return (-1); return (OFW_WRITE(ofw_obj, instance, addr, len)); } /* Seek to a position. */ int OF_seek(ihandle_t instance, uint64_t pos) { if (ofw_def_impl == NULL) return (-1); return (OFW_SEEK(ofw_obj, instance, pos)); } /* * Memory functions */ /* Claim an area of memory. */ void * OF_claim(void *virt, size_t size, u_int align) { if (ofw_def_impl == NULL) return ((void *)-1); return (OFW_CLAIM(ofw_obj, virt, size, align)); } /* Release an area of memory. */ void OF_release(void *virt, size_t size) { if (ofw_def_impl == NULL) return; OFW_RELEASE(ofw_obj, virt, size); } /* * Control transfer functions */ /* Suspend and drop back to the Open Firmware interface. */ void OF_enter() { if (ofw_def_impl == NULL) return; OFW_ENTER(ofw_obj); } /* Shut down and drop back to the Open Firmware interface. */ void OF_exit() { if (ofw_def_impl == NULL) panic("OF_exit: Open Firmware not available"); /* Should not return */ OFW_EXIT(ofw_obj); for (;;) /* just in case */ ; } Index: projects/clang380-import/sys/dev/sound/pci/hdspe.h =================================================================== --- projects/clang380-import/sys/dev/sound/pci/hdspe.h (revision 294776) +++ projects/clang380-import/sys/dev/sound/pci/hdspe.h (revision 294777) @@ -1,179 +1,205 @@ /*- - * Copyright (c) 2012 Ruslan Bukin + * Copyright (c) 2012-2016 Ruslan Bukin * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ -#define PCI_VENDOR_XILINX 0x10ee -#define PCI_DEVICE_XILINX_HDSPE 0x3fc6 /* AIO, MADI, AES, RayDAT */ -#define PCI_CLASS_REVISION 0x08 -#define PCI_REVISION_AIO 212 -#define PCI_REVISION_RAYDAT 211 +#define PCI_VENDOR_XILINX 0x10ee +#define PCI_DEVICE_XILINX_HDSPE 0x3fc6 /* AIO, MADI, AES, RayDAT */ +#define PCI_CLASS_REVISION 0x08 +#define PCI_REVISION_AIO 212 +#define PCI_REVISION_RAYDAT 211 -#define AIO 0 -#define RAYDAT 1 +#define AIO 0 +#define RAYDAT 1 /* Hardware mixer */ -#define HDSPE_OUT_ENABLE_BASE 512 -#define HDSPE_IN_ENABLE_BASE 768 -#define HDSPE_MIXER_BASE 32768 -#define HDSPE_MAX_GAIN 32768 +#define HDSPE_OUT_ENABLE_BASE 512 +#define HDSPE_IN_ENABLE_BASE 768 +#define HDSPE_MIXER_BASE 32768 +#define HDSPE_MAX_GAIN 32768 /* Buffer */ -#define HDSPE_PAGE_ADDR_BUF_OUT 8192 -#define HDSPE_PAGE_ADDR_BUF_IN (HDSPE_PAGE_ADDR_BUF_OUT + 64 * 16 * 4) -#define HDSPE_BUF_POSITION_MASK 0x000FFC0 +#define HDSPE_PAGE_ADDR_BUF_OUT 8192 +#define HDSPE_PAGE_ADDR_BUF_IN (HDSPE_PAGE_ADDR_BUF_OUT + 64 * 16 * 4) +#define HDSPE_BUF_POSITION_MASK 0x000FFC0 /* Frequency */ -#define HDSPE_FREQ_0 (1<<6) -#define HDSPE_FREQ_1 (1<<7) -#define HDSPE_FREQ_DOUBLE (1<<8) -#define HDSPE_FREQ_QUAD (1<<31) +#define HDSPE_FREQ_0 (1 << 6) +#define HDSPE_FREQ_1 (1 << 7) +#define HDSPE_FREQ_DOUBLE (1 << 8) +#define HDSPE_FREQ_QUAD (1 << 31) -#define HDSPE_FREQ_32000 HDSPE_FREQ_0 -#define HDSPE_FREQ_44100 HDSPE_FREQ_1 -#define HDSPE_FREQ_48000 (HDSPE_FREQ_0 | HDSPE_FREQ_1) -#define HDSPE_FREQ_MASK (HDSPE_FREQ_0 | HDSPE_FREQ_1 | \ +#define HDSPE_FREQ_32000 HDSPE_FREQ_0 +#define HDSPE_FREQ_44100 HDSPE_FREQ_1 +#define HDSPE_FREQ_48000 (HDSPE_FREQ_0 | HDSPE_FREQ_1) +#define HDSPE_FREQ_MASK (HDSPE_FREQ_0 | HDSPE_FREQ_1 | \ HDSPE_FREQ_DOUBLE | HDSPE_FREQ_QUAD) -#define HDSPE_FREQ_MASK_DEFAULT HDSPE_FREQ_48000 -#define HDSPE_FREQ_REG 256 -#define HDSPE_FREQ_AIO 104857600000000ULL +#define HDSPE_FREQ_MASK_DEFAULT HDSPE_FREQ_48000 +#define HDSPE_FREQ_REG 256 +#define HDSPE_FREQ_AIO 104857600000000ULL -#define HDSPE_SPEED_DEFAULT 48000 +#define HDSPE_SPEED_DEFAULT 48000 /* Latency */ -#define HDSPE_LAT_0 (1<<1) -#define HDSPE_LAT_1 (1<<2) -#define HDSPE_LAT_2 (1<<3) -#define HDSPE_LAT_MASK (HDSPE_LAT_0 | HDSPE_LAT_1 | HDSPE_LAT_2) -#define HDSPE_LAT_BYTES_MAX (4096 * 4) -#define HDSPE_LAT_BYTES_MIN (32 * 4) -#define hdspe_encode_latency(x) (((x)<<1) & HDSPE_LAT_MASK) +#define HDSPE_LAT_0 (1 << 1) +#define HDSPE_LAT_1 (1 << 2) +#define HDSPE_LAT_2 (1 << 3) +#define HDSPE_LAT_MASK (HDSPE_LAT_0 | HDSPE_LAT_1 | HDSPE_LAT_2) +#define HDSPE_LAT_BYTES_MAX (4096 * 4) +#define HDSPE_LAT_BYTES_MIN (32 * 4) +#define hdspe_encode_latency(x) (((x)<<1) & HDSPE_LAT_MASK) +/* Gain */ +#define HDSP_ADGain0 (1 << 25) +#define HDSP_ADGain1 (1 << 26) +#define HDSP_DAGain0 (1 << 27) +#define HDSP_DAGain1 (1 << 28) +#define HDSP_PhoneGain0 (1 << 29) +#define HDSP_PhoneGain1 (1 << 30) + +#define HDSP_ADGainMask (HDSP_ADGain0 | HDSP_ADGain1) +#define HDSP_ADGainMinus10dBV (HDSP_ADGainMask) +#define HDSP_ADGainPlus4dBu (HDSP_ADGain0) +#define HDSP_ADGainLowGain 0 + +#define HDSP_DAGainMask (HDSP_DAGain0 | HDSP_DAGain1) +#define HDSP_DAGainHighGain (HDSP_DAGainMask) +#define HDSP_DAGainPlus4dBu (HDSP_DAGain0) +#define HDSP_DAGainMinus10dBV 0 + +#define HDSP_PhoneGainMask (HDSP_PhoneGain0|HDSP_PhoneGain1) +#define HDSP_PhoneGain0dB HDSP_PhoneGainMask +#define HDSP_PhoneGainMinus6dB (HDSP_PhoneGain0) +#define HDSP_PhoneGainMinus12dB 0 + +#define HDSPM_statusRegister 0 +#define HDSPM_statusRegister2 192 + /* Settings */ -#define HDSPE_SETTINGS_REG 0 -#define HDSPE_CONTROL_REG 64 -#define HDSPE_STATUS_REG 0 -#define HDSPE_ENABLE (1<<0) -#define HDSPM_CLOCK_MODE_MASTER (1<<4) +#define HDSPE_SETTINGS_REG 0 +#define HDSPE_CONTROL_REG 64 +#define HDSPE_STATUS_REG 0 +#define HDSPE_ENABLE (1 << 0) +#define HDSPM_CLOCK_MODE_MASTER (1 << 4) /* Interrupts */ -#define HDSPE_AUDIO_IRQ_PENDING (1<<0) -#define HDSPE_AUDIO_INT_ENABLE (1<<5) -#define HDSPE_INTERRUPT_ACK 96 +#define HDSPE_AUDIO_IRQ_PENDING (1 << 0) +#define HDSPE_AUDIO_INT_ENABLE (1 << 5) +#define HDSPE_INTERRUPT_ACK 96 /* Channels */ -#define HDSPE_MAX_SLOTS 64 /* Mono channels */ -#define HDSPE_MAX_CHANS (HDSPE_MAX_SLOTS / 2) /* Stereo pairs */ +#define HDSPE_MAX_SLOTS 64 /* Mono channels */ +#define HDSPE_MAX_CHANS (HDSPE_MAX_SLOTS / 2) /* Stereo pairs */ -#define HDSPE_CHANBUF_SAMPLES (16 * 1024) -#define HDSPE_CHANBUF_SIZE (4 * HDSPE_CHANBUF_SAMPLES) -#define HDSPE_DMASEGSIZE (HDSPE_CHANBUF_SIZE * HDSPE_MAX_SLOTS) +#define HDSPE_CHANBUF_SAMPLES (16 * 1024) +#define HDSPE_CHANBUF_SIZE (4 * HDSPE_CHANBUF_SAMPLES) +#define HDSPE_DMASEGSIZE (HDSPE_CHANBUF_SIZE * HDSPE_MAX_SLOTS) struct hdspe_channel { uint32_t left; uint32_t right; char *descr; uint32_t play; uint32_t rec; }; static MALLOC_DEFINE(M_HDSPE, "hdspe", "hdspe audio"); /* Channel registers */ struct sc_chinfo { struct snd_dbuf *buffer; struct pcm_channel *channel; struct sc_pcminfo *parent; /* Channel information */ uint32_t dir; uint32_t format; uint32_t lslot; uint32_t rslot; uint32_t lvol; uint32_t rvol; /* Buffer */ uint32_t *data; uint32_t size; /* Flags */ uint32_t run; }; /* PCM device private data */ struct sc_pcminfo { device_t dev; uint32_t (*ih) (struct sc_pcminfo *scp); uint32_t chnum; struct sc_chinfo chan[HDSPE_MAX_CHANS]; struct sc_info *sc; struct hdspe_channel *hc; }; /* HDSPe device private data */ struct sc_info { device_t dev; struct mtx *lock; uint32_t ctrl_register; uint32_t settings_register; uint32_t type; /* Control/Status register */ struct resource *cs; int csid; bus_space_tag_t cst; bus_space_handle_t csh; struct resource *irq; int irqid; void *ih; bus_dma_tag_t dmat; /* Play/Record DMA buffers */ uint32_t *pbuf; uint32_t *rbuf; uint32_t bufsize; bus_dmamap_t pmap; bus_dmamap_t rmap; uint32_t period; uint32_t speed; }; -#define hdspe_read_1(sc, regno) \ +#define hdspe_read_1(sc, regno) \ bus_space_read_1((sc)->cst, (sc)->csh, (regno)) -#define hdspe_read_2(sc, regno) \ +#define hdspe_read_2(sc, regno) \ bus_space_read_2((sc)->cst, (sc)->csh, (regno)) -#define hdspe_read_4(sc, regno) \ +#define hdspe_read_4(sc, regno) \ bus_space_read_4((sc)->cst, (sc)->csh, (regno)) -#define hdspe_write_1(sc, regno, data) \ +#define hdspe_write_1(sc, regno, data) \ bus_space_write_1((sc)->cst, (sc)->csh, (regno), (data)) -#define hdspe_write_2(sc, regno, data) \ +#define hdspe_write_2(sc, regno, data) \ bus_space_write_2((sc)->cst, (sc)->csh, (regno), (data)) -#define hdspe_write_4(sc, regno, data) \ +#define hdspe_write_4(sc, regno, data) \ bus_space_write_4((sc)->cst, (sc)->csh, (regno), (data)) Index: projects/clang380-import/sys/dev/uart/uart_tty.c =================================================================== --- projects/clang380-import/sys/dev/uart/uart_tty.c (revision 294776) +++ projects/clang380-import/sys/dev/uart/uart_tty.c (revision 294777) @@ -1,429 +1,439 @@ /*- * Copyright (c) 2003 Marcel Moolenaar * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "uart_if.h" static cn_probe_t uart_cnprobe; static cn_init_t uart_cninit; static cn_term_t uart_cnterm; static cn_getc_t uart_cngetc; static cn_putc_t uart_cnputc; static cn_grab_t uart_cngrab; static cn_ungrab_t uart_cnungrab; +static tsw_open_t uart_tty_open; +static tsw_close_t uart_tty_close; +static tsw_outwakeup_t uart_tty_outwakeup; +static tsw_inwakeup_t uart_tty_inwakeup; +static tsw_ioctl_t uart_tty_ioctl; +static tsw_param_t uart_tty_param; +static tsw_modem_t uart_tty_modem; +static tsw_free_t uart_tty_free; +static tsw_busy_t uart_tty_busy; + CONSOLE_DRIVER(uart); static struct uart_devinfo uart_console; static void uart_cnprobe(struct consdev *cp) { cp->cn_pri = CN_DEAD; KASSERT(uart_console.cookie == NULL, ("foo")); if (uart_cpu_getdev(UART_DEV_CONSOLE, &uart_console)) return; if (uart_probe(&uart_console)) return; strlcpy(cp->cn_name, uart_driver_name, sizeof(cp->cn_name)); cp->cn_pri = (boothowto & RB_SERIAL) ? CN_REMOTE : CN_NORMAL; cp->cn_arg = &uart_console; } static void uart_cninit(struct consdev *cp) { struct uart_devinfo *di; /* * Yedi trick: we need to be able to define cn_dev before we go * single- or multi-user. The problem is that we don't know at * this time what the device will be. Hence, we need to link from * the uart_devinfo to the consdev that corresponds to it so that * we can define cn_dev in uart_bus_attach() when we find the * device during bus enumeration. That's when we'll know what the * the unit number will be. */ di = cp->cn_arg; KASSERT(di->cookie == NULL, ("foo")); di->cookie = cp; di->type = UART_DEV_CONSOLE; uart_add_sysdev(di); uart_init(di); } static void uart_cnterm(struct consdev *cp) { uart_term(cp->cn_arg); } static void uart_cngrab(struct consdev *cp) { uart_grab(cp->cn_arg); } static void uart_cnungrab(struct consdev *cp) { uart_ungrab(cp->cn_arg); } static void uart_cnputc(struct consdev *cp, int c) { uart_putc(cp->cn_arg, c); } static int uart_cngetc(struct consdev *cp) { return (uart_poll(cp->cn_arg)); } static int uart_tty_open(struct tty *tp) { struct uart_softc *sc; sc = tty_softc(tp); if (sc == NULL || sc->sc_leaving) return (ENXIO); sc->sc_opened = 1; return (0); } static void uart_tty_close(struct tty *tp) { struct uart_softc *sc; sc = tty_softc(tp); - if (sc == NULL || sc->sc_leaving || !sc->sc_opened) + if (sc == NULL || sc->sc_leaving || !sc->sc_opened) return; if (sc->sc_hwiflow) UART_IOCTL(sc, UART_IOCTL_IFLOW, 0); if (sc->sc_hwoflow) UART_IOCTL(sc, UART_IOCTL_OFLOW, 0); if (sc->sc_sysdev == NULL) UART_SETSIG(sc, SER_DDTR | SER_DRTS); wakeup(sc); sc->sc_opened = 0; - return; } static void uart_tty_outwakeup(struct tty *tp) { struct uart_softc *sc; sc = tty_softc(tp); if (sc == NULL || sc->sc_leaving) return; if (sc->sc_txbusy) return; /* * Respect RTS/CTS (output) flow control if enabled and not already * handled by hardware. */ if ((tp->t_termios.c_cflag & CCTS_OFLOW) && !sc->sc_hwoflow && !(sc->sc_hwsig & SER_CTS)) return; sc->sc_txdatasz = ttydisc_getc(tp, sc->sc_txbuf, sc->sc_txfifosz); if (sc->sc_txdatasz != 0) UART_TRANSMIT(sc); } static void uart_tty_inwakeup(struct tty *tp) { struct uart_softc *sc; sc = tty_softc(tp); if (sc == NULL || sc->sc_leaving) return; if (sc->sc_isquelch) { if ((tp->t_termios.c_cflag & CRTS_IFLOW) && !sc->sc_hwiflow) UART_SETSIG(sc, SER_DRTS|SER_RTS); sc->sc_isquelch = 0; uart_sched_softih(sc, SER_INT_RXREADY); } } static int -uart_tty_ioctl(struct tty *tp, u_long cmd, caddr_t data, struct thread *td) +uart_tty_ioctl(struct tty *tp, u_long cmd, caddr_t data, + struct thread *td __unused) { struct uart_softc *sc; sc = tty_softc(tp); switch (cmd) { case TIOCSBRK: UART_IOCTL(sc, UART_IOCTL_BREAK, 1); return (0); case TIOCCBRK: UART_IOCTL(sc, UART_IOCTL_BREAK, 0); return (0); default: return pps_ioctl(cmd, data, &sc->sc_pps); } } static int uart_tty_param(struct tty *tp, struct termios *t) { struct uart_softc *sc; int databits, parity, stopbits; sc = tty_softc(tp); if (sc == NULL || sc->sc_leaving) return (ENODEV); if (t->c_ispeed != t->c_ospeed && t->c_ospeed != 0) return (EINVAL); if (t->c_ospeed == 0) { UART_SETSIG(sc, SER_DDTR | SER_DRTS); return (0); } switch (t->c_cflag & CSIZE) { case CS5: databits = 5; break; case CS6: databits = 6; break; case CS7: databits = 7; break; default: databits = 8; break; } stopbits = (t->c_cflag & CSTOPB) ? 2 : 1; if (t->c_cflag & PARENB) - parity = (t->c_cflag & PARODD) ? UART_PARITY_ODD - : UART_PARITY_EVEN; + parity = (t->c_cflag & PARODD) ? UART_PARITY_ODD : + UART_PARITY_EVEN; else parity = UART_PARITY_NONE; if (UART_PARAM(sc, t->c_ospeed, databits, stopbits, parity) != 0) return (EINVAL); UART_SETSIG(sc, SER_DDTR | SER_DTR); /* Set input flow control state. */ if (!sc->sc_hwiflow) { if ((t->c_cflag & CRTS_IFLOW) && sc->sc_isquelch) UART_SETSIG(sc, SER_DRTS); else UART_SETSIG(sc, SER_DRTS | SER_RTS); } else UART_IOCTL(sc, UART_IOCTL_IFLOW, (t->c_cflag & CRTS_IFLOW)); /* Set output flow control state. */ if (sc->sc_hwoflow) UART_IOCTL(sc, UART_IOCTL_OFLOW, (t->c_cflag & CCTS_OFLOW)); return (0); } static int uart_tty_modem(struct tty *tp, int biton, int bitoff) { struct uart_softc *sc; sc = tty_softc(tp); if (biton != 0 || bitoff != 0) - UART_SETSIG(sc, SER_DELTA(bitoff|biton) | biton); + UART_SETSIG(sc, SER_DELTA(bitoff | biton) | biton); return (sc->sc_hwsig); } void uart_tty_intr(void *arg) { struct uart_softc *sc = arg; struct tty *tp; int c, err = 0, pend, sig, xc; if (sc->sc_leaving) return; pend = atomic_readandclear_32(&sc->sc_ttypend); if (!(pend & SER_INT_MASK)) return; tp = sc->sc_u.u_tty.tp; tty_lock(tp); if (pend & SER_INT_RXREADY) { while (!uart_rx_empty(sc) && !sc->sc_isquelch) { xc = uart_rx_peek(sc); c = xc & 0xff; if (xc & UART_STAT_FRAMERR) err |= TRE_FRAMING; if (xc & UART_STAT_OVERRUN) err |= TRE_OVERRUN; if (xc & UART_STAT_PARERR) err |= TRE_PARITY; if (ttydisc_rint(tp, c, err) != 0) { sc->sc_isquelch = 1; if ((tp->t_termios.c_cflag & CRTS_IFLOW) && !sc->sc_hwiflow) UART_SETSIG(sc, SER_DRTS); } else uart_rx_next(sc); } } if (pend & SER_INT_BREAK) ttydisc_rint(tp, 0, TRE_BREAK); if (pend & SER_INT_SIGCHG) { sig = pend & SER_INT_SIGMASK; if (sig & SER_DDCD) ttydisc_modem(tp, sig & SER_DCD); if (sig & SER_DCTS) uart_tty_outwakeup(tp); } if (pend & SER_INT_TXIDLE) uart_tty_outwakeup(tp); ttydisc_rint_done(tp); tty_unlock(tp); } static void -uart_tty_free(void *arg) +uart_tty_free(void *arg __unused) { /* * XXX: uart(4) could reuse the device unit number before it is * being freed by the TTY layer. We should use this hook to free * the device unit number, but unfortunately newbus does not * seem to support such a construct. */ } static bool uart_tty_busy(struct tty *tp) { struct uart_softc *sc; - + sc = tty_softc(tp); if (sc == NULL || sc->sc_leaving) return (FALSE); return (sc->sc_txbusy); } static struct ttydevsw uart_tty_class = { .tsw_flags = TF_INITLOCK|TF_CALLOUT, .tsw_open = uart_tty_open, .tsw_close = uart_tty_close, .tsw_outwakeup = uart_tty_outwakeup, .tsw_inwakeup = uart_tty_inwakeup, .tsw_ioctl = uart_tty_ioctl, .tsw_param = uart_tty_param, .tsw_modem = uart_tty_modem, .tsw_free = uart_tty_free, .tsw_busy = uart_tty_busy, }; int uart_tty_attach(struct uart_softc *sc) { struct tty *tp; int unit; sc->sc_u.u_tty.tp = tp = tty_alloc(&uart_tty_class, sc); unit = device_get_unit(sc->sc_dev); if (sc->sc_sysdev != NULL && sc->sc_sysdev->type == UART_DEV_CONSOLE) { sprintf(((struct consdev *)sc->sc_sysdev->cookie)->cn_name, "ttyu%r", unit); tty_init_console(tp, sc->sc_sysdev->baudrate); } swi_add(&tty_intr_event, uart_driver_name, uart_tty_intr, sc, SWI_TTY, INTR_TYPE_TTY, &sc->sc_softih); tty_makedev(tp, NULL, "u%r", unit); return (0); } int uart_tty_detach(struct uart_softc *sc) { struct tty *tp; tp = sc->sc_u.u_tty.tp; tty_lock(tp); swi_remove(sc->sc_softih); tty_rel_gone(tp); return (0); } struct mtx * uart_tty_getlock(struct uart_softc *sc) { if (sc->sc_u.u_tty.tp != NULL) return (tty_getlock(sc->sc_u.u_tty.tp)); else return (NULL); } Index: projects/clang380-import/sys/fs/ext2fs/ext2_alloc.c =================================================================== --- projects/clang380-import/sys/fs/ext2fs/ext2_alloc.c (revision 294776) +++ projects/clang380-import/sys/fs/ext2fs/ext2_alloc.c (revision 294777) @@ -1,1110 +1,1111 @@ /*- * modified for Lites 1.1 * * Aug 1995, Godmar Back (gback@cs.utah.edu) * University of Utah, Department of Computer Science */ /*- * Copyright (c) 1982, 1986, 1989, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)ffs_alloc.c 8.8 (Berkeley) 2/21/94 * $FreeBSD$ */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include static daddr_t ext2_alloccg(struct inode *, int, daddr_t, int); static daddr_t ext2_clusteralloc(struct inode *, int, daddr_t, int); static u_long ext2_dirpref(struct inode *); static void ext2_fserr(struct m_ext2fs *, uid_t, char *); static u_long ext2_hashalloc(struct inode *, int, long, int, daddr_t (*)(struct inode *, int, daddr_t, int)); static daddr_t ext2_nodealloccg(struct inode *, int, daddr_t, int); static daddr_t ext2_mapsearch(struct m_ext2fs *, char *, daddr_t); /* * Allocate a block in the filesystem. * * A preference may be optionally specified. If a preference is given * the following hierarchy is used to allocate a block: * 1) allocate the requested block. * 2) allocate a rotationally optimal block in the same cylinder. * 3) allocate a block in the same cylinder group. * 4) quadradically rehash into other cylinder groups, until an * available block is located. * If no block preference is given the following hierarchy is used * to allocate a block: * 1) allocate a block in the cylinder group that contains the * inode for the file. * 2) quadradically rehash into other cylinder groups, until an * available block is located. */ int ext2_alloc(struct inode *ip, daddr_t lbn, e4fs_daddr_t bpref, int size, struct ucred *cred, e4fs_daddr_t *bnp) { struct m_ext2fs *fs; struct ext2mount *ump; int32_t bno; int cg; *bnp = 0; fs = ip->i_e2fs; ump = ip->i_ump; mtx_assert(EXT2_MTX(ump), MA_OWNED); #ifdef INVARIANTS if ((u_int)size > fs->e2fs_bsize || blkoff(fs, size) != 0) { vn_printf(ip->i_devvp, "bsize = %lu, size = %d, fs = %s\n", (long unsigned int)fs->e2fs_bsize, size, fs->e2fs_fsmnt); panic("ext2_alloc: bad size"); } if (cred == NOCRED) panic("ext2_alloc: missing credential"); #endif /* INVARIANTS */ if (size == fs->e2fs_bsize && fs->e2fs->e2fs_fbcount == 0) goto nospace; if (cred->cr_uid != 0 && fs->e2fs->e2fs_fbcount < fs->e2fs->e2fs_rbcount) goto nospace; if (bpref >= fs->e2fs->e2fs_bcount) bpref = 0; if (bpref == 0) cg = ino_to_cg(fs, ip->i_number); else cg = dtog(fs, bpref); bno = (daddr_t)ext2_hashalloc(ip, cg, bpref, fs->e2fs_bsize, ext2_alloccg); if (bno > 0) { /* set next_alloc fields as done in block_getblk */ ip->i_next_alloc_block = lbn; ip->i_next_alloc_goal = bno; ip->i_blocks += btodb(fs->e2fs_bsize); ip->i_flag |= IN_CHANGE | IN_UPDATE; *bnp = bno; return (0); } nospace: EXT2_UNLOCK(ump); ext2_fserr(fs, cred->cr_uid, "filesystem full"); uprintf("\n%s: write failed, filesystem is full\n", fs->e2fs_fsmnt); return (ENOSPC); } /* * Reallocate a sequence of blocks into a contiguous sequence of blocks. * * The vnode and an array of buffer pointers for a range of sequential * logical blocks to be made contiguous is given. The allocator attempts * to find a range of sequential blocks starting as close as possible to * an fs_rotdelay offset from the end of the allocation for the logical * block immediately preceding the current range. If successful, the * physical block numbers in the buffer pointers and in the inode are * changed to reflect the new allocation. If unsuccessful, the allocation * is left unchanged. The success in doing the reallocation is returned. * Note that the error return is not reflected back to the user. Rather * the previous block allocation will be used. */ static SYSCTL_NODE(_vfs, OID_AUTO, ext2fs, CTLFLAG_RW, 0, "EXT2FS filesystem"); static int doasyncfree = 1; SYSCTL_INT(_vfs_ext2fs, OID_AUTO, doasyncfree, CTLFLAG_RW, &doasyncfree, 0, "Use asychronous writes to update block pointers when freeing blocks"); static int doreallocblks = 1; SYSCTL_INT(_vfs_ext2fs, OID_AUTO, doreallocblks, CTLFLAG_RW, &doreallocblks, 0, ""); int ext2_reallocblks(struct vop_reallocblks_args *ap) { struct m_ext2fs *fs; struct inode *ip; struct vnode *vp; struct buf *sbp, *ebp; uint32_t *bap, *sbap, *ebap = 0; struct ext2mount *ump; struct cluster_save *buflist; struct indir start_ap[NIADDR + 1], end_ap[NIADDR + 1], *idp; e2fs_lbn_t start_lbn, end_lbn; int soff; e2fs_daddr_t newblk, blkno; int i, len, start_lvl, end_lvl, pref, ssize; if (doreallocblks == 0) return (ENOSPC); vp = ap->a_vp; ip = VTOI(vp); fs = ip->i_e2fs; ump = ip->i_ump; if (fs->e2fs_contigsumsize <= 0) return (ENOSPC); buflist = ap->a_buflist; len = buflist->bs_nchildren; start_lbn = buflist->bs_children[0]->b_lblkno; end_lbn = start_lbn + len - 1; #ifdef INVARIANTS for (i = 1; i < len; i++) if (buflist->bs_children[i]->b_lblkno != start_lbn + i) panic("ext2_reallocblks: non-cluster"); #endif /* * If the cluster crosses the boundary for the first indirect * block, leave space for the indirect block. Indirect blocks * are initially laid out in a position after the last direct * block. Block reallocation would usually destroy locality by * moving the indirect block out of the way to make room for * data blocks if we didn't compensate here. We should also do * this for other indirect block boundaries, but it is only * important for the first one. */ if (start_lbn < NDADDR && end_lbn >= NDADDR) return (ENOSPC); /* * If the latest allocation is in a new cylinder group, assume that * the filesystem has decided to move and do not force it back to * the previous cylinder group. */ if (dtog(fs, dbtofsb(fs, buflist->bs_children[0]->b_blkno)) != dtog(fs, dbtofsb(fs, buflist->bs_children[len - 1]->b_blkno))) return (ENOSPC); if (ext2_getlbns(vp, start_lbn, start_ap, &start_lvl) || ext2_getlbns(vp, end_lbn, end_ap, &end_lvl)) return (ENOSPC); /* * Get the starting offset and block map for the first block. */ if (start_lvl == 0) { sbap = &ip->i_db[0]; soff = start_lbn; } else { idp = &start_ap[start_lvl - 1]; if (bread(vp, idp->in_lbn, (int)fs->e2fs_bsize, NOCRED, &sbp)) { brelse(sbp); return (ENOSPC); } sbap = (u_int *)sbp->b_data; soff = idp->in_off; } /* * If the block range spans two block maps, get the second map. */ if (end_lvl == 0 || (idp = &end_ap[end_lvl - 1])->in_off + 1 >= len) { ssize = len; } else { #ifdef INVARIANTS if (start_ap[start_lvl-1].in_lbn == idp->in_lbn) panic("ext2_reallocblks: start == end"); #endif ssize = len - (idp->in_off + 1); if (bread(vp, idp->in_lbn, (int)fs->e2fs_bsize, NOCRED, &ebp)) goto fail; ebap = (u_int *)ebp->b_data; } /* * Find the preferred location for the cluster. */ EXT2_LOCK(ump); pref = ext2_blkpref(ip, start_lbn, soff, sbap, 0); /* * Search the block map looking for an allocation of the desired size. */ if ((newblk = (e2fs_daddr_t)ext2_hashalloc(ip, dtog(fs, pref), pref, len, ext2_clusteralloc)) == 0){ EXT2_UNLOCK(ump); goto fail; } /* * We have found a new contiguous block. * * First we have to replace the old block pointers with the new * block pointers in the inode and indirect blocks associated * with the file. */ #ifdef DEBUG printf("realloc: ino %ju, lbns %jd-%jd\n\told:", (uintmax_t)ip->i_number, (intmax_t)start_lbn, (intmax_t)end_lbn); #endif /* DEBUG */ blkno = newblk; for (bap = &sbap[soff], i = 0; i < len; i++, blkno += fs->e2fs_fpb) { if (i == ssize) { bap = ebap; soff = -i; } #ifdef INVARIANTS if (buflist->bs_children[i]->b_blkno != fsbtodb(fs, *bap)) panic("ext2_reallocblks: alloc mismatch"); #endif #ifdef DEBUG printf(" %d,", *bap); #endif /* DEBUG */ *bap++ = blkno; } /* * Next we must write out the modified inode and indirect blocks. * For strict correctness, the writes should be synchronous since * the old block values may have been written to disk. In practise * they are almost never written, but if we are concerned about * strict correctness, the `doasyncfree' flag should be set to zero. * * The test on `doasyncfree' should be changed to test a flag * that shows whether the associated buffers and inodes have * been written. The flag should be set when the cluster is * started and cleared whenever the buffer or inode is flushed. * We can then check below to see if it is set, and do the * synchronous write only when it has been cleared. */ if (sbap != &ip->i_db[0]) { if (doasyncfree) bdwrite(sbp); else bwrite(sbp); } else { ip->i_flag |= IN_CHANGE | IN_UPDATE; if (!doasyncfree) ext2_update(vp, 1); } if (ssize < len) { if (doasyncfree) bdwrite(ebp); else bwrite(ebp); } /* * Last, free the old blocks and assign the new blocks to the buffers. */ #ifdef DEBUG printf("\n\tnew:"); #endif /* DEBUG */ for (blkno = newblk, i = 0; i < len; i++, blkno += fs->e2fs_fpb) { ext2_blkfree(ip, dbtofsb(fs, buflist->bs_children[i]->b_blkno), fs->e2fs_bsize); buflist->bs_children[i]->b_blkno = fsbtodb(fs, blkno); #ifdef DEBUG printf(" %d,", blkno); #endif /* DEBUG */ } #ifdef DEBUG printf("\n"); #endif /* DEBUG */ return (0); fail: if (ssize < len) brelse(ebp); if (sbap != &ip->i_db[0]) brelse(sbp); return (ENOSPC); } /* * Allocate an inode in the filesystem. * */ int ext2_valloc(struct vnode *pvp, int mode, struct ucred *cred, struct vnode **vpp) { struct timespec ts; struct inode *pip; struct m_ext2fs *fs; struct inode *ip; struct ext2mount *ump; ino_t ino, ipref; int i, error, cg; *vpp = NULL; pip = VTOI(pvp); fs = pip->i_e2fs; ump = pip->i_ump; EXT2_LOCK(ump); if (fs->e2fs->e2fs_ficount == 0) goto noinodes; /* * If it is a directory then obtain a cylinder group based on * ext2_dirpref else obtain it using ino_to_cg. The preferred inode is * always the next inode. */ if ((mode & IFMT) == IFDIR) { cg = ext2_dirpref(pip); if (fs->e2fs_contigdirs[cg] < 255) fs->e2fs_contigdirs[cg]++; } else { cg = ino_to_cg(fs, pip->i_number); if (fs->e2fs_contigdirs[cg] > 0) fs->e2fs_contigdirs[cg]--; } ipref = cg * fs->e2fs->e2fs_ipg + 1; ino = (ino_t)ext2_hashalloc(pip, cg, (long)ipref, mode, ext2_nodealloccg); if (ino == 0) goto noinodes; error = VFS_VGET(pvp->v_mount, ino, LK_EXCLUSIVE, vpp); if (error) { ext2_vfree(pvp, ino, mode); return (error); } ip = VTOI(*vpp); /* * The question is whether using VGET was such good idea at all: * Linux doesn't read the old inode in when it is allocating a * new one. I will set at least i_size and i_blocks to zero. */ + ip->i_flag = 0; ip->i_size = 0; ip->i_blocks = 0; ip->i_mode = 0; ip->i_flags = 0; /* now we want to make sure that the block pointers are zeroed out */ for (i = 0; i < NDADDR; i++) ip->i_db[i] = 0; for (i = 0; i < NIADDR; i++) ip->i_ib[i] = 0; /* * Set up a new generation number for this inode. * XXX check if this makes sense in ext2 */ if (ip->i_gen == 0 || ++ip->i_gen == 0) ip->i_gen = random() / 2 + 1; vfs_timestamp(&ts); ip->i_birthtime = ts.tv_sec; ip->i_birthnsec = ts.tv_nsec; /* printf("ext2_valloc: allocated inode %d\n", ino); */ return (0); noinodes: EXT2_UNLOCK(ump); ext2_fserr(fs, cred->cr_uid, "out of inodes"); uprintf("\n%s: create/symlink failed, no inodes free\n", fs->e2fs_fsmnt); return (ENOSPC); } /* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. * */ static u_long ext2_dirpref(struct inode *pip) { struct m_ext2fs *fs; int cg, prefcg, cgsize; u_int avgifree, avgbfree, avgndir, curdirsize; u_int minifree, minbfree, maxndir; u_int mincg, minndir; u_int dirsize, maxcontigdirs; mtx_assert(EXT2_MTX(pip->i_ump), MA_OWNED); fs = pip->i_e2fs; avgifree = fs->e2fs->e2fs_ficount / fs->e2fs_gcount; avgbfree = fs->e2fs->e2fs_fbcount / fs->e2fs_gcount; avgndir = fs->e2fs_total_dir / fs->e2fs_gcount; /* * Force allocation in another cg if creating a first level dir. */ ASSERT_VOP_LOCKED(ITOV(pip), "ext2fs_dirpref"); if (ITOV(pip)->v_vflag & VV_ROOT) { prefcg = arc4random() % fs->e2fs_gcount; mincg = prefcg; minndir = fs->e2fs_ipg; for (cg = prefcg; cg < fs->e2fs_gcount; cg++) if (fs->e2fs_gd[cg].ext2bgd_ndirs < minndir && fs->e2fs_gd[cg].ext2bgd_nifree >= avgifree && fs->e2fs_gd[cg].ext2bgd_nbfree >= avgbfree) { mincg = cg; minndir = fs->e2fs_gd[cg].ext2bgd_ndirs; } for (cg = 0; cg < prefcg; cg++) if (fs->e2fs_gd[cg].ext2bgd_ndirs < minndir && fs->e2fs_gd[cg].ext2bgd_nifree >= avgifree && fs->e2fs_gd[cg].ext2bgd_nbfree >= avgbfree) { mincg = cg; minndir = fs->e2fs_gd[cg].ext2bgd_ndirs; } return (mincg); } /* * Count various limits which used for * optimal allocation of a directory inode. */ maxndir = min(avgndir + fs->e2fs_ipg / 16, fs->e2fs_ipg); minifree = avgifree - avgifree / 4; if (minifree < 1) minifree = 1; minbfree = avgbfree - avgbfree / 4; if (minbfree < 1) minbfree = 1; cgsize = fs->e2fs_fsize * fs->e2fs_fpg; dirsize = AVGDIRSIZE; curdirsize = avgndir ? (cgsize - avgbfree * fs->e2fs_bsize) / avgndir : 0; if (dirsize < curdirsize) dirsize = curdirsize; maxcontigdirs = min((avgbfree * fs->e2fs_bsize) / dirsize, 255); maxcontigdirs = min(maxcontigdirs, fs->e2fs_ipg / AFPDIR); if (maxcontigdirs == 0) maxcontigdirs = 1; /* * Limit number of dirs in one cg and reserve space for * regular files, but only if we have no deficit in * inodes or space. */ prefcg = ino_to_cg(fs, pip->i_number); for (cg = prefcg; cg < fs->e2fs_gcount; cg++) if (fs->e2fs_gd[cg].ext2bgd_ndirs < maxndir && fs->e2fs_gd[cg].ext2bgd_nifree >= minifree && fs->e2fs_gd[cg].ext2bgd_nbfree >= minbfree) { if (fs->e2fs_contigdirs[cg] < maxcontigdirs) return (cg); } for (cg = 0; cg < prefcg; cg++) if (fs->e2fs_gd[cg].ext2bgd_ndirs < maxndir && fs->e2fs_gd[cg].ext2bgd_nifree >= minifree && fs->e2fs_gd[cg].ext2bgd_nbfree >= minbfree) { if (fs->e2fs_contigdirs[cg] < maxcontigdirs) return (cg); } /* * This is a backstop when we have deficit in space. */ for (cg = prefcg; cg < fs->e2fs_gcount; cg++) if (fs->e2fs_gd[cg].ext2bgd_nifree >= avgifree) return (cg); for (cg = 0; cg < prefcg; cg++) if (fs->e2fs_gd[cg].ext2bgd_nifree >= avgifree) break; return (cg); } /* * Select the desired position for the next block in a file. * * we try to mimic what Remy does in inode_getblk/block_getblk * * we note: blocknr == 0 means that we're about to allocate either * a direct block or a pointer block at the first level of indirection * (In other words, stuff that will go in i_db[] or i_ib[]) * * blocknr != 0 means that we're allocating a block that is none * of the above. Then, blocknr tells us the number of the block * that will hold the pointer */ e4fs_daddr_t ext2_blkpref(struct inode *ip, e2fs_lbn_t lbn, int indx, e2fs_daddr_t *bap, e2fs_daddr_t blocknr) { int tmp; mtx_assert(EXT2_MTX(ip->i_ump), MA_OWNED); /* if the next block is actually what we thought it is, then set the goal to what we thought it should be */ if (ip->i_next_alloc_block == lbn && ip->i_next_alloc_goal != 0) return ip->i_next_alloc_goal; /* now check whether we were provided with an array that basically tells us previous blocks to which we want to stay closeby */ if (bap) for (tmp = indx - 1; tmp >= 0; tmp--) if (bap[tmp]) return bap[tmp]; /* else let's fall back to the blocknr, or, if there is none, follow the rule that a block should be allocated near its inode */ return blocknr ? blocknr : (e2fs_daddr_t)(ip->i_block_group * EXT2_BLOCKS_PER_GROUP(ip->i_e2fs)) + ip->i_e2fs->e2fs->e2fs_first_dblock; } /* * Implement the cylinder overflow algorithm. * * The policy implemented by this algorithm is: * 1) allocate the block in its requested cylinder group. * 2) quadradically rehash on the cylinder group number. * 3) brute force search for a free block. */ static u_long ext2_hashalloc(struct inode *ip, int cg, long pref, int size, daddr_t (*allocator)(struct inode *, int, daddr_t, int)) { struct m_ext2fs *fs; ino_t result; int i, icg = cg; mtx_assert(EXT2_MTX(ip->i_ump), MA_OWNED); fs = ip->i_e2fs; /* * 1: preferred cylinder group */ result = (*allocator)(ip, cg, pref, size); if (result) return (result); /* * 2: quadratic rehash */ for (i = 1; i < fs->e2fs_gcount; i *= 2) { cg += i; if (cg >= fs->e2fs_gcount) cg -= fs->e2fs_gcount; result = (*allocator)(ip, cg, 0, size); if (result) return (result); } /* * 3: brute force search * Note that we start at i == 2, since 0 was checked initially, * and 1 is always checked in the quadratic rehash. */ cg = (icg + 2) % fs->e2fs_gcount; for (i = 2; i < fs->e2fs_gcount; i++) { result = (*allocator)(ip, cg, 0, size); if (result) return (result); cg++; if (cg == fs->e2fs_gcount) cg = 0; } return (0); } /* * Determine whether a block can be allocated. * * Check to see if a block of the appropriate size is available, * and if it is, allocate it. */ static daddr_t ext2_alloccg(struct inode *ip, int cg, daddr_t bpref, int size) { struct m_ext2fs *fs; struct buf *bp; struct ext2mount *ump; daddr_t bno, runstart, runlen; int bit, loc, end, error, start; char *bbp; /* XXX ondisk32 */ fs = ip->i_e2fs; ump = ip->i_ump; if (fs->e2fs_gd[cg].ext2bgd_nbfree == 0) return (0); EXT2_UNLOCK(ump); error = bread(ip->i_devvp, fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), (int)fs->e2fs_bsize, NOCRED, &bp); if (error) { brelse(bp); EXT2_LOCK(ump); return (0); } if (fs->e2fs_gd[cg].ext2bgd_nbfree == 0) { /* * Another thread allocated the last block in this * group while we were waiting for the buffer. */ brelse(bp); EXT2_LOCK(ump); return (0); } bbp = (char *)bp->b_data; if (dtog(fs, bpref) != cg) bpref = 0; if (bpref != 0) { bpref = dtogd(fs, bpref); /* * if the requested block is available, use it */ if (isclr(bbp, bpref)) { bno = bpref; goto gotit; } } /* * no blocks in the requested cylinder, so take next * available one in this cylinder group. * first try to get 8 contigous blocks, then fall back to a single * block. */ if (bpref) start = dtogd(fs, bpref) / NBBY; else start = 0; end = howmany(fs->e2fs->e2fs_fpg, NBBY) - start; retry: runlen = 0; runstart = 0; for (loc = start; loc < end; loc++) { if (bbp[loc] == (char)0xff) { runlen = 0; continue; } /* Start of a run, find the number of high clear bits. */ if (runlen == 0) { bit = fls(bbp[loc]); runlen = NBBY - bit; runstart = loc * NBBY + bit; } else if (bbp[loc] == 0) { /* Continue a run. */ runlen += NBBY; } else { /* * Finish the current run. If it isn't long * enough, start a new one. */ bit = ffs(bbp[loc]) - 1; runlen += bit; if (runlen >= 8) { bno = runstart; goto gotit; } /* Run was too short, start a new one. */ bit = fls(bbp[loc]); runlen = NBBY - bit; runstart = loc * NBBY + bit; } /* If the current run is long enough, use it. */ if (runlen >= 8) { bno = runstart; goto gotit; } } if (start != 0) { end = start; start = 0; goto retry; } bno = ext2_mapsearch(fs, bbp, bpref); if (bno < 0){ brelse(bp); EXT2_LOCK(ump); return (0); } gotit: #ifdef INVARIANTS if (isset(bbp, bno)) { printf("ext2fs_alloccgblk: cg=%d bno=%jd fs=%s\n", cg, (intmax_t)bno, fs->e2fs_fsmnt); panic("ext2fs_alloccg: dup alloc"); } #endif setbit(bbp, bno); EXT2_LOCK(ump); ext2_clusteracct(fs, bbp, cg, bno, -1); fs->e2fs->e2fs_fbcount--; fs->e2fs_gd[cg].ext2bgd_nbfree--; fs->e2fs_fmod = 1; EXT2_UNLOCK(ump); bdwrite(bp); return (cg * fs->e2fs->e2fs_fpg + fs->e2fs->e2fs_first_dblock + bno); } /* * Determine whether a cluster can be allocated. */ static daddr_t ext2_clusteralloc(struct inode *ip, int cg, daddr_t bpref, int len) { struct m_ext2fs *fs; struct ext2mount *ump; struct buf *bp; char *bbp; int bit, error, got, i, loc, run; int32_t *lp; daddr_t bno; fs = ip->i_e2fs; ump = ip->i_ump; if (fs->e2fs_maxcluster[cg] < len) return (0); EXT2_UNLOCK(ump); error = bread(ip->i_devvp, fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), (int)fs->e2fs_bsize, NOCRED, &bp); if (error) goto fail_lock; bbp = (char *)bp->b_data; EXT2_LOCK(ump); /* * Check to see if a cluster of the needed size (or bigger) is * available in this cylinder group. */ lp = &fs->e2fs_clustersum[cg].cs_sum[len]; for (i = len; i <= fs->e2fs_contigsumsize; i++) if (*lp++ > 0) break; if (i > fs->e2fs_contigsumsize) { /* * Update the cluster summary information to reflect * the true maximum-sized cluster so that future cluster * allocation requests can avoid reading the bitmap only * to find no cluster. */ lp = &fs->e2fs_clustersum[cg].cs_sum[len - 1]; for (i = len - 1; i > 0; i--) if (*lp-- > 0) break; fs->e2fs_maxcluster[cg] = i; goto fail; } EXT2_UNLOCK(ump); /* Search the bitmap to find a big enough cluster like in FFS. */ if (dtog(fs, bpref) != cg) bpref = 0; if (bpref != 0) bpref = dtogd(fs, bpref); loc = bpref / NBBY; bit = 1 << (bpref % NBBY); for (run = 0, got = bpref; got < fs->e2fs->e2fs_fpg; got++) { if ((bbp[loc] & bit) != 0) run = 0; else { run++; if (run == len) break; } if ((got & (NBBY - 1)) != (NBBY - 1)) bit <<= 1; else { loc++; bit = 1; } } if (got >= fs->e2fs->e2fs_fpg) goto fail_lock; /* Allocate the cluster that we found. */ for (i = 1; i < len; i++) if (!isclr(bbp, got - run + i)) panic("ext2_clusteralloc: map mismatch"); bno = got - run + 1; if (bno >= fs->e2fs->e2fs_fpg) panic("ext2_clusteralloc: allocated out of group"); EXT2_LOCK(ump); for (i = 0; i < len; i += fs->e2fs_fpb) { setbit(bbp, bno + i); ext2_clusteracct(fs, bbp, cg, bno + i, -1); fs->e2fs->e2fs_fbcount--; fs->e2fs_gd[cg].ext2bgd_nbfree--; } fs->e2fs_fmod = 1; EXT2_UNLOCK(ump); bdwrite(bp); return (cg * fs->e2fs->e2fs_fpg + fs->e2fs->e2fs_first_dblock + bno); fail_lock: EXT2_LOCK(ump); fail: brelse(bp); return (0); } /* * Determine whether an inode can be allocated. * * Check to see if an inode is available, and if it is, * allocate it using tode in the specified cylinder group. */ static daddr_t ext2_nodealloccg(struct inode *ip, int cg, daddr_t ipref, int mode) { struct m_ext2fs *fs; struct buf *bp; struct ext2mount *ump; int error, start, len; char *ibp, *loc; ipref--; /* to avoid a lot of (ipref -1) */ if (ipref == -1) ipref = 0; fs = ip->i_e2fs; ump = ip->i_ump; if (fs->e2fs_gd[cg].ext2bgd_nifree == 0) return (0); EXT2_UNLOCK(ump); error = bread(ip->i_devvp, fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_i_bitmap), (int)fs->e2fs_bsize, NOCRED, &bp); if (error) { brelse(bp); EXT2_LOCK(ump); return (0); } if (fs->e2fs_gd[cg].ext2bgd_nifree == 0) { /* * Another thread allocated the last i-node in this * group while we were waiting for the buffer. */ brelse(bp); EXT2_LOCK(ump); return (0); } ibp = (char *)bp->b_data; if (ipref) { ipref %= fs->e2fs->e2fs_ipg; if (isclr(ibp, ipref)) goto gotit; } start = ipref / NBBY; len = howmany(fs->e2fs->e2fs_ipg - ipref, NBBY); loc = memcchr(&ibp[start], 0xff, len); if (loc == NULL) { len = start + 1; start = 0; loc = memcchr(&ibp[start], 0xff, len); if (loc == NULL) { printf("cg = %d, ipref = %lld, fs = %s\n", cg, (long long)ipref, fs->e2fs_fsmnt); panic("ext2fs_nodealloccg: map corrupted"); /* NOTREACHED */ } } ipref = (loc - ibp) * NBBY + ffs(~*loc) - 1; gotit: setbit(ibp, ipref); EXT2_LOCK(ump); fs->e2fs_gd[cg].ext2bgd_nifree--; fs->e2fs->e2fs_ficount--; fs->e2fs_fmod = 1; if ((mode & IFMT) == IFDIR) { fs->e2fs_gd[cg].ext2bgd_ndirs++; fs->e2fs_total_dir++; } EXT2_UNLOCK(ump); bdwrite(bp); return (cg * fs->e2fs->e2fs_ipg + ipref +1); } /* * Free a block or fragment. * */ void ext2_blkfree(struct inode *ip, e4fs_daddr_t bno, long size) { struct m_ext2fs *fs; struct buf *bp; struct ext2mount *ump; int cg, error; char *bbp; fs = ip->i_e2fs; ump = ip->i_ump; cg = dtog(fs, bno); if ((u_int)bno >= fs->e2fs->e2fs_bcount) { printf("bad block %lld, ino %ju\n", (long long)bno, (uintmax_t)ip->i_number); ext2_fserr(fs, ip->i_uid, "bad block"); return; } error = bread(ip->i_devvp, fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_b_bitmap), (int)fs->e2fs_bsize, NOCRED, &bp); if (error) { brelse(bp); return; } bbp = (char *)bp->b_data; bno = dtogd(fs, bno); if (isclr(bbp, bno)) { printf("block = %lld, fs = %s\n", (long long)bno, fs->e2fs_fsmnt); panic("ext2_blkfree: freeing free block"); } clrbit(bbp, bno); EXT2_LOCK(ump); ext2_clusteracct(fs, bbp, cg, bno, 1); fs->e2fs->e2fs_fbcount++; fs->e2fs_gd[cg].ext2bgd_nbfree++; fs->e2fs_fmod = 1; EXT2_UNLOCK(ump); bdwrite(bp); } /* * Free an inode. * */ int ext2_vfree(struct vnode *pvp, ino_t ino, int mode) { struct m_ext2fs *fs; struct inode *pip; struct buf *bp; struct ext2mount *ump; int error, cg; char * ibp; pip = VTOI(pvp); fs = pip->i_e2fs; ump = pip->i_ump; if ((u_int)ino > fs->e2fs_ipg * fs->e2fs_gcount) panic("ext2_vfree: range: devvp = %p, ino = %ju, fs = %s", pip->i_devvp, (uintmax_t)ino, fs->e2fs_fsmnt); cg = ino_to_cg(fs, ino); error = bread(pip->i_devvp, fsbtodb(fs, fs->e2fs_gd[cg].ext2bgd_i_bitmap), (int)fs->e2fs_bsize, NOCRED, &bp); if (error) { brelse(bp); return (0); } ibp = (char *)bp->b_data; ino = (ino - 1) % fs->e2fs->e2fs_ipg; if (isclr(ibp, ino)) { printf("ino = %llu, fs = %s\n", (unsigned long long)ino, fs->e2fs_fsmnt); if (fs->e2fs_ronly == 0) panic("ext2_vfree: freeing free inode"); } clrbit(ibp, ino); EXT2_LOCK(ump); fs->e2fs->e2fs_ficount++; fs->e2fs_gd[cg].ext2bgd_nifree++; if ((mode & IFMT) == IFDIR) { fs->e2fs_gd[cg].ext2bgd_ndirs--; fs->e2fs_total_dir--; } fs->e2fs_fmod = 1; EXT2_UNLOCK(ump); bdwrite(bp); return (0); } /* * Find a block in the specified cylinder group. * * It is a panic if a request is made to find a block if none are * available. */ static daddr_t ext2_mapsearch(struct m_ext2fs *fs, char *bbp, daddr_t bpref) { char *loc; int start, len; /* * find the fragment by searching through the free block * map for an appropriate bit pattern */ if (bpref) start = dtogd(fs, bpref) / NBBY; else start = 0; len = howmany(fs->e2fs->e2fs_fpg, NBBY) - start; loc = memcchr(&bbp[start], 0xff, len); if (loc == NULL) { len = start + 1; start = 0; loc = memcchr(&bbp[start], 0xff, len); if (loc == NULL) { printf("start = %d, len = %d, fs = %s\n", start, len, fs->e2fs_fsmnt); panic("ext2_mapsearch: map corrupted"); /* NOTREACHED */ } } return ((loc - bbp) * NBBY + ffs(~*loc) - 1); } /* * Fserr prints the name of a filesystem with an error diagnostic. * * The form of the error message is: * fs: error message */ static void ext2_fserr(struct m_ext2fs *fs, uid_t uid, char *cp) { log(LOG_ERR, "uid %u on %s: %s\n", uid, fs->e2fs_fsmnt, cp); } int cg_has_sb(int i) { int a3, a5, a7; if (i == 0 || i == 1) return 1; for (a3 = 3, a5 = 5, a7 = 7; a3 <= i || a5 <= i || a7 <= i; a3 *= 3, a5 *= 5, a7 *= 7) if (i == a3 || i == a5 || i == a7) return 1; return 0; } Index: projects/clang380-import/sys/fs/ext2fs/ext2_dinode.h =================================================================== --- projects/clang380-import/sys/fs/ext2fs/ext2_dinode.h (revision 294776) +++ projects/clang380-import/sys/fs/ext2fs/ext2_dinode.h (revision 294777) @@ -1,137 +1,137 @@ /*- * Copyright (c) 2009 Aditya Sarawgi * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _FS_EXT2FS_EXT2_DINODE_H_ #define _FS_EXT2FS_EXT2_DINODE_H_ /* * Special inode numbers * The root inode is the root of the file system. Inode 0 can't be used for * normal purposes and bad blocks are normally linked to inode 1, thus * the root inode is 2. * Inode 3 to 10 are reserved in ext2fs. */ #define EXT2_BADBLKINO ((ino_t)1) #define EXT2_ROOTINO ((ino_t)2) #define EXT2_ACLIDXINO ((ino_t)3) #define EXT2_ACLDATAINO ((ino_t)4) #define EXT2_BOOTLOADERINO ((ino_t)5) #define EXT2_UNDELDIRINO ((ino_t)6) #define EXT2_RESIZEINO ((ino_t)7) #define EXT2_JOURNALINO ((ino_t)8) #define EXT2_EXCLUDEINO ((ino_t)9) #define EXT2_REPLICAINO ((ino_t)10) #define EXT2_FIRSTINO ((ino_t)11) /* * Inode flags * The system supports EXT2_IMMUTABLE, EXT2_APPEND and EXT2_NODUMP flags. - * The current implementation also uses EXT4_INDEX, EXT4_EXTENTS and - * EXT4_HUGE_FILE with some restrictions, imposed the lack of write + * The current implementation also uses EXT3_INDEX, EXT4_EXTENTS and + * EXT4_HUGE_FILE with some restrictions imposed by the lack of write * support. */ #define EXT2_SECRM 0x00000001 /* Secure deletion */ #define EXT2_UNRM 0x00000002 /* Undelete */ #define EXT2_COMPR 0x00000004 /* Compress file */ #define EXT2_SYNC 0x00000008 /* Synchronous updates */ #define EXT2_IMMUTABLE 0x00000010 /* Immutable file */ #define EXT2_APPEND 0x00000020 /* Writes to file may only append */ #define EXT2_NODUMP 0x00000040 /* Do not dump file */ #define EXT2_NOATIME 0x00000080 /* Do not update atime */ -#define EXT4_INDEX 0x00001000 /* Hash-indexed directory */ +#define EXT3_INDEX 0x00001000 /* Hash-indexed directory */ #define EXT4_IMAGIC 0x00002000 /* AFS directory */ #define EXT4_JOURNAL_DATA 0x00004000 /* File data should be journaled */ #define EXT4_NOTAIL 0x00008000 /* File tail should not be merged */ #define EXT4_DIRSYNC 0x00010000 /* Dirsync behaviour */ #define EXT4_TOPDIR 0x00020000 /* Top of directory hierarchies*/ #define EXT4_HUGE_FILE 0x00040000 /* Set to each huge file */ #define EXT4_EXTENTS 0x00080000 /* Inode uses extents */ #define EXT4_EOFBLOCKS 0x00400000 /* Blocks allocated beyond EOF */ /* * Definitions for nanosecond timestamps. * Ext3 inode versioning, 2006-12-13. */ #define EXT3_EPOCH_BITS 2 #define EXT3_EPOCH_MASK ((1 << EXT3_EPOCH_BITS) - 1) #define EXT3_NSEC_MASK (~0UL << EXT3_EPOCH_BITS) #define E2DI_HAS_XTIME(ip) (EXT2_HAS_RO_COMPAT_FEATURE(ip->i_e2fs, \ EXT2F_ROCOMPAT_EXTRA_ISIZE)) #define E2DI_HAS_HUGE_FILE(ip) (EXT2_HAS_RO_COMPAT_FEATURE(ip->i_e2fs, \ EXT2F_ROCOMPAT_HUGE_FILE)) /* * Constants relative to the data blocks */ #define EXT2_NDIR_BLOCKS 12 #define EXT2_IND_BLOCK EXT2_NDIR_BLOCKS #define EXT2_DIND_BLOCK (EXT2_IND_BLOCK + 1) #define EXT2_TIND_BLOCK (EXT2_DIND_BLOCK + 1) #define EXT2_N_BLOCKS (EXT2_TIND_BLOCK + 1) #define EXT2_MAXSYMLINKLEN (EXT2_N_BLOCKS * sizeof(uint32_t)) /* * Structure of an inode on the disk */ struct ext2fs_dinode { uint16_t e2di_mode; /* 0: IFMT, permissions; see below. */ uint16_t e2di_uid; /* 2: Owner UID */ uint32_t e2di_size; /* 4: Size (in bytes) */ uint32_t e2di_atime; /* 8: Access time */ uint32_t e2di_ctime; /* 12: Change time */ uint32_t e2di_mtime; /* 16: Modification time */ uint32_t e2di_dtime; /* 20: Deletion time */ uint16_t e2di_gid; /* 24: Owner GID */ uint16_t e2di_nlink; /* 26: File link count */ uint32_t e2di_nblock; /* 28: Blocks count */ uint32_t e2di_flags; /* 32: Status flags (chflags) */ uint32_t e2di_version; /* 36: Low 32 bits inode version */ uint32_t e2di_blocks[EXT2_N_BLOCKS]; /* 40: disk blocks */ uint32_t e2di_gen; /* 100: generation number */ uint32_t e2di_facl; /* 104: Low EA block */ uint32_t e2di_size_high; /* 108: Upper bits of file size */ uint32_t e2di_faddr; /* 112: Fragment address (obsolete) */ uint16_t e2di_nblock_high; /* 116: Blocks count bits 47:32 */ uint16_t e2di_facl_high; /* 118: File EA bits 47:32 */ uint16_t e2di_uid_high; /* 120: Owner UID top 16 bits */ uint16_t e2di_gid_high; /* 122: Owner GID top 16 bits */ uint16_t e2di_chksum_lo; /* 124: Lower inode checksum */ uint16_t e2di_lx_reserved; /* 126: Unused */ uint16_t e2di_extra_isize; /* 128: Size of this inode */ uint16_t e2di_chksum_hi; /* 130: High inode checksum */ uint32_t e2di_ctime_extra; /* 132: Extra change time */ uint32_t e2di_mtime_extra; /* 136: Extra modification time */ uint32_t e2di_atime_extra; /* 140: Extra access time */ uint32_t e2di_crtime; /* 144: Creation (birth)time */ uint32_t e2di_crtime_extra; /* 148: Extra creation (birth)time */ uint32_t e2di_version_hi; /* 152: High bits of inode version */ }; #endif /* !_FS_EXT2FS_EXT2_DINODE_H_ */ Index: projects/clang380-import/sys/fs/ext2fs/ext2_htree.c =================================================================== --- projects/clang380-import/sys/fs/ext2fs/ext2_htree.c (revision 294776) +++ projects/clang380-import/sys/fs/ext2fs/ext2_htree.c (revision 294777) @@ -1,899 +1,899 @@ /*- * Copyright (c) 2010, 2012 Zheng Liu * Copyright (c) 2012, Vyacheslav Matyushin * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static void ext2_append_entry(char *block, uint32_t blksize, struct ext2fs_direct_2 *last_entry, struct ext2fs_direct_2 *new_entry); static int ext2_htree_append_block(struct vnode *vp, char *data, struct componentname *cnp, uint32_t blksize); static int ext2_htree_check_next(struct inode *ip, uint32_t hash, const char *name, struct ext2fs_htree_lookup_info *info); static int ext2_htree_cmp_sort_entry(const void *e1, const void *e2); static int ext2_htree_find_leaf(struct inode *ip, const char *name, int namelen, uint32_t *hash, uint8_t *hash_version, struct ext2fs_htree_lookup_info *info); static uint32_t ext2_htree_get_block(struct ext2fs_htree_entry *ep); static uint16_t ext2_htree_get_count(struct ext2fs_htree_entry *ep); static uint32_t ext2_htree_get_hash(struct ext2fs_htree_entry *ep); static uint16_t ext2_htree_get_limit(struct ext2fs_htree_entry *ep); static void ext2_htree_insert_entry_to_level(struct ext2fs_htree_lookup_level *level, uint32_t hash, uint32_t blk); static void ext2_htree_insert_entry(struct ext2fs_htree_lookup_info *info, uint32_t hash, uint32_t blk); static uint32_t ext2_htree_node_limit(struct inode *ip); static void ext2_htree_set_block(struct ext2fs_htree_entry *ep, uint32_t blk); static void ext2_htree_set_count(struct ext2fs_htree_entry *ep, uint16_t cnt); static void ext2_htree_set_hash(struct ext2fs_htree_entry *ep, uint32_t hash); static void ext2_htree_set_limit(struct ext2fs_htree_entry *ep, uint16_t limit); static int ext2_htree_split_dirblock(char *block1, char *block2, uint32_t blksize, uint32_t *hash_seed, uint8_t hash_version, uint32_t *split_hash, struct ext2fs_direct_2 *entry); static void ext2_htree_release(struct ext2fs_htree_lookup_info *info); static uint32_t ext2_htree_root_limit(struct inode *ip, int len); static int ext2_htree_writebuf(struct ext2fs_htree_lookup_info *info); int ext2_htree_has_idx(struct inode *ip) { if (EXT2_HAS_COMPAT_FEATURE(ip->i_e2fs, EXT2F_COMPAT_DIRHASHINDEX) && - ip->i_flag & IN_E4INDEX) + ip->i_flag & IN_E3INDEX) return (1); else return (0); } static int ext2_htree_check_next(struct inode *ip, uint32_t hash, const char *name, struct ext2fs_htree_lookup_info *info) { struct vnode *vp = ITOV(ip); struct ext2fs_htree_lookup_level *level; struct buf *bp; uint32_t next_hash; int idx = info->h_levels_num - 1; int levels = 0; do { level = &info->h_levels[idx]; level->h_entry++; if (level->h_entry < level->h_entries + ext2_htree_get_count(level->h_entries)) break; if (idx == 0) return (0); idx--; levels++; } while (1); next_hash = ext2_htree_get_hash(level->h_entry); if ((hash & 1) == 0) { if (hash != (next_hash & ~1)) return (0); } while (levels > 0) { levels--; if (ext2_blkatoff(vp, ext2_htree_get_block(level->h_entry) * ip->i_e2fs->e2fs_bsize, NULL, &bp) != 0) return (0); level = &info->h_levels[idx + 1]; brelse(level->h_bp); level->h_bp = bp; level->h_entry = level->h_entries = ((struct ext2fs_htree_node *)bp->b_data)->h_entries; } return (1); } static uint32_t ext2_htree_get_block(struct ext2fs_htree_entry *ep) { return (ep->h_blk & 0x00FFFFFF); } static void ext2_htree_set_block(struct ext2fs_htree_entry *ep, uint32_t blk) { ep->h_blk = blk; } static uint16_t ext2_htree_get_count(struct ext2fs_htree_entry *ep) { return (((struct ext2fs_htree_count *)(ep))->h_entries_num); } static void ext2_htree_set_count(struct ext2fs_htree_entry *ep, uint16_t cnt) { ((struct ext2fs_htree_count *)(ep))->h_entries_num = cnt; } static uint32_t ext2_htree_get_hash(struct ext2fs_htree_entry *ep) { return (ep->h_hash); } static uint16_t ext2_htree_get_limit(struct ext2fs_htree_entry *ep) { return (((struct ext2fs_htree_count *)(ep))->h_entries_max); } static void ext2_htree_set_hash(struct ext2fs_htree_entry *ep, uint32_t hash) { ep->h_hash = hash; } static void ext2_htree_set_limit(struct ext2fs_htree_entry *ep, uint16_t limit) { ((struct ext2fs_htree_count *)(ep))->h_entries_max = limit; } static void ext2_htree_release(struct ext2fs_htree_lookup_info *info) { int i; for (i = 0; i < info->h_levels_num; i++) { struct buf *bp = info->h_levels[i].h_bp; if (bp != NULL) brelse(bp); } } static uint32_t ext2_htree_root_limit(struct inode *ip, int len) { uint32_t space; space = ip->i_e2fs->e2fs_bsize - EXT2_DIR_REC_LEN(1) - EXT2_DIR_REC_LEN(2) - len; return (space / sizeof(struct ext2fs_htree_entry)); } static uint32_t ext2_htree_node_limit(struct inode *ip) { struct m_ext2fs *fs; uint32_t space; fs = ip->i_e2fs; space = fs->e2fs_bsize - EXT2_DIR_REC_LEN(0); return (space / sizeof(struct ext2fs_htree_entry)); } static int ext2_htree_find_leaf(struct inode *ip, const char *name, int namelen, uint32_t *hash, uint8_t *hash_ver, struct ext2fs_htree_lookup_info *info) { struct vnode *vp; struct ext2fs *fs; struct m_ext2fs *m_fs; struct buf *bp = NULL; struct ext2fs_htree_root *rootp; struct ext2fs_htree_entry *entp, *start, *end, *middle, *found; struct ext2fs_htree_lookup_level *level_info; uint32_t hash_major = 0, hash_minor = 0; uint32_t levels, cnt; uint8_t hash_version; if (name == NULL || info == NULL) return (-1); vp = ITOV(ip); fs = ip->i_e2fs->e2fs; m_fs = ip->i_e2fs; if (ext2_blkatoff(vp, 0, NULL, &bp) != 0) return (-1); info->h_levels_num = 1; info->h_levels[0].h_bp = bp; rootp = (struct ext2fs_htree_root *)bp->b_data; if (rootp->h_info.h_hash_version != EXT2_HTREE_LEGACY && rootp->h_info.h_hash_version != EXT2_HTREE_HALF_MD4 && rootp->h_info.h_hash_version != EXT2_HTREE_TEA) goto error; hash_version = rootp->h_info.h_hash_version; if (hash_version <= EXT2_HTREE_TEA) hash_version += m_fs->e2fs_uhash; *hash_ver = hash_version; ext2_htree_hash(name, namelen, fs->e3fs_hash_seed, hash_version, &hash_major, &hash_minor); *hash = hash_major; if ((levels = rootp->h_info.h_ind_levels) > 1) goto error; entp = (struct ext2fs_htree_entry *)(((char *)&rootp->h_info) + rootp->h_info.h_info_len); if (ext2_htree_get_limit(entp) != ext2_htree_root_limit(ip, rootp->h_info.h_info_len)) goto error; while (1) { cnt = ext2_htree_get_count(entp); if (cnt == 0 || cnt > ext2_htree_get_limit(entp)) goto error; start = entp + 1; end = entp + cnt - 1; while (start <= end) { middle = start + (end - start) / 2; if (ext2_htree_get_hash(middle) > hash_major) end = middle - 1; else start = middle + 1; } found = start - 1; level_info = &(info->h_levels[info->h_levels_num - 1]); level_info->h_bp = bp; level_info->h_entries = entp; level_info->h_entry = found; if (levels == 0) return (0); levels--; if (ext2_blkatoff(vp, ext2_htree_get_block(found) * m_fs->e2fs_bsize, NULL, &bp) != 0) goto error; entp = ((struct ext2fs_htree_node *)bp->b_data)->h_entries; info->h_levels_num++; info->h_levels[info->h_levels_num - 1].h_bp = bp; } error: ext2_htree_release(info); return (-1); } /* * Try to lookup a directory entry in HTree index */ int ext2_htree_lookup(struct inode *ip, const char *name, int namelen, struct buf **bpp, int *entryoffp, doff_t *offp, doff_t *prevoffp, doff_t *endusefulp, struct ext2fs_searchslot *ss) { struct vnode *vp; struct ext2fs_htree_lookup_info info; struct ext2fs_htree_entry *leaf_node; struct m_ext2fs *m_fs; struct buf *bp; uint32_t blk; uint32_t dirhash; uint32_t bsize; uint8_t hash_version; int search_next; int found = 0; m_fs = ip->i_e2fs; bsize = m_fs->e2fs_bsize; vp = ITOV(ip); /* TODO: print error msg because we don't lookup '.' and '..' */ memset(&info, 0, sizeof(info)); if (ext2_htree_find_leaf(ip, name, namelen, &dirhash, &hash_version, &info)) return (-1); do { leaf_node = info.h_levels[info.h_levels_num - 1].h_entry; blk = ext2_htree_get_block(leaf_node); if (ext2_blkatoff(vp, blk * bsize, NULL, &bp) != 0) { ext2_htree_release(&info); return (-1); } *offp = blk * bsize; *entryoffp = 0; *prevoffp = blk * bsize; *endusefulp = blk * bsize; if (ss->slotstatus == NONE) { ss->slotoffset = -1; ss->slotfreespace = 0; } if (ext2_search_dirblock(ip, bp->b_data, &found, name, namelen, entryoffp, offp, prevoffp, endusefulp, ss) != 0) { brelse(bp); ext2_htree_release(&info); return (-1); } if (found) { *bpp = bp; ext2_htree_release(&info); return (0); } brelse(bp); search_next = ext2_htree_check_next(ip, dirhash, name, &info); } while (search_next); ext2_htree_release(&info); return (ENOENT); } static int ext2_htree_append_block(struct vnode *vp, char *data, struct componentname *cnp, uint32_t blksize) { struct iovec aiov; struct uio auio; struct inode *dp = VTOI(vp); uint64_t cursize, newsize; int error; cursize = roundup(dp->i_size, blksize); newsize = cursize + blksize; auio.uio_offset = cursize; auio.uio_resid = blksize; aiov.iov_len = blksize; aiov.iov_base = data; auio.uio_iov = &aiov; auio.uio_iovcnt = 1; auio.uio_rw = UIO_WRITE; auio.uio_segflg = UIO_SYSSPACE; error = VOP_WRITE(vp, &auio, IO_SYNC, cnp->cn_cred); if (!error) dp->i_size = newsize; return (error); } static int ext2_htree_writebuf(struct ext2fs_htree_lookup_info *info) { int i, error; for (i = 0; i < info->h_levels_num; i++) { struct buf *bp = info->h_levels[i].h_bp; error = bwrite(bp); if (error) return (error); } return (0); } static void ext2_htree_insert_entry_to_level(struct ext2fs_htree_lookup_level *level, uint32_t hash, uint32_t blk) { struct ext2fs_htree_entry *target; int entries_num; target = level->h_entry + 1; entries_num = ext2_htree_get_count(level->h_entries); memmove(target + 1, target, (char *)(level->h_entries + entries_num) - (char *)target); ext2_htree_set_block(target, blk); ext2_htree_set_hash(target, hash); ext2_htree_set_count(level->h_entries, entries_num + 1); } /* * Insert an index entry to the index node. */ static void ext2_htree_insert_entry(struct ext2fs_htree_lookup_info *info, uint32_t hash, uint32_t blk) { struct ext2fs_htree_lookup_level *level; level = &info->h_levels[info->h_levels_num - 1]; ext2_htree_insert_entry_to_level(level, hash, blk); } /* * Compare two entry sort descriptors by name hash value. * This is used together with qsort. */ static int ext2_htree_cmp_sort_entry(const void *e1, const void *e2) { const struct ext2fs_htree_sort_entry *entry1, *entry2; entry1 = (const struct ext2fs_htree_sort_entry *)e1; entry2 = (const struct ext2fs_htree_sort_entry *)e2; if (entry1->h_hash < entry2->h_hash) return (-1); if (entry1->h_hash > entry2->h_hash) return (1); return (0); } /* * Append an entry to the end of the directory block. */ static void ext2_append_entry(char *block, uint32_t blksize, struct ext2fs_direct_2 *last_entry, struct ext2fs_direct_2 *new_entry) { uint16_t entry_len; entry_len = EXT2_DIR_REC_LEN(last_entry->e2d_namlen); last_entry->e2d_reclen = entry_len; last_entry = (struct ext2fs_direct_2 *)((char *)last_entry + entry_len); new_entry->e2d_reclen = block + blksize - (char *)last_entry; memcpy(last_entry, new_entry, EXT2_DIR_REC_LEN(new_entry->e2d_namlen)); } /* * Move half of entries from the old directory block to the new one. */ static int ext2_htree_split_dirblock(char *block1, char *block2, uint32_t blksize, uint32_t *hash_seed, uint8_t hash_version, uint32_t *split_hash, struct ext2fs_direct_2 *entry) { int entry_cnt = 0; int size = 0; int i, k; uint32_t offset; uint16_t entry_len = 0; uint32_t entry_hash; struct ext2fs_direct_2 *ep, *last; char *dest; struct ext2fs_htree_sort_entry *sort_info; ep = (struct ext2fs_direct_2 *)block1; dest = block2; sort_info = (struct ext2fs_htree_sort_entry *) ((char *)block2 + blksize); /* * Calculate name hash value for the entry which is to be added. */ ext2_htree_hash(entry->e2d_name, entry->e2d_namlen, hash_seed, hash_version, &entry_hash, NULL); /* * Fill in directory entry sort descriptors. */ while ((char *)ep < block1 + blksize) { if (ep->e2d_ino && ep->e2d_namlen) { entry_cnt++; sort_info--; sort_info->h_size = ep->e2d_reclen; sort_info->h_offset = (char *)ep - block1; ext2_htree_hash(ep->e2d_name, ep->e2d_namlen, hash_seed, hash_version, &sort_info->h_hash, NULL); } ep = (struct ext2fs_direct_2 *) ((char *)ep + ep->e2d_reclen); } /* * Sort directory entry descriptors by name hash value. */ qsort(sort_info, entry_cnt, sizeof(struct ext2fs_htree_sort_entry), ext2_htree_cmp_sort_entry); /* * Count the number of entries to move to directory block 2. */ for (i = entry_cnt - 1; i >= 0; i--) { if (sort_info[i].h_size + size > blksize / 2) break; size += sort_info[i].h_size; } *split_hash = sort_info[i + 1].h_hash; /* * Set collision bit. */ if (*split_hash == sort_info[i].h_hash) *split_hash += 1; /* * Move half of directory entries from block 1 to block 2. */ for (k = i + 1; k < entry_cnt; k++) { ep = (struct ext2fs_direct_2 *)((char *)block1 + sort_info[k].h_offset); entry_len = EXT2_DIR_REC_LEN(ep->e2d_namlen); memcpy(dest, ep, entry_len); ((struct ext2fs_direct_2 *)dest)->e2d_reclen = entry_len; /* Mark directory entry as unused. */ ep->e2d_ino = 0; dest += entry_len; } dest -= entry_len; /* Shrink directory entries in block 1. */ last = (struct ext2fs_direct_2 *)block1; entry_len = 0; for (offset = 0; offset < blksize; ) { ep = (struct ext2fs_direct_2 *)(block1 + offset); offset += ep->e2d_reclen; if (ep->e2d_ino) { last = (struct ext2fs_direct_2 *) ((char *)last + entry_len); entry_len = EXT2_DIR_REC_LEN(ep->e2d_namlen); memcpy((void *)last, (void *)ep, entry_len); last->e2d_reclen = entry_len; } } if (entry_hash >= *split_hash) { /* Add entry to block 2. */ ext2_append_entry(block2, blksize, (struct ext2fs_direct_2 *)dest, entry); /* Adjust length field of last entry of block 1. */ last->e2d_reclen = block1 + blksize - (char *)last; } else { /* Add entry to block 1. */ ext2_append_entry(block1, blksize, last, entry); /* Adjust length field of last entry of block 2. */ ((struct ext2fs_direct_2 *)dest)->e2d_reclen = block2 + blksize - dest; } return (0); } /* * Create an HTree index for a directory */ int ext2_htree_create_index(struct vnode *vp, struct componentname *cnp, struct ext2fs_direct_2 *new_entry) { struct buf *bp = NULL; struct inode *dp; struct ext2fs *fs; struct m_ext2fs *m_fs; struct ext2fs_direct_2 *ep, *dotdot; struct ext2fs_htree_root *root; struct ext2fs_htree_lookup_info info; uint32_t blksize, dirlen, split_hash; uint8_t hash_version; char *buf1 = NULL; char *buf2 = NULL; int error = 0; dp = VTOI(vp); fs = dp->i_e2fs->e2fs; m_fs = dp->i_e2fs; blksize = m_fs->e2fs_bsize; buf1 = malloc(blksize, M_TEMP, M_WAITOK | M_ZERO); buf2 = malloc(blksize, M_TEMP, M_WAITOK | M_ZERO); if ((error = ext2_blkatoff(vp, 0, NULL, &bp)) != 0) goto out; root = (struct ext2fs_htree_root *)bp->b_data; dotdot = (struct ext2fs_direct_2 *)((char *)&(root->h_dotdot)); ep = (struct ext2fs_direct_2 *)((char *)dotdot + dotdot->e2d_reclen); dirlen = (char *)root + blksize - (char *)ep; memcpy(buf1, ep, dirlen); ep = (struct ext2fs_direct_2 *)buf1; while ((char *)ep < buf1 + dirlen) ep = (struct ext2fs_direct_2 *) ((char *)ep + ep->e2d_reclen); ep->e2d_reclen = buf1 + blksize - (char *)ep; - dp->i_flag |= IN_E4INDEX; + dp->i_flag |= IN_E3INDEX; /* * Initialize index root. */ dotdot->e2d_reclen = blksize - EXT2_DIR_REC_LEN(1); memset(&root->h_info, 0, sizeof(root->h_info)); root->h_info.h_hash_version = fs->e3fs_def_hash_version; root->h_info.h_info_len = sizeof(root->h_info); ext2_htree_set_block(root->h_entries, 1); ext2_htree_set_count(root->h_entries, 1); ext2_htree_set_limit(root->h_entries, ext2_htree_root_limit(dp, sizeof(root->h_info))); memset(&info, 0, sizeof(info)); info.h_levels_num = 1; info.h_levels[0].h_entries = root->h_entries; info.h_levels[0].h_entry = root->h_entries; hash_version = root->h_info.h_hash_version; if (hash_version <= EXT2_HTREE_TEA) hash_version += m_fs->e2fs_uhash; ext2_htree_split_dirblock(buf1, buf2, blksize, fs->e3fs_hash_seed, hash_version, &split_hash, new_entry); ext2_htree_insert_entry(&info, split_hash, 2); /* * Write directory block 0. */ if (DOINGASYNC(vp)) { bdwrite(bp); error = 0; } else { error = bwrite(bp); } dp->i_flag |= IN_CHANGE | IN_UPDATE; if (error) goto out; /* * Write directory block 1. */ error = ext2_htree_append_block(vp, buf1, cnp, blksize); if (error) goto out1; /* * Write directory block 2. */ error = ext2_htree_append_block(vp, buf2, cnp, blksize); free(buf1, M_TEMP); free(buf2, M_TEMP); return (error); out: if (bp != NULL) brelse(bp); out1: free(buf1, M_TEMP); free(buf2, M_TEMP); return (error); } /* * Add an entry to the directory using htree index. */ int ext2_htree_add_entry(struct vnode *dvp, struct ext2fs_direct_2 *entry, struct componentname *cnp) { struct ext2fs_htree_entry *entries, *leaf_node; struct ext2fs_htree_lookup_info info; struct buf *bp = NULL; struct ext2fs *fs; struct m_ext2fs *m_fs; struct inode *ip; uint16_t ent_num; uint32_t dirhash, split_hash; uint32_t blksize, blknum; uint64_t cursize, dirsize; uint8_t hash_version; char *newdirblock = NULL; char *newidxblock = NULL; struct ext2fs_htree_node *dst_node; struct ext2fs_htree_entry *dst_entries; struct ext2fs_htree_entry *root_entires; struct buf *dst_bp = NULL; int error, write_bp = 0, write_dst_bp = 0, write_info = 0; ip = VTOI(dvp); m_fs = ip->i_e2fs; fs = m_fs->e2fs; blksize = m_fs->e2fs_bsize; if (ip->i_count != 0) return ext2_add_entry(dvp, entry); /* Target directory block is full, split it */ memset(&info, 0, sizeof(info)); error = ext2_htree_find_leaf(ip, entry->e2d_name, entry->e2d_namlen, &dirhash, &hash_version, &info); if (error) return (error); entries = info.h_levels[info.h_levels_num - 1].h_entries; ent_num = ext2_htree_get_count(entries); if (ent_num == ext2_htree_get_limit(entries)) { /* Split the index node. */ root_entires = info.h_levels[0].h_entries; newidxblock = malloc(blksize, M_TEMP, M_WAITOK | M_ZERO); dst_node = (struct ext2fs_htree_node *)newidxblock; dst_entries = dst_node->h_entries; memset(&dst_node->h_fake_dirent, 0, sizeof(dst_node->h_fake_dirent)); dst_node->h_fake_dirent.e2d_reclen = blksize; cursize = roundup(ip->i_size, blksize); dirsize = cursize + blksize; blknum = dirsize / blksize - 1; error = ext2_htree_append_block(dvp, newidxblock, cnp, blksize); if (error) goto finish; error = ext2_blkatoff(dvp, cursize, NULL, &dst_bp); if (error) goto finish; dst_node = (struct ext2fs_htree_node *)dst_bp->b_data; dst_entries = dst_node->h_entries; if (info.h_levels_num == 2) { uint16_t src_ent_num, dst_ent_num; if (ext2_htree_get_count(root_entires) == ext2_htree_get_limit(root_entires)) { /* Directory index is full */ error = EIO; goto finish; } src_ent_num = ent_num / 2; dst_ent_num = ent_num - src_ent_num; split_hash = ext2_htree_get_hash(entries + src_ent_num); /* Move half of index entries to the new index node */ memcpy(dst_entries, entries + src_ent_num, dst_ent_num * sizeof(struct ext2fs_htree_entry)); ext2_htree_set_count(entries, src_ent_num); ext2_htree_set_count(dst_entries, dst_ent_num); ext2_htree_set_limit(dst_entries, ext2_htree_node_limit(ip)); if (info.h_levels[1].h_entry >= entries + src_ent_num) { struct buf *tmp = info.h_levels[1].h_bp; info.h_levels[1].h_bp = dst_bp; dst_bp = tmp; info.h_levels[1].h_entry = info.h_levels[1].h_entry - (entries + src_ent_num) + dst_entries; info.h_levels[1].h_entries = dst_entries; } ext2_htree_insert_entry_to_level(&info.h_levels[0], split_hash, blknum); /* Write new index node to disk */ error = bwrite(dst_bp); ip->i_flag |= IN_CHANGE | IN_UPDATE; if (error) goto finish; write_dst_bp = 1; } else { /* Create second level for htree index */ struct ext2fs_htree_root *idx_root; memcpy(dst_entries, entries, ent_num * sizeof(struct ext2fs_htree_entry)); ext2_htree_set_limit(dst_entries, ext2_htree_node_limit(ip)); idx_root = (struct ext2fs_htree_root *) info.h_levels[0].h_bp->b_data; idx_root->h_info.h_ind_levels = 1; ext2_htree_set_count(entries, 1); ext2_htree_set_block(entries, blknum); info.h_levels_num = 2; info.h_levels[1].h_entries = dst_entries; info.h_levels[1].h_entry = info.h_levels[0].h_entry - info.h_levels[0].h_entries + dst_entries; info.h_levels[1].h_bp = dst_bp; dst_bp = NULL; } } leaf_node = info.h_levels[info.h_levels_num - 1].h_entry; blknum = ext2_htree_get_block(leaf_node); error = ext2_blkatoff(dvp, blknum * blksize, NULL, &bp); if (error) goto finish; /* Split target directory block */ newdirblock = malloc(blksize, M_TEMP, M_WAITOK | M_ZERO); ext2_htree_split_dirblock((char *)bp->b_data, newdirblock, blksize, fs->e3fs_hash_seed, hash_version, &split_hash, entry); cursize = roundup(ip->i_size, blksize); dirsize = cursize + blksize; blknum = dirsize / blksize - 1; /* Add index entry for the new directory block */ ext2_htree_insert_entry(&info, split_hash, blknum); /* Write the new directory block to the end of the directory */ error = ext2_htree_append_block(dvp, newdirblock, cnp, blksize); if (error) goto finish; /* Write the target directory block */ error = bwrite(bp); ip->i_flag |= IN_CHANGE | IN_UPDATE; if (error) goto finish; write_bp = 1; /* Write the index block */ error = ext2_htree_writebuf(&info); if (!error) write_info = 1; finish: if (dst_bp != NULL && !write_dst_bp) brelse(dst_bp); if (bp != NULL && !write_bp) brelse(bp); if (newdirblock != NULL) free(newdirblock, M_TEMP); if (newidxblock != NULL) free(newidxblock, M_TEMP); if (!write_info) ext2_htree_release(&info); return (error); } Index: projects/clang380-import/sys/fs/ext2fs/ext2_inode_cnv.c =================================================================== --- projects/clang380-import/sys/fs/ext2fs/ext2_inode_cnv.c (revision 294776) +++ projects/clang380-import/sys/fs/ext2fs/ext2_inode_cnv.c (revision 294777) @@ -1,175 +1,173 @@ /*- * Copyright (c) 1995 The University of Utah and * the Computer Systems Laboratory at the University of Utah (CSL). * All rights reserved. * * Permission to use, copy, modify and distribute this software is hereby * granted provided that (1) source code retains these copyright, permission, * and disclaimer notices, and (2) redistributions including binaries * reproduce the notices in supporting documentation, and (3) all advertising * materials mentioning features or use of this software display the following * acknowledgement: ``This product includes software developed by the * Computer Systems Laboratory at the University of Utah.'' * * THE UNIVERSITY OF UTAH AND CSL ALLOW FREE USE OF THIS SOFTWARE IN ITS "AS * IS" CONDITION. THE UNIVERSITY OF UTAH AND CSL DISCLAIM ANY LIABILITY OF * ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * CSL requests users of this software to return to csl-dist@cs.utah.edu any * improvements that they make and grant CSL redistribution rights. * * Utah $Hdr$ * $FreeBSD$ */ /* * routines to convert on disk ext2 inodes into inodes and back */ #include #include #include #include #include #include #include #include #include #include #include #define XTIME_TO_NSEC(x) ((x & EXT3_NSEC_MASK) >> 2) #define NSEC_TO_XTIME(t) (le32toh(t << 2) & EXT3_NSEC_MASK) #ifdef EXT2FS_DEBUG void ext2_print_inode(struct inode *in) { int i; struct ext4_extent_header *ehp; struct ext4_extent *ep; printf( "Inode: %5ju", (uintmax_t)in->i_number); printf( /* "Inode: %5d" */ " Type: %10s Mode: 0x%o Flags: 0x%x Version: %d\n", "n/a", in->i_mode, in->i_flags, in->i_gen); printf("User: %5u Group: %5u Size: %ju\n", in->i_uid, in->i_gid, (uintmax_t)in->i_size); printf("Links: %3d Blockcount: %ju\n", in->i_nlink, (uintmax_t)in->i_blocks); printf( "ctime: 0x%x", in->i_ctime); printf( "atime: 0x%x", in->i_atime); printf( "mtime: 0x%x", in->i_mtime); if (E2DI_HAS_XTIME(in)) printf("crtime %#x ", in->i_birthtime); printf("BLOCKS:"); for (i = 0; i < (in->i_blocks <= 24 ? (in->i_blocks + 1) / 2 : 12); i++) printf(" %d", in->i_db[i]); printf("\n"); printf("Extents:\n"); ehp = (struct ext4_extent_header *)in->i_db; printf("Header (magic 0x%x entries %d max %d depth %d gen %d)\n", ehp->eh_magic, ehp->eh_ecount, ehp->eh_max, ehp->eh_depth, ehp->eh_gen); ep = (struct ext4_extent *)(char *)(ehp + 1); printf("Index (blk %d len %d start_lo %d start_hi %d)\n", ep->e_blk, ep->e_len, ep->e_start_lo, ep->e_start_hi); printf("\n"); } #endif /* EXT2FS_DEBUG */ /* * raw ext2 inode to inode */ void ext2_ei2i(struct ext2fs_dinode *ei, struct inode *ip) { int i; ip->i_nlink = ei->e2di_nlink; /* Godmar thinks - if the link count is zero, then the inode is unused - according to ext2 standards. Ufs marks this fact by setting i_mode to zero - why ? I can see that this might lead to problems in an undelete. */ ip->i_mode = ei->e2di_nlink ? ei->e2di_mode : 0; ip->i_size = ei->e2di_size; if (S_ISREG(ip->i_mode)) ip->i_size |= ((u_int64_t)ei->e2di_size_high) << 32; ip->i_atime = ei->e2di_atime; ip->i_mtime = ei->e2di_mtime; ip->i_ctime = ei->e2di_ctime; if (E2DI_HAS_XTIME(ip)) { ip->i_atimensec = XTIME_TO_NSEC(ei->e2di_atime_extra); ip->i_mtimensec = XTIME_TO_NSEC(ei->e2di_mtime_extra); ip->i_ctimensec = XTIME_TO_NSEC(ei->e2di_ctime_extra); ip->i_birthtime = ei->e2di_crtime; ip->i_birthnsec = XTIME_TO_NSEC(ei->e2di_crtime_extra); } ip->i_flags = 0; ip->i_flags |= (ei->e2di_flags & EXT2_APPEND) ? SF_APPEND : 0; ip->i_flags |= (ei->e2di_flags & EXT2_IMMUTABLE) ? SF_IMMUTABLE : 0; ip->i_flags |= (ei->e2di_flags & EXT2_NODUMP) ? UF_NODUMP : 0; - ip->i_flag |= (ei->e2di_flags & EXT4_INDEX) ? IN_E4INDEX : 0; + ip->i_flag |= (ei->e2di_flags & EXT3_INDEX) ? IN_E3INDEX : 0; ip->i_flag |= (ei->e2di_flags & EXT4_EXTENTS) ? IN_E4EXTENTS : 0; ip->i_blocks = ei->e2di_nblock; if (E2DI_HAS_HUGE_FILE(ip)) { ip->i_blocks |= (uint64_t)ei->e2di_nblock_high << 32; if (ei->e2di_flags & EXT4_HUGE_FILE) ip->i_blocks = fsbtodb(ip->i_e2fs, ip->i_blocks); } ip->i_gen = ei->e2di_gen; ip->i_uid = ei->e2di_uid; ip->i_gid = ei->e2di_gid; /* XXX use memcpy */ for(i = 0; i < NDADDR; i++) ip->i_db[i] = ei->e2di_blocks[i]; for(i = 0; i < NIADDR; i++) ip->i_ib[i] = ei->e2di_blocks[EXT2_NDIR_BLOCKS + i]; } /* * inode to raw ext2 inode */ void ext2_i2ei(struct inode *ip, struct ext2fs_dinode *ei) { int i; ei->e2di_mode = ip->i_mode; ei->e2di_nlink = ip->i_nlink; /* Godmar thinks: if dtime is nonzero, ext2 says this inode has been deleted, this would correspond to a zero link count */ ei->e2di_dtime = ei->e2di_nlink ? 0 : ip->i_mtime; ei->e2di_size = ip->i_size; if (S_ISREG(ip->i_mode)) ei->e2di_size_high = ip->i_size >> 32; ei->e2di_atime = ip->i_atime; ei->e2di_mtime = ip->i_mtime; ei->e2di_ctime = ip->i_ctime; - if (E2DI_HAS_XTIME(ip)) { - ei->e2di_ctime_extra = NSEC_TO_XTIME(ip->i_ctimensec); - ei->e2di_mtime_extra = NSEC_TO_XTIME(ip->i_mtimensec); - ei->e2di_atime_extra = NSEC_TO_XTIME(ip->i_atimensec); - ei->e2di_crtime = ip->i_birthtime; - ei->e2di_crtime_extra = NSEC_TO_XTIME(ip->i_birthnsec); - } + ei->e2di_ctime_extra = NSEC_TO_XTIME(ip->i_ctimensec); + ei->e2di_mtime_extra = NSEC_TO_XTIME(ip->i_mtimensec); + ei->e2di_atime_extra = NSEC_TO_XTIME(ip->i_atimensec); + ei->e2di_crtime = ip->i_birthtime; + ei->e2di_crtime_extra = NSEC_TO_XTIME(ip->i_birthnsec); ei->e2di_flags = 0; ei->e2di_flags |= (ip->i_flags & SF_APPEND) ? EXT2_APPEND: 0; ei->e2di_flags |= (ip->i_flags & SF_IMMUTABLE) ? EXT2_IMMUTABLE: 0; ei->e2di_flags |= (ip->i_flags & UF_NODUMP) ? EXT2_NODUMP: 0; - ei->e2di_flags |= (ip->i_flag & IN_E4INDEX) ? EXT4_INDEX: 0; + ei->e2di_flags |= (ip->i_flag & IN_E3INDEX) ? EXT3_INDEX: 0; ei->e2di_flags |= (ip->i_flag & IN_E4EXTENTS) ? EXT4_EXTENTS: 0; ei->e2di_nblock = ip->i_blocks & 0xffffffff; ei->e2di_nblock_high = ip->i_blocks >> 32 & 0xffff; ei->e2di_gen = ip->i_gen; ei->e2di_uid = ip->i_uid; ei->e2di_gid = ip->i_gid; /* XXX use memcpy */ for(i = 0; i < NDADDR; i++) ei->e2di_blocks[i] = ip->i_db[i]; for(i = 0; i < NIADDR; i++) ei->e2di_blocks[EXT2_NDIR_BLOCKS + i] = ip->i_ib[i]; } Index: projects/clang380-import/sys/fs/ext2fs/ext2_lookup.c =================================================================== --- projects/clang380-import/sys/fs/ext2fs/ext2_lookup.c (revision 294776) +++ projects/clang380-import/sys/fs/ext2fs/ext2_lookup.c (revision 294777) @@ -1,1236 +1,1236 @@ /*- * modified for Lites 1.1 * * Aug 1995, Godmar Back (gback@cs.utah.edu) * University of Utah, Department of Computer Science */ /*- * Copyright (c) 1989, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)ufs_lookup.c 8.6 (Berkeley) 4/1/94 * $FreeBSD$ */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INVARIANTS static int dirchk = 1; #else static int dirchk = 0; #endif static SYSCTL_NODE(_vfs, OID_AUTO, e2fs, CTLFLAG_RD, 0, "EXT2FS filesystem"); SYSCTL_INT(_vfs_e2fs, OID_AUTO, dircheck, CTLFLAG_RW, &dirchk, 0, ""); /* DIRBLKSIZE in ffs is DEV_BSIZE (in most cases 512) while it is the native blocksize in ext2fs - thus, a #define is no longer appropriate */ #undef DIRBLKSIZ static u_char ext2_ft_to_dt[] = { DT_UNKNOWN, /* EXT2_FT_UNKNOWN */ DT_REG, /* EXT2_FT_REG_FILE */ DT_DIR, /* EXT2_FT_DIR */ DT_CHR, /* EXT2_FT_CHRDEV */ DT_BLK, /* EXT2_FT_BLKDEV */ DT_FIFO, /* EXT2_FT_FIFO */ DT_SOCK, /* EXT2_FT_SOCK */ DT_LNK, /* EXT2_FT_SYMLINK */ }; #define FTTODT(ft) \ ((ft) < nitems(ext2_ft_to_dt) ? ext2_ft_to_dt[(ft)] : DT_UNKNOWN) static u_char dt_to_ext2_ft[] = { EXT2_FT_UNKNOWN, /* DT_UNKNOWN */ EXT2_FT_FIFO, /* DT_FIFO */ EXT2_FT_CHRDEV, /* DT_CHR */ EXT2_FT_UNKNOWN, /* unused */ EXT2_FT_DIR, /* DT_DIR */ EXT2_FT_UNKNOWN, /* unused */ EXT2_FT_BLKDEV, /* DT_BLK */ EXT2_FT_UNKNOWN, /* unused */ EXT2_FT_REG_FILE, /* DT_REG */ EXT2_FT_UNKNOWN, /* unused */ EXT2_FT_SYMLINK, /* DT_LNK */ EXT2_FT_UNKNOWN, /* unused */ EXT2_FT_SOCK, /* DT_SOCK */ EXT2_FT_UNKNOWN, /* unused */ EXT2_FT_UNKNOWN, /* DT_WHT */ }; #define DTTOFT(dt) \ ((dt) < nitems(dt_to_ext2_ft) ? dt_to_ext2_ft[(dt)] : EXT2_FT_UNKNOWN) static int ext2_dirbadentry(struct vnode *dp, struct ext2fs_direct_2 *de, int entryoffsetinblock); static int ext2_is_dot_entry(struct componentname *cnp); static int ext2_lookup_ino(struct vnode *vdp, struct vnode **vpp, struct componentname *cnp, ino_t *dd_ino); static int ext2_is_dot_entry(struct componentname *cnp) { if (cnp->cn_namelen <= 2 && cnp->cn_nameptr[0] == '.' && (cnp->cn_nameptr[1] == '.' || cnp->cn_nameptr[1] == '\0')) return (1); return (0); } /* * Vnode op for reading directories. */ int ext2_readdir(struct vop_readdir_args *ap) { struct vnode *vp = ap->a_vp; struct uio *uio = ap->a_uio; struct buf *bp; struct inode *ip; struct ext2fs_direct_2 *dp, *edp; u_long *cookies; struct dirent dstdp; off_t offset, startoffset; size_t readcnt, skipcnt; ssize_t startresid; int ncookies; int DIRBLKSIZ = VTOI(ap->a_vp)->i_e2fs->e2fs_bsize; int error; if (uio->uio_offset < 0) return (EINVAL); ip = VTOI(vp); if (ap->a_ncookies != NULL) { ncookies = uio->uio_resid; if (uio->uio_offset >= ip->i_size) ncookies = 0; else if (ip->i_size - uio->uio_offset < ncookies) ncookies = ip->i_size - uio->uio_offset; ncookies = ncookies / (offsetof(struct ext2fs_direct_2, e2d_namlen) + 4) + 1; cookies = malloc(ncookies * sizeof(*cookies), M_TEMP, M_WAITOK); *ap->a_ncookies = ncookies; *ap->a_cookies = cookies; } else { ncookies = 0; cookies = NULL; } offset = startoffset = uio->uio_offset; startresid = uio->uio_resid; error = 0; while (error == 0 && uio->uio_resid > 0 && uio->uio_offset < ip->i_size) { error = ext2_blkatoff(vp, uio->uio_offset, NULL, &bp); if (error) break; if (bp->b_offset + bp->b_bcount > ip->i_size) readcnt = ip->i_size - bp->b_offset; else readcnt = bp->b_bcount; skipcnt = (size_t)(uio->uio_offset - bp->b_offset) & ~(size_t)(DIRBLKSIZ - 1); offset = bp->b_offset + skipcnt; dp = (struct ext2fs_direct_2 *)&bp->b_data[skipcnt]; edp = (struct ext2fs_direct_2 *)&bp->b_data[readcnt]; while (error == 0 && uio->uio_resid > 0 && dp < edp) { if (dp->e2d_reclen <= offsetof(struct ext2fs_direct_2, e2d_namlen) || (caddr_t)dp + dp->e2d_reclen > (caddr_t)edp) { error = EIO; break; } /*- * "New" ext2fs directory entries differ in 3 ways * from ufs on-disk ones: * - the name is not necessarily NUL-terminated. * - the file type field always exists and always * follows the name length field. * - the file type is encoded in a different way. * * "Old" ext2fs directory entries need no special * conversions, since they are binary compatible * with "new" entries having a file type of 0 (i.e., * EXT2_FT_UNKNOWN). Splitting the old name length * field didn't make a mess like it did in ufs, * because ext2fs uses a machine-independent disk * layout. */ dstdp.d_namlen = dp->e2d_namlen; dstdp.d_type = FTTODT(dp->e2d_type); if (offsetof(struct ext2fs_direct_2, e2d_namlen) + dstdp.d_namlen > dp->e2d_reclen) { error = EIO; break; } if (offset < startoffset || dp->e2d_ino == 0) goto nextentry; dstdp.d_fileno = dp->e2d_ino; dstdp.d_reclen = GENERIC_DIRSIZ(&dstdp); bcopy(dp->e2d_name, dstdp.d_name, dstdp.d_namlen); dstdp.d_name[dstdp.d_namlen] = '\0'; if (dstdp.d_reclen > uio->uio_resid) { if (uio->uio_resid == startresid) error = EINVAL; else error = EJUSTRETURN; break; } /* Advance dp. */ error = uiomove((caddr_t)&dstdp, dstdp.d_reclen, uio); if (error) break; if (cookies != NULL) { KASSERT(ncookies > 0, ("ext2_readdir: cookies buffer too small")); *cookies = offset + dp->e2d_reclen; cookies++; ncookies--; } nextentry: offset += dp->e2d_reclen; dp = (struct ext2fs_direct_2 *)((caddr_t)dp + dp->e2d_reclen); } bqrelse(bp); uio->uio_offset = offset; } /* We need to correct uio_offset. */ uio->uio_offset = offset; if (error == EJUSTRETURN) error = 0; if (ap->a_ncookies != NULL) { if (error == 0) { ap->a_ncookies -= ncookies; } else { free(*ap->a_cookies, M_TEMP); *ap->a_ncookies = 0; *ap->a_cookies = NULL; } } if (error == 0 && ap->a_eofflag) *ap->a_eofflag = ip->i_size <= uio->uio_offset; return (error); } /* * Convert a component of a pathname into a pointer to a locked inode. * This is a very central and rather complicated routine. * If the file system is not maintained in a strict tree hierarchy, * this can result in a deadlock situation (see comments in code below). * * The cnp->cn_nameiop argument is LOOKUP, CREATE, RENAME, or DELETE depending * on whether the name is to be looked up, created, renamed, or deleted. * When CREATE, RENAME, or DELETE is specified, information usable in * creating, renaming, or deleting a directory entry may be calculated. * If flag has LOCKPARENT or'ed into it and the target of the pathname * exists, lookup returns both the target and its parent directory locked. * When creating or renaming and LOCKPARENT is specified, the target may * not be ".". When deleting and LOCKPARENT is specified, the target may * be "."., but the caller must check to ensure it does an vrele and vput * instead of two vputs. * * Overall outline of ext2_lookup: * * search for name in directory, to found or notfound * notfound: * if creating, return locked directory, leaving info on available slots * else return error * found: * if at end of path and deleting, return information to allow delete * if at end of path and rewriting (RENAME and LOCKPARENT), lock target * inode and return info to allow rewrite * if not at end, add name to cache; if at end and neither creating * nor deleting, add name to cache */ int ext2_lookup(struct vop_cachedlookup_args *ap) { return (ext2_lookup_ino(ap->a_dvp, ap->a_vpp, ap->a_cnp, NULL)); } static int ext2_lookup_ino(struct vnode *vdp, struct vnode **vpp, struct componentname *cnp, ino_t *dd_ino) { struct inode *dp; /* inode for directory being searched */ struct buf *bp; /* a buffer of directory entries */ struct ext2fs_direct_2 *ep; /* the current directory entry */ int entryoffsetinblock; /* offset of ep in bp's buffer */ struct ext2fs_searchslot ss; doff_t i_diroff; /* cached i_diroff value */ doff_t i_offset; /* cached i_offset value */ int numdirpasses; /* strategy for directory search */ doff_t endsearch; /* offset to end directory search */ doff_t prevoff; /* prev entry dp->i_offset */ struct vnode *pdp; /* saved dp during symlink work */ struct vnode *tdp; /* returned by VFS_VGET */ doff_t enduseful; /* pointer past last used dir slot */ u_long bmask; /* block offset mask */ int error; struct ucred *cred = cnp->cn_cred; int flags = cnp->cn_flags; int nameiop = cnp->cn_nameiop; ino_t ino, ino1; int ltype; int entry_found = 0; int DIRBLKSIZ = VTOI(vdp)->i_e2fs->e2fs_bsize; if (vpp != NULL) *vpp = NULL; dp = VTOI(vdp); bmask = VFSTOEXT2(vdp->v_mount)->um_mountp->mnt_stat.f_iosize - 1; restart: bp = NULL; ss.slotoffset = -1; /* * We now have a segment name to search for, and a directory to search. * * Suppress search for slots unless creating * file and at end of pathname, in which case * we watch for a place to put the new file in * case it doesn't already exist. */ i_diroff = dp->i_diroff; ss.slotstatus = FOUND; ss.slotfreespace = ss.slotsize = ss.slotneeded = 0; if ((nameiop == CREATE || nameiop == RENAME) && (flags & ISLASTCN)) { ss.slotstatus = NONE; ss.slotneeded = EXT2_DIR_REC_LEN(cnp->cn_namelen); /* was ss.slotneeded = (sizeof(struct direct) - MAXNAMLEN + cnp->cn_namelen + 3) &~ 3; */ } /* * Try to lookup dir entry using htree directory index. * * If we got an error or we want to find '.' or '..' entry, * we will fall back to linear search. */ if (!ext2_is_dot_entry(cnp) && ext2_htree_has_idx(dp)) { numdirpasses = 1; entryoffsetinblock = 0; switch (ext2_htree_lookup(dp, cnp->cn_nameptr, cnp->cn_namelen, &bp, &entryoffsetinblock, &i_offset, &prevoff, &enduseful, &ss)) { case 0: ep = (struct ext2fs_direct_2 *)((char *)bp->b_data + (i_offset & bmask)); goto foundentry; case ENOENT: i_offset = roundup2(dp->i_size, DIRBLKSIZ); goto notfound; default: /* * Something failed; just fallback to do a linear * search. */ break; } } /* * If there is cached information on a previous search of * this directory, pick up where we last left off. * We cache only lookups as these are the most common * and have the greatest payoff. Caching CREATE has little * benefit as it usually must search the entire directory * to determine that the entry does not exist. Caching the * location of the last DELETE or RENAME has not reduced * profiling time and hence has been removed in the interest * of simplicity. */ if (nameiop != LOOKUP || i_diroff == 0 || i_diroff > dp->i_size) { entryoffsetinblock = 0; i_offset = 0; numdirpasses = 1; } else { i_offset = i_diroff; if ((entryoffsetinblock = i_offset & bmask) && (error = ext2_blkatoff(vdp, (off_t)i_offset, NULL, &bp))) return (error); numdirpasses = 2; nchstats.ncs_2passes++; } prevoff = i_offset; endsearch = roundup2(dp->i_size, DIRBLKSIZ); enduseful = 0; searchloop: while (i_offset < endsearch) { /* * If necessary, get the next directory block. */ if (bp != NULL) brelse(bp); error = ext2_blkatoff(vdp, (off_t)i_offset, NULL, &bp); if (error != 0) return (error); entryoffsetinblock = 0; /* * If still looking for a slot, and at a DIRBLKSIZE * boundary, have to start looking for free space again. */ if (ss.slotstatus == NONE && (entryoffsetinblock & (DIRBLKSIZ - 1)) == 0) { ss.slotoffset = -1; ss.slotfreespace = 0; } error = ext2_search_dirblock(dp, bp->b_data, &entry_found, cnp->cn_nameptr, cnp->cn_namelen, &entryoffsetinblock, &i_offset, &prevoff, &enduseful, &ss); if (error != 0) { brelse(bp); return (error); } if (entry_found) { ep = (struct ext2fs_direct_2 *)((char *)bp->b_data + (entryoffsetinblock & bmask)); foundentry: ino = ep->e2d_ino; goto found; } } notfound: /* * If we started in the middle of the directory and failed * to find our target, we must check the beginning as well. */ if (numdirpasses == 2) { numdirpasses--; i_offset = 0; endsearch = i_diroff; goto searchloop; } if (bp != NULL) brelse(bp); /* * If creating, and at end of pathname and current * directory has not been removed, then can consider * allowing file to be created. */ if ((nameiop == CREATE || nameiop == RENAME) && (flags & ISLASTCN) && dp->i_nlink != 0) { /* * Access for write is interpreted as allowing * creation of files in the directory. */ if ((error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_thread)) != 0) return (error); /* * Return an indication of where the new directory * entry should be put. If we didn't find a slot, * then set dp->i_count to 0 indicating * that the new slot belongs at the end of the * directory. If we found a slot, then the new entry * can be put in the range from dp->i_offset to * dp->i_offset + dp->i_count. */ if (ss.slotstatus == NONE) { dp->i_offset = roundup2(dp->i_size, DIRBLKSIZ); dp->i_count = 0; enduseful = dp->i_offset; } else { dp->i_offset = ss.slotoffset; dp->i_count = ss.slotsize; if (enduseful < ss.slotoffset + ss.slotsize) enduseful = ss.slotoffset + ss.slotsize; } dp->i_endoff = roundup2(enduseful, DIRBLKSIZ); /* * We return with the directory locked, so that * the parameters we set up above will still be * valid if we actually decide to do a direnter(). * We return ni_vp == NULL to indicate that the entry * does not currently exist; we leave a pointer to * the (locked) directory inode in ndp->ni_dvp. * The pathname buffer is saved so that the name * can be obtained later. * * NB - if the directory is unlocked, then this * information cannot be used. */ cnp->cn_flags |= SAVENAME; return (EJUSTRETURN); } /* * Insert name into cache (as non-existent) if appropriate. */ if ((cnp->cn_flags & MAKEENTRY) != 0) cache_enter(vdp, NULL, cnp); return (ENOENT); found: if (dd_ino != NULL) *dd_ino = ino; if (numdirpasses == 2) nchstats.ncs_pass2++; /* * Check that directory length properly reflects presence * of this entry. */ if (entryoffsetinblock + EXT2_DIR_REC_LEN(ep->e2d_namlen) > dp->i_size) { ext2_dirbad(dp, i_offset, "i_size too small"); dp->i_size = entryoffsetinblock+EXT2_DIR_REC_LEN(ep->e2d_namlen); dp->i_flag |= IN_CHANGE | IN_UPDATE; } brelse(bp); /* * Found component in pathname. * If the final component of path name, save information * in the cache as to where the entry was found. */ if ((flags & ISLASTCN) && nameiop == LOOKUP) dp->i_diroff = i_offset &~ (DIRBLKSIZ - 1); /* * If deleting, and at end of pathname, return * parameters which can be used to remove file. */ if (nameiop == DELETE && (flags & ISLASTCN)) { if (flags & LOCKPARENT) ASSERT_VOP_ELOCKED(vdp, __FUNCTION__); /* * Write access to directory required to delete files. */ if ((error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_thread)) != 0) return (error); /* * Return pointer to current entry in dp->i_offset, * and distance past previous entry (if there * is a previous entry in this block) in dp->i_count. * Save directory inode pointer in ndp->ni_dvp for dirremove(). * * Technically we shouldn't be setting these in the * WANTPARENT case (first lookup in rename()), but any * lookups that will result in directory changes will * overwrite these. */ dp->i_offset = i_offset; if ((dp->i_offset & (DIRBLKSIZ - 1)) == 0) dp->i_count = 0; else dp->i_count = dp->i_offset - prevoff; if (dd_ino != NULL) return (0); if (dp->i_number == ino) { VREF(vdp); *vpp = vdp; return (0); } if ((error = VFS_VGET(vdp->v_mount, ino, LK_EXCLUSIVE, &tdp)) != 0) return (error); /* * If directory is "sticky", then user must own * the directory, or the file in it, else she * may not delete it (unless she's root). This * implements append-only directories. */ if ((dp->i_mode & ISVTX) && cred->cr_uid != 0 && cred->cr_uid != dp->i_uid && VTOI(tdp)->i_uid != cred->cr_uid) { vput(tdp); return (EPERM); } *vpp = tdp; return (0); } /* * If rewriting (RENAME), return the inode and the * information required to rewrite the present directory * Must get inode of directory entry to verify it's a * regular file, or empty directory. */ if (nameiop == RENAME && (flags & ISLASTCN)) { if ((error = VOP_ACCESS(vdp, VWRITE, cred, cnp->cn_thread)) != 0) return (error); /* * Careful about locking second inode. * This can only occur if the target is ".". */ dp->i_offset = i_offset; if (dp->i_number == ino) return (EISDIR); if (dd_ino != NULL) return (0); if ((error = VFS_VGET(vdp->v_mount, ino, LK_EXCLUSIVE, &tdp)) != 0) return (error); *vpp = tdp; cnp->cn_flags |= SAVENAME; return (0); } if (dd_ino != NULL) return (0); /* * Step through the translation in the name. We do not `vput' the * directory because we may need it again if a symbolic link * is relative to the current directory. Instead we save it * unlocked as "pdp". We must get the target inode before unlocking * the directory to insure that the inode will not be removed * before we get it. We prevent deadlock by always fetching * inodes from the root, moving down the directory tree. Thus * when following backward pointers ".." we must unlock the * parent directory before getting the requested directory. * There is a potential race condition here if both the current * and parent directories are removed before the VFS_VGET for the * inode associated with ".." returns. We hope that this occurs * infrequently since we cannot avoid this race condition without * implementing a sophisticated deadlock detection algorithm. * Note also that this simple deadlock detection scheme will not * work if the file system has any hard links other than ".." * that point backwards in the directory structure. */ pdp = vdp; if (flags & ISDOTDOT) { error = vn_vget_ino(pdp, ino, cnp->cn_lkflags, &tdp); if (pdp->v_iflag & VI_DOOMED) { if (error == 0) vput(tdp); error = ENOENT; } if (error) return (error); /* * Recheck that ".." entry in the vdp directory points * to the inode we looked up before vdp lock was * dropped. */ error = ext2_lookup_ino(pdp, NULL, cnp, &ino1); if (error) { vput(tdp); return (error); } if (ino1 != ino) { vput(tdp); goto restart; } *vpp = tdp; } else if (dp->i_number == ino) { VREF(vdp); /* we want ourself, ie "." */ /* * When we lookup "." we still can be asked to lock it * differently. */ ltype = cnp->cn_lkflags & LK_TYPE_MASK; if (ltype != VOP_ISLOCKED(vdp)) { if (ltype == LK_EXCLUSIVE) vn_lock(vdp, LK_UPGRADE | LK_RETRY); else /* if (ltype == LK_SHARED) */ vn_lock(vdp, LK_DOWNGRADE | LK_RETRY); } *vpp = vdp; } else { if ((error = VFS_VGET(vdp->v_mount, ino, cnp->cn_lkflags, &tdp)) != 0) return (error); *vpp = tdp; } /* * Insert name into cache if appropriate. */ if (cnp->cn_flags & MAKEENTRY) cache_enter(vdp, *vpp, cnp); return (0); } int ext2_search_dirblock(struct inode *ip, void *data, int *foundp, const char *name, int namelen, int *entryoffsetinblockp, doff_t *offp, doff_t *prevoffp, doff_t *endusefulp, struct ext2fs_searchslot *ssp) { struct vnode *vdp; struct ext2fs_direct_2 *ep, *top; uint32_t bsize = ip->i_e2fs->e2fs_bsize; int offset = *entryoffsetinblockp; int namlen; vdp = ITOV(ip); ep = (struct ext2fs_direct_2 *)((char *)data + offset); top = (struct ext2fs_direct_2 *)((char *)data + bsize - EXT2_DIR_REC_LEN(0)); while (ep < top) { /* * Full validation checks are slow, so we only check * enough to insure forward progress through the * directory. Complete checks can be run by setting * "vfs.e2fs.dirchk" to be true. */ if (ep->e2d_reclen == 0 || (dirchk && ext2_dirbadentry(vdp, ep, offset))) { int i; ext2_dirbad(ip, *offp, "mangled entry"); i = bsize - (offset & (bsize - 1)); *offp += i; offset += i; continue; } /* * If an appropriate sized slot has not yet been found, * check to see if one is available. Also accumulate space * in the current block so that we can determine if * compaction is viable. */ if (ssp->slotstatus != FOUND) { int size = ep->e2d_reclen; if (ep->e2d_ino != 0) size -= EXT2_DIR_REC_LEN(ep->e2d_namlen); if (size > 0) { if (size >= ssp->slotneeded) { ssp->slotstatus = FOUND; ssp->slotoffset = *offp; ssp->slotsize = ep->e2d_reclen; } else if (ssp->slotstatus == NONE) { ssp->slotfreespace += size; if (ssp->slotoffset == -1) ssp->slotoffset = *offp; if (ssp->slotfreespace >= ssp->slotneeded) { ssp->slotstatus = COMPACT; ssp->slotsize = *offp + ep->e2d_reclen - ssp->slotoffset; } } } } /* * Check for a name match. */ if (ep->e2d_ino) { namlen = ep->e2d_namlen; if (namlen == namelen && !bcmp(name, ep->e2d_name, (unsigned)namlen)) { /* * Save directory entry's inode number and * reclen in ndp->ni_ufs area, and release * directory buffer. */ *foundp = 1; return (0); } } *prevoffp = *offp; *offp += ep->e2d_reclen; offset += ep->e2d_reclen; *entryoffsetinblockp = offset; if (ep->e2d_ino) *endusefulp = *offp; /* * Get pointer to the next entry. */ ep = (struct ext2fs_direct_2 *)((char *)data + offset); } return (0); } void ext2_dirbad(struct inode *ip, doff_t offset, char *how) { struct mount *mp; mp = ITOV(ip)->v_mount; if ((mp->mnt_flag & MNT_RDONLY) == 0) panic("ext2_dirbad: %s: bad dir ino %ju at offset %ld: %s\n", mp->mnt_stat.f_mntonname, (uintmax_t)ip->i_number, (long)offset, how); else (void)printf("%s: bad dir ino %ju at offset %ld: %s\n", mp->mnt_stat.f_mntonname, (uintmax_t)ip->i_number, (long)offset, how); } /* * Do consistency checking on a directory entry: * record length must be multiple of 4 * entry must fit in rest of its DIRBLKSIZ block * record must be large enough to contain entry * name is not longer than MAXNAMLEN * name must be as long as advertised, and null terminated */ /* * changed so that it confirms to ext2_check_dir_entry */ static int ext2_dirbadentry(struct vnode *dp, struct ext2fs_direct_2 *de, int entryoffsetinblock) { int DIRBLKSIZ = VTOI(dp)->i_e2fs->e2fs_bsize; char * error_msg = NULL; if (de->e2d_reclen < EXT2_DIR_REC_LEN(1)) error_msg = "rec_len is smaller than minimal"; else if (de->e2d_reclen % 4 != 0) error_msg = "rec_len % 4 != 0"; else if (de->e2d_reclen < EXT2_DIR_REC_LEN(de->e2d_namlen)) error_msg = "reclen is too small for name_len"; else if (entryoffsetinblock + de->e2d_reclen > DIRBLKSIZ) error_msg = "directory entry across blocks"; /* else LATER if (de->inode > dir->i_sb->u.ext2_sb.s_es->s_inodes_count) error_msg = "inode out of bounds"; */ if (error_msg != NULL) { printf("bad directory entry: %s\n", error_msg); printf("offset=%d, inode=%lu, rec_len=%u, name_len=%u\n", entryoffsetinblock, (unsigned long)de->e2d_ino, de->e2d_reclen, de->e2d_namlen); } return error_msg == NULL ? 0 : 1; } /* * Write a directory entry after a call to namei, using the parameters * that it left in nameidata. The argument ip is the inode which the new * directory entry will refer to. Dvp is a pointer to the directory to * be written, which was left locked by namei. Remaining parameters * (dp->i_offset, dp->i_count) indicate how the space for the new * entry is to be obtained. */ int ext2_direnter(struct inode *ip, struct vnode *dvp, struct componentname *cnp) { struct inode *dp; struct ext2fs_direct_2 newdir; struct iovec aiov; struct uio auio; int error, newentrysize; int DIRBLKSIZ = ip->i_e2fs->e2fs_bsize; #ifdef INVARIANTS if ((cnp->cn_flags & SAVENAME) == 0) panic("ext2_direnter: missing name"); #endif dp = VTOI(dvp); newdir.e2d_ino = ip->i_number; newdir.e2d_namlen = cnp->cn_namelen; if (EXT2_HAS_INCOMPAT_FEATURE(ip->i_e2fs, EXT2F_INCOMPAT_FTYPE)) newdir.e2d_type = DTTOFT(IFTODT(ip->i_mode)); else newdir.e2d_type = EXT2_FT_UNKNOWN; bcopy(cnp->cn_nameptr, newdir.e2d_name, (unsigned)cnp->cn_namelen + 1); newentrysize = EXT2_DIR_REC_LEN(newdir.e2d_namlen); if (ext2_htree_has_idx(dp)) { error = ext2_htree_add_entry(dvp, &newdir, cnp); if (error) { - dp->i_flag &= ~IN_E4INDEX; + dp->i_flag &= ~IN_E3INDEX; dp->i_flag |= IN_CHANGE | IN_UPDATE; } return (error); } if (EXT2_HAS_COMPAT_FEATURE(ip->i_e2fs, EXT2F_COMPAT_DIRHASHINDEX) && !ext2_htree_has_idx(dp)) { if ((dp->i_size / DIRBLKSIZ) == 1 && dp->i_offset == DIRBLKSIZ) { /* * Making indexed directory when one block is not * enough to save all entries. */ return ext2_htree_create_index(dvp, cnp, &newdir); } } if (dp->i_count == 0) { /* * If dp->i_count is 0, then namei could find no * space in the directory. Here, dp->i_offset will * be on a directory block boundary and we will write the * new entry into a fresh block. */ if (dp->i_offset & (DIRBLKSIZ - 1)) panic("ext2_direnter: newblk"); auio.uio_offset = dp->i_offset; newdir.e2d_reclen = DIRBLKSIZ; auio.uio_resid = newentrysize; aiov.iov_len = newentrysize; aiov.iov_base = (caddr_t)&newdir; auio.uio_iov = &aiov; auio.uio_iovcnt = 1; auio.uio_rw = UIO_WRITE; auio.uio_segflg = UIO_SYSSPACE; auio.uio_td = (struct thread *)0; error = VOP_WRITE(dvp, &auio, IO_SYNC, cnp->cn_cred); if (DIRBLKSIZ > VFSTOEXT2(dvp->v_mount)->um_mountp->mnt_stat.f_bsize) /* XXX should grow with balloc() */ panic("ext2_direnter: frag size"); else if (!error) { dp->i_size = roundup2(dp->i_size, DIRBLKSIZ); dp->i_flag |= IN_CHANGE; } return (error); } error = ext2_add_entry(dvp, &newdir); if (!error && dp->i_endoff && dp->i_endoff < dp->i_size) error = ext2_truncate(dvp, (off_t)dp->i_endoff, IO_SYNC, cnp->cn_cred, cnp->cn_thread); return (error); } /* * Insert an entry into the directory block. * Compact the contents. */ int ext2_add_entry(struct vnode *dvp, struct ext2fs_direct_2 *entry) { struct ext2fs_direct_2 *ep, *nep; struct inode *dp; struct buf *bp; u_int dsize; int error, loc, newentrysize, spacefree; char *dirbuf; dp = VTOI(dvp); /* * If dp->i_count is non-zero, then namei found space * for the new entry in the range dp->i_offset to * dp->i_offset + dp->i_count in the directory. * To use this space, we may have to compact the entries located * there, by copying them together towards the beginning of the * block, leaving the free space in one usable chunk at the end. */ /* * Increase size of directory if entry eats into new space. * This should never push the size past a new multiple of * DIRBLKSIZE. * * N.B. - THIS IS AN ARTIFACT OF 4.2 AND SHOULD NEVER HAPPEN. */ if (dp->i_offset + dp->i_count > dp->i_size) dp->i_size = dp->i_offset + dp->i_count; /* * Get the block containing the space for the new directory entry. */ if ((error = ext2_blkatoff(dvp, (off_t)dp->i_offset, &dirbuf, &bp)) != 0) return (error); /* * Find space for the new entry. In the simple case, the entry at * offset base will have the space. If it does not, then namei * arranged that compacting the region dp->i_offset to * dp->i_offset + dp->i_count would yield the * space. */ newentrysize = EXT2_DIR_REC_LEN(entry->e2d_namlen); ep = (struct ext2fs_direct_2 *)dirbuf; dsize = EXT2_DIR_REC_LEN(ep->e2d_namlen); spacefree = ep->e2d_reclen - dsize; for (loc = ep->e2d_reclen; loc < dp->i_count; ) { nep = (struct ext2fs_direct_2 *)(dirbuf + loc); if (ep->e2d_ino) { /* trim the existing slot */ ep->e2d_reclen = dsize; ep = (struct ext2fs_direct_2 *)((char *)ep + dsize); } else { /* overwrite; nothing there; header is ours */ spacefree += dsize; } dsize = EXT2_DIR_REC_LEN(nep->e2d_namlen); spacefree += nep->e2d_reclen - dsize; loc += nep->e2d_reclen; bcopy((caddr_t)nep, (caddr_t)ep, dsize); } /* * Update the pointer fields in the previous entry (if any), * copy in the new entry, and write out the block. */ if (ep->e2d_ino == 0) { if (spacefree + dsize < newentrysize) panic("ext2_direnter: compact1"); entry->e2d_reclen = spacefree + dsize; } else { if (spacefree < newentrysize) panic("ext2_direnter: compact2"); entry->e2d_reclen = spacefree; ep->e2d_reclen = dsize; ep = (struct ext2fs_direct_2 *)((char *)ep + dsize); } bcopy((caddr_t)entry, (caddr_t)ep, (u_int)newentrysize); if (DOINGASYNC(dvp)) { bdwrite(bp); error = 0; } else { error = bwrite(bp); } dp->i_flag |= IN_CHANGE | IN_UPDATE; return (error); } /* * Remove a directory entry after a call to namei, using * the parameters which it left in nameidata. The entry * dp->i_offset contains the offset into the directory of the * entry to be eliminated. The dp->i_count field contains the * size of the previous record in the directory. If this * is 0, the first entry is being deleted, so we need only * zero the inode number to mark the entry as free. If the * entry is not the first in the directory, we must reclaim * the space of the now empty record by adding the record size * to the size of the previous entry. */ int ext2_dirremove(struct vnode *dvp, struct componentname *cnp) { struct inode *dp; struct ext2fs_direct_2 *ep, *rep; struct buf *bp; int error; dp = VTOI(dvp); if (dp->i_count == 0) { /* * First entry in block: set d_ino to zero. */ if ((error = ext2_blkatoff(dvp, (off_t)dp->i_offset, (char **)&ep, &bp)) != 0) return (error); ep->e2d_ino = 0; error = bwrite(bp); dp->i_flag |= IN_CHANGE | IN_UPDATE; return (error); } /* * Collapse new free space into previous entry. */ if ((error = ext2_blkatoff(dvp, (off_t)(dp->i_offset - dp->i_count), (char **)&ep, &bp)) != 0) return (error); /* Set 'rep' to the entry being removed. */ if (dp->i_count == 0) rep = ep; else rep = (struct ext2fs_direct_2 *)((char *)ep + ep->e2d_reclen); ep->e2d_reclen += rep->e2d_reclen; if (DOINGASYNC(dvp) && dp->i_count != 0) bdwrite(bp); else error = bwrite(bp); dp->i_flag |= IN_CHANGE | IN_UPDATE; return (error); } /* * Rewrite an existing directory entry to point at the inode * supplied. The parameters describing the directory entry are * set up by a call to namei. */ int ext2_dirrewrite(struct inode *dp, struct inode *ip, struct componentname *cnp) { struct buf *bp; struct ext2fs_direct_2 *ep; struct vnode *vdp = ITOV(dp); int error; if ((error = ext2_blkatoff(vdp, (off_t)dp->i_offset, (char **)&ep, &bp)) != 0) return (error); ep->e2d_ino = ip->i_number; if (EXT2_HAS_INCOMPAT_FEATURE(ip->i_e2fs, EXT2F_INCOMPAT_FTYPE)) ep->e2d_type = DTTOFT(IFTODT(ip->i_mode)); else ep->e2d_type = EXT2_FT_UNKNOWN; error = bwrite(bp); dp->i_flag |= IN_CHANGE | IN_UPDATE; return (error); } /* * Check if a directory is empty or not. * Inode supplied must be locked. * * Using a struct dirtemplate here is not precisely * what we want, but better than using a struct direct. * * NB: does not handle corrupted directories. */ int ext2_dirempty(struct inode *ip, ino_t parentino, struct ucred *cred) { off_t off; struct dirtemplate dbuf; struct ext2fs_direct_2 *dp = (struct ext2fs_direct_2 *)&dbuf; int error, namlen; ssize_t count; #define MINDIRSIZ (sizeof(struct dirtemplate) / 2) for (off = 0; off < ip->i_size; off += dp->e2d_reclen) { error = vn_rdwr(UIO_READ, ITOV(ip), (caddr_t)dp, MINDIRSIZ, off, UIO_SYSSPACE, IO_NODELOCKED | IO_NOMACCHECK, cred, NOCRED, &count, (struct thread *)0); /* * Since we read MINDIRSIZ, residual must * be 0 unless we're at end of file. */ if (error || count != 0) return (0); /* avoid infinite loops */ if (dp->e2d_reclen == 0) return (0); /* skip empty entries */ if (dp->e2d_ino == 0) continue; /* accept only "." and ".." */ namlen = dp->e2d_namlen; if (namlen > 2) return (0); if (dp->e2d_name[0] != '.') return (0); /* * At this point namlen must be 1 or 2. * 1 implies ".", 2 implies ".." if second * char is also "." */ if (namlen == 1) continue; if (dp->e2d_name[1] == '.' && dp->e2d_ino == parentino) continue; return (0); } return (1); } /* * Check if source directory is in the path of the target directory. * Target is supplied locked, source is unlocked. * The target is always vput before returning. */ int ext2_checkpath(struct inode *source, struct inode *target, struct ucred *cred) { struct vnode *vp; int error, namlen; struct dirtemplate dirbuf; vp = ITOV(target); if (target->i_number == source->i_number) { error = EEXIST; goto out; } if (target->i_number == EXT2_ROOTINO) { error = 0; goto out; } for (;;) { if (vp->v_type != VDIR) { error = ENOTDIR; break; } error = vn_rdwr(UIO_READ, vp, (caddr_t)&dirbuf, sizeof(struct dirtemplate), (off_t)0, UIO_SYSSPACE, IO_NODELOCKED | IO_NOMACCHECK, cred, NOCRED, NULL, NULL); if (error != 0) break; namlen = dirbuf.dotdot_type; /* like ufs little-endian */ if (namlen != 2 || dirbuf.dotdot_name[0] != '.' || dirbuf.dotdot_name[1] != '.') { error = ENOTDIR; break; } if (dirbuf.dotdot_ino == source->i_number) { error = EINVAL; break; } if (dirbuf.dotdot_ino == EXT2_ROOTINO) break; vput(vp); if ((error = VFS_VGET(vp->v_mount, dirbuf.dotdot_ino, LK_EXCLUSIVE, &vp)) != 0) { vp = NULL; break; } } out: if (error == ENOTDIR) printf("checkpath: .. not a directory\n"); if (vp != NULL) vput(vp); return (error); } Index: projects/clang380-import/sys/fs/ext2fs/inode.h =================================================================== --- projects/clang380-import/sys/fs/ext2fs/inode.h (revision 294776) +++ projects/clang380-import/sys/fs/ext2fs/inode.h (revision 294777) @@ -1,188 +1,188 @@ /*- * Copyright (c) 1982, 1989, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)inode.h 8.9 (Berkeley) 5/14/95 * $FreeBSD$ */ #ifndef _FS_EXT2FS_INODE_H_ #define _FS_EXT2FS_INODE_H_ #include #include #include #include #include /* * This must agree with the definition in . */ #define doff_t int32_t #define NDADDR 12 /* Direct addresses in inode. */ #define NIADDR 3 /* Indirect addresses in inode. */ /* * The size of physical and logical block numbers in EXT2FS. */ typedef uint32_t e2fs_daddr_t; typedef int64_t e2fs_lbn_t; typedef int64_t e4fs_daddr_t; /* * The inode is used to describe each active (or recently active) file in the * EXT2FS filesystem. It is composed of two types of information. The first * part is the information that is needed only while the file is active (such * as the identity of the file and linkage to speed its lookup). The second * part is the permanent meta-data associated with the file which is read in * from the permanent dinode from long term storage when the file becomes * active, and is put back when the file is no longer being used. */ struct inode { struct vnode *i_vnode;/* Vnode associated with this inode. */ struct ext2mount *i_ump; uint32_t i_flag; /* flags, see below */ ino_t i_number; /* The identity of the inode. */ struct m_ext2fs *i_e2fs; /* EXT2FS */ u_quad_t i_modrev; /* Revision level for NFS lease. */ /* * Side effects; used during directory lookup. */ int32_t i_count; /* Size of free slot in directory. */ doff_t i_endoff; /* End of useful stuff in directory. */ doff_t i_diroff; /* Offset in dir, where we found last entry. */ doff_t i_offset; /* Offset of free space in directory. */ uint32_t i_block_group; uint32_t i_next_alloc_block; uint32_t i_next_alloc_goal; /* Fields from struct dinode in UFS. */ uint16_t i_mode; /* IFMT, permissions; see below. */ int16_t i_nlink; /* File link count. */ uint32_t i_uid; /* File owner. */ uint32_t i_gid; /* File group. */ uint64_t i_size; /* File byte count. */ uint64_t i_blocks; /* Blocks actually held. */ int32_t i_atime; /* Last access time. */ int32_t i_mtime; /* Last modified time. */ int32_t i_ctime; /* Last inode change time. */ int32_t i_birthtime; /* Inode creation time. */ int32_t i_mtimensec; /* Last modified time. */ int32_t i_atimensec; /* Last access time. */ int32_t i_ctimensec; /* Last inode change time. */ int32_t i_birthnsec; /* Inode creation time. */ uint32_t i_gen; /* Generation number. */ uint32_t i_flags; /* Status flags (chflags). */ uint32_t i_db[NDADDR]; /* Direct disk blocks. */ uint32_t i_ib[NIADDR]; /* Indirect disk blocks. */ struct ext4_extent_cache i_ext_cache; /* cache for ext4 extent */ }; /* * The di_db fields may be overlaid with other information for * file types that do not have associated disk storage. Block * and character devices overlay the first data block with their * dev_t value. Short symbolic links place their path in the * di_db area. */ #define i_shortlink i_db #define i_rdev i_db[0] /* File permissions. */ #define IEXEC 0000100 /* Executable. */ #define IWRITE 0000200 /* Writeable. */ #define IREAD 0000400 /* Readable. */ #define ISVTX 0001000 /* Sticky bit. */ #define ISGID 0002000 /* Set-gid. */ #define ISUID 0004000 /* Set-uid. */ /* File types. */ #define IFMT 0170000 /* Mask of file type. */ #define IFIFO 0010000 /* Named pipe (fifo). */ #define IFCHR 0020000 /* Character device. */ #define IFDIR 0040000 /* Directory file. */ #define IFBLK 0060000 /* Block device. */ #define IFREG 0100000 /* Regular file. */ #define IFLNK 0120000 /* Symbolic link. */ #define IFSOCK 0140000 /* UNIX domain socket. */ #define IFWHT 0160000 /* Whiteout. */ /* These flags are kept in i_flag. */ #define IN_ACCESS 0x0001 /* Access time update request. */ #define IN_CHANGE 0x0002 /* Inode change time update request. */ #define IN_UPDATE 0x0004 /* Modification time update request. */ #define IN_MODIFIED 0x0008 /* Inode has been modified. */ #define IN_RENAME 0x0010 /* Inode is being renamed. */ #define IN_HASHED 0x0020 /* Inode is on hash list */ #define IN_LAZYMOD 0x0040 /* Modified, but don't write yet. */ #define IN_SPACECOUNTED 0x0080 /* Blocks to be freed in free count. */ #define IN_LAZYACCESS 0x0100 /* Process IN_ACCESS after the suspension finished */ /* * These are translation flags for some attributes that Ext4 * passes as inode flags but that we cannot pass directly. */ -#define IN_E4INDEX 0x010000 +#define IN_E3INDEX 0x010000 #define IN_E4EXTENTS 0x020000 #define i_devvp i_ump->um_devvp #ifdef _KERNEL /* * Structure used to pass around logical block paths generated by * ext2_getlbns and used by truncate and bmap code. */ struct indir { e2fs_lbn_t in_lbn; /* Logical block number. */ int in_off; /* Offset in buffer. */ }; /* Convert between inode pointers and vnode pointers. */ #define VTOI(vp) ((struct inode *)(vp)->v_data) #define ITOV(ip) ((ip)->i_vnode) /* This overlays the fid structure (see mount.h). */ struct ufid { uint16_t ufid_len; /* Length of structure. */ uint16_t ufid_pad; /* Force 32-bit alignment. */ ino_t ufid_ino; /* File number (ino). */ uint32_t ufid_gen; /* Generation number. */ }; #endif /* _KERNEL */ #endif /* !_FS_EXT2FS_INODE_H_ */ Index: projects/clang380-import/sys/geom/geom_flashmap.c =================================================================== --- projects/clang380-import/sys/geom/geom_flashmap.c (revision 294776) +++ projects/clang380-import/sys/geom/geom_flashmap.c (revision 294777) @@ -1,268 +1,272 @@ /*- * Copyright (c) 2012 Semihalf * Copyright (c) 2009 Jakub Klama * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define FLASHMAP_CLASS_NAME "Flashmap" struct g_flashmap_slice { off_t sl_start; off_t sl_end; const char *sl_name; STAILQ_ENTRY(g_flashmap_slice) sl_link; }; STAILQ_HEAD(g_flashmap_head, g_flashmap_slice); static void g_flashmap_print(struct g_flashmap_slice *); static int g_flashmap_modify(struct g_geom *, const char *, int, struct g_flashmap_head *); static int g_flashmap_start(struct bio *); static int g_flashmap_ioctl(struct g_provider *, u_long, void *, int, struct thread *); static void g_flashmap_dumpconf(struct sbuf *, const char *, struct g_geom *, struct g_consumer *, struct g_provider *); static struct g_geom *g_flashmap_taste(struct g_class *, struct g_provider *, int); static void g_flashmap_config(struct gctl_req *, struct g_class *, const char *); static int g_flashmap_load(device_t, struct g_flashmap_head *); static int (*flash_fill_slices)(device_t, struct flash_slice *, int *) = fdt_flash_fill_slices; MALLOC_DECLARE(M_FLASHMAP); MALLOC_DEFINE(M_FLASHMAP, "geom_flashmap", "GEOM flash memory slicer class"); static void g_flashmap_print(struct g_flashmap_slice *slice) { printf("%08jx-%08jx: %s (%juKB)\n", (uintmax_t)slice->sl_start, (uintmax_t)slice->sl_end, slice->sl_name, (uintmax_t)(slice->sl_end - slice->sl_start) / 1024); } static int g_flashmap_modify(struct g_geom *gp, const char *devname, int secsize, struct g_flashmap_head *slices) { struct g_flashmap_slice *slice; int i, error; g_topology_assert(); i = 0; STAILQ_FOREACH(slice, slices, sl_link) { if (bootverbose) { printf("%s: slice ", devname); g_flashmap_print(slice); } error = g_slice_config(gp, i++, G_SLICE_CONFIG_CHECK, slice->sl_start, slice->sl_end - slice->sl_start + 1, secsize, "%ss.%s", gp->name, slice->sl_name); if (error) return (error); } i = 0; STAILQ_FOREACH(slice, slices, sl_link) { error = g_slice_config(gp, i++, G_SLICE_CONFIG_SET, slice->sl_start, slice->sl_end - slice->sl_start + 1, secsize, "%ss.%s", gp->name, slice->sl_name); if (error) return (error); } return (0); } static int g_flashmap_start(struct bio *bp) { return (0); } static void g_flashmap_dumpconf(struct sbuf *sb, const char *indent, struct g_geom *gp, struct g_consumer *cp __unused, struct g_provider *pp) { struct g_slicer *gsp; gsp = gp->softc; g_slice_dumpconf(sb, indent, gp, cp, pp); } static int g_flashmap_ioctl(struct g_provider *pp, u_long cmd, void *data, int fflag, struct thread *td) { struct g_consumer *cp; struct g_geom *gp; if (cmd != NAND_IO_GET_CHIP_PARAM) return (ENOIOCTL); cp = LIST_FIRST(&pp->geom->consumer); if (cp == NULL) return (ENOIOCTL); gp = cp->provider->geom; if (gp->ioctl == NULL) return (ENOIOCTL); return (gp->ioctl(cp->provider, cmd, data, fflag, td)); } static struct g_geom * g_flashmap_taste(struct g_class *mp, struct g_provider *pp, int flags) { struct g_geom *gp = NULL; struct g_consumer *cp; struct g_flashmap_head head; struct g_flashmap_slice *slice, *slice_temp; device_t dev; int nslices, size; g_trace(G_T_TOPOLOGY, "flashmap_taste(%s,%s)", mp->name, pp->name); g_topology_assert(); if (flags == G_TF_NORMAL && strcmp(pp->geom->class->name, G_DISK_CLASS_NAME) != 0) return (NULL); gp = g_slice_new(mp, FLASH_SLICES_MAX_NUM, pp, &cp, NULL, 0, g_flashmap_start); if (gp == NULL) return (NULL); STAILQ_INIT(&head); do { size = sizeof(device_t); if (g_io_getattr("NAND::device", cp, &size, &dev)) { size = sizeof(device_t); - if (g_io_getattr("CFI::device", cp, &size, &dev)) - break; + if (g_io_getattr("CFI::device", cp, &size, &dev)) { + size = sizeof(device_t); + if (g_io_getattr("SPI::device", cp, &size, + &dev)) + break; + } } nslices = g_flashmap_load(dev, &head); if (nslices == 0) break; g_flashmap_modify(gp, cp->provider->name, cp->provider->sectorsize, &head); } while (0); g_access(cp, -1, 0, 0); STAILQ_FOREACH_SAFE(slice, &head, sl_link, slice_temp) { free(slice, M_FLASHMAP); } if (LIST_EMPTY(&gp->provider)) { g_slice_spoiled(cp); return (NULL); } return (gp); } static void g_flashmap_config(struct gctl_req *req, struct g_class *mp, const char *verb) { gctl_error(req, "unknown config verb"); } static int g_flashmap_load(device_t dev, struct g_flashmap_head *head) { struct flash_slice *slices; struct g_flashmap_slice *slice; uint32_t i, buf_size; int nslices = 0; buf_size = sizeof(struct flash_slice) * FLASH_SLICES_MAX_NUM; slices = malloc(buf_size, M_FLASHMAP, M_WAITOK | M_ZERO); if (flash_fill_slices && flash_fill_slices(dev, slices, &nslices) == 0) { for (i = 0; i < nslices; i++) { slice = malloc(sizeof(struct g_flashmap_slice), M_FLASHMAP, M_WAITOK); slice->sl_name = slices[i].label; slice->sl_start = slices[i].base; slice->sl_end = slices[i].base + slices[i].size - 1; STAILQ_INSERT_TAIL(head, slice, sl_link); } } free(slices, M_FLASHMAP); return (nslices); } void flash_register_slicer(int (*slicer)(device_t, struct flash_slice *, int *)) { flash_fill_slices = slicer; } static struct g_class g_flashmap_class = { .name = FLASHMAP_CLASS_NAME, .version = G_VERSION, .taste = g_flashmap_taste, .dumpconf = g_flashmap_dumpconf, .ioctl = g_flashmap_ioctl, .ctlreq = g_flashmap_config, }; DECLARE_GEOM_CLASS(g_flashmap_class, g_flashmap); Index: projects/clang380-import/sys/kern/kern_sysctl.c =================================================================== --- projects/clang380-import/sys/kern/kern_sysctl.c (revision 294776) +++ projects/clang380-import/sys/kern/kern_sysctl.c (revision 294777) @@ -1,1995 +1,1995 @@ /*- * Copyright (c) 1982, 1986, 1989, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Mike Karels at Berkeley Software Design, Inc. * * Quite extensively rewritten by Poul-Henning Kamp of the FreeBSD * project, to make these variables more userfriendly. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)kern_sysctl.c 8.4 (Berkeley) 4/14/94 */ #include __FBSDID("$FreeBSD$"); #include "opt_capsicum.h" #include "opt_compat.h" #include "opt_ktrace.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef KTRACE #include #endif #include #include #include #include static MALLOC_DEFINE(M_SYSCTL, "sysctl", "sysctl internal magic"); static MALLOC_DEFINE(M_SYSCTLOID, "sysctloid", "sysctl dynamic oids"); static MALLOC_DEFINE(M_SYSCTLTMP, "sysctltmp", "sysctl temp output buffer"); /* * The sysctllock protects the MIB tree. It also protects sysctl * contexts used with dynamic sysctls. The sysctl_register_oid() and * sysctl_unregister_oid() routines require the sysctllock to already * be held, so the sysctl_wlock() and sysctl_wunlock() routines are * provided for the few places in the kernel which need to use that * API rather than using the dynamic API. Use of the dynamic API is * strongly encouraged for most code. * * The sysctlmemlock is used to limit the amount of user memory wired for * sysctl requests. This is implemented by serializing any userland * sysctl requests larger than a single page via an exclusive lock. */ static struct rmlock sysctllock; static struct sx sysctlmemlock; #define SYSCTL_WLOCK() rm_wlock(&sysctllock) #define SYSCTL_WUNLOCK() rm_wunlock(&sysctllock) #define SYSCTL_RLOCK(tracker) rm_rlock(&sysctllock, (tracker)) #define SYSCTL_RUNLOCK(tracker) rm_runlock(&sysctllock, (tracker)) #define SYSCTL_WLOCKED() rm_wowned(&sysctllock) #define SYSCTL_ASSERT_LOCKED() rm_assert(&sysctllock, RA_LOCKED) #define SYSCTL_ASSERT_WLOCKED() rm_assert(&sysctllock, RA_WLOCKED) #define SYSCTL_ASSERT_RLOCKED() rm_assert(&sysctllock, RA_RLOCKED) #define SYSCTL_INIT() rm_init_flags(&sysctllock, "sysctl lock", \ RM_SLEEPABLE) #define SYSCTL_SLEEP(ch, wmesg, timo) \ rm_sleep(ch, &sysctllock, 0, wmesg, timo) static int sysctl_root(SYSCTL_HANDLER_ARGS); /* Root list */ struct sysctl_oid_list sysctl__children = SLIST_HEAD_INITIALIZER(&sysctl__children); static int sysctl_remove_oid_locked(struct sysctl_oid *oidp, int del, int recurse); static int sysctl_old_kernel(struct sysctl_req *, const void *, size_t); static int sysctl_new_kernel(struct sysctl_req *, void *, size_t); static struct sysctl_oid * sysctl_find_oidname(const char *name, struct sysctl_oid_list *list) { struct sysctl_oid *oidp; SYSCTL_ASSERT_LOCKED(); SLIST_FOREACH(oidp, list, oid_link) { if (strcmp(oidp->oid_name, name) == 0) { return (oidp); } } return (NULL); } /* * Initialization of the MIB tree. * * Order by number in each list. */ void sysctl_wlock(void) { SYSCTL_WLOCK(); } void sysctl_wunlock(void) { SYSCTL_WUNLOCK(); } static int sysctl_root_handler_locked(struct sysctl_oid *oid, void *arg1, intmax_t arg2, struct sysctl_req *req, struct rm_priotracker *tracker) { int error; if (oid->oid_kind & CTLFLAG_DYN) atomic_add_int(&oid->oid_running, 1); if (tracker != NULL) SYSCTL_RUNLOCK(tracker); else SYSCTL_WUNLOCK(); if (!(oid->oid_kind & CTLFLAG_MPSAFE)) mtx_lock(&Giant); error = oid->oid_handler(oid, arg1, arg2, req); if (!(oid->oid_kind & CTLFLAG_MPSAFE)) mtx_unlock(&Giant); + KFAIL_POINT_ERROR(_debug_fail_point, sysctl_running, error); + if (tracker != NULL) SYSCTL_RLOCK(tracker); else SYSCTL_WLOCK(); if (oid->oid_kind & CTLFLAG_DYN) { if (atomic_fetchadd_int(&oid->oid_running, -1) == 1 && (oid->oid_kind & CTLFLAG_DYING) != 0) wakeup(&oid->oid_running); } return (error); } static void sysctl_load_tunable_by_oid_locked(struct sysctl_oid *oidp) { struct sysctl_req req; struct sysctl_oid *curr; char *penv = NULL; char path[64]; ssize_t rem = sizeof(path); ssize_t len; uint8_t val_8; uint16_t val_16; uint32_t val_32; int val_int; long val_long; int64_t val_64; quad_t val_quad; int error; path[--rem] = 0; for (curr = oidp; curr != NULL; curr = SYSCTL_PARENT(curr)) { len = strlen(curr->oid_name); rem -= len; if (curr != oidp) rem -= 1; if (rem < 0) { printf("OID path exceeds %d bytes\n", (int)sizeof(path)); return; } memcpy(path + rem, curr->oid_name, len); if (curr != oidp) path[rem + len] = '.'; } memset(&req, 0, sizeof(req)); req.td = curthread; req.oldfunc = sysctl_old_kernel; req.newfunc = sysctl_new_kernel; req.lock = REQ_UNWIRED; switch (oidp->oid_kind & CTLTYPE) { case CTLTYPE_INT: if (getenv_int(path + rem, &val_int) == 0) return; req.newlen = sizeof(val_int); req.newptr = &val_int; break; case CTLTYPE_UINT: if (getenv_uint(path + rem, (unsigned int *)&val_int) == 0) return; req.newlen = sizeof(val_int); req.newptr = &val_int; break; case CTLTYPE_LONG: if (getenv_long(path + rem, &val_long) == 0) return; req.newlen = sizeof(val_long); req.newptr = &val_long; break; case CTLTYPE_ULONG: if (getenv_ulong(path + rem, (unsigned long *)&val_long) == 0) return; req.newlen = sizeof(val_long); req.newptr = &val_long; break; case CTLTYPE_S8: if (getenv_int(path + rem, &val_int) == 0) return; val_8 = val_int; req.newlen = sizeof(val_8); req.newptr = &val_8; break; case CTLTYPE_S16: if (getenv_int(path + rem, &val_int) == 0) return; val_16 = val_int; req.newlen = sizeof(val_16); req.newptr = &val_16; break; case CTLTYPE_S32: if (getenv_long(path + rem, &val_long) == 0) return; val_32 = val_long; req.newlen = sizeof(val_32); req.newptr = &val_32; break; case CTLTYPE_S64: if (getenv_quad(path + rem, &val_quad) == 0) return; val_64 = val_quad; req.newlen = sizeof(val_64); req.newptr = &val_64; break; case CTLTYPE_U8: if (getenv_uint(path + rem, (unsigned int *)&val_int) == 0) return; val_8 = val_int; req.newlen = sizeof(val_8); req.newptr = &val_8; break; case CTLTYPE_U16: if (getenv_uint(path + rem, (unsigned int *)&val_int) == 0) return; val_16 = val_int; req.newlen = sizeof(val_16); req.newptr = &val_16; break; case CTLTYPE_U32: if (getenv_ulong(path + rem, (unsigned long *)&val_long) == 0) return; val_32 = val_long; req.newlen = sizeof(val_32); req.newptr = &val_32; break; case CTLTYPE_U64: /* XXX there is no getenv_uquad() */ if (getenv_quad(path + rem, &val_quad) == 0) return; val_64 = val_quad; req.newlen = sizeof(val_64); req.newptr = &val_64; break; case CTLTYPE_STRING: penv = kern_getenv(path + rem); if (penv == NULL) return; req.newlen = strlen(penv); req.newptr = penv; break; default: return; } error = sysctl_root_handler_locked(oidp, oidp->oid_arg1, oidp->oid_arg2, &req, NULL); if (error != 0) printf("Setting sysctl %s failed: %d\n", path + rem, error); if (penv != NULL) freeenv(penv); } void sysctl_register_oid(struct sysctl_oid *oidp) { struct sysctl_oid_list *parent = oidp->oid_parent; struct sysctl_oid *p; struct sysctl_oid *q; int oid_number; int timeout = 2; /* * First check if another oid with the same name already * exists in the parent's list. */ SYSCTL_ASSERT_WLOCKED(); p = sysctl_find_oidname(oidp->oid_name, parent); if (p != NULL) { if ((p->oid_kind & CTLTYPE) == CTLTYPE_NODE) { p->oid_refcnt++; return; } else { printf("can't re-use a leaf (%s)!\n", p->oid_name); return; } } /* get current OID number */ oid_number = oidp->oid_number; #if (OID_AUTO >= 0) #error "OID_AUTO is expected to be a negative value" #endif /* * Any negative OID number qualifies as OID_AUTO. Valid OID * numbers should always be positive. * * NOTE: DO NOT change the starting value here, change it in * , and make sure it is at least 256 to * accomodate e.g. net.inet.raw as a static sysctl node. */ if (oid_number < 0) { static int newoid; /* * By decrementing the next OID number we spend less * time inserting the OIDs into a sorted list. */ if (--newoid < CTL_AUTO_START) newoid = 0x7fffffff; oid_number = newoid; } /* * Insert the OID into the parent's list sorted by OID number. */ retry: q = NULL; SLIST_FOREACH(p, parent, oid_link) { /* check if the current OID number is in use */ if (oid_number == p->oid_number) { /* get the next valid OID number */ if (oid_number < CTL_AUTO_START || oid_number == 0x7fffffff) { /* wraparound - restart */ oid_number = CTL_AUTO_START; /* don't loop forever */ if (!timeout--) panic("sysctl: Out of OID numbers\n"); goto retry; } else { oid_number++; } } else if (oid_number < p->oid_number) break; q = p; } /* check for non-auto OID number collision */ if (oidp->oid_number >= 0 && oidp->oid_number < CTL_AUTO_START && oid_number >= CTL_AUTO_START) { printf("sysctl: OID number(%d) is already in use for '%s'\n", oidp->oid_number, oidp->oid_name); } /* update the OID number, if any */ oidp->oid_number = oid_number; if (q != NULL) SLIST_INSERT_AFTER(q, oidp, oid_link); else SLIST_INSERT_HEAD(parent, oidp, oid_link); if ((oidp->oid_kind & CTLTYPE) != CTLTYPE_NODE && #ifdef VIMAGE (oidp->oid_kind & CTLFLAG_VNET) == 0 && #endif (oidp->oid_kind & CTLFLAG_TUN) != 0 && (oidp->oid_kind & CTLFLAG_NOFETCH) == 0) { /* only fetch value once */ oidp->oid_kind |= CTLFLAG_NOFETCH; /* try to fetch value from kernel environment */ sysctl_load_tunable_by_oid_locked(oidp); } } void sysctl_unregister_oid(struct sysctl_oid *oidp) { struct sysctl_oid *p; int error; SYSCTL_ASSERT_WLOCKED(); error = ENOENT; if (oidp->oid_number == OID_AUTO) { error = EINVAL; } else { SLIST_FOREACH(p, oidp->oid_parent, oid_link) { if (p == oidp) { SLIST_REMOVE(oidp->oid_parent, oidp, sysctl_oid, oid_link); error = 0; break; } } } /* * This can happen when a module fails to register and is * being unloaded afterwards. It should not be a panic() * for normal use. */ if (error) printf("%s: failed to unregister sysctl\n", __func__); } /* Initialize a new context to keep track of dynamically added sysctls. */ int sysctl_ctx_init(struct sysctl_ctx_list *c) { if (c == NULL) { return (EINVAL); } /* * No locking here, the caller is responsible for not adding * new nodes to a context until after this function has * returned. */ TAILQ_INIT(c); return (0); } /* Free the context, and destroy all dynamic oids registered in this context */ int sysctl_ctx_free(struct sysctl_ctx_list *clist) { struct sysctl_ctx_entry *e, *e1; int error; error = 0; /* * First perform a "dry run" to check if it's ok to remove oids. * XXX FIXME * XXX This algorithm is a hack. But I don't know any * XXX better solution for now... */ SYSCTL_WLOCK(); TAILQ_FOREACH(e, clist, link) { error = sysctl_remove_oid_locked(e->entry, 0, 0); if (error) break; } /* * Restore deregistered entries, either from the end, * or from the place where error occured. * e contains the entry that was not unregistered */ if (error) e1 = TAILQ_PREV(e, sysctl_ctx_list, link); else e1 = TAILQ_LAST(clist, sysctl_ctx_list); while (e1 != NULL) { sysctl_register_oid(e1->entry); e1 = TAILQ_PREV(e1, sysctl_ctx_list, link); } if (error) { SYSCTL_WUNLOCK(); return(EBUSY); } /* Now really delete the entries */ e = TAILQ_FIRST(clist); while (e != NULL) { e1 = TAILQ_NEXT(e, link); error = sysctl_remove_oid_locked(e->entry, 1, 0); if (error) panic("sysctl_remove_oid: corrupt tree, entry: %s", e->entry->oid_name); free(e, M_SYSCTLOID); e = e1; } SYSCTL_WUNLOCK(); return (error); } /* Add an entry to the context */ struct sysctl_ctx_entry * sysctl_ctx_entry_add(struct sysctl_ctx_list *clist, struct sysctl_oid *oidp) { struct sysctl_ctx_entry *e; SYSCTL_ASSERT_WLOCKED(); if (clist == NULL || oidp == NULL) return(NULL); e = malloc(sizeof(struct sysctl_ctx_entry), M_SYSCTLOID, M_WAITOK); e->entry = oidp; TAILQ_INSERT_HEAD(clist, e, link); return (e); } /* Find an entry in the context */ struct sysctl_ctx_entry * sysctl_ctx_entry_find(struct sysctl_ctx_list *clist, struct sysctl_oid *oidp) { struct sysctl_ctx_entry *e; SYSCTL_ASSERT_WLOCKED(); if (clist == NULL || oidp == NULL) return(NULL); TAILQ_FOREACH(e, clist, link) { if(e->entry == oidp) return(e); } return (e); } /* * Delete an entry from the context. * NOTE: this function doesn't free oidp! You have to remove it * with sysctl_remove_oid(). */ int sysctl_ctx_entry_del(struct sysctl_ctx_list *clist, struct sysctl_oid *oidp) { struct sysctl_ctx_entry *e; if (clist == NULL || oidp == NULL) return (EINVAL); SYSCTL_WLOCK(); e = sysctl_ctx_entry_find(clist, oidp); if (e != NULL) { TAILQ_REMOVE(clist, e, link); SYSCTL_WUNLOCK(); free(e, M_SYSCTLOID); return (0); } else { SYSCTL_WUNLOCK(); return (ENOENT); } } /* * Remove dynamically created sysctl trees. * oidp - top of the tree to be removed * del - if 0 - just deregister, otherwise free up entries as well * recurse - if != 0 traverse the subtree to be deleted */ int sysctl_remove_oid(struct sysctl_oid *oidp, int del, int recurse) { int error; SYSCTL_WLOCK(); error = sysctl_remove_oid_locked(oidp, del, recurse); SYSCTL_WUNLOCK(); return (error); } int sysctl_remove_name(struct sysctl_oid *parent, const char *name, int del, int recurse) { struct sysctl_oid *p, *tmp; int error; error = ENOENT; SYSCTL_WLOCK(); SLIST_FOREACH_SAFE(p, SYSCTL_CHILDREN(parent), oid_link, tmp) { if (strcmp(p->oid_name, name) == 0) { error = sysctl_remove_oid_locked(p, del, recurse); break; } } SYSCTL_WUNLOCK(); return (error); } static int sysctl_remove_oid_locked(struct sysctl_oid *oidp, int del, int recurse) { struct sysctl_oid *p, *tmp; int error; SYSCTL_ASSERT_WLOCKED(); if (oidp == NULL) return(EINVAL); if ((oidp->oid_kind & CTLFLAG_DYN) == 0) { printf("can't remove non-dynamic nodes!\n"); return (EINVAL); } /* * WARNING: normal method to do this should be through * sysctl_ctx_free(). Use recursing as the last resort * method to purge your sysctl tree of leftovers... * However, if some other code still references these nodes, * it will panic. */ if ((oidp->oid_kind & CTLTYPE) == CTLTYPE_NODE) { if (oidp->oid_refcnt == 1) { SLIST_FOREACH_SAFE(p, SYSCTL_CHILDREN(oidp), oid_link, tmp) { if (!recurse) { printf("Warning: failed attempt to " "remove oid %s with child %s\n", oidp->oid_name, p->oid_name); return (ENOTEMPTY); } error = sysctl_remove_oid_locked(p, del, recurse); if (error) return (error); } } } if (oidp->oid_refcnt > 1 ) { oidp->oid_refcnt--; } else { if (oidp->oid_refcnt == 0) { printf("Warning: bad oid_refcnt=%u (%s)!\n", oidp->oid_refcnt, oidp->oid_name); return (EINVAL); } sysctl_unregister_oid(oidp); if (del) { /* * Wait for all threads running the handler to drain. * This preserves the previous behavior when the * sysctl lock was held across a handler invocation, * and is necessary for module unload correctness. */ while (oidp->oid_running > 0) { oidp->oid_kind |= CTLFLAG_DYING; SYSCTL_SLEEP(&oidp->oid_running, "oidrm", 0); } if (oidp->oid_descr) free(__DECONST(char *, oidp->oid_descr), M_SYSCTLOID); free(__DECONST(char *, oidp->oid_name), M_SYSCTLOID); free(oidp, M_SYSCTLOID); } } return (0); } /* * Create new sysctls at run time. * clist may point to a valid context initialized with sysctl_ctx_init(). */ struct sysctl_oid * sysctl_add_oid(struct sysctl_ctx_list *clist, struct sysctl_oid_list *parent, int number, const char *name, int kind, void *arg1, intmax_t arg2, int (*handler)(SYSCTL_HANDLER_ARGS), const char *fmt, const char *descr) { struct sysctl_oid *oidp; /* You have to hook up somewhere.. */ if (parent == NULL) return(NULL); /* Check if the node already exists, otherwise create it */ SYSCTL_WLOCK(); oidp = sysctl_find_oidname(name, parent); if (oidp != NULL) { if ((oidp->oid_kind & CTLTYPE) == CTLTYPE_NODE) { oidp->oid_refcnt++; /* Update the context */ if (clist != NULL) sysctl_ctx_entry_add(clist, oidp); SYSCTL_WUNLOCK(); return (oidp); } else { SYSCTL_WUNLOCK(); printf("can't re-use a leaf (%s)!\n", name); return (NULL); } } oidp = malloc(sizeof(struct sysctl_oid), M_SYSCTLOID, M_WAITOK|M_ZERO); oidp->oid_parent = parent; SLIST_INIT(&oidp->oid_children); oidp->oid_number = number; oidp->oid_refcnt = 1; oidp->oid_name = strdup(name, M_SYSCTLOID); oidp->oid_handler = handler; oidp->oid_kind = CTLFLAG_DYN | kind; oidp->oid_arg1 = arg1; oidp->oid_arg2 = arg2; oidp->oid_fmt = fmt; if (descr != NULL) oidp->oid_descr = strdup(descr, M_SYSCTLOID); /* Update the context, if used */ if (clist != NULL) sysctl_ctx_entry_add(clist, oidp); /* Register this oid */ sysctl_register_oid(oidp); SYSCTL_WUNLOCK(); return (oidp); } /* * Rename an existing oid. */ void sysctl_rename_oid(struct sysctl_oid *oidp, const char *name) { char *newname; char *oldname; newname = strdup(name, M_SYSCTLOID); SYSCTL_WLOCK(); oldname = __DECONST(char *, oidp->oid_name); oidp->oid_name = newname; SYSCTL_WUNLOCK(); free(oldname, M_SYSCTLOID); } /* * Reparent an existing oid. */ int sysctl_move_oid(struct sysctl_oid *oid, struct sysctl_oid_list *parent) { struct sysctl_oid *oidp; SYSCTL_WLOCK(); if (oid->oid_parent == parent) { SYSCTL_WUNLOCK(); return (0); } oidp = sysctl_find_oidname(oid->oid_name, parent); if (oidp != NULL) { SYSCTL_WUNLOCK(); return (EEXIST); } sysctl_unregister_oid(oid); oid->oid_parent = parent; oid->oid_number = OID_AUTO; sysctl_register_oid(oid); SYSCTL_WUNLOCK(); return (0); } /* * Register the kernel's oids on startup. */ SET_DECLARE(sysctl_set, struct sysctl_oid); static void sysctl_register_all(void *arg) { struct sysctl_oid **oidp; sx_init(&sysctlmemlock, "sysctl mem"); SYSCTL_INIT(); SYSCTL_WLOCK(); SET_FOREACH(oidp, sysctl_set) sysctl_register_oid(*oidp); SYSCTL_WUNLOCK(); } SYSINIT(sysctl, SI_SUB_KMEM, SI_ORDER_FIRST, sysctl_register_all, 0); /* * "Staff-functions" * * These functions implement a presently undocumented interface * used by the sysctl program to walk the tree, and get the type * so it can print the value. * This interface is under work and consideration, and should probably * be killed with a big axe by the first person who can find the time. * (be aware though, that the proper interface isn't as obvious as it * may seem, there are various conflicting requirements. * * {0,0} printf the entire MIB-tree. * {0,1,...} return the name of the "..." OID. * {0,2,...} return the next OID. * {0,3} return the OID of the name in "new" * {0,4,...} return the kind & format info for the "..." OID. * {0,5,...} return the description the "..." OID. */ #ifdef SYSCTL_DEBUG static void sysctl_sysctl_debug_dump_node(struct sysctl_oid_list *l, int i) { int k; struct sysctl_oid *oidp; SYSCTL_ASSERT_LOCKED(); SLIST_FOREACH(oidp, l, oid_link) { for (k=0; koid_number, oidp->oid_name); printf("%c%c", oidp->oid_kind & CTLFLAG_RD ? 'R':' ', oidp->oid_kind & CTLFLAG_WR ? 'W':' '); if (oidp->oid_handler) printf(" *Handler"); switch (oidp->oid_kind & CTLTYPE) { case CTLTYPE_NODE: printf(" Node\n"); if (!oidp->oid_handler) { sysctl_sysctl_debug_dump_node( SYSCTL_CHILDREN(oidp), i + 2); } break; case CTLTYPE_INT: printf(" Int\n"); break; case CTLTYPE_UINT: printf(" u_int\n"); break; case CTLTYPE_LONG: printf(" Long\n"); break; case CTLTYPE_ULONG: printf(" u_long\n"); break; case CTLTYPE_STRING: printf(" String\n"); break; case CTLTYPE_S8: printf(" int8_t\n"); break; case CTLTYPE_S16: printf(" int16_t\n"); break; case CTLTYPE_S32: printf(" int32_t\n"); break; case CTLTYPE_S64: printf(" int64_t\n"); break; case CTLTYPE_U8: printf(" uint8_t\n"); break; case CTLTYPE_U16: printf(" uint16_t\n"); break; case CTLTYPE_U32: printf(" uint32_t\n"); break; case CTLTYPE_U64: printf(" uint64_t\n"); break; case CTLTYPE_OPAQUE: printf(" Opaque/struct\n"); break; default: printf("\n"); } } } static int sysctl_sysctl_debug(SYSCTL_HANDLER_ARGS) { struct rm_priotracker tracker; int error; error = priv_check(req->td, PRIV_SYSCTL_DEBUG); if (error) return (error); SYSCTL_RLOCK(&tracker); sysctl_sysctl_debug_dump_node(&sysctl__children, 0); SYSCTL_RUNLOCK(&tracker); return (ENOENT); } SYSCTL_PROC(_sysctl, 0, debug, CTLTYPE_STRING|CTLFLAG_RD|CTLFLAG_MPSAFE, 0, 0, sysctl_sysctl_debug, "-", ""); #endif static int sysctl_sysctl_name(SYSCTL_HANDLER_ARGS) { int *name = (int *) arg1; u_int namelen = arg2; int error = 0; struct sysctl_oid *oid; struct sysctl_oid_list *lsp = &sysctl__children, *lsp2; struct rm_priotracker tracker; char buf[10]; SYSCTL_RLOCK(&tracker); while (namelen) { if (!lsp) { snprintf(buf,sizeof(buf),"%d",*name); if (req->oldidx) error = SYSCTL_OUT(req, ".", 1); if (!error) error = SYSCTL_OUT(req, buf, strlen(buf)); if (error) goto out; namelen--; name++; continue; } lsp2 = 0; SLIST_FOREACH(oid, lsp, oid_link) { if (oid->oid_number != *name) continue; if (req->oldidx) error = SYSCTL_OUT(req, ".", 1); if (!error) error = SYSCTL_OUT(req, oid->oid_name, strlen(oid->oid_name)); if (error) goto out; namelen--; name++; if ((oid->oid_kind & CTLTYPE) != CTLTYPE_NODE) break; if (oid->oid_handler) break; lsp2 = SYSCTL_CHILDREN(oid); break; } lsp = lsp2; } error = SYSCTL_OUT(req, "", 1); out: SYSCTL_RUNLOCK(&tracker); return (error); } /* * XXXRW/JA: Shouldn't return name data for nodes that we don't permit in * capability mode. */ static SYSCTL_NODE(_sysctl, 1, name, CTLFLAG_RD | CTLFLAG_MPSAFE | CTLFLAG_CAPRD, sysctl_sysctl_name, ""); static int sysctl_sysctl_next_ls(struct sysctl_oid_list *lsp, int *name, u_int namelen, int *next, int *len, int level, struct sysctl_oid **oidpp) { struct sysctl_oid *oidp; SYSCTL_ASSERT_LOCKED(); *len = level; SLIST_FOREACH(oidp, lsp, oid_link) { *next = oidp->oid_number; *oidpp = oidp; if (oidp->oid_kind & CTLFLAG_SKIP) continue; if (!namelen) { if ((oidp->oid_kind & CTLTYPE) != CTLTYPE_NODE) return (0); if (oidp->oid_handler) /* We really should call the handler here...*/ return (0); lsp = SYSCTL_CHILDREN(oidp); if (!sysctl_sysctl_next_ls(lsp, 0, 0, next+1, len, level+1, oidpp)) return (0); goto emptynode; } if (oidp->oid_number < *name) continue; if (oidp->oid_number > *name) { if ((oidp->oid_kind & CTLTYPE) != CTLTYPE_NODE) return (0); if (oidp->oid_handler) return (0); lsp = SYSCTL_CHILDREN(oidp); if (!sysctl_sysctl_next_ls(lsp, name+1, namelen-1, next+1, len, level+1, oidpp)) return (0); goto next; } if ((oidp->oid_kind & CTLTYPE) != CTLTYPE_NODE) continue; if (oidp->oid_handler) continue; lsp = SYSCTL_CHILDREN(oidp); if (!sysctl_sysctl_next_ls(lsp, name+1, namelen-1, next+1, len, level+1, oidpp)) return (0); next: namelen = 1; emptynode: *len = level; } return (1); } static int sysctl_sysctl_next(SYSCTL_HANDLER_ARGS) { int *name = (int *) arg1; u_int namelen = arg2; int i, j, error; struct sysctl_oid *oid; struct sysctl_oid_list *lsp = &sysctl__children; struct rm_priotracker tracker; int newoid[CTL_MAXNAME]; SYSCTL_RLOCK(&tracker); i = sysctl_sysctl_next_ls(lsp, name, namelen, newoid, &j, 1, &oid); SYSCTL_RUNLOCK(&tracker); if (i) return (ENOENT); error = SYSCTL_OUT(req, newoid, j * sizeof (int)); return (error); } /* * XXXRW/JA: Shouldn't return next data for nodes that we don't permit in * capability mode. */ static SYSCTL_NODE(_sysctl, 2, next, CTLFLAG_RD | CTLFLAG_MPSAFE | CTLFLAG_CAPRD, sysctl_sysctl_next, ""); static int name2oid(char *name, int *oid, int *len, struct sysctl_oid **oidpp) { struct sysctl_oid *oidp; struct sysctl_oid_list *lsp = &sysctl__children; char *p; SYSCTL_ASSERT_LOCKED(); for (*len = 0; *len < CTL_MAXNAME;) { p = strsep(&name, "."); oidp = SLIST_FIRST(lsp); for (;; oidp = SLIST_NEXT(oidp, oid_link)) { if (oidp == NULL) return (ENOENT); if (strcmp(p, oidp->oid_name) == 0) break; } *oid++ = oidp->oid_number; (*len)++; if (name == NULL || *name == '\0') { if (oidpp) *oidpp = oidp; return (0); } if ((oidp->oid_kind & CTLTYPE) != CTLTYPE_NODE) break; if (oidp->oid_handler) break; lsp = SYSCTL_CHILDREN(oidp); } return (ENOENT); } static int sysctl_sysctl_name2oid(SYSCTL_HANDLER_ARGS) { char *p; int error, oid[CTL_MAXNAME], len = 0; struct sysctl_oid *op = 0; struct rm_priotracker tracker; if (!req->newlen) return (ENOENT); if (req->newlen >= MAXPATHLEN) /* XXX arbitrary, undocumented */ return (ENAMETOOLONG); p = malloc(req->newlen+1, M_SYSCTL, M_WAITOK); error = SYSCTL_IN(req, p, req->newlen); if (error) { free(p, M_SYSCTL); return (error); } p [req->newlen] = '\0'; SYSCTL_RLOCK(&tracker); error = name2oid(p, oid, &len, &op); SYSCTL_RUNLOCK(&tracker); free(p, M_SYSCTL); if (error) return (error); error = SYSCTL_OUT(req, oid, len * sizeof *oid); return (error); } /* * XXXRW/JA: Shouldn't return name2oid data for nodes that we don't permit in * capability mode. */ SYSCTL_PROC(_sysctl, 3, name2oid, CTLTYPE_INT | CTLFLAG_RW | CTLFLAG_ANYBODY | CTLFLAG_MPSAFE | CTLFLAG_CAPRW, 0, 0, sysctl_sysctl_name2oid, "I", ""); static int sysctl_sysctl_oidfmt(SYSCTL_HANDLER_ARGS) { struct sysctl_oid *oid; struct rm_priotracker tracker; int error; SYSCTL_RLOCK(&tracker); error = sysctl_find_oid(arg1, arg2, &oid, NULL, req); if (error) goto out; if (oid->oid_fmt == NULL) { error = ENOENT; goto out; } error = SYSCTL_OUT(req, &oid->oid_kind, sizeof(oid->oid_kind)); if (error) goto out; error = SYSCTL_OUT(req, oid->oid_fmt, strlen(oid->oid_fmt) + 1); out: SYSCTL_RUNLOCK(&tracker); return (error); } static SYSCTL_NODE(_sysctl, 4, oidfmt, CTLFLAG_RD|CTLFLAG_MPSAFE|CTLFLAG_CAPRD, sysctl_sysctl_oidfmt, ""); static int sysctl_sysctl_oiddescr(SYSCTL_HANDLER_ARGS) { struct sysctl_oid *oid; struct rm_priotracker tracker; int error; SYSCTL_RLOCK(&tracker); error = sysctl_find_oid(arg1, arg2, &oid, NULL, req); if (error) goto out; if (oid->oid_descr == NULL) { error = ENOENT; goto out; } error = SYSCTL_OUT(req, oid->oid_descr, strlen(oid->oid_descr) + 1); out: SYSCTL_RUNLOCK(&tracker); return (error); } static SYSCTL_NODE(_sysctl, 5, oiddescr, CTLFLAG_RD|CTLFLAG_MPSAFE|CTLFLAG_CAPRD, sysctl_sysctl_oiddescr, ""); /* * Default "handler" functions. */ /* * Handle an int8_t, signed or unsigned. * Two cases: * a variable: point arg1 at it. * a constant: pass it in arg2. */ int sysctl_handle_8(SYSCTL_HANDLER_ARGS) { int8_t tmpout; int error = 0; /* * Attempt to get a coherent snapshot by making a copy of the data. */ if (arg1) tmpout = *(int8_t *)arg1; else tmpout = arg2; error = SYSCTL_OUT(req, &tmpout, sizeof(tmpout)); if (error || !req->newptr) return (error); if (!arg1) error = EPERM; else error = SYSCTL_IN(req, arg1, sizeof(tmpout)); return (error); } /* * Handle an int16_t, signed or unsigned. * Two cases: * a variable: point arg1 at it. * a constant: pass it in arg2. */ int sysctl_handle_16(SYSCTL_HANDLER_ARGS) { int16_t tmpout; int error = 0; /* * Attempt to get a coherent snapshot by making a copy of the data. */ if (arg1) tmpout = *(int16_t *)arg1; else tmpout = arg2; error = SYSCTL_OUT(req, &tmpout, sizeof(tmpout)); if (error || !req->newptr) return (error); if (!arg1) error = EPERM; else error = SYSCTL_IN(req, arg1, sizeof(tmpout)); return (error); } /* * Handle an int32_t, signed or unsigned. * Two cases: * a variable: point arg1 at it. * a constant: pass it in arg2. */ int sysctl_handle_32(SYSCTL_HANDLER_ARGS) { int32_t tmpout; int error = 0; /* * Attempt to get a coherent snapshot by making a copy of the data. */ if (arg1) tmpout = *(int32_t *)arg1; else tmpout = arg2; error = SYSCTL_OUT(req, &tmpout, sizeof(tmpout)); if (error || !req->newptr) return (error); if (!arg1) error = EPERM; else error = SYSCTL_IN(req, arg1, sizeof(tmpout)); return (error); } /* * Handle an int, signed or unsigned. * Two cases: * a variable: point arg1 at it. * a constant: pass it in arg2. */ int sysctl_handle_int(SYSCTL_HANDLER_ARGS) { int tmpout, error = 0; /* * Attempt to get a coherent snapshot by making a copy of the data. */ if (arg1) tmpout = *(int *)arg1; else tmpout = arg2; error = SYSCTL_OUT(req, &tmpout, sizeof(int)); if (error || !req->newptr) return (error); if (!arg1) error = EPERM; else error = SYSCTL_IN(req, arg1, sizeof(int)); return (error); } /* * Based on on sysctl_handle_int() convert milliseconds into ticks. * Note: this is used by TCP. */ int sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) { int error, s, tt; tt = *(int *)arg1; s = (int)((int64_t)tt * 1000 / hz); error = sysctl_handle_int(oidp, &s, 0, req); if (error || !req->newptr) return (error); tt = (int)((int64_t)s * hz / 1000); if (tt < 1) return (EINVAL); *(int *)arg1 = tt; return (0); } /* * Handle a long, signed or unsigned. * Two cases: * a variable: point arg1 at it. * a constant: pass it in arg2. */ int sysctl_handle_long(SYSCTL_HANDLER_ARGS) { int error = 0; long tmplong; #ifdef SCTL_MASK32 int tmpint; #endif /* * Attempt to get a coherent snapshot by making a copy of the data. */ if (arg1) tmplong = *(long *)arg1; else tmplong = arg2; #ifdef SCTL_MASK32 if (req->flags & SCTL_MASK32) { tmpint = tmplong; error = SYSCTL_OUT(req, &tmpint, sizeof(int)); } else #endif error = SYSCTL_OUT(req, &tmplong, sizeof(long)); if (error || !req->newptr) return (error); if (!arg1) error = EPERM; #ifdef SCTL_MASK32 else if (req->flags & SCTL_MASK32) { error = SYSCTL_IN(req, &tmpint, sizeof(int)); *(long *)arg1 = (long)tmpint; } #endif else error = SYSCTL_IN(req, arg1, sizeof(long)); return (error); } /* * Handle a 64 bit int, signed or unsigned. * Two cases: * a variable: point arg1 at it. * a constant: pass it in arg2. */ int sysctl_handle_64(SYSCTL_HANDLER_ARGS) { int error = 0; uint64_t tmpout; /* * Attempt to get a coherent snapshot by making a copy of the data. */ if (arg1) tmpout = *(uint64_t *)arg1; else tmpout = arg2; error = SYSCTL_OUT(req, &tmpout, sizeof(uint64_t)); if (error || !req->newptr) return (error); if (!arg1) error = EPERM; else error = SYSCTL_IN(req, arg1, sizeof(uint64_t)); return (error); } /* * Handle our generic '\0' terminated 'C' string. * Two cases: * a variable string: point arg1 at it, arg2 is max length. * a constant string: point arg1 at it, arg2 is zero. */ int sysctl_handle_string(SYSCTL_HANDLER_ARGS) { size_t outlen; int error = 0, ro_string = 0; /* * A zero-length buffer indicates a fixed size read-only * string: */ if (arg2 == 0) { arg2 = strlen((char *)arg1) + 1; ro_string = 1; } if (req->oldptr != NULL) { char *tmparg; if (ro_string) { tmparg = arg1; } else { /* try to make a coherent snapshot of the string */ tmparg = malloc(arg2, M_SYSCTLTMP, M_WAITOK); memcpy(tmparg, arg1, arg2); } outlen = strnlen(tmparg, arg2 - 1) + 1; error = SYSCTL_OUT(req, tmparg, outlen); if (!ro_string) free(tmparg, M_SYSCTLTMP); } else { outlen = strnlen((char *)arg1, arg2 - 1) + 1; error = SYSCTL_OUT(req, NULL, outlen); } if (error || !req->newptr) return (error); if ((req->newlen - req->newidx) >= arg2) { error = EINVAL; } else { arg2 = (req->newlen - req->newidx); error = SYSCTL_IN(req, arg1, arg2); ((char *)arg1)[arg2] = '\0'; } return (error); } /* * Handle any kind of opaque data. * arg1 points to it, arg2 is the size. */ int sysctl_handle_opaque(SYSCTL_HANDLER_ARGS) { int error, tries; u_int generation; struct sysctl_req req2; /* * Attempt to get a coherent snapshot, by using the thread * pre-emption counter updated from within mi_switch() to * determine if we were pre-empted during a bcopy() or * copyout(). Make 3 attempts at doing this before giving up. * If we encounter an error, stop immediately. */ tries = 0; req2 = *req; retry: generation = curthread->td_generation; error = SYSCTL_OUT(req, arg1, arg2); if (error) return (error); tries++; if (generation != curthread->td_generation && tries < 3) { *req = req2; goto retry; } error = SYSCTL_IN(req, arg1, arg2); return (error); } /* * Transfer functions to/from kernel space. * XXX: rather untested at this point */ static int sysctl_old_kernel(struct sysctl_req *req, const void *p, size_t l) { size_t i = 0; if (req->oldptr) { i = l; if (req->oldlen <= req->oldidx) i = 0; else if (i > req->oldlen - req->oldidx) i = req->oldlen - req->oldidx; if (i > 0) bcopy(p, (char *)req->oldptr + req->oldidx, i); } req->oldidx += l; if (req->oldptr && i != l) return (ENOMEM); return (0); } static int sysctl_new_kernel(struct sysctl_req *req, void *p, size_t l) { if (!req->newptr) return (0); if (req->newlen - req->newidx < l) return (EINVAL); bcopy((char *)req->newptr + req->newidx, p, l); req->newidx += l; return (0); } int kernel_sysctl(struct thread *td, int *name, u_int namelen, void *old, size_t *oldlenp, void *new, size_t newlen, size_t *retval, int flags) { int error = 0; struct sysctl_req req; bzero(&req, sizeof req); req.td = td; req.flags = flags; if (oldlenp) { req.oldlen = *oldlenp; } req.validlen = req.oldlen; if (old) { req.oldptr= old; } if (new != NULL) { req.newlen = newlen; req.newptr = new; } req.oldfunc = sysctl_old_kernel; req.newfunc = sysctl_new_kernel; req.lock = REQ_UNWIRED; error = sysctl_root(0, name, namelen, &req); if (req.lock == REQ_WIRED && req.validlen > 0) vsunlock(req.oldptr, req.validlen); if (error && error != ENOMEM) return (error); if (retval) { if (req.oldptr && req.oldidx > req.validlen) *retval = req.validlen; else *retval = req.oldidx; } return (error); } int kernel_sysctlbyname(struct thread *td, char *name, void *old, size_t *oldlenp, void *new, size_t newlen, size_t *retval, int flags) { int oid[CTL_MAXNAME]; size_t oidlen, plen; int error; oid[0] = 0; /* sysctl internal magic */ oid[1] = 3; /* name2oid */ oidlen = sizeof(oid); error = kernel_sysctl(td, oid, 2, oid, &oidlen, (void *)name, strlen(name), &plen, flags); if (error) return (error); error = kernel_sysctl(td, oid, plen / sizeof(int), old, oldlenp, new, newlen, retval, flags); return (error); } /* * Transfer function to/from user space. */ static int sysctl_old_user(struct sysctl_req *req, const void *p, size_t l) { size_t i, len, origidx; int error; origidx = req->oldidx; req->oldidx += l; if (req->oldptr == NULL) return (0); /* * If we have not wired the user supplied buffer and we are currently * holding locks, drop a witness warning, as it's possible that * write operations to the user page can sleep. */ if (req->lock != REQ_WIRED) WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "sysctl_old_user()"); i = l; len = req->validlen; if (len <= origidx) i = 0; else { if (i > len - origidx) i = len - origidx; if (req->lock == REQ_WIRED) { error = copyout_nofault(p, (char *)req->oldptr + origidx, i); } else error = copyout(p, (char *)req->oldptr + origidx, i); if (error != 0) return (error); } if (i < l) return (ENOMEM); return (0); } static int sysctl_new_user(struct sysctl_req *req, void *p, size_t l) { int error; if (!req->newptr) return (0); if (req->newlen - req->newidx < l) return (EINVAL); WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "sysctl_new_user()"); error = copyin((char *)req->newptr + req->newidx, p, l); req->newidx += l; return (error); } /* * Wire the user space destination buffer. If set to a value greater than * zero, the len parameter limits the maximum amount of wired memory. */ int sysctl_wire_old_buffer(struct sysctl_req *req, size_t len) { int ret; size_t wiredlen; wiredlen = (len > 0 && len < req->oldlen) ? len : req->oldlen; ret = 0; if (req->lock != REQ_WIRED && req->oldptr && req->oldfunc == sysctl_old_user) { if (wiredlen != 0) { ret = vslock(req->oldptr, wiredlen); if (ret != 0) { if (ret != ENOMEM) return (ret); wiredlen = 0; } } req->lock = REQ_WIRED; req->validlen = wiredlen; } return (0); } int sysctl_find_oid(int *name, u_int namelen, struct sysctl_oid **noid, int *nindx, struct sysctl_req *req) { struct sysctl_oid_list *lsp; struct sysctl_oid *oid; int indx; SYSCTL_ASSERT_LOCKED(); lsp = &sysctl__children; indx = 0; while (indx < CTL_MAXNAME) { SLIST_FOREACH(oid, lsp, oid_link) { if (oid->oid_number == name[indx]) break; } if (oid == NULL) return (ENOENT); indx++; if ((oid->oid_kind & CTLTYPE) == CTLTYPE_NODE) { if (oid->oid_handler != NULL || indx == namelen) { *noid = oid; if (nindx != NULL) *nindx = indx; KASSERT((oid->oid_kind & CTLFLAG_DYING) == 0, ("%s found DYING node %p", __func__, oid)); return (0); } lsp = SYSCTL_CHILDREN(oid); } else if (indx == namelen) { *noid = oid; if (nindx != NULL) *nindx = indx; KASSERT((oid->oid_kind & CTLFLAG_DYING) == 0, ("%s found DYING node %p", __func__, oid)); return (0); } else { return (ENOTDIR); } } return (ENOENT); } /* * Traverse our tree, and find the right node, execute whatever it points * to, and return the resulting error code. */ static int sysctl_root(SYSCTL_HANDLER_ARGS) { struct sysctl_oid *oid; struct rm_priotracker tracker; int error, indx, lvl; SYSCTL_RLOCK(&tracker); error = sysctl_find_oid(arg1, arg2, &oid, &indx, req); if (error) goto out; if ((oid->oid_kind & CTLTYPE) == CTLTYPE_NODE) { /* * You can't call a sysctl when it's a node, but has * no handler. Inform the user that it's a node. * The indx may or may not be the same as namelen. */ if (oid->oid_handler == NULL) { error = EISDIR; goto out; } } /* Is this sysctl writable? */ if (req->newptr && !(oid->oid_kind & CTLFLAG_WR)) { error = EPERM; goto out; } KASSERT(req->td != NULL, ("sysctl_root(): req->td == NULL")); #ifdef CAPABILITY_MODE /* * If the process is in capability mode, then don't permit reading or * writing unless specifically granted for the node. */ if (IN_CAPABILITY_MODE(req->td)) { if ((req->oldptr && !(oid->oid_kind & CTLFLAG_CAPRD)) || (req->newptr && !(oid->oid_kind & CTLFLAG_CAPWR))) { error = EPERM; goto out; } } #endif /* Is this sysctl sensitive to securelevels? */ if (req->newptr && (oid->oid_kind & CTLFLAG_SECURE)) { lvl = (oid->oid_kind & CTLMASK_SECURE) >> CTLSHIFT_SECURE; error = securelevel_gt(req->td->td_ucred, lvl); if (error) goto out; } /* Is this sysctl writable by only privileged users? */ if (req->newptr && !(oid->oid_kind & CTLFLAG_ANYBODY)) { int priv; if (oid->oid_kind & CTLFLAG_PRISON) priv = PRIV_SYSCTL_WRITEJAIL; #ifdef VIMAGE else if ((oid->oid_kind & CTLFLAG_VNET) && prison_owns_vnet(req->td->td_ucred)) priv = PRIV_SYSCTL_WRITEJAIL; #endif else priv = PRIV_SYSCTL_WRITE; error = priv_check(req->td, priv); if (error) goto out; } if (!oid->oid_handler) { error = EINVAL; goto out; } if ((oid->oid_kind & CTLTYPE) == CTLTYPE_NODE) { arg1 = (int *)arg1 + indx; arg2 -= indx; } else { arg1 = oid->oid_arg1; arg2 = oid->oid_arg2; } #ifdef MAC error = mac_system_check_sysctl(req->td->td_ucred, oid, arg1, arg2, req); if (error != 0) goto out; #endif #ifdef VIMAGE if ((oid->oid_kind & CTLFLAG_VNET) && arg1 != NULL) arg1 = (void *)(curvnet->vnet_data_base + (uintptr_t)arg1); #endif error = sysctl_root_handler_locked(oid, arg1, arg2, req, &tracker); - - KFAIL_POINT_ERROR(_debug_fail_point, sysctl_running, error); out: SYSCTL_RUNLOCK(&tracker); return (error); } #ifndef _SYS_SYSPROTO_H_ struct sysctl_args { int *name; u_int namelen; void *old; size_t *oldlenp; void *new; size_t newlen; }; #endif int sys___sysctl(struct thread *td, struct sysctl_args *uap) { int error, i, name[CTL_MAXNAME]; size_t j; if (uap->namelen > CTL_MAXNAME || uap->namelen < 2) return (EINVAL); error = copyin(uap->name, &name, uap->namelen * sizeof(int)); if (error) return (error); error = userland_sysctl(td, name, uap->namelen, uap->old, uap->oldlenp, 0, uap->new, uap->newlen, &j, 0); if (error && error != ENOMEM) return (error); if (uap->oldlenp) { i = copyout(&j, uap->oldlenp, sizeof(j)); if (i) return (i); } return (error); } /* * This is used from various compatibility syscalls too. That's why name * must be in kernel space. */ int userland_sysctl(struct thread *td, int *name, u_int namelen, void *old, size_t *oldlenp, int inkernel, void *new, size_t newlen, size_t *retval, int flags) { int error = 0, memlocked; struct sysctl_req req; bzero(&req, sizeof req); req.td = td; req.flags = flags; if (oldlenp) { if (inkernel) { req.oldlen = *oldlenp; } else { error = copyin(oldlenp, &req.oldlen, sizeof(*oldlenp)); if (error) return (error); } } req.validlen = req.oldlen; if (old) { if (!useracc(old, req.oldlen, VM_PROT_WRITE)) return (EFAULT); req.oldptr= old; } if (new != NULL) { if (!useracc(new, newlen, VM_PROT_READ)) return (EFAULT); req.newlen = newlen; req.newptr = new; } req.oldfunc = sysctl_old_user; req.newfunc = sysctl_new_user; req.lock = REQ_UNWIRED; #ifdef KTRACE if (KTRPOINT(curthread, KTR_SYSCTL)) ktrsysctl(name, namelen); #endif if (req.oldptr && req.oldlen > PAGE_SIZE) { memlocked = 1; sx_xlock(&sysctlmemlock); } else memlocked = 0; CURVNET_SET(TD_TO_VNET(td)); for (;;) { req.oldidx = 0; req.newidx = 0; error = sysctl_root(0, name, namelen, &req); if (error != EAGAIN) break; kern_yield(PRI_USER); } CURVNET_RESTORE(); if (req.lock == REQ_WIRED && req.validlen > 0) vsunlock(req.oldptr, req.validlen); if (memlocked) sx_xunlock(&sysctlmemlock); if (error && error != ENOMEM) return (error); if (retval) { if (req.oldptr && req.oldidx > req.validlen) *retval = req.validlen; else *retval = req.oldidx; } return (error); } /* * Drain into a sysctl struct. The user buffer should be wired if a page * fault would cause issue. */ static int sbuf_sysctl_drain(void *arg, const char *data, int len) { struct sysctl_req *req = arg; int error; error = SYSCTL_OUT(req, data, len); KASSERT(error >= 0, ("Got unexpected negative value %d", error)); return (error == 0 ? len : -error); } struct sbuf * sbuf_new_for_sysctl(struct sbuf *s, char *buf, int length, struct sysctl_req *req) { /* Supply a default buffer size if none given. */ if (buf == NULL && length == 0) length = 64; s = sbuf_new(s, buf, length, SBUF_FIXEDLEN | SBUF_INCLUDENUL); sbuf_set_drain(s, sbuf_sysctl_drain, req); return (s); } Index: projects/clang380-import/sys/kern/tty.c =================================================================== --- projects/clang380-import/sys/kern/tty.c (revision 294776) +++ projects/clang380-import/sys/kern/tty.c (revision 294777) @@ -1,2271 +1,2300 @@ /*- * Copyright (c) 2008 Ed Schouten * All rights reserved. * * Portions of this software were developed under sponsorship from Snow * B.V., the Netherlands. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_capsicum.h" #include "opt_compat.h" #include #include #include #include #include #include #include #include #ifdef COMPAT_43TTY #include #endif /* COMPAT_43TTY */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define TTYDEFCHARS #include #undef TTYDEFCHARS #include #include #include static MALLOC_DEFINE(M_TTY, "tty", "tty device"); static void tty_rel_free(struct tty *tp); static TAILQ_HEAD(, tty) tty_list = TAILQ_HEAD_INITIALIZER(tty_list); static struct sx tty_list_sx; SX_SYSINIT(tty_list, &tty_list_sx, "tty list"); static unsigned int tty_list_count = 0; /* Character device of /dev/console. */ static struct cdev *dev_console; static const char *dev_console_filename; /* * Flags that are supported and stored by this implementation. */ #define TTYSUP_IFLAG (IGNBRK|BRKINT|IGNPAR|PARMRK|INPCK|ISTRIP|\ INLCR|IGNCR|ICRNL|IXON|IXOFF|IXANY|IMAXBEL) #define TTYSUP_OFLAG (OPOST|ONLCR|TAB3|ONOEOT|OCRNL|ONOCR|ONLRET) #define TTYSUP_LFLAG (ECHOKE|ECHOE|ECHOK|ECHO|ECHONL|ECHOPRT|\ ECHOCTL|ISIG|ICANON|ALTWERASE|IEXTEN|TOSTOP|\ FLUSHO|NOKERNINFO|NOFLSH) #define TTYSUP_CFLAG (CIGNORE|CSIZE|CSTOPB|CREAD|PARENB|PARODD|\ HUPCL|CLOCAL|CCTS_OFLOW|CRTS_IFLOW|CDTR_IFLOW|\ CDSR_OFLOW|CCAR_OFLOW) #define TTY_CALLOUT(tp,d) (dev2unit(d) & TTYUNIT_CALLOUT) /* * Set TTY buffer sizes. */ #define TTYBUF_MAX 65536 static void tty_watermarks(struct tty *tp) { size_t bs = 0; /* Provide an input buffer for 0.2 seconds of data. */ if (tp->t_termios.c_cflag & CREAD) bs = MIN(tp->t_termios.c_ispeed / 5, TTYBUF_MAX); ttyinq_setsize(&tp->t_inq, tp, bs); /* Set low watermark at 10% (when 90% is available). */ tp->t_inlow = (ttyinq_getallocatedsize(&tp->t_inq) * 9) / 10; /* Provide an output buffer for 0.2 seconds of data. */ bs = MIN(tp->t_termios.c_ospeed / 5, TTYBUF_MAX); ttyoutq_setsize(&tp->t_outq, tp, bs); /* Set low watermark at 10% (when 90% is available). */ tp->t_outlow = (ttyoutq_getallocatedsize(&tp->t_outq) * 9) / 10; } static int tty_drain(struct tty *tp, int leaving) { size_t bytesused; int error, revokecnt; if (ttyhook_hashook(tp, getc_inject)) /* buffer is inaccessible */ return (0); while (ttyoutq_bytesused(&tp->t_outq) > 0 || ttydevsw_busy(tp)) { ttydevsw_outwakeup(tp); /* Could be handled synchronously. */ bytesused = ttyoutq_bytesused(&tp->t_outq); if (bytesused == 0 && !ttydevsw_busy(tp)) return (0); /* Wait for data to be drained. */ if (leaving) { revokecnt = tp->t_revokecnt; error = tty_timedwait(tp, &tp->t_outwait, hz); switch (error) { case ERESTART: if (revokecnt != tp->t_revokecnt) error = 0; break; case EWOULDBLOCK: if (ttyoutq_bytesused(&tp->t_outq) < bytesused) error = 0; break; } } else error = tty_wait(tp, &tp->t_outwait); if (error) return (error); } return (0); } /* * Though ttydev_enter() and ttydev_leave() seem to be related, they * don't have to be used together. ttydev_enter() is used by the cdev * operations to prevent an actual operation from being processed when * the TTY has been abandoned. ttydev_leave() is used by ttydev_open() * and ttydev_close() to determine whether per-TTY data should be * deallocated. */ static __inline int ttydev_enter(struct tty *tp) { + tty_lock(tp); if (tty_gone(tp) || !tty_opened(tp)) { /* Device is already gone. */ tty_unlock(tp); return (ENXIO); } return (0); } static void ttydev_leave(struct tty *tp) { + tty_lock_assert(tp, MA_OWNED); if (tty_opened(tp) || tp->t_flags & TF_OPENCLOSE) { /* Device is still opened somewhere. */ tty_unlock(tp); return; } tp->t_flags |= TF_OPENCLOSE; /* Stop asynchronous I/O. */ funsetown(&tp->t_sigio); /* Remove console TTY. */ if (constty == tp) constty_clear(); /* Drain any output. */ MPASS((tp->t_flags & TF_STOPPED) == 0); if (!tty_gone(tp)) tty_drain(tp, 1); ttydisc_close(tp); /* Free i/o queues now since they might be large. */ ttyinq_free(&tp->t_inq); tp->t_inlow = 0; ttyoutq_free(&tp->t_outq); tp->t_outlow = 0; knlist_clear(&tp->t_inpoll.si_note, 1); knlist_clear(&tp->t_outpoll.si_note, 1); if (!tty_gone(tp)) ttydevsw_close(tp); tp->t_flags &= ~TF_OPENCLOSE; cv_broadcast(&tp->t_dcdwait); tty_rel_free(tp); } /* * Operations that are exposed through the character device in /dev. */ static int -ttydev_open(struct cdev *dev, int oflags, int devtype, struct thread *td) +ttydev_open(struct cdev *dev, int oflags, int devtype __unused, + struct thread *td) { struct tty *tp; int error; tp = dev->si_drv1; error = 0; tty_lock(tp); if (tty_gone(tp)) { /* Device is already gone. */ tty_unlock(tp); return (ENXIO); } /* * Block when other processes are currently opening or closing * the TTY. */ while (tp->t_flags & TF_OPENCLOSE) { error = tty_wait(tp, &tp->t_dcdwait); if (error != 0) { tty_unlock(tp); return (error); } } tp->t_flags |= TF_OPENCLOSE; /* * Make sure the "tty" and "cua" device cannot be opened at the - * same time. + * same time. The console is a "tty" device. */ if (TTY_CALLOUT(tp, dev)) { - if (tp->t_flags & TF_OPENED_IN) { + if (tp->t_flags & (TF_OPENED_CONS | TF_OPENED_IN)) { error = EBUSY; goto done; } } else { if (tp->t_flags & TF_OPENED_OUT) { error = EBUSY; goto done; } } if (tp->t_flags & TF_EXCLUDE && priv_check(td, PRIV_TTY_EXCLUSIVE)) { error = EBUSY; goto done; } if (!tty_opened(tp)) { /* Set proper termios flags. */ if (TTY_CALLOUT(tp, dev)) tp->t_termios = tp->t_termios_init_out; else tp->t_termios = tp->t_termios_init_in; ttydevsw_param(tp, &tp->t_termios); /* Prevent modem control on callout devices and /dev/console. */ if (TTY_CALLOUT(tp, dev) || dev == dev_console) tp->t_termios.c_cflag |= CLOCAL; ttydevsw_modem(tp, SER_DTR|SER_RTS, 0); error = ttydevsw_open(tp); if (error != 0) goto done; ttydisc_open(tp); tty_watermarks(tp); /* XXXGL: drops lock */ } /* Wait for Carrier Detect. */ if ((oflags & O_NONBLOCK) == 0 && (tp->t_termios.c_cflag & CLOCAL) == 0) { while ((ttydevsw_modem(tp, 0, 0) & SER_DCD) == 0) { error = tty_wait(tp, &tp->t_dcdwait); if (error != 0) goto done; } } if (dev == dev_console) tp->t_flags |= TF_OPENED_CONS; else if (TTY_CALLOUT(tp, dev)) tp->t_flags |= TF_OPENED_OUT; else tp->t_flags |= TF_OPENED_IN; + MPASS((tp->t_flags & (TF_OPENED_CONS | TF_OPENED_IN)) == 0 || + (tp->t_flags & TF_OPENED_OUT) == 0); done: tp->t_flags &= ~TF_OPENCLOSE; cv_broadcast(&tp->t_dcdwait); ttydev_leave(tp); return (error); } static int -ttydev_close(struct cdev *dev, int fflag, int devtype, struct thread *td) +ttydev_close(struct cdev *dev, int fflag, int devtype __unused, + struct thread *td __unused) { struct tty *tp = dev->si_drv1; tty_lock(tp); /* * Don't actually close the device if it is being used as the * console. */ - MPASS((tp->t_flags & TF_OPENED) != TF_OPENED); + MPASS((tp->t_flags & (TF_OPENED_CONS | TF_OPENED_IN)) == 0 || + (tp->t_flags & TF_OPENED_OUT) == 0); if (dev == dev_console) tp->t_flags &= ~TF_OPENED_CONS; else tp->t_flags &= ~(TF_OPENED_IN|TF_OPENED_OUT); if (tp->t_flags & TF_OPENED) { tty_unlock(tp); return (0); } /* * This can only be called once. The callin and the callout * devices cannot be opened at the same time. */ tp->t_flags &= ~(TF_EXCLUDE|TF_STOPPED); /* Properly wake up threads that are stuck - revoke(). */ tp->t_revokecnt++; tty_wakeup(tp, FREAD|FWRITE); cv_broadcast(&tp->t_bgwait); cv_broadcast(&tp->t_dcdwait); ttydev_leave(tp); return (0); } static __inline int tty_is_ctty(struct tty *tp, struct proc *p) { + tty_lock_assert(tp, MA_OWNED); return (p->p_session == tp->t_session && p->p_flag & P_CONTROLT); } int tty_wait_background(struct tty *tp, struct thread *td, int sig) { struct proc *p = td->td_proc; struct pgrp *pg; ksiginfo_t ksi; int error; MPASS(sig == SIGTTIN || sig == SIGTTOU); tty_lock_assert(tp, MA_OWNED); for (;;) { PROC_LOCK(p); /* * The process should only sleep, when: * - This terminal is the controling terminal * - Its process group is not the foreground process * group * - The parent process isn't waiting for the child to * exit * - the signal to send to the process isn't masked */ if (!tty_is_ctty(tp, p) || p->p_pgrp == tp->t_pgrp) { /* Allow the action to happen. */ PROC_UNLOCK(p); return (0); } if (SIGISMEMBER(p->p_sigacts->ps_sigignore, sig) || SIGISMEMBER(td->td_sigmask, sig)) { /* Only allow them in write()/ioctl(). */ PROC_UNLOCK(p); return (sig == SIGTTOU ? 0 : EIO); } pg = p->p_pgrp; if (p->p_flag & P_PPWAIT || pg->pg_jobc == 0) { /* Don't allow the action to happen. */ PROC_UNLOCK(p); return (EIO); } PROC_UNLOCK(p); /* * Send the signal and sleep until we're the new * foreground process group. */ if (sig != 0) { ksiginfo_init(&ksi); ksi.ksi_code = SI_KERNEL; ksi.ksi_signo = sig; sig = 0; } PGRP_LOCK(pg); pgsignal(pg, ksi.ksi_signo, 1, &ksi); PGRP_UNLOCK(pg); error = tty_wait(tp, &tp->t_bgwait); if (error) return (error); } } static int ttydev_read(struct cdev *dev, struct uio *uio, int ioflag) { struct tty *tp = dev->si_drv1; int error; error = ttydev_enter(tp); if (error) goto done; error = ttydisc_read(tp, uio, ioflag); tty_unlock(tp); /* * The read() call should not throw an error when the device is * being destroyed. Silently convert it to an EOF. */ done: if (error == ENXIO) error = 0; return (error); } static int ttydev_write(struct cdev *dev, struct uio *uio, int ioflag) { struct tty *tp = dev->si_drv1; int error; error = ttydev_enter(tp); if (error) return (error); if (tp->t_termios.c_lflag & TOSTOP) { error = tty_wait_background(tp, curthread, SIGTTOU); if (error) goto done; } if (ioflag & IO_NDELAY && tp->t_flags & TF_BUSY_OUT) { /* Allow non-blocking writes to bypass serialization. */ error = ttydisc_write(tp, uio, ioflag); } else { /* Serialize write() calls. */ while (tp->t_flags & TF_BUSY_OUT) { error = tty_wait(tp, &tp->t_outserwait); if (error) goto done; } tp->t_flags |= TF_BUSY_OUT; error = ttydisc_write(tp, uio, ioflag); tp->t_flags &= ~TF_BUSY_OUT; cv_signal(&tp->t_outserwait); } done: tty_unlock(tp); return (error); } static int ttydev_ioctl(struct cdev *dev, u_long cmd, caddr_t data, int fflag, struct thread *td) { struct tty *tp = dev->si_drv1; int error; error = ttydev_enter(tp); if (error) return (error); switch (cmd) { case TIOCCBRK: case TIOCCONS: case TIOCDRAIN: case TIOCEXCL: case TIOCFLUSH: case TIOCNXCL: case TIOCSBRK: case TIOCSCTTY: case TIOCSETA: case TIOCSETAF: case TIOCSETAW: case TIOCSPGRP: case TIOCSTART: case TIOCSTAT: case TIOCSTI: case TIOCSTOP: case TIOCSWINSZ: #if 0 case TIOCSDRAINWAIT: case TIOCSETD: #endif #ifdef COMPAT_43TTY case TIOCLBIC: case TIOCLBIS: case TIOCLSET: case TIOCSETC: case OTIOCSETD: case TIOCSETN: case TIOCSETP: case TIOCSLTC: #endif /* COMPAT_43TTY */ /* * If the ioctl() causes the TTY to be modified, let it * wait in the background. */ error = tty_wait_background(tp, curthread, SIGTTOU); if (error) goto done; } if (cmd == TIOCSETA || cmd == TIOCSETAW || cmd == TIOCSETAF) { struct termios *old = &tp->t_termios; struct termios *new = (struct termios *)data; struct termios *lock = TTY_CALLOUT(tp, dev) ? &tp->t_termios_lock_out : &tp->t_termios_lock_in; int cc; /* * Lock state devices. Just overwrite the values of the * commands that are currently in use. */ new->c_iflag = (old->c_iflag & lock->c_iflag) | (new->c_iflag & ~lock->c_iflag); new->c_oflag = (old->c_oflag & lock->c_oflag) | (new->c_oflag & ~lock->c_oflag); new->c_cflag = (old->c_cflag & lock->c_cflag) | (new->c_cflag & ~lock->c_cflag); new->c_lflag = (old->c_lflag & lock->c_lflag) | (new->c_lflag & ~lock->c_lflag); for (cc = 0; cc < NCCS; ++cc) if (lock->c_cc[cc]) new->c_cc[cc] = old->c_cc[cc]; if (lock->c_ispeed) new->c_ispeed = old->c_ispeed; if (lock->c_ospeed) new->c_ospeed = old->c_ospeed; } error = tty_ioctl(tp, cmd, data, fflag, td); done: tty_unlock(tp); return (error); } static int ttydev_poll(struct cdev *dev, int events, struct thread *td) { struct tty *tp = dev->si_drv1; int error, revents = 0; error = ttydev_enter(tp); if (error) return ((events & (POLLIN|POLLRDNORM)) | POLLHUP); if (events & (POLLIN|POLLRDNORM)) { /* See if we can read something. */ if (ttydisc_read_poll(tp) > 0) revents |= events & (POLLIN|POLLRDNORM); } if (tp->t_flags & TF_ZOMBIE) { /* Hangup flag on zombie state. */ revents |= POLLHUP; } else if (events & (POLLOUT|POLLWRNORM)) { /* See if we can write something. */ if (ttydisc_write_poll(tp) > 0) revents |= events & (POLLOUT|POLLWRNORM); } if (revents == 0) { if (events & (POLLIN|POLLRDNORM)) selrecord(td, &tp->t_inpoll); if (events & (POLLOUT|POLLWRNORM)) selrecord(td, &tp->t_outpoll); } tty_unlock(tp); return (revents); } static int ttydev_mmap(struct cdev *dev, vm_ooffset_t offset, vm_paddr_t *paddr, int nprot, vm_memattr_t *memattr) { struct tty *tp = dev->si_drv1; int error; /* Handle mmap() through the driver. */ error = ttydev_enter(tp); if (error) return (-1); error = ttydevsw_mmap(tp, offset, paddr, nprot, memattr); tty_unlock(tp); return (error); } /* * kqueue support. */ static void tty_kqops_read_detach(struct knote *kn) { struct tty *tp = kn->kn_hook; knlist_remove(&tp->t_inpoll.si_note, kn, 0); } static int -tty_kqops_read_event(struct knote *kn, long hint) +tty_kqops_read_event(struct knote *kn, long hint __unused) { struct tty *tp = kn->kn_hook; tty_lock_assert(tp, MA_OWNED); if (tty_gone(tp) || tp->t_flags & TF_ZOMBIE) { kn->kn_flags |= EV_EOF; return (1); } else { kn->kn_data = ttydisc_read_poll(tp); return (kn->kn_data > 0); } } static void tty_kqops_write_detach(struct knote *kn) { struct tty *tp = kn->kn_hook; knlist_remove(&tp->t_outpoll.si_note, kn, 0); } static int -tty_kqops_write_event(struct knote *kn, long hint) +tty_kqops_write_event(struct knote *kn, long hint __unused) { struct tty *tp = kn->kn_hook; tty_lock_assert(tp, MA_OWNED); if (tty_gone(tp)) { kn->kn_flags |= EV_EOF; return (1); } else { kn->kn_data = ttydisc_write_poll(tp); return (kn->kn_data > 0); } } static struct filterops tty_kqops_read = { .f_isfd = 1, .f_detach = tty_kqops_read_detach, .f_event = tty_kqops_read_event, }; + static struct filterops tty_kqops_write = { .f_isfd = 1, .f_detach = tty_kqops_write_detach, .f_event = tty_kqops_write_event, }; static int ttydev_kqfilter(struct cdev *dev, struct knote *kn) { struct tty *tp = dev->si_drv1; int error; error = ttydev_enter(tp); if (error) return (error); switch (kn->kn_filter) { case EVFILT_READ: kn->kn_hook = tp; kn->kn_fop = &tty_kqops_read; knlist_add(&tp->t_inpoll.si_note, kn, 1); break; case EVFILT_WRITE: kn->kn_hook = tp; kn->kn_fop = &tty_kqops_write; knlist_add(&tp->t_outpoll.si_note, kn, 1); break; default: error = EINVAL; break; } tty_unlock(tp); return (error); } static struct cdevsw ttydev_cdevsw = { .d_version = D_VERSION, .d_open = ttydev_open, .d_close = ttydev_close, .d_read = ttydev_read, .d_write = ttydev_write, .d_ioctl = ttydev_ioctl, .d_kqfilter = ttydev_kqfilter, .d_poll = ttydev_poll, .d_mmap = ttydev_mmap, .d_name = "ttydev", .d_flags = D_TTY, }; /* * Init/lock-state devices */ static int -ttyil_open(struct cdev *dev, int oflags, int devtype, struct thread *td) +ttyil_open(struct cdev *dev, int oflags __unused, int devtype __unused, + struct thread *td) { struct tty *tp; int error; tp = dev->si_drv1; error = 0; tty_lock(tp); if (tty_gone(tp)) error = ENODEV; tty_unlock(tp); return (error); } static int -ttyil_close(struct cdev *dev, int flag, int mode, struct thread *td) +ttyil_close(struct cdev *dev __unused, int flag __unused, int mode __unused, + struct thread *td __unused) { + return (0); } static int -ttyil_rdwr(struct cdev *dev, struct uio *uio, int ioflag) +ttyil_rdwr(struct cdev *dev __unused, struct uio *uio __unused, + int ioflag __unused) { + return (ENODEV); } static int ttyil_ioctl(struct cdev *dev, u_long cmd, caddr_t data, int fflag, struct thread *td) { struct tty *tp = dev->si_drv1; int error; tty_lock(tp); if (tty_gone(tp)) { error = ENODEV; goto done; } error = ttydevsw_cioctl(tp, dev2unit(dev), cmd, data, td); if (error != ENOIOCTL) goto done; error = 0; switch (cmd) { case TIOCGETA: /* Obtain terminal flags through tcgetattr(). */ *(struct termios*)data = *(struct termios*)dev->si_drv2; break; case TIOCSETA: /* Set terminal flags through tcsetattr(). */ error = priv_check(td, PRIV_TTY_SETA); if (error) break; *(struct termios*)dev->si_drv2 = *(struct termios*)data; break; case TIOCGETD: *(int *)data = TTYDISC; break; case TIOCGWINSZ: bzero(data, sizeof(struct winsize)); break; default: error = ENOTTY; } done: tty_unlock(tp); return (error); } static struct cdevsw ttyil_cdevsw = { .d_version = D_VERSION, .d_open = ttyil_open, .d_close = ttyil_close, .d_read = ttyil_rdwr, .d_write = ttyil_rdwr, .d_ioctl = ttyil_ioctl, .d_name = "ttyil", .d_flags = D_TTY, }; static void tty_init_termios(struct tty *tp) { struct termios *t = &tp->t_termios_init_in; t->c_cflag = TTYDEF_CFLAG; t->c_iflag = TTYDEF_IFLAG; t->c_lflag = TTYDEF_LFLAG; t->c_oflag = TTYDEF_OFLAG; t->c_ispeed = TTYDEF_SPEED; t->c_ospeed = TTYDEF_SPEED; memcpy(&t->c_cc, ttydefchars, sizeof ttydefchars); tp->t_termios_init_out = *t; } void tty_init_console(struct tty *tp, speed_t s) { struct termios *ti = &tp->t_termios_init_in; struct termios *to = &tp->t_termios_init_out; if (s != 0) { ti->c_ispeed = ti->c_ospeed = s; to->c_ispeed = to->c_ospeed = s; } ti->c_cflag |= CLOCAL; to->c_cflag |= CLOCAL; } /* * Standard device routine implementations, mostly meant for * pseudo-terminal device drivers. When a driver creates a new terminal * device class, missing routines are patched. */ static int -ttydevsw_defopen(struct tty *tp) +ttydevsw_defopen(struct tty *tp __unused) { return (0); } static void -ttydevsw_defclose(struct tty *tp) +ttydevsw_defclose(struct tty *tp __unused) { + } static void -ttydevsw_defoutwakeup(struct tty *tp) +ttydevsw_defoutwakeup(struct tty *tp __unused) { panic("Terminal device has output, while not implemented"); } static void -ttydevsw_definwakeup(struct tty *tp) +ttydevsw_definwakeup(struct tty *tp __unused) { + } static int -ttydevsw_defioctl(struct tty *tp, u_long cmd, caddr_t data, struct thread *td) +ttydevsw_defioctl(struct tty *tp __unused, u_long cmd __unused, + caddr_t data __unused, struct thread *td __unused) { return (ENOIOCTL); } static int -ttydevsw_defcioctl(struct tty *tp, int unit, u_long cmd, caddr_t data, struct thread *td) +ttydevsw_defcioctl(struct tty *tp __unused, int unit __unused, + u_long cmd __unused, caddr_t data __unused, struct thread *td __unused) { return (ENOIOCTL); } static int -ttydevsw_defparam(struct tty *tp, struct termios *t) +ttydevsw_defparam(struct tty *tp __unused, struct termios *t) { /* * Allow the baud rate to be adjusted for pseudo-devices, but at * least restrict it to 115200 to prevent excessive buffer * usage. Also disallow 0, to prevent foot shooting. */ if (t->c_ispeed < B50) t->c_ispeed = B50; else if (t->c_ispeed > B115200) t->c_ispeed = B115200; if (t->c_ospeed < B50) t->c_ospeed = B50; else if (t->c_ospeed > B115200) t->c_ospeed = B115200; t->c_cflag |= CREAD; return (0); } static int -ttydevsw_defmodem(struct tty *tp, int sigon, int sigoff) +ttydevsw_defmodem(struct tty *tp __unused, int sigon __unused, + int sigoff __unused) { /* Simulate a carrier to make the TTY layer happy. */ return (SER_DCD); } static int -ttydevsw_defmmap(struct tty *tp, vm_ooffset_t offset, vm_paddr_t *paddr, - int nprot, vm_memattr_t *memattr) +ttydevsw_defmmap(struct tty *tp __unused, vm_ooffset_t offset __unused, + vm_paddr_t *paddr __unused, int nprot __unused, + vm_memattr_t *memattr __unused) { return (-1); } static void -ttydevsw_defpktnotify(struct tty *tp, char event) +ttydevsw_defpktnotify(struct tty *tp __unused, char event __unused) { + } static void -ttydevsw_deffree(void *softc) +ttydevsw_deffree(void *softc __unused) { panic("Terminal device freed without a free-handler"); } static bool ttydevsw_defbusy(struct tty *tp __unused) { return (FALSE); } /* * TTY allocation and deallocation. TTY devices can be deallocated when * the driver doesn't use it anymore, when the TTY isn't a session's * controlling TTY and when the device node isn't opened through devfs. */ struct tty * tty_alloc(struct ttydevsw *tsw, void *sc) { return (tty_alloc_mutex(tsw, sc, NULL)); } struct tty * tty_alloc_mutex(struct ttydevsw *tsw, void *sc, struct mtx *mutex) { struct tty *tp; /* Make sure the driver defines all routines. */ #define PATCH_FUNC(x) do { \ if (tsw->tsw_ ## x == NULL) \ tsw->tsw_ ## x = ttydevsw_def ## x; \ } while (0) PATCH_FUNC(open); PATCH_FUNC(close); PATCH_FUNC(outwakeup); PATCH_FUNC(inwakeup); PATCH_FUNC(ioctl); PATCH_FUNC(cioctl); PATCH_FUNC(param); PATCH_FUNC(modem); PATCH_FUNC(mmap); PATCH_FUNC(pktnotify); PATCH_FUNC(free); PATCH_FUNC(busy); #undef PATCH_FUNC tp = malloc(sizeof(struct tty), M_TTY, M_WAITOK|M_ZERO); tp->t_devsw = tsw; tp->t_devswsoftc = sc; tp->t_flags = tsw->tsw_flags; tty_init_termios(tp); cv_init(&tp->t_inwait, "ttyin"); cv_init(&tp->t_outwait, "ttyout"); cv_init(&tp->t_outserwait, "ttyosr"); cv_init(&tp->t_bgwait, "ttybg"); cv_init(&tp->t_dcdwait, "ttydcd"); /* Allow drivers to use a custom mutex to lock the TTY. */ if (mutex != NULL) { tp->t_mtx = mutex; } else { tp->t_mtx = &tp->t_mtxobj; mtx_init(&tp->t_mtxobj, "ttymtx", NULL, MTX_DEF); } knlist_init_mtx(&tp->t_inpoll.si_note, tp->t_mtx); knlist_init_mtx(&tp->t_outpoll.si_note, tp->t_mtx); return (tp); } static void tty_dealloc(void *arg) { struct tty *tp = arg; /* * ttyydev_leave() usually frees the i/o queues earlier, but it is * not always called between queue allocation and here. The queues * may be allocated by ioctls on a pty control device without the * corresponding pty slave device ever being open, or after it is * closed. */ ttyinq_free(&tp->t_inq); ttyoutq_free(&tp->t_outq); seldrain(&tp->t_inpoll); seldrain(&tp->t_outpoll); knlist_destroy(&tp->t_inpoll.si_note); knlist_destroy(&tp->t_outpoll.si_note); cv_destroy(&tp->t_inwait); cv_destroy(&tp->t_outwait); cv_destroy(&tp->t_bgwait); cv_destroy(&tp->t_dcdwait); cv_destroy(&tp->t_outserwait); if (tp->t_mtx == &tp->t_mtxobj) mtx_destroy(&tp->t_mtxobj); ttydevsw_free(tp); free(tp, M_TTY); } static void tty_rel_free(struct tty *tp) { struct cdev *dev; tty_lock_assert(tp, MA_OWNED); #define TF_ACTIVITY (TF_GONE|TF_OPENED|TF_HOOK|TF_OPENCLOSE) if (tp->t_sessioncnt != 0 || (tp->t_flags & TF_ACTIVITY) != TF_GONE) { /* TTY is still in use. */ tty_unlock(tp); return; } /* TTY can be deallocated. */ dev = tp->t_dev; tp->t_dev = NULL; tty_unlock(tp); if (dev != NULL) { sx_xlock(&tty_list_sx); TAILQ_REMOVE(&tty_list, tp, t_list); tty_list_count--; sx_xunlock(&tty_list_sx); destroy_dev_sched_cb(dev, tty_dealloc, tp); } } void tty_rel_pgrp(struct tty *tp, struct pgrp *pg) { + MPASS(tp->t_sessioncnt > 0); tty_lock_assert(tp, MA_OWNED); if (tp->t_pgrp == pg) tp->t_pgrp = NULL; tty_unlock(tp); } void tty_rel_sess(struct tty *tp, struct session *sess) { + MPASS(tp->t_sessioncnt > 0); /* Current session has left. */ if (tp->t_session == sess) { tp->t_session = NULL; MPASS(tp->t_pgrp == NULL); } tp->t_sessioncnt--; tty_rel_free(tp); } void tty_rel_gone(struct tty *tp) { + MPASS(!tty_gone(tp)); /* Simulate carrier removal. */ ttydisc_modem(tp, 0); /* Wake up all blocked threads. */ tty_wakeup(tp, FREAD|FWRITE); cv_broadcast(&tp->t_bgwait); cv_broadcast(&tp->t_dcdwait); tp->t_flags |= TF_GONE; tty_rel_free(tp); } /* * Exposing information about current TTY's through sysctl */ static void tty_to_xtty(struct tty *tp, struct xtty *xt) { + tty_lock_assert(tp, MA_OWNED); xt->xt_size = sizeof(struct xtty); xt->xt_insize = ttyinq_getsize(&tp->t_inq); xt->xt_incc = ttyinq_bytescanonicalized(&tp->t_inq); xt->xt_inlc = ttyinq_bytesline(&tp->t_inq); xt->xt_inlow = tp->t_inlow; xt->xt_outsize = ttyoutq_getsize(&tp->t_outq); xt->xt_outcc = ttyoutq_bytesused(&tp->t_outq); xt->xt_outlow = tp->t_outlow; xt->xt_column = tp->t_column; xt->xt_pgid = tp->t_pgrp ? tp->t_pgrp->pg_id : 0; xt->xt_sid = tp->t_session ? tp->t_session->s_sid : 0; xt->xt_flags = tp->t_flags; xt->xt_dev = tp->t_dev ? dev2udev(tp->t_dev) : NODEV; } static int sysctl_kern_ttys(SYSCTL_HANDLER_ARGS) { unsigned long lsize; struct xtty *xtlist, *xt; struct tty *tp; int error; sx_slock(&tty_list_sx); lsize = tty_list_count * sizeof(struct xtty); if (lsize == 0) { sx_sunlock(&tty_list_sx); return (0); } xtlist = xt = malloc(lsize, M_TTY, M_WAITOK); TAILQ_FOREACH(tp, &tty_list, t_list) { tty_lock(tp); tty_to_xtty(tp, xt); tty_unlock(tp); xt++; } sx_sunlock(&tty_list_sx); error = SYSCTL_OUT(req, xtlist, lsize); free(xtlist, M_TTY); return (error); } SYSCTL_PROC(_kern, OID_AUTO, ttys, CTLTYPE_OPAQUE|CTLFLAG_RD|CTLFLAG_MPSAFE, 0, 0, sysctl_kern_ttys, "S,xtty", "List of TTYs"); /* * Device node creation. Device has been set up, now we can expose it to * the user. */ int tty_makedevf(struct tty *tp, struct ucred *cred, int flags, const char *fmt, ...) { va_list ap; struct make_dev_args args; struct cdev *dev, *init, *lock, *cua, *cinit, *clock; const char *prefix = "tty"; char name[SPECNAMELEN - 3]; /* for "tty" and "cua". */ uid_t uid; gid_t gid; mode_t mode; int error; /* Remove "tty" prefix from devices like PTY's. */ if (tp->t_flags & TF_NOPREFIX) prefix = ""; va_start(ap, fmt); vsnrprintf(name, sizeof name, 32, fmt, ap); va_end(ap); if (cred == NULL) { /* System device. */ uid = UID_ROOT; gid = GID_WHEEL; mode = S_IRUSR|S_IWUSR; } else { /* User device. */ uid = cred->cr_ruid; gid = GID_TTY; mode = S_IRUSR|S_IWUSR|S_IWGRP; } flags = flags & TTYMK_CLONING ? MAKEDEV_REF : 0; flags |= MAKEDEV_CHECKNAME; /* Master call-in device. */ make_dev_args_init(&args); args.mda_flags = flags; args.mda_devsw = &ttydev_cdevsw; args.mda_cr = cred; args.mda_uid = uid; args.mda_gid = gid; args.mda_mode = mode; args.mda_si_drv1 = tp; error = make_dev_s(&args, &dev, "%s%s", prefix, name); if (error != 0) return (error); tp->t_dev = dev; init = lock = cua = cinit = clock = NULL; /* Slave call-in devices. */ if (tp->t_flags & TF_INITLOCK) { args.mda_devsw = &ttyil_cdevsw; args.mda_unit = TTYUNIT_INIT; args.mda_si_drv1 = tp; args.mda_si_drv2 = &tp->t_termios_init_in; error = make_dev_s(&args, &init, "%s%s.init", prefix, name); if (error != 0) goto fail; dev_depends(dev, init); args.mda_unit = TTYUNIT_LOCK; args.mda_si_drv2 = &tp->t_termios_lock_in; error = make_dev_s(&args, &lock, "%s%s.lock", prefix, name); if (error != 0) goto fail; dev_depends(dev, lock); } /* Call-out devices. */ if (tp->t_flags & TF_CALLOUT) { make_dev_args_init(&args); args.mda_flags = flags; args.mda_devsw = &ttydev_cdevsw; args.mda_cr = cred; args.mda_uid = UID_UUCP; args.mda_gid = GID_DIALER; args.mda_mode = 0660; args.mda_unit = TTYUNIT_CALLOUT; args.mda_si_drv1 = tp; error = make_dev_s(&args, &cua, "cua%s", name); if (error != 0) goto fail; dev_depends(dev, cua); /* Slave call-out devices. */ if (tp->t_flags & TF_INITLOCK) { args.mda_devsw = &ttyil_cdevsw; args.mda_unit = TTYUNIT_CALLOUT | TTYUNIT_INIT; args.mda_si_drv2 = &tp->t_termios_init_out; error = make_dev_s(&args, &cinit, "cua%s.init", name); if (error != 0) goto fail; dev_depends(dev, cinit); args.mda_unit = TTYUNIT_CALLOUT | TTYUNIT_LOCK; args.mda_si_drv2 = &tp->t_termios_lock_out; error = make_dev_s(&args, &clock, "cua%s.lock", name); if (error != 0) goto fail; dev_depends(dev, clock); } } sx_xlock(&tty_list_sx); TAILQ_INSERT_TAIL(&tty_list, tp, t_list); tty_list_count++; sx_xunlock(&tty_list_sx); return (0); fail: destroy_dev(dev); if (init) destroy_dev(init); if (lock) destroy_dev(lock); if (cinit) destroy_dev(cinit); if (clock) destroy_dev(clock); return (error); } /* * Signalling processes. */ void tty_signal_sessleader(struct tty *tp, int sig) { struct proc *p; tty_lock_assert(tp, MA_OWNED); MPASS(sig >= 1 && sig < NSIG); /* Make signals start output again. */ tp->t_flags &= ~TF_STOPPED; if (tp->t_session != NULL && tp->t_session->s_leader != NULL) { p = tp->t_session->s_leader; PROC_LOCK(p); kern_psignal(p, sig); PROC_UNLOCK(p); } } void tty_signal_pgrp(struct tty *tp, int sig) { ksiginfo_t ksi; tty_lock_assert(tp, MA_OWNED); MPASS(sig >= 1 && sig < NSIG); /* Make signals start output again. */ tp->t_flags &= ~TF_STOPPED; if (sig == SIGINFO && !(tp->t_termios.c_lflag & NOKERNINFO)) tty_info(tp); if (tp->t_pgrp != NULL) { ksiginfo_init(&ksi); ksi.ksi_signo = sig; ksi.ksi_code = SI_KERNEL; PGRP_LOCK(tp->t_pgrp); pgsignal(tp->t_pgrp, sig, 1, &ksi); PGRP_UNLOCK(tp->t_pgrp); } } void tty_wakeup(struct tty *tp, int flags) { + if (tp->t_flags & TF_ASYNC && tp->t_sigio != NULL) pgsigio(&tp->t_sigio, SIGIO, (tp->t_session != NULL)); if (flags & FWRITE) { cv_broadcast(&tp->t_outwait); selwakeup(&tp->t_outpoll); KNOTE_LOCKED(&tp->t_outpoll.si_note, 0); } if (flags & FREAD) { cv_broadcast(&tp->t_inwait); selwakeup(&tp->t_inpoll); KNOTE_LOCKED(&tp->t_inpoll.si_note, 0); } } int tty_wait(struct tty *tp, struct cv *cv) { int error; int revokecnt = tp->t_revokecnt; tty_lock_assert(tp, MA_OWNED|MA_NOTRECURSED); MPASS(!tty_gone(tp)); error = cv_wait_sig(cv, tp->t_mtx); /* Bail out when the device slipped away. */ if (tty_gone(tp)) return (ENXIO); /* Restart the system call when we may have been revoked. */ if (tp->t_revokecnt != revokecnt) return (ERESTART); return (error); } int tty_timedwait(struct tty *tp, struct cv *cv, int hz) { int error; int revokecnt = tp->t_revokecnt; tty_lock_assert(tp, MA_OWNED|MA_NOTRECURSED); MPASS(!tty_gone(tp)); error = cv_timedwait_sig(cv, tp->t_mtx, hz); /* Bail out when the device slipped away. */ if (tty_gone(tp)) return (ENXIO); /* Restart the system call when we may have been revoked. */ if (tp->t_revokecnt != revokecnt) return (ERESTART); return (error); } void tty_flush(struct tty *tp, int flags) { + if (flags & FWRITE) { tp->t_flags &= ~TF_HIWAT_OUT; ttyoutq_flush(&tp->t_outq); tty_wakeup(tp, FWRITE); ttydevsw_pktnotify(tp, TIOCPKT_FLUSHWRITE); } if (flags & FREAD) { tty_hiwat_in_unblock(tp); ttyinq_flush(&tp->t_inq); ttydevsw_inwakeup(tp); ttydevsw_pktnotify(tp, TIOCPKT_FLUSHREAD); } } void tty_set_winsize(struct tty *tp, const struct winsize *wsz) { if (memcmp(&tp->t_winsize, wsz, sizeof(*wsz)) == 0) return; tp->t_winsize = *wsz; tty_signal_pgrp(tp, SIGWINCH); } static int tty_generic_ioctl(struct tty *tp, u_long cmd, void *data, int fflag, struct thread *td) { int error; switch (cmd) { /* * Modem commands. * The SER_* and TIOCM_* flags are the same, but one bit * shifted. I don't know why. */ case TIOCSDTR: ttydevsw_modem(tp, SER_DTR, 0); return (0); case TIOCCDTR: ttydevsw_modem(tp, 0, SER_DTR); return (0); case TIOCMSET: { int bits = *(int *)data; ttydevsw_modem(tp, (bits & (TIOCM_DTR | TIOCM_RTS)) >> 1, ((~bits) & (TIOCM_DTR | TIOCM_RTS)) >> 1); return (0); } case TIOCMBIS: { int bits = *(int *)data; ttydevsw_modem(tp, (bits & (TIOCM_DTR | TIOCM_RTS)) >> 1, 0); return (0); } case TIOCMBIC: { int bits = *(int *)data; ttydevsw_modem(tp, 0, (bits & (TIOCM_DTR | TIOCM_RTS)) >> 1); return (0); } case TIOCMGET: *(int *)data = TIOCM_LE + (ttydevsw_modem(tp, 0, 0) << 1); return (0); case FIOASYNC: if (*(int *)data) tp->t_flags |= TF_ASYNC; else tp->t_flags &= ~TF_ASYNC; return (0); case FIONBIO: /* This device supports non-blocking operation. */ return (0); case FIONREAD: *(int *)data = ttyinq_bytescanonicalized(&tp->t_inq); return (0); case FIONWRITE: case TIOCOUTQ: *(int *)data = ttyoutq_bytesused(&tp->t_outq); return (0); case FIOSETOWN: if (tp->t_session != NULL && !tty_is_ctty(tp, td->td_proc)) /* Not allowed to set ownership. */ return (ENOTTY); /* Temporarily unlock the TTY to set ownership. */ tty_unlock(tp); error = fsetown(*(int *)data, &tp->t_sigio); tty_lock(tp); return (error); case FIOGETOWN: if (tp->t_session != NULL && !tty_is_ctty(tp, td->td_proc)) /* Not allowed to set ownership. */ return (ENOTTY); /* Get ownership. */ *(int *)data = fgetown(&tp->t_sigio); return (0); case TIOCGETA: /* Obtain terminal flags through tcgetattr(). */ *(struct termios*)data = tp->t_termios; return (0); case TIOCSETA: case TIOCSETAW: case TIOCSETAF: { struct termios *t = data; /* * Who makes up these funny rules? According to POSIX, * input baud rate is set equal to the output baud rate * when zero. */ if (t->c_ispeed == 0) t->c_ispeed = t->c_ospeed; /* Discard any unsupported bits. */ t->c_iflag &= TTYSUP_IFLAG; t->c_oflag &= TTYSUP_OFLAG; t->c_lflag &= TTYSUP_LFLAG; t->c_cflag &= TTYSUP_CFLAG; /* Set terminal flags through tcsetattr(). */ if (cmd == TIOCSETAW || cmd == TIOCSETAF) { error = tty_drain(tp, 0); if (error) return (error); if (cmd == TIOCSETAF) tty_flush(tp, FREAD); } /* * Only call param() when the flags really change. */ if ((t->c_cflag & CIGNORE) == 0 && (tp->t_termios.c_cflag != t->c_cflag || ((tp->t_termios.c_iflag ^ t->c_iflag) & (IXON|IXOFF|IXANY)) || tp->t_termios.c_ispeed != t->c_ispeed || tp->t_termios.c_ospeed != t->c_ospeed)) { error = ttydevsw_param(tp, t); if (error) return (error); /* XXX: CLOCAL? */ tp->t_termios.c_cflag = t->c_cflag & ~CIGNORE; tp->t_termios.c_ispeed = t->c_ispeed; tp->t_termios.c_ospeed = t->c_ospeed; /* Baud rate has changed - update watermarks. */ tty_watermarks(tp); } /* Copy new non-device driver parameters. */ tp->t_termios.c_iflag = t->c_iflag; tp->t_termios.c_oflag = t->c_oflag; tp->t_termios.c_lflag = t->c_lflag; memcpy(&tp->t_termios.c_cc, t->c_cc, sizeof t->c_cc); ttydisc_optimize(tp); if ((t->c_lflag & ICANON) == 0) { /* * When in non-canonical mode, wake up all * readers. Canonicalize any partial input. VMIN * and VTIME could also be adjusted. */ ttyinq_canonicalize(&tp->t_inq); tty_wakeup(tp, FREAD); } /* * For packet mode: notify the PTY consumer that VSTOP * and VSTART may have been changed. */ if (tp->t_termios.c_iflag & IXON && tp->t_termios.c_cc[VSTOP] == CTRL('S') && tp->t_termios.c_cc[VSTART] == CTRL('Q')) ttydevsw_pktnotify(tp, TIOCPKT_DOSTOP); else ttydevsw_pktnotify(tp, TIOCPKT_NOSTOP); return (0); } case TIOCGETD: /* For compatibility - we only support TTYDISC. */ *(int *)data = TTYDISC; return (0); case TIOCGPGRP: if (!tty_is_ctty(tp, td->td_proc)) return (ENOTTY); if (tp->t_pgrp != NULL) *(int *)data = tp->t_pgrp->pg_id; else *(int *)data = NO_PID; return (0); case TIOCGSID: if (!tty_is_ctty(tp, td->td_proc)) return (ENOTTY); MPASS(tp->t_session); *(int *)data = tp->t_session->s_sid; return (0); case TIOCSCTTY: { struct proc *p = td->td_proc; /* XXX: This looks awful. */ tty_unlock(tp); sx_xlock(&proctree_lock); tty_lock(tp); if (!SESS_LEADER(p)) { /* Only the session leader may do this. */ sx_xunlock(&proctree_lock); return (EPERM); } if (tp->t_session != NULL && tp->t_session == p->p_session) { /* This is already our controlling TTY. */ sx_xunlock(&proctree_lock); return (0); } if (p->p_session->s_ttyp != NULL || (tp->t_session != NULL && tp->t_session->s_ttyvp != NULL && tp->t_session->s_ttyvp->v_type != VBAD)) { /* * There is already a relation between a TTY and * a session, or the caller is not the session * leader. * * Allow the TTY to be stolen when the vnode is * invalid, but the reference to the TTY is * still active. This allows immediate reuse of * TTYs of which the session leader has been * killed or the TTY revoked. */ sx_xunlock(&proctree_lock); return (EPERM); } /* Connect the session to the TTY. */ tp->t_session = p->p_session; tp->t_session->s_ttyp = tp; tp->t_sessioncnt++; sx_xunlock(&proctree_lock); /* Assign foreground process group. */ tp->t_pgrp = p->p_pgrp; PROC_LOCK(p); p->p_flag |= P_CONTROLT; PROC_UNLOCK(p); return (0); } case TIOCSPGRP: { struct pgrp *pg; /* * XXX: Temporarily unlock the TTY to locate the process * group. This code would be lot nicer if we would ever * decompose proctree_lock. */ tty_unlock(tp); sx_slock(&proctree_lock); pg = pgfind(*(int *)data); if (pg != NULL) PGRP_UNLOCK(pg); if (pg == NULL || pg->pg_session != td->td_proc->p_session) { sx_sunlock(&proctree_lock); tty_lock(tp); return (EPERM); } tty_lock(tp); /* * Determine if this TTY is the controlling TTY after * relocking the TTY. */ if (!tty_is_ctty(tp, td->td_proc)) { sx_sunlock(&proctree_lock); return (ENOTTY); } tp->t_pgrp = pg; sx_sunlock(&proctree_lock); /* Wake up the background process groups. */ cv_broadcast(&tp->t_bgwait); return (0); } case TIOCFLUSH: { int flags = *(int *)data; if (flags == 0) flags = (FREAD|FWRITE); else flags &= (FREAD|FWRITE); tty_flush(tp, flags); return (0); } case TIOCDRAIN: /* Drain TTY output. */ return tty_drain(tp, 0); case TIOCCONS: /* Set terminal as console TTY. */ if (*(int *)data) { error = priv_check(td, PRIV_TTY_CONSOLE); if (error) return (error); /* * XXX: constty should really need to be locked! * XXX: allow disconnected constty's to be stolen! */ if (constty == tp) return (0); if (constty != NULL) return (EBUSY); tty_unlock(tp); constty_set(tp); tty_lock(tp); } else if (constty == tp) { constty_clear(); } return (0); case TIOCGWINSZ: /* Obtain window size. */ *(struct winsize*)data = tp->t_winsize; return (0); case TIOCSWINSZ: /* Set window size. */ tty_set_winsize(tp, data); return (0); case TIOCEXCL: tp->t_flags |= TF_EXCLUDE; return (0); case TIOCNXCL: tp->t_flags &= ~TF_EXCLUDE; return (0); case TIOCSTOP: tp->t_flags |= TF_STOPPED; ttydevsw_pktnotify(tp, TIOCPKT_STOP); return (0); case TIOCSTART: tp->t_flags &= ~TF_STOPPED; ttydevsw_outwakeup(tp); ttydevsw_pktnotify(tp, TIOCPKT_START); return (0); case TIOCSTAT: tty_info(tp); return (0); case TIOCSTI: if ((fflag & FREAD) == 0 && priv_check(td, PRIV_TTY_STI)) return (EPERM); if (!tty_is_ctty(tp, td->td_proc) && priv_check(td, PRIV_TTY_STI)) return (EACCES); ttydisc_rint(tp, *(char *)data, 0); ttydisc_rint_done(tp); return (0); } #ifdef COMPAT_43TTY return tty_ioctl_compat(tp, cmd, data, fflag, td); #else /* !COMPAT_43TTY */ return (ENOIOCTL); #endif /* COMPAT_43TTY */ } int tty_ioctl(struct tty *tp, u_long cmd, void *data, int fflag, struct thread *td) { int error; tty_lock_assert(tp, MA_OWNED); if (tty_gone(tp)) return (ENXIO); error = ttydevsw_ioctl(tp, cmd, data, td); if (error == ENOIOCTL) error = tty_generic_ioctl(tp, cmd, data, fflag, td); return (error); } dev_t tty_udev(struct tty *tp) { + if (tp->t_dev) - return dev2udev(tp->t_dev); + return (dev2udev(tp->t_dev)); else - return NODEV; + return (NODEV); } int tty_checkoutq(struct tty *tp) { /* 256 bytes should be enough to print a log message. */ return (ttyoutq_bytesleft(&tp->t_outq) >= 256); } void tty_hiwat_in_block(struct tty *tp) { if ((tp->t_flags & TF_HIWAT_IN) == 0 && tp->t_termios.c_iflag & IXOFF && tp->t_termios.c_cc[VSTOP] != _POSIX_VDISABLE) { /* * Input flow control. Only enter the high watermark when we * can successfully store the VSTOP character. */ if (ttyoutq_write_nofrag(&tp->t_outq, &tp->t_termios.c_cc[VSTOP], 1) == 0) tp->t_flags |= TF_HIWAT_IN; } else { /* No input flow control. */ tp->t_flags |= TF_HIWAT_IN; } } void tty_hiwat_in_unblock(struct tty *tp) { if (tp->t_flags & TF_HIWAT_IN && tp->t_termios.c_iflag & IXOFF && tp->t_termios.c_cc[VSTART] != _POSIX_VDISABLE) { /* * Input flow control. Only leave the high watermark when we * can successfully store the VSTART character. */ if (ttyoutq_write_nofrag(&tp->t_outq, &tp->t_termios.c_cc[VSTART], 1) == 0) tp->t_flags &= ~TF_HIWAT_IN; } else { /* No input flow control. */ tp->t_flags &= ~TF_HIWAT_IN; } if (!tty_gone(tp)) ttydevsw_inwakeup(tp); } /* * TTY hooks interface. */ static int ttyhook_defrint(struct tty *tp, char c, int flags) { if (ttyhook_rint_bypass(tp, &c, 1) != 1) return (-1); return (0); } int -ttyhook_register(struct tty **rtp, struct proc *p, int fd, - struct ttyhook *th, void *softc) +ttyhook_register(struct tty **rtp, struct proc *p, int fd, struct ttyhook *th, + void *softc) { struct tty *tp; struct file *fp; struct cdev *dev; struct cdevsw *cdp; struct filedesc *fdp; cap_rights_t rights; int error, ref; /* Validate the file descriptor. */ fdp = p->p_fd; error = fget_unlocked(fdp, fd, cap_rights_init(&rights, CAP_TTYHOOK), &fp, NULL); if (error != 0) return (error); if (fp->f_ops == &badfileops) { error = EBADF; goto done1; } /* * Make sure the vnode is bound to a character device. * Unlocked check for the vnode type is ok there, because we * only shall prevent calling devvn_refthread on the file that * never has been opened over a character device. */ if (fp->f_type != DTYPE_VNODE || fp->f_vnode->v_type != VCHR) { error = EINVAL; goto done1; } /* Make sure it is a TTY. */ cdp = devvn_refthread(fp->f_vnode, &dev, &ref); if (cdp == NULL) { error = ENXIO; goto done1; } if (dev != fp->f_data) { error = ENXIO; goto done2; } if (cdp != &ttydev_cdevsw) { error = ENOTTY; goto done2; } tp = dev->si_drv1; /* Try to attach the hook to the TTY. */ error = EBUSY; tty_lock(tp); MPASS((tp->t_hook == NULL) == ((tp->t_flags & TF_HOOK) == 0)); if (tp->t_flags & TF_HOOK) goto done3; tp->t_flags |= TF_HOOK; tp->t_hook = th; tp->t_hooksoftc = softc; *rtp = tp; error = 0; /* Maybe we can switch into bypass mode now. */ ttydisc_optimize(tp); /* Silently convert rint() calls to rint_bypass() when possible. */ if (!ttyhook_hashook(tp, rint) && ttyhook_hashook(tp, rint_bypass)) th->th_rint = ttyhook_defrint; done3: tty_unlock(tp); done2: dev_relthread(dev, ref); done1: fdrop(fp, curthread); return (error); } void ttyhook_unregister(struct tty *tp) { tty_lock_assert(tp, MA_OWNED); MPASS(tp->t_flags & TF_HOOK); /* Disconnect the hook. */ tp->t_flags &= ~TF_HOOK; tp->t_hook = NULL; /* Maybe we need to leave bypass mode. */ ttydisc_optimize(tp); /* Maybe deallocate the TTY as well. */ tty_rel_free(tp); } /* * /dev/console handling. */ static int ttyconsdev_open(struct cdev *dev, int oflags, int devtype, struct thread *td) { struct tty *tp; /* System has no console device. */ if (dev_console_filename == NULL) return (ENXIO); /* Look up corresponding TTY by device name. */ sx_slock(&tty_list_sx); TAILQ_FOREACH(tp, &tty_list, t_list) { if (strcmp(dev_console_filename, tty_devname(tp)) == 0) { dev_console->si_drv1 = tp; break; } } sx_sunlock(&tty_list_sx); /* System console has no TTY associated. */ if (dev_console->si_drv1 == NULL) return (ENXIO); return (ttydev_open(dev, oflags, devtype, td)); } static int ttyconsdev_write(struct cdev *dev, struct uio *uio, int ioflag) { log_console(uio); return (ttydev_write(dev, uio, ioflag)); } /* * /dev/console is a little different than normal TTY's. When opened, * it determines which TTY to use. When data gets written to it, it * will be logged in the kernel message buffer. */ static struct cdevsw ttyconsdev_cdevsw = { .d_version = D_VERSION, .d_open = ttyconsdev_open, .d_close = ttydev_close, .d_read = ttydev_read, .d_write = ttyconsdev_write, .d_ioctl = ttydev_ioctl, .d_kqfilter = ttydev_kqfilter, .d_poll = ttydev_poll, .d_mmap = ttydev_mmap, .d_name = "ttyconsdev", .d_flags = D_TTY, }; static void -ttyconsdev_init(void *unused) +ttyconsdev_init(void *unused __unused) { dev_console = make_dev_credf(MAKEDEV_ETERNAL, &ttyconsdev_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0600, "console"); } SYSINIT(tty, SI_SUB_DRIVERS, SI_ORDER_FIRST, ttyconsdev_init, NULL); void ttyconsdev_select(const char *name) { dev_console_filename = name; } /* * Debugging routines. */ #include "opt_ddb.h" #ifdef DDB #include #include -static struct { +static const struct { int flag; char val; } ttystates[] = { #if 0 { TF_NOPREFIX, 'N' }, #endif { TF_INITLOCK, 'I' }, { TF_CALLOUT, 'C' }, /* Keep these together -> 'Oi' and 'Oo'. */ { TF_OPENED, 'O' }, { TF_OPENED_IN, 'i' }, { TF_OPENED_OUT, 'o' }, { TF_OPENED_CONS, 'c' }, { TF_GONE, 'G' }, { TF_OPENCLOSE, 'B' }, { TF_ASYNC, 'Y' }, { TF_LITERAL, 'L' }, /* Keep these together -> 'Hi' and 'Ho'. */ { TF_HIWAT, 'H' }, { TF_HIWAT_IN, 'i' }, { TF_HIWAT_OUT, 'o' }, { TF_STOPPED, 'S' }, { TF_EXCLUDE, 'X' }, { TF_BYPASS, 'l' }, { TF_ZOMBIE, 'Z' }, { TF_HOOK, 's' }, /* Keep these together -> 'bi' and 'bo'. */ { TF_BUSY, 'b' }, { TF_BUSY_IN, 'i' }, { TF_BUSY_OUT, 'o' }, { 0, '\0'}, }; #define TTY_FLAG_BITS \ - "\20\1NOPREFIX\2INITLOCK\3CALLOUT\4OPENED_IN\5OPENED_OUT\6GONE" \ - "\7OPENCLOSE\10ASYNC\11LITERAL\12HIWAT_IN\13HIWAT_OUT\14STOPPED" \ - "\15EXCLUDE\16BYPASS\17ZOMBIE\20HOOK" + "\20\1NOPREFIX\2INITLOCK\3CALLOUT\4OPENED_IN" \ + "\5OPENED_OUT\6OPENED_CONS\7GONE\10OPENCLOSE" \ + "\11ASYNC\12LITERAL\13HIWAT_IN\14HIWAT_OUT" \ + "\15STOPPED\16EXCLUDE\17BYPASS\20ZOMBIE" \ + "\21HOOK\22BUSY_IN\23BUSY_OUT" #define DB_PRINTSYM(name, addr) \ db_printf("%s " #name ": ", sep); \ db_printsym((db_addr_t) addr, DB_STGY_ANY); \ db_printf("\n"); static void _db_show_devsw(const char *sep, const struct ttydevsw *tsw) { + db_printf("%sdevsw: ", sep); db_printsym((db_addr_t)tsw, DB_STGY_ANY); db_printf(" (%p)\n", tsw); DB_PRINTSYM(open, tsw->tsw_open); DB_PRINTSYM(close, tsw->tsw_close); DB_PRINTSYM(outwakeup, tsw->tsw_outwakeup); DB_PRINTSYM(inwakeup, tsw->tsw_inwakeup); DB_PRINTSYM(ioctl, tsw->tsw_ioctl); DB_PRINTSYM(param, tsw->tsw_param); DB_PRINTSYM(modem, tsw->tsw_modem); DB_PRINTSYM(mmap, tsw->tsw_mmap); DB_PRINTSYM(pktnotify, tsw->tsw_pktnotify); DB_PRINTSYM(free, tsw->tsw_free); } + static void _db_show_hooks(const char *sep, const struct ttyhook *th) { + db_printf("%shook: ", sep); db_printsym((db_addr_t)th, DB_STGY_ANY); db_printf(" (%p)\n", th); if (th == NULL) return; DB_PRINTSYM(rint, th->th_rint); DB_PRINTSYM(rint_bypass, th->th_rint_bypass); DB_PRINTSYM(rint_done, th->th_rint_done); DB_PRINTSYM(rint_poll, th->th_rint_poll); DB_PRINTSYM(getc_inject, th->th_getc_inject); DB_PRINTSYM(getc_capture, th->th_getc_capture); DB_PRINTSYM(getc_poll, th->th_getc_poll); DB_PRINTSYM(close, th->th_close); } static void _db_show_termios(const char *name, const struct termios *t) { db_printf("%s: iflag 0x%x oflag 0x%x cflag 0x%x " "lflag 0x%x ispeed %u ospeed %u\n", name, t->c_iflag, t->c_oflag, t->c_cflag, t->c_lflag, t->c_ispeed, t->c_ospeed); } /* DDB command to show TTY statistics. */ DB_SHOW_COMMAND(tty, db_show_tty) { struct tty *tp; if (!have_addr) { db_printf("usage: show tty \n"); return; } tp = (struct tty *)addr; - db_printf("0x%p: %s\n", tp, tty_devname(tp)); + db_printf("%p: %s\n", tp, tty_devname(tp)); db_printf("\tmtx: %p\n", tp->t_mtx); - db_printf("\tflags: %b\n", tp->t_flags, TTY_FLAG_BITS); + db_printf("\tflags: 0x%b\n", tp->t_flags, TTY_FLAG_BITS); db_printf("\trevokecnt: %u\n", tp->t_revokecnt); /* Buffering mechanisms. */ db_printf("\tinq: %p begin %u linestart %u reprint %u end %u " "nblocks %u quota %u\n", &tp->t_inq, tp->t_inq.ti_begin, tp->t_inq.ti_linestart, tp->t_inq.ti_reprint, tp->t_inq.ti_end, tp->t_inq.ti_nblocks, tp->t_inq.ti_quota); db_printf("\toutq: %p begin %u end %u nblocks %u quota %u\n", &tp->t_outq, tp->t_outq.to_begin, tp->t_outq.to_end, tp->t_outq.to_nblocks, tp->t_outq.to_quota); db_printf("\tinlow: %zu\n", tp->t_inlow); db_printf("\toutlow: %zu\n", tp->t_outlow); _db_show_termios("\ttermios", &tp->t_termios); db_printf("\twinsize: row %u col %u xpixel %u ypixel %u\n", tp->t_winsize.ws_row, tp->t_winsize.ws_col, tp->t_winsize.ws_xpixel, tp->t_winsize.ws_ypixel); db_printf("\tcolumn: %u\n", tp->t_column); db_printf("\twritepos: %u\n", tp->t_writepos); db_printf("\tcompatflags: 0x%x\n", tp->t_compatflags); /* Init/lock-state devices. */ _db_show_termios("\ttermios_init_in", &tp->t_termios_init_in); _db_show_termios("\ttermios_init_out", &tp->t_termios_init_out); _db_show_termios("\ttermios_lock_in", &tp->t_termios_lock_in); _db_show_termios("\ttermios_lock_out", &tp->t_termios_lock_out); /* Hooks */ _db_show_devsw("\t", tp->t_devsw); _db_show_hooks("\t", tp->t_hook); /* Process info. */ db_printf("\tpgrp: %p gid %d jobc %d\n", tp->t_pgrp, tp->t_pgrp ? tp->t_pgrp->pg_id : 0, tp->t_pgrp ? tp->t_pgrp->pg_jobc : 0); db_printf("\tsession: %p", tp->t_session); if (tp->t_session != NULL) db_printf(" count %u leader %p tty %p sid %d login %s", tp->t_session->s_count, tp->t_session->s_leader, tp->t_session->s_ttyp, tp->t_session->s_sid, tp->t_session->s_login); db_printf("\n"); db_printf("\tsessioncnt: %u\n", tp->t_sessioncnt); db_printf("\tdevswsoftc: %p\n", tp->t_devswsoftc); db_printf("\thooksoftc: %p\n", tp->t_hooksoftc); db_printf("\tdev: %p\n", tp->t_dev); } /* DDB command to list TTYs. */ DB_SHOW_ALL_COMMAND(ttys, db_show_all_ttys) { struct tty *tp; size_t isiz, osiz; int i, j; /* Make the output look like `pstat -t'. */ db_printf("PTR "); #if defined(__LP64__) db_printf(" "); #endif db_printf(" LINE INQ CAN LIN LOW OUTQ USE LOW " "COL SESS PGID STATE\n"); TAILQ_FOREACH(tp, &tty_list, t_list) { isiz = tp->t_inq.ti_nblocks * TTYINQ_DATASIZE; osiz = tp->t_outq.to_nblocks * TTYOUTQ_DATASIZE; - db_printf("%p %10s %5zu %4u %4u %4zu %5zu %4u %4zu %5u %5d %5d ", - tp, - tty_devname(tp), - isiz, + db_printf("%p %10s %5zu %4u %4u %4zu %5zu %4u %4zu %5u %5d " + "%5d ", tp, tty_devname(tp), isiz, tp->t_inq.ti_linestart - tp->t_inq.ti_begin, tp->t_inq.ti_end - tp->t_inq.ti_linestart, - isiz - tp->t_inlow, - osiz, + isiz - tp->t_inlow, osiz, tp->t_outq.to_end - tp->t_outq.to_begin, - osiz - tp->t_outlow, - MIN(tp->t_column, 99999), + osiz - tp->t_outlow, MIN(tp->t_column, 99999), tp->t_session ? tp->t_session->s_sid : 0, tp->t_pgrp ? tp->t_pgrp->pg_id : 0); /* Flag bits. */ for (i = j = 0; ttystates[i].flag; i++) if (tp->t_flags & ttystates[i].flag) { db_printf("%c", ttystates[i].val); j++; } if (j == 0) db_printf("-"); db_printf("\n"); } } #endif /* DDB */ Index: projects/clang380-import/sys/kern/vfs_export.c =================================================================== --- projects/clang380-import/sys/kern/vfs_export.c (revision 294776) +++ projects/clang380-import/sys/kern/vfs_export.c (revision 294777) @@ -1,520 +1,520 @@ /*- * Copyright (c) 1989, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)vfs_subr.c 8.31 (Berkeley) 5/26/95 */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include static MALLOC_DEFINE(M_NETADDR, "export_host", "Export host address structure"); static struct radix_node_head *vfs_create_addrlist_af( struct radix_node_head **prnh, int off); static void vfs_free_addrlist(struct netexport *nep); static int vfs_free_netcred(struct radix_node *rn, void *w); static void vfs_free_addrlist_af(struct radix_node_head **prnh); static int vfs_hang_addrlist(struct mount *mp, struct netexport *nep, struct export_args *argp); static struct netcred *vfs_export_lookup(struct mount *, struct sockaddr *); /* * Network address lookup element */ struct netcred { struct radix_node netc_rnodes[2]; int netc_exflags; struct ucred *netc_anon; int netc_numsecflavors; int netc_secflavors[MAXSECFLAVORS]; }; /* * Network export information */ struct netexport { struct netcred ne_defexported; /* Default export */ struct radix_node_head *ne4; struct radix_node_head *ne6; }; /* * Build hash lists of net addresses and hang them off the mount point. * Called by vfs_export() to set up the lists of export addresses. */ static int vfs_hang_addrlist(struct mount *mp, struct netexport *nep, struct export_args *argp) { register struct netcred *np; register struct radix_node_head *rnh; register int i; struct radix_node *rn; struct sockaddr *saddr, *smask = 0; #if defined(INET6) || defined(INET) int off; #endif int error; /* * XXX: This routine converts from a `struct xucred' * (argp->ex_anon) to a `struct ucred' (np->netc_anon). This * operation is questionable; for example, what should be done * with fields like cr_uidinfo and cr_prison? Currently, this * routine does not touch them (leaves them as NULL). */ if (argp->ex_anon.cr_version != XUCRED_VERSION) { vfs_mount_error(mp, "ex_anon.cr_version: %d != %d", argp->ex_anon.cr_version, XUCRED_VERSION); return (EINVAL); } if (argp->ex_addrlen == 0) { if (mp->mnt_flag & MNT_DEFEXPORTED) { vfs_mount_error(mp, "MNT_DEFEXPORTED already set for mount %p", mp); return (EPERM); } np = &nep->ne_defexported; np->netc_exflags = argp->ex_flags; np->netc_anon = crget(); np->netc_anon->cr_uid = argp->ex_anon.cr_uid; crsetgroups(np->netc_anon, argp->ex_anon.cr_ngroups, argp->ex_anon.cr_groups); np->netc_anon->cr_prison = &prison0; prison_hold(np->netc_anon->cr_prison); np->netc_numsecflavors = argp->ex_numsecflavors; bcopy(argp->ex_secflavors, np->netc_secflavors, sizeof(np->netc_secflavors)); MNT_ILOCK(mp); mp->mnt_flag |= MNT_DEFEXPORTED; MNT_IUNLOCK(mp); return (0); } #if MSIZE <= 256 if (argp->ex_addrlen > MLEN) { vfs_mount_error(mp, "ex_addrlen %d is greater than %d", argp->ex_addrlen, MLEN); return (EINVAL); } #endif i = sizeof(struct netcred) + argp->ex_addrlen + argp->ex_masklen; np = (struct netcred *) malloc(i, M_NETADDR, M_WAITOK | M_ZERO); saddr = (struct sockaddr *) (np + 1); if ((error = copyin(argp->ex_addr, saddr, argp->ex_addrlen))) goto out; if (saddr->sa_family == AF_UNSPEC || saddr->sa_family > AF_MAX) { error = EINVAL; vfs_mount_error(mp, "Invalid saddr->sa_family: %d"); goto out; } if (saddr->sa_len > argp->ex_addrlen) saddr->sa_len = argp->ex_addrlen; if (argp->ex_masklen) { smask = (struct sockaddr *)((caddr_t)saddr + argp->ex_addrlen); error = copyin(argp->ex_mask, smask, argp->ex_masklen); if (error) goto out; if (smask->sa_len > argp->ex_masklen) smask->sa_len = argp->ex_masklen; } rnh = NULL; switch (saddr->sa_family) { #ifdef INET case AF_INET: if ((rnh = nep->ne4) == NULL) { off = offsetof(struct sockaddr_in, sin_addr) << 3; rnh = vfs_create_addrlist_af(&nep->ne4, off); } break; #endif #ifdef INET6 case AF_INET6: if ((rnh = nep->ne6) == NULL) { off = offsetof(struct sockaddr_in6, sin6_addr) << 3; rnh = vfs_create_addrlist_af(&nep->ne6, off); } break; #endif } if (rnh == NULL) { error = ENOBUFS; vfs_mount_error(mp, "%s %s %d", "Unable to initialize radix node head ", "for address family", saddr->sa_family); goto out; } RADIX_NODE_HEAD_LOCK(rnh); - rn = (*rnh->rnh_addaddr)(saddr, smask, rnh, np->netc_rnodes); + rn = (*rnh->rnh_addaddr)(saddr, smask, &rnh->rh, np->netc_rnodes); RADIX_NODE_HEAD_UNLOCK(rnh); if (rn == NULL || np != (struct netcred *)rn) { /* already exists */ error = EPERM; vfs_mount_error(mp, "netcred already exists for given addr/mask"); goto out; } np->netc_exflags = argp->ex_flags; np->netc_anon = crget(); np->netc_anon->cr_uid = argp->ex_anon.cr_uid; crsetgroups(np->netc_anon, argp->ex_anon.cr_ngroups, argp->ex_anon.cr_groups); np->netc_anon->cr_prison = &prison0; prison_hold(np->netc_anon->cr_prison); np->netc_numsecflavors = argp->ex_numsecflavors; bcopy(argp->ex_secflavors, np->netc_secflavors, sizeof(np->netc_secflavors)); return (0); out: free(np, M_NETADDR); return (error); } /* Helper for vfs_free_addrlist. */ /* ARGSUSED */ static int vfs_free_netcred(struct radix_node *rn, void *w) { struct radix_node_head *rnh = (struct radix_node_head *) w; struct ucred *cred; - (*rnh->rnh_deladdr) (rn->rn_key, rn->rn_mask, rnh); + (*rnh->rnh_deladdr) (rn->rn_key, rn->rn_mask, &rnh->rh); cred = ((struct netcred *)rn)->netc_anon; if (cred != NULL) crfree(cred); free(rn, M_NETADDR); return (0); } static struct radix_node_head * vfs_create_addrlist_af(struct radix_node_head **prnh, int off) { if (rn_inithead((void **)prnh, off) == 0) return (NULL); RADIX_NODE_HEAD_LOCK_INIT(*prnh); return (*prnh); } static void vfs_free_addrlist_af(struct radix_node_head **prnh) { struct radix_node_head *rnh; rnh = *prnh; RADIX_NODE_HEAD_LOCK(rnh); - (*rnh->rnh_walktree) (rnh, vfs_free_netcred, rnh); + (*rnh->rnh_walktree)(&rnh->rh, vfs_free_netcred, &rnh->rh); RADIX_NODE_HEAD_UNLOCK(rnh); RADIX_NODE_HEAD_DESTROY(rnh); free(rnh, M_RTABLE); prnh = NULL; } /* * Free the net address hash lists that are hanging off the mount points. */ static void vfs_free_addrlist(struct netexport *nep) { struct ucred *cred; if (nep->ne4 != NULL) vfs_free_addrlist_af(&nep->ne4); if (nep->ne6 != NULL) vfs_free_addrlist_af(&nep->ne6); cred = nep->ne_defexported.netc_anon; if (cred != NULL) crfree(cred); } /* * High level function to manipulate export options on a mount point * and the passed in netexport. * Struct export_args *argp is the variable used to twiddle options, * the structure is described in sys/mount.h */ int vfs_export(struct mount *mp, struct export_args *argp) { struct netexport *nep; int error; if (argp->ex_numsecflavors < 0 || argp->ex_numsecflavors >= MAXSECFLAVORS) return (EINVAL); error = 0; lockmgr(&mp->mnt_explock, LK_EXCLUSIVE, NULL); nep = mp->mnt_export; if (argp->ex_flags & MNT_DELEXPORT) { if (nep == NULL) { error = ENOENT; goto out; } if (mp->mnt_flag & MNT_EXPUBLIC) { vfs_setpublicfs(NULL, NULL, NULL); MNT_ILOCK(mp); mp->mnt_flag &= ~MNT_EXPUBLIC; MNT_IUNLOCK(mp); } vfs_free_addrlist(nep); mp->mnt_export = NULL; free(nep, M_MOUNT); nep = NULL; MNT_ILOCK(mp); mp->mnt_flag &= ~(MNT_EXPORTED | MNT_DEFEXPORTED); MNT_IUNLOCK(mp); } if (argp->ex_flags & MNT_EXPORTED) { if (nep == NULL) { nep = malloc(sizeof(struct netexport), M_MOUNT, M_WAITOK | M_ZERO); mp->mnt_export = nep; } if (argp->ex_flags & MNT_EXPUBLIC) { if ((error = vfs_setpublicfs(mp, nep, argp)) != 0) goto out; MNT_ILOCK(mp); mp->mnt_flag |= MNT_EXPUBLIC; MNT_IUNLOCK(mp); } if ((error = vfs_hang_addrlist(mp, nep, argp))) goto out; MNT_ILOCK(mp); mp->mnt_flag |= MNT_EXPORTED; MNT_IUNLOCK(mp); } out: lockmgr(&mp->mnt_explock, LK_RELEASE, NULL); /* * Once we have executed the vfs_export() command, we do * not want to keep the "export" option around in the * options list, since that will cause subsequent MNT_UPDATE * calls to fail. The export information is saved in * mp->mnt_export, so we can safely delete the "export" mount option * here. */ vfs_deleteopt(mp->mnt_optnew, "export"); vfs_deleteopt(mp->mnt_opt, "export"); return (error); } /* * Set the publicly exported filesystem (WebNFS). Currently, only * one public filesystem is possible in the spec (RFC 2054 and 2055) */ int vfs_setpublicfs(struct mount *mp, struct netexport *nep, struct export_args *argp) { int error; struct vnode *rvp; char *cp; /* * mp == NULL -> invalidate the current info, the FS is * no longer exported. May be called from either vfs_export * or unmount, so check if it hasn't already been done. */ if (mp == NULL) { if (nfs_pub.np_valid) { nfs_pub.np_valid = 0; if (nfs_pub.np_index != NULL) { free(nfs_pub.np_index, M_TEMP); nfs_pub.np_index = NULL; } } return (0); } /* * Only one allowed at a time. */ if (nfs_pub.np_valid != 0 && mp != nfs_pub.np_mount) return (EBUSY); /* * Get real filehandle for root of exported FS. */ bzero(&nfs_pub.np_handle, sizeof(nfs_pub.np_handle)); nfs_pub.np_handle.fh_fsid = mp->mnt_stat.f_fsid; if ((error = VFS_ROOT(mp, LK_EXCLUSIVE, &rvp))) return (error); if ((error = VOP_VPTOFH(rvp, &nfs_pub.np_handle.fh_fid))) return (error); vput(rvp); /* * If an indexfile was specified, pull it in. */ if (argp->ex_indexfile != NULL) { if (nfs_pub.np_index != NULL) nfs_pub.np_index = malloc(MAXNAMLEN + 1, M_TEMP, M_WAITOK); error = copyinstr(argp->ex_indexfile, nfs_pub.np_index, MAXNAMLEN, (size_t *)0); if (!error) { /* * Check for illegal filenames. */ for (cp = nfs_pub.np_index; *cp; cp++) { if (*cp == '/') { error = EINVAL; break; } } } if (error) { free(nfs_pub.np_index, M_TEMP); nfs_pub.np_index = NULL; return (error); } } nfs_pub.np_mount = mp; nfs_pub.np_valid = 1; return (0); } /* * Used by the filesystems to determine if a given network address * (passed in 'nam') is present in their exports list, returns a pointer * to struct netcred so that the filesystem can examine it for * access rights (read/write/etc). */ static struct netcred * vfs_export_lookup(struct mount *mp, struct sockaddr *nam) { struct netexport *nep; register struct netcred *np; register struct radix_node_head *rnh; struct sockaddr *saddr; nep = mp->mnt_export; if (nep == NULL) return (NULL); np = NULL; if (mp->mnt_flag & MNT_EXPORTED) { /* * Lookup in the export list first. */ if (nam != NULL) { saddr = nam; rnh = NULL; switch (saddr->sa_family) { case AF_INET: rnh = nep->ne4; break; case AF_INET6: rnh = nep->ne6; break; } if (rnh != NULL) { RADIX_NODE_HEAD_RLOCK(rnh); np = (struct netcred *) - (*rnh->rnh_matchaddr)(saddr, rnh); + (*rnh->rnh_matchaddr)(saddr, &rnh->rh); RADIX_NODE_HEAD_RUNLOCK(rnh); if (np && np->netc_rnodes->rn_flags & RNF_ROOT) np = NULL; } } /* * If no address match, use the default if it exists. */ if (np == NULL && mp->mnt_flag & MNT_DEFEXPORTED) np = &nep->ne_defexported; } return (np); } /* * XXX: This comment comes from the deprecated ufs_check_export() * XXX: and may not entirely apply, but lacking something better: * This is the generic part of fhtovp called after the underlying * filesystem has validated the file handle. * * Verify that a host should have access to a filesystem. */ int vfs_stdcheckexp(struct mount *mp, struct sockaddr *nam, int *extflagsp, struct ucred **credanonp, int *numsecflavors, int **secflavors) { struct netcred *np; lockmgr(&mp->mnt_explock, LK_SHARED, NULL); np = vfs_export_lookup(mp, nam); if (np == NULL) { lockmgr(&mp->mnt_explock, LK_RELEASE, NULL); *credanonp = NULL; return (EACCES); } *extflagsp = np->netc_exflags; if ((*credanonp = np->netc_anon) != NULL) crhold(*credanonp); if (numsecflavors) *numsecflavors = np->netc_numsecflavors; if (secflavors) *secflavors = np->netc_secflavors; lockmgr(&mp->mnt_explock, LK_RELEASE, NULL); return (0); } Index: projects/clang380-import/sys/mips/conf/AR934X_BASE =================================================================== --- projects/clang380-import/sys/mips/conf/AR934X_BASE (revision 294776) +++ projects/clang380-import/sys/mips/conf/AR934X_BASE (revision 294777) @@ -1,130 +1,129 @@ # # AR91XX -- Kernel configuration base file for the Atheros AR913x SoC. # # This file (and the hints file accompanying it) are not designed to be # used by themselves. Instead, users of this file should create a kernel # config file which includes this file (which gets the basic hints), then # override the default options (adding devices as needed) and adding # hints as needed (for example, the GPIO and LAN PHY.) # # $FreeBSD$ # machine mips mips ident AR934X_BASE cpu CPU_MIPS74KC makeoptions KERNLOADADDR=0x80050000 options HZ=1000 files "../atheros/files.ar71xx" hints "AR934X_BASE.hints" makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols makeoptions MODULES_OVERRIDE="gpio ar71xx if_gif if_gre if_vlan if_bridge bridgestp usb wlan wlan_xauth wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_rssadapt wlan_amrr ath ath_ahb hwpmc ipfw ipfw_nat libalias" # makeoptions MODULES_OVERRIDE="" options DDB options KDB options ALQ options SCHED_4BSD #4BSD scheduler options INET #InterNETworking #options INET6 #InterNETworking #options NFSCL #Network Filesystem Client options PSEUDOFS #Pseudo-filesystem framework options _KPOSIX_PRIORITY_SCHEDULING #Posix P1003_1B real-time extensions # Don't include the SCSI/CAM strings in the default build options SCSI_NO_SENSE_STRINGS options SCSI_NO_OP_STRINGS # .. And no sysctl strings options NO_SYSCTL_DESCR # Limit IO size options NBUF=128 # Limit UMTX hash size # options UMTX_NUM_CHAINS=64 # PMC #options HWPMC_HOOKS #device hwpmc #device hwpmc_mips24k # options NFS_LEGACYRPC # Debugging for use in -current #options INVARIANTS #options INVARIANT_SUPPORT #options WITNESS #options WITNESS_SKIPSPIN options FFS #Berkeley Fast Filesystem #options SOFTUPDATES #Enable FFS soft updates support #options UFS_ACL #Support for access control lists #options UFS_DIRHASH #Improve performance on big directories options NO_FFS_SNAPSHOT # We don't require snapshot support # Wireless NIC cards options IEEE80211_DEBUG options IEEE80211_SUPPORT_MESH options IEEE80211_SUPPORT_TDMA options IEEE80211_SUPPORT_SUPERG options IEEE80211_ALQ # 802.11 ALQ logging support device wlan # 802.11 support device wlan_wep # 802.11 WEP support device wlan_ccmp # 802.11 CCMP support device wlan_tkip # 802.11 TKIP support device wlan_xauth # 802.11 hostap support # ath(4) device ath # Atheros network device device ath_rate_sample device ath_ahb # Atheros host bus glue options ATH_DEBUG options ATH_DIAGAPI option ATH_ENABLE_11N -option AH_DEBUG_ALQ #device ath_hal device ath_ar9300 # AR9330 HAL; no need for the others option AH_DEBUG option AH_SUPPORT_AR5416 # 11n HAL support option AH_SUPPORT_AR9340 # Chipset support option AH_DEBUG_ALQ option AH_AR5416_INTERRUPT_MITIGATION device mii device arge device usb options USB_EHCI_BIG_ENDIAN_DESC # handle big-endian byte order options USB_DEBUG options USB_HOST_ALIGN=32 # AR71XX (MIPS in general?) requires this device ehci device pci device ar724x_pci device scbus device umass device da device spibus device ar71xx_spi device mx25l device ar71xx_wdog device uart device uart_ar71xx # XXX for now; later a separate APB mux is needed to demux PCI/WLAN interrupts. device ar71xx_apb device loop device ether device md device bpf device random device if_bridge device gpio device gpioled Index: projects/clang380-import/sys/net/if_lagg.c =================================================================== --- projects/clang380-import/sys/net/if_lagg.c (revision 294776) +++ projects/clang380-import/sys/net/if_lagg.c (revision 294777) @@ -1,2219 +1,2240 @@ /* $OpenBSD: if_trunk.c,v 1.30 2007/01/31 06:20:19 reyk Exp $ */ /* * Copyright (c) 2005, 2006 Reyk Floeter * Copyright (c) 2007 Andrew Thompson - * Copyright (c) 2014 Marcelo Araujo + * Copyright (c) 2014, 2016 Marcelo Araujo * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #if defined(INET) || defined(INET6) #include #include #endif #ifdef INET #include #include #endif #ifdef INET6 #include #include #include #endif #include #include #include /* Special flags we should propagate to the lagg ports. */ static struct { int flag; int (*func)(struct ifnet *, int); } lagg_pflags[] = { {IFF_PROMISC, ifpromisc}, {IFF_ALLMULTI, if_allmulti}, {0, NULL} }; VNET_DEFINE(SLIST_HEAD(__trhead, lagg_softc), lagg_list); /* list of laggs */ #define V_lagg_list VNET(lagg_list) static VNET_DEFINE(struct mtx, lagg_list_mtx); #define V_lagg_list_mtx VNET(lagg_list_mtx) #define LAGG_LIST_LOCK_INIT(x) mtx_init(&V_lagg_list_mtx, \ "if_lagg list", NULL, MTX_DEF) #define LAGG_LIST_LOCK_DESTROY(x) mtx_destroy(&V_lagg_list_mtx) #define LAGG_LIST_LOCK(x) mtx_lock(&V_lagg_list_mtx) #define LAGG_LIST_UNLOCK(x) mtx_unlock(&V_lagg_list_mtx) eventhandler_tag lagg_detach_cookie = NULL; static int lagg_clone_create(struct if_clone *, int, caddr_t); static void lagg_clone_destroy(struct ifnet *); static VNET_DEFINE(struct if_clone *, lagg_cloner); #define V_lagg_cloner VNET(lagg_cloner) static const char laggname[] = "lagg"; static void lagg_lladdr(struct lagg_softc *, uint8_t *); static void lagg_capabilities(struct lagg_softc *); static void lagg_port_lladdr(struct lagg_port *, uint8_t *, lagg_llqtype); static void lagg_port_setlladdr(void *, int); static int lagg_port_create(struct lagg_softc *, struct ifnet *); static int lagg_port_destroy(struct lagg_port *, int); static struct mbuf *lagg_input(struct ifnet *, struct mbuf *); static void lagg_linkstate(struct lagg_softc *); static void lagg_port_state(struct ifnet *, int); static int lagg_port_ioctl(struct ifnet *, u_long, caddr_t); static int lagg_port_output(struct ifnet *, struct mbuf *, const struct sockaddr *, struct route *); static void lagg_port_ifdetach(void *arg __unused, struct ifnet *); #ifdef LAGG_PORT_STACKING static int lagg_port_checkstacking(struct lagg_softc *); #endif static void lagg_port2req(struct lagg_port *, struct lagg_reqport *); static void lagg_init(void *); static void lagg_stop(struct lagg_softc *); static int lagg_ioctl(struct ifnet *, u_long, caddr_t); static int lagg_ether_setmulti(struct lagg_softc *); static int lagg_ether_cmdmulti(struct lagg_port *, int); static int lagg_setflag(struct lagg_port *, int, int, int (*func)(struct ifnet *, int)); static int lagg_setflags(struct lagg_port *, int status); static uint64_t lagg_get_counter(struct ifnet *ifp, ift_counter cnt); static int lagg_transmit(struct ifnet *, struct mbuf *); static void lagg_qflush(struct ifnet *); static int lagg_media_change(struct ifnet *); static void lagg_media_status(struct ifnet *, struct ifmediareq *); static struct lagg_port *lagg_link_active(struct lagg_softc *, struct lagg_port *); /* Simple round robin */ static void lagg_rr_attach(struct lagg_softc *); static int lagg_rr_start(struct lagg_softc *, struct mbuf *); static struct mbuf *lagg_rr_input(struct lagg_softc *, struct lagg_port *, struct mbuf *); /* Active failover */ static int lagg_fail_start(struct lagg_softc *, struct mbuf *); static struct mbuf *lagg_fail_input(struct lagg_softc *, struct lagg_port *, struct mbuf *); /* Loadbalancing */ static void lagg_lb_attach(struct lagg_softc *); static void lagg_lb_detach(struct lagg_softc *); static int lagg_lb_port_create(struct lagg_port *); static void lagg_lb_port_destroy(struct lagg_port *); static int lagg_lb_start(struct lagg_softc *, struct mbuf *); static struct mbuf *lagg_lb_input(struct lagg_softc *, struct lagg_port *, struct mbuf *); static int lagg_lb_porttable(struct lagg_softc *, struct lagg_port *); /* Broadcast */ static int lagg_bcast_start(struct lagg_softc *, struct mbuf *); static struct mbuf *lagg_bcast_input(struct lagg_softc *, struct lagg_port *, struct mbuf *); /* 802.3ad LACP */ static void lagg_lacp_attach(struct lagg_softc *); static void lagg_lacp_detach(struct lagg_softc *); static int lagg_lacp_start(struct lagg_softc *, struct mbuf *); static struct mbuf *lagg_lacp_input(struct lagg_softc *, struct lagg_port *, struct mbuf *); static void lagg_lacp_lladdr(struct lagg_softc *); /* lagg protocol table */ static const struct lagg_proto { lagg_proto pr_num; void (*pr_attach)(struct lagg_softc *); void (*pr_detach)(struct lagg_softc *); int (*pr_start)(struct lagg_softc *, struct mbuf *); struct mbuf * (*pr_input)(struct lagg_softc *, struct lagg_port *, struct mbuf *); int (*pr_addport)(struct lagg_port *); void (*pr_delport)(struct lagg_port *); void (*pr_linkstate)(struct lagg_port *); void (*pr_init)(struct lagg_softc *); void (*pr_stop)(struct lagg_softc *); void (*pr_lladdr)(struct lagg_softc *); void (*pr_request)(struct lagg_softc *, void *); void (*pr_portreq)(struct lagg_port *, void *); } lagg_protos[] = { { .pr_num = LAGG_PROTO_NONE }, { .pr_num = LAGG_PROTO_ROUNDROBIN, .pr_attach = lagg_rr_attach, .pr_start = lagg_rr_start, .pr_input = lagg_rr_input, }, { .pr_num = LAGG_PROTO_FAILOVER, .pr_start = lagg_fail_start, .pr_input = lagg_fail_input, }, { .pr_num = LAGG_PROTO_LOADBALANCE, .pr_attach = lagg_lb_attach, .pr_detach = lagg_lb_detach, .pr_start = lagg_lb_start, .pr_input = lagg_lb_input, .pr_addport = lagg_lb_port_create, .pr_delport = lagg_lb_port_destroy, }, { .pr_num = LAGG_PROTO_LACP, .pr_attach = lagg_lacp_attach, .pr_detach = lagg_lacp_detach, .pr_start = lagg_lacp_start, .pr_input = lagg_lacp_input, .pr_addport = lacp_port_create, .pr_delport = lacp_port_destroy, .pr_linkstate = lacp_linkstate, .pr_init = lacp_init, .pr_stop = lacp_stop, .pr_lladdr = lagg_lacp_lladdr, .pr_request = lacp_req, .pr_portreq = lacp_portreq, }, { .pr_num = LAGG_PROTO_BROADCAST, .pr_start = lagg_bcast_start, .pr_input = lagg_bcast_input, }, }; SYSCTL_DECL(_net_link); SYSCTL_NODE(_net_link, OID_AUTO, lagg, CTLFLAG_RW, 0, "Link Aggregation"); /* Allow input on any failover links */ static VNET_DEFINE(int, lagg_failover_rx_all); #define V_lagg_failover_rx_all VNET(lagg_failover_rx_all) SYSCTL_INT(_net_link_lagg, OID_AUTO, failover_rx_all, CTLFLAG_RW | CTLFLAG_VNET, &VNET_NAME(lagg_failover_rx_all), 0, "Accept input from any interface in a failover lagg"); /* Default value for using flowid */ static VNET_DEFINE(int, def_use_flowid) = 1; #define V_def_use_flowid VNET(def_use_flowid) SYSCTL_INT(_net_link_lagg, OID_AUTO, default_use_flowid, CTLFLAG_RWTUN, &VNET_NAME(def_use_flowid), 0, "Default setting for using flow id for load sharing"); /* Default value for flowid shift */ static VNET_DEFINE(int, def_flowid_shift) = 16; #define V_def_flowid_shift VNET(def_flowid_shift) SYSCTL_INT(_net_link_lagg, OID_AUTO, default_flowid_shift, CTLFLAG_RWTUN, &VNET_NAME(def_flowid_shift), 0, "Default setting for flowid shift for load sharing"); static void vnet_lagg_init(const void *unused __unused) { LAGG_LIST_LOCK_INIT(); SLIST_INIT(&V_lagg_list); V_lagg_cloner = if_clone_simple(laggname, lagg_clone_create, lagg_clone_destroy, 0); } VNET_SYSINIT(vnet_lagg_init, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY, vnet_lagg_init, NULL); static void vnet_lagg_uninit(const void *unused __unused) { if_clone_detach(V_lagg_cloner); LAGG_LIST_LOCK_DESTROY(); } VNET_SYSUNINIT(vnet_lagg_uninit, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY, vnet_lagg_uninit, NULL); static int lagg_modevent(module_t mod, int type, void *data) { switch (type) { case MOD_LOAD: lagg_input_p = lagg_input; lagg_linkstate_p = lagg_port_state; lagg_detach_cookie = EVENTHANDLER_REGISTER( ifnet_departure_event, lagg_port_ifdetach, NULL, EVENTHANDLER_PRI_ANY); break; case MOD_UNLOAD: EVENTHANDLER_DEREGISTER(ifnet_departure_event, lagg_detach_cookie); lagg_input_p = NULL; lagg_linkstate_p = NULL; break; default: return (EOPNOTSUPP); } return (0); } static moduledata_t lagg_mod = { "if_lagg", lagg_modevent, 0 }; DECLARE_MODULE(if_lagg, lagg_mod, SI_SUB_PSEUDO, SI_ORDER_ANY); MODULE_VERSION(if_lagg, 1); static void lagg_proto_attach(struct lagg_softc *sc, lagg_proto pr) { KASSERT(sc->sc_proto == LAGG_PROTO_NONE, ("%s: sc %p has proto", __func__, sc)); if (sc->sc_ifflags & IFF_DEBUG) if_printf(sc->sc_ifp, "using proto %u\n", pr); if (lagg_protos[pr].pr_attach != NULL) lagg_protos[pr].pr_attach(sc); sc->sc_proto = pr; } static void lagg_proto_detach(struct lagg_softc *sc) { lagg_proto pr; LAGG_WLOCK_ASSERT(sc); pr = sc->sc_proto; sc->sc_proto = LAGG_PROTO_NONE; if (lagg_protos[pr].pr_detach != NULL) lagg_protos[pr].pr_detach(sc); else LAGG_WUNLOCK(sc); } static int lagg_proto_start(struct lagg_softc *sc, struct mbuf *m) { return (lagg_protos[sc->sc_proto].pr_start(sc, m)); } static struct mbuf * lagg_proto_input(struct lagg_softc *sc, struct lagg_port *lp, struct mbuf *m) { return (lagg_protos[sc->sc_proto].pr_input(sc, lp, m)); } static int lagg_proto_addport(struct lagg_softc *sc, struct lagg_port *lp) { if (lagg_protos[sc->sc_proto].pr_addport == NULL) return (0); else return (lagg_protos[sc->sc_proto].pr_addport(lp)); } static void lagg_proto_delport(struct lagg_softc *sc, struct lagg_port *lp) { if (lagg_protos[sc->sc_proto].pr_delport != NULL) lagg_protos[sc->sc_proto].pr_delport(lp); } static void lagg_proto_linkstate(struct lagg_softc *sc, struct lagg_port *lp) { if (lagg_protos[sc->sc_proto].pr_linkstate != NULL) lagg_protos[sc->sc_proto].pr_linkstate(lp); } static void lagg_proto_init(struct lagg_softc *sc) { if (lagg_protos[sc->sc_proto].pr_init != NULL) lagg_protos[sc->sc_proto].pr_init(sc); } static void lagg_proto_stop(struct lagg_softc *sc) { if (lagg_protos[sc->sc_proto].pr_stop != NULL) lagg_protos[sc->sc_proto].pr_stop(sc); } static void lagg_proto_lladdr(struct lagg_softc *sc) { if (lagg_protos[sc->sc_proto].pr_lladdr != NULL) lagg_protos[sc->sc_proto].pr_lladdr(sc); } static void lagg_proto_request(struct lagg_softc *sc, void *v) { if (lagg_protos[sc->sc_proto].pr_request != NULL) lagg_protos[sc->sc_proto].pr_request(sc, v); } static void lagg_proto_portreq(struct lagg_softc *sc, struct lagg_port *lp, void *v) { if (lagg_protos[sc->sc_proto].pr_portreq != NULL) lagg_protos[sc->sc_proto].pr_portreq(lp, v); } /* * This routine is run via an vlan * config EVENT */ static void lagg_register_vlan(void *arg, struct ifnet *ifp, u_int16_t vtag) { struct lagg_softc *sc = ifp->if_softc; struct lagg_port *lp; struct rm_priotracker tracker; if (ifp->if_softc != arg) /* Not our event */ return; LAGG_RLOCK(sc, &tracker); if (!SLIST_EMPTY(&sc->sc_ports)) { SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) EVENTHANDLER_INVOKE(vlan_config, lp->lp_ifp, vtag); } LAGG_RUNLOCK(sc, &tracker); } /* * This routine is run via an vlan * unconfig EVENT */ static void lagg_unregister_vlan(void *arg, struct ifnet *ifp, u_int16_t vtag) { struct lagg_softc *sc = ifp->if_softc; struct lagg_port *lp; struct rm_priotracker tracker; if (ifp->if_softc != arg) /* Not our event */ return; LAGG_RLOCK(sc, &tracker); if (!SLIST_EMPTY(&sc->sc_ports)) { SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) EVENTHANDLER_INVOKE(vlan_unconfig, lp->lp_ifp, vtag); } LAGG_RUNLOCK(sc, &tracker); } static int lagg_clone_create(struct if_clone *ifc, int unit, caddr_t params) { struct lagg_softc *sc; struct ifnet *ifp; static const u_char eaddr[6]; /* 00:00:00:00:00:00 */ sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO); ifp = sc->sc_ifp = if_alloc(IFT_ETHER); if (ifp == NULL) { free(sc, M_DEVBUF); return (ENOSPC); } if (V_def_use_flowid) sc->sc_opts |= LAGG_OPT_USE_FLOWID; sc->flowid_shift = V_def_flowid_shift; /* Hash all layers by default */ sc->sc_flags = MBUF_HASHFLAG_L2|MBUF_HASHFLAG_L3|MBUF_HASHFLAG_L4; lagg_proto_attach(sc, LAGG_PROTO_DEFAULT); LAGG_LOCK_INIT(sc); SLIST_INIT(&sc->sc_ports); TASK_INIT(&sc->sc_lladdr_task, 0, lagg_port_setlladdr, sc); /* Initialise pseudo media types */ ifmedia_init(&sc->sc_media, 0, lagg_media_change, lagg_media_status); ifmedia_add(&sc->sc_media, IFM_ETHER | IFM_AUTO, 0, NULL); ifmedia_set(&sc->sc_media, IFM_ETHER | IFM_AUTO); if_initname(ifp, laggname, unit); ifp->if_softc = sc; ifp->if_transmit = lagg_transmit; ifp->if_qflush = lagg_qflush; ifp->if_init = lagg_init; ifp->if_ioctl = lagg_ioctl; ifp->if_get_counter = lagg_get_counter; ifp->if_flags = IFF_SIMPLEX | IFF_BROADCAST | IFF_MULTICAST; ifp->if_capenable = ifp->if_capabilities = IFCAP_HWSTATS; /* * Attach as an ordinary ethernet device, children will be attached * as special device IFT_IEEE8023ADLAG. */ ether_ifattach(ifp, eaddr); sc->vlan_attach = EVENTHANDLER_REGISTER(vlan_config, lagg_register_vlan, sc, EVENTHANDLER_PRI_FIRST); sc->vlan_detach = EVENTHANDLER_REGISTER(vlan_unconfig, lagg_unregister_vlan, sc, EVENTHANDLER_PRI_FIRST); /* Insert into the global list of laggs */ LAGG_LIST_LOCK(); SLIST_INSERT_HEAD(&V_lagg_list, sc, sc_entries); LAGG_LIST_UNLOCK(); return (0); } static void lagg_clone_destroy(struct ifnet *ifp) { struct lagg_softc *sc = (struct lagg_softc *)ifp->if_softc; struct lagg_port *lp; LAGG_WLOCK(sc); lagg_stop(sc); ifp->if_flags &= ~IFF_UP; EVENTHANDLER_DEREGISTER(vlan_config, sc->vlan_attach); EVENTHANDLER_DEREGISTER(vlan_unconfig, sc->vlan_detach); /* Shutdown and remove lagg ports */ while ((lp = SLIST_FIRST(&sc->sc_ports)) != NULL) lagg_port_destroy(lp, 1); /* Unhook the aggregation protocol */ lagg_proto_detach(sc); LAGG_UNLOCK_ASSERT(sc); ifmedia_removeall(&sc->sc_media); ether_ifdetach(ifp); if_free(ifp); LAGG_LIST_LOCK(); SLIST_REMOVE(&V_lagg_list, sc, lagg_softc, sc_entries); LAGG_LIST_UNLOCK(); taskqueue_drain(taskqueue_swi, &sc->sc_lladdr_task); LAGG_LOCK_DESTROY(sc); free(sc, M_DEVBUF); } /* * Set link-layer address on the lagg interface itself. * * Set noinline to be dtrace-friendly */ static __noinline void lagg_lladdr(struct lagg_softc *sc, uint8_t *lladdr) { struct ifnet *ifp = sc->sc_ifp; struct lagg_port lp; if (memcmp(lladdr, IF_LLADDR(ifp), ETHER_ADDR_LEN) == 0) return; LAGG_WLOCK_ASSERT(sc); /* * Set the link layer address on the lagg interface. * lagg_proto_lladdr() notifies the MAC change to * the aggregation protocol. iflladdr_event handler which * may trigger gratuitous ARPs for INET will be handled in * a taskqueue. */ bcopy(lladdr, IF_LLADDR(ifp), ETHER_ADDR_LEN); lagg_proto_lladdr(sc); /* * Send notification request for lagg interface * itself. Note that new lladdr is already set. */ bzero(&lp, sizeof(lp)); lp.lp_ifp = sc->sc_ifp; lp.lp_softc = sc; /* Do not request lladdr change */ lagg_port_lladdr(&lp, lladdr, LAGG_LLQTYPE_VIRT); } static void lagg_capabilities(struct lagg_softc *sc) { struct lagg_port *lp; int cap = ~0, ena = ~0; u_long hwa = ~0UL; struct ifnet_hw_tsomax hw_tsomax; LAGG_WLOCK_ASSERT(sc); memset(&hw_tsomax, 0, sizeof(hw_tsomax)); /* Get capabilities from the lagg ports */ SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { cap &= lp->lp_ifp->if_capabilities; ena &= lp->lp_ifp->if_capenable; hwa &= lp->lp_ifp->if_hwassist; if_hw_tsomax_common(lp->lp_ifp, &hw_tsomax); } cap = (cap == ~0 ? 0 : cap); ena = (ena == ~0 ? 0 : ena); hwa = (hwa == ~0 ? 0 : hwa); if (sc->sc_ifp->if_capabilities != cap || sc->sc_ifp->if_capenable != ena || sc->sc_ifp->if_hwassist != hwa || if_hw_tsomax_update(sc->sc_ifp, &hw_tsomax) != 0) { sc->sc_ifp->if_capabilities = cap; sc->sc_ifp->if_capenable = ena; sc->sc_ifp->if_hwassist = hwa; getmicrotime(&sc->sc_ifp->if_lastchange); if (sc->sc_ifflags & IFF_DEBUG) if_printf(sc->sc_ifp, "capabilities 0x%08x enabled 0x%08x\n", cap, ena); } } /* * Enqueue interface lladdr notification. * If request is already queued, it is updated. * If setting lladdr is also desired, @do_change has to be set to 1. * * Set noinline to be dtrace-friendly */ static __noinline void lagg_port_lladdr(struct lagg_port *lp, uint8_t *lladdr, lagg_llqtype llq_type) { struct lagg_softc *sc = lp->lp_softc; struct ifnet *ifp = lp->lp_ifp; struct lagg_llq *llq; LAGG_WLOCK_ASSERT(sc); /* * Do not enqueue requests where lladdr is the same for * "physical" interfaces (e.g. ports in lagg) */ if (llq_type == LAGG_LLQTYPE_PHYS && memcmp(IF_LLADDR(ifp), lladdr, ETHER_ADDR_LEN) == 0) return; /* Check to make sure its not already queued to be changed */ SLIST_FOREACH(llq, &sc->sc_llq_head, llq_entries) { if (llq->llq_ifp == ifp) { /* Update lladdr, it may have changed */ bcopy(lladdr, llq->llq_lladdr, ETHER_ADDR_LEN); return; } } llq = malloc(sizeof(struct lagg_llq), M_DEVBUF, M_NOWAIT | M_ZERO); if (llq == NULL) /* XXX what to do */ return; llq->llq_ifp = ifp; llq->llq_type = llq_type; bcopy(lladdr, llq->llq_lladdr, ETHER_ADDR_LEN); /* XXX: We should insert to tail */ SLIST_INSERT_HEAD(&sc->sc_llq_head, llq, llq_entries); taskqueue_enqueue(taskqueue_swi, &sc->sc_lladdr_task); } /* * Set the interface MAC address from a taskqueue to avoid a LOR. * * Set noinline to be dtrace-friendly */ static __noinline void lagg_port_setlladdr(void *arg, int pending) { struct lagg_softc *sc = (struct lagg_softc *)arg; struct lagg_llq *llq, *head; struct ifnet *ifp; /* Grab a local reference of the queue and remove it from the softc */ LAGG_WLOCK(sc); head = SLIST_FIRST(&sc->sc_llq_head); SLIST_FIRST(&sc->sc_llq_head) = NULL; LAGG_WUNLOCK(sc); /* * Traverse the queue and set the lladdr on each ifp. It is safe to do * unlocked as we have the only reference to it. */ for (llq = head; llq != NULL; llq = head) { ifp = llq->llq_ifp; CURVNET_SET(ifp->if_vnet); /* * Set the link layer address on the laggport interface. * Note that if_setlladdr() or iflladdr_event handler * may result in arp transmission / lltable updates. */ if (llq->llq_type == LAGG_LLQTYPE_PHYS) if_setlladdr(ifp, llq->llq_lladdr, ETHER_ADDR_LEN); else EVENTHANDLER_INVOKE(iflladdr_event, ifp); CURVNET_RESTORE(); head = SLIST_NEXT(llq, llq_entries); free(llq, M_DEVBUF); } } static int lagg_port_create(struct lagg_softc *sc, struct ifnet *ifp) { struct lagg_softc *sc_ptr; struct lagg_port *lp, *tlp; int error, i; uint64_t *pval; LAGG_WLOCK_ASSERT(sc); /* Limit the maximal number of lagg ports */ if (sc->sc_count >= LAGG_MAX_PORTS) return (ENOSPC); /* Check if port has already been associated to a lagg */ if (ifp->if_lagg != NULL) { /* Port is already in the current lagg? */ lp = (struct lagg_port *)ifp->if_lagg; if (lp->lp_softc == sc) return (EEXIST); return (EBUSY); } /* XXX Disallow non-ethernet interfaces (this should be any of 802) */ if (ifp->if_type != IFT_ETHER && ifp->if_type != IFT_L2VLAN) return (EPROTONOSUPPORT); /* Allow the first Ethernet member to define the MTU */ if (SLIST_EMPTY(&sc->sc_ports)) sc->sc_ifp->if_mtu = ifp->if_mtu; else if (sc->sc_ifp->if_mtu != ifp->if_mtu) { if_printf(sc->sc_ifp, "invalid MTU for %s\n", ifp->if_xname); return (EINVAL); } if ((lp = malloc(sizeof(struct lagg_port), M_DEVBUF, M_NOWAIT|M_ZERO)) == NULL) return (ENOMEM); /* Check if port is a stacked lagg */ LAGG_LIST_LOCK(); SLIST_FOREACH(sc_ptr, &V_lagg_list, sc_entries) { if (ifp == sc_ptr->sc_ifp) { LAGG_LIST_UNLOCK(); free(lp, M_DEVBUF); return (EINVAL); /* XXX disable stacking for the moment, its untested */ #ifdef LAGG_PORT_STACKING lp->lp_flags |= LAGG_PORT_STACK; if (lagg_port_checkstacking(sc_ptr) >= LAGG_MAX_STACKING) { LAGG_LIST_UNLOCK(); free(lp, M_DEVBUF); return (E2BIG); } #endif } } LAGG_LIST_UNLOCK(); /* Change the interface type */ lp->lp_iftype = ifp->if_type; ifp->if_type = IFT_IEEE8023ADLAG; ifp->if_lagg = lp; lp->lp_ioctl = ifp->if_ioctl; ifp->if_ioctl = lagg_port_ioctl; lp->lp_output = ifp->if_output; ifp->if_output = lagg_port_output; lp->lp_ifp = ifp; lp->lp_softc = sc; /* Save port link layer address */ bcopy(IF_LLADDR(ifp), lp->lp_lladdr, ETHER_ADDR_LEN); if (SLIST_EMPTY(&sc->sc_ports)) { sc->sc_primary = lp; /* First port in lagg. Update/notify lagg lladdress */ lagg_lladdr(sc, IF_LLADDR(ifp)); } else { /* * Update link layer address for this port and * send notifications to other subsystems. */ lagg_port_lladdr(lp, IF_LLADDR(sc->sc_ifp), LAGG_LLQTYPE_PHYS); } /* * Insert into the list of ports. * Keep ports sorted by if_index. It is handy, when configuration * is predictable and `ifconfig laggN create ...` command * will lead to the same result each time. */ SLIST_FOREACH(tlp, &sc->sc_ports, lp_entries) { if (tlp->lp_ifp->if_index < ifp->if_index && ( SLIST_NEXT(tlp, lp_entries) == NULL || SLIST_NEXT(tlp, lp_entries)->lp_ifp->if_index > ifp->if_index)) break; } if (tlp != NULL) SLIST_INSERT_AFTER(tlp, lp, lp_entries); else SLIST_INSERT_HEAD(&sc->sc_ports, lp, lp_entries); sc->sc_count++; /* Update lagg capabilities */ lagg_capabilities(sc); lagg_linkstate(sc); /* Read port counters */ pval = lp->port_counters.val; for (i = 0; i < IFCOUNTERS; i++, pval++) *pval = ifp->if_get_counter(ifp, i); /* Add multicast addresses and interface flags to this port */ lagg_ether_cmdmulti(lp, 1); lagg_setflags(lp, 1); if ((error = lagg_proto_addport(sc, lp)) != 0) { /* Remove the port, without calling pr_delport. */ lagg_port_destroy(lp, 0); return (error); } return (0); } #ifdef LAGG_PORT_STACKING static int lagg_port_checkstacking(struct lagg_softc *sc) { struct lagg_softc *sc_ptr; struct lagg_port *lp; int m = 0; LAGG_WLOCK_ASSERT(sc); SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { if (lp->lp_flags & LAGG_PORT_STACK) { sc_ptr = (struct lagg_softc *)lp->lp_ifp->if_softc; m = MAX(m, lagg_port_checkstacking(sc_ptr)); } } return (m + 1); } #endif static int lagg_port_destroy(struct lagg_port *lp, int rundelport) { struct lagg_softc *sc = lp->lp_softc; struct lagg_port *lp_ptr, *lp0; struct lagg_llq *llq; struct ifnet *ifp = lp->lp_ifp; uint64_t *pval, vdiff; int i; LAGG_WLOCK_ASSERT(sc); if (rundelport) lagg_proto_delport(sc, lp); /* * Remove multicast addresses and interface flags from this port and * reset the MAC address, skip if the interface is being detached. */ if (!lp->lp_detaching) { lagg_ether_cmdmulti(lp, 0); lagg_setflags(lp, 0); lagg_port_lladdr(lp, lp->lp_lladdr, LAGG_LLQTYPE_PHYS); } /* Restore interface */ ifp->if_type = lp->lp_iftype; ifp->if_ioctl = lp->lp_ioctl; ifp->if_output = lp->lp_output; ifp->if_lagg = NULL; /* Update detached port counters */ pval = lp->port_counters.val; for (i = 0; i < IFCOUNTERS; i++, pval++) { vdiff = ifp->if_get_counter(ifp, i) - *pval; sc->detached_counters.val[i] += vdiff; } /* Finally, remove the port from the lagg */ SLIST_REMOVE(&sc->sc_ports, lp, lagg_port, lp_entries); sc->sc_count--; /* Update the primary interface */ if (lp == sc->sc_primary) { uint8_t lladdr[ETHER_ADDR_LEN]; if ((lp0 = SLIST_FIRST(&sc->sc_ports)) == NULL) { bzero(&lladdr, ETHER_ADDR_LEN); } else { bcopy(lp0->lp_lladdr, lladdr, ETHER_ADDR_LEN); } lagg_lladdr(sc, lladdr); /* Mark lp0 as new primary */ sc->sc_primary = lp0; /* * Enqueue lladdr update/notification for each port * (new primary needs update as well, to switch from * old lladdr to its 'real' one). */ SLIST_FOREACH(lp_ptr, &sc->sc_ports, lp_entries) lagg_port_lladdr(lp_ptr, lladdr, LAGG_LLQTYPE_PHYS); } /* Remove any pending lladdr changes from the queue */ if (lp->lp_detaching) { SLIST_FOREACH(llq, &sc->sc_llq_head, llq_entries) { if (llq->llq_ifp == ifp) { SLIST_REMOVE(&sc->sc_llq_head, llq, lagg_llq, llq_entries); free(llq, M_DEVBUF); break; /* Only appears once */ } } } if (lp->lp_ifflags) if_printf(ifp, "%s: lp_ifflags unclean\n", __func__); free(lp, M_DEVBUF); /* Update lagg capabilities */ lagg_capabilities(sc); lagg_linkstate(sc); return (0); } static int lagg_port_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { struct lagg_reqport *rp = (struct lagg_reqport *)data; struct lagg_softc *sc; struct lagg_port *lp = NULL; int error = 0; struct rm_priotracker tracker; /* Should be checked by the caller */ if (ifp->if_type != IFT_IEEE8023ADLAG || (lp = ifp->if_lagg) == NULL || (sc = lp->lp_softc) == NULL) goto fallback; switch (cmd) { case SIOCGLAGGPORT: if (rp->rp_portname[0] == '\0' || ifunit(rp->rp_portname) != ifp) { error = EINVAL; break; } LAGG_RLOCK(sc, &tracker); if ((lp = ifp->if_lagg) == NULL || lp->lp_softc != sc) { error = ENOENT; LAGG_RUNLOCK(sc, &tracker); break; } lagg_port2req(lp, rp); LAGG_RUNLOCK(sc, &tracker); break; case SIOCSIFCAP: if (lp->lp_ioctl == NULL) { error = EINVAL; break; } error = (*lp->lp_ioctl)(ifp, cmd, data); if (error) break; /* Update lagg interface capabilities */ LAGG_WLOCK(sc); lagg_capabilities(sc); LAGG_WUNLOCK(sc); break; case SIOCSIFMTU: /* Do not allow the MTU to be changed once joined */ error = EINVAL; break; default: goto fallback; } return (error); fallback: if (lp->lp_ioctl != NULL) return ((*lp->lp_ioctl)(ifp, cmd, data)); return (EINVAL); } /* * Requests counter @cnt data. * * Counter value is calculated the following way: * 1) for each port, sum difference between current and "initial" measurements. * 2) add lagg logical interface counters. * 3) add data from detached_counters array. * * We also do the following things on ports attach/detach: * 1) On port attach we store all counters it has into port_counter array. * 2) On port detach we add the different between "initial" and * current counters data to detached_counters array. */ static uint64_t lagg_get_counter(struct ifnet *ifp, ift_counter cnt) { struct lagg_softc *sc; struct lagg_port *lp; struct ifnet *lpifp; struct rm_priotracker tracker; uint64_t newval, oldval, vsum; /* Revise this when we've got non-generic counters. */ KASSERT(cnt < IFCOUNTERS, ("%s: invalid cnt %d", __func__, cnt)); sc = (struct lagg_softc *)ifp->if_softc; LAGG_RLOCK(sc, &tracker); vsum = 0; SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { /* Saved attached value */ oldval = lp->port_counters.val[cnt]; /* current value */ lpifp = lp->lp_ifp; newval = lpifp->if_get_counter(lpifp, cnt); /* Calculate diff and save new */ vsum += newval - oldval; } /* * Add counter data which might be added by upper * layer protocols operating on logical interface. */ vsum += if_get_counter_default(ifp, cnt); /* * Add counter data from detached ports counters */ vsum += sc->detached_counters.val[cnt]; LAGG_RUNLOCK(sc, &tracker); return (vsum); } /* * For direct output to child ports. */ static int lagg_port_output(struct ifnet *ifp, struct mbuf *m, const struct sockaddr *dst, struct route *ro) { struct lagg_port *lp = ifp->if_lagg; switch (dst->sa_family) { case pseudo_AF_HDRCMPLT: case AF_UNSPEC: return ((*lp->lp_output)(ifp, m, dst, ro)); } /* drop any other frames */ m_freem(m); return (ENETDOWN); } static void lagg_port_ifdetach(void *arg __unused, struct ifnet *ifp) { struct lagg_port *lp; struct lagg_softc *sc; if ((lp = ifp->if_lagg) == NULL) return; /* If the ifnet is just being renamed, don't do anything. */ if (ifp->if_flags & IFF_RENAMING) return; sc = lp->lp_softc; LAGG_WLOCK(sc); lp->lp_detaching = 1; lagg_port_destroy(lp, 1); LAGG_WUNLOCK(sc); } static void lagg_port2req(struct lagg_port *lp, struct lagg_reqport *rp) { struct lagg_softc *sc = lp->lp_softc; strlcpy(rp->rp_ifname, sc->sc_ifname, sizeof(rp->rp_ifname)); strlcpy(rp->rp_portname, lp->lp_ifp->if_xname, sizeof(rp->rp_portname)); rp->rp_prio = lp->lp_prio; rp->rp_flags = lp->lp_flags; lagg_proto_portreq(sc, lp, &rp->rp_psc); /* Add protocol specific flags */ switch (sc->sc_proto) { case LAGG_PROTO_FAILOVER: if (lp == sc->sc_primary) rp->rp_flags |= LAGG_PORT_MASTER; if (lp == lagg_link_active(sc, sc->sc_primary)) rp->rp_flags |= LAGG_PORT_ACTIVE; break; case LAGG_PROTO_ROUNDROBIN: case LAGG_PROTO_LOADBALANCE: case LAGG_PROTO_BROADCAST: if (LAGG_PORTACTIVE(lp)) rp->rp_flags |= LAGG_PORT_ACTIVE; break; case LAGG_PROTO_LACP: /* LACP has a different definition of active */ if (lacp_isactive(lp)) rp->rp_flags |= LAGG_PORT_ACTIVE; if (lacp_iscollecting(lp)) rp->rp_flags |= LAGG_PORT_COLLECTING; if (lacp_isdistributing(lp)) rp->rp_flags |= LAGG_PORT_DISTRIBUTING; break; } } static void lagg_init(void *xsc) { struct lagg_softc *sc = (struct lagg_softc *)xsc; struct ifnet *ifp = sc->sc_ifp; struct lagg_port *lp; if (ifp->if_drv_flags & IFF_DRV_RUNNING) return; LAGG_WLOCK(sc); ifp->if_drv_flags |= IFF_DRV_RUNNING; /* * Update the port lladdrs if needed. * This might be if_setlladdr() notification * that lladdr has been changed. */ SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) lagg_port_lladdr(lp, IF_LLADDR(ifp), LAGG_LLQTYPE_PHYS); lagg_proto_init(sc); LAGG_WUNLOCK(sc); } static void lagg_stop(struct lagg_softc *sc) { struct ifnet *ifp = sc->sc_ifp; LAGG_WLOCK_ASSERT(sc); if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) return; ifp->if_drv_flags &= ~IFF_DRV_RUNNING; lagg_proto_stop(sc); } static int lagg_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { struct lagg_softc *sc = (struct lagg_softc *)ifp->if_softc; struct lagg_reqall *ra = (struct lagg_reqall *)data; struct lagg_reqopts *ro = (struct lagg_reqopts *)data; struct lagg_reqport *rp = (struct lagg_reqport *)data, rpbuf; struct lagg_reqflags *rf = (struct lagg_reqflags *)data; struct ifreq *ifr = (struct ifreq *)data; struct lagg_port *lp; struct ifnet *tpif; struct thread *td = curthread; char *buf, *outbuf; int count, buflen, len, error = 0; struct rm_priotracker tracker; bzero(&rpbuf, sizeof(rpbuf)); switch (cmd) { case SIOCGLAGG: LAGG_RLOCK(sc, &tracker); count = 0; SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) count++; buflen = count * sizeof(struct lagg_reqport); LAGG_RUNLOCK(sc, &tracker); outbuf = malloc(buflen, M_TEMP, M_WAITOK | M_ZERO); LAGG_RLOCK(sc, &tracker); ra->ra_proto = sc->sc_proto; lagg_proto_request(sc, &ra->ra_psc); count = 0; buf = outbuf; len = min(ra->ra_size, buflen); SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { if (len < sizeof(rpbuf)) break; lagg_port2req(lp, &rpbuf); memcpy(buf, &rpbuf, sizeof(rpbuf)); count++; buf += sizeof(rpbuf); len -= sizeof(rpbuf); } LAGG_RUNLOCK(sc, &tracker); ra->ra_ports = count; ra->ra_size = count * sizeof(rpbuf); error = copyout(outbuf, ra->ra_port, ra->ra_size); free(outbuf, M_TEMP); break; case SIOCSLAGG: error = priv_check(td, PRIV_NET_LAGG); if (error) break; if (ra->ra_proto < 1 || ra->ra_proto >= LAGG_PROTO_MAX) { error = EPROTONOSUPPORT; break; } LAGG_WLOCK(sc); lagg_proto_detach(sc); LAGG_UNLOCK_ASSERT(sc); lagg_proto_attach(sc, ra->ra_proto); break; case SIOCGLAGGOPTS: ro->ro_opts = sc->sc_opts; if (sc->sc_proto == LAGG_PROTO_LACP) { struct lacp_softc *lsc; lsc = (struct lacp_softc *)sc->sc_psc; if (lsc->lsc_debug.lsc_tx_test != 0) ro->ro_opts |= LAGG_OPT_LACP_TXTEST; if (lsc->lsc_debug.lsc_rx_test != 0) ro->ro_opts |= LAGG_OPT_LACP_RXTEST; if (lsc->lsc_strict_mode != 0) ro->ro_opts |= LAGG_OPT_LACP_STRICT; if (lsc->lsc_fast_timeout != 0) ro->ro_opts |= LAGG_OPT_LACP_TIMEOUT; ro->ro_active = sc->sc_active; } else { ro->ro_active = 0; SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) ro->ro_active += LAGG_PORTACTIVE(lp); } + ro->ro_bkt = sc->sc_bkt; ro->ro_flapping = sc->sc_flapping; ro->ro_flowid_shift = sc->flowid_shift; break; case SIOCSLAGGOPTS: + if (sc->sc_proto == LAGG_PROTO_ROUNDROBIN) { + if (ro->ro_bkt == 0) + sc->sc_bkt = 1; // Minimum 1 packet per iface. + else + sc->sc_bkt = ro->ro_bkt; + } error = priv_check(td, PRIV_NET_LAGG); if (error) break; if (ro->ro_opts == 0) break; /* * Set options. LACP options are stored in sc->sc_psc, * not in sc_opts. */ int valid, lacp; switch (ro->ro_opts) { case LAGG_OPT_USE_FLOWID: case -LAGG_OPT_USE_FLOWID: case LAGG_OPT_FLOWIDSHIFT: valid = 1; lacp = 0; break; case LAGG_OPT_LACP_TXTEST: case -LAGG_OPT_LACP_TXTEST: case LAGG_OPT_LACP_RXTEST: case -LAGG_OPT_LACP_RXTEST: case LAGG_OPT_LACP_STRICT: case -LAGG_OPT_LACP_STRICT: case LAGG_OPT_LACP_TIMEOUT: case -LAGG_OPT_LACP_TIMEOUT: valid = lacp = 1; break; default: valid = lacp = 0; break; } LAGG_WLOCK(sc); + if (valid == 0 || (lacp == 1 && sc->sc_proto != LAGG_PROTO_LACP)) { /* Invalid combination of options specified. */ error = EINVAL; LAGG_WUNLOCK(sc); break; /* Return from SIOCSLAGGOPTS. */ } /* * Store new options into sc->sc_opts except for * FLOWIDSHIFT and LACP options. */ if (lacp == 0) { if (ro->ro_opts == LAGG_OPT_FLOWIDSHIFT) sc->flowid_shift = ro->ro_flowid_shift; else if (ro->ro_opts > 0) sc->sc_opts |= ro->ro_opts; else sc->sc_opts &= ~ro->ro_opts; } else { struct lacp_softc *lsc; struct lacp_port *lp; lsc = (struct lacp_softc *)sc->sc_psc; switch (ro->ro_opts) { case LAGG_OPT_LACP_TXTEST: lsc->lsc_debug.lsc_tx_test = 1; break; case -LAGG_OPT_LACP_TXTEST: lsc->lsc_debug.lsc_tx_test = 0; break; case LAGG_OPT_LACP_RXTEST: lsc->lsc_debug.lsc_rx_test = 1; break; case -LAGG_OPT_LACP_RXTEST: lsc->lsc_debug.lsc_rx_test = 0; break; case LAGG_OPT_LACP_STRICT: lsc->lsc_strict_mode = 1; break; case -LAGG_OPT_LACP_STRICT: lsc->lsc_strict_mode = 0; break; case LAGG_OPT_LACP_TIMEOUT: LACP_LOCK(lsc); LIST_FOREACH(lp, &lsc->lsc_ports, lp_next) lp->lp_state |= LACP_STATE_TIMEOUT; LACP_UNLOCK(lsc); lsc->lsc_fast_timeout = 1; break; case -LAGG_OPT_LACP_TIMEOUT: LACP_LOCK(lsc); LIST_FOREACH(lp, &lsc->lsc_ports, lp_next) lp->lp_state &= ~LACP_STATE_TIMEOUT; LACP_UNLOCK(lsc); lsc->lsc_fast_timeout = 0; break; } } LAGG_WUNLOCK(sc); break; case SIOCGLAGGFLAGS: rf->rf_flags = 0; LAGG_RLOCK(sc, &tracker); if (sc->sc_flags & MBUF_HASHFLAG_L2) rf->rf_flags |= LAGG_F_HASHL2; if (sc->sc_flags & MBUF_HASHFLAG_L3) rf->rf_flags |= LAGG_F_HASHL3; if (sc->sc_flags & MBUF_HASHFLAG_L4) rf->rf_flags |= LAGG_F_HASHL4; LAGG_RUNLOCK(sc, &tracker); break; case SIOCSLAGGHASH: error = priv_check(td, PRIV_NET_LAGG); if (error) break; if ((rf->rf_flags & LAGG_F_HASHMASK) == 0) { error = EINVAL; break; } LAGG_WLOCK(sc); sc->sc_flags = 0; if (rf->rf_flags & LAGG_F_HASHL2) sc->sc_flags |= MBUF_HASHFLAG_L2; if (rf->rf_flags & LAGG_F_HASHL3) sc->sc_flags |= MBUF_HASHFLAG_L3; if (rf->rf_flags & LAGG_F_HASHL4) sc->sc_flags |= MBUF_HASHFLAG_L4; LAGG_WUNLOCK(sc); break; case SIOCGLAGGPORT: if (rp->rp_portname[0] == '\0' || (tpif = ifunit(rp->rp_portname)) == NULL) { error = EINVAL; break; } LAGG_RLOCK(sc, &tracker); if ((lp = (struct lagg_port *)tpif->if_lagg) == NULL || lp->lp_softc != sc) { error = ENOENT; LAGG_RUNLOCK(sc, &tracker); break; } lagg_port2req(lp, rp); LAGG_RUNLOCK(sc, &tracker); break; case SIOCSLAGGPORT: error = priv_check(td, PRIV_NET_LAGG); if (error) break; if (rp->rp_portname[0] == '\0' || (tpif = ifunit(rp->rp_portname)) == NULL) { error = EINVAL; break; } #ifdef INET6 /* * A laggport interface should not have inet6 address * because two interfaces with a valid link-local * scope zone must not be merged in any form. This * restriction is needed to prevent violation of * link-local scope zone. Attempts to add a laggport * interface which has inet6 addresses triggers * removal of all inet6 addresses on the member * interface. */ if (in6ifa_llaonifp(tpif)) { in6_ifdetach(tpif); if_printf(sc->sc_ifp, "IPv6 addresses on %s have been removed " "before adding it as a member to prevent " "IPv6 address scope violation.\n", tpif->if_xname); } #endif LAGG_WLOCK(sc); error = lagg_port_create(sc, tpif); LAGG_WUNLOCK(sc); break; case SIOCSLAGGDELPORT: error = priv_check(td, PRIV_NET_LAGG); if (error) break; if (rp->rp_portname[0] == '\0' || (tpif = ifunit(rp->rp_portname)) == NULL) { error = EINVAL; break; } LAGG_WLOCK(sc); if ((lp = (struct lagg_port *)tpif->if_lagg) == NULL || lp->lp_softc != sc) { error = ENOENT; LAGG_WUNLOCK(sc); break; } error = lagg_port_destroy(lp, 1); LAGG_WUNLOCK(sc); break; case SIOCSIFFLAGS: /* Set flags on ports too */ LAGG_WLOCK(sc); SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { lagg_setflags(lp, 1); } LAGG_WUNLOCK(sc); if (!(ifp->if_flags & IFF_UP) && (ifp->if_drv_flags & IFF_DRV_RUNNING)) { /* * If interface is marked down and it is running, * then stop and disable it. */ LAGG_WLOCK(sc); lagg_stop(sc); LAGG_WUNLOCK(sc); } else if ((ifp->if_flags & IFF_UP) && !(ifp->if_drv_flags & IFF_DRV_RUNNING)) { /* * If interface is marked up and it is stopped, then * start it. */ (*ifp->if_init)(sc); } break; case SIOCADDMULTI: case SIOCDELMULTI: LAGG_WLOCK(sc); error = lagg_ether_setmulti(sc); LAGG_WUNLOCK(sc); break; case SIOCSIFMEDIA: case SIOCGIFMEDIA: error = ifmedia_ioctl(ifp, ifr, &sc->sc_media, cmd); break; case SIOCSIFCAP: case SIOCSIFMTU: /* Do not allow the MTU or caps to be directly changed */ error = EINVAL; break; default: error = ether_ioctl(ifp, cmd, data); break; } return (error); } static int lagg_ether_setmulti(struct lagg_softc *sc) { struct lagg_port *lp; LAGG_WLOCK_ASSERT(sc); SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { /* First, remove any existing filter entries. */ lagg_ether_cmdmulti(lp, 0); /* copy all addresses from the lagg interface to the port */ lagg_ether_cmdmulti(lp, 1); } return (0); } static int lagg_ether_cmdmulti(struct lagg_port *lp, int set) { struct lagg_softc *sc = lp->lp_softc; struct ifnet *ifp = lp->lp_ifp; struct ifnet *scifp = sc->sc_ifp; struct lagg_mc *mc; struct ifmultiaddr *ifma; int error; LAGG_WLOCK_ASSERT(sc); if (set) { IF_ADDR_WLOCK(scifp); TAILQ_FOREACH(ifma, &scifp->if_multiaddrs, ifma_link) { if (ifma->ifma_addr->sa_family != AF_LINK) continue; mc = malloc(sizeof(struct lagg_mc), M_DEVBUF, M_NOWAIT); if (mc == NULL) { IF_ADDR_WUNLOCK(scifp); return (ENOMEM); } bcopy(ifma->ifma_addr, &mc->mc_addr, ifma->ifma_addr->sa_len); mc->mc_addr.sdl_index = ifp->if_index; mc->mc_ifma = NULL; SLIST_INSERT_HEAD(&lp->lp_mc_head, mc, mc_entries); } IF_ADDR_WUNLOCK(scifp); SLIST_FOREACH (mc, &lp->lp_mc_head, mc_entries) { error = if_addmulti(ifp, (struct sockaddr *)&mc->mc_addr, &mc->mc_ifma); if (error) return (error); } } else { while ((mc = SLIST_FIRST(&lp->lp_mc_head)) != NULL) { SLIST_REMOVE(&lp->lp_mc_head, mc, lagg_mc, mc_entries); if (mc->mc_ifma && !lp->lp_detaching) if_delmulti_ifma(mc->mc_ifma); free(mc, M_DEVBUF); } } return (0); } /* Handle a ref counted flag that should be set on the lagg port as well */ static int lagg_setflag(struct lagg_port *lp, int flag, int status, int (*func)(struct ifnet *, int)) { struct lagg_softc *sc = lp->lp_softc; struct ifnet *scifp = sc->sc_ifp; struct ifnet *ifp = lp->lp_ifp; int error; LAGG_WLOCK_ASSERT(sc); status = status ? (scifp->if_flags & flag) : 0; /* Now "status" contains the flag value or 0 */ /* * See if recorded ports status is different from what * we want it to be. If it is, flip it. We record ports * status in lp_ifflags so that we won't clear ports flag * we haven't set. In fact, we don't clear or set ports * flags directly, but get or release references to them. * That's why we can be sure that recorded flags still are * in accord with actual ports flags. */ if (status != (lp->lp_ifflags & flag)) { error = (*func)(ifp, status); if (error) return (error); lp->lp_ifflags &= ~flag; lp->lp_ifflags |= status; } return (0); } /* * Handle IFF_* flags that require certain changes on the lagg port * if "status" is true, update ports flags respective to the lagg * if "status" is false, forcedly clear the flags set on port. */ static int lagg_setflags(struct lagg_port *lp, int status) { int error, i; for (i = 0; lagg_pflags[i].flag; i++) { error = lagg_setflag(lp, lagg_pflags[i].flag, status, lagg_pflags[i].func); if (error) return (error); } return (0); } static int lagg_transmit(struct ifnet *ifp, struct mbuf *m) { struct lagg_softc *sc = (struct lagg_softc *)ifp->if_softc; int error, len, mcast; struct rm_priotracker tracker; len = m->m_pkthdr.len; mcast = (m->m_flags & (M_MCAST | M_BCAST)) ? 1 : 0; LAGG_RLOCK(sc, &tracker); /* We need a Tx algorithm and at least one port */ if (sc->sc_proto == LAGG_PROTO_NONE || sc->sc_count == 0) { LAGG_RUNLOCK(sc, &tracker); m_freem(m); if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); return (ENXIO); } ETHER_BPF_MTAP(ifp, m); error = lagg_proto_start(sc, m); LAGG_RUNLOCK(sc, &tracker); if (error != 0) if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); return (error); } /* * The ifp->if_qflush entry point for lagg(4) is no-op. */ static void lagg_qflush(struct ifnet *ifp __unused) { } static struct mbuf * lagg_input(struct ifnet *ifp, struct mbuf *m) { struct lagg_port *lp = ifp->if_lagg; struct lagg_softc *sc = lp->lp_softc; struct ifnet *scifp = sc->sc_ifp; struct rm_priotracker tracker; LAGG_RLOCK(sc, &tracker); if ((scifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || (lp->lp_flags & LAGG_PORT_DISABLED) || sc->sc_proto == LAGG_PROTO_NONE) { LAGG_RUNLOCK(sc, &tracker); m_freem(m); return (NULL); } ETHER_BPF_MTAP(scifp, m); if (lp->lp_detaching != 0) { m_freem(m); m = NULL; } else m = lagg_proto_input(sc, lp, m); if (m != NULL) { if (scifp->if_flags & IFF_MONITOR) { m_freem(m); m = NULL; } } LAGG_RUNLOCK(sc, &tracker); return (m); } static int lagg_media_change(struct ifnet *ifp) { struct lagg_softc *sc = (struct lagg_softc *)ifp->if_softc; if (sc->sc_ifflags & IFF_DEBUG) printf("%s\n", __func__); /* Ignore */ return (0); } static void lagg_media_status(struct ifnet *ifp, struct ifmediareq *imr) { struct lagg_softc *sc = (struct lagg_softc *)ifp->if_softc; struct lagg_port *lp; struct rm_priotracker tracker; imr->ifm_status = IFM_AVALID; imr->ifm_active = IFM_ETHER | IFM_AUTO; LAGG_RLOCK(sc, &tracker); SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { if (LAGG_PORTACTIVE(lp)) imr->ifm_status |= IFM_ACTIVE; } LAGG_RUNLOCK(sc, &tracker); } static void lagg_linkstate(struct lagg_softc *sc) { struct lagg_port *lp; int new_link = LINK_STATE_DOWN; uint64_t speed; /* Our link is considered up if at least one of our ports is active */ SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { if (lp->lp_ifp->if_link_state == LINK_STATE_UP) { new_link = LINK_STATE_UP; break; } } if_link_state_change(sc->sc_ifp, new_link); /* Update if_baudrate to reflect the max possible speed */ switch (sc->sc_proto) { case LAGG_PROTO_FAILOVER: sc->sc_ifp->if_baudrate = sc->sc_primary != NULL ? sc->sc_primary->lp_ifp->if_baudrate : 0; break; case LAGG_PROTO_ROUNDROBIN: case LAGG_PROTO_LOADBALANCE: case LAGG_PROTO_BROADCAST: speed = 0; SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) speed += lp->lp_ifp->if_baudrate; sc->sc_ifp->if_baudrate = speed; break; case LAGG_PROTO_LACP: /* LACP updates if_baudrate itself */ break; } } static void lagg_port_state(struct ifnet *ifp, int state) { struct lagg_port *lp = (struct lagg_port *)ifp->if_lagg; struct lagg_softc *sc = NULL; if (lp != NULL) sc = lp->lp_softc; if (sc == NULL) return; LAGG_WLOCK(sc); lagg_linkstate(sc); lagg_proto_linkstate(sc, lp); LAGG_WUNLOCK(sc); } struct lagg_port * lagg_link_active(struct lagg_softc *sc, struct lagg_port *lp) { struct lagg_port *lp_next, *rval = NULL; // int new_link = LINK_STATE_DOWN; LAGG_RLOCK_ASSERT(sc); /* * Search a port which reports an active link state. */ if (lp == NULL) goto search; if (LAGG_PORTACTIVE(lp)) { rval = lp; goto found; } if ((lp_next = SLIST_NEXT(lp, lp_entries)) != NULL && LAGG_PORTACTIVE(lp_next)) { rval = lp_next; goto found; } search: SLIST_FOREACH(lp_next, &sc->sc_ports, lp_entries) { if (LAGG_PORTACTIVE(lp_next)) { rval = lp_next; goto found; } } found: if (rval != NULL) { /* * The IEEE 802.1D standard assumes that a lagg with * multiple ports is always full duplex. This is valid * for load sharing laggs and if at least two links * are active. Unfortunately, checking the latter would * be too expensive at this point. XXX if ((sc->sc_capabilities & IFCAP_LAGG_FULLDUPLEX) && (sc->sc_count > 1)) new_link = LINK_STATE_FULL_DUPLEX; else new_link = rval->lp_link_state; */ } return (rval); } int lagg_enqueue(struct ifnet *ifp, struct mbuf *m) { return (ifp->if_transmit)(ifp, m); } /* * Simple round robin aggregation */ static void lagg_rr_attach(struct lagg_softc *sc) { sc->sc_capabilities = IFCAP_LAGG_FULLDUPLEX; sc->sc_seq = 0; + sc->sc_bkt_count = sc->sc_bkt; } static int lagg_rr_start(struct lagg_softc *sc, struct mbuf *m) { struct lagg_port *lp; uint32_t p; - p = atomic_fetchadd_32(&sc->sc_seq, 1); + if (sc->sc_bkt_count == 0 && sc->sc_bkt > 0) + sc->sc_bkt_count = sc->sc_bkt; + + if (sc->sc_bkt > 0) { + atomic_subtract_int(&sc->sc_bkt_count, 1); + if (atomic_cmpset_int(&sc->sc_bkt_count, 0, sc->sc_bkt)) + p = atomic_fetchadd_32(&sc->sc_seq, 1); + else + p = sc->sc_seq; + } else + p = atomic_fetchadd_32(&sc->sc_seq, 1); + p %= sc->sc_count; lp = SLIST_FIRST(&sc->sc_ports); + while (p--) lp = SLIST_NEXT(lp, lp_entries); /* * Check the port's link state. This will return the next active * port if the link is down or the port is NULL. */ if ((lp = lagg_link_active(sc, lp)) == NULL) { m_freem(m); return (ENETDOWN); } /* Send mbuf */ return (lagg_enqueue(lp->lp_ifp, m)); } static struct mbuf * lagg_rr_input(struct lagg_softc *sc, struct lagg_port *lp, struct mbuf *m) { struct ifnet *ifp = sc->sc_ifp; /* Just pass in the packet to our lagg device */ m->m_pkthdr.rcvif = ifp; return (m); } /* * Broadcast mode */ static int lagg_bcast_start(struct lagg_softc *sc, struct mbuf *m) { int active_ports = 0; int errors = 0; int ret; struct lagg_port *lp, *last = NULL; struct mbuf *m0; SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) { if (!LAGG_PORTACTIVE(lp)) continue; active_ports++; if (last != NULL) { m0 = m_copym(m, 0, M_COPYALL, M_NOWAIT); if (m0 == NULL) { ret = ENOBUFS; errors++; break; } ret = lagg_enqueue(last->lp_ifp, m0); if (ret != 0) errors++; } last = lp; } if (last == NULL) { m_freem(m); return (ENOENT); } if ((last = lagg_link_active(sc, last)) == NULL) { m_freem(m); return (ENETDOWN); } ret = lagg_enqueue(last->lp_ifp, m); if (ret != 0) errors++; if (errors == 0) return (ret); return (0); } static struct mbuf* lagg_bcast_input(struct lagg_softc *sc, struct lagg_port *lp, struct mbuf *m) { struct ifnet *ifp = sc->sc_ifp; /* Just pass in the packet to our lagg device */ m->m_pkthdr.rcvif = ifp; return (m); } /* * Active failover */ static int lagg_fail_start(struct lagg_softc *sc, struct mbuf *m) { struct lagg_port *lp; /* Use the master port if active or the next available port */ if ((lp = lagg_link_active(sc, sc->sc_primary)) == NULL) { m_freem(m); return (ENETDOWN); } /* Send mbuf */ return (lagg_enqueue(lp->lp_ifp, m)); } static struct mbuf * lagg_fail_input(struct lagg_softc *sc, struct lagg_port *lp, struct mbuf *m) { struct ifnet *ifp = sc->sc_ifp; struct lagg_port *tmp_tp; if (lp == sc->sc_primary || V_lagg_failover_rx_all) { m->m_pkthdr.rcvif = ifp; return (m); } if (!LAGG_PORTACTIVE(sc->sc_primary)) { tmp_tp = lagg_link_active(sc, sc->sc_primary); /* * If tmp_tp is null, we've recieved a packet when all * our links are down. Weird, but process it anyways. */ if ((tmp_tp == NULL || tmp_tp == lp)) { m->m_pkthdr.rcvif = ifp; return (m); } } m_freem(m); return (NULL); } /* * Loadbalancing */ static void lagg_lb_attach(struct lagg_softc *sc) { struct lagg_port *lp; struct lagg_lb *lb; lb = malloc(sizeof(struct lagg_lb), M_DEVBUF, M_WAITOK | M_ZERO); sc->sc_capabilities = IFCAP_LAGG_FULLDUPLEX; lb->lb_key = m_ether_tcpip_hash_init(); sc->sc_psc = lb; SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) lagg_lb_port_create(lp); } static void lagg_lb_detach(struct lagg_softc *sc) { struct lagg_lb *lb; lb = (struct lagg_lb *)sc->sc_psc; LAGG_WUNLOCK(sc); if (lb != NULL) free(lb, M_DEVBUF); } static int lagg_lb_porttable(struct lagg_softc *sc, struct lagg_port *lp) { struct lagg_lb *lb = (struct lagg_lb *)sc->sc_psc; struct lagg_port *lp_next; int i = 0; bzero(&lb->lb_ports, sizeof(lb->lb_ports)); SLIST_FOREACH(lp_next, &sc->sc_ports, lp_entries) { if (lp_next == lp) continue; if (i >= LAGG_MAX_PORTS) return (EINVAL); if (sc->sc_ifflags & IFF_DEBUG) printf("%s: port %s at index %d\n", sc->sc_ifname, lp_next->lp_ifp->if_xname, i); lb->lb_ports[i++] = lp_next; } return (0); } static int lagg_lb_port_create(struct lagg_port *lp) { struct lagg_softc *sc = lp->lp_softc; return (lagg_lb_porttable(sc, NULL)); } static void lagg_lb_port_destroy(struct lagg_port *lp) { struct lagg_softc *sc = lp->lp_softc; lagg_lb_porttable(sc, lp); } static int lagg_lb_start(struct lagg_softc *sc, struct mbuf *m) { struct lagg_lb *lb = (struct lagg_lb *)sc->sc_psc; struct lagg_port *lp = NULL; uint32_t p = 0; if ((sc->sc_opts & LAGG_OPT_USE_FLOWID) && M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) p = m->m_pkthdr.flowid >> sc->flowid_shift; else p = m_ether_tcpip_hash(sc->sc_flags, m, lb->lb_key); p %= sc->sc_count; lp = lb->lb_ports[p]; /* * Check the port's link state. This will return the next active * port if the link is down or the port is NULL. */ if ((lp = lagg_link_active(sc, lp)) == NULL) { m_freem(m); return (ENETDOWN); } /* Send mbuf */ return (lagg_enqueue(lp->lp_ifp, m)); } static struct mbuf * lagg_lb_input(struct lagg_softc *sc, struct lagg_port *lp, struct mbuf *m) { struct ifnet *ifp = sc->sc_ifp; /* Just pass in the packet to our lagg device */ m->m_pkthdr.rcvif = ifp; return (m); } /* * 802.3ad LACP */ static void lagg_lacp_attach(struct lagg_softc *sc) { struct lagg_port *lp; lacp_attach(sc); SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) lacp_port_create(lp); } static void lagg_lacp_detach(struct lagg_softc *sc) { struct lagg_port *lp; void *psc; SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) lacp_port_destroy(lp); psc = sc->sc_psc; sc->sc_psc = NULL; LAGG_WUNLOCK(sc); lacp_detach(psc); } static void lagg_lacp_lladdr(struct lagg_softc *sc) { struct lagg_port *lp; /* purge all the lacp ports */ SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) lacp_port_destroy(lp); /* add them back in */ SLIST_FOREACH(lp, &sc->sc_ports, lp_entries) lacp_port_create(lp); } static int lagg_lacp_start(struct lagg_softc *sc, struct mbuf *m) { struct lagg_port *lp; lp = lacp_select_tx_port(sc, m); if (lp == NULL) { m_freem(m); return (ENETDOWN); } /* Send mbuf */ return (lagg_enqueue(lp->lp_ifp, m)); } static struct mbuf * lagg_lacp_input(struct lagg_softc *sc, struct lagg_port *lp, struct mbuf *m) { struct ifnet *ifp = sc->sc_ifp; struct ether_header *eh; u_short etype; eh = mtod(m, struct ether_header *); etype = ntohs(eh->ether_type); /* Tap off LACP control messages */ if ((m->m_flags & M_VLANTAG) == 0 && etype == ETHERTYPE_SLOW) { m = lacp_input(lp, m); if (m == NULL) return (NULL); } /* * If the port is not collecting or not in the active aggregator then * free and return. */ if (lacp_iscollecting(lp) == 0 || lacp_isactive(lp) == 0) { m_freem(m); return (NULL); } m->m_pkthdr.rcvif = ifp; return (m); } Index: projects/clang380-import/sys/net/if_lagg.h =================================================================== --- projects/clang380-import/sys/net/if_lagg.h (revision 294776) +++ projects/clang380-import/sys/net/if_lagg.h (revision 294777) @@ -1,292 +1,295 @@ /* $OpenBSD: if_trunk.h,v 1.11 2007/01/31 06:20:19 reyk Exp $ */ /* * Copyright (c) 2005, 2006 Reyk Floeter * * Permission to use, copy, modify, and distribute this software for any * purpose with or without fee is hereby granted, provided that the above * copyright notice and this permission notice appear in all copies. * * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * * $FreeBSD$ */ #ifndef _NET_LAGG_H #define _NET_LAGG_H /* * Global definitions */ #define LAGG_MAX_PORTS 32 /* logically */ #define LAGG_MAX_NAMESIZE 32 /* name of a protocol */ #define LAGG_MAX_STACKING 4 /* maximum number of stacked laggs */ /* Lagg flags */ #define LAGG_F_HASHL2 0x00000001 /* hash layer 2 */ #define LAGG_F_HASHL3 0x00000002 /* hash layer 3 */ #define LAGG_F_HASHL4 0x00000004 /* hash layer 4 */ #define LAGG_F_HASHMASK 0x00000007 /* Port flags */ #define LAGG_PORT_SLAVE 0x00000000 /* normal enslaved port */ #define LAGG_PORT_MASTER 0x00000001 /* primary port */ #define LAGG_PORT_STACK 0x00000002 /* stacked lagg port */ #define LAGG_PORT_ACTIVE 0x00000004 /* port is active */ #define LAGG_PORT_COLLECTING 0x00000008 /* port is receiving frames */ #define LAGG_PORT_DISTRIBUTING 0x00000010 /* port is sending frames */ #define LAGG_PORT_DISABLED 0x00000020 /* port is disabled */ #define LAGG_PORT_BITS "\20\01MASTER\02STACK\03ACTIVE\04COLLECTING" \ "\05DISTRIBUTING\06DISABLED" /* Supported lagg PROTOs */ typedef enum { LAGG_PROTO_NONE = 0, /* no lagg protocol defined */ LAGG_PROTO_ROUNDROBIN, /* simple round robin */ LAGG_PROTO_FAILOVER, /* active failover */ LAGG_PROTO_LOADBALANCE, /* loadbalance */ LAGG_PROTO_LACP, /* 802.3ad lacp */ LAGG_PROTO_BROADCAST, /* broadcast */ LAGG_PROTO_MAX, } lagg_proto; struct lagg_protos { const char *lpr_name; lagg_proto lpr_proto; }; #define LAGG_PROTO_DEFAULT LAGG_PROTO_FAILOVER #define LAGG_PROTOS { \ { "failover", LAGG_PROTO_FAILOVER }, \ { "lacp", LAGG_PROTO_LACP }, \ { "loadbalance", LAGG_PROTO_LOADBALANCE }, \ { "roundrobin", LAGG_PROTO_ROUNDROBIN }, \ { "broadcast", LAGG_PROTO_BROADCAST }, \ { "none", LAGG_PROTO_NONE }, \ { "default", LAGG_PROTO_DEFAULT } \ } /* * lagg ioctls. */ /* * LACP current operational parameters structure. */ struct lacp_opreq { uint16_t actor_prio; uint8_t actor_mac[ETHER_ADDR_LEN]; uint16_t actor_key; uint16_t actor_portprio; uint16_t actor_portno; uint8_t actor_state; uint16_t partner_prio; uint8_t partner_mac[ETHER_ADDR_LEN]; uint16_t partner_key; uint16_t partner_portprio; uint16_t partner_portno; uint8_t partner_state; }; /* lagg port settings */ struct lagg_reqport { char rp_ifname[IFNAMSIZ]; /* name of the lagg */ char rp_portname[IFNAMSIZ]; /* name of the port */ u_int32_t rp_prio; /* port priority */ u_int32_t rp_flags; /* port flags */ union { struct lacp_opreq rpsc_lacp; } rp_psc; #define rp_lacpreq rp_psc.rpsc_lacp }; #define SIOCGLAGGPORT _IOWR('i', 140, struct lagg_reqport) #define SIOCSLAGGPORT _IOW('i', 141, struct lagg_reqport) #define SIOCSLAGGDELPORT _IOW('i', 142, struct lagg_reqport) /* lagg, ports and options */ struct lagg_reqall { char ra_ifname[IFNAMSIZ]; /* name of the lagg */ u_int ra_proto; /* lagg protocol */ size_t ra_size; /* size of buffer */ struct lagg_reqport *ra_port; /* allocated buffer */ int ra_ports; /* total port count */ union { struct lacp_opreq rpsc_lacp; } ra_psc; #define ra_lacpreq ra_psc.rpsc_lacp }; #define SIOCGLAGG _IOWR('i', 143, struct lagg_reqall) #define SIOCSLAGG _IOW('i', 144, struct lagg_reqall) struct lagg_reqflags { char rf_ifname[IFNAMSIZ]; /* name of the lagg */ uint32_t rf_flags; /* lagg protocol */ }; #define SIOCGLAGGFLAGS _IOWR('i', 145, struct lagg_reqflags) #define SIOCSLAGGHASH _IOW('i', 146, struct lagg_reqflags) struct lagg_reqopts { char ro_ifname[IFNAMSIZ]; /* name of the lagg */ int ro_opts; /* Option bitmap */ #define LAGG_OPT_NONE 0x00 #define LAGG_OPT_USE_FLOWID 0x01 /* enable use of flowid */ /* Pseudo flags which are used in ro_opts but not stored into sc_opts. */ #define LAGG_OPT_FLOWIDSHIFT 0x02 /* set flowid shift */ #define LAGG_OPT_FLOWIDSHIFT_MASK 0x1f /* flowid is uint32_t */ #define LAGG_OPT_LACP_STRICT 0x10 /* LACP strict mode */ #define LAGG_OPT_LACP_TXTEST 0x20 /* LACP debug: txtest */ #define LAGG_OPT_LACP_RXTEST 0x40 /* LACP debug: rxtest */ #define LAGG_OPT_LACP_TIMEOUT 0x80 /* LACP timeout */ u_int ro_count; /* number of ports */ u_int ro_active; /* active port count */ u_int ro_flapping; /* number of flapping */ int ro_flowid_shift; /* shift the flowid */ + uint32_t ro_bkt; /* packet bucket for roundrobin */ }; #define SIOCGLAGGOPTS _IOWR('i', 152, struct lagg_reqopts) #define SIOCSLAGGOPTS _IOW('i', 153, struct lagg_reqopts) #define LAGG_OPT_BITS "\020\001USE_FLOWID\005LACP_STRICT" \ "\006LACP_TXTEST\007LACP_RXTEST" #ifdef _KERNEL /* * Internal kernel part */ #define LAGG_PORTACTIVE(_tp) ( \ ((_tp)->lp_ifp->if_link_state == LINK_STATE_UP) && \ ((_tp)->lp_ifp->if_flags & IFF_UP) \ ) struct lagg_ifreq { union { struct ifreq ifreq; struct { char ifr_name[IFNAMSIZ]; struct sockaddr_storage ifr_ss; } ifreq_storage; } ifreq; }; #define sc_ifflags sc_ifp->if_flags /* flags */ #define sc_ifname sc_ifp->if_xname /* name */ #define sc_capabilities sc_ifp->if_capabilities /* capabilities */ #define IFCAP_LAGG_MASK 0xffff0000 /* private capabilities */ #define IFCAP_LAGG_FULLDUPLEX 0x00010000 /* full duplex with >1 ports */ /* Private data used by the loadbalancing protocol */ struct lagg_lb { u_int32_t lb_key; struct lagg_port *lb_ports[LAGG_MAX_PORTS]; }; struct lagg_mc { struct sockaddr_dl mc_addr; struct ifmultiaddr *mc_ifma; SLIST_ENTRY(lagg_mc) mc_entries; }; typedef enum { LAGG_LLQTYPE_PHYS = 0, /* Task related to physical (underlying) port */ LAGG_LLQTYPE_VIRT, /* Task related to lagg interface itself */ } lagg_llqtype; /* List of interfaces to have the MAC address modified */ struct lagg_llq { struct ifnet *llq_ifp; uint8_t llq_lladdr[ETHER_ADDR_LEN]; lagg_llqtype llq_type; SLIST_ENTRY(lagg_llq) llq_entries; }; struct lagg_counters { uint64_t val[IFCOUNTERS]; }; struct lagg_softc { struct ifnet *sc_ifp; /* virtual interface */ struct rmlock sc_mtx; int sc_proto; /* lagg protocol */ u_int sc_count; /* number of ports */ u_int sc_active; /* active port count */ u_int sc_flapping; /* number of flapping * events */ struct lagg_port *sc_primary; /* primary port */ struct ifmedia sc_media; /* media config */ void *sc_psc; /* protocol data */ uint32_t sc_seq; /* sequence counter */ uint32_t sc_flags; SLIST_HEAD(__tplhd, lagg_port) sc_ports; /* list of interfaces */ SLIST_ENTRY(lagg_softc) sc_entries; struct task sc_lladdr_task; SLIST_HEAD(__llqhd, lagg_llq) sc_llq_head; /* interfaces to program the lladdr on */ eventhandler_tag vlan_attach; eventhandler_tag vlan_detach; struct callout sc_callout; u_int sc_opts; int flowid_shift; /* shift the flowid */ + uint32_t sc_bkt; /* packates bucket for roundrobin */ + uint32_t sc_bkt_count; /* packates bucket count for roundrobin */ struct lagg_counters detached_counters; /* detached ports sum */ }; struct lagg_port { struct ifnet *lp_ifp; /* physical interface */ struct lagg_softc *lp_softc; /* parent lagg */ uint8_t lp_lladdr[ETHER_ADDR_LEN]; u_char lp_iftype; /* interface type */ uint32_t lp_prio; /* port priority */ uint32_t lp_flags; /* port flags */ int lp_ifflags; /* saved ifp flags */ void *lh_cookie; /* if state hook */ void *lp_psc; /* protocol data */ int lp_detaching; /* ifnet is detaching */ SLIST_HEAD(__mclhd, lagg_mc) lp_mc_head; /* multicast addresses */ /* Redirected callbacks */ int (*lp_ioctl)(struct ifnet *, u_long, caddr_t); int (*lp_output)(struct ifnet *, struct mbuf *, const struct sockaddr *, struct route *); struct lagg_counters port_counters; /* ifp counters copy */ SLIST_ENTRY(lagg_port) lp_entries; }; #define LAGG_LOCK_INIT(_sc) rm_init(&(_sc)->sc_mtx, "if_lagg rmlock") #define LAGG_LOCK_DESTROY(_sc) rm_destroy(&(_sc)->sc_mtx) #define LAGG_RLOCK(_sc, _p) rm_rlock(&(_sc)->sc_mtx, (_p)) #define LAGG_WLOCK(_sc) rm_wlock(&(_sc)->sc_mtx) #define LAGG_RUNLOCK(_sc, _p) rm_runlock(&(_sc)->sc_mtx, (_p)) #define LAGG_WUNLOCK(_sc) rm_wunlock(&(_sc)->sc_mtx) #define LAGG_RLOCK_ASSERT(_sc) rm_assert(&(_sc)->sc_mtx, RA_RLOCKED) #define LAGG_WLOCK_ASSERT(_sc) rm_assert(&(_sc)->sc_mtx, RA_WLOCKED) #define LAGG_UNLOCK_ASSERT(_sc) rm_assert(&(_sc)->sc_mtx, RA_UNLOCKED) extern struct mbuf *(*lagg_input_p)(struct ifnet *, struct mbuf *); extern void (*lagg_linkstate_p)(struct ifnet *, int ); int lagg_enqueue(struct ifnet *, struct mbuf *); SYSCTL_DECL(_net_link_lagg); #endif /* _KERNEL */ #endif /* _NET_LAGG_H */ Index: projects/clang380-import/sys/net/radix.c =================================================================== --- projects/clang380-import/sys/net/radix.c (revision 294776) +++ projects/clang380-import/sys/net/radix.c (revision 294777) @@ -1,1208 +1,1209 @@ /*- * Copyright (c) 1988, 1989, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)radix.c 8.5 (Berkeley) 5/19/95 * $FreeBSD$ */ /* * Routines to build and maintain radix trees for routing lookups. */ #include #ifdef _KERNEL #include #include #include #include #include #include #include #include "opt_mpath.h" #ifdef RADIX_MPATH #include #endif #else /* !_KERNEL */ #include #include #include #define log(x, arg...) fprintf(stderr, ## arg) #define panic(x) fprintf(stderr, "PANIC: %s", x), exit(1) #define min(a, b) ((a) < (b) ? (a) : (b) ) #include #endif /* !_KERNEL */ -static int rn_walktree_from(struct radix_node_head *h, void *a, void *m, - walktree_f_t *f, void *w); -static int rn_walktree(struct radix_node_head *, walktree_f_t *, void *); static struct radix_node - *rn_insert(void *, struct radix_node_head *, int *, + *rn_insert(void *, struct radix_head *, int *, struct radix_node [2]), *rn_newpair(void *, int, struct radix_node[2]), *rn_search(void *, struct radix_node *), *rn_search_m(void *, struct radix_node *, void *); +static struct radix_node *rn_addmask(void *, struct radix_mask_head *, int,int); -static void rn_detachhead_internal(void **head); -static int rn_inithead_internal(void **head, int off); +static void rn_detachhead_internal(struct radix_head *); #define RADIX_MAX_KEY_LEN 32 static char rn_zeros[RADIX_MAX_KEY_LEN]; static char rn_ones[RADIX_MAX_KEY_LEN] = { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, }; static int rn_lexobetter(void *m_arg, void *n_arg); static struct radix_mask * rn_new_radix_mask(struct radix_node *tt, struct radix_mask *next); static int rn_satisfies_leaf(char *trial, struct radix_node *leaf, int skip); /* * The data structure for the keys is a radix tree with one way * branching removed. The index rn_bit at an internal node n represents a bit * position to be tested. The tree is arranged so that all descendants * of a node n have keys whose bits all agree up to position rn_bit - 1. * (We say the index of n is rn_bit.) * * There is at least one descendant which has a one bit at position rn_bit, * and at least one with a zero there. * * A route is determined by a pair of key and mask. We require that the * bit-wise logical and of the key and mask to be the key. * We define the index of a route to associated with the mask to be * the first bit number in the mask where 0 occurs (with bit number 0 * representing the highest order bit). * * We say a mask is normal if every bit is 0, past the index of the mask. * If a node n has a descendant (k, m) with index(m) == index(n) == rn_bit, * and m is a normal mask, then the route applies to every descendant of n. * If the index(m) < rn_bit, this implies the trailing last few bits of k * before bit b are all 0, (and hence consequently true of every descendant * of n), so the route applies to all descendants of the node as well. * * Similar logic shows that a non-normal mask m such that * index(m) <= index(n) could potentially apply to many children of n. * Thus, for each non-host route, we attach its mask to a list at an internal * node as high in the tree as we can go. * * The present version of the code makes use of normal routes in short- * circuiting an explict mask and compare operation when testing whether * a key satisfies a normal route, and also in remembering the unique leaf * that governs a subtree. */ /* * Most of the functions in this code assume that the key/mask arguments * are sockaddr-like structures, where the first byte is an u_char * indicating the size of the entire structure. * * To make the assumption more explicit, we use the LEN() macro to access * this field. It is safe to pass an expression with side effects * to LEN() as the argument is evaluated only once. * We cast the result to int as this is the dominant usage. */ #define LEN(x) ( (int) (*(const u_char *)(x)) ) /* * XXX THIS NEEDS TO BE FIXED * In the code, pointers to keys and masks are passed as either * 'void *' (because callers use to pass pointers of various kinds), or * 'caddr_t' (which is fine for pointer arithmetics, but not very * clean when you dereference it to access data). Furthermore, caddr_t * is really 'char *', while the natural type to operate on keys and * masks would be 'u_char'. This mismatch require a lot of casts and * intermediate variables to adapt types that clutter the code. */ /* * Search a node in the tree matching the key. */ static struct radix_node * rn_search(void *v_arg, struct radix_node *head) { struct radix_node *x; caddr_t v; for (x = head, v = v_arg; x->rn_bit >= 0;) { if (x->rn_bmask & v[x->rn_offset]) x = x->rn_right; else x = x->rn_left; } return (x); } /* * Same as above, but with an additional mask. * XXX note this function is used only once. */ static struct radix_node * rn_search_m(void *v_arg, struct radix_node *head, void *m_arg) { struct radix_node *x; caddr_t v = v_arg, m = m_arg; for (x = head; x->rn_bit >= 0;) { if ((x->rn_bmask & m[x->rn_offset]) && (x->rn_bmask & v[x->rn_offset])) x = x->rn_right; else x = x->rn_left; } return (x); } int rn_refines(void *m_arg, void *n_arg) { caddr_t m = m_arg, n = n_arg; caddr_t lim, lim2 = lim = n + LEN(n); int longer = LEN(n++) - LEN(m++); int masks_are_equal = 1; if (longer > 0) lim -= longer; while (n < lim) { if (*n & ~(*m)) return (0); if (*n++ != *m++) masks_are_equal = 0; } while (n < lim2) if (*n++) return (0); if (masks_are_equal && (longer < 0)) for (lim2 = m - longer; m < lim2; ) if (*m++) return (1); return (!masks_are_equal); } /* * Search for exact match in given @head. * Assume host bits are cleared in @v_arg if @m_arg is not NULL * Note that prefixes with /32 or /128 masks are treated differently * from host routes. */ struct radix_node * -rn_lookup(void *v_arg, void *m_arg, struct radix_node_head *head) +rn_lookup(void *v_arg, void *m_arg, struct radix_head *head) { struct radix_node *x; caddr_t netmask; if (m_arg != NULL) { /* * Most common case: search exact prefix/mask */ x = rn_addmask(m_arg, head->rnh_masks, 1, head->rnh_treetop->rn_offset); if (x == NULL) return (NULL); netmask = x->rn_key; x = rn_match(v_arg, head); while (x != NULL && x->rn_mask != netmask) x = x->rn_dupedkey; return (x); } /* * Search for host address. */ if ((x = rn_match(v_arg, head)) == NULL) return (NULL); /* Check if found key is the same */ if (LEN(x->rn_key) != LEN(v_arg) || bcmp(x->rn_key, v_arg, LEN(v_arg))) return (NULL); /* Check if this is not host route */ if (x->rn_mask != NULL) return (NULL); return (x); } static int rn_satisfies_leaf(char *trial, struct radix_node *leaf, int skip) { char *cp = trial, *cp2 = leaf->rn_key, *cp3 = leaf->rn_mask; char *cplim; int length = min(LEN(cp), LEN(cp2)); if (cp3 == NULL) cp3 = rn_ones; else length = min(length, LEN(cp3)); cplim = cp + length; cp3 += skip; cp2 += skip; for (cp += skip; cp < cplim; cp++, cp2++, cp3++) if ((*cp ^ *cp2) & *cp3) return (0); return (1); } /* * Search for longest-prefix match in given @head */ struct radix_node * -rn_match(void *v_arg, struct radix_node_head *head) +rn_match(void *v_arg, struct radix_head *head) { caddr_t v = v_arg; struct radix_node *t = head->rnh_treetop, *x; caddr_t cp = v, cp2; caddr_t cplim; struct radix_node *saved_t, *top = t; int off = t->rn_offset, vlen = LEN(cp), matched_off; int test, b, rn_bit; /* * Open code rn_search(v, top) to avoid overhead of extra * subroutine call. */ for (; t->rn_bit >= 0; ) { if (t->rn_bmask & cp[t->rn_offset]) t = t->rn_right; else t = t->rn_left; } /* * See if we match exactly as a host destination * or at least learn how many bits match, for normal mask finesse. * * It doesn't hurt us to limit how many bytes to check * to the length of the mask, since if it matches we had a genuine * match and the leaf we have is the most specific one anyway; * if it didn't match with a shorter length it would fail * with a long one. This wins big for class B&C netmasks which * are probably the most common case... */ if (t->rn_mask) vlen = *(u_char *)t->rn_mask; cp += off; cp2 = t->rn_key + off; cplim = v + vlen; for (; cp < cplim; cp++, cp2++) if (*cp != *cp2) goto on1; /* * This extra grot is in case we are explicitly asked * to look up the default. Ugh! * * Never return the root node itself, it seems to cause a * lot of confusion. */ if (t->rn_flags & RNF_ROOT) t = t->rn_dupedkey; return (t); on1: test = (*cp ^ *cp2) & 0xff; /* find first bit that differs */ for (b = 7; (test >>= 1) > 0;) b--; matched_off = cp - v; b += matched_off << 3; rn_bit = -1 - b; /* * If there is a host route in a duped-key chain, it will be first. */ if ((saved_t = t)->rn_mask == 0) t = t->rn_dupedkey; for (; t; t = t->rn_dupedkey) /* * Even if we don't match exactly as a host, * we may match if the leaf we wound up at is * a route to a net. */ if (t->rn_flags & RNF_NORMAL) { if (rn_bit <= t->rn_bit) return (t); } else if (rn_satisfies_leaf(v, t, matched_off)) return (t); t = saved_t; /* start searching up the tree */ do { struct radix_mask *m; t = t->rn_parent; m = t->rn_mklist; /* * If non-contiguous masks ever become important * we can restore the masking and open coding of * the search and satisfaction test and put the * calculation of "off" back before the "do". */ while (m) { if (m->rm_flags & RNF_NORMAL) { if (rn_bit <= m->rm_bit) return (m->rm_leaf); } else { off = min(t->rn_offset, matched_off); x = rn_search_m(v, t, m->rm_mask); while (x && x->rn_mask != m->rm_mask) x = x->rn_dupedkey; if (x && rn_satisfies_leaf(v, x, off)) return (x); } m = m->rm_mklist; } } while (t != top); return (0); } #ifdef RN_DEBUG int rn_nodenum; struct radix_node *rn_clist; int rn_saveinfo; int rn_debug = 1; #endif /* * Whenever we add a new leaf to the tree, we also add a parent node, * so we allocate them as an array of two elements: the first one must be * the leaf (see RNTORT() in route.c), the second one is the parent. * This routine initializes the relevant fields of the nodes, so that * the leaf is the left child of the parent node, and both nodes have * (almost) all all fields filled as appropriate. * (XXX some fields are left unset, see the '#if 0' section). * The function returns a pointer to the parent node. */ static struct radix_node * rn_newpair(void *v, int b, struct radix_node nodes[2]) { struct radix_node *tt = nodes, *t = tt + 1; t->rn_bit = b; t->rn_bmask = 0x80 >> (b & 7); t->rn_left = tt; t->rn_offset = b >> 3; #if 0 /* XXX perhaps we should fill these fields as well. */ t->rn_parent = t->rn_right = NULL; tt->rn_mask = NULL; tt->rn_dupedkey = NULL; tt->rn_bmask = 0; #endif tt->rn_bit = -1; tt->rn_key = (caddr_t)v; tt->rn_parent = t; tt->rn_flags = t->rn_flags = RNF_ACTIVE; tt->rn_mklist = t->rn_mklist = 0; #ifdef RN_DEBUG tt->rn_info = rn_nodenum++; t->rn_info = rn_nodenum++; tt->rn_twin = t; tt->rn_ybro = rn_clist; rn_clist = tt; #endif return (t); } static struct radix_node * -rn_insert(void *v_arg, struct radix_node_head *head, int *dupentry, +rn_insert(void *v_arg, struct radix_head *head, int *dupentry, struct radix_node nodes[2]) { caddr_t v = v_arg; struct radix_node *top = head->rnh_treetop; int head_off = top->rn_offset, vlen = LEN(v); struct radix_node *t = rn_search(v_arg, top); caddr_t cp = v + head_off; int b; struct radix_node *p, *tt, *x; /* * Find first bit at which v and t->rn_key differ */ caddr_t cp2 = t->rn_key + head_off; int cmp_res; caddr_t cplim = v + vlen; while (cp < cplim) if (*cp2++ != *cp++) goto on1; *dupentry = 1; return (t); on1: *dupentry = 0; cmp_res = (cp[-1] ^ cp2[-1]) & 0xff; for (b = (cp - v) << 3; cmp_res; b--) cmp_res >>= 1; x = top; cp = v; do { p = x; if (cp[x->rn_offset] & x->rn_bmask) x = x->rn_right; else x = x->rn_left; } while (b > (unsigned) x->rn_bit); /* x->rn_bit < b && x->rn_bit >= 0 */ #ifdef RN_DEBUG if (rn_debug) log(LOG_DEBUG, "rn_insert: Going In:\n"), traverse(p); #endif t = rn_newpair(v_arg, b, nodes); tt = t->rn_left; if ((cp[p->rn_offset] & p->rn_bmask) == 0) p->rn_left = t; else p->rn_right = t; x->rn_parent = t; t->rn_parent = p; /* frees x, p as temp vars below */ if ((cp[t->rn_offset] & t->rn_bmask) == 0) { t->rn_right = x; } else { t->rn_right = tt; t->rn_left = x; } #ifdef RN_DEBUG if (rn_debug) log(LOG_DEBUG, "rn_insert: Coming Out:\n"), traverse(p); #endif return (tt); } struct radix_node * -rn_addmask(void *n_arg, struct radix_node_head *maskhead, int search, int skip) +rn_addmask(void *n_arg, struct radix_mask_head *maskhead, int search, int skip) { unsigned char *netmask = n_arg; unsigned char *cp, *cplim; struct radix_node *x; int b = 0, mlen, j; int maskduplicated, isnormal; struct radix_node *saved_x; unsigned char addmask_key[RADIX_MAX_KEY_LEN]; if ((mlen = LEN(netmask)) > RADIX_MAX_KEY_LEN) mlen = RADIX_MAX_KEY_LEN; if (skip == 0) skip = 1; if (mlen <= skip) - return (maskhead->rnh_nodes); + return (maskhead->mask_nodes); bzero(addmask_key, RADIX_MAX_KEY_LEN); if (skip > 1) bcopy(rn_ones + 1, addmask_key + 1, skip - 1); bcopy(netmask + skip, addmask_key + skip, mlen - skip); /* * Trim trailing zeroes. */ for (cp = addmask_key + mlen; (cp > addmask_key) && cp[-1] == 0;) cp--; mlen = cp - addmask_key; if (mlen <= skip) - return (maskhead->rnh_nodes); + return (maskhead->mask_nodes); *addmask_key = mlen; - x = rn_search(addmask_key, maskhead->rnh_treetop); + x = rn_search(addmask_key, maskhead->head.rnh_treetop); if (bcmp(addmask_key, x->rn_key, mlen) != 0) x = 0; if (x || search) return (x); R_Zalloc(x, struct radix_node *, RADIX_MAX_KEY_LEN + 2 * sizeof (*x)); if ((saved_x = x) == 0) return (0); netmask = cp = (unsigned char *)(x + 2); bcopy(addmask_key, cp, mlen); - x = rn_insert(cp, maskhead, &maskduplicated, x); + x = rn_insert(cp, &maskhead->head, &maskduplicated, x); if (maskduplicated) { log(LOG_ERR, "rn_addmask: mask impossibly already in tree"); R_Free(saved_x); return (x); } /* * Calculate index of mask, and check for normalcy. * First find the first byte with a 0 bit, then if there are * more bits left (remember we already trimmed the trailing 0's), * the bits should be contiguous, otherwise we have got * a non-contiguous mask. */ #define CONTIG(_c) (((~(_c) + 1) & (_c)) == (unsigned char)(~(_c) + 1)) cplim = netmask + mlen; isnormal = 1; for (cp = netmask + skip; (cp < cplim) && *(u_char *)cp == 0xff;) cp++; if (cp != cplim) { for (j = 0x80; (j & *cp) != 0; j >>= 1) b++; if (!CONTIG(*cp) || cp != (cplim - 1)) isnormal = 0; } b += (cp - netmask) << 3; x->rn_bit = -1 - b; if (isnormal) x->rn_flags |= RNF_NORMAL; return (x); } static int /* XXX: arbitrary ordering for non-contiguous masks */ rn_lexobetter(void *m_arg, void *n_arg) { u_char *mp = m_arg, *np = n_arg, *lim; if (LEN(mp) > LEN(np)) return (1); /* not really, but need to check longer one first */ if (LEN(mp) == LEN(np)) for (lim = mp + LEN(mp); mp < lim;) if (*mp++ > *np++) return (1); return (0); } static struct radix_mask * rn_new_radix_mask(struct radix_node *tt, struct radix_mask *next) { struct radix_mask *m; R_Malloc(m, struct radix_mask *, sizeof (struct radix_mask)); if (m == NULL) { log(LOG_ERR, "Failed to allocate route mask\n"); return (0); } bzero(m, sizeof(*m)); m->rm_bit = tt->rn_bit; m->rm_flags = tt->rn_flags; if (tt->rn_flags & RNF_NORMAL) m->rm_leaf = tt; else m->rm_mask = tt->rn_mask; m->rm_mklist = next; tt->rn_mklist = m; return (m); } struct radix_node * -rn_addroute(void *v_arg, void *n_arg, struct radix_node_head *head, +rn_addroute(void *v_arg, void *n_arg, struct radix_head *head, struct radix_node treenodes[2]) { caddr_t v = (caddr_t)v_arg, netmask = (caddr_t)n_arg; struct radix_node *t, *x = 0, *tt; struct radix_node *saved_tt, *top = head->rnh_treetop; short b = 0, b_leaf = 0; int keyduplicated; caddr_t mmask; struct radix_mask *m, **mp; /* * In dealing with non-contiguous masks, there may be * many different routes which have the same mask. * We will find it useful to have a unique pointer to * the mask to speed avoiding duplicate references at * nodes and possibly save time in calculating indices. */ if (netmask) { x = rn_addmask(netmask, head->rnh_masks, 0, top->rn_offset); if (x == NULL) return (0); b_leaf = x->rn_bit; b = -1 - x->rn_bit; netmask = x->rn_key; } /* * Deal with duplicated keys: attach node to previous instance */ saved_tt = tt = rn_insert(v, head, &keyduplicated, treenodes); if (keyduplicated) { for (t = tt; tt; t = tt, tt = tt->rn_dupedkey) { #ifdef RADIX_MPATH /* permit multipath, if enabled for the family */ if (rn_mpath_capable(head) && netmask == tt->rn_mask) { /* * go down to the end of multipaths, so that * new entry goes into the end of rn_dupedkey * chain. */ do { t = tt; tt = tt->rn_dupedkey; } while (tt && t->rn_mask == tt->rn_mask); break; } #endif if (tt->rn_mask == netmask) return (0); if (netmask == 0 || (tt->rn_mask && ((b_leaf < tt->rn_bit) /* index(netmask) > node */ || rn_refines(netmask, tt->rn_mask) || rn_lexobetter(netmask, tt->rn_mask)))) break; } /* * If the mask is not duplicated, we wouldn't * find it among possible duplicate key entries * anyway, so the above test doesn't hurt. * * We sort the masks for a duplicated key the same way as * in a masklist -- most specific to least specific. * This may require the unfortunate nuisance of relocating * the head of the list. * * We also reverse, or doubly link the list through the * parent pointer. */ if (tt == saved_tt) { struct radix_node *xx = x; /* link in at head of list */ (tt = treenodes)->rn_dupedkey = t; tt->rn_flags = t->rn_flags; tt->rn_parent = x = t->rn_parent; t->rn_parent = tt; /* parent */ if (x->rn_left == t) x->rn_left = tt; else x->rn_right = tt; saved_tt = tt; x = xx; } else { (tt = treenodes)->rn_dupedkey = t->rn_dupedkey; t->rn_dupedkey = tt; tt->rn_parent = t; /* parent */ if (tt->rn_dupedkey) /* parent */ tt->rn_dupedkey->rn_parent = tt; /* parent */ } #ifdef RN_DEBUG t=tt+1; tt->rn_info = rn_nodenum++; t->rn_info = rn_nodenum++; tt->rn_twin = t; tt->rn_ybro = rn_clist; rn_clist = tt; #endif tt->rn_key = (caddr_t) v; tt->rn_bit = -1; tt->rn_flags = RNF_ACTIVE; } /* * Put mask in tree. */ if (netmask) { tt->rn_mask = netmask; tt->rn_bit = x->rn_bit; tt->rn_flags |= x->rn_flags & RNF_NORMAL; } t = saved_tt->rn_parent; if (keyduplicated) goto on2; b_leaf = -1 - t->rn_bit; if (t->rn_right == saved_tt) x = t->rn_left; else x = t->rn_right; /* Promote general routes from below */ if (x->rn_bit < 0) { for (mp = &t->rn_mklist; x; x = x->rn_dupedkey) if (x->rn_mask && (x->rn_bit >= b_leaf) && x->rn_mklist == 0) { *mp = m = rn_new_radix_mask(x, 0); if (m) mp = &m->rm_mklist; } } else if (x->rn_mklist) { /* * Skip over masks whose index is > that of new node */ for (mp = &x->rn_mklist; (m = *mp); mp = &m->rm_mklist) if (m->rm_bit >= b_leaf) break; t->rn_mklist = m; *mp = 0; } on2: /* Add new route to highest possible ancestor's list */ if ((netmask == 0) || (b > t->rn_bit )) return (tt); /* can't lift at all */ b_leaf = tt->rn_bit; do { x = t; t = t->rn_parent; } while (b <= t->rn_bit && x != top); /* * Search through routes associated with node to * insert new route according to index. * Need same criteria as when sorting dupedkeys to avoid * double loop on deletion. */ for (mp = &x->rn_mklist; (m = *mp); mp = &m->rm_mklist) { if (m->rm_bit < b_leaf) continue; if (m->rm_bit > b_leaf) break; if (m->rm_flags & RNF_NORMAL) { mmask = m->rm_leaf->rn_mask; if (tt->rn_flags & RNF_NORMAL) { #if !defined(RADIX_MPATH) log(LOG_ERR, "Non-unique normal route, mask not entered\n"); #endif return (tt); } } else mmask = m->rm_mask; if (mmask == netmask) { m->rm_refs++; tt->rn_mklist = m; return (tt); } if (rn_refines(netmask, mmask) || rn_lexobetter(netmask, mmask)) break; } *mp = rn_new_radix_mask(tt, *mp); return (tt); } struct radix_node * -rn_delete(void *v_arg, void *netmask_arg, struct radix_node_head *head) +rn_delete(void *v_arg, void *netmask_arg, struct radix_head *head) { struct radix_node *t, *p, *x, *tt; struct radix_mask *m, *saved_m, **mp; struct radix_node *dupedkey, *saved_tt, *top; caddr_t v, netmask; int b, head_off, vlen; v = v_arg; netmask = netmask_arg; x = head->rnh_treetop; tt = rn_search(v, x); head_off = x->rn_offset; vlen = LEN(v); saved_tt = tt; top = x; if (tt == 0 || bcmp(v + head_off, tt->rn_key + head_off, vlen - head_off)) return (0); /* * Delete our route from mask lists. */ if (netmask) { x = rn_addmask(netmask, head->rnh_masks, 1, head_off); if (x == NULL) return (0); netmask = x->rn_key; while (tt->rn_mask != netmask) if ((tt = tt->rn_dupedkey) == 0) return (0); } if (tt->rn_mask == 0 || (saved_m = m = tt->rn_mklist) == 0) goto on1; if (tt->rn_flags & RNF_NORMAL) { if (m->rm_leaf != tt || m->rm_refs > 0) { log(LOG_ERR, "rn_delete: inconsistent annotation\n"); return (0); /* dangling ref could cause disaster */ } } else { if (m->rm_mask != tt->rn_mask) { log(LOG_ERR, "rn_delete: inconsistent annotation\n"); goto on1; } if (--m->rm_refs >= 0) goto on1; } b = -1 - tt->rn_bit; t = saved_tt->rn_parent; if (b > t->rn_bit) goto on1; /* Wasn't lifted at all */ do { x = t; t = t->rn_parent; } while (b <= t->rn_bit && x != top); for (mp = &x->rn_mklist; (m = *mp); mp = &m->rm_mklist) if (m == saved_m) { *mp = m->rm_mklist; R_Free(m); break; } if (m == 0) { log(LOG_ERR, "rn_delete: couldn't find our annotation\n"); if (tt->rn_flags & RNF_NORMAL) return (0); /* Dangling ref to us */ } on1: /* * Eliminate us from tree */ if (tt->rn_flags & RNF_ROOT) return (0); #ifdef RN_DEBUG /* Get us out of the creation list */ for (t = rn_clist; t && t->rn_ybro != tt; t = t->rn_ybro) {} if (t) t->rn_ybro = tt->rn_ybro; #endif t = tt->rn_parent; dupedkey = saved_tt->rn_dupedkey; if (dupedkey) { /* * Here, tt is the deletion target and * saved_tt is the head of the dupekey chain. */ if (tt == saved_tt) { /* remove from head of chain */ x = dupedkey; x->rn_parent = t; if (t->rn_left == tt) t->rn_left = x; else t->rn_right = x; } else { /* find node in front of tt on the chain */ for (x = p = saved_tt; p && p->rn_dupedkey != tt;) p = p->rn_dupedkey; if (p) { p->rn_dupedkey = tt->rn_dupedkey; if (tt->rn_dupedkey) /* parent */ tt->rn_dupedkey->rn_parent = p; /* parent */ } else log(LOG_ERR, "rn_delete: couldn't find us\n"); } t = tt + 1; if (t->rn_flags & RNF_ACTIVE) { #ifndef RN_DEBUG *++x = *t; p = t->rn_parent; #else b = t->rn_info; *++x = *t; t->rn_info = b; p = t->rn_parent; #endif if (p->rn_left == t) p->rn_left = x; else p->rn_right = x; x->rn_left->rn_parent = x; x->rn_right->rn_parent = x; } goto out; } if (t->rn_left == tt) x = t->rn_right; else x = t->rn_left; p = t->rn_parent; if (p->rn_right == t) p->rn_right = x; else p->rn_left = x; x->rn_parent = p; /* * Demote routes attached to us. */ if (t->rn_mklist) { if (x->rn_bit >= 0) { for (mp = &x->rn_mklist; (m = *mp);) mp = &m->rm_mklist; *mp = t->rn_mklist; } else { /* If there are any key,mask pairs in a sibling duped-key chain, some subset will appear sorted in the same order attached to our mklist */ for (m = t->rn_mklist; m && x; x = x->rn_dupedkey) if (m == x->rn_mklist) { struct radix_mask *mm = m->rm_mklist; x->rn_mklist = 0; if (--(m->rm_refs) < 0) R_Free(m); m = mm; } if (m) log(LOG_ERR, "rn_delete: Orphaned Mask %p at %p\n", m, x); } } /* * We may be holding an active internal node in the tree. */ x = tt + 1; if (t != x) { #ifndef RN_DEBUG *t = *x; #else b = t->rn_info; *t = *x; t->rn_info = b; #endif t->rn_left->rn_parent = t; t->rn_right->rn_parent = t; p = x->rn_parent; if (p->rn_left == x) p->rn_left = t; else p->rn_right = t; } out: tt->rn_flags &= ~RNF_ACTIVE; tt[1].rn_flags &= ~RNF_ACTIVE; return (tt); } /* * This is the same as rn_walktree() except for the parameters and the * exit. */ -static int -rn_walktree_from(struct radix_node_head *h, void *a, void *m, +int +rn_walktree_from(struct radix_head *h, void *a, void *m, walktree_f_t *f, void *w) { int error; struct radix_node *base, *next; u_char *xa = (u_char *)a; u_char *xm = (u_char *)m; struct radix_node *rn, *last = NULL; /* shut up gcc */ int stopping = 0; int lastb; KASSERT(m != NULL, ("%s: mask needs to be specified", __func__)); /* * rn_search_m is sort-of-open-coded here. We cannot use the * function because we need to keep track of the last node seen. */ /* printf("about to search\n"); */ for (rn = h->rnh_treetop; rn->rn_bit >= 0; ) { last = rn; /* printf("rn_bit %d, rn_bmask %x, xm[rn_offset] %x\n", rn->rn_bit, rn->rn_bmask, xm[rn->rn_offset]); */ if (!(rn->rn_bmask & xm[rn->rn_offset])) { break; } if (rn->rn_bmask & xa[rn->rn_offset]) { rn = rn->rn_right; } else { rn = rn->rn_left; } } /* printf("done searching\n"); */ /* * Two cases: either we stepped off the end of our mask, * in which case last == rn, or we reached a leaf, in which * case we want to start from the leaf. */ if (rn->rn_bit >= 0) rn = last; lastb = last->rn_bit; /* printf("rn %p, lastb %d\n", rn, lastb);*/ /* * This gets complicated because we may delete the node * while applying the function f to it, so we need to calculate * the successor node in advance. */ while (rn->rn_bit >= 0) rn = rn->rn_left; while (!stopping) { /* printf("node %p (%d)\n", rn, rn->rn_bit); */ base = rn; /* If at right child go back up, otherwise, go right */ while (rn->rn_parent->rn_right == rn && !(rn->rn_flags & RNF_ROOT)) { rn = rn->rn_parent; /* if went up beyond last, stop */ if (rn->rn_bit <= lastb) { stopping = 1; /* printf("up too far\n"); */ /* * XXX we should jump to the 'Process leaves' * part, because the values of 'rn' and 'next' * we compute will not be used. Not a big deal * because this loop will terminate, but it is * inefficient and hard to understand! */ } } /* * At the top of the tree, no need to traverse the right * half, prevent the traversal of the entire tree in the * case of default route. */ if (rn->rn_parent->rn_flags & RNF_ROOT) stopping = 1; /* Find the next *leaf* since next node might vanish, too */ for (rn = rn->rn_parent->rn_right; rn->rn_bit >= 0;) rn = rn->rn_left; next = rn; /* Process leaves */ while ((rn = base) != 0) { base = rn->rn_dupedkey; /* printf("leaf %p\n", rn); */ if (!(rn->rn_flags & RNF_ROOT) && (error = (*f)(rn, w))) return (error); } rn = next; if (rn->rn_flags & RNF_ROOT) { /* printf("root, stopping"); */ stopping = 1; } } return (0); } -static int -rn_walktree(struct radix_node_head *h, walktree_f_t *f, void *w) +int +rn_walktree(struct radix_head *h, walktree_f_t *f, void *w) { int error; struct radix_node *base, *next; struct radix_node *rn = h->rnh_treetop; /* * This gets complicated because we may delete the node * while applying the function f to it, so we need to calculate * the successor node in advance. */ /* First time through node, go left */ while (rn->rn_bit >= 0) rn = rn->rn_left; for (;;) { base = rn; /* If at right child go back up, otherwise, go right */ while (rn->rn_parent->rn_right == rn && (rn->rn_flags & RNF_ROOT) == 0) rn = rn->rn_parent; /* Find the next *leaf* since next node might vanish, too */ for (rn = rn->rn_parent->rn_right; rn->rn_bit >= 0;) rn = rn->rn_left; next = rn; /* Process leaves */ while ((rn = base)) { base = rn->rn_dupedkey; if (!(rn->rn_flags & RNF_ROOT) && (error = (*f)(rn, w))) return (error); } rn = next; if (rn->rn_flags & RNF_ROOT) return (0); } /* NOTREACHED */ } /* - * Allocate and initialize an empty tree. This has 3 nodes, which are - * part of the radix_node_head (in the order ) and are + * Initialize an empty tree. This has 3 nodes, which are passed + * via base_nodes (in the order ) and are * marked RNF_ROOT so they cannot be freed. * The leaves have all-zero and all-one keys, with significant * bits starting at 'off'. - * Return 1 on success, 0 on error. */ -static int -rn_inithead_internal(void **head, int off) +void +rn_inithead_internal(struct radix_head *rh, struct radix_node *base_nodes, int off) { - struct radix_node_head *rnh; struct radix_node *t, *tt, *ttt; - if (*head) - return (1); - R_Zalloc(rnh, struct radix_node_head *, sizeof (*rnh)); - if (rnh == 0) - return (0); - *head = rnh; - t = rn_newpair(rn_zeros, off, rnh->rnh_nodes); - ttt = rnh->rnh_nodes + 2; + + t = rn_newpair(rn_zeros, off, base_nodes); + ttt = base_nodes + 2; t->rn_right = ttt; t->rn_parent = t; - tt = t->rn_left; /* ... which in turn is rnh->rnh_nodes */ + tt = t->rn_left; /* ... which in turn is base_nodes */ tt->rn_flags = t->rn_flags = RNF_ROOT | RNF_ACTIVE; tt->rn_bit = -1 - off; *ttt = *tt; ttt->rn_key = rn_ones; - rnh->rnh_addaddr = rn_addroute; - rnh->rnh_deladdr = rn_delete; - rnh->rnh_matchaddr = rn_match; - rnh->rnh_lookup = rn_lookup; - rnh->rnh_walktree = rn_walktree; - rnh->rnh_walktree_from = rn_walktree_from; - rnh->rnh_treetop = t; - return (1); + + rh->rnh_treetop = t; } static void -rn_detachhead_internal(void **head) +rn_detachhead_internal(struct radix_head *head) { - struct radix_node_head *rnh; - KASSERT((head != NULL && *head != NULL), + KASSERT((head != NULL), ("%s: head already freed", __func__)); - rnh = *head; /* Free nodes. */ - R_Free(rnh); - - *head = NULL; + R_Free(head); } +/* Functions used by 'struct radix_node_head' users */ + int rn_inithead(void **head, int off) { struct radix_node_head *rnh; + struct radix_mask_head *rmh; + rnh = *head; + rmh = NULL; + if (*head != NULL) return (1); - if (rn_inithead_internal(head, off) == 0) + R_Zalloc(rnh, struct radix_node_head *, sizeof (*rnh)); + R_Zalloc(rmh, struct radix_mask_head *, sizeof (*rmh)); + if (rnh == NULL || rmh == NULL) { + if (rnh != NULL) + R_Free(rnh); return (0); + } - rnh = (struct radix_node_head *)(*head); + /* Init trees */ + rn_inithead_internal(&rnh->rh, rnh->rnh_nodes, off); + rn_inithead_internal(&rmh->head, rmh->mask_nodes, 0); + *head = rnh; + rnh->rh.rnh_masks = rmh; - if (rn_inithead_internal((void **)&rnh->rnh_masks, 0) == 0) { - rn_detachhead_internal(head); - return (0); - } + /* Finally, set base callbacks */ + rnh->rnh_addaddr = rn_addroute; + rnh->rnh_deladdr = rn_delete; + rnh->rnh_matchaddr = rn_match; + rnh->rnh_lookup = rn_lookup; + rnh->rnh_walktree = rn_walktree; + rnh->rnh_walktree_from = rn_walktree_from; return (1); } static int rn_freeentry(struct radix_node *rn, void *arg) { - struct radix_node_head * const rnh = arg; + struct radix_head * const rnh = arg; struct radix_node *x; x = (struct radix_node *)rn_delete(rn + 2, NULL, rnh); if (x != NULL) R_Free(x); return (0); } int rn_detachhead(void **head) { struct radix_node_head *rnh; KASSERT((head != NULL && *head != NULL), ("%s: head already freed", __func__)); - rnh = *head; + rnh = (struct radix_node_head *)(*head); - rn_walktree(rnh->rnh_masks, rn_freeentry, rnh->rnh_masks); - rn_detachhead_internal((void **)&rnh->rnh_masks); - rn_detachhead_internal(head); + rn_walktree(&rnh->rh.rnh_masks->head, rn_freeentry, rnh->rh.rnh_masks); + rn_detachhead_internal(&rnh->rh.rnh_masks->head); + rn_detachhead_internal(&rnh->rh); + + *head = NULL; + return (1); } Index: projects/clang380-import/sys/net/radix.h =================================================================== --- projects/clang380-import/sys/net/radix.h (revision 294776) +++ projects/clang380-import/sys/net/radix.h (revision 294777) @@ -1,168 +1,187 @@ /*- * Copyright (c) 1988, 1989, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)radix.h 8.2 (Berkeley) 10/31/94 * $FreeBSD$ */ #ifndef _RADIX_H_ #define _RADIX_H_ #ifdef _KERNEL #include #include #include #endif #ifdef MALLOC_DECLARE MALLOC_DECLARE(M_RTABLE); #endif /* * Radix search tree node layout. */ struct radix_node { struct radix_mask *rn_mklist; /* list of masks contained in subtree */ struct radix_node *rn_parent; /* parent */ short rn_bit; /* bit offset; -1-index(netmask) */ char rn_bmask; /* node: mask for bit test*/ u_char rn_flags; /* enumerated next */ #define RNF_NORMAL 1 /* leaf contains normal route */ #define RNF_ROOT 2 /* leaf is root leaf for tree */ #define RNF_ACTIVE 4 /* This node is alive (for rtfree) */ union { struct { /* leaf only data: */ caddr_t rn_Key; /* object of search */ caddr_t rn_Mask; /* netmask, if present */ struct radix_node *rn_Dupedkey; } rn_leaf; struct { /* node only data: */ int rn_Off; /* where to start compare */ struct radix_node *rn_L;/* progeny */ struct radix_node *rn_R;/* progeny */ } rn_node; } rn_u; #ifdef RN_DEBUG int rn_info; struct radix_node *rn_twin; struct radix_node *rn_ybro; #endif }; #define rn_dupedkey rn_u.rn_leaf.rn_Dupedkey #define rn_key rn_u.rn_leaf.rn_Key #define rn_mask rn_u.rn_leaf.rn_Mask #define rn_offset rn_u.rn_node.rn_Off #define rn_left rn_u.rn_node.rn_L #define rn_right rn_u.rn_node.rn_R /* * Annotations to tree concerning potential routes applying to subtrees. */ struct radix_mask { short rm_bit; /* bit offset; -1-index(netmask) */ char rm_unused; /* cf. rn_bmask */ u_char rm_flags; /* cf. rn_flags */ struct radix_mask *rm_mklist; /* more masks to try */ union { caddr_t rmu_mask; /* the mask */ struct radix_node *rmu_leaf; /* for normal routes */ } rm_rmu; int rm_refs; /* # of references to this struct */ }; #define rm_mask rm_rmu.rmu_mask #define rm_leaf rm_rmu.rmu_leaf /* extra field would make 32 bytes */ +struct radix_head; + typedef int walktree_f_t(struct radix_node *, void *); +typedef struct radix_node *rn_matchaddr_f_t(void *v, + struct radix_head *head); +typedef struct radix_node *rn_addaddr_f_t(void *v, void *mask, + struct radix_head *head, struct radix_node nodes[]); +typedef struct radix_node *rn_deladdr_f_t(void *v, void *mask, + struct radix_head *head); +typedef struct radix_node *rn_lookup_f_t(void *v, void *mask, + struct radix_head *head); +typedef int rn_walktree_t(struct radix_head *head, walktree_f_t *f, + void *w); +typedef int rn_walktree_from_t(struct radix_head *head, + void *a, void *m, walktree_f_t *f, void *w); +typedef void rn_close_t(struct radix_node *rn, struct radix_head *head); -struct radix_node_head { +struct radix_mask_head; + +struct radix_head { struct radix_node *rnh_treetop; - u_int rnh_gen; /* generation counter */ - int rnh_multipath; /* multipath capable ? */ - struct radix_node *(*rnh_addaddr) /* add based on sockaddr */ - (void *v, void *mask, - struct radix_node_head *head, struct radix_node nodes[]); - struct radix_node *(*rnh_deladdr) /* remove based on sockaddr */ - (void *v, void *mask, struct radix_node_head *head); - struct radix_node *(*rnh_matchaddr) /* longest match for sockaddr */ - (void *v, struct radix_node_head *head); - struct radix_node *(*rnh_lookup) /*exact match for sockaddr*/ - (void *v, void *mask, struct radix_node_head *head); - int (*rnh_walktree) /* traverse tree */ - (struct radix_node_head *head, walktree_f_t *f, void *w); - int (*rnh_walktree_from) /* traverse tree below a */ - (struct radix_node_head *head, void *a, void *m, - walktree_f_t *f, void *w); - void (*rnh_close) /* do something when the last ref drops */ - (struct radix_node *rn, struct radix_node_head *head); + struct radix_mask_head *rnh_masks; /* Storage for our masks */ +}; + +struct radix_node_head { + struct radix_head rh; + rn_matchaddr_f_t *rnh_matchaddr; /* longest match for sockaddr */ + rn_addaddr_f_t *rnh_addaddr; /* add based on sockaddr*/ + rn_deladdr_f_t *rnh_deladdr; /* remove based on sockaddr */ + rn_lookup_f_t *rnh_lookup; /* exact match for sockaddr */ + rn_walktree_t *rnh_walktree; /* traverse tree */ + rn_walktree_from_t *rnh_walktree_from; /* traverse tree below a */ + rn_close_t *rnh_close; /*do something when the last ref drops*/ struct radix_node rnh_nodes[3]; /* empty tree for common case */ - struct radix_node_head *rnh_masks; /* Storage for our masks */ #ifdef _KERNEL struct rwlock rnh_lock; /* locks entire radix tree */ #endif }; +struct radix_mask_head { + struct radix_head head; + struct radix_node mask_nodes[3]; +}; + +void rn_inithead_internal(struct radix_head *rh, struct radix_node *base_nodes, + int off); + #ifndef _KERNEL #define R_Malloc(p, t, n) (p = (t) malloc((unsigned int)(n))) #define R_Zalloc(p, t, n) (p = (t) calloc(1,(unsigned int)(n))) #define R_Free(p) free((char *)p); #else #define R_Malloc(p, t, n) (p = (t) malloc((unsigned long)(n), M_RTABLE, M_NOWAIT)) #define R_Zalloc(p, t, n) (p = (t) malloc((unsigned long)(n), M_RTABLE, M_NOWAIT | M_ZERO)) #define R_Free(p) free((caddr_t)p, M_RTABLE); #define RADIX_NODE_HEAD_LOCK_INIT(rnh) \ rw_init_flags(&(rnh)->rnh_lock, "radix node head", 0) #define RADIX_NODE_HEAD_LOCK(rnh) rw_wlock(&(rnh)->rnh_lock) #define RADIX_NODE_HEAD_UNLOCK(rnh) rw_wunlock(&(rnh)->rnh_lock) #define RADIX_NODE_HEAD_RLOCK(rnh) rw_rlock(&(rnh)->rnh_lock) #define RADIX_NODE_HEAD_RUNLOCK(rnh) rw_runlock(&(rnh)->rnh_lock) #define RADIX_NODE_HEAD_LOCK_TRY_UPGRADE(rnh) rw_try_upgrade(&(rnh)->rnh_lock) #define RADIX_NODE_HEAD_DESTROY(rnh) rw_destroy(&(rnh)->rnh_lock) #define RADIX_NODE_HEAD_LOCK_ASSERT(rnh) rw_assert(&(rnh)->rnh_lock, RA_LOCKED) #define RADIX_NODE_HEAD_WLOCK_ASSERT(rnh) rw_assert(&(rnh)->rnh_lock, RA_WLOCKED) #endif /* _KERNEL */ int rn_inithead(void **, int); int rn_detachhead(void **); int rn_refines(void *, void *); -struct radix_node - *rn_addmask(void *, struct radix_node_head *, int, int), - *rn_addroute (void *, void *, struct radix_node_head *, - struct radix_node [2]), - *rn_delete(void *, void *, struct radix_node_head *), - *rn_lookup (void *v_arg, void *m_arg, - struct radix_node_head *head), - *rn_match(void *, struct radix_node_head *); +struct radix_node *rn_addroute(void *, void *, struct radix_head *, + struct radix_node[2]); +struct radix_node *rn_delete(void *, void *, struct radix_head *); +struct radix_node *rn_lookup (void *v_arg, void *m_arg, + struct radix_head *head); +struct radix_node *rn_match(void *, struct radix_head *); +int rn_walktree_from(struct radix_head *h, void *a, void *m, + walktree_f_t *f, void *w); +int rn_walktree(struct radix_head *, walktree_f_t *, void *); #endif /* _RADIX_H_ */ Index: projects/clang380-import/sys/net/radix_mpath.c =================================================================== --- projects/clang380-import/sys/net/radix_mpath.c (revision 294776) +++ projects/clang380-import/sys/net/radix_mpath.c (revision 294777) @@ -1,314 +1,322 @@ /* $KAME: radix_mpath.c,v 1.17 2004/11/08 10:29:39 itojun Exp $ */ /* * Copyright (C) 2001 WIDE Project. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * THE AUTHORS DO NOT GUARANTEE THAT THIS SOFTWARE DOES NOT INFRINGE * ANY OTHERS' INTELLECTUAL PROPERTIES. IN NO EVENT SHALL THE AUTHORS * BE LIABLE FOR ANY INFRINGEMENT OF ANY OTHERS' INTELLECTUAL * PROPERTIES. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include +#include #include #include /* * give some jitter to hash, to avoid synchronization between routers */ static uint32_t hashjitter; int -rn_mpath_capable(struct radix_node_head *rnh) +rt_mpath_capable(struct rib_head *rnh) { return rnh->rnh_multipath; } +int +rn_mpath_capable(struct radix_head *rh) +{ + + return (rt_mpath_capable((struct rib_head *)rh)); +} + struct radix_node * rn_mpath_next(struct radix_node *rn) { struct radix_node *next; if (!rn->rn_dupedkey) return NULL; next = rn->rn_dupedkey; if (rn->rn_mask == next->rn_mask) return next; else return NULL; } uint32_t rn_mpath_count(struct radix_node *rn) { uint32_t i = 0; struct rtentry *rt; while (rn != NULL) { rt = (struct rtentry *)rn; i += rt->rt_weight; rn = rn_mpath_next(rn); } return (i); } struct rtentry * rt_mpath_matchgate(struct rtentry *rt, struct sockaddr *gate) { struct radix_node *rn; if (!gate || !rt->rt_gateway) return NULL; /* beyond here, we use rn as the master copy */ rn = (struct radix_node *)rt; do { rt = (struct rtentry *)rn; /* * we are removing an address alias that has * the same prefix as another address * we need to compare the interface address because * rt_gateway is a special sockadd_dl structure */ if (rt->rt_gateway->sa_family == AF_LINK) { if (!memcmp(rt->rt_ifa->ifa_addr, gate, gate->sa_len)) break; } /* * Check for other options: * 1) Routes with 'real' IPv4/IPv6 gateway * 2) Loopback host routes (another AF_LINK/sockadd_dl check) * */ if (rt->rt_gateway->sa_len == gate->sa_len && !memcmp(rt->rt_gateway, gate, gate->sa_len)) break; } while ((rn = rn_mpath_next(rn)) != NULL); return (struct rtentry *)rn; } /* * go through the chain and unlink "rt" from the list * the caller will free "rt" */ int rt_mpath_deldup(struct rtentry *headrt, struct rtentry *rt) { struct radix_node *t, *tt; if (!headrt || !rt) return (0); t = (struct radix_node *)headrt; tt = rn_mpath_next(t); while (tt) { if (tt == (struct radix_node *)rt) { t->rn_dupedkey = tt->rn_dupedkey; tt->rn_dupedkey = NULL; tt->rn_flags &= ~RNF_ACTIVE; tt[1].rn_flags &= ~RNF_ACTIVE; return (1); } t = tt; tt = rn_mpath_next((struct radix_node *)t); } return (0); } /* * check if we have the same key/mask/gateway on the table already. * Assume @rt rt_key host bits are cleared according to @netmask */ int -rt_mpath_conflict(struct radix_node_head *rnh, struct rtentry *rt, +rt_mpath_conflict(struct rib_head *rnh, struct rtentry *rt, struct sockaddr *netmask) { struct radix_node *rn, *rn1; struct rtentry *rt1; rn = (struct radix_node *)rt; - rn1 = rnh->rnh_lookup(rt_key(rt), netmask, rnh); + rn1 = rnh->rnh_lookup(rt_key(rt), netmask, &rnh->head); if (!rn1 || rn1->rn_flags & RNF_ROOT) return (0); /* key/mask are the same. compare gateway for all multipaths */ do { rt1 = (struct rtentry *)rn1; /* sanity: no use in comparing the same thing */ if (rn1 == rn) continue; if (rt1->rt_gateway->sa_family == AF_LINK) { if (rt1->rt_ifa->ifa_addr->sa_len != rt->rt_ifa->ifa_addr->sa_len || bcmp(rt1->rt_ifa->ifa_addr, rt->rt_ifa->ifa_addr, rt1->rt_ifa->ifa_addr->sa_len)) continue; } else { if (rt1->rt_gateway->sa_len != rt->rt_gateway->sa_len || bcmp(rt1->rt_gateway, rt->rt_gateway, rt1->rt_gateway->sa_len)) continue; } /* all key/mask/gateway are the same. conflicting entry. */ return (EEXIST); } while ((rn1 = rn_mpath_next(rn1)) != NULL); return (0); } static struct rtentry * rt_mpath_selectrte(struct rtentry *rte, uint32_t hash) { struct radix_node *rn0, *rn; uint32_t total_weight; struct rtentry *rt; int64_t weight; /* beyond here, we use rn as the master copy */ rn0 = rn = (struct radix_node *)rte; rt = rte; /* gw selection by Modulo-N Hash (RFC2991) XXX need improvement? */ total_weight = rn_mpath_count(rn0); hash += hashjitter; hash %= total_weight; for (weight = abs((int32_t)hash); rt != NULL && weight >= rt->rt_weight; weight -= rt->rt_weight) { /* stay within the multipath routes */ if (rn->rn_dupedkey && rn->rn_mask != rn->rn_dupedkey->rn_mask) break; rn = rn->rn_dupedkey; rt = (struct rtentry *)rn; } return (rt); } struct rtentry * rt_mpath_select(struct rtentry *rte, uint32_t hash) { if (rn_mpath_next((struct radix_node *)rte) == NULL) return (rte); return (rt_mpath_selectrte(rte, hash)); } void rtalloc_mpath_fib(struct route *ro, uint32_t hash, u_int fibnum) { struct rtentry *rt; /* * XXX we don't attempt to lookup cached route again; what should * be done for sendto(3) case? */ if (ro->ro_rt && ro->ro_rt->rt_ifp && (ro->ro_rt->rt_flags & RTF_UP) && RT_LINK_IS_UP(ro->ro_rt->rt_ifp)) return; ro->ro_rt = rtalloc1_fib(&ro->ro_dst, 1, 0, fibnum); /* if the route does not exist or it is not multipath, don't care */ if (ro->ro_rt == NULL) return; if (rn_mpath_next((struct radix_node *)ro->ro_rt) == NULL) { RT_UNLOCK(ro->ro_rt); return; } rt = rt_mpath_selectrte(ro->ro_rt, hash); /* XXX try filling rt_gwroute and avoid unreachable gw */ /* gw selection has failed - there must be only zero weight routes */ if (!rt) { RT_UNLOCK(ro->ro_rt); ro->ro_rt = NULL; return; } if (ro->ro_rt != rt) { RTFREE_LOCKED(ro->ro_rt); ro->ro_rt = rt; RT_LOCK(ro->ro_rt); RT_ADDREF(ro->ro_rt); } RT_UNLOCK(ro->ro_rt); } extern int in6_inithead(void **head, int off); extern int in_inithead(void **head, int off); #ifdef INET int rn4_mpath_inithead(void **head, int off) { - struct radix_node_head *rnh; + struct rib_head *rnh; hashjitter = arc4random(); if (in_inithead(head, off) == 1) { - rnh = (struct radix_node_head *)*head; + rnh = (struct rib_head *)*head; rnh->rnh_multipath = 1; return 1; } else return 0; } #endif #ifdef INET6 int rn6_mpath_inithead(void **head, int off) { - struct radix_node_head *rnh; + struct rib_head *rnh; hashjitter = arc4random(); if (in6_inithead(head, off) == 1) { - rnh = (struct radix_node_head *)*head; + rnh = (struct rib_head *)*head; rnh->rnh_multipath = 1; return 1; } else return 0; } #endif Index: projects/clang380-import/sys/net/radix_mpath.h =================================================================== --- projects/clang380-import/sys/net/radix_mpath.h (revision 294776) +++ projects/clang380-import/sys/net/radix_mpath.h (revision 294777) @@ -1,64 +1,63 @@ /* $KAME: radix_mpath.h,v 1.10 2004/11/06 15:44:28 itojun Exp $ */ /* * Copyright (C) 2001 WIDE Project. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * THE AUTHORS DO NOT GUARANTEE THAT THIS SOFTWARE DOES NOT INFRINGE * ANY OTHERS' INTELLECTUAL PROPERTIES. IN NO EVENT SHALL THE AUTHORS * BE LIABLE FOR ANY INFRINGEMENT OF ANY OTHERS' INTELLECTUAL * PROPERTIES. */ /* $FreeBSD$ */ #ifndef _NET_RADIX_MPATH_H_ #define _NET_RADIX_MPATH_H_ #ifdef _KERNEL /* * Radix tree API with multipath support */ struct route; struct rtentry; struct sockaddr; -int rn_mpath_capable(struct radix_node_head *); +struct rib_head; +int rt_mpath_capable(struct rib_head *); +int rn_mpath_capable(struct radix_head *); struct radix_node *rn_mpath_next(struct radix_node *); u_int32_t rn_mpath_count(struct radix_node *); struct rtentry *rt_mpath_matchgate(struct rtentry *, struct sockaddr *); -int rt_mpath_conflict(struct radix_node_head *, struct rtentry *, +int rt_mpath_conflict(struct rib_head *, struct rtentry *, struct sockaddr *); void rtalloc_mpath_fib(struct route *, u_int32_t, u_int); -#define rtalloc_mpath(_route, _hash) rtalloc_mpath_fib((_route), (_hash), 0) struct rtentry *rt_mpath_select(struct rtentry *, uint32_t); -struct radix_node *rn_mpath_lookup(void *, void *, - struct radix_node_head *); int rt_mpath_deldup(struct rtentry *, struct rtentry *); int rn4_mpath_inithead(void **, int); int rn6_mpath_inithead(void **, int); #endif #endif /* _NET_RADIX_MPATH_H_ */ Index: projects/clang380-import/sys/net/route.c =================================================================== --- projects/clang380-import/sys/net/route.c (revision 294776) +++ projects/clang380-import/sys/net/route.c (revision 294777) @@ -1,2241 +1,2279 @@ /*- * Copyright (c) 1980, 1986, 1991, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)route.c 8.3.1.1 (Berkeley) 2/23/95 * $FreeBSD$ */ /************************************************************************ * Note: In this file a 'fib' is a "forwarding information base" * * Which is the new name for an in kernel routing (next hop) table. * ***********************************************************************/ #include "opt_inet.h" #include "opt_inet6.h" #include "opt_route.h" #include "opt_sctp.h" #include "opt_mrouting.h" #include "opt_mpath.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #ifdef RADIX_MPATH #include #endif #include #include #include #define RT_MAXFIBS UINT16_MAX /* Kernel config default option. */ #ifdef ROUTETABLES #if ROUTETABLES <= 0 #error "ROUTETABLES defined too low" #endif #if ROUTETABLES > RT_MAXFIBS #error "ROUTETABLES defined too big" #endif #define RT_NUMFIBS ROUTETABLES #endif /* ROUTETABLES */ /* Initialize to default if not otherwise set. */ #ifndef RT_NUMFIBS #define RT_NUMFIBS 1 #endif #if defined(INET) || defined(INET6) #ifdef SCTP extern void sctp_addr_change(struct ifaddr *ifa, int cmd); #endif /* SCTP */ #endif /* This is read-only.. */ u_int rt_numfibs = RT_NUMFIBS; SYSCTL_UINT(_net, OID_AUTO, fibs, CTLFLAG_RDTUN, &rt_numfibs, 0, ""); /* * By default add routes to all fibs for new interfaces. * Once this is set to 0 then only allocate routes on interface * changes for the FIB of the caller when adding a new set of addresses * to an interface. XXX this is a shotgun aproach to a problem that needs * a more fine grained solution.. that will come. * XXX also has the problems getting the FIB from curthread which will not * always work given the fib can be overridden and prefixes can be added * from the network stack context. */ VNET_DEFINE(u_int, rt_add_addr_allfibs) = 1; SYSCTL_UINT(_net, OID_AUTO, add_addr_allfibs, CTLFLAG_RWTUN | CTLFLAG_VNET, &VNET_NAME(rt_add_addr_allfibs), 0, ""); VNET_DEFINE(struct rtstat, rtstat); #define V_rtstat VNET(rtstat) -VNET_DEFINE(struct radix_node_head *, rt_tables); +VNET_DEFINE(struct rib_head *, rt_tables); #define V_rt_tables VNET(rt_tables) VNET_DEFINE(int, rttrash); /* routes not in table but not freed */ #define V_rttrash VNET(rttrash) /* * Convert a 'struct radix_node *' to a 'struct rtentry *'. * The operation can be done safely (in this code) because a * 'struct rtentry' starts with two 'struct radix_node''s, the first * one representing leaf nodes in the routing tree, which is * what the code in radix.c passes us as a 'struct radix_node'. * * But because there are a lot of assumptions in this conversion, * do not cast explicitly, but always use the macro below. */ #define RNTORT(p) ((struct rtentry *)(p)) static VNET_DEFINE(uma_zone_t, rtzone); /* Routing table UMA zone. */ #define V_rtzone VNET(rtzone) -static int rtrequest1_fib_change(struct radix_node_head *, struct rt_addrinfo *, +static int rtrequest1_fib_change(struct rib_head *, struct rt_addrinfo *, struct rtentry **, u_int); static void rt_setmetrics(const struct rt_addrinfo *, struct rtentry *); static int rt_ifdelroute(const struct rtentry *rt, void *arg); -static struct rtentry *rt_unlinkrte(struct radix_node_head *rnh, +static struct rtentry *rt_unlinkrte(struct rib_head *rnh, struct rt_addrinfo *info, int *perror); static void rt_notifydelete(struct rtentry *rt, struct rt_addrinfo *info); #ifdef RADIX_MPATH -static struct radix_node *rt_mpath_unlink(struct radix_node_head *rnh, +static struct radix_node *rt_mpath_unlink(struct rib_head *rnh, struct rt_addrinfo *info, struct rtentry *rto, int *perror); #endif static int rt_exportinfo(struct rtentry *rt, struct rt_addrinfo *info, int flags); struct if_mtuinfo { struct ifnet *ifp; int mtu; }; static int if_updatemtu_cb(struct radix_node *, void *); /* * handler for net.my_fibnum */ static int sysctl_my_fibnum(SYSCTL_HANDLER_ARGS) { int fibnum; int error; fibnum = curthread->td_proc->p_fibnum; error = sysctl_handle_int(oidp, &fibnum, 0, req); return (error); } SYSCTL_PROC(_net, OID_AUTO, my_fibnum, CTLTYPE_INT|CTLFLAG_RD, NULL, 0, &sysctl_my_fibnum, "I", "default FIB of caller"); -static __inline struct radix_node_head ** +static __inline struct rib_head ** rt_tables_get_rnh_ptr(int table, int fam) { - struct radix_node_head **rnh; + struct rib_head **rnh; KASSERT(table >= 0 && table < rt_numfibs, ("%s: table out of bounds.", __func__)); KASSERT(fam >= 0 && fam < (AF_MAX+1), ("%s: fam out of bounds.", __func__)); /* rnh is [fib=0][af=0]. */ - rnh = (struct radix_node_head **)V_rt_tables; + rnh = (struct rib_head **)V_rt_tables; /* Get the offset to the requested table and fam. */ rnh += table * (AF_MAX+1) + fam; return (rnh); } -struct radix_node_head * +struct rib_head * rt_tables_get_rnh(int table, int fam) { return (*rt_tables_get_rnh_ptr(table, fam)); } /* * route initialization must occur before ip6_init2(), which happenas at * SI_ORDER_MIDDLE. */ static void route_init(void) { /* whack the tunable ints into line. */ if (rt_numfibs > RT_MAXFIBS) rt_numfibs = RT_MAXFIBS; if (rt_numfibs == 0) rt_numfibs = 1; } SYSINIT(route_init, SI_SUB_PROTO_DOMAIN, SI_ORDER_THIRD, route_init, 0); static int rtentry_zinit(void *mem, int size, int how) { struct rtentry *rt = mem; rt->rt_pksent = counter_u64_alloc(how); if (rt->rt_pksent == NULL) return (ENOMEM); RT_LOCK_INIT(rt); return (0); } static void rtentry_zfini(void *mem, int size) { struct rtentry *rt = mem; RT_LOCK_DESTROY(rt); counter_u64_free(rt->rt_pksent); } static int rtentry_ctor(void *mem, int size, void *arg, int how) { struct rtentry *rt = mem; bzero(rt, offsetof(struct rtentry, rt_endzero)); counter_u64_zero(rt->rt_pksent); rt->rt_chain = NULL; return (0); } static void rtentry_dtor(void *mem, int size, void *arg) { struct rtentry *rt = mem; RT_UNLOCK_COND(rt); } static void vnet_route_init(const void *unused __unused) { struct domain *dom; - struct radix_node_head **rnh; + struct rib_head **rnh; int table; int fam; V_rt_tables = malloc(rt_numfibs * (AF_MAX+1) * - sizeof(struct radix_node_head *), M_RTABLE, M_WAITOK|M_ZERO); + sizeof(struct rib_head *), M_RTABLE, M_WAITOK|M_ZERO); V_rtzone = uma_zcreate("rtentry", sizeof(struct rtentry), rtentry_ctor, rtentry_dtor, rtentry_zinit, rtentry_zfini, UMA_ALIGN_PTR, 0); for (dom = domains; dom; dom = dom->dom_next) { if (dom->dom_rtattach == NULL) continue; for (table = 0; table < rt_numfibs; table++) { fam = dom->dom_family; if (table != 0 && fam != AF_INET6 && fam != AF_INET) break; rnh = rt_tables_get_rnh_ptr(table, fam); if (rnh == NULL) panic("%s: rnh NULL", __func__); dom->dom_rtattach((void **)rnh, 0); } } } VNET_SYSINIT(vnet_route_init, SI_SUB_PROTO_DOMAIN, SI_ORDER_FOURTH, vnet_route_init, 0); #ifdef VIMAGE static void vnet_route_uninit(const void *unused __unused) { int table; int fam; struct domain *dom; - struct radix_node_head **rnh; + struct rib_head **rnh; for (dom = domains; dom; dom = dom->dom_next) { if (dom->dom_rtdetach == NULL) continue; for (table = 0; table < rt_numfibs; table++) { fam = dom->dom_family; if (table != 0 && fam != AF_INET6 && fam != AF_INET) break; rnh = rt_tables_get_rnh_ptr(table, fam); if (rnh == NULL) panic("%s: rnh NULL", __func__); dom->dom_rtdetach((void **)rnh, 0); } } free(V_rt_tables, M_RTABLE); uma_zdestroy(V_rtzone); } VNET_SYSUNINIT(vnet_route_uninit, SI_SUB_PROTO_DOMAIN, SI_ORDER_THIRD, vnet_route_uninit, 0); #endif +struct rib_head * +rt_table_init(int offset) +{ + struct rib_head *rh; + + rh = malloc(sizeof(struct rib_head), M_RTABLE, M_WAITOK | M_ZERO); + + /* TODO: These details should be hidded inside radix.c */ + /* Init masks tree */ + rn_inithead_internal(&rh->head, rh->rnh_nodes, offset); + rn_inithead_internal(&rh->rmhead.head, rh->rmhead.mask_nodes, 0); + rh->head.rnh_masks = &rh->rmhead; + + /* Init locks */ + rw_init(&rh->rib_lock, "rib head lock"); + + /* Finally, set base callbacks */ + rh->rnh_addaddr = rn_addroute; + rh->rnh_deladdr = rn_delete; + rh->rnh_matchaddr = rn_match; + rh->rnh_lookup = rn_lookup; + rh->rnh_walktree = rn_walktree; + rh->rnh_walktree_from = rn_walktree_from; + + return (rh); +} + +void +rt_table_destroy(struct rib_head *rh) +{ + + /* Assume table is already empty */ + rw_destroy(&rh->rib_lock); + free(rh, M_RTABLE); +} + + #ifndef _SYS_SYSPROTO_H_ struct setfib_args { int fibnum; }; #endif int sys_setfib(struct thread *td, struct setfib_args *uap) { if (uap->fibnum < 0 || uap->fibnum >= rt_numfibs) return EINVAL; td->td_proc->p_fibnum = uap->fibnum; return (0); } /* * Packet routing routines. */ void rtalloc_ign_fib(struct route *ro, u_long ignore, u_int fibnum) { struct rtentry *rt; if ((rt = ro->ro_rt) != NULL) { if (rt->rt_ifp != NULL && rt->rt_flags & RTF_UP) return; RTFREE(rt); ro->ro_rt = NULL; } ro->ro_rt = rtalloc1_fib(&ro->ro_dst, 1, ignore, fibnum); if (ro->ro_rt) RT_UNLOCK(ro->ro_rt); } /* * Look up the route that matches the address given * Or, at least try.. Create a cloned route if needed. * * The returned route, if any, is locked. */ struct rtentry * rtalloc1(struct sockaddr *dst, int report, u_long ignflags) { return (rtalloc1_fib(dst, report, ignflags, RT_DEFAULT_FIB)); } struct rtentry * rtalloc1_fib(struct sockaddr *dst, int report, u_long ignflags, u_int fibnum) { - struct radix_node_head *rnh; + struct rib_head *rh; struct radix_node *rn; struct rtentry *newrt; struct rt_addrinfo info; int err = 0, msgtype = RTM_MISS; KASSERT((fibnum < rt_numfibs), ("rtalloc1_fib: bad fibnum")); - rnh = rt_tables_get_rnh(fibnum, dst->sa_family); + rh = rt_tables_get_rnh(fibnum, dst->sa_family); newrt = NULL; - if (rnh == NULL) + if (rh == NULL) goto miss; /* * Look up the address in the table for that Address Family */ - RADIX_NODE_HEAD_RLOCK(rnh); - rn = rnh->rnh_matchaddr(dst, rnh); + RIB_RLOCK(rh); + rn = rh->rnh_matchaddr(dst, &rh->head); if (rn && ((rn->rn_flags & RNF_ROOT) == 0)) { newrt = RNTORT(rn); RT_LOCK(newrt); RT_ADDREF(newrt); - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rh); return (newrt); } else - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rh); /* * Either we hit the root or couldn't find any match, * Which basically means * "caint get there frm here" */ miss: V_rtstat.rts_unreach++; if (report) { /* * If required, report the failure to the supervising * Authorities. * For a delete, this is not an error. (report == 0) */ bzero(&info, sizeof(info)); info.rti_info[RTAX_DST] = dst; rt_missmsg_fib(msgtype, &info, 0, err, fibnum); } return (newrt); } /* * Remove a reference count from an rtentry. * If the count gets low enough, take it out of the routing table */ void rtfree(struct rtentry *rt) { - struct radix_node_head *rnh; + struct rib_head *rnh; KASSERT(rt != NULL,("%s: NULL rt", __func__)); rnh = rt_tables_get_rnh(rt->rt_fibnum, rt_key(rt)->sa_family); KASSERT(rnh != NULL,("%s: NULL rnh", __func__)); RT_LOCK_ASSERT(rt); /* * The callers should use RTFREE_LOCKED() or RTFREE(), so * we should come here exactly with the last reference. */ RT_REMREF(rt); if (rt->rt_refcnt > 0) { log(LOG_DEBUG, "%s: %p has %d refs\n", __func__, rt, rt->rt_refcnt); goto done; } /* * On last reference give the "close method" a chance * to cleanup private state. This also permits (for * IPv4 and IPv6) a chance to decide if the routing table * entry should be purged immediately or at a later time. * When an immediate purge is to happen the close routine * typically calls rtexpunge which clears the RTF_UP flag * on the entry so that the code below reclaims the storage. */ if (rt->rt_refcnt == 0 && rnh->rnh_close) - rnh->rnh_close((struct radix_node *)rt, rnh); + rnh->rnh_close((struct radix_node *)rt, &rnh->head); /* * If we are no longer "up" (and ref == 0) * then we can free the resources associated * with the route. */ if ((rt->rt_flags & RTF_UP) == 0) { if (rt->rt_nodes->rn_flags & (RNF_ACTIVE | RNF_ROOT)) panic("rtfree 2"); /* * the rtentry must have been removed from the routing table * so it is represented in rttrash.. remove that now. */ V_rttrash--; #ifdef DIAGNOSTIC if (rt->rt_refcnt < 0) { printf("rtfree: %p not freed (neg refs)\n", rt); goto done; } #endif /* * release references on items we hold them on.. * e.g other routes and ifaddrs. */ if (rt->rt_ifa) ifa_free(rt->rt_ifa); /* * The key is separatly alloc'd so free it (see rt_setgate()). * This also frees the gateway, as they are always malloc'd * together. */ R_Free(rt_key(rt)); /* * and the rtentry itself of course */ uma_zfree(V_rtzone, rt); return; } done: RT_UNLOCK(rt); } /* * Force a routing table entry to the specified * destination to go through the given gateway. * Normally called as a result of a routing redirect * message from the network layer. */ void rtredirect_fib(struct sockaddr *dst, struct sockaddr *gateway, struct sockaddr *netmask, int flags, struct sockaddr *src, u_int fibnum) { struct rtentry *rt; int error = 0; short *stat = NULL; struct rt_addrinfo info; struct ifaddr *ifa; - struct radix_node_head *rnh; + struct rib_head *rnh; ifa = NULL; rnh = rt_tables_get_rnh(fibnum, dst->sa_family); if (rnh == NULL) { error = EAFNOSUPPORT; goto out; } /* verify the gateway is directly reachable */ if ((ifa = ifa_ifwithnet(gateway, 0, fibnum)) == NULL) { error = ENETUNREACH; goto out; } rt = rtalloc1_fib(dst, 0, 0UL, fibnum); /* NB: rt is locked */ /* * If the redirect isn't from our current router for this dst, * it's either old or wrong. If it redirects us to ourselves, * we have a routing loop, perhaps as a result of an interface * going down recently. */ if (!(flags & RTF_DONE) && rt) { if (!sa_equal(src, rt->rt_gateway)) { error = EINVAL; goto done; } if (rt->rt_ifa != ifa && ifa->ifa_addr->sa_family != AF_LINK) { error = EINVAL; goto done; } } if ((flags & RTF_GATEWAY) && ifa_ifwithaddr_check(gateway)) { error = EHOSTUNREACH; goto done; } /* * Create a new entry if we just got back a wildcard entry * or the lookup failed. This is necessary for hosts * which use routing redirects generated by smart gateways * to dynamically build the routing tables. */ if (rt == NULL || (rt_mask(rt) && rt_mask(rt)->sa_len < 2)) goto create; /* * Don't listen to the redirect if it's * for a route to an interface. */ if (rt->rt_flags & RTF_GATEWAY) { if (((rt->rt_flags & RTF_HOST) == 0) && (flags & RTF_HOST)) { /* * Changing from route to net => route to host. * Create new route, rather than smashing route to net. */ create: if (rt != NULL) RTFREE_LOCKED(rt); flags |= RTF_DYNAMIC; bzero((caddr_t)&info, sizeof(info)); info.rti_info[RTAX_DST] = dst; info.rti_info[RTAX_GATEWAY] = gateway; info.rti_info[RTAX_NETMASK] = netmask; info.rti_ifa = ifa; info.rti_flags = flags; error = rtrequest1_fib(RTM_ADD, &info, &rt, fibnum); if (rt != NULL) { RT_LOCK(rt); flags = rt->rt_flags; } stat = &V_rtstat.rts_dynamic; } else { /* * Smash the current notion of the gateway to * this destination. Should check about netmask!!! */ if ((flags & RTF_GATEWAY) == 0) rt->rt_flags &= ~RTF_GATEWAY; rt->rt_flags |= RTF_MODIFIED; flags |= RTF_MODIFIED; stat = &V_rtstat.rts_newgateway; /* * add the key and gateway (in one malloc'd chunk). */ RT_UNLOCK(rt); - RADIX_NODE_HEAD_LOCK(rnh); + RIB_WLOCK(rnh); RT_LOCK(rt); rt_setgate(rt, rt_key(rt), gateway); - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WUNLOCK(rnh); } } else error = EHOSTUNREACH; done: if (rt) RTFREE_LOCKED(rt); out: if (error) V_rtstat.rts_badredirect++; else if (stat != NULL) (*stat)++; bzero((caddr_t)&info, sizeof(info)); info.rti_info[RTAX_DST] = dst; info.rti_info[RTAX_GATEWAY] = gateway; info.rti_info[RTAX_NETMASK] = netmask; info.rti_info[RTAX_AUTHOR] = src; rt_missmsg_fib(RTM_REDIRECT, &info, flags, error, fibnum); if (ifa != NULL) ifa_free(ifa); } /* * Routing table ioctl interface. */ int rtioctl_fib(u_long req, caddr_t data, u_int fibnum) { /* * If more ioctl commands are added here, make sure the proper * super-user checks are being performed because it is possible for * prison-root to make it this far if raw sockets have been enabled * in jails. */ #ifdef INET /* Multicast goop, grrr... */ return mrt_ioctl ? mrt_ioctl(req, data, fibnum) : EOPNOTSUPP; #else /* INET */ return ENXIO; #endif /* INET */ } struct ifaddr * ifa_ifwithroute(int flags, const struct sockaddr *dst, struct sockaddr *gateway, u_int fibnum) { struct ifaddr *ifa; int not_found = 0; if ((flags & RTF_GATEWAY) == 0) { /* * If we are adding a route to an interface, * and the interface is a pt to pt link * we should search for the destination * as our clue to the interface. Otherwise * we can use the local address. */ ifa = NULL; if (flags & RTF_HOST) ifa = ifa_ifwithdstaddr(dst, fibnum); if (ifa == NULL) ifa = ifa_ifwithaddr(gateway); } else { /* * If we are adding a route to a remote net * or host, the gateway may still be on the * other end of a pt to pt link. */ ifa = ifa_ifwithdstaddr(gateway, fibnum); } if (ifa == NULL) ifa = ifa_ifwithnet(gateway, 0, fibnum); if (ifa == NULL) { struct rtentry *rt = rtalloc1_fib(gateway, 0, 0, fibnum); if (rt == NULL) return (NULL); /* * dismiss a gateway that is reachable only * through the default router */ switch (gateway->sa_family) { case AF_INET: if (satosin(rt_key(rt))->sin_addr.s_addr == INADDR_ANY) not_found = 1; break; case AF_INET6: if (IN6_IS_ADDR_UNSPECIFIED(&satosin6(rt_key(rt))->sin6_addr)) not_found = 1; break; default: break; } if (!not_found && rt->rt_ifa != NULL) { ifa = rt->rt_ifa; ifa_ref(ifa); } RT_REMREF(rt); RT_UNLOCK(rt); if (not_found || ifa == NULL) return (NULL); } if (ifa->ifa_addr->sa_family != dst->sa_family) { struct ifaddr *oifa = ifa; ifa = ifaof_ifpforaddr(dst, ifa->ifa_ifp); if (ifa == NULL) ifa = oifa; else ifa_free(oifa); } return (ifa); } /* * Do appropriate manipulations of a routing tree given * all the bits of info needed */ int rtrequest_fib(int req, struct sockaddr *dst, struct sockaddr *gateway, struct sockaddr *netmask, int flags, struct rtentry **ret_nrt, u_int fibnum) { struct rt_addrinfo info; if (dst->sa_len == 0) return(EINVAL); bzero((caddr_t)&info, sizeof(info)); info.rti_flags = flags; info.rti_info[RTAX_DST] = dst; info.rti_info[RTAX_GATEWAY] = gateway; info.rti_info[RTAX_NETMASK] = netmask; return rtrequest1_fib(req, &info, ret_nrt, fibnum); } /* * Copy most of @rt data into @info. * * If @flags contains NHR_COPY, copies dst,netmask and gw to the * pointers specified by @info structure. Assume such pointers * are zeroed sockaddr-like structures with sa_len field initialized * to reflect size of the provided buffer. if no NHR_COPY is specified, * point dst,netmask and gw @info fields to appropriate @rt values. * * if @flags contains NHR_REF, do refcouting on rt_ifp. * * Returns 0 on success. */ int rt_exportinfo(struct rtentry *rt, struct rt_addrinfo *info, int flags) { struct rt_metrics *rmx; struct sockaddr *src, *dst; int sa_len; if (flags & NHR_COPY) { /* Copy destination if dst is non-zero */ src = rt_key(rt); dst = info->rti_info[RTAX_DST]; sa_len = src->sa_len; if (dst != NULL) { if (src->sa_len > dst->sa_len) return (ENOMEM); memcpy(dst, src, src->sa_len); info->rti_addrs |= RTA_DST; } /* Copy mask if set && dst is non-zero */ src = rt_mask(rt); dst = info->rti_info[RTAX_NETMASK]; if (src != NULL && dst != NULL) { /* * Radix stores different value in sa_len, * assume rt_mask() to have the same length * as rt_key() */ if (sa_len > dst->sa_len) return (ENOMEM); memcpy(dst, src, src->sa_len); info->rti_addrs |= RTA_NETMASK; } /* Copy gateway is set && dst is non-zero */ src = rt->rt_gateway; dst = info->rti_info[RTAX_GATEWAY]; if ((rt->rt_flags & RTF_GATEWAY) && src != NULL && dst != NULL){ if (src->sa_len > dst->sa_len) return (ENOMEM); memcpy(dst, src, src->sa_len); info->rti_addrs |= RTA_GATEWAY; } } else { info->rti_info[RTAX_DST] = rt_key(rt); info->rti_addrs |= RTA_DST; if (rt_mask(rt) != NULL) { info->rti_info[RTAX_NETMASK] = rt_mask(rt); info->rti_addrs |= RTA_NETMASK; } if (rt->rt_flags & RTF_GATEWAY) { info->rti_info[RTAX_GATEWAY] = rt->rt_gateway; info->rti_addrs |= RTA_GATEWAY; } } rmx = info->rti_rmx; if (rmx != NULL) { info->rti_mflags |= RTV_MTU; rmx->rmx_mtu = rt->rt_mtu; } info->rti_flags = rt->rt_flags; info->rti_ifp = rt->rt_ifp; info->rti_ifa = rt->rt_ifa; if (flags & NHR_REF) { /* Do 'traditional' refcouting */ if_ref(info->rti_ifp); } return (0); } /* * Lookups up route entry for @dst in RIB database for fib @fibnum. * Exports entry data to @info using rt_exportinfo(). * * if @flags contains NHR_REF, refcouting is performed on rt_ifp. * All references can be released later by calling rib_free_info() * * Returns 0 on success. * Returns ENOENT for lookup failure, ENOMEM for export failure. */ int rib_lookup_info(uint32_t fibnum, const struct sockaddr *dst, uint32_t flags, uint32_t flowid, struct rt_addrinfo *info) { - struct radix_node_head *rh; + struct rib_head *rh; struct radix_node *rn; struct rtentry *rt; int error; KASSERT((fibnum < rt_numfibs), ("rib_lookup_rte: bad fibnum")); rh = rt_tables_get_rnh(fibnum, dst->sa_family); if (rh == NULL) return (ENOENT); - RADIX_NODE_HEAD_RLOCK(rh); - rn = rh->rnh_matchaddr(__DECONST(void *, dst), rh); + RIB_RLOCK(rh); + rn = rh->rnh_matchaddr(__DECONST(void *, dst), &rh->head); if (rn != NULL && ((rn->rn_flags & RNF_ROOT) == 0)) { rt = RNTORT(rn); /* Ensure route & ifp is UP */ if (RT_LINK_IS_UP(rt->rt_ifp)) { flags = (flags & NHR_REF) | NHR_COPY; error = rt_exportinfo(rt, info, flags); - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (error); } } - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (ENOENT); } /* * Releases all references acquired by rib_lookup_info() when * called with NHR_REF flags. */ void rib_free_info(struct rt_addrinfo *info) { if_rele(info->rti_ifp); } /* * Iterates over all existing fibs in system calling * @setwa_f function prior to traversing each fib. * Calls @wa_f function for each element in current fib. * If af is not AF_UNSPEC, iterates over fibs in particular * address family. */ void rt_foreach_fib_walk(int af, rt_setwarg_t *setwa_f, rt_walktree_f_t *wa_f, void *arg) { - struct radix_node_head *rnh; + struct rib_head *rnh; uint32_t fibnum; int i; for (fibnum = 0; fibnum < rt_numfibs; fibnum++) { /* Do we want some specific family? */ if (af != AF_UNSPEC) { rnh = rt_tables_get_rnh(fibnum, af); if (rnh == NULL) continue; if (setwa_f != NULL) setwa_f(rnh, fibnum, af, arg); - RADIX_NODE_HEAD_LOCK(rnh); - rnh->rnh_walktree(rnh, (walktree_f_t *)wa_f, arg); - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WLOCK(rnh); + rnh->rnh_walktree(&rnh->head, (walktree_f_t *)wa_f,arg); + RIB_WUNLOCK(rnh); continue; } for (i = 1; i <= AF_MAX; i++) { rnh = rt_tables_get_rnh(fibnum, i); if (rnh == NULL) continue; if (setwa_f != NULL) setwa_f(rnh, fibnum, i, arg); - RADIX_NODE_HEAD_LOCK(rnh); - rnh->rnh_walktree(rnh, (walktree_f_t *)wa_f, arg); - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WLOCK(rnh); + rnh->rnh_walktree(&rnh->head, (walktree_f_t *)wa_f,arg); + RIB_WUNLOCK(rnh); } } } struct rt_delinfo { struct rt_addrinfo info; - struct radix_node_head *rnh; + struct rib_head *rnh; struct rtentry *head; }; /* * Conditionally unlinks @rn from radix tree based * on info data passed in @arg. */ static int rt_checkdelroute(struct radix_node *rn, void *arg) { struct rt_delinfo *di; struct rt_addrinfo *info; struct rtentry *rt; int error; di = (struct rt_delinfo *)arg; rt = (struct rtentry *)rn; info = &di->info; error = 0; info->rti_info[RTAX_DST] = rt_key(rt); info->rti_info[RTAX_NETMASK] = rt_mask(rt); info->rti_info[RTAX_GATEWAY] = rt->rt_gateway; rt = rt_unlinkrte(di->rnh, info, &error); if (rt == NULL) { /* Either not allowed or not matched. Skip entry */ return (0); } /* Entry was unlinked. Add to the list and return */ rt->rt_chain = di->head; di->head = rt; return (0); } /* * Iterates over all existing fibs in system. * Deletes each element for which @filter_f function returned * non-zero value. * If @af is not AF_UNSPEC, iterates over fibs in particular * address family. */ void rt_foreach_fib_walk_del(int af, rt_filter_f_t *filter_f, void *arg) { - struct radix_node_head *rnh; + struct rib_head *rnh; struct rt_delinfo di; struct rtentry *rt; uint32_t fibnum; int i, start, end; bzero(&di, sizeof(di)); di.info.rti_filter = filter_f; di.info.rti_filterdata = arg; for (fibnum = 0; fibnum < rt_numfibs; fibnum++) { /* Do we want some specific family? */ if (af != AF_UNSPEC) { start = af; end = af; } else { start = 1; end = AF_MAX; } for (i = start; i <= end; i++) { rnh = rt_tables_get_rnh(fibnum, i); if (rnh == NULL) continue; di.rnh = rnh; - RADIX_NODE_HEAD_LOCK(rnh); - rnh->rnh_walktree(rnh, rt_checkdelroute, &di); - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WLOCK(rnh); + rnh->rnh_walktree(&rnh->head, rt_checkdelroute, &di); + RIB_WUNLOCK(rnh); if (di.head == NULL) continue; /* We might have something to reclaim */ while (di.head != NULL) { rt = di.head; di.head = rt->rt_chain; rt->rt_chain = NULL; /* TODO std rt -> rt_addrinfo export */ di.info.rti_info[RTAX_DST] = rt_key(rt); di.info.rti_info[RTAX_NETMASK] = rt_mask(rt); rt_notifydelete(rt, &di.info); RTFREE_LOCKED(rt); } } } } /* * Delete Routes for a Network Interface * * Called for each routing entry via the rnh->rnh_walktree() call above * to delete all route entries referencing a detaching network interface. * * Arguments: * rt pointer to rtentry * arg argument passed to rnh->rnh_walktree() - detaching interface * * Returns: * 0 successful * errno failed - reason indicated */ static int rt_ifdelroute(const struct rtentry *rt, void *arg) { struct ifnet *ifp = arg; if (rt->rt_ifp != ifp) return (0); /* * Protect (sorta) against walktree recursion problems * with cloned routes */ if ((rt->rt_flags & RTF_UP) == 0) return (0); return (1); } /* * Delete all remaining routes using this interface * Unfortuneatly the only way to do this is to slog through * the entire routing table looking for routes which point * to this interface...oh well... */ void rt_flushifroutes(struct ifnet *ifp) { rt_foreach_fib_walk_del(AF_UNSPEC, rt_ifdelroute, ifp); } /* * Conditionally unlinks rtentry matching data inside @info from @rnh. * Returns unlinked, locked and referenced @rtentry on success, * Returns NULL and sets @perror to: * ESRCH - if prefix was not found, * EADDRINUSE - if trying to delete PINNED route without appropriate flag. * ENOENT - if supplied filter function returned 0 (not matched). */ static struct rtentry * -rt_unlinkrte(struct radix_node_head *rnh, struct rt_addrinfo *info, int *perror) +rt_unlinkrte(struct rib_head *rnh, struct rt_addrinfo *info, int *perror) { struct sockaddr *dst, *netmask; struct rtentry *rt; struct radix_node *rn; dst = info->rti_info[RTAX_DST]; netmask = info->rti_info[RTAX_NETMASK]; - rt = (struct rtentry *)rnh->rnh_lookup(dst, netmask, rnh); + rt = (struct rtentry *)rnh->rnh_lookup(dst, netmask, &rnh->head); if (rt == NULL) { *perror = ESRCH; return (NULL); } if ((info->rti_flags & RTF_PINNED) == 0) { /* Check if target route can be deleted */ if (rt->rt_flags & RTF_PINNED) { *perror = EADDRINUSE; return (NULL); } } if (info->rti_filter != NULL) { if (info->rti_filter(rt, info->rti_filterdata) == 0) { /* Not matched */ *perror = ENOENT; return (NULL); } /* * Filter function requested rte deletion. * Ease the caller work by filling in remaining info * from that particular entry. */ info->rti_info[RTAX_GATEWAY] = rt->rt_gateway; } /* * Remove the item from the tree and return it. * Complain if it is not there and do no more processing. */ *perror = ESRCH; #ifdef RADIX_MPATH - if (rn_mpath_capable(rnh)) + if (rt_mpath_capable(rnh)) rn = rt_mpath_unlink(rnh, info, rt, perror); else #endif - rn = rnh->rnh_deladdr(dst, netmask, rnh); + rn = rnh->rnh_deladdr(dst, netmask, &rnh->head); if (rn == NULL) return (NULL); if (rn->rn_flags & (RNF_ACTIVE | RNF_ROOT)) panic ("rtrequest delete"); rt = RNTORT(rn); RT_LOCK(rt); RT_ADDREF(rt); rt->rt_flags &= ~RTF_UP; *perror = 0; return (rt); } static void rt_notifydelete(struct rtentry *rt, struct rt_addrinfo *info) { struct ifaddr *ifa; /* * give the protocol a chance to keep things in sync. */ ifa = rt->rt_ifa; if (ifa != NULL && ifa->ifa_rtrequest != NULL) ifa->ifa_rtrequest(RTM_DELETE, rt, info); /* * One more rtentry floating around that is not * linked to the routing table. rttrash will be decremented * when RTFREE(rt) is eventually called. */ V_rttrash++; } /* * These (questionable) definitions of apparent local variables apply * to the next two functions. XXXXXX!!! */ #define dst info->rti_info[RTAX_DST] #define gateway info->rti_info[RTAX_GATEWAY] #define netmask info->rti_info[RTAX_NETMASK] #define ifaaddr info->rti_info[RTAX_IFA] #define ifpaddr info->rti_info[RTAX_IFP] #define flags info->rti_flags /* * Look up rt_addrinfo for a specific fib. Note that if rti_ifa is defined, * it will be referenced so the caller must free it. */ int rt_getifa_fib(struct rt_addrinfo *info, u_int fibnum) { struct ifaddr *ifa; int error = 0; /* * ifp may be specified by sockaddr_dl * when protocol address is ambiguous. */ if (info->rti_ifp == NULL && ifpaddr != NULL && ifpaddr->sa_family == AF_LINK && (ifa = ifa_ifwithnet(ifpaddr, 0, fibnum)) != NULL) { info->rti_ifp = ifa->ifa_ifp; ifa_free(ifa); } if (info->rti_ifa == NULL && ifaaddr != NULL) info->rti_ifa = ifa_ifwithaddr(ifaaddr); if (info->rti_ifa == NULL) { struct sockaddr *sa; sa = ifaaddr != NULL ? ifaaddr : (gateway != NULL ? gateway : dst); if (sa != NULL && info->rti_ifp != NULL) info->rti_ifa = ifaof_ifpforaddr(sa, info->rti_ifp); else if (dst != NULL && gateway != NULL) info->rti_ifa = ifa_ifwithroute(flags, dst, gateway, fibnum); else if (sa != NULL) info->rti_ifa = ifa_ifwithroute(flags, sa, sa, fibnum); } if ((ifa = info->rti_ifa) != NULL) { if (info->rti_ifp == NULL) info->rti_ifp = ifa->ifa_ifp; } else error = ENETUNREACH; return (error); } static int if_updatemtu_cb(struct radix_node *rn, void *arg) { struct rtentry *rt; struct if_mtuinfo *ifmtu; rt = (struct rtentry *)rn; ifmtu = (struct if_mtuinfo *)arg; if (rt->rt_ifp != ifmtu->ifp) return (0); if (rt->rt_mtu >= ifmtu->mtu) { /* We have to decrease mtu regardless of flags */ rt->rt_mtu = ifmtu->mtu; return (0); } /* * New MTU is bigger. Check if are allowed to alter it */ if ((rt->rt_flags & (RTF_FIXEDMTU | RTF_GATEWAY | RTF_HOST)) != 0) { /* * Skip routes with user-supplied MTU and * non-interface routes */ return (0); } /* We are safe to update route MTU */ rt->rt_mtu = ifmtu->mtu; return (0); } void rt_updatemtu(struct ifnet *ifp) { struct if_mtuinfo ifmtu; - struct radix_node_head *rnh; + struct rib_head *rnh; int i, j; ifmtu.ifp = ifp; /* * Try to update rt_mtu for all routes using this interface * Unfortunately the only way to do this is to traverse all * routing tables in all fibs/domains. */ for (i = 1; i <= AF_MAX; i++) { ifmtu.mtu = if_getmtu_family(ifp, i); for (j = 0; j < rt_numfibs; j++) { rnh = rt_tables_get_rnh(j, i); if (rnh == NULL) continue; - RADIX_NODE_HEAD_LOCK(rnh); - rnh->rnh_walktree(rnh, if_updatemtu_cb, &ifmtu); - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WLOCK(rnh); + rnh->rnh_walktree(&rnh->head, if_updatemtu_cb, &ifmtu); + RIB_WUNLOCK(rnh); } } } #if 0 int p_sockaddr(char *buf, int buflen, struct sockaddr *s); int rt_print(char *buf, int buflen, struct rtentry *rt); int p_sockaddr(char *buf, int buflen, struct sockaddr *s) { void *paddr = NULL; switch (s->sa_family) { case AF_INET: paddr = &((struct sockaddr_in *)s)->sin_addr; break; case AF_INET6: paddr = &((struct sockaddr_in6 *)s)->sin6_addr; break; } if (paddr == NULL) return (0); if (inet_ntop(s->sa_family, paddr, buf, buflen) == NULL) return (0); return (strlen(buf)); } int rt_print(char *buf, int buflen, struct rtentry *rt) { struct sockaddr *addr, *mask; int i = 0; addr = rt_key(rt); mask = rt_mask(rt); i = p_sockaddr(buf, buflen, addr); if (!(rt->rt_flags & RTF_HOST)) { buf[i++] = '/'; i += p_sockaddr(buf + i, buflen - i, mask); } if (rt->rt_flags & RTF_GATEWAY) { buf[i++] = '>'; i += p_sockaddr(buf + i, buflen - i, rt->rt_gateway); } return (i); } #endif #ifdef RADIX_MPATH /* * Deletes key for single-path routes, unlinks rtentry with * gateway specified in @info from multi-path routes. * * Returnes unlinked entry. In case of failure, returns NULL * and sets @perror to ESRCH. */ static struct radix_node * -rt_mpath_unlink(struct radix_node_head *rnh, struct rt_addrinfo *info, +rt_mpath_unlink(struct rib_head *rnh, struct rt_addrinfo *info, struct rtentry *rto, int *perror) { /* * if we got multipath routes, we require users to specify * a matching RTAX_GATEWAY. */ struct rtentry *rt; // *rto = NULL; struct radix_node *rn; struct sockaddr *gw; gw = info->rti_info[RTAX_GATEWAY]; rt = rt_mpath_matchgate(rto, gw); if (rt == NULL) { *perror = ESRCH; return (NULL); } /* * this is the first entry in the chain */ if (rto == rt) { rn = rn_mpath_next((struct radix_node *)rt); /* * there is another entry, now it's active */ if (rn) { rto = RNTORT(rn); RT_LOCK(rto); rto->rt_flags |= RTF_UP; RT_UNLOCK(rto); } else if (rt->rt_flags & RTF_GATEWAY) { /* * For gateway routes, we need to * make sure that we we are deleting * the correct gateway. * rt_mpath_matchgate() does not * check the case when there is only * one route in the chain. */ if (gw && (rt->rt_gateway->sa_len != gw->sa_len || memcmp(rt->rt_gateway, gw, gw->sa_len))) { *perror = ESRCH; return (NULL); } } /* * use the normal delete code to remove * the first entry */ - rn = rnh->rnh_deladdr(dst, netmask, rnh); + rn = rnh->rnh_deladdr(dst, netmask, &rnh->head); *perror = 0; return (rn); } /* * if the entry is 2nd and on up */ if (rt_mpath_deldup(rto, rt) == 0) panic ("rtrequest1: rt_mpath_deldup"); *perror = 0; rn = (struct radix_node *)rt; return (rn); } #endif #ifdef FLOWTABLE static struct rtentry * -rt_flowtable_check_route(struct radix_node_head *rnh, struct rt_addrinfo *info) +rt_flowtable_check_route(struct rib_head *rnh, struct rt_addrinfo *info) { #if defined(INET6) || defined(INET) struct radix_node *rn; #endif struct rtentry *rt0; rt0 = NULL; /* "flow-table" only supports IPv6 and IPv4 at the moment. */ switch (dst->sa_family) { #ifdef INET6 case AF_INET6: #endif #ifdef INET case AF_INET: #endif #if defined(INET6) || defined(INET) - rn = rnh->rnh_matchaddr(dst, rnh); + rn = rnh->rnh_matchaddr(dst, &rnh->head); if (rn && ((rn->rn_flags & RNF_ROOT) == 0)) { struct sockaddr *mask; u_char *m, *n; int len; /* * compare mask to see if the new route is * more specific than the existing one */ rt0 = RNTORT(rn); RT_LOCK(rt0); RT_ADDREF(rt0); RT_UNLOCK(rt0); /* * A host route is already present, so * leave the flow-table entries as is. */ if (rt0->rt_flags & RTF_HOST) { RTFREE(rt0); rt0 = NULL; } else if (!(flags & RTF_HOST) && netmask) { mask = rt_mask(rt0); len = mask->sa_len; m = (u_char *)mask; n = (u_char *)netmask; while (len-- > 0) { if (*n != *m) break; n++; m++; } if (len == 0 || (*n < *m)) { RTFREE(rt0); rt0 = NULL; } } } #endif/* INET6 || INET */ } return (rt0); } #endif int rtrequest1_fib(int req, struct rt_addrinfo *info, struct rtentry **ret_nrt, u_int fibnum) { int error = 0; struct rtentry *rt, *rt_old; #ifdef FLOWTABLE struct rtentry *rt0; #endif struct radix_node *rn; - struct radix_node_head *rnh; + struct rib_head *rnh; struct ifaddr *ifa; struct sockaddr *ndst; struct sockaddr_storage mdst; KASSERT((fibnum < rt_numfibs), ("rtrequest1_fib: bad fibnum")); KASSERT((flags & RTF_RNH_LOCKED) == 0, ("rtrequest1_fib: locked")); switch (dst->sa_family) { case AF_INET6: case AF_INET: /* We support multiple FIBs. */ break; default: fibnum = RT_DEFAULT_FIB; break; } /* * Find the correct routing tree to use for this Address Family */ rnh = rt_tables_get_rnh(fibnum, dst->sa_family); if (rnh == NULL) return (EAFNOSUPPORT); /* * If we are adding a host route then we don't want to put * a netmask in the tree, nor do we want to clone it. */ if (flags & RTF_HOST) netmask = NULL; switch (req) { case RTM_DELETE: if (netmask) { rt_maskedcopy(dst, (struct sockaddr *)&mdst, netmask); dst = (struct sockaddr *)&mdst; } - RADIX_NODE_HEAD_LOCK(rnh); + RIB_WLOCK(rnh); rt = rt_unlinkrte(rnh, info, &error); - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WUNLOCK(rnh); if (error != 0) return (error); rt_notifydelete(rt, info); /* * If the caller wants it, then it can have it, * but it's up to it to free the rtentry as we won't be * doing it. */ if (ret_nrt) { *ret_nrt = rt; RT_UNLOCK(rt); } else RTFREE_LOCKED(rt); break; case RTM_RESOLVE: /* * resolve was only used for route cloning * here for compat */ break; case RTM_ADD: if ((flags & RTF_GATEWAY) && !gateway) return (EINVAL); if (dst && gateway && (dst->sa_family != gateway->sa_family) && (gateway->sa_family != AF_UNSPEC) && (gateway->sa_family != AF_LINK)) return (EINVAL); if (info->rti_ifa == NULL) { error = rt_getifa_fib(info, fibnum); if (error) return (error); } else ifa_ref(info->rti_ifa); ifa = info->rti_ifa; rt = uma_zalloc(V_rtzone, M_NOWAIT); if (rt == NULL) { ifa_free(ifa); return (ENOBUFS); } rt->rt_flags = RTF_UP | flags; rt->rt_fibnum = fibnum; /* * Add the gateway. Possibly re-malloc-ing the storage for it. */ if ((error = rt_setgate(rt, dst, gateway)) != 0) { ifa_free(ifa); uma_zfree(V_rtzone, rt); return (error); } /* * point to the (possibly newly malloc'd) dest address. */ ndst = (struct sockaddr *)rt_key(rt); /* * make sure it contains the value we want (masked if needed). */ if (netmask) { rt_maskedcopy(dst, ndst, netmask); } else bcopy(dst, ndst, dst->sa_len); /* * We use the ifa reference returned by rt_getifa_fib(). * This moved from below so that rnh->rnh_addaddr() can * examine the ifa and ifa->ifa_ifp if it so desires. */ rt->rt_ifa = ifa; rt->rt_ifp = ifa->ifa_ifp; rt->rt_weight = 1; rt_setmetrics(info, rt); - RADIX_NODE_HEAD_LOCK(rnh); + RIB_WLOCK(rnh); RT_LOCK(rt); #ifdef RADIX_MPATH /* do not permit exactly the same dst/mask/gw pair */ - if (rn_mpath_capable(rnh) && + if (rt_mpath_capable(rnh) && rt_mpath_conflict(rnh, rt, netmask)) { - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WUNLOCK(rnh); ifa_free(rt->rt_ifa); R_Free(rt_key(rt)); uma_zfree(V_rtzone, rt); return (EEXIST); } #endif #ifdef FLOWTABLE rt0 = rt_flowtable_check_route(rnh, info); #endif /* FLOWTABLE */ /* XXX mtu manipulation will be done in rnh_addaddr -- itojun */ - rn = rnh->rnh_addaddr(ndst, netmask, rnh, rt->rt_nodes); + rn = rnh->rnh_addaddr(ndst, netmask, &rnh->head, rt->rt_nodes); rt_old = NULL; if (rn == NULL && (info->rti_flags & RTF_PINNED) != 0) { /* * Force removal and re-try addition * TODO: better multipath&pinned support */ struct sockaddr *info_dst = info->rti_info[RTAX_DST]; info->rti_info[RTAX_DST] = ndst; /* Do not delete existing PINNED(interface) routes */ info->rti_flags &= ~RTF_PINNED; rt_old = rt_unlinkrte(rnh, info, &error); info->rti_flags |= RTF_PINNED; info->rti_info[RTAX_DST] = info_dst; if (rt_old != NULL) - rn = rnh->rnh_addaddr(ndst, netmask, rnh, + rn = rnh->rnh_addaddr(ndst, netmask, &rnh->head, rt->rt_nodes); } - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WUNLOCK(rnh); if (rt_old != NULL) RT_UNLOCK(rt_old); /* * If it still failed to go into the tree, * then un-make it (this should be a function) */ if (rn == NULL) { ifa_free(rt->rt_ifa); R_Free(rt_key(rt)); uma_zfree(V_rtzone, rt); #ifdef FLOWTABLE if (rt0 != NULL) RTFREE(rt0); #endif return (EEXIST); } #ifdef FLOWTABLE else if (rt0 != NULL) { flowtable_route_flush(dst->sa_family, rt0); RTFREE(rt0); } #endif if (rt_old != NULL) { rt_notifydelete(rt_old, info); RTFREE(rt_old); } /* * If this protocol has something to add to this then * allow it to do that as well. */ if (ifa->ifa_rtrequest) ifa->ifa_rtrequest(req, rt, info); /* * actually return a resultant rtentry and * give the caller a single reference. */ if (ret_nrt) { *ret_nrt = rt; RT_ADDREF(rt); } RT_UNLOCK(rt); break; case RTM_CHANGE: - RADIX_NODE_HEAD_LOCK(rnh); + RIB_WLOCK(rnh); error = rtrequest1_fib_change(rnh, info, ret_nrt, fibnum); - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WUNLOCK(rnh); break; default: error = EOPNOTSUPP; } return (error); } #undef dst #undef gateway #undef netmask #undef ifaaddr #undef ifpaddr #undef flags static int -rtrequest1_fib_change(struct radix_node_head *rnh, struct rt_addrinfo *info, +rtrequest1_fib_change(struct rib_head *rnh, struct rt_addrinfo *info, struct rtentry **ret_nrt, u_int fibnum) { struct rtentry *rt = NULL; int error = 0; int free_ifa = 0; int family, mtu; struct if_mtuinfo ifmtu; rt = (struct rtentry *)rnh->rnh_lookup(info->rti_info[RTAX_DST], - info->rti_info[RTAX_NETMASK], rnh); + info->rti_info[RTAX_NETMASK], &rnh->head); if (rt == NULL) return (ESRCH); #ifdef RADIX_MPATH /* * If we got multipath routes, * we require users to specify a matching RTAX_GATEWAY. */ - if (rn_mpath_capable(rnh)) { + if (rt_mpath_capable(rnh)) { rt = rt_mpath_matchgate(rt, info->rti_info[RTAX_GATEWAY]); if (rt == NULL) return (ESRCH); } #endif RT_LOCK(rt); rt_setmetrics(info, rt); /* * New gateway could require new ifaddr, ifp; * flags may also be different; ifp may be specified * by ll sockaddr when protocol address is ambiguous */ if (((rt->rt_flags & RTF_GATEWAY) && info->rti_info[RTAX_GATEWAY] != NULL) || info->rti_info[RTAX_IFP] != NULL || (info->rti_info[RTAX_IFA] != NULL && !sa_equal(info->rti_info[RTAX_IFA], rt->rt_ifa->ifa_addr))) { error = rt_getifa_fib(info, fibnum); if (info->rti_ifa != NULL) free_ifa = 1; if (error != 0) goto bad; } /* Check if outgoing interface has changed */ if (info->rti_ifa != NULL && info->rti_ifa != rt->rt_ifa && rt->rt_ifa != NULL && rt->rt_ifa->ifa_rtrequest != NULL) { rt->rt_ifa->ifa_rtrequest(RTM_DELETE, rt, info); ifa_free(rt->rt_ifa); } /* Update gateway address */ if (info->rti_info[RTAX_GATEWAY] != NULL) { error = rt_setgate(rt, rt_key(rt), info->rti_info[RTAX_GATEWAY]); if (error != 0) goto bad; rt->rt_flags &= ~RTF_GATEWAY; rt->rt_flags |= (RTF_GATEWAY & info->rti_flags); } if (info->rti_ifa != NULL && info->rti_ifa != rt->rt_ifa) { ifa_ref(info->rti_ifa); rt->rt_ifa = info->rti_ifa; rt->rt_ifp = info->rti_ifp; } /* Allow some flags to be toggled on change. */ rt->rt_flags &= ~RTF_FMASK; rt->rt_flags |= info->rti_flags & RTF_FMASK; if (rt->rt_ifa && rt->rt_ifa->ifa_rtrequest != NULL) rt->rt_ifa->ifa_rtrequest(RTM_ADD, rt, info); /* Alter route MTU if necessary */ if (rt->rt_ifp != NULL) { family = info->rti_info[RTAX_DST]->sa_family; mtu = if_getmtu_family(rt->rt_ifp, family); /* Set default MTU */ if (rt->rt_mtu == 0) rt->rt_mtu = mtu; if (rt->rt_mtu != mtu) { /* Check if we really need to update */ ifmtu.ifp = rt->rt_ifp; ifmtu.mtu = mtu; if_updatemtu_cb(rt->rt_nodes, &ifmtu); } } if (ret_nrt) { *ret_nrt = rt; RT_ADDREF(rt); } bad: RT_UNLOCK(rt); if (free_ifa != 0) ifa_free(info->rti_ifa); return (error); } static void rt_setmetrics(const struct rt_addrinfo *info, struct rtentry *rt) { if (info->rti_mflags & RTV_MTU) { if (info->rti_rmx->rmx_mtu != 0) { /* * MTU was explicitly provided by user. * Keep it. */ rt->rt_flags |= RTF_FIXEDMTU; } else { /* * User explicitly sets MTU to 0. * Assume rollback to default. */ rt->rt_flags &= ~RTF_FIXEDMTU; } rt->rt_mtu = info->rti_rmx->rmx_mtu; } if (info->rti_mflags & RTV_WEIGHT) rt->rt_weight = info->rti_rmx->rmx_weight; /* Kernel -> userland timebase conversion. */ if (info->rti_mflags & RTV_EXPIRE) rt->rt_expire = info->rti_rmx->rmx_expire ? info->rti_rmx->rmx_expire - time_second + time_uptime : 0; } int rt_setgate(struct rtentry *rt, struct sockaddr *dst, struct sockaddr *gate) { /* XXX dst may be overwritten, can we move this to below */ int dlen = SA_SIZE(dst), glen = SA_SIZE(gate); /* * Prepare to store the gateway in rt->rt_gateway. * Both dst and gateway are stored one after the other in the same * malloc'd chunk. If we have room, we can reuse the old buffer, * rt_gateway already points to the right place. * Otherwise, malloc a new block and update the 'dst' address. */ if (rt->rt_gateway == NULL || glen > SA_SIZE(rt->rt_gateway)) { caddr_t new; R_Malloc(new, caddr_t, dlen + glen); if (new == NULL) return ENOBUFS; /* * XXX note, we copy from *dst and not *rt_key(rt) because * rt_setgate() can be called to initialize a newly * allocated route entry, in which case rt_key(rt) == NULL * (and also rt->rt_gateway == NULL). * Free()/free() handle a NULL argument just fine. */ bcopy(dst, new, dlen); R_Free(rt_key(rt)); /* free old block, if any */ rt_key(rt) = (struct sockaddr *)new; rt->rt_gateway = (struct sockaddr *)(new + dlen); } /* * Copy the new gateway value into the memory chunk. */ bcopy(gate, rt->rt_gateway, glen); return (0); } void rt_maskedcopy(struct sockaddr *src, struct sockaddr *dst, struct sockaddr *netmask) { u_char *cp1 = (u_char *)src; u_char *cp2 = (u_char *)dst; u_char *cp3 = (u_char *)netmask; u_char *cplim = cp2 + *cp3; u_char *cplim2 = cp2 + *cp1; *cp2++ = *cp1++; *cp2++ = *cp1++; /* copies sa_len & sa_family */ cp3 += 2; if (cplim > cplim2) cplim = cplim2; while (cp2 < cplim) *cp2++ = *cp1++ & *cp3++; if (cp2 < cplim2) bzero((caddr_t)cp2, (unsigned)(cplim2 - cp2)); } /* * Set up a routing table entry, normally * for an interface. */ #define _SOCKADDR_TMPSIZE 128 /* Not too big.. kernel stack size is limited */ static inline int rtinit1(struct ifaddr *ifa, int cmd, int flags, int fibnum) { struct sockaddr *dst; struct sockaddr *netmask; struct rtentry *rt = NULL; struct rt_addrinfo info; int error = 0; int startfib, endfib; char tempbuf[_SOCKADDR_TMPSIZE]; int didwork = 0; int a_failure = 0; static struct sockaddr_dl null_sdl = {sizeof(null_sdl), AF_LINK}; - struct radix_node_head *rnh; + struct rib_head *rnh; if (flags & RTF_HOST) { dst = ifa->ifa_dstaddr; netmask = NULL; } else { dst = ifa->ifa_addr; netmask = ifa->ifa_netmask; } if (dst->sa_len == 0) return(EINVAL); switch (dst->sa_family) { case AF_INET6: case AF_INET: /* We support multiple FIBs. */ break; default: fibnum = RT_DEFAULT_FIB; break; } if (fibnum == RT_ALL_FIBS) { if (V_rt_add_addr_allfibs == 0 && cmd == (int)RTM_ADD) startfib = endfib = ifa->ifa_ifp->if_fib; else { startfib = 0; endfib = rt_numfibs - 1; } } else { KASSERT((fibnum < rt_numfibs), ("rtinit1: bad fibnum")); startfib = fibnum; endfib = fibnum; } /* * If it's a delete, check that if it exists, * it's on the correct interface or we might scrub * a route to another ifa which would * be confusing at best and possibly worse. */ if (cmd == RTM_DELETE) { /* * It's a delete, so it should already exist.. * If it's a net, mask off the host bits * (Assuming we have a mask) * XXX this is kinda inet specific.. */ if (netmask != NULL) { rt_maskedcopy(dst, (struct sockaddr *)tempbuf, netmask); dst = (struct sockaddr *)tempbuf; } } /* * Now go through all the requested tables (fibs) and do the * requested action. Realistically, this will either be fib 0 * for protocols that don't do multiple tables or all the * tables for those that do. */ for ( fibnum = startfib; fibnum <= endfib; fibnum++) { if (cmd == RTM_DELETE) { struct radix_node *rn; /* * Look up an rtentry that is in the routing tree and * contains the correct info. */ rnh = rt_tables_get_rnh(fibnum, dst->sa_family); if (rnh == NULL) /* this table doesn't exist but others might */ continue; - RADIX_NODE_HEAD_RLOCK(rnh); - rn = rnh->rnh_lookup(dst, netmask, rnh); + RIB_RLOCK(rnh); + rn = rnh->rnh_lookup(dst, netmask, &rnh->head); #ifdef RADIX_MPATH - if (rn_mpath_capable(rnh)) { + if (rt_mpath_capable(rnh)) { if (rn == NULL) error = ESRCH; else { rt = RNTORT(rn); /* * for interface route the * rt->rt_gateway is sockaddr_intf * for cloning ARP entries, so * rt_mpath_matchgate must use the * interface address */ rt = rt_mpath_matchgate(rt, ifa->ifa_addr); if (rt == NULL) error = ESRCH; } } #endif error = (rn == NULL || (rn->rn_flags & RNF_ROOT) || RNTORT(rn)->rt_ifa != ifa); - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rnh); if (error) { /* this is only an error if bad on ALL tables */ continue; } } /* * Do the actual request */ bzero((caddr_t)&info, sizeof(info)); info.rti_ifa = ifa; info.rti_flags = flags | (ifa->ifa_flags & ~IFA_RTSELF) | RTF_PINNED; info.rti_info[RTAX_DST] = dst; /* * doing this for compatibility reasons */ if (cmd == RTM_ADD) info.rti_info[RTAX_GATEWAY] = (struct sockaddr *)&null_sdl; else info.rti_info[RTAX_GATEWAY] = ifa->ifa_addr; info.rti_info[RTAX_NETMASK] = netmask; error = rtrequest1_fib(cmd, &info, &rt, fibnum); if (error == 0 && rt != NULL) { /* * notify any listening routing agents of the change */ RT_LOCK(rt); #ifdef RADIX_MPATH /* * in case address alias finds the first address * e.g. ifconfig bge0 192.0.2.246/24 * e.g. ifconfig bge0 192.0.2.247/24 * the address set in the route is 192.0.2.246 * so we need to replace it with 192.0.2.247 */ if (memcmp(rt->rt_ifa->ifa_addr, ifa->ifa_addr, ifa->ifa_addr->sa_len)) { ifa_free(rt->rt_ifa); ifa_ref(ifa); rt->rt_ifp = ifa->ifa_ifp; rt->rt_ifa = ifa; } #endif /* * doing this for compatibility reasons */ if (cmd == RTM_ADD) { ((struct sockaddr_dl *)rt->rt_gateway)->sdl_type = rt->rt_ifp->if_type; ((struct sockaddr_dl *)rt->rt_gateway)->sdl_index = rt->rt_ifp->if_index; } RT_ADDREF(rt); RT_UNLOCK(rt); rt_newaddrmsg_fib(cmd, ifa, error, rt, fibnum); RT_LOCK(rt); RT_REMREF(rt); if (cmd == RTM_DELETE) { /* * If we are deleting, and we found an entry, * then it's been removed from the tree.. * now throw it away. */ RTFREE_LOCKED(rt); } else { if (cmd == RTM_ADD) { /* * We just wanted to add it.. * we don't actually need a reference. */ RT_REMREF(rt); } RT_UNLOCK(rt); } didwork = 1; } if (error) a_failure = error; } if (cmd == RTM_DELETE) { if (didwork) { error = 0; } else { /* we only give an error if it wasn't in any table */ error = ((flags & RTF_HOST) ? EHOSTUNREACH : ENETUNREACH); } } else { if (a_failure) { /* return an error if any of them failed */ error = a_failure; } } return (error); } /* * Set up a routing table entry, normally * for an interface. */ int rtinit(struct ifaddr *ifa, int cmd, int flags) { struct sockaddr *dst; int fib = RT_DEFAULT_FIB; if (flags & RTF_HOST) { dst = ifa->ifa_dstaddr; } else { dst = ifa->ifa_addr; } switch (dst->sa_family) { case AF_INET6: case AF_INET: /* We do support multiple FIBs. */ fib = RT_ALL_FIBS; break; } return (rtinit1(ifa, cmd, flags, fib)); } /* * Announce interface address arrival/withdraw * Returns 0 on success. */ int rt_addrmsg(int cmd, struct ifaddr *ifa, int fibnum) { KASSERT(cmd == RTM_ADD || cmd == RTM_DELETE, ("unexpected cmd %d", cmd)); KASSERT(fibnum == RT_ALL_FIBS || (fibnum >= 0 && fibnum < rt_numfibs), ("%s: fib out of range 0 <=%d<%d", __func__, fibnum, rt_numfibs)); #if defined(INET) || defined(INET6) #ifdef SCTP /* * notify the SCTP stack * this will only get called when an address is added/deleted * XXX pass the ifaddr struct instead if ifa->ifa_addr... */ sctp_addr_change(ifa, cmd); #endif /* SCTP */ #endif return (rtsock_addrmsg(cmd, ifa, fibnum)); } /* * Announce route addition/removal. * Users of this function MUST validate input data BEFORE calling. * However we have to be able to handle invalid data: * if some userland app sends us "invalid" route message (invalid mask, * no dst, wrong address families, etc...) we need to pass it back * to app (and any other rtsock consumers) with rtm_errno field set to * non-zero value. * Returns 0 on success. */ int rt_routemsg(int cmd, struct ifnet *ifp, int error, struct rtentry *rt, int fibnum) { KASSERT(cmd == RTM_ADD || cmd == RTM_DELETE, ("unexpected cmd %d", cmd)); KASSERT(fibnum == RT_ALL_FIBS || (fibnum >= 0 && fibnum < rt_numfibs), ("%s: fib out of range 0 <=%d<%d", __func__, fibnum, rt_numfibs)); KASSERT(rt_key(rt) != NULL, (":%s: rt_key must be supplied", __func__)); return (rtsock_routemsg(cmd, ifp, error, rt, fibnum)); } void rt_newaddrmsg(int cmd, struct ifaddr *ifa, int error, struct rtentry *rt) { rt_newaddrmsg_fib(cmd, ifa, error, rt, RT_ALL_FIBS); } /* * This is called to generate messages from the routing socket * indicating a network interface has had addresses associated with it. */ void rt_newaddrmsg_fib(int cmd, struct ifaddr *ifa, int error, struct rtentry *rt, int fibnum) { KASSERT(cmd == RTM_ADD || cmd == RTM_DELETE, ("unexpected cmd %u", cmd)); KASSERT(fibnum == RT_ALL_FIBS || (fibnum >= 0 && fibnum < rt_numfibs), ("%s: fib out of range 0 <=%d<%d", __func__, fibnum, rt_numfibs)); if (cmd == RTM_ADD) { rt_addrmsg(cmd, ifa, fibnum); if (rt != NULL) rt_routemsg(cmd, ifa->ifa_ifp, error, rt, fibnum); } else { if (rt != NULL) rt_routemsg(cmd, ifa->ifa_ifp, error, rt, fibnum); rt_addrmsg(cmd, ifa, fibnum); } } Index: projects/clang380-import/sys/net/route.h =================================================================== --- projects/clang380-import/sys/net/route.h (revision 294776) +++ projects/clang380-import/sys/net/route.h (revision 294777) @@ -1,479 +1,465 @@ /*- * Copyright (c) 1980, 1986, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)route.h 8.4 (Berkeley) 1/9/95 * $FreeBSD$ */ #ifndef _NET_ROUTE_H_ #define _NET_ROUTE_H_ #include #include /* * Kernel resident routing tables. * * The routing tables are initialized when interface addresses * are set by making entries for all directly connected interfaces. */ /* * Struct route consiste of a destination address, * a route entry pointer, link-layer prepend data pointer along * with its length. */ struct route { struct rtentry *ro_rt; char *ro_prepend; uint16_t ro_plen; uint16_t ro_flags; uint16_t ro_mtu; /* saved ro_rt mtu */ uint16_t spare; struct sockaddr ro_dst; }; #define RT_L2_ME_BIT 2 /* dst L2 addr is our address */ #define RT_MAY_LOOP_BIT 3 /* dst may require loop copy */ #define RT_HAS_HEADER_BIT 4 /* mbuf already have its header prepended */ #define RT_CACHING_CONTEXT 0x1 /* XXX: not used anywhere */ #define RT_NORTREF 0x2 /* doesn't hold reference on ro_rt */ #define RT_L2_ME (1 << RT_L2_ME_BIT) /* 0x0004 */ #define RT_MAY_LOOP (1 << RT_MAY_LOOP_BIT) /* 0x0008 */ #define RT_HAS_HEADER (1 << RT_HAS_HEADER_BIT) /* 0x0010 */ #define RT_REJECT 0x0020 /* Destination is reject */ #define RT_BLACKHOLE 0x0040 /* Destination is blackhole */ #define RT_HAS_GW 0x0080 /* Destination has GW */ struct rt_metrics { u_long rmx_locks; /* Kernel must leave these values alone */ u_long rmx_mtu; /* MTU for this path */ u_long rmx_hopcount; /* max hops expected */ u_long rmx_expire; /* lifetime for route, e.g. redirect */ u_long rmx_recvpipe; /* inbound delay-bandwidth product */ u_long rmx_sendpipe; /* outbound delay-bandwidth product */ u_long rmx_ssthresh; /* outbound gateway buffer limit */ u_long rmx_rtt; /* estimated round trip time */ u_long rmx_rttvar; /* estimated rtt variance */ u_long rmx_pksent; /* packets sent using this route */ u_long rmx_weight; /* route weight */ u_long rmx_filler[3]; /* will be used for T/TCP later */ }; /* * rmx_rtt and rmx_rttvar are stored as microseconds; * RTTTOPRHZ(rtt) converts to a value suitable for use * by a protocol slowtimo counter. */ #define RTM_RTTUNIT 1000000 /* units for rtt, rttvar, as units per sec */ #define RTTTOPRHZ(r) ((r) / (RTM_RTTUNIT / PR_SLOWHZ)) /* lle state is exported in rmx_state rt_metrics field */ #define rmx_state rmx_weight #define RT_DEFAULT_FIB 0 /* Explicitly mark fib=0 restricted cases */ #define RT_ALL_FIBS -1 /* Announce event for every fib */ #ifdef _KERNEL extern u_int rt_numfibs; /* number of usable routing tables */ VNET_DECLARE(u_int, rt_add_addr_allfibs); /* Announce interfaces to all fibs */ #define V_rt_add_addr_allfibs VNET(rt_add_addr_allfibs) #endif /* * We distinguish between routes to hosts and routes to networks, * preferring the former if available. For each route we infer * the interface to use from the gateway address supplied when * the route was entered. Routes that forward packets through * gateways are marked so that the output routines know to address the * gateway rather than the ultimate destination. */ #ifndef RNF_NORMAL #include #ifdef RADIX_MPATH #include #endif #endif #if defined(_KERNEL) || defined(_WANT_RTENTRY) struct rtentry { struct radix_node rt_nodes[2]; /* tree glue, and other values */ /* * XXX struct rtentry must begin with a struct radix_node (or two!) * because the code does some casts of a 'struct radix_node *' * to a 'struct rtentry *' */ #define rt_key(r) (*((struct sockaddr **)(&(r)->rt_nodes->rn_key))) #define rt_mask(r) (*((struct sockaddr **)(&(r)->rt_nodes->rn_mask))) struct sockaddr *rt_gateway; /* value */ struct ifnet *rt_ifp; /* the answer: interface to use */ struct ifaddr *rt_ifa; /* the answer: interface address to use */ int rt_flags; /* up/down?, host/net */ int rt_refcnt; /* # held references */ u_int rt_fibnum; /* which FIB */ u_long rt_mtu; /* MTU for this path */ u_long rt_weight; /* absolute weight */ u_long rt_expire; /* lifetime for route, e.g. redirect */ #define rt_endzero rt_pksent counter_u64_t rt_pksent; /* packets sent using this route */ struct mtx rt_mtx; /* mutex for routing entry */ struct rtentry *rt_chain; /* pointer to next rtentry to delete */ }; #endif /* _KERNEL || _WANT_RTENTRY */ #define RTF_UP 0x1 /* route usable */ #define RTF_GATEWAY 0x2 /* destination is a gateway */ #define RTF_HOST 0x4 /* host entry (net otherwise) */ #define RTF_REJECT 0x8 /* host or net unreachable */ #define RTF_DYNAMIC 0x10 /* created dynamically (by redirect) */ #define RTF_MODIFIED 0x20 /* modified dynamically (by redirect) */ #define RTF_DONE 0x40 /* message confirmed */ /* 0x80 unused, was RTF_DELCLONE */ /* 0x100 unused, was RTF_CLONING */ #define RTF_XRESOLVE 0x200 /* external daemon resolves name */ #define RTF_LLINFO 0x400 /* DEPRECATED - exists ONLY for backward compatibility */ #define RTF_LLDATA 0x400 /* used by apps to add/del L2 entries */ #define RTF_STATIC 0x800 /* manually added */ #define RTF_BLACKHOLE 0x1000 /* just discard pkts (during updates) */ #define RTF_PROTO2 0x4000 /* protocol specific routing flag */ #define RTF_PROTO1 0x8000 /* protocol specific routing flag */ /* 0x10000 unused, was RTF_PRCLONING */ /* 0x20000 unused, was RTF_WASCLONED */ #define RTF_PROTO3 0x40000 /* protocol specific routing flag */ #define RTF_FIXEDMTU 0x80000 /* MTU was explicitly specified */ #define RTF_PINNED 0x100000 /* route is immutable */ #define RTF_LOCAL 0x200000 /* route represents a local address */ #define RTF_BROADCAST 0x400000 /* route represents a bcast address */ #define RTF_MULTICAST 0x800000 /* route represents a mcast address */ /* 0x8000000 and up unassigned */ #define RTF_STICKY 0x10000000 /* always route dst->src */ #define RTF_RNH_LOCKED 0x40000000 /* unused */ #define RTF_GWFLAG_COMPAT 0x80000000 /* a compatibility bit for interacting with existing routing apps */ /* Mask of RTF flags that are allowed to be modified by RTM_CHANGE. */ #define RTF_FMASK \ (RTF_PROTO1 | RTF_PROTO2 | RTF_PROTO3 | RTF_BLACKHOLE | \ RTF_REJECT | RTF_STATIC | RTF_STICKY) /* * fib_ nexthop API flags. */ /* Consumer-visible nexthop info flags */ #define NHF_REJECT 0x0010 /* RTF_REJECT */ #define NHF_BLACKHOLE 0x0020 /* RTF_BLACKHOLE */ #define NHF_REDIRECT 0x0040 /* RTF_DYNAMIC|RTF_MODIFIED */ #define NHF_DEFAULT 0x0080 /* Default route */ #define NHF_BROADCAST 0x0100 /* RTF_BROADCAST */ #define NHF_GATEWAY 0x0200 /* RTF_GATEWAY */ /* Nexthop request flags */ #define NHR_IFAIF 0x01 /* Return ifa_ifp interface */ #define NHR_REF 0x02 /* For future use */ /* Control plane route request flags */ #define NHR_COPY 0x100 /* Copy rte data */ -/* rte<>nhop translation */ -static inline uint16_t -fib_rte_to_nh_flags(int rt_flags) -{ - uint16_t res; - - res = (rt_flags & RTF_REJECT) ? NHF_REJECT : 0; - res |= (rt_flags & RTF_BLACKHOLE) ? NHF_BLACKHOLE : 0; - res |= (rt_flags & (RTF_DYNAMIC|RTF_MODIFIED)) ? NHF_REDIRECT : 0; - res |= (rt_flags & RTF_BROADCAST) ? NHF_BROADCAST : 0; - res |= (rt_flags & RTF_GATEWAY) ? NHF_GATEWAY : 0; - - return (res); -} - #ifdef _KERNEL /* rte<>ro_flags translation */ static inline void rt_update_ro_flags(struct route *ro) { int rt_flags = ro->ro_rt->rt_flags; ro->ro_flags &= ~ (RT_REJECT|RT_BLACKHOLE|RT_HAS_GW); ro->ro_flags |= (rt_flags & RTF_REJECT) ? RT_REJECT : 0; ro->ro_flags |= (rt_flags & RTF_BLACKHOLE) ? RT_BLACKHOLE : 0; ro->ro_flags |= (rt_flags & RTF_GATEWAY) ? RT_HAS_GW : 0; } #endif /* * Routing statistics. */ struct rtstat { short rts_badredirect; /* bogus redirect calls */ short rts_dynamic; /* routes created by redirects */ short rts_newgateway; /* routes modified by redirects */ short rts_unreach; /* lookups which failed */ short rts_wildcard; /* lookups satisfied by a wildcard */ }; /* * Structures for routing messages. */ struct rt_msghdr { u_short rtm_msglen; /* to skip over non-understood messages */ u_char rtm_version; /* future binary compatibility */ u_char rtm_type; /* message type */ u_short rtm_index; /* index for associated ifp */ int rtm_flags; /* flags, incl. kern & message, e.g. DONE */ int rtm_addrs; /* bitmask identifying sockaddrs in msg */ pid_t rtm_pid; /* identify sender */ int rtm_seq; /* for sender to identify action */ int rtm_errno; /* why failed */ int rtm_fmask; /* bitmask used in RTM_CHANGE message */ u_long rtm_inits; /* which metrics we are initializing */ struct rt_metrics rtm_rmx; /* metrics themselves */ }; #define RTM_VERSION 5 /* Up the ante and ignore older versions */ /* * Message types. */ #define RTM_ADD 0x1 /* Add Route */ #define RTM_DELETE 0x2 /* Delete Route */ #define RTM_CHANGE 0x3 /* Change Metrics or flags */ #define RTM_GET 0x4 /* Report Metrics */ #define RTM_LOSING 0x5 /* Kernel Suspects Partitioning */ #define RTM_REDIRECT 0x6 /* Told to use different route */ #define RTM_MISS 0x7 /* Lookup failed on this address */ #define RTM_LOCK 0x8 /* fix specified metrics */ /* 0x9 */ /* 0xa */ #define RTM_RESOLVE 0xb /* req to resolve dst to LL addr */ #define RTM_NEWADDR 0xc /* address being added to iface */ #define RTM_DELADDR 0xd /* address being removed from iface */ #define RTM_IFINFO 0xe /* iface going up/down etc. */ #define RTM_NEWMADDR 0xf /* mcast group membership being added to if */ #define RTM_DELMADDR 0x10 /* mcast group membership being deleted */ #define RTM_IFANNOUNCE 0x11 /* iface arrival/departure */ #define RTM_IEEE80211 0x12 /* IEEE80211 wireless event */ /* * Bitmask values for rtm_inits and rmx_locks. */ #define RTV_MTU 0x1 /* init or lock _mtu */ #define RTV_HOPCOUNT 0x2 /* init or lock _hopcount */ #define RTV_EXPIRE 0x4 /* init or lock _expire */ #define RTV_RPIPE 0x8 /* init or lock _recvpipe */ #define RTV_SPIPE 0x10 /* init or lock _sendpipe */ #define RTV_SSTHRESH 0x20 /* init or lock _ssthresh */ #define RTV_RTT 0x40 /* init or lock _rtt */ #define RTV_RTTVAR 0x80 /* init or lock _rttvar */ #define RTV_WEIGHT 0x100 /* init or lock _weight */ /* * Bitmask values for rtm_addrs. */ #define RTA_DST 0x1 /* destination sockaddr present */ #define RTA_GATEWAY 0x2 /* gateway sockaddr present */ #define RTA_NETMASK 0x4 /* netmask sockaddr present */ #define RTA_GENMASK 0x8 /* cloning mask sockaddr present */ #define RTA_IFP 0x10 /* interface name sockaddr present */ #define RTA_IFA 0x20 /* interface addr sockaddr present */ #define RTA_AUTHOR 0x40 /* sockaddr for author of redirect */ #define RTA_BRD 0x80 /* for NEWADDR, broadcast or p-p dest addr */ /* * Index offsets for sockaddr array for alternate internal encoding. */ #define RTAX_DST 0 /* destination sockaddr present */ #define RTAX_GATEWAY 1 /* gateway sockaddr present */ #define RTAX_NETMASK 2 /* netmask sockaddr present */ #define RTAX_GENMASK 3 /* cloning mask sockaddr present */ #define RTAX_IFP 4 /* interface name sockaddr present */ #define RTAX_IFA 5 /* interface addr sockaddr present */ #define RTAX_AUTHOR 6 /* sockaddr for author of redirect */ #define RTAX_BRD 7 /* for NEWADDR, broadcast or p-p dest addr */ #define RTAX_MAX 8 /* size of array to allocate */ typedef int rt_filter_f_t(const struct rtentry *, void *); struct rt_addrinfo { int rti_addrs; /* Route RTF_ flags */ int rti_flags; /* Route RTF_ flags */ struct sockaddr *rti_info[RTAX_MAX]; /* Sockaddr data */ struct ifaddr *rti_ifa; /* value of rt_ifa addr */ struct ifnet *rti_ifp; /* route interface */ rt_filter_f_t *rti_filter; /* filter function */ void *rti_filterdata; /* filter paramenters */ u_long rti_mflags; /* metrics RTV_ flags */ u_long rti_spare; /* Will be used for fib */ struct rt_metrics *rti_rmx; /* Pointer to route metrics */ }; /* * This macro returns the size of a struct sockaddr when passed * through a routing socket. Basically we round up sa_len to * a multiple of sizeof(long), with a minimum of sizeof(long). * The check for a NULL pointer is just a convenience, probably never used. * The case sa_len == 0 should only apply to empty structures. */ #define SA_SIZE(sa) \ ( (!(sa) || ((struct sockaddr *)(sa))->sa_len == 0) ? \ sizeof(long) : \ 1 + ( (((struct sockaddr *)(sa))->sa_len - 1) | (sizeof(long) - 1) ) ) #define sa_equal(a, b) ( \ (((const struct sockaddr *)(a))->sa_len == ((const struct sockaddr *)(b))->sa_len) && \ (bcmp((a), (b), ((const struct sockaddr *)(b))->sa_len) == 0)) #ifdef _KERNEL #define RT_LINK_IS_UP(ifp) (!((ifp)->if_capabilities & IFCAP_LINKSTATE) \ || (ifp)->if_link_state == LINK_STATE_UP) #define RT_LOCK_INIT(_rt) \ mtx_init(&(_rt)->rt_mtx, "rtentry", NULL, MTX_DEF | MTX_DUPOK) #define RT_LOCK(_rt) mtx_lock(&(_rt)->rt_mtx) #define RT_UNLOCK(_rt) mtx_unlock(&(_rt)->rt_mtx) #define RT_LOCK_DESTROY(_rt) mtx_destroy(&(_rt)->rt_mtx) #define RT_LOCK_ASSERT(_rt) mtx_assert(&(_rt)->rt_mtx, MA_OWNED) #define RT_UNLOCK_COND(_rt) do { \ if (mtx_owned(&(_rt)->rt_mtx)) \ mtx_unlock(&(_rt)->rt_mtx); \ } while (0) #define RT_ADDREF(_rt) do { \ RT_LOCK_ASSERT(_rt); \ KASSERT((_rt)->rt_refcnt >= 0, \ ("negative refcnt %d", (_rt)->rt_refcnt)); \ (_rt)->rt_refcnt++; \ } while (0) #define RT_REMREF(_rt) do { \ RT_LOCK_ASSERT(_rt); \ KASSERT((_rt)->rt_refcnt > 0, \ ("bogus refcnt %d", (_rt)->rt_refcnt)); \ (_rt)->rt_refcnt--; \ } while (0) #define RTFREE_LOCKED(_rt) do { \ if ((_rt)->rt_refcnt <= 1) \ rtfree(_rt); \ else { \ RT_REMREF(_rt); \ RT_UNLOCK(_rt); \ } \ /* guard against invalid refs */ \ _rt = 0; \ } while (0) #define RTFREE(_rt) do { \ RT_LOCK(_rt); \ RTFREE_LOCKED(_rt); \ } while (0) #define RO_RTFREE(_ro) do { \ if ((_ro)->ro_rt) { \ if ((_ro)->ro_flags & RT_NORTREF) { \ (_ro)->ro_flags &= ~RT_NORTREF; \ (_ro)->ro_rt = NULL; \ } else { \ RT_LOCK((_ro)->ro_rt); \ RTFREE_LOCKED((_ro)->ro_rt); \ } \ } \ } while (0) -struct radix_node_head *rt_tables_get_rnh(int, int); - struct ifmultiaddr; +struct rib_head; void rt_ieee80211msg(struct ifnet *, int, void *, size_t); void rt_ifannouncemsg(struct ifnet *, int); void rt_ifmsg(struct ifnet *); void rt_missmsg(int, struct rt_addrinfo *, int, int); void rt_missmsg_fib(int, struct rt_addrinfo *, int, int, int); void rt_newaddrmsg(int, struct ifaddr *, int, struct rtentry *); void rt_newaddrmsg_fib(int, struct ifaddr *, int, struct rtentry *, int); int rt_addrmsg(int, struct ifaddr *, int); int rt_routemsg(int, struct ifnet *ifp, int, struct rtentry *, int); void rt_newmaddrmsg(int, struct ifmultiaddr *); int rt_setgate(struct rtentry *, struct sockaddr *, struct sockaddr *); void rt_maskedcopy(struct sockaddr *, struct sockaddr *, struct sockaddr *); +struct rib_head *rt_table_init(int); +void rt_table_destroy(struct rib_head *); int rtsock_addrmsg(int, struct ifaddr *, int); int rtsock_routemsg(int, struct ifnet *ifp, int, struct rtentry *, int); /* * Note the following locking behavior: * * rtalloc1() returns a locked rtentry * * rtfree() and RTFREE_LOCKED() require a locked rtentry * * RTFREE() uses an unlocked entry. */ void rtfree(struct rtentry *); void rt_updatemtu(struct ifnet *); typedef int rt_walktree_f_t(struct rtentry *, void *); -typedef void rt_setwarg_t(struct radix_node_head *, uint32_t, int, void *); +typedef void rt_setwarg_t(struct rib_head *, uint32_t, int, void *); void rt_foreach_fib_walk(int af, rt_setwarg_t *, rt_walktree_f_t *, void *); void rt_foreach_fib_walk_del(int af, rt_filter_f_t *filter_f, void *arg); void rt_flushifroutes(struct ifnet *ifp); /* XXX MRT COMPAT VERSIONS THAT SET UNIVERSE to 0 */ /* Thes are used by old code not yet converted to use multiple FIBS */ struct rtentry *rtalloc1(struct sockaddr *, int, u_long); int rtinit(struct ifaddr *, int, int); /* XXX MRT NEW VERSIONS THAT USE FIBs * For now the protocol indepedent versions are the same as the AF_INET ones * but this will change.. */ int rt_getifa_fib(struct rt_addrinfo *, u_int fibnum); void rtalloc_ign_fib(struct route *ro, u_long ignflags, u_int fibnum); struct rtentry *rtalloc1_fib(struct sockaddr *, int, u_long, u_int); int rtioctl_fib(u_long, caddr_t, u_int); void rtredirect_fib(struct sockaddr *, struct sockaddr *, struct sockaddr *, int, struct sockaddr *, u_int); int rtrequest_fib(int, struct sockaddr *, struct sockaddr *, struct sockaddr *, int, struct rtentry **, u_int); int rtrequest1_fib(int, struct rt_addrinfo *, struct rtentry **, u_int); int rib_lookup_info(uint32_t, const struct sockaddr *, uint32_t, uint32_t, struct rt_addrinfo *); void rib_free_info(struct rt_addrinfo *info); #endif #endif Index: projects/clang380-import/sys/net/route_var.h =================================================================== --- projects/clang380-import/sys/net/route_var.h (nonexistent) +++ projects/clang380-import/sys/net/route_var.h (revision 294777) @@ -0,0 +1,76 @@ +/*- + * Copyright (c) 2015-2016 + * Alexander V. Chernikov + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _NET_ROUTE_VAR_H_ +#define _NET_ROUTE_VAR_H_ + +struct rib_head { + struct radix_head head; + rn_matchaddr_f_t *rnh_matchaddr; /* longest match for sockaddr */ + rn_addaddr_f_t *rnh_addaddr; /* add based on sockaddr*/ + rn_deladdr_f_t *rnh_deladdr; /* remove based on sockaddr */ + rn_lookup_f_t *rnh_lookup; /* exact match for sockaddr */ + rn_walktree_t *rnh_walktree; /* traverse tree */ + rn_walktree_from_t *rnh_walktree_from; /* traverse tree below a */ + rn_close_t *rnh_close; /*do something when the last ref drops*/ + u_int rnh_gen; /* generation counter */ + int rnh_multipath; /* multipath capable ? */ + struct radix_node rnh_nodes[3]; /* empty tree for common case */ + struct rwlock rib_lock; /* config/data path lock */ + struct radix_mask_head rmhead; /* masks radix head */ +}; + +#define RIB_RLOCK(rh) rw_rlock(&(rh)->rib_lock) +#define RIB_RUNLOCK(rh) rw_runlock(&(rh)->rib_lock) +#define RIB_WLOCK(rh) rw_wlock(&(rh)->rib_lock) +#define RIB_WUNLOCK(rh) rw_wunlock(&(rh)->rib_lock) +#define RIB_LOCK_ASSERT(rh) rw_assert(&(rh)->rib_lock, RA_LOCKED) +#define RIB_WLOCK_ASSERT(rh) rw_assert(&(rh)->rib_lock, RA_WLOCKED) + +struct rib_head *rt_tables_get_rnh(int fib, int family); + +/* rte<>nhop translation */ +static inline uint16_t +fib_rte_to_nh_flags(int rt_flags) +{ + uint16_t res; + + res = (rt_flags & RTF_REJECT) ? NHF_REJECT : 0; + res |= (rt_flags & RTF_BLACKHOLE) ? NHF_BLACKHOLE : 0; + res |= (rt_flags & (RTF_DYNAMIC|RTF_MODIFIED)) ? NHF_REDIRECT : 0; + res |= (rt_flags & RTF_BROADCAST) ? NHF_BROADCAST : 0; + res |= (rt_flags & RTF_GATEWAY) ? NHF_GATEWAY : 0; + + return (res); +} + + +#endif Property changes on: projects/clang380-import/sys/net/route_var.h ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: projects/clang380-import/sys/net/rtsock.c =================================================================== --- projects/clang380-import/sys/net/rtsock.c (revision 294776) +++ projects/clang380-import/sys/net/rtsock.c (revision 294777) @@ -1,1925 +1,1927 @@ /*- * Copyright (c) 1988, 1991, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)rtsock.c 8.7 (Berkeley) 10/12/95 * $FreeBSD$ */ #include "opt_compat.h" #include "opt_mpath.h" #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #ifdef INET6 #include #include #endif #ifdef COMPAT_FREEBSD32 #include #include struct if_msghdr32 { uint16_t ifm_msglen; uint8_t ifm_version; uint8_t ifm_type; int32_t ifm_addrs; int32_t ifm_flags; uint16_t ifm_index; struct if_data ifm_data; }; struct if_msghdrl32 { uint16_t ifm_msglen; uint8_t ifm_version; uint8_t ifm_type; int32_t ifm_addrs; int32_t ifm_flags; uint16_t ifm_index; uint16_t _ifm_spare1; uint16_t ifm_len; uint16_t ifm_data_off; struct if_data ifm_data; }; struct ifa_msghdrl32 { uint16_t ifam_msglen; uint8_t ifam_version; uint8_t ifam_type; int32_t ifam_addrs; int32_t ifam_flags; uint16_t ifam_index; uint16_t _ifam_spare1; uint16_t ifam_len; uint16_t ifam_data_off; int32_t ifam_metric; struct if_data ifam_data; }; #endif /* COMPAT_FREEBSD32 */ MALLOC_DEFINE(M_RTABLE, "routetbl", "routing tables"); /* NB: these are not modified */ static struct sockaddr route_src = { 2, PF_ROUTE, }; static struct sockaddr sa_zero = { sizeof(sa_zero), AF_INET, }; /* These are external hooks for CARP. */ int (*carp_get_vhid_p)(struct ifaddr *); /* * Used by rtsock/raw_input callback code to decide whether to filter the update * notification to a socket bound to a particular FIB. */ #define RTS_FILTER_FIB M_PROTO8 typedef struct { int ip_count; /* attached w/ AF_INET */ int ip6_count; /* attached w/ AF_INET6 */ int any_count; /* total attached */ } route_cb_t; static VNET_DEFINE(route_cb_t, route_cb); #define V_route_cb VNET(route_cb) struct mtx rtsock_mtx; MTX_SYSINIT(rtsock, &rtsock_mtx, "rtsock route_cb lock", MTX_DEF); #define RTSOCK_LOCK() mtx_lock(&rtsock_mtx) #define RTSOCK_UNLOCK() mtx_unlock(&rtsock_mtx) #define RTSOCK_LOCK_ASSERT() mtx_assert(&rtsock_mtx, MA_OWNED) static SYSCTL_NODE(_net, OID_AUTO, route, CTLFLAG_RD, 0, ""); struct walkarg { int w_tmemsize; int w_op, w_arg; caddr_t w_tmem; struct sysctl_req *w_req; }; static void rts_input(struct mbuf *m); static struct mbuf *rtsock_msg_mbuf(int type, struct rt_addrinfo *rtinfo); static int rtsock_msg_buffer(int type, struct rt_addrinfo *rtinfo, struct walkarg *w, int *plen); static int rt_xaddrs(caddr_t cp, caddr_t cplim, struct rt_addrinfo *rtinfo); static int sysctl_dumpentry(struct radix_node *rn, void *vw); static int sysctl_iflist(int af, struct walkarg *w); static int sysctl_ifmalist(int af, struct walkarg *w); static int route_output(struct mbuf *m, struct socket *so, ...); static void rt_getmetrics(const struct rtentry *rt, struct rt_metrics *out); static void rt_dispatch(struct mbuf *, sa_family_t); static struct sockaddr *rtsock_fix_netmask(struct sockaddr *dst, struct sockaddr *smask, struct sockaddr_storage *dmask); static struct netisr_handler rtsock_nh = { .nh_name = "rtsock", .nh_handler = rts_input, .nh_proto = NETISR_ROUTE, .nh_policy = NETISR_POLICY_SOURCE, }; static int sysctl_route_netisr_maxqlen(SYSCTL_HANDLER_ARGS) { int error, qlimit; netisr_getqlimit(&rtsock_nh, &qlimit); error = sysctl_handle_int(oidp, &qlimit, 0, req); if (error || !req->newptr) return (error); if (qlimit < 1) return (EINVAL); return (netisr_setqlimit(&rtsock_nh, qlimit)); } SYSCTL_PROC(_net_route, OID_AUTO, netisr_maxqlen, CTLTYPE_INT|CTLFLAG_RW, 0, 0, sysctl_route_netisr_maxqlen, "I", "maximum routing socket dispatch queue length"); static void rts_init(void) { int tmp; if (TUNABLE_INT_FETCH("net.route.netisr_maxqlen", &tmp)) rtsock_nh.nh_qlimit = tmp; netisr_register(&rtsock_nh); } SYSINIT(rtsock, SI_SUB_PROTO_DOMAIN, SI_ORDER_THIRD, rts_init, 0); static int raw_input_rts_cb(struct mbuf *m, struct sockproto *proto, struct sockaddr *src, struct rawcb *rp) { int fibnum; KASSERT(m != NULL, ("%s: m is NULL", __func__)); KASSERT(proto != NULL, ("%s: proto is NULL", __func__)); KASSERT(rp != NULL, ("%s: rp is NULL", __func__)); /* No filtering requested. */ if ((m->m_flags & RTS_FILTER_FIB) == 0) return (0); /* Check if it is a rts and the fib matches the one of the socket. */ fibnum = M_GETFIB(m); if (proto->sp_family != PF_ROUTE || rp->rcb_socket == NULL || rp->rcb_socket->so_fibnum == fibnum) return (0); /* Filtering requested and no match, the socket shall be skipped. */ return (1); } static void rts_input(struct mbuf *m) { struct sockproto route_proto; unsigned short *family; struct m_tag *tag; route_proto.sp_family = PF_ROUTE; tag = m_tag_find(m, PACKET_TAG_RTSOCKFAM, NULL); if (tag != NULL) { family = (unsigned short *)(tag + 1); route_proto.sp_protocol = *family; m_tag_delete(m, tag); } else route_proto.sp_protocol = 0; raw_input_ext(m, &route_proto, &route_src, raw_input_rts_cb); } /* * It really doesn't make any sense at all for this code to share much * with raw_usrreq.c, since its functionality is so restricted. XXX */ static void rts_abort(struct socket *so) { raw_usrreqs.pru_abort(so); } static void rts_close(struct socket *so) { raw_usrreqs.pru_close(so); } /* pru_accept is EOPNOTSUPP */ static int rts_attach(struct socket *so, int proto, struct thread *td) { struct rawcb *rp; int error; KASSERT(so->so_pcb == NULL, ("rts_attach: so_pcb != NULL")); /* XXX */ rp = malloc(sizeof *rp, M_PCB, M_WAITOK | M_ZERO); if (rp == NULL) return ENOBUFS; so->so_pcb = (caddr_t)rp; so->so_fibnum = td->td_proc->p_fibnum; error = raw_attach(so, proto); rp = sotorawcb(so); if (error) { so->so_pcb = NULL; free(rp, M_PCB); return error; } RTSOCK_LOCK(); switch(rp->rcb_proto.sp_protocol) { case AF_INET: V_route_cb.ip_count++; break; case AF_INET6: V_route_cb.ip6_count++; break; } V_route_cb.any_count++; RTSOCK_UNLOCK(); soisconnected(so); so->so_options |= SO_USELOOPBACK; return 0; } static int rts_bind(struct socket *so, struct sockaddr *nam, struct thread *td) { return (raw_usrreqs.pru_bind(so, nam, td)); /* xxx just EINVAL */ } static int rts_connect(struct socket *so, struct sockaddr *nam, struct thread *td) { return (raw_usrreqs.pru_connect(so, nam, td)); /* XXX just EINVAL */ } /* pru_connect2 is EOPNOTSUPP */ /* pru_control is EOPNOTSUPP */ static void rts_detach(struct socket *so) { struct rawcb *rp = sotorawcb(so); KASSERT(rp != NULL, ("rts_detach: rp == NULL")); RTSOCK_LOCK(); switch(rp->rcb_proto.sp_protocol) { case AF_INET: V_route_cb.ip_count--; break; case AF_INET6: V_route_cb.ip6_count--; break; } V_route_cb.any_count--; RTSOCK_UNLOCK(); raw_usrreqs.pru_detach(so); } static int rts_disconnect(struct socket *so) { return (raw_usrreqs.pru_disconnect(so)); } /* pru_listen is EOPNOTSUPP */ static int rts_peeraddr(struct socket *so, struct sockaddr **nam) { return (raw_usrreqs.pru_peeraddr(so, nam)); } /* pru_rcvd is EOPNOTSUPP */ /* pru_rcvoob is EOPNOTSUPP */ static int rts_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam, struct mbuf *control, struct thread *td) { return (raw_usrreqs.pru_send(so, flags, m, nam, control, td)); } /* pru_sense is null */ static int rts_shutdown(struct socket *so) { return (raw_usrreqs.pru_shutdown(so)); } static int rts_sockaddr(struct socket *so, struct sockaddr **nam) { return (raw_usrreqs.pru_sockaddr(so, nam)); } static struct pr_usrreqs route_usrreqs = { .pru_abort = rts_abort, .pru_attach = rts_attach, .pru_bind = rts_bind, .pru_connect = rts_connect, .pru_detach = rts_detach, .pru_disconnect = rts_disconnect, .pru_peeraddr = rts_peeraddr, .pru_send = rts_send, .pru_shutdown = rts_shutdown, .pru_sockaddr = rts_sockaddr, .pru_close = rts_close, }; #ifndef _SOCKADDR_UNION_DEFINED #define _SOCKADDR_UNION_DEFINED /* * The union of all possible address formats we handle. */ union sockaddr_union { struct sockaddr sa; struct sockaddr_in sin; struct sockaddr_in6 sin6; }; #endif /* _SOCKADDR_UNION_DEFINED */ static int rtm_get_jailed(struct rt_addrinfo *info, struct ifnet *ifp, struct rtentry *rt, union sockaddr_union *saun, struct ucred *cred) { /* First, see if the returned address is part of the jail. */ if (prison_if(cred, rt->rt_ifa->ifa_addr) == 0) { info->rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; return (0); } switch (info->rti_info[RTAX_DST]->sa_family) { #ifdef INET case AF_INET: { struct in_addr ia; struct ifaddr *ifa; int found; found = 0; /* * Try to find an address on the given outgoing interface * that belongs to the jail. */ IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { struct sockaddr *sa; sa = ifa->ifa_addr; if (sa->sa_family != AF_INET) continue; ia = ((struct sockaddr_in *)sa)->sin_addr; if (prison_check_ip4(cred, &ia) == 0) { found = 1; break; } } IF_ADDR_RUNLOCK(ifp); if (!found) { /* * As a last resort return the 'default' jail address. */ ia = ((struct sockaddr_in *)rt->rt_ifa->ifa_addr)-> sin_addr; if (prison_get_ip4(cred, &ia) != 0) return (ESRCH); } bzero(&saun->sin, sizeof(struct sockaddr_in)); saun->sin.sin_len = sizeof(struct sockaddr_in); saun->sin.sin_family = AF_INET; saun->sin.sin_addr.s_addr = ia.s_addr; info->rti_info[RTAX_IFA] = (struct sockaddr *)&saun->sin; break; } #endif #ifdef INET6 case AF_INET6: { struct in6_addr ia6; struct ifaddr *ifa; int found; found = 0; /* * Try to find an address on the given outgoing interface * that belongs to the jail. */ IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { struct sockaddr *sa; sa = ifa->ifa_addr; if (sa->sa_family != AF_INET6) continue; bcopy(&((struct sockaddr_in6 *)sa)->sin6_addr, &ia6, sizeof(struct in6_addr)); if (prison_check_ip6(cred, &ia6) == 0) { found = 1; break; } } IF_ADDR_RUNLOCK(ifp); if (!found) { /* * As a last resort return the 'default' jail address. */ ia6 = ((struct sockaddr_in6 *)rt->rt_ifa->ifa_addr)-> sin6_addr; if (prison_get_ip6(cred, &ia6) != 0) return (ESRCH); } bzero(&saun->sin6, sizeof(struct sockaddr_in6)); saun->sin6.sin6_len = sizeof(struct sockaddr_in6); saun->sin6.sin6_family = AF_INET6; bcopy(&ia6, &saun->sin6.sin6_addr, sizeof(struct in6_addr)); if (sa6_recoverscope(&saun->sin6) != 0) return (ESRCH); info->rti_info[RTAX_IFA] = (struct sockaddr *)&saun->sin6; break; } #endif default: return (ESRCH); } return (0); } /*ARGSUSED*/ static int route_output(struct mbuf *m, struct socket *so, ...) { struct rt_msghdr *rtm = NULL; struct rtentry *rt = NULL; - struct radix_node_head *rnh; + struct rib_head *rnh; struct rt_addrinfo info; struct sockaddr_storage ss; #ifdef INET6 struct sockaddr_in6 *sin6; int i, rti_need_deembed = 0; #endif int alloc_len = 0, len, error = 0, fibnum; struct ifnet *ifp = NULL; union sockaddr_union saun; sa_family_t saf = AF_UNSPEC; struct rawcb *rp = NULL; struct walkarg w; fibnum = so->so_fibnum; #define senderr(e) { error = e; goto flush;} if (m == NULL || ((m->m_len < sizeof(long)) && (m = m_pullup(m, sizeof(long))) == NULL)) return (ENOBUFS); if ((m->m_flags & M_PKTHDR) == 0) panic("route_output"); len = m->m_pkthdr.len; if (len < sizeof(*rtm) || len != mtod(m, struct rt_msghdr *)->rtm_msglen) senderr(EINVAL); /* * Most of current messages are in range 200-240 bytes, * minimize possible re-allocation on reply using larger size * buffer aligned on 1k boundaty. */ alloc_len = roundup2(len, 1024); if ((rtm = malloc(alloc_len, M_TEMP, M_NOWAIT)) == NULL) senderr(ENOBUFS); m_copydata(m, 0, len, (caddr_t)rtm); bzero(&info, sizeof(info)); bzero(&w, sizeof(w)); if (rtm->rtm_version != RTM_VERSION) { /* Do not touch message since format is unknown */ free(rtm, M_TEMP); rtm = NULL; senderr(EPROTONOSUPPORT); } /* * Starting from here, it is possible * to alter original message and insert * caller PID and error value. */ rtm->rtm_pid = curproc->p_pid; info.rti_addrs = rtm->rtm_addrs; info.rti_mflags = rtm->rtm_inits; info.rti_rmx = &rtm->rtm_rmx; /* * rt_xaddrs() performs s6_addr[2] := sin6_scope_id for AF_INET6 * link-local address because rtrequest requires addresses with * embedded scope id. */ if (rt_xaddrs((caddr_t)(rtm + 1), len + (caddr_t)rtm, &info)) senderr(EINVAL); info.rti_flags = rtm->rtm_flags; if (info.rti_info[RTAX_DST] == NULL || info.rti_info[RTAX_DST]->sa_family >= AF_MAX || (info.rti_info[RTAX_GATEWAY] != NULL && info.rti_info[RTAX_GATEWAY]->sa_family >= AF_MAX)) senderr(EINVAL); saf = info.rti_info[RTAX_DST]->sa_family; /* * Verify that the caller has the appropriate privilege; RTM_GET * is the only operation the non-superuser is allowed. */ if (rtm->rtm_type != RTM_GET) { error = priv_check(curthread, PRIV_NET_ROUTE); if (error) senderr(error); } /* * The given gateway address may be an interface address. * For example, issuing a "route change" command on a route * entry that was created from a tunnel, and the gateway * address given is the local end point. In this case the * RTF_GATEWAY flag must be cleared or the destination will * not be reachable even though there is no error message. */ if (info.rti_info[RTAX_GATEWAY] != NULL && info.rti_info[RTAX_GATEWAY]->sa_family != AF_LINK) { struct rt_addrinfo ginfo; struct sockaddr *gdst; bzero(&ginfo, sizeof(ginfo)); bzero(&ss, sizeof(ss)); ss.ss_len = sizeof(ss); ginfo.rti_info[RTAX_GATEWAY] = (struct sockaddr *)&ss; gdst = info.rti_info[RTAX_GATEWAY]; /* * A host route through the loopback interface is * installed for each interface adddress. In pre 8.0 * releases the interface address of a PPP link type * is not reachable locally. This behavior is fixed as * part of the new L2/L3 redesign and rewrite work. The * signature of this interface address route is the * AF_LINK sa_family type of the rt_gateway, and the * rt_ifp has the IFF_LOOPBACK flag set. */ if (rib_lookup_info(fibnum, gdst, NHR_REF, 0, &ginfo) == 0) { if (ss.ss_family == AF_LINK && ginfo.rti_ifp->if_flags & IFF_LOOPBACK) { info.rti_flags &= ~RTF_GATEWAY; info.rti_flags |= RTF_GWFLAG_COMPAT; } rib_free_info(&ginfo); } } switch (rtm->rtm_type) { struct rtentry *saved_nrt; case RTM_ADD: case RTM_CHANGE: if (info.rti_info[RTAX_GATEWAY] == NULL) senderr(EINVAL); saved_nrt = NULL; /* support for new ARP code */ if (info.rti_info[RTAX_GATEWAY]->sa_family == AF_LINK && (rtm->rtm_flags & RTF_LLDATA) != 0) { error = lla_rt_output(rtm, &info); #ifdef INET6 if (error == 0) rti_need_deembed = (V_deembed_scopeid) ? 1 : 0; #endif break; } error = rtrequest1_fib(rtm->rtm_type, &info, &saved_nrt, fibnum); if (error == 0 && saved_nrt != NULL) { #ifdef INET6 rti_need_deembed = (V_deembed_scopeid) ? 1 : 0; #endif RT_LOCK(saved_nrt); rtm->rtm_index = saved_nrt->rt_ifp->if_index; RT_REMREF(saved_nrt); RT_UNLOCK(saved_nrt); } break; case RTM_DELETE: saved_nrt = NULL; /* support for new ARP code */ if (info.rti_info[RTAX_GATEWAY] && (info.rti_info[RTAX_GATEWAY]->sa_family == AF_LINK) && (rtm->rtm_flags & RTF_LLDATA) != 0) { error = lla_rt_output(rtm, &info); #ifdef INET6 if (error == 0) rti_need_deembed = (V_deembed_scopeid) ? 1 : 0; #endif break; } error = rtrequest1_fib(RTM_DELETE, &info, &saved_nrt, fibnum); if (error == 0) { RT_LOCK(saved_nrt); rt = saved_nrt; goto report; } #ifdef INET6 /* rt_msg2() will not be used when RTM_DELETE fails. */ rti_need_deembed = (V_deembed_scopeid) ? 1 : 0; #endif break; case RTM_GET: rnh = rt_tables_get_rnh(fibnum, saf); if (rnh == NULL) senderr(EAFNOSUPPORT); - RADIX_NODE_HEAD_RLOCK(rnh); + RIB_RLOCK(rnh); if (info.rti_info[RTAX_NETMASK] == NULL && rtm->rtm_type == RTM_GET) { /* * Provide logest prefix match for * address lookup (no mask). * 'route -n get addr' */ rt = (struct rtentry *) rnh->rnh_matchaddr( - info.rti_info[RTAX_DST], rnh); + info.rti_info[RTAX_DST], &rnh->head); } else rt = (struct rtentry *) rnh->rnh_lookup( info.rti_info[RTAX_DST], - info.rti_info[RTAX_NETMASK], rnh); + info.rti_info[RTAX_NETMASK], &rnh->head); if (rt == NULL) { - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rnh); senderr(ESRCH); } #ifdef RADIX_MPATH /* * for RTM_CHANGE/LOCK, if we got multipath routes, * we require users to specify a matching RTAX_GATEWAY. * * for RTM_GET, gate is optional even with multipath. * if gate == NULL the first match is returned. * (no need to call rt_mpath_matchgate if gate == NULL) */ - if (rn_mpath_capable(rnh) && + if (rt_mpath_capable(rnh) && (rtm->rtm_type != RTM_GET || info.rti_info[RTAX_GATEWAY])) { rt = rt_mpath_matchgate(rt, info.rti_info[RTAX_GATEWAY]); if (!rt) { - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rnh); senderr(ESRCH); } } #endif /* * If performing proxied L2 entry insertion, and * the actual PPP host entry is found, perform * another search to retrieve the prefix route of * the local end point of the PPP link. */ if (rtm->rtm_flags & RTF_ANNOUNCE) { struct sockaddr laddr; if (rt->rt_ifp != NULL && rt->rt_ifp->if_type == IFT_PROPVIRTUAL) { struct ifaddr *ifa; ifa = ifa_ifwithnet(info.rti_info[RTAX_DST], 1, RT_ALL_FIBS); if (ifa != NULL) rt_maskedcopy(ifa->ifa_addr, &laddr, ifa->ifa_netmask); } else rt_maskedcopy(rt->rt_ifa->ifa_addr, &laddr, rt->rt_ifa->ifa_netmask); /* * refactor rt and no lock operation necessary */ - rt = (struct rtentry *)rnh->rnh_matchaddr(&laddr, rnh); + rt = (struct rtentry *)rnh->rnh_matchaddr(&laddr, + &rnh->head); if (rt == NULL) { - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rnh); senderr(ESRCH); } } RT_LOCK(rt); RT_ADDREF(rt); - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rnh); report: RT_LOCK_ASSERT(rt); if ((rt->rt_flags & RTF_HOST) == 0 ? jailed_without_vnet(curthread->td_ucred) : prison_if(curthread->td_ucred, rt_key(rt)) != 0) { RT_UNLOCK(rt); senderr(ESRCH); } info.rti_info[RTAX_DST] = rt_key(rt); info.rti_info[RTAX_GATEWAY] = rt->rt_gateway; info.rti_info[RTAX_NETMASK] = rtsock_fix_netmask(rt_key(rt), rt_mask(rt), &ss); info.rti_info[RTAX_GENMASK] = 0; if (rtm->rtm_addrs & (RTA_IFP | RTA_IFA)) { ifp = rt->rt_ifp; if (ifp) { info.rti_info[RTAX_IFP] = ifp->if_addr->ifa_addr; error = rtm_get_jailed(&info, ifp, rt, &saun, curthread->td_ucred); if (error != 0) { RT_UNLOCK(rt); senderr(error); } if (ifp->if_flags & IFF_POINTOPOINT) info.rti_info[RTAX_BRD] = rt->rt_ifa->ifa_dstaddr; rtm->rtm_index = ifp->if_index; } else { info.rti_info[RTAX_IFP] = NULL; info.rti_info[RTAX_IFA] = NULL; } } else if ((ifp = rt->rt_ifp) != NULL) { rtm->rtm_index = ifp->if_index; } /* Check if we need to realloc storage */ rtsock_msg_buffer(rtm->rtm_type, &info, NULL, &len); if (len > alloc_len) { struct rt_msghdr *new_rtm; new_rtm = malloc(len, M_TEMP, M_NOWAIT); if (new_rtm == NULL) { RT_UNLOCK(rt); senderr(ENOBUFS); } bcopy(rtm, new_rtm, rtm->rtm_msglen); free(rtm, M_TEMP); rtm = new_rtm; alloc_len = len; } w.w_tmem = (caddr_t)rtm; w.w_tmemsize = alloc_len; rtsock_msg_buffer(rtm->rtm_type, &info, &w, &len); if (rt->rt_flags & RTF_GWFLAG_COMPAT) rtm->rtm_flags = RTF_GATEWAY | (rt->rt_flags & ~RTF_GWFLAG_COMPAT); else rtm->rtm_flags = rt->rt_flags; rt_getmetrics(rt, &rtm->rtm_rmx); rtm->rtm_addrs = info.rti_addrs; RT_UNLOCK(rt); break; default: senderr(EOPNOTSUPP); } flush: if (rt != NULL) RTFREE(rt); /* * Check to see if we don't want our own messages. */ if ((so->so_options & SO_USELOOPBACK) == 0) { if (V_route_cb.any_count <= 1) { if (rtm != NULL) free(rtm, M_TEMP); m_freem(m); return (error); } /* There is another listener, so construct message */ rp = sotorawcb(so); } if (rtm != NULL) { #ifdef INET6 if (rti_need_deembed) { /* sin6_scope_id is recovered before sending rtm. */ sin6 = (struct sockaddr_in6 *)&ss; for (i = 0; i < RTAX_MAX; i++) { if (info.rti_info[i] == NULL) continue; if (info.rti_info[i]->sa_family != AF_INET6) continue; bcopy(info.rti_info[i], sin6, sizeof(*sin6)); if (sa6_recoverscope(sin6) == 0) bcopy(sin6, info.rti_info[i], sizeof(*sin6)); } } #endif if (error != 0) rtm->rtm_errno = error; else rtm->rtm_flags |= RTF_DONE; m_copyback(m, 0, rtm->rtm_msglen, (caddr_t)rtm); if (m->m_pkthdr.len < rtm->rtm_msglen) { m_freem(m); m = NULL; } else if (m->m_pkthdr.len > rtm->rtm_msglen) m_adj(m, rtm->rtm_msglen - m->m_pkthdr.len); free(rtm, M_TEMP); } if (m != NULL) { M_SETFIB(m, fibnum); m->m_flags |= RTS_FILTER_FIB; if (rp) { /* * XXX insure we don't get a copy by * invalidating our protocol */ unsigned short family = rp->rcb_proto.sp_family; rp->rcb_proto.sp_family = 0; rt_dispatch(m, saf); rp->rcb_proto.sp_family = family; } else rt_dispatch(m, saf); } return (error); } static void rt_getmetrics(const struct rtentry *rt, struct rt_metrics *out) { bzero(out, sizeof(*out)); out->rmx_mtu = rt->rt_mtu; out->rmx_weight = rt->rt_weight; out->rmx_pksent = counter_u64_fetch(rt->rt_pksent); /* Kernel -> userland timebase conversion. */ out->rmx_expire = rt->rt_expire ? rt->rt_expire - time_uptime + time_second : 0; } /* * Extract the addresses of the passed sockaddrs. * Do a little sanity checking so as to avoid bad memory references. * This data is derived straight from userland. */ static int rt_xaddrs(caddr_t cp, caddr_t cplim, struct rt_addrinfo *rtinfo) { struct sockaddr *sa; int i; for (i = 0; i < RTAX_MAX && cp < cplim; i++) { if ((rtinfo->rti_addrs & (1 << i)) == 0) continue; sa = (struct sockaddr *)cp; /* * It won't fit. */ if (cp + sa->sa_len > cplim) return (EINVAL); /* * there are no more.. quit now * If there are more bits, they are in error. * I've seen this. route(1) can evidently generate these. * This causes kernel to core dump. * for compatibility, If we see this, point to a safe address. */ if (sa->sa_len == 0) { rtinfo->rti_info[i] = &sa_zero; return (0); /* should be EINVAL but for compat */ } /* accept it */ #ifdef INET6 if (sa->sa_family == AF_INET6) sa6_embedscope((struct sockaddr_in6 *)sa, V_ip6_use_defzone); #endif rtinfo->rti_info[i] = sa; cp += SA_SIZE(sa); } return (0); } /* * Fill in @dmask with valid netmask leaving original @smask * intact. Mostly used with radix netmasks. */ static struct sockaddr * rtsock_fix_netmask(struct sockaddr *dst, struct sockaddr *smask, struct sockaddr_storage *dmask) { if (dst == NULL || smask == NULL) return (NULL); memset(dmask, 0, dst->sa_len); memcpy(dmask, smask, smask->sa_len); dmask->ss_len = dst->sa_len; dmask->ss_family = dst->sa_family; return ((struct sockaddr *)dmask); } /* * Writes information related to @rtinfo object to newly-allocated mbuf. * Assumes MCLBYTES is enough to construct any message. * Used for OS notifications of vaious events (if/ifa announces,etc) * * Returns allocated mbuf or NULL on failure. */ static struct mbuf * rtsock_msg_mbuf(int type, struct rt_addrinfo *rtinfo) { struct rt_msghdr *rtm; struct mbuf *m; int i; struct sockaddr *sa; #ifdef INET6 struct sockaddr_storage ss; struct sockaddr_in6 *sin6; #endif int len, dlen; switch (type) { case RTM_DELADDR: case RTM_NEWADDR: len = sizeof(struct ifa_msghdr); break; case RTM_DELMADDR: case RTM_NEWMADDR: len = sizeof(struct ifma_msghdr); break; case RTM_IFINFO: len = sizeof(struct if_msghdr); break; case RTM_IFANNOUNCE: case RTM_IEEE80211: len = sizeof(struct if_announcemsghdr); break; default: len = sizeof(struct rt_msghdr); } /* XXXGL: can we use MJUMPAGESIZE cluster here? */ KASSERT(len <= MCLBYTES, ("%s: message too big", __func__)); if (len > MHLEN) m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR); else m = m_gethdr(M_NOWAIT, MT_DATA); if (m == NULL) return (m); m->m_pkthdr.len = m->m_len = len; rtm = mtod(m, struct rt_msghdr *); bzero((caddr_t)rtm, len); for (i = 0; i < RTAX_MAX; i++) { if ((sa = rtinfo->rti_info[i]) == NULL) continue; rtinfo->rti_addrs |= (1 << i); dlen = SA_SIZE(sa); #ifdef INET6 if (V_deembed_scopeid && sa->sa_family == AF_INET6) { sin6 = (struct sockaddr_in6 *)&ss; bcopy(sa, sin6, sizeof(*sin6)); if (sa6_recoverscope(sin6) == 0) sa = (struct sockaddr *)sin6; } #endif m_copyback(m, len, dlen, (caddr_t)sa); len += dlen; } if (m->m_pkthdr.len != len) { m_freem(m); return (NULL); } rtm->rtm_msglen = len; rtm->rtm_version = RTM_VERSION; rtm->rtm_type = type; return (m); } /* * Writes information related to @rtinfo object to preallocated buffer. * Stores needed size in @plen. If @w is NULL, calculates size without * writing. * Used for sysctl dumps and rtsock answers (RTM_DEL/RTM_GET) generation. * * Returns 0 on success. * */ static int rtsock_msg_buffer(int type, struct rt_addrinfo *rtinfo, struct walkarg *w, int *plen) { int i; int len, buflen = 0, dlen; caddr_t cp = NULL; struct rt_msghdr *rtm = NULL; #ifdef INET6 struct sockaddr_storage ss; struct sockaddr_in6 *sin6; #endif switch (type) { case RTM_DELADDR: case RTM_NEWADDR: if (w != NULL && w->w_op == NET_RT_IFLISTL) { #ifdef COMPAT_FREEBSD32 if (w->w_req->flags & SCTL_MASK32) len = sizeof(struct ifa_msghdrl32); else #endif len = sizeof(struct ifa_msghdrl); } else len = sizeof(struct ifa_msghdr); break; case RTM_IFINFO: #ifdef COMPAT_FREEBSD32 if (w != NULL && w->w_req->flags & SCTL_MASK32) { if (w->w_op == NET_RT_IFLISTL) len = sizeof(struct if_msghdrl32); else len = sizeof(struct if_msghdr32); break; } #endif if (w != NULL && w->w_op == NET_RT_IFLISTL) len = sizeof(struct if_msghdrl); else len = sizeof(struct if_msghdr); break; case RTM_NEWMADDR: len = sizeof(struct ifma_msghdr); break; default: len = sizeof(struct rt_msghdr); } if (w != NULL) { rtm = (struct rt_msghdr *)w->w_tmem; buflen = w->w_tmemsize - len; cp = (caddr_t)w->w_tmem + len; } rtinfo->rti_addrs = 0; for (i = 0; i < RTAX_MAX; i++) { struct sockaddr *sa; if ((sa = rtinfo->rti_info[i]) == NULL) continue; rtinfo->rti_addrs |= (1 << i); dlen = SA_SIZE(sa); if (cp != NULL && buflen >= dlen) { #ifdef INET6 if (V_deembed_scopeid && sa->sa_family == AF_INET6) { sin6 = (struct sockaddr_in6 *)&ss; bcopy(sa, sin6, sizeof(*sin6)); if (sa6_recoverscope(sin6) == 0) sa = (struct sockaddr *)sin6; } #endif bcopy((caddr_t)sa, cp, (unsigned)dlen); cp += dlen; buflen -= dlen; } else if (cp != NULL) { /* * Buffer too small. Count needed size * and return with error. */ cp = NULL; } len += dlen; } if (cp != NULL) { dlen = ALIGN(len) - len; if (buflen < dlen) cp = NULL; else buflen -= dlen; } len = ALIGN(len); if (cp != NULL) { /* fill header iff buffer is large enough */ rtm->rtm_version = RTM_VERSION; rtm->rtm_type = type; rtm->rtm_msglen = len; } *plen = len; if (w != NULL && cp == NULL) return (ENOBUFS); return (0); } /* * This routine is called to generate a message from the routing * socket indicating that a redirect has occured, a routing lookup * has failed, or that a protocol has detected timeouts to a particular * destination. */ void rt_missmsg_fib(int type, struct rt_addrinfo *rtinfo, int flags, int error, int fibnum) { struct rt_msghdr *rtm; struct mbuf *m; struct sockaddr *sa = rtinfo->rti_info[RTAX_DST]; if (V_route_cb.any_count == 0) return; m = rtsock_msg_mbuf(type, rtinfo); if (m == NULL) return; if (fibnum != RT_ALL_FIBS) { KASSERT(fibnum >= 0 && fibnum < rt_numfibs, ("%s: fibnum out " "of range 0 <= %d < %d", __func__, fibnum, rt_numfibs)); M_SETFIB(m, fibnum); m->m_flags |= RTS_FILTER_FIB; } rtm = mtod(m, struct rt_msghdr *); rtm->rtm_flags = RTF_DONE | flags; rtm->rtm_errno = error; rtm->rtm_addrs = rtinfo->rti_addrs; rt_dispatch(m, sa ? sa->sa_family : AF_UNSPEC); } void rt_missmsg(int type, struct rt_addrinfo *rtinfo, int flags, int error) { rt_missmsg_fib(type, rtinfo, flags, error, RT_ALL_FIBS); } /* * This routine is called to generate a message from the routing * socket indicating that the status of a network interface has changed. */ void rt_ifmsg(struct ifnet *ifp) { struct if_msghdr *ifm; struct mbuf *m; struct rt_addrinfo info; if (V_route_cb.any_count == 0) return; bzero((caddr_t)&info, sizeof(info)); m = rtsock_msg_mbuf(RTM_IFINFO, &info); if (m == NULL) return; ifm = mtod(m, struct if_msghdr *); ifm->ifm_index = ifp->if_index; ifm->ifm_flags = ifp->if_flags | ifp->if_drv_flags; if_data_copy(ifp, &ifm->ifm_data); ifm->ifm_addrs = 0; rt_dispatch(m, AF_UNSPEC); } /* * Announce interface address arrival/withdraw. * Please do not call directly, use rt_addrmsg(). * Assume input data to be valid. * Returns 0 on success. */ int rtsock_addrmsg(int cmd, struct ifaddr *ifa, int fibnum) { struct rt_addrinfo info; struct sockaddr *sa; int ncmd; struct mbuf *m; struct ifa_msghdr *ifam; struct ifnet *ifp = ifa->ifa_ifp; struct sockaddr_storage ss; if (V_route_cb.any_count == 0) return (0); ncmd = cmd == RTM_ADD ? RTM_NEWADDR : RTM_DELADDR; bzero((caddr_t)&info, sizeof(info)); info.rti_info[RTAX_IFA] = sa = ifa->ifa_addr; info.rti_info[RTAX_IFP] = ifp->if_addr->ifa_addr; info.rti_info[RTAX_NETMASK] = rtsock_fix_netmask( info.rti_info[RTAX_IFP], ifa->ifa_netmask, &ss); info.rti_info[RTAX_BRD] = ifa->ifa_dstaddr; if ((m = rtsock_msg_mbuf(ncmd, &info)) == NULL) return (ENOBUFS); ifam = mtod(m, struct ifa_msghdr *); ifam->ifam_index = ifp->if_index; ifam->ifam_metric = ifa->ifa_ifp->if_metric; ifam->ifam_flags = ifa->ifa_flags; ifam->ifam_addrs = info.rti_addrs; if (fibnum != RT_ALL_FIBS) { M_SETFIB(m, fibnum); m->m_flags |= RTS_FILTER_FIB; } rt_dispatch(m, sa ? sa->sa_family : AF_UNSPEC); return (0); } /* * Announce route addition/removal. * Please do not call directly, use rt_routemsg(). * Note that @rt data MAY be inconsistent/invalid: * if some userland app sends us "invalid" route message (invalid mask, * no dst, wrong address families, etc...) we need to pass it back * to app (and any other rtsock consumers) with rtm_errno field set to * non-zero value. * * Returns 0 on success. */ int rtsock_routemsg(int cmd, struct ifnet *ifp, int error, struct rtentry *rt, int fibnum) { struct rt_addrinfo info; struct sockaddr *sa; struct mbuf *m; struct rt_msghdr *rtm; struct sockaddr_storage ss; if (V_route_cb.any_count == 0) return (0); bzero((caddr_t)&info, sizeof(info)); info.rti_info[RTAX_DST] = sa = rt_key(rt); info.rti_info[RTAX_NETMASK] = rtsock_fix_netmask(sa, rt_mask(rt), &ss); info.rti_info[RTAX_GATEWAY] = rt->rt_gateway; if ((m = rtsock_msg_mbuf(cmd, &info)) == NULL) return (ENOBUFS); rtm = mtod(m, struct rt_msghdr *); rtm->rtm_index = ifp->if_index; rtm->rtm_flags |= rt->rt_flags; rtm->rtm_errno = error; rtm->rtm_addrs = info.rti_addrs; if (fibnum != RT_ALL_FIBS) { M_SETFIB(m, fibnum); m->m_flags |= RTS_FILTER_FIB; } rt_dispatch(m, sa ? sa->sa_family : AF_UNSPEC); return (0); } /* * This is the analogue to the rt_newaddrmsg which performs the same * function but for multicast group memberhips. This is easier since * there is no route state to worry about. */ void rt_newmaddrmsg(int cmd, struct ifmultiaddr *ifma) { struct rt_addrinfo info; struct mbuf *m = NULL; struct ifnet *ifp = ifma->ifma_ifp; struct ifma_msghdr *ifmam; if (V_route_cb.any_count == 0) return; bzero((caddr_t)&info, sizeof(info)); info.rti_info[RTAX_IFA] = ifma->ifma_addr; info.rti_info[RTAX_IFP] = ifp ? ifp->if_addr->ifa_addr : NULL; /* * If a link-layer address is present, present it as a ``gateway'' * (similarly to how ARP entries, e.g., are presented). */ info.rti_info[RTAX_GATEWAY] = ifma->ifma_lladdr; m = rtsock_msg_mbuf(cmd, &info); if (m == NULL) return; ifmam = mtod(m, struct ifma_msghdr *); KASSERT(ifp != NULL, ("%s: link-layer multicast address w/o ifp\n", __func__)); ifmam->ifmam_index = ifp->if_index; ifmam->ifmam_addrs = info.rti_addrs; rt_dispatch(m, ifma->ifma_addr ? ifma->ifma_addr->sa_family : AF_UNSPEC); } static struct mbuf * rt_makeifannouncemsg(struct ifnet *ifp, int type, int what, struct rt_addrinfo *info) { struct if_announcemsghdr *ifan; struct mbuf *m; if (V_route_cb.any_count == 0) return NULL; bzero((caddr_t)info, sizeof(*info)); m = rtsock_msg_mbuf(type, info); if (m != NULL) { ifan = mtod(m, struct if_announcemsghdr *); ifan->ifan_index = ifp->if_index; strlcpy(ifan->ifan_name, ifp->if_xname, sizeof(ifan->ifan_name)); ifan->ifan_what = what; } return m; } /* * This is called to generate routing socket messages indicating * IEEE80211 wireless events. * XXX we piggyback on the RTM_IFANNOUNCE msg format in a clumsy way. */ void rt_ieee80211msg(struct ifnet *ifp, int what, void *data, size_t data_len) { struct mbuf *m; struct rt_addrinfo info; m = rt_makeifannouncemsg(ifp, RTM_IEEE80211, what, &info); if (m != NULL) { /* * Append the ieee80211 data. Try to stick it in the * mbuf containing the ifannounce msg; otherwise allocate * a new mbuf and append. * * NB: we assume m is a single mbuf. */ if (data_len > M_TRAILINGSPACE(m)) { struct mbuf *n = m_get(M_NOWAIT, MT_DATA); if (n == NULL) { m_freem(m); return; } bcopy(data, mtod(n, void *), data_len); n->m_len = data_len; m->m_next = n; } else if (data_len > 0) { bcopy(data, mtod(m, u_int8_t *) + m->m_len, data_len); m->m_len += data_len; } if (m->m_flags & M_PKTHDR) m->m_pkthdr.len += data_len; mtod(m, struct if_announcemsghdr *)->ifan_msglen += data_len; rt_dispatch(m, AF_UNSPEC); } } /* * This is called to generate routing socket messages indicating * network interface arrival and departure. */ void rt_ifannouncemsg(struct ifnet *ifp, int what) { struct mbuf *m; struct rt_addrinfo info; m = rt_makeifannouncemsg(ifp, RTM_IFANNOUNCE, what, &info); if (m != NULL) rt_dispatch(m, AF_UNSPEC); } static void rt_dispatch(struct mbuf *m, sa_family_t saf) { struct m_tag *tag; /* * Preserve the family from the sockaddr, if any, in an m_tag for * use when injecting the mbuf into the routing socket buffer from * the netisr. */ if (saf != AF_UNSPEC) { tag = m_tag_get(PACKET_TAG_RTSOCKFAM, sizeof(unsigned short), M_NOWAIT); if (tag == NULL) { m_freem(m); return; } *(unsigned short *)(tag + 1) = saf; m_tag_prepend(m, tag); } #ifdef VIMAGE if (V_loif) m->m_pkthdr.rcvif = V_loif; else { m_freem(m); return; } #endif netisr_queue(NETISR_ROUTE, m); /* mbuf is free'd on failure. */ } /* * This is used in dumping the kernel table via sysctl(). */ static int sysctl_dumpentry(struct radix_node *rn, void *vw) { struct walkarg *w = vw; struct rtentry *rt = (struct rtentry *)rn; int error = 0, size; struct rt_addrinfo info; struct sockaddr_storage ss; if (w->w_op == NET_RT_FLAGS && !(rt->rt_flags & w->w_arg)) return 0; if ((rt->rt_flags & RTF_HOST) == 0 ? jailed_without_vnet(w->w_req->td->td_ucred) : prison_if(w->w_req->td->td_ucred, rt_key(rt)) != 0) return (0); bzero((caddr_t)&info, sizeof(info)); info.rti_info[RTAX_DST] = rt_key(rt); info.rti_info[RTAX_GATEWAY] = rt->rt_gateway; info.rti_info[RTAX_NETMASK] = rtsock_fix_netmask(rt_key(rt), rt_mask(rt), &ss); info.rti_info[RTAX_GENMASK] = 0; if (rt->rt_ifp) { info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr; info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; if (rt->rt_ifp->if_flags & IFF_POINTOPOINT) info.rti_info[RTAX_BRD] = rt->rt_ifa->ifa_dstaddr; } if ((error = rtsock_msg_buffer(RTM_GET, &info, w, &size)) != 0) return (error); if (w->w_req && w->w_tmem) { struct rt_msghdr *rtm = (struct rt_msghdr *)w->w_tmem; if (rt->rt_flags & RTF_GWFLAG_COMPAT) rtm->rtm_flags = RTF_GATEWAY | (rt->rt_flags & ~RTF_GWFLAG_COMPAT); else rtm->rtm_flags = rt->rt_flags; rt_getmetrics(rt, &rtm->rtm_rmx); rtm->rtm_index = rt->rt_ifp->if_index; rtm->rtm_errno = rtm->rtm_pid = rtm->rtm_seq = 0; rtm->rtm_addrs = info.rti_addrs; error = SYSCTL_OUT(w->w_req, (caddr_t)rtm, size); return (error); } return (error); } static int sysctl_iflist_ifml(struct ifnet *ifp, struct rt_addrinfo *info, struct walkarg *w, int len) { struct if_msghdrl *ifm; struct if_data *ifd; ifm = (struct if_msghdrl *)w->w_tmem; #ifdef COMPAT_FREEBSD32 if (w->w_req->flags & SCTL_MASK32) { struct if_msghdrl32 *ifm32; ifm32 = (struct if_msghdrl32 *)ifm; ifm32->ifm_addrs = info->rti_addrs; ifm32->ifm_flags = ifp->if_flags | ifp->if_drv_flags; ifm32->ifm_index = ifp->if_index; ifm32->_ifm_spare1 = 0; ifm32->ifm_len = sizeof(*ifm32); ifm32->ifm_data_off = offsetof(struct if_msghdrl32, ifm_data); ifd = &ifm32->ifm_data; } else #endif { ifm->ifm_addrs = info->rti_addrs; ifm->ifm_flags = ifp->if_flags | ifp->if_drv_flags; ifm->ifm_index = ifp->if_index; ifm->_ifm_spare1 = 0; ifm->ifm_len = sizeof(*ifm); ifm->ifm_data_off = offsetof(struct if_msghdrl, ifm_data); ifd = &ifm->ifm_data; } if_data_copy(ifp, ifd); return (SYSCTL_OUT(w->w_req, (caddr_t)ifm, len)); } static int sysctl_iflist_ifm(struct ifnet *ifp, struct rt_addrinfo *info, struct walkarg *w, int len) { struct if_msghdr *ifm; struct if_data *ifd; ifm = (struct if_msghdr *)w->w_tmem; #ifdef COMPAT_FREEBSD32 if (w->w_req->flags & SCTL_MASK32) { struct if_msghdr32 *ifm32; ifm32 = (struct if_msghdr32 *)ifm; ifm32->ifm_addrs = info->rti_addrs; ifm32->ifm_flags = ifp->if_flags | ifp->if_drv_flags; ifm32->ifm_index = ifp->if_index; ifd = &ifm32->ifm_data; } else #endif { ifm->ifm_addrs = info->rti_addrs; ifm->ifm_flags = ifp->if_flags | ifp->if_drv_flags; ifm->ifm_index = ifp->if_index; ifd = &ifm->ifm_data; } if_data_copy(ifp, ifd); return (SYSCTL_OUT(w->w_req, (caddr_t)ifm, len)); } static int sysctl_iflist_ifaml(struct ifaddr *ifa, struct rt_addrinfo *info, struct walkarg *w, int len) { struct ifa_msghdrl *ifam; struct if_data *ifd; ifam = (struct ifa_msghdrl *)w->w_tmem; #ifdef COMPAT_FREEBSD32 if (w->w_req->flags & SCTL_MASK32) { struct ifa_msghdrl32 *ifam32; ifam32 = (struct ifa_msghdrl32 *)ifam; ifam32->ifam_addrs = info->rti_addrs; ifam32->ifam_flags = ifa->ifa_flags; ifam32->ifam_index = ifa->ifa_ifp->if_index; ifam32->_ifam_spare1 = 0; ifam32->ifam_len = sizeof(*ifam32); ifam32->ifam_data_off = offsetof(struct ifa_msghdrl32, ifam_data); ifam32->ifam_metric = ifa->ifa_ifp->if_metric; ifd = &ifam32->ifam_data; } else #endif { ifam->ifam_addrs = info->rti_addrs; ifam->ifam_flags = ifa->ifa_flags; ifam->ifam_index = ifa->ifa_ifp->if_index; ifam->_ifam_spare1 = 0; ifam->ifam_len = sizeof(*ifam); ifam->ifam_data_off = offsetof(struct ifa_msghdrl, ifam_data); ifam->ifam_metric = ifa->ifa_ifp->if_metric; ifd = &ifam->ifam_data; } bzero(ifd, sizeof(*ifd)); ifd->ifi_datalen = sizeof(struct if_data); ifd->ifi_ipackets = counter_u64_fetch(ifa->ifa_ipackets); ifd->ifi_opackets = counter_u64_fetch(ifa->ifa_opackets); ifd->ifi_ibytes = counter_u64_fetch(ifa->ifa_ibytes); ifd->ifi_obytes = counter_u64_fetch(ifa->ifa_obytes); /* Fixup if_data carp(4) vhid. */ if (carp_get_vhid_p != NULL) ifd->ifi_vhid = (*carp_get_vhid_p)(ifa); return (SYSCTL_OUT(w->w_req, w->w_tmem, len)); } static int sysctl_iflist_ifam(struct ifaddr *ifa, struct rt_addrinfo *info, struct walkarg *w, int len) { struct ifa_msghdr *ifam; ifam = (struct ifa_msghdr *)w->w_tmem; ifam->ifam_addrs = info->rti_addrs; ifam->ifam_flags = ifa->ifa_flags; ifam->ifam_index = ifa->ifa_ifp->if_index; ifam->ifam_metric = ifa->ifa_ifp->if_metric; return (SYSCTL_OUT(w->w_req, w->w_tmem, len)); } static int sysctl_iflist(int af, struct walkarg *w) { struct ifnet *ifp; struct ifaddr *ifa; struct rt_addrinfo info; int len, error = 0; struct sockaddr_storage ss; bzero((caddr_t)&info, sizeof(info)); IFNET_RLOCK_NOSLEEP(); TAILQ_FOREACH(ifp, &V_ifnet, if_link) { if (w->w_arg && w->w_arg != ifp->if_index) continue; IF_ADDR_RLOCK(ifp); ifa = ifp->if_addr; info.rti_info[RTAX_IFP] = ifa->ifa_addr; error = rtsock_msg_buffer(RTM_IFINFO, &info, w, &len); if (error != 0) goto done; info.rti_info[RTAX_IFP] = NULL; if (w->w_req && w->w_tmem) { if (w->w_op == NET_RT_IFLISTL) error = sysctl_iflist_ifml(ifp, &info, w, len); else error = sysctl_iflist_ifm(ifp, &info, w, len); if (error) goto done; } while ((ifa = TAILQ_NEXT(ifa, ifa_link)) != NULL) { if (af && af != ifa->ifa_addr->sa_family) continue; if (prison_if(w->w_req->td->td_ucred, ifa->ifa_addr) != 0) continue; info.rti_info[RTAX_IFA] = ifa->ifa_addr; info.rti_info[RTAX_NETMASK] = rtsock_fix_netmask( ifa->ifa_addr, ifa->ifa_netmask, &ss); info.rti_info[RTAX_BRD] = ifa->ifa_dstaddr; error = rtsock_msg_buffer(RTM_NEWADDR, &info, w, &len); if (error != 0) goto done; if (w->w_req && w->w_tmem) { if (w->w_op == NET_RT_IFLISTL) error = sysctl_iflist_ifaml(ifa, &info, w, len); else error = sysctl_iflist_ifam(ifa, &info, w, len); if (error) goto done; } } IF_ADDR_RUNLOCK(ifp); info.rti_info[RTAX_IFA] = NULL; info.rti_info[RTAX_NETMASK] = NULL; info.rti_info[RTAX_BRD] = NULL; } done: if (ifp != NULL) IF_ADDR_RUNLOCK(ifp); IFNET_RUNLOCK_NOSLEEP(); return (error); } static int sysctl_ifmalist(int af, struct walkarg *w) { struct ifnet *ifp; struct ifmultiaddr *ifma; struct rt_addrinfo info; int len, error = 0; struct ifaddr *ifa; bzero((caddr_t)&info, sizeof(info)); IFNET_RLOCK_NOSLEEP(); TAILQ_FOREACH(ifp, &V_ifnet, if_link) { if (w->w_arg && w->w_arg != ifp->if_index) continue; ifa = ifp->if_addr; info.rti_info[RTAX_IFP] = ifa ? ifa->ifa_addr : NULL; IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) { if (af && af != ifma->ifma_addr->sa_family) continue; if (prison_if(w->w_req->td->td_ucred, ifma->ifma_addr) != 0) continue; info.rti_info[RTAX_IFA] = ifma->ifma_addr; info.rti_info[RTAX_GATEWAY] = (ifma->ifma_addr->sa_family != AF_LINK) ? ifma->ifma_lladdr : NULL; error = rtsock_msg_buffer(RTM_NEWMADDR, &info, w, &len); if (error != 0) goto done; if (w->w_req && w->w_tmem) { struct ifma_msghdr *ifmam; ifmam = (struct ifma_msghdr *)w->w_tmem; ifmam->ifmam_index = ifma->ifma_ifp->if_index; ifmam->ifmam_flags = 0; ifmam->ifmam_addrs = info.rti_addrs; error = SYSCTL_OUT(w->w_req, w->w_tmem, len); if (error) { IF_ADDR_RUNLOCK(ifp); goto done; } } } IF_ADDR_RUNLOCK(ifp); } done: IFNET_RUNLOCK_NOSLEEP(); return (error); } static int sysctl_rtsock(SYSCTL_HANDLER_ARGS) { int *name = (int *)arg1; u_int namelen = arg2; - struct radix_node_head *rnh = NULL; /* silence compiler. */ + struct rib_head *rnh = NULL; /* silence compiler. */ int i, lim, error = EINVAL; int fib = 0; u_char af; struct walkarg w; name ++; namelen--; if (req->newptr) return (EPERM); if (name[1] == NET_RT_DUMP) { if (namelen == 3) fib = req->td->td_proc->p_fibnum; else if (namelen == 4) fib = (name[3] == RT_ALL_FIBS) ? req->td->td_proc->p_fibnum : name[3]; else return ((namelen < 3) ? EISDIR : ENOTDIR); if (fib < 0 || fib >= rt_numfibs) return (EINVAL); } else if (namelen != 3) return ((namelen < 3) ? EISDIR : ENOTDIR); af = name[0]; if (af > AF_MAX) return (EINVAL); bzero(&w, sizeof(w)); w.w_op = name[1]; w.w_arg = name[2]; w.w_req = req; error = sysctl_wire_old_buffer(req, 0); if (error) return (error); /* * Allocate reply buffer in advance. * All rtsock messages has maximum length of u_short. */ w.w_tmemsize = 65536; w.w_tmem = malloc(w.w_tmemsize, M_TEMP, M_WAITOK); switch (w.w_op) { case NET_RT_DUMP: case NET_RT_FLAGS: if (af == 0) { /* dump all tables */ i = 1; lim = AF_MAX; } else /* dump only one table */ i = lim = af; /* * take care of llinfo entries, the caller must * specify an AF */ if (w.w_op == NET_RT_FLAGS && (w.w_arg == 0 || w.w_arg & RTF_LLINFO)) { if (af != 0) error = lltable_sysctl_dumparp(af, w.w_req); else error = EINVAL; break; } /* * take care of routing entries */ for (error = 0; error == 0 && i <= lim; i++) { rnh = rt_tables_get_rnh(fib, i); if (rnh != NULL) { - RADIX_NODE_HEAD_RLOCK(rnh); - error = rnh->rnh_walktree(rnh, + RIB_RLOCK(rnh); + error = rnh->rnh_walktree(&rnh->head, sysctl_dumpentry, &w); - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rnh); } else if (af != 0) error = EAFNOSUPPORT; } break; case NET_RT_IFLIST: case NET_RT_IFLISTL: error = sysctl_iflist(af, &w); break; case NET_RT_IFMALIST: error = sysctl_ifmalist(af, &w); break; } free(w.w_tmem, M_TEMP); return (error); } static SYSCTL_NODE(_net, PF_ROUTE, routetable, CTLFLAG_RD, sysctl_rtsock, ""); /* * Definitions of protocols supported in the ROUTE domain. */ static struct domain routedomain; /* or at least forward */ static struct protosw routesw[] = { { .pr_type = SOCK_RAW, .pr_domain = &routedomain, .pr_flags = PR_ATOMIC|PR_ADDR, .pr_output = route_output, .pr_ctlinput = raw_ctlinput, .pr_init = raw_init, .pr_usrreqs = &route_usrreqs } }; static struct domain routedomain = { .dom_family = PF_ROUTE, .dom_name = "route", .dom_protosw = routesw, .dom_protoswNPROTOSW = &routesw[sizeof(routesw)/sizeof(routesw[0])] }; VNET_DOMAIN_SET(route); Index: projects/clang380-import/sys/net/vnet.c =================================================================== --- projects/clang380-import/sys/net/vnet.c (revision 294776) +++ projects/clang380-import/sys/net/vnet.c (revision 294777) @@ -1,782 +1,781 @@ /*- * Copyright (c) 2004-2009 University of Zagreb * Copyright (c) 2006-2009 FreeBSD Foundation * All rights reserved. * * This software was developed by the University of Zagreb and the * FreeBSD Foundation under sponsorship by the Stichting NLnet and the * FreeBSD Foundation. * * Copyright (c) 2009 Jeffrey Roberson * Copyright (c) 2009 Robert N. M. Watson * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_ddb.h" #include "opt_kdb.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef DDB #include #include #endif #include #include #include /*- * This file implements core functions for virtual network stacks: * * - Virtual network stack management functions. * * - Virtual network stack memory allocator, which virtualizes global * variables in the network stack * * - Virtualized SYSINIT's/SYSUNINIT's, which allow network stack subsystems * to register startup/shutdown events to be run for each virtual network * stack instance. */ FEATURE(vimage, "VIMAGE kernel virtualization"); static MALLOC_DEFINE(M_VNET, "vnet", "network stack control block"); /* * The virtual network stack list has two read-write locks, one sleepable and * the other not, so that the list can be stablized and walked in a variety * of network stack contexts. Both must be acquired exclusively to modify * the list, but a read lock of either lock is sufficient to walk the list. */ struct rwlock vnet_rwlock; struct sx vnet_sxlock; #define VNET_LIST_WLOCK() do { \ sx_xlock(&vnet_sxlock); \ rw_wlock(&vnet_rwlock); \ } while (0) #define VNET_LIST_WUNLOCK() do { \ rw_wunlock(&vnet_rwlock); \ sx_xunlock(&vnet_sxlock); \ } while (0) struct vnet_list_head vnet_head; struct vnet *vnet0; /* * The virtual network stack allocator provides storage for virtualized * global variables. These variables are defined/declared using the * VNET_DEFINE()/VNET_DECLARE() macros, which place them in the 'set_vnet' * linker set. The details of the implementation are somewhat subtle, but * allow the majority of most network subsystems to maintain * virtualization-agnostic. * * The virtual network stack allocator handles variables in the base kernel * vs. modules in similar but different ways. In both cases, virtualized * global variables are marked as such by being declared to be part of the * vnet linker set. These "master" copies of global variables serve two * functions: * * (1) They contain static initialization or "default" values for global * variables which will be propagated to each virtual network stack * instance when created. As with normal global variables, they default * to zero-filled. * * (2) They act as unique global names by which the variable can be referred * to, regardless of network stack instance. The single global symbol * will be used to calculate the location of a per-virtual instance * variable at run-time. * * Each virtual network stack instance has a complete copy of each * virtualized global variable, stored in a malloc'd block of memory * referred to by vnet->vnet_data_mem. Critical to the design is that each * per-instance memory block is laid out identically to the master block so * that the offset of each global variable is the same across all blocks. To * optimize run-time access, a precalculated 'base' address, * vnet->vnet_data_base, is stored in each vnet, and is the amount that can * be added to the address of a 'master' instance of a variable to get to the * per-vnet instance. * * Virtualized global variables are handled in a similar manner, but as each * module has its own 'set_vnet' linker set, and we want to keep all * virtualized globals togther, we reserve space in the kernel's linker set * for potential module variables using a per-vnet character array, * 'modspace'. The virtual network stack allocator maintains a free list to * track what space in the array is free (all, initially) and as modules are * linked, allocates portions of the space to specific globals. The kernel * module linker queries the virtual network stack allocator and will * bind references of the global to the location during linking. It also * calls into the virtual network stack allocator, once the memory is * initialized, in order to propagate the new static initializations to all * existing virtual network stack instances so that the soon-to-be executing * module will find every network stack instance with proper default values. */ /* * Number of bytes of data in the 'set_vnet' linker set, and hence the total * size of all kernel virtualized global variables, and the malloc(9) type * that will be used to allocate it. */ #define VNET_BYTES (VNET_STOP - VNET_START) static MALLOC_DEFINE(M_VNET_DATA, "vnet_data", "VNET data"); /* * VNET_MODMIN is the minimum number of bytes we will reserve for the sum of * global variables across all loaded modules. As this actually sizes an * array declared as a virtualized global variable in the kernel itself, and * we want the virtualized global variable space to be page-sized, we may * have more space than that in practice. */ #define VNET_MODMIN 8192 #define VNET_SIZE roundup2(VNET_BYTES, PAGE_SIZE) -#define VNET_MODSIZE (VNET_SIZE - (VNET_BYTES - VNET_MODMIN)) /* * Space to store virtualized global variables from loadable kernel modules, * and the free list to manage it. */ static VNET_DEFINE(char, modspace[VNET_MODMIN]); /* * Global lists of subsystem constructor and destructors for vnets. They are * registered via VNET_SYSINIT() and VNET_SYSUNINIT(). Both lists are * protected by the vnet_sysinit_sxlock global lock. */ static TAILQ_HEAD(vnet_sysinit_head, vnet_sysinit) vnet_constructors = TAILQ_HEAD_INITIALIZER(vnet_constructors); static TAILQ_HEAD(vnet_sysuninit_head, vnet_sysinit) vnet_destructors = TAILQ_HEAD_INITIALIZER(vnet_destructors); struct sx vnet_sysinit_sxlock; #define VNET_SYSINIT_WLOCK() sx_xlock(&vnet_sysinit_sxlock); #define VNET_SYSINIT_WUNLOCK() sx_xunlock(&vnet_sysinit_sxlock); #define VNET_SYSINIT_RLOCK() sx_slock(&vnet_sysinit_sxlock); #define VNET_SYSINIT_RUNLOCK() sx_sunlock(&vnet_sysinit_sxlock); struct vnet_data_free { uintptr_t vnd_start; int vnd_len; TAILQ_ENTRY(vnet_data_free) vnd_link; }; static MALLOC_DEFINE(M_VNET_DATA_FREE, "vnet_data_free", "VNET resource accounting"); static TAILQ_HEAD(, vnet_data_free) vnet_data_free_head = TAILQ_HEAD_INITIALIZER(vnet_data_free_head); static struct sx vnet_data_free_lock; SDT_PROVIDER_DEFINE(vnet); SDT_PROBE_DEFINE1(vnet, functions, vnet_alloc, entry, "int"); SDT_PROBE_DEFINE2(vnet, functions, vnet_alloc, alloc, "int", "struct vnet *"); SDT_PROBE_DEFINE2(vnet, functions, vnet_alloc, return, "int", "struct vnet *"); SDT_PROBE_DEFINE2(vnet, functions, vnet_destroy, entry, "int", "struct vnet *"); SDT_PROBE_DEFINE1(vnet, functions, vnet_destroy, return, "int"); #ifdef DDB static void db_show_vnet_print_vs(struct vnet_sysinit *, int); #endif /* * Allocate a virtual network stack. */ struct vnet * vnet_alloc(void) { struct vnet *vnet; SDT_PROBE1(vnet, functions, vnet_alloc, entry, __LINE__); vnet = malloc(sizeof(struct vnet), M_VNET, M_WAITOK | M_ZERO); vnet->vnet_magic_n = VNET_MAGIC_N; SDT_PROBE2(vnet, functions, vnet_alloc, alloc, __LINE__, vnet); /* * Allocate storage for virtualized global variables and copy in * initial values form our 'master' copy. */ vnet->vnet_data_mem = malloc(VNET_SIZE, M_VNET_DATA, M_WAITOK); memcpy(vnet->vnet_data_mem, (void *)VNET_START, VNET_BYTES); /* * All use of vnet-specific data will immediately subtract VNET_START * from the base memory pointer, so pre-calculate that now to avoid * it on each use. */ vnet->vnet_data_base = (uintptr_t)vnet->vnet_data_mem - VNET_START; /* Initialize / attach vnet module instances. */ CURVNET_SET_QUIET(vnet); vnet_sysinit(); CURVNET_RESTORE(); VNET_LIST_WLOCK(); LIST_INSERT_HEAD(&vnet_head, vnet, vnet_le); VNET_LIST_WUNLOCK(); SDT_PROBE2(vnet, functions, vnet_alloc, return, __LINE__, vnet); return (vnet); } /* * Destroy a virtual network stack. */ void vnet_destroy(struct vnet *vnet) { struct ifnet *ifp, *nifp; SDT_PROBE2(vnet, functions, vnet_destroy, entry, __LINE__, vnet); KASSERT(vnet->vnet_sockcnt == 0, ("%s: vnet still has sockets", __func__)); VNET_LIST_WLOCK(); LIST_REMOVE(vnet, vnet_le); VNET_LIST_WUNLOCK(); CURVNET_SET_QUIET(vnet); /* Return all inherited interfaces to their parent vnets. */ TAILQ_FOREACH_SAFE(ifp, &V_ifnet, if_link, nifp) { if (ifp->if_home_vnet != ifp->if_vnet) if_vmove(ifp, ifp->if_home_vnet); } vnet_sysuninit(); CURVNET_RESTORE(); /* * Release storage for the virtual network stack instance. */ free(vnet->vnet_data_mem, M_VNET_DATA); vnet->vnet_data_mem = NULL; vnet->vnet_data_base = 0; vnet->vnet_magic_n = 0xdeadbeef; free(vnet, M_VNET); SDT_PROBE1(vnet, functions, vnet_destroy, return, __LINE__); } /* * Boot time initialization and allocation of virtual network stacks. */ static void vnet_init_prelink(void *arg) { rw_init(&vnet_rwlock, "vnet_rwlock"); sx_init(&vnet_sxlock, "vnet_sxlock"); sx_init(&vnet_sysinit_sxlock, "vnet_sysinit_sxlock"); LIST_INIT(&vnet_head); } SYSINIT(vnet_init_prelink, SI_SUB_VNET_PRELINK, SI_ORDER_FIRST, vnet_init_prelink, NULL); static void vnet0_init(void *arg) { /* Warn people before take off - in case we crash early. */ printf("WARNING: VIMAGE (virtualized network stack) is a highly " "experimental feature.\n"); /* * We MUST clear curvnet in vi_init_done() before going SMP, * otherwise CURVNET_SET() macros would scream about unnecessary * curvnet recursions. */ curvnet = prison0.pr_vnet = vnet0 = vnet_alloc(); } SYSINIT(vnet0_init, SI_SUB_VNET, SI_ORDER_FIRST, vnet0_init, NULL); static void vnet_init_done(void *unused) { curvnet = NULL; } SYSINIT(vnet_init_done, SI_SUB_VNET_DONE, SI_ORDER_FIRST, vnet_init_done, NULL); /* * Once on boot, initialize the modspace freelist to entirely cover modspace. */ static void vnet_data_startup(void *dummy __unused) { struct vnet_data_free *df; df = malloc(sizeof(*df), M_VNET_DATA_FREE, M_WAITOK | M_ZERO); df->vnd_start = (uintptr_t)&VNET_NAME(modspace); df->vnd_len = VNET_MODMIN; TAILQ_INSERT_HEAD(&vnet_data_free_head, df, vnd_link); sx_init(&vnet_data_free_lock, "vnet_data alloc lock"); } SYSINIT(vnet_data, SI_SUB_KLD, SI_ORDER_FIRST, vnet_data_startup, 0); /* * When a module is loaded and requires storage for a virtualized global * variable, allocate space from the modspace free list. This interface * should be used only by the kernel linker. */ void * vnet_data_alloc(int size) { struct vnet_data_free *df; void *s; s = NULL; size = roundup2(size, sizeof(void *)); sx_xlock(&vnet_data_free_lock); TAILQ_FOREACH(df, &vnet_data_free_head, vnd_link) { if (df->vnd_len < size) continue; if (df->vnd_len == size) { s = (void *)df->vnd_start; TAILQ_REMOVE(&vnet_data_free_head, df, vnd_link); free(df, M_VNET_DATA_FREE); break; } s = (void *)df->vnd_start; df->vnd_len -= size; df->vnd_start = df->vnd_start + size; break; } sx_xunlock(&vnet_data_free_lock); return (s); } /* * Free space for a virtualized global variable on module unload. */ void vnet_data_free(void *start_arg, int size) { struct vnet_data_free *df; struct vnet_data_free *dn; uintptr_t start; uintptr_t end; size = roundup2(size, sizeof(void *)); start = (uintptr_t)start_arg; end = start + size; /* * Free a region of space and merge it with as many neighbors as * possible. Keeping the list sorted simplifies this operation. */ sx_xlock(&vnet_data_free_lock); TAILQ_FOREACH(df, &vnet_data_free_head, vnd_link) { if (df->vnd_start > end) break; /* * If we expand at the end of an entry we may have to merge * it with the one following it as well. */ if (df->vnd_start + df->vnd_len == start) { df->vnd_len += size; dn = TAILQ_NEXT(df, vnd_link); if (df->vnd_start + df->vnd_len == dn->vnd_start) { df->vnd_len += dn->vnd_len; TAILQ_REMOVE(&vnet_data_free_head, dn, vnd_link); free(dn, M_VNET_DATA_FREE); } sx_xunlock(&vnet_data_free_lock); return; } if (df->vnd_start == end) { df->vnd_start = start; df->vnd_len += size; sx_xunlock(&vnet_data_free_lock); return; } } dn = malloc(sizeof(*df), M_VNET_DATA_FREE, M_WAITOK | M_ZERO); dn->vnd_start = start; dn->vnd_len = size; if (df) TAILQ_INSERT_BEFORE(df, dn, vnd_link); else TAILQ_INSERT_TAIL(&vnet_data_free_head, dn, vnd_link); sx_xunlock(&vnet_data_free_lock); } /* * When a new virtualized global variable has been allocated, propagate its * initial value to each already-allocated virtual network stack instance. */ void vnet_data_copy(void *start, int size) { struct vnet *vnet; VNET_LIST_RLOCK(); LIST_FOREACH(vnet, &vnet_head, vnet_le) memcpy((void *)((uintptr_t)vnet->vnet_data_base + (uintptr_t)start), start, size); VNET_LIST_RUNLOCK(); } /* * Support for special SYSINIT handlers registered via VNET_SYSINIT() * and VNET_SYSUNINIT(). */ void vnet_register_sysinit(void *arg) { struct vnet_sysinit *vs, *vs2; struct vnet *vnet; vs = arg; KASSERT(vs->subsystem > SI_SUB_VNET, ("vnet sysinit too early")); /* Add the constructor to the global list of vnet constructors. */ VNET_SYSINIT_WLOCK(); TAILQ_FOREACH(vs2, &vnet_constructors, link) { if (vs2->subsystem > vs->subsystem) break; if (vs2->subsystem == vs->subsystem && vs2->order > vs->order) break; } if (vs2 != NULL) TAILQ_INSERT_BEFORE(vs2, vs, link); else TAILQ_INSERT_TAIL(&vnet_constructors, vs, link); /* * Invoke the constructor on all the existing vnets when it is * registered. */ VNET_FOREACH(vnet) { CURVNET_SET_QUIET(vnet); vs->func(vs->arg); CURVNET_RESTORE(); } VNET_SYSINIT_WUNLOCK(); } void vnet_deregister_sysinit(void *arg) { struct vnet_sysinit *vs; vs = arg; /* Remove the constructor from the global list of vnet constructors. */ VNET_SYSINIT_WLOCK(); TAILQ_REMOVE(&vnet_constructors, vs, link); VNET_SYSINIT_WUNLOCK(); } void vnet_register_sysuninit(void *arg) { struct vnet_sysinit *vs, *vs2; vs = arg; /* Add the destructor to the global list of vnet destructors. */ VNET_SYSINIT_WLOCK(); TAILQ_FOREACH(vs2, &vnet_destructors, link) { if (vs2->subsystem > vs->subsystem) break; if (vs2->subsystem == vs->subsystem && vs2->order > vs->order) break; } if (vs2 != NULL) TAILQ_INSERT_BEFORE(vs2, vs, link); else TAILQ_INSERT_TAIL(&vnet_destructors, vs, link); VNET_SYSINIT_WUNLOCK(); } void vnet_deregister_sysuninit(void *arg) { struct vnet_sysinit *vs; struct vnet *vnet; vs = arg; /* * Invoke the destructor on all the existing vnets when it is * deregistered. */ VNET_SYSINIT_WLOCK(); VNET_FOREACH(vnet) { CURVNET_SET_QUIET(vnet); vs->func(vs->arg); CURVNET_RESTORE(); } /* Remove the destructor from the global list of vnet destructors. */ TAILQ_REMOVE(&vnet_destructors, vs, link); VNET_SYSINIT_WUNLOCK(); } /* * Invoke all registered vnet constructors on the current vnet. Used during * vnet construction. The caller is responsible for ensuring the new vnet is * the current vnet and that the vnet_sysinit_sxlock lock is locked. */ void vnet_sysinit(void) { struct vnet_sysinit *vs; VNET_SYSINIT_RLOCK(); TAILQ_FOREACH(vs, &vnet_constructors, link) { vs->func(vs->arg); } VNET_SYSINIT_RUNLOCK(); } /* * Invoke all registered vnet destructors on the current vnet. Used during * vnet destruction. The caller is responsible for ensuring the dying vnet * the current vnet and that the vnet_sysinit_sxlock lock is locked. */ void vnet_sysuninit(void) { struct vnet_sysinit *vs; VNET_SYSINIT_RLOCK(); TAILQ_FOREACH_REVERSE(vs, &vnet_destructors, vnet_sysuninit_head, link) { vs->func(vs->arg); } VNET_SYSINIT_RUNLOCK(); } /* * EVENTHANDLER(9) extensions. */ /* * Invoke the eventhandler function originally registered with the possibly * registered argument for all virtual network stack instances. * * This iterator can only be used for eventhandlers that do not take any * additional arguments, as we do ignore the variadic arguments from the * EVENTHANDLER_INVOKE() call. */ void vnet_global_eventhandler_iterator_func(void *arg, ...) { VNET_ITERATOR_DECL(vnet_iter); struct eventhandler_entry_vimage *v_ee; /* * There is a bug here in that we should actually cast things to * (struct eventhandler_entry_ ## name *) but that's not easily * possible in here so just re-using the variadic version we * defined for the generic vimage case. */ v_ee = arg; VNET_LIST_RLOCK(); VNET_FOREACH(vnet_iter) { CURVNET_SET(vnet_iter); ((vimage_iterator_func_t)v_ee->func)(v_ee->ee_arg); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK(); } #ifdef VNET_DEBUG struct vnet_recursion { SLIST_ENTRY(vnet_recursion) vnr_le; const char *prev_fn; const char *where_fn; int where_line; struct vnet *old_vnet; struct vnet *new_vnet; }; static SLIST_HEAD(, vnet_recursion) vnet_recursions = SLIST_HEAD_INITIALIZER(vnet_recursions); static void vnet_print_recursion(struct vnet_recursion *vnr, int brief) { if (!brief) printf("CURVNET_SET() recursion in "); printf("%s() line %d, prev in %s()", vnr->where_fn, vnr->where_line, vnr->prev_fn); if (brief) printf(", "); else printf("\n "); printf("%p -> %p\n", vnr->old_vnet, vnr->new_vnet); } void vnet_log_recursion(struct vnet *old_vnet, const char *old_fn, int line) { struct vnet_recursion *vnr; /* Skip already logged recursion events. */ SLIST_FOREACH(vnr, &vnet_recursions, vnr_le) if (vnr->prev_fn == old_fn && vnr->where_fn == curthread->td_vnet_lpush && vnr->where_line == line && (vnr->old_vnet == vnr->new_vnet) == (curvnet == old_vnet)) return; vnr = malloc(sizeof(*vnr), M_VNET, M_NOWAIT | M_ZERO); if (vnr == NULL) panic("%s: malloc failed", __func__); vnr->prev_fn = old_fn; vnr->where_fn = curthread->td_vnet_lpush; vnr->where_line = line; vnr->old_vnet = old_vnet; vnr->new_vnet = curvnet; SLIST_INSERT_HEAD(&vnet_recursions, vnr, vnr_le); vnet_print_recursion(vnr, 0); #ifdef KDB kdb_backtrace(); #endif } #endif /* VNET_DEBUG */ /* * DDB(4). */ #ifdef DDB DB_SHOW_COMMAND(vnets, db_show_vnets) { VNET_ITERATOR_DECL(vnet_iter); VNET_FOREACH(vnet_iter) { db_printf("vnet = %p\n", vnet_iter); db_printf(" vnet_magic_n = 0x%x (%s, orig 0x%x)\n", vnet_iter->vnet_magic_n, (vnet_iter->vnet_magic_n == VNET_MAGIC_N) ? "ok" : "mismatch", VNET_MAGIC_N); db_printf(" vnet_ifcnt = %u\n", vnet_iter->vnet_ifcnt); db_printf(" vnet_sockcnt = %u\n", vnet_iter->vnet_sockcnt); db_printf(" vnet_data_mem = %p\n", vnet_iter->vnet_data_mem); db_printf(" vnet_data_base = 0x%jx\n", (uintmax_t)vnet_iter->vnet_data_base); db_printf("\n"); if (db_pager_quit) break; } } static void db_show_vnet_print_vs(struct vnet_sysinit *vs, int ddb) { const char *vsname, *funcname; c_db_sym_t sym; db_expr_t offset; #define xprint(...) \ if (ddb) \ db_printf(__VA_ARGS__); \ else \ printf(__VA_ARGS__) if (vs == NULL) { xprint("%s: no vnet_sysinit * given\n", __func__); return; } sym = db_search_symbol((vm_offset_t)vs, DB_STGY_ANY, &offset); db_symbol_values(sym, &vsname, NULL); sym = db_search_symbol((vm_offset_t)vs->func, DB_STGY_PROC, &offset); db_symbol_values(sym, &funcname, NULL); xprint("%s(%p)\n", (vsname != NULL) ? vsname : "", vs); xprint(" 0x%08x 0x%08x\n", vs->subsystem, vs->order); xprint(" %p(%s)(%p)\n", vs->func, (funcname != NULL) ? funcname : "", vs->arg); #undef xprint } DB_SHOW_COMMAND(vnet_sysinit, db_show_vnet_sysinit) { struct vnet_sysinit *vs; db_printf("VNET_SYSINIT vs Name(Ptr)\n"); db_printf(" Subsystem Order\n"); db_printf(" Function(Name)(Arg)\n"); TAILQ_FOREACH(vs, &vnet_constructors, link) { db_show_vnet_print_vs(vs, 1); if (db_pager_quit) break; } } DB_SHOW_COMMAND(vnet_sysuninit, db_show_vnet_sysuninit) { struct vnet_sysinit *vs; db_printf("VNET_SYSUNINIT vs Name(Ptr)\n"); db_printf(" Subsystem Order\n"); db_printf(" Function(Name)(Arg)\n"); TAILQ_FOREACH_REVERSE(vs, &vnet_destructors, vnet_sysuninit_head, link) { db_show_vnet_print_vs(vs, 1); if (db_pager_quit) break; } } #ifdef VNET_DEBUG DB_SHOW_COMMAND(vnetrcrs, db_show_vnetrcrs) { struct vnet_recursion *vnr; SLIST_FOREACH(vnr, &vnet_recursions, vnr_le) vnet_print_recursion(vnr, 1); } #endif #endif /* DDB */ Index: projects/clang380-import/sys/net80211/ieee80211_ioctl.c =================================================================== --- projects/clang380-import/sys/net80211/ieee80211_ioctl.c (revision 294776) +++ projects/clang380-import/sys/net80211/ieee80211_ioctl.c (revision 294777) @@ -1,3389 +1,3393 @@ /*- * Copyright (c) 2001 Atsushi Onoe * Copyright (c) 2002-2009 Sam Leffler, Errno Consulting * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); /* * IEEE 802.11 ioctl support (FreeBSD-specific) */ #include "opt_inet.h" #include "opt_wlan.h" #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET #include #include #endif #include #include #include #include #define IS_UP_AUTO(_vap) \ (IFNET_IS_UP_RUNNING((_vap)->iv_ifp) && \ (_vap)->iv_roaming == IEEE80211_ROAMING_AUTO) static const uint8_t zerobssid[IEEE80211_ADDR_LEN]; static struct ieee80211_channel *findchannel(struct ieee80211com *, int ieee, int mode); static int ieee80211_scanreq(struct ieee80211vap *, struct ieee80211_scan_req *); -static __noinline int +static int ieee80211_ioctl_getkey(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_node *ni; struct ieee80211req_key ik; struct ieee80211_key *wk; const struct ieee80211_cipher *cip; u_int kid; int error; if (ireq->i_len != sizeof(ik)) return EINVAL; error = copyin(ireq->i_data, &ik, sizeof(ik)); if (error) return error; kid = ik.ik_keyix; if (kid == IEEE80211_KEYIX_NONE) { ni = ieee80211_find_vap_node(&ic->ic_sta, vap, ik.ik_macaddr); if (ni == NULL) return ENOENT; wk = &ni->ni_ucastkey; } else { if (kid >= IEEE80211_WEP_NKID) return EINVAL; wk = &vap->iv_nw_keys[kid]; IEEE80211_ADDR_COPY(&ik.ik_macaddr, vap->iv_bss->ni_macaddr); ni = NULL; } cip = wk->wk_cipher; ik.ik_type = cip->ic_cipher; ik.ik_keylen = wk->wk_keylen; ik.ik_flags = wk->wk_flags & (IEEE80211_KEY_XMIT | IEEE80211_KEY_RECV); if (wk->wk_keyix == vap->iv_def_txkey) ik.ik_flags |= IEEE80211_KEY_DEFAULT; if (priv_check(curthread, PRIV_NET80211_GETKEY) == 0) { /* NB: only root can read key data */ ik.ik_keyrsc = wk->wk_keyrsc[IEEE80211_NONQOS_TID]; ik.ik_keytsc = wk->wk_keytsc; memcpy(ik.ik_keydata, wk->wk_key, wk->wk_keylen); if (cip->ic_cipher == IEEE80211_CIPHER_TKIP) { memcpy(ik.ik_keydata+wk->wk_keylen, wk->wk_key + IEEE80211_KEYBUF_SIZE, IEEE80211_MICBUF_SIZE); ik.ik_keylen += IEEE80211_MICBUF_SIZE; } } else { ik.ik_keyrsc = 0; ik.ik_keytsc = 0; memset(ik.ik_keydata, 0, sizeof(ik.ik_keydata)); } if (ni != NULL) ieee80211_free_node(ni); return copyout(&ik, ireq->i_data, sizeof(ik)); } -static __noinline int +static int ieee80211_ioctl_getchanlist(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; if (sizeof(ic->ic_chan_active) < ireq->i_len) ireq->i_len = sizeof(ic->ic_chan_active); return copyout(&ic->ic_chan_active, ireq->i_data, ireq->i_len); } -static __noinline int +static int ieee80211_ioctl_getchaninfo(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; uint32_t space; space = __offsetof(struct ieee80211req_chaninfo, ic_chans[ic->ic_nchans]); if (space > ireq->i_len) space = ireq->i_len; /* XXX assumes compatible layout */ return copyout(&ic->ic_nchans, ireq->i_data, space); } -static __noinline int +static int ieee80211_ioctl_getwpaie(struct ieee80211vap *vap, struct ieee80211req *ireq, int req) { struct ieee80211_node *ni; - struct ieee80211req_wpaie2 wpaie; + struct ieee80211req_wpaie2 *wpaie; int error; if (ireq->i_len < IEEE80211_ADDR_LEN) return EINVAL; - error = copyin(ireq->i_data, wpaie.wpa_macaddr, IEEE80211_ADDR_LEN); + wpaie = IEEE80211_MALLOC(sizeof(*wpaie), M_TEMP, + IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); + if (wpaie == NULL) + return ENOMEM; + error = copyin(ireq->i_data, wpaie->wpa_macaddr, IEEE80211_ADDR_LEN); if (error != 0) - return error; - ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, wpaie.wpa_macaddr); - if (ni == NULL) - return ENOENT; - memset(wpaie.wpa_ie, 0, sizeof(wpaie.wpa_ie)); + goto bad; + ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, wpaie->wpa_macaddr); + if (ni == NULL) { + error = ENOENT; + goto bad; + } if (ni->ni_ies.wpa_ie != NULL) { int ielen = ni->ni_ies.wpa_ie[1] + 2; - if (ielen > sizeof(wpaie.wpa_ie)) - ielen = sizeof(wpaie.wpa_ie); - memcpy(wpaie.wpa_ie, ni->ni_ies.wpa_ie, ielen); + if (ielen > sizeof(wpaie->wpa_ie)) + ielen = sizeof(wpaie->wpa_ie); + memcpy(wpaie->wpa_ie, ni->ni_ies.wpa_ie, ielen); } if (req == IEEE80211_IOC_WPAIE2) { - memset(wpaie.rsn_ie, 0, sizeof(wpaie.rsn_ie)); if (ni->ni_ies.rsn_ie != NULL) { int ielen = ni->ni_ies.rsn_ie[1] + 2; - if (ielen > sizeof(wpaie.rsn_ie)) - ielen = sizeof(wpaie.rsn_ie); - memcpy(wpaie.rsn_ie, ni->ni_ies.rsn_ie, ielen); + if (ielen > sizeof(wpaie->rsn_ie)) + ielen = sizeof(wpaie->rsn_ie); + memcpy(wpaie->rsn_ie, ni->ni_ies.rsn_ie, ielen); } if (ireq->i_len > sizeof(struct ieee80211req_wpaie2)) ireq->i_len = sizeof(struct ieee80211req_wpaie2); } else { /* compatibility op, may overwrite wpa ie */ /* XXX check ic_flags? */ if (ni->ni_ies.rsn_ie != NULL) { int ielen = ni->ni_ies.rsn_ie[1] + 2; - if (ielen > sizeof(wpaie.wpa_ie)) - ielen = sizeof(wpaie.wpa_ie); - memcpy(wpaie.wpa_ie, ni->ni_ies.rsn_ie, ielen); + if (ielen > sizeof(wpaie->wpa_ie)) + ielen = sizeof(wpaie->wpa_ie); + memcpy(wpaie->wpa_ie, ni->ni_ies.rsn_ie, ielen); } if (ireq->i_len > sizeof(struct ieee80211req_wpaie)) ireq->i_len = sizeof(struct ieee80211req_wpaie); } ieee80211_free_node(ni); - return copyout(&wpaie, ireq->i_data, ireq->i_len); + error = copyout(wpaie, ireq->i_data, ireq->i_len); +bad: + IEEE80211_FREE(wpaie, M_TEMP); + return error; } -static __noinline int +static int ieee80211_ioctl_getstastats(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211_node *ni; uint8_t macaddr[IEEE80211_ADDR_LEN]; const size_t off = __offsetof(struct ieee80211req_sta_stats, is_stats); int error; if (ireq->i_len < off) return EINVAL; error = copyin(ireq->i_data, macaddr, IEEE80211_ADDR_LEN); if (error != 0) return error; ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, macaddr); if (ni == NULL) return ENOENT; if (ireq->i_len > sizeof(struct ieee80211req_sta_stats)) ireq->i_len = sizeof(struct ieee80211req_sta_stats); /* NB: copy out only the statistics */ error = copyout(&ni->ni_stats, (uint8_t *) ireq->i_data + off, ireq->i_len - off); ieee80211_free_node(ni); return error; } struct scanreq { struct ieee80211req_scan_result *sr; size_t space; }; static size_t scan_space(const struct ieee80211_scan_entry *se, int *ielen) { size_t len; *ielen = se->se_ies.len; /* * NB: ie's can be no more than 255 bytes and the max 802.11 * packet is <3Kbytes so we are sure this doesn't overflow * 16-bits; if this is a concern we can drop the ie's. */ len = sizeof(struct ieee80211req_scan_result) + se->se_ssid[1] + se->se_meshid[1] + *ielen; return roundup(len, sizeof(uint32_t)); } static void get_scan_space(void *arg, const struct ieee80211_scan_entry *se) { struct scanreq *req = arg; int ielen; req->space += scan_space(se, &ielen); } -static __noinline void +static void get_scan_result(void *arg, const struct ieee80211_scan_entry *se) { struct scanreq *req = arg; struct ieee80211req_scan_result *sr; int ielen, len, nr, nxr; uint8_t *cp; len = scan_space(se, &ielen); if (len > req->space) return; sr = req->sr; KASSERT(len <= 65535 && ielen <= 65535, ("len %u ssid %u ie %u", len, se->se_ssid[1], ielen)); sr->isr_len = len; sr->isr_ie_off = sizeof(struct ieee80211req_scan_result); sr->isr_ie_len = ielen; sr->isr_freq = se->se_chan->ic_freq; sr->isr_flags = se->se_chan->ic_flags; sr->isr_rssi = se->se_rssi; sr->isr_noise = se->se_noise; sr->isr_intval = se->se_intval; sr->isr_capinfo = se->se_capinfo; sr->isr_erp = se->se_erp; IEEE80211_ADDR_COPY(sr->isr_bssid, se->se_bssid); nr = min(se->se_rates[1], IEEE80211_RATE_MAXSIZE); memcpy(sr->isr_rates, se->se_rates+2, nr); nxr = min(se->se_xrates[1], IEEE80211_RATE_MAXSIZE - nr); memcpy(sr->isr_rates+nr, se->se_xrates+2, nxr); sr->isr_nrates = nr + nxr; /* copy SSID */ sr->isr_ssid_len = se->se_ssid[1]; cp = ((uint8_t *)sr) + sr->isr_ie_off; memcpy(cp, se->se_ssid+2, sr->isr_ssid_len); /* copy mesh id */ cp += sr->isr_ssid_len; sr->isr_meshid_len = se->se_meshid[1]; memcpy(cp, se->se_meshid+2, sr->isr_meshid_len); cp += sr->isr_meshid_len; if (ielen) memcpy(cp, se->se_ies.data, ielen); req->space -= len; req->sr = (struct ieee80211req_scan_result *)(((uint8_t *)sr) + len); } -static __noinline int +static int ieee80211_ioctl_getscanresults(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct scanreq req; int error; if (ireq->i_len < sizeof(struct scanreq)) return EFAULT; error = 0; req.space = 0; ieee80211_scan_iterate(vap, get_scan_space, &req); if (req.space > ireq->i_len) req.space = ireq->i_len; if (req.space > 0) { uint32_t space; void *p; space = req.space; /* XXX M_WAITOK after driver lock released */ p = IEEE80211_MALLOC(space, M_TEMP, IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); if (p == NULL) return ENOMEM; req.sr = p; ieee80211_scan_iterate(vap, get_scan_result, &req); ireq->i_len = space - req.space; error = copyout(p, ireq->i_data, ireq->i_len); IEEE80211_FREE(p, M_TEMP); } else ireq->i_len = 0; return error; } struct stainforeq { struct ieee80211vap *vap; struct ieee80211req_sta_info *si; size_t space; }; static size_t sta_space(const struct ieee80211_node *ni, size_t *ielen) { *ielen = ni->ni_ies.len; return roundup(sizeof(struct ieee80211req_sta_info) + *ielen, sizeof(uint32_t)); } static void get_sta_space(void *arg, struct ieee80211_node *ni) { struct stainforeq *req = arg; size_t ielen; if (req->vap != ni->ni_vap) return; if (ni->ni_vap->iv_opmode == IEEE80211_M_HOSTAP && ni->ni_associd == 0) /* only associated stations */ return; req->space += sta_space(ni, &ielen); } -static __noinline void +static void get_sta_info(void *arg, struct ieee80211_node *ni) { struct stainforeq *req = arg; struct ieee80211vap *vap = ni->ni_vap; struct ieee80211req_sta_info *si; size_t ielen, len; uint8_t *cp; if (req->vap != ni->ni_vap) return; if (vap->iv_opmode == IEEE80211_M_HOSTAP && ni->ni_associd == 0) /* only associated stations */ return; if (ni->ni_chan == IEEE80211_CHAN_ANYC) /* XXX bogus entry */ return; len = sta_space(ni, &ielen); if (len > req->space) return; si = req->si; si->isi_len = len; si->isi_ie_off = sizeof(struct ieee80211req_sta_info); si->isi_ie_len = ielen; si->isi_freq = ni->ni_chan->ic_freq; si->isi_flags = ni->ni_chan->ic_flags; si->isi_state = ni->ni_flags; si->isi_authmode = ni->ni_authmode; vap->iv_ic->ic_node_getsignal(ni, &si->isi_rssi, &si->isi_noise); vap->iv_ic->ic_node_getmimoinfo(ni, &si->isi_mimo); si->isi_capinfo = ni->ni_capinfo; si->isi_erp = ni->ni_erp; IEEE80211_ADDR_COPY(si->isi_macaddr, ni->ni_macaddr); si->isi_nrates = ni->ni_rates.rs_nrates; if (si->isi_nrates > 15) si->isi_nrates = 15; memcpy(si->isi_rates, ni->ni_rates.rs_rates, si->isi_nrates); si->isi_txrate = ni->ni_txrate; if (si->isi_txrate & IEEE80211_RATE_MCS) { const struct ieee80211_mcs_rates *mcs = &ieee80211_htrates[ni->ni_txrate &~ IEEE80211_RATE_MCS]; if (IEEE80211_IS_CHAN_HT40(ni->ni_chan)) { if (ni->ni_flags & IEEE80211_NODE_SGI40) si->isi_txmbps = mcs->ht40_rate_800ns; else si->isi_txmbps = mcs->ht40_rate_400ns; } else { if (ni->ni_flags & IEEE80211_NODE_SGI20) si->isi_txmbps = mcs->ht20_rate_800ns; else si->isi_txmbps = mcs->ht20_rate_400ns; } } else si->isi_txmbps = si->isi_txrate; si->isi_associd = ni->ni_associd; si->isi_txpower = ni->ni_txpower; si->isi_vlan = ni->ni_vlan; if (ni->ni_flags & IEEE80211_NODE_QOS) { memcpy(si->isi_txseqs, ni->ni_txseqs, sizeof(ni->ni_txseqs)); memcpy(si->isi_rxseqs, ni->ni_rxseqs, sizeof(ni->ni_rxseqs)); } else { si->isi_txseqs[0] = ni->ni_txseqs[IEEE80211_NONQOS_TID]; si->isi_rxseqs[0] = ni->ni_rxseqs[IEEE80211_NONQOS_TID]; } /* NB: leave all cases in case we relax ni_associd == 0 check */ if (ieee80211_node_is_authorized(ni)) si->isi_inact = vap->iv_inact_run; else if (ni->ni_associd != 0 || (vap->iv_opmode == IEEE80211_M_WDS && (vap->iv_flags_ext & IEEE80211_FEXT_WDSLEGACY))) si->isi_inact = vap->iv_inact_auth; else si->isi_inact = vap->iv_inact_init; si->isi_inact = (si->isi_inact - ni->ni_inact) * IEEE80211_INACT_WAIT; si->isi_localid = ni->ni_mllid; si->isi_peerid = ni->ni_mlpid; si->isi_peerstate = ni->ni_mlstate; if (ielen) { cp = ((uint8_t *)si) + si->isi_ie_off; memcpy(cp, ni->ni_ies.data, ielen); } req->si = (struct ieee80211req_sta_info *)(((uint8_t *)si) + len); req->space -= len; } -static __noinline int +static int getstainfo_common(struct ieee80211vap *vap, struct ieee80211req *ireq, struct ieee80211_node *ni, size_t off) { struct ieee80211com *ic = vap->iv_ic; struct stainforeq req; size_t space; void *p; int error; error = 0; req.space = 0; req.vap = vap; if (ni == NULL) ieee80211_iterate_nodes(&ic->ic_sta, get_sta_space, &req); else get_sta_space(&req, ni); if (req.space > ireq->i_len) req.space = ireq->i_len; if (req.space > 0) { space = req.space; /* XXX M_WAITOK after driver lock released */ p = IEEE80211_MALLOC(space, M_TEMP, IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); if (p == NULL) { error = ENOMEM; goto bad; } req.si = p; if (ni == NULL) ieee80211_iterate_nodes(&ic->ic_sta, get_sta_info, &req); else get_sta_info(&req, ni); ireq->i_len = space - req.space; error = copyout(p, (uint8_t *) ireq->i_data+off, ireq->i_len); IEEE80211_FREE(p, M_TEMP); } else ireq->i_len = 0; bad: if (ni != NULL) ieee80211_free_node(ni); return error; } -static __noinline int +static int ieee80211_ioctl_getstainfo(struct ieee80211vap *vap, struct ieee80211req *ireq) { uint8_t macaddr[IEEE80211_ADDR_LEN]; const size_t off = __offsetof(struct ieee80211req_sta_req, info); struct ieee80211_node *ni; int error; if (ireq->i_len < sizeof(struct ieee80211req_sta_req)) return EFAULT; error = copyin(ireq->i_data, macaddr, IEEE80211_ADDR_LEN); if (error != 0) return error; if (IEEE80211_ADDR_EQ(macaddr, vap->iv_ifp->if_broadcastaddr)) { ni = NULL; } else { ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, macaddr); if (ni == NULL) return ENOENT; } return getstainfo_common(vap, ireq, ni, off); } -static __noinline int +static int ieee80211_ioctl_getstatxpow(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211_node *ni; struct ieee80211req_sta_txpow txpow; int error; if (ireq->i_len != sizeof(txpow)) return EINVAL; error = copyin(ireq->i_data, &txpow, sizeof(txpow)); if (error != 0) return error; ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, txpow.it_macaddr); if (ni == NULL) return ENOENT; txpow.it_txpow = ni->ni_txpower; error = copyout(&txpow, ireq->i_data, sizeof(txpow)); ieee80211_free_node(ni); return error; } -static __noinline int +static int ieee80211_ioctl_getwmeparam(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_wme_state *wme = &ic->ic_wme; struct wmeParams *wmep; int ac; if ((ic->ic_caps & IEEE80211_C_WME) == 0) return EINVAL; ac = (ireq->i_len & IEEE80211_WMEPARAM_VAL); if (ac >= WME_NUM_AC) ac = WME_AC_BE; if (ireq->i_len & IEEE80211_WMEPARAM_BSS) wmep = &wme->wme_wmeBssChanParams.cap_wmeParams[ac]; else wmep = &wme->wme_wmeChanParams.cap_wmeParams[ac]; switch (ireq->i_type) { case IEEE80211_IOC_WME_CWMIN: /* WME: CWmin */ ireq->i_val = wmep->wmep_logcwmin; break; case IEEE80211_IOC_WME_CWMAX: /* WME: CWmax */ ireq->i_val = wmep->wmep_logcwmax; break; case IEEE80211_IOC_WME_AIFS: /* WME: AIFS */ ireq->i_val = wmep->wmep_aifsn; break; case IEEE80211_IOC_WME_TXOPLIMIT: /* WME: txops limit */ ireq->i_val = wmep->wmep_txopLimit; break; case IEEE80211_IOC_WME_ACM: /* WME: ACM (bss only) */ wmep = &wme->wme_wmeBssChanParams.cap_wmeParams[ac]; ireq->i_val = wmep->wmep_acm; break; case IEEE80211_IOC_WME_ACKPOLICY: /* WME: ACK policy (!bss only)*/ wmep = &wme->wme_wmeChanParams.cap_wmeParams[ac]; ireq->i_val = !wmep->wmep_noackPolicy; break; } return 0; } -static __noinline int +static int ieee80211_ioctl_getmaccmd(struct ieee80211vap *vap, struct ieee80211req *ireq) { const struct ieee80211_aclator *acl = vap->iv_acl; return (acl == NULL ? EINVAL : acl->iac_getioctl(vap, ireq)); } -static __noinline int +static int ieee80211_ioctl_getcurchan(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_channel *c; if (ireq->i_len != sizeof(struct ieee80211_channel)) return EINVAL; /* * vap's may have different operating channels when HT is * in use. When in RUN state report the vap-specific channel. * Otherwise return curchan. */ if (vap->iv_state == IEEE80211_S_RUN || vap->iv_state == IEEE80211_S_SLEEP) c = vap->iv_bss->ni_chan; else c = ic->ic_curchan; return copyout(c, ireq->i_data, sizeof(*c)); } static int getappie(const struct ieee80211_appie *aie, struct ieee80211req *ireq) { if (aie == NULL) return EINVAL; /* NB: truncate, caller can check length */ if (ireq->i_len > aie->ie_len) ireq->i_len = aie->ie_len; return copyout(aie->ie_data, ireq->i_data, ireq->i_len); } static int ieee80211_ioctl_getappie(struct ieee80211vap *vap, struct ieee80211req *ireq) { uint8_t fc0; fc0 = ireq->i_val & 0xff; if ((fc0 & IEEE80211_FC0_TYPE_MASK) != IEEE80211_FC0_TYPE_MGT) return EINVAL; /* NB: could check iv_opmode and reject but hardly worth the effort */ switch (fc0 & IEEE80211_FC0_SUBTYPE_MASK) { case IEEE80211_FC0_SUBTYPE_BEACON: return getappie(vap->iv_appie_beacon, ireq); case IEEE80211_FC0_SUBTYPE_PROBE_RESP: return getappie(vap->iv_appie_proberesp, ireq); case IEEE80211_FC0_SUBTYPE_ASSOC_RESP: return getappie(vap->iv_appie_assocresp, ireq); case IEEE80211_FC0_SUBTYPE_PROBE_REQ: return getappie(vap->iv_appie_probereq, ireq); case IEEE80211_FC0_SUBTYPE_ASSOC_REQ: return getappie(vap->iv_appie_assocreq, ireq); case IEEE80211_FC0_SUBTYPE_BEACON|IEEE80211_FC0_SUBTYPE_PROBE_RESP: return getappie(vap->iv_appie_wpa, ireq); } return EINVAL; } -static __noinline int +static int ieee80211_ioctl_getregdomain(struct ieee80211vap *vap, const struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; if (ireq->i_len != sizeof(ic->ic_regdomain)) return EINVAL; return copyout(&ic->ic_regdomain, ireq->i_data, sizeof(ic->ic_regdomain)); } -static __noinline int +static int ieee80211_ioctl_getroam(struct ieee80211vap *vap, const struct ieee80211req *ireq) { size_t len = ireq->i_len; /* NB: accept short requests for backwards compat */ if (len > sizeof(vap->iv_roamparms)) len = sizeof(vap->iv_roamparms); return copyout(vap->iv_roamparms, ireq->i_data, len); } -static __noinline int +static int ieee80211_ioctl_gettxparams(struct ieee80211vap *vap, const struct ieee80211req *ireq) { size_t len = ireq->i_len; /* NB: accept short requests for backwards compat */ if (len > sizeof(vap->iv_txparms)) len = sizeof(vap->iv_txparms); return copyout(vap->iv_txparms, ireq->i_data, len); } -static __noinline int +static int ieee80211_ioctl_getdevcaps(struct ieee80211com *ic, const struct ieee80211req *ireq) { struct ieee80211_devcaps_req *dc; struct ieee80211req_chaninfo *ci; int maxchans, error; maxchans = 1 + ((ireq->i_len - sizeof(struct ieee80211_devcaps_req)) / sizeof(struct ieee80211_channel)); /* NB: require 1 so we know ic_nchans is accessible */ if (maxchans < 1) return EINVAL; /* constrain max request size, 2K channels is ~24Kbytes */ if (maxchans > 2048) maxchans = 2048; dc = (struct ieee80211_devcaps_req *) IEEE80211_MALLOC(IEEE80211_DEVCAPS_SIZE(maxchans), M_TEMP, IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); if (dc == NULL) return ENOMEM; dc->dc_drivercaps = ic->ic_caps; dc->dc_cryptocaps = ic->ic_cryptocaps; dc->dc_htcaps = ic->ic_htcaps; ci = &dc->dc_chaninfo; ic->ic_getradiocaps(ic, maxchans, &ci->ic_nchans, ci->ic_chans); KASSERT(ci->ic_nchans <= maxchans, ("nchans %d maxchans %d", ci->ic_nchans, maxchans)); ieee80211_sort_channels(ci->ic_chans, ci->ic_nchans); error = copyout(dc, ireq->i_data, IEEE80211_DEVCAPS_SPACE(dc)); IEEE80211_FREE(dc, M_TEMP); return error; } -static __noinline int +static int ieee80211_ioctl_getstavlan(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211_node *ni; struct ieee80211req_sta_vlan vlan; int error; if (ireq->i_len != sizeof(vlan)) return EINVAL; error = copyin(ireq->i_data, &vlan, sizeof(vlan)); if (error != 0) return error; if (!IEEE80211_ADDR_EQ(vlan.sv_macaddr, zerobssid)) { ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, vlan.sv_macaddr); if (ni == NULL) return ENOENT; } else ni = ieee80211_ref_node(vap->iv_bss); vlan.sv_vlan = ni->ni_vlan; error = copyout(&vlan, ireq->i_data, sizeof(vlan)); ieee80211_free_node(ni); return error; } /* * Dummy ioctl get handler so the linker set is defined. */ static int dummy_ioctl_get(struct ieee80211vap *vap, struct ieee80211req *ireq) { return ENOSYS; } IEEE80211_IOCTL_GET(dummy, dummy_ioctl_get); static int ieee80211_ioctl_getdefault(struct ieee80211vap *vap, struct ieee80211req *ireq) { ieee80211_ioctl_getfunc * const *get; int error; SET_FOREACH(get, ieee80211_ioctl_getset) { error = (*get)(vap, ireq); if (error != ENOSYS) return error; } return EINVAL; } -/* - * When building the kernel with -O2 on the i386 architecture, gcc - * seems to want to inline this function into ieee80211_ioctl() - * (which is the only routine that calls it). When this happens, - * ieee80211_ioctl() ends up consuming an additional 2K of stack - * space. (Exactly why it needs so much is unclear.) The problem - * is that it's possible for ieee80211_ioctl() to invoke other - * routines (including driver init functions) which could then find - * themselves perilously close to exhausting the stack. - * - * To avoid this, we deliberately prevent gcc from inlining this - * routine. Another way to avoid this is to use less agressive - * optimization when compiling this file (i.e. -O instead of -O2) - * but special-casing the compilation of this one module in the - * build system would be awkward. - */ -static __noinline int +static int ieee80211_ioctl_get80211(struct ieee80211vap *vap, u_long cmd, struct ieee80211req *ireq) { #define MS(_v, _f) (((_v) & _f) >> _f##_S) struct ieee80211com *ic = vap->iv_ic; u_int kid, len; uint8_t tmpkey[IEEE80211_KEYBUF_SIZE]; char tmpssid[IEEE80211_NWID_LEN]; int error = 0; switch (ireq->i_type) { case IEEE80211_IOC_SSID: switch (vap->iv_state) { case IEEE80211_S_INIT: case IEEE80211_S_SCAN: ireq->i_len = vap->iv_des_ssid[0].len; memcpy(tmpssid, vap->iv_des_ssid[0].ssid, ireq->i_len); break; default: ireq->i_len = vap->iv_bss->ni_esslen; memcpy(tmpssid, vap->iv_bss->ni_essid, ireq->i_len); break; } error = copyout(tmpssid, ireq->i_data, ireq->i_len); break; case IEEE80211_IOC_NUMSSIDS: ireq->i_val = 1; break; case IEEE80211_IOC_WEP: if ((vap->iv_flags & IEEE80211_F_PRIVACY) == 0) ireq->i_val = IEEE80211_WEP_OFF; else if (vap->iv_flags & IEEE80211_F_DROPUNENC) ireq->i_val = IEEE80211_WEP_ON; else ireq->i_val = IEEE80211_WEP_MIXED; break; case IEEE80211_IOC_WEPKEY: kid = (u_int) ireq->i_val; if (kid >= IEEE80211_WEP_NKID) return EINVAL; len = (u_int) vap->iv_nw_keys[kid].wk_keylen; /* NB: only root can read WEP keys */ if (priv_check(curthread, PRIV_NET80211_GETKEY) == 0) { bcopy(vap->iv_nw_keys[kid].wk_key, tmpkey, len); } else { bzero(tmpkey, len); } ireq->i_len = len; error = copyout(tmpkey, ireq->i_data, len); break; case IEEE80211_IOC_NUMWEPKEYS: ireq->i_val = IEEE80211_WEP_NKID; break; case IEEE80211_IOC_WEPTXKEY: ireq->i_val = vap->iv_def_txkey; break; case IEEE80211_IOC_AUTHMODE: if (vap->iv_flags & IEEE80211_F_WPA) ireq->i_val = IEEE80211_AUTH_WPA; else ireq->i_val = vap->iv_bss->ni_authmode; break; case IEEE80211_IOC_CHANNEL: ireq->i_val = ieee80211_chan2ieee(ic, ic->ic_curchan); break; case IEEE80211_IOC_POWERSAVE: if (vap->iv_flags & IEEE80211_F_PMGTON) ireq->i_val = IEEE80211_POWERSAVE_ON; else ireq->i_val = IEEE80211_POWERSAVE_OFF; break; case IEEE80211_IOC_POWERSAVESLEEP: ireq->i_val = ic->ic_lintval; break; case IEEE80211_IOC_RTSTHRESHOLD: ireq->i_val = vap->iv_rtsthreshold; break; case IEEE80211_IOC_PROTMODE: ireq->i_val = ic->ic_protmode; break; case IEEE80211_IOC_TXPOWER: /* * Tx power limit is the min of max regulatory * power, any user-set limit, and the max the * radio can do. */ ireq->i_val = 2*ic->ic_curchan->ic_maxregpower; if (ireq->i_val > ic->ic_txpowlimit) ireq->i_val = ic->ic_txpowlimit; if (ireq->i_val > ic->ic_curchan->ic_maxpower) ireq->i_val = ic->ic_curchan->ic_maxpower; break; case IEEE80211_IOC_WPA: switch (vap->iv_flags & IEEE80211_F_WPA) { case IEEE80211_F_WPA1: ireq->i_val = 1; break; case IEEE80211_F_WPA2: ireq->i_val = 2; break; case IEEE80211_F_WPA1 | IEEE80211_F_WPA2: ireq->i_val = 3; break; default: ireq->i_val = 0; break; } break; case IEEE80211_IOC_CHANLIST: error = ieee80211_ioctl_getchanlist(vap, ireq); break; case IEEE80211_IOC_ROAMING: ireq->i_val = vap->iv_roaming; break; case IEEE80211_IOC_PRIVACY: ireq->i_val = (vap->iv_flags & IEEE80211_F_PRIVACY) != 0; break; case IEEE80211_IOC_DROPUNENCRYPTED: ireq->i_val = (vap->iv_flags & IEEE80211_F_DROPUNENC) != 0; break; case IEEE80211_IOC_COUNTERMEASURES: ireq->i_val = (vap->iv_flags & IEEE80211_F_COUNTERM) != 0; break; case IEEE80211_IOC_WME: ireq->i_val = (vap->iv_flags & IEEE80211_F_WME) != 0; break; case IEEE80211_IOC_HIDESSID: ireq->i_val = (vap->iv_flags & IEEE80211_F_HIDESSID) != 0; break; case IEEE80211_IOC_APBRIDGE: ireq->i_val = (vap->iv_flags & IEEE80211_F_NOBRIDGE) == 0; break; case IEEE80211_IOC_WPAKEY: error = ieee80211_ioctl_getkey(vap, ireq); break; case IEEE80211_IOC_CHANINFO: error = ieee80211_ioctl_getchaninfo(vap, ireq); break; case IEEE80211_IOC_BSSID: if (ireq->i_len != IEEE80211_ADDR_LEN) return EINVAL; if (vap->iv_state == IEEE80211_S_RUN || vap->iv_state == IEEE80211_S_SLEEP) { error = copyout(vap->iv_opmode == IEEE80211_M_WDS ? vap->iv_bss->ni_macaddr : vap->iv_bss->ni_bssid, ireq->i_data, ireq->i_len); } else error = copyout(vap->iv_des_bssid, ireq->i_data, ireq->i_len); break; case IEEE80211_IOC_WPAIE: - error = ieee80211_ioctl_getwpaie(vap, ireq, ireq->i_type); - break; case IEEE80211_IOC_WPAIE2: error = ieee80211_ioctl_getwpaie(vap, ireq, ireq->i_type); break; case IEEE80211_IOC_SCAN_RESULTS: error = ieee80211_ioctl_getscanresults(vap, ireq); break; case IEEE80211_IOC_STA_STATS: error = ieee80211_ioctl_getstastats(vap, ireq); break; case IEEE80211_IOC_TXPOWMAX: ireq->i_val = vap->iv_bss->ni_txpower; break; case IEEE80211_IOC_STA_TXPOW: error = ieee80211_ioctl_getstatxpow(vap, ireq); break; case IEEE80211_IOC_STA_INFO: error = ieee80211_ioctl_getstainfo(vap, ireq); break; case IEEE80211_IOC_WME_CWMIN: /* WME: CWmin */ case IEEE80211_IOC_WME_CWMAX: /* WME: CWmax */ case IEEE80211_IOC_WME_AIFS: /* WME: AIFS */ case IEEE80211_IOC_WME_TXOPLIMIT: /* WME: txops limit */ case IEEE80211_IOC_WME_ACM: /* WME: ACM (bss only) */ case IEEE80211_IOC_WME_ACKPOLICY: /* WME: ACK policy (!bss only) */ error = ieee80211_ioctl_getwmeparam(vap, ireq); break; case IEEE80211_IOC_DTIM_PERIOD: ireq->i_val = vap->iv_dtim_period; break; case IEEE80211_IOC_BEACON_INTERVAL: /* NB: get from ic_bss for station mode */ ireq->i_val = vap->iv_bss->ni_intval; break; case IEEE80211_IOC_PUREG: ireq->i_val = (vap->iv_flags & IEEE80211_F_PUREG) != 0; break; case IEEE80211_IOC_QUIET: ireq->i_val = vap->iv_quiet; break; case IEEE80211_IOC_QUIET_COUNT: ireq->i_val = vap->iv_quiet_count; break; case IEEE80211_IOC_QUIET_PERIOD: ireq->i_val = vap->iv_quiet_period; break; case IEEE80211_IOC_QUIET_DUR: ireq->i_val = vap->iv_quiet_duration; break; case IEEE80211_IOC_QUIET_OFFSET: ireq->i_val = vap->iv_quiet_offset; break; case IEEE80211_IOC_BGSCAN: ireq->i_val = (vap->iv_flags & IEEE80211_F_BGSCAN) != 0; break; case IEEE80211_IOC_BGSCAN_IDLE: ireq->i_val = vap->iv_bgscanidle*hz/1000; /* ms */ break; case IEEE80211_IOC_BGSCAN_INTERVAL: ireq->i_val = vap->iv_bgscanintvl/hz; /* seconds */ break; case IEEE80211_IOC_SCANVALID: ireq->i_val = vap->iv_scanvalid/hz; /* seconds */ break; case IEEE80211_IOC_FRAGTHRESHOLD: ireq->i_val = vap->iv_fragthreshold; break; case IEEE80211_IOC_MACCMD: error = ieee80211_ioctl_getmaccmd(vap, ireq); break; case IEEE80211_IOC_BURST: ireq->i_val = (vap->iv_flags & IEEE80211_F_BURST) != 0; break; case IEEE80211_IOC_BMISSTHRESHOLD: ireq->i_val = vap->iv_bmissthreshold; break; case IEEE80211_IOC_CURCHAN: error = ieee80211_ioctl_getcurchan(vap, ireq); break; case IEEE80211_IOC_SHORTGI: ireq->i_val = 0; if (vap->iv_flags_ht & IEEE80211_FHT_SHORTGI20) ireq->i_val |= IEEE80211_HTCAP_SHORTGI20; if (vap->iv_flags_ht & IEEE80211_FHT_SHORTGI40) ireq->i_val |= IEEE80211_HTCAP_SHORTGI40; break; case IEEE80211_IOC_AMPDU: ireq->i_val = 0; if (vap->iv_flags_ht & IEEE80211_FHT_AMPDU_TX) ireq->i_val |= 1; if (vap->iv_flags_ht & IEEE80211_FHT_AMPDU_RX) ireq->i_val |= 2; break; case IEEE80211_IOC_AMPDU_LIMIT: if (vap->iv_opmode == IEEE80211_M_HOSTAP) ireq->i_val = vap->iv_ampdu_rxmax; else if (vap->iv_state == IEEE80211_S_RUN || vap->iv_state == IEEE80211_S_SLEEP) ireq->i_val = MS(vap->iv_bss->ni_htparam, IEEE80211_HTCAP_MAXRXAMPDU); else ireq->i_val = vap->iv_ampdu_limit; break; case IEEE80211_IOC_AMPDU_DENSITY: if (vap->iv_opmode == IEEE80211_M_STA && (vap->iv_state == IEEE80211_S_RUN || vap->iv_state == IEEE80211_S_SLEEP)) ireq->i_val = MS(vap->iv_bss->ni_htparam, IEEE80211_HTCAP_MPDUDENSITY); else ireq->i_val = vap->iv_ampdu_density; break; case IEEE80211_IOC_AMSDU: ireq->i_val = 0; if (vap->iv_flags_ht & IEEE80211_FHT_AMSDU_TX) ireq->i_val |= 1; if (vap->iv_flags_ht & IEEE80211_FHT_AMSDU_RX) ireq->i_val |= 2; break; case IEEE80211_IOC_AMSDU_LIMIT: ireq->i_val = vap->iv_amsdu_limit; /* XXX truncation? */ break; case IEEE80211_IOC_PUREN: ireq->i_val = (vap->iv_flags_ht & IEEE80211_FHT_PUREN) != 0; break; case IEEE80211_IOC_DOTH: ireq->i_val = (vap->iv_flags & IEEE80211_F_DOTH) != 0; break; case IEEE80211_IOC_REGDOMAIN: error = ieee80211_ioctl_getregdomain(vap, ireq); break; case IEEE80211_IOC_ROAM: error = ieee80211_ioctl_getroam(vap, ireq); break; case IEEE80211_IOC_TXPARAMS: error = ieee80211_ioctl_gettxparams(vap, ireq); break; case IEEE80211_IOC_HTCOMPAT: ireq->i_val = (vap->iv_flags_ht & IEEE80211_FHT_HTCOMPAT) != 0; break; case IEEE80211_IOC_DWDS: ireq->i_val = (vap->iv_flags & IEEE80211_F_DWDS) != 0; break; case IEEE80211_IOC_INACTIVITY: ireq->i_val = (vap->iv_flags_ext & IEEE80211_FEXT_INACT) != 0; break; case IEEE80211_IOC_APPIE: error = ieee80211_ioctl_getappie(vap, ireq); break; case IEEE80211_IOC_WPS: ireq->i_val = (vap->iv_flags_ext & IEEE80211_FEXT_WPS) != 0; break; case IEEE80211_IOC_TSN: ireq->i_val = (vap->iv_flags_ext & IEEE80211_FEXT_TSN) != 0; break; case IEEE80211_IOC_DFS: ireq->i_val = (vap->iv_flags_ext & IEEE80211_FEXT_DFS) != 0; break; case IEEE80211_IOC_DOTD: ireq->i_val = (vap->iv_flags_ext & IEEE80211_FEXT_DOTD) != 0; break; case IEEE80211_IOC_DEVCAPS: error = ieee80211_ioctl_getdevcaps(ic, ireq); break; case IEEE80211_IOC_HTPROTMODE: ireq->i_val = ic->ic_htprotmode; break; case IEEE80211_IOC_HTCONF: if (vap->iv_flags_ht & IEEE80211_FHT_HT) { ireq->i_val = 1; if (vap->iv_flags_ht & IEEE80211_FHT_USEHT40) ireq->i_val |= 2; } else ireq->i_val = 0; break; case IEEE80211_IOC_STA_VLAN: error = ieee80211_ioctl_getstavlan(vap, ireq); break; case IEEE80211_IOC_SMPS: if (vap->iv_opmode == IEEE80211_M_STA && (vap->iv_state == IEEE80211_S_RUN || vap->iv_state == IEEE80211_S_SLEEP)) { if (vap->iv_bss->ni_flags & IEEE80211_NODE_MIMO_RTS) ireq->i_val = IEEE80211_HTCAP_SMPS_DYNAMIC; else if (vap->iv_bss->ni_flags & IEEE80211_NODE_MIMO_PS) ireq->i_val = IEEE80211_HTCAP_SMPS_ENA; else ireq->i_val = IEEE80211_HTCAP_SMPS_OFF; } else ireq->i_val = vap->iv_htcaps & IEEE80211_HTCAP_SMPS; break; case IEEE80211_IOC_RIFS: if (vap->iv_opmode == IEEE80211_M_STA && (vap->iv_state == IEEE80211_S_RUN || vap->iv_state == IEEE80211_S_SLEEP)) ireq->i_val = (vap->iv_bss->ni_flags & IEEE80211_NODE_RIFS) != 0; else ireq->i_val = (vap->iv_flags_ht & IEEE80211_FHT_RIFS) != 0; break; default: error = ieee80211_ioctl_getdefault(vap, ireq); break; } return error; #undef MS } -static __noinline int +static int ieee80211_ioctl_setkey(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211req_key ik; struct ieee80211_node *ni; struct ieee80211_key *wk; uint16_t kid; int error, i; if (ireq->i_len != sizeof(ik)) return EINVAL; error = copyin(ireq->i_data, &ik, sizeof(ik)); if (error) return error; /* NB: cipher support is verified by ieee80211_crypt_newkey */ /* NB: this also checks ik->ik_keylen > sizeof(wk->wk_key) */ if (ik.ik_keylen > sizeof(ik.ik_keydata)) return E2BIG; kid = ik.ik_keyix; if (kid == IEEE80211_KEYIX_NONE) { /* XXX unicast keys currently must be tx/rx */ if (ik.ik_flags != (IEEE80211_KEY_XMIT | IEEE80211_KEY_RECV)) return EINVAL; if (vap->iv_opmode == IEEE80211_M_STA) { ni = ieee80211_ref_node(vap->iv_bss); if (!IEEE80211_ADDR_EQ(ik.ik_macaddr, ni->ni_bssid)) { ieee80211_free_node(ni); return EADDRNOTAVAIL; } } else { ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, ik.ik_macaddr); if (ni == NULL) return ENOENT; } wk = &ni->ni_ucastkey; } else { if (kid >= IEEE80211_WEP_NKID) return EINVAL; wk = &vap->iv_nw_keys[kid]; /* * Global slots start off w/o any assigned key index. * Force one here for consistency with IEEE80211_IOC_WEPKEY. */ if (wk->wk_keyix == IEEE80211_KEYIX_NONE) wk->wk_keyix = kid; ni = NULL; } error = 0; ieee80211_key_update_begin(vap); if (ieee80211_crypto_newkey(vap, ik.ik_type, ik.ik_flags, wk)) { wk->wk_keylen = ik.ik_keylen; /* NB: MIC presence is implied by cipher type */ if (wk->wk_keylen > IEEE80211_KEYBUF_SIZE) wk->wk_keylen = IEEE80211_KEYBUF_SIZE; for (i = 0; i < IEEE80211_TID_SIZE; i++) wk->wk_keyrsc[i] = ik.ik_keyrsc; wk->wk_keytsc = 0; /* new key, reset */ memset(wk->wk_key, 0, sizeof(wk->wk_key)); memcpy(wk->wk_key, ik.ik_keydata, ik.ik_keylen); IEEE80211_ADDR_COPY(wk->wk_macaddr, ni != NULL ? ni->ni_macaddr : ik.ik_macaddr); if (!ieee80211_crypto_setkey(vap, wk)) error = EIO; else if ((ik.ik_flags & IEEE80211_KEY_DEFAULT)) vap->iv_def_txkey = kid; } else error = ENXIO; ieee80211_key_update_end(vap); if (ni != NULL) ieee80211_free_node(ni); return error; } -static __noinline int +static int ieee80211_ioctl_delkey(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211req_del_key dk; int kid, error; if (ireq->i_len != sizeof(dk)) return EINVAL; error = copyin(ireq->i_data, &dk, sizeof(dk)); if (error) return error; kid = dk.idk_keyix; /* XXX uint8_t -> uint16_t */ if (dk.idk_keyix == (uint8_t) IEEE80211_KEYIX_NONE) { struct ieee80211_node *ni; if (vap->iv_opmode == IEEE80211_M_STA) { ni = ieee80211_ref_node(vap->iv_bss); if (!IEEE80211_ADDR_EQ(dk.idk_macaddr, ni->ni_bssid)) { ieee80211_free_node(ni); return EADDRNOTAVAIL; } } else { ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, dk.idk_macaddr); if (ni == NULL) return ENOENT; } /* XXX error return */ ieee80211_node_delucastkey(ni); ieee80211_free_node(ni); } else { if (kid >= IEEE80211_WEP_NKID) return EINVAL; /* XXX error return */ ieee80211_crypto_delkey(vap, &vap->iv_nw_keys[kid]); } return 0; } struct mlmeop { struct ieee80211vap *vap; int op; int reason; }; static void mlmedebug(struct ieee80211vap *vap, const uint8_t mac[IEEE80211_ADDR_LEN], int op, int reason) { #ifdef IEEE80211_DEBUG static const struct { int mask; const char *opstr; } ops[] = { { 0, "op#0" }, { IEEE80211_MSG_IOCTL | IEEE80211_MSG_STATE | IEEE80211_MSG_ASSOC, "assoc" }, { IEEE80211_MSG_IOCTL | IEEE80211_MSG_STATE | IEEE80211_MSG_ASSOC, "disassoc" }, { IEEE80211_MSG_IOCTL | IEEE80211_MSG_STATE | IEEE80211_MSG_AUTH, "deauth" }, { IEEE80211_MSG_IOCTL | IEEE80211_MSG_STATE | IEEE80211_MSG_AUTH, "authorize" }, { IEEE80211_MSG_IOCTL | IEEE80211_MSG_STATE | IEEE80211_MSG_AUTH, "unauthorize" }, }; if (op == IEEE80211_MLME_AUTH) { IEEE80211_NOTE_MAC(vap, IEEE80211_MSG_IOCTL | IEEE80211_MSG_STATE | IEEE80211_MSG_AUTH, mac, "station authenticate %s via MLME (reason %d)", reason == IEEE80211_STATUS_SUCCESS ? "ACCEPT" : "REJECT", reason); } else if (!(IEEE80211_MLME_ASSOC <= op && op <= IEEE80211_MLME_AUTH)) { IEEE80211_NOTE_MAC(vap, IEEE80211_MSG_ANY, mac, "unknown MLME request %d (reason %d)", op, reason); } else if (reason == IEEE80211_STATUS_SUCCESS) { IEEE80211_NOTE_MAC(vap, ops[op].mask, mac, "station %s via MLME", ops[op].opstr); } else { IEEE80211_NOTE_MAC(vap, ops[op].mask, mac, "station %s via MLME (reason %d)", ops[op].opstr, reason); } #endif /* IEEE80211_DEBUG */ } static void domlme(void *arg, struct ieee80211_node *ni) { struct mlmeop *mop = arg; struct ieee80211vap *vap = ni->ni_vap; if (vap != mop->vap) return; /* * NB: if ni_associd is zero then the node is already cleaned * up and we don't need to do this (we're safely holding a * reference but should otherwise not modify it's state). */ if (ni->ni_associd == 0) return; mlmedebug(vap, ni->ni_macaddr, mop->op, mop->reason); if (mop->op == IEEE80211_MLME_DEAUTH) { IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_DEAUTH, mop->reason); } else { IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_DISASSOC, mop->reason); } ieee80211_node_leave(ni); } static int setmlme_dropsta(struct ieee80211vap *vap, const uint8_t mac[IEEE80211_ADDR_LEN], struct mlmeop *mlmeop) { struct ieee80211_node_table *nt = &vap->iv_ic->ic_sta; struct ieee80211_node *ni; int error = 0; /* NB: the broadcast address means do 'em all */ if (!IEEE80211_ADDR_EQ(mac, vap->iv_ifp->if_broadcastaddr)) { IEEE80211_NODE_LOCK(nt); ni = ieee80211_find_node_locked(nt, mac); IEEE80211_NODE_UNLOCK(nt); /* * Don't do the node update inside the node * table lock. This unfortunately causes LORs * with drivers and their TX paths. */ if (ni != NULL) { domlme(mlmeop, ni); ieee80211_free_node(ni); } else error = ENOENT; } else { ieee80211_iterate_nodes(nt, domlme, mlmeop); } return error; } -static __noinline int +static int setmlme_common(struct ieee80211vap *vap, int op, const uint8_t mac[IEEE80211_ADDR_LEN], int reason) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_node_table *nt = &ic->ic_sta; struct ieee80211_node *ni; struct mlmeop mlmeop; int error; error = 0; switch (op) { case IEEE80211_MLME_DISASSOC: case IEEE80211_MLME_DEAUTH: switch (vap->iv_opmode) { case IEEE80211_M_STA: mlmedebug(vap, vap->iv_bss->ni_macaddr, op, reason); /* XXX not quite right */ ieee80211_new_state(vap, IEEE80211_S_INIT, reason); break; case IEEE80211_M_HOSTAP: mlmeop.vap = vap; mlmeop.op = op; mlmeop.reason = reason; error = setmlme_dropsta(vap, mac, &mlmeop); break; case IEEE80211_M_WDS: /* XXX user app should send raw frame? */ if (op != IEEE80211_MLME_DEAUTH) { error = EINVAL; break; } #if 0 /* XXX accept any address, simplifies user code */ if (!IEEE80211_ADDR_EQ(mac, vap->iv_bss->ni_macaddr)) { error = EINVAL; break; } #endif mlmedebug(vap, vap->iv_bss->ni_macaddr, op, reason); ni = ieee80211_ref_node(vap->iv_bss); IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_DEAUTH, reason); ieee80211_free_node(ni); break; case IEEE80211_M_MBSS: IEEE80211_NODE_LOCK(nt); ni = ieee80211_find_node_locked(nt, mac); /* * Don't do the node update inside the node * table lock. This unfortunately causes LORs * with drivers and their TX paths. */ IEEE80211_NODE_UNLOCK(nt); if (ni != NULL) { ieee80211_node_leave(ni); ieee80211_free_node(ni); } else { error = ENOENT; } break; default: error = EINVAL; break; } break; case IEEE80211_MLME_AUTHORIZE: case IEEE80211_MLME_UNAUTHORIZE: if (vap->iv_opmode != IEEE80211_M_HOSTAP && vap->iv_opmode != IEEE80211_M_WDS) { error = EINVAL; break; } IEEE80211_NODE_LOCK(nt); ni = ieee80211_find_vap_node_locked(nt, vap, mac); /* * Don't do the node update inside the node * table lock. This unfortunately causes LORs * with drivers and their TX paths. */ IEEE80211_NODE_UNLOCK(nt); if (ni != NULL) { mlmedebug(vap, mac, op, reason); if (op == IEEE80211_MLME_AUTHORIZE) ieee80211_node_authorize(ni); else ieee80211_node_unauthorize(ni); ieee80211_free_node(ni); } else error = ENOENT; break; case IEEE80211_MLME_AUTH: if (vap->iv_opmode != IEEE80211_M_HOSTAP) { error = EINVAL; break; } IEEE80211_NODE_LOCK(nt); ni = ieee80211_find_vap_node_locked(nt, vap, mac); /* * Don't do the node update inside the node * table lock. This unfortunately causes LORs * with drivers and their TX paths. */ IEEE80211_NODE_UNLOCK(nt); if (ni != NULL) { mlmedebug(vap, mac, op, reason); if (reason == IEEE80211_STATUS_SUCCESS) { IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, 2); /* * For shared key auth, just continue the * exchange. Otherwise when 802.1x is not in * use mark the port authorized at this point * so traffic can flow. */ if (ni->ni_authmode != IEEE80211_AUTH_8021X && ni->ni_challenge == NULL) ieee80211_node_authorize(ni); } else { vap->iv_stats.is_rx_acl++; ieee80211_send_error(ni, ni->ni_macaddr, IEEE80211_FC0_SUBTYPE_AUTH, 2|(reason<<16)); ieee80211_node_leave(ni); } ieee80211_free_node(ni); } else error = ENOENT; break; default: error = EINVAL; break; } return error; } struct scanlookup { const uint8_t *mac; int esslen; const uint8_t *essid; const struct ieee80211_scan_entry *se; }; /* * Match mac address and any ssid. */ static void mlmelookup(void *arg, const struct ieee80211_scan_entry *se) { struct scanlookup *look = arg; if (!IEEE80211_ADDR_EQ(look->mac, se->se_macaddr)) return; if (look->esslen != 0) { if (se->se_ssid[1] != look->esslen) return; if (memcmp(look->essid, se->se_ssid+2, look->esslen)) return; } look->se = se; } -static __noinline int +static int setmlme_assoc_sta(struct ieee80211vap *vap, const uint8_t mac[IEEE80211_ADDR_LEN], int ssid_len, const uint8_t ssid[IEEE80211_NWID_LEN]) { struct scanlookup lookup; KASSERT(vap->iv_opmode == IEEE80211_M_STA, ("expected opmode STA not %s", ieee80211_opmode_name[vap->iv_opmode])); /* NB: this is racey if roaming is !manual */ lookup.se = NULL; lookup.mac = mac; lookup.esslen = ssid_len; lookup.essid = ssid; ieee80211_scan_iterate(vap, mlmelookup, &lookup); if (lookup.se == NULL) return ENOENT; mlmedebug(vap, mac, IEEE80211_MLME_ASSOC, 0); if (!ieee80211_sta_join(vap, lookup.se->se_chan, lookup.se)) return EIO; /* XXX unique but could be better */ return 0; } -static __noinline int +static int setmlme_assoc_adhoc(struct ieee80211vap *vap, const uint8_t mac[IEEE80211_ADDR_LEN], int ssid_len, const uint8_t ssid[IEEE80211_NWID_LEN]) { - struct ieee80211_scan_req sr; + struct ieee80211_scan_req *sr; + int error; KASSERT(vap->iv_opmode == IEEE80211_M_IBSS || vap->iv_opmode == IEEE80211_M_AHDEMO, ("expected opmode IBSS or AHDEMO not %s", ieee80211_opmode_name[vap->iv_opmode])); if (ssid_len == 0) return EINVAL; + sr = IEEE80211_MALLOC(sizeof(*sr), M_TEMP, + IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); + if (sr == NULL) + return ENOMEM; + /* NB: IEEE80211_IOC_SSID call missing for ap_scan=2. */ memset(vap->iv_des_ssid[0].ssid, 0, IEEE80211_NWID_LEN); vap->iv_des_ssid[0].len = ssid_len; memcpy(vap->iv_des_ssid[0].ssid, ssid, ssid_len); vap->iv_des_nssid = 1; - memset(&sr, 0, sizeof(sr)); - sr.sr_flags = IEEE80211_IOC_SCAN_ACTIVE | IEEE80211_IOC_SCAN_ONCE; - sr.sr_duration = IEEE80211_IOC_SCAN_FOREVER; - memcpy(sr.sr_ssid[0].ssid, ssid, ssid_len); - sr.sr_ssid[0].len = ssid_len; - sr.sr_nssid = 1; + sr->sr_flags = IEEE80211_IOC_SCAN_ACTIVE | IEEE80211_IOC_SCAN_ONCE; + sr->sr_duration = IEEE80211_IOC_SCAN_FOREVER; + memcpy(sr->sr_ssid[0].ssid, ssid, ssid_len); + sr->sr_ssid[0].len = ssid_len; + sr->sr_nssid = 1; - return ieee80211_scanreq(vap, &sr); + error = ieee80211_scanreq(vap, sr); + + IEEE80211_FREE(sr, M_TEMP); + return error; } -static __noinline int +static int ieee80211_ioctl_setmlme(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211req_mlme mlme; int error; if (ireq->i_len != sizeof(mlme)) return EINVAL; error = copyin(ireq->i_data, &mlme, sizeof(mlme)); if (error) return error; if (vap->iv_opmode == IEEE80211_M_STA && mlme.im_op == IEEE80211_MLME_ASSOC) return setmlme_assoc_sta(vap, mlme.im_macaddr, vap->iv_des_ssid[0].len, vap->iv_des_ssid[0].ssid); else if ((vap->iv_opmode == IEEE80211_M_IBSS || vap->iv_opmode == IEEE80211_M_AHDEMO) && mlme.im_op == IEEE80211_MLME_ASSOC) return setmlme_assoc_adhoc(vap, mlme.im_macaddr, mlme.im_ssid_len, mlme.im_ssid); else return setmlme_common(vap, mlme.im_op, mlme.im_macaddr, mlme.im_reason); } -static __noinline int +static int ieee80211_ioctl_macmac(struct ieee80211vap *vap, struct ieee80211req *ireq) { uint8_t mac[IEEE80211_ADDR_LEN]; const struct ieee80211_aclator *acl = vap->iv_acl; int error; if (ireq->i_len != sizeof(mac)) return EINVAL; error = copyin(ireq->i_data, mac, ireq->i_len); if (error) return error; if (acl == NULL) { acl = ieee80211_aclator_get("mac"); if (acl == NULL || !acl->iac_attach(vap)) return EINVAL; vap->iv_acl = acl; } if (ireq->i_type == IEEE80211_IOC_ADDMAC) acl->iac_add(vap, mac); else acl->iac_remove(vap, mac); return 0; } -static __noinline int +static int ieee80211_ioctl_setmaccmd(struct ieee80211vap *vap, struct ieee80211req *ireq) { const struct ieee80211_aclator *acl = vap->iv_acl; switch (ireq->i_val) { case IEEE80211_MACCMD_POLICY_OPEN: case IEEE80211_MACCMD_POLICY_ALLOW: case IEEE80211_MACCMD_POLICY_DENY: case IEEE80211_MACCMD_POLICY_RADIUS: if (acl == NULL) { acl = ieee80211_aclator_get("mac"); if (acl == NULL || !acl->iac_attach(vap)) return EINVAL; vap->iv_acl = acl; } acl->iac_setpolicy(vap, ireq->i_val); break; case IEEE80211_MACCMD_FLUSH: if (acl != NULL) acl->iac_flush(vap); /* NB: silently ignore when not in use */ break; case IEEE80211_MACCMD_DETACH: if (acl != NULL) { vap->iv_acl = NULL; acl->iac_detach(vap); } break; default: if (acl == NULL) return EINVAL; else return acl->iac_setioctl(vap, ireq); } return 0; } -static __noinline int +static int ieee80211_ioctl_setchanlist(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; uint8_t *chanlist, *list; int i, nchan, maxchan, error; if (ireq->i_len > sizeof(ic->ic_chan_active)) ireq->i_len = sizeof(ic->ic_chan_active); list = IEEE80211_MALLOC(ireq->i_len + IEEE80211_CHAN_BYTES, M_TEMP, IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); if (list == NULL) return ENOMEM; error = copyin(ireq->i_data, list, ireq->i_len); if (error) { IEEE80211_FREE(list, M_TEMP); return error; } nchan = 0; chanlist = list + ireq->i_len; /* NB: zero'd already */ maxchan = ireq->i_len * NBBY; for (i = 0; i < ic->ic_nchans; i++) { const struct ieee80211_channel *c = &ic->ic_channels[i]; /* * Calculate the intersection of the user list and the * available channels so users can do things like specify * 1-255 to get all available channels. */ if (c->ic_ieee < maxchan && isset(list, c->ic_ieee)) { setbit(chanlist, c->ic_ieee); nchan++; } } if (nchan == 0) { IEEE80211_FREE(list, M_TEMP); return EINVAL; } if (ic->ic_bsschan != IEEE80211_CHAN_ANYC && /* XXX */ isclr(chanlist, ic->ic_bsschan->ic_ieee)) ic->ic_bsschan = IEEE80211_CHAN_ANYC; memcpy(ic->ic_chan_active, chanlist, IEEE80211_CHAN_BYTES); ieee80211_scan_flush(vap); IEEE80211_FREE(list, M_TEMP); return ENETRESET; } -static __noinline int +static int ieee80211_ioctl_setstastats(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211_node *ni; uint8_t macaddr[IEEE80211_ADDR_LEN]; int error; /* * NB: we could copyin ieee80211req_sta_stats so apps * could make selective changes but that's overkill; * just clear all stats for now. */ if (ireq->i_len < IEEE80211_ADDR_LEN) return EINVAL; error = copyin(ireq->i_data, macaddr, IEEE80211_ADDR_LEN); if (error != 0) return error; ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, macaddr); if (ni == NULL) return ENOENT; /* XXX require ni_vap == vap? */ memset(&ni->ni_stats, 0, sizeof(ni->ni_stats)); ieee80211_free_node(ni); return 0; } -static __noinline int +static int ieee80211_ioctl_setstatxpow(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211_node *ni; struct ieee80211req_sta_txpow txpow; int error; if (ireq->i_len != sizeof(txpow)) return EINVAL; error = copyin(ireq->i_data, &txpow, sizeof(txpow)); if (error != 0) return error; ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, txpow.it_macaddr); if (ni == NULL) return ENOENT; ni->ni_txpower = txpow.it_txpow; ieee80211_free_node(ni); return error; } -static __noinline int +static int ieee80211_ioctl_setwmeparam(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_wme_state *wme = &ic->ic_wme; struct wmeParams *wmep, *chanp; int isbss, ac, aggrmode; if ((ic->ic_caps & IEEE80211_C_WME) == 0) return EOPNOTSUPP; isbss = (ireq->i_len & IEEE80211_WMEPARAM_BSS); ac = (ireq->i_len & IEEE80211_WMEPARAM_VAL); aggrmode = (wme->wme_flags & WME_F_AGGRMODE); if (ac >= WME_NUM_AC) ac = WME_AC_BE; if (isbss) { chanp = &wme->wme_bssChanParams.cap_wmeParams[ac]; wmep = &wme->wme_wmeBssChanParams.cap_wmeParams[ac]; } else { chanp = &wme->wme_chanParams.cap_wmeParams[ac]; wmep = &wme->wme_wmeChanParams.cap_wmeParams[ac]; } switch (ireq->i_type) { case IEEE80211_IOC_WME_CWMIN: /* WME: CWmin */ wmep->wmep_logcwmin = ireq->i_val; if (!isbss || !aggrmode) chanp->wmep_logcwmin = ireq->i_val; break; case IEEE80211_IOC_WME_CWMAX: /* WME: CWmax */ wmep->wmep_logcwmax = ireq->i_val; if (!isbss || !aggrmode) chanp->wmep_logcwmax = ireq->i_val; break; case IEEE80211_IOC_WME_AIFS: /* WME: AIFS */ wmep->wmep_aifsn = ireq->i_val; if (!isbss || !aggrmode) chanp->wmep_aifsn = ireq->i_val; break; case IEEE80211_IOC_WME_TXOPLIMIT: /* WME: txops limit */ wmep->wmep_txopLimit = ireq->i_val; if (!isbss || !aggrmode) chanp->wmep_txopLimit = ireq->i_val; break; case IEEE80211_IOC_WME_ACM: /* WME: ACM (bss only) */ wmep->wmep_acm = ireq->i_val; if (!aggrmode) chanp->wmep_acm = ireq->i_val; break; case IEEE80211_IOC_WME_ACKPOLICY: /* WME: ACK policy (!bss only)*/ wmep->wmep_noackPolicy = chanp->wmep_noackPolicy = (ireq->i_val) == 0; break; } ieee80211_wme_updateparams(vap); return 0; } static int find11gchannel(struct ieee80211com *ic, int start, int freq) { const struct ieee80211_channel *c; int i; for (i = start+1; i < ic->ic_nchans; i++) { c = &ic->ic_channels[i]; if (c->ic_freq == freq && IEEE80211_IS_CHAN_ANYG(c)) return 1; } /* NB: should not be needed but in case things are mis-sorted */ for (i = 0; i < start; i++) { c = &ic->ic_channels[i]; if (c->ic_freq == freq && IEEE80211_IS_CHAN_ANYG(c)) return 1; } return 0; } static struct ieee80211_channel * findchannel(struct ieee80211com *ic, int ieee, int mode) { static const u_int chanflags[IEEE80211_MODE_MAX] = { [IEEE80211_MODE_AUTO] = 0, [IEEE80211_MODE_11A] = IEEE80211_CHAN_A, [IEEE80211_MODE_11B] = IEEE80211_CHAN_B, [IEEE80211_MODE_11G] = IEEE80211_CHAN_G, [IEEE80211_MODE_FH] = IEEE80211_CHAN_FHSS, [IEEE80211_MODE_TURBO_A] = IEEE80211_CHAN_108A, [IEEE80211_MODE_TURBO_G] = IEEE80211_CHAN_108G, [IEEE80211_MODE_STURBO_A] = IEEE80211_CHAN_STURBO, [IEEE80211_MODE_HALF] = IEEE80211_CHAN_HALF, [IEEE80211_MODE_QUARTER] = IEEE80211_CHAN_QUARTER, /* NB: handled specially below */ [IEEE80211_MODE_11NA] = IEEE80211_CHAN_A, [IEEE80211_MODE_11NG] = IEEE80211_CHAN_G, }; u_int modeflags; int i; modeflags = chanflags[mode]; for (i = 0; i < ic->ic_nchans; i++) { struct ieee80211_channel *c = &ic->ic_channels[i]; if (c->ic_ieee != ieee) continue; if (mode == IEEE80211_MODE_AUTO) { /* ignore turbo channels for autoselect */ if (IEEE80211_IS_CHAN_TURBO(c)) continue; /* * XXX special-case 11b/g channels so we * always select the g channel if both * are present. * XXX prefer HT to non-HT? */ if (!IEEE80211_IS_CHAN_B(c) || !find11gchannel(ic, i, c->ic_freq)) return c; } else { /* must check HT specially */ if ((mode == IEEE80211_MODE_11NA || mode == IEEE80211_MODE_11NG) && !IEEE80211_IS_CHAN_HT(c)) continue; if ((c->ic_flags & modeflags) == modeflags) return c; } } return NULL; } /* * Check the specified against any desired mode (aka netband). * This is only used (presently) when operating in hostap mode * to enforce consistency. */ static int check_mode_consistency(const struct ieee80211_channel *c, int mode) { KASSERT(c != IEEE80211_CHAN_ANYC, ("oops, no channel")); switch (mode) { case IEEE80211_MODE_11B: return (IEEE80211_IS_CHAN_B(c)); case IEEE80211_MODE_11G: return (IEEE80211_IS_CHAN_ANYG(c) && !IEEE80211_IS_CHAN_HT(c)); case IEEE80211_MODE_11A: return (IEEE80211_IS_CHAN_A(c) && !IEEE80211_IS_CHAN_HT(c)); case IEEE80211_MODE_STURBO_A: return (IEEE80211_IS_CHAN_STURBO(c)); case IEEE80211_MODE_11NA: return (IEEE80211_IS_CHAN_HTA(c)); case IEEE80211_MODE_11NG: return (IEEE80211_IS_CHAN_HTG(c)); } return 1; } /* * Common code to set the current channel. If the device * is up and running this may result in an immediate channel * change or a kick of the state machine. */ static int setcurchan(struct ieee80211vap *vap, struct ieee80211_channel *c) { struct ieee80211com *ic = vap->iv_ic; int error; if (c != IEEE80211_CHAN_ANYC) { if (IEEE80211_IS_CHAN_RADAR(c)) return EBUSY; /* XXX better code? */ if (vap->iv_opmode == IEEE80211_M_HOSTAP) { if (IEEE80211_IS_CHAN_NOHOSTAP(c)) return EINVAL; if (!check_mode_consistency(c, vap->iv_des_mode)) return EINVAL; } else if (vap->iv_opmode == IEEE80211_M_IBSS) { if (IEEE80211_IS_CHAN_NOADHOC(c)) return EINVAL; } if ((vap->iv_state == IEEE80211_S_RUN || vap->iv_state == IEEE80211_S_SLEEP) && vap->iv_bss->ni_chan == c) return 0; /* NB: nothing to do */ } vap->iv_des_chan = c; error = 0; if (vap->iv_opmode == IEEE80211_M_MONITOR && vap->iv_des_chan != IEEE80211_CHAN_ANYC) { /* * Monitor mode can switch directly. */ if (IFNET_IS_UP_RUNNING(vap->iv_ifp)) { /* XXX need state machine for other vap's to follow */ ieee80211_setcurchan(ic, vap->iv_des_chan); vap->iv_bss->ni_chan = ic->ic_curchan; } else ic->ic_curchan = vap->iv_des_chan; ic->ic_rt = ieee80211_get_ratetable(ic->ic_curchan); } else { /* * Need to go through the state machine in case we * need to reassociate or the like. The state machine * will pickup the desired channel and avoid scanning. */ if (IS_UP_AUTO(vap)) ieee80211_new_state(vap, IEEE80211_S_SCAN, 0); else if (vap->iv_des_chan != IEEE80211_CHAN_ANYC) { /* * When not up+running and a real channel has * been specified fix the current channel so * there is immediate feedback; e.g. via ifconfig. */ ic->ic_curchan = vap->iv_des_chan; ic->ic_rt = ieee80211_get_ratetable(ic->ic_curchan); } } return error; } /* * Old api for setting the current channel; this is * deprecated because channel numbers are ambiguous. */ -static __noinline int +static int ieee80211_ioctl_setchannel(struct ieee80211vap *vap, const struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_channel *c; /* XXX 0xffff overflows 16-bit signed */ if (ireq->i_val == 0 || ireq->i_val == (int16_t) IEEE80211_CHAN_ANY) { c = IEEE80211_CHAN_ANYC; } else { struct ieee80211_channel *c2; c = findchannel(ic, ireq->i_val, vap->iv_des_mode); if (c == NULL) { c = findchannel(ic, ireq->i_val, IEEE80211_MODE_AUTO); if (c == NULL) return EINVAL; } /* * Fine tune channel selection based on desired mode: * if 11b is requested, find the 11b version of any * 11g channel returned, * if static turbo, find the turbo version of any * 11a channel return, * if 11na is requested, find the ht version of any * 11a channel returned, * if 11ng is requested, find the ht version of any * 11g channel returned, * otherwise we should be ok with what we've got. */ switch (vap->iv_des_mode) { case IEEE80211_MODE_11B: if (IEEE80211_IS_CHAN_ANYG(c)) { c2 = findchannel(ic, ireq->i_val, IEEE80211_MODE_11B); /* NB: should not happen, =>'s 11g w/o 11b */ if (c2 != NULL) c = c2; } break; case IEEE80211_MODE_TURBO_A: if (IEEE80211_IS_CHAN_A(c)) { c2 = findchannel(ic, ireq->i_val, IEEE80211_MODE_TURBO_A); if (c2 != NULL) c = c2; } break; case IEEE80211_MODE_11NA: if (IEEE80211_IS_CHAN_A(c)) { c2 = findchannel(ic, ireq->i_val, IEEE80211_MODE_11NA); if (c2 != NULL) c = c2; } break; case IEEE80211_MODE_11NG: if (IEEE80211_IS_CHAN_ANYG(c)) { c2 = findchannel(ic, ireq->i_val, IEEE80211_MODE_11NG); if (c2 != NULL) c = c2; } break; default: /* NB: no static turboG */ break; } } return setcurchan(vap, c); } /* * New/current api for setting the current channel; a complete * channel description is provide so there is no ambiguity in * identifying the channel. */ -static __noinline int +static int ieee80211_ioctl_setcurchan(struct ieee80211vap *vap, const struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_channel chan, *c; int error; if (ireq->i_len != sizeof(chan)) return EINVAL; error = copyin(ireq->i_data, &chan, sizeof(chan)); if (error != 0) return error; /* XXX 0xffff overflows 16-bit signed */ if (chan.ic_freq == 0 || chan.ic_freq == IEEE80211_CHAN_ANY) { c = IEEE80211_CHAN_ANYC; } else { c = ieee80211_find_channel(ic, chan.ic_freq, chan.ic_flags); if (c == NULL) return EINVAL; } return setcurchan(vap, c); } -static __noinline int +static int ieee80211_ioctl_setregdomain(struct ieee80211vap *vap, const struct ieee80211req *ireq) { struct ieee80211_regdomain_req *reg; int nchans, error; nchans = 1 + ((ireq->i_len - sizeof(struct ieee80211_regdomain_req)) / sizeof(struct ieee80211_channel)); if (!(1 <= nchans && nchans <= IEEE80211_CHAN_MAX)) { IEEE80211_DPRINTF(vap, IEEE80211_MSG_IOCTL, "%s: bad # chans, i_len %d nchans %d\n", __func__, ireq->i_len, nchans); return EINVAL; } reg = (struct ieee80211_regdomain_req *) IEEE80211_MALLOC(IEEE80211_REGDOMAIN_SIZE(nchans), M_TEMP, IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); if (reg == NULL) { IEEE80211_DPRINTF(vap, IEEE80211_MSG_IOCTL, "%s: no memory, nchans %d\n", __func__, nchans); return ENOMEM; } error = copyin(ireq->i_data, reg, IEEE80211_REGDOMAIN_SIZE(nchans)); if (error == 0) { /* NB: validate inline channel count against storage size */ if (reg->chaninfo.ic_nchans != nchans) { IEEE80211_DPRINTF(vap, IEEE80211_MSG_IOCTL, "%s: chan cnt mismatch, %d != %d\n", __func__, reg->chaninfo.ic_nchans, nchans); error = EINVAL; } else error = ieee80211_setregdomain(vap, reg); } IEEE80211_FREE(reg, M_TEMP); return (error == 0 ? ENETRESET : error); } static int ieee80211_ioctl_setroam(struct ieee80211vap *vap, const struct ieee80211req *ireq) { if (ireq->i_len != sizeof(vap->iv_roamparms)) return EINVAL; /* XXX validate params */ /* XXX? ENETRESET to push to device? */ return copyin(ireq->i_data, vap->iv_roamparms, sizeof(vap->iv_roamparms)); } static int checkrate(const struct ieee80211_rateset *rs, int rate) { int i; if (rate == IEEE80211_FIXED_RATE_NONE) return 1; for (i = 0; i < rs->rs_nrates; i++) if ((rs->rs_rates[i] & IEEE80211_RATE_VAL) == rate) return 1; return 0; } static int checkmcs(int mcs) { if (mcs == IEEE80211_FIXED_RATE_NONE) return 1; if ((mcs & IEEE80211_RATE_MCS) == 0) /* MCS always have 0x80 set */ return 0; return (mcs & 0x7f) <= 15; /* XXX could search ht rate set */ } -static __noinline int +static int ieee80211_ioctl_settxparams(struct ieee80211vap *vap, const struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_txparams_req parms; /* XXX stack use? */ struct ieee80211_txparam *src, *dst; const struct ieee80211_rateset *rs; int error, mode, changed, is11n, nmodes; /* NB: accept short requests for backwards compat */ if (ireq->i_len > sizeof(parms)) return EINVAL; error = copyin(ireq->i_data, &parms, ireq->i_len); if (error != 0) return error; nmodes = ireq->i_len / sizeof(struct ieee80211_txparam); changed = 0; /* validate parameters and check if anything changed */ for (mode = IEEE80211_MODE_11A; mode < nmodes; mode++) { if (isclr(ic->ic_modecaps, mode)) continue; src = &parms.params[mode]; dst = &vap->iv_txparms[mode]; rs = &ic->ic_sup_rates[mode]; /* NB: 11n maps to legacy */ is11n = (mode == IEEE80211_MODE_11NA || mode == IEEE80211_MODE_11NG); if (src->ucastrate != dst->ucastrate) { if (!checkrate(rs, src->ucastrate) && (!is11n || !checkmcs(src->ucastrate))) return EINVAL; changed++; } if (src->mcastrate != dst->mcastrate) { if (!checkrate(rs, src->mcastrate) && (!is11n || !checkmcs(src->mcastrate))) return EINVAL; changed++; } if (src->mgmtrate != dst->mgmtrate) { if (!checkrate(rs, src->mgmtrate) && (!is11n || !checkmcs(src->mgmtrate))) return EINVAL; changed++; } if (src->maxretry != dst->maxretry) /* NB: no bounds */ changed++; } if (changed) { /* * Copy new parameters in place and notify the * driver so it can push state to the device. */ for (mode = IEEE80211_MODE_11A; mode < nmodes; mode++) { if (isset(ic->ic_modecaps, mode)) vap->iv_txparms[mode] = parms.params[mode]; } /* XXX could be more intelligent, e.g. don't reset if setting not being used */ return ENETRESET; } return 0; } /* * Application Information Element support. */ static int setappie(struct ieee80211_appie **aie, const struct ieee80211req *ireq) { struct ieee80211_appie *app = *aie; struct ieee80211_appie *napp; int error; if (ireq->i_len == 0) { /* delete any existing ie */ if (app != NULL) { *aie = NULL; /* XXX racey */ IEEE80211_FREE(app, M_80211_NODE_IE); } return 0; } if (!(2 <= ireq->i_len && ireq->i_len <= IEEE80211_MAX_APPIE)) return EINVAL; /* * Allocate a new appie structure and copy in the user data. * When done swap in the new structure. Note that we do not * guard against users holding a ref to the old structure; * this must be handled outside this code. * * XXX bad bad bad */ napp = (struct ieee80211_appie *) IEEE80211_MALLOC( sizeof(struct ieee80211_appie) + ireq->i_len, M_80211_NODE_IE, IEEE80211_M_NOWAIT); if (napp == NULL) return ENOMEM; /* XXX holding ic lock */ error = copyin(ireq->i_data, napp->ie_data, ireq->i_len); if (error) { IEEE80211_FREE(napp, M_80211_NODE_IE); return error; } napp->ie_len = ireq->i_len; *aie = napp; if (app != NULL) IEEE80211_FREE(app, M_80211_NODE_IE); return 0; } static void setwparsnie(struct ieee80211vap *vap, uint8_t *ie, int space) { /* validate data is present as best we can */ if (space == 0 || 2+ie[1] > space) return; if (ie[0] == IEEE80211_ELEMID_VENDOR) vap->iv_wpa_ie = ie; else if (ie[0] == IEEE80211_ELEMID_RSN) vap->iv_rsn_ie = ie; } -static __noinline int +static int ieee80211_ioctl_setappie_locked(struct ieee80211vap *vap, const struct ieee80211req *ireq, int fc0) { int error; IEEE80211_LOCK_ASSERT(vap->iv_ic); switch (fc0 & IEEE80211_FC0_SUBTYPE_MASK) { case IEEE80211_FC0_SUBTYPE_BEACON: if (vap->iv_opmode != IEEE80211_M_HOSTAP && vap->iv_opmode != IEEE80211_M_IBSS) { error = EINVAL; break; } error = setappie(&vap->iv_appie_beacon, ireq); if (error == 0) ieee80211_beacon_notify(vap, IEEE80211_BEACON_APPIE); break; case IEEE80211_FC0_SUBTYPE_PROBE_RESP: error = setappie(&vap->iv_appie_proberesp, ireq); break; case IEEE80211_FC0_SUBTYPE_ASSOC_RESP: if (vap->iv_opmode == IEEE80211_M_HOSTAP) error = setappie(&vap->iv_appie_assocresp, ireq); else error = EINVAL; break; case IEEE80211_FC0_SUBTYPE_PROBE_REQ: error = setappie(&vap->iv_appie_probereq, ireq); break; case IEEE80211_FC0_SUBTYPE_ASSOC_REQ: if (vap->iv_opmode == IEEE80211_M_STA) error = setappie(&vap->iv_appie_assocreq, ireq); else error = EINVAL; break; case (IEEE80211_APPIE_WPA & IEEE80211_FC0_SUBTYPE_MASK): error = setappie(&vap->iv_appie_wpa, ireq); if (error == 0) { /* * Must split single blob of data into separate * WPA and RSN ie's because they go in different * locations in the mgt frames. * XXX use IEEE80211_IOC_WPA2 so user code does split */ vap->iv_wpa_ie = NULL; vap->iv_rsn_ie = NULL; if (vap->iv_appie_wpa != NULL) { struct ieee80211_appie *appie = vap->iv_appie_wpa; uint8_t *data = appie->ie_data; /* XXX ie length validate is painful, cheat */ setwparsnie(vap, data, appie->ie_len); setwparsnie(vap, data + 2 + data[1], appie->ie_len - (2 + data[1])); } if (vap->iv_opmode == IEEE80211_M_HOSTAP || vap->iv_opmode == IEEE80211_M_IBSS) { /* * Must rebuild beacon frame as the update * mechanism doesn't handle WPA/RSN ie's. * Could extend it but it doesn't normally * change; this is just to deal with hostapd * plumbing the ie after the interface is up. */ error = ENETRESET; } } break; default: error = EINVAL; break; } return error; } -static __noinline int +static int ieee80211_ioctl_setappie(struct ieee80211vap *vap, const struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; int error; uint8_t fc0; fc0 = ireq->i_val & 0xff; if ((fc0 & IEEE80211_FC0_TYPE_MASK) != IEEE80211_FC0_TYPE_MGT) return EINVAL; /* NB: could check iv_opmode and reject but hardly worth the effort */ IEEE80211_LOCK(ic); error = ieee80211_ioctl_setappie_locked(vap, ireq, fc0); IEEE80211_UNLOCK(ic); return error; } -static __noinline int +static int ieee80211_ioctl_chanswitch(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; struct ieee80211_chanswitch_req csr; struct ieee80211_channel *c; int error; if (ireq->i_len != sizeof(csr)) return EINVAL; error = copyin(ireq->i_data, &csr, sizeof(csr)); if (error != 0) return error; /* XXX adhoc mode not supported */ if (vap->iv_opmode != IEEE80211_M_HOSTAP || (vap->iv_flags & IEEE80211_F_DOTH) == 0) return EOPNOTSUPP; c = ieee80211_find_channel(ic, csr.csa_chan.ic_freq, csr.csa_chan.ic_flags); if (c == NULL) return ENOENT; IEEE80211_LOCK(ic); if ((ic->ic_flags & IEEE80211_F_CSAPENDING) == 0) ieee80211_csa_startswitch(ic, c, csr.csa_mode, csr.csa_count); else if (csr.csa_count == 0) ieee80211_csa_cancelswitch(ic); else error = EBUSY; IEEE80211_UNLOCK(ic); return error; } static int ieee80211_scanreq(struct ieee80211vap *vap, struct ieee80211_scan_req *sr) { #define IEEE80211_IOC_SCAN_FLAGS \ (IEEE80211_IOC_SCAN_NOPICK | IEEE80211_IOC_SCAN_ACTIVE | \ IEEE80211_IOC_SCAN_PICK1ST | IEEE80211_IOC_SCAN_BGSCAN | \ IEEE80211_IOC_SCAN_ONCE | IEEE80211_IOC_SCAN_NOBCAST | \ IEEE80211_IOC_SCAN_NOJOIN | IEEE80211_IOC_SCAN_FLUSH | \ IEEE80211_IOC_SCAN_CHECK) struct ieee80211com *ic = vap->iv_ic; int error, i; /* convert duration */ if (sr->sr_duration == IEEE80211_IOC_SCAN_FOREVER) sr->sr_duration = IEEE80211_SCAN_FOREVER; else { if (sr->sr_duration < IEEE80211_IOC_SCAN_DURATION_MIN || sr->sr_duration > IEEE80211_IOC_SCAN_DURATION_MAX) return EINVAL; sr->sr_duration = msecs_to_ticks(sr->sr_duration); if (sr->sr_duration < 1) sr->sr_duration = 1; } /* convert min/max channel dwell */ if (sr->sr_mindwell != 0) { sr->sr_mindwell = msecs_to_ticks(sr->sr_mindwell); if (sr->sr_mindwell < 1) sr->sr_mindwell = 1; } if (sr->sr_maxdwell != 0) { sr->sr_maxdwell = msecs_to_ticks(sr->sr_maxdwell); if (sr->sr_maxdwell < 1) sr->sr_maxdwell = 1; } /* NB: silently reduce ssid count to what is supported */ if (sr->sr_nssid > IEEE80211_SCAN_MAX_SSID) sr->sr_nssid = IEEE80211_SCAN_MAX_SSID; for (i = 0; i < sr->sr_nssid; i++) if (sr->sr_ssid[i].len > IEEE80211_NWID_LEN) return EINVAL; /* cleanse flags just in case, could reject if invalid flags */ sr->sr_flags &= IEEE80211_IOC_SCAN_FLAGS; /* * Add an implicit NOPICK if the vap is not marked UP. This * allows applications to scan without joining a bss (or picking * a channel and setting up a bss) and without forcing manual * roaming mode--you just need to mark the parent device UP. */ if ((vap->iv_ifp->if_flags & IFF_UP) == 0) sr->sr_flags |= IEEE80211_IOC_SCAN_NOPICK; IEEE80211_DPRINTF(vap, IEEE80211_MSG_SCAN, "%s: flags 0x%x%s duration 0x%x mindwell %u maxdwell %u nssid %d\n", __func__, sr->sr_flags, (vap->iv_ifp->if_flags & IFF_UP) == 0 ? " (!IFF_UP)" : "", sr->sr_duration, sr->sr_mindwell, sr->sr_maxdwell, sr->sr_nssid); /* * If we are in INIT state then the driver has never had a chance * to setup hardware state to do a scan; we must use the state * machine to get us up to the SCAN state but once we reach SCAN * state we then want to use the supplied params. Stash the * parameters in the vap and mark IEEE80211_FEXT_SCANREQ; the * state machines will recognize this and use the stashed params * to issue the scan request. * * Otherwise just invoke the scan machinery directly. */ IEEE80211_LOCK(ic); if (vap->iv_state == IEEE80211_S_INIT) { /* NB: clobbers previous settings */ vap->iv_scanreq_flags = sr->sr_flags; vap->iv_scanreq_duration = sr->sr_duration; vap->iv_scanreq_nssid = sr->sr_nssid; for (i = 0; i < sr->sr_nssid; i++) { vap->iv_scanreq_ssid[i].len = sr->sr_ssid[i].len; memcpy(vap->iv_scanreq_ssid[i].ssid, sr->sr_ssid[i].ssid, sr->sr_ssid[i].len); } vap->iv_flags_ext |= IEEE80211_FEXT_SCANREQ; IEEE80211_UNLOCK(ic); ieee80211_new_state(vap, IEEE80211_S_SCAN, 0); } else { vap->iv_flags_ext &= ~IEEE80211_FEXT_SCANREQ; IEEE80211_UNLOCK(ic); if (sr->sr_flags & IEEE80211_IOC_SCAN_CHECK) { error = ieee80211_check_scan(vap, sr->sr_flags, sr->sr_duration, sr->sr_mindwell, sr->sr_maxdwell, sr->sr_nssid, /* NB: cheat, we assume structures are compatible */ (const struct ieee80211_scan_ssid *) &sr->sr_ssid[0]); } else { error = ieee80211_start_scan(vap, sr->sr_flags, sr->sr_duration, sr->sr_mindwell, sr->sr_maxdwell, sr->sr_nssid, /* NB: cheat, we assume structures are compatible */ (const struct ieee80211_scan_ssid *) &sr->sr_ssid[0]); } if (error == 0) return EINPROGRESS; } return 0; #undef IEEE80211_IOC_SCAN_FLAGS } -static __noinline int +static int ieee80211_ioctl_scanreq(struct ieee80211vap *vap, struct ieee80211req *ireq) { - struct ieee80211_scan_req sr; /* XXX off stack? */ + struct ieee80211_scan_req *sr; int error; - if (ireq->i_len != sizeof(sr)) + if (ireq->i_len != sizeof(*sr)) return EINVAL; - error = copyin(ireq->i_data, &sr, sizeof(sr)); + sr = IEEE80211_MALLOC(sizeof(*sr), M_TEMP, + IEEE80211_M_NOWAIT | IEEE80211_M_ZERO); + if (sr == NULL) + return ENOMEM; + error = copyin(ireq->i_data, sr, sizeof(*sr)); if (error != 0) - return error; - return ieee80211_scanreq(vap, &sr); + goto bad; + error = ieee80211_scanreq(vap, sr); +bad: + IEEE80211_FREE(sr, M_TEMP); + return error; } -static __noinline int +static int ieee80211_ioctl_setstavlan(struct ieee80211vap *vap, struct ieee80211req *ireq) { struct ieee80211_node *ni; struct ieee80211req_sta_vlan vlan; int error; if (ireq->i_len != sizeof(vlan)) return EINVAL; error = copyin(ireq->i_data, &vlan, sizeof(vlan)); if (error != 0) return error; if (!IEEE80211_ADDR_EQ(vlan.sv_macaddr, zerobssid)) { ni = ieee80211_find_vap_node(&vap->iv_ic->ic_sta, vap, vlan.sv_macaddr); if (ni == NULL) return ENOENT; } else ni = ieee80211_ref_node(vap->iv_bss); ni->ni_vlan = vlan.sv_vlan; ieee80211_free_node(ni); return error; } static int isvap11g(const struct ieee80211vap *vap) { const struct ieee80211_node *bss = vap->iv_bss; return bss->ni_chan != IEEE80211_CHAN_ANYC && IEEE80211_IS_CHAN_ANYG(bss->ni_chan); } static int isvapht(const struct ieee80211vap *vap) { const struct ieee80211_node *bss = vap->iv_bss; return bss->ni_chan != IEEE80211_CHAN_ANYC && IEEE80211_IS_CHAN_HT(bss->ni_chan); } /* * Dummy ioctl set handler so the linker set is defined. */ static int dummy_ioctl_set(struct ieee80211vap *vap, struct ieee80211req *ireq) { return ENOSYS; } IEEE80211_IOCTL_SET(dummy, dummy_ioctl_set); static int ieee80211_ioctl_setdefault(struct ieee80211vap *vap, struct ieee80211req *ireq) { ieee80211_ioctl_setfunc * const *set; int error; SET_FOREACH(set, ieee80211_ioctl_setset) { error = (*set)(vap, ireq); if (error != ENOSYS) return error; } return EINVAL; } -static __noinline int +static int ieee80211_ioctl_set80211(struct ieee80211vap *vap, u_long cmd, struct ieee80211req *ireq) { struct ieee80211com *ic = vap->iv_ic; int error; const struct ieee80211_authenticator *auth; uint8_t tmpkey[IEEE80211_KEYBUF_SIZE]; char tmpssid[IEEE80211_NWID_LEN]; uint8_t tmpbssid[IEEE80211_ADDR_LEN]; struct ieee80211_key *k; u_int kid; uint32_t flags; error = 0; switch (ireq->i_type) { case IEEE80211_IOC_SSID: if (ireq->i_val != 0 || ireq->i_len > IEEE80211_NWID_LEN) return EINVAL; error = copyin(ireq->i_data, tmpssid, ireq->i_len); if (error) break; memset(vap->iv_des_ssid[0].ssid, 0, IEEE80211_NWID_LEN); vap->iv_des_ssid[0].len = ireq->i_len; memcpy(vap->iv_des_ssid[0].ssid, tmpssid, ireq->i_len); vap->iv_des_nssid = (ireq->i_len > 0); error = ENETRESET; break; case IEEE80211_IOC_WEP: switch (ireq->i_val) { case IEEE80211_WEP_OFF: vap->iv_flags &= ~IEEE80211_F_PRIVACY; vap->iv_flags &= ~IEEE80211_F_DROPUNENC; break; case IEEE80211_WEP_ON: vap->iv_flags |= IEEE80211_F_PRIVACY; vap->iv_flags |= IEEE80211_F_DROPUNENC; break; case IEEE80211_WEP_MIXED: vap->iv_flags |= IEEE80211_F_PRIVACY; vap->iv_flags &= ~IEEE80211_F_DROPUNENC; break; } error = ENETRESET; break; case IEEE80211_IOC_WEPKEY: kid = (u_int) ireq->i_val; if (kid >= IEEE80211_WEP_NKID) return EINVAL; k = &vap->iv_nw_keys[kid]; if (ireq->i_len == 0) { /* zero-len =>'s delete any existing key */ (void) ieee80211_crypto_delkey(vap, k); break; } if (ireq->i_len > sizeof(tmpkey)) return EINVAL; memset(tmpkey, 0, sizeof(tmpkey)); error = copyin(ireq->i_data, tmpkey, ireq->i_len); if (error) break; ieee80211_key_update_begin(vap); k->wk_keyix = kid; /* NB: force fixed key id */ if (ieee80211_crypto_newkey(vap, IEEE80211_CIPHER_WEP, IEEE80211_KEY_XMIT | IEEE80211_KEY_RECV, k)) { k->wk_keylen = ireq->i_len; memcpy(k->wk_key, tmpkey, sizeof(tmpkey)); IEEE80211_ADDR_COPY(k->wk_macaddr, vap->iv_myaddr); if (!ieee80211_crypto_setkey(vap, k)) error = EINVAL; } else error = EINVAL; ieee80211_key_update_end(vap); break; case IEEE80211_IOC_WEPTXKEY: kid = (u_int) ireq->i_val; if (kid >= IEEE80211_WEP_NKID && (uint16_t) kid != IEEE80211_KEYIX_NONE) return EINVAL; vap->iv_def_txkey = kid; break; case IEEE80211_IOC_AUTHMODE: switch (ireq->i_val) { case IEEE80211_AUTH_WPA: case IEEE80211_AUTH_8021X: /* 802.1x */ case IEEE80211_AUTH_OPEN: /* open */ case IEEE80211_AUTH_SHARED: /* shared-key */ case IEEE80211_AUTH_AUTO: /* auto */ auth = ieee80211_authenticator_get(ireq->i_val); if (auth == NULL) return EINVAL; break; default: return EINVAL; } switch (ireq->i_val) { case IEEE80211_AUTH_WPA: /* WPA w/ 802.1x */ vap->iv_flags |= IEEE80211_F_PRIVACY; ireq->i_val = IEEE80211_AUTH_8021X; break; case IEEE80211_AUTH_OPEN: /* open */ vap->iv_flags &= ~(IEEE80211_F_WPA|IEEE80211_F_PRIVACY); break; case IEEE80211_AUTH_SHARED: /* shared-key */ case IEEE80211_AUTH_8021X: /* 802.1x */ vap->iv_flags &= ~IEEE80211_F_WPA; /* both require a key so mark the PRIVACY capability */ vap->iv_flags |= IEEE80211_F_PRIVACY; break; case IEEE80211_AUTH_AUTO: /* auto */ vap->iv_flags &= ~IEEE80211_F_WPA; /* XXX PRIVACY handling? */ /* XXX what's the right way to do this? */ break; } /* NB: authenticator attach/detach happens on state change */ vap->iv_bss->ni_authmode = ireq->i_val; /* XXX mixed/mode/usage? */ vap->iv_auth = auth; error = ENETRESET; break; case IEEE80211_IOC_CHANNEL: error = ieee80211_ioctl_setchannel(vap, ireq); break; case IEEE80211_IOC_POWERSAVE: switch (ireq->i_val) { case IEEE80211_POWERSAVE_OFF: if (vap->iv_flags & IEEE80211_F_PMGTON) { ieee80211_syncflag(vap, -IEEE80211_F_PMGTON); error = ERESTART; } break; case IEEE80211_POWERSAVE_ON: if ((vap->iv_caps & IEEE80211_C_PMGT) == 0) error = EOPNOTSUPP; else if ((vap->iv_flags & IEEE80211_F_PMGTON) == 0) { ieee80211_syncflag(vap, IEEE80211_F_PMGTON); error = ERESTART; } break; default: error = EINVAL; break; } break; case IEEE80211_IOC_POWERSAVESLEEP: if (ireq->i_val < 0) return EINVAL; ic->ic_lintval = ireq->i_val; error = ERESTART; break; case IEEE80211_IOC_RTSTHRESHOLD: if (!(IEEE80211_RTS_MIN <= ireq->i_val && ireq->i_val <= IEEE80211_RTS_MAX)) return EINVAL; vap->iv_rtsthreshold = ireq->i_val; error = ERESTART; break; case IEEE80211_IOC_PROTMODE: if (ireq->i_val > IEEE80211_PROT_RTSCTS) return EINVAL; ic->ic_protmode = (enum ieee80211_protmode)ireq->i_val; /* NB: if not operating in 11g this can wait */ if (ic->ic_bsschan != IEEE80211_CHAN_ANYC && IEEE80211_IS_CHAN_ANYG(ic->ic_bsschan)) error = ERESTART; break; case IEEE80211_IOC_TXPOWER: if ((ic->ic_caps & IEEE80211_C_TXPMGT) == 0) return EOPNOTSUPP; if (!(IEEE80211_TXPOWER_MIN <= ireq->i_val && ireq->i_val <= IEEE80211_TXPOWER_MAX)) return EINVAL; ic->ic_txpowlimit = ireq->i_val; error = ERESTART; break; case IEEE80211_IOC_ROAMING: if (!(IEEE80211_ROAMING_DEVICE <= ireq->i_val && ireq->i_val <= IEEE80211_ROAMING_MANUAL)) return EINVAL; vap->iv_roaming = (enum ieee80211_roamingmode)ireq->i_val; /* XXXX reset? */ break; case IEEE80211_IOC_PRIVACY: if (ireq->i_val) { /* XXX check for key state? */ vap->iv_flags |= IEEE80211_F_PRIVACY; } else vap->iv_flags &= ~IEEE80211_F_PRIVACY; /* XXX ERESTART? */ break; case IEEE80211_IOC_DROPUNENCRYPTED: if (ireq->i_val) vap->iv_flags |= IEEE80211_F_DROPUNENC; else vap->iv_flags &= ~IEEE80211_F_DROPUNENC; /* XXX ERESTART? */ break; case IEEE80211_IOC_WPAKEY: error = ieee80211_ioctl_setkey(vap, ireq); break; case IEEE80211_IOC_DELKEY: error = ieee80211_ioctl_delkey(vap, ireq); break; case IEEE80211_IOC_MLME: error = ieee80211_ioctl_setmlme(vap, ireq); break; case IEEE80211_IOC_COUNTERMEASURES: if (ireq->i_val) { if ((vap->iv_flags & IEEE80211_F_WPA) == 0) return EOPNOTSUPP; vap->iv_flags |= IEEE80211_F_COUNTERM; } else vap->iv_flags &= ~IEEE80211_F_COUNTERM; /* XXX ERESTART? */ break; case IEEE80211_IOC_WPA: if (ireq->i_val > 3) return EINVAL; /* XXX verify ciphers available */ flags = vap->iv_flags & ~IEEE80211_F_WPA; switch (ireq->i_val) { case 0: /* wpa_supplicant calls this to clear the WPA config */ break; case 1: if (!(vap->iv_caps & IEEE80211_C_WPA1)) return EOPNOTSUPP; flags |= IEEE80211_F_WPA1; break; case 2: if (!(vap->iv_caps & IEEE80211_C_WPA2)) return EOPNOTSUPP; flags |= IEEE80211_F_WPA2; break; case 3: if ((vap->iv_caps & IEEE80211_C_WPA) != IEEE80211_C_WPA) return EOPNOTSUPP; flags |= IEEE80211_F_WPA1 | IEEE80211_F_WPA2; break; default: /* Can't set any -> error */ return EOPNOTSUPP; } vap->iv_flags = flags; error = ERESTART; /* NB: can change beacon frame */ break; case IEEE80211_IOC_WME: if (ireq->i_val) { if ((vap->iv_caps & IEEE80211_C_WME) == 0) return EOPNOTSUPP; ieee80211_syncflag(vap, IEEE80211_F_WME); } else ieee80211_syncflag(vap, -IEEE80211_F_WME); error = ERESTART; /* NB: can change beacon frame */ break; case IEEE80211_IOC_HIDESSID: if (ireq->i_val) vap->iv_flags |= IEEE80211_F_HIDESSID; else vap->iv_flags &= ~IEEE80211_F_HIDESSID; error = ERESTART; /* XXX ENETRESET? */ break; case IEEE80211_IOC_APBRIDGE: if (ireq->i_val == 0) vap->iv_flags |= IEEE80211_F_NOBRIDGE; else vap->iv_flags &= ~IEEE80211_F_NOBRIDGE; break; case IEEE80211_IOC_BSSID: if (ireq->i_len != sizeof(tmpbssid)) return EINVAL; error = copyin(ireq->i_data, tmpbssid, ireq->i_len); if (error) break; IEEE80211_ADDR_COPY(vap->iv_des_bssid, tmpbssid); if (IEEE80211_ADDR_EQ(vap->iv_des_bssid, zerobssid)) vap->iv_flags &= ~IEEE80211_F_DESBSSID; else vap->iv_flags |= IEEE80211_F_DESBSSID; error = ENETRESET; break; case IEEE80211_IOC_CHANLIST: error = ieee80211_ioctl_setchanlist(vap, ireq); break; #define OLD_IEEE80211_IOC_SCAN_REQ 23 #ifdef OLD_IEEE80211_IOC_SCAN_REQ case OLD_IEEE80211_IOC_SCAN_REQ: IEEE80211_DPRINTF(vap, IEEE80211_MSG_SCAN, "%s: active scan request\n", __func__); /* * If we are in INIT state then the driver has never * had a chance to setup hardware state to do a scan; * use the state machine to get us up the SCAN state. * Otherwise just invoke the scan machinery to start * a one-time scan. */ if (vap->iv_state == IEEE80211_S_INIT) ieee80211_new_state(vap, IEEE80211_S_SCAN, 0); else (void) ieee80211_start_scan(vap, IEEE80211_SCAN_ACTIVE | IEEE80211_SCAN_NOPICK | IEEE80211_SCAN_ONCE, IEEE80211_SCAN_FOREVER, 0, 0, /* XXX use ioctl params */ vap->iv_des_nssid, vap->iv_des_ssid); break; #endif /* OLD_IEEE80211_IOC_SCAN_REQ */ case IEEE80211_IOC_SCAN_REQ: error = ieee80211_ioctl_scanreq(vap, ireq); break; case IEEE80211_IOC_SCAN_CANCEL: IEEE80211_DPRINTF(vap, IEEE80211_MSG_SCAN, "%s: cancel scan\n", __func__); ieee80211_cancel_scan(vap); break; case IEEE80211_IOC_HTCONF: if (ireq->i_val & 1) ieee80211_syncflag_ht(vap, IEEE80211_FHT_HT); else ieee80211_syncflag_ht(vap, -IEEE80211_FHT_HT); if (ireq->i_val & 2) ieee80211_syncflag_ht(vap, IEEE80211_FHT_USEHT40); else ieee80211_syncflag_ht(vap, -IEEE80211_FHT_USEHT40); error = ENETRESET; break; case IEEE80211_IOC_ADDMAC: case IEEE80211_IOC_DELMAC: error = ieee80211_ioctl_macmac(vap, ireq); break; case IEEE80211_IOC_MACCMD: error = ieee80211_ioctl_setmaccmd(vap, ireq); break; case IEEE80211_IOC_STA_STATS: error = ieee80211_ioctl_setstastats(vap, ireq); break; case IEEE80211_IOC_STA_TXPOW: error = ieee80211_ioctl_setstatxpow(vap, ireq); break; case IEEE80211_IOC_WME_CWMIN: /* WME: CWmin */ case IEEE80211_IOC_WME_CWMAX: /* WME: CWmax */ case IEEE80211_IOC_WME_AIFS: /* WME: AIFS */ case IEEE80211_IOC_WME_TXOPLIMIT: /* WME: txops limit */ case IEEE80211_IOC_WME_ACM: /* WME: ACM (bss only) */ case IEEE80211_IOC_WME_ACKPOLICY: /* WME: ACK policy (!bss only) */ error = ieee80211_ioctl_setwmeparam(vap, ireq); break; case IEEE80211_IOC_DTIM_PERIOD: if (vap->iv_opmode != IEEE80211_M_HOSTAP && vap->iv_opmode != IEEE80211_M_MBSS && vap->iv_opmode != IEEE80211_M_IBSS) return EINVAL; if (IEEE80211_DTIM_MIN <= ireq->i_val && ireq->i_val <= IEEE80211_DTIM_MAX) { vap->iv_dtim_period = ireq->i_val; error = ENETRESET; /* requires restart */ } else error = EINVAL; break; case IEEE80211_IOC_BEACON_INTERVAL: if (vap->iv_opmode != IEEE80211_M_HOSTAP && vap->iv_opmode != IEEE80211_M_MBSS && vap->iv_opmode != IEEE80211_M_IBSS) return EINVAL; if (IEEE80211_BINTVAL_MIN <= ireq->i_val && ireq->i_val <= IEEE80211_BINTVAL_MAX) { ic->ic_bintval = ireq->i_val; error = ENETRESET; /* requires restart */ } else error = EINVAL; break; case IEEE80211_IOC_PUREG: if (ireq->i_val) vap->iv_flags |= IEEE80211_F_PUREG; else vap->iv_flags &= ~IEEE80211_F_PUREG; /* NB: reset only if we're operating on an 11g channel */ if (isvap11g(vap)) error = ENETRESET; break; case IEEE80211_IOC_QUIET: vap->iv_quiet= ireq->i_val; break; case IEEE80211_IOC_QUIET_COUNT: vap->iv_quiet_count=ireq->i_val; break; case IEEE80211_IOC_QUIET_PERIOD: vap->iv_quiet_period=ireq->i_val; break; case IEEE80211_IOC_QUIET_OFFSET: vap->iv_quiet_offset=ireq->i_val; break; case IEEE80211_IOC_QUIET_DUR: if(ireq->i_val < vap->iv_bss->ni_intval) vap->iv_quiet_duration = ireq->i_val; else error = EINVAL; break; case IEEE80211_IOC_BGSCAN: if (ireq->i_val) { if ((vap->iv_caps & IEEE80211_C_BGSCAN) == 0) return EOPNOTSUPP; vap->iv_flags |= IEEE80211_F_BGSCAN; } else vap->iv_flags &= ~IEEE80211_F_BGSCAN; break; case IEEE80211_IOC_BGSCAN_IDLE: if (ireq->i_val >= IEEE80211_BGSCAN_IDLE_MIN) vap->iv_bgscanidle = ireq->i_val*hz/1000; else error = EINVAL; break; case IEEE80211_IOC_BGSCAN_INTERVAL: if (ireq->i_val >= IEEE80211_BGSCAN_INTVAL_MIN) vap->iv_bgscanintvl = ireq->i_val*hz; else error = EINVAL; break; case IEEE80211_IOC_SCANVALID: if (ireq->i_val >= IEEE80211_SCAN_VALID_MIN) vap->iv_scanvalid = ireq->i_val*hz; else error = EINVAL; break; case IEEE80211_IOC_FRAGTHRESHOLD: if ((vap->iv_caps & IEEE80211_C_TXFRAG) == 0 && ireq->i_val != IEEE80211_FRAG_MAX) return EOPNOTSUPP; if (!(IEEE80211_FRAG_MIN <= ireq->i_val && ireq->i_val <= IEEE80211_FRAG_MAX)) return EINVAL; vap->iv_fragthreshold = ireq->i_val; error = ERESTART; break; case IEEE80211_IOC_BURST: if (ireq->i_val) { if ((vap->iv_caps & IEEE80211_C_BURST) == 0) return EOPNOTSUPP; ieee80211_syncflag(vap, IEEE80211_F_BURST); } else ieee80211_syncflag(vap, -IEEE80211_F_BURST); error = ERESTART; break; case IEEE80211_IOC_BMISSTHRESHOLD: if (!(IEEE80211_HWBMISS_MIN <= ireq->i_val && ireq->i_val <= IEEE80211_HWBMISS_MAX)) return EINVAL; vap->iv_bmissthreshold = ireq->i_val; error = ERESTART; break; case IEEE80211_IOC_CURCHAN: error = ieee80211_ioctl_setcurchan(vap, ireq); break; case IEEE80211_IOC_SHORTGI: if (ireq->i_val) { #define IEEE80211_HTCAP_SHORTGI \ (IEEE80211_HTCAP_SHORTGI20 | IEEE80211_HTCAP_SHORTGI40) if (((ireq->i_val ^ vap->iv_htcaps) & IEEE80211_HTCAP_SHORTGI) != 0) return EINVAL; if (ireq->i_val & IEEE80211_HTCAP_SHORTGI20) vap->iv_flags_ht |= IEEE80211_FHT_SHORTGI20; if (ireq->i_val & IEEE80211_HTCAP_SHORTGI40) vap->iv_flags_ht |= IEEE80211_FHT_SHORTGI40; #undef IEEE80211_HTCAP_SHORTGI } else vap->iv_flags_ht &= ~(IEEE80211_FHT_SHORTGI20 | IEEE80211_FHT_SHORTGI40); error = ERESTART; break; case IEEE80211_IOC_AMPDU: if (ireq->i_val && (vap->iv_htcaps & IEEE80211_HTC_AMPDU) == 0) return EINVAL; if (ireq->i_val & 1) vap->iv_flags_ht |= IEEE80211_FHT_AMPDU_TX; else vap->iv_flags_ht &= ~IEEE80211_FHT_AMPDU_TX; if (ireq->i_val & 2) vap->iv_flags_ht |= IEEE80211_FHT_AMPDU_RX; else vap->iv_flags_ht &= ~IEEE80211_FHT_AMPDU_RX; /* NB: reset only if we're operating on an 11n channel */ if (isvapht(vap)) error = ERESTART; break; case IEEE80211_IOC_AMPDU_LIMIT: if (!(IEEE80211_HTCAP_MAXRXAMPDU_8K <= ireq->i_val && ireq->i_val <= IEEE80211_HTCAP_MAXRXAMPDU_64K)) return EINVAL; if (vap->iv_opmode == IEEE80211_M_HOSTAP) vap->iv_ampdu_rxmax = ireq->i_val; else vap->iv_ampdu_limit = ireq->i_val; error = ERESTART; break; case IEEE80211_IOC_AMPDU_DENSITY: if (!(IEEE80211_HTCAP_MPDUDENSITY_NA <= ireq->i_val && ireq->i_val <= IEEE80211_HTCAP_MPDUDENSITY_16)) return EINVAL; vap->iv_ampdu_density = ireq->i_val; error = ERESTART; break; case IEEE80211_IOC_AMSDU: if (ireq->i_val && (vap->iv_htcaps & IEEE80211_HTC_AMSDU) == 0) return EINVAL; if (ireq->i_val & 1) vap->iv_flags_ht |= IEEE80211_FHT_AMSDU_TX; else vap->iv_flags_ht &= ~IEEE80211_FHT_AMSDU_TX; if (ireq->i_val & 2) vap->iv_flags_ht |= IEEE80211_FHT_AMSDU_RX; else vap->iv_flags_ht &= ~IEEE80211_FHT_AMSDU_RX; /* NB: reset only if we're operating on an 11n channel */ if (isvapht(vap)) error = ERESTART; break; case IEEE80211_IOC_AMSDU_LIMIT: /* XXX validate */ vap->iv_amsdu_limit = ireq->i_val; /* XXX truncation? */ break; case IEEE80211_IOC_PUREN: if (ireq->i_val) { if ((vap->iv_flags_ht & IEEE80211_FHT_HT) == 0) return EINVAL; vap->iv_flags_ht |= IEEE80211_FHT_PUREN; } else vap->iv_flags_ht &= ~IEEE80211_FHT_PUREN; /* NB: reset only if we're operating on an 11n channel */ if (isvapht(vap)) error = ERESTART; break; case IEEE80211_IOC_DOTH: if (ireq->i_val) { #if 0 /* XXX no capability */ if ((vap->iv_caps & IEEE80211_C_DOTH) == 0) return EOPNOTSUPP; #endif vap->iv_flags |= IEEE80211_F_DOTH; } else vap->iv_flags &= ~IEEE80211_F_DOTH; error = ENETRESET; break; case IEEE80211_IOC_REGDOMAIN: error = ieee80211_ioctl_setregdomain(vap, ireq); break; case IEEE80211_IOC_ROAM: error = ieee80211_ioctl_setroam(vap, ireq); break; case IEEE80211_IOC_TXPARAMS: error = ieee80211_ioctl_settxparams(vap, ireq); break; case IEEE80211_IOC_HTCOMPAT: if (ireq->i_val) { if ((vap->iv_flags_ht & IEEE80211_FHT_HT) == 0) return EOPNOTSUPP; vap->iv_flags_ht |= IEEE80211_FHT_HTCOMPAT; } else vap->iv_flags_ht &= ~IEEE80211_FHT_HTCOMPAT; /* NB: reset only if we're operating on an 11n channel */ if (isvapht(vap)) error = ERESTART; break; case IEEE80211_IOC_DWDS: if (ireq->i_val) { /* NB: DWDS only makes sense for WDS-capable devices */ if ((ic->ic_caps & IEEE80211_C_WDS) == 0) return EOPNOTSUPP; /* NB: DWDS is used only with ap+sta vaps */ if (vap->iv_opmode != IEEE80211_M_HOSTAP && vap->iv_opmode != IEEE80211_M_STA) return EINVAL; vap->iv_flags |= IEEE80211_F_DWDS; if (vap->iv_opmode == IEEE80211_M_STA) vap->iv_flags_ext |= IEEE80211_FEXT_4ADDR; } else { vap->iv_flags &= ~IEEE80211_F_DWDS; if (vap->iv_opmode == IEEE80211_M_STA) vap->iv_flags_ext &= ~IEEE80211_FEXT_4ADDR; } break; case IEEE80211_IOC_INACTIVITY: if (ireq->i_val) vap->iv_flags_ext |= IEEE80211_FEXT_INACT; else vap->iv_flags_ext &= ~IEEE80211_FEXT_INACT; break; case IEEE80211_IOC_APPIE: error = ieee80211_ioctl_setappie(vap, ireq); break; case IEEE80211_IOC_WPS: if (ireq->i_val) { if ((vap->iv_caps & IEEE80211_C_WPA) == 0) return EOPNOTSUPP; vap->iv_flags_ext |= IEEE80211_FEXT_WPS; } else vap->iv_flags_ext &= ~IEEE80211_FEXT_WPS; break; case IEEE80211_IOC_TSN: if (ireq->i_val) { if ((vap->iv_caps & IEEE80211_C_WPA) == 0) return EOPNOTSUPP; vap->iv_flags_ext |= IEEE80211_FEXT_TSN; } else vap->iv_flags_ext &= ~IEEE80211_FEXT_TSN; break; case IEEE80211_IOC_CHANSWITCH: error = ieee80211_ioctl_chanswitch(vap, ireq); break; case IEEE80211_IOC_DFS: if (ireq->i_val) { if ((vap->iv_caps & IEEE80211_C_DFS) == 0) return EOPNOTSUPP; /* NB: DFS requires 11h support */ if ((vap->iv_flags & IEEE80211_F_DOTH) == 0) return EINVAL; vap->iv_flags_ext |= IEEE80211_FEXT_DFS; } else vap->iv_flags_ext &= ~IEEE80211_FEXT_DFS; break; case IEEE80211_IOC_DOTD: if (ireq->i_val) vap->iv_flags_ext |= IEEE80211_FEXT_DOTD; else vap->iv_flags_ext &= ~IEEE80211_FEXT_DOTD; if (vap->iv_opmode == IEEE80211_M_STA) error = ENETRESET; break; case IEEE80211_IOC_HTPROTMODE: if (ireq->i_val > IEEE80211_PROT_RTSCTS) return EINVAL; ic->ic_htprotmode = ireq->i_val ? IEEE80211_PROT_RTSCTS : IEEE80211_PROT_NONE; /* NB: if not operating in 11n this can wait */ if (isvapht(vap)) error = ERESTART; break; case IEEE80211_IOC_STA_VLAN: error = ieee80211_ioctl_setstavlan(vap, ireq); break; case IEEE80211_IOC_SMPS: if ((ireq->i_val &~ IEEE80211_HTCAP_SMPS) != 0 || ireq->i_val == 0x0008) /* value of 2 is reserved */ return EINVAL; if (ireq->i_val != IEEE80211_HTCAP_SMPS_OFF && (vap->iv_htcaps & IEEE80211_HTC_SMPS) == 0) return EOPNOTSUPP; vap->iv_htcaps = (vap->iv_htcaps &~ IEEE80211_HTCAP_SMPS) | ireq->i_val; /* NB: if not operating in 11n this can wait */ if (isvapht(vap)) error = ERESTART; break; case IEEE80211_IOC_RIFS: if (ireq->i_val != 0) { if ((vap->iv_htcaps & IEEE80211_HTC_RIFS) == 0) return EOPNOTSUPP; vap->iv_flags_ht |= IEEE80211_FHT_RIFS; } else vap->iv_flags_ht &= ~IEEE80211_FHT_RIFS; /* NB: if not operating in 11n this can wait */ if (isvapht(vap)) error = ERESTART; break; default: error = ieee80211_ioctl_setdefault(vap, ireq); break; } /* * The convention is that ENETRESET means an operation * requires a complete re-initialization of the device (e.g. * changing something that affects the association state). * ERESTART means the request may be handled with only a * reload of the hardware state. We hand ERESTART requests * to the iv_reset callback so the driver can decide. If * a device does not fillin iv_reset then it defaults to one * that returns ENETRESET. Otherwise a driver may return * ENETRESET (in which case a full reset will be done) or * 0 to mean there's no need to do anything (e.g. when the * change has no effect on the driver/device). */ if (error == ERESTART) error = IFNET_IS_UP_RUNNING(vap->iv_ifp) ? vap->iv_reset(vap, ireq->i_type) : 0; if (error == ENETRESET) { /* XXX need to re-think AUTO handling */ if (IS_UP_AUTO(vap)) ieee80211_init(vap); error = 0; } return error; } int ieee80211_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { struct ieee80211vap *vap = ifp->if_softc; struct ieee80211com *ic = vap->iv_ic; int error = 0; struct ifreq *ifr; struct ifaddr *ifa; /* XXX */ switch (cmd) { case SIOCSIFFLAGS: IEEE80211_LOCK(ic); if ((ifp->if_flags ^ vap->iv_ifflags) & IFF_PROMISC) ieee80211_promisc(vap, ifp->if_flags & IFF_PROMISC); if ((ifp->if_flags ^ vap->iv_ifflags) & IFF_ALLMULTI) ieee80211_allmulti(vap, ifp->if_flags & IFF_ALLMULTI); vap->iv_ifflags = ifp->if_flags; if (ifp->if_flags & IFF_UP) { /* * Bring ourself up unless we're already operational. * If we're the first vap and the parent is not up * then it will automatically be brought up as a * side-effect of bringing ourself up. */ if (vap->iv_state == IEEE80211_S_INIT) ieee80211_start_locked(vap); } else if (ifp->if_drv_flags & IFF_DRV_RUNNING) { /* * Stop ourself. If we are the last vap to be * marked down the parent will also be taken down. */ ieee80211_stop_locked(vap); } IEEE80211_UNLOCK(ic); /* Wait for parent ioctl handler if it was queued */ ieee80211_waitfor_parent(ic); break; case SIOCADDMULTI: case SIOCDELMULTI: ieee80211_runtask(ic, &ic->ic_mcast_task); break; case SIOCSIFMEDIA: case SIOCGIFMEDIA: ifr = (struct ifreq *)data; error = ifmedia_ioctl(ifp, ifr, &vap->iv_media, cmd); break; case SIOCG80211: error = ieee80211_ioctl_get80211(vap, cmd, (struct ieee80211req *) data); break; case SIOCS80211: error = priv_check(curthread, PRIV_NET80211_MANAGE); if (error == 0) error = ieee80211_ioctl_set80211(vap, cmd, (struct ieee80211req *) data); break; case SIOCG80211STATS: ifr = (struct ifreq *)data; copyout(&vap->iv_stats, ifr->ifr_data, sizeof (vap->iv_stats)); break; case SIOCSIFMTU: ifr = (struct ifreq *)data; if (!(IEEE80211_MTU_MIN <= ifr->ifr_mtu && ifr->ifr_mtu <= IEEE80211_MTU_MAX)) error = EINVAL; else ifp->if_mtu = ifr->ifr_mtu; break; case SIOCSIFADDR: /* * XXX Handle this directly so we can supress if_init calls. * XXX This should be done in ether_ioctl but for the moment * XXX there are too many other parts of the system that * XXX set IFF_UP and so supress if_init being called when * XXX it should be. */ ifa = (struct ifaddr *) data; switch (ifa->ifa_addr->sa_family) { #ifdef INET case AF_INET: if ((ifp->if_flags & IFF_UP) == 0) { ifp->if_flags |= IFF_UP; ifp->if_init(ifp->if_softc); } arp_ifinit(ifp, ifa); break; #endif default: if ((ifp->if_flags & IFF_UP) == 0) { ifp->if_flags |= IFF_UP; ifp->if_init(ifp->if_softc); } break; } break; default: /* * Pass unknown ioctls first to the driver, and if it * returns ENOTTY, then to the generic Ethernet handler. */ if (ic->ic_ioctl != NULL && (error = ic->ic_ioctl(ic, cmd, data)) != ENOTTY) break; error = ether_ioctl(ifp, cmd, data); break; } return (error); } Index: projects/clang380-import/sys/netinet/in_fib.c =================================================================== --- projects/clang380-import/sys/netinet/in_fib.c (revision 294776) +++ projects/clang380-import/sys/netinet/in_fib.c (revision 294777) @@ -1,232 +1,233 @@ /*- * Copyright (c) 2015 * Alexander V. Chernikov * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_route.h" #include "opt_mpath.h" #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #ifdef RADIX_MPATH #include #endif #include #include #include #ifdef INET static void fib4_rte_to_nh_basic(struct rtentry *rte, struct in_addr dst, uint32_t flags, struct nhop4_basic *pnh4); static void fib4_rte_to_nh_extended(struct rtentry *rte, struct in_addr dst, uint32_t flags, struct nhop4_extended *pnh4); #define RNTORT(p) ((struct rtentry *)(p)) static void fib4_rte_to_nh_basic(struct rtentry *rte, struct in_addr dst, uint32_t flags, struct nhop4_basic *pnh4) { struct sockaddr_in *gw; if ((flags & NHR_IFAIF) != 0) pnh4->nh_ifp = rte->rt_ifa->ifa_ifp; else pnh4->nh_ifp = rte->rt_ifp; pnh4->nh_mtu = min(rte->rt_mtu, rte->rt_ifp->if_mtu); if (rte->rt_flags & RTF_GATEWAY) { gw = (struct sockaddr_in *)rte->rt_gateway; pnh4->nh_addr = gw->sin_addr; } else pnh4->nh_addr = dst; /* Set flags */ pnh4->nh_flags = fib_rte_to_nh_flags(rte->rt_flags); gw = (struct sockaddr_in *)rt_key(rte); if (gw->sin_addr.s_addr == 0) pnh4->nh_flags |= NHF_DEFAULT; /* TODO: Handle RTF_BROADCAST here */ } static void fib4_rte_to_nh_extended(struct rtentry *rte, struct in_addr dst, uint32_t flags, struct nhop4_extended *pnh4) { struct sockaddr_in *gw; struct in_ifaddr *ia; if ((flags & NHR_IFAIF) != 0) pnh4->nh_ifp = rte->rt_ifa->ifa_ifp; else pnh4->nh_ifp = rte->rt_ifp; pnh4->nh_mtu = min(rte->rt_mtu, rte->rt_ifp->if_mtu); if (rte->rt_flags & RTF_GATEWAY) { gw = (struct sockaddr_in *)rte->rt_gateway; pnh4->nh_addr = gw->sin_addr; } else pnh4->nh_addr = dst; /* Set flags */ pnh4->nh_flags = fib_rte_to_nh_flags(rte->rt_flags); gw = (struct sockaddr_in *)rt_key(rte); if (gw->sin_addr.s_addr == 0) pnh4->nh_flags |= NHF_DEFAULT; /* XXX: Set RTF_BROADCAST if GW address is broadcast */ ia = ifatoia(rte->rt_ifa); pnh4->nh_src = IA_SIN(ia)->sin_addr; } /* * Performs IPv4 route table lookup on @dst. Returns 0 on success. * Stores nexthop info provided @pnh4 structure. * Note that * - nh_ifp cannot be safely dereferenced * - nh_ifp represents logical transmit interface (rt_ifp) (e.g. if * looking up address on interface "ix0" pointer to "lo0" interface * will be returned instead of "ix0") * - nh_ifp represents "address" interface if NHR_IFAIF flag is passed * - howewer mtu from "transmit" interface will be returned. */ int fib4_lookup_nh_basic(uint32_t fibnum, struct in_addr dst, uint32_t flags, uint32_t flowid, struct nhop4_basic *pnh4) { - struct radix_node_head *rh; + struct rib_head *rh; struct radix_node *rn; struct sockaddr_in sin; struct rtentry *rte; KASSERT((fibnum < rt_numfibs), ("fib4_lookup_nh_basic: bad fibnum")); rh = rt_tables_get_rnh(fibnum, AF_INET); if (rh == NULL) return (ENOENT); /* Prepare lookup key */ memset(&sin, 0, sizeof(sin)); sin.sin_len = sizeof(struct sockaddr_in); sin.sin_addr = dst; - RADIX_NODE_HEAD_RLOCK(rh); - rn = rh->rnh_matchaddr((void *)&sin, rh); + RIB_RLOCK(rh); + rn = rh->rnh_matchaddr((void *)&sin, &rh->head); if (rn != NULL && ((rn->rn_flags & RNF_ROOT) == 0)) { rte = RNTORT(rn); /* Ensure route & ifp is UP */ if (RT_LINK_IS_UP(rte->rt_ifp)) { fib4_rte_to_nh_basic(rte, dst, flags, pnh4); - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (0); } } - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (ENOENT); } /* * Performs IPv4 route table lookup on @dst. Returns 0 on success. * Stores extende nexthop info provided @pnh4 structure. * Note that * - nh_ifp cannot be safely dereferenced unless NHR_REF is specified. * - in that case you need to call fib4_free_nh_ext() * - nh_ifp represents logical transmit interface (rt_ifp) (e.g. if * looking up address of interface "ix0" pointer to "lo0" interface * will be returned instead of "ix0") * - nh_ifp represents "address" interface if NHR_IFAIF flag is passed * - howewer mtu from "transmit" interface will be returned. */ int fib4_lookup_nh_ext(uint32_t fibnum, struct in_addr dst, uint32_t flags, uint32_t flowid, struct nhop4_extended *pnh4) { - struct radix_node_head *rh; + struct rib_head *rh; struct radix_node *rn; struct sockaddr_in sin; struct rtentry *rte; KASSERT((fibnum < rt_numfibs), ("fib4_lookup_nh_ext: bad fibnum")); rh = rt_tables_get_rnh(fibnum, AF_INET); if (rh == NULL) return (ENOENT); /* Prepare lookup key */ memset(&sin, 0, sizeof(sin)); sin.sin_len = sizeof(struct sockaddr_in); sin.sin_addr = dst; - RADIX_NODE_HEAD_RLOCK(rh); - rn = rh->rnh_matchaddr((void *)&sin, rh); + RIB_RLOCK(rh); + rn = rh->rnh_matchaddr((void *)&sin, &rh->head); if (rn != NULL && ((rn->rn_flags & RNF_ROOT) == 0)) { rte = RNTORT(rn); #ifdef RADIX_MPATH rte = rt_mpath_select(rte, flowid); if (rte == NULL) { - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (ENOENT); } #endif /* Ensure route & ifp is UP */ if (RT_LINK_IS_UP(rte->rt_ifp)) { fib4_rte_to_nh_extended(rte, dst, flags, pnh4); if ((flags & NHR_REF) != 0) { /* TODO: lwref on egress ifp's ? */ } - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (0); } } - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (ENOENT); } void fib4_free_nh_ext(uint32_t fibnum, struct nhop4_extended *pnh4) { } #endif Index: projects/clang380-import/sys/netinet/in_rmx.c =================================================================== --- projects/clang380-import/sys/netinet/in_rmx.c (revision 294776) +++ projects/clang380-import/sys/netinet/in_rmx.c (revision 294777) @@ -1,204 +1,204 @@ /*- * Copyright 1994, 1995 Massachusetts Institute of Technology * * Permission to use, copy, modify, and distribute this software and * its documentation for any purpose and without fee is hereby * granted, provided that both the above copyright notice and this * permission notice appear in all copies, that both the above * copyright notice and this permission notice appear in all * supporting documentation, and that the name of M.I.T. not be used * in advertising or publicity pertaining to distribution of the * software without specific, written prior permission. M.I.T. makes * no representations about the suitability of this software for any * purpose. It is provided "as is" without express or implied * warranty. * * THIS SOFTWARE IS PROVIDED BY M.I.T. ``AS IS''. M.I.T. DISCLAIMS * ALL EXPRESS OR IMPLIED WARRANTIES WITH REGARD TO THIS SOFTWARE, * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT * SHALL M.I.T. BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include extern int in_inithead(void **head, int off); #ifdef VIMAGE extern int in_detachhead(void **head, int off); #endif /* * Do what we need to do when inserting a route. */ static struct radix_node * -in_addroute(void *v_arg, void *n_arg, struct radix_node_head *head, +in_addroute(void *v_arg, void *n_arg, struct radix_head *head, struct radix_node *treenodes) { struct rtentry *rt = (struct rtentry *)treenodes; struct sockaddr_in *sin = (struct sockaddr_in *)rt_key(rt); - RADIX_NODE_HEAD_WLOCK_ASSERT(head); /* * A little bit of help for both IP output and input: * For host routes, we make sure that RTF_BROADCAST * is set for anything that looks like a broadcast address. * This way, we can avoid an expensive call to in_broadcast() * in ip_output() most of the time (because the route passed * to ip_output() is almost always a host route). * * We also do the same for local addresses, with the thought * that this might one day be used to speed up ip_input(). * * We also mark routes to multicast addresses as such, because * it's easy to do and might be useful (but this is much more * dubious since it's so easy to inspect the address). */ if (rt->rt_flags & RTF_HOST) { if (in_broadcast(sin->sin_addr, rt->rt_ifp)) { rt->rt_flags |= RTF_BROADCAST; } else if (satosin(rt->rt_ifa->ifa_addr)->sin_addr.s_addr == sin->sin_addr.s_addr) { rt->rt_flags |= RTF_LOCAL; } } if (IN_MULTICAST(ntohl(sin->sin_addr.s_addr))) rt->rt_flags |= RTF_MULTICAST; if (rt->rt_ifp != NULL) { /* * Check route MTU: * inherit interface MTU if not set or * check if MTU is too large. */ if (rt->rt_mtu == 0) { rt->rt_mtu = rt->rt_ifp->if_mtu; } else if (rt->rt_mtu > rt->rt_ifp->if_mtu) rt->rt_mtu = rt->rt_ifp->if_mtu; } return (rn_addroute(v_arg, n_arg, head, treenodes)); } static int _in_rt_was_here; /* * Initialize our routing tree. */ int in_inithead(void **head, int off) { - struct radix_node_head *rnh; + struct rib_head *rh; - if (!rn_inithead(head, 32)) - return 0; + rh = rt_table_init(32); + if (rh == NULL) + return (0); - rnh = *head; - RADIX_NODE_HEAD_LOCK_INIT(rnh); + rh->rnh_addaddr = in_addroute; + *head = (void *)rh; - rnh->rnh_addaddr = in_addroute; if (_in_rt_was_here == 0 ) { _in_rt_was_here = 1; } return 1; } #ifdef VIMAGE int in_detachhead(void **head, int off) { return (rn_detachhead(head)); } #endif /* * This zaps old routes when the interface goes down or interface * address is deleted. In the latter case, it deletes static routes * that point to this address. If we don't do this, we may end up * using the old address in the future. The ones we always want to * get rid of are things like ARP entries, since the user might down * the interface, walk over to a completely different network, and * plug back in. */ struct in_ifadown_arg { struct ifaddr *ifa; int del; }; static int in_ifadownkill(const struct rtentry *rt, void *xap) { struct in_ifadown_arg *ap = xap; if (rt->rt_ifa != ap->ifa) return (0); if ((rt->rt_flags & RTF_STATIC) != 0 && ap->del == 0) return (0); return (1); } void in_ifadown(struct ifaddr *ifa, int delete) { struct in_ifadown_arg arg; KASSERT(ifa->ifa_addr->sa_family == AF_INET, ("%s: wrong family", __func__)); arg.ifa = ifa; arg.del = delete; rt_foreach_fib_walk_del(AF_INET, in_ifadownkill, &arg); ifa->ifa_flags &= ~IFA_ROUTE; /* XXXlocking? */ } /* * inet versions of rt functions. These have fib extensions and * for now will just reference the _fib variants. * eventually this order will be reversed, */ void in_rtalloc_ign(struct route *ro, u_long ignflags, u_int fibnum) { rtalloc_ign_fib(ro, ignflags, fibnum); } void in_rtredirect(struct sockaddr *dst, struct sockaddr *gateway, struct sockaddr *netmask, int flags, struct sockaddr *src, u_int fibnum) { rtredirect_fib(dst, gateway, netmask, flags, src, fibnum); } Index: projects/clang380-import/sys/netinet/in_var.h =================================================================== --- projects/clang380-import/sys/netinet/in_var.h (revision 294776) +++ projects/clang380-import/sys/netinet/in_var.h (revision 294777) @@ -1,397 +1,396 @@ /*- * Copyright (c) 1985, 1986, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)in_var.h 8.2 (Berkeley) 1/9/95 * $FreeBSD$ */ #ifndef _NETINET_IN_VAR_H_ #define _NETINET_IN_VAR_H_ /* * Argument structure for SIOCAIFADDR. */ struct in_aliasreq { char ifra_name[IFNAMSIZ]; /* if name, e.g. "en0" */ struct sockaddr_in ifra_addr; struct sockaddr_in ifra_broadaddr; #define ifra_dstaddr ifra_broadaddr struct sockaddr_in ifra_mask; int ifra_vhid; }; #ifdef _KERNEL #include #include #include struct igmp_ifsoftc; struct in_multi; struct lltable; /* * IPv4 per-interface state. */ struct in_ifinfo { struct lltable *ii_llt; /* ARP state */ struct igmp_ifsoftc *ii_igmp; /* IGMP state */ struct in_multi *ii_allhosts; /* 224.0.0.1 membership */ }; /* * Interface address, Internet version. One of these structures * is allocated for each Internet address on an interface. * The ifaddr structure contains the protocol-independent part * of the structure and is assumed to be first. */ struct in_ifaddr { struct ifaddr ia_ifa; /* protocol-independent info */ #define ia_ifp ia_ifa.ifa_ifp #define ia_flags ia_ifa.ifa_flags /* ia_subnet{,mask} in host order */ u_long ia_subnet; /* subnet address */ u_long ia_subnetmask; /* mask of subnet */ LIST_ENTRY(in_ifaddr) ia_hash; /* entry in bucket of inet addresses */ TAILQ_ENTRY(in_ifaddr) ia_link; /* list of internet addresses */ struct sockaddr_in ia_addr; /* reserve space for interface name */ struct sockaddr_in ia_dstaddr; /* reserve space for broadcast addr */ #define ia_broadaddr ia_dstaddr struct sockaddr_in ia_sockmask; /* reserve space for general netmask */ }; /* * Given a pointer to an in_ifaddr (ifaddr), * return a pointer to the addr as a sockaddr_in. */ #define IA_SIN(ia) (&(((struct in_ifaddr *)(ia))->ia_addr)) #define IA_DSTSIN(ia) (&(((struct in_ifaddr *)(ia))->ia_dstaddr)) #define IA_MASKSIN(ia) (&(((struct in_ifaddr *)(ia))->ia_sockmask)) #define IN_LNAOF(in, ifa) \ ((ntohl((in).s_addr) & ~((struct in_ifaddr *)(ifa)->ia_subnetmask)) extern u_char inetctlerrmap[]; #define LLTABLE(ifp) \ ((struct in_ifinfo *)(ifp)->if_afdata[AF_INET])->ii_llt /* * Hash table for IP addresses. */ TAILQ_HEAD(in_ifaddrhead, in_ifaddr); LIST_HEAD(in_ifaddrhashhead, in_ifaddr); VNET_DECLARE(struct in_ifaddrhashhead *, in_ifaddrhashtbl); VNET_DECLARE(struct in_ifaddrhead, in_ifaddrhead); VNET_DECLARE(u_long, in_ifaddrhmask); /* mask for hash table */ #define V_in_ifaddrhashtbl VNET(in_ifaddrhashtbl) #define V_in_ifaddrhead VNET(in_ifaddrhead) #define V_in_ifaddrhmask VNET(in_ifaddrhmask) #define INADDR_NHASH_LOG2 9 #define INADDR_NHASH (1 << INADDR_NHASH_LOG2) #define INADDR_HASHVAL(x) fnv_32_buf((&(x)), sizeof(x), FNV1_32_INIT) #define INADDR_HASH(x) \ (&V_in_ifaddrhashtbl[INADDR_HASHVAL(x) & V_in_ifaddrhmask]) extern struct rmlock in_ifaddr_lock; #define IN_IFADDR_LOCK_ASSERT() rm_assert(&in_ifaddr_lock, RA_LOCKED) #define IN_IFADDR_RLOCK(t) rm_rlock(&in_ifaddr_lock, (t)) #define IN_IFADDR_RLOCK_ASSERT() rm_assert(&in_ifaddr_lock, RA_RLOCKED) #define IN_IFADDR_RUNLOCK(t) rm_runlock(&in_ifaddr_lock, (t)) #define IN_IFADDR_WLOCK() rm_wlock(&in_ifaddr_lock) #define IN_IFADDR_WLOCK_ASSERT() rm_assert(&in_ifaddr_lock, RA_WLOCKED) #define IN_IFADDR_WUNLOCK() rm_wunlock(&in_ifaddr_lock) /* * Macro for finding the internet address structure (in_ifaddr) * corresponding to one of our IP addresses (in_addr). */ #define INADDR_TO_IFADDR(addr, ia) \ /* struct in_addr addr; */ \ /* struct in_ifaddr *ia; */ \ do { \ \ LIST_FOREACH(ia, INADDR_HASH((addr).s_addr), ia_hash) \ if (IA_SIN(ia)->sin_addr.s_addr == (addr).s_addr) \ break; \ } while (0) /* * Macro for finding the interface (ifnet structure) corresponding to one * of our IP addresses. */ #define INADDR_TO_IFP(addr, ifp) \ /* struct in_addr addr; */ \ /* struct ifnet *ifp; */ \ { \ struct in_ifaddr *ia; \ \ INADDR_TO_IFADDR(addr, ia); \ (ifp) = (ia == NULL) ? NULL : ia->ia_ifp; \ } /* * Macro for finding the internet address structure (in_ifaddr) corresponding * to a given interface (ifnet structure). */ #define IFP_TO_IA(ifp, ia, t) \ /* struct ifnet *ifp; */ \ /* struct in_ifaddr *ia; */ \ /* struct rm_priotracker *t; */ \ do { \ IN_IFADDR_RLOCK((t)); \ for ((ia) = TAILQ_FIRST(&V_in_ifaddrhead); \ (ia) != NULL && (ia)->ia_ifp != (ifp); \ (ia) = TAILQ_NEXT((ia), ia_link)) \ continue; \ if ((ia) != NULL) \ ifa_ref(&(ia)->ia_ifa); \ IN_IFADDR_RUNLOCK((t)); \ } while (0) /* * Legacy IPv4 IGMP per-link structure. */ struct router_info { struct ifnet *rti_ifp; int rti_type; /* type of router which is querier on this interface */ int rti_time; /* # of slow timeouts since last old query */ SLIST_ENTRY(router_info) rti_list; }; /* * IPv4 multicast IGMP-layer source entry. */ struct ip_msource { RB_ENTRY(ip_msource) ims_link; /* RB tree links */ in_addr_t ims_haddr; /* host byte order */ struct ims_st { uint16_t ex; /* # of exclusive members */ uint16_t in; /* # of inclusive members */ } ims_st[2]; /* state at t0, t1 */ uint8_t ims_stp; /* pending query */ }; /* * IPv4 multicast PCB-layer source entry. */ struct in_msource { RB_ENTRY(ip_msource) ims_link; /* RB tree links */ in_addr_t ims_haddr; /* host byte order */ uint8_t imsl_st[2]; /* state before/at commit */ }; RB_HEAD(ip_msource_tree, ip_msource); /* define struct ip_msource_tree */ static __inline int ip_msource_cmp(const struct ip_msource *a, const struct ip_msource *b) { if (a->ims_haddr < b->ims_haddr) return (-1); if (a->ims_haddr == b->ims_haddr) return (0); return (1); } RB_PROTOTYPE(ip_msource_tree, ip_msource, ims_link, ip_msource_cmp); /* * IPv4 multicast PCB-layer group filter descriptor. */ struct in_mfilter { struct ip_msource_tree imf_sources; /* source list for (S,G) */ u_long imf_nsrc; /* # of source entries */ uint8_t imf_st[2]; /* state before/at commit */ }; /* * IPv4 group descriptor. * * For every entry on an ifnet's if_multiaddrs list which represents * an IP multicast group, there is one of these structures. * * If any source filters are present, then a node will exist in the RB-tree * to permit fast lookup by source whenever an operation takes place. * This permits pre-order traversal when we issue reports. * Source filter trees are kept separately from the socket layer to * greatly simplify locking. * * When IGMPv3 is active, inm_timer is the response to group query timer. * The state-change timer inm_sctimer is separate; whenever state changes * for the group the state change record is generated and transmitted, * and kept if retransmissions are necessary. * * FUTURE: inm_link is now only used when groups are being purged * on a detaching ifnet. It could be demoted to a SLIST_ENTRY, but * because it is at the very start of the struct, we can't do this * w/o breaking the ABI for ifmcstat. */ struct in_multi { LIST_ENTRY(in_multi) inm_link; /* to-be-released by in_ifdetach */ struct in_addr inm_addr; /* IP multicast address, convenience */ struct ifnet *inm_ifp; /* back pointer to ifnet */ struct ifmultiaddr *inm_ifma; /* back pointer to ifmultiaddr */ u_int inm_timer; /* IGMPv1/v2 group / v3 query timer */ u_int inm_state; /* state of the membership */ void *inm_rti; /* unused, legacy field */ u_int inm_refcount; /* reference count */ /* New fields for IGMPv3 follow. */ struct igmp_ifsoftc *inm_igi; /* IGMP info */ SLIST_ENTRY(in_multi) inm_nrele; /* to-be-released by IGMP */ struct ip_msource_tree inm_srcs; /* tree of sources */ u_long inm_nsrc; /* # of tree entries */ struct mbufq inm_scq; /* queue of pending * state-change packets */ struct timeval inm_lastgsrtv; /* Time of last G-S-R query */ uint16_t inm_sctimer; /* state-change timer */ uint16_t inm_scrv; /* state-change rexmit count */ /* * SSM state counters which track state at T0 (the time the last * state-change report's RV timer went to zero) and T1 * (time of pending report, i.e. now). * Used for computing IGMPv3 state-change reports. Several refcounts * are maintained here to optimize for common use-cases. */ struct inm_st { uint16_t iss_fmode; /* IGMP filter mode */ uint16_t iss_asm; /* # of ASM listeners */ uint16_t iss_ex; /* # of exclusive members */ uint16_t iss_in; /* # of inclusive members */ uint16_t iss_rec; /* # of recorded sources */ } inm_st[2]; /* state at t0, t1 */ }; /* * Helper function to derive the filter mode on a source entry * from its internal counters. Predicates are: * A source is only excluded if all listeners exclude it. * A source is only included if no listeners exclude it, * and at least one listener includes it. * May be used by ifmcstat(8). */ static __inline uint8_t ims_get_mode(const struct in_multi *inm, const struct ip_msource *ims, uint8_t t) { t = !!t; if (inm->inm_st[t].iss_ex > 0 && inm->inm_st[t].iss_ex == ims->ims_st[t].ex) return (MCAST_EXCLUDE); else if (ims->ims_st[t].in > 0 && ims->ims_st[t].ex == 0) return (MCAST_INCLUDE); return (MCAST_UNDEFINED); } #ifdef SYSCTL_DECL SYSCTL_DECL(_net_inet); SYSCTL_DECL(_net_inet_ip); SYSCTL_DECL(_net_inet_raw); #endif /* * Lock macros for IPv4 layer multicast address lists. IPv4 lock goes * before link layer multicast locks in the lock order. In most cases, * consumers of IN_*_MULTI() macros should acquire the locks before * calling them; users of the in_{add,del}multi() functions should not. */ extern struct mtx in_multi_mtx; #define IN_MULTI_LOCK() mtx_lock(&in_multi_mtx) #define IN_MULTI_UNLOCK() mtx_unlock(&in_multi_mtx) #define IN_MULTI_LOCK_ASSERT() mtx_assert(&in_multi_mtx, MA_OWNED) #define IN_MULTI_UNLOCK_ASSERT() mtx_assert(&in_multi_mtx, MA_NOTOWNED) /* Acquire an in_multi record. */ static __inline void inm_acquire_locked(struct in_multi *inm) { IN_MULTI_LOCK_ASSERT(); ++inm->inm_refcount; } /* * Return values for imo_multi_filter(). */ #define MCAST_PASS 0 /* Pass */ #define MCAST_NOTGMEMBER 1 /* This host not a member of group */ #define MCAST_NOTSMEMBER 2 /* This host excluded source */ #define MCAST_MUTED 3 /* [deprecated] */ struct rtentry; struct route; struct ip_moptions; -struct radix_node_head; struct in_multi *inm_lookup_locked(struct ifnet *, const struct in_addr); struct in_multi *inm_lookup(struct ifnet *, const struct in_addr); int imo_multi_filter(const struct ip_moptions *, const struct ifnet *, const struct sockaddr *, const struct sockaddr *); void inm_commit(struct in_multi *); void inm_clear_recorded(struct in_multi *); void inm_print(const struct in_multi *); int inm_record_source(struct in_multi *inm, const in_addr_t); void inm_release(struct in_multi *); void inm_release_locked(struct in_multi *); struct in_multi * in_addmulti(struct in_addr *, struct ifnet *); void in_delmulti(struct in_multi *); int in_joingroup(struct ifnet *, const struct in_addr *, /*const*/ struct in_mfilter *, struct in_multi **); int in_joingroup_locked(struct ifnet *, const struct in_addr *, /*const*/ struct in_mfilter *, struct in_multi **); int in_leavegroup(struct in_multi *, /*const*/ struct in_mfilter *); int in_leavegroup_locked(struct in_multi *, /*const*/ struct in_mfilter *); int in_control(struct socket *, u_long, caddr_t, struct ifnet *, struct thread *); int in_addprefix(struct in_ifaddr *, int); int in_scrubprefix(struct in_ifaddr *, u_int); void ip_input(struct mbuf *); void ip_direct_input(struct mbuf *); void in_ifadown(struct ifaddr *ifa, int); struct mbuf *ip_tryforward(struct mbuf *); void *in_domifattach(struct ifnet *); void in_domifdetach(struct ifnet *, void *); /* XXX */ void in_rtalloc_ign(struct route *ro, u_long ignflags, u_int fibnum); void in_rtredirect(struct sockaddr *, struct sockaddr *, struct sockaddr *, int, struct sockaddr *, u_int); #endif /* _KERNEL */ /* INET6 stuff */ #include #endif /* _NETINET_IN_VAR_H_ */ Index: projects/clang380-import/sys/netinet/tcp_subr.c =================================================================== --- projects/clang380-import/sys/netinet/tcp_subr.c (revision 294776) +++ projects/clang380-import/sys/netinet/tcp_subr.c (revision 294777) @@ -1,2922 +1,2913 @@ /*- * Copyright (c) 1982, 1986, 1988, 1990, 1993, 1995 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)tcp_subr.c 8.2 (Berkeley) 5/24/95 */ #include __FBSDID("$FreeBSD$"); #include "opt_compat.h" #include "opt_inet.h" #include "opt_inet6.h" #include "opt_ipsec.h" #include "opt_tcpdebug.h" #include #include #include #include #include #include #include #include #include #include #include #include #ifdef INET6 #include #endif #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #ifdef INET6 #include +#include #include #include #include #include #endif #ifdef TCP_RFC7413 #include #endif #include #include #include #include #include #include #include #ifdef INET6 #include #endif #include #ifdef TCPPCAP #include #endif #ifdef TCPDEBUG #include #endif #ifdef INET6 #include #endif #ifdef TCP_OFFLOAD #include #endif #ifdef IPSEC #include #include #ifdef INET6 #include #endif #include #include #endif /*IPSEC*/ #include #include #include VNET_DEFINE(int, tcp_mssdflt) = TCP_MSS; #ifdef INET6 VNET_DEFINE(int, tcp_v6mssdflt) = TCP6_MSS; #endif struct rwlock tcp_function_lock; static int sysctl_net_inet_tcp_mss_check(SYSCTL_HANDLER_ARGS) { int error, new; new = V_tcp_mssdflt; error = sysctl_handle_int(oidp, &new, 0, req); if (error == 0 && req->newptr) { if (new < TCP_MINMSS) error = EINVAL; else V_tcp_mssdflt = new; } return (error); } SYSCTL_PROC(_net_inet_tcp, TCPCTL_MSSDFLT, mssdflt, CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW, &VNET_NAME(tcp_mssdflt), 0, &sysctl_net_inet_tcp_mss_check, "I", "Default TCP Maximum Segment Size"); #ifdef INET6 static int sysctl_net_inet_tcp_mss_v6_check(SYSCTL_HANDLER_ARGS) { int error, new; new = V_tcp_v6mssdflt; error = sysctl_handle_int(oidp, &new, 0, req); if (error == 0 && req->newptr) { if (new < TCP_MINMSS) error = EINVAL; else V_tcp_v6mssdflt = new; } return (error); } SYSCTL_PROC(_net_inet_tcp, TCPCTL_V6MSSDFLT, v6mssdflt, CTLFLAG_VNET | CTLTYPE_INT | CTLFLAG_RW, &VNET_NAME(tcp_v6mssdflt), 0, &sysctl_net_inet_tcp_mss_v6_check, "I", "Default TCP Maximum Segment Size for IPv6"); #endif /* INET6 */ /* * Minimum MSS we accept and use. This prevents DoS attacks where * we are forced to a ridiculous low MSS like 20 and send hundreds * of packets instead of one. The effect scales with the available * bandwidth and quickly saturates the CPU and network interface * with packet generation and sending. Set to zero to disable MINMSS * checking. This setting prevents us from sending too small packets. */ VNET_DEFINE(int, tcp_minmss) = TCP_MINMSS; SYSCTL_INT(_net_inet_tcp, OID_AUTO, minmss, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(tcp_minmss), 0, "Minimum TCP Maximum Segment Size"); VNET_DEFINE(int, tcp_do_rfc1323) = 1; SYSCTL_INT(_net_inet_tcp, TCPCTL_DO_RFC1323, rfc1323, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(tcp_do_rfc1323), 0, "Enable rfc1323 (high performance TCP) extensions"); static int tcp_log_debug = 0; SYSCTL_INT(_net_inet_tcp, OID_AUTO, log_debug, CTLFLAG_RW, &tcp_log_debug, 0, "Log errors caused by incoming TCP segments"); static int tcp_tcbhashsize; SYSCTL_INT(_net_inet_tcp, OID_AUTO, tcbhashsize, CTLFLAG_RDTUN | CTLFLAG_NOFETCH, &tcp_tcbhashsize, 0, "Size of TCP control-block hashtable"); static int do_tcpdrain = 1; SYSCTL_INT(_net_inet_tcp, OID_AUTO, do_tcpdrain, CTLFLAG_RW, &do_tcpdrain, 0, "Enable tcp_drain routine for extra help when low on mbufs"); SYSCTL_UINT(_net_inet_tcp, OID_AUTO, pcbcount, CTLFLAG_VNET | CTLFLAG_RD, &VNET_NAME(tcbinfo.ipi_count), 0, "Number of active PCBs"); static VNET_DEFINE(int, icmp_may_rst) = 1; #define V_icmp_may_rst VNET(icmp_may_rst) SYSCTL_INT(_net_inet_tcp, OID_AUTO, icmp_may_rst, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(icmp_may_rst), 0, "Certain ICMP unreachable messages may abort connections in SYN_SENT"); static VNET_DEFINE(int, tcp_isn_reseed_interval) = 0; #define V_tcp_isn_reseed_interval VNET(tcp_isn_reseed_interval) SYSCTL_INT(_net_inet_tcp, OID_AUTO, isn_reseed_interval, CTLFLAG_VNET | CTLFLAG_RW, &VNET_NAME(tcp_isn_reseed_interval), 0, "Seconds between reseeding of ISN secret"); static int tcp_soreceive_stream; SYSCTL_INT(_net_inet_tcp, OID_AUTO, soreceive_stream, CTLFLAG_RDTUN, &tcp_soreceive_stream, 0, "Using soreceive_stream for TCP sockets"); #ifdef TCP_SIGNATURE static int tcp_sig_checksigs = 1; SYSCTL_INT(_net_inet_tcp, OID_AUTO, signature_verify_input, CTLFLAG_RW, &tcp_sig_checksigs, 0, "Verify RFC2385 digests on inbound traffic"); #endif VNET_DEFINE(uma_zone_t, sack_hole_zone); #define V_sack_hole_zone VNET(sack_hole_zone) VNET_DEFINE(struct hhook_head *, tcp_hhh[HHOOK_TCP_LAST+1]); static struct inpcb *tcp_notify(struct inpcb *, int); static struct inpcb *tcp_mtudisc_notify(struct inpcb *, int); static void tcp_mtudisc(struct inpcb *, int); static char * tcp_log_addr(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, const void *ip6hdr); static void tcp_timer_discard(struct tcpcb *, uint32_t); static struct tcp_function_block tcp_def_funcblk = { "default", tcp_output, tcp_do_segment, tcp_default_ctloutput, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 0, 0 }; struct tcp_funchead t_functions; static struct tcp_function_block *tcp_func_set_ptr = &tcp_def_funcblk; static struct tcp_function_block * find_tcp_functions_locked(struct tcp_function_set *fs) { struct tcp_function *f; struct tcp_function_block *blk=NULL; TAILQ_FOREACH(f, &t_functions, tf_next) { if (strcmp(f->tf_fb->tfb_tcp_block_name, fs->function_set_name) == 0) { blk = f->tf_fb; break; } } return(blk); } static struct tcp_function_block * find_tcp_fb_locked(struct tcp_function_block *blk, struct tcp_function **s) { struct tcp_function_block *rblk=NULL; struct tcp_function *f; TAILQ_FOREACH(f, &t_functions, tf_next) { if (f->tf_fb == blk) { rblk = blk; if (s) { *s = f; } break; } } return (rblk); } struct tcp_function_block * find_and_ref_tcp_functions(struct tcp_function_set *fs) { struct tcp_function_block *blk; rw_rlock(&tcp_function_lock); blk = find_tcp_functions_locked(fs); if (blk) refcount_acquire(&blk->tfb_refcnt); rw_runlock(&tcp_function_lock); return(blk); } struct tcp_function_block * find_and_ref_tcp_fb(struct tcp_function_block *blk) { struct tcp_function_block *rblk; rw_rlock(&tcp_function_lock); rblk = find_tcp_fb_locked(blk, NULL); if (rblk) refcount_acquire(&rblk->tfb_refcnt); rw_runlock(&tcp_function_lock); return(rblk); } static int sysctl_net_inet_default_tcp_functions(SYSCTL_HANDLER_ARGS) { int error=ENOENT; struct tcp_function_set fs; struct tcp_function_block *blk; memset(&fs, 0, sizeof(fs)); rw_rlock(&tcp_function_lock); blk = find_tcp_fb_locked(tcp_func_set_ptr, NULL); if (blk) { /* Found him */ strcpy(fs.function_set_name, blk->tfb_tcp_block_name); fs.pcbcnt = blk->tfb_refcnt; } rw_runlock(&tcp_function_lock); error = sysctl_handle_string(oidp, fs.function_set_name, sizeof(fs.function_set_name), req); /* Check for error or no change */ if (error != 0 || req->newptr == NULL) return(error); rw_wlock(&tcp_function_lock); blk = find_tcp_functions_locked(&fs); if ((blk == NULL) || (blk->tfb_flags & TCP_FUNC_BEING_REMOVED)) { error = ENOENT; goto done; } tcp_func_set_ptr = blk; done: rw_wunlock(&tcp_function_lock); return (error); } SYSCTL_PROC(_net_inet_tcp, OID_AUTO, functions_default, CTLTYPE_STRING | CTLFLAG_RW, NULL, 0, sysctl_net_inet_default_tcp_functions, "A", "Set/get the default TCP functions"); static int sysctl_net_inet_list_available(SYSCTL_HANDLER_ARGS) { int error, cnt, linesz; struct tcp_function *f; char *buffer, *cp; size_t bufsz, outsz; cnt = 0; rw_rlock(&tcp_function_lock); TAILQ_FOREACH(f, &t_functions, tf_next) { cnt++; } rw_runlock(&tcp_function_lock); bufsz = (cnt+2) * (TCP_FUNCTION_NAME_LEN_MAX + 12) + 1; buffer = malloc(bufsz, M_TEMP, M_WAITOK); error = 0; cp = buffer; linesz = snprintf(cp, bufsz, "\n%-32s%c %s\n", "Stack", 'D', "PCB count"); cp += linesz; bufsz -= linesz; outsz = linesz; rw_rlock(&tcp_function_lock); TAILQ_FOREACH(f, &t_functions, tf_next) { linesz = snprintf(cp, bufsz, "%-32s%c %u\n", f->tf_fb->tfb_tcp_block_name, (f->tf_fb == tcp_func_set_ptr) ? '*' : ' ', f->tf_fb->tfb_refcnt); if (linesz >= bufsz) { error = EOVERFLOW; break; } cp += linesz; bufsz -= linesz; outsz += linesz; } rw_runlock(&tcp_function_lock); if (error == 0) error = sysctl_handle_string(oidp, buffer, outsz + 1, req); free(buffer, M_TEMP); return (error); } SYSCTL_PROC(_net_inet_tcp, OID_AUTO, functions_available, CTLTYPE_STRING|CTLFLAG_RD, NULL, 0, sysctl_net_inet_list_available, "A", "list available TCP Function sets"); /* * Target size of TCP PCB hash tables. Must be a power of two. * * Note that this can be overridden by the kernel environment * variable net.inet.tcp.tcbhashsize */ #ifndef TCBHASHSIZE #define TCBHASHSIZE 0 #endif /* * XXX * Callouts should be moved into struct tcp directly. They are currently * separate because the tcpcb structure is exported to userland for sysctl * parsing purposes, which do not know about callouts. */ struct tcpcb_mem { struct tcpcb tcb; struct tcp_timer tt; struct cc_var ccv; struct osd osd; }; static VNET_DEFINE(uma_zone_t, tcpcb_zone); #define V_tcpcb_zone VNET(tcpcb_zone) MALLOC_DEFINE(M_TCPLOG, "tcplog", "TCP address and flags print buffers"); MALLOC_DEFINE(M_TCPFUNCTIONS, "tcpfunc", "TCP function set memory"); static struct mtx isn_mtx; #define ISN_LOCK_INIT() mtx_init(&isn_mtx, "isn_mtx", NULL, MTX_DEF) #define ISN_LOCK() mtx_lock(&isn_mtx) #define ISN_UNLOCK() mtx_unlock(&isn_mtx) /* * TCP initialization. */ static void tcp_zone_change(void *tag) { uma_zone_set_max(V_tcbinfo.ipi_zone, maxsockets); uma_zone_set_max(V_tcpcb_zone, maxsockets); tcp_tw_zone_change(); } static int tcp_inpcb_init(void *mem, int size, int flags) { struct inpcb *inp = mem; INP_LOCK_INIT(inp, "inp", "tcpinp"); return (0); } /* * Take a value and get the next power of 2 that doesn't overflow. * Used to size the tcp_inpcb hash buckets. */ static int maketcp_hashsize(int size) { int hashsize; /* * auto tune. * get the next power of 2 higher than maxsockets. */ hashsize = 1 << fls(size); /* catch overflow, and just go one power of 2 smaller */ if (hashsize < size) { hashsize = 1 << (fls(size) - 1); } return (hashsize); } int register_tcp_functions(struct tcp_function_block *blk, int wait) { struct tcp_function_block *lblk; struct tcp_function *n; struct tcp_function_set fs; if ((blk->tfb_tcp_output == NULL) || (blk->tfb_tcp_do_segment == NULL) || (blk->tfb_tcp_ctloutput == NULL) || (strlen(blk->tfb_tcp_block_name) == 0)) { /* * These functions are required and you * need a name. */ return (EINVAL); } if (blk->tfb_tcp_timer_stop_all || blk->tfb_tcp_timers_left || blk->tfb_tcp_timer_activate || blk->tfb_tcp_timer_active || blk->tfb_tcp_timer_stop) { /* * If you define one timer function you * must have them all. */ if ((blk->tfb_tcp_timer_stop_all == NULL) || (blk->tfb_tcp_timers_left == NULL) || (blk->tfb_tcp_timer_activate == NULL) || (blk->tfb_tcp_timer_active == NULL) || (blk->tfb_tcp_timer_stop == NULL)) { return (EINVAL); } } n = malloc(sizeof(struct tcp_function), M_TCPFUNCTIONS, wait); if (n == NULL) { return (ENOMEM); } n->tf_fb = blk; strcpy(fs.function_set_name, blk->tfb_tcp_block_name); rw_wlock(&tcp_function_lock); lblk = find_tcp_functions_locked(&fs); if (lblk) { /* Duplicate name space not allowed */ rw_wunlock(&tcp_function_lock); free(n, M_TCPFUNCTIONS); return (EALREADY); } refcount_init(&blk->tfb_refcnt, 0); blk->tfb_flags = 0; TAILQ_INSERT_TAIL(&t_functions, n, tf_next); rw_wunlock(&tcp_function_lock); return(0); } int deregister_tcp_functions(struct tcp_function_block *blk) { struct tcp_function_block *lblk; struct tcp_function *f; int error=ENOENT; if (strcmp(blk->tfb_tcp_block_name, "default") == 0) { /* You can't un-register the default */ return (EPERM); } rw_wlock(&tcp_function_lock); if (blk == tcp_func_set_ptr) { /* You can't free the current default */ rw_wunlock(&tcp_function_lock); return (EBUSY); } if (blk->tfb_refcnt) { /* Still tcb attached, mark it. */ blk->tfb_flags |= TCP_FUNC_BEING_REMOVED; rw_wunlock(&tcp_function_lock); return (EBUSY); } lblk = find_tcp_fb_locked(blk, &f); if (lblk) { /* Found */ TAILQ_REMOVE(&t_functions, f, tf_next); f->tf_fb = NULL; free(f, M_TCPFUNCTIONS); error = 0; } rw_wunlock(&tcp_function_lock); return (error); } void tcp_init(void) { const char *tcbhash_tuneable; int hashsize; tcbhash_tuneable = "net.inet.tcp.tcbhashsize"; if (hhook_head_register(HHOOK_TYPE_TCP, HHOOK_TCP_EST_IN, &V_tcp_hhh[HHOOK_TCP_EST_IN], HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0) printf("%s: WARNING: unable to register helper hook\n", __func__); if (hhook_head_register(HHOOK_TYPE_TCP, HHOOK_TCP_EST_OUT, &V_tcp_hhh[HHOOK_TCP_EST_OUT], HHOOK_NOWAIT|HHOOK_HEADISINVNET) != 0) printf("%s: WARNING: unable to register helper hook\n", __func__); hashsize = TCBHASHSIZE; TUNABLE_INT_FETCH(tcbhash_tuneable, &hashsize); if (hashsize == 0) { /* * Auto tune the hash size based on maxsockets. * A perfect hash would have a 1:1 mapping * (hashsize = maxsockets) however it's been * suggested that O(2) average is better. */ hashsize = maketcp_hashsize(maxsockets / 4); /* * Our historical default is 512, * do not autotune lower than this. */ if (hashsize < 512) hashsize = 512; if (bootverbose && IS_DEFAULT_VNET(curvnet)) printf("%s: %s auto tuned to %d\n", __func__, tcbhash_tuneable, hashsize); } /* * We require a hashsize to be a power of two. * Previously if it was not a power of two we would just reset it * back to 512, which could be a nasty surprise if you did not notice * the error message. * Instead what we do is clip it to the closest power of two lower * than the specified hash value. */ if (!powerof2(hashsize)) { int oldhashsize = hashsize; hashsize = maketcp_hashsize(hashsize); /* prevent absurdly low value */ if (hashsize < 16) hashsize = 16; printf("%s: WARNING: TCB hash size not a power of 2, " "clipped from %d to %d.\n", __func__, oldhashsize, hashsize); } in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, hashsize, hashsize, "tcp_inpcb", tcp_inpcb_init, NULL, UMA_ZONE_NOFREE, IPI_HASHFIELDS_4TUPLE); /* * These have to be type stable for the benefit of the timers. */ V_tcpcb_zone = uma_zcreate("tcpcb", sizeof(struct tcpcb_mem), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); uma_zone_set_max(V_tcpcb_zone, maxsockets); uma_zone_set_warning(V_tcpcb_zone, "kern.ipc.maxsockets limit reached"); tcp_tw_init(); syncache_init(); tcp_hc_init(); TUNABLE_INT_FETCH("net.inet.tcp.sack.enable", &V_tcp_do_sack); V_sack_hole_zone = uma_zcreate("sackhole", sizeof(struct sackhole), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); /* Skip initialization of globals for non-default instances. */ if (!IS_DEFAULT_VNET(curvnet)) return; tcp_reass_global_init(); /* XXX virtualize those bellow? */ tcp_delacktime = TCPTV_DELACK; tcp_keepinit = TCPTV_KEEP_INIT; tcp_keepidle = TCPTV_KEEP_IDLE; tcp_keepintvl = TCPTV_KEEPINTVL; tcp_maxpersistidle = TCPTV_KEEP_IDLE; tcp_msl = TCPTV_MSL; tcp_rexmit_min = TCPTV_MIN; if (tcp_rexmit_min < 1) tcp_rexmit_min = 1; tcp_rexmit_slop = TCPTV_CPU_VAR; tcp_finwait2_timeout = TCPTV_FINWAIT2_TIMEOUT; tcp_tcbhashsize = hashsize; /* Setup the tcp function block list */ TAILQ_INIT(&t_functions); rw_init_flags(&tcp_function_lock, "tcp_func_lock" , 0); register_tcp_functions(&tcp_def_funcblk, M_WAITOK); if (tcp_soreceive_stream) { #ifdef INET tcp_usrreqs.pru_soreceive = soreceive_stream; #endif #ifdef INET6 tcp6_usrreqs.pru_soreceive = soreceive_stream; #endif /* INET6 */ } #ifdef INET6 #define TCP_MINPROTOHDR (sizeof(struct ip6_hdr) + sizeof(struct tcphdr)) #else /* INET6 */ #define TCP_MINPROTOHDR (sizeof(struct tcpiphdr)) #endif /* INET6 */ if (max_protohdr < TCP_MINPROTOHDR) max_protohdr = TCP_MINPROTOHDR; if (max_linkhdr + TCP_MINPROTOHDR > MHLEN) panic("tcp_init"); #undef TCP_MINPROTOHDR ISN_LOCK_INIT(); EVENTHANDLER_REGISTER(shutdown_pre_sync, tcp_fini, NULL, SHUTDOWN_PRI_DEFAULT); EVENTHANDLER_REGISTER(maxsockets_change, tcp_zone_change, NULL, EVENTHANDLER_PRI_ANY); #ifdef TCPPCAP tcp_pcap_init(); #endif #ifdef TCP_RFC7413 tcp_fastopen_init(); #endif } #ifdef VIMAGE void tcp_destroy(void) { int error; #ifdef TCP_RFC7413 tcp_fastopen_destroy(); #endif tcp_hc_destroy(); syncache_destroy(); tcp_tw_destroy(); in_pcbinfo_destroy(&V_tcbinfo); uma_zdestroy(V_sack_hole_zone); uma_zdestroy(V_tcpcb_zone); error = hhook_head_deregister(V_tcp_hhh[HHOOK_TCP_EST_IN]); if (error != 0) { printf("%s: WARNING: unable to deregister helper hook " "type=%d, id=%d: error %d returned\n", __func__, HHOOK_TYPE_TCP, HHOOK_TCP_EST_IN, error); } error = hhook_head_deregister(V_tcp_hhh[HHOOK_TCP_EST_OUT]); if (error != 0) { printf("%s: WARNING: unable to deregister helper hook " "type=%d, id=%d: error %d returned\n", __func__, HHOOK_TYPE_TCP, HHOOK_TCP_EST_OUT, error); } } #endif void tcp_fini(void *xtp) { } /* * Fill in the IP and TCP headers for an outgoing packet, given the tcpcb. * tcp_template used to store this data in mbufs, but we now recopy it out * of the tcpcb each time to conserve mbufs. */ void tcpip_fillheaders(struct inpcb *inp, void *ip_ptr, void *tcp_ptr) { struct tcphdr *th = (struct tcphdr *)tcp_ptr; INP_WLOCK_ASSERT(inp); #ifdef INET6 if ((inp->inp_vflag & INP_IPV6) != 0) { struct ip6_hdr *ip6; ip6 = (struct ip6_hdr *)ip_ptr; ip6->ip6_flow = (ip6->ip6_flow & ~IPV6_FLOWINFO_MASK) | (inp->inp_flow & IPV6_FLOWINFO_MASK); ip6->ip6_vfc = (ip6->ip6_vfc & ~IPV6_VERSION_MASK) | (IPV6_VERSION & IPV6_VERSION_MASK); ip6->ip6_nxt = IPPROTO_TCP; ip6->ip6_plen = htons(sizeof(struct tcphdr)); ip6->ip6_src = inp->in6p_laddr; ip6->ip6_dst = inp->in6p_faddr; } #endif /* INET6 */ #if defined(INET6) && defined(INET) else #endif #ifdef INET { struct ip *ip; ip = (struct ip *)ip_ptr; ip->ip_v = IPVERSION; ip->ip_hl = 5; ip->ip_tos = inp->inp_ip_tos; ip->ip_len = 0; ip->ip_id = 0; ip->ip_off = 0; ip->ip_ttl = inp->inp_ip_ttl; ip->ip_sum = 0; ip->ip_p = IPPROTO_TCP; ip->ip_src = inp->inp_laddr; ip->ip_dst = inp->inp_faddr; } #endif /* INET */ th->th_sport = inp->inp_lport; th->th_dport = inp->inp_fport; th->th_seq = 0; th->th_ack = 0; th->th_x2 = 0; th->th_off = 5; th->th_flags = 0; th->th_win = 0; th->th_urp = 0; th->th_sum = 0; /* in_pseudo() is called later for ipv4 */ } /* * Create template to be used to send tcp packets on a connection. * Allocates an mbuf and fills in a skeletal tcp/ip header. The only * use for this function is in keepalives, which use tcp_respond. */ struct tcptemp * tcpip_maketemplate(struct inpcb *inp) { struct tcptemp *t; t = malloc(sizeof(*t), M_TEMP, M_NOWAIT); if (t == NULL) return (NULL); tcpip_fillheaders(inp, (void *)&t->tt_ipgen, (void *)&t->tt_t); return (t); } /* * Send a single message to the TCP at address specified by * the given TCP/IP header. If m == NULL, then we make a copy * of the tcpiphdr at th and send directly to the addressed host. * This is used to force keep alive messages out using the TCP * template for a connection. If flags are given then we send * a message back to the TCP which originated the segment th, * and discard the mbuf containing it and any other attached mbufs. * * In any case the ack and sequence number of the transmitted * segment are as specified by the parameters. * * NOTE: If m != NULL, then th must point to *inside* the mbuf. */ void tcp_respond(struct tcpcb *tp, void *ipgen, struct tcphdr *th, struct mbuf *m, tcp_seq ack, tcp_seq seq, int flags) { int tlen; int win = 0; struct ip *ip; struct tcphdr *nth; #ifdef INET6 struct ip6_hdr *ip6; int isipv6; #endif /* INET6 */ int ipflags = 0; struct inpcb *inp; KASSERT(tp != NULL || m != NULL, ("tcp_respond: tp and m both NULL")); #ifdef INET6 isipv6 = ((struct ip *)ipgen)->ip_v == (IPV6_VERSION >> 4); ip6 = ipgen; #endif /* INET6 */ ip = ipgen; if (tp != NULL) { inp = tp->t_inpcb; KASSERT(inp != NULL, ("tcp control block w/o inpcb")); INP_WLOCK_ASSERT(inp); } else inp = NULL; if (tp != NULL) { if (!(flags & TH_RST)) { win = sbspace(&inp->inp_socket->so_rcv); if (win > (long)TCP_MAXWIN << tp->rcv_scale) win = (long)TCP_MAXWIN << tp->rcv_scale; } } if (m == NULL) { m = m_gethdr(M_NOWAIT, MT_DATA); if (m == NULL) return; tlen = 0; m->m_data += max_linkhdr; #ifdef INET6 if (isipv6) { bcopy((caddr_t)ip6, mtod(m, caddr_t), sizeof(struct ip6_hdr)); ip6 = mtod(m, struct ip6_hdr *); nth = (struct tcphdr *)(ip6 + 1); } else #endif /* INET6 */ { bcopy((caddr_t)ip, mtod(m, caddr_t), sizeof(struct ip)); ip = mtod(m, struct ip *); nth = (struct tcphdr *)(ip + 1); } bcopy((caddr_t)th, (caddr_t)nth, sizeof(struct tcphdr)); flags = TH_ACK; } else { /* * reuse the mbuf. * XXX MRT We inherrit the FIB, which is lucky. */ m_freem(m->m_next); m->m_next = NULL; m->m_data = (caddr_t)ipgen; /* m_len is set later */ tlen = 0; #define xchg(a,b,type) { type t; t=a; a=b; b=t; } #ifdef INET6 if (isipv6) { xchg(ip6->ip6_dst, ip6->ip6_src, struct in6_addr); nth = (struct tcphdr *)(ip6 + 1); } else #endif /* INET6 */ { xchg(ip->ip_dst.s_addr, ip->ip_src.s_addr, uint32_t); nth = (struct tcphdr *)(ip + 1); } if (th != nth) { /* * this is usually a case when an extension header * exists between the IPv6 header and the * TCP header. */ nth->th_sport = th->th_sport; nth->th_dport = th->th_dport; } xchg(nth->th_dport, nth->th_sport, uint16_t); #undef xchg } #ifdef INET6 if (isipv6) { ip6->ip6_flow = 0; ip6->ip6_vfc = IPV6_VERSION; ip6->ip6_nxt = IPPROTO_TCP; tlen += sizeof (struct ip6_hdr) + sizeof (struct tcphdr); ip6->ip6_plen = htons(tlen - sizeof(*ip6)); } #endif #if defined(INET) && defined(INET6) else #endif #ifdef INET { tlen += sizeof (struct tcpiphdr); ip->ip_len = htons(tlen); ip->ip_ttl = V_ip_defttl; if (V_path_mtu_discovery) ip->ip_off |= htons(IP_DF); } #endif m->m_len = tlen; m->m_pkthdr.len = tlen; m->m_pkthdr.rcvif = NULL; #ifdef MAC if (inp != NULL) { /* * Packet is associated with a socket, so allow the * label of the response to reflect the socket label. */ INP_WLOCK_ASSERT(inp); mac_inpcb_create_mbuf(inp, m); } else { /* * Packet is not associated with a socket, so possibly * update the label in place. */ mac_netinet_tcp_reply(m); } #endif nth->th_seq = htonl(seq); nth->th_ack = htonl(ack); nth->th_x2 = 0; nth->th_off = sizeof (struct tcphdr) >> 2; nth->th_flags = flags; if (tp != NULL) nth->th_win = htons((u_short) (win >> tp->rcv_scale)); else nth->th_win = htons((u_short)win); nth->th_urp = 0; m->m_pkthdr.csum_data = offsetof(struct tcphdr, th_sum); #ifdef INET6 if (isipv6) { m->m_pkthdr.csum_flags = CSUM_TCP_IPV6; nth->th_sum = in6_cksum_pseudo(ip6, tlen - sizeof(struct ip6_hdr), IPPROTO_TCP, 0); ip6->ip6_hlim = in6_selecthlim(tp != NULL ? tp->t_inpcb : NULL, NULL); } #endif /* INET6 */ #if defined(INET6) && defined(INET) else #endif #ifdef INET { m->m_pkthdr.csum_flags = CSUM_TCP; nth->th_sum = in_pseudo(ip->ip_src.s_addr, ip->ip_dst.s_addr, htons((u_short)(tlen - sizeof(struct ip) + ip->ip_p))); } #endif /* INET */ #ifdef TCPDEBUG if (tp == NULL || (inp->inp_socket->so_options & SO_DEBUG)) tcp_trace(TA_OUTPUT, 0, tp, mtod(m, void *), th, 0); #endif TCP_PROBE3(debug__input, tp, th, mtod(m, const char *)); if (flags & TH_RST) TCP_PROBE5(accept__refused, NULL, NULL, mtod(m, const char *), tp, nth); TCP_PROBE5(send, NULL, tp, mtod(m, const char *), tp, nth); #ifdef INET6 if (isipv6) (void) ip6_output(m, NULL, NULL, ipflags, NULL, NULL, inp); #endif /* INET6 */ #if defined(INET) && defined(INET6) else #endif #ifdef INET (void) ip_output(m, NULL, NULL, ipflags, NULL, inp); #endif } /* * Create a new TCP control block, making an * empty reassembly queue and hooking it to the argument * protocol control block. The `inp' parameter must have * come from the zone allocator set up in tcp_init(). */ struct tcpcb * tcp_newtcpcb(struct inpcb *inp) { struct tcpcb_mem *tm; struct tcpcb *tp; #ifdef INET6 int isipv6 = (inp->inp_vflag & INP_IPV6) != 0; #endif /* INET6 */ tm = uma_zalloc(V_tcpcb_zone, M_NOWAIT | M_ZERO); if (tm == NULL) return (NULL); tp = &tm->tcb; /* Initialise cc_var struct for this tcpcb. */ tp->ccv = &tm->ccv; tp->ccv->type = IPPROTO_TCP; tp->ccv->ccvc.tcp = tp; rw_rlock(&tcp_function_lock); tp->t_fb = tcp_func_set_ptr; refcount_acquire(&tp->t_fb->tfb_refcnt); rw_runlock(&tcp_function_lock); if (tp->t_fb->tfb_tcp_fb_init) { (*tp->t_fb->tfb_tcp_fb_init)(tp); } /* * Use the current system default CC algorithm. */ CC_LIST_RLOCK(); KASSERT(!STAILQ_EMPTY(&cc_list), ("cc_list is empty!")); CC_ALGO(tp) = CC_DEFAULT(); CC_LIST_RUNLOCK(); if (CC_ALGO(tp)->cb_init != NULL) if (CC_ALGO(tp)->cb_init(tp->ccv) > 0) { if (tp->t_fb->tfb_tcp_fb_fini) (*tp->t_fb->tfb_tcp_fb_fini)(tp); refcount_release(&tp->t_fb->tfb_refcnt); uma_zfree(V_tcpcb_zone, tm); return (NULL); } tp->osd = &tm->osd; if (khelp_init_osd(HELPER_CLASS_TCP, tp->osd)) { if (tp->t_fb->tfb_tcp_fb_fini) (*tp->t_fb->tfb_tcp_fb_fini)(tp); refcount_release(&tp->t_fb->tfb_refcnt); uma_zfree(V_tcpcb_zone, tm); return (NULL); } #ifdef VIMAGE tp->t_vnet = inp->inp_vnet; #endif tp->t_timers = &tm->tt; /* LIST_INIT(&tp->t_segq); */ /* XXX covered by M_ZERO */ tp->t_maxseg = #ifdef INET6 isipv6 ? V_tcp_v6mssdflt : #endif /* INET6 */ V_tcp_mssdflt; /* Set up our timeouts. */ callout_init(&tp->t_timers->tt_rexmt, 1); callout_init(&tp->t_timers->tt_persist, 1); callout_init(&tp->t_timers->tt_keep, 1); callout_init(&tp->t_timers->tt_2msl, 1); callout_init(&tp->t_timers->tt_delack, 1); if (V_tcp_do_rfc1323) tp->t_flags = (TF_REQ_SCALE|TF_REQ_TSTMP); if (V_tcp_do_sack) tp->t_flags |= TF_SACK_PERMIT; TAILQ_INIT(&tp->snd_holes); /* * The tcpcb will hold a reference on its inpcb until tcp_discardcb() * is called. */ in_pcbref(inp); /* Reference for tcpcb */ tp->t_inpcb = inp; /* * Init srtt to TCPTV_SRTTBASE (0), so we can tell that we have no * rtt estimate. Set rttvar so that srtt + 4 * rttvar gives * reasonable initial retransmit time. */ tp->t_srtt = TCPTV_SRTTBASE; tp->t_rttvar = ((TCPTV_RTOBASE - TCPTV_SRTTBASE) << TCP_RTTVAR_SHIFT) / 4; tp->t_rttmin = tcp_rexmit_min; tp->t_rxtcur = TCPTV_RTOBASE; tp->snd_cwnd = TCP_MAXWIN << TCP_MAX_WINSHIFT; tp->snd_ssthresh = TCP_MAXWIN << TCP_MAX_WINSHIFT; tp->t_rcvtime = ticks; /* * IPv4 TTL initialization is necessary for an IPv6 socket as well, * because the socket may be bound to an IPv6 wildcard address, * which may match an IPv4-mapped IPv6 address. */ inp->inp_ip_ttl = V_ip_defttl; inp->inp_ppcb = tp; #ifdef TCPPCAP /* * Init the TCP PCAP queues. */ tcp_pcap_tcpcb_init(tp); #endif return (tp); /* XXX */ } /* * Switch the congestion control algorithm back to NewReno for any active * control blocks using an algorithm which is about to go away. * This ensures the CC framework can allow the unload to proceed without leaving * any dangling pointers which would trigger a panic. * Returning non-zero would inform the CC framework that something went wrong * and it would be unsafe to allow the unload to proceed. However, there is no * way for this to occur with this implementation so we always return zero. */ int tcp_ccalgounload(struct cc_algo *unload_algo) { struct cc_algo *tmpalgo; struct inpcb *inp; struct tcpcb *tp; VNET_ITERATOR_DECL(vnet_iter); /* * Check all active control blocks across all network stacks and change * any that are using "unload_algo" back to NewReno. If "unload_algo" * requires cleanup code to be run, call it. */ VNET_LIST_RLOCK(); VNET_FOREACH(vnet_iter) { CURVNET_SET(vnet_iter); INP_INFO_WLOCK(&V_tcbinfo); /* * New connections already part way through being initialised * with the CC algo we're removing will not race with this code * because the INP_INFO_WLOCK is held during initialisation. We * therefore don't enter the loop below until the connection * list has stabilised. */ LIST_FOREACH(inp, &V_tcb, inp_list) { INP_WLOCK(inp); /* Important to skip tcptw structs. */ if (!(inp->inp_flags & INP_TIMEWAIT) && (tp = intotcpcb(inp)) != NULL) { /* * By holding INP_WLOCK here, we are assured * that the connection is not currently * executing inside the CC module's functions * i.e. it is safe to make the switch back to * NewReno. */ if (CC_ALGO(tp) == unload_algo) { tmpalgo = CC_ALGO(tp); /* NewReno does not require any init. */ CC_ALGO(tp) = &newreno_cc_algo; if (tmpalgo->cb_destroy != NULL) tmpalgo->cb_destroy(tp->ccv); } } INP_WUNLOCK(inp); } INP_INFO_WUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK(); return (0); } /* * Drop a TCP connection, reporting * the specified error. If connection is synchronized, * then send a RST to peer. */ struct tcpcb * tcp_drop(struct tcpcb *tp, int errno) { struct socket *so = tp->t_inpcb->inp_socket; INP_INFO_LOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(tp->t_inpcb); if (TCPS_HAVERCVDSYN(tp->t_state)) { tcp_state_change(tp, TCPS_CLOSED); (void) tp->t_fb->tfb_tcp_output(tp); TCPSTAT_INC(tcps_drops); } else TCPSTAT_INC(tcps_conndrops); if (errno == ETIMEDOUT && tp->t_softerror) errno = tp->t_softerror; so->so_error = errno; return (tcp_close(tp)); } void tcp_discardcb(struct tcpcb *tp) { struct inpcb *inp = tp->t_inpcb; struct socket *so = inp->inp_socket; #ifdef INET6 int isipv6 = (inp->inp_vflag & INP_IPV6) != 0; #endif /* INET6 */ int released; INP_WLOCK_ASSERT(inp); /* * Make sure that all of our timers are stopped before we delete the * PCB. * * If stopping a timer fails, we schedule a discard function in same * callout, and the last discard function called will take care of * deleting the tcpcb. */ tcp_timer_stop(tp, TT_REXMT); tcp_timer_stop(tp, TT_PERSIST); tcp_timer_stop(tp, TT_KEEP); tcp_timer_stop(tp, TT_2MSL); tcp_timer_stop(tp, TT_DELACK); if (tp->t_fb->tfb_tcp_timer_stop_all) { /* Call the stop-all function of the methods */ tp->t_fb->tfb_tcp_timer_stop_all(tp); } /* * If we got enough samples through the srtt filter, * save the rtt and rttvar in the routing entry. * 'Enough' is arbitrarily defined as 4 rtt samples. * 4 samples is enough for the srtt filter to converge * to within enough % of the correct value; fewer samples * and we could save a bogus rtt. The danger is not high * as tcp quickly recovers from everything. * XXX: Works very well but needs some more statistics! */ if (tp->t_rttupdated >= 4) { struct hc_metrics_lite metrics; u_long ssthresh; bzero(&metrics, sizeof(metrics)); /* * Update the ssthresh always when the conditions below * are satisfied. This gives us better new start value * for the congestion avoidance for new connections. * ssthresh is only set if packet loss occured on a session. * * XXXRW: 'so' may be NULL here, and/or socket buffer may be * being torn down. Ideally this code would not use 'so'. */ ssthresh = tp->snd_ssthresh; if (ssthresh != 0 && ssthresh < so->so_snd.sb_hiwat / 2) { /* * convert the limit from user data bytes to * packets then to packet data bytes. */ ssthresh = (ssthresh + tp->t_maxseg / 2) / tp->t_maxseg; if (ssthresh < 2) ssthresh = 2; ssthresh *= (u_long)(tp->t_maxseg + #ifdef INET6 (isipv6 ? sizeof (struct ip6_hdr) + sizeof (struct tcphdr) : #endif sizeof (struct tcpiphdr) #ifdef INET6 ) #endif ); } else ssthresh = 0; metrics.rmx_ssthresh = ssthresh; metrics.rmx_rtt = tp->t_srtt; metrics.rmx_rttvar = tp->t_rttvar; metrics.rmx_cwnd = tp->snd_cwnd; metrics.rmx_sendpipe = 0; metrics.rmx_recvpipe = 0; tcp_hc_update(&inp->inp_inc, &metrics); } /* free the reassembly queue, if any */ tcp_reass_flush(tp); #ifdef TCP_OFFLOAD /* Disconnect offload device, if any. */ if (tp->t_flags & TF_TOE) tcp_offload_detach(tp); #endif tcp_free_sackholes(tp); #ifdef TCPPCAP /* Free the TCP PCAP queues. */ tcp_pcap_drain(&(tp->t_inpkts)); tcp_pcap_drain(&(tp->t_outpkts)); #endif /* Allow the CC algorithm to clean up after itself. */ if (CC_ALGO(tp)->cb_destroy != NULL) CC_ALGO(tp)->cb_destroy(tp->ccv); khelp_destroy_osd(tp->osd); CC_ALGO(tp) = NULL; inp->inp_ppcb = NULL; if ((tp->t_timers->tt_flags & TT_MASK) == 0) { /* We own the last reference on tcpcb, let's free it. */ if ((tp->t_fb->tfb_tcp_timers_left) && (tp->t_fb->tfb_tcp_timers_left(tp))) { /* Some fb timers left running! */ return; } if (tp->t_fb->tfb_tcp_fb_fini) (*tp->t_fb->tfb_tcp_fb_fini)(tp); refcount_release(&tp->t_fb->tfb_refcnt); tp->t_inpcb = NULL; uma_zfree(V_tcpcb_zone, tp); released = in_pcbrele_wlocked(inp); KASSERT(!released, ("%s: inp %p should not have been released " "here", __func__, inp)); } } void tcp_timer_2msl_discard(void *xtp) { tcp_timer_discard((struct tcpcb *)xtp, TT_2MSL); } void tcp_timer_keep_discard(void *xtp) { tcp_timer_discard((struct tcpcb *)xtp, TT_KEEP); } void tcp_timer_persist_discard(void *xtp) { tcp_timer_discard((struct tcpcb *)xtp, TT_PERSIST); } void tcp_timer_rexmt_discard(void *xtp) { tcp_timer_discard((struct tcpcb *)xtp, TT_REXMT); } void tcp_timer_delack_discard(void *xtp) { tcp_timer_discard((struct tcpcb *)xtp, TT_DELACK); } void tcp_timer_discard(struct tcpcb *tp, uint32_t timer_type) { struct inpcb *inp; CURVNET_SET(tp->t_vnet); INP_INFO_RLOCK(&V_tcbinfo); inp = tp->t_inpcb; KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, tp)); INP_WLOCK(inp); KASSERT((tp->t_timers->tt_flags & TT_STOPPED) != 0, ("%s: tcpcb has to be stopped here", __func__)); KASSERT((tp->t_timers->tt_flags & timer_type) != 0, ("%s: discard callout should be running", __func__)); tp->t_timers->tt_flags &= ~timer_type; if ((tp->t_timers->tt_flags & TT_MASK) == 0) { /* We own the last reference on this tcpcb, let's free it. */ if ((tp->t_fb->tfb_tcp_timers_left) && (tp->t_fb->tfb_tcp_timers_left(tp))) { /* Some fb timers left running! */ goto leave; } if (tp->t_fb->tfb_tcp_fb_fini) (*tp->t_fb->tfb_tcp_fb_fini)(tp); refcount_release(&tp->t_fb->tfb_refcnt); tp->t_inpcb = NULL; uma_zfree(V_tcpcb_zone, tp); if (in_pcbrele_wlocked(inp)) { INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); return; } } leave: INP_WUNLOCK(inp); INP_INFO_RUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } /* * Attempt to close a TCP control block, marking it as dropped, and freeing * the socket if we hold the only reference. */ struct tcpcb * tcp_close(struct tcpcb *tp) { struct inpcb *inp = tp->t_inpcb; struct socket *so; INP_INFO_LOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); #ifdef TCP_OFFLOAD if (tp->t_state == TCPS_LISTEN) tcp_offload_listen_stop(tp); #endif #ifdef TCP_RFC7413 /* * This releases the TFO pending counter resource for TFO listen * sockets as well as passively-created TFO sockets that transition * from SYN_RECEIVED to CLOSED. */ if (tp->t_tfo_pending) { tcp_fastopen_decrement_counter(tp->t_tfo_pending); tp->t_tfo_pending = NULL; } #endif in_pcbdrop(inp); TCPSTAT_INC(tcps_closed); KASSERT(inp->inp_socket != NULL, ("tcp_close: inp_socket NULL")); so = inp->inp_socket; soisdisconnected(so); if (inp->inp_flags & INP_SOCKREF) { KASSERT(so->so_state & SS_PROTOREF, ("tcp_close: !SS_PROTOREF")); inp->inp_flags &= ~INP_SOCKREF; INP_WUNLOCK(inp); ACCEPT_LOCK(); SOCK_LOCK(so); so->so_state &= ~SS_PROTOREF; sofree(so); return (NULL); } return (tp); } void tcp_drain(void) { VNET_ITERATOR_DECL(vnet_iter); if (!do_tcpdrain) return; VNET_LIST_RLOCK_NOSLEEP(); VNET_FOREACH(vnet_iter) { CURVNET_SET(vnet_iter); struct inpcb *inpb; struct tcpcb *tcpb; /* * Walk the tcpbs, if existing, and flush the reassembly queue, * if there is one... * XXX: The "Net/3" implementation doesn't imply that the TCP * reassembly queue should be flushed, but in a situation * where we're really low on mbufs, this is potentially * useful. */ INP_INFO_WLOCK(&V_tcbinfo); LIST_FOREACH(inpb, V_tcbinfo.ipi_listhead, inp_list) { if (inpb->inp_flags & INP_TIMEWAIT) continue; INP_WLOCK(inpb); if ((tcpb = intotcpcb(inpb)) != NULL) { tcp_reass_flush(tcpb); tcp_clean_sackreport(tcpb); } INP_WUNLOCK(inpb); } INP_INFO_WUNLOCK(&V_tcbinfo); CURVNET_RESTORE(); } VNET_LIST_RUNLOCK_NOSLEEP(); } /* * Notify a tcp user of an asynchronous error; * store error as soft error, but wake up user * (for now, won't do anything until can select for soft error). * * Do not wake up user since there currently is no mechanism for * reporting soft errors (yet - a kqueue filter may be added). */ static struct inpcb * tcp_notify(struct inpcb *inp, int error) { struct tcpcb *tp; INP_INFO_LOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); if ((inp->inp_flags & INP_TIMEWAIT) || (inp->inp_flags & INP_DROPPED)) return (inp); tp = intotcpcb(inp); KASSERT(tp != NULL, ("tcp_notify: tp == NULL")); /* * Ignore some errors if we are hooked up. * If connection hasn't completed, has retransmitted several times, * and receives a second error, give up now. This is better * than waiting a long time to establish a connection that * can never complete. */ if (tp->t_state == TCPS_ESTABLISHED && (error == EHOSTUNREACH || error == ENETUNREACH || error == EHOSTDOWN)) { return (inp); } else if (tp->t_state < TCPS_ESTABLISHED && tp->t_rxtshift > 3 && tp->t_softerror) { tp = tcp_drop(tp, error); if (tp != NULL) return (inp); else return (NULL); } else { tp->t_softerror = error; return (inp); } #if 0 wakeup( &so->so_timeo); sorwakeup(so); sowwakeup(so); #endif } static int tcp_pcblist(SYSCTL_HANDLER_ARGS) { int error, i, m, n, pcb_count; struct inpcb *inp, **inp_list; inp_gen_t gencnt; struct xinpgen xig; /* * The process of preparing the TCB list is too time-consuming and * resource-intensive to repeat twice on every request. */ if (req->oldptr == NULL) { n = V_tcbinfo.ipi_count + syncache_pcbcount(); n += imax(n / 8, 10); req->oldidx = 2 * (sizeof xig) + n * sizeof(struct xtcpcb); return (0); } if (req->newptr != NULL) return (EPERM); /* * OK, now we're committed to doing something. */ INP_LIST_RLOCK(&V_tcbinfo); gencnt = V_tcbinfo.ipi_gencnt; n = V_tcbinfo.ipi_count; INP_LIST_RUNLOCK(&V_tcbinfo); m = syncache_pcbcount(); error = sysctl_wire_old_buffer(req, 2 * (sizeof xig) + (n + m) * sizeof(struct xtcpcb)); if (error != 0) return (error); xig.xig_len = sizeof xig; xig.xig_count = n + m; xig.xig_gen = gencnt; xig.xig_sogen = so_gencnt; error = SYSCTL_OUT(req, &xig, sizeof xig); if (error) return (error); error = syncache_pcblist(req, m, &pcb_count); if (error) return (error); inp_list = malloc(n * sizeof *inp_list, M_TEMP, M_WAITOK); if (inp_list == NULL) return (ENOMEM); INP_INFO_WLOCK(&V_tcbinfo); for (inp = LIST_FIRST(V_tcbinfo.ipi_listhead), i = 0; inp != NULL && i < n; inp = LIST_NEXT(inp, inp_list)) { INP_WLOCK(inp); if (inp->inp_gencnt <= gencnt) { /* * XXX: This use of cr_cansee(), introduced with * TCP state changes, is not quite right, but for * now, better than nothing. */ if (inp->inp_flags & INP_TIMEWAIT) { if (intotw(inp) != NULL) error = cr_cansee(req->td->td_ucred, intotw(inp)->tw_cred); else error = EINVAL; /* Skip this inp. */ } else error = cr_canseeinpcb(req->td->td_ucred, inp); if (error == 0) { in_pcbref(inp); inp_list[i++] = inp; } } INP_WUNLOCK(inp); } INP_INFO_WUNLOCK(&V_tcbinfo); n = i; error = 0; for (i = 0; i < n; i++) { inp = inp_list[i]; INP_RLOCK(inp); if (inp->inp_gencnt <= gencnt) { struct xtcpcb xt; void *inp_ppcb; bzero(&xt, sizeof(xt)); xt.xt_len = sizeof xt; /* XXX should avoid extra copy */ bcopy(inp, &xt.xt_inp, sizeof *inp); inp_ppcb = inp->inp_ppcb; if (inp_ppcb == NULL) bzero((char *) &xt.xt_tp, sizeof xt.xt_tp); else if (inp->inp_flags & INP_TIMEWAIT) { bzero((char *) &xt.xt_tp, sizeof xt.xt_tp); xt.xt_tp.t_state = TCPS_TIME_WAIT; } else { bcopy(inp_ppcb, &xt.xt_tp, sizeof xt.xt_tp); if (xt.xt_tp.t_timers) tcp_timer_to_xtimer(&xt.xt_tp, xt.xt_tp.t_timers, &xt.xt_timer); } if (inp->inp_socket != NULL) sotoxsocket(inp->inp_socket, &xt.xt_socket); else { bzero(&xt.xt_socket, sizeof xt.xt_socket); xt.xt_socket.xso_protocol = IPPROTO_TCP; } xt.xt_inp.inp_gencnt = inp->inp_gencnt; INP_RUNLOCK(inp); error = SYSCTL_OUT(req, &xt, sizeof xt); } else INP_RUNLOCK(inp); } INP_INFO_RLOCK(&V_tcbinfo); for (i = 0; i < n; i++) { inp = inp_list[i]; INP_RLOCK(inp); if (!in_pcbrele_rlocked(inp)) INP_RUNLOCK(inp); } INP_INFO_RUNLOCK(&V_tcbinfo); if (!error) { /* * Give the user an updated idea of our state. * If the generation differs from what we told * her before, she knows that something happened * while we were processing this request, and it * might be necessary to retry. */ INP_LIST_RLOCK(&V_tcbinfo); xig.xig_gen = V_tcbinfo.ipi_gencnt; xig.xig_sogen = so_gencnt; xig.xig_count = V_tcbinfo.ipi_count + pcb_count; INP_LIST_RUNLOCK(&V_tcbinfo); error = SYSCTL_OUT(req, &xig, sizeof xig); } free(inp_list, M_TEMP); return (error); } SYSCTL_PROC(_net_inet_tcp, TCPCTL_PCBLIST, pcblist, CTLTYPE_OPAQUE | CTLFLAG_RD, NULL, 0, tcp_pcblist, "S,xtcpcb", "List of active TCP connections"); #ifdef INET static int tcp_getcred(SYSCTL_HANDLER_ARGS) { struct xucred xuc; struct sockaddr_in addrs[2]; struct inpcb *inp; int error; error = priv_check(req->td, PRIV_NETINET_GETCRED); if (error) return (error); error = SYSCTL_IN(req, addrs, sizeof(addrs)); if (error) return (error); inp = in_pcblookup(&V_tcbinfo, addrs[1].sin_addr, addrs[1].sin_port, addrs[0].sin_addr, addrs[0].sin_port, INPLOOKUP_RLOCKPCB, NULL); if (inp != NULL) { if (inp->inp_socket == NULL) error = ENOENT; if (error == 0) error = cr_canseeinpcb(req->td->td_ucred, inp); if (error == 0) cru2x(inp->inp_cred, &xuc); INP_RUNLOCK(inp); } else error = ENOENT; if (error == 0) error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred)); return (error); } SYSCTL_PROC(_net_inet_tcp, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW|CTLFLAG_PRISON, 0, 0, tcp_getcred, "S,xucred", "Get the xucred of a TCP connection"); #endif /* INET */ #ifdef INET6 static int tcp6_getcred(SYSCTL_HANDLER_ARGS) { struct xucred xuc; struct sockaddr_in6 addrs[2]; struct inpcb *inp; int error; #ifdef INET int mapped = 0; #endif error = priv_check(req->td, PRIV_NETINET_GETCRED); if (error) return (error); error = SYSCTL_IN(req, addrs, sizeof(addrs)); if (error) return (error); if ((error = sa6_embedscope(&addrs[0], V_ip6_use_defzone)) != 0 || (error = sa6_embedscope(&addrs[1], V_ip6_use_defzone)) != 0) { return (error); } if (IN6_IS_ADDR_V4MAPPED(&addrs[0].sin6_addr)) { #ifdef INET if (IN6_IS_ADDR_V4MAPPED(&addrs[1].sin6_addr)) mapped = 1; else #endif return (EINVAL); } #ifdef INET if (mapped == 1) inp = in_pcblookup(&V_tcbinfo, *(struct in_addr *)&addrs[1].sin6_addr.s6_addr[12], addrs[1].sin6_port, *(struct in_addr *)&addrs[0].sin6_addr.s6_addr[12], addrs[0].sin6_port, INPLOOKUP_RLOCKPCB, NULL); else #endif inp = in6_pcblookup(&V_tcbinfo, &addrs[1].sin6_addr, addrs[1].sin6_port, &addrs[0].sin6_addr, addrs[0].sin6_port, INPLOOKUP_RLOCKPCB, NULL); if (inp != NULL) { if (inp->inp_socket == NULL) error = ENOENT; if (error == 0) error = cr_canseeinpcb(req->td->td_ucred, inp); if (error == 0) cru2x(inp->inp_cred, &xuc); INP_RUNLOCK(inp); } else error = ENOENT; if (error == 0) error = SYSCTL_OUT(req, &xuc, sizeof(struct xucred)); return (error); } SYSCTL_PROC(_net_inet6_tcp6, OID_AUTO, getcred, CTLTYPE_OPAQUE|CTLFLAG_RW|CTLFLAG_PRISON, 0, 0, tcp6_getcred, "S,xucred", "Get the xucred of a TCP6 connection"); #endif /* INET6 */ #ifdef INET void tcp_ctlinput(int cmd, struct sockaddr *sa, void *vip) { struct ip *ip = vip; struct tcphdr *th; struct in_addr faddr; struct inpcb *inp; struct tcpcb *tp; struct inpcb *(*notify)(struct inpcb *, int) = tcp_notify; struct icmp *icp; struct in_conninfo inc; tcp_seq icmp_tcp_seq; int mtu; faddr = ((struct sockaddr_in *)sa)->sin_addr; if (sa->sa_family != AF_INET || faddr.s_addr == INADDR_ANY) return; if (cmd == PRC_MSGSIZE) notify = tcp_mtudisc_notify; else if (V_icmp_may_rst && (cmd == PRC_UNREACH_ADMIN_PROHIB || cmd == PRC_UNREACH_PORT || cmd == PRC_TIMXCEED_INTRANS) && ip) notify = tcp_drop_syn_sent; /* * Redirects don't need to be handled up here. */ else if (PRC_IS_REDIRECT(cmd)) return; /* * Hostdead is ugly because it goes linearly through all PCBs. * XXX: We never get this from ICMP, otherwise it makes an * excellent DoS attack on machines with many connections. */ else if (cmd == PRC_HOSTDEAD) ip = NULL; else if ((unsigned)cmd >= PRC_NCMDS || inetctlerrmap[cmd] == 0) return; if (ip == NULL) { in_pcbnotifyall(&V_tcbinfo, faddr, inetctlerrmap[cmd], notify); return; } icp = (struct icmp *)((caddr_t)ip - offsetof(struct icmp, icmp_ip)); th = (struct tcphdr *)((caddr_t)ip + (ip->ip_hl << 2)); INP_INFO_RLOCK(&V_tcbinfo); inp = in_pcblookup(&V_tcbinfo, faddr, th->th_dport, ip->ip_src, th->th_sport, INPLOOKUP_WLOCKPCB, NULL); if (inp != NULL) { if (!(inp->inp_flags & INP_TIMEWAIT) && !(inp->inp_flags & INP_DROPPED) && !(inp->inp_socket == NULL)) { icmp_tcp_seq = ntohl(th->th_seq); tp = intotcpcb(inp); if (SEQ_GEQ(icmp_tcp_seq, tp->snd_una) && SEQ_LT(icmp_tcp_seq, tp->snd_max)) { if (cmd == PRC_MSGSIZE) { /* * MTU discovery: * If we got a needfrag set the MTU * in the route to the suggested new * value (if given) and then notify. */ mtu = ntohs(icp->icmp_nextmtu); /* * If no alternative MTU was * proposed, try the next smaller * one. */ if (!mtu) mtu = ip_next_mtu( ntohs(ip->ip_len), 1); if (mtu < V_tcp_minmss + sizeof(struct tcpiphdr)) mtu = V_tcp_minmss + sizeof(struct tcpiphdr); /* * Only process the offered MTU if it * is smaller than the current one. */ if (mtu < tp->t_maxseg + sizeof(struct tcpiphdr)) { bzero(&inc, sizeof(inc)); inc.inc_faddr = faddr; inc.inc_fibnum = inp->inp_inc.inc_fibnum; tcp_hc_updatemtu(&inc, mtu); tcp_mtudisc(inp, mtu); } } else inp = (*notify)(inp, inetctlerrmap[cmd]); } } if (inp != NULL) INP_WUNLOCK(inp); } else { bzero(&inc, sizeof(inc)); inc.inc_fport = th->th_dport; inc.inc_lport = th->th_sport; inc.inc_faddr = faddr; inc.inc_laddr = ip->ip_src; syncache_unreach(&inc, th); } INP_INFO_RUNLOCK(&V_tcbinfo); } #endif /* INET */ #ifdef INET6 void tcp6_ctlinput(int cmd, struct sockaddr *sa, void *d) { struct tcphdr th; struct inpcb *(*notify)(struct inpcb *, int) = tcp_notify; struct ip6_hdr *ip6; struct mbuf *m; struct ip6ctlparam *ip6cp = NULL; const struct sockaddr_in6 *sa6_src = NULL; int off; struct tcp_portonly { u_int16_t th_sport; u_int16_t th_dport; } *thp; if (sa->sa_family != AF_INET6 || sa->sa_len != sizeof(struct sockaddr_in6)) return; if (cmd == PRC_MSGSIZE) notify = tcp_mtudisc_notify; else if (!PRC_IS_REDIRECT(cmd) && ((unsigned)cmd >= PRC_NCMDS || inet6ctlerrmap[cmd] == 0)) return; /* if the parameter is from icmp6, decode it. */ if (d != NULL) { ip6cp = (struct ip6ctlparam *)d; m = ip6cp->ip6c_m; ip6 = ip6cp->ip6c_ip6; off = ip6cp->ip6c_off; sa6_src = ip6cp->ip6c_src; } else { m = NULL; ip6 = NULL; off = 0; /* fool gcc */ sa6_src = &sa6_any; } if (ip6 != NULL) { struct in_conninfo inc; /* * XXX: We assume that when IPV6 is non NULL, * M and OFF are valid. */ /* check if we can safely examine src and dst ports */ if (m->m_pkthdr.len < off + sizeof(*thp)) return; bzero(&th, sizeof(th)); m_copydata(m, off, sizeof(*thp), (caddr_t)&th); in6_pcbnotify(&V_tcbinfo, sa, th.th_dport, (struct sockaddr *)ip6cp->ip6c_src, th.th_sport, cmd, NULL, notify); bzero(&inc, sizeof(inc)); inc.inc_fport = th.th_dport; inc.inc_lport = th.th_sport; inc.inc6_faddr = ((struct sockaddr_in6 *)sa)->sin6_addr; inc.inc6_laddr = ip6cp->ip6c_src->sin6_addr; inc.inc_flags |= INC_ISIPV6; INP_INFO_RLOCK(&V_tcbinfo); syncache_unreach(&inc, &th); INP_INFO_RUNLOCK(&V_tcbinfo); } else in6_pcbnotify(&V_tcbinfo, sa, 0, (const struct sockaddr *)sa6_src, 0, cmd, NULL, notify); } #endif /* INET6 */ /* * Following is where TCP initial sequence number generation occurs. * * There are two places where we must use initial sequence numbers: * 1. In SYN-ACK packets. * 2. In SYN packets. * * All ISNs for SYN-ACK packets are generated by the syncache. See * tcp_syncache.c for details. * * The ISNs in SYN packets must be monotonic; TIME_WAIT recycling * depends on this property. In addition, these ISNs should be * unguessable so as to prevent connection hijacking. To satisfy * the requirements of this situation, the algorithm outlined in * RFC 1948 is used, with only small modifications. * * Implementation details: * * Time is based off the system timer, and is corrected so that it * increases by one megabyte per second. This allows for proper * recycling on high speed LANs while still leaving over an hour * before rollover. * * As reading the *exact* system time is too expensive to be done * whenever setting up a TCP connection, we increment the time * offset in two ways. First, a small random positive increment * is added to isn_offset for each connection that is set up. * Second, the function tcp_isn_tick fires once per clock tick * and increments isn_offset as necessary so that sequence numbers * are incremented at approximately ISN_BYTES_PER_SECOND. The * random positive increments serve only to ensure that the same * exact sequence number is never sent out twice (as could otherwise * happen when a port is recycled in less than the system tick * interval.) * * net.inet.tcp.isn_reseed_interval controls the number of seconds * between seeding of isn_secret. This is normally set to zero, * as reseeding should not be necessary. * * Locking of the global variables isn_secret, isn_last_reseed, isn_offset, * isn_offset_old, and isn_ctx is performed using the TCP pcbinfo lock. In * general, this means holding an exclusive (write) lock. */ #define ISN_BYTES_PER_SECOND 1048576 #define ISN_STATIC_INCREMENT 4096 #define ISN_RANDOM_INCREMENT (4096 - 1) static VNET_DEFINE(u_char, isn_secret[32]); static VNET_DEFINE(int, isn_last); static VNET_DEFINE(int, isn_last_reseed); static VNET_DEFINE(u_int32_t, isn_offset); static VNET_DEFINE(u_int32_t, isn_offset_old); #define V_isn_secret VNET(isn_secret) #define V_isn_last VNET(isn_last) #define V_isn_last_reseed VNET(isn_last_reseed) #define V_isn_offset VNET(isn_offset) #define V_isn_offset_old VNET(isn_offset_old) tcp_seq tcp_new_isn(struct tcpcb *tp) { MD5_CTX isn_ctx; u_int32_t md5_buffer[4]; tcp_seq new_isn; u_int32_t projected_offset; INP_WLOCK_ASSERT(tp->t_inpcb); ISN_LOCK(); /* Seed if this is the first use, reseed if requested. */ if ((V_isn_last_reseed == 0) || ((V_tcp_isn_reseed_interval > 0) && (((u_int)V_isn_last_reseed + (u_int)V_tcp_isn_reseed_interval*hz) < (u_int)ticks))) { read_random(&V_isn_secret, sizeof(V_isn_secret)); V_isn_last_reseed = ticks; } /* Compute the md5 hash and return the ISN. */ MD5Init(&isn_ctx); MD5Update(&isn_ctx, (u_char *) &tp->t_inpcb->inp_fport, sizeof(u_short)); MD5Update(&isn_ctx, (u_char *) &tp->t_inpcb->inp_lport, sizeof(u_short)); #ifdef INET6 if ((tp->t_inpcb->inp_vflag & INP_IPV6) != 0) { MD5Update(&isn_ctx, (u_char *) &tp->t_inpcb->in6p_faddr, sizeof(struct in6_addr)); MD5Update(&isn_ctx, (u_char *) &tp->t_inpcb->in6p_laddr, sizeof(struct in6_addr)); } else #endif { MD5Update(&isn_ctx, (u_char *) &tp->t_inpcb->inp_faddr, sizeof(struct in_addr)); MD5Update(&isn_ctx, (u_char *) &tp->t_inpcb->inp_laddr, sizeof(struct in_addr)); } MD5Update(&isn_ctx, (u_char *) &V_isn_secret, sizeof(V_isn_secret)); MD5Final((u_char *) &md5_buffer, &isn_ctx); new_isn = (tcp_seq) md5_buffer[0]; V_isn_offset += ISN_STATIC_INCREMENT + (arc4random() & ISN_RANDOM_INCREMENT); if (ticks != V_isn_last) { projected_offset = V_isn_offset_old + ISN_BYTES_PER_SECOND / hz * (ticks - V_isn_last); if (SEQ_GT(projected_offset, V_isn_offset)) V_isn_offset = projected_offset; V_isn_offset_old = V_isn_offset; V_isn_last = ticks; } new_isn += V_isn_offset; ISN_UNLOCK(); return (new_isn); } /* * When a specific ICMP unreachable message is received and the * connection state is SYN-SENT, drop the connection. This behavior * is controlled by the icmp_may_rst sysctl. */ struct inpcb * tcp_drop_syn_sent(struct inpcb *inp, int errno) { struct tcpcb *tp; INP_INFO_RLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); if ((inp->inp_flags & INP_TIMEWAIT) || (inp->inp_flags & INP_DROPPED)) return (inp); tp = intotcpcb(inp); if (tp->t_state != TCPS_SYN_SENT) return (inp); tp = tcp_drop(tp, errno); if (tp != NULL) return (inp); else return (NULL); } /* * When `need fragmentation' ICMP is received, update our idea of the MSS * based on the new value. Also nudge TCP to send something, since we * know the packet we just sent was dropped. * This duplicates some code in the tcp_mss() function in tcp_input.c. */ static struct inpcb * tcp_mtudisc_notify(struct inpcb *inp, int error) { tcp_mtudisc(inp, -1); return (inp); } static void tcp_mtudisc(struct inpcb *inp, int mtuoffer) { struct tcpcb *tp; struct socket *so; INP_WLOCK_ASSERT(inp); if ((inp->inp_flags & INP_TIMEWAIT) || (inp->inp_flags & INP_DROPPED)) return; tp = intotcpcb(inp); KASSERT(tp != NULL, ("tcp_mtudisc: tp == NULL")); tcp_mss_update(tp, -1, mtuoffer, NULL, NULL); so = inp->inp_socket; SOCKBUF_LOCK(&so->so_snd); /* If the mss is larger than the socket buffer, decrease the mss. */ if (so->so_snd.sb_hiwat < tp->t_maxseg) tp->t_maxseg = so->so_snd.sb_hiwat; SOCKBUF_UNLOCK(&so->so_snd); TCPSTAT_INC(tcps_mturesent); tp->t_rtttime = 0; tp->snd_nxt = tp->snd_una; tcp_free_sackholes(tp); tp->snd_recover = tp->snd_max; if (tp->t_flags & TF_SACK_PERMIT) EXIT_FASTRECOVERY(tp->t_flags); tp->t_fb->tfb_tcp_output(tp); } #ifdef INET /* * Look-up the routing entry to the peer of this inpcb. If no route * is found and it cannot be allocated, then return 0. This routine * is called by TCP routines that access the rmx structure and by * tcp_mss_update to get the peer/interface MTU. */ u_long tcp_maxmtu(struct in_conninfo *inc, struct tcp_ifcap *cap) { - struct route sro; - struct sockaddr_in *dst; + struct nhop4_extended nh4; struct ifnet *ifp; u_long maxmtu = 0; KASSERT(inc != NULL, ("tcp_maxmtu with NULL in_conninfo pointer")); - bzero(&sro, sizeof(sro)); if (inc->inc_faddr.s_addr != INADDR_ANY) { - dst = (struct sockaddr_in *)&sro.ro_dst; - dst->sin_family = AF_INET; - dst->sin_len = sizeof(*dst); - dst->sin_addr = inc->inc_faddr; - in_rtalloc_ign(&sro, 0, inc->inc_fibnum); - } - if (sro.ro_rt != NULL) { - ifp = sro.ro_rt->rt_ifp; - if (sro.ro_rt->rt_mtu == 0) - maxmtu = ifp->if_mtu; - else - maxmtu = min(sro.ro_rt->rt_mtu, ifp->if_mtu); + if (fib4_lookup_nh_ext(inc->inc_fibnum, inc->inc_faddr, + NHR_REF, 0, &nh4) != 0) + return (0); + + ifp = nh4.nh_ifp; + maxmtu = nh4.nh_mtu; + /* Report additional interface capabilities. */ if (cap != NULL) { if (ifp->if_capenable & IFCAP_TSO4 && ifp->if_hwassist & CSUM_TSO) { cap->ifcap |= CSUM_TSO; cap->tsomax = ifp->if_hw_tsomax; cap->tsomaxsegcount = ifp->if_hw_tsomaxsegcount; cap->tsomaxsegsize = ifp->if_hw_tsomaxsegsize; } } - RTFREE(sro.ro_rt); + fib4_free_nh_ext(inc->inc_fibnum, &nh4); } return (maxmtu); } #endif /* INET */ #ifdef INET6 u_long tcp_maxmtu6(struct in_conninfo *inc, struct tcp_ifcap *cap) { - struct route_in6 sro6; + struct nhop6_extended nh6; + struct in6_addr dst6; + uint32_t scopeid; struct ifnet *ifp; u_long maxmtu = 0; KASSERT(inc != NULL, ("tcp_maxmtu6 with NULL in_conninfo pointer")); - bzero(&sro6, sizeof(sro6)); if (!IN6_IS_ADDR_UNSPECIFIED(&inc->inc6_faddr)) { - sro6.ro_dst.sin6_family = AF_INET6; - sro6.ro_dst.sin6_len = sizeof(struct sockaddr_in6); - sro6.ro_dst.sin6_addr = inc->inc6_faddr; - in6_rtalloc_ign(&sro6, 0, inc->inc_fibnum); - } - if (sro6.ro_rt != NULL) { - ifp = sro6.ro_rt->rt_ifp; - if (sro6.ro_rt->rt_mtu == 0) - maxmtu = IN6_LINKMTU(sro6.ro_rt->rt_ifp); - else - maxmtu = min(sro6.ro_rt->rt_mtu, - IN6_LINKMTU(sro6.ro_rt->rt_ifp)); + in6_splitscope(&inc->inc6_faddr, &dst6, &scopeid); + if (fib6_lookup_nh_ext(inc->inc_fibnum, &dst6, scopeid, 0, + 0, &nh6) != 0) + return (0); + ifp = nh6.nh_ifp; + maxmtu = nh6.nh_mtu; + /* Report additional interface capabilities. */ if (cap != NULL) { if (ifp->if_capenable & IFCAP_TSO6 && ifp->if_hwassist & CSUM_TSO) { cap->ifcap |= CSUM_TSO; cap->tsomax = ifp->if_hw_tsomax; cap->tsomaxsegcount = ifp->if_hw_tsomaxsegcount; cap->tsomaxsegsize = ifp->if_hw_tsomaxsegsize; } } - RTFREE(sro6.ro_rt); + fib6_free_nh_ext(inc->inc_fibnum, &nh6); } return (maxmtu); } #endif /* INET6 */ /* * Calculate effective SMSS per RFC5681 definition for a given TCP * connection at its current state, taking into account SACK and etc. */ u_int tcp_maxseg(const struct tcpcb *tp) { u_int optlen; if (tp->t_flags & TF_NOOPT) return (tp->t_maxseg); /* * Here we have a simplified code from tcp_addoptions(), * without a proper loop, and having most of paddings hardcoded. * We might make mistakes with padding here in some edge cases, * but this is harmless, since result of tcp_maxseg() is used * only in cwnd and ssthresh estimations. */ #define PAD(len) ((((len) / 4) + !!((len) % 4)) * 4) if (TCPS_HAVEESTABLISHED(tp->t_state)) { if (tp->t_flags & TF_RCVD_TSTMP) optlen = TCPOLEN_TSTAMP_APPA; else optlen = 0; #ifdef TCP_SIGNATURE if (tp->t_flags & TF_SIGNATURE) optlen += PAD(TCPOLEN_SIGNATURE); #endif if ((tp->t_flags & TF_SACK_PERMIT) && tp->rcv_numsacks > 0) { optlen += TCPOLEN_SACKHDR; optlen += tp->rcv_numsacks * TCPOLEN_SACK; optlen = PAD(optlen); } } else { if (tp->t_flags & TF_REQ_TSTMP) optlen = TCPOLEN_TSTAMP_APPA; else optlen = PAD(TCPOLEN_MAXSEG); if (tp->t_flags & TF_REQ_SCALE) optlen += PAD(TCPOLEN_WINDOW); #ifdef TCP_SIGNATURE if (tp->t_flags & TF_SIGNATURE) optlen += PAD(TCPOLEN_SIGNATURE); #endif if (tp->t_flags & TF_SACK_PERMIT) optlen += PAD(TCPOLEN_SACK_PERMITTED); } #undef PAD optlen = min(optlen, TCP_MAXOLEN); return (tp->t_maxseg - optlen); } #ifdef IPSEC /* compute ESP/AH header size for TCP, including outer IP header. */ size_t ipsec_hdrsiz_tcp(struct tcpcb *tp) { struct inpcb *inp; struct mbuf *m; size_t hdrsiz; struct ip *ip; #ifdef INET6 struct ip6_hdr *ip6; #endif struct tcphdr *th; if ((tp == NULL) || ((inp = tp->t_inpcb) == NULL) || (!key_havesp(IPSEC_DIR_OUTBOUND))) return (0); m = m_gethdr(M_NOWAIT, MT_DATA); if (!m) return (0); #ifdef INET6 if ((inp->inp_vflag & INP_IPV6) != 0) { ip6 = mtod(m, struct ip6_hdr *); th = (struct tcphdr *)(ip6 + 1); m->m_pkthdr.len = m->m_len = sizeof(struct ip6_hdr) + sizeof(struct tcphdr); tcpip_fillheaders(inp, ip6, th); hdrsiz = ipsec_hdrsiz(m, IPSEC_DIR_OUTBOUND, inp); } else #endif /* INET6 */ { ip = mtod(m, struct ip *); th = (struct tcphdr *)(ip + 1); m->m_pkthdr.len = m->m_len = sizeof(struct tcpiphdr); tcpip_fillheaders(inp, ip, th); hdrsiz = ipsec_hdrsiz(m, IPSEC_DIR_OUTBOUND, inp); } m_free(m); return (hdrsiz); } #endif /* IPSEC */ #ifdef TCP_SIGNATURE /* * Callback function invoked by m_apply() to digest TCP segment data * contained within an mbuf chain. */ static int tcp_signature_apply(void *fstate, void *data, u_int len) { MD5Update(fstate, (u_char *)data, len); return (0); } /* * XXX The key is retrieved from the system's PF_KEY SADB, by keying a * search with the destination IP address, and a 'magic SPI' to be * determined by the application. This is hardcoded elsewhere to 1179 */ struct secasvar * tcp_get_sav(struct mbuf *m, u_int direction) { union sockaddr_union dst; struct secasvar *sav; struct ip *ip; #ifdef INET6 struct ip6_hdr *ip6; char ip6buf[INET6_ADDRSTRLEN]; #endif /* Extract the destination from the IP header in the mbuf. */ bzero(&dst, sizeof(union sockaddr_union)); ip = mtod(m, struct ip *); #ifdef INET6 ip6 = NULL; /* Make the compiler happy. */ #endif switch (ip->ip_v) { #ifdef INET case IPVERSION: dst.sa.sa_len = sizeof(struct sockaddr_in); dst.sa.sa_family = AF_INET; dst.sin.sin_addr = (direction == IPSEC_DIR_INBOUND) ? ip->ip_src : ip->ip_dst; break; #endif #ifdef INET6 case (IPV6_VERSION >> 4): ip6 = mtod(m, struct ip6_hdr *); dst.sa.sa_len = sizeof(struct sockaddr_in6); dst.sa.sa_family = AF_INET6; dst.sin6.sin6_addr = (direction == IPSEC_DIR_INBOUND) ? ip6->ip6_src : ip6->ip6_dst; break; #endif default: return (NULL); /* NOTREACHED */ break; } /* Look up an SADB entry which matches the address of the peer. */ sav = KEY_ALLOCSA(&dst, IPPROTO_TCP, htonl(TCP_SIG_SPI)); if (sav == NULL) { ipseclog((LOG_ERR, "%s: SADB lookup failed for %s\n", __func__, (ip->ip_v == IPVERSION) ? inet_ntoa(dst.sin.sin_addr) : #ifdef INET6 (ip->ip_v == (IPV6_VERSION >> 4)) ? ip6_sprintf(ip6buf, &dst.sin6.sin6_addr) : #endif "(unsupported)")); } return (sav); } /* * Compute TCP-MD5 hash of a TCP segment. (RFC2385) * * Parameters: * m pointer to head of mbuf chain * len length of TCP segment data, excluding options * optlen length of TCP segment options * buf pointer to storage for computed MD5 digest * sav pointer to security assosiation * * We do this over ip, tcphdr, segment data, and the key in the SADB. * When called from tcp_input(), we can be sure that th_sum has been * zeroed out and verified already. * * Releases reference to SADB key before return. * * Return 0 if successful, otherwise return -1. * */ int tcp_signature_do_compute(struct mbuf *m, int len, int optlen, u_char *buf, struct secasvar *sav) { #ifdef INET struct ippseudo ippseudo; #endif MD5_CTX ctx; int doff; struct ip *ip; #ifdef INET struct ipovly *ipovly; #endif struct tcphdr *th; #ifdef INET6 struct ip6_hdr *ip6; struct in6_addr in6; uint32_t plen; uint16_t nhdr; #endif u_short savecsum; KASSERT(m != NULL, ("NULL mbuf chain")); KASSERT(buf != NULL, ("NULL signature pointer")); /* Extract the destination from the IP header in the mbuf. */ ip = mtod(m, struct ip *); #ifdef INET6 ip6 = NULL; /* Make the compiler happy. */ #endif MD5Init(&ctx); /* * Step 1: Update MD5 hash with IP(v6) pseudo-header. * * XXX The ippseudo header MUST be digested in network byte order, * or else we'll fail the regression test. Assume all fields we've * been doing arithmetic on have been in host byte order. * XXX One cannot depend on ipovly->ih_len here. When called from * tcp_output(), the underlying ip_len member has not yet been set. */ switch (ip->ip_v) { #ifdef INET case IPVERSION: ipovly = (struct ipovly *)ip; ippseudo.ippseudo_src = ipovly->ih_src; ippseudo.ippseudo_dst = ipovly->ih_dst; ippseudo.ippseudo_pad = 0; ippseudo.ippseudo_p = IPPROTO_TCP; ippseudo.ippseudo_len = htons(len + sizeof(struct tcphdr) + optlen); MD5Update(&ctx, (char *)&ippseudo, sizeof(struct ippseudo)); th = (struct tcphdr *)((u_char *)ip + sizeof(struct ip)); doff = sizeof(struct ip) + sizeof(struct tcphdr) + optlen; break; #endif #ifdef INET6 /* * RFC 2385, 2.0 Proposal * For IPv6, the pseudo-header is as described in RFC 2460, namely the * 128-bit source IPv6 address, 128-bit destination IPv6 address, zero- * extended next header value (to form 32 bits), and 32-bit segment * length. * Note: Upper-Layer Packet Length comes before Next Header. */ case (IPV6_VERSION >> 4): in6 = ip6->ip6_src; in6_clearscope(&in6); MD5Update(&ctx, (char *)&in6, sizeof(struct in6_addr)); in6 = ip6->ip6_dst; in6_clearscope(&in6); MD5Update(&ctx, (char *)&in6, sizeof(struct in6_addr)); plen = htonl(len + sizeof(struct tcphdr) + optlen); MD5Update(&ctx, (char *)&plen, sizeof(uint32_t)); nhdr = 0; MD5Update(&ctx, (char *)&nhdr, sizeof(uint8_t)); MD5Update(&ctx, (char *)&nhdr, sizeof(uint8_t)); MD5Update(&ctx, (char *)&nhdr, sizeof(uint8_t)); nhdr = IPPROTO_TCP; MD5Update(&ctx, (char *)&nhdr, sizeof(uint8_t)); th = (struct tcphdr *)((u_char *)ip6 + sizeof(struct ip6_hdr)); doff = sizeof(struct ip6_hdr) + sizeof(struct tcphdr) + optlen; break; #endif default: KEY_FREESAV(&sav); return (-1); /* NOTREACHED */ break; } /* * Step 2: Update MD5 hash with TCP header, excluding options. * The TCP checksum must be set to zero. */ savecsum = th->th_sum; th->th_sum = 0; MD5Update(&ctx, (char *)th, sizeof(struct tcphdr)); th->th_sum = savecsum; /* * Step 3: Update MD5 hash with TCP segment data. * Use m_apply() to avoid an early m_pullup(). */ if (len > 0) m_apply(m, doff, len, tcp_signature_apply, &ctx); /* * Step 4: Update MD5 hash with shared secret. */ MD5Update(&ctx, sav->key_auth->key_data, _KEYLEN(sav->key_auth)); MD5Final(buf, &ctx); key_sa_recordxfer(sav, m); KEY_FREESAV(&sav); return (0); } /* * Compute TCP-MD5 hash of a TCP segment. (RFC2385) * * Return 0 if successful, otherwise return -1. */ int tcp_signature_compute(struct mbuf *m, int _unused, int len, int optlen, u_char *buf, u_int direction) { struct secasvar *sav; if ((sav = tcp_get_sav(m, direction)) == NULL) return (-1); return (tcp_signature_do_compute(m, len, optlen, buf, sav)); } /* * Verify the TCP-MD5 hash of a TCP segment. (RFC2385) * * Parameters: * m pointer to head of mbuf chain * len length of TCP segment data, excluding options * optlen length of TCP segment options * buf pointer to storage for computed MD5 digest * direction direction of flow (IPSEC_DIR_INBOUND or OUTBOUND) * * Return 1 if successful, otherwise return 0. */ int tcp_signature_verify(struct mbuf *m, int off0, int tlen, int optlen, struct tcpopt *to, struct tcphdr *th, u_int tcpbflag) { char tmpdigest[TCP_SIGLEN]; if (tcp_sig_checksigs == 0) return (1); if ((tcpbflag & TF_SIGNATURE) == 0) { if ((to->to_flags & TOF_SIGNATURE) != 0) { /* * If this socket is not expecting signature but * the segment contains signature just fail. */ TCPSTAT_INC(tcps_sig_err_sigopt); TCPSTAT_INC(tcps_sig_rcvbadsig); return (0); } /* Signature is not expected, and not present in segment. */ return (1); } /* * If this socket is expecting signature but the segment does not * contain any just fail. */ if ((to->to_flags & TOF_SIGNATURE) == 0) { TCPSTAT_INC(tcps_sig_err_nosigopt); TCPSTAT_INC(tcps_sig_rcvbadsig); return (0); } if (tcp_signature_compute(m, off0, tlen, optlen, &tmpdigest[0], IPSEC_DIR_INBOUND) == -1) { TCPSTAT_INC(tcps_sig_err_buildsig); TCPSTAT_INC(tcps_sig_rcvbadsig); return (0); } if (bcmp(to->to_signature, &tmpdigest[0], TCP_SIGLEN) != 0) { TCPSTAT_INC(tcps_sig_rcvbadsig); return (0); } TCPSTAT_INC(tcps_sig_rcvgoodsig); return (1); } #endif /* TCP_SIGNATURE */ static int sysctl_drop(SYSCTL_HANDLER_ARGS) { /* addrs[0] is a foreign socket, addrs[1] is a local one. */ struct sockaddr_storage addrs[2]; struct inpcb *inp; struct tcpcb *tp; struct tcptw *tw; struct sockaddr_in *fin, *lin; #ifdef INET6 struct sockaddr_in6 *fin6, *lin6; #endif int error; inp = NULL; fin = lin = NULL; #ifdef INET6 fin6 = lin6 = NULL; #endif error = 0; if (req->oldptr != NULL || req->oldlen != 0) return (EINVAL); if (req->newptr == NULL) return (EPERM); if (req->newlen < sizeof(addrs)) return (ENOMEM); error = SYSCTL_IN(req, &addrs, sizeof(addrs)); if (error) return (error); switch (addrs[0].ss_family) { #ifdef INET6 case AF_INET6: fin6 = (struct sockaddr_in6 *)&addrs[0]; lin6 = (struct sockaddr_in6 *)&addrs[1]; if (fin6->sin6_len != sizeof(struct sockaddr_in6) || lin6->sin6_len != sizeof(struct sockaddr_in6)) return (EINVAL); if (IN6_IS_ADDR_V4MAPPED(&fin6->sin6_addr)) { if (!IN6_IS_ADDR_V4MAPPED(&lin6->sin6_addr)) return (EINVAL); in6_sin6_2_sin_in_sock((struct sockaddr *)&addrs[0]); in6_sin6_2_sin_in_sock((struct sockaddr *)&addrs[1]); fin = (struct sockaddr_in *)&addrs[0]; lin = (struct sockaddr_in *)&addrs[1]; break; } error = sa6_embedscope(fin6, V_ip6_use_defzone); if (error) return (error); error = sa6_embedscope(lin6, V_ip6_use_defzone); if (error) return (error); break; #endif #ifdef INET case AF_INET: fin = (struct sockaddr_in *)&addrs[0]; lin = (struct sockaddr_in *)&addrs[1]; if (fin->sin_len != sizeof(struct sockaddr_in) || lin->sin_len != sizeof(struct sockaddr_in)) return (EINVAL); break; #endif default: return (EINVAL); } INP_INFO_RLOCK(&V_tcbinfo); switch (addrs[0].ss_family) { #ifdef INET6 case AF_INET6: inp = in6_pcblookup(&V_tcbinfo, &fin6->sin6_addr, fin6->sin6_port, &lin6->sin6_addr, lin6->sin6_port, INPLOOKUP_WLOCKPCB, NULL); break; #endif #ifdef INET case AF_INET: inp = in_pcblookup(&V_tcbinfo, fin->sin_addr, fin->sin_port, lin->sin_addr, lin->sin_port, INPLOOKUP_WLOCKPCB, NULL); break; #endif } if (inp != NULL) { if (inp->inp_flags & INP_TIMEWAIT) { /* * XXXRW: There currently exists a state where an * inpcb is present, but its timewait state has been * discarded. For now, don't allow dropping of this * type of inpcb. */ tw = intotw(inp); if (tw != NULL) tcp_twclose(tw, 0); else INP_WUNLOCK(inp); } else if (!(inp->inp_flags & INP_DROPPED) && !(inp->inp_socket->so_options & SO_ACCEPTCONN)) { tp = intotcpcb(inp); tp = tcp_drop(tp, ECONNABORTED); if (tp != NULL) INP_WUNLOCK(inp); } else INP_WUNLOCK(inp); } else error = ESRCH; INP_INFO_RUNLOCK(&V_tcbinfo); return (error); } SYSCTL_PROC(_net_inet_tcp, TCPCTL_DROP, drop, CTLFLAG_VNET | CTLTYPE_STRUCT | CTLFLAG_WR | CTLFLAG_SKIP, NULL, 0, sysctl_drop, "", "Drop TCP connection"); /* * Generate a standardized TCP log line for use throughout the * tcp subsystem. Memory allocation is done with M_NOWAIT to * allow use in the interrupt context. * * NB: The caller MUST free(s, M_TCPLOG) the returned string. * NB: The function may return NULL if memory allocation failed. * * Due to header inclusion and ordering limitations the struct ip * and ip6_hdr pointers have to be passed as void pointers. */ char * tcp_log_vain(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, const void *ip6hdr) { /* Is logging enabled? */ if (tcp_log_in_vain == 0) return (NULL); return (tcp_log_addr(inc, th, ip4hdr, ip6hdr)); } char * tcp_log_addrs(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, const void *ip6hdr) { /* Is logging enabled? */ if (tcp_log_debug == 0) return (NULL); return (tcp_log_addr(inc, th, ip4hdr, ip6hdr)); } static char * tcp_log_addr(struct in_conninfo *inc, struct tcphdr *th, void *ip4hdr, const void *ip6hdr) { char *s, *sp; size_t size; struct ip *ip; #ifdef INET6 const struct ip6_hdr *ip6; ip6 = (const struct ip6_hdr *)ip6hdr; #endif /* INET6 */ ip = (struct ip *)ip4hdr; /* * The log line looks like this: * "TCP: [1.2.3.4]:50332 to [1.2.3.4]:80 tcpflags 0x2" */ size = sizeof("TCP: []:12345 to []:12345 tcpflags 0x2<>") + sizeof(PRINT_TH_FLAGS) + 1 + #ifdef INET6 2 * INET6_ADDRSTRLEN; #else 2 * INET_ADDRSTRLEN; #endif /* INET6 */ s = malloc(size, M_TCPLOG, M_ZERO|M_NOWAIT); if (s == NULL) return (NULL); strcat(s, "TCP: ["); sp = s + strlen(s); if (inc && ((inc->inc_flags & INC_ISIPV6) == 0)) { inet_ntoa_r(inc->inc_faddr, sp); sp = s + strlen(s); sprintf(sp, "]:%i to [", ntohs(inc->inc_fport)); sp = s + strlen(s); inet_ntoa_r(inc->inc_laddr, sp); sp = s + strlen(s); sprintf(sp, "]:%i", ntohs(inc->inc_lport)); #ifdef INET6 } else if (inc) { ip6_sprintf(sp, &inc->inc6_faddr); sp = s + strlen(s); sprintf(sp, "]:%i to [", ntohs(inc->inc_fport)); sp = s + strlen(s); ip6_sprintf(sp, &inc->inc6_laddr); sp = s + strlen(s); sprintf(sp, "]:%i", ntohs(inc->inc_lport)); } else if (ip6 && th) { ip6_sprintf(sp, &ip6->ip6_src); sp = s + strlen(s); sprintf(sp, "]:%i to [", ntohs(th->th_sport)); sp = s + strlen(s); ip6_sprintf(sp, &ip6->ip6_dst); sp = s + strlen(s); sprintf(sp, "]:%i", ntohs(th->th_dport)); #endif /* INET6 */ #ifdef INET } else if (ip && th) { inet_ntoa_r(ip->ip_src, sp); sp = s + strlen(s); sprintf(sp, "]:%i to [", ntohs(th->th_sport)); sp = s + strlen(s); inet_ntoa_r(ip->ip_dst, sp); sp = s + strlen(s); sprintf(sp, "]:%i", ntohs(th->th_dport)); #endif /* INET */ } else { free(s, M_TCPLOG); return (NULL); } sp = s + strlen(s); if (th) sprintf(sp, " tcpflags 0x%b", th->th_flags, PRINT_TH_FLAGS); if (*(s + size - 1) != '\0') panic("%s: string too long", __func__); return (s); } /* * A subroutine which makes it easy to track TCP state changes with DTrace. * This function shouldn't be called for t_state initializations that don't * correspond to actual TCP state transitions. */ void tcp_state_change(struct tcpcb *tp, int newstate) { #if defined(KDTRACE_HOOKS) int pstate = tp->t_state; #endif tp->t_state = newstate; TCP_PROBE6(state__change, NULL, tp, NULL, tp, NULL, pstate); } Index: projects/clang380-import/sys/netinet6/in6_fib.c =================================================================== --- projects/clang380-import/sys/netinet6/in6_fib.c (revision 294776) +++ projects/clang380-import/sys/netinet6/in6_fib.c (revision 294777) @@ -1,275 +1,276 @@ /*- * Copyright (c) 2015 * Alexander V. Chernikov * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include "opt_route.h" #include "opt_mpath.h" #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #ifdef RADIX_MPATH #include #endif #include #include #include #include #include #include #include #include #include #ifdef INET6 static void fib6_rte_to_nh_extended(struct rtentry *rte, const struct in6_addr *dst, uint32_t flags, struct nhop6_extended *pnh6); static void fib6_rte_to_nh_basic(struct rtentry *rte, const struct in6_addr *dst, uint32_t flags, struct nhop6_basic *pnh6); static struct ifnet *fib6_get_ifaifp(struct rtentry *rte); #define RNTORT(p) ((struct rtentry *)(p)) /* * Gets real interface for the @rte. * Returns rt_ifp for !IFF_LOOPBACK routers. * Extracts "real" address interface from interface address * loopback routes. */ static struct ifnet * fib6_get_ifaifp(struct rtentry *rte) { struct ifnet *ifp; struct sockaddr_dl *sdl; ifp = rte->rt_ifp; if ((ifp->if_flags & IFF_LOOPBACK) && rte->rt_gateway->sa_family == AF_LINK) { sdl = (struct sockaddr_dl *)rte->rt_gateway; return (ifnet_byindex(sdl->sdl_index)); } return (ifp); } static void fib6_rte_to_nh_basic(struct rtentry *rte, const struct in6_addr *dst, uint32_t flags, struct nhop6_basic *pnh6) { struct sockaddr_in6 *gw; /* Do explicit nexthop zero unless we're copying it */ memset(pnh6, 0, sizeof(*pnh6)); if ((flags & NHR_IFAIF) != 0) pnh6->nh_ifp = fib6_get_ifaifp(rte); else pnh6->nh_ifp = rte->rt_ifp; pnh6->nh_mtu = min(rte->rt_mtu, IN6_LINKMTU(rte->rt_ifp)); if (rte->rt_flags & RTF_GATEWAY) { gw = (struct sockaddr_in6 *)rte->rt_gateway; pnh6->nh_addr = gw->sin6_addr; in6_clearscope(&pnh6->nh_addr); } else pnh6->nh_addr = *dst; /* Set flags */ pnh6->nh_flags = fib_rte_to_nh_flags(rte->rt_flags); gw = (struct sockaddr_in6 *)rt_key(rte); if (IN6_IS_ADDR_UNSPECIFIED(&gw->sin6_addr)) pnh6->nh_flags |= NHF_DEFAULT; } static void fib6_rte_to_nh_extended(struct rtentry *rte, const struct in6_addr *dst, uint32_t flags, struct nhop6_extended *pnh6) { struct sockaddr_in6 *gw; /* Do explicit nexthop zero unless we're copying it */ memset(pnh6, 0, sizeof(*pnh6)); if ((flags & NHR_IFAIF) != 0) pnh6->nh_ifp = fib6_get_ifaifp(rte); else pnh6->nh_ifp = rte->rt_ifp; pnh6->nh_mtu = min(rte->rt_mtu, IN6_LINKMTU(rte->rt_ifp)); if (rte->rt_flags & RTF_GATEWAY) { gw = (struct sockaddr_in6 *)rte->rt_gateway; pnh6->nh_addr = gw->sin6_addr; in6_clearscope(&pnh6->nh_addr); } else pnh6->nh_addr = *dst; /* Set flags */ pnh6->nh_flags = fib_rte_to_nh_flags(rte->rt_flags); gw = (struct sockaddr_in6 *)rt_key(rte); if (IN6_IS_ADDR_UNSPECIFIED(&gw->sin6_addr)) pnh6->nh_flags |= NHF_DEFAULT; } /* * Performs IPv6 route table lookup on @dst. Returns 0 on success. * Stores basic nexthop info into provided @pnh6 structure. * Note that * - nh_ifp represents logical transmit interface (rt_ifp) by default * - nh_ifp represents "address" interface if NHR_IFAIF flag is passed * - mtu from logical transmit interface will be returned. * - nh_ifp cannot be safely dereferenced * - nh_ifp represents rt_ifp (e.g. if looking up address on * interface "ix0" pointer to "ix0" interface will be returned instead * of "lo0") * - howewer mtu from "transmit" interface will be returned. * - scope will be embedded in nh_addr */ int fib6_lookup_nh_basic(uint32_t fibnum, const struct in6_addr *dst, uint32_t scopeid, uint32_t flags, uint32_t flowid, struct nhop6_basic *pnh6) { - struct radix_node_head *rh; + struct rib_head *rh; struct radix_node *rn; struct sockaddr_in6 sin6; struct rtentry *rte; KASSERT((fibnum < rt_numfibs), ("fib6_lookup_nh_basic: bad fibnum")); rh = rt_tables_get_rnh(fibnum, AF_INET6); if (rh == NULL) return (ENOENT); /* Prepare lookup key */ memset(&sin6, 0, sizeof(sin6)); sin6.sin6_addr = *dst; sin6.sin6_len = sizeof(struct sockaddr_in6); /* Assume scopeid is valid and embed it directly */ if (IN6_IS_SCOPE_LINKLOCAL(dst)) sin6.sin6_addr.s6_addr16[1] = htons(scopeid & 0xffff); - RADIX_NODE_HEAD_RLOCK(rh); - rn = rh->rnh_matchaddr((void *)&sin6, rh); + RIB_RLOCK(rh); + rn = rh->rnh_matchaddr((void *)&sin6, &rh->head); if (rn != NULL && ((rn->rn_flags & RNF_ROOT) == 0)) { rte = RNTORT(rn); /* Ensure route & ifp is UP */ if (RT_LINK_IS_UP(rte->rt_ifp)) { fib6_rte_to_nh_basic(rte, &sin6.sin6_addr, flags, pnh6); - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (0); } } - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (ENOENT); } /* * Performs IPv6 route table lookup on @dst. Returns 0 on success. * Stores extended nexthop info into provided @pnh6 structure. * Note that * - nh_ifp cannot be safely dereferenced unless NHR_REF is specified. * - in that case you need to call fib6_free_nh_ext() * - nh_ifp represents logical transmit interface (rt_ifp) by default * - nh_ifp represents "address" interface if NHR_IFAIF flag is passed * - mtu from logical transmit interface will be returned. * - scope will be embedded in nh_addr */ int fib6_lookup_nh_ext(uint32_t fibnum, const struct in6_addr *dst,uint32_t scopeid, uint32_t flags, uint32_t flowid, struct nhop6_extended *pnh6) { - struct radix_node_head *rh; + struct rib_head *rh; struct radix_node *rn; struct sockaddr_in6 sin6; struct rtentry *rte; KASSERT((fibnum < rt_numfibs), ("fib6_lookup_nh_ext: bad fibnum")); rh = rt_tables_get_rnh(fibnum, AF_INET6); if (rh == NULL) return (ENOENT); /* Prepare lookup key */ memset(&sin6, 0, sizeof(sin6)); sin6.sin6_len = sizeof(struct sockaddr_in6); sin6.sin6_addr = *dst; /* Assume scopeid is valid and embed it directly */ if (IN6_IS_SCOPE_LINKLOCAL(dst)) sin6.sin6_addr.s6_addr16[1] = htons(scopeid & 0xffff); - RADIX_NODE_HEAD_RLOCK(rh); - rn = rh->rnh_matchaddr((void *)&sin6, rh); + RIB_RLOCK(rh); + rn = rh->rnh_matchaddr((void *)&sin6, &rh->head); if (rn != NULL && ((rn->rn_flags & RNF_ROOT) == 0)) { rte = RNTORT(rn); #ifdef RADIX_MPATH rte = rt_mpath_select(rte, flowid); if (rte == NULL) { - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (ENOENT); } #endif /* Ensure route & ifp is UP */ if (RT_LINK_IS_UP(rte->rt_ifp)) { fib6_rte_to_nh_extended(rte, &sin6.sin6_addr, flags, pnh6); if ((flags & NHR_REF) != 0) { /* TODO: Do lwref on egress ifp's */ } - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (0); } } - RADIX_NODE_HEAD_RUNLOCK(rh); + RIB_RUNLOCK(rh); return (ENOENT); } void fib6_free_nh_ext(uint32_t fibnum, struct nhop6_extended *pnh6) { } #endif Index: projects/clang380-import/sys/netinet6/in6_rmx.c =================================================================== --- projects/clang380-import/sys/netinet6/in6_rmx.c (revision 294776) +++ projects/clang380-import/sys/netinet6/in6_rmx.c (revision 294777) @@ -1,283 +1,282 @@ /*- * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $KAME: in6_rmx.c,v 1.11 2001/07/26 06:53:16 jinmei Exp $ */ /*- * Copyright 1994, 1995 Massachusetts Institute of Technology * * Permission to use, copy, modify, and distribute this software and * its documentation for any purpose and without fee is hereby * granted, provided that both the above copyright notice and this * permission notice appear in all copies, that both the above * copyright notice and this permission notice appear in all * supporting documentation, and that the name of M.I.T. not be used * in advertising or publicity pertaining to distribution of the * software without specific, written prior permission. M.I.T. makes * no representations about the suitability of this software for any * purpose. It is provided "as is" without express or implied * warranty. * * THIS SOFTWARE IS PROVIDED BY M.I.T. ``AS IS''. M.I.T. DISCLAIMS * ALL EXPRESS OR IMPLIED WARRANTIES WITH REGARD TO THIS SOFTWARE, * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT * SHALL M.I.T. BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include #include #include #include extern int in6_inithead(void **head, int off); #ifdef VIMAGE extern int in6_detachhead(void **head, int off); #endif /* * Do what we need to do when inserting a route. */ static struct radix_node * -in6_addroute(void *v_arg, void *n_arg, struct radix_node_head *head, +in6_addroute(void *v_arg, void *n_arg, struct radix_head *head, struct radix_node *treenodes) { struct rtentry *rt = (struct rtentry *)treenodes; struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)rt_key(rt); - RADIX_NODE_HEAD_WLOCK_ASSERT(head); if (IN6_IS_ADDR_MULTICAST(&sin6->sin6_addr)) rt->rt_flags |= RTF_MULTICAST; /* * A little bit of help for both IPv6 output and input: * For local addresses, we make sure that RTF_LOCAL is set, * with the thought that this might one day be used to speed up * ip_input(). * * We also mark routes to multicast addresses as such, because * it's easy to do and might be useful (but this is much more * dubious since it's so easy to inspect the address). (This * is done above.) * * XXX * should elaborate the code. */ if (rt->rt_flags & RTF_HOST) { if (IN6_ARE_ADDR_EQUAL(&satosin6(rt->rt_ifa->ifa_addr) ->sin6_addr, &sin6->sin6_addr)) { rt->rt_flags |= RTF_LOCAL; } } if (rt->rt_ifp != NULL) { /* * Check route MTU: * inherit interface MTU if not set or * check if MTU is too large. */ if (rt->rt_mtu == 0) { rt->rt_mtu = IN6_LINKMTU(rt->rt_ifp); } else if (rt->rt_mtu > IN6_LINKMTU(rt->rt_ifp)) rt->rt_mtu = IN6_LINKMTU(rt->rt_ifp); } return (rn_addroute(v_arg, n_arg, head, treenodes)); } /* * Age old PMTUs. */ struct mtuex_arg { - struct radix_node_head *rnh; + struct rib_head *rnh; time_t nextstop; }; static VNET_DEFINE(struct callout, rtq_mtutimer); #define V_rtq_mtutimer VNET(rtq_mtutimer) static int in6_mtuexpire(struct rtentry *rt, void *rock) { struct mtuex_arg *ap = rock; if (rt->rt_expire && !(rt->rt_flags & RTF_PROBEMTU)) { if (rt->rt_expire <= time_uptime) { rt->rt_flags |= RTF_PROBEMTU; } else { ap->nextstop = lmin(ap->nextstop, rt->rt_expire); } } return (0); } #define MTUTIMO_DEFAULT (60*1) static void -in6_mtutimo_setwa(struct radix_node_head *rnh, uint32_t fibum, int af, +in6_mtutimo_setwa(struct rib_head *rnh, uint32_t fibum, int af, void *_arg) { struct mtuex_arg *arg; arg = (struct mtuex_arg *)_arg; arg->rnh = rnh; } static void in6_mtutimo(void *rock) { CURVNET_SET_QUIET((struct vnet *) rock); struct timeval atv; struct mtuex_arg arg; rt_foreach_fib_walk(AF_INET6, in6_mtutimo_setwa, in6_mtuexpire, &arg); atv.tv_sec = MTUTIMO_DEFAULT; atv.tv_usec = 0; callout_reset(&V_rtq_mtutimer, tvtohz(&atv), in6_mtutimo, rock); CURVNET_RESTORE(); } /* * Initialize our routing tree. */ static VNET_DEFINE(int, _in6_rt_was_here); #define V__in6_rt_was_here VNET(_in6_rt_was_here) int in6_inithead(void **head, int off) { - struct radix_node_head *rnh; + struct rib_head *rh; - if (!rn_inithead(head, offsetof(struct sockaddr_in6, sin6_addr) << 3)) + rh = rt_table_init(offsetof(struct sockaddr_in6, sin6_addr) << 3); + if (rh == NULL) return (0); - rnh = *head; - RADIX_NODE_HEAD_LOCK_INIT(rnh); - - rnh->rnh_addaddr = in6_addroute; + rh->rnh_addaddr = in6_addroute; + *head = (void *)rh; if (V__in6_rt_was_here == 0) { callout_init(&V_rtq_mtutimer, 1); in6_mtutimo(curvnet); /* kick off timeout first time */ V__in6_rt_was_here = 1; } return (1); } #ifdef VIMAGE int in6_detachhead(void **head, int off) { callout_drain(&V_rtq_mtutimer); return (rn_detachhead(head)); } #endif /* * Extended API for IPv6 FIB support. */ void in6_rtredirect(struct sockaddr *dst, struct sockaddr *gw, struct sockaddr *nm, int flags, struct sockaddr *src, u_int fibnum) { rtredirect_fib(dst, gw, nm, flags, src, fibnum); } int in6_rtrequest(int req, struct sockaddr *dst, struct sockaddr *gw, struct sockaddr *mask, int flags, struct rtentry **ret_nrt, u_int fibnum) { return (rtrequest_fib(req, dst, gw, mask, flags, ret_nrt, fibnum)); } void in6_rtalloc(struct route_in6 *ro, u_int fibnum) { rtalloc_ign_fib((struct route *)ro, 0ul, fibnum); } void in6_rtalloc_ign(struct route_in6 *ro, u_long ignflags, u_int fibnum) { rtalloc_ign_fib((struct route *)ro, ignflags, fibnum); } struct rtentry * in6_rtalloc1(struct sockaddr *dst, int report, u_long ignflags, u_int fibnum) { return (rtalloc1_fib(dst, report, ignflags, fibnum)); } Index: projects/clang380-import/sys/netinet6/nd6_rtr.c =================================================================== --- projects/clang380-import/sys/netinet6/nd6_rtr.c (revision 294776) +++ projects/clang380-import/sys/netinet6/nd6_rtr.c (revision 294777) @@ -1,2137 +1,2138 @@ /*- * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the project nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $KAME: nd6_rtr.c,v 1.111 2001/04/27 01:37:15 jinmei Exp $ */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include #include #include #include static int rtpref(struct nd_defrouter *); static struct nd_defrouter *defrtrlist_update(struct nd_defrouter *); static int prelist_update(struct nd_prefixctl *, struct nd_defrouter *, struct mbuf *, int); static struct in6_ifaddr *in6_ifadd(struct nd_prefixctl *, int); static struct nd_pfxrouter *pfxrtr_lookup(struct nd_prefix *, struct nd_defrouter *); static void pfxrtr_add(struct nd_prefix *, struct nd_defrouter *); static void pfxrtr_del(struct nd_pfxrouter *); static struct nd_pfxrouter *find_pfxlist_reachable_router (struct nd_prefix *); static void defrouter_delreq(struct nd_defrouter *); static void nd6_rtmsg(int, struct rtentry *); static int in6_init_prefix_ltimes(struct nd_prefix *); static void in6_init_address_ltimes(struct nd_prefix *, struct in6_addrlifetime *); static int nd6_prefix_onlink(struct nd_prefix *); static int nd6_prefix_offlink(struct nd_prefix *); static int rt6_deleteroute(const struct rtentry *, void *); VNET_DECLARE(int, nd6_recalc_reachtm_interval); #define V_nd6_recalc_reachtm_interval VNET(nd6_recalc_reachtm_interval) static VNET_DEFINE(struct ifnet *, nd6_defifp); VNET_DEFINE(int, nd6_defifindex); #define V_nd6_defifp VNET(nd6_defifp) VNET_DEFINE(int, ip6_use_tempaddr) = 0; VNET_DEFINE(int, ip6_desync_factor); VNET_DEFINE(u_int32_t, ip6_temp_preferred_lifetime) = DEF_TEMP_PREFERRED_LIFETIME; VNET_DEFINE(u_int32_t, ip6_temp_valid_lifetime) = DEF_TEMP_VALID_LIFETIME; VNET_DEFINE(int, ip6_temp_regen_advance) = TEMPADDR_REGEN_ADVANCE; /* RTPREF_MEDIUM has to be 0! */ #define RTPREF_HIGH 1 #define RTPREF_MEDIUM 0 #define RTPREF_LOW (-1) #define RTPREF_RESERVED (-2) #define RTPREF_INVALID (-3) /* internal */ /* * Receive Router Solicitation Message - just for routers. * Router solicitation/advertisement is mostly managed by userland program * (rtadvd) so here we have no function like nd6_ra_output(). * * Based on RFC 2461 */ void nd6_rs_input(struct mbuf *m, int off, int icmp6len) { struct ifnet *ifp = m->m_pkthdr.rcvif; struct ip6_hdr *ip6 = mtod(m, struct ip6_hdr *); struct nd_router_solicit *nd_rs; struct in6_addr saddr6 = ip6->ip6_src; char *lladdr = NULL; int lladdrlen = 0; union nd_opts ndopts; char ip6bufs[INET6_ADDRSTRLEN], ip6bufd[INET6_ADDRSTRLEN]; /* * Accept RS only when V_ip6_forwarding=1 and the interface has * no ND6_IFF_ACCEPT_RTADV. */ if (!V_ip6_forwarding || ND_IFINFO(ifp)->flags & ND6_IFF_ACCEPT_RTADV) goto freeit; /* Sanity checks */ if (ip6->ip6_hlim != 255) { nd6log((LOG_ERR, "nd6_rs_input: invalid hlim (%d) from %s to %s on %s\n", ip6->ip6_hlim, ip6_sprintf(ip6bufs, &ip6->ip6_src), ip6_sprintf(ip6bufd, &ip6->ip6_dst), if_name(ifp))); goto bad; } /* * Don't update the neighbor cache, if src = ::. * This indicates that the src has no IP address assigned yet. */ if (IN6_IS_ADDR_UNSPECIFIED(&saddr6)) goto freeit; #ifndef PULLDOWN_TEST IP6_EXTHDR_CHECK(m, off, icmp6len,); nd_rs = (struct nd_router_solicit *)((caddr_t)ip6 + off); #else IP6_EXTHDR_GET(nd_rs, struct nd_router_solicit *, m, off, icmp6len); if (nd_rs == NULL) { ICMP6STAT_INC(icp6s_tooshort); return; } #endif icmp6len -= sizeof(*nd_rs); nd6_option_init(nd_rs + 1, icmp6len, &ndopts); if (nd6_options(&ndopts) < 0) { nd6log((LOG_INFO, "nd6_rs_input: invalid ND option, ignored\n")); /* nd6_options have incremented stats */ goto freeit; } if (ndopts.nd_opts_src_lladdr) { lladdr = (char *)(ndopts.nd_opts_src_lladdr + 1); lladdrlen = ndopts.nd_opts_src_lladdr->nd_opt_len << 3; } if (lladdr && ((ifp->if_addrlen + 2 + 7) & ~7) != lladdrlen) { nd6log((LOG_INFO, "nd6_rs_input: lladdrlen mismatch for %s " "(if %d, RS packet %d)\n", ip6_sprintf(ip6bufs, &saddr6), ifp->if_addrlen, lladdrlen - 2)); goto bad; } nd6_cache_lladdr(ifp, &saddr6, lladdr, lladdrlen, ND_ROUTER_SOLICIT, 0); freeit: m_freem(m); return; bad: ICMP6STAT_INC(icp6s_badrs); m_freem(m); } /* * Receive Router Advertisement Message. * * Based on RFC 2461 * TODO: on-link bit on prefix information * TODO: ND_RA_FLAG_{OTHER,MANAGED} processing */ void nd6_ra_input(struct mbuf *m, int off, int icmp6len) { struct ifnet *ifp = m->m_pkthdr.rcvif; struct nd_ifinfo *ndi = ND_IFINFO(ifp); struct ip6_hdr *ip6 = mtod(m, struct ip6_hdr *); struct nd_router_advert *nd_ra; struct in6_addr saddr6 = ip6->ip6_src; int mcast = 0; union nd_opts ndopts; struct nd_defrouter *dr; char ip6bufs[INET6_ADDRSTRLEN], ip6bufd[INET6_ADDRSTRLEN]; /* * We only accept RAs only when the per-interface flag * ND6_IFF_ACCEPT_RTADV is on the receiving interface. */ if (!(ndi->flags & ND6_IFF_ACCEPT_RTADV)) goto freeit; if (ip6->ip6_hlim != 255) { nd6log((LOG_ERR, "nd6_ra_input: invalid hlim (%d) from %s to %s on %s\n", ip6->ip6_hlim, ip6_sprintf(ip6bufs, &ip6->ip6_src), ip6_sprintf(ip6bufd, &ip6->ip6_dst), if_name(ifp))); goto bad; } if (!IN6_IS_ADDR_LINKLOCAL(&saddr6)) { nd6log((LOG_ERR, "nd6_ra_input: src %s is not link-local\n", ip6_sprintf(ip6bufs, &saddr6))); goto bad; } #ifndef PULLDOWN_TEST IP6_EXTHDR_CHECK(m, off, icmp6len,); nd_ra = (struct nd_router_advert *)((caddr_t)ip6 + off); #else IP6_EXTHDR_GET(nd_ra, struct nd_router_advert *, m, off, icmp6len); if (nd_ra == NULL) { ICMP6STAT_INC(icp6s_tooshort); return; } #endif icmp6len -= sizeof(*nd_ra); nd6_option_init(nd_ra + 1, icmp6len, &ndopts); if (nd6_options(&ndopts) < 0) { nd6log((LOG_INFO, "nd6_ra_input: invalid ND option, ignored\n")); /* nd6_options have incremented stats */ goto freeit; } { struct nd_defrouter dr0; u_int32_t advreachable = nd_ra->nd_ra_reachable; /* remember if this is a multicasted advertisement */ if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst)) mcast = 1; bzero(&dr0, sizeof(dr0)); dr0.rtaddr = saddr6; dr0.flags = nd_ra->nd_ra_flags_reserved; /* * Effectively-disable routes from RA messages when * ND6_IFF_NO_RADR enabled on the receiving interface or * (ip6.forwarding == 1 && ip6.rfc6204w3 != 1). */ if (ndi->flags & ND6_IFF_NO_RADR) dr0.rtlifetime = 0; else if (V_ip6_forwarding && !V_ip6_rfc6204w3) dr0.rtlifetime = 0; else dr0.rtlifetime = ntohs(nd_ra->nd_ra_router_lifetime); dr0.expire = time_uptime + dr0.rtlifetime; dr0.ifp = ifp; /* unspecified or not? (RFC 2461 6.3.4) */ if (advreachable) { advreachable = ntohl(advreachable); if (advreachable <= MAX_REACHABLE_TIME && ndi->basereachable != advreachable) { ndi->basereachable = advreachable; ndi->reachable = ND_COMPUTE_RTIME(ndi->basereachable); ndi->recalctm = V_nd6_recalc_reachtm_interval; /* reset */ } } if (nd_ra->nd_ra_retransmit) ndi->retrans = ntohl(nd_ra->nd_ra_retransmit); if (nd_ra->nd_ra_curhoplimit) { if (ndi->chlim < nd_ra->nd_ra_curhoplimit) ndi->chlim = nd_ra->nd_ra_curhoplimit; else if (ndi->chlim != nd_ra->nd_ra_curhoplimit) { log(LOG_ERR, "RA with a lower CurHopLimit sent from " "%s on %s (current = %d, received = %d). " "Ignored.\n", ip6_sprintf(ip6bufs, &ip6->ip6_src), if_name(ifp), ndi->chlim, nd_ra->nd_ra_curhoplimit); } } dr = defrtrlist_update(&dr0); } /* * prefix */ if (ndopts.nd_opts_pi) { struct nd_opt_hdr *pt; struct nd_opt_prefix_info *pi = NULL; struct nd_prefixctl pr; for (pt = (struct nd_opt_hdr *)ndopts.nd_opts_pi; pt <= (struct nd_opt_hdr *)ndopts.nd_opts_pi_end; pt = (struct nd_opt_hdr *)((caddr_t)pt + (pt->nd_opt_len << 3))) { if (pt->nd_opt_type != ND_OPT_PREFIX_INFORMATION) continue; pi = (struct nd_opt_prefix_info *)pt; if (pi->nd_opt_pi_len != 4) { nd6log((LOG_INFO, "nd6_ra_input: invalid option " "len %d for prefix information option, " "ignored\n", pi->nd_opt_pi_len)); continue; } if (128 < pi->nd_opt_pi_prefix_len) { nd6log((LOG_INFO, "nd6_ra_input: invalid prefix " "len %d for prefix information option, " "ignored\n", pi->nd_opt_pi_prefix_len)); continue; } if (IN6_IS_ADDR_MULTICAST(&pi->nd_opt_pi_prefix) || IN6_IS_ADDR_LINKLOCAL(&pi->nd_opt_pi_prefix)) { nd6log((LOG_INFO, "nd6_ra_input: invalid prefix " "%s, ignored\n", ip6_sprintf(ip6bufs, &pi->nd_opt_pi_prefix))); continue; } bzero(&pr, sizeof(pr)); pr.ndpr_prefix.sin6_family = AF_INET6; pr.ndpr_prefix.sin6_len = sizeof(pr.ndpr_prefix); pr.ndpr_prefix.sin6_addr = pi->nd_opt_pi_prefix; pr.ndpr_ifp = (struct ifnet *)m->m_pkthdr.rcvif; pr.ndpr_raf_onlink = (pi->nd_opt_pi_flags_reserved & ND_OPT_PI_FLAG_ONLINK) ? 1 : 0; pr.ndpr_raf_auto = (pi->nd_opt_pi_flags_reserved & ND_OPT_PI_FLAG_AUTO) ? 1 : 0; pr.ndpr_plen = pi->nd_opt_pi_prefix_len; pr.ndpr_vltime = ntohl(pi->nd_opt_pi_valid_time); pr.ndpr_pltime = ntohl(pi->nd_opt_pi_preferred_time); (void)prelist_update(&pr, dr, m, mcast); } } /* * MTU */ if (ndopts.nd_opts_mtu && ndopts.nd_opts_mtu->nd_opt_mtu_len == 1) { u_long mtu; u_long maxmtu; mtu = (u_long)ntohl(ndopts.nd_opts_mtu->nd_opt_mtu_mtu); /* lower bound */ if (mtu < IPV6_MMTU) { nd6log((LOG_INFO, "nd6_ra_input: bogus mtu option " "mtu=%lu sent from %s, ignoring\n", mtu, ip6_sprintf(ip6bufs, &ip6->ip6_src))); goto skip; } /* upper bound */ maxmtu = (ndi->maxmtu && ndi->maxmtu < ifp->if_mtu) ? ndi->maxmtu : ifp->if_mtu; if (mtu <= maxmtu) { int change = (ndi->linkmtu != mtu); ndi->linkmtu = mtu; if (change) /* in6_maxmtu may change */ in6_setmaxmtu(); } else { nd6log((LOG_INFO, "nd6_ra_input: bogus mtu " "mtu=%lu sent from %s; " "exceeds maxmtu %lu, ignoring\n", mtu, ip6_sprintf(ip6bufs, &ip6->ip6_src), maxmtu)); } } skip: /* * Source link layer address */ { char *lladdr = NULL; int lladdrlen = 0; if (ndopts.nd_opts_src_lladdr) { lladdr = (char *)(ndopts.nd_opts_src_lladdr + 1); lladdrlen = ndopts.nd_opts_src_lladdr->nd_opt_len << 3; } if (lladdr && ((ifp->if_addrlen + 2 + 7) & ~7) != lladdrlen) { nd6log((LOG_INFO, "nd6_ra_input: lladdrlen mismatch for %s " "(if %d, RA packet %d)\n", ip6_sprintf(ip6bufs, &saddr6), ifp->if_addrlen, lladdrlen - 2)); goto bad; } nd6_cache_lladdr(ifp, &saddr6, lladdr, lladdrlen, ND_ROUTER_ADVERT, 0); /* * Installing a link-layer address might change the state of the * router's neighbor cache, which might also affect our on-link * detection of adveritsed prefixes. */ pfxlist_onlink_check(); } freeit: m_freem(m); return; bad: ICMP6STAT_INC(icp6s_badra); m_freem(m); } /* * default router list proccessing sub routines */ /* tell the change to user processes watching the routing socket. */ static void nd6_rtmsg(int cmd, struct rtentry *rt) { struct rt_addrinfo info; struct ifnet *ifp; struct ifaddr *ifa; bzero((caddr_t)&info, sizeof(info)); info.rti_info[RTAX_DST] = rt_key(rt); info.rti_info[RTAX_GATEWAY] = rt->rt_gateway; info.rti_info[RTAX_NETMASK] = rt_mask(rt); ifp = rt->rt_ifp; if (ifp != NULL) { IF_ADDR_RLOCK(ifp); ifa = TAILQ_FIRST(&ifp->if_addrhead); info.rti_info[RTAX_IFP] = ifa->ifa_addr; ifa_ref(ifa); IF_ADDR_RUNLOCK(ifp); info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; } else ifa = NULL; rt_missmsg_fib(cmd, &info, rt->rt_flags, 0, rt->rt_fibnum); if (ifa != NULL) ifa_free(ifa); } static void defrouter_addreq(struct nd_defrouter *new) { struct sockaddr_in6 def, mask, gate; struct rtentry *newrt = NULL; int error; bzero(&def, sizeof(def)); bzero(&mask, sizeof(mask)); bzero(&gate, sizeof(gate)); def.sin6_len = mask.sin6_len = gate.sin6_len = sizeof(struct sockaddr_in6); def.sin6_family = gate.sin6_family = AF_INET6; gate.sin6_addr = new->rtaddr; error = in6_rtrequest(RTM_ADD, (struct sockaddr *)&def, (struct sockaddr *)&gate, (struct sockaddr *)&mask, RTF_GATEWAY, &newrt, RT_DEFAULT_FIB); if (newrt) { nd6_rtmsg(RTM_ADD, newrt); /* tell user process */ RTFREE(newrt); } if (error == 0) new->installed = 1; return; } struct nd_defrouter * defrouter_lookup(struct in6_addr *addr, struct ifnet *ifp) { struct nd_defrouter *dr; TAILQ_FOREACH(dr, &V_nd_defrouter, dr_entry) { if (dr->ifp == ifp && IN6_ARE_ADDR_EQUAL(addr, &dr->rtaddr)) return (dr); } return (NULL); /* search failed */ } /* * Remove the default route for a given router. * This is just a subroutine function for defrouter_select(), and should * not be called from anywhere else. */ static void defrouter_delreq(struct nd_defrouter *dr) { struct sockaddr_in6 def, mask, gate; struct rtentry *oldrt = NULL; bzero(&def, sizeof(def)); bzero(&mask, sizeof(mask)); bzero(&gate, sizeof(gate)); def.sin6_len = mask.sin6_len = gate.sin6_len = sizeof(struct sockaddr_in6); def.sin6_family = gate.sin6_family = AF_INET6; gate.sin6_addr = dr->rtaddr; in6_rtrequest(RTM_DELETE, (struct sockaddr *)&def, (struct sockaddr *)&gate, (struct sockaddr *)&mask, RTF_GATEWAY, &oldrt, RT_DEFAULT_FIB); if (oldrt) { nd6_rtmsg(RTM_DELETE, oldrt); RTFREE(oldrt); } dr->installed = 0; } /* * remove all default routes from default router list */ void defrouter_reset(void) { struct nd_defrouter *dr; TAILQ_FOREACH(dr, &V_nd_defrouter, dr_entry) defrouter_delreq(dr); /* * XXX should we also nuke any default routers in the kernel, by * going through them by rtalloc1()? */ } void defrtrlist_del(struct nd_defrouter *dr) { struct nd_defrouter *deldr = NULL; struct nd_prefix *pr; /* * Flush all the routing table entries that use the router * as a next hop. */ if (ND_IFINFO(dr->ifp)->flags & ND6_IFF_ACCEPT_RTADV) rt6_flush(&dr->rtaddr, dr->ifp); if (dr->installed) { deldr = dr; defrouter_delreq(dr); } TAILQ_REMOVE(&V_nd_defrouter, dr, dr_entry); /* * Also delete all the pointers to the router in each prefix lists. */ LIST_FOREACH(pr, &V_nd_prefix, ndpr_entry) { struct nd_pfxrouter *pfxrtr; if ((pfxrtr = pfxrtr_lookup(pr, dr)) != NULL) pfxrtr_del(pfxrtr); } pfxlist_onlink_check(); /* * If the router is the primary one, choose a new one. * Note that defrouter_select() will remove the current gateway * from the routing table. */ if (deldr) defrouter_select(); free(dr, M_IP6NDP); } /* * Default Router Selection according to Section 6.3.6 of RFC 2461 and * draft-ietf-ipngwg-router-selection: * 1) Routers that are reachable or probably reachable should be preferred. * If we have more than one (probably) reachable router, prefer ones * with the highest router preference. * 2) When no routers on the list are known to be reachable or * probably reachable, routers SHOULD be selected in a round-robin * fashion, regardless of router preference values. * 3) If the Default Router List is empty, assume that all * destinations are on-link. * * We assume nd_defrouter is sorted by router preference value. * Since the code below covers both with and without router preference cases, * we do not need to classify the cases by ifdef. * * At this moment, we do not try to install more than one default router, * even when the multipath routing is available, because we're not sure about * the benefits for stub hosts comparing to the risk of making the code * complicated and the possibility of introducing bugs. */ void defrouter_select(void) { struct nd_defrouter *dr, *selected_dr = NULL, *installed_dr = NULL; struct llentry *ln = NULL; /* * Let's handle easy case (3) first: * If default router list is empty, there's nothing to be done. */ if (TAILQ_EMPTY(&V_nd_defrouter)) return; /* * Search for a (probably) reachable router from the list. * We just pick up the first reachable one (if any), assuming that * the ordering rule of the list described in defrtrlist_update(). */ TAILQ_FOREACH(dr, &V_nd_defrouter, dr_entry) { IF_AFDATA_RLOCK(dr->ifp); if (selected_dr == NULL && (ln = nd6_lookup(&dr->rtaddr, 0, dr->ifp)) && ND6_IS_LLINFO_PROBREACH(ln)) { selected_dr = dr; } IF_AFDATA_RUNLOCK(dr->ifp); if (ln != NULL) { LLE_RUNLOCK(ln); ln = NULL; } if (dr->installed && installed_dr == NULL) installed_dr = dr; else if (dr->installed && installed_dr) { /* this should not happen. warn for diagnosis. */ log(LOG_ERR, "defrouter_select: more than one router" " is installed\n"); } } /* * If none of the default routers was found to be reachable, * round-robin the list regardless of preference. * Otherwise, if we have an installed router, check if the selected * (reachable) router should really be preferred to the installed one. * We only prefer the new router when the old one is not reachable * or when the new one has a really higher preference value. */ if (selected_dr == NULL) { if (installed_dr == NULL || !TAILQ_NEXT(installed_dr, dr_entry)) selected_dr = TAILQ_FIRST(&V_nd_defrouter); else selected_dr = TAILQ_NEXT(installed_dr, dr_entry); } else if (installed_dr) { IF_AFDATA_RLOCK(installed_dr->ifp); if ((ln = nd6_lookup(&installed_dr->rtaddr, 0, installed_dr->ifp)) && ND6_IS_LLINFO_PROBREACH(ln) && rtpref(selected_dr) <= rtpref(installed_dr)) { selected_dr = installed_dr; } IF_AFDATA_RUNLOCK(installed_dr->ifp); if (ln != NULL) LLE_RUNLOCK(ln); } /* * If the selected router is different than the installed one, * remove the installed router and install the selected one. * Note that the selected router is never NULL here. */ if (installed_dr != selected_dr) { if (installed_dr) defrouter_delreq(installed_dr); defrouter_addreq(selected_dr); } return; } /* * for default router selection * regards router-preference field as a 2-bit signed integer */ static int rtpref(struct nd_defrouter *dr) { switch (dr->flags & ND_RA_FLAG_RTPREF_MASK) { case ND_RA_FLAG_RTPREF_HIGH: return (RTPREF_HIGH); case ND_RA_FLAG_RTPREF_MEDIUM: case ND_RA_FLAG_RTPREF_RSV: return (RTPREF_MEDIUM); case ND_RA_FLAG_RTPREF_LOW: return (RTPREF_LOW); default: /* * This case should never happen. If it did, it would mean a * serious bug of kernel internal. We thus always bark here. * Or, can we even panic? */ log(LOG_ERR, "rtpref: impossible RA flag %x\n", dr->flags); return (RTPREF_INVALID); } /* NOTREACHED */ } static struct nd_defrouter * defrtrlist_update(struct nd_defrouter *new) { struct nd_defrouter *dr, *n; if ((dr = defrouter_lookup(&new->rtaddr, new->ifp)) != NULL) { /* entry exists */ if (new->rtlifetime == 0) { defrtrlist_del(dr); dr = NULL; } else { int oldpref = rtpref(dr); /* override */ dr->flags = new->flags; /* xxx flag check */ dr->rtlifetime = new->rtlifetime; dr->expire = new->expire; /* * If the preference does not change, there's no need * to sort the entries. Also make sure the selected * router is still installed in the kernel. */ if (dr->installed && rtpref(new) == oldpref) return (dr); /* * preferred router may be changed, so relocate * this router. * XXX: calling TAILQ_REMOVE directly is a bad manner. * However, since defrtrlist_del() has many side * effects, we intentionally do so here. * defrouter_select() below will handle routing * changes later. */ TAILQ_REMOVE(&V_nd_defrouter, dr, dr_entry); n = dr; goto insert; } return (dr); } /* entry does not exist */ if (new->rtlifetime == 0) return (NULL); n = (struct nd_defrouter *)malloc(sizeof(*n), M_IP6NDP, M_NOWAIT); if (n == NULL) return (NULL); bzero(n, sizeof(*n)); *n = *new; insert: /* * Insert the new router in the Default Router List; * The Default Router List should be in the descending order * of router-preferece. Routers with the same preference are * sorted in the arriving time order. */ /* insert at the end of the group */ TAILQ_FOREACH(dr, &V_nd_defrouter, dr_entry) { if (rtpref(n) > rtpref(dr)) break; } if (dr) TAILQ_INSERT_BEFORE(dr, n, dr_entry); else TAILQ_INSERT_TAIL(&V_nd_defrouter, n, dr_entry); defrouter_select(); return (n); } static struct nd_pfxrouter * pfxrtr_lookup(struct nd_prefix *pr, struct nd_defrouter *dr) { struct nd_pfxrouter *search; LIST_FOREACH(search, &pr->ndpr_advrtrs, pfr_entry) { if (search->router == dr) break; } return (search); } static void pfxrtr_add(struct nd_prefix *pr, struct nd_defrouter *dr) { struct nd_pfxrouter *new; new = (struct nd_pfxrouter *)malloc(sizeof(*new), M_IP6NDP, M_NOWAIT); if (new == NULL) return; bzero(new, sizeof(*new)); new->router = dr; LIST_INSERT_HEAD(&pr->ndpr_advrtrs, new, pfr_entry); pfxlist_onlink_check(); } static void pfxrtr_del(struct nd_pfxrouter *pfr) { LIST_REMOVE(pfr, pfr_entry); free(pfr, M_IP6NDP); } struct nd_prefix * nd6_prefix_lookup(struct nd_prefixctl *key) { struct nd_prefix *search; LIST_FOREACH(search, &V_nd_prefix, ndpr_entry) { if (key->ndpr_ifp == search->ndpr_ifp && key->ndpr_plen == search->ndpr_plen && in6_are_prefix_equal(&key->ndpr_prefix.sin6_addr, &search->ndpr_prefix.sin6_addr, key->ndpr_plen)) { break; } } return (search); } int nd6_prelist_add(struct nd_prefixctl *pr, struct nd_defrouter *dr, struct nd_prefix **newp) { struct nd_prefix *new = NULL; int error = 0; char ip6buf[INET6_ADDRSTRLEN]; new = (struct nd_prefix *)malloc(sizeof(*new), M_IP6NDP, M_NOWAIT); if (new == NULL) return(ENOMEM); bzero(new, sizeof(*new)); new->ndpr_ifp = pr->ndpr_ifp; new->ndpr_prefix = pr->ndpr_prefix; new->ndpr_plen = pr->ndpr_plen; new->ndpr_vltime = pr->ndpr_vltime; new->ndpr_pltime = pr->ndpr_pltime; new->ndpr_flags = pr->ndpr_flags; if ((error = in6_init_prefix_ltimes(new)) != 0) { free(new, M_IP6NDP); return(error); } new->ndpr_lastupdate = time_uptime; if (newp != NULL) *newp = new; /* initialization */ LIST_INIT(&new->ndpr_advrtrs); in6_prefixlen2mask(&new->ndpr_mask, new->ndpr_plen); /* make prefix in the canonical form */ IN6_MASK_ADDR(&new->ndpr_prefix.sin6_addr, &new->ndpr_mask); /* link ndpr_entry to nd_prefix list */ LIST_INSERT_HEAD(&V_nd_prefix, new, ndpr_entry); /* ND_OPT_PI_FLAG_ONLINK processing */ if (new->ndpr_raf_onlink) { int e; if ((e = nd6_prefix_onlink(new)) != 0) { nd6log((LOG_ERR, "nd6_prelist_add: failed to make " "the prefix %s/%d on-link on %s (errno=%d)\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen, if_name(pr->ndpr_ifp), e)); /* proceed anyway. XXX: is it correct? */ } } if (dr) pfxrtr_add(new, dr); return 0; } void prelist_remove(struct nd_prefix *pr) { struct nd_pfxrouter *pfr, *next; int e; char ip6buf[INET6_ADDRSTRLEN]; /* make sure to invalidate the prefix until it is really freed. */ pr->ndpr_vltime = 0; pr->ndpr_pltime = 0; /* * Though these flags are now meaningless, we'd rather keep the value * of pr->ndpr_raf_onlink and pr->ndpr_raf_auto not to confuse users * when executing "ndp -p". */ if ((pr->ndpr_stateflags & NDPRF_ONLINK) != 0 && (e = nd6_prefix_offlink(pr)) != 0) { nd6log((LOG_ERR, "prelist_remove: failed to make %s/%d offlink " "on %s, errno=%d\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen, if_name(pr->ndpr_ifp), e)); /* what should we do? */ } if (pr->ndpr_refcnt > 0) return; /* notice here? */ /* unlink ndpr_entry from nd_prefix list */ LIST_REMOVE(pr, ndpr_entry); /* free list of routers that adversed the prefix */ LIST_FOREACH_SAFE(pfr, &pr->ndpr_advrtrs, pfr_entry, next) { free(pfr, M_IP6NDP); } free(pr, M_IP6NDP); pfxlist_onlink_check(); } /* * dr - may be NULL */ static int prelist_update(struct nd_prefixctl *new, struct nd_defrouter *dr, struct mbuf *m, int mcast) { struct in6_ifaddr *ia6 = NULL, *ia6_match = NULL; struct ifaddr *ifa; struct ifnet *ifp = new->ndpr_ifp; struct nd_prefix *pr; int error = 0; int newprefix = 0; int auth; struct in6_addrlifetime lt6_tmp; char ip6buf[INET6_ADDRSTRLEN]; auth = 0; if (m) { /* * Authenticity for NA consists authentication for * both IP header and IP datagrams, doesn't it ? */ #if defined(M_AUTHIPHDR) && defined(M_AUTHIPDGM) auth = ((m->m_flags & M_AUTHIPHDR) && (m->m_flags & M_AUTHIPDGM)); #endif } if ((pr = nd6_prefix_lookup(new)) != NULL) { /* * nd6_prefix_lookup() ensures that pr and new have the same * prefix on a same interface. */ /* * Update prefix information. Note that the on-link (L) bit * and the autonomous (A) bit should NOT be changed from 1 * to 0. */ if (new->ndpr_raf_onlink == 1) pr->ndpr_raf_onlink = 1; if (new->ndpr_raf_auto == 1) pr->ndpr_raf_auto = 1; if (new->ndpr_raf_onlink) { pr->ndpr_vltime = new->ndpr_vltime; pr->ndpr_pltime = new->ndpr_pltime; (void)in6_init_prefix_ltimes(pr); /* XXX error case? */ pr->ndpr_lastupdate = time_uptime; } if (new->ndpr_raf_onlink && (pr->ndpr_stateflags & NDPRF_ONLINK) == 0) { int e; if ((e = nd6_prefix_onlink(pr)) != 0) { nd6log((LOG_ERR, "prelist_update: failed to make " "the prefix %s/%d on-link on %s " "(errno=%d)\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen, if_name(pr->ndpr_ifp), e)); /* proceed anyway. XXX: is it correct? */ } } if (dr && pfxrtr_lookup(pr, dr) == NULL) pfxrtr_add(pr, dr); } else { struct nd_prefix *newpr = NULL; newprefix = 1; if (new->ndpr_vltime == 0) goto end; if (new->ndpr_raf_onlink == 0 && new->ndpr_raf_auto == 0) goto end; error = nd6_prelist_add(new, dr, &newpr); if (error != 0 || newpr == NULL) { nd6log((LOG_NOTICE, "prelist_update: " "nd6_prelist_add failed for %s/%d on %s " "errno=%d, returnpr=%p\n", ip6_sprintf(ip6buf, &new->ndpr_prefix.sin6_addr), new->ndpr_plen, if_name(new->ndpr_ifp), error, newpr)); goto end; /* we should just give up in this case. */ } /* * XXX: from the ND point of view, we can ignore a prefix * with the on-link bit being zero. However, we need a * prefix structure for references from autoconfigured * addresses. Thus, we explicitly make sure that the prefix * itself expires now. */ if (newpr->ndpr_raf_onlink == 0) { newpr->ndpr_vltime = 0; newpr->ndpr_pltime = 0; in6_init_prefix_ltimes(newpr); } pr = newpr; } /* * Address autoconfiguration based on Section 5.5.3 of RFC 2462. * Note that pr must be non NULL at this point. */ /* 5.5.3 (a). Ignore the prefix without the A bit set. */ if (!new->ndpr_raf_auto) goto end; /* * 5.5.3 (b). the link-local prefix should have been ignored in * nd6_ra_input. */ /* 5.5.3 (c). Consistency check on lifetimes: pltime <= vltime. */ if (new->ndpr_pltime > new->ndpr_vltime) { error = EINVAL; /* XXX: won't be used */ goto end; } /* * 5.5.3 (d). If the prefix advertised is not equal to the prefix of * an address configured by stateless autoconfiguration already in the * list of addresses associated with the interface, and the Valid * Lifetime is not 0, form an address. We first check if we have * a matching prefix. * Note: we apply a clarification in rfc2462bis-02 here. We only * consider autoconfigured addresses while RFC2462 simply said * "address". */ IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { struct in6_ifaddr *ifa6; u_int32_t remaininglifetime; if (ifa->ifa_addr->sa_family != AF_INET6) continue; ifa6 = (struct in6_ifaddr *)ifa; /* * We only consider autoconfigured addresses as per rfc2462bis. */ if (!(ifa6->ia6_flags & IN6_IFF_AUTOCONF)) continue; /* * Spec is not clear here, but I believe we should concentrate * on unicast (i.e. not anycast) addresses. * XXX: other ia6_flags? detached or duplicated? */ if ((ifa6->ia6_flags & IN6_IFF_ANYCAST) != 0) continue; /* * Ignore the address if it is not associated with a prefix * or is associated with a prefix that is different from this * one. (pr is never NULL here) */ if (ifa6->ia6_ndpr != pr) continue; if (ia6_match == NULL) /* remember the first one */ ia6_match = ifa6; /* * An already autoconfigured address matched. Now that we * are sure there is at least one matched address, we can * proceed to 5.5.3. (e): update the lifetimes according to the * "two hours" rule and the privacy extension. * We apply some clarifications in rfc2462bis: * - use remaininglifetime instead of storedlifetime as a * variable name * - remove the dead code in the "two-hour" rule */ #define TWOHOUR (120*60) lt6_tmp = ifa6->ia6_lifetime; if (lt6_tmp.ia6t_vltime == ND6_INFINITE_LIFETIME) remaininglifetime = ND6_INFINITE_LIFETIME; else if (time_uptime - ifa6->ia6_updatetime > lt6_tmp.ia6t_vltime) { /* * The case of "invalid" address. We should usually * not see this case. */ remaininglifetime = 0; } else remaininglifetime = lt6_tmp.ia6t_vltime - (time_uptime - ifa6->ia6_updatetime); /* when not updating, keep the current stored lifetime. */ lt6_tmp.ia6t_vltime = remaininglifetime; if (TWOHOUR < new->ndpr_vltime || remaininglifetime < new->ndpr_vltime) { lt6_tmp.ia6t_vltime = new->ndpr_vltime; } else if (remaininglifetime <= TWOHOUR) { if (auth) { lt6_tmp.ia6t_vltime = new->ndpr_vltime; } } else { /* * new->ndpr_vltime <= TWOHOUR && * TWOHOUR < remaininglifetime */ lt6_tmp.ia6t_vltime = TWOHOUR; } /* The 2 hour rule is not imposed for preferred lifetime. */ lt6_tmp.ia6t_pltime = new->ndpr_pltime; in6_init_address_ltimes(pr, <6_tmp); /* * We need to treat lifetimes for temporary addresses * differently, according to * draft-ietf-ipv6-privacy-addrs-v2-01.txt 3.3 (1); * we only update the lifetimes when they are in the maximum * intervals. */ if ((ifa6->ia6_flags & IN6_IFF_TEMPORARY) != 0) { u_int32_t maxvltime, maxpltime; if (V_ip6_temp_valid_lifetime > (u_int32_t)((time_uptime - ifa6->ia6_createtime) + V_ip6_desync_factor)) { maxvltime = V_ip6_temp_valid_lifetime - (time_uptime - ifa6->ia6_createtime) - V_ip6_desync_factor; } else maxvltime = 0; if (V_ip6_temp_preferred_lifetime > (u_int32_t)((time_uptime - ifa6->ia6_createtime) + V_ip6_desync_factor)) { maxpltime = V_ip6_temp_preferred_lifetime - (time_uptime - ifa6->ia6_createtime) - V_ip6_desync_factor; } else maxpltime = 0; if (lt6_tmp.ia6t_vltime == ND6_INFINITE_LIFETIME || lt6_tmp.ia6t_vltime > maxvltime) { lt6_tmp.ia6t_vltime = maxvltime; } if (lt6_tmp.ia6t_pltime == ND6_INFINITE_LIFETIME || lt6_tmp.ia6t_pltime > maxpltime) { lt6_tmp.ia6t_pltime = maxpltime; } } ifa6->ia6_lifetime = lt6_tmp; ifa6->ia6_updatetime = time_uptime; } IF_ADDR_RUNLOCK(ifp); if (ia6_match == NULL && new->ndpr_vltime) { int ifidlen; /* * 5.5.3 (d) (continued) * No address matched and the valid lifetime is non-zero. * Create a new address. */ /* * Prefix Length check: * If the sum of the prefix length and interface identifier * length does not equal 128 bits, the Prefix Information * option MUST be ignored. The length of the interface * identifier is defined in a separate link-type specific * document. */ ifidlen = in6_if2idlen(ifp); if (ifidlen < 0) { /* this should not happen, so we always log it. */ log(LOG_ERR, "prelist_update: IFID undefined (%s)\n", if_name(ifp)); goto end; } if (ifidlen + pr->ndpr_plen != 128) { nd6log((LOG_INFO, "prelist_update: invalid prefixlen " "%d for %s, ignored\n", pr->ndpr_plen, if_name(ifp))); goto end; } if ((ia6 = in6_ifadd(new, mcast)) != NULL) { /* * note that we should use pr (not new) for reference. */ pr->ndpr_refcnt++; ia6->ia6_ndpr = pr; /* * RFC 3041 3.3 (2). * When a new public address is created as described * in RFC2462, also create a new temporary address. * * RFC 3041 3.5. * When an interface connects to a new link, a new * randomized interface identifier should be generated * immediately together with a new set of temporary * addresses. Thus, we specifiy 1 as the 2nd arg of * in6_tmpifadd(). */ if (V_ip6_use_tempaddr) { int e; if ((e = in6_tmpifadd(ia6, 1, 1)) != 0) { nd6log((LOG_NOTICE, "prelist_update: " "failed to create a temporary " "address, errno=%d\n", e)); } } ifa_free(&ia6->ia_ifa); /* * A newly added address might affect the status * of other addresses, so we check and update it. * XXX: what if address duplication happens? */ pfxlist_onlink_check(); } else { /* just set an error. do not bark here. */ error = EADDRNOTAVAIL; /* XXX: might be unused. */ } } end: return error; } /* * A supplement function used in the on-link detection below; * detect if a given prefix has a (probably) reachable advertising router. * XXX: lengthy function name... */ static struct nd_pfxrouter * find_pfxlist_reachable_router(struct nd_prefix *pr) { struct nd_pfxrouter *pfxrtr; struct llentry *ln; int canreach; LIST_FOREACH(pfxrtr, &pr->ndpr_advrtrs, pfr_entry) { IF_AFDATA_RLOCK(pfxrtr->router->ifp); ln = nd6_lookup(&pfxrtr->router->rtaddr, 0, pfxrtr->router->ifp); IF_AFDATA_RUNLOCK(pfxrtr->router->ifp); if (ln == NULL) continue; canreach = ND6_IS_LLINFO_PROBREACH(ln); LLE_RUNLOCK(ln); if (canreach) break; } return (pfxrtr); } /* * Check if each prefix in the prefix list has at least one available router * that advertised the prefix (a router is "available" if its neighbor cache * entry is reachable or probably reachable). * If the check fails, the prefix may be off-link, because, for example, * we have moved from the network but the lifetime of the prefix has not * expired yet. So we should not use the prefix if there is another prefix * that has an available router. * But, if there is no prefix that has an available router, we still regards * all the prefixes as on-link. This is because we can't tell if all the * routers are simply dead or if we really moved from the network and there * is no router around us. */ void pfxlist_onlink_check() { struct nd_prefix *pr; struct in6_ifaddr *ifa; struct nd_defrouter *dr; struct nd_pfxrouter *pfxrtr = NULL; /* * Check if there is a prefix that has a reachable advertising * router. */ LIST_FOREACH(pr, &V_nd_prefix, ndpr_entry) { if (pr->ndpr_raf_onlink && find_pfxlist_reachable_router(pr)) break; } /* * If we have no such prefix, check whether we still have a router * that does not advertise any prefixes. */ if (pr == NULL) { TAILQ_FOREACH(dr, &V_nd_defrouter, dr_entry) { struct nd_prefix *pr0; LIST_FOREACH(pr0, &V_nd_prefix, ndpr_entry) { if ((pfxrtr = pfxrtr_lookup(pr0, dr)) != NULL) break; } if (pfxrtr != NULL) break; } } if (pr != NULL || (!TAILQ_EMPTY(&V_nd_defrouter) && pfxrtr == NULL)) { /* * There is at least one prefix that has a reachable router, * or at least a router which probably does not advertise * any prefixes. The latter would be the case when we move * to a new link where we have a router that does not provide * prefixes and we configure an address by hand. * Detach prefixes which have no reachable advertising * router, and attach other prefixes. */ LIST_FOREACH(pr, &V_nd_prefix, ndpr_entry) { /* XXX: a link-local prefix should never be detached */ if (IN6_IS_ADDR_LINKLOCAL(&pr->ndpr_prefix.sin6_addr)) continue; /* * we aren't interested in prefixes without the L bit * set. */ if (pr->ndpr_raf_onlink == 0) continue; if (pr->ndpr_raf_auto == 0) continue; if ((pr->ndpr_stateflags & NDPRF_DETACHED) == 0 && find_pfxlist_reachable_router(pr) == NULL) pr->ndpr_stateflags |= NDPRF_DETACHED; if ((pr->ndpr_stateflags & NDPRF_DETACHED) != 0 && find_pfxlist_reachable_router(pr) != 0) pr->ndpr_stateflags &= ~NDPRF_DETACHED; } } else { /* there is no prefix that has a reachable router */ LIST_FOREACH(pr, &V_nd_prefix, ndpr_entry) { if (IN6_IS_ADDR_LINKLOCAL(&pr->ndpr_prefix.sin6_addr)) continue; if (pr->ndpr_raf_onlink == 0) continue; if (pr->ndpr_raf_auto == 0) continue; if ((pr->ndpr_stateflags & NDPRF_DETACHED) != 0) pr->ndpr_stateflags &= ~NDPRF_DETACHED; } } /* * Remove each interface route associated with a (just) detached * prefix, and reinstall the interface route for a (just) attached * prefix. Note that all attempt of reinstallation does not * necessarily success, when a same prefix is shared among multiple * interfaces. Such cases will be handled in nd6_prefix_onlink, * so we don't have to care about them. */ LIST_FOREACH(pr, &V_nd_prefix, ndpr_entry) { int e; char ip6buf[INET6_ADDRSTRLEN]; if (IN6_IS_ADDR_LINKLOCAL(&pr->ndpr_prefix.sin6_addr)) continue; if (pr->ndpr_raf_onlink == 0) continue; if (pr->ndpr_raf_auto == 0) continue; if ((pr->ndpr_stateflags & NDPRF_DETACHED) != 0 && (pr->ndpr_stateflags & NDPRF_ONLINK) != 0) { if ((e = nd6_prefix_offlink(pr)) != 0) { nd6log((LOG_ERR, "pfxlist_onlink_check: failed to " "make %s/%d offlink, errno=%d\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen, e)); } } if ((pr->ndpr_stateflags & NDPRF_DETACHED) == 0 && (pr->ndpr_stateflags & NDPRF_ONLINK) == 0 && pr->ndpr_raf_onlink) { if ((e = nd6_prefix_onlink(pr)) != 0) { nd6log((LOG_ERR, "pfxlist_onlink_check: failed to " "make %s/%d onlink, errno=%d\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen, e)); } } } /* * Changes on the prefix status might affect address status as well. * Make sure that all addresses derived from an attached prefix are * attached, and that all addresses derived from a detached prefix are * detached. Note, however, that a manually configured address should * always be attached. * The precise detection logic is same as the one for prefixes. * * XXXRW: in6_ifaddrhead locking. */ TAILQ_FOREACH(ifa, &V_in6_ifaddrhead, ia_link) { if (!(ifa->ia6_flags & IN6_IFF_AUTOCONF)) continue; if (ifa->ia6_ndpr == NULL) { /* * This can happen when we first configure the address * (i.e. the address exists, but the prefix does not). * XXX: complicated relationships... */ continue; } if (find_pfxlist_reachable_router(ifa->ia6_ndpr)) break; } if (ifa) { TAILQ_FOREACH(ifa, &V_in6_ifaddrhead, ia_link) { if ((ifa->ia6_flags & IN6_IFF_AUTOCONF) == 0) continue; if (ifa->ia6_ndpr == NULL) /* XXX: see above. */ continue; if (find_pfxlist_reachable_router(ifa->ia6_ndpr)) { if (ifa->ia6_flags & IN6_IFF_DETACHED) { ifa->ia6_flags &= ~IN6_IFF_DETACHED; ifa->ia6_flags |= IN6_IFF_TENTATIVE; nd6_dad_start((struct ifaddr *)ifa, 0); } } else { ifa->ia6_flags |= IN6_IFF_DETACHED; } } } else { TAILQ_FOREACH(ifa, &V_in6_ifaddrhead, ia_link) { if ((ifa->ia6_flags & IN6_IFF_AUTOCONF) == 0) continue; if (ifa->ia6_flags & IN6_IFF_DETACHED) { ifa->ia6_flags &= ~IN6_IFF_DETACHED; ifa->ia6_flags |= IN6_IFF_TENTATIVE; /* Do we need a delay in this case? */ nd6_dad_start((struct ifaddr *)ifa, 0); } } } } static int nd6_prefix_onlink_rtrequest(struct nd_prefix *pr, struct ifaddr *ifa) { static struct sockaddr_dl null_sdl = {sizeof(null_sdl), AF_LINK}; - struct radix_node_head *rnh; + struct rib_head *rnh; struct rtentry *rt; struct sockaddr_in6 mask6; u_long rtflags; int error, a_failure, fibnum; /* * in6_ifinit() sets nd6_rtrequest to ifa_rtrequest for all ifaddrs. * ifa->ifa_rtrequest = nd6_rtrequest; */ bzero(&mask6, sizeof(mask6)); mask6.sin6_len = sizeof(mask6); mask6.sin6_addr = pr->ndpr_mask; rtflags = (ifa->ifa_flags & ~IFA_RTSELF) | RTF_UP; a_failure = 0; for (fibnum = 0; fibnum < rt_numfibs; fibnum++) { rt = NULL; error = in6_rtrequest(RTM_ADD, (struct sockaddr *)&pr->ndpr_prefix, ifa->ifa_addr, (struct sockaddr *)&mask6, rtflags, &rt, fibnum); if (error == 0) { KASSERT(rt != NULL, ("%s: in6_rtrequest return no " "error(%d) but rt is NULL, pr=%p, ifa=%p", __func__, error, pr, ifa)); rnh = rt_tables_get_rnh(rt->rt_fibnum, AF_INET6); /* XXX what if rhn == NULL? */ - RADIX_NODE_HEAD_LOCK(rnh); + RIB_WLOCK(rnh); RT_LOCK(rt); if (rt_setgate(rt, rt_key(rt), (struct sockaddr *)&null_sdl) == 0) { struct sockaddr_dl *dl; dl = (struct sockaddr_dl *)rt->rt_gateway; dl->sdl_type = rt->rt_ifp->if_type; dl->sdl_index = rt->rt_ifp->if_index; } - RADIX_NODE_HEAD_UNLOCK(rnh); + RIB_WUNLOCK(rnh); nd6_rtmsg(RTM_ADD, rt); RT_UNLOCK(rt); pr->ndpr_stateflags |= NDPRF_ONLINK; } else { char ip6buf[INET6_ADDRSTRLEN]; char ip6bufg[INET6_ADDRSTRLEN]; char ip6bufm[INET6_ADDRSTRLEN]; struct sockaddr_in6 *sin6; sin6 = (struct sockaddr_in6 *)ifa->ifa_addr; nd6log((LOG_ERR, "nd6_prefix_onlink: failed to add " "route for a prefix (%s/%d) on %s, gw=%s, mask=%s, " "flags=%lx errno = %d\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen, if_name(pr->ndpr_ifp), ip6_sprintf(ip6bufg, &sin6->sin6_addr), ip6_sprintf(ip6bufm, &mask6.sin6_addr), rtflags, error)); /* Save last error to return, see rtinit(). */ a_failure = error; } if (rt != NULL) { RT_LOCK(rt); RT_REMREF(rt); RT_UNLOCK(rt); } } /* Return the last error we got. */ return (a_failure); } static int nd6_prefix_onlink(struct nd_prefix *pr) { struct ifaddr *ifa; struct ifnet *ifp = pr->ndpr_ifp; struct nd_prefix *opr; int error = 0; char ip6buf[INET6_ADDRSTRLEN]; /* sanity check */ if ((pr->ndpr_stateflags & NDPRF_ONLINK) != 0) { nd6log((LOG_ERR, "nd6_prefix_onlink: %s/%d is already on-link\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen)); return (EEXIST); } /* * Add the interface route associated with the prefix. Before * installing the route, check if there's the same prefix on another * interface, and the prefix has already installed the interface route. * Although such a configuration is expected to be rare, we explicitly * allow it. */ LIST_FOREACH(opr, &V_nd_prefix, ndpr_entry) { if (opr == pr) continue; if ((opr->ndpr_stateflags & NDPRF_ONLINK) == 0) continue; if (opr->ndpr_plen == pr->ndpr_plen && in6_are_prefix_equal(&pr->ndpr_prefix.sin6_addr, &opr->ndpr_prefix.sin6_addr, pr->ndpr_plen)) return (0); } /* * We prefer link-local addresses as the associated interface address. */ /* search for a link-local addr */ ifa = (struct ifaddr *)in6ifa_ifpforlinklocal(ifp, IN6_IFF_NOTREADY | IN6_IFF_ANYCAST); if (ifa == NULL) { /* XXX: freebsd does not have ifa_ifwithaf */ IF_ADDR_RLOCK(ifp); TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family == AF_INET6) break; } if (ifa != NULL) ifa_ref(ifa); IF_ADDR_RUNLOCK(ifp); /* should we care about ia6_flags? */ } if (ifa == NULL) { /* * This can still happen, when, for example, we receive an RA * containing a prefix with the L bit set and the A bit clear, * after removing all IPv6 addresses on the receiving * interface. This should, of course, be rare though. */ nd6log((LOG_NOTICE, "nd6_prefix_onlink: failed to find any ifaddr" " to add route for a prefix(%s/%d) on %s\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen, if_name(ifp))); return (0); } error = nd6_prefix_onlink_rtrequest(pr, ifa); if (ifa != NULL) ifa_free(ifa); return (error); } static int nd6_prefix_offlink(struct nd_prefix *pr) { int error = 0; struct ifnet *ifp = pr->ndpr_ifp; struct nd_prefix *opr; struct sockaddr_in6 sa6, mask6; struct rtentry *rt; char ip6buf[INET6_ADDRSTRLEN]; int fibnum, a_failure; /* sanity check */ if ((pr->ndpr_stateflags & NDPRF_ONLINK) == 0) { nd6log((LOG_ERR, "nd6_prefix_offlink: %s/%d is already off-link\n", ip6_sprintf(ip6buf, &pr->ndpr_prefix.sin6_addr), pr->ndpr_plen)); return (EEXIST); } bzero(&sa6, sizeof(sa6)); sa6.sin6_family = AF_INET6; sa6.sin6_len = sizeof(sa6); bcopy(&pr->ndpr_prefix.sin6_addr, &sa6.sin6_addr, sizeof(struct in6_addr)); bzero(&mask6, sizeof(mask6)); mask6.sin6_family = AF_INET6; mask6.sin6_len = sizeof(sa6); bcopy(&pr->ndpr_mask, &mask6.sin6_addr, sizeof(struct in6_addr)); a_failure = 0; for (fibnum = 0; fibnum < rt_numfibs; fibnum++) { rt = NULL; error = in6_rtrequest(RTM_DELETE, (struct sockaddr *)&sa6, NULL, (struct sockaddr *)&mask6, 0, &rt, fibnum); if (error == 0) { /* report the route deletion to the routing socket. */ if (rt != NULL) nd6_rtmsg(RTM_DELETE, rt); } else { /* Save last error to return, see rtinit(). */ a_failure = error; } if (rt != NULL) { RTFREE(rt); } } error = a_failure; a_failure = 1; if (error == 0) { pr->ndpr_stateflags &= ~NDPRF_ONLINK; /* * There might be the same prefix on another interface, * the prefix which could not be on-link just because we have * the interface route (see comments in nd6_prefix_onlink). * If there's one, try to make the prefix on-link on the * interface. */ LIST_FOREACH(opr, &V_nd_prefix, ndpr_entry) { if (opr == pr) continue; if ((opr->ndpr_stateflags & NDPRF_ONLINK) != 0) continue; /* * KAME specific: detached prefixes should not be * on-link. */ if ((opr->ndpr_stateflags & NDPRF_DETACHED) != 0) continue; if (opr->ndpr_plen == pr->ndpr_plen && in6_are_prefix_equal(&pr->ndpr_prefix.sin6_addr, &opr->ndpr_prefix.sin6_addr, pr->ndpr_plen)) { int e; if ((e = nd6_prefix_onlink(opr)) != 0) { nd6log((LOG_ERR, "nd6_prefix_offlink: failed to " "recover a prefix %s/%d from %s " "to %s (errno = %d)\n", ip6_sprintf(ip6buf, &opr->ndpr_prefix.sin6_addr), opr->ndpr_plen, if_name(ifp), if_name(opr->ndpr_ifp), e)); } else a_failure = 0; } } } else { /* XXX: can we still set the NDPRF_ONLINK flag? */ nd6log((LOG_ERR, "nd6_prefix_offlink: failed to delete route: " "%s/%d on %s (errno = %d)\n", ip6_sprintf(ip6buf, &sa6.sin6_addr), pr->ndpr_plen, if_name(ifp), error)); } if (a_failure) lltable_prefix_free(AF_INET6, (struct sockaddr *)&sa6, (struct sockaddr *)&mask6, LLE_STATIC); return (error); } static struct in6_ifaddr * in6_ifadd(struct nd_prefixctl *pr, int mcast) { struct ifnet *ifp = pr->ndpr_ifp; struct ifaddr *ifa; struct in6_aliasreq ifra; struct in6_ifaddr *ia, *ib; int error, plen0; struct in6_addr mask; int prefixlen = pr->ndpr_plen; int updateflags; char ip6buf[INET6_ADDRSTRLEN]; in6_prefixlen2mask(&mask, prefixlen); /* * find a link-local address (will be interface ID). * Is it really mandatory? Theoretically, a global or a site-local * address can be configured without a link-local address, if we * have a unique interface identifier... * * it is not mandatory to have a link-local address, we can generate * interface identifier on the fly. we do this because: * (1) it should be the easiest way to find interface identifier. * (2) RFC2462 5.4 suggesting the use of the same interface identifier * for multiple addresses on a single interface, and possible shortcut * of DAD. we omitted DAD for this reason in the past. * (3) a user can prevent autoconfiguration of global address * by removing link-local address by hand (this is partly because we * don't have other way to control the use of IPv6 on an interface. * this has been our design choice - cf. NRL's "ifconfig auto"). * (4) it is easier to manage when an interface has addresses * with the same interface identifier, than to have multiple addresses * with different interface identifiers. */ ifa = (struct ifaddr *)in6ifa_ifpforlinklocal(ifp, 0); /* 0 is OK? */ if (ifa) ib = (struct in6_ifaddr *)ifa; else return NULL; /* prefixlen + ifidlen must be equal to 128 */ plen0 = in6_mask2len(&ib->ia_prefixmask.sin6_addr, NULL); if (prefixlen != plen0) { ifa_free(ifa); nd6log((LOG_INFO, "in6_ifadd: wrong prefixlen for %s " "(prefix=%d ifid=%d)\n", if_name(ifp), prefixlen, 128 - plen0)); return NULL; } /* make ifaddr */ in6_prepare_ifra(&ifra, &pr->ndpr_prefix.sin6_addr, &mask); IN6_MASK_ADDR(&ifra.ifra_addr.sin6_addr, &mask); /* interface ID */ ifra.ifra_addr.sin6_addr.s6_addr32[0] |= (ib->ia_addr.sin6_addr.s6_addr32[0] & ~mask.s6_addr32[0]); ifra.ifra_addr.sin6_addr.s6_addr32[1] |= (ib->ia_addr.sin6_addr.s6_addr32[1] & ~mask.s6_addr32[1]); ifra.ifra_addr.sin6_addr.s6_addr32[2] |= (ib->ia_addr.sin6_addr.s6_addr32[2] & ~mask.s6_addr32[2]); ifra.ifra_addr.sin6_addr.s6_addr32[3] |= (ib->ia_addr.sin6_addr.s6_addr32[3] & ~mask.s6_addr32[3]); ifa_free(ifa); /* lifetimes. */ ifra.ifra_lifetime.ia6t_vltime = pr->ndpr_vltime; ifra.ifra_lifetime.ia6t_pltime = pr->ndpr_pltime; /* XXX: scope zone ID? */ ifra.ifra_flags |= IN6_IFF_AUTOCONF; /* obey autoconf */ /* * Make sure that we do not have this address already. This should * usually not happen, but we can still see this case, e.g., if we * have manually configured the exact address to be configured. */ ifa = (struct ifaddr *)in6ifa_ifpwithaddr(ifp, &ifra.ifra_addr.sin6_addr); if (ifa != NULL) { ifa_free(ifa); /* this should be rare enough to make an explicit log */ log(LOG_INFO, "in6_ifadd: %s is already configured\n", ip6_sprintf(ip6buf, &ifra.ifra_addr.sin6_addr)); return (NULL); } /* * Allocate ifaddr structure, link into chain, etc. * If we are going to create a new address upon receiving a multicasted * RA, we need to impose a random delay before starting DAD. * [draft-ietf-ipv6-rfc2462bis-02.txt, Section 5.4.2] */ updateflags = 0; if (mcast) updateflags |= IN6_IFAUPDATE_DADDELAY; if ((error = in6_update_ifa(ifp, &ifra, NULL, updateflags)) != 0) { nd6log((LOG_ERR, "in6_ifadd: failed to make ifaddr %s on %s (errno=%d)\n", ip6_sprintf(ip6buf, &ifra.ifra_addr.sin6_addr), if_name(ifp), error)); return (NULL); /* ifaddr must not have been allocated. */ } ia = in6ifa_ifpwithaddr(ifp, &ifra.ifra_addr.sin6_addr); /* * XXXRW: Assumption of non-NULLness here might not be true with * fine-grained locking -- should we validate it? Or just return * earlier ifa rather than looking it up again? */ return (ia); /* this is always non-NULL and referenced. */ } /* * ia0 - corresponding public address */ int in6_tmpifadd(const struct in6_ifaddr *ia0, int forcegen, int delay) { struct ifnet *ifp = ia0->ia_ifa.ifa_ifp; struct in6_ifaddr *newia; struct in6_aliasreq ifra; int error; int trylimit = 3; /* XXX: adhoc value */ int updateflags; u_int32_t randid[2]; time_t vltime0, pltime0; in6_prepare_ifra(&ifra, &ia0->ia_addr.sin6_addr, &ia0->ia_prefixmask.sin6_addr); ifra.ifra_addr = ia0->ia_addr; /* XXX: do we need this ? */ /* clear the old IFID */ IN6_MASK_ADDR(&ifra.ifra_addr.sin6_addr, &ifra.ifra_prefixmask.sin6_addr); again: if (in6_get_tmpifid(ifp, (u_int8_t *)randid, (const u_int8_t *)&ia0->ia_addr.sin6_addr.s6_addr[8], forcegen)) { nd6log((LOG_NOTICE, "in6_tmpifadd: failed to find a good " "random IFID\n")); return (EINVAL); } ifra.ifra_addr.sin6_addr.s6_addr32[2] |= (randid[0] & ~(ifra.ifra_prefixmask.sin6_addr.s6_addr32[2])); ifra.ifra_addr.sin6_addr.s6_addr32[3] |= (randid[1] & ~(ifra.ifra_prefixmask.sin6_addr.s6_addr32[3])); /* * in6_get_tmpifid() quite likely provided a unique interface ID. * However, we may still have a chance to see collision, because * there may be a time lag between generation of the ID and generation * of the address. So, we'll do one more sanity check. */ if (in6_localip(&ifra.ifra_addr.sin6_addr) != 0) { if (trylimit-- > 0) { forcegen = 1; goto again; } /* Give up. Something strange should have happened. */ nd6log((LOG_NOTICE, "in6_tmpifadd: failed to " "find a unique random IFID\n")); return (EEXIST); } /* * The Valid Lifetime is the lower of the Valid Lifetime of the * public address or TEMP_VALID_LIFETIME. * The Preferred Lifetime is the lower of the Preferred Lifetime * of the public address or TEMP_PREFERRED_LIFETIME - * DESYNC_FACTOR. */ if (ia0->ia6_lifetime.ia6t_vltime != ND6_INFINITE_LIFETIME) { vltime0 = IFA6_IS_INVALID(ia0) ? 0 : (ia0->ia6_lifetime.ia6t_vltime - (time_uptime - ia0->ia6_updatetime)); if (vltime0 > V_ip6_temp_valid_lifetime) vltime0 = V_ip6_temp_valid_lifetime; } else vltime0 = V_ip6_temp_valid_lifetime; if (ia0->ia6_lifetime.ia6t_pltime != ND6_INFINITE_LIFETIME) { pltime0 = IFA6_IS_DEPRECATED(ia0) ? 0 : (ia0->ia6_lifetime.ia6t_pltime - (time_uptime - ia0->ia6_updatetime)); if (pltime0 > V_ip6_temp_preferred_lifetime - V_ip6_desync_factor){ pltime0 = V_ip6_temp_preferred_lifetime - V_ip6_desync_factor; } } else pltime0 = V_ip6_temp_preferred_lifetime - V_ip6_desync_factor; ifra.ifra_lifetime.ia6t_vltime = vltime0; ifra.ifra_lifetime.ia6t_pltime = pltime0; /* * A temporary address is created only if this calculated Preferred * Lifetime is greater than REGEN_ADVANCE time units. */ if (ifra.ifra_lifetime.ia6t_pltime <= V_ip6_temp_regen_advance) return (0); /* XXX: scope zone ID? */ ifra.ifra_flags |= (IN6_IFF_AUTOCONF|IN6_IFF_TEMPORARY); /* allocate ifaddr structure, link into chain, etc. */ updateflags = 0; if (delay) updateflags |= IN6_IFAUPDATE_DADDELAY; if ((error = in6_update_ifa(ifp, &ifra, NULL, updateflags)) != 0) return (error); newia = in6ifa_ifpwithaddr(ifp, &ifra.ifra_addr.sin6_addr); if (newia == NULL) { /* XXX: can it happen? */ nd6log((LOG_ERR, "in6_tmpifadd: ifa update succeeded, but we got " "no ifaddr\n")); return (EINVAL); /* XXX */ } newia->ia6_ndpr = ia0->ia6_ndpr; newia->ia6_ndpr->ndpr_refcnt++; ifa_free(&newia->ia_ifa); /* * A newly added address might affect the status of other addresses. * XXX: when the temporary address is generated with a new public * address, the onlink check is redundant. However, it would be safe * to do the check explicitly everywhere a new address is generated, * and, in fact, we surely need the check when we create a new * temporary address due to deprecation of an old temporary address. */ pfxlist_onlink_check(); return (0); } static int in6_init_prefix_ltimes(struct nd_prefix *ndpr) { if (ndpr->ndpr_pltime == ND6_INFINITE_LIFETIME) ndpr->ndpr_preferred = 0; else ndpr->ndpr_preferred = time_uptime + ndpr->ndpr_pltime; if (ndpr->ndpr_vltime == ND6_INFINITE_LIFETIME) ndpr->ndpr_expire = 0; else ndpr->ndpr_expire = time_uptime + ndpr->ndpr_vltime; return 0; } static void in6_init_address_ltimes(struct nd_prefix *new, struct in6_addrlifetime *lt6) { /* init ia6t_expire */ if (lt6->ia6t_vltime == ND6_INFINITE_LIFETIME) lt6->ia6t_expire = 0; else { lt6->ia6t_expire = time_uptime; lt6->ia6t_expire += lt6->ia6t_vltime; } /* init ia6t_preferred */ if (lt6->ia6t_pltime == ND6_INFINITE_LIFETIME) lt6->ia6t_preferred = 0; else { lt6->ia6t_preferred = time_uptime; lt6->ia6t_preferred += lt6->ia6t_pltime; } } /* * Delete all the routing table entries that use the specified gateway. * XXX: this function causes search through all entries of routing table, so * it shouldn't be called when acting as a router. */ void rt6_flush(struct in6_addr *gateway, struct ifnet *ifp) { /* We'll care only link-local addresses */ if (!IN6_IS_ADDR_LINKLOCAL(gateway)) return; /* XXX Do we really need to walk any but the default FIB? */ rt_foreach_fib_walk_del(AF_INET6, rt6_deleteroute, (void *)gateway); } static int rt6_deleteroute(const struct rtentry *rt, void *arg) { #define SIN6(s) ((struct sockaddr_in6 *)s) struct in6_addr *gate = (struct in6_addr *)arg; if (rt->rt_gateway == NULL || rt->rt_gateway->sa_family != AF_INET6) return (0); if (!IN6_ARE_ADDR_EQUAL(gate, &SIN6(rt->rt_gateway)->sin6_addr)) { return (0); } /* * Do not delete a static route. * XXX: this seems to be a bit ad-hoc. Should we consider the * 'cloned' bit instead? */ if ((rt->rt_flags & RTF_STATIC) != 0) return (0); /* * We delete only host route. This means, in particular, we don't * delete default route. */ if ((rt->rt_flags & RTF_HOST) == 0) return (0); return (1); #undef SIN6 } int nd6_setdefaultiface(int ifindex) { int error = 0; if (ifindex < 0 || V_if_index < ifindex) return (EINVAL); if (ifindex != 0 && !ifnet_byindex(ifindex)) return (EINVAL); if (V_nd6_defifindex != ifindex) { V_nd6_defifindex = ifindex; if (V_nd6_defifindex > 0) V_nd6_defifp = ifnet_byindex(V_nd6_defifindex); else V_nd6_defifp = NULL; /* * Our current implementation assumes one-to-one maping between * interfaces and links, so it would be natural to use the * default interface as the default link. */ scope6_setdefault(V_nd6_defifp); } return (error); } Index: projects/clang380-import/sys/netpfil/ipfw/dn_sched_qfq.c =================================================================== --- projects/clang380-import/sys/netpfil/ipfw/dn_sched_qfq.c (revision 294776) +++ projects/clang380-import/sys/netpfil/ipfw/dn_sched_qfq.c (revision 294777) @@ -1,870 +1,874 @@ /* * Copyright (c) 2010 Fabio Checconi, Luigi Rizzo, Paolo Valente * All rights reserved * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* * $FreeBSD$ */ #ifdef _KERNEL #include #include #include #include #include #include #include /* IFNAMSIZ */ #include #include /* ipfw_rule_ref */ #include /* flow_id */ #include #include #include #include #else #include #endif #ifdef QFQ_DEBUG struct qfq_sched; static void dump_sched(struct qfq_sched *q, const char *msg); #define NO(x) x #else #define NO(x) #endif #define DN_SCHED_QFQ 4 // XXX Where? typedef unsigned long bitmap; /* * bitmaps ops are critical. Some linux versions have __fls * and the bitmap ops. Some machines have ffs + * NOTE: fls() returns 1 for the least significant bit, + * __fls() returns 0 for the same case. + * We use the base-0 version __fls() to match the description in + * the ToN QFQ paper */ #if defined(_WIN32) || (defined(__MIPSEL__) && defined(LINUX_24)) int fls(unsigned int n) { int i = 0; for (i = 0; n > 0; n >>= 1, i++) ; return i; } #endif #if !defined(_KERNEL) || defined( __FreeBSD__ ) || defined(_WIN32) || (defined(__MIPSEL__) && defined(LINUX_24)) static inline unsigned long __fls(unsigned long word) { return fls(word) - 1; } #endif #if !defined(_KERNEL) || !defined(__linux__) #ifdef QFQ_DEBUG int test_bit(int ix, bitmap *p) { if (ix < 0 || ix > 31) D("bad index %d", ix); return *p & (1< 31) D("bad index %d", ix); *p |= (1< 31) D("bad index %d", ix); *p &= ~(1<index = 0 *.__grp->slot_shift where MIN_SLOT_SHIFT is derived by difference from the others. The max group index corresponds to Lmax/w_min, where Lmax=1<group mapping. Class weights are * in the range [1, QFQ_MAX_WEIGHT], we to map each class i to the * group with the smallest index that can support the L_i / r_i * configured for the class. * * grp->index is the index of the group; and grp->slot_shift * is the shift for the corresponding (scaled) sigma_i. * * When computing the group index, we do (len< 0; } /* Round a precise timestamp to its slotted value. */ static inline uint64_t qfq_round_down(uint64_t ts, unsigned int shift) { return ts & ~((1ULL << shift) - 1); } /* return the pointer to the group with lowest index in the bitmap */ static inline struct qfq_group *qfq_ffs(struct qfq_sched *q, unsigned long bitmap) { int index = ffs(bitmap) - 1; // zero-based return &q->groups[index]; } /* * Calculate a flow index, given its weight and maximum packet length. * index = log_2(maxlen/weight) but we need to apply the scaling. * This is used only once at flow creation. */ static int qfq_calc_index(uint32_t inv_w, unsigned int maxlen) { uint64_t slot_size = (uint64_t)maxlen *inv_w; unsigned long size_map; int index = 0; size_map = (unsigned long)(slot_size >> QFQ_MIN_SLOT_SHIFT); if (!size_map) goto out; index = __fls(size_map) + 1; // basically a log_2() index -= !(slot_size - (1ULL << (index + QFQ_MIN_SLOT_SHIFT - 1))); if (index < 0) index = 0; out: ND("W = %d, L = %d, I = %d\n", ONE_FP/inv_w, maxlen, index); return index; } /*---- end support functions ----*/ /*-------- API calls --------------------------------*/ /* * Validate and copy parameters from flowset. */ static int qfq_new_queue(struct dn_queue *_q) { struct qfq_sched *q = (struct qfq_sched *)(_q->_si + 1); struct qfq_class *cl = (struct qfq_class *)_q; int i; uint32_t w; /* approximated weight */ /* import parameters from the flowset. They should be correct * already. */ w = _q->fs->fs.par[0]; cl->lmax = _q->fs->fs.par[1]; if (!w || w > QFQ_MAX_WEIGHT) { w = 1; D("rounding weight to 1"); } cl->inv_w = ONE_FP/w; w = ONE_FP/cl->inv_w; if (q->wsum + w > QFQ_MAX_WSUM) return EINVAL; i = qfq_calc_index(cl->inv_w, cl->lmax); cl->grp = &q->groups[i]; q->wsum += w; q->iwsum = ONE_FP / q->wsum; /* XXX note theory */ // XXX cl->S = q->V; ? return 0; } /* remove an empty queue */ static int qfq_free_queue(struct dn_queue *_q) { struct qfq_sched *q = (struct qfq_sched *)(_q->_si + 1); struct qfq_class *cl = (struct qfq_class *)_q; if (cl->inv_w) { q->wsum -= ONE_FP/cl->inv_w; if (q->wsum != 0) q->iwsum = ONE_FP / q->wsum; cl->inv_w = 0; /* reset weight to avoid run twice */ } return 0; } /* Calculate a mask to mimic what would be ffs_from(). */ static inline unsigned long mask_from(unsigned long bitmap, int from) { return bitmap & ~((1UL << from) - 1); } /* * The state computation relies on ER=0, IR=1, EB=2, IB=3 * First compute eligibility comparing grp->S, q->V, * then check if someone is blocking us and possibly add EB */ static inline unsigned int qfq_calc_state(struct qfq_sched *q, struct qfq_group *grp) { /* if S > V we are not eligible */ unsigned int state = qfq_gt(grp->S, q->V); unsigned long mask = mask_from(q->bitmaps[ER], grp->index); struct qfq_group *next; if (mask) { next = qfq_ffs(q, mask); if (qfq_gt(grp->F, next->F)) state |= EB; } return state; } /* * In principle * q->bitmaps[dst] |= q->bitmaps[src] & mask; * q->bitmaps[src] &= ~mask; * but we should make sure that src != dst */ static inline void qfq_move_groups(struct qfq_sched *q, unsigned long mask, int src, int dst) { q->bitmaps[dst] |= q->bitmaps[src] & mask; q->bitmaps[src] &= ~mask; } static inline void qfq_unblock_groups(struct qfq_sched *q, int index, uint64_t old_finish) { unsigned long mask = mask_from(q->bitmaps[ER], index + 1); struct qfq_group *next; if (mask) { next = qfq_ffs(q, mask); if (!qfq_gt(next->F, old_finish)) return; } mask = (1UL << index) - 1; qfq_move_groups(q, mask, EB, ER); qfq_move_groups(q, mask, IB, IR); } /* * perhaps * old_V ^= q->V; old_V >>= QFQ_MIN_SLOT_SHIFT; if (old_V) { ... } * */ static inline void qfq_make_eligible(struct qfq_sched *q, uint64_t old_V) { unsigned long mask, vslot, old_vslot; vslot = q->V >> QFQ_MIN_SLOT_SHIFT; old_vslot = old_V >> QFQ_MIN_SLOT_SHIFT; if (vslot != old_vslot) { - /* should be 1ULL not 2ULL */ - mask = (1ULL << (__fls(vslot ^ old_vslot))) - 1; + /* must be 2ULL, see ToN QFQ article fig.5, we use base-0 fls */ + mask = (2ULL << (__fls(vslot ^ old_vslot))) - 1; qfq_move_groups(q, mask, IR, ER); qfq_move_groups(q, mask, IB, EB); } } /* * XXX we should make sure that slot becomes less than 32. * This is guaranteed by the input values. * roundedS is always cl->S rounded on grp->slot_shift bits. */ static inline void qfq_slot_insert(struct qfq_group *grp, struct qfq_class *cl, uint64_t roundedS) { uint64_t slot = (roundedS - grp->S) >> grp->slot_shift; unsigned int i = (grp->front + slot) % QFQ_MAX_SLOTS; cl->next = grp->slots[i]; grp->slots[i] = cl; __set_bit(slot, &grp->full_slots); } /* * remove the entry from the slot */ static inline void qfq_front_slot_remove(struct qfq_group *grp) { struct qfq_class **h = &grp->slots[grp->front]; *h = (*h)->next; if (!*h) __clear_bit(0, &grp->full_slots); } /* * Returns the first full queue in a group. As a side effect, * adjust the bucket list so the first non-empty bucket is at * position 0 in full_slots. */ static inline struct qfq_class * qfq_slot_scan(struct qfq_group *grp) { int i; ND("grp %d full %x", grp->index, grp->full_slots); if (!grp->full_slots) return NULL; i = ffs(grp->full_slots) - 1; // zero-based if (i > 0) { grp->front = (grp->front + i) % QFQ_MAX_SLOTS; grp->full_slots >>= i; } return grp->slots[grp->front]; } /* * adjust the bucket list. When the start time of a group decreases, * we move the index down (modulo QFQ_MAX_SLOTS) so we don't need to * move the objects. The mask of occupied slots must be shifted * because we use ffs() to find the first non-empty slot. * This covers decreases in the group's start time, but what about * increases of the start time ? * Here too we should make sure that i is less than 32 */ static inline void qfq_slot_rotate(struct qfq_sched *q, struct qfq_group *grp, uint64_t roundedS) { unsigned int i = (grp->S - roundedS) >> grp->slot_shift; grp->full_slots <<= i; grp->front = (grp->front - i) % QFQ_MAX_SLOTS; } static inline void qfq_update_eligible(struct qfq_sched *q, uint64_t old_V) { bitmap ineligible; ineligible = q->bitmaps[IR] | q->bitmaps[IB]; if (ineligible) { if (!q->bitmaps[ER]) { struct qfq_group *grp; grp = qfq_ffs(q, ineligible); if (qfq_gt(grp->S, q->V)) q->V = grp->S; } qfq_make_eligible(q, old_V); } } /* * Updates the class, returns true if also the group needs to be updated. */ static inline int qfq_update_class(struct qfq_sched *q, struct qfq_group *grp, struct qfq_class *cl) { cl->S = cl->F; if (cl->_q.mq.head == NULL) { qfq_front_slot_remove(grp); } else { unsigned int len; uint64_t roundedS; len = cl->_q.mq.head->m_pkthdr.len; cl->F = cl->S + (uint64_t)len * cl->inv_w; roundedS = qfq_round_down(cl->S, grp->slot_shift); if (roundedS == grp->S) return 0; qfq_front_slot_remove(grp); qfq_slot_insert(grp, cl, roundedS); } return 1; } static struct mbuf * qfq_dequeue(struct dn_sch_inst *si) { struct qfq_sched *q = (struct qfq_sched *)(si + 1); struct qfq_group *grp; struct qfq_class *cl; struct mbuf *m; uint64_t old_V; NO(q->loops++;) if (!q->bitmaps[ER]) { NO(if (q->queued) dump_sched(q, "start dequeue");) return NULL; } grp = qfq_ffs(q, q->bitmaps[ER]); cl = grp->slots[grp->front]; /* extract from the first bucket in the bucket list */ m = dn_dequeue(&cl->_q); if (!m) { D("BUG/* non-workconserving leaf */"); return NULL; } NO(q->queued--;) old_V = q->V; q->V += (uint64_t)m->m_pkthdr.len * q->iwsum; ND("m is %p F 0x%llx V now 0x%llx", m, cl->F, q->V); if (qfq_update_class(q, grp, cl)) { uint64_t old_F = grp->F; cl = qfq_slot_scan(grp); if (!cl) { /* group gone, remove from ER */ __clear_bit(grp->index, &q->bitmaps[ER]); // grp->S = grp->F + 1; // XXX debugging only } else { uint64_t roundedS = qfq_round_down(cl->S, grp->slot_shift); unsigned int s; if (grp->S == roundedS) goto skip_unblock; grp->S = roundedS; grp->F = roundedS + (2ULL << grp->slot_shift); /* remove from ER and put in the new set */ __clear_bit(grp->index, &q->bitmaps[ER]); s = qfq_calc_state(q, grp); __set_bit(grp->index, &q->bitmaps[s]); } /* we need to unblock even if the group has gone away */ qfq_unblock_groups(q, grp->index, old_F); } skip_unblock: qfq_update_eligible(q, old_V); NO(if (!q->bitmaps[ER] && q->queued) dump_sched(q, "end dequeue");) return m; } /* * Assign a reasonable start time for a new flow k in group i. * Admissible values for \hat(F) are multiples of \sigma_i * no greater than V+\sigma_i . Larger values mean that * we had a wraparound so we consider the timestamp to be stale. * * If F is not stale and F >= V then we set S = F. * Otherwise we should assign S = V, but this may violate * the ordering in ER. So, if we have groups in ER, set S to * the F_j of the first group j which would be blocking us. * We are guaranteed not to move S backward because * otherwise our group i would still be blocked. */ static inline void qfq_update_start(struct qfq_sched *q, struct qfq_class *cl) { unsigned long mask; uint64_t limit, roundedF; int slot_shift = cl->grp->slot_shift; roundedF = qfq_round_down(cl->F, slot_shift); limit = qfq_round_down(q->V, slot_shift) + (1ULL << slot_shift); if (!qfq_gt(cl->F, q->V) || qfq_gt(roundedF, limit)) { /* timestamp was stale */ mask = mask_from(q->bitmaps[ER], cl->grp->index); if (mask) { struct qfq_group *next = qfq_ffs(q, mask); if (qfq_gt(roundedF, next->F)) { /* from pv 71261956973ba9e0637848a5adb4a5819b4bae83 */ if (qfq_gt(limit, next->F)) cl->S = next->F; else /* preserve timestamp correctness */ cl->S = limit; return; } } cl->S = q->V; } else { /* timestamp is not stale */ cl->S = cl->F; } } static int qfq_enqueue(struct dn_sch_inst *si, struct dn_queue *_q, struct mbuf *m) { struct qfq_sched *q = (struct qfq_sched *)(si + 1); struct qfq_group *grp; struct qfq_class *cl = (struct qfq_class *)_q; uint64_t roundedS; int s; NO(q->loops++;) DX(4, "len %d flow %p inv_w 0x%x grp %d", m->m_pkthdr.len, _q, cl->inv_w, cl->grp->index); /* XXX verify that the packet obeys the parameters */ if (m != _q->mq.head) { if (dn_enqueue(_q, m, 0)) /* packet was dropped */ return 1; NO(q->queued++;) if (m != _q->mq.head) return 0; } /* If reach this point, queue q was idle */ grp = cl->grp; qfq_update_start(q, cl); /* adjust start time */ /* compute new finish time and rounded start. */ cl->F = cl->S + (uint64_t)(m->m_pkthdr.len) * cl->inv_w; roundedS = qfq_round_down(cl->S, grp->slot_shift); /* * insert cl in the correct bucket. * If cl->S >= grp->S we don't need to adjust the * bucket list and simply go to the insertion phase. * Otherwise grp->S is decreasing, we must make room * in the bucket list, and also recompute the group state. * Finally, if there were no flows in this group and nobody * was in ER make sure to adjust V. */ if (grp->full_slots) { if (!qfq_gt(grp->S, cl->S)) goto skip_update; /* create a slot for this cl->S */ qfq_slot_rotate(q, grp, roundedS); /* group was surely ineligible, remove */ __clear_bit(grp->index, &q->bitmaps[IR]); __clear_bit(grp->index, &q->bitmaps[IB]); } else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V)) q->V = roundedS; grp->S = roundedS; grp->F = roundedS + (2ULL << grp->slot_shift); // i.e. 2\sigma_i s = qfq_calc_state(q, grp); __set_bit(grp->index, &q->bitmaps[s]); ND("new state %d 0x%x", s, q->bitmaps[s]); ND("S %llx F %llx V %llx", cl->S, cl->F, q->V); skip_update: qfq_slot_insert(grp, cl, roundedS); return 0; } #if 0 static inline void qfq_slot_remove(struct qfq_sched *q, struct qfq_group *grp, struct qfq_class *cl, struct qfq_class **pprev) { unsigned int i, offset; uint64_t roundedS; roundedS = qfq_round_down(cl->S, grp->slot_shift); offset = (roundedS - grp->S) >> grp->slot_shift; i = (grp->front + offset) % QFQ_MAX_SLOTS; #ifdef notyet if (!pprev) { pprev = &grp->slots[i]; while (*pprev && *pprev != cl) pprev = &(*pprev)->next; } #endif *pprev = cl->next; if (!grp->slots[i]) __clear_bit(offset, &grp->full_slots); } /* * called to forcibly destroy a queue. * If the queue is not in the front bucket, or if it has * other queues in the front bucket, we can simply remove * the queue with no other side effects. * Otherwise we must propagate the event up. * XXX description to be completed. */ static void qfq_deactivate_class(struct qfq_sched *q, struct qfq_class *cl, struct qfq_class **pprev) { struct qfq_group *grp = &q->groups[cl->index]; unsigned long mask; uint64_t roundedS; int s; cl->F = cl->S; // not needed if the class goes away. qfq_slot_remove(q, grp, cl, pprev); if (!grp->full_slots) { /* nothing left in the group, remove from all sets. * Do ER last because if we were blocking other groups * we must unblock them. */ __clear_bit(grp->index, &q->bitmaps[IR]); __clear_bit(grp->index, &q->bitmaps[EB]); __clear_bit(grp->index, &q->bitmaps[IB]); if (test_bit(grp->index, &q->bitmaps[ER]) && !(q->bitmaps[ER] & ~((1UL << grp->index) - 1))) { mask = q->bitmaps[ER] & ((1UL << grp->index) - 1); if (mask) mask = ~((1UL << __fls(mask)) - 1); else mask = ~0UL; qfq_move_groups(q, mask, EB, ER); qfq_move_groups(q, mask, IB, IR); } __clear_bit(grp->index, &q->bitmaps[ER]); } else if (!grp->slots[grp->front]) { cl = qfq_slot_scan(grp); roundedS = qfq_round_down(cl->S, grp->slot_shift); if (grp->S != roundedS) { __clear_bit(grp->index, &q->bitmaps[ER]); __clear_bit(grp->index, &q->bitmaps[IR]); __clear_bit(grp->index, &q->bitmaps[EB]); __clear_bit(grp->index, &q->bitmaps[IB]); grp->S = roundedS; grp->F = roundedS + (2ULL << grp->slot_shift); s = qfq_calc_state(q, grp); __set_bit(grp->index, &q->bitmaps[s]); } } qfq_update_eligible(q, q->V); } #endif static int qfq_new_fsk(struct dn_fsk *f) { ipdn_bound_var(&f->fs.par[0], 1, 1, QFQ_MAX_WEIGHT, "qfq weight"); ipdn_bound_var(&f->fs.par[1], 1500, 1, 2000, "qfq maxlen"); ND("weight %d len %d\n", f->fs.par[0], f->fs.par[1]); return 0; } /* * initialize a new scheduler instance */ static int qfq_new_sched(struct dn_sch_inst *si) { struct qfq_sched *q = (struct qfq_sched *)(si + 1); struct qfq_group *grp; int i; for (i = 0; i <= QFQ_MAX_INDEX; i++) { grp = &q->groups[i]; grp->index = i; grp->slot_shift = QFQ_MTU_SHIFT + FRAC_BITS - (QFQ_MAX_INDEX - i); } return 0; } /* * QFQ scheduler descriptor */ static struct dn_alg qfq_desc = { _SI( .type = ) DN_SCHED_QFQ, _SI( .name = ) "QFQ", _SI( .flags = ) DN_MULTIQUEUE, _SI( .schk_datalen = ) 0, _SI( .si_datalen = ) sizeof(struct qfq_sched), _SI( .q_datalen = ) sizeof(struct qfq_class) - sizeof(struct dn_queue), _SI( .enqueue = ) qfq_enqueue, _SI( .dequeue = ) qfq_dequeue, _SI( .config = ) NULL, _SI( .destroy = ) NULL, _SI( .new_sched = ) qfq_new_sched, _SI( .free_sched = ) NULL, _SI( .new_fsk = ) qfq_new_fsk, _SI( .free_fsk = ) NULL, _SI( .new_queue = ) qfq_new_queue, _SI( .free_queue = ) qfq_free_queue, }; DECLARE_DNSCHED_MODULE(dn_qfq, &qfq_desc); #ifdef QFQ_DEBUG static void dump_groups(struct qfq_sched *q, uint32_t mask) { int i, j; for (i = 0; i < QFQ_MAX_INDEX + 1; i++) { struct qfq_group *g = &q->groups[i]; if (0 == (mask & (1<slots[j]) D(" bucket %d %p", j, g->slots[j]); } D("full_slots 0x%x", g->full_slots); D(" %2d S 0x%20llx F 0x%llx %c", i, g->S, g->F, mask & (1<loops, q->queued, q->V); D(" ER 0x%08x", q->bitmaps[ER]); D(" EB 0x%08x", q->bitmaps[EB]); D(" IR 0x%08x", q->bitmaps[IR]); D(" IB 0x%08x", q->bitmaps[IB]); dump_groups(q, 0xffffffff); }; #endif /* QFQ_DEBUG */ Index: projects/clang380-import/sys/netpfil/ipfw/ip_fw_table_algo.c =================================================================== --- projects/clang380-import/sys/netpfil/ipfw/ip_fw_table_algo.c (revision 294776) +++ projects/clang380-import/sys/netpfil/ipfw/ip_fw_table_algo.c (revision 294777) @@ -1,4106 +1,4107 @@ /*- * Copyright (c) 2014 Yandex LLC * Copyright (c) 2014 Alexander V. Chernikov * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); /* * Lookup table algorithms. * */ #include "opt_ipfw.h" #include "opt_inet.h" #ifndef INET #error IPFIREWALL requires INET. #endif /* INET */ #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include /* ip_fw.h requires IFNAMSIZ */ #include #include +#include #include #include #include /* struct ipfw_rule_ref */ #include #include #include #include /* * IPFW table lookup algorithms. * * What is needed to add another table algo? * * Algo init: * * struct table_algo has to be filled with: * name: "type:algoname" format, e.g. "addr:radix". Currently * there are the following types: "addr", "iface", "number" and "flow". * type: one of IPFW_TABLE_* types * flags: one or more TA_FLAGS_* * ta_buf_size: size of structure used to store add/del item state. * Needs to be less than TA_BUF_SZ. * callbacks: see below for description. * * ipfw_add_table_algo / ipfw_del_table_algo has to be called * * Callbacks description: * * -init: request to initialize new table instance. * typedef int (ta_init)(struct ip_fw_chain *ch, void **ta_state, * struct table_info *ti, char *data, uint8_t tflags); * MANDATORY, unlocked. (M_WAITOK). Returns 0 on success. * * Allocate all structures needed for normal operations. * * Caller may want to parse @data for some algo-specific * options provided by userland. * * Caller may want to save configuration state pointer to @ta_state * * Caller needs to save desired runtime structure pointer(s) * inside @ti fields. Note that it is not correct to save * @ti pointer at this moment. Use -change_ti hook for that. * * Caller has to fill in ti->lookup to appropriate function * pointer. * * * * -destroy: request to destroy table instance. * typedef void (ta_destroy)(void *ta_state, struct table_info *ti); * MANDATORY, unlocked. (M_WAITOK). * * Frees all table entries and all tables structures allocated by -init. * * * * -prepare_add: request to allocate state for adding new entry. * typedef int (ta_prepare_add)(struct ip_fw_chain *ch, struct tentry_info *tei, * void *ta_buf); * MANDATORY, unlocked. (M_WAITOK). Returns 0 on success. * * Allocates state and fills it in with all necessary data (EXCEPT value) * from @tei to minimize operations needed to be done under WLOCK. * "value" field has to be copied to new entry in @add callback. * Buffer ta_buf of size ta->ta_buf_sz may be used to store * allocated state. * * * * -prepare_del: request to set state for deleting existing entry. * typedef int (ta_prepare_del)(struct ip_fw_chain *ch, struct tentry_info *tei, * void *ta_buf); * MANDATORY, locked, UH. (M_NOWAIT). Returns 0 on success. * * Buffer ta_buf of size ta->ta_buf_sz may be used to store * allocated state. Caller should use on-stack ta_buf allocation * instead of doing malloc(). * * * * -add: request to insert new entry into runtime/config structures. * typedef int (ta_add)(void *ta_state, struct table_info *ti, * struct tentry_info *tei, void *ta_buf, uint32_t *pnum); * MANDATORY, UH+WLOCK. (M_NOWAIT). Returns 0 on success. * * Insert new entry using previously-allocated state in @ta_buf. * * @tei may have the following flags: * TEI_FLAGS_UPDATE: request to add or update entry. * TEI_FLAGS_DONTADD: request to update (but not add) entry. * * Caller is required to do the following: * copy real entry value from @tei * entry added: return 0, set 1 to @pnum * entry updated: return 0, store 0 to @pnum, store old value in @tei, * add TEI_FLAGS_UPDATED flag to @tei. * entry exists: return EEXIST * entry not found: return ENOENT * other error: return non-zero error code. * * * * -del: request to delete existing entry from runtime/config structures. * typedef int (ta_del)(void *ta_state, struct table_info *ti, * struct tentry_info *tei, void *ta_buf, uint32_t *pnum); * MANDATORY, UH+WLOCK. (M_NOWAIT). Returns 0 on success. * * Delete entry using previously set up in @ta_buf. * * Caller is required to do the following: * entry deleted: return 0, set 1 to @pnum, store old value in @tei. * entry not found: return ENOENT * other error: return non-zero error code. * * * * -flush_entry: flush entry state created by -prepare_add / -del / others * typedef void (ta_flush_entry)(struct ip_fw_chain *ch, * struct tentry_info *tei, void *ta_buf); * MANDATORY, may be locked. (M_NOWAIT). * * Delete state allocated by: * -prepare_add (-add returned EEXIST|UPDATED) * -prepare_del (if any) * -del * * Caller is required to handle empty @ta_buf correctly. * * * -find_tentry: finds entry specified by key @tei * typedef int ta_find_tentry(void *ta_state, struct table_info *ti, * ipfw_obj_tentry *tent); * OPTIONAL, locked (UH). (M_NOWAIT). Returns 0 on success. * * Finds entry specified by given key. * * Caller is requred to do the following: * entry found: returns 0, export entry to @tent * entry not found: returns ENOENT * * * -need_modify: checks if @ti has enough space to hold another @count items. * typedef int (ta_need_modify)(void *ta_state, struct table_info *ti, * uint32_t count, uint64_t *pflags); * OPTIONAL, locked (UH). (M_NOWAIT). Returns 0 if has. * * Checks if given table has enough space to add @count items without * resize. Caller may use @pflags to store desired modification data. * * * * -prepare_mod: allocate structures for table modification. * typedef int (ta_prepare_mod)(void *ta_buf, uint64_t *pflags); * OPTIONAL(need_modify), unlocked. (M_WAITOK). Returns 0 on success. * * Allocate all needed state for table modification. Caller * should use `struct mod_item` to store new state in @ta_buf. * Up to TA_BUF_SZ (128 bytes) can be stored in @ta_buf. * * * * -fill_mod: copy some data to new state/ * typedef int (ta_fill_mod)(void *ta_state, struct table_info *ti, * void *ta_buf, uint64_t *pflags); * OPTIONAL(need_modify), locked (UH). (M_NOWAIT). Returns 0 on success. * * Copy as much data as we can to minimize changes under WLOCK. * For example, array can be merged inside this callback. * * * * -modify: perform final modification. * typedef void (ta_modify)(void *ta_state, struct table_info *ti, * void *ta_buf, uint64_t pflags); * OPTIONAL(need_modify), locked (UH+WLOCK). (M_NOWAIT). * * Performs all changes necessary to switch to new structures. * * Caller should save old pointers to @ta_buf storage. * * * * -flush_mod: flush table modification state. * typedef void (ta_flush_mod)(void *ta_buf); * OPTIONAL(need_modify), unlocked. (M_WAITOK). * * Performs flush for the following: * - prepare_mod (modification was not necessary) * - modify (for the old state) * * * * -change_gi: monitor table info pointer changes * typedef void (ta_change_ti)(void *ta_state, struct table_info *ti); * OPTIONAL, locked (UH). (M_NOWAIT). * * Called on @ti pointer changed. Called immediately after -init * to set initial state. * * * * -foreach: calls @f for each table entry * typedef void ta_foreach(void *ta_state, struct table_info *ti, * ta_foreach_f *f, void *arg); * MANDATORY, locked(UH). (M_NOWAIT). * * Runs callback with specified argument for each table entry, * Typically used for dumping table entries. * * * * -dump_tentry: dump table entry in current @tentry format. * typedef int ta_dump_tentry(void *ta_state, struct table_info *ti, void *e, * ipfw_obj_tentry *tent); * MANDATORY, locked(UH). (M_NOWAIT). Returns 0 on success. * * Dumps entry @e to @tent. * * * -print_config: prints custom algoritm options into buffer. * typedef void (ta_print_config)(void *ta_state, struct table_info *ti, * char *buf, size_t bufsize); * OPTIONAL. locked(UH). (M_NOWAIT). * * Prints custom algorithm options in the format suitable to pass * back to -init callback. * * * * -dump_tinfo: dumps algo-specific info. * typedef void ta_dump_tinfo(void *ta_state, struct table_info *ti, * ipfw_ta_tinfo *tinfo); * OPTIONAL. locked(UH). (M_NOWAIT). * * Dumps options like items size/hash size, etc. */ MALLOC_DEFINE(M_IPFW_TBL, "ipfw_tbl", "IpFw tables"); /* * Utility structures/functions common to more than one algo */ struct mod_item { void *main_ptr; size_t size; void *main_ptr6; size_t size6; }; static int badd(const void *key, void *item, void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)); static int bdel(const void *key, void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)); /* * ADDR implementation using radix * */ /* * The radix code expects addr and mask to be array of bytes, * with the first byte being the length of the array. rn_inithead * is called with the offset in bits of the lookup key within the * array. If we use a sockaddr_in as the underlying type, * sin_len is conveniently located at offset 0, sin_addr is at * offset 4 and normally aligned. * But for portability, let's avoid assumption and make the code explicit */ #define KEY_LEN(v) *((uint8_t *)&(v)) /* * Do not require radix to compare more than actual IPv4/IPv6 address */ #define KEY_LEN_INET (offsetof(struct sockaddr_in, sin_addr) + sizeof(in_addr_t)) #define KEY_LEN_INET6 (offsetof(struct sa_in6, sin6_addr) + sizeof(struct in6_addr)) #define OFF_LEN_INET (8 * offsetof(struct sockaddr_in, sin_addr)) #define OFF_LEN_INET6 (8 * offsetof(struct sa_in6, sin6_addr)) struct radix_addr_entry { struct radix_node rn[2]; struct sockaddr_in addr; uint32_t value; uint8_t masklen; }; struct sa_in6 { uint8_t sin6_len; uint8_t sin6_family; uint8_t pad[2]; struct in6_addr sin6_addr; }; struct radix_addr_xentry { struct radix_node rn[2]; struct sa_in6 addr6; uint32_t value; uint8_t masklen; }; struct radix_cfg { struct radix_node_head *head4; struct radix_node_head *head6; size_t count4; size_t count6; }; struct ta_buf_radix { void *ent_ptr; struct sockaddr *addr_ptr; struct sockaddr *mask_ptr; union { struct { struct sockaddr_in sa; struct sockaddr_in ma; } a4; struct { struct sa_in6 sa; struct sa_in6 ma; } a6; } addr; }; static int ta_lookup_radix(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val); static int ta_init_radix(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags); static int flush_radix_entry(struct radix_node *rn, void *arg); static void ta_destroy_radix(void *ta_state, struct table_info *ti); static void ta_dump_radix_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo); static int ta_dump_radix_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent); static int ta_find_radix_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent); static void ta_foreach_radix(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg); static void tei_to_sockaddr_ent(struct tentry_info *tei, struct sockaddr *sa, struct sockaddr *ma, int *set_mask); static int ta_prepare_add_radix(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_add_radix(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static int ta_prepare_del_radix(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_del_radix(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static void ta_flush_radix_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_need_modify_radix(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags); static int ta_lookup_radix(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val) { struct radix_node_head *rnh; if (keylen == sizeof(in_addr_t)) { struct radix_addr_entry *ent; struct sockaddr_in sa; KEY_LEN(sa) = KEY_LEN_INET; sa.sin_addr.s_addr = *((in_addr_t *)key); rnh = (struct radix_node_head *)ti->state; - ent = (struct radix_addr_entry *)(rnh->rnh_matchaddr(&sa, rnh)); + ent = (struct radix_addr_entry *)(rnh->rnh_matchaddr(&sa, &rnh->rh)); if (ent != NULL) { *val = ent->value; return (1); } } else { struct radix_addr_xentry *xent; struct sa_in6 sa6; KEY_LEN(sa6) = KEY_LEN_INET6; memcpy(&sa6.sin6_addr, key, sizeof(struct in6_addr)); rnh = (struct radix_node_head *)ti->xstate; - xent = (struct radix_addr_xentry *)(rnh->rnh_matchaddr(&sa6, rnh)); + xent = (struct radix_addr_xentry *)(rnh->rnh_matchaddr(&sa6, &rnh->rh)); if (xent != NULL) { *val = xent->value; return (1); } } return (0); } /* * New table */ static int ta_init_radix(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags) { struct radix_cfg *cfg; if (!rn_inithead(&ti->state, OFF_LEN_INET)) return (ENOMEM); if (!rn_inithead(&ti->xstate, OFF_LEN_INET6)) { rn_detachhead(&ti->state); return (ENOMEM); } cfg = malloc(sizeof(struct radix_cfg), M_IPFW, M_WAITOK | M_ZERO); *ta_state = cfg; ti->lookup = ta_lookup_radix; return (0); } static int flush_radix_entry(struct radix_node *rn, void *arg) { struct radix_node_head * const rnh = arg; struct radix_addr_entry *ent; ent = (struct radix_addr_entry *) - rnh->rnh_deladdr(rn->rn_key, rn->rn_mask, rnh); + rnh->rnh_deladdr(rn->rn_key, rn->rn_mask, &rnh->rh); if (ent != NULL) free(ent, M_IPFW_TBL); return (0); } static void ta_destroy_radix(void *ta_state, struct table_info *ti) { struct radix_cfg *cfg; struct radix_node_head *rnh; cfg = (struct radix_cfg *)ta_state; rnh = (struct radix_node_head *)(ti->state); - rnh->rnh_walktree(rnh, flush_radix_entry, rnh); + rnh->rnh_walktree(&rnh->rh, flush_radix_entry, rnh); rn_detachhead(&ti->state); rnh = (struct radix_node_head *)(ti->xstate); - rnh->rnh_walktree(rnh, flush_radix_entry, rnh); + rnh->rnh_walktree(&rnh->rh, flush_radix_entry, rnh); rn_detachhead(&ti->xstate); free(cfg, M_IPFW); } /* * Provide algo-specific table info */ static void ta_dump_radix_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo) { struct radix_cfg *cfg; cfg = (struct radix_cfg *)ta_state; tinfo->flags = IPFW_TATFLAGS_AFDATA | IPFW_TATFLAGS_AFITEM; tinfo->taclass4 = IPFW_TACLASS_RADIX; tinfo->count4 = cfg->count4; tinfo->itemsize4 = sizeof(struct radix_addr_entry); tinfo->taclass6 = IPFW_TACLASS_RADIX; tinfo->count6 = cfg->count6; tinfo->itemsize6 = sizeof(struct radix_addr_xentry); } static int ta_dump_radix_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent) { struct radix_addr_entry *n; #ifdef INET6 struct radix_addr_xentry *xn; #endif n = (struct radix_addr_entry *)e; /* Guess IPv4/IPv6 radix by sockaddr family */ if (n->addr.sin_family == AF_INET) { tent->k.addr.s_addr = n->addr.sin_addr.s_addr; tent->masklen = n->masklen; tent->subtype = AF_INET; tent->v.kidx = n->value; #ifdef INET6 } else { xn = (struct radix_addr_xentry *)e; memcpy(&tent->k, &xn->addr6.sin6_addr, sizeof(struct in6_addr)); tent->masklen = xn->masklen; tent->subtype = AF_INET6; tent->v.kidx = xn->value; #endif } return (0); } static int ta_find_radix_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent) { struct radix_node_head *rnh; void *e; e = NULL; if (tent->subtype == AF_INET) { struct sockaddr_in sa; KEY_LEN(sa) = KEY_LEN_INET; sa.sin_addr.s_addr = tent->k.addr.s_addr; rnh = (struct radix_node_head *)ti->state; - e = rnh->rnh_matchaddr(&sa, rnh); + e = rnh->rnh_matchaddr(&sa, &rnh->rh); } else { struct sa_in6 sa6; KEY_LEN(sa6) = KEY_LEN_INET6; memcpy(&sa6.sin6_addr, &tent->k.addr6, sizeof(struct in6_addr)); rnh = (struct radix_node_head *)ti->xstate; - e = rnh->rnh_matchaddr(&sa6, rnh); + e = rnh->rnh_matchaddr(&sa6, &rnh->rh); } if (e != NULL) { ta_dump_radix_tentry(ta_state, ti, e, tent); return (0); } return (ENOENT); } static void ta_foreach_radix(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg) { struct radix_node_head *rnh; rnh = (struct radix_node_head *)(ti->state); - rnh->rnh_walktree(rnh, (walktree_f_t *)f, arg); + rnh->rnh_walktree(&rnh->rh, (walktree_f_t *)f, arg); rnh = (struct radix_node_head *)(ti->xstate); - rnh->rnh_walktree(rnh, (walktree_f_t *)f, arg); + rnh->rnh_walktree(&rnh->rh, (walktree_f_t *)f, arg); } #ifdef INET6 static inline void ipv6_writemask(struct in6_addr *addr6, uint8_t mask); static inline void ipv6_writemask(struct in6_addr *addr6, uint8_t mask) { uint32_t *cp; for (cp = (uint32_t *)addr6; mask >= 32; mask -= 32) *cp++ = 0xFFFFFFFF; *cp = htonl(mask ? ~((1 << (32 - mask)) - 1) : 0); } #endif static void tei_to_sockaddr_ent(struct tentry_info *tei, struct sockaddr *sa, struct sockaddr *ma, int *set_mask) { int mlen; #ifdef INET struct sockaddr_in *addr, *mask; #endif #ifdef INET6 struct sa_in6 *addr6, *mask6; #endif in_addr_t a4; mlen = tei->masklen; if (tei->subtype == AF_INET) { #ifdef INET addr = (struct sockaddr_in *)sa; mask = (struct sockaddr_in *)ma; /* Set 'total' structure length */ KEY_LEN(*addr) = KEY_LEN_INET; KEY_LEN(*mask) = KEY_LEN_INET; addr->sin_family = AF_INET; mask->sin_addr.s_addr = htonl(mlen ? ~((1 << (32 - mlen)) - 1) : 0); a4 = *((in_addr_t *)tei->paddr); addr->sin_addr.s_addr = a4 & mask->sin_addr.s_addr; if (mlen != 32) *set_mask = 1; else *set_mask = 0; #endif #ifdef INET6 } else if (tei->subtype == AF_INET6) { /* IPv6 case */ addr6 = (struct sa_in6 *)sa; mask6 = (struct sa_in6 *)ma; /* Set 'total' structure length */ KEY_LEN(*addr6) = KEY_LEN_INET6; KEY_LEN(*mask6) = KEY_LEN_INET6; addr6->sin6_family = AF_INET6; ipv6_writemask(&mask6->sin6_addr, mlen); memcpy(&addr6->sin6_addr, tei->paddr, sizeof(struct in6_addr)); APPLY_MASK(&addr6->sin6_addr, &mask6->sin6_addr); if (mlen != 128) *set_mask = 1; else *set_mask = 0; #endif } } static int ta_prepare_add_radix(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_radix *tb; struct radix_addr_entry *ent; #ifdef INET6 struct radix_addr_xentry *xent; #endif struct sockaddr *addr, *mask; int mlen, set_mask; tb = (struct ta_buf_radix *)ta_buf; mlen = tei->masklen; set_mask = 0; if (tei->subtype == AF_INET) { #ifdef INET if (mlen > 32) return (EINVAL); ent = malloc(sizeof(*ent), M_IPFW_TBL, M_WAITOK | M_ZERO); ent->masklen = mlen; addr = (struct sockaddr *)&ent->addr; mask = (struct sockaddr *)&tb->addr.a4.ma; tb->ent_ptr = ent; #endif #ifdef INET6 } else if (tei->subtype == AF_INET6) { /* IPv6 case */ if (mlen > 128) return (EINVAL); xent = malloc(sizeof(*xent), M_IPFW_TBL, M_WAITOK | M_ZERO); xent->masklen = mlen; addr = (struct sockaddr *)&xent->addr6; mask = (struct sockaddr *)&tb->addr.a6.ma; tb->ent_ptr = xent; #endif } else { /* Unknown CIDR type */ return (EINVAL); } tei_to_sockaddr_ent(tei, addr, mask, &set_mask); /* Set pointers */ tb->addr_ptr = addr; if (set_mask != 0) tb->mask_ptr = mask; return (0); } static int ta_add_radix(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct radix_cfg *cfg; struct radix_node_head *rnh; struct radix_node *rn; struct ta_buf_radix *tb; uint32_t *old_value, value; cfg = (struct radix_cfg *)ta_state; tb = (struct ta_buf_radix *)ta_buf; /* Save current entry value from @tei */ if (tei->subtype == AF_INET) { rnh = ti->state; ((struct radix_addr_entry *)tb->ent_ptr)->value = tei->value; } else { rnh = ti->xstate; ((struct radix_addr_xentry *)tb->ent_ptr)->value = tei->value; } /* Search for an entry first */ - rn = rnh->rnh_lookup(tb->addr_ptr, tb->mask_ptr, rnh); + rn = rnh->rnh_lookup(tb->addr_ptr, tb->mask_ptr, &rnh->rh); if (rn != NULL) { if ((tei->flags & TEI_FLAGS_UPDATE) == 0) return (EEXIST); /* Record already exists. Update value if we're asked to */ if (tei->subtype == AF_INET) old_value = &((struct radix_addr_entry *)rn)->value; else old_value = &((struct radix_addr_xentry *)rn)->value; value = *old_value; *old_value = tei->value; tei->value = value; /* Indicate that update has happened instead of addition */ tei->flags |= TEI_FLAGS_UPDATED; *pnum = 0; return (0); } if ((tei->flags & TEI_FLAGS_DONTADD) != 0) return (EFBIG); - rn = rnh->rnh_addaddr(tb->addr_ptr, tb->mask_ptr, rnh, tb->ent_ptr); + rn = rnh->rnh_addaddr(tb->addr_ptr, tb->mask_ptr, &rnh->rh,tb->ent_ptr); if (rn == NULL) { /* Unknown error */ return (EINVAL); } if (tei->subtype == AF_INET) cfg->count4++; else cfg->count6++; tb->ent_ptr = NULL; *pnum = 1; return (0); } static int ta_prepare_del_radix(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_radix *tb; struct sockaddr *addr, *mask; int mlen, set_mask; tb = (struct ta_buf_radix *)ta_buf; mlen = tei->masklen; set_mask = 0; if (tei->subtype == AF_INET) { if (mlen > 32) return (EINVAL); addr = (struct sockaddr *)&tb->addr.a4.sa; mask = (struct sockaddr *)&tb->addr.a4.ma; #ifdef INET6 } else if (tei->subtype == AF_INET6) { if (mlen > 128) return (EINVAL); addr = (struct sockaddr *)&tb->addr.a6.sa; mask = (struct sockaddr *)&tb->addr.a6.ma; #endif } else return (EINVAL); tei_to_sockaddr_ent(tei, addr, mask, &set_mask); tb->addr_ptr = addr; if (set_mask != 0) tb->mask_ptr = mask; return (0); } static int ta_del_radix(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct radix_cfg *cfg; struct radix_node_head *rnh; struct radix_node *rn; struct ta_buf_radix *tb; cfg = (struct radix_cfg *)ta_state; tb = (struct ta_buf_radix *)ta_buf; if (tei->subtype == AF_INET) rnh = ti->state; else rnh = ti->xstate; - rn = rnh->rnh_deladdr(tb->addr_ptr, tb->mask_ptr, rnh); + rn = rnh->rnh_deladdr(tb->addr_ptr, tb->mask_ptr, &rnh->rh); if (rn == NULL) return (ENOENT); /* Save entry value to @tei */ if (tei->subtype == AF_INET) tei->value = ((struct radix_addr_entry *)rn)->value; else tei->value = ((struct radix_addr_xentry *)rn)->value; tb->ent_ptr = rn; if (tei->subtype == AF_INET) cfg->count4--; else cfg->count6--; *pnum = 1; return (0); } static void ta_flush_radix_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_radix *tb; tb = (struct ta_buf_radix *)ta_buf; if (tb->ent_ptr != NULL) free(tb->ent_ptr, M_IPFW_TBL); } static int ta_need_modify_radix(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags) { /* * radix does not require additional memory allocations * other than nodes itself. Adding new masks to the tree do * but we don't have any API to call (and we don't known which * sizes do we need). */ return (0); } struct table_algo addr_radix = { .name = "addr:radix", .type = IPFW_TABLE_ADDR, .flags = TA_FLAG_DEFAULT, .ta_buf_size = sizeof(struct ta_buf_radix), .init = ta_init_radix, .destroy = ta_destroy_radix, .prepare_add = ta_prepare_add_radix, .prepare_del = ta_prepare_del_radix, .add = ta_add_radix, .del = ta_del_radix, .flush_entry = ta_flush_radix_entry, .foreach = ta_foreach_radix, .dump_tentry = ta_dump_radix_tentry, .find_tentry = ta_find_radix_tentry, .dump_tinfo = ta_dump_radix_tinfo, .need_modify = ta_need_modify_radix, }; /* * addr:hash cmds * * * ti->data: * [inv.mask4][inv.mask6][log2hsize4][log2hsize6] * [ 8][ 8[ 8][ 8] * * inv.mask4: 32 - mask * inv.mask6: * 1) _slow lookup: mask * 2) _aligned: (128 - mask) / 8 * 3) _64: 8 * * * pflags: * [v4=1/v6=0][hsize] * [ 32][ 32] */ struct chashentry; SLIST_HEAD(chashbhead, chashentry); struct chash_cfg { struct chashbhead *head4; struct chashbhead *head6; size_t size4; size_t size6; size_t items4; size_t items6; uint8_t mask4; uint8_t mask6; }; struct chashentry { SLIST_ENTRY(chashentry) next; uint32_t value; uint32_t type; union { uint32_t a4; /* Host format */ struct in6_addr a6; /* Network format */ } a; }; struct ta_buf_chash { void *ent_ptr; struct chashentry ent; }; #ifdef INET static __inline uint32_t hash_ip(uint32_t addr, int hsize); #endif #ifdef INET6 static __inline uint32_t hash_ip6(struct in6_addr *addr6, int hsize); static __inline uint16_t hash_ip64(struct in6_addr *addr6, int hsize); static __inline uint32_t hash_ip6_slow(struct in6_addr *addr6, void *key, int mask, int hsize); static __inline uint32_t hash_ip6_al(struct in6_addr *addr6, void *key, int mask, int hsize); #endif static int ta_lookup_chash_slow(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val); static int ta_lookup_chash_aligned(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val); static int ta_lookup_chash_64(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val); static int chash_parse_opts(struct chash_cfg *cfg, char *data); static void ta_print_chash_config(void *ta_state, struct table_info *ti, char *buf, size_t bufsize); static int ta_log2(uint32_t v); static int ta_init_chash(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags); static void ta_destroy_chash(void *ta_state, struct table_info *ti); static void ta_dump_chash_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo); static int ta_dump_chash_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent); static uint32_t hash_ent(struct chashentry *ent, int af, int mlen, uint32_t size); static int tei_to_chash_ent(struct tentry_info *tei, struct chashentry *ent); static int ta_find_chash_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent); static void ta_foreach_chash(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg); static int ta_prepare_add_chash(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_add_chash(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static int ta_prepare_del_chash(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_del_chash(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static void ta_flush_chash_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_need_modify_chash(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags); static int ta_prepare_mod_chash(void *ta_buf, uint64_t *pflags); static int ta_fill_mod_chash(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t *pflags); static void ta_modify_chash(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t pflags); static void ta_flush_mod_chash(void *ta_buf); #ifdef INET static __inline uint32_t hash_ip(uint32_t addr, int hsize) { return (addr % (hsize - 1)); } #endif #ifdef INET6 static __inline uint32_t hash_ip6(struct in6_addr *addr6, int hsize) { uint32_t i; i = addr6->s6_addr32[0] ^ addr6->s6_addr32[1] ^ addr6->s6_addr32[2] ^ addr6->s6_addr32[3]; return (i % (hsize - 1)); } static __inline uint16_t hash_ip64(struct in6_addr *addr6, int hsize) { uint32_t i; i = addr6->s6_addr32[0] ^ addr6->s6_addr32[1]; return (i % (hsize - 1)); } static __inline uint32_t hash_ip6_slow(struct in6_addr *addr6, void *key, int mask, int hsize) { struct in6_addr mask6; ipv6_writemask(&mask6, mask); memcpy(addr6, key, sizeof(struct in6_addr)); APPLY_MASK(addr6, &mask6); return (hash_ip6(addr6, hsize)); } static __inline uint32_t hash_ip6_al(struct in6_addr *addr6, void *key, int mask, int hsize) { uint64_t *paddr; paddr = (uint64_t *)addr6; *paddr = 0; *(paddr + 1) = 0; memcpy(addr6, key, mask); return (hash_ip6(addr6, hsize)); } #endif static int ta_lookup_chash_slow(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val) { struct chashbhead *head; struct chashentry *ent; uint16_t hash, hsize; uint8_t imask; if (keylen == sizeof(in_addr_t)) { #ifdef INET head = (struct chashbhead *)ti->state; imask = ti->data >> 24; hsize = 1 << ((ti->data & 0xFFFF) >> 8); uint32_t a; a = ntohl(*((in_addr_t *)key)); a = a >> imask; hash = hash_ip(a, hsize); SLIST_FOREACH(ent, &head[hash], next) { if (ent->a.a4 == a) { *val = ent->value; return (1); } } #endif } else { #ifdef INET6 /* IPv6: worst scenario: non-round mask */ struct in6_addr addr6; head = (struct chashbhead *)ti->xstate; imask = (ti->data & 0xFF0000) >> 16; hsize = 1 << (ti->data & 0xFF); hash = hash_ip6_slow(&addr6, key, imask, hsize); SLIST_FOREACH(ent, &head[hash], next) { if (memcmp(&ent->a.a6, &addr6, 16) == 0) { *val = ent->value; return (1); } } #endif } return (0); } static int ta_lookup_chash_aligned(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val) { struct chashbhead *head; struct chashentry *ent; uint16_t hash, hsize; uint8_t imask; if (keylen == sizeof(in_addr_t)) { #ifdef INET head = (struct chashbhead *)ti->state; imask = ti->data >> 24; hsize = 1 << ((ti->data & 0xFFFF) >> 8); uint32_t a; a = ntohl(*((in_addr_t *)key)); a = a >> imask; hash = hash_ip(a, hsize); SLIST_FOREACH(ent, &head[hash], next) { if (ent->a.a4 == a) { *val = ent->value; return (1); } } #endif } else { #ifdef INET6 /* IPv6: aligned to 8bit mask */ struct in6_addr addr6; uint64_t *paddr, *ptmp; head = (struct chashbhead *)ti->xstate; imask = (ti->data & 0xFF0000) >> 16; hsize = 1 << (ti->data & 0xFF); hash = hash_ip6_al(&addr6, key, imask, hsize); paddr = (uint64_t *)&addr6; SLIST_FOREACH(ent, &head[hash], next) { ptmp = (uint64_t *)&ent->a.a6; if (paddr[0] == ptmp[0] && paddr[1] == ptmp[1]) { *val = ent->value; return (1); } } #endif } return (0); } static int ta_lookup_chash_64(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val) { struct chashbhead *head; struct chashentry *ent; uint16_t hash, hsize; uint8_t imask; if (keylen == sizeof(in_addr_t)) { #ifdef INET head = (struct chashbhead *)ti->state; imask = ti->data >> 24; hsize = 1 << ((ti->data & 0xFFFF) >> 8); uint32_t a; a = ntohl(*((in_addr_t *)key)); a = a >> imask; hash = hash_ip(a, hsize); SLIST_FOREACH(ent, &head[hash], next) { if (ent->a.a4 == a) { *val = ent->value; return (1); } } #endif } else { #ifdef INET6 /* IPv6: /64 */ uint64_t a6, *paddr; head = (struct chashbhead *)ti->xstate; paddr = (uint64_t *)key; hsize = 1 << (ti->data & 0xFF); a6 = *paddr; hash = hash_ip64((struct in6_addr *)key, hsize); SLIST_FOREACH(ent, &head[hash], next) { paddr = (uint64_t *)&ent->a.a6; if (a6 == *paddr) { *val = ent->value; return (1); } } #endif } return (0); } static int chash_parse_opts(struct chash_cfg *cfg, char *data) { char *pdel, *pend, *s; int mask4, mask6; mask4 = cfg->mask4; mask6 = cfg->mask6; if (data == NULL) return (0); if ((pdel = strchr(data, ' ')) == NULL) return (0); while (*pdel == ' ') pdel++; if (strncmp(pdel, "masks=", 6) != 0) return (EINVAL); if ((s = strchr(pdel, ' ')) != NULL) *s++ = '\0'; pdel += 6; /* Need /XX[,/YY] */ if (*pdel++ != '/') return (EINVAL); mask4 = strtol(pdel, &pend, 10); if (*pend == ',') { /* ,/YY */ pdel = pend + 1; if (*pdel++ != '/') return (EINVAL); mask6 = strtol(pdel, &pend, 10); if (*pend != '\0') return (EINVAL); } else if (*pend != '\0') return (EINVAL); if (mask4 < 0 || mask4 > 32 || mask6 < 0 || mask6 > 128) return (EINVAL); cfg->mask4 = mask4; cfg->mask6 = mask6; return (0); } static void ta_print_chash_config(void *ta_state, struct table_info *ti, char *buf, size_t bufsize) { struct chash_cfg *cfg; cfg = (struct chash_cfg *)ta_state; if (cfg->mask4 != 32 || cfg->mask6 != 128) snprintf(buf, bufsize, "%s masks=/%d,/%d", "addr:hash", cfg->mask4, cfg->mask6); else snprintf(buf, bufsize, "%s", "addr:hash"); } static int ta_log2(uint32_t v) { uint32_t r; r = 0; while (v >>= 1) r++; return (r); } /* * New table. * We assume 'data' to be either NULL or the following format: * 'addr:hash [masks=/32[,/128]]' */ static int ta_init_chash(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags) { int error, i; uint32_t hsize; struct chash_cfg *cfg; cfg = malloc(sizeof(struct chash_cfg), M_IPFW, M_WAITOK | M_ZERO); cfg->mask4 = 32; cfg->mask6 = 128; if ((error = chash_parse_opts(cfg, data)) != 0) { free(cfg, M_IPFW); return (error); } cfg->size4 = 128; cfg->size6 = 128; cfg->head4 = malloc(sizeof(struct chashbhead) * cfg->size4, M_IPFW, M_WAITOK | M_ZERO); cfg->head6 = malloc(sizeof(struct chashbhead) * cfg->size6, M_IPFW, M_WAITOK | M_ZERO); for (i = 0; i < cfg->size4; i++) SLIST_INIT(&cfg->head4[i]); for (i = 0; i < cfg->size6; i++) SLIST_INIT(&cfg->head6[i]); *ta_state = cfg; ti->state = cfg->head4; ti->xstate = cfg->head6; /* Store data depending on v6 mask length */ hsize = ta_log2(cfg->size4) << 8 | ta_log2(cfg->size6); if (cfg->mask6 == 64) { ti->data = (32 - cfg->mask4) << 24 | (128 - cfg->mask6) << 16| hsize; ti->lookup = ta_lookup_chash_64; } else if ((cfg->mask6 % 8) == 0) { ti->data = (32 - cfg->mask4) << 24 | cfg->mask6 << 13 | hsize; ti->lookup = ta_lookup_chash_aligned; } else { /* don't do that! */ ti->data = (32 - cfg->mask4) << 24 | cfg->mask6 << 16 | hsize; ti->lookup = ta_lookup_chash_slow; } return (0); } static void ta_destroy_chash(void *ta_state, struct table_info *ti) { struct chash_cfg *cfg; struct chashentry *ent, *ent_next; int i; cfg = (struct chash_cfg *)ta_state; for (i = 0; i < cfg->size4; i++) SLIST_FOREACH_SAFE(ent, &cfg->head4[i], next, ent_next) free(ent, M_IPFW_TBL); for (i = 0; i < cfg->size6; i++) SLIST_FOREACH_SAFE(ent, &cfg->head6[i], next, ent_next) free(ent, M_IPFW_TBL); free(cfg->head4, M_IPFW); free(cfg->head6, M_IPFW); free(cfg, M_IPFW); } static void ta_dump_chash_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo) { struct chash_cfg *cfg; cfg = (struct chash_cfg *)ta_state; tinfo->flags = IPFW_TATFLAGS_AFDATA | IPFW_TATFLAGS_AFITEM; tinfo->taclass4 = IPFW_TACLASS_HASH; tinfo->size4 = cfg->size4; tinfo->count4 = cfg->items4; tinfo->itemsize4 = sizeof(struct chashentry); tinfo->taclass6 = IPFW_TACLASS_HASH; tinfo->size6 = cfg->size6; tinfo->count6 = cfg->items6; tinfo->itemsize6 = sizeof(struct chashentry); } static int ta_dump_chash_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent) { struct chash_cfg *cfg; struct chashentry *ent; cfg = (struct chash_cfg *)ta_state; ent = (struct chashentry *)e; if (ent->type == AF_INET) { tent->k.addr.s_addr = htonl(ent->a.a4 << (32 - cfg->mask4)); tent->masklen = cfg->mask4; tent->subtype = AF_INET; tent->v.kidx = ent->value; #ifdef INET6 } else { memcpy(&tent->k, &ent->a.a6, sizeof(struct in6_addr)); tent->masklen = cfg->mask6; tent->subtype = AF_INET6; tent->v.kidx = ent->value; #endif } return (0); } static uint32_t hash_ent(struct chashentry *ent, int af, int mlen, uint32_t size) { uint32_t hash; hash = 0; if (af == AF_INET) { #ifdef INET hash = hash_ip(ent->a.a4, size); #endif } else { #ifdef INET6 if (mlen == 64) hash = hash_ip64(&ent->a.a6, size); else hash = hash_ip6(&ent->a.a6, size); #endif } return (hash); } static int tei_to_chash_ent(struct tentry_info *tei, struct chashentry *ent) { int mlen; #ifdef INET6 struct in6_addr mask6; #endif mlen = tei->masklen; if (tei->subtype == AF_INET) { #ifdef INET if (mlen > 32) return (EINVAL); ent->type = AF_INET; /* Calculate masked address */ ent->a.a4 = ntohl(*((in_addr_t *)tei->paddr)) >> (32 - mlen); #endif #ifdef INET6 } else if (tei->subtype == AF_INET6) { /* IPv6 case */ if (mlen > 128) return (EINVAL); ent->type = AF_INET6; ipv6_writemask(&mask6, mlen); memcpy(&ent->a.a6, tei->paddr, sizeof(struct in6_addr)); APPLY_MASK(&ent->a.a6, &mask6); #endif } else { /* Unknown CIDR type */ return (EINVAL); } return (0); } static int ta_find_chash_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent) { struct chash_cfg *cfg; struct chashbhead *head; struct chashentry ent, *tmp; struct tentry_info tei; int error; uint32_t hash; cfg = (struct chash_cfg *)ta_state; memset(&ent, 0, sizeof(ent)); memset(&tei, 0, sizeof(tei)); if (tent->subtype == AF_INET) { tei.paddr = &tent->k.addr; tei.masklen = cfg->mask4; tei.subtype = AF_INET; if ((error = tei_to_chash_ent(&tei, &ent)) != 0) return (error); head = cfg->head4; hash = hash_ent(&ent, AF_INET, cfg->mask4, cfg->size4); /* Check for existence */ SLIST_FOREACH(tmp, &head[hash], next) { if (tmp->a.a4 != ent.a.a4) continue; ta_dump_chash_tentry(ta_state, ti, tmp, tent); return (0); } } else { tei.paddr = &tent->k.addr6; tei.masklen = cfg->mask6; tei.subtype = AF_INET6; if ((error = tei_to_chash_ent(&tei, &ent)) != 0) return (error); head = cfg->head6; hash = hash_ent(&ent, AF_INET6, cfg->mask6, cfg->size6); /* Check for existence */ SLIST_FOREACH(tmp, &head[hash], next) { if (memcmp(&tmp->a.a6, &ent.a.a6, 16) != 0) continue; ta_dump_chash_tentry(ta_state, ti, tmp, tent); return (0); } } return (ENOENT); } static void ta_foreach_chash(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg) { struct chash_cfg *cfg; struct chashentry *ent, *ent_next; int i; cfg = (struct chash_cfg *)ta_state; for (i = 0; i < cfg->size4; i++) SLIST_FOREACH_SAFE(ent, &cfg->head4[i], next, ent_next) f(ent, arg); for (i = 0; i < cfg->size6; i++) SLIST_FOREACH_SAFE(ent, &cfg->head6[i], next, ent_next) f(ent, arg); } static int ta_prepare_add_chash(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_chash *tb; struct chashentry *ent; int error; tb = (struct ta_buf_chash *)ta_buf; ent = malloc(sizeof(*ent), M_IPFW_TBL, M_WAITOK | M_ZERO); error = tei_to_chash_ent(tei, ent); if (error != 0) { free(ent, M_IPFW_TBL); return (error); } tb->ent_ptr = ent; return (0); } static int ta_add_chash(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct chash_cfg *cfg; struct chashbhead *head; struct chashentry *ent, *tmp; struct ta_buf_chash *tb; int exists; uint32_t hash, value; cfg = (struct chash_cfg *)ta_state; tb = (struct ta_buf_chash *)ta_buf; ent = (struct chashentry *)tb->ent_ptr; hash = 0; exists = 0; /* Read current value from @tei */ ent->value = tei->value; /* Read cuurrent value */ if (tei->subtype == AF_INET) { if (tei->masklen != cfg->mask4) return (EINVAL); head = cfg->head4; hash = hash_ent(ent, AF_INET, cfg->mask4, cfg->size4); /* Check for existence */ SLIST_FOREACH(tmp, &head[hash], next) { if (tmp->a.a4 == ent->a.a4) { exists = 1; break; } } } else { if (tei->masklen != cfg->mask6) return (EINVAL); head = cfg->head6; hash = hash_ent(ent, AF_INET6, cfg->mask6, cfg->size6); /* Check for existence */ SLIST_FOREACH(tmp, &head[hash], next) { if (memcmp(&tmp->a.a6, &ent->a.a6, 16) == 0) { exists = 1; break; } } } if (exists == 1) { if ((tei->flags & TEI_FLAGS_UPDATE) == 0) return (EEXIST); /* Record already exists. Update value if we're asked to */ value = tmp->value; tmp->value = tei->value; tei->value = value; /* Indicate that update has happened instead of addition */ tei->flags |= TEI_FLAGS_UPDATED; *pnum = 0; } else { if ((tei->flags & TEI_FLAGS_DONTADD) != 0) return (EFBIG); SLIST_INSERT_HEAD(&head[hash], ent, next); tb->ent_ptr = NULL; *pnum = 1; /* Update counters */ if (tei->subtype == AF_INET) cfg->items4++; else cfg->items6++; } return (0); } static int ta_prepare_del_chash(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_chash *tb; tb = (struct ta_buf_chash *)ta_buf; return (tei_to_chash_ent(tei, &tb->ent)); } static int ta_del_chash(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct chash_cfg *cfg; struct chashbhead *head; struct chashentry *tmp, *tmp_next, *ent; struct ta_buf_chash *tb; uint32_t hash; cfg = (struct chash_cfg *)ta_state; tb = (struct ta_buf_chash *)ta_buf; ent = &tb->ent; if (tei->subtype == AF_INET) { if (tei->masklen != cfg->mask4) return (EINVAL); head = cfg->head4; hash = hash_ent(ent, AF_INET, cfg->mask4, cfg->size4); SLIST_FOREACH_SAFE(tmp, &head[hash], next, tmp_next) { if (tmp->a.a4 != ent->a.a4) continue; SLIST_REMOVE(&head[hash], tmp, chashentry, next); cfg->items4--; tb->ent_ptr = tmp; tei->value = tmp->value; *pnum = 1; return (0); } } else { if (tei->masklen != cfg->mask6) return (EINVAL); head = cfg->head6; hash = hash_ent(ent, AF_INET6, cfg->mask6, cfg->size6); SLIST_FOREACH_SAFE(tmp, &head[hash], next, tmp_next) { if (memcmp(&tmp->a.a6, &ent->a.a6, 16) != 0) continue; SLIST_REMOVE(&head[hash], tmp, chashentry, next); cfg->items6--; tb->ent_ptr = tmp; tei->value = tmp->value; *pnum = 1; return (0); } } return (ENOENT); } static void ta_flush_chash_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_chash *tb; tb = (struct ta_buf_chash *)ta_buf; if (tb->ent_ptr != NULL) free(tb->ent_ptr, M_IPFW_TBL); } /* * Hash growing callbacks. */ static int ta_need_modify_chash(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags) { struct chash_cfg *cfg; uint64_t data; /* * Since we don't know exact number of IPv4/IPv6 records in @count, * ignore non-zero @count value at all. Check current hash sizes * and return appropriate data. */ cfg = (struct chash_cfg *)ta_state; data = 0; if (cfg->items4 > cfg->size4 && cfg->size4 < 65536) data |= (cfg->size4 * 2) << 16; if (cfg->items6 > cfg->size6 && cfg->size6 < 65536) data |= cfg->size6 * 2; if (data != 0) { *pflags = data; return (1); } return (0); } /* * Allocate new, larger chash. */ static int ta_prepare_mod_chash(void *ta_buf, uint64_t *pflags) { struct mod_item *mi; struct chashbhead *head; int i; mi = (struct mod_item *)ta_buf; memset(mi, 0, sizeof(struct mod_item)); mi->size = (*pflags >> 16) & 0xFFFF; mi->size6 = *pflags & 0xFFFF; if (mi->size > 0) { head = malloc(sizeof(struct chashbhead) * mi->size, M_IPFW, M_WAITOK | M_ZERO); for (i = 0; i < mi->size; i++) SLIST_INIT(&head[i]); mi->main_ptr = head; } if (mi->size6 > 0) { head = malloc(sizeof(struct chashbhead) * mi->size6, M_IPFW, M_WAITOK | M_ZERO); for (i = 0; i < mi->size6; i++) SLIST_INIT(&head[i]); mi->main_ptr6 = head; } return (0); } /* * Copy data from old runtime array to new one. */ static int ta_fill_mod_chash(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t *pflags) { /* In is not possible to do rehash if we're not holidng WLOCK. */ return (0); } /* * Switch old & new arrays. */ static void ta_modify_chash(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t pflags) { struct mod_item *mi; struct chash_cfg *cfg; struct chashbhead *old_head, *new_head; struct chashentry *ent, *ent_next; int af, i, mlen; uint32_t nhash; size_t old_size, new_size; mi = (struct mod_item *)ta_buf; cfg = (struct chash_cfg *)ta_state; /* Check which hash we need to grow and do we still need that */ if (mi->size > 0 && cfg->size4 < mi->size) { new_head = (struct chashbhead *)mi->main_ptr; new_size = mi->size; old_size = cfg->size4; old_head = ti->state; mlen = cfg->mask4; af = AF_INET; for (i = 0; i < old_size; i++) { SLIST_FOREACH_SAFE(ent, &old_head[i], next, ent_next) { nhash = hash_ent(ent, af, mlen, new_size); SLIST_INSERT_HEAD(&new_head[nhash], ent, next); } } ti->state = new_head; cfg->head4 = new_head; cfg->size4 = mi->size; mi->main_ptr = old_head; } if (mi->size6 > 0 && cfg->size6 < mi->size6) { new_head = (struct chashbhead *)mi->main_ptr6; new_size = mi->size6; old_size = cfg->size6; old_head = ti->xstate; mlen = cfg->mask6; af = AF_INET6; for (i = 0; i < old_size; i++) { SLIST_FOREACH_SAFE(ent, &old_head[i], next, ent_next) { nhash = hash_ent(ent, af, mlen, new_size); SLIST_INSERT_HEAD(&new_head[nhash], ent, next); } } ti->xstate = new_head; cfg->head6 = new_head; cfg->size6 = mi->size6; mi->main_ptr6 = old_head; } /* Update lower 32 bits with new values */ ti->data &= 0xFFFFFFFF00000000; ti->data |= ta_log2(cfg->size4) << 8 | ta_log2(cfg->size6); } /* * Free unneded array. */ static void ta_flush_mod_chash(void *ta_buf) { struct mod_item *mi; mi = (struct mod_item *)ta_buf; if (mi->main_ptr != NULL) free(mi->main_ptr, M_IPFW); if (mi->main_ptr6 != NULL) free(mi->main_ptr6, M_IPFW); } struct table_algo addr_hash = { .name = "addr:hash", .type = IPFW_TABLE_ADDR, .ta_buf_size = sizeof(struct ta_buf_chash), .init = ta_init_chash, .destroy = ta_destroy_chash, .prepare_add = ta_prepare_add_chash, .prepare_del = ta_prepare_del_chash, .add = ta_add_chash, .del = ta_del_chash, .flush_entry = ta_flush_chash_entry, .foreach = ta_foreach_chash, .dump_tentry = ta_dump_chash_tentry, .find_tentry = ta_find_chash_tentry, .print_config = ta_print_chash_config, .dump_tinfo = ta_dump_chash_tinfo, .need_modify = ta_need_modify_chash, .prepare_mod = ta_prepare_mod_chash, .fill_mod = ta_fill_mod_chash, .modify = ta_modify_chash, .flush_mod = ta_flush_mod_chash, }; /* * Iface table cmds. * * Implementation: * * Runtime part: * - sorted array of "struct ifidx" pointed by ti->state. * Array is allocated with rounding up to IFIDX_CHUNK. Only existing * interfaces are stored in array, however its allocated size is * sufficient to hold all table records if needed. * - current array size is stored in ti->data * * Table data: * - "struct iftable_cfg" is allocated to store table state (ta_state). * - All table records are stored inside namedobj instance. * */ struct ifidx { uint16_t kidx; uint16_t spare; uint32_t value; }; #define DEFAULT_IFIDX_SIZE 64 struct iftable_cfg; struct ifentry { struct named_object no; struct ipfw_ifc ic; struct iftable_cfg *icfg; uint32_t value; int linked; }; struct iftable_cfg { struct namedobj_instance *ii; struct ip_fw_chain *ch; struct table_info *ti; void *main_ptr; size_t size; /* Number of items allocated in array */ size_t count; /* Number of all items */ size_t used; /* Number of items _active_ now */ }; struct ta_buf_ifidx { struct ifentry *ife; uint32_t value; }; int compare_ifidx(const void *k, const void *v); static struct ifidx * ifidx_find(struct table_info *ti, void *key); static int ta_lookup_ifidx(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val); static int ta_init_ifidx(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags); static void ta_change_ti_ifidx(void *ta_state, struct table_info *ti); static void destroy_ifidx_locked(struct namedobj_instance *ii, struct named_object *no, void *arg); static void ta_destroy_ifidx(void *ta_state, struct table_info *ti); static void ta_dump_ifidx_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo); static int ta_prepare_add_ifidx(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_add_ifidx(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static int ta_prepare_del_ifidx(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_del_ifidx(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static void ta_flush_ifidx_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static void if_notifier(struct ip_fw_chain *ch, void *cbdata, uint16_t ifindex); static int ta_need_modify_ifidx(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags); static int ta_prepare_mod_ifidx(void *ta_buf, uint64_t *pflags); static int ta_fill_mod_ifidx(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t *pflags); static void ta_modify_ifidx(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t pflags); static void ta_flush_mod_ifidx(void *ta_buf); static int ta_dump_ifidx_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent); static int ta_find_ifidx_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent); static void foreach_ifidx(struct namedobj_instance *ii, struct named_object *no, void *arg); static void ta_foreach_ifidx(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg); int compare_ifidx(const void *k, const void *v) { const struct ifidx *ifidx; uint16_t key; key = *((const uint16_t *)k); ifidx = (const struct ifidx *)v; if (key < ifidx->kidx) return (-1); else if (key > ifidx->kidx) return (1); return (0); } /* * Adds item @item with key @key into ascending-sorted array @base. * Assumes @base has enough additional storage. * * Returns 1 on success, 0 on duplicate key. */ static int badd(const void *key, void *item, void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)) { int min, max, mid, shift, res; caddr_t paddr; if (nmemb == 0) { memcpy(base, item, size); return (1); } /* Binary search */ min = 0; max = nmemb - 1; mid = 0; while (min <= max) { mid = (min + max) / 2; res = compar(key, (const void *)((caddr_t)base + mid * size)); if (res == 0) return (0); if (res > 0) min = mid + 1; else max = mid - 1; } /* Item not found. */ res = compar(key, (const void *)((caddr_t)base + mid * size)); if (res > 0) shift = mid + 1; else shift = mid; paddr = (caddr_t)base + shift * size; if (nmemb > shift) memmove(paddr + size, paddr, (nmemb - shift) * size); memcpy(paddr, item, size); return (1); } /* * Deletes item with key @key from ascending-sorted array @base. * * Returns 1 on success, 0 for non-existent key. */ static int bdel(const void *key, void *base, size_t nmemb, size_t size, int (*compar) (const void *, const void *)) { caddr_t item; size_t sz; item = (caddr_t)bsearch(key, base, nmemb, size, compar); if (item == NULL) return (0); sz = (caddr_t)base + nmemb * size - item; if (sz > 0) memmove(item, item + size, sz); return (1); } static struct ifidx * ifidx_find(struct table_info *ti, void *key) { struct ifidx *ifi; ifi = bsearch(key, ti->state, ti->data, sizeof(struct ifidx), compare_ifidx); return (ifi); } static int ta_lookup_ifidx(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val) { struct ifidx *ifi; ifi = ifidx_find(ti, key); if (ifi != NULL) { *val = ifi->value; return (1); } return (0); } static int ta_init_ifidx(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags) { struct iftable_cfg *icfg; icfg = malloc(sizeof(struct iftable_cfg), M_IPFW, M_WAITOK | M_ZERO); icfg->ii = ipfw_objhash_create(DEFAULT_IFIDX_SIZE); icfg->size = DEFAULT_IFIDX_SIZE; icfg->main_ptr = malloc(sizeof(struct ifidx) * icfg->size, M_IPFW, M_WAITOK | M_ZERO); icfg->ch = ch; *ta_state = icfg; ti->state = icfg->main_ptr; ti->lookup = ta_lookup_ifidx; return (0); } /* * Handle tableinfo @ti pointer change (on table array resize). */ static void ta_change_ti_ifidx(void *ta_state, struct table_info *ti) { struct iftable_cfg *icfg; icfg = (struct iftable_cfg *)ta_state; icfg->ti = ti; } static void destroy_ifidx_locked(struct namedobj_instance *ii, struct named_object *no, void *arg) { struct ifentry *ife; struct ip_fw_chain *ch; ch = (struct ip_fw_chain *)arg; ife = (struct ifentry *)no; ipfw_iface_del_notify(ch, &ife->ic); ipfw_iface_unref(ch, &ife->ic); free(ife, M_IPFW_TBL); } /* * Destroys table @ti */ static void ta_destroy_ifidx(void *ta_state, struct table_info *ti) { struct iftable_cfg *icfg; struct ip_fw_chain *ch; icfg = (struct iftable_cfg *)ta_state; ch = icfg->ch; if (icfg->main_ptr != NULL) free(icfg->main_ptr, M_IPFW); IPFW_UH_WLOCK(ch); ipfw_objhash_foreach(icfg->ii, destroy_ifidx_locked, ch); IPFW_UH_WUNLOCK(ch); ipfw_objhash_destroy(icfg->ii); free(icfg, M_IPFW); } /* * Provide algo-specific table info */ static void ta_dump_ifidx_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo) { struct iftable_cfg *cfg; cfg = (struct iftable_cfg *)ta_state; tinfo->taclass4 = IPFW_TACLASS_ARRAY; tinfo->size4 = cfg->size; tinfo->count4 = cfg->used; tinfo->itemsize4 = sizeof(struct ifidx); } /* * Prepare state to add to the table: * allocate ifentry and reference needed interface. */ static int ta_prepare_add_ifidx(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_ifidx *tb; char *ifname; struct ifentry *ife; tb = (struct ta_buf_ifidx *)ta_buf; /* Check if string is terminated */ ifname = (char *)tei->paddr; if (strnlen(ifname, IF_NAMESIZE) == IF_NAMESIZE) return (EINVAL); ife = malloc(sizeof(struct ifentry), M_IPFW_TBL, M_WAITOK | M_ZERO); ife->ic.cb = if_notifier; ife->ic.cbdata = ife; if (ipfw_iface_ref(ch, ifname, &ife->ic) != 0) { free(ife, M_IPFW_TBL); return (EINVAL); } /* Use ipfw_iface 'ifname' field as stable storage */ ife->no.name = ife->ic.iface->ifname; tb->ife = ife; return (0); } static int ta_add_ifidx(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct iftable_cfg *icfg; struct ifentry *ife, *tmp; struct ta_buf_ifidx *tb; struct ipfw_iface *iif; struct ifidx *ifi; char *ifname; uint32_t value; tb = (struct ta_buf_ifidx *)ta_buf; ifname = (char *)tei->paddr; icfg = (struct iftable_cfg *)ta_state; ife = tb->ife; ife->icfg = icfg; ife->value = tei->value; tmp = (struct ifentry *)ipfw_objhash_lookup_name(icfg->ii, 0, ifname); if (tmp != NULL) { if ((tei->flags & TEI_FLAGS_UPDATE) == 0) return (EEXIST); /* Exchange values in @tmp and @tei */ value = tmp->value; tmp->value = tei->value; tei->value = value; iif = tmp->ic.iface; if (iif->resolved != 0) { /* We have to update runtime value, too */ ifi = ifidx_find(ti, &iif->ifindex); ifi->value = ife->value; } /* Indicate that update has happened instead of addition */ tei->flags |= TEI_FLAGS_UPDATED; *pnum = 0; return (0); } if ((tei->flags & TEI_FLAGS_DONTADD) != 0) return (EFBIG); /* Link to internal list */ ipfw_objhash_add(icfg->ii, &ife->no); /* Link notifier (possible running its callback) */ ipfw_iface_add_notify(icfg->ch, &ife->ic); icfg->count++; tb->ife = NULL; *pnum = 1; return (0); } /* * Prepare to delete key from table. * Do basic interface name checks. */ static int ta_prepare_del_ifidx(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_ifidx *tb; char *ifname; tb = (struct ta_buf_ifidx *)ta_buf; /* Check if string is terminated */ ifname = (char *)tei->paddr; if (strnlen(ifname, IF_NAMESIZE) == IF_NAMESIZE) return (EINVAL); return (0); } /* * Remove key from both configuration list and * runtime array. Removed interface notification. */ static int ta_del_ifidx(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct iftable_cfg *icfg; struct ifentry *ife; struct ta_buf_ifidx *tb; char *ifname; uint16_t ifindex; int res; tb = (struct ta_buf_ifidx *)ta_buf; ifname = (char *)tei->paddr; icfg = (struct iftable_cfg *)ta_state; ife = tb->ife; ife = (struct ifentry *)ipfw_objhash_lookup_name(icfg->ii, 0, ifname); if (ife == NULL) return (ENOENT); if (ife->linked != 0) { /* We have to remove item from runtime */ ifindex = ife->ic.iface->ifindex; res = bdel(&ifindex, icfg->main_ptr, icfg->used, sizeof(struct ifidx), compare_ifidx); KASSERT(res == 1, ("index %d does not exist", ifindex)); icfg->used--; ti->data = icfg->used; ife->linked = 0; } /* Unlink from local list */ ipfw_objhash_del(icfg->ii, &ife->no); /* Unlink notifier and deref */ ipfw_iface_del_notify(icfg->ch, &ife->ic); ipfw_iface_unref(icfg->ch, &ife->ic); icfg->count--; tei->value = ife->value; tb->ife = ife; *pnum = 1; return (0); } /* * Flush deleted entry. * Drops interface reference and frees entry. */ static void ta_flush_ifidx_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_ifidx *tb; tb = (struct ta_buf_ifidx *)ta_buf; if (tb->ife != NULL) free(tb->ife, M_IPFW_TBL); } /* * Handle interface announce/withdrawal for particular table. * Every real runtime array modification happens here. */ static void if_notifier(struct ip_fw_chain *ch, void *cbdata, uint16_t ifindex) { struct ifentry *ife; struct ifidx ifi; struct iftable_cfg *icfg; struct table_info *ti; int res; ife = (struct ifentry *)cbdata; icfg = ife->icfg; ti = icfg->ti; KASSERT(ti != NULL, ("ti=NULL, check change_ti handler")); if (ife->linked == 0 && ifindex != 0) { /* Interface announce */ ifi.kidx = ifindex; ifi.spare = 0; ifi.value = ife->value; res = badd(&ifindex, &ifi, icfg->main_ptr, icfg->used, sizeof(struct ifidx), compare_ifidx); KASSERT(res == 1, ("index %d already exists", ifindex)); icfg->used++; ti->data = icfg->used; ife->linked = 1; } else if (ife->linked != 0 && ifindex == 0) { /* Interface withdrawal */ ifindex = ife->ic.iface->ifindex; res = bdel(&ifindex, icfg->main_ptr, icfg->used, sizeof(struct ifidx), compare_ifidx); KASSERT(res == 1, ("index %d does not exist", ifindex)); icfg->used--; ti->data = icfg->used; ife->linked = 0; } } /* * Table growing callbacks. */ static int ta_need_modify_ifidx(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags) { struct iftable_cfg *cfg; uint32_t size; cfg = (struct iftable_cfg *)ta_state; size = cfg->size; while (size < cfg->count + count) size *= 2; if (size != cfg->size) { *pflags = size; return (1); } return (0); } /* * Allocate ned, larger runtime ifidx array. */ static int ta_prepare_mod_ifidx(void *ta_buf, uint64_t *pflags) { struct mod_item *mi; mi = (struct mod_item *)ta_buf; memset(mi, 0, sizeof(struct mod_item)); mi->size = *pflags; mi->main_ptr = malloc(sizeof(struct ifidx) * mi->size, M_IPFW, M_WAITOK | M_ZERO); return (0); } /* * Copy data from old runtime array to new one. */ static int ta_fill_mod_ifidx(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t *pflags) { struct mod_item *mi; struct iftable_cfg *icfg; mi = (struct mod_item *)ta_buf; icfg = (struct iftable_cfg *)ta_state; /* Check if we still need to grow array */ if (icfg->size >= mi->size) { *pflags = 0; return (0); } memcpy(mi->main_ptr, icfg->main_ptr, icfg->used * sizeof(struct ifidx)); return (0); } /* * Switch old & new arrays. */ static void ta_modify_ifidx(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t pflags) { struct mod_item *mi; struct iftable_cfg *icfg; void *old_ptr; mi = (struct mod_item *)ta_buf; icfg = (struct iftable_cfg *)ta_state; old_ptr = icfg->main_ptr; icfg->main_ptr = mi->main_ptr; icfg->size = mi->size; ti->state = icfg->main_ptr; mi->main_ptr = old_ptr; } /* * Free unneded array. */ static void ta_flush_mod_ifidx(void *ta_buf) { struct mod_item *mi; mi = (struct mod_item *)ta_buf; if (mi->main_ptr != NULL) free(mi->main_ptr, M_IPFW); } static int ta_dump_ifidx_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent) { struct ifentry *ife; ife = (struct ifentry *)e; tent->masklen = 8 * IF_NAMESIZE; memcpy(&tent->k, ife->no.name, IF_NAMESIZE); tent->v.kidx = ife->value; return (0); } static int ta_find_ifidx_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent) { struct iftable_cfg *icfg; struct ifentry *ife; char *ifname; icfg = (struct iftable_cfg *)ta_state; ifname = tent->k.iface; if (strnlen(ifname, IF_NAMESIZE) == IF_NAMESIZE) return (EINVAL); ife = (struct ifentry *)ipfw_objhash_lookup_name(icfg->ii, 0, ifname); if (ife != NULL) { ta_dump_ifidx_tentry(ta_state, ti, ife, tent); return (0); } return (ENOENT); } struct wa_ifidx { ta_foreach_f *f; void *arg; }; static void foreach_ifidx(struct namedobj_instance *ii, struct named_object *no, void *arg) { struct ifentry *ife; struct wa_ifidx *wa; ife = (struct ifentry *)no; wa = (struct wa_ifidx *)arg; wa->f(ife, wa->arg); } static void ta_foreach_ifidx(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg) { struct iftable_cfg *icfg; struct wa_ifidx wa; icfg = (struct iftable_cfg *)ta_state; wa.f = f; wa.arg = arg; ipfw_objhash_foreach(icfg->ii, foreach_ifidx, &wa); } struct table_algo iface_idx = { .name = "iface:array", .type = IPFW_TABLE_INTERFACE, .flags = TA_FLAG_DEFAULT, .ta_buf_size = sizeof(struct ta_buf_ifidx), .init = ta_init_ifidx, .destroy = ta_destroy_ifidx, .prepare_add = ta_prepare_add_ifidx, .prepare_del = ta_prepare_del_ifidx, .add = ta_add_ifidx, .del = ta_del_ifidx, .flush_entry = ta_flush_ifidx_entry, .foreach = ta_foreach_ifidx, .dump_tentry = ta_dump_ifidx_tentry, .find_tentry = ta_find_ifidx_tentry, .dump_tinfo = ta_dump_ifidx_tinfo, .need_modify = ta_need_modify_ifidx, .prepare_mod = ta_prepare_mod_ifidx, .fill_mod = ta_fill_mod_ifidx, .modify = ta_modify_ifidx, .flush_mod = ta_flush_mod_ifidx, .change_ti = ta_change_ti_ifidx, }; /* * Number array cmds. * * Implementation: * * Runtime part: * - sorted array of "struct numarray" pointed by ti->state. * Array is allocated with rounding up to NUMARRAY_CHUNK. * - current array size is stored in ti->data * */ struct numarray { uint32_t number; uint32_t value; }; struct numarray_cfg { void *main_ptr; size_t size; /* Number of items allocated in array */ size_t used; /* Number of items _active_ now */ }; struct ta_buf_numarray { struct numarray na; }; int compare_numarray(const void *k, const void *v); static struct numarray *numarray_find(struct table_info *ti, void *key); static int ta_lookup_numarray(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val); static int ta_init_numarray(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags); static void ta_destroy_numarray(void *ta_state, struct table_info *ti); static void ta_dump_numarray_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo); static int ta_prepare_add_numarray(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_add_numarray(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static int ta_del_numarray(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static void ta_flush_numarray_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_need_modify_numarray(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags); static int ta_prepare_mod_numarray(void *ta_buf, uint64_t *pflags); static int ta_fill_mod_numarray(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t *pflags); static void ta_modify_numarray(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t pflags); static void ta_flush_mod_numarray(void *ta_buf); static int ta_dump_numarray_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent); static int ta_find_numarray_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent); static void ta_foreach_numarray(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg); int compare_numarray(const void *k, const void *v) { const struct numarray *na; uint32_t key; key = *((const uint32_t *)k); na = (const struct numarray *)v; if (key < na->number) return (-1); else if (key > na->number) return (1); return (0); } static struct numarray * numarray_find(struct table_info *ti, void *key) { struct numarray *ri; ri = bsearch(key, ti->state, ti->data, sizeof(struct numarray), compare_ifidx); return (ri); } static int ta_lookup_numarray(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val) { struct numarray *ri; ri = numarray_find(ti, key); if (ri != NULL) { *val = ri->value; return (1); } return (0); } static int ta_init_numarray(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags) { struct numarray_cfg *cfg; cfg = malloc(sizeof(*cfg), M_IPFW, M_WAITOK | M_ZERO); cfg->size = 16; cfg->main_ptr = malloc(sizeof(struct numarray) * cfg->size, M_IPFW, M_WAITOK | M_ZERO); *ta_state = cfg; ti->state = cfg->main_ptr; ti->lookup = ta_lookup_numarray; return (0); } /* * Destroys table @ti */ static void ta_destroy_numarray(void *ta_state, struct table_info *ti) { struct numarray_cfg *cfg; cfg = (struct numarray_cfg *)ta_state; if (cfg->main_ptr != NULL) free(cfg->main_ptr, M_IPFW); free(cfg, M_IPFW); } /* * Provide algo-specific table info */ static void ta_dump_numarray_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo) { struct numarray_cfg *cfg; cfg = (struct numarray_cfg *)ta_state; tinfo->taclass4 = IPFW_TACLASS_ARRAY; tinfo->size4 = cfg->size; tinfo->count4 = cfg->used; tinfo->itemsize4 = sizeof(struct numarray); } /* * Prepare for addition/deletion to an array. */ static int ta_prepare_add_numarray(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_numarray *tb; tb = (struct ta_buf_numarray *)ta_buf; tb->na.number = *((uint32_t *)tei->paddr); return (0); } static int ta_add_numarray(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct numarray_cfg *cfg; struct ta_buf_numarray *tb; struct numarray *ri; int res; uint32_t value; tb = (struct ta_buf_numarray *)ta_buf; cfg = (struct numarray_cfg *)ta_state; /* Read current value from @tei */ tb->na.value = tei->value; ri = numarray_find(ti, &tb->na.number); if (ri != NULL) { if ((tei->flags & TEI_FLAGS_UPDATE) == 0) return (EEXIST); /* Exchange values between ri and @tei */ value = ri->value; ri->value = tei->value; tei->value = value; /* Indicate that update has happened instead of addition */ tei->flags |= TEI_FLAGS_UPDATED; *pnum = 0; return (0); } if ((tei->flags & TEI_FLAGS_DONTADD) != 0) return (EFBIG); res = badd(&tb->na.number, &tb->na, cfg->main_ptr, cfg->used, sizeof(struct numarray), compare_numarray); KASSERT(res == 1, ("number %d already exists", tb->na.number)); cfg->used++; ti->data = cfg->used; *pnum = 1; return (0); } /* * Remove key from both configuration list and * runtime array. Removed interface notification. */ static int ta_del_numarray(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct numarray_cfg *cfg; struct ta_buf_numarray *tb; struct numarray *ri; int res; tb = (struct ta_buf_numarray *)ta_buf; cfg = (struct numarray_cfg *)ta_state; ri = numarray_find(ti, &tb->na.number); if (ri == NULL) return (ENOENT); tei->value = ri->value; res = bdel(&tb->na.number, cfg->main_ptr, cfg->used, sizeof(struct numarray), compare_numarray); KASSERT(res == 1, ("number %u does not exist", tb->na.number)); cfg->used--; ti->data = cfg->used; *pnum = 1; return (0); } static void ta_flush_numarray_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { /* We don't have any state, do nothing */ } /* * Table growing callbacks. */ static int ta_need_modify_numarray(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags) { struct numarray_cfg *cfg; size_t size; cfg = (struct numarray_cfg *)ta_state; size = cfg->size; while (size < cfg->used + count) size *= 2; if (size != cfg->size) { *pflags = size; return (1); } return (0); } /* * Allocate new, larger runtime array. */ static int ta_prepare_mod_numarray(void *ta_buf, uint64_t *pflags) { struct mod_item *mi; mi = (struct mod_item *)ta_buf; memset(mi, 0, sizeof(struct mod_item)); mi->size = *pflags; mi->main_ptr = malloc(sizeof(struct numarray) * mi->size, M_IPFW, M_WAITOK | M_ZERO); return (0); } /* * Copy data from old runtime array to new one. */ static int ta_fill_mod_numarray(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t *pflags) { struct mod_item *mi; struct numarray_cfg *cfg; mi = (struct mod_item *)ta_buf; cfg = (struct numarray_cfg *)ta_state; /* Check if we still need to grow array */ if (cfg->size >= mi->size) { *pflags = 0; return (0); } memcpy(mi->main_ptr, cfg->main_ptr, cfg->used * sizeof(struct numarray)); return (0); } /* * Switch old & new arrays. */ static void ta_modify_numarray(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t pflags) { struct mod_item *mi; struct numarray_cfg *cfg; void *old_ptr; mi = (struct mod_item *)ta_buf; cfg = (struct numarray_cfg *)ta_state; old_ptr = cfg->main_ptr; cfg->main_ptr = mi->main_ptr; cfg->size = mi->size; ti->state = cfg->main_ptr; mi->main_ptr = old_ptr; } /* * Free unneded array. */ static void ta_flush_mod_numarray(void *ta_buf) { struct mod_item *mi; mi = (struct mod_item *)ta_buf; if (mi->main_ptr != NULL) free(mi->main_ptr, M_IPFW); } static int ta_dump_numarray_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent) { struct numarray *na; na = (struct numarray *)e; tent->k.key = na->number; tent->v.kidx = na->value; return (0); } static int ta_find_numarray_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent) { struct numarray_cfg *cfg; struct numarray *ri; cfg = (struct numarray_cfg *)ta_state; ri = numarray_find(ti, &tent->k.key); if (ri != NULL) { ta_dump_numarray_tentry(ta_state, ti, ri, tent); return (0); } return (ENOENT); } static void ta_foreach_numarray(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg) { struct numarray_cfg *cfg; struct numarray *array; int i; cfg = (struct numarray_cfg *)ta_state; array = cfg->main_ptr; for (i = 0; i < cfg->used; i++) f(&array[i], arg); } struct table_algo number_array = { .name = "number:array", .type = IPFW_TABLE_NUMBER, .ta_buf_size = sizeof(struct ta_buf_numarray), .init = ta_init_numarray, .destroy = ta_destroy_numarray, .prepare_add = ta_prepare_add_numarray, .prepare_del = ta_prepare_add_numarray, .add = ta_add_numarray, .del = ta_del_numarray, .flush_entry = ta_flush_numarray_entry, .foreach = ta_foreach_numarray, .dump_tentry = ta_dump_numarray_tentry, .find_tentry = ta_find_numarray_tentry, .dump_tinfo = ta_dump_numarray_tinfo, .need_modify = ta_need_modify_numarray, .prepare_mod = ta_prepare_mod_numarray, .fill_mod = ta_fill_mod_numarray, .modify = ta_modify_numarray, .flush_mod = ta_flush_mod_numarray, }; /* * flow:hash cmds * * * ti->data: * [inv.mask4][inv.mask6][log2hsize4][log2hsize6] * [ 8][ 8[ 8][ 8] * * inv.mask4: 32 - mask * inv.mask6: * 1) _slow lookup: mask * 2) _aligned: (128 - mask) / 8 * 3) _64: 8 * * * pflags: * [hsize4][hsize6] * [ 16][ 16] */ struct fhashentry; SLIST_HEAD(fhashbhead, fhashentry); struct fhashentry { SLIST_ENTRY(fhashentry) next; uint8_t af; uint8_t proto; uint16_t spare0; uint16_t dport; uint16_t sport; uint32_t value; uint32_t spare1; }; struct fhashentry4 { struct fhashentry e; struct in_addr dip; struct in_addr sip; }; struct fhashentry6 { struct fhashentry e; struct in6_addr dip6; struct in6_addr sip6; }; struct fhash_cfg { struct fhashbhead *head; size_t size; size_t items; struct fhashentry4 fe4; struct fhashentry6 fe6; }; struct ta_buf_fhash { void *ent_ptr; struct fhashentry6 fe6; }; static __inline int cmp_flow_ent(struct fhashentry *a, struct fhashentry *b, size_t sz); static __inline uint32_t hash_flow4(struct fhashentry4 *f, int hsize); static __inline uint32_t hash_flow6(struct fhashentry6 *f, int hsize); static uint32_t hash_flow_ent(struct fhashentry *ent, uint32_t size); static int ta_lookup_fhash(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val); static int ta_init_fhash(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags); static void ta_destroy_fhash(void *ta_state, struct table_info *ti); static void ta_dump_fhash_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo); static int ta_dump_fhash_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent); static int tei_to_fhash_ent(struct tentry_info *tei, struct fhashentry *ent); static int ta_find_fhash_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent); static void ta_foreach_fhash(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg); static int ta_prepare_add_fhash(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_add_fhash(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static int ta_prepare_del_fhash(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_del_fhash(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum); static void ta_flush_fhash_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf); static int ta_need_modify_fhash(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags); static int ta_prepare_mod_fhash(void *ta_buf, uint64_t *pflags); static int ta_fill_mod_fhash(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t *pflags); static void ta_modify_fhash(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t pflags); static void ta_flush_mod_fhash(void *ta_buf); static __inline int cmp_flow_ent(struct fhashentry *a, struct fhashentry *b, size_t sz) { uint64_t *ka, *kb; ka = (uint64_t *)(&a->next + 1); kb = (uint64_t *)(&b->next + 1); if (*ka == *kb && (memcmp(a + 1, b + 1, sz) == 0)) return (1); return (0); } static __inline uint32_t hash_flow4(struct fhashentry4 *f, int hsize) { uint32_t i; i = (f->dip.s_addr) ^ (f->sip.s_addr) ^ (f->e.dport) ^ (f->e.sport); return (i % (hsize - 1)); } static __inline uint32_t hash_flow6(struct fhashentry6 *f, int hsize) { uint32_t i; i = (f->dip6.__u6_addr.__u6_addr32[2]) ^ (f->dip6.__u6_addr.__u6_addr32[3]) ^ (f->sip6.__u6_addr.__u6_addr32[2]) ^ (f->sip6.__u6_addr.__u6_addr32[3]) ^ (f->e.dport) ^ (f->e.sport); return (i % (hsize - 1)); } static uint32_t hash_flow_ent(struct fhashentry *ent, uint32_t size) { uint32_t hash; if (ent->af == AF_INET) { hash = hash_flow4((struct fhashentry4 *)ent, size); } else { hash = hash_flow6((struct fhashentry6 *)ent, size); } return (hash); } static int ta_lookup_fhash(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val) { struct fhashbhead *head; struct fhashentry *ent; struct fhashentry4 *m4; struct ipfw_flow_id *id; uint16_t hash, hsize; id = (struct ipfw_flow_id *)key; head = (struct fhashbhead *)ti->state; hsize = ti->data; m4 = (struct fhashentry4 *)ti->xstate; if (id->addr_type == 4) { struct fhashentry4 f; /* Copy hash mask */ f = *m4; f.dip.s_addr &= id->dst_ip; f.sip.s_addr &= id->src_ip; f.e.dport &= id->dst_port; f.e.sport &= id->src_port; f.e.proto &= id->proto; hash = hash_flow4(&f, hsize); SLIST_FOREACH(ent, &head[hash], next) { if (cmp_flow_ent(ent, &f.e, 2 * 4) != 0) { *val = ent->value; return (1); } } } else if (id->addr_type == 6) { struct fhashentry6 f; uint64_t *fp, *idp; /* Copy hash mask */ f = *((struct fhashentry6 *)(m4 + 1)); /* Handle lack of __u6_addr.__u6_addr64 */ fp = (uint64_t *)&f.dip6; idp = (uint64_t *)&id->dst_ip6; /* src IPv6 is stored after dst IPv6 */ *fp++ &= *idp++; *fp++ &= *idp++; *fp++ &= *idp++; *fp &= *idp; f.e.dport &= id->dst_port; f.e.sport &= id->src_port; f.e.proto &= id->proto; hash = hash_flow6(&f, hsize); SLIST_FOREACH(ent, &head[hash], next) { if (cmp_flow_ent(ent, &f.e, 2 * 16) != 0) { *val = ent->value; return (1); } } } return (0); } /* * New table. */ static int ta_init_fhash(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags) { int i; struct fhash_cfg *cfg; struct fhashentry4 *fe4; struct fhashentry6 *fe6; cfg = malloc(sizeof(struct fhash_cfg), M_IPFW, M_WAITOK | M_ZERO); cfg->size = 512; cfg->head = malloc(sizeof(struct fhashbhead) * cfg->size, M_IPFW, M_WAITOK | M_ZERO); for (i = 0; i < cfg->size; i++) SLIST_INIT(&cfg->head[i]); /* Fill in fe masks based on @tflags */ fe4 = &cfg->fe4; fe6 = &cfg->fe6; if (tflags & IPFW_TFFLAG_SRCIP) { memset(&fe4->sip, 0xFF, sizeof(fe4->sip)); memset(&fe6->sip6, 0xFF, sizeof(fe6->sip6)); } if (tflags & IPFW_TFFLAG_DSTIP) { memset(&fe4->dip, 0xFF, sizeof(fe4->dip)); memset(&fe6->dip6, 0xFF, sizeof(fe6->dip6)); } if (tflags & IPFW_TFFLAG_SRCPORT) { memset(&fe4->e.sport, 0xFF, sizeof(fe4->e.sport)); memset(&fe6->e.sport, 0xFF, sizeof(fe6->e.sport)); } if (tflags & IPFW_TFFLAG_DSTPORT) { memset(&fe4->e.dport, 0xFF, sizeof(fe4->e.dport)); memset(&fe6->e.dport, 0xFF, sizeof(fe6->e.dport)); } if (tflags & IPFW_TFFLAG_PROTO) { memset(&fe4->e.proto, 0xFF, sizeof(fe4->e.proto)); memset(&fe6->e.proto, 0xFF, sizeof(fe6->e.proto)); } fe4->e.af = AF_INET; fe6->e.af = AF_INET6; *ta_state = cfg; ti->state = cfg->head; ti->xstate = &cfg->fe4; ti->data = cfg->size; ti->lookup = ta_lookup_fhash; return (0); } static void ta_destroy_fhash(void *ta_state, struct table_info *ti) { struct fhash_cfg *cfg; struct fhashentry *ent, *ent_next; int i; cfg = (struct fhash_cfg *)ta_state; for (i = 0; i < cfg->size; i++) SLIST_FOREACH_SAFE(ent, &cfg->head[i], next, ent_next) free(ent, M_IPFW_TBL); free(cfg->head, M_IPFW); free(cfg, M_IPFW); } /* * Provide algo-specific table info */ static void ta_dump_fhash_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo) { struct fhash_cfg *cfg; cfg = (struct fhash_cfg *)ta_state; tinfo->flags = IPFW_TATFLAGS_AFITEM; tinfo->taclass4 = IPFW_TACLASS_HASH; tinfo->size4 = cfg->size; tinfo->count4 = cfg->items; tinfo->itemsize4 = sizeof(struct fhashentry4); tinfo->itemsize6 = sizeof(struct fhashentry6); } static int ta_dump_fhash_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent) { struct fhash_cfg *cfg; struct fhashentry *ent; struct fhashentry4 *fe4; #ifdef INET6 struct fhashentry6 *fe6; #endif struct tflow_entry *tfe; cfg = (struct fhash_cfg *)ta_state; ent = (struct fhashentry *)e; tfe = &tent->k.flow; tfe->af = ent->af; tfe->proto = ent->proto; tfe->dport = htons(ent->dport); tfe->sport = htons(ent->sport); tent->v.kidx = ent->value; tent->subtype = ent->af; if (ent->af == AF_INET) { fe4 = (struct fhashentry4 *)ent; tfe->a.a4.sip.s_addr = htonl(fe4->sip.s_addr); tfe->a.a4.dip.s_addr = htonl(fe4->dip.s_addr); tent->masklen = 32; #ifdef INET6 } else { fe6 = (struct fhashentry6 *)ent; tfe->a.a6.sip6 = fe6->sip6; tfe->a.a6.dip6 = fe6->dip6; tent->masklen = 128; #endif } return (0); } static int tei_to_fhash_ent(struct tentry_info *tei, struct fhashentry *ent) { #ifdef INET struct fhashentry4 *fe4; #endif #ifdef INET6 struct fhashentry6 *fe6; #endif struct tflow_entry *tfe; tfe = (struct tflow_entry *)tei->paddr; ent->af = tei->subtype; ent->proto = tfe->proto; ent->dport = ntohs(tfe->dport); ent->sport = ntohs(tfe->sport); if (tei->subtype == AF_INET) { #ifdef INET fe4 = (struct fhashentry4 *)ent; fe4->sip.s_addr = ntohl(tfe->a.a4.sip.s_addr); fe4->dip.s_addr = ntohl(tfe->a.a4.dip.s_addr); #endif #ifdef INET6 } else if (tei->subtype == AF_INET6) { fe6 = (struct fhashentry6 *)ent; fe6->sip6 = tfe->a.a6.sip6; fe6->dip6 = tfe->a.a6.dip6; #endif } else { /* Unknown CIDR type */ return (EINVAL); } return (0); } static int ta_find_fhash_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent) { struct fhash_cfg *cfg; struct fhashbhead *head; struct fhashentry *ent, *tmp; struct fhashentry6 fe6; struct tentry_info tei; int error; uint32_t hash; size_t sz; cfg = (struct fhash_cfg *)ta_state; ent = &fe6.e; memset(&fe6, 0, sizeof(fe6)); memset(&tei, 0, sizeof(tei)); tei.paddr = &tent->k.flow; tei.subtype = tent->subtype; if ((error = tei_to_fhash_ent(&tei, ent)) != 0) return (error); head = cfg->head; hash = hash_flow_ent(ent, cfg->size); if (tei.subtype == AF_INET) sz = 2 * sizeof(struct in_addr); else sz = 2 * sizeof(struct in6_addr); /* Check for existence */ SLIST_FOREACH(tmp, &head[hash], next) { if (cmp_flow_ent(tmp, ent, sz) != 0) { ta_dump_fhash_tentry(ta_state, ti, tmp, tent); return (0); } } return (ENOENT); } static void ta_foreach_fhash(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg) { struct fhash_cfg *cfg; struct fhashentry *ent, *ent_next; int i; cfg = (struct fhash_cfg *)ta_state; for (i = 0; i < cfg->size; i++) SLIST_FOREACH_SAFE(ent, &cfg->head[i], next, ent_next) f(ent, arg); } static int ta_prepare_add_fhash(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_fhash *tb; struct fhashentry *ent; size_t sz; int error; tb = (struct ta_buf_fhash *)ta_buf; if (tei->subtype == AF_INET) sz = sizeof(struct fhashentry4); else if (tei->subtype == AF_INET6) sz = sizeof(struct fhashentry6); else return (EINVAL); ent = malloc(sz, M_IPFW_TBL, M_WAITOK | M_ZERO); error = tei_to_fhash_ent(tei, ent); if (error != 0) { free(ent, M_IPFW_TBL); return (error); } tb->ent_ptr = ent; return (0); } static int ta_add_fhash(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct fhash_cfg *cfg; struct fhashbhead *head; struct fhashentry *ent, *tmp; struct ta_buf_fhash *tb; int exists; uint32_t hash, value; size_t sz; cfg = (struct fhash_cfg *)ta_state; tb = (struct ta_buf_fhash *)ta_buf; ent = (struct fhashentry *)tb->ent_ptr; exists = 0; /* Read current value from @tei */ ent->value = tei->value; head = cfg->head; hash = hash_flow_ent(ent, cfg->size); if (tei->subtype == AF_INET) sz = 2 * sizeof(struct in_addr); else sz = 2 * sizeof(struct in6_addr); /* Check for existence */ SLIST_FOREACH(tmp, &head[hash], next) { if (cmp_flow_ent(tmp, ent, sz) != 0) { exists = 1; break; } } if (exists == 1) { if ((tei->flags & TEI_FLAGS_UPDATE) == 0) return (EEXIST); /* Record already exists. Update value if we're asked to */ /* Exchange values between tmp and @tei */ value = tmp->value; tmp->value = tei->value; tei->value = value; /* Indicate that update has happened instead of addition */ tei->flags |= TEI_FLAGS_UPDATED; *pnum = 0; } else { if ((tei->flags & TEI_FLAGS_DONTADD) != 0) return (EFBIG); SLIST_INSERT_HEAD(&head[hash], ent, next); tb->ent_ptr = NULL; *pnum = 1; /* Update counters and check if we need to grow hash */ cfg->items++; } return (0); } static int ta_prepare_del_fhash(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_fhash *tb; tb = (struct ta_buf_fhash *)ta_buf; return (tei_to_fhash_ent(tei, &tb->fe6.e)); } static int ta_del_fhash(void *ta_state, struct table_info *ti, struct tentry_info *tei, void *ta_buf, uint32_t *pnum) { struct fhash_cfg *cfg; struct fhashbhead *head; struct fhashentry *ent, *tmp; struct ta_buf_fhash *tb; uint32_t hash; size_t sz; cfg = (struct fhash_cfg *)ta_state; tb = (struct ta_buf_fhash *)ta_buf; ent = &tb->fe6.e; head = cfg->head; hash = hash_flow_ent(ent, cfg->size); if (tei->subtype == AF_INET) sz = 2 * sizeof(struct in_addr); else sz = 2 * sizeof(struct in6_addr); /* Check for existence */ SLIST_FOREACH(tmp, &head[hash], next) { if (cmp_flow_ent(tmp, ent, sz) == 0) continue; SLIST_REMOVE(&head[hash], tmp, fhashentry, next); tei->value = tmp->value; *pnum = 1; cfg->items--; tb->ent_ptr = tmp; return (0); } return (ENOENT); } static void ta_flush_fhash_entry(struct ip_fw_chain *ch, struct tentry_info *tei, void *ta_buf) { struct ta_buf_fhash *tb; tb = (struct ta_buf_fhash *)ta_buf; if (tb->ent_ptr != NULL) free(tb->ent_ptr, M_IPFW_TBL); } /* * Hash growing callbacks. */ static int ta_need_modify_fhash(void *ta_state, struct table_info *ti, uint32_t count, uint64_t *pflags) { struct fhash_cfg *cfg; cfg = (struct fhash_cfg *)ta_state; if (cfg->items > cfg->size && cfg->size < 65536) { *pflags = cfg->size * 2; return (1); } return (0); } /* * Allocate new, larger fhash. */ static int ta_prepare_mod_fhash(void *ta_buf, uint64_t *pflags) { struct mod_item *mi; struct fhashbhead *head; int i; mi = (struct mod_item *)ta_buf; memset(mi, 0, sizeof(struct mod_item)); mi->size = *pflags; head = malloc(sizeof(struct fhashbhead) * mi->size, M_IPFW, M_WAITOK | M_ZERO); for (i = 0; i < mi->size; i++) SLIST_INIT(&head[i]); mi->main_ptr = head; return (0); } /* * Copy data from old runtime array to new one. */ static int ta_fill_mod_fhash(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t *pflags) { /* In is not possible to do rehash if we're not holidng WLOCK. */ return (0); } /* * Switch old & new arrays. */ static void ta_modify_fhash(void *ta_state, struct table_info *ti, void *ta_buf, uint64_t pflags) { struct mod_item *mi; struct fhash_cfg *cfg; struct fhashbhead *old_head, *new_head; struct fhashentry *ent, *ent_next; int i; uint32_t nhash; size_t old_size; mi = (struct mod_item *)ta_buf; cfg = (struct fhash_cfg *)ta_state; old_size = cfg->size; old_head = ti->state; new_head = (struct fhashbhead *)mi->main_ptr; for (i = 0; i < old_size; i++) { SLIST_FOREACH_SAFE(ent, &old_head[i], next, ent_next) { nhash = hash_flow_ent(ent, mi->size); SLIST_INSERT_HEAD(&new_head[nhash], ent, next); } } ti->state = new_head; ti->data = mi->size; cfg->head = new_head; cfg->size = mi->size; mi->main_ptr = old_head; } /* * Free unneded array. */ static void ta_flush_mod_fhash(void *ta_buf) { struct mod_item *mi; mi = (struct mod_item *)ta_buf; if (mi->main_ptr != NULL) free(mi->main_ptr, M_IPFW); } struct table_algo flow_hash = { .name = "flow:hash", .type = IPFW_TABLE_FLOW, .flags = TA_FLAG_DEFAULT, .ta_buf_size = sizeof(struct ta_buf_fhash), .init = ta_init_fhash, .destroy = ta_destroy_fhash, .prepare_add = ta_prepare_add_fhash, .prepare_del = ta_prepare_del_fhash, .add = ta_add_fhash, .del = ta_del_fhash, .flush_entry = ta_flush_fhash_entry, .foreach = ta_foreach_fhash, .dump_tentry = ta_dump_fhash_tentry, .find_tentry = ta_find_fhash_tentry, .dump_tinfo = ta_dump_fhash_tinfo, .need_modify = ta_need_modify_fhash, .prepare_mod = ta_prepare_mod_fhash, .fill_mod = ta_fill_mod_fhash, .modify = ta_modify_fhash, .flush_mod = ta_flush_mod_fhash, }; /* * Kernel fibs bindings. * * Implementation: * * Runtime part: * - fully relies on route API * - fib number is stored in ti->data * */ static int ta_lookup_kfib(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val); static int kfib_parse_opts(int *pfib, char *data); static void ta_print_kfib_config(void *ta_state, struct table_info *ti, char *buf, size_t bufsize); static int ta_init_kfib(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags); static void ta_destroy_kfib(void *ta_state, struct table_info *ti); static void ta_dump_kfib_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo); static int contigmask(uint8_t *p, int len); static int ta_dump_kfib_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent); static int ta_dump_kfib_tentry_int(struct sockaddr *paddr, struct sockaddr *pmask, ipfw_obj_tentry *tent); static int ta_find_kfib_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent); static void ta_foreach_kfib(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg); static int ta_lookup_kfib(struct table_info *ti, void *key, uint32_t keylen, uint32_t *val) { #ifdef INET struct nhop4_basic nh4; struct in_addr in; #endif #ifdef INET6 struct nhop6_basic nh6; #endif int error; error = ENOENT; #ifdef INET if (keylen == 4) { in.s_addr = *(in_addr_t *)key; error = fib4_lookup_nh_basic(ti->data, in, 0, 0, &nh4); } #endif #ifdef INET6 if (keylen == 6) error = fib6_lookup_nh_basic(ti->data, (struct in6_addr *)key, 0, 0, 0, &nh6); #endif if (error != 0) return (0); *val = 0; return (1); } /* Parse 'fib=%d' */ static int kfib_parse_opts(int *pfib, char *data) { char *pdel, *pend, *s; int fibnum; if (data == NULL) return (0); if ((pdel = strchr(data, ' ')) == NULL) return (0); while (*pdel == ' ') pdel++; if (strncmp(pdel, "fib=", 4) != 0) return (EINVAL); if ((s = strchr(pdel, ' ')) != NULL) *s++ = '\0'; pdel += 4; /* Need \d+ */ fibnum = strtol(pdel, &pend, 10); if (*pend != '\0') return (EINVAL); *pfib = fibnum; return (0); } static void ta_print_kfib_config(void *ta_state, struct table_info *ti, char *buf, size_t bufsize) { if (ti->data != 0) snprintf(buf, bufsize, "%s fib=%lu", "addr:kfib", ti->data); else snprintf(buf, bufsize, "%s", "addr:kfib"); } static int ta_init_kfib(struct ip_fw_chain *ch, void **ta_state, struct table_info *ti, char *data, uint8_t tflags) { int error, fibnum; fibnum = 0; if ((error = kfib_parse_opts(&fibnum, data)) != 0) return (error); if (fibnum >= rt_numfibs) return (E2BIG); ti->data = fibnum; ti->lookup = ta_lookup_kfib; return (0); } /* * Destroys table @ti */ static void ta_destroy_kfib(void *ta_state, struct table_info *ti) { } /* * Provide algo-specific table info */ static void ta_dump_kfib_tinfo(void *ta_state, struct table_info *ti, ipfw_ta_tinfo *tinfo) { tinfo->flags = IPFW_TATFLAGS_AFDATA; tinfo->taclass4 = IPFW_TACLASS_RADIX; tinfo->count4 = 0; tinfo->itemsize4 = sizeof(struct rtentry); tinfo->taclass6 = IPFW_TACLASS_RADIX; tinfo->count6 = 0; tinfo->itemsize6 = sizeof(struct rtentry); } static int contigmask(uint8_t *p, int len) { int i, n; for (i = 0; i < len ; i++) if ( (p[i/8] & (1 << (7 - (i%8)))) == 0) /* first bit unset */ break; for (n= i + 1; n < len; n++) if ( (p[n/8] & (1 << (7 - (n % 8)))) != 0) return (-1); /* mask not contiguous */ return (i); } static int ta_dump_kfib_tentry(void *ta_state, struct table_info *ti, void *e, ipfw_obj_tentry *tent) { struct rtentry *rte; rte = (struct rtentry *)e; return ta_dump_kfib_tentry_int(rt_key(rte), rt_mask(rte), tent); } static int ta_dump_kfib_tentry_int(struct sockaddr *paddr, struct sockaddr *pmask, ipfw_obj_tentry *tent) { #ifdef INET struct sockaddr_in *addr, *mask; #endif #ifdef INET6 struct sockaddr_in6 *addr6, *mask6; #endif int len; len = 0; /* Guess IPv4/IPv6 radix by sockaddr family */ #ifdef INET if (paddr->sa_family == AF_INET) { addr = (struct sockaddr_in *)paddr; mask = (struct sockaddr_in *)pmask; tent->k.addr.s_addr = addr->sin_addr.s_addr; len = 32; if (mask != NULL) len = contigmask((uint8_t *)&mask->sin_addr, 32); if (len == -1) len = 0; tent->masklen = len; tent->subtype = AF_INET; tent->v.kidx = 0; /* Do we need to put GW here? */ } #endif #ifdef INET6 if (paddr->sa_family == AF_INET6) { addr6 = (struct sockaddr_in6 *)paddr; mask6 = (struct sockaddr_in6 *)pmask; memcpy(&tent->k, &addr6->sin6_addr, sizeof(struct in6_addr)); len = 128; if (mask6 != NULL) len = contigmask((uint8_t *)&mask6->sin6_addr, 128); if (len == -1) len = 0; tent->masklen = len; tent->subtype = AF_INET6; tent->v.kidx = 0; } #endif return (0); } static int ta_find_kfib_tentry(void *ta_state, struct table_info *ti, ipfw_obj_tentry *tent) { struct rt_addrinfo info; struct sockaddr_in6 key6, dst6, mask6; struct sockaddr *dst, *key, *mask; /* Prepare sockaddr for prefix/mask and info */ bzero(&dst6, sizeof(dst6)); dst6.sin6_len = sizeof(dst6); dst = (struct sockaddr *)&dst6; bzero(&mask6, sizeof(mask6)); mask6.sin6_len = sizeof(mask6); mask = (struct sockaddr *)&mask6; bzero(&info, sizeof(info)); info.rti_info[RTAX_DST] = dst; info.rti_info[RTAX_NETMASK] = mask; /* Prepare the lookup key */ bzero(&key6, sizeof(key6)); key6.sin6_family = tent->subtype; key = (struct sockaddr *)&key6; if (tent->subtype == AF_INET) { ((struct sockaddr_in *)&key6)->sin_addr = tent->k.addr; key6.sin6_len = sizeof(struct sockaddr_in); } else { key6.sin6_addr = tent->k.addr6; key6.sin6_len = sizeof(struct sockaddr_in6); } if (rib_lookup_info(ti->data, key, 0, 0, &info) != 0) return (ENOENT); if ((info.rti_addrs & RTA_NETMASK) == 0) mask = NULL; ta_dump_kfib_tentry_int(dst, mask, tent); return (0); } static void ta_foreach_kfib(void *ta_state, struct table_info *ti, ta_foreach_f *f, void *arg) { - struct radix_node_head *rnh; + struct rib_head *rh; int error; - rnh = rt_tables_get_rnh(ti->data, AF_INET); - if (rnh != NULL) { - RADIX_NODE_HEAD_RLOCK(rnh); - error = rnh->rnh_walktree(rnh, (walktree_f_t *)f, arg); - RADIX_NODE_HEAD_RUNLOCK(rnh); + rh = rt_tables_get_rnh(ti->data, AF_INET); + if (rh != NULL) { + RIB_RLOCK(rh); + error = rh->rnh_walktree(&rh->head, (walktree_f_t *)f, arg); + RIB_RUNLOCK(rh); } - rnh = rt_tables_get_rnh(ti->data, AF_INET6); - if (rnh != NULL) { - RADIX_NODE_HEAD_RLOCK(rnh); - error = rnh->rnh_walktree(rnh, (walktree_f_t *)f, arg); - RADIX_NODE_HEAD_RUNLOCK(rnh); + rh = rt_tables_get_rnh(ti->data, AF_INET6); + if (rh != NULL) { + RIB_RLOCK(rh); + error = rh->rnh_walktree(&rh->head, (walktree_f_t *)f, arg); + RIB_RUNLOCK(rh); } } struct table_algo addr_kfib = { .name = "addr:kfib", .type = IPFW_TABLE_ADDR, .flags = TA_FLAG_READONLY, .ta_buf_size = 0, .init = ta_init_kfib, .destroy = ta_destroy_kfib, .foreach = ta_foreach_kfib, .dump_tentry = ta_dump_kfib_tentry, .find_tentry = ta_find_kfib_tentry, .dump_tinfo = ta_dump_kfib_tinfo, .print_config = ta_print_kfib_config, }; void ipfw_table_algo_init(struct ip_fw_chain *ch) { size_t sz; /* * Register all algorithms presented here. */ sz = sizeof(struct table_algo); ipfw_add_table_algo(ch, &addr_radix, sz, &addr_radix.idx); ipfw_add_table_algo(ch, &addr_hash, sz, &addr_hash.idx); ipfw_add_table_algo(ch, &iface_idx, sz, &iface_idx.idx); ipfw_add_table_algo(ch, &number_array, sz, &number_array.idx); ipfw_add_table_algo(ch, &flow_hash, sz, &flow_hash.idx); ipfw_add_table_algo(ch, &addr_kfib, sz, &addr_kfib.idx); } void ipfw_table_algo_destroy(struct ip_fw_chain *ch) { ipfw_del_table_algo(ch, addr_radix.idx); ipfw_del_table_algo(ch, addr_hash.idx); ipfw_del_table_algo(ch, iface_idx.idx); ipfw_del_table_algo(ch, number_array.idx); ipfw_del_table_algo(ch, flow_hash.idx); ipfw_del_table_algo(ch, addr_kfib.idx); } Index: projects/clang380-import/sys/netpfil/pf/pf_table.c =================================================================== --- projects/clang380-import/sys/netpfil/pf/pf_table.c (revision 294776) +++ projects/clang380-import/sys/netpfil/pf/pf_table.c (revision 294777) @@ -1,2193 +1,2193 @@ /*- * Copyright (c) 2002 Cedric Berger * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * - Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials provided * with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE * COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * $OpenBSD: pf_table.c,v 1.79 2008/10/08 06:24:50 mcbride Exp $ */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #define ACCEPT_FLAGS(flags, oklist) \ do { \ if ((flags & ~(oklist)) & \ PFR_FLAG_ALLMASK) \ return (EINVAL); \ } while (0) #define FILLIN_SIN(sin, addr) \ do { \ (sin).sin_len = sizeof(sin); \ (sin).sin_family = AF_INET; \ (sin).sin_addr = (addr); \ } while (0) #define FILLIN_SIN6(sin6, addr) \ do { \ (sin6).sin6_len = sizeof(sin6); \ (sin6).sin6_family = AF_INET6; \ (sin6).sin6_addr = (addr); \ } while (0) #define SWAP(type, a1, a2) \ do { \ type tmp = a1; \ a1 = a2; \ a2 = tmp; \ } while (0) #define SUNION2PF(su, af) (((af)==AF_INET) ? \ (struct pf_addr *)&(su)->sin.sin_addr : \ (struct pf_addr *)&(su)->sin6.sin6_addr) #define AF_BITS(af) (((af)==AF_INET)?32:128) #define ADDR_NETWORK(ad) ((ad)->pfra_net < AF_BITS((ad)->pfra_af)) #define KENTRY_NETWORK(ke) ((ke)->pfrke_net < AF_BITS((ke)->pfrke_af)) #define KENTRY_RNF_ROOT(ke) \ ((((struct radix_node *)(ke))->rn_flags & RNF_ROOT) != 0) #define NO_ADDRESSES (-1) #define ENQUEUE_UNMARKED_ONLY (1) #define INVERT_NEG_FLAG (1) struct pfr_walktree { enum pfrw_op { PFRW_MARK, PFRW_SWEEP, PFRW_ENQUEUE, PFRW_GET_ADDRS, PFRW_GET_ASTATS, PFRW_POOL_GET, PFRW_DYNADDR_UPDATE } pfrw_op; union { struct pfr_addr *pfrw1_addr; struct pfr_astats *pfrw1_astats; struct pfr_kentryworkq *pfrw1_workq; struct pfr_kentry *pfrw1_kentry; struct pfi_dynaddr *pfrw1_dyn; } pfrw_1; int pfrw_free; }; #define pfrw_addr pfrw_1.pfrw1_addr #define pfrw_astats pfrw_1.pfrw1_astats #define pfrw_workq pfrw_1.pfrw1_workq #define pfrw_kentry pfrw_1.pfrw1_kentry #define pfrw_dyn pfrw_1.pfrw1_dyn #define pfrw_cnt pfrw_free #define senderr(e) do { rv = (e); goto _bad; } while (0) static MALLOC_DEFINE(M_PFTABLE, "pf_table", "pf(4) tables structures"); static VNET_DEFINE(uma_zone_t, pfr_kentry_z); #define V_pfr_kentry_z VNET(pfr_kentry_z) static VNET_DEFINE(uma_zone_t, pfr_kcounters_z); #define V_pfr_kcounters_z VNET(pfr_kcounters_z) static struct pf_addr pfr_ffaddr = { .addr32 = { 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff } }; static void pfr_copyout_addr(struct pfr_addr *, struct pfr_kentry *ke); static int pfr_validate_addr(struct pfr_addr *); static void pfr_enqueue_addrs(struct pfr_ktable *, struct pfr_kentryworkq *, int *, int); static void pfr_mark_addrs(struct pfr_ktable *); static struct pfr_kentry *pfr_lookup_addr(struct pfr_ktable *, struct pfr_addr *, int); static struct pfr_kentry *pfr_create_kentry(struct pfr_addr *); static void pfr_destroy_kentries(struct pfr_kentryworkq *); static void pfr_destroy_kentry(struct pfr_kentry *); static void pfr_insert_kentries(struct pfr_ktable *, struct pfr_kentryworkq *, long); static void pfr_remove_kentries(struct pfr_ktable *, struct pfr_kentryworkq *); static void pfr_clstats_kentries(struct pfr_kentryworkq *, long, int); static void pfr_reset_feedback(struct pfr_addr *, int); static void pfr_prepare_network(union sockaddr_union *, int, int); static int pfr_route_kentry(struct pfr_ktable *, struct pfr_kentry *); static int pfr_unroute_kentry(struct pfr_ktable *, struct pfr_kentry *); static int pfr_walktree(struct radix_node *, void *); static int pfr_validate_table(struct pfr_table *, int, int); static int pfr_fix_anchor(char *); static void pfr_commit_ktable(struct pfr_ktable *, long); static void pfr_insert_ktables(struct pfr_ktableworkq *); static void pfr_insert_ktable(struct pfr_ktable *); static void pfr_setflags_ktables(struct pfr_ktableworkq *); static void pfr_setflags_ktable(struct pfr_ktable *, int); static void pfr_clstats_ktables(struct pfr_ktableworkq *, long, int); static void pfr_clstats_ktable(struct pfr_ktable *, long, int); static struct pfr_ktable *pfr_create_ktable(struct pfr_table *, long, int); static void pfr_destroy_ktables(struct pfr_ktableworkq *, int); static void pfr_destroy_ktable(struct pfr_ktable *, int); static int pfr_ktable_compare(struct pfr_ktable *, struct pfr_ktable *); static struct pfr_ktable *pfr_lookup_table(struct pfr_table *); static void pfr_clean_node_mask(struct pfr_ktable *, struct pfr_kentryworkq *); static int pfr_table_count(struct pfr_table *, int); static int pfr_skip_table(struct pfr_table *, struct pfr_ktable *, int); static struct pfr_kentry *pfr_kentry_byidx(struct pfr_ktable *, int, int); static RB_PROTOTYPE(pfr_ktablehead, pfr_ktable, pfrkt_tree, pfr_ktable_compare); static RB_GENERATE(pfr_ktablehead, pfr_ktable, pfrkt_tree, pfr_ktable_compare); struct pfr_ktablehead pfr_ktables; struct pfr_table pfr_nulltable; int pfr_ktable_cnt; void pfr_initialize(void) { V_pfr_kentry_z = uma_zcreate("pf table entries", sizeof(struct pfr_kentry), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); V_pfr_kcounters_z = uma_zcreate("pf table counters", sizeof(struct pfr_kcounters), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); V_pf_limits[PF_LIMIT_TABLE_ENTRIES].zone = V_pfr_kentry_z; V_pf_limits[PF_LIMIT_TABLE_ENTRIES].limit = PFR_KENTRY_HIWAT; } void pfr_cleanup(void) { uma_zdestroy(V_pfr_kentry_z); uma_zdestroy(V_pfr_kcounters_z); } int pfr_clr_addrs(struct pfr_table *tbl, int *ndel, int flags) { struct pfr_ktable *kt; struct pfr_kentryworkq workq; PF_RULES_WASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY); if (pfr_validate_table(tbl, 0, flags & PFR_FLAG_USERIOCTL)) return (EINVAL); kt = pfr_lookup_table(tbl); if (kt == NULL || !(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (ESRCH); if (kt->pfrkt_flags & PFR_TFLAG_CONST) return (EPERM); pfr_enqueue_addrs(kt, &workq, ndel, 0); if (!(flags & PFR_FLAG_DUMMY)) { pfr_remove_kentries(kt, &workq); KASSERT(kt->pfrkt_cnt == 0, ("%s: non-null pfrkt_cnt", __func__)); } return (0); } int pfr_add_addrs(struct pfr_table *tbl, struct pfr_addr *addr, int size, int *nadd, int flags) { struct pfr_ktable *kt, *tmpkt; struct pfr_kentryworkq workq; struct pfr_kentry *p, *q; struct pfr_addr *ad; int i, rv, xadd = 0; long tzero = time_second; PF_RULES_WASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY | PFR_FLAG_FEEDBACK); if (pfr_validate_table(tbl, 0, flags & PFR_FLAG_USERIOCTL)) return (EINVAL); kt = pfr_lookup_table(tbl); if (kt == NULL || !(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (ESRCH); if (kt->pfrkt_flags & PFR_TFLAG_CONST) return (EPERM); tmpkt = pfr_create_ktable(&pfr_nulltable, 0, 0); if (tmpkt == NULL) return (ENOMEM); SLIST_INIT(&workq); for (i = 0, ad = addr; i < size; i++, ad++) { if (pfr_validate_addr(ad)) senderr(EINVAL); p = pfr_lookup_addr(kt, ad, 1); q = pfr_lookup_addr(tmpkt, ad, 1); if (flags & PFR_FLAG_FEEDBACK) { if (q != NULL) ad->pfra_fback = PFR_FB_DUPLICATE; else if (p == NULL) ad->pfra_fback = PFR_FB_ADDED; else if (p->pfrke_not != ad->pfra_not) ad->pfra_fback = PFR_FB_CONFLICT; else ad->pfra_fback = PFR_FB_NONE; } if (p == NULL && q == NULL) { p = pfr_create_kentry(ad); if (p == NULL) senderr(ENOMEM); if (pfr_route_kentry(tmpkt, p)) { pfr_destroy_kentry(p); ad->pfra_fback = PFR_FB_NONE; } else { SLIST_INSERT_HEAD(&workq, p, pfrke_workq); xadd++; } } } pfr_clean_node_mask(tmpkt, &workq); if (!(flags & PFR_FLAG_DUMMY)) pfr_insert_kentries(kt, &workq, tzero); else pfr_destroy_kentries(&workq); if (nadd != NULL) *nadd = xadd; pfr_destroy_ktable(tmpkt, 0); return (0); _bad: pfr_clean_node_mask(tmpkt, &workq); pfr_destroy_kentries(&workq); if (flags & PFR_FLAG_FEEDBACK) pfr_reset_feedback(addr, size); pfr_destroy_ktable(tmpkt, 0); return (rv); } int pfr_del_addrs(struct pfr_table *tbl, struct pfr_addr *addr, int size, int *ndel, int flags) { struct pfr_ktable *kt; struct pfr_kentryworkq workq; struct pfr_kentry *p; struct pfr_addr *ad; int i, rv, xdel = 0, log = 1; PF_RULES_WASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY | PFR_FLAG_FEEDBACK); if (pfr_validate_table(tbl, 0, flags & PFR_FLAG_USERIOCTL)) return (EINVAL); kt = pfr_lookup_table(tbl); if (kt == NULL || !(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (ESRCH); if (kt->pfrkt_flags & PFR_TFLAG_CONST) return (EPERM); /* * there are two algorithms to choose from here. * with: * n: number of addresses to delete * N: number of addresses in the table * * one is O(N) and is better for large 'n' * one is O(n*LOG(N)) and is better for small 'n' * * following code try to decide which one is best. */ for (i = kt->pfrkt_cnt; i > 0; i >>= 1) log++; if (size > kt->pfrkt_cnt/log) { /* full table scan */ pfr_mark_addrs(kt); } else { /* iterate over addresses to delete */ for (i = 0, ad = addr; i < size; i++, ad++) { if (pfr_validate_addr(ad)) return (EINVAL); p = pfr_lookup_addr(kt, ad, 1); if (p != NULL) p->pfrke_mark = 0; } } SLIST_INIT(&workq); for (i = 0, ad = addr; i < size; i++, ad++) { if (pfr_validate_addr(ad)) senderr(EINVAL); p = pfr_lookup_addr(kt, ad, 1); if (flags & PFR_FLAG_FEEDBACK) { if (p == NULL) ad->pfra_fback = PFR_FB_NONE; else if (p->pfrke_not != ad->pfra_not) ad->pfra_fback = PFR_FB_CONFLICT; else if (p->pfrke_mark) ad->pfra_fback = PFR_FB_DUPLICATE; else ad->pfra_fback = PFR_FB_DELETED; } if (p != NULL && p->pfrke_not == ad->pfra_not && !p->pfrke_mark) { p->pfrke_mark = 1; SLIST_INSERT_HEAD(&workq, p, pfrke_workq); xdel++; } } if (!(flags & PFR_FLAG_DUMMY)) pfr_remove_kentries(kt, &workq); if (ndel != NULL) *ndel = xdel; return (0); _bad: if (flags & PFR_FLAG_FEEDBACK) pfr_reset_feedback(addr, size); return (rv); } int pfr_set_addrs(struct pfr_table *tbl, struct pfr_addr *addr, int size, int *size2, int *nadd, int *ndel, int *nchange, int flags, u_int32_t ignore_pfrt_flags) { struct pfr_ktable *kt, *tmpkt; struct pfr_kentryworkq addq, delq, changeq; struct pfr_kentry *p, *q; struct pfr_addr ad; int i, rv, xadd = 0, xdel = 0, xchange = 0; long tzero = time_second; PF_RULES_WASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY | PFR_FLAG_FEEDBACK); if (pfr_validate_table(tbl, ignore_pfrt_flags, flags & PFR_FLAG_USERIOCTL)) return (EINVAL); kt = pfr_lookup_table(tbl); if (kt == NULL || !(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (ESRCH); if (kt->pfrkt_flags & PFR_TFLAG_CONST) return (EPERM); tmpkt = pfr_create_ktable(&pfr_nulltable, 0, 0); if (tmpkt == NULL) return (ENOMEM); pfr_mark_addrs(kt); SLIST_INIT(&addq); SLIST_INIT(&delq); SLIST_INIT(&changeq); for (i = 0; i < size; i++) { /* * XXXGL: undertand pf_if usage of this function * and make ad a moving pointer */ bcopy(addr + i, &ad, sizeof(ad)); if (pfr_validate_addr(&ad)) senderr(EINVAL); ad.pfra_fback = PFR_FB_NONE; p = pfr_lookup_addr(kt, &ad, 1); if (p != NULL) { if (p->pfrke_mark) { ad.pfra_fback = PFR_FB_DUPLICATE; goto _skip; } p->pfrke_mark = 1; if (p->pfrke_not != ad.pfra_not) { SLIST_INSERT_HEAD(&changeq, p, pfrke_workq); ad.pfra_fback = PFR_FB_CHANGED; xchange++; } } else { q = pfr_lookup_addr(tmpkt, &ad, 1); if (q != NULL) { ad.pfra_fback = PFR_FB_DUPLICATE; goto _skip; } p = pfr_create_kentry(&ad); if (p == NULL) senderr(ENOMEM); if (pfr_route_kentry(tmpkt, p)) { pfr_destroy_kentry(p); ad.pfra_fback = PFR_FB_NONE; } else { SLIST_INSERT_HEAD(&addq, p, pfrke_workq); ad.pfra_fback = PFR_FB_ADDED; xadd++; } } _skip: if (flags & PFR_FLAG_FEEDBACK) bcopy(&ad, addr + i, sizeof(ad)); } pfr_enqueue_addrs(kt, &delq, &xdel, ENQUEUE_UNMARKED_ONLY); if ((flags & PFR_FLAG_FEEDBACK) && *size2) { if (*size2 < size+xdel) { *size2 = size+xdel; senderr(0); } i = 0; SLIST_FOREACH(p, &delq, pfrke_workq) { pfr_copyout_addr(&ad, p); ad.pfra_fback = PFR_FB_DELETED; bcopy(&ad, addr + size + i, sizeof(ad)); i++; } } pfr_clean_node_mask(tmpkt, &addq); if (!(flags & PFR_FLAG_DUMMY)) { pfr_insert_kentries(kt, &addq, tzero); pfr_remove_kentries(kt, &delq); pfr_clstats_kentries(&changeq, tzero, INVERT_NEG_FLAG); } else pfr_destroy_kentries(&addq); if (nadd != NULL) *nadd = xadd; if (ndel != NULL) *ndel = xdel; if (nchange != NULL) *nchange = xchange; if ((flags & PFR_FLAG_FEEDBACK) && size2) *size2 = size+xdel; pfr_destroy_ktable(tmpkt, 0); return (0); _bad: pfr_clean_node_mask(tmpkt, &addq); pfr_destroy_kentries(&addq); if (flags & PFR_FLAG_FEEDBACK) pfr_reset_feedback(addr, size); pfr_destroy_ktable(tmpkt, 0); return (rv); } int pfr_tst_addrs(struct pfr_table *tbl, struct pfr_addr *addr, int size, int *nmatch, int flags) { struct pfr_ktable *kt; struct pfr_kentry *p; struct pfr_addr *ad; int i, xmatch = 0; PF_RULES_RASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_REPLACE); if (pfr_validate_table(tbl, 0, 0)) return (EINVAL); kt = pfr_lookup_table(tbl); if (kt == NULL || !(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (ESRCH); for (i = 0, ad = addr; i < size; i++, ad++) { if (pfr_validate_addr(ad)) return (EINVAL); if (ADDR_NETWORK(ad)) return (EINVAL); p = pfr_lookup_addr(kt, ad, 0); if (flags & PFR_FLAG_REPLACE) pfr_copyout_addr(ad, p); ad->pfra_fback = (p == NULL) ? PFR_FB_NONE : (p->pfrke_not ? PFR_FB_NOTMATCH : PFR_FB_MATCH); if (p != NULL && !p->pfrke_not) xmatch++; } if (nmatch != NULL) *nmatch = xmatch; return (0); } int pfr_get_addrs(struct pfr_table *tbl, struct pfr_addr *addr, int *size, int flags) { struct pfr_ktable *kt; struct pfr_walktree w; int rv; PF_RULES_RASSERT(); ACCEPT_FLAGS(flags, 0); if (pfr_validate_table(tbl, 0, 0)) return (EINVAL); kt = pfr_lookup_table(tbl); if (kt == NULL || !(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (ESRCH); if (kt->pfrkt_cnt > *size) { *size = kt->pfrkt_cnt; return (0); } bzero(&w, sizeof(w)); w.pfrw_op = PFRW_GET_ADDRS; w.pfrw_addr = addr; w.pfrw_free = kt->pfrkt_cnt; - rv = kt->pfrkt_ip4->rnh_walktree(kt->pfrkt_ip4, pfr_walktree, &w); + rv = kt->pfrkt_ip4->rnh_walktree(&kt->pfrkt_ip4->rh, pfr_walktree, &w); if (!rv) - rv = kt->pfrkt_ip6->rnh_walktree(kt->pfrkt_ip6, pfr_walktree, - &w); + rv = kt->pfrkt_ip6->rnh_walktree(&kt->pfrkt_ip6->rh, + pfr_walktree, &w); if (rv) return (rv); KASSERT(w.pfrw_free == 0, ("%s: corruption detected (%d)", __func__, w.pfrw_free)); *size = kt->pfrkt_cnt; return (0); } int pfr_get_astats(struct pfr_table *tbl, struct pfr_astats *addr, int *size, int flags) { struct pfr_ktable *kt; struct pfr_walktree w; struct pfr_kentryworkq workq; int rv; long tzero = time_second; PF_RULES_RASSERT(); /* XXX PFR_FLAG_CLSTATS disabled */ ACCEPT_FLAGS(flags, 0); if (pfr_validate_table(tbl, 0, 0)) return (EINVAL); kt = pfr_lookup_table(tbl); if (kt == NULL || !(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (ESRCH); if (kt->pfrkt_cnt > *size) { *size = kt->pfrkt_cnt; return (0); } bzero(&w, sizeof(w)); w.pfrw_op = PFRW_GET_ASTATS; w.pfrw_astats = addr; w.pfrw_free = kt->pfrkt_cnt; - rv = kt->pfrkt_ip4->rnh_walktree(kt->pfrkt_ip4, pfr_walktree, &w); + rv = kt->pfrkt_ip4->rnh_walktree(&kt->pfrkt_ip4->rh, pfr_walktree, &w); if (!rv) - rv = kt->pfrkt_ip6->rnh_walktree(kt->pfrkt_ip6, pfr_walktree, - &w); + rv = kt->pfrkt_ip6->rnh_walktree(&kt->pfrkt_ip6->rh, + pfr_walktree, &w); if (!rv && (flags & PFR_FLAG_CLSTATS)) { pfr_enqueue_addrs(kt, &workq, NULL, 0); pfr_clstats_kentries(&workq, tzero, 0); } if (rv) return (rv); if (w.pfrw_free) { printf("pfr_get_astats: corruption detected (%d).\n", w.pfrw_free); return (ENOTTY); } *size = kt->pfrkt_cnt; return (0); } int pfr_clr_astats(struct pfr_table *tbl, struct pfr_addr *addr, int size, int *nzero, int flags) { struct pfr_ktable *kt; struct pfr_kentryworkq workq; struct pfr_kentry *p; struct pfr_addr *ad; int i, rv, xzero = 0; PF_RULES_WASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY | PFR_FLAG_FEEDBACK); if (pfr_validate_table(tbl, 0, 0)) return (EINVAL); kt = pfr_lookup_table(tbl); if (kt == NULL || !(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (ESRCH); SLIST_INIT(&workq); for (i = 0, ad = addr; i < size; i++, ad++) { if (pfr_validate_addr(ad)) senderr(EINVAL); p = pfr_lookup_addr(kt, ad, 1); if (flags & PFR_FLAG_FEEDBACK) { ad->pfra_fback = (p != NULL) ? PFR_FB_CLEARED : PFR_FB_NONE; } if (p != NULL) { SLIST_INSERT_HEAD(&workq, p, pfrke_workq); xzero++; } } if (!(flags & PFR_FLAG_DUMMY)) pfr_clstats_kentries(&workq, 0, 0); if (nzero != NULL) *nzero = xzero; return (0); _bad: if (flags & PFR_FLAG_FEEDBACK) pfr_reset_feedback(addr, size); return (rv); } static int pfr_validate_addr(struct pfr_addr *ad) { int i; switch (ad->pfra_af) { #ifdef INET case AF_INET: if (ad->pfra_net > 32) return (-1); break; #endif /* INET */ #ifdef INET6 case AF_INET6: if (ad->pfra_net > 128) return (-1); break; #endif /* INET6 */ default: return (-1); } if (ad->pfra_net < 128 && (((caddr_t)ad)[ad->pfra_net/8] & (0xFF >> (ad->pfra_net%8)))) return (-1); for (i = (ad->pfra_net+7)/8; i < sizeof(ad->pfra_u); i++) if (((caddr_t)ad)[i]) return (-1); if (ad->pfra_not && ad->pfra_not != 1) return (-1); if (ad->pfra_fback) return (-1); return (0); } static void pfr_enqueue_addrs(struct pfr_ktable *kt, struct pfr_kentryworkq *workq, int *naddr, int sweep) { struct pfr_walktree w; SLIST_INIT(workq); bzero(&w, sizeof(w)); w.pfrw_op = sweep ? PFRW_SWEEP : PFRW_ENQUEUE; w.pfrw_workq = workq; if (kt->pfrkt_ip4 != NULL) - if (kt->pfrkt_ip4->rnh_walktree(kt->pfrkt_ip4, pfr_walktree, - &w)) + if (kt->pfrkt_ip4->rnh_walktree(&kt->pfrkt_ip4->rh, + pfr_walktree, &w)) printf("pfr_enqueue_addrs: IPv4 walktree failed.\n"); if (kt->pfrkt_ip6 != NULL) - if (kt->pfrkt_ip6->rnh_walktree(kt->pfrkt_ip6, pfr_walktree, - &w)) + if (kt->pfrkt_ip6->rnh_walktree(&kt->pfrkt_ip6->rh, + pfr_walktree, &w)) printf("pfr_enqueue_addrs: IPv6 walktree failed.\n"); if (naddr != NULL) *naddr = w.pfrw_cnt; } static void pfr_mark_addrs(struct pfr_ktable *kt) { struct pfr_walktree w; bzero(&w, sizeof(w)); w.pfrw_op = PFRW_MARK; - if (kt->pfrkt_ip4->rnh_walktree(kt->pfrkt_ip4, pfr_walktree, &w)) + if (kt->pfrkt_ip4->rnh_walktree(&kt->pfrkt_ip4->rh, pfr_walktree, &w)) printf("pfr_mark_addrs: IPv4 walktree failed.\n"); - if (kt->pfrkt_ip6->rnh_walktree(kt->pfrkt_ip6, pfr_walktree, &w)) + if (kt->pfrkt_ip6->rnh_walktree(&kt->pfrkt_ip6->rh, pfr_walktree, &w)) printf("pfr_mark_addrs: IPv6 walktree failed.\n"); } static struct pfr_kentry * pfr_lookup_addr(struct pfr_ktable *kt, struct pfr_addr *ad, int exact) { union sockaddr_union sa, mask; - struct radix_node_head *head = NULL; + struct radix_head *head = NULL; struct pfr_kentry *ke; PF_RULES_ASSERT(); bzero(&sa, sizeof(sa)); if (ad->pfra_af == AF_INET) { FILLIN_SIN(sa.sin, ad->pfra_ip4addr); - head = kt->pfrkt_ip4; + head = &kt->pfrkt_ip4->rh; } else if ( ad->pfra_af == AF_INET6 ) { FILLIN_SIN6(sa.sin6, ad->pfra_ip6addr); - head = kt->pfrkt_ip6; + head = &kt->pfrkt_ip6->rh; } if (ADDR_NETWORK(ad)) { pfr_prepare_network(&mask, ad->pfra_af, ad->pfra_net); ke = (struct pfr_kentry *)rn_lookup(&sa, &mask, head); if (ke && KENTRY_RNF_ROOT(ke)) ke = NULL; } else { ke = (struct pfr_kentry *)rn_match(&sa, head); if (ke && KENTRY_RNF_ROOT(ke)) ke = NULL; if (exact && ke && KENTRY_NETWORK(ke)) ke = NULL; } return (ke); } static struct pfr_kentry * pfr_create_kentry(struct pfr_addr *ad) { struct pfr_kentry *ke; ke = uma_zalloc(V_pfr_kentry_z, M_NOWAIT | M_ZERO); if (ke == NULL) return (NULL); if (ad->pfra_af == AF_INET) FILLIN_SIN(ke->pfrke_sa.sin, ad->pfra_ip4addr); else if (ad->pfra_af == AF_INET6) FILLIN_SIN6(ke->pfrke_sa.sin6, ad->pfra_ip6addr); ke->pfrke_af = ad->pfra_af; ke->pfrke_net = ad->pfra_net; ke->pfrke_not = ad->pfra_not; return (ke); } static void pfr_destroy_kentries(struct pfr_kentryworkq *workq) { struct pfr_kentry *p, *q; for (p = SLIST_FIRST(workq); p != NULL; p = q) { q = SLIST_NEXT(p, pfrke_workq); pfr_destroy_kentry(p); } } static void pfr_destroy_kentry(struct pfr_kentry *ke) { if (ke->pfrke_counters) uma_zfree(V_pfr_kcounters_z, ke->pfrke_counters); uma_zfree(V_pfr_kentry_z, ke); } static void pfr_insert_kentries(struct pfr_ktable *kt, struct pfr_kentryworkq *workq, long tzero) { struct pfr_kentry *p; int rv, n = 0; SLIST_FOREACH(p, workq, pfrke_workq) { rv = pfr_route_kentry(kt, p); if (rv) { printf("pfr_insert_kentries: cannot route entry " "(code=%d).\n", rv); break; } p->pfrke_tzero = tzero; n++; } kt->pfrkt_cnt += n; } int pfr_insert_kentry(struct pfr_ktable *kt, struct pfr_addr *ad, long tzero) { struct pfr_kentry *p; int rv; p = pfr_lookup_addr(kt, ad, 1); if (p != NULL) return (0); p = pfr_create_kentry(ad); if (p == NULL) return (ENOMEM); rv = pfr_route_kentry(kt, p); if (rv) return (rv); p->pfrke_tzero = tzero; kt->pfrkt_cnt++; return (0); } static void pfr_remove_kentries(struct pfr_ktable *kt, struct pfr_kentryworkq *workq) { struct pfr_kentry *p; int n = 0; SLIST_FOREACH(p, workq, pfrke_workq) { pfr_unroute_kentry(kt, p); n++; } kt->pfrkt_cnt -= n; pfr_destroy_kentries(workq); } static void pfr_clean_node_mask(struct pfr_ktable *kt, struct pfr_kentryworkq *workq) { struct pfr_kentry *p; SLIST_FOREACH(p, workq, pfrke_workq) pfr_unroute_kentry(kt, p); } static void pfr_clstats_kentries(struct pfr_kentryworkq *workq, long tzero, int negchange) { struct pfr_kentry *p; SLIST_FOREACH(p, workq, pfrke_workq) { if (negchange) p->pfrke_not = !p->pfrke_not; if (p->pfrke_counters) { uma_zfree(V_pfr_kcounters_z, p->pfrke_counters); p->pfrke_counters = NULL; } p->pfrke_tzero = tzero; } } static void pfr_reset_feedback(struct pfr_addr *addr, int size) { struct pfr_addr *ad; int i; for (i = 0, ad = addr; i < size; i++, ad++) ad->pfra_fback = PFR_FB_NONE; } static void pfr_prepare_network(union sockaddr_union *sa, int af, int net) { int i; bzero(sa, sizeof(*sa)); if (af == AF_INET) { sa->sin.sin_len = sizeof(sa->sin); sa->sin.sin_family = AF_INET; sa->sin.sin_addr.s_addr = net ? htonl(-1 << (32-net)) : 0; } else if (af == AF_INET6) { sa->sin6.sin6_len = sizeof(sa->sin6); sa->sin6.sin6_family = AF_INET6; for (i = 0; i < 4; i++) { if (net <= 32) { sa->sin6.sin6_addr.s6_addr32[i] = net ? htonl(-1 << (32-net)) : 0; break; } sa->sin6.sin6_addr.s6_addr32[i] = 0xFFFFFFFF; net -= 32; } } } static int pfr_route_kentry(struct pfr_ktable *kt, struct pfr_kentry *ke) { union sockaddr_union mask; struct radix_node *rn; - struct radix_node_head *head = NULL; + struct radix_head *head = NULL; PF_RULES_WASSERT(); bzero(ke->pfrke_node, sizeof(ke->pfrke_node)); if (ke->pfrke_af == AF_INET) - head = kt->pfrkt_ip4; + head = &kt->pfrkt_ip4->rh; else if (ke->pfrke_af == AF_INET6) - head = kt->pfrkt_ip6; + head = &kt->pfrkt_ip6->rh; if (KENTRY_NETWORK(ke)) { pfr_prepare_network(&mask, ke->pfrke_af, ke->pfrke_net); rn = rn_addroute(&ke->pfrke_sa, &mask, head, ke->pfrke_node); } else rn = rn_addroute(&ke->pfrke_sa, NULL, head, ke->pfrke_node); return (rn == NULL ? -1 : 0); } static int pfr_unroute_kentry(struct pfr_ktable *kt, struct pfr_kentry *ke) { union sockaddr_union mask; struct radix_node *rn; - struct radix_node_head *head = NULL; + struct radix_head *head = NULL; if (ke->pfrke_af == AF_INET) - head = kt->pfrkt_ip4; + head = &kt->pfrkt_ip4->rh; else if (ke->pfrke_af == AF_INET6) - head = kt->pfrkt_ip6; + head = &kt->pfrkt_ip6->rh; if (KENTRY_NETWORK(ke)) { pfr_prepare_network(&mask, ke->pfrke_af, ke->pfrke_net); rn = rn_delete(&ke->pfrke_sa, &mask, head); } else rn = rn_delete(&ke->pfrke_sa, NULL, head); if (rn == NULL) { printf("pfr_unroute_kentry: delete failed.\n"); return (-1); } return (0); } static void pfr_copyout_addr(struct pfr_addr *ad, struct pfr_kentry *ke) { bzero(ad, sizeof(*ad)); if (ke == NULL) return; ad->pfra_af = ke->pfrke_af; ad->pfra_net = ke->pfrke_net; ad->pfra_not = ke->pfrke_not; if (ad->pfra_af == AF_INET) ad->pfra_ip4addr = ke->pfrke_sa.sin.sin_addr; else if (ad->pfra_af == AF_INET6) ad->pfra_ip6addr = ke->pfrke_sa.sin6.sin6_addr; } static int pfr_walktree(struct radix_node *rn, void *arg) { struct pfr_kentry *ke = (struct pfr_kentry *)rn; struct pfr_walktree *w = arg; switch (w->pfrw_op) { case PFRW_MARK: ke->pfrke_mark = 0; break; case PFRW_SWEEP: if (ke->pfrke_mark) break; /* FALLTHROUGH */ case PFRW_ENQUEUE: SLIST_INSERT_HEAD(w->pfrw_workq, ke, pfrke_workq); w->pfrw_cnt++; break; case PFRW_GET_ADDRS: if (w->pfrw_free-- > 0) { pfr_copyout_addr(w->pfrw_addr, ke); w->pfrw_addr++; } break; case PFRW_GET_ASTATS: if (w->pfrw_free-- > 0) { struct pfr_astats as; pfr_copyout_addr(&as.pfras_a, ke); if (ke->pfrke_counters) { bcopy(ke->pfrke_counters->pfrkc_packets, as.pfras_packets, sizeof(as.pfras_packets)); bcopy(ke->pfrke_counters->pfrkc_bytes, as.pfras_bytes, sizeof(as.pfras_bytes)); } else { bzero(as.pfras_packets, sizeof(as.pfras_packets)); bzero(as.pfras_bytes, sizeof(as.pfras_bytes)); as.pfras_a.pfra_fback = PFR_FB_NOCOUNT; } as.pfras_tzero = ke->pfrke_tzero; bcopy(&as, w->pfrw_astats, sizeof(as)); w->pfrw_astats++; } break; case PFRW_POOL_GET: if (ke->pfrke_not) break; /* negative entries are ignored */ if (!w->pfrw_cnt--) { w->pfrw_kentry = ke; return (1); /* finish search */ } break; case PFRW_DYNADDR_UPDATE: { union sockaddr_union pfr_mask; if (ke->pfrke_af == AF_INET) { if (w->pfrw_dyn->pfid_acnt4++ > 0) break; pfr_prepare_network(&pfr_mask, AF_INET, ke->pfrke_net); w->pfrw_dyn->pfid_addr4 = *SUNION2PF(&ke->pfrke_sa, AF_INET); w->pfrw_dyn->pfid_mask4 = *SUNION2PF(&pfr_mask, AF_INET); } else if (ke->pfrke_af == AF_INET6){ if (w->pfrw_dyn->pfid_acnt6++ > 0) break; pfr_prepare_network(&pfr_mask, AF_INET6, ke->pfrke_net); w->pfrw_dyn->pfid_addr6 = *SUNION2PF(&ke->pfrke_sa, AF_INET6); w->pfrw_dyn->pfid_mask6 = *SUNION2PF(&pfr_mask, AF_INET6); } break; } } return (0); } int pfr_clr_tables(struct pfr_table *filter, int *ndel, int flags) { struct pfr_ktableworkq workq; struct pfr_ktable *p; int xdel = 0; ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY | PFR_FLAG_ALLRSETS); if (pfr_fix_anchor(filter->pfrt_anchor)) return (EINVAL); if (pfr_table_count(filter, flags) < 0) return (ENOENT); SLIST_INIT(&workq); RB_FOREACH(p, pfr_ktablehead, &pfr_ktables) { if (pfr_skip_table(filter, p, flags)) continue; if (!strcmp(p->pfrkt_anchor, PF_RESERVED_ANCHOR)) continue; if (!(p->pfrkt_flags & PFR_TFLAG_ACTIVE)) continue; p->pfrkt_nflags = p->pfrkt_flags & ~PFR_TFLAG_ACTIVE; SLIST_INSERT_HEAD(&workq, p, pfrkt_workq); xdel++; } if (!(flags & PFR_FLAG_DUMMY)) pfr_setflags_ktables(&workq); if (ndel != NULL) *ndel = xdel; return (0); } int pfr_add_tables(struct pfr_table *tbl, int size, int *nadd, int flags) { struct pfr_ktableworkq addq, changeq; struct pfr_ktable *p, *q, *r, key; int i, rv, xadd = 0; long tzero = time_second; ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY); SLIST_INIT(&addq); SLIST_INIT(&changeq); for (i = 0; i < size; i++) { bcopy(tbl+i, &key.pfrkt_t, sizeof(key.pfrkt_t)); if (pfr_validate_table(&key.pfrkt_t, PFR_TFLAG_USRMASK, flags & PFR_FLAG_USERIOCTL)) senderr(EINVAL); key.pfrkt_flags |= PFR_TFLAG_ACTIVE; p = RB_FIND(pfr_ktablehead, &pfr_ktables, &key); if (p == NULL) { p = pfr_create_ktable(&key.pfrkt_t, tzero, 1); if (p == NULL) senderr(ENOMEM); SLIST_FOREACH(q, &addq, pfrkt_workq) { if (!pfr_ktable_compare(p, q)) goto _skip; } SLIST_INSERT_HEAD(&addq, p, pfrkt_workq); xadd++; if (!key.pfrkt_anchor[0]) goto _skip; /* find or create root table */ bzero(key.pfrkt_anchor, sizeof(key.pfrkt_anchor)); r = RB_FIND(pfr_ktablehead, &pfr_ktables, &key); if (r != NULL) { p->pfrkt_root = r; goto _skip; } SLIST_FOREACH(q, &addq, pfrkt_workq) { if (!pfr_ktable_compare(&key, q)) { p->pfrkt_root = q; goto _skip; } } key.pfrkt_flags = 0; r = pfr_create_ktable(&key.pfrkt_t, 0, 1); if (r == NULL) senderr(ENOMEM); SLIST_INSERT_HEAD(&addq, r, pfrkt_workq); p->pfrkt_root = r; } else if (!(p->pfrkt_flags & PFR_TFLAG_ACTIVE)) { SLIST_FOREACH(q, &changeq, pfrkt_workq) if (!pfr_ktable_compare(&key, q)) goto _skip; p->pfrkt_nflags = (p->pfrkt_flags & ~PFR_TFLAG_USRMASK) | key.pfrkt_flags; SLIST_INSERT_HEAD(&changeq, p, pfrkt_workq); xadd++; } _skip: ; } if (!(flags & PFR_FLAG_DUMMY)) { pfr_insert_ktables(&addq); pfr_setflags_ktables(&changeq); } else pfr_destroy_ktables(&addq, 0); if (nadd != NULL) *nadd = xadd; return (0); _bad: pfr_destroy_ktables(&addq, 0); return (rv); } int pfr_del_tables(struct pfr_table *tbl, int size, int *ndel, int flags) { struct pfr_ktableworkq workq; struct pfr_ktable *p, *q, key; int i, xdel = 0; ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY); SLIST_INIT(&workq); for (i = 0; i < size; i++) { bcopy(tbl+i, &key.pfrkt_t, sizeof(key.pfrkt_t)); if (pfr_validate_table(&key.pfrkt_t, 0, flags & PFR_FLAG_USERIOCTL)) return (EINVAL); p = RB_FIND(pfr_ktablehead, &pfr_ktables, &key); if (p != NULL && (p->pfrkt_flags & PFR_TFLAG_ACTIVE)) { SLIST_FOREACH(q, &workq, pfrkt_workq) if (!pfr_ktable_compare(p, q)) goto _skip; p->pfrkt_nflags = p->pfrkt_flags & ~PFR_TFLAG_ACTIVE; SLIST_INSERT_HEAD(&workq, p, pfrkt_workq); xdel++; } _skip: ; } if (!(flags & PFR_FLAG_DUMMY)) pfr_setflags_ktables(&workq); if (ndel != NULL) *ndel = xdel; return (0); } int pfr_get_tables(struct pfr_table *filter, struct pfr_table *tbl, int *size, int flags) { struct pfr_ktable *p; int n, nn; PF_RULES_RASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_ALLRSETS); if (pfr_fix_anchor(filter->pfrt_anchor)) return (EINVAL); n = nn = pfr_table_count(filter, flags); if (n < 0) return (ENOENT); if (n > *size) { *size = n; return (0); } RB_FOREACH(p, pfr_ktablehead, &pfr_ktables) { if (pfr_skip_table(filter, p, flags)) continue; if (n-- <= 0) continue; bcopy(&p->pfrkt_t, tbl++, sizeof(*tbl)); } KASSERT(n == 0, ("%s: corruption detected (%d)", __func__, n)); *size = nn; return (0); } int pfr_get_tstats(struct pfr_table *filter, struct pfr_tstats *tbl, int *size, int flags) { struct pfr_ktable *p; struct pfr_ktableworkq workq; int n, nn; long tzero = time_second; /* XXX PFR_FLAG_CLSTATS disabled */ ACCEPT_FLAGS(flags, PFR_FLAG_ALLRSETS); if (pfr_fix_anchor(filter->pfrt_anchor)) return (EINVAL); n = nn = pfr_table_count(filter, flags); if (n < 0) return (ENOENT); if (n > *size) { *size = n; return (0); } SLIST_INIT(&workq); RB_FOREACH(p, pfr_ktablehead, &pfr_ktables) { if (pfr_skip_table(filter, p, flags)) continue; if (n-- <= 0) continue; bcopy(&p->pfrkt_ts, tbl++, sizeof(*tbl)); SLIST_INSERT_HEAD(&workq, p, pfrkt_workq); } if (flags & PFR_FLAG_CLSTATS) pfr_clstats_ktables(&workq, tzero, flags & PFR_FLAG_ADDRSTOO); KASSERT(n == 0, ("%s: corruption detected (%d)", __func__, n)); *size = nn; return (0); } int pfr_clr_tstats(struct pfr_table *tbl, int size, int *nzero, int flags) { struct pfr_ktableworkq workq; struct pfr_ktable *p, key; int i, xzero = 0; long tzero = time_second; ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY | PFR_FLAG_ADDRSTOO); SLIST_INIT(&workq); for (i = 0; i < size; i++) { bcopy(tbl + i, &key.pfrkt_t, sizeof(key.pfrkt_t)); if (pfr_validate_table(&key.pfrkt_t, 0, 0)) return (EINVAL); p = RB_FIND(pfr_ktablehead, &pfr_ktables, &key); if (p != NULL) { SLIST_INSERT_HEAD(&workq, p, pfrkt_workq); xzero++; } } if (!(flags & PFR_FLAG_DUMMY)) pfr_clstats_ktables(&workq, tzero, flags & PFR_FLAG_ADDRSTOO); if (nzero != NULL) *nzero = xzero; return (0); } int pfr_set_tflags(struct pfr_table *tbl, int size, int setflag, int clrflag, int *nchange, int *ndel, int flags) { struct pfr_ktableworkq workq; struct pfr_ktable *p, *q, key; int i, xchange = 0, xdel = 0; ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY); if ((setflag & ~PFR_TFLAG_USRMASK) || (clrflag & ~PFR_TFLAG_USRMASK) || (setflag & clrflag)) return (EINVAL); SLIST_INIT(&workq); for (i = 0; i < size; i++) { bcopy(tbl + i, &key.pfrkt_t, sizeof(key.pfrkt_t)); if (pfr_validate_table(&key.pfrkt_t, 0, flags & PFR_FLAG_USERIOCTL)) return (EINVAL); p = RB_FIND(pfr_ktablehead, &pfr_ktables, &key); if (p != NULL && (p->pfrkt_flags & PFR_TFLAG_ACTIVE)) { p->pfrkt_nflags = (p->pfrkt_flags | setflag) & ~clrflag; if (p->pfrkt_nflags == p->pfrkt_flags) goto _skip; SLIST_FOREACH(q, &workq, pfrkt_workq) if (!pfr_ktable_compare(p, q)) goto _skip; SLIST_INSERT_HEAD(&workq, p, pfrkt_workq); if ((p->pfrkt_flags & PFR_TFLAG_PERSIST) && (clrflag & PFR_TFLAG_PERSIST) && !(p->pfrkt_flags & PFR_TFLAG_REFERENCED)) xdel++; else xchange++; } _skip: ; } if (!(flags & PFR_FLAG_DUMMY)) pfr_setflags_ktables(&workq); if (nchange != NULL) *nchange = xchange; if (ndel != NULL) *ndel = xdel; return (0); } int pfr_ina_begin(struct pfr_table *trs, u_int32_t *ticket, int *ndel, int flags) { struct pfr_ktableworkq workq; struct pfr_ktable *p; struct pf_ruleset *rs; int xdel = 0; ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY); rs = pf_find_or_create_ruleset(trs->pfrt_anchor); if (rs == NULL) return (ENOMEM); SLIST_INIT(&workq); RB_FOREACH(p, pfr_ktablehead, &pfr_ktables) { if (!(p->pfrkt_flags & PFR_TFLAG_INACTIVE) || pfr_skip_table(trs, p, 0)) continue; p->pfrkt_nflags = p->pfrkt_flags & ~PFR_TFLAG_INACTIVE; SLIST_INSERT_HEAD(&workq, p, pfrkt_workq); xdel++; } if (!(flags & PFR_FLAG_DUMMY)) { pfr_setflags_ktables(&workq); if (ticket != NULL) *ticket = ++rs->tticket; rs->topen = 1; } else pf_remove_if_empty_ruleset(rs); if (ndel != NULL) *ndel = xdel; return (0); } int pfr_ina_define(struct pfr_table *tbl, struct pfr_addr *addr, int size, int *nadd, int *naddr, u_int32_t ticket, int flags) { struct pfr_ktableworkq tableq; struct pfr_kentryworkq addrq; struct pfr_ktable *kt, *rt, *shadow, key; struct pfr_kentry *p; struct pfr_addr *ad; struct pf_ruleset *rs; int i, rv, xadd = 0, xaddr = 0; PF_RULES_WASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY | PFR_FLAG_ADDRSTOO); if (size && !(flags & PFR_FLAG_ADDRSTOO)) return (EINVAL); if (pfr_validate_table(tbl, PFR_TFLAG_USRMASK, flags & PFR_FLAG_USERIOCTL)) return (EINVAL); rs = pf_find_ruleset(tbl->pfrt_anchor); if (rs == NULL || !rs->topen || ticket != rs->tticket) return (EBUSY); tbl->pfrt_flags |= PFR_TFLAG_INACTIVE; SLIST_INIT(&tableq); kt = RB_FIND(pfr_ktablehead, &pfr_ktables, (struct pfr_ktable *)tbl); if (kt == NULL) { kt = pfr_create_ktable(tbl, 0, 1); if (kt == NULL) return (ENOMEM); SLIST_INSERT_HEAD(&tableq, kt, pfrkt_workq); xadd++; if (!tbl->pfrt_anchor[0]) goto _skip; /* find or create root table */ bzero(&key, sizeof(key)); strlcpy(key.pfrkt_name, tbl->pfrt_name, sizeof(key.pfrkt_name)); rt = RB_FIND(pfr_ktablehead, &pfr_ktables, &key); if (rt != NULL) { kt->pfrkt_root = rt; goto _skip; } rt = pfr_create_ktable(&key.pfrkt_t, 0, 1); if (rt == NULL) { pfr_destroy_ktables(&tableq, 0); return (ENOMEM); } SLIST_INSERT_HEAD(&tableq, rt, pfrkt_workq); kt->pfrkt_root = rt; } else if (!(kt->pfrkt_flags & PFR_TFLAG_INACTIVE)) xadd++; _skip: shadow = pfr_create_ktable(tbl, 0, 0); if (shadow == NULL) { pfr_destroy_ktables(&tableq, 0); return (ENOMEM); } SLIST_INIT(&addrq); for (i = 0, ad = addr; i < size; i++, ad++) { if (pfr_validate_addr(ad)) senderr(EINVAL); if (pfr_lookup_addr(shadow, ad, 1) != NULL) continue; p = pfr_create_kentry(ad); if (p == NULL) senderr(ENOMEM); if (pfr_route_kentry(shadow, p)) { pfr_destroy_kentry(p); continue; } SLIST_INSERT_HEAD(&addrq, p, pfrke_workq); xaddr++; } if (!(flags & PFR_FLAG_DUMMY)) { if (kt->pfrkt_shadow != NULL) pfr_destroy_ktable(kt->pfrkt_shadow, 1); kt->pfrkt_flags |= PFR_TFLAG_INACTIVE; pfr_insert_ktables(&tableq); shadow->pfrkt_cnt = (flags & PFR_FLAG_ADDRSTOO) ? xaddr : NO_ADDRESSES; kt->pfrkt_shadow = shadow; } else { pfr_clean_node_mask(shadow, &addrq); pfr_destroy_ktable(shadow, 0); pfr_destroy_ktables(&tableq, 0); pfr_destroy_kentries(&addrq); } if (nadd != NULL) *nadd = xadd; if (naddr != NULL) *naddr = xaddr; return (0); _bad: pfr_destroy_ktable(shadow, 0); pfr_destroy_ktables(&tableq, 0); pfr_destroy_kentries(&addrq); return (rv); } int pfr_ina_rollback(struct pfr_table *trs, u_int32_t ticket, int *ndel, int flags) { struct pfr_ktableworkq workq; struct pfr_ktable *p; struct pf_ruleset *rs; int xdel = 0; PF_RULES_WASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY); rs = pf_find_ruleset(trs->pfrt_anchor); if (rs == NULL || !rs->topen || ticket != rs->tticket) return (0); SLIST_INIT(&workq); RB_FOREACH(p, pfr_ktablehead, &pfr_ktables) { if (!(p->pfrkt_flags & PFR_TFLAG_INACTIVE) || pfr_skip_table(trs, p, 0)) continue; p->pfrkt_nflags = p->pfrkt_flags & ~PFR_TFLAG_INACTIVE; SLIST_INSERT_HEAD(&workq, p, pfrkt_workq); xdel++; } if (!(flags & PFR_FLAG_DUMMY)) { pfr_setflags_ktables(&workq); rs->topen = 0; pf_remove_if_empty_ruleset(rs); } if (ndel != NULL) *ndel = xdel; return (0); } int pfr_ina_commit(struct pfr_table *trs, u_int32_t ticket, int *nadd, int *nchange, int flags) { struct pfr_ktable *p, *q; struct pfr_ktableworkq workq; struct pf_ruleset *rs; int xadd = 0, xchange = 0; long tzero = time_second; PF_RULES_WASSERT(); ACCEPT_FLAGS(flags, PFR_FLAG_DUMMY); rs = pf_find_ruleset(trs->pfrt_anchor); if (rs == NULL || !rs->topen || ticket != rs->tticket) return (EBUSY); SLIST_INIT(&workq); RB_FOREACH(p, pfr_ktablehead, &pfr_ktables) { if (!(p->pfrkt_flags & PFR_TFLAG_INACTIVE) || pfr_skip_table(trs, p, 0)) continue; SLIST_INSERT_HEAD(&workq, p, pfrkt_workq); if (p->pfrkt_flags & PFR_TFLAG_ACTIVE) xchange++; else xadd++; } if (!(flags & PFR_FLAG_DUMMY)) { for (p = SLIST_FIRST(&workq); p != NULL; p = q) { q = SLIST_NEXT(p, pfrkt_workq); pfr_commit_ktable(p, tzero); } rs->topen = 0; pf_remove_if_empty_ruleset(rs); } if (nadd != NULL) *nadd = xadd; if (nchange != NULL) *nchange = xchange; return (0); } static void pfr_commit_ktable(struct pfr_ktable *kt, long tzero) { struct pfr_ktable *shadow = kt->pfrkt_shadow; int nflags; PF_RULES_WASSERT(); if (shadow->pfrkt_cnt == NO_ADDRESSES) { if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) pfr_clstats_ktable(kt, tzero, 1); } else if (kt->pfrkt_flags & PFR_TFLAG_ACTIVE) { /* kt might contain addresses */ struct pfr_kentryworkq addrq, addq, changeq, delq, garbageq; struct pfr_kentry *p, *q, *next; struct pfr_addr ad; pfr_enqueue_addrs(shadow, &addrq, NULL, 0); pfr_mark_addrs(kt); SLIST_INIT(&addq); SLIST_INIT(&changeq); SLIST_INIT(&delq); SLIST_INIT(&garbageq); pfr_clean_node_mask(shadow, &addrq); for (p = SLIST_FIRST(&addrq); p != NULL; p = next) { next = SLIST_NEXT(p, pfrke_workq); /* XXX */ pfr_copyout_addr(&ad, p); q = pfr_lookup_addr(kt, &ad, 1); if (q != NULL) { if (q->pfrke_not != p->pfrke_not) SLIST_INSERT_HEAD(&changeq, q, pfrke_workq); q->pfrke_mark = 1; SLIST_INSERT_HEAD(&garbageq, p, pfrke_workq); } else { p->pfrke_tzero = tzero; SLIST_INSERT_HEAD(&addq, p, pfrke_workq); } } pfr_enqueue_addrs(kt, &delq, NULL, ENQUEUE_UNMARKED_ONLY); pfr_insert_kentries(kt, &addq, tzero); pfr_remove_kentries(kt, &delq); pfr_clstats_kentries(&changeq, tzero, INVERT_NEG_FLAG); pfr_destroy_kentries(&garbageq); } else { /* kt cannot contain addresses */ SWAP(struct radix_node_head *, kt->pfrkt_ip4, shadow->pfrkt_ip4); SWAP(struct radix_node_head *, kt->pfrkt_ip6, shadow->pfrkt_ip6); SWAP(int, kt->pfrkt_cnt, shadow->pfrkt_cnt); pfr_clstats_ktable(kt, tzero, 1); } nflags = ((shadow->pfrkt_flags & PFR_TFLAG_USRMASK) | (kt->pfrkt_flags & PFR_TFLAG_SETMASK) | PFR_TFLAG_ACTIVE) & ~PFR_TFLAG_INACTIVE; pfr_destroy_ktable(shadow, 0); kt->pfrkt_shadow = NULL; pfr_setflags_ktable(kt, nflags); } static int pfr_validate_table(struct pfr_table *tbl, int allowedflags, int no_reserved) { int i; if (!tbl->pfrt_name[0]) return (-1); if (no_reserved && !strcmp(tbl->pfrt_anchor, PF_RESERVED_ANCHOR)) return (-1); if (tbl->pfrt_name[PF_TABLE_NAME_SIZE-1]) return (-1); for (i = strlen(tbl->pfrt_name); i < PF_TABLE_NAME_SIZE; i++) if (tbl->pfrt_name[i]) return (-1); if (pfr_fix_anchor(tbl->pfrt_anchor)) return (-1); if (tbl->pfrt_flags & ~allowedflags) return (-1); return (0); } /* * Rewrite anchors referenced by tables to remove slashes * and check for validity. */ static int pfr_fix_anchor(char *anchor) { size_t siz = MAXPATHLEN; int i; if (anchor[0] == '/') { char *path; int off; path = anchor; off = 1; while (*++path == '/') off++; bcopy(path, anchor, siz - off); memset(anchor + siz - off, 0, off); } if (anchor[siz - 1]) return (-1); for (i = strlen(anchor); i < siz; i++) if (anchor[i]) return (-1); return (0); } static int pfr_table_count(struct pfr_table *filter, int flags) { struct pf_ruleset *rs; PF_RULES_ASSERT(); if (flags & PFR_FLAG_ALLRSETS) return (pfr_ktable_cnt); if (filter->pfrt_anchor[0]) { rs = pf_find_ruleset(filter->pfrt_anchor); return ((rs != NULL) ? rs->tables : -1); } return (pf_main_ruleset.tables); } static int pfr_skip_table(struct pfr_table *filter, struct pfr_ktable *kt, int flags) { if (flags & PFR_FLAG_ALLRSETS) return (0); if (strcmp(filter->pfrt_anchor, kt->pfrkt_anchor)) return (1); return (0); } static void pfr_insert_ktables(struct pfr_ktableworkq *workq) { struct pfr_ktable *p; SLIST_FOREACH(p, workq, pfrkt_workq) pfr_insert_ktable(p); } static void pfr_insert_ktable(struct pfr_ktable *kt) { PF_RULES_WASSERT(); RB_INSERT(pfr_ktablehead, &pfr_ktables, kt); pfr_ktable_cnt++; if (kt->pfrkt_root != NULL) if (!kt->pfrkt_root->pfrkt_refcnt[PFR_REFCNT_ANCHOR]++) pfr_setflags_ktable(kt->pfrkt_root, kt->pfrkt_root->pfrkt_flags|PFR_TFLAG_REFDANCHOR); } static void pfr_setflags_ktables(struct pfr_ktableworkq *workq) { struct pfr_ktable *p, *q; for (p = SLIST_FIRST(workq); p; p = q) { q = SLIST_NEXT(p, pfrkt_workq); pfr_setflags_ktable(p, p->pfrkt_nflags); } } static void pfr_setflags_ktable(struct pfr_ktable *kt, int newf) { struct pfr_kentryworkq addrq; PF_RULES_WASSERT(); if (!(newf & PFR_TFLAG_REFERENCED) && !(newf & PFR_TFLAG_PERSIST)) newf &= ~PFR_TFLAG_ACTIVE; if (!(newf & PFR_TFLAG_ACTIVE)) newf &= ~PFR_TFLAG_USRMASK; if (!(newf & PFR_TFLAG_SETMASK)) { RB_REMOVE(pfr_ktablehead, &pfr_ktables, kt); if (kt->pfrkt_root != NULL) if (!--kt->pfrkt_root->pfrkt_refcnt[PFR_REFCNT_ANCHOR]) pfr_setflags_ktable(kt->pfrkt_root, kt->pfrkt_root->pfrkt_flags & ~PFR_TFLAG_REFDANCHOR); pfr_destroy_ktable(kt, 1); pfr_ktable_cnt--; return; } if (!(newf & PFR_TFLAG_ACTIVE) && kt->pfrkt_cnt) { pfr_enqueue_addrs(kt, &addrq, NULL, 0); pfr_remove_kentries(kt, &addrq); } if (!(newf & PFR_TFLAG_INACTIVE) && kt->pfrkt_shadow != NULL) { pfr_destroy_ktable(kt->pfrkt_shadow, 1); kt->pfrkt_shadow = NULL; } kt->pfrkt_flags = newf; } static void pfr_clstats_ktables(struct pfr_ktableworkq *workq, long tzero, int recurse) { struct pfr_ktable *p; SLIST_FOREACH(p, workq, pfrkt_workq) pfr_clstats_ktable(p, tzero, recurse); } static void pfr_clstats_ktable(struct pfr_ktable *kt, long tzero, int recurse) { struct pfr_kentryworkq addrq; if (recurse) { pfr_enqueue_addrs(kt, &addrq, NULL, 0); pfr_clstats_kentries(&addrq, tzero, 0); } bzero(kt->pfrkt_packets, sizeof(kt->pfrkt_packets)); bzero(kt->pfrkt_bytes, sizeof(kt->pfrkt_bytes)); kt->pfrkt_match = kt->pfrkt_nomatch = 0; kt->pfrkt_tzero = tzero; } static struct pfr_ktable * pfr_create_ktable(struct pfr_table *tbl, long tzero, int attachruleset) { struct pfr_ktable *kt; struct pf_ruleset *rs; PF_RULES_WASSERT(); kt = malloc(sizeof(*kt), M_PFTABLE, M_NOWAIT|M_ZERO); if (kt == NULL) return (NULL); kt->pfrkt_t = *tbl; if (attachruleset) { rs = pf_find_or_create_ruleset(tbl->pfrt_anchor); if (!rs) { pfr_destroy_ktable(kt, 0); return (NULL); } kt->pfrkt_rs = rs; rs->tables++; } if (!rn_inithead((void **)&kt->pfrkt_ip4, offsetof(struct sockaddr_in, sin_addr) * 8) || !rn_inithead((void **)&kt->pfrkt_ip6, offsetof(struct sockaddr_in6, sin6_addr) * 8)) { pfr_destroy_ktable(kt, 0); return (NULL); } kt->pfrkt_tzero = tzero; return (kt); } static void pfr_destroy_ktables(struct pfr_ktableworkq *workq, int flushaddr) { struct pfr_ktable *p, *q; for (p = SLIST_FIRST(workq); p; p = q) { q = SLIST_NEXT(p, pfrkt_workq); pfr_destroy_ktable(p, flushaddr); } } static void pfr_destroy_ktable(struct pfr_ktable *kt, int flushaddr) { struct pfr_kentryworkq addrq; if (flushaddr) { pfr_enqueue_addrs(kt, &addrq, NULL, 0); pfr_clean_node_mask(kt, &addrq); pfr_destroy_kentries(&addrq); } if (kt->pfrkt_ip4 != NULL) rn_detachhead((void **)&kt->pfrkt_ip4); if (kt->pfrkt_ip6 != NULL) rn_detachhead((void **)&kt->pfrkt_ip6); if (kt->pfrkt_shadow != NULL) pfr_destroy_ktable(kt->pfrkt_shadow, flushaddr); if (kt->pfrkt_rs != NULL) { kt->pfrkt_rs->tables--; pf_remove_if_empty_ruleset(kt->pfrkt_rs); } free(kt, M_PFTABLE); } static int pfr_ktable_compare(struct pfr_ktable *p, struct pfr_ktable *q) { int d; if ((d = strncmp(p->pfrkt_name, q->pfrkt_name, PF_TABLE_NAME_SIZE))) return (d); return (strcmp(p->pfrkt_anchor, q->pfrkt_anchor)); } static struct pfr_ktable * pfr_lookup_table(struct pfr_table *tbl) { /* struct pfr_ktable start like a struct pfr_table */ return (RB_FIND(pfr_ktablehead, &pfr_ktables, (struct pfr_ktable *)tbl)); } int pfr_match_addr(struct pfr_ktable *kt, struct pf_addr *a, sa_family_t af) { struct pfr_kentry *ke = NULL; int match; PF_RULES_RASSERT(); if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE) && kt->pfrkt_root != NULL) kt = kt->pfrkt_root; if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (0); switch (af) { #ifdef INET case AF_INET: { struct sockaddr_in sin; bzero(&sin, sizeof(sin)); sin.sin_len = sizeof(sin); sin.sin_family = AF_INET; sin.sin_addr.s_addr = a->addr32[0]; - ke = (struct pfr_kentry *)rn_match(&sin, kt->pfrkt_ip4); + ke = (struct pfr_kentry *)rn_match(&sin, &kt->pfrkt_ip4->rh); if (ke && KENTRY_RNF_ROOT(ke)) ke = NULL; break; } #endif /* INET */ #ifdef INET6 case AF_INET6: { struct sockaddr_in6 sin6; bzero(&sin6, sizeof(sin6)); sin6.sin6_len = sizeof(sin6); sin6.sin6_family = AF_INET6; bcopy(a, &sin6.sin6_addr, sizeof(sin6.sin6_addr)); - ke = (struct pfr_kentry *)rn_match(&sin6, kt->pfrkt_ip6); + ke = (struct pfr_kentry *)rn_match(&sin6, &kt->pfrkt_ip6->rh); if (ke && KENTRY_RNF_ROOT(ke)) ke = NULL; break; } #endif /* INET6 */ } match = (ke && !ke->pfrke_not); if (match) kt->pfrkt_match++; else kt->pfrkt_nomatch++; return (match); } void pfr_update_stats(struct pfr_ktable *kt, struct pf_addr *a, sa_family_t af, u_int64_t len, int dir_out, int op_pass, int notrule) { struct pfr_kentry *ke = NULL; if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE) && kt->pfrkt_root != NULL) kt = kt->pfrkt_root; if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return; switch (af) { #ifdef INET case AF_INET: { struct sockaddr_in sin; bzero(&sin, sizeof(sin)); sin.sin_len = sizeof(sin); sin.sin_family = AF_INET; sin.sin_addr.s_addr = a->addr32[0]; - ke = (struct pfr_kentry *)rn_match(&sin, kt->pfrkt_ip4); + ke = (struct pfr_kentry *)rn_match(&sin, &kt->pfrkt_ip4->rh); if (ke && KENTRY_RNF_ROOT(ke)) ke = NULL; break; } #endif /* INET */ #ifdef INET6 case AF_INET6: { struct sockaddr_in6 sin6; bzero(&sin6, sizeof(sin6)); sin6.sin6_len = sizeof(sin6); sin6.sin6_family = AF_INET6; bcopy(a, &sin6.sin6_addr, sizeof(sin6.sin6_addr)); - ke = (struct pfr_kentry *)rn_match(&sin6, kt->pfrkt_ip6); + ke = (struct pfr_kentry *)rn_match(&sin6, &kt->pfrkt_ip6->rh); if (ke && KENTRY_RNF_ROOT(ke)) ke = NULL; break; } #endif /* INET6 */ default: panic("%s: unknown address family %u", __func__, af); } if ((ke == NULL || ke->pfrke_not) != notrule) { if (op_pass != PFR_OP_PASS) printf("pfr_update_stats: assertion failed.\n"); op_pass = PFR_OP_XPASS; } kt->pfrkt_packets[dir_out][op_pass]++; kt->pfrkt_bytes[dir_out][op_pass] += len; if (ke != NULL && op_pass != PFR_OP_XPASS && (kt->pfrkt_flags & PFR_TFLAG_COUNTERS)) { if (ke->pfrke_counters == NULL) ke->pfrke_counters = uma_zalloc(V_pfr_kcounters_z, M_NOWAIT | M_ZERO); if (ke->pfrke_counters != NULL) { ke->pfrke_counters->pfrkc_packets[dir_out][op_pass]++; ke->pfrke_counters->pfrkc_bytes[dir_out][op_pass] += len; } } } struct pfr_ktable * pfr_attach_table(struct pf_ruleset *rs, char *name) { struct pfr_ktable *kt, *rt; struct pfr_table tbl; struct pf_anchor *ac = rs->anchor; PF_RULES_WASSERT(); bzero(&tbl, sizeof(tbl)); strlcpy(tbl.pfrt_name, name, sizeof(tbl.pfrt_name)); if (ac != NULL) strlcpy(tbl.pfrt_anchor, ac->path, sizeof(tbl.pfrt_anchor)); kt = pfr_lookup_table(&tbl); if (kt == NULL) { kt = pfr_create_ktable(&tbl, time_second, 1); if (kt == NULL) return (NULL); if (ac != NULL) { bzero(tbl.pfrt_anchor, sizeof(tbl.pfrt_anchor)); rt = pfr_lookup_table(&tbl); if (rt == NULL) { rt = pfr_create_ktable(&tbl, 0, 1); if (rt == NULL) { pfr_destroy_ktable(kt, 0); return (NULL); } pfr_insert_ktable(rt); } kt->pfrkt_root = rt; } pfr_insert_ktable(kt); } if (!kt->pfrkt_refcnt[PFR_REFCNT_RULE]++) pfr_setflags_ktable(kt, kt->pfrkt_flags|PFR_TFLAG_REFERENCED); return (kt); } void pfr_detach_table(struct pfr_ktable *kt) { PF_RULES_WASSERT(); KASSERT(kt->pfrkt_refcnt[PFR_REFCNT_RULE] > 0, ("%s: refcount %d\n", __func__, kt->pfrkt_refcnt[PFR_REFCNT_RULE])); if (!--kt->pfrkt_refcnt[PFR_REFCNT_RULE]) pfr_setflags_ktable(kt, kt->pfrkt_flags&~PFR_TFLAG_REFERENCED); } int pfr_pool_get(struct pfr_ktable *kt, int *pidx, struct pf_addr *counter, sa_family_t af) { struct pf_addr *addr, *cur, *mask; union sockaddr_union uaddr, umask; struct pfr_kentry *ke, *ke2 = NULL; int idx = -1, use_counter = 0; switch (af) { case AF_INET: uaddr.sin.sin_len = sizeof(struct sockaddr_in); uaddr.sin.sin_family = AF_INET; break; case AF_INET6: uaddr.sin6.sin6_len = sizeof(struct sockaddr_in6); uaddr.sin6.sin6_family = AF_INET6; break; } addr = SUNION2PF(&uaddr, af); if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE) && kt->pfrkt_root != NULL) kt = kt->pfrkt_root; if (!(kt->pfrkt_flags & PFR_TFLAG_ACTIVE)) return (-1); if (pidx != NULL) idx = *pidx; if (counter != NULL && idx >= 0) use_counter = 1; if (idx < 0) idx = 0; _next_block: ke = pfr_kentry_byidx(kt, idx, af); if (ke == NULL) { kt->pfrkt_nomatch++; return (1); } pfr_prepare_network(&umask, af, ke->pfrke_net); cur = SUNION2PF(&ke->pfrke_sa, af); mask = SUNION2PF(&umask, af); if (use_counter) { /* is supplied address within block? */ if (!PF_MATCHA(0, cur, mask, counter, af)) { /* no, go to next block in table */ idx++; use_counter = 0; goto _next_block; } PF_ACPY(addr, counter, af); } else { /* use first address of block */ PF_ACPY(addr, cur, af); } if (!KENTRY_NETWORK(ke)) { /* this is a single IP address - no possible nested block */ PF_ACPY(counter, addr, af); *pidx = idx; kt->pfrkt_match++; return (0); } for (;;) { /* we don't want to use a nested block */ switch (af) { case AF_INET: ke2 = (struct pfr_kentry *)rn_match(&uaddr, - kt->pfrkt_ip4); + &kt->pfrkt_ip4->rh); break; case AF_INET6: ke2 = (struct pfr_kentry *)rn_match(&uaddr, - kt->pfrkt_ip6); + &kt->pfrkt_ip6->rh); break; } /* no need to check KENTRY_RNF_ROOT() here */ if (ke2 == ke) { /* lookup return the same block - perfect */ PF_ACPY(counter, addr, af); *pidx = idx; kt->pfrkt_match++; return (0); } /* we need to increase the counter past the nested block */ pfr_prepare_network(&umask, AF_INET, ke2->pfrke_net); PF_POOLMASK(addr, addr, SUNION2PF(&umask, af), &pfr_ffaddr, af); PF_AINC(addr, af); if (!PF_MATCHA(0, cur, mask, addr, af)) { /* ok, we reached the end of our main block */ /* go to next block in table */ idx++; use_counter = 0; goto _next_block; } } } static struct pfr_kentry * pfr_kentry_byidx(struct pfr_ktable *kt, int idx, int af) { struct pfr_walktree w; bzero(&w, sizeof(w)); w.pfrw_op = PFRW_POOL_GET; w.pfrw_cnt = idx; switch (af) { #ifdef INET case AF_INET: - kt->pfrkt_ip4->rnh_walktree(kt->pfrkt_ip4, pfr_walktree, &w); + kt->pfrkt_ip4->rnh_walktree(&kt->pfrkt_ip4->rh, pfr_walktree, &w); return (w.pfrw_kentry); #endif /* INET */ #ifdef INET6 case AF_INET6: - kt->pfrkt_ip6->rnh_walktree(kt->pfrkt_ip6, pfr_walktree, &w); + kt->pfrkt_ip6->rnh_walktree(&kt->pfrkt_ip6->rh, pfr_walktree, &w); return (w.pfrw_kentry); #endif /* INET6 */ default: return (NULL); } } void pfr_dynaddr_update(struct pfr_ktable *kt, struct pfi_dynaddr *dyn) { struct pfr_walktree w; bzero(&w, sizeof(w)); w.pfrw_op = PFRW_DYNADDR_UPDATE; w.pfrw_dyn = dyn; dyn->pfid_acnt4 = 0; dyn->pfid_acnt6 = 0; if (!dyn->pfid_af || dyn->pfid_af == AF_INET) - kt->pfrkt_ip4->rnh_walktree(kt->pfrkt_ip4, pfr_walktree, &w); + kt->pfrkt_ip4->rnh_walktree(&kt->pfrkt_ip4->rh, pfr_walktree, &w); if (!dyn->pfid_af || dyn->pfid_af == AF_INET6) - kt->pfrkt_ip6->rnh_walktree(kt->pfrkt_ip6, pfr_walktree, &w); + kt->pfrkt_ip6->rnh_walktree(&kt->pfrkt_ip6->rh, pfr_walktree, &w); } Index: projects/clang380-import/sys/nfs/bootp_subr.c =================================================================== --- projects/clang380-import/sys/nfs/bootp_subr.c (revision 294776) +++ projects/clang380-import/sys/nfs/bootp_subr.c (revision 294777) @@ -1,1863 +1,1866 @@ /*- * Copyright (c) 1995 Gordon Ross, Adam Glass * Copyright (c) 1992 Regents of the University of California. * All rights reserved. * * This software was developed by the Computer Systems Engineering group * at Lawrence Berkeley Laboratory under DARPA contract BG 91-66 and * contributed to Berkeley. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Lawrence Berkeley Laboratory and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * based on: * nfs/krpc_subr.c * $NetBSD: krpc_subr.c,v 1.10 1995/08/08 20:43:43 gwr Exp $ */ #include __FBSDID("$FreeBSD$"); #include "opt_bootp.h" #include "opt_nfs.h" #include "opt_rootdevname.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#ifdef BOOTP_DEBUG +#include +#endif #include #include #include #include #include #include #include #include #include #include #define BOOTP_MIN_LEN 300 /* Minimum size of bootp udp packet */ #ifndef BOOTP_SETTLE_DELAY #define BOOTP_SETTLE_DELAY 3 #endif /* * Wait 10 seconds for interface appearance * USB ethernet adapters might require some time to pop up */ #ifndef BOOTP_IFACE_WAIT_TIMEOUT #define BOOTP_IFACE_WAIT_TIMEOUT 10 #endif /* * What is the longest we will wait before re-sending a request? * Note this is also the frequency of "RPC timeout" messages. * The re-send loop count sup linearly to this maximum, so the * first complaint will happen after (1+2+3+4+5)=15 seconds. */ #define MAX_RESEND_DELAY 5 /* seconds */ /* Definitions from RFC951 */ struct bootp_packet { u_int8_t op; u_int8_t htype; u_int8_t hlen; u_int8_t hops; u_int32_t xid; u_int16_t secs; u_int16_t flags; struct in_addr ciaddr; struct in_addr yiaddr; struct in_addr siaddr; struct in_addr giaddr; unsigned char chaddr[16]; char sname[64]; char file[128]; unsigned char vend[1222]; }; struct bootpc_ifcontext { STAILQ_ENTRY(bootpc_ifcontext) next; struct bootp_packet call; struct bootp_packet reply; int replylen; int overload; union { struct ifreq _ifreq; struct in_aliasreq _in_alias_req; } _req; #define ireq _req._ifreq #define iareq _req._in_alias_req struct ifnet *ifp; struct sockaddr_dl *sdl; struct sockaddr_in myaddr; struct sockaddr_in netmask; struct sockaddr_in gw; int gotgw; int gotnetmask; int gotrootpath; int outstanding; int sentmsg; u_int32_t xid; enum { IF_BOOTP_UNRESOLVED, IF_BOOTP_RESOLVED, IF_BOOTP_FAILED, IF_DHCP_UNRESOLVED, IF_DHCP_OFFERED, IF_DHCP_RESOLVED, IF_DHCP_FAILED, } state; int dhcpquerytype; /* dhcp type sent */ struct in_addr dhcpserver; int gotdhcpserver; }; #define TAG_MAXLEN 1024 struct bootpc_tagcontext { char buf[TAG_MAXLEN + 1]; int overload; int badopt; int badtag; int foundopt; int taglen; }; struct bootpc_globalcontext { STAILQ_HEAD(, bootpc_ifcontext) interfaces; u_int32_t xid; int any_root_overrides; int gotrootpath; int gotgw; int ifnum; int secs; int starttime; struct bootp_packet reply; int replylen; struct bootpc_ifcontext *setrootfs; struct bootpc_ifcontext *sethostname; struct bootpc_tagcontext tmptag; struct bootpc_tagcontext tag; }; #define IPPORT_BOOTPC 68 #define IPPORT_BOOTPS 67 #define BOOTP_REQUEST 1 #define BOOTP_REPLY 2 /* Common tags */ #define TAG_PAD 0 /* Pad option, implicit length 1 */ #define TAG_SUBNETMASK 1 /* RFC 950 subnet mask */ #define TAG_ROUTERS 3 /* Routers (in order of preference) */ #define TAG_HOSTNAME 12 /* Client host name */ #define TAG_ROOT 17 /* Root path */ /* DHCP specific tags */ #define TAG_OVERLOAD 52 /* Option Overload */ #define TAG_MAXMSGSIZE 57 /* Maximum DHCP Message Size */ #define TAG_END 255 /* End Option (i.e. no more options) */ /* Overload values */ #define OVERLOAD_FILE 1 #define OVERLOAD_SNAME 2 /* Site specific tags: */ #define TAG_ROOTOPTS 130 #define TAG_COOKIE 134 /* ascii info for userland, via sysctl */ #define TAG_DHCP_MSGTYPE 53 #define TAG_DHCP_REQ_ADDR 50 #define TAG_DHCP_SERVERID 54 #define TAG_DHCP_LEASETIME 51 #define TAG_VENDOR_INDENTIFIER 60 #define DHCP_NOMSG 0 #define DHCP_DISCOVER 1 #define DHCP_OFFER 2 #define DHCP_REQUEST 3 #define DHCP_ACK 5 /* NFS read/write block size */ #ifndef BOOTP_BLOCKSIZE #define BOOTP_BLOCKSIZE 8192 #endif static char bootp_cookie[128]; static struct socket *bootp_so; SYSCTL_STRING(_kern, OID_AUTO, bootp_cookie, CTLFLAG_RD, bootp_cookie, 0, "Cookie (T134) supplied by bootp server"); /* mountd RPC */ static int md_mount(struct sockaddr_in *mdsin, char *path, u_char *fhp, int *fhsizep, struct nfs_args *args, struct thread *td); static int setfs(struct sockaddr_in *addr, char *path, char *p, const struct in_addr *siaddr); static int getdec(char **ptr); static int getip(char **ptr, struct in_addr *ip); static void mountopts(struct nfs_args *args, char *p); static int xdr_opaque_decode(struct mbuf **ptr, u_char *buf, int len); static int xdr_int_decode(struct mbuf **ptr, int *iptr); static void print_in_addr(struct in_addr addr); static void print_sin_addr(struct sockaddr_in *addr); static void clear_sinaddr(struct sockaddr_in *sin); static void allocifctx(struct bootpc_globalcontext *gctx); static void bootpc_compose_query(struct bootpc_ifcontext *ifctx, struct thread *td); static unsigned char *bootpc_tag(struct bootpc_tagcontext *tctx, struct bootp_packet *bp, int len, int tag); static void bootpc_tag_helper(struct bootpc_tagcontext *tctx, unsigned char *start, int len, int tag); #ifdef BOOTP_DEBUG void bootpboot_p_sa(struct sockaddr *sa, struct sockaddr *ma); void bootpboot_p_rtentry(struct rtentry *rt); void bootpboot_p_tree(struct radix_node *rn); void bootpboot_p_rtlist(void); void bootpboot_p_if(struct ifnet *ifp, struct ifaddr *ifa); void bootpboot_p_iflist(void); #endif static int bootpc_call(struct bootpc_globalcontext *gctx, struct thread *td); static void bootpc_fakeup_interface(struct bootpc_ifcontext *ifctx, struct thread *td); static int bootpc_adjust_interface(struct bootpc_ifcontext *ifctx, struct bootpc_globalcontext *gctx, struct thread *td); static void bootpc_decode_reply(struct nfsv3_diskless *nd, struct bootpc_ifcontext *ifctx, struct bootpc_globalcontext *gctx); static int bootpc_received(struct bootpc_globalcontext *gctx, struct bootpc_ifcontext *ifctx); static __inline int bootpc_ifctx_isresolved(struct bootpc_ifcontext *ifctx); static __inline int bootpc_ifctx_isunresolved(struct bootpc_ifcontext *ifctx); static __inline int bootpc_ifctx_isfailed(struct bootpc_ifcontext *ifctx); /* * In order to have multiple active interfaces with address 0.0.0.0 * and be able to send data to a selected interface, we first set * mask to /8 on all interfaces, and temporarily set it to /0 when * doing sosend(). */ #ifdef BOOTP_DEBUG void bootpboot_p_sa(struct sockaddr *sa, struct sockaddr *ma) { if (sa == NULL) { printf("(sockaddr *) "); return; } switch (sa->sa_family) { case AF_INET: { struct sockaddr_in *sin; sin = (struct sockaddr_in *) sa; printf("inet "); print_sin_addr(sin); if (ma != NULL) { sin = (struct sockaddr_in *) ma; printf(" mask "); print_sin_addr(sin); } } break; case AF_LINK: { struct sockaddr_dl *sli; int i; sli = (struct sockaddr_dl *) sa; printf("link %.*s ", sli->sdl_nlen, sli->sdl_data); for (i = 0; i < sli->sdl_alen; i++) { if (i > 0) printf(":"); printf("%x", ((unsigned char *) LLADDR(sli))[i]); } } break; default: printf("af%d", sa->sa_family); } } void bootpboot_p_rtentry(struct rtentry *rt) { bootpboot_p_sa(rt_key(rt), rt_mask(rt)); printf(" "); bootpboot_p_sa(rt->rt_gateway, NULL); printf(" "); printf("flags %x", (unsigned short) rt->rt_flags); printf(" %d", (int) rt->rt_expire); printf(" %s\n", rt->rt_ifp->if_xname); } void bootpboot_p_tree(struct radix_node *rn) { while (rn != NULL) { if (rn->rn_bit < 0) { if ((rn->rn_flags & RNF_ROOT) != 0) { } else { bootpboot_p_rtentry((struct rtentry *) rn); } rn = rn->rn_dupedkey; } else { bootpboot_p_tree(rn->rn_left); bootpboot_p_tree(rn->rn_right); return; } } } void bootpboot_p_rtlist(void) { - struct radix_node_head *rnh; + struct rib_head *rnh; printf("Routing table:\n"); rnh = rt_tables_get_rnh(0, AF_INET); if (rnh == NULL) return; - RADIX_NODE_HEAD_RLOCK(rnh); /* could sleep XXX */ + RIB_RLOCK(rnh); /* could sleep XXX */ bootpboot_p_tree(rnh->rnh_treetop); - RADIX_NODE_HEAD_RUNLOCK(rnh); + RIB_RUNLOCK(rnh); } void bootpboot_p_if(struct ifnet *ifp, struct ifaddr *ifa) { printf("%s flags %x, addr ", ifp->if_xname, ifp->if_flags); print_sin_addr((struct sockaddr_in *) ifa->ifa_addr); printf(", broadcast "); print_sin_addr((struct sockaddr_in *) ifa->ifa_dstaddr); printf(", netmask "); print_sin_addr((struct sockaddr_in *) ifa->ifa_netmask); printf("\n"); } void bootpboot_p_iflist(void) { struct ifnet *ifp; struct ifaddr *ifa; printf("Interface list:\n"); IFNET_RLOCK(); for (ifp = TAILQ_FIRST(&V_ifnet); ifp != NULL; ifp = TAILQ_NEXT(ifp, if_link)) { for (ifa = TAILQ_FIRST(&ifp->if_addrhead); ifa != NULL; ifa = TAILQ_NEXT(ifa, ifa_link)) if (ifa->ifa_addr->sa_family == AF_INET) bootpboot_p_if(ifp, ifa); } IFNET_RUNLOCK(); } #endif /* defined(BOOTP_DEBUG) */ static void clear_sinaddr(struct sockaddr_in *sin) { bzero(sin, sizeof(*sin)); sin->sin_len = sizeof(*sin); sin->sin_family = AF_INET; sin->sin_addr.s_addr = INADDR_ANY; /* XXX: htonl(INAADDR_ANY) ? */ sin->sin_port = 0; } static void allocifctx(struct bootpc_globalcontext *gctx) { struct bootpc_ifcontext *ifctx; ifctx = malloc(sizeof(*ifctx), M_TEMP, M_WAITOK | M_ZERO); ifctx->xid = gctx->xid; #ifdef BOOTP_NO_DHCP ifctx->state = IF_BOOTP_UNRESOLVED; #else ifctx->state = IF_DHCP_UNRESOLVED; #endif gctx->xid += 0x100; STAILQ_INSERT_TAIL(&gctx->interfaces, ifctx, next); } static __inline int bootpc_ifctx_isresolved(struct bootpc_ifcontext *ifctx) { if (ifctx->state == IF_BOOTP_RESOLVED || ifctx->state == IF_DHCP_RESOLVED) return 1; return 0; } static __inline int bootpc_ifctx_isunresolved(struct bootpc_ifcontext *ifctx) { if (ifctx->state == IF_BOOTP_UNRESOLVED || ifctx->state == IF_DHCP_UNRESOLVED) return 1; return 0; } static __inline int bootpc_ifctx_isfailed(struct bootpc_ifcontext *ifctx) { if (ifctx->state == IF_BOOTP_FAILED || ifctx->state == IF_DHCP_FAILED) return 1; return 0; } static int bootpc_received(struct bootpc_globalcontext *gctx, struct bootpc_ifcontext *ifctx) { unsigned char dhcpreplytype; char *p; /* * Need timeout for fallback to less * desirable alternative. */ /* This call used for the side effect (badopt flag) */ (void) bootpc_tag(&gctx->tmptag, &gctx->reply, gctx->replylen, TAG_END); /* If packet is invalid, ignore it */ if (gctx->tmptag.badopt != 0) return 0; p = bootpc_tag(&gctx->tmptag, &gctx->reply, gctx->replylen, TAG_DHCP_MSGTYPE); if (p != NULL) dhcpreplytype = *p; else dhcpreplytype = DHCP_NOMSG; switch (ifctx->dhcpquerytype) { case DHCP_DISCOVER: if (dhcpreplytype != DHCP_OFFER /* Normal DHCP offer */ #ifndef BOOTP_FORCE_DHCP && dhcpreplytype != DHCP_NOMSG /* Fallback to BOOTP */ #endif ) return 0; break; case DHCP_REQUEST: if (dhcpreplytype != DHCP_ACK) return 0; case DHCP_NOMSG: break; } /* Ignore packet unless it gives us a root tag we didn't have */ if ((ifctx->state == IF_BOOTP_RESOLVED || (ifctx->dhcpquerytype == DHCP_DISCOVER && (ifctx->state == IF_DHCP_OFFERED || ifctx->state == IF_DHCP_RESOLVED))) && (bootpc_tag(&gctx->tmptag, &ifctx->reply, ifctx->replylen, TAG_ROOT) != NULL || bootpc_tag(&gctx->tmptag, &gctx->reply, gctx->replylen, TAG_ROOT) == NULL)) return 0; bcopy(&gctx->reply, &ifctx->reply, gctx->replylen); ifctx->replylen = gctx->replylen; /* XXX: Only reset if 'perfect' response */ if (ifctx->state == IF_BOOTP_UNRESOLVED) ifctx->state = IF_BOOTP_RESOLVED; else if (ifctx->state == IF_DHCP_UNRESOLVED && ifctx->dhcpquerytype == DHCP_DISCOVER) { if (dhcpreplytype == DHCP_OFFER) ifctx->state = IF_DHCP_OFFERED; else ifctx->state = IF_BOOTP_RESOLVED; /* Fallback */ } else if (ifctx->state == IF_DHCP_OFFERED && ifctx->dhcpquerytype == DHCP_REQUEST) ifctx->state = IF_DHCP_RESOLVED; if (ifctx->dhcpquerytype == DHCP_DISCOVER && ifctx->state != IF_BOOTP_RESOLVED) { p = bootpc_tag(&gctx->tmptag, &ifctx->reply, ifctx->replylen, TAG_DHCP_SERVERID); if (p != NULL && gctx->tmptag.taglen == 4) { memcpy(&ifctx->dhcpserver, p, 4); ifctx->gotdhcpserver = 1; } else ifctx->gotdhcpserver = 0; return 1; } ifctx->gotrootpath = (bootpc_tag(&gctx->tmptag, &ifctx->reply, ifctx->replylen, TAG_ROOT) != NULL); ifctx->gotgw = (bootpc_tag(&gctx->tmptag, &ifctx->reply, ifctx->replylen, TAG_ROUTERS) != NULL); ifctx->gotnetmask = (bootpc_tag(&gctx->tmptag, &ifctx->reply, ifctx->replylen, TAG_SUBNETMASK) != NULL); return 1; } static int bootpc_call(struct bootpc_globalcontext *gctx, struct thread *td) { struct sockaddr_in *sin, dst; struct uio auio; struct sockopt sopt; struct iovec aio; int error, on, rcvflg, timo, len; time_t atimo; time_t rtimo; struct timeval tv; struct bootpc_ifcontext *ifctx; int outstanding; int gotrootpath; int retry; const char *s; tv.tv_sec = 1; tv.tv_usec = 0; bzero(&sopt, sizeof(sopt)); sopt.sopt_dir = SOPT_SET; sopt.sopt_level = SOL_SOCKET; sopt.sopt_name = SO_RCVTIMEO; sopt.sopt_val = &tv; sopt.sopt_valsize = sizeof tv; error = sosetopt(bootp_so, &sopt); if (error != 0) goto out; /* * Enable broadcast. */ on = 1; sopt.sopt_name = SO_BROADCAST; sopt.sopt_val = &on; sopt.sopt_valsize = sizeof on; error = sosetopt(bootp_so, &sopt); if (error != 0) goto out; /* * Disable routing. */ on = 1; sopt.sopt_name = SO_DONTROUTE; sopt.sopt_val = &on; sopt.sopt_valsize = sizeof on; error = sosetopt(bootp_so, &sopt); if (error != 0) goto out; /* * Bind the local endpoint to a bootp client port. */ sin = &dst; clear_sinaddr(sin); sin->sin_port = htons(IPPORT_BOOTPC); error = sobind(bootp_so, (struct sockaddr *)sin, td); if (error != 0) { printf("bind failed\n"); goto out; } /* * Setup socket address for the server. */ sin = &dst; clear_sinaddr(sin); sin->sin_addr.s_addr = INADDR_BROADCAST; sin->sin_port = htons(IPPORT_BOOTPS); /* * Send it, repeatedly, until a reply is received, * but delay each re-send by an increasing amount. * If the delay hits the maximum, start complaining. */ timo = 0; rtimo = 0; for (;;) { outstanding = 0; gotrootpath = 0; STAILQ_FOREACH(ifctx, &gctx->interfaces, next) { if (bootpc_ifctx_isresolved(ifctx) != 0 && bootpc_tag(&gctx->tmptag, &ifctx->reply, ifctx->replylen, TAG_ROOT) != NULL) gotrootpath = 1; } STAILQ_FOREACH(ifctx, &gctx->interfaces, next) { struct in_aliasreq *ifra = &ifctx->iareq; sin = (struct sockaddr_in *)&ifra->ifra_mask; ifctx->outstanding = 0; if (bootpc_ifctx_isresolved(ifctx) != 0 && gotrootpath != 0) { continue; } if (bootpc_ifctx_isfailed(ifctx) != 0) continue; outstanding++; ifctx->outstanding = 1; /* Proceed to next step in DHCP negotiation */ if ((ifctx->state == IF_DHCP_OFFERED && ifctx->dhcpquerytype != DHCP_REQUEST) || (ifctx->state == IF_DHCP_UNRESOLVED && ifctx->dhcpquerytype != DHCP_DISCOVER) || (ifctx->state == IF_BOOTP_UNRESOLVED && ifctx->dhcpquerytype != DHCP_NOMSG)) { ifctx->sentmsg = 0; bootpc_compose_query(ifctx, td); } /* Send BOOTP request (or re-send). */ if (ifctx->sentmsg == 0) { switch(ifctx->dhcpquerytype) { case DHCP_DISCOVER: s = "DHCP Discover"; break; case DHCP_REQUEST: s = "DHCP Request"; break; case DHCP_NOMSG: default: s = "BOOTP Query"; break; } printf("Sending %s packet from " "interface %s (%*D)\n", s, ifctx->ireq.ifr_name, ifctx->sdl->sdl_alen, (unsigned char *) LLADDR(ifctx->sdl), ":"); ifctx->sentmsg = 1; } aio.iov_base = (caddr_t) &ifctx->call; aio.iov_len = sizeof(ifctx->call); auio.uio_iov = &aio; auio.uio_iovcnt = 1; auio.uio_segflg = UIO_SYSSPACE; auio.uio_rw = UIO_WRITE; auio.uio_offset = 0; auio.uio_resid = sizeof(ifctx->call); auio.uio_td = td; /* Set netmask to 0.0.0.0 */ clear_sinaddr(sin); error = ifioctl(bootp_so, SIOCAIFADDR, (caddr_t)ifra, td); if (error != 0) panic("%s: SIOCAIFADDR, error=%d", __func__, error); error = sosend(bootp_so, (struct sockaddr *) &dst, &auio, NULL, NULL, 0, td); if (error != 0) printf("%s: sosend: %d state %08x\n", __func__, error, (int )bootp_so->so_state); /* Set netmask to 255.0.0.0 */ sin->sin_addr.s_addr = htonl(IN_CLASSA_NET); error = ifioctl(bootp_so, SIOCAIFADDR, (caddr_t)ifra, td); if (error != 0) panic("%s: SIOCAIFADDR, error=%d", __func__, error); } if (outstanding == 0 && (rtimo == 0 || time_second >= rtimo)) { error = 0; goto out; } /* Determine new timeout. */ if (timo < MAX_RESEND_DELAY) timo++; else { printf("DHCP/BOOTP timeout for server "); print_sin_addr(&dst); printf("\n"); } /* * Wait for up to timo seconds for a reply. * The socket receive timeout was set to 1 second. */ atimo = timo + time_second; while (time_second < atimo) { aio.iov_base = (caddr_t) &gctx->reply; aio.iov_len = sizeof(gctx->reply); auio.uio_iov = &aio; auio.uio_iovcnt = 1; auio.uio_segflg = UIO_SYSSPACE; auio.uio_rw = UIO_READ; auio.uio_offset = 0; auio.uio_resid = sizeof(gctx->reply); auio.uio_td = td; rcvflg = 0; error = soreceive(bootp_so, NULL, &auio, NULL, NULL, &rcvflg); gctx->secs = time_second - gctx->starttime; STAILQ_FOREACH(ifctx, &gctx->interfaces, next) { if (bootpc_ifctx_isresolved(ifctx) != 0 || bootpc_ifctx_isfailed(ifctx) != 0) continue; ifctx->call.secs = htons(gctx->secs); } if (error == EWOULDBLOCK) continue; if (error != 0) goto out; len = sizeof(gctx->reply) - auio.uio_resid; /* Do we have the required number of bytes ? */ if (len < BOOTP_MIN_LEN) continue; gctx->replylen = len; /* Is it a reply? */ if (gctx->reply.op != BOOTP_REPLY) continue; /* Is this an answer to our query */ STAILQ_FOREACH(ifctx, &gctx->interfaces, next) { if (gctx->reply.xid != ifctx->call.xid) continue; /* Same HW address size ? */ if (gctx->reply.hlen != ifctx->call.hlen) continue; /* Correct HW address ? */ if (bcmp(gctx->reply.chaddr, ifctx->call.chaddr, ifctx->call.hlen) != 0) continue; break; } if (ifctx != NULL) { s = bootpc_tag(&gctx->tmptag, &gctx->reply, gctx->replylen, TAG_DHCP_MSGTYPE); if (s != NULL) { switch (*s) { case DHCP_OFFER: s = "DHCP Offer"; break; case DHCP_ACK: s = "DHCP Ack"; break; default: s = "DHCP (unexpected)"; break; } } else s = "BOOTP Reply"; printf("Received %s packet" " on %s from ", s, ifctx->ireq.ifr_name); print_in_addr(gctx->reply.siaddr); if (gctx->reply.giaddr.s_addr != htonl(INADDR_ANY)) { printf(" via "); print_in_addr(gctx->reply.giaddr); } if (bootpc_received(gctx, ifctx) != 0) { printf(" (accepted)"); if (ifctx->outstanding) { ifctx->outstanding = 0; outstanding--; } /* Network settle delay */ if (outstanding == 0) atimo = time_second + BOOTP_SETTLE_DELAY; } else printf(" (ignored)"); if (ifctx->gotrootpath || gctx->any_root_overrides) { gotrootpath = 1; rtimo = time_second + BOOTP_SETTLE_DELAY; if (ifctx->gotrootpath) printf(" (got root path)"); } printf("\n"); } } /* while secs */ #ifdef BOOTP_TIMEOUT if (gctx->secs > BOOTP_TIMEOUT && BOOTP_TIMEOUT > 0) break; #endif /* Force a retry if halfway in DHCP negotiation */ retry = 0; STAILQ_FOREACH(ifctx, &gctx->interfaces, next) if (ifctx->state == IF_DHCP_OFFERED) { if (ifctx->dhcpquerytype == DHCP_DISCOVER) retry = 1; else ifctx->state = IF_DHCP_UNRESOLVED; } if (retry != 0) continue; if (gotrootpath != 0) { gctx->gotrootpath = gotrootpath; if (rtimo != 0 && time_second >= rtimo) break; } } /* forever send/receive */ /* * XXX: These are errors of varying seriousness being silently * ignored */ STAILQ_FOREACH(ifctx, &gctx->interfaces, next) if (bootpc_ifctx_isresolved(ifctx) == 0) { printf("%s timeout for interface %s\n", ifctx->dhcpquerytype != DHCP_NOMSG ? "DHCP" : "BOOTP", ifctx->ireq.ifr_name); } if (gctx->gotrootpath != 0) { #if 0 printf("Got a root path, ignoring remaining timeout\n"); #endif error = 0; goto out; } #ifndef BOOTP_NFSROOT STAILQ_FOREACH(ifctx, &gctx->interfaces, next) if (bootpc_ifctx_isresolved(ifctx) != 0) { error = 0; goto out; } #endif error = ETIMEDOUT; out: return (error); } static void bootpc_fakeup_interface(struct bootpc_ifcontext *ifctx, struct thread *td) { struct ifreq *ifr; struct in_aliasreq *ifra; struct sockaddr_in *sin; int error; ifr = &ifctx->ireq; ifra = &ifctx->iareq; /* * Bring up the interface. * * Get the old interface flags and or IFF_UP into them; if * IFF_UP set blindly, interface selection can be clobbered. */ error = ifioctl(bootp_so, SIOCGIFFLAGS, (caddr_t)ifr, td); if (error != 0) panic("%s: SIOCGIFFLAGS, error=%d", __func__, error); ifr->ifr_flags |= IFF_UP; error = ifioctl(bootp_so, SIOCSIFFLAGS, (caddr_t)ifr, td); if (error != 0) panic("%s: SIOCSIFFLAGS, error=%d", __func__, error); /* * Do enough of ifconfig(8) so that the chosen interface * can talk to the servers. Set address to 0.0.0.0/8 and * broadcast address to local broadcast. */ sin = (struct sockaddr_in *)&ifra->ifra_addr; clear_sinaddr(sin); sin = (struct sockaddr_in *)&ifra->ifra_mask; clear_sinaddr(sin); sin->sin_addr.s_addr = htonl(IN_CLASSA_NET); sin = (struct sockaddr_in *)&ifra->ifra_broadaddr; clear_sinaddr(sin); sin->sin_addr.s_addr = htonl(INADDR_BROADCAST); error = ifioctl(bootp_so, SIOCAIFADDR, (caddr_t)ifra, td); if (error != 0) panic("%s: SIOCAIFADDR, error=%d", __func__, error); } static void bootpc_shutdown_interface(struct bootpc_ifcontext *ifctx, struct thread *td) { struct ifreq *ifr; struct sockaddr_in *sin; int error; ifr = &ifctx->ireq; printf("Shutdown interface %s\n", ifctx->ireq.ifr_name); error = ifioctl(bootp_so, SIOCGIFFLAGS, (caddr_t)ifr, td); if (error != 0) panic("%s: SIOCGIFFLAGS, error=%d", __func__, error); ifr->ifr_flags &= ~IFF_UP; error = ifioctl(bootp_so, SIOCSIFFLAGS, (caddr_t)ifr, td); if (error != 0) panic("%s: SIOCSIFFLAGS, error=%d", __func__, error); sin = (struct sockaddr_in *) &ifr->ifr_addr; clear_sinaddr(sin); error = ifioctl(bootp_so, SIOCDIFADDR, (caddr_t) ifr, td); if (error != 0) panic("%s: SIOCDIFADDR, error=%d", __func__, error); } static int bootpc_adjust_interface(struct bootpc_ifcontext *ifctx, struct bootpc_globalcontext *gctx, struct thread *td) { int error; struct sockaddr_in defdst; struct sockaddr_in defmask; struct sockaddr_in *sin; struct ifreq *ifr; struct in_aliasreq *ifra; struct sockaddr_in *myaddr; struct sockaddr_in *netmask; struct sockaddr_in *gw; ifr = &ifctx->ireq; ifra = &ifctx->iareq; myaddr = &ifctx->myaddr; netmask = &ifctx->netmask; gw = &ifctx->gw; if (bootpc_ifctx_isresolved(ifctx) == 0) { /* Shutdown interfaces where BOOTP failed */ bootpc_shutdown_interface(ifctx, td); return (0); } printf("Adjusted interface %s\n", ifctx->ireq.ifr_name); /* * Do enough of ifconfig(8) so that the chosen interface * can talk to the servers. (just set the address) */ sin = (struct sockaddr_in *) &ifr->ifr_addr; clear_sinaddr(sin); error = ifioctl(bootp_so, SIOCDIFADDR, (caddr_t) ifr, td); if (error != 0) panic("%s: SIOCDIFADDR, error=%d", __func__, error); bcopy(myaddr, &ifra->ifra_addr, sizeof(*myaddr)); bcopy(netmask, &ifra->ifra_mask, sizeof(*netmask)); clear_sinaddr(&ifra->ifra_broadaddr); ifra->ifra_broadaddr.sin_addr.s_addr = myaddr->sin_addr.s_addr | ~netmask->sin_addr.s_addr; error = ifioctl(bootp_so, SIOCAIFADDR, (caddr_t)ifra, td); if (error != 0) panic("%s: SIOCAIFADDR, error=%d", __func__, error); /* Add new default route */ if (ifctx->gotgw != 0 || gctx->gotgw == 0) { clear_sinaddr(&defdst); clear_sinaddr(&defmask); /* XXX MRT just table 0 */ error = rtrequest_fib(RTM_ADD, (struct sockaddr *) &defdst, (struct sockaddr *) gw, (struct sockaddr *) &defmask, (RTF_UP | RTF_GATEWAY | RTF_STATIC), NULL, RT_DEFAULT_FIB); if (error != 0) { printf("%s: RTM_ADD, error=%d\n", __func__, error); return (error); } } return (0); } static int setfs(struct sockaddr_in *addr, char *path, char *p, const struct in_addr *siaddr) { if (getip(&p, &addr->sin_addr) == 0) { if (siaddr != NULL && *p == '/') bcopy(siaddr, &addr->sin_addr, sizeof(struct in_addr)); else return 0; } else { if (*p != ':') return 0; p++; } addr->sin_len = sizeof(struct sockaddr_in); addr->sin_family = AF_INET; strlcpy(path, p, MNAMELEN); return 1; } static int getip(char **ptr, struct in_addr *addr) { char *p; unsigned int ip; int val; p = *ptr; ip = 0; if (((val = getdec(&p)) < 0) || (val > 255)) return 0; ip = val << 24; if (*p != '.') return 0; p++; if (((val = getdec(&p)) < 0) || (val > 255)) return 0; ip |= (val << 16); if (*p != '.') return 0; p++; if (((val = getdec(&p)) < 0) || (val > 255)) return 0; ip |= (val << 8); if (*p != '.') return 0; p++; if (((val = getdec(&p)) < 0) || (val > 255)) return 0; ip |= val; addr->s_addr = htonl(ip); *ptr = p; return 1; } static int getdec(char **ptr) { char *p; int ret; p = *ptr; ret = 0; if ((*p < '0') || (*p > '9')) return -1; while ((*p >= '0') && (*p <= '9')) { ret = ret * 10 + (*p - '0'); p++; } *ptr = p; return ret; } static void mountopts(struct nfs_args *args, char *p) { args->version = NFS_ARGSVERSION; args->rsize = BOOTP_BLOCKSIZE; args->wsize = BOOTP_BLOCKSIZE; args->flags = NFSMNT_RSIZE | NFSMNT_WSIZE | NFSMNT_RESVPORT; args->sotype = SOCK_DGRAM; if (p != NULL) nfs_parse_options(p, args); } static int xdr_opaque_decode(struct mbuf **mptr, u_char *buf, int len) { struct mbuf *m; int alignedlen; m = *mptr; alignedlen = ( len + 3 ) & ~3; if (m->m_len < alignedlen) { m = m_pullup(m, alignedlen); if (m == NULL) { *mptr = NULL; return EBADRPC; } } bcopy(mtod(m, u_char *), buf, len); m_adj(m, alignedlen); *mptr = m; return 0; } static int xdr_int_decode(struct mbuf **mptr, int *iptr) { u_int32_t i; if (xdr_opaque_decode(mptr, (u_char *) &i, sizeof(u_int32_t)) != 0) return EBADRPC; *iptr = fxdr_unsigned(u_int32_t, i); return 0; } static void print_sin_addr(struct sockaddr_in *sin) { print_in_addr(sin->sin_addr); } static void print_in_addr(struct in_addr addr) { unsigned int ip; ip = ntohl(addr.s_addr); printf("%d.%d.%d.%d", ip >> 24, (ip >> 16) & 255, (ip >> 8) & 255, ip & 255); } static void bootpc_compose_query(struct bootpc_ifcontext *ifctx, struct thread *td) { unsigned char *vendp; unsigned char vendor_client[64]; uint32_t leasetime; uint8_t vendor_client_len; ifctx->gotrootpath = 0; bzero((caddr_t) &ifctx->call, sizeof(ifctx->call)); /* bootpc part */ ifctx->call.op = BOOTP_REQUEST; /* BOOTREQUEST */ ifctx->call.htype = 1; /* 10mb ethernet */ ifctx->call.hlen = ifctx->sdl->sdl_alen;/* Hardware address length */ ifctx->call.hops = 0; if (bootpc_ifctx_isunresolved(ifctx) != 0) ifctx->xid++; ifctx->call.xid = txdr_unsigned(ifctx->xid); bcopy(LLADDR(ifctx->sdl), &ifctx->call.chaddr, ifctx->sdl->sdl_alen); vendp = ifctx->call.vend; *vendp++ = 99; /* RFC1048 cookie */ *vendp++ = 130; *vendp++ = 83; *vendp++ = 99; *vendp++ = TAG_MAXMSGSIZE; *vendp++ = 2; *vendp++ = (sizeof(struct bootp_packet) >> 8) & 255; *vendp++ = sizeof(struct bootp_packet) & 255; snprintf(vendor_client, sizeof(vendor_client), "%s:%s:%s", ostype, MACHINE, osrelease); vendor_client_len = strlen(vendor_client); *vendp++ = TAG_VENDOR_INDENTIFIER; *vendp++ = vendor_client_len; memcpy(vendp, vendor_client, vendor_client_len); vendp += vendor_client_len; ifctx->dhcpquerytype = DHCP_NOMSG; switch (ifctx->state) { case IF_DHCP_UNRESOLVED: *vendp++ = TAG_DHCP_MSGTYPE; *vendp++ = 1; *vendp++ = DHCP_DISCOVER; ifctx->dhcpquerytype = DHCP_DISCOVER; ifctx->gotdhcpserver = 0; break; case IF_DHCP_OFFERED: *vendp++ = TAG_DHCP_MSGTYPE; *vendp++ = 1; *vendp++ = DHCP_REQUEST; ifctx->dhcpquerytype = DHCP_REQUEST; *vendp++ = TAG_DHCP_REQ_ADDR; *vendp++ = 4; memcpy(vendp, &ifctx->reply.yiaddr, 4); vendp += 4; if (ifctx->gotdhcpserver != 0) { *vendp++ = TAG_DHCP_SERVERID; *vendp++ = 4; memcpy(vendp, &ifctx->dhcpserver, 4); vendp += 4; } *vendp++ = TAG_DHCP_LEASETIME; *vendp++ = 4; leasetime = htonl(300); memcpy(vendp, &leasetime, 4); vendp += 4; break; default: break; } *vendp = TAG_END; ifctx->call.secs = 0; ifctx->call.flags = htons(0x8000); /* We need a broadcast answer */ } static int bootpc_hascookie(struct bootp_packet *bp) { return (bp->vend[0] == 99 && bp->vend[1] == 130 && bp->vend[2] == 83 && bp->vend[3] == 99); } static void bootpc_tag_helper(struct bootpc_tagcontext *tctx, unsigned char *start, int len, int tag) { unsigned char *j; unsigned char *ej; unsigned char code; if (tctx->badtag != 0 || tctx->badopt != 0) return; j = start; ej = j + len; while (j < ej) { code = *j++; if (code == TAG_PAD) continue; if (code == TAG_END) return; if (j >= ej || j + *j + 1 > ej) { tctx->badopt = 1; return; } len = *j++; if (code == tag) { if (tctx->taglen + len > TAG_MAXLEN) { tctx->badtag = 1; return; } tctx->foundopt = 1; if (len > 0) memcpy(tctx->buf + tctx->taglen, j, len); tctx->taglen += len; } if (code == TAG_OVERLOAD) tctx->overload = *j; j += len; } } static unsigned char * bootpc_tag(struct bootpc_tagcontext *tctx, struct bootp_packet *bp, int len, int tag) { tctx->overload = 0; tctx->badopt = 0; tctx->badtag = 0; tctx->foundopt = 0; tctx->taglen = 0; if (bootpc_hascookie(bp) == 0) return NULL; bootpc_tag_helper(tctx, &bp->vend[4], (unsigned char *) bp + len - &bp->vend[4], tag); if ((tctx->overload & OVERLOAD_FILE) != 0) bootpc_tag_helper(tctx, (unsigned char *) bp->file, sizeof(bp->file), tag); if ((tctx->overload & OVERLOAD_SNAME) != 0) bootpc_tag_helper(tctx, (unsigned char *) bp->sname, sizeof(bp->sname), tag); if (tctx->badopt != 0 || tctx->badtag != 0 || tctx->foundopt == 0) return NULL; tctx->buf[tctx->taglen] = '\0'; return tctx->buf; } static void bootpc_decode_reply(struct nfsv3_diskless *nd, struct bootpc_ifcontext *ifctx, struct bootpc_globalcontext *gctx) { char *p, *s; unsigned int ip; ifctx->gotgw = 0; ifctx->gotnetmask = 0; clear_sinaddr(&ifctx->myaddr); clear_sinaddr(&ifctx->netmask); clear_sinaddr(&ifctx->gw); ifctx->myaddr.sin_addr = ifctx->reply.yiaddr; ip = ntohl(ifctx->myaddr.sin_addr.s_addr); printf("%s at ", ifctx->ireq.ifr_name); print_sin_addr(&ifctx->myaddr); printf(" server "); print_in_addr(ifctx->reply.siaddr); ifctx->gw.sin_addr = ifctx->reply.giaddr; if (ifctx->reply.giaddr.s_addr != htonl(INADDR_ANY)) { printf(" via gateway "); print_in_addr(ifctx->reply.giaddr); } /* This call used for the side effect (overload flag) */ (void) bootpc_tag(&gctx->tmptag, &ifctx->reply, ifctx->replylen, TAG_END); if ((gctx->tmptag.overload & OVERLOAD_SNAME) == 0) if (ifctx->reply.sname[0] != '\0') printf(" server name %s", ifctx->reply.sname); if ((gctx->tmptag.overload & OVERLOAD_FILE) == 0) if (ifctx->reply.file[0] != '\0') printf(" boot file %s", ifctx->reply.file); printf("\n"); p = bootpc_tag(&gctx->tag, &ifctx->reply, ifctx->replylen, TAG_SUBNETMASK); if (p != NULL) { if (gctx->tag.taglen != 4) panic("bootpc: subnet mask len is %d", gctx->tag.taglen); bcopy(p, &ifctx->netmask.sin_addr, 4); ifctx->gotnetmask = 1; printf("subnet mask "); print_sin_addr(&ifctx->netmask); printf(" "); } p = bootpc_tag(&gctx->tag, &ifctx->reply, ifctx->replylen, TAG_ROUTERS); if (p != NULL) { /* Routers */ if (gctx->tag.taglen % 4) panic("bootpc: Router Len is %d", gctx->tag.taglen); if (gctx->tag.taglen > 0) { bcopy(p, &ifctx->gw.sin_addr, 4); printf("router "); print_sin_addr(&ifctx->gw); printf(" "); ifctx->gotgw = 1; gctx->gotgw = 1; } } /* * Choose a root filesystem. If a value is forced in the environment * and it contains "nfs:", use it unconditionally. Otherwise, if the * kernel is compiled with the ROOTDEVNAME option, then use it if: * - The server doesn't provide a pathname. * - The boothowto flags include RB_DFLTROOT (user said to override * the server value). */ p = NULL; if ((s = kern_getenv("vfs.root.mountfrom")) != NULL) { if ((p = strstr(s, "nfs:")) != NULL) p = strdup(p + 4, M_TEMP); freeenv(s); } if (p == NULL) { p = bootpc_tag(&gctx->tag, &ifctx->reply, ifctx->replylen, TAG_ROOT); } #ifdef ROOTDEVNAME if ((p == NULL || (boothowto & RB_DFLTROOT) != 0) && (p = strstr(ROOTDEVNAME, "nfs:")) != NULL) { p += 4; } #endif if (p != NULL) { if (gctx->setrootfs != NULL) { printf("rootfs %s (ignored) ", p); } else if (setfs(&nd->root_saddr, nd->root_hostnam, p, &ifctx->reply.siaddr)) { if (*p == '/') { printf("root_server "); print_sin_addr(&nd->root_saddr); printf(" "); } printf("rootfs %s ", p); gctx->gotrootpath = 1; ifctx->gotrootpath = 1; gctx->setrootfs = ifctx; p = bootpc_tag(&gctx->tag, &ifctx->reply, ifctx->replylen, TAG_ROOTOPTS); if (p != NULL) { mountopts(&nd->root_args, p); printf("rootopts %s ", p); } } else panic("Failed to set rootfs to %s", p); } p = bootpc_tag(&gctx->tag, &ifctx->reply, ifctx->replylen, TAG_HOSTNAME); if (p != NULL) { if (gctx->tag.taglen >= MAXHOSTNAMELEN) panic("bootpc: hostname >= %d bytes", MAXHOSTNAMELEN); if (gctx->sethostname != NULL) { printf("hostname %s (ignored) ", p); } else { strcpy(nd->my_hostnam, p); mtx_lock(&prison0.pr_mtx); strcpy(prison0.pr_hostname, p); mtx_unlock(&prison0.pr_mtx); printf("hostname %s ", p); gctx->sethostname = ifctx; } } p = bootpc_tag(&gctx->tag, &ifctx->reply, ifctx->replylen, TAG_COOKIE); if (p != NULL) { /* store in a sysctl variable */ int i, l = sizeof(bootp_cookie) - 1; for (i = 0; i < l && p[i] != '\0'; i++) bootp_cookie[i] = p[i]; p[i] = '\0'; } printf("\n"); if (ifctx->gotnetmask == 0) { if (IN_CLASSA(ntohl(ifctx->myaddr.sin_addr.s_addr))) ifctx->netmask.sin_addr.s_addr = htonl(IN_CLASSA_NET); else if (IN_CLASSB(ntohl(ifctx->myaddr.sin_addr.s_addr))) ifctx->netmask.sin_addr.s_addr = htonl(IN_CLASSB_NET); else ifctx->netmask.sin_addr.s_addr = htonl(IN_CLASSC_NET); } if (ifctx->gotgw == 0) { /* Use proxyarp */ ifctx->gw.sin_addr.s_addr = ifctx->myaddr.sin_addr.s_addr; } } void bootpc_init(void) { struct bootpc_ifcontext *ifctx; /* Interface BOOTP contexts */ struct bootpc_globalcontext *gctx; /* Global BOOTP context */ struct ifnet *ifp; struct sockaddr_dl *sdl; struct ifaddr *ifa; int error; #ifndef BOOTP_WIRED_TO int ifcnt; #endif struct nfsv3_diskless *nd; struct thread *td; int timeout; int delay; timeout = BOOTP_IFACE_WAIT_TIMEOUT * hz; delay = hz / 10; nd = &nfsv3_diskless; td = curthread; /* * If already filled in, don't touch it here */ if (nfs_diskless_valid != 0) return; gctx = malloc(sizeof(*gctx), M_TEMP, M_WAITOK | M_ZERO); STAILQ_INIT(&gctx->interfaces); gctx->xid = ~0xFFFF; gctx->starttime = time_second; /* * If ROOTDEVNAME is defined or vfs.root.mountfrom is set then we have * root-path overrides that can potentially let us boot even if we don't * get a root path from the server, so we can treat that as a non-error. */ #ifdef ROOTDEVNAME gctx->any_root_overrides = 1; #else gctx->any_root_overrides = testenv("vfs.root.mountfrom"); #endif /* * Find a network interface. */ CURVNET_SET(TD_TO_VNET(td)); #ifdef BOOTP_WIRED_TO printf("%s: wired to interface '%s'\n", __func__, __XSTRING(BOOTP_WIRED_TO)); allocifctx(gctx); #else /* * Preallocate interface context storage, if another interface * attaches and wins the race, it won't be eligible for bootp. */ ifcnt = 0; IFNET_RLOCK(); TAILQ_FOREACH(ifp, &V_ifnet, if_link) { if ((ifp->if_flags & (IFF_LOOPBACK | IFF_POINTOPOINT | IFF_BROADCAST)) != IFF_BROADCAST) continue; switch (ifp->if_alloctype) { case IFT_ETHER: case IFT_FDDI: case IFT_ISO88025: break; default: continue; } ifcnt++; } IFNET_RUNLOCK(); if (ifcnt == 0) panic("%s: no eligible interfaces", __func__); for (; ifcnt > 0; ifcnt--) allocifctx(gctx); #endif retry: ifctx = STAILQ_FIRST(&gctx->interfaces); IFNET_RLOCK(); TAILQ_FOREACH(ifp, &V_ifnet, if_link) { if (ifctx == NULL) break; #ifdef BOOTP_WIRED_TO if (strcmp(ifp->if_xname, __XSTRING(BOOTP_WIRED_TO)) != 0) continue; #else if ((ifp->if_flags & (IFF_LOOPBACK | IFF_POINTOPOINT | IFF_BROADCAST)) != IFF_BROADCAST) continue; switch (ifp->if_alloctype) { case IFT_ETHER: case IFT_FDDI: case IFT_ISO88025: break; default: continue; } #endif strlcpy(ifctx->ireq.ifr_name, ifp->if_xname, sizeof(ifctx->ireq.ifr_name)); ifctx->ifp = ifp; /* Get HW address */ sdl = NULL; TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) if (ifa->ifa_addr->sa_family == AF_LINK) { sdl = (struct sockaddr_dl *)ifa->ifa_addr; if (sdl->sdl_type == IFT_ETHER) break; } if (sdl == NULL) panic("bootpc: Unable to find HW address for %s", ifctx->ireq.ifr_name); ifctx->sdl = sdl; ifctx = STAILQ_NEXT(ifctx, next); } IFNET_RUNLOCK(); CURVNET_RESTORE(); if (STAILQ_EMPTY(&gctx->interfaces) || STAILQ_FIRST(&gctx->interfaces)->ifp == NULL) { if (timeout > 0) { pause("bootpc", delay); timeout -= delay; goto retry; } #ifdef BOOTP_WIRED_TO panic("%s: Could not find interface specified " "by BOOTP_WIRED_TO: " __XSTRING(BOOTP_WIRED_TO), __func__); #else panic("%s: no suitable interface", __func__); #endif } error = socreate(AF_INET, &bootp_so, SOCK_DGRAM, 0, td->td_ucred, td); if (error != 0) panic("%s: socreate, error=%d", __func__, error); STAILQ_FOREACH(ifctx, &gctx->interfaces, next) bootpc_fakeup_interface(ifctx, td); STAILQ_FOREACH(ifctx, &gctx->interfaces, next) bootpc_compose_query(ifctx, td); error = bootpc_call(gctx, td); if (error != 0) { printf("BOOTP call failed\n"); } mountopts(&nd->root_args, NULL); STAILQ_FOREACH(ifctx, &gctx->interfaces, next) if (bootpc_ifctx_isresolved(ifctx) != 0) bootpc_decode_reply(nd, ifctx, gctx); #ifdef BOOTP_NFSROOT if (gctx->gotrootpath == 0 && gctx->any_root_overrides == 0) panic("bootpc: No root path offered"); #endif STAILQ_FOREACH(ifctx, &gctx->interfaces, next) bootpc_adjust_interface(ifctx, gctx, td); soclose(bootp_so); STAILQ_FOREACH(ifctx, &gctx->interfaces, next) if (ifctx->gotrootpath != 0) break; if (ifctx == NULL) { STAILQ_FOREACH(ifctx, &gctx->interfaces, next) if (bootpc_ifctx_isresolved(ifctx) != 0) break; } if (ifctx == NULL) goto out; if (gctx->gotrootpath != 0) { kern_setenv("boot.netif.name", ifctx->ifp->if_xname); error = md_mount(&nd->root_saddr, nd->root_hostnam, nd->root_fh, &nd->root_fhsize, &nd->root_args, td); if (error != 0) { if (gctx->any_root_overrides == 0) panic("nfs_boot: mount root, error=%d", error); else goto out; } rootdevnames[0] = "nfs:"; nfs_diskless_valid = 3; } strcpy(nd->myif.ifra_name, ifctx->ireq.ifr_name); bcopy(&ifctx->myaddr, &nd->myif.ifra_addr, sizeof(ifctx->myaddr)); bcopy(&ifctx->myaddr, &nd->myif.ifra_broadaddr, sizeof(ifctx->myaddr)); ((struct sockaddr_in *) &nd->myif.ifra_broadaddr)->sin_addr.s_addr = ifctx->myaddr.sin_addr.s_addr | ~ ifctx->netmask.sin_addr.s_addr; bcopy(&ifctx->netmask, &nd->myif.ifra_mask, sizeof(ifctx->netmask)); out: while((ifctx = STAILQ_FIRST(&gctx->interfaces)) != NULL) { STAILQ_REMOVE_HEAD(&gctx->interfaces, next); free(ifctx, M_TEMP); } free(gctx, M_TEMP); } /* * RPC: mountd/mount * Given a server pathname, get an NFS file handle. * Also, sets sin->sin_port to the NFS service port. */ static int md_mount(struct sockaddr_in *mdsin, char *path, u_char *fhp, int *fhsizep, struct nfs_args *args, struct thread *td) { struct mbuf *m; int error; int authunixok; int authcount; int authver; #define RPCPROG_MNT 100005 #define RPCMNT_VER1 1 #define RPCMNT_VER3 3 #define RPCMNT_MOUNT 1 #define AUTH_SYS 1 /* unix style (uid, gids) */ #define AUTH_UNIX AUTH_SYS /* XXX honor v2/v3 flags in args->flags? */ #ifdef BOOTP_NFSV3 /* First try NFS v3 */ /* Get port number for MOUNTD. */ error = krpc_portmap(mdsin, RPCPROG_MNT, RPCMNT_VER3, &mdsin->sin_port, td); if (error == 0) { m = xdr_string_encode(path, strlen(path)); /* Do RPC to mountd. */ error = krpc_call(mdsin, RPCPROG_MNT, RPCMNT_VER3, RPCMNT_MOUNT, &m, NULL, td); } if (error == 0) { args->flags |= NFSMNT_NFSV3; } else { #endif /* Fallback to NFS v2 */ /* Get port number for MOUNTD. */ error = krpc_portmap(mdsin, RPCPROG_MNT, RPCMNT_VER1, &mdsin->sin_port, td); if (error != 0) return error; m = xdr_string_encode(path, strlen(path)); /* Do RPC to mountd. */ error = krpc_call(mdsin, RPCPROG_MNT, RPCMNT_VER1, RPCMNT_MOUNT, &m, NULL, td); if (error != 0) return error; /* message already freed */ #ifdef BOOTP_NFSV3 } #endif if (xdr_int_decode(&m, &error) != 0 || error != 0) goto bad; if ((args->flags & NFSMNT_NFSV3) != 0) { if (xdr_int_decode(&m, fhsizep) != 0 || *fhsizep > NFSX_V3FHMAX || *fhsizep <= 0) goto bad; } else *fhsizep = NFSX_V2FH; if (xdr_opaque_decode(&m, fhp, *fhsizep) != 0) goto bad; if (args->flags & NFSMNT_NFSV3) { if (xdr_int_decode(&m, &authcount) != 0) goto bad; authunixok = 0; if (authcount < 0 || authcount > 100) goto bad; while (authcount > 0) { if (xdr_int_decode(&m, &authver) != 0) goto bad; if (authver == AUTH_UNIX) authunixok = 1; authcount--; } if (authunixok == 0) goto bad; } /* Set port number for NFS use. */ error = krpc_portmap(mdsin, NFS_PROG, (args->flags & NFSMNT_NFSV3) ? NFS_VER3 : NFS_VER2, &mdsin->sin_port, td); goto out; bad: error = EBADRPC; out: m_freem(m); return error; } SYSINIT(bootp_rootconf, SI_SUB_ROOT_CONF, SI_ORDER_FIRST, bootpc_init, NULL); Index: projects/clang380-import/sys/ofed/drivers/infiniband/core/cma.c =================================================================== --- projects/clang380-import/sys/ofed/drivers/infiniband/core/cma.c (revision 294776) +++ projects/clang380-import/sys/ofed/drivers/infiniband/core/cma.c (revision 294777) @@ -1,3747 +1,3824 @@ /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. * Copyright (c) 2005-2006 Intel Corporation. All rights reserved. + * Copyright (c) 2016 Chelsio Communications. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); MODULE_LICENSE("Dual BSD/GPL"); #define CMA_CM_RESPONSE_TIMEOUT 20 #define CMA_MAX_CM_RETRIES 15 #define CMA_CM_MRA_SETTING (IB_CM_MRA_FLAG_DELAY | 24) #define CMA_IBOE_PACKET_LIFETIME 18 static int cma_response_timeout = CMA_CM_RESPONSE_TIMEOUT; module_param_named(cma_response_timeout, cma_response_timeout, int, 0644); MODULE_PARM_DESC(cma_response_timeout, "CMA_CM_RESPONSE_TIMEOUT (default=20)"); static int def_prec2sl = 3; module_param_named(def_prec2sl, def_prec2sl, int, 0644); MODULE_PARM_DESC(def_prec2sl, "Default value for SL priority with RoCE. Valid values 0 - 7"); static int unify_tcp_port_space = 1; module_param(unify_tcp_port_space, int, 0644); MODULE_PARM_DESC(unify_tcp_port_space, "Unify the host TCP and RDMA port " "space allocation (default=1)"); static int debug_level = 0; #define cma_pr(level, priv, format, arg...) \ printk(level "CMA: %p: %s: " format, ((struct rdma_id_priv *) priv) , __func__, ## arg) #define cma_dbg(priv, format, arg...) \ do { if (debug_level) cma_pr(KERN_DEBUG, priv, format, ## arg); } while (0) #define cma_warn(priv, format, arg...) \ cma_pr(KERN_WARNING, priv, format, ## arg) #define CMA_GID_FMT "%2.2x%2.2x:%2.2x%2.2x" #define CMA_GID_RAW_ARG(gid) ((u8 *)(gid))[12],\ ((u8 *)(gid))[13],\ ((u8 *)(gid))[14],\ ((u8 *)(gid))[15] #define CMA_GID_ARG(gid) CMA_GID_RAW_ARG((gid).raw) #define cma_debug_path(priv, pfx, p) \ cma_dbg(priv, pfx "sgid=" CMA_GID_FMT ",dgid=" \ CMA_GID_FMT "\n", CMA_GID_ARG(p.sgid), \ CMA_GID_ARG(p.dgid)) #define cma_debug_gid(priv, g) \ cma_dbg(priv, "gid=" CMA_GID_FMT "\n", CMA_GID_ARG(g) module_param_named(debug_level, debug_level, int, 0644); MODULE_PARM_DESC(debug_level, "debug level default=0"); static void cma_add_one(struct ib_device *device); static void cma_remove_one(struct ib_device *device); static struct ib_client cma_client = { .name = "cma", .add = cma_add_one, .remove = cma_remove_one }; static struct ib_sa_client sa_client; static struct rdma_addr_client addr_client; static LIST_HEAD(dev_list); static LIST_HEAD(listen_any_list); static DEFINE_MUTEX(lock); static struct workqueue_struct *cma_wq; static struct workqueue_struct *cma_free_wq; static DEFINE_IDR(sdp_ps); static DEFINE_IDR(tcp_ps); static DEFINE_IDR(udp_ps); static DEFINE_IDR(ipoib_ps); static DEFINE_IDR(ib_ps); struct cma_device { struct list_head list; struct ib_device *device; struct completion comp; atomic_t refcount; struct list_head id_list; }; struct rdma_bind_list { struct idr *ps; struct hlist_head owners; unsigned short port; }; enum { CMA_OPTION_AFONLY, }; /* * Device removal can occur at anytime, so we need extra handling to * serialize notifying the user of device removal with other callbacks. * We do this by disabling removal notification while a callback is in process, * and reporting it after the callback completes. */ struct rdma_id_private { struct rdma_cm_id id; struct rdma_bind_list *bind_list; struct socket *sock; struct hlist_node node; struct list_head list; /* listen_any_list or cma_device.list */ struct list_head listen_list; /* per device listens */ struct cma_device *cma_dev; struct list_head mc_list; int internal_id; enum rdma_cm_state state; spinlock_t lock; spinlock_t cm_lock; struct mutex qp_mutex; struct completion comp; atomic_t refcount; struct mutex handler_mutex; struct work_struct work; /* garbage coll */ int backlog; int timeout_ms; struct ib_sa_query *query; int query_id; union { struct ib_cm_id *ib; struct iw_cm_id *iw; } cm_id; u32 seq_num; u32 qkey; u32 qp_num; pid_t owner; u32 options; u8 srq; u8 tos; u8 reuseaddr; u8 afonly; int qp_timeout; /* cache for mc record params */ struct ib_sa_mcmember_rec rec; int is_valid_rec; }; struct cma_multicast { struct rdma_id_private *id_priv; union { struct ib_sa_multicast *ib; } multicast; struct list_head list; void *context; struct sockaddr_storage addr; struct kref mcref; }; struct cma_work { struct work_struct work; struct rdma_id_private *id; enum rdma_cm_state old_state; enum rdma_cm_state new_state; struct rdma_cm_event event; }; struct cma_ndev_work { struct work_struct work; struct rdma_id_private *id; struct rdma_cm_event event; }; struct iboe_mcast_work { struct work_struct work; struct rdma_id_private *id; struct cma_multicast *mc; }; union cma_ip_addr { struct in6_addr ip6; struct { __be32 pad[3]; __be32 addr; } ip4; }; struct cma_hdr { u8 cma_version; u8 ip_version; /* IP version: 7:4 */ __be16 port; union cma_ip_addr src_addr; union cma_ip_addr dst_addr; }; struct sdp_hh { u8 bsdh[16]; u8 sdp_version; /* Major version: 7:4 */ u8 ip_version; /* IP version: 7:4 */ u8 sdp_specific1[10]; __be16 port; __be16 sdp_specific2; union cma_ip_addr src_addr; union cma_ip_addr dst_addr; }; struct sdp_hah { u8 bsdh[16]; u8 sdp_version; }; #define CMA_VERSION 0x00 #define SDP_MAJ_VERSION 0x2 static int cma_comp(struct rdma_id_private *id_priv, enum rdma_cm_state comp) { unsigned long flags; int ret; spin_lock_irqsave(&id_priv->lock, flags); ret = (id_priv->state == comp); spin_unlock_irqrestore(&id_priv->lock, flags); return ret; } static int cma_comp_exch(struct rdma_id_private *id_priv, enum rdma_cm_state comp, enum rdma_cm_state exch) { unsigned long flags; int ret; spin_lock_irqsave(&id_priv->lock, flags); if ((ret = (id_priv->state == comp))) id_priv->state = exch; spin_unlock_irqrestore(&id_priv->lock, flags); return ret; } static enum rdma_cm_state cma_exch(struct rdma_id_private *id_priv, enum rdma_cm_state exch) { unsigned long flags; enum rdma_cm_state old; spin_lock_irqsave(&id_priv->lock, flags); old = id_priv->state; id_priv->state = exch; spin_unlock_irqrestore(&id_priv->lock, flags); return old; } static inline u8 cma_get_ip_ver(struct cma_hdr *hdr) { return hdr->ip_version >> 4; } static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver) { hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF); } static inline u8 sdp_get_majv(u8 sdp_version) { return sdp_version >> 4; } static inline u8 sdp_get_ip_ver(struct sdp_hh *hh) { return hh->ip_version >> 4; } static inline void sdp_set_ip_ver(struct sdp_hh *hh, u8 ip_ver) { hh->ip_version = (ip_ver << 4) | (hh->ip_version & 0xF); } static void cma_attach_to_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { atomic_inc(&cma_dev->refcount); id_priv->cma_dev = cma_dev; id_priv->id.device = cma_dev->device; id_priv->id.route.addr.dev_addr.transport = rdma_node_get_transport(cma_dev->device->node_type); list_add_tail(&id_priv->list, &cma_dev->id_list); } static inline void cma_deref_dev(struct cma_device *cma_dev) { if (atomic_dec_and_test(&cma_dev->refcount)) complete(&cma_dev->comp); } static inline void release_mc(struct kref *kref) { struct cma_multicast *mc = container_of(kref, struct cma_multicast, mcref); kfree(mc->multicast.ib); kfree(mc); } static void cma_release_dev(struct rdma_id_private *id_priv) { mutex_lock(&lock); list_del(&id_priv->list); cma_deref_dev(id_priv->cma_dev); id_priv->cma_dev = NULL; mutex_unlock(&lock); } static int cma_set_qkey(struct rdma_id_private *id_priv) { struct ib_sa_mcmember_rec rec; int ret = 0; if (id_priv->qkey) return 0; switch (id_priv->id.ps) { case RDMA_PS_UDP: id_priv->qkey = RDMA_UDP_QKEY; break; case RDMA_PS_IPOIB: ib_addr_get_mgid(&id_priv->id.route.addr.dev_addr, &rec.mgid); ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, &rec.mgid, &rec); if (!ret) id_priv->qkey = be32_to_cpu(rec.qkey); break; default: break; } return ret; } static int find_gid_port(struct ib_device *device, union ib_gid *gid, u8 port_num) { int i; int err; struct ib_port_attr props; union ib_gid tmp; err = ib_query_port(device, port_num, &props); if (err) return 1; for (i = 0; i < props.gid_tbl_len; ++i) { err = ib_query_gid(device, port_num, i, &tmp); if (err) return 1; if (!memcmp(&tmp, gid, sizeof tmp)) return 0; } return -EAGAIN; } +int +rdma_find_cmid_laddr(struct sockaddr_in *local_addr, unsigned short dev_type, + void **cm_id) +{ + int ret; + u8 port; + int found_dev = 0, found_cmid = 0; + struct rdma_id_private *id_priv; + struct rdma_id_private *dev_id_priv; + struct cma_device *cma_dev; + struct rdma_dev_addr dev_addr; + union ib_gid gid; + enum rdma_link_layer dev_ll = dev_type == ARPHRD_INFINIBAND ? + IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET; + + memset(&dev_addr, 0, sizeof(dev_addr)); + + ret = rdma_translate_ip((struct sockaddr *)local_addr, + &dev_addr, NULL); + if (ret) + goto err; + + /* find rdma device based on MAC address/gid */ + mutex_lock(&lock); + + memcpy(&gid, dev_addr.src_dev_addr + + rdma_addr_gid_offset(&dev_addr), sizeof(gid)); + + list_for_each_entry(cma_dev, &dev_list, list) + for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) + if ((rdma_port_get_link_layer(cma_dev->device, port) == + dev_ll) && + (rdma_node_get_transport(cma_dev->device->node_type) == + RDMA_TRANSPORT_IWARP)) { + ret = find_gid_port(cma_dev->device, + &gid, port); + if (!ret) { + found_dev = 1; + goto out; + } else if (ret == 1) { + mutex_unlock(&lock); + goto err; + } + } +out: + mutex_unlock(&lock); + + if (!found_dev) + goto err; + + /* Traverse through the list of listening cm_id's to find the + * desired cm_id based on rdma device & port number. + */ + list_for_each_entry(id_priv, &listen_any_list, list) + list_for_each_entry(dev_id_priv, &id_priv->listen_list, + listen_list) + if (dev_id_priv->cma_dev == cma_dev) + if (dev_id_priv->cm_id.iw->local_addr.sin_port + == local_addr->sin_port) { + *cm_id = (void *)dev_id_priv->cm_id.iw; + found_cmid = 1; + } + return found_cmid ? 0 : -ENODEV; + +err: + return -ENODEV; +} +EXPORT_SYMBOL(rdma_find_cmid_laddr); + static int cma_acquire_dev(struct rdma_id_private *id_priv) { struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; struct cma_device *cma_dev; union ib_gid gid, iboe_gid; int ret = -ENODEV; u8 port; enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ? IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET; if (dev_ll != IB_LINK_LAYER_INFINIBAND && id_priv->id.ps == RDMA_PS_IPOIB) return -EINVAL; mutex_lock(&lock); rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr, &iboe_gid); memcpy(&gid, dev_addr->src_dev_addr + rdma_addr_gid_offset(dev_addr), sizeof gid); list_for_each_entry(cma_dev, &dev_list, list) { for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) { if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) { if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB && rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET) ret = find_gid_port(cma_dev->device, &iboe_gid, port); else ret = find_gid_port(cma_dev->device, &gid, port); if (!ret) { id_priv->id.port_num = port; goto out; } else if (ret == 1) break; } } } out: if (!ret) cma_attach_to_dev(id_priv, cma_dev); mutex_unlock(&lock); return ret; } static void cma_deref_id(struct rdma_id_private *id_priv) { if (atomic_dec_and_test(&id_priv->refcount)) complete(&id_priv->comp); } static int cma_disable_callback(struct rdma_id_private *id_priv, enum rdma_cm_state state) { mutex_lock(&id_priv->handler_mutex); if (id_priv->state != state) { mutex_unlock(&id_priv->handler_mutex); return -EINVAL; } return 0; } struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, void *context, enum rdma_port_space ps, enum ib_qp_type qp_type) { struct rdma_id_private *id_priv; id_priv = kzalloc(sizeof *id_priv, GFP_KERNEL); if (!id_priv) return ERR_PTR(-ENOMEM); id_priv->owner = curthread->td_proc->p_pid; id_priv->state = RDMA_CM_IDLE; id_priv->id.context = context; id_priv->id.event_handler = event_handler; id_priv->id.ps = ps; id_priv->id.qp_type = qp_type; spin_lock_init(&id_priv->lock); spin_lock_init(&id_priv->cm_lock); mutex_init(&id_priv->qp_mutex); init_completion(&id_priv->comp); atomic_set(&id_priv->refcount, 1); mutex_init(&id_priv->handler_mutex); INIT_LIST_HEAD(&id_priv->listen_list); INIT_LIST_HEAD(&id_priv->mc_list); get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num); return &id_priv->id; } EXPORT_SYMBOL(rdma_create_id); static int cma_init_ud_qp(struct rdma_id_private *id_priv, struct ib_qp *qp) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; qp_attr.qp_state = IB_QPS_INIT; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) return ret; ret = ib_modify_qp(qp, &qp_attr, qp_attr_mask); if (ret) return ret; qp_attr.qp_state = IB_QPS_RTR; ret = ib_modify_qp(qp, &qp_attr, IB_QP_STATE); if (ret) return ret; qp_attr.qp_state = IB_QPS_RTS; qp_attr.sq_psn = 0; ret = ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_SQ_PSN); return ret; } static int cma_init_conn_qp(struct rdma_id_private *id_priv, struct ib_qp *qp) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; qp_attr.qp_state = IB_QPS_INIT; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) return ret; return ib_modify_qp(qp, &qp_attr, qp_attr_mask); } int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd, struct ib_qp_init_attr *qp_init_attr) { struct rdma_id_private *id_priv; struct ib_qp *qp; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (id->device != pd->device) return -EINVAL; qp = ib_create_qp(pd, qp_init_attr); if (IS_ERR(qp)) return PTR_ERR(qp); if (id->qp_type == IB_QPT_UD) ret = cma_init_ud_qp(id_priv, qp); else ret = cma_init_conn_qp(id_priv, qp); if (ret) goto err; id->qp = qp; id_priv->qp_num = qp->qp_num; id_priv->srq = (qp->srq != NULL); return 0; err: ib_destroy_qp(qp); return ret; } EXPORT_SYMBOL(rdma_create_qp); void rdma_destroy_qp(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; id_priv = container_of(id, struct rdma_id_private, id); mutex_lock(&id_priv->qp_mutex); ib_destroy_qp(id_priv->id.qp); id_priv->id.qp = NULL; mutex_unlock(&id_priv->qp_mutex); } EXPORT_SYMBOL(rdma_destroy_qp); static int cma_modify_qp_rtr(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; union ib_gid sgid; mutex_lock(&id_priv->qp_mutex); if (!id_priv->id.qp) { ret = 0; goto out; } /* Need to update QP attributes from default values. */ qp_attr.qp_state = IB_QPS_INIT; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) goto out; ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); if (ret) goto out; qp_attr.qp_state = IB_QPS_RTR; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) goto out; ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num, qp_attr.ah_attr.grh.sgid_index, &sgid); if (ret) goto out; if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB && rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) == IB_LINK_LAYER_ETHERNET) { ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL); if (ret) goto out; } if (conn_param) qp_attr.max_dest_rd_atomic = conn_param->responder_resources; ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); out: mutex_unlock(&id_priv->qp_mutex); return ret; } static int cma_modify_qp_rts(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; mutex_lock(&id_priv->qp_mutex); if (!id_priv->id.qp) { ret = 0; goto out; } qp_attr.qp_state = IB_QPS_RTS; ret = rdma_init_qp_attr(&id_priv->id, &qp_attr, &qp_attr_mask); if (ret) goto out; if (conn_param) qp_attr.max_rd_atomic = conn_param->initiator_depth; if (id_priv->qp_timeout && id_priv->id.qp->qp_type == IB_QPT_RC) { qp_attr.timeout = id_priv->qp_timeout; qp_attr_mask |= IB_QP_TIMEOUT; } ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask); out: mutex_unlock(&id_priv->qp_mutex); return ret; } static int cma_modify_qp_err(struct rdma_id_private *id_priv) { struct ib_qp_attr qp_attr; int ret; mutex_lock(&id_priv->qp_mutex); if (!id_priv->id.qp) { ret = 0; goto out; } qp_attr.qp_state = IB_QPS_ERR; ret = ib_modify_qp(id_priv->id.qp, &qp_attr, IB_QP_STATE); out: mutex_unlock(&id_priv->qp_mutex); return ret; } static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv, struct ib_qp_attr *qp_attr, int *qp_attr_mask) { struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; int ret; u16 pkey; if (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) == IB_LINK_LAYER_INFINIBAND) pkey = ib_addr_get_pkey(dev_addr); else pkey = 0xffff; ret = ib_find_cached_pkey(id_priv->id.device, id_priv->id.port_num, pkey, &qp_attr->pkey_index); if (ret) return ret; qp_attr->port_num = id_priv->id.port_num; *qp_attr_mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_PORT; if (id_priv->id.qp_type == IB_QPT_UD) { ret = cma_set_qkey(id_priv); if (ret) return ret; qp_attr->qkey = id_priv->qkey; *qp_attr_mask |= IB_QP_QKEY; } else { qp_attr->qp_access_flags = 0; *qp_attr_mask |= IB_QP_ACCESS_FLAGS; } return 0; } int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int *qp_attr_mask) { struct rdma_id_private *id_priv; int ret = 0; id_priv = container_of(id, struct rdma_id_private, id); switch (rdma_node_get_transport(id_priv->id.device->node_type)) { case RDMA_TRANSPORT_IB: if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD)) ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask); else ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, qp_attr, qp_attr_mask); if (qp_attr->qp_state == IB_QPS_RTR) qp_attr->rq_psn = id_priv->seq_num; break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: if (!id_priv->cm_id.iw) { qp_attr->qp_access_flags = 0; *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; } else ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr, qp_attr_mask); break; default: ret = -ENOSYS; break; } return ret; } EXPORT_SYMBOL(rdma_init_qp_attr); static inline int cma_zero_addr(struct sockaddr *addr) { struct in6_addr *ip6; if (addr->sa_family == AF_INET) return ipv4_is_zeronet( ((struct sockaddr_in *)addr)->sin_addr.s_addr); else { ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr; return (ip6->s6_addr32[0] | ip6->s6_addr32[1] | ip6->s6_addr32[2] | ip6->s6_addr32[3]) == 0; } } static inline int cma_loopback_addr(struct sockaddr *addr) { if (addr->sa_family == AF_INET) return ipv4_is_loopback( ((struct sockaddr_in *) addr)->sin_addr.s_addr); else return ipv6_addr_loopback( &((struct sockaddr_in6 *) addr)->sin6_addr); } static inline int cma_any_addr(struct sockaddr *addr) { return cma_zero_addr(addr) || cma_loopback_addr(addr); } +int +rdma_cma_any_addr(struct sockaddr *addr) +{ + return cma_any_addr(addr); +} +EXPORT_SYMBOL(rdma_cma_any_addr); static int cma_addr_cmp(struct sockaddr *src, struct sockaddr *dst) { if (src->sa_family != dst->sa_family) return -1; switch (src->sa_family) { case AF_INET: return ((struct sockaddr_in *) src)->sin_addr.s_addr != ((struct sockaddr_in *) dst)->sin_addr.s_addr; default: return ipv6_addr_cmp(&((struct sockaddr_in6 *) src)->sin6_addr, &((struct sockaddr_in6 *) dst)->sin6_addr); } } static inline __be16 cma_port(struct sockaddr *addr) { if (addr->sa_family == AF_INET) return ((struct sockaddr_in *) addr)->sin_port; else return ((struct sockaddr_in6 *) addr)->sin6_port; } static inline int cma_any_port(struct sockaddr *addr) { return !cma_port(addr); } static int cma_get_net_info(void *hdr, enum rdma_port_space ps, u8 *ip_ver, __be16 *port, union cma_ip_addr **src, union cma_ip_addr **dst) { switch (ps) { case RDMA_PS_SDP: if (sdp_get_majv(((struct sdp_hh *) hdr)->sdp_version) != SDP_MAJ_VERSION) return -EINVAL; *ip_ver = sdp_get_ip_ver(hdr); *port = ((struct sdp_hh *) hdr)->port; *src = &((struct sdp_hh *) hdr)->src_addr; *dst = &((struct sdp_hh *) hdr)->dst_addr; break; default: if (((struct cma_hdr *) hdr)->cma_version != CMA_VERSION) return -EINVAL; *ip_ver = cma_get_ip_ver(hdr); *port = ((struct cma_hdr *) hdr)->port; *src = &((struct cma_hdr *) hdr)->src_addr; *dst = &((struct cma_hdr *) hdr)->dst_addr; break; } if (*ip_ver != 4 && *ip_ver != 6) return -EINVAL; return 0; } static void cma_save_net_info(struct rdma_addr *addr, struct rdma_addr *listen_addr, u8 ip_ver, __be16 port, union cma_ip_addr *src, union cma_ip_addr *dst) { struct sockaddr_in *listen4, *ip4; struct sockaddr_in6 *listen6, *ip6; switch (ip_ver) { case 4: listen4 = (struct sockaddr_in *) &listen_addr->src_addr; ip4 = (struct sockaddr_in *) &addr->src_addr; ip4->sin_family = listen4->sin_family; ip4->sin_addr.s_addr = dst->ip4.addr; ip4->sin_port = listen4->sin_port; ip4 = (struct sockaddr_in *) &addr->dst_addr; ip4->sin_family = listen4->sin_family; ip4->sin_addr.s_addr = src->ip4.addr; ip4->sin_port = port; break; case 6: listen6 = (struct sockaddr_in6 *) &listen_addr->src_addr; ip6 = (struct sockaddr_in6 *) &addr->src_addr; ip6->sin6_family = listen6->sin6_family; ip6->sin6_addr = dst->ip6; ip6->sin6_port = listen6->sin6_port; ip6 = (struct sockaddr_in6 *) &addr->dst_addr; ip6->sin6_family = listen6->sin6_family; ip6->sin6_addr = src->ip6; ip6->sin6_port = port; break; default: break; } } static inline int cma_user_data_offset(enum rdma_port_space ps) { switch (ps) { case RDMA_PS_SDP: return 0; default: return sizeof(struct cma_hdr); } } static void cma_cancel_route(struct rdma_id_private *id_priv) { switch (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)) { case IB_LINK_LAYER_INFINIBAND: if (id_priv->query) ib_sa_cancel_query(id_priv->query_id, id_priv->query); break; default: break; } } static void cma_cancel_listens(struct rdma_id_private *id_priv) { struct rdma_id_private *dev_id_priv; /* * Remove from listen_any_list to prevent added devices from spawning * additional listen requests. */ mutex_lock(&lock); list_del(&id_priv->list); while (!list_empty(&id_priv->listen_list)) { dev_id_priv = list_entry(id_priv->listen_list.next, struct rdma_id_private, listen_list); /* sync with device removal to avoid duplicate destruction */ list_del_init(&dev_id_priv->list); list_del(&dev_id_priv->listen_list); mutex_unlock(&lock); rdma_destroy_id(&dev_id_priv->id); mutex_lock(&lock); } mutex_unlock(&lock); } static void cma_cancel_operation(struct rdma_id_private *id_priv, enum rdma_cm_state state) { switch (state) { case RDMA_CM_ADDR_QUERY: rdma_addr_cancel(&id_priv->id.route.addr.dev_addr); break; case RDMA_CM_ROUTE_QUERY: cma_cancel_route(id_priv); break; case RDMA_CM_LISTEN: if (cma_any_addr((struct sockaddr *) &id_priv->id.route.addr.src_addr) && !id_priv->cma_dev) cma_cancel_listens(id_priv); break; default: break; } } static void cma_release_port(struct rdma_id_private *id_priv) { struct rdma_bind_list *bind_list; mutex_lock(&lock); bind_list = id_priv->bind_list; if (!bind_list) { mutex_unlock(&lock); return; } hlist_del(&id_priv->node); id_priv->bind_list = NULL; if (hlist_empty(&bind_list->owners)) { idr_remove(bind_list->ps, bind_list->port); kfree(bind_list); } mutex_unlock(&lock); if (id_priv->sock) sock_release(id_priv->sock); } static void cma_leave_mc_groups(struct rdma_id_private *id_priv) { struct cma_multicast *mc; while (!list_empty(&id_priv->mc_list)) { mc = container_of(id_priv->mc_list.next, struct cma_multicast, list); list_del(&mc->list); switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) { case IB_LINK_LAYER_INFINIBAND: ib_sa_free_multicast(mc->multicast.ib); kfree(mc); break; case IB_LINK_LAYER_ETHERNET: kref_put(&mc->mcref, release_mc); break; default: break; } } } static void __rdma_free(struct work_struct *work) { struct rdma_id_private *id_priv; id_priv = container_of(work, struct rdma_id_private, work); wait_for_completion(&id_priv->comp); if (id_priv->internal_id) cma_deref_id(id_priv->id.context); kfree(id_priv->id.route.path_rec); kfree(id_priv); } void rdma_destroy_id(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; enum rdma_cm_state state; unsigned long flags; struct ib_cm_id *ib; id_priv = container_of(id, struct rdma_id_private, id); state = cma_exch(id_priv, RDMA_CM_DESTROYING); cma_cancel_operation(id_priv, state); /* * Wait for any active callback to finish. New callbacks will find * the id_priv state set to destroying and abort. */ mutex_lock(&id_priv->handler_mutex); mutex_unlock(&id_priv->handler_mutex); if (id_priv->cma_dev) { switch (rdma_node_get_transport(id_priv->id.device->node_type)) { case RDMA_TRANSPORT_IB: spin_lock_irqsave(&id_priv->cm_lock, flags); if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib)) { ib = id_priv->cm_id.ib; id_priv->cm_id.ib = NULL; spin_unlock_irqrestore(&id_priv->cm_lock, flags); ib_destroy_cm_id(ib); } else spin_unlock_irqrestore(&id_priv->cm_lock, flags); break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: if (id_priv->cm_id.iw) iw_destroy_cm_id(id_priv->cm_id.iw); break; default: break; } cma_leave_mc_groups(id_priv); cma_release_dev(id_priv); } cma_release_port(id_priv); cma_deref_id(id_priv); INIT_WORK(&id_priv->work, __rdma_free); queue_work(cma_free_wq, &id_priv->work); } EXPORT_SYMBOL(rdma_destroy_id); static int cma_rep_recv(struct rdma_id_private *id_priv) { int ret; ret = cma_modify_qp_rtr(id_priv, NULL); if (ret) goto reject; ret = cma_modify_qp_rts(id_priv, NULL); if (ret) goto reject; cma_dbg(id_priv, "sending RTU\n"); ret = ib_send_cm_rtu(id_priv->cm_id.ib, NULL, 0); if (ret) goto reject; return 0; reject: cma_modify_qp_err(id_priv); cma_dbg(id_priv, "sending REJ\n"); ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, NULL, 0); return ret; } static int cma_verify_rep(struct rdma_id_private *id_priv, void *data) { if (id_priv->id.ps == RDMA_PS_SDP && sdp_get_majv(((struct sdp_hah *) data)->sdp_version) != SDP_MAJ_VERSION) return -EINVAL; return 0; } static void cma_set_rep_event_data(struct rdma_cm_event *event, struct ib_cm_rep_event_param *rep_data, void *private_data) { event->param.conn.private_data = private_data; event->param.conn.private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE; event->param.conn.responder_resources = rep_data->responder_resources; event->param.conn.initiator_depth = rep_data->initiator_depth; event->param.conn.flow_control = rep_data->flow_control; event->param.conn.rnr_retry_count = rep_data->rnr_retry_count; event->param.conn.srq = rep_data->srq; event->param.conn.qp_num = rep_data->remote_qpn; } static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; struct rdma_cm_event event; int ret = 0; if ((ib_event->event != IB_CM_TIMEWAIT_EXIT && cma_disable_callback(id_priv, RDMA_CM_CONNECT)) || (ib_event->event == IB_CM_TIMEWAIT_EXIT && cma_disable_callback(id_priv, RDMA_CM_DISCONNECT))) return 0; memset(&event, 0, sizeof event); switch (ib_event->event) { case IB_CM_REQ_ERROR: case IB_CM_REP_ERROR: event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = -ETIMEDOUT; break; case IB_CM_REP_RECEIVED: event.status = cma_verify_rep(id_priv, ib_event->private_data); if (event.status) event.event = RDMA_CM_EVENT_CONNECT_ERROR; else if (id_priv->id.qp && id_priv->id.ps != RDMA_PS_SDP) { event.status = cma_rep_recv(id_priv); event.event = event.status ? RDMA_CM_EVENT_CONNECT_ERROR : RDMA_CM_EVENT_ESTABLISHED; } else event.event = RDMA_CM_EVENT_CONNECT_RESPONSE; cma_set_rep_event_data(&event, &ib_event->param.rep_rcvd, ib_event->private_data); break; case IB_CM_RTU_RECEIVED: case IB_CM_USER_ESTABLISHED: event.event = RDMA_CM_EVENT_ESTABLISHED; break; case IB_CM_DREQ_ERROR: event.status = -ETIMEDOUT; /* fall through */ case IB_CM_DREQ_RECEIVED: case IB_CM_DREP_RECEIVED: if (!cma_comp_exch(id_priv, RDMA_CM_CONNECT, RDMA_CM_DISCONNECT)) goto out; event.event = RDMA_CM_EVENT_DISCONNECTED; break; case IB_CM_TIMEWAIT_EXIT: event.event = RDMA_CM_EVENT_TIMEWAIT_EXIT; break; case IB_CM_MRA_RECEIVED: /* ignore event */ goto out; case IB_CM_REJ_RECEIVED: cma_modify_qp_err(id_priv); event.status = ib_event->param.rej_rcvd.reason; event.event = RDMA_CM_EVENT_REJECTED; event.param.conn.private_data = ib_event->private_data; event.param.conn.private_data_len = IB_CM_REJ_PRIVATE_DATA_SIZE; break; default: printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d\n", ib_event->event); goto out; } ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.ib = NULL; cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); rdma_destroy_id(&id_priv->id); return ret; } out: mutex_unlock(&id_priv->handler_mutex); return ret; } static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv; struct rdma_cm_id *id; struct rdma_route *rt; union cma_ip_addr *src, *dst; __be16 port; u8 ip_ver; int ret; if (cma_get_net_info(ib_event->private_data, listen_id->ps, &ip_ver, &port, &src, &dst)) return NULL; id = rdma_create_id(listen_id->event_handler, listen_id->context, listen_id->ps, ib_event->param.req_rcvd.qp_type); if (IS_ERR(id)) return NULL; cma_save_net_info(&id->route.addr, &listen_id->route.addr, ip_ver, port, src, dst); rt = &id->route; rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1; rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL); if (!rt->path_rec) goto err; rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path; if (rt->num_paths == 2) rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; if (cma_any_addr((struct sockaddr *) &rt->addr.src_addr)) { rt->addr.dev_addr.dev_type = ARPHRD_INFINIBAND; rdma_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid); ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey)); } else { ret = rdma_translate_ip((struct sockaddr *) &rt->addr.src_addr, &rt->addr.dev_addr, NULL); if (ret) goto err; } rdma_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid); id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = RDMA_CM_CONNECT; return id_priv; err: rdma_destroy_id(id); return NULL; } static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv; struct rdma_cm_id *id; union cma_ip_addr *src, *dst; __be16 port; u8 ip_ver; int ret; id = rdma_create_id(listen_id->event_handler, listen_id->context, listen_id->ps, IB_QPT_UD); if (IS_ERR(id)) return NULL; if (cma_get_net_info(ib_event->private_data, listen_id->ps, &ip_ver, &port, &src, &dst)) goto err; cma_save_net_info(&id->route.addr, &listen_id->route.addr, ip_ver, port, src, dst); if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) { ret = rdma_translate_ip((struct sockaddr *) &id->route.addr.src_addr, &id->route.addr.dev_addr, NULL); if (ret) goto err; } id_priv = container_of(id, struct rdma_id_private, id); id_priv->state = RDMA_CM_CONNECT; return id_priv; err: rdma_destroy_id(id); return NULL; } static void cma_set_req_event_data(struct rdma_cm_event *event, struct ib_cm_req_event_param *req_data, void *private_data, int offset) { event->param.conn.private_data = private_data + offset; event->param.conn.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - offset; event->param.conn.responder_resources = req_data->responder_resources; event->param.conn.initiator_depth = req_data->initiator_depth; event->param.conn.flow_control = req_data->flow_control; event->param.conn.retry_count = req_data->retry_count; event->param.conn.rnr_retry_count = req_data->rnr_retry_count; event->param.conn.srq = req_data->srq; event->param.conn.qp_num = req_data->remote_qpn; } static int cma_check_req_qp_type(struct rdma_cm_id *id, struct ib_cm_event *ib_event) { return (((ib_event->event == IB_CM_REQ_RECEIVED) && (ib_event->param.req_rcvd.qp_type == id->qp_type)) || ((ib_event->event == IB_CM_SIDR_REQ_RECEIVED) && (id->qp_type == IB_QPT_UD)) || (!id->qp_type)); } static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *listen_id, *conn_id; struct rdma_cm_event event; int offset, ret; u8 smac[ETH_ALEN]; u8 alt_smac[ETH_ALEN]; u8 *psmac = smac; u8 *palt_smac = alt_smac; int is_iboe = ((rdma_node_get_transport(cm_id->device->node_type) == RDMA_TRANSPORT_IB) && (rdma_port_get_link_layer(cm_id->device, ib_event->param.req_rcvd.port) == IB_LINK_LAYER_ETHERNET)); int is_sidr = 0; listen_id = cm_id->context; if (!cma_check_req_qp_type(&listen_id->id, ib_event)) return -EINVAL; if (cma_disable_callback(listen_id, RDMA_CM_LISTEN)) return -ECONNABORTED; memset(&event, 0, sizeof event); offset = cma_user_data_offset(listen_id->id.ps); event.event = RDMA_CM_EVENT_CONNECT_REQUEST; if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED) { is_sidr = 1; conn_id = cma_new_udp_id(&listen_id->id, ib_event); event.param.ud.private_data = ib_event->private_data + offset; event.param.ud.private_data_len = IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE - offset; } else { conn_id = cma_new_conn_id(&listen_id->id, ib_event); cma_set_req_event_data(&event, &ib_event->param.req_rcvd, ib_event->private_data, offset); } if (!conn_id) { ret = -ENOMEM; goto err1; } mutex_lock_nested(&conn_id->handler_mutex, SINGLE_DEPTH_NESTING); ret = cma_acquire_dev(conn_id); if (ret) goto err2; conn_id->cm_id.ib = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_ib_handler; /* * Protect against the user destroying conn_id from another thread * until we're done accessing it. */ atomic_inc(&conn_id->refcount); ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) goto err3; if (is_iboe && !is_sidr) { if (ib_event->param.req_rcvd.primary_path != NULL) rdma_addr_find_smac_by_sgid( &ib_event->param.req_rcvd.primary_path->sgid, psmac, NULL); else psmac = NULL; if (ib_event->param.req_rcvd.alternate_path != NULL) rdma_addr_find_smac_by_sgid( &ib_event->param.req_rcvd.alternate_path->sgid, palt_smac, NULL); else palt_smac = NULL; } /* * Acquire mutex to prevent user executing rdma_destroy_id() * while we're accessing the cm_id. */ mutex_lock(&lock); if (is_iboe && !is_sidr) ib_update_cm_av(cm_id, psmac, palt_smac); if (cma_comp(conn_id, RDMA_CM_CONNECT) && (conn_id->id.qp_type != IB_QPT_UD)) { cma_dbg(container_of(&conn_id->id, struct rdma_id_private, id), "sending MRA\n"); ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0); } mutex_unlock(&lock); mutex_unlock(&conn_id->handler_mutex); mutex_unlock(&listen_id->handler_mutex); cma_deref_id(conn_id); return 0; err3: cma_deref_id(conn_id); /* Destroy the CM ID by returning a non-zero value. */ conn_id->cm_id.ib = NULL; err2: cma_exch(conn_id, RDMA_CM_DESTROYING); mutex_unlock(&conn_id->handler_mutex); err1: mutex_unlock(&listen_id->handler_mutex); if (conn_id) rdma_destroy_id(&conn_id->id); return ret; } static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) { return cpu_to_be64(((u64)ps << 16) + be16_to_cpu(cma_port(addr))); } static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr, struct ib_cm_compare_data *compare) { struct cma_hdr *cma_data, *cma_mask; struct sdp_hh *sdp_data, *sdp_mask; __be32 ip4_addr; struct in6_addr ip6_addr; memset(compare, 0, sizeof *compare); cma_data = (void *) compare->data; cma_mask = (void *) compare->mask; sdp_data = (void *) compare->data; sdp_mask = (void *) compare->mask; switch (addr->sa_family) { case AF_INET: ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr; if (ps == RDMA_PS_SDP) { sdp_set_ip_ver(sdp_data, 4); sdp_set_ip_ver(sdp_mask, 0xF); if (!cma_any_addr(addr)) { sdp_data->dst_addr.ip4.addr = ip4_addr; sdp_mask->dst_addr.ip4.addr = htonl(~0); } } else { cma_set_ip_ver(cma_data, 4); cma_set_ip_ver(cma_mask, 0xF); if (!cma_any_addr(addr)) { cma_data->dst_addr.ip4.addr = ip4_addr; cma_mask->dst_addr.ip4.addr = htonl(~0); } } break; case AF_INET6: ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr; if (ps == RDMA_PS_SDP) { sdp_set_ip_ver(sdp_data, 6); sdp_set_ip_ver(sdp_mask, 0xF); if (!cma_any_addr(addr)) { sdp_data->dst_addr.ip6 = ip6_addr; memset(&sdp_mask->dst_addr.ip6, 0xFF, sizeof(sdp_mask->dst_addr.ip6)); } } else { cma_set_ip_ver(cma_data, 6); cma_set_ip_ver(cma_mask, 0xF); if (!cma_any_addr(addr)) { cma_data->dst_addr.ip6 = ip6_addr; memset(&cma_mask->dst_addr.ip6, 0xFF, sizeof(cma_mask->dst_addr.ip6)); } } break; default: break; } } static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) { struct rdma_id_private *id_priv = iw_id->context; struct rdma_cm_event event; struct sockaddr_in *sin; int ret = 0; if (cma_disable_callback(id_priv, RDMA_CM_CONNECT)) return 0; memset(&event, 0, sizeof event); switch (iw_event->event) { case IW_CM_EVENT_CLOSE: event.event = RDMA_CM_EVENT_DISCONNECTED; break; case IW_CM_EVENT_CONNECT_REPLY: sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; *sin = iw_event->local_addr; sin = (struct sockaddr_in *) &id_priv->id.route.addr.dst_addr; *sin = iw_event->remote_addr; switch ((int)iw_event->status) { case 0: event.event = RDMA_CM_EVENT_ESTABLISHED; event.param.conn.initiator_depth = iw_event->ird; event.param.conn.responder_resources = iw_event->ord; break; case -ECONNRESET: case -ECONNREFUSED: event.event = RDMA_CM_EVENT_REJECTED; break; case -ETIMEDOUT: event.event = RDMA_CM_EVENT_UNREACHABLE; break; default: event.event = RDMA_CM_EVENT_CONNECT_ERROR; break; } break; case IW_CM_EVENT_ESTABLISHED: event.event = RDMA_CM_EVENT_ESTABLISHED; event.param.conn.initiator_depth = iw_event->ird; event.param.conn.responder_resources = iw_event->ord; break; default: BUG_ON(1); } event.status = iw_event->status; event.param.conn.private_data = iw_event->private_data; event.param.conn.private_data_len = iw_event->private_data_len; ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.iw = NULL; cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); rdma_destroy_id(&id_priv->id); return ret; } mutex_unlock(&id_priv->handler_mutex); return ret; } static int iw_conn_req_handler(struct iw_cm_id *cm_id, struct iw_cm_event *iw_event) { struct rdma_cm_id *new_cm_id; struct rdma_id_private *listen_id, *conn_id; struct sockaddr_in *sin; struct net_device *dev = NULL; struct rdma_cm_event event; int ret; struct ib_device_attr attr; listen_id = cm_id->context; if (cma_disable_callback(listen_id, RDMA_CM_LISTEN)) return -ECONNABORTED; /* Create a new RDMA id for the new IW CM ID */ new_cm_id = rdma_create_id(listen_id->id.event_handler, listen_id->id.context, RDMA_PS_TCP, IB_QPT_RC); if (IS_ERR(new_cm_id)) { ret = -ENOMEM; goto out; } conn_id = container_of(new_cm_id, struct rdma_id_private, id); mutex_lock_nested(&conn_id->handler_mutex, SINGLE_DEPTH_NESTING); conn_id->state = RDMA_CM_CONNECT; dev = ip_dev_find(&init_net, iw_event->local_addr.sin_addr.s_addr); if (!dev) { ret = -EADDRNOTAVAIL; mutex_unlock(&conn_id->handler_mutex); rdma_destroy_id(new_cm_id); goto out; } ret = rdma_copy_addr(&conn_id->id.route.addr.dev_addr, dev, NULL); if (ret) { mutex_unlock(&conn_id->handler_mutex); rdma_destroy_id(new_cm_id); goto out; } ret = cma_acquire_dev(conn_id); if (ret) { mutex_unlock(&conn_id->handler_mutex); rdma_destroy_id(new_cm_id); goto out; } conn_id->cm_id.iw = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_iw_handler; sin = (struct sockaddr_in *) &new_cm_id->route.addr.src_addr; *sin = iw_event->local_addr; sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr; *sin = iw_event->remote_addr; ret = ib_query_device(conn_id->id.device, &attr); if (ret) { mutex_unlock(&conn_id->handler_mutex); rdma_destroy_id(new_cm_id); goto out; } memset(&event, 0, sizeof event); event.event = RDMA_CM_EVENT_CONNECT_REQUEST; event.param.conn.private_data = iw_event->private_data; event.param.conn.private_data_len = iw_event->private_data_len; event.param.conn.initiator_depth = iw_event->ird; event.param.conn.responder_resources = iw_event->ord; /* * Protect against the user destroying conn_id from another thread * until we're done accessing it. */ atomic_inc(&conn_id->refcount); ret = conn_id->id.event_handler(&conn_id->id, &event); if (ret) { /* User wants to destroy the CM ID */ conn_id->cm_id.iw = NULL; cma_exch(conn_id, RDMA_CM_DESTROYING); mutex_unlock(&conn_id->handler_mutex); cma_deref_id(conn_id); rdma_destroy_id(&conn_id->id); goto out; } mutex_unlock(&conn_id->handler_mutex); cma_deref_id(conn_id); out: if (dev) dev_put(dev); mutex_unlock(&listen_id->handler_mutex); return ret; } static int cma_ib_listen(struct rdma_id_private *id_priv) { struct ib_cm_compare_data compare_data; struct sockaddr *addr; struct ib_cm_id *id; __be64 svc_id; int ret; id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv); if (IS_ERR(id)) return PTR_ERR(id); id_priv->cm_id.ib = id; addr = (struct sockaddr *) &id_priv->id.route.addr.src_addr; svc_id = cma_get_service_id(id_priv->id.ps, addr); if (cma_any_addr(addr) && !id_priv->afonly) ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, NULL); else { cma_set_compare_data(id_priv->id.ps, addr, &compare_data); ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, &compare_data); } if (ret) { ib_destroy_cm_id(id_priv->cm_id.ib); id_priv->cm_id.ib = NULL; } return ret; } static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog) { int ret; struct sockaddr_in *sin; struct iw_cm_id *id; id = iw_create_cm_id(id_priv->id.device, id_priv->sock, iw_conn_req_handler, id_priv); if (IS_ERR(id)) return PTR_ERR(id); id_priv->cm_id.iw = id; sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; id_priv->cm_id.iw->local_addr = *sin; ret = iw_cm_listen(id_priv->cm_id.iw, backlog); if (ret) { iw_destroy_cm_id(id_priv->cm_id.iw); id_priv->cm_id.iw = NULL; } return ret; } static int cma_listen_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) { struct rdma_id_private *id_priv = id->context; id->context = id_priv->id.context; id->event_handler = id_priv->id.event_handler; return id_priv->id.event_handler(id, event); } static void cma_listen_on_dev(struct rdma_id_private *id_priv, struct cma_device *cma_dev) { struct rdma_id_private *dev_id_priv; struct rdma_cm_id *id; int ret; id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps, id_priv->id.qp_type); if (IS_ERR(id)) return; dev_id_priv = container_of(id, struct rdma_id_private, id); dev_id_priv->state = RDMA_CM_ADDR_BOUND; + dev_id_priv->sock = id_priv->sock; memcpy(&id->route.addr.src_addr, &id_priv->id.route.addr.src_addr, ip_addr_size((struct sockaddr *) &id_priv->id.route.addr.src_addr)); cma_attach_to_dev(dev_id_priv, cma_dev); list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list); atomic_inc(&id_priv->refcount); dev_id_priv->internal_id = 1; dev_id_priv->afonly = id_priv->afonly; ret = rdma_listen(id, id_priv->backlog); if (ret) cma_warn(id_priv, "cma_listen_on_dev, error %d, listening on device %s\n", ret, cma_dev->device->name); } static void cma_listen_on_all(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; mutex_lock(&lock); list_add_tail(&id_priv->list, &listen_any_list); list_for_each_entry(cma_dev, &dev_list, list) cma_listen_on_dev(id_priv, cma_dev); mutex_unlock(&lock); } void rdma_set_service_type(struct rdma_cm_id *id, int tos) { struct rdma_id_private *id_priv; id_priv = container_of(id, struct rdma_id_private, id); id_priv->tos = (u8) tos; } EXPORT_SYMBOL(rdma_set_service_type); void rdma_set_timeout(struct rdma_cm_id *id, int timeout) { struct rdma_id_private *id_priv; id_priv = container_of(id, struct rdma_id_private, id); id_priv->qp_timeout = (u8) timeout; } EXPORT_SYMBOL(rdma_set_timeout); static void cma_query_handler(int status, struct ib_sa_path_rec *path_rec, void *context) { struct cma_work *work = context; struct rdma_route *route; route = &work->id->id.route; if (!status) { route->num_paths = 1; *route->path_rec = *path_rec; } else { work->old_state = RDMA_CM_ROUTE_QUERY; work->new_state = RDMA_CM_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_ERROR; work->event.status = status; } queue_work(cma_wq, &work->work); } static int cma_query_ib_route(struct rdma_id_private *id_priv, int timeout_ms, struct cma_work *work) { struct rdma_addr *addr = &id_priv->id.route.addr; struct ib_sa_path_rec path_rec; ib_sa_comp_mask comp_mask; struct sockaddr_in6 *sin6; memset(&path_rec, 0, sizeof path_rec); rdma_addr_get_sgid(&addr->dev_addr, &path_rec.sgid); rdma_addr_get_dgid(&addr->dev_addr, &path_rec.dgid); path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(&addr->dev_addr)); path_rec.numb_path = 1; path_rec.reversible = 1; path_rec.service_id = cma_get_service_id(id_priv->id.ps, (struct sockaddr *) &addr->dst_addr); comp_mask = IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID | IB_SA_PATH_REC_PKEY | IB_SA_PATH_REC_NUMB_PATH | IB_SA_PATH_REC_REVERSIBLE | IB_SA_PATH_REC_SERVICE_ID; if (addr->src_addr.ss_family == AF_INET) { path_rec.qos_class = cpu_to_be16((u16) id_priv->tos); comp_mask |= IB_SA_PATH_REC_QOS_CLASS; } else { sin6 = (struct sockaddr_in6 *) &addr->src_addr; path_rec.traffic_class = (u8) (be32_to_cpu(sin6->sin6_flowinfo) >> 20); comp_mask |= IB_SA_PATH_REC_TRAFFIC_CLASS; } id_priv->query_id = ib_sa_path_rec_get(&sa_client, id_priv->id.device, id_priv->id.port_num, &path_rec, comp_mask, timeout_ms, GFP_KERNEL, cma_query_handler, work, &id_priv->query); return (id_priv->query_id < 0) ? id_priv->query_id : 0; } static void cma_work_handler(struct work_struct *_work) { struct cma_work *work = container_of(_work, struct cma_work, work); struct rdma_id_private *id_priv = work->id; int destroy = 0; mutex_lock(&id_priv->handler_mutex); if (!cma_comp_exch(id_priv, work->old_state, work->new_state)) goto out; if (id_priv->id.event_handler(&id_priv->id, &work->event)) { cma_exch(id_priv, RDMA_CM_DESTROYING); destroy = 1; } out: mutex_unlock(&id_priv->handler_mutex); cma_deref_id(id_priv); if (destroy) rdma_destroy_id(&id_priv->id); kfree(work); } static void cma_ndev_work_handler(struct work_struct *_work) { struct cma_ndev_work *work = container_of(_work, struct cma_ndev_work, work); struct rdma_id_private *id_priv = work->id; int destroy = 0; mutex_lock(&id_priv->handler_mutex); if (id_priv->state == RDMA_CM_DESTROYING || id_priv->state == RDMA_CM_DEVICE_REMOVAL) goto out; if (id_priv->id.event_handler(&id_priv->id, &work->event)) { cma_exch(id_priv, RDMA_CM_DESTROYING); destroy = 1; } out: mutex_unlock(&id_priv->handler_mutex); cma_deref_id(id_priv); if (destroy) rdma_destroy_id(&id_priv->id); kfree(work); } static int cma_resolve_ib_route(struct rdma_id_private *id_priv, int timeout_ms) { struct rdma_route *route = &id_priv->id.route; struct cma_work *work; int ret; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); work->old_state = RDMA_CM_ROUTE_QUERY; work->new_state = RDMA_CM_ROUTE_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; route->path_rec = kmalloc(sizeof *route->path_rec, GFP_KERNEL); if (!route->path_rec) { ret = -ENOMEM; goto err1; } ret = cma_query_ib_route(id_priv, timeout_ms, work); if (ret) goto err2; return 0; err2: kfree(route->path_rec); route->path_rec = NULL; err1: kfree(work); return ret; } int rdma_set_ib_paths(struct rdma_cm_id *id, struct ib_sa_path_rec *path_rec, int num_paths) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ROUTE_RESOLVED)) return -EINVAL; id->route.path_rec = kmemdup(path_rec, sizeof *path_rec * num_paths, GFP_KERNEL); if (!id->route.path_rec) { ret = -ENOMEM; goto err; } id->route.num_paths = num_paths; return 0; err: cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_ADDR_RESOLVED); return ret; } EXPORT_SYMBOL(rdma_set_ib_paths); static int cma_resolve_iw_route(struct rdma_id_private *id_priv, int timeout_ms) { struct cma_work *work; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); work->old_state = RDMA_CM_ROUTE_QUERY; work->new_state = RDMA_CM_ROUTE_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; queue_work(cma_wq, &work->work); return 0; } static u8 tos_to_sl(u8 tos) { return def_prec2sl & 7; } static int cma_resolve_iboe_route(struct rdma_id_private *id_priv) { struct rdma_route *route = &id_priv->id.route; struct rdma_addr *addr = &route->addr; struct cma_work *work; int ret; struct sockaddr_in *src_addr = (struct sockaddr_in *)&route->addr.src_addr; struct sockaddr_in *dst_addr = (struct sockaddr_in *)&route->addr.dst_addr; struct net_device *ndev = NULL; if (src_addr->sin_family != dst_addr->sin_family) return -EINVAL; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); route->path_rec = kzalloc(sizeof *route->path_rec, GFP_KERNEL); if (!route->path_rec) { ret = -ENOMEM; goto err1; } route->num_paths = 1; if (addr->dev_addr.bound_dev_if) ndev = dev_get_by_index(&init_net, addr->dev_addr.bound_dev_if); if (!ndev) { ret = -ENODEV; goto err2; } route->path_rec->vlan_id = rdma_vlan_dev_vlan_id(ndev); memcpy(route->path_rec->dmac, addr->dev_addr.dst_dev_addr, ETH_ALEN); memcpy(route->path_rec->smac, IF_LLADDR(ndev), ndev->if_addrlen); rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr, &route->path_rec->sgid); rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.dst_addr, &route->path_rec->dgid); route->path_rec->hop_limit = 1; route->path_rec->reversible = 1; route->path_rec->pkey = cpu_to_be16(0xffff); route->path_rec->mtu_selector = IB_SA_EQ; route->path_rec->sl = tos_to_sl(id_priv->tos); route->path_rec->mtu = iboe_get_mtu(ndev->if_mtu); route->path_rec->rate_selector = IB_SA_EQ; route->path_rec->rate = iboe_get_rate(ndev); dev_put(ndev); route->path_rec->packet_life_time_selector = IB_SA_EQ; route->path_rec->packet_life_time = CMA_IBOE_PACKET_LIFETIME; if (!route->path_rec->mtu) { ret = -EINVAL; goto err2; } work->old_state = RDMA_CM_ROUTE_QUERY; work->new_state = RDMA_CM_ROUTE_RESOLVED; work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED; work->event.status = 0; queue_work(cma_wq, &work->work); return 0; err2: kfree(route->path_rec); route->path_rec = NULL; err1: kfree(work); return ret; } int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ROUTE_QUERY)) return -EINVAL; atomic_inc(&id_priv->refcount); switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: switch (rdma_port_get_link_layer(id->device, id->port_num)) { case IB_LINK_LAYER_INFINIBAND: ret = cma_resolve_ib_route(id_priv, timeout_ms); break; case IB_LINK_LAYER_ETHERNET: ret = cma_resolve_iboe_route(id_priv); break; default: ret = -ENOSYS; } break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = cma_resolve_iw_route(id_priv, timeout_ms); break; default: ret = -ENOSYS; break; } if (ret) goto err; return 0; err: cma_comp_exch(id_priv, RDMA_CM_ROUTE_QUERY, RDMA_CM_ADDR_RESOLVED); cma_deref_id(id_priv); return ret; } EXPORT_SYMBOL(rdma_resolve_route); int rdma_enable_apm(struct rdma_cm_id *id, enum alt_path_type alt_type) { /* APM is not supported yet */ return -EINVAL; } EXPORT_SYMBOL(rdma_enable_apm); static int cma_bind_loopback(struct rdma_id_private *id_priv) { struct cma_device *cma_dev; struct ib_port_attr port_attr; union ib_gid gid; u16 pkey; int ret; u8 p; mutex_lock(&lock); if (list_empty(&dev_list)) { ret = -ENODEV; goto out; } list_for_each_entry(cma_dev, &dev_list, list) for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p) if (!ib_query_port(cma_dev->device, p, &port_attr) && port_attr.state == IB_PORT_ACTIVE) goto port_found; p = 1; cma_dev = list_entry(dev_list.next, struct cma_device, list); port_found: ret = ib_get_cached_gid(cma_dev->device, p, 0, &gid); if (ret) goto out; ret = ib_get_cached_pkey(cma_dev->device, p, 0, &pkey); if (ret) goto out; id_priv->id.route.addr.dev_addr.dev_type = (rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ? ARPHRD_INFINIBAND : ARPHRD_ETHER; rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid); ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey); id_priv->id.port_num = p; cma_attach_to_dev(id_priv, cma_dev); out: mutex_unlock(&lock); return ret; } static void addr_handler(int status, struct sockaddr *src_addr, struct rdma_dev_addr *dev_addr, void *context) { struct rdma_id_private *id_priv = context; struct rdma_cm_event event; memset(&event, 0, sizeof event); mutex_lock(&id_priv->handler_mutex); if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_RESOLVED)) goto out; memcpy(&id_priv->id.route.addr.src_addr, src_addr, ip_addr_size(src_addr)); if (!status && !id_priv->cma_dev) status = cma_acquire_dev(id_priv); if (status) { if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ADDR_BOUND)) goto out; event.event = RDMA_CM_EVENT_ADDR_ERROR; event.status = status; } else event.event = RDMA_CM_EVENT_ADDR_RESOLVED; if (id_priv->id.event_handler(&id_priv->id, &event)) { cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); cma_deref_id(id_priv); rdma_destroy_id(&id_priv->id); return; } out: mutex_unlock(&id_priv->handler_mutex); cma_deref_id(id_priv); } static int cma_resolve_loopback(struct rdma_id_private *id_priv) { struct cma_work *work; struct sockaddr *src, *dst; union ib_gid gid; int ret; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; if (!id_priv->cma_dev) { ret = cma_bind_loopback(id_priv); if (ret) goto err; } rdma_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid); rdma_addr_set_dgid(&id_priv->id.route.addr.dev_addr, &gid); src = (struct sockaddr *) &id_priv->id.route.addr.src_addr; if (cma_zero_addr(src)) { dst = (struct sockaddr *) &id_priv->id.route.addr.dst_addr; if ((src->sa_family = dst->sa_family) == AF_INET) { ((struct sockaddr_in *)src)->sin_addr = ((struct sockaddr_in *)dst)->sin_addr; } else { ((struct sockaddr_in6 *)src)->sin6_addr = ((struct sockaddr_in6 *)dst)->sin6_addr; } } work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); work->old_state = RDMA_CM_ADDR_QUERY; work->new_state = RDMA_CM_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ADDR_RESOLVED; queue_work(cma_wq, &work->work); return 0; err: kfree(work); return ret; } static int cma_resolve_scif(struct rdma_id_private *id_priv) { struct cma_work *work; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; /* we probably can leave it empty here */ work->id = id_priv; INIT_WORK(&work->work, cma_work_handler); work->old_state = RDMA_CM_ADDR_QUERY; work->new_state = RDMA_CM_ADDR_RESOLVED; work->event.event = RDMA_CM_EVENT_ADDR_RESOLVED; queue_work(cma_wq, &work->work); return 0; } static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr *dst_addr) { if (!src_addr || !src_addr->sa_family) { src_addr = (struct sockaddr *) &id->route.addr.src_addr; if ((src_addr->sa_family = dst_addr->sa_family) == AF_INET6) { ((struct sockaddr_in6 *) src_addr)->sin6_scope_id = ((struct sockaddr_in6 *) dst_addr)->sin6_scope_id; } } if (!cma_any_addr(src_addr)) return rdma_bind_addr(id, src_addr); else { struct sockaddr_in addr_in; memset(&addr_in, 0, sizeof addr_in); addr_in.sin_family = dst_addr->sa_family; addr_in.sin_len = sizeof addr_in; return rdma_bind_addr(id, (struct sockaddr *) &addr_in); } } int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr *dst_addr, int timeout_ms) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (id_priv->state == RDMA_CM_IDLE) { ret = cma_bind_addr(id, src_addr, dst_addr); if (ret) return ret; } if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) return -EINVAL; atomic_inc(&id_priv->refcount); memcpy(&id->route.addr.dst_addr, dst_addr, ip_addr_size(dst_addr)); if (cma_any_addr(dst_addr)) ret = cma_resolve_loopback(id_priv); else if (id_priv->id.device && rdma_node_get_transport(id_priv->id.device->node_type) == RDMA_TRANSPORT_SCIF) ret = cma_resolve_scif(id_priv); else ret = rdma_resolve_ip(&addr_client, (struct sockaddr *) &id->route.addr.src_addr, dst_addr, &id->route.addr.dev_addr, timeout_ms, addr_handler, id_priv); if (ret) goto err; return 0; err: cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_BOUND); cma_deref_id(id_priv); return ret; } EXPORT_SYMBOL(rdma_resolve_addr); int rdma_set_reuseaddr(struct rdma_cm_id *id, int reuse) { struct rdma_id_private *id_priv; unsigned long flags; int ret; id_priv = container_of(id, struct rdma_id_private, id); spin_lock_irqsave(&id_priv->lock, flags); if (id_priv->state == RDMA_CM_IDLE) { id_priv->reuseaddr = reuse; ret = 0; } else { ret = -EINVAL; } spin_unlock_irqrestore(&id_priv->lock, flags); return ret; } EXPORT_SYMBOL(rdma_set_reuseaddr); int rdma_set_afonly(struct rdma_cm_id *id, int afonly) { struct rdma_id_private *id_priv; unsigned long flags; int ret; id_priv = container_of(id, struct rdma_id_private, id); spin_lock_irqsave(&id_priv->lock, flags); if (id_priv->state == RDMA_CM_IDLE || id_priv->state == RDMA_CM_ADDR_BOUND) { id_priv->options |= (1 << CMA_OPTION_AFONLY); id_priv->afonly = afonly; ret = 0; } else { ret = -EINVAL; } spin_unlock_irqrestore(&id_priv->lock, flags); return ret; } EXPORT_SYMBOL(rdma_set_afonly); static void cma_bind_port(struct rdma_bind_list *bind_list, struct rdma_id_private *id_priv) { struct sockaddr_in *sin; sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr; sin->sin_port = htons(bind_list->port); id_priv->bind_list = bind_list; hlist_add_head(&id_priv->node, &bind_list->owners); } static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv, unsigned short snum) { struct rdma_bind_list *bind_list; int port, ret; bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL); if (!bind_list) return -ENOMEM; do { ret = idr_get_new_above(ps, bind_list, snum, &port); } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL)); if (ret) goto err1; if (port != snum) { ret = -EADDRNOTAVAIL; goto err2; } bind_list->ps = ps; bind_list->port = (unsigned short) port; cma_bind_port(bind_list, id_priv); return 0; err2: idr_remove(ps, port); err1: kfree(bind_list); return ret; } static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv) { static unsigned int last_used_port; int low, high, remaining; unsigned int rover; inet_get_local_port_range(&low, &high); remaining = (high - low) + 1; rover = random() % remaining + low; retry: if (last_used_port != rover && !idr_find(ps, (unsigned short) rover)) { int ret = cma_alloc_port(ps, id_priv, rover); /* * Remember previously used port number in order to avoid * re-using same port immediately after it is closed. */ if (!ret) last_used_port = rover; if (ret != -EADDRNOTAVAIL) return ret; } if (--remaining) { rover++; if ((rover < low) || (rover > high)) rover = low; goto retry; } return -EADDRNOTAVAIL; } /* * Check that the requested port is available. This is called when trying to * bind to a specific port, or when trying to listen on a bound port. In * the latter case, the provided id_priv may already be on the bind_list, but * we still need to check that it's okay to start listening. */ static int cma_check_port(struct rdma_bind_list *bind_list, struct rdma_id_private *id_priv, uint8_t reuseaddr) { struct rdma_id_private *cur_id; struct sockaddr *addr, *cur_addr; addr = (struct sockaddr *) &id_priv->id.route.addr.src_addr; hlist_for_each_entry(cur_id, &bind_list->owners, node) { if (id_priv == cur_id) continue; if ((cur_id->state != RDMA_CM_LISTEN) && reuseaddr && cur_id->reuseaddr) continue; cur_addr = (struct sockaddr *) &cur_id->id.route.addr.src_addr; if (id_priv->afonly && cur_id->afonly && (addr->sa_family != cur_addr->sa_family)) continue; if (cma_any_addr(addr) || cma_any_addr(cur_addr)) return -EADDRNOTAVAIL; if (!cma_addr_cmp(addr, cur_addr)) return -EADDRINUSE; } return 0; } static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv) { struct rdma_bind_list *bind_list; unsigned short snum; int ret; snum = ntohs(cma_port((struct sockaddr *) &id_priv->id.route.addr.src_addr)); bind_list = idr_find(ps, snum); if (!bind_list) { ret = cma_alloc_port(ps, id_priv, snum); } else { ret = cma_check_port(bind_list, id_priv, id_priv->reuseaddr); if (!ret) cma_bind_port(bind_list, id_priv); } return ret; } static int cma_bind_listen(struct rdma_id_private *id_priv) { struct rdma_bind_list *bind_list = id_priv->bind_list; int ret = 0; mutex_lock(&lock); if (bind_list->owners.first->next) ret = cma_check_port(bind_list, id_priv, 0); mutex_unlock(&lock); return ret; } static int cma_get_tcp_port(struct rdma_id_private *id_priv) { int ret; int size; struct socket *sock; ret = sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock); if (ret) return ret; #ifdef __linux__ ret = sock->ops->bind(sock, (struct sockaddr *) &id_priv->id.route.addr.src_addr, ip_addr_size((struct sockaddr *) &id_priv->id.route.addr.src_addr)); #else ret = -sobind(sock, (struct sockaddr *)&id_priv->id.route.addr.src_addr, curthread); #endif if (ret) { sock_release(sock); return ret; } size = ip_addr_size((struct sockaddr *) &id_priv->id.route.addr.src_addr); ret = sock_getname(sock, (struct sockaddr *) &id_priv->id.route.addr.src_addr, &size, 0); if (ret) { sock_release(sock); return ret; } id_priv->sock = sock; return 0; } static int cma_get_port(struct rdma_id_private *id_priv) { struct idr *ps; int ret; switch (id_priv->id.ps) { case RDMA_PS_SDP: ps = &sdp_ps; break; case RDMA_PS_TCP: ps = &tcp_ps; if (unify_tcp_port_space) { ret = cma_get_tcp_port(id_priv); if (ret) goto out; } break; case RDMA_PS_UDP: ps = &udp_ps; break; case RDMA_PS_IPOIB: ps = &ipoib_ps; break; case RDMA_PS_IB: ps = &ib_ps; break; default: return -EPROTONOSUPPORT; } mutex_lock(&lock); if (cma_any_port((struct sockaddr *) &id_priv->id.route.addr.src_addr)) ret = cma_alloc_any_port(ps, id_priv); else ret = cma_use_port(ps, id_priv); mutex_unlock(&lock); out: return ret; } static int cma_check_linklocal(struct rdma_dev_addr *dev_addr, struct sockaddr *addr) { #if defined(INET6) struct sockaddr_in6 *sin6; if (addr->sa_family != AF_INET6) return 0; sin6 = (struct sockaddr_in6 *) addr; if (IN6_IS_SCOPE_LINKLOCAL(&sin6->sin6_addr) && !sin6->sin6_scope_id) return -EINVAL; dev_addr->bound_dev_if = sin6->sin6_scope_id; #endif return 0; } int rdma_listen(struct rdma_cm_id *id, int backlog) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (id_priv->state == RDMA_CM_IDLE) { ((struct sockaddr *) &id->route.addr.src_addr)->sa_family = AF_INET; ret = rdma_bind_addr(id, (struct sockaddr *) &id->route.addr.src_addr); if (ret) return ret; } if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_LISTEN)) return -EINVAL; if (id_priv->reuseaddr) { ret = cma_bind_listen(id_priv); if (ret) goto err; } id_priv->backlog = backlog; if (id->device) { switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: ret = cma_ib_listen(id_priv); if (ret) goto err; break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = cma_iw_listen(id_priv, backlog); if (ret) goto err; break; default: ret = -ENOSYS; goto err; } } else cma_listen_on_all(id_priv); return 0; err: id_priv->backlog = 0; cma_comp_exch(id_priv, RDMA_CM_LISTEN, RDMA_CM_ADDR_BOUND); return ret; } EXPORT_SYMBOL(rdma_listen); int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) { struct rdma_id_private *id_priv; int ret; #if defined(INET6) int ipv6only; size_t var_size = sizeof(int); #endif if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) return -EAFNOSUPPORT; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND)) return -EINVAL; ret = cma_check_linklocal(&id->route.addr.dev_addr, addr); if (ret) goto err1; memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr)); if (!cma_any_addr(addr)) { ret = rdma_translate_ip(addr, &id->route.addr.dev_addr, NULL); if (ret) goto err1; ret = cma_acquire_dev(id_priv); if (ret) goto err1; } if (!(id_priv->options & (1 << CMA_OPTION_AFONLY))) { if (addr->sa_family == AF_INET) id_priv->afonly = 1; #if defined(INET6) else if (addr->sa_family == AF_INET6) id_priv->afonly = kernel_sysctlbyname(&thread0, "net.inet6.ip6.v6only", &ipv6only, &var_size, NULL, 0, NULL, 0); #endif } ret = cma_get_port(id_priv); if (ret) goto err2; return 0; err2: if (id_priv->cma_dev) cma_release_dev(id_priv); err1: cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE); return ret; } EXPORT_SYMBOL(rdma_bind_addr); static int cma_format_hdr(void *hdr, enum rdma_port_space ps, struct rdma_route *route) { struct cma_hdr *cma_hdr; struct sdp_hh *sdp_hdr; if (route->addr.src_addr.ss_family == AF_INET) { struct sockaddr_in *src4, *dst4; src4 = (struct sockaddr_in *) &route->addr.src_addr; dst4 = (struct sockaddr_in *) &route->addr.dst_addr; switch (ps) { case RDMA_PS_SDP: sdp_hdr = hdr; if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) return -EINVAL; sdp_set_ip_ver(sdp_hdr, 4); sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; sdp_hdr->port = src4->sin_port; break; default: cma_hdr = hdr; cma_hdr->cma_version = CMA_VERSION; cma_set_ip_ver(cma_hdr, 4); cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; cma_hdr->port = src4->sin_port; break; } } else { struct sockaddr_in6 *src6, *dst6; src6 = (struct sockaddr_in6 *) &route->addr.src_addr; dst6 = (struct sockaddr_in6 *) &route->addr.dst_addr; switch (ps) { case RDMA_PS_SDP: sdp_hdr = hdr; if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) return -EINVAL; sdp_set_ip_ver(sdp_hdr, 6); sdp_hdr->src_addr.ip6 = src6->sin6_addr; sdp_hdr->dst_addr.ip6 = dst6->sin6_addr; sdp_hdr->port = src6->sin6_port; break; default: cma_hdr = hdr; cma_hdr->cma_version = CMA_VERSION; cma_set_ip_ver(cma_hdr, 6); cma_hdr->src_addr.ip6 = src6->sin6_addr; cma_hdr->dst_addr.ip6 = dst6->sin6_addr; cma_hdr->port = src6->sin6_port; break; } } return 0; } static int cma_sidr_rep_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct rdma_id_private *id_priv = cm_id->context; struct rdma_cm_event event; struct ib_cm_sidr_rep_event_param *rep = &ib_event->param.sidr_rep_rcvd; int ret = 0; if (cma_disable_callback(id_priv, RDMA_CM_CONNECT)) return 0; memset(&event, 0, sizeof event); switch (ib_event->event) { case IB_CM_SIDR_REQ_ERROR: event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = -ETIMEDOUT; break; case IB_CM_SIDR_REP_RECEIVED: event.param.ud.private_data = ib_event->private_data; event.param.ud.private_data_len = IB_CM_SIDR_REP_PRIVATE_DATA_SIZE; if (rep->status != IB_SIDR_SUCCESS) { event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = ib_event->param.sidr_rep_rcvd.status; break; } ret = cma_set_qkey(id_priv); if (ret) { event.event = RDMA_CM_EVENT_ADDR_ERROR; event.status = -EINVAL; break; } if (id_priv->qkey != rep->qkey) { event.event = RDMA_CM_EVENT_UNREACHABLE; event.status = -EINVAL; break; } ib_init_ah_from_path(id_priv->id.device, id_priv->id.port_num, id_priv->id.route.path_rec, &event.param.ud.ah_attr); event.param.ud.qp_num = rep->qpn; event.param.ud.qkey = rep->qkey; event.event = RDMA_CM_EVENT_ESTABLISHED; event.status = 0; break; default: printk(KERN_ERR "RDMA CMA: unexpected IB CM event: %d\n", ib_event->event); goto out; } ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.ib = NULL; cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); rdma_destroy_id(&id_priv->id); return ret; } out: mutex_unlock(&id_priv->handler_mutex); return ret; } static int cma_resolve_ib_udp(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_cm_sidr_req_param req; struct rdma_route *route; struct ib_cm_id *id; int ret; req.private_data_len = sizeof(struct cma_hdr) + conn_param->private_data_len; if (req.private_data_len < conn_param->private_data_len) return -EINVAL; req.private_data = kzalloc(req.private_data_len, GFP_ATOMIC); if (!req.private_data) return -ENOMEM; if (conn_param->private_data && conn_param->private_data_len) memcpy((void *) req.private_data + sizeof(struct cma_hdr), conn_param->private_data, conn_param->private_data_len); route = &id_priv->id.route; ret = cma_format_hdr((void *) req.private_data, id_priv->id.ps, route); if (ret) goto out; id = ib_create_cm_id(id_priv->id.device, cma_sidr_rep_handler, id_priv); if (IS_ERR(id)) { ret = PTR_ERR(id); goto out; } id_priv->cm_id.ib = id; req.path = route->path_rec; req.service_id = cma_get_service_id(id_priv->id.ps, (struct sockaddr *) &route->addr.dst_addr); req.timeout_ms = 1 << (cma_response_timeout - 8); req.max_cm_retries = CMA_MAX_CM_RETRIES; cma_dbg(id_priv, "sending SIDR\n"); ret = ib_send_cm_sidr_req(id_priv->cm_id.ib, &req); if (ret) { ib_destroy_cm_id(id_priv->cm_id.ib); id_priv->cm_id.ib = NULL; } out: kfree(req.private_data); return ret; } static int cma_connect_ib(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_cm_req_param req; struct rdma_route *route; void *private_data; struct ib_cm_id *id; int offset, ret; memset(&req, 0, sizeof req); offset = cma_user_data_offset(id_priv->id.ps); req.private_data_len = offset + conn_param->private_data_len; if (req.private_data_len < conn_param->private_data_len) return -EINVAL; private_data = kzalloc(req.private_data_len, GFP_ATOMIC); if (!private_data) return -ENOMEM; if (conn_param->private_data && conn_param->private_data_len) memcpy(private_data + offset, conn_param->private_data, conn_param->private_data_len); id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv); if (IS_ERR(id)) { ret = PTR_ERR(id); goto out; } id_priv->cm_id.ib = id; route = &id_priv->id.route; ret = cma_format_hdr(private_data, id_priv->id.ps, route); if (ret) goto out; req.private_data = private_data; req.primary_path = &route->path_rec[0]; if (route->num_paths == 2) req.alternate_path = &route->path_rec[1]; req.service_id = cma_get_service_id(id_priv->id.ps, (struct sockaddr *) &route->addr.dst_addr); req.qp_num = id_priv->qp_num; req.qp_type = id_priv->id.qp_type; req.starting_psn = id_priv->seq_num; req.responder_resources = conn_param->responder_resources; req.initiator_depth = conn_param->initiator_depth; req.flow_control = conn_param->flow_control; req.retry_count = min_t(u8, 7, conn_param->retry_count); req.rnr_retry_count = min_t(u8, 7, conn_param->rnr_retry_count); req.remote_cm_response_timeout = cma_response_timeout; req.local_cm_response_timeout = cma_response_timeout; req.max_cm_retries = CMA_MAX_CM_RETRIES; req.srq = id_priv->srq ? 1 : 0; cma_dbg(id_priv, "sending REQ\n"); ret = ib_send_cm_req(id_priv->cm_id.ib, &req); out: if (ret && !IS_ERR(id)) { ib_destroy_cm_id(id); id_priv->cm_id.ib = NULL; } kfree(private_data); return ret; } static int cma_connect_iw(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct iw_cm_id *cm_id; struct sockaddr_in* sin; int ret; struct iw_cm_conn_param iw_param; cm_id = iw_create_cm_id(id_priv->id.device, id_priv->sock, cma_iw_handler, id_priv); if (IS_ERR(cm_id)) return PTR_ERR(cm_id); id_priv->cm_id.iw = cm_id; sin = (struct sockaddr_in*) &id_priv->id.route.addr.src_addr; cm_id->local_addr = *sin; sin = (struct sockaddr_in*) &id_priv->id.route.addr.dst_addr; cm_id->remote_addr = *sin; ret = cma_modify_qp_rtr(id_priv, conn_param); if (ret) goto out; if (conn_param) { iw_param.ord = conn_param->initiator_depth; iw_param.ird = conn_param->responder_resources; iw_param.private_data = conn_param->private_data; iw_param.private_data_len = conn_param->private_data_len; iw_param.qpn = id_priv->id.qp ? id_priv->qp_num : conn_param->qp_num; } else { memset(&iw_param, 0, sizeof iw_param); iw_param.qpn = id_priv->qp_num; } ret = iw_cm_connect(cm_id, &iw_param); out: if (ret) { iw_destroy_cm_id(cm_id); id_priv->cm_id.iw = NULL; } return ret; } int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT)) return -EINVAL; if (!id->qp) { id_priv->qp_num = conn_param->qp_num; id_priv->srq = conn_param->srq; } switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: if (id->qp_type == IB_QPT_UD) ret = cma_resolve_ib_udp(id_priv, conn_param); else ret = cma_connect_ib(id_priv, conn_param); break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = cma_connect_iw(id_priv, conn_param); break; default: ret = -ENOSYS; break; } if (ret) goto err; return 0; err: cma_comp_exch(id_priv, RDMA_CM_CONNECT, RDMA_CM_ROUTE_RESOLVED); return ret; } EXPORT_SYMBOL(rdma_connect); static int cma_accept_ib(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct ib_cm_rep_param rep; int ret; ret = cma_modify_qp_rtr(id_priv, conn_param); if (ret) goto out; ret = cma_modify_qp_rts(id_priv, conn_param); if (ret) goto out; memset(&rep, 0, sizeof rep); rep.qp_num = id_priv->qp_num; rep.starting_psn = id_priv->seq_num; rep.private_data = conn_param->private_data; rep.private_data_len = conn_param->private_data_len; rep.responder_resources = conn_param->responder_resources; rep.initiator_depth = conn_param->initiator_depth; rep.failover_accepted = 0; rep.flow_control = conn_param->flow_control; rep.rnr_retry_count = min_t(u8, 7, conn_param->rnr_retry_count); rep.srq = id_priv->srq ? 1 : 0; cma_dbg(id_priv, "sending REP\n"); ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep); out: return ret; } static int cma_accept_iw(struct rdma_id_private *id_priv, struct rdma_conn_param *conn_param) { struct iw_cm_conn_param iw_param; int ret; if (!conn_param) return -EINVAL; ret = cma_modify_qp_rtr(id_priv, conn_param); if (ret) return ret; iw_param.ord = conn_param->initiator_depth; iw_param.ird = conn_param->responder_resources; iw_param.private_data = conn_param->private_data; iw_param.private_data_len = conn_param->private_data_len; if (id_priv->id.qp) { iw_param.qpn = id_priv->qp_num; } else iw_param.qpn = conn_param->qp_num; return iw_cm_accept(id_priv->cm_id.iw, &iw_param); } static int cma_send_sidr_rep(struct rdma_id_private *id_priv, enum ib_cm_sidr_status status, const void *private_data, int private_data_len) { struct ib_cm_sidr_rep_param rep; int ret; memset(&rep, 0, sizeof rep); rep.status = status; if (status == IB_SIDR_SUCCESS) { ret = cma_set_qkey(id_priv); if (ret) return ret; rep.qp_num = id_priv->qp_num; rep.qkey = id_priv->qkey; } rep.private_data = private_data; rep.private_data_len = private_data_len; cma_dbg(id_priv, "sending SIDR\n"); return ib_send_cm_sidr_rep(id_priv->cm_id.ib, &rep); } int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); id_priv->owner = curthread->td_proc->p_pid; if (!cma_comp(id_priv, RDMA_CM_CONNECT)) return -EINVAL; if (!id->qp && conn_param) { id_priv->qp_num = conn_param->qp_num; id_priv->srq = conn_param->srq; } switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: if (id->qp_type == IB_QPT_UD) { if (conn_param) ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS, conn_param->private_data, conn_param->private_data_len); else ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS, NULL, 0); } else { if (conn_param) ret = cma_accept_ib(id_priv, conn_param); else ret = cma_rep_recv(id_priv); } break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = cma_accept_iw(id_priv, conn_param); break; default: ret = -ENOSYS; break; } if (ret) goto reject; return 0; reject: cma_modify_qp_err(id_priv); rdma_reject(id, NULL, 0); return ret; } EXPORT_SYMBOL(rdma_accept); int rdma_notify(struct rdma_cm_id *id, enum ib_event_type event) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!id_priv->cm_id.ib) return -EINVAL; switch (id->device->node_type) { case RDMA_NODE_IB_CA: ret = ib_cm_notify(id_priv->cm_id.ib, event); break; default: ret = 0; break; } return ret; } EXPORT_SYMBOL(rdma_notify); int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!id_priv->cm_id.ib) return -EINVAL; switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: if (id->qp_type == IB_QPT_UD) ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, private_data, private_data_len); else { cma_dbg(id_priv, "sending REJ\n"); ret = ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); } break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = iw_cm_reject(id_priv->cm_id.iw, private_data, private_data_len); break; default: ret = -ENOSYS; break; } return ret; } EXPORT_SYMBOL(rdma_reject); int rdma_disconnect(struct rdma_cm_id *id) { struct rdma_id_private *id_priv; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!id_priv->cm_id.ib) return -EINVAL; switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: ret = cma_modify_qp_err(id_priv); if (ret) goto out; /* Initiate or respond to a disconnect. */ cma_dbg(id_priv, "sending DREQ\n"); if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0)) { cma_dbg(id_priv, "sending DREP\n"); ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0); } break; case RDMA_TRANSPORT_IWARP: case RDMA_TRANSPORT_SCIF: ret = iw_cm_disconnect(id_priv->cm_id.iw, 0); break; default: ret = -EINVAL; break; } out: return ret; } EXPORT_SYMBOL(rdma_disconnect); static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast) { struct rdma_id_private *id_priv; struct cma_multicast *mc = multicast->context; struct rdma_cm_event event; struct rdma_dev_addr *dev_addr; int ret; struct net_device *ndev = NULL; u16 vlan; id_priv = mc->id_priv; dev_addr = &id_priv->id.route.addr.dev_addr; if (cma_disable_callback(id_priv, RDMA_CM_ADDR_BOUND) && cma_disable_callback(id_priv, RDMA_CM_ADDR_RESOLVED)) return 0; mutex_lock(&id_priv->qp_mutex); if (!status && id_priv->id.qp) status = ib_attach_mcast(id_priv->id.qp, &multicast->rec.mgid, be16_to_cpu(multicast->rec.mlid)); mutex_unlock(&id_priv->qp_mutex); memset(&event, 0, sizeof event); event.status = status; event.param.ud.private_data = mc->context; ndev = dev_get_by_index(&init_net, dev_addr->bound_dev_if); if (!ndev) { status = -ENODEV; } else { vlan = rdma_vlan_dev_vlan_id(ndev); dev_put(ndev); } if (!status) { event.event = RDMA_CM_EVENT_MULTICAST_JOIN; ib_init_ah_from_mcmember(id_priv->id.device, id_priv->id.port_num, &multicast->rec, &event.param.ud.ah_attr); event.param.ud.ah_attr.vlan_id = vlan; event.param.ud.qp_num = 0xFFFFFF; event.param.ud.qkey = be32_to_cpu(multicast->rec.qkey); } else { event.event = RDMA_CM_EVENT_MULTICAST_ERROR; /* mark that the cached record is no longer valid */ if (status != -ENETRESET && status != -EAGAIN) { spin_lock(&id_priv->lock); id_priv->is_valid_rec = 0; spin_unlock(&id_priv->lock); } } ret = id_priv->id.event_handler(&id_priv->id, &event); if (ret) { cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); rdma_destroy_id(&id_priv->id); return 0; } mutex_unlock(&id_priv->handler_mutex); return 0; } static void cma_set_mgid(struct rdma_id_private *id_priv, struct sockaddr *addr, union ib_gid *mgid) { unsigned char mc_map[MAX_ADDR_LEN]; struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; struct sockaddr_in *sin = (struct sockaddr_in *) addr; #if defined(INET6) struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) addr; #endif if (cma_any_addr(addr)) { memset(mgid, 0, sizeof *mgid); #if defined(INET6) } else if ((addr->sa_family == AF_INET6) && ((be32_to_cpu(sin6->sin6_addr.s6_addr32[0]) & 0xFFF0FFFF) == 0xFF10A01B)) { /* IPv6 address is an SA assigned MGID. */ memcpy(mgid, &sin6->sin6_addr, sizeof *mgid); } else if (addr->sa_family == AF_INET6) { ipv6_ib_mc_map(&sin6->sin6_addr, dev_addr->broadcast, mc_map); if (id_priv->id.ps == RDMA_PS_UDP) mc_map[7] = 0x01; /* Use RDMA CM signature */ *mgid = *(union ib_gid *) (mc_map + 4); #endif } else { ip_ib_mc_map(sin->sin_addr.s_addr, dev_addr->broadcast, mc_map); if (id_priv->id.ps == RDMA_PS_UDP) mc_map[7] = 0x01; /* Use RDMA CM signature */ *mgid = *(union ib_gid *) (mc_map + 4); } } static int cma_join_ib_multicast(struct rdma_id_private *id_priv, struct cma_multicast *mc) { struct ib_sa_mcmember_rec rec; struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; ib_sa_comp_mask comp_mask; int ret = 0; ib_addr_get_mgid(dev_addr, &id_priv->rec.mgid); /* cache ipoib bc record */ spin_lock(&id_priv->lock); if (!id_priv->is_valid_rec) ret = ib_sa_get_mcmember_rec(id_priv->id.device, id_priv->id.port_num, &id_priv->rec.mgid, &id_priv->rec); if (ret) { id_priv->is_valid_rec = 0; spin_unlock(&id_priv->lock); return ret; } else { rec = id_priv->rec; id_priv->is_valid_rec = 1; } spin_unlock(&id_priv->lock); cma_set_mgid(id_priv, (struct sockaddr *) &mc->addr, &rec.mgid); if (id_priv->id.ps == RDMA_PS_UDP) rec.qkey = cpu_to_be32(RDMA_UDP_QKEY); rdma_addr_get_sgid(dev_addr, &rec.port_gid); rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr)); rec.join_state = 1; comp_mask = IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID | IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE | IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_SL | IB_SA_MCMEMBER_REC_FLOW_LABEL | IB_SA_MCMEMBER_REC_TRAFFIC_CLASS; if (id_priv->id.ps == RDMA_PS_IPOIB) comp_mask |= IB_SA_MCMEMBER_REC_RATE | IB_SA_MCMEMBER_REC_RATE_SELECTOR | IB_SA_MCMEMBER_REC_MTU_SELECTOR | IB_SA_MCMEMBER_REC_MTU | IB_SA_MCMEMBER_REC_HOP_LIMIT; mc->multicast.ib = ib_sa_join_multicast(&sa_client, id_priv->id.device, id_priv->id.port_num, &rec, comp_mask, GFP_KERNEL, cma_ib_mc_handler, mc); return PTR_RET(mc->multicast.ib); } static void iboe_mcast_work_handler(struct work_struct *work) { struct iboe_mcast_work *mw = container_of(work, struct iboe_mcast_work, work); struct cma_multicast *mc = mw->mc; struct ib_sa_multicast *m = mc->multicast.ib; mc->multicast.ib->context = mc; cma_ib_mc_handler(0, m); kref_put(&mc->mcref, release_mc); kfree(mw); } static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid) { struct sockaddr_in *sin = (struct sockaddr_in *)addr; struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr; if (cma_any_addr(addr)) { memset(mgid, 0, sizeof *mgid); } else if (addr->sa_family == AF_INET6) { memcpy(mgid, &sin6->sin6_addr, sizeof *mgid); } else { mgid->raw[0] = 0xff; mgid->raw[1] = 0x0e; mgid->raw[2] = 0; mgid->raw[3] = 0; mgid->raw[4] = 0; mgid->raw[5] = 0; mgid->raw[6] = 0; mgid->raw[7] = 0; mgid->raw[8] = 0; mgid->raw[9] = 0; mgid->raw[10] = 0xff; mgid->raw[11] = 0xff; *(__be32 *)(&mgid->raw[12]) = sin->sin_addr.s_addr; } } static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, struct cma_multicast *mc) { struct iboe_mcast_work *work; struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; int err; struct sockaddr *addr = (struct sockaddr *)&mc->addr; struct net_device *ndev = NULL; if (cma_zero_addr((struct sockaddr *)&mc->addr)) return -EINVAL; work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; mc->multicast.ib = kzalloc(sizeof(struct ib_sa_multicast), GFP_KERNEL); if (!mc->multicast.ib) { err = -ENOMEM; goto out1; } cma_iboe_set_mgid(addr, &mc->multicast.ib->rec.mgid); mc->multicast.ib->rec.pkey = cpu_to_be16(0xffff); if (id_priv->id.ps == RDMA_PS_UDP) mc->multicast.ib->rec.qkey = cpu_to_be32(RDMA_UDP_QKEY); if (dev_addr->bound_dev_if) ndev = dev_get_by_index(&init_net, dev_addr->bound_dev_if); if (!ndev) { err = -ENODEV; goto out2; } mc->multicast.ib->rec.rate = iboe_get_rate(ndev); mc->multicast.ib->rec.hop_limit = 1; mc->multicast.ib->rec.mtu = iboe_get_mtu(ndev->if_mtu); dev_put(ndev); if (!mc->multicast.ib->rec.mtu) { err = -EINVAL; goto out2; } rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr, &mc->multicast.ib->rec.port_gid); work->id = id_priv; work->mc = mc; INIT_WORK(&work->work, iboe_mcast_work_handler); kref_get(&mc->mcref); queue_work(cma_wq, &work->work); return 0; out2: kfree(mc->multicast.ib); out1: kfree(work); return err; } int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, void *context) { struct rdma_id_private *id_priv; struct cma_multicast *mc; int ret; id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp(id_priv, RDMA_CM_ADDR_BOUND) && !cma_comp(id_priv, RDMA_CM_ADDR_RESOLVED)) return -EINVAL; mc = kmalloc(sizeof *mc, GFP_KERNEL); if (!mc) return -ENOMEM; memcpy(&mc->addr, addr, ip_addr_size(addr)); mc->context = context; mc->id_priv = id_priv; spin_lock(&id_priv->lock); list_add(&mc->list, &id_priv->mc_list); spin_unlock(&id_priv->lock); switch (rdma_node_get_transport(id->device->node_type)) { case RDMA_TRANSPORT_IB: switch (rdma_port_get_link_layer(id->device, id->port_num)) { case IB_LINK_LAYER_INFINIBAND: ret = cma_join_ib_multicast(id_priv, mc); break; case IB_LINK_LAYER_ETHERNET: kref_init(&mc->mcref); ret = cma_iboe_join_multicast(id_priv, mc); break; default: ret = -EINVAL; } break; default: ret = -ENOSYS; break; } if (ret) { spin_lock_irq(&id_priv->lock); list_del(&mc->list); spin_unlock_irq(&id_priv->lock); kfree(mc); } return ret; } EXPORT_SYMBOL(rdma_join_multicast); void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr) { struct rdma_id_private *id_priv; struct cma_multicast *mc; id_priv = container_of(id, struct rdma_id_private, id); spin_lock_irq(&id_priv->lock); list_for_each_entry(mc, &id_priv->mc_list, list) { if (!memcmp(&mc->addr, addr, ip_addr_size(addr))) { list_del(&mc->list); spin_unlock_irq(&id_priv->lock); if (id->qp) ib_detach_mcast(id->qp, &mc->multicast.ib->rec.mgid, be16_to_cpu(mc->multicast.ib->rec.mlid)); if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) { switch (rdma_port_get_link_layer(id->device, id->port_num)) { case IB_LINK_LAYER_INFINIBAND: ib_sa_free_multicast(mc->multicast.ib); kfree(mc); break; case IB_LINK_LAYER_ETHERNET: kref_put(&mc->mcref, release_mc); break; default: break; } } return; } } spin_unlock_irq(&id_priv->lock); } EXPORT_SYMBOL(rdma_leave_multicast); static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id_priv) { struct rdma_dev_addr *dev_addr; struct cma_ndev_work *work; dev_addr = &id_priv->id.route.addr.dev_addr; if ((dev_addr->bound_dev_if == ndev->if_index) && memcmp(dev_addr->src_dev_addr, IF_LLADDR(ndev), ndev->if_addrlen)) { printk(KERN_INFO "RDMA CM addr change for ndev %s used by id %p\n", ndev->if_xname, &id_priv->id); work = kzalloc(sizeof *work, GFP_KERNEL); if (!work) return -ENOMEM; INIT_WORK(&work->work, cma_ndev_work_handler); work->id = id_priv; work->event.event = RDMA_CM_EVENT_ADDR_CHANGE; atomic_inc(&id_priv->refcount); queue_work(cma_wq, &work->work); } return 0; } static int cma_netdev_callback(struct notifier_block *self, unsigned long event, void *ctx) { struct net_device *ndev = (struct net_device *)ctx; struct cma_device *cma_dev; struct rdma_id_private *id_priv; int ret = NOTIFY_DONE; /* BONDING related, commented out until the bonding is resolved */ #if 0 if (dev_net(ndev) != &init_net) return NOTIFY_DONE; if (event != NETDEV_BONDING_FAILOVER) return NOTIFY_DONE; if (!(ndev->flags & IFF_MASTER) || !(ndev->priv_flags & IFF_BONDING)) return NOTIFY_DONE; #endif if (event != NETDEV_DOWN && event != NETDEV_UNREGISTER) return NOTIFY_DONE; mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) list_for_each_entry(id_priv, &cma_dev->id_list, list) { ret = cma_netdev_change(ndev, id_priv); if (ret) goto out; } out: mutex_unlock(&lock); return ret; } static struct notifier_block cma_nb = { .notifier_call = cma_netdev_callback }; static void cma_add_one(struct ib_device *device) { struct cma_device *cma_dev; struct rdma_id_private *id_priv; cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL); if (!cma_dev) return; cma_dev->device = device; init_completion(&cma_dev->comp); atomic_set(&cma_dev->refcount, 1); INIT_LIST_HEAD(&cma_dev->id_list); ib_set_client_data(device, &cma_client, cma_dev); mutex_lock(&lock); list_add_tail(&cma_dev->list, &dev_list); list_for_each_entry(id_priv, &listen_any_list, list) cma_listen_on_dev(id_priv, cma_dev); mutex_unlock(&lock); } static int cma_remove_id_dev(struct rdma_id_private *id_priv) { struct rdma_cm_event event; enum rdma_cm_state state; int ret = 0; /* Record that we want to remove the device */ state = cma_exch(id_priv, RDMA_CM_DEVICE_REMOVAL); if (state == RDMA_CM_DESTROYING) return 0; cma_cancel_operation(id_priv, state); mutex_lock(&id_priv->handler_mutex); /* Check for destruction from another callback. */ if (!cma_comp(id_priv, RDMA_CM_DEVICE_REMOVAL)) goto out; memset(&event, 0, sizeof event); event.event = RDMA_CM_EVENT_DEVICE_REMOVAL; ret = id_priv->id.event_handler(&id_priv->id, &event); out: mutex_unlock(&id_priv->handler_mutex); return ret; } static void cma_process_remove(struct cma_device *cma_dev) { struct rdma_id_private *id_priv; int ret; mutex_lock(&lock); while (!list_empty(&cma_dev->id_list)) { id_priv = list_entry(cma_dev->id_list.next, struct rdma_id_private, list); list_del(&id_priv->listen_list); list_del_init(&id_priv->list); atomic_inc(&id_priv->refcount); mutex_unlock(&lock); ret = id_priv->internal_id ? 1 : cma_remove_id_dev(id_priv); cma_deref_id(id_priv); if (ret) rdma_destroy_id(&id_priv->id); mutex_lock(&lock); } mutex_unlock(&lock); cma_deref_dev(cma_dev); wait_for_completion(&cma_dev->comp); } static void cma_remove_one(struct ib_device *device) { struct cma_device *cma_dev; cma_dev = ib_get_client_data(device, &cma_client); if (!cma_dev) return; mutex_lock(&lock); list_del(&cma_dev->list); mutex_unlock(&lock); cma_process_remove(cma_dev); kfree(cma_dev); } static int __init cma_init(void) { int ret = -ENOMEM; cma_wq = create_singlethread_workqueue("rdma_cm"); if (!cma_wq) return -ENOMEM; cma_free_wq = create_singlethread_workqueue("rdma_cm_fr"); if (!cma_free_wq) goto err1; ib_sa_register_client(&sa_client); rdma_addr_register_client(&addr_client); register_netdevice_notifier(&cma_nb); ret = ib_register_client(&cma_client); if (ret) goto err; return 0; err: unregister_netdevice_notifier(&cma_nb); rdma_addr_unregister_client(&addr_client); ib_sa_unregister_client(&sa_client); destroy_workqueue(cma_free_wq); err1: destroy_workqueue(cma_wq); return ret; } static void __exit cma_cleanup(void) { ib_unregister_client(&cma_client); unregister_netdevice_notifier(&cma_nb); rdma_addr_unregister_client(&addr_client); ib_sa_unregister_client(&sa_client); flush_workqueue(cma_free_wq); destroy_workqueue(cma_free_wq); destroy_workqueue(cma_wq); idr_destroy(&sdp_ps); idr_destroy(&tcp_ps); idr_destroy(&udp_ps); idr_destroy(&ipoib_ps); idr_destroy(&ib_ps); } module_init(cma_init); module_exit(cma_cleanup); Index: projects/clang380-import/sys/ofed/drivers/infiniband/core/iwcm.c =================================================================== --- projects/clang380-import/sys/ofed/drivers/infiniband/core/iwcm.c (revision 294776) +++ projects/clang380-import/sys/ofed/drivers/infiniband/core/iwcm.c (revision 294777) @@ -1,1038 +1,1318 @@ /* * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. * Copyright (c) 2004 Topspin Corporation. All rights reserved. * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. + * Copyright (c) 2016 Chelsio Communications. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. * */ +#include "opt_inet.h" + #include #include #include #include #include #include #include #include #include #include #include #include +#include +#include +#include #include #include #include "iwcm.h" MODULE_AUTHOR("Tom Tucker"); MODULE_DESCRIPTION("iWARP CM"); MODULE_LICENSE("Dual BSD/GPL"); static struct workqueue_struct *iwcm_wq; struct iwcm_work { struct work_struct work; struct iwcm_id_private *cm_id; struct list_head list; struct iw_cm_event event; struct list_head free_list; }; +struct iwcm_listen_work { + struct work_struct work; + struct iw_cm_id *cm_id; +}; +static LIST_HEAD(listen_port_list); + +static DEFINE_MUTEX(listen_port_mutex); +static DEFINE_MUTEX(dequeue_mutex); + +struct listen_port_info { + struct list_head list; + uint16_t port_num; + uint32_t refcnt; +}; + +static int32_t +add_port_to_listenlist(uint16_t port) +{ + struct listen_port_info *port_info; + int err = 0; + + mutex_lock(&listen_port_mutex); + + list_for_each_entry(port_info, &listen_port_list, list) + if (port_info->port_num == port) + goto found_port; + + port_info = kmalloc(sizeof(*port_info), GFP_KERNEL); + if (!port_info) { + err = -ENOMEM; + mutex_unlock(&listen_port_mutex); + goto out; + } + + port_info->port_num = port; + port_info->refcnt = 0; + + list_add(&port_info->list, &listen_port_list); + +found_port: + ++(port_info->refcnt); + mutex_unlock(&listen_port_mutex); + return port_info->refcnt; +out: + return err; +} + +static int32_t +rem_port_from_listenlist(uint16_t port) +{ + struct listen_port_info *port_info; + int ret, found_port = 0; + + mutex_lock(&listen_port_mutex); + + list_for_each_entry(port_info, &listen_port_list, list) + if (port_info->port_num == port) { + found_port = 1; + break; + } + + if (found_port) { + --(port_info->refcnt); + ret = port_info->refcnt; + if (port_info->refcnt == 0) { + /* Remove this entry from the list as there are no + * more listeners for this port_num. + */ + list_del(&port_info->list); + kfree(port_info); + } + } else { + ret = -EINVAL; + } + mutex_unlock(&listen_port_mutex); + return ret; + +} + /* * The following services provide a mechanism for pre-allocating iwcm_work * elements. The design pre-allocates them based on the cm_id type: * LISTENING IDS: Get enough elements preallocated to handle the * listen backlog. * ACTIVE IDS: 4: CONNECT_REPLY, ESTABLISHED, DISCONNECT, CLOSE * PASSIVE IDS: 3: ESTABLISHED, DISCONNECT, CLOSE * * Allocating them in connect and listen avoids having to deal * with allocation failures on the event upcall from the provider (which * is called in the interrupt context). * * One exception is when creating the cm_id for incoming connection requests. * There are two cases: * 1) in the event upcall, cm_event_handler(), for a listening cm_id. If * the backlog is exceeded, then no more connection request events will * be processed. cm_event_handler() returns -ENOMEM in this case. Its up * to the provider to reject the connection request. * 2) in the connection request workqueue handler, cm_conn_req_handler(). * If work elements cannot be allocated for the new connect request cm_id, * then IWCM will call the provider reject method. This is ok since * cm_conn_req_handler() runs in the workqueue thread context. */ static struct iwcm_work *get_work(struct iwcm_id_private *cm_id_priv) { struct iwcm_work *work; if (list_empty(&cm_id_priv->work_free_list)) return NULL; work = list_entry(cm_id_priv->work_free_list.next, struct iwcm_work, free_list); list_del_init(&work->free_list); return work; } static void put_work(struct iwcm_work *work) { list_add(&work->free_list, &work->cm_id->work_free_list); } static void dealloc_work_entries(struct iwcm_id_private *cm_id_priv) { struct list_head *e, *tmp; list_for_each_safe(e, tmp, &cm_id_priv->work_free_list) kfree(list_entry(e, struct iwcm_work, free_list)); } static int alloc_work_entries(struct iwcm_id_private *cm_id_priv, int count) { struct iwcm_work *work; BUG_ON(!list_empty(&cm_id_priv->work_free_list)); while (count--) { work = kmalloc(sizeof(struct iwcm_work), GFP_KERNEL); if (!work) { dealloc_work_entries(cm_id_priv); return -ENOMEM; } work->cm_id = cm_id_priv; INIT_LIST_HEAD(&work->list); put_work(work); } return 0; } /* * Save private data from incoming connection requests to * iw_cm_event, so the low level driver doesn't have to. Adjust * the event ptr to point to the local copy. */ static int copy_private_data(struct iw_cm_event *event) { void *p; p = kmemdup(event->private_data, event->private_data_len, GFP_ATOMIC); if (!p) return -ENOMEM; event->private_data = p; return 0; } static void free_cm_id(struct iwcm_id_private *cm_id_priv) { dealloc_work_entries(cm_id_priv); kfree(cm_id_priv); } /* * Release a reference on cm_id. If the last reference is being * released, enable the waiting thread (in iw_destroy_cm_id) to * get woken up, and return 1 if a thread is already waiting. */ static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) { BUG_ON(atomic_read(&cm_id_priv->refcount)==0); if (atomic_dec_and_test(&cm_id_priv->refcount)) { BUG_ON(!list_empty(&cm_id_priv->work_list)); complete(&cm_id_priv->destroy_comp); return 1; } return 0; } static void add_ref(struct iw_cm_id *cm_id) { struct iwcm_id_private *cm_id_priv; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); atomic_inc(&cm_id_priv->refcount); } static void rem_ref(struct iw_cm_id *cm_id) { struct iwcm_id_private *cm_id_priv; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); if (iwcm_deref_id(cm_id_priv) && test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { BUG_ON(!list_empty(&cm_id_priv->work_list)); free_cm_id(cm_id_priv); } } static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event); struct iw_cm_id *iw_create_cm_id(struct ib_device *device, struct socket *so, iw_cm_handler cm_handler, void *context) { struct iwcm_id_private *cm_id_priv; cm_id_priv = kzalloc(sizeof(*cm_id_priv), GFP_KERNEL); if (!cm_id_priv) return ERR_PTR(-ENOMEM); cm_id_priv->state = IW_CM_STATE_IDLE; cm_id_priv->id.device = device; cm_id_priv->id.cm_handler = cm_handler; cm_id_priv->id.context = context; cm_id_priv->id.event_handler = cm_event_handler; cm_id_priv->id.add_ref = add_ref; cm_id_priv->id.rem_ref = rem_ref; cm_id_priv->id.so = so; spin_lock_init(&cm_id_priv->lock); atomic_set(&cm_id_priv->refcount, 1); init_waitqueue_head(&cm_id_priv->connect_wait); init_completion(&cm_id_priv->destroy_comp); INIT_LIST_HEAD(&cm_id_priv->work_list); INIT_LIST_HEAD(&cm_id_priv->work_free_list); return &cm_id_priv->id; } EXPORT_SYMBOL(iw_create_cm_id); static int iwcm_modify_qp_err(struct ib_qp *qp) { struct ib_qp_attr qp_attr; if (!qp) return -EINVAL; qp_attr.qp_state = IB_QPS_ERR; return ib_modify_qp(qp, &qp_attr, IB_QP_STATE); } /* * This is really the RDMAC CLOSING state. It is most similar to the * IB SQD QP state. */ static int iwcm_modify_qp_sqd(struct ib_qp *qp) { struct ib_qp_attr qp_attr; BUG_ON(qp == NULL); qp_attr.qp_state = IB_QPS_SQD; return ib_modify_qp(qp, &qp_attr, IB_QP_STATE); } /* * CM_ID <-- CLOSING * * Block if a passive or active connection is currently being processed. Then * process the event as follows: * - If we are ESTABLISHED, move to CLOSING and modify the QP state * based on the abrupt flag * - If the connection is already in the CLOSING or IDLE state, the peer is * disconnecting concurrently with us and we've already seen the * DISCONNECT event -- ignore the request and return 0 * - Disconnect on a listening endpoint returns -EINVAL */ int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt) { struct iwcm_id_private *cm_id_priv; unsigned long flags; int ret = 0; struct ib_qp *qp = NULL; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); /* Wait if we're currently in a connect or accept downcall */ wait_event(cm_id_priv->connect_wait, !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags)); spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->state) { case IW_CM_STATE_ESTABLISHED: cm_id_priv->state = IW_CM_STATE_CLOSING; /* QP could be for user-mode client */ if (cm_id_priv->qp) qp = cm_id_priv->qp; else ret = -EINVAL; break; case IW_CM_STATE_LISTEN: ret = -EINVAL; break; case IW_CM_STATE_CLOSING: /* remote peer closed first */ case IW_CM_STATE_IDLE: /* accept or connect returned !0 */ break; case IW_CM_STATE_CONN_RECV: /* * App called disconnect before/without calling accept after * connect_request event delivered. */ break; case IW_CM_STATE_CONN_SENT: /* Can only get here if wait above fails */ default: BUG(); } spin_unlock_irqrestore(&cm_id_priv->lock, flags); if (qp) { if (abrupt) ret = iwcm_modify_qp_err(qp); else ret = iwcm_modify_qp_sqd(qp); /* * If both sides are disconnecting the QP could * already be in ERR or SQD states */ ret = 0; } return ret; } EXPORT_SYMBOL(iw_cm_disconnect); +static struct socket * +dequeue_socket(struct socket *head) +{ + struct socket *so; + struct sockaddr_in *remote; + + ACCEPT_LOCK(); + so = TAILQ_FIRST(&head->so_comp); + if (!so) { + ACCEPT_UNLOCK(); + return NULL; + } + + SOCK_LOCK(so); + /* + * Before changing the flags on the socket, we have to bump the + * reference count. Otherwise, if the protocol calls sofree(), + * the socket will be released due to a zero refcount. + */ + soref(so); + TAILQ_REMOVE(&head->so_comp, so, so_list); + head->so_qlen--; + so->so_qstate &= ~SQ_COMP; + so->so_head = NULL; + so->so_state |= SS_NBIO; + SOCK_UNLOCK(so); + ACCEPT_UNLOCK(); + soaccept(so, (struct sockaddr **)&remote); + + free(remote, M_SONAME); + return so; +} +static void +iw_so_event_handler(struct work_struct *_work) +{ +#ifdef INET + struct iwcm_listen_work *work = container_of(_work, + struct iwcm_listen_work, work); + struct iw_cm_id *listen_cm_id = work->cm_id; + struct iwcm_id_private *cm_id_priv; + struct iw_cm_id *real_cm_id; + struct sockaddr_in *local; + struct socket *so; + + cm_id_priv = container_of(listen_cm_id, struct iwcm_id_private, id); + + if (cm_id_priv->state != IW_CM_STATE_LISTEN) { + kfree(work); + return; + } + mutex_lock(&dequeue_mutex); + + /* Dequeue & process all new 'so' connection requests for this cmid */ + while ((so = dequeue_socket(work->cm_id->so)) != NULL) { + if (rdma_cma_any_addr((struct sockaddr *) + &listen_cm_id->local_addr)) { + in_getsockaddr(so, (struct sockaddr **)&local); + if (rdma_find_cmid_laddr(local, ARPHRD_ETHER, + (void **) &real_cm_id)) { + free(local, M_SONAME); + goto err; + } + free(local, M_SONAME); + + real_cm_id->device->iwcm->newconn(real_cm_id, so); + } else { + listen_cm_id->device->iwcm->newconn(listen_cm_id, so); + } + } +err: + mutex_unlock(&dequeue_mutex); + kfree(work); +#endif + return; +} +static int +iw_so_upcall(struct socket *parent_so, void *arg, int waitflag) +{ + struct iwcm_listen_work *work; + struct socket *so; + struct iw_cm_id *cm_id = arg; + + mutex_lock(&dequeue_mutex); + /* check whether iw_so_event_handler() already dequeued this 'so' */ + so = TAILQ_FIRST(&parent_so->so_comp); + if (!so) + return SU_OK; + work = kzalloc(sizeof(*work), M_NOWAIT); + if (!work) + return -ENOMEM; + work->cm_id = cm_id; + + INIT_WORK(&work->work, iw_so_event_handler); + queue_work(iwcm_wq, &work->work); + + mutex_unlock(&dequeue_mutex); + return SU_OK; +} + +static void +iw_init_sock(struct iw_cm_id *cm_id) +{ + struct sockopt sopt; + struct socket *so = cm_id->so; + int on = 1; + + SOCK_LOCK(so); + soupcall_set(so, SO_RCV, iw_so_upcall, cm_id); + so->so_state |= SS_NBIO; + SOCK_UNLOCK(so); + sopt.sopt_dir = SOPT_SET; + sopt.sopt_level = IPPROTO_TCP; + sopt.sopt_name = TCP_NODELAY; + sopt.sopt_val = (caddr_t)&on; + sopt.sopt_valsize = sizeof(on); + sopt.sopt_td = NULL; + sosetopt(so, &sopt); +} + +static int +iw_close_socket(struct iw_cm_id *cm_id, int close) +{ + struct socket *so = cm_id->so; + int rc; + + + SOCK_LOCK(so); + soupcall_clear(so, SO_RCV); + SOCK_UNLOCK(so); + + if (close) + rc = soclose(so); + else + rc = soshutdown(so, SHUT_WR | SHUT_RD); + + cm_id->so = NULL; + + return rc; +} + +static int +iw_create_listen(struct iw_cm_id *cm_id, int backlog) +{ + int rc; + + iw_init_sock(cm_id); + rc = solisten(cm_id->so, backlog, curthread); + if (rc != 0) + iw_close_socket(cm_id, 0); + return rc; +} + +static int +iw_destroy_listen(struct iw_cm_id *cm_id) +{ + int rc; + rc = iw_close_socket(cm_id, 0); + return rc; +} + + /* * CM_ID <-- DESTROYING * * Clean up all resources associated with the connection and release * the initial reference taken by iw_create_cm_id. */ static void destroy_cm_id(struct iw_cm_id *cm_id) { struct iwcm_id_private *cm_id_priv; unsigned long flags; - int ret; + int ret = 0, refcnt; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); /* * Wait if we're currently in a connect or accept downcall. A * listening endpoint should never block here. */ wait_event(cm_id_priv->connect_wait, !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags)); spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->state) { case IW_CM_STATE_LISTEN: cm_id_priv->state = IW_CM_STATE_DESTROYING; spin_unlock_irqrestore(&cm_id_priv->lock, flags); - /* destroy the listening endpoint */ - ret = cm_id->device->iwcm->destroy_listen(cm_id); + if (rdma_cma_any_addr((struct sockaddr *)&cm_id->local_addr)) { + refcnt = + rem_port_from_listenlist(cm_id->local_addr.sin_port); + + if (refcnt == 0) + ret = iw_destroy_listen(cm_id); + + cm_id->device->iwcm->destroy_listen_ep(cm_id); + } else { + ret = iw_destroy_listen(cm_id); + cm_id->device->iwcm->destroy_listen_ep(cm_id); + } spin_lock_irqsave(&cm_id_priv->lock, flags); break; case IW_CM_STATE_ESTABLISHED: cm_id_priv->state = IW_CM_STATE_DESTROYING; spin_unlock_irqrestore(&cm_id_priv->lock, flags); /* Abrupt close of the connection */ (void)iwcm_modify_qp_err(cm_id_priv->qp); spin_lock_irqsave(&cm_id_priv->lock, flags); break; case IW_CM_STATE_IDLE: case IW_CM_STATE_CLOSING: cm_id_priv->state = IW_CM_STATE_DESTROYING; break; case IW_CM_STATE_CONN_RECV: /* * App called destroy before/without calling accept after * receiving connection request event notification or * returned non zero from the event callback function. * In either case, must tell the provider to reject. */ cm_id_priv->state = IW_CM_STATE_DESTROYING; spin_unlock_irqrestore(&cm_id_priv->lock, flags); cm_id->device->iwcm->reject(cm_id, NULL, 0); spin_lock_irqsave(&cm_id_priv->lock, flags); break; case IW_CM_STATE_CONN_SENT: case IW_CM_STATE_DESTROYING: default: BUG(); break; } if (cm_id_priv->qp) { cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); cm_id_priv->qp = NULL; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); (void)iwcm_deref_id(cm_id_priv); } /* * This function is only called by the application thread and cannot * be called by the event thread. The function will wait for all * references to be released on the cm_id and then kfree the cm_id * object. */ void iw_destroy_cm_id(struct iw_cm_id *cm_id) { struct iwcm_id_private *cm_id_priv; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)); destroy_cm_id(cm_id); wait_for_completion(&cm_id_priv->destroy_comp); free_cm_id(cm_id_priv); } EXPORT_SYMBOL(iw_destroy_cm_id); /* * CM_ID <-- LISTEN * * Start listening for connect requests. Generates one CONNECT_REQUEST * event for each inbound connect request. */ int iw_cm_listen(struct iw_cm_id *cm_id, int backlog) { struct iwcm_id_private *cm_id_priv; unsigned long flags; - int ret; + int ret, refcnt; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); ret = alloc_work_entries(cm_id_priv, backlog); if (ret) return ret; spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->state) { case IW_CM_STATE_IDLE: cm_id_priv->state = IW_CM_STATE_LISTEN; spin_unlock_irqrestore(&cm_id_priv->lock, flags); - ret = cm_id->device->iwcm->create_listen(cm_id, backlog); - if (ret) + + if (rdma_cma_any_addr((struct sockaddr *)&cm_id->local_addr)) { + refcnt = + add_port_to_listenlist(cm_id->local_addr.sin_port); + + if (refcnt == 1) { + ret = iw_create_listen(cm_id, backlog); + } else if (refcnt <= 0) { + ret = -EINVAL; + } else { + /* if refcnt > 1, a socket listener created + * already. And we need not create socket + * listener on other rdma devices/listen cm_id's + * due to TOE. That is when a socket listener is + * created with INADDR_ANY all registered TOE + * devices will get a call to start + * hardware listeners. + */ + } + } else { + ret = iw_create_listen(cm_id, backlog); + } + if (!ret) + cm_id->device->iwcm->create_listen_ep(cm_id, backlog); + else cm_id_priv->state = IW_CM_STATE_IDLE; + spin_lock_irqsave(&cm_id_priv->lock, flags); break; default: ret = -EINVAL; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); return ret; } EXPORT_SYMBOL(iw_cm_listen); /* * CM_ID <-- IDLE * * Rejects an inbound connection request. No events are generated. */ int iw_cm_reject(struct iw_cm_id *cm_id, const void *private_data, u8 private_data_len) { struct iwcm_id_private *cm_id_priv; unsigned long flags; int ret; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) { spin_unlock_irqrestore(&cm_id_priv->lock, flags); clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); wake_up_all(&cm_id_priv->connect_wait); return -EINVAL; } cm_id_priv->state = IW_CM_STATE_IDLE; spin_unlock_irqrestore(&cm_id_priv->lock, flags); ret = cm_id->device->iwcm->reject(cm_id, private_data, private_data_len); clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); wake_up_all(&cm_id_priv->connect_wait); return ret; } EXPORT_SYMBOL(iw_cm_reject); /* * CM_ID <-- ESTABLISHED * * Accepts an inbound connection request and generates an ESTABLISHED * event. Callers of iw_cm_disconnect and iw_destroy_cm_id will block * until the ESTABLISHED event is received from the provider. */ int iw_cm_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) { struct iwcm_id_private *cm_id_priv; struct ib_qp *qp; unsigned long flags; int ret; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) { spin_unlock_irqrestore(&cm_id_priv->lock, flags); clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); wake_up_all(&cm_id_priv->connect_wait); return -EINVAL; } /* Get the ib_qp given the QPN */ qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn); if (!qp) { spin_unlock_irqrestore(&cm_id_priv->lock, flags); clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); wake_up_all(&cm_id_priv->connect_wait); return -EINVAL; } cm_id->device->iwcm->add_ref(qp); cm_id_priv->qp = qp; spin_unlock_irqrestore(&cm_id_priv->lock, flags); ret = cm_id->device->iwcm->accept(cm_id, iw_param); if (ret) { /* An error on accept precludes provider events */ BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV); cm_id_priv->state = IW_CM_STATE_IDLE; spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id_priv->qp) { cm_id->device->iwcm->rem_ref(qp); cm_id_priv->qp = NULL; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); wake_up_all(&cm_id_priv->connect_wait); } return ret; } EXPORT_SYMBOL(iw_cm_accept); /* * Active Side: CM_ID <-- CONN_SENT * * If successful, results in the generation of a CONNECT_REPLY * event. iw_cm_disconnect and iw_cm_destroy will block until the * CONNECT_REPLY event is received from the provider. */ int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param) { struct iwcm_id_private *cm_id_priv; int ret; unsigned long flags; struct ib_qp *qp; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); ret = alloc_work_entries(cm_id_priv, 4); if (ret) return ret; set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id_priv->state != IW_CM_STATE_IDLE) { spin_unlock_irqrestore(&cm_id_priv->lock, flags); clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); wake_up_all(&cm_id_priv->connect_wait); return -EINVAL; } /* Get the ib_qp given the QPN */ qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn); if (!qp) { spin_unlock_irqrestore(&cm_id_priv->lock, flags); clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); wake_up_all(&cm_id_priv->connect_wait); return -EINVAL; } cm_id->device->iwcm->add_ref(qp); cm_id_priv->qp = qp; cm_id_priv->state = IW_CM_STATE_CONN_SENT; spin_unlock_irqrestore(&cm_id_priv->lock, flags); ret = cm_id->device->iwcm->connect(cm_id, iw_param); if (ret) { spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id_priv->qp) { cm_id->device->iwcm->rem_ref(qp); cm_id_priv->qp = NULL; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT); cm_id_priv->state = IW_CM_STATE_IDLE; clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); wake_up_all(&cm_id_priv->connect_wait); } return ret; } EXPORT_SYMBOL(iw_cm_connect); /* * Passive Side: new CM_ID <-- CONN_RECV * * Handles an inbound connect request. The function creates a new * iw_cm_id to represent the new connection and inherits the client * callback function and other attributes from the listening parent. * * The work item contains a pointer to the listen_cm_id and the event. The * listen_cm_id contains the client cm_handler, context and * device. These are copied when the device is cloned. The event * contains the new four tuple. * * An error on the child should not affect the parent, so this * function does not return a value. */ static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv, struct iw_cm_event *iw_event) { unsigned long flags; struct iw_cm_id *cm_id; struct iwcm_id_private *cm_id_priv; int ret; /* * The provider should never generate a connection request * event with a bad status. */ BUG_ON(iw_event->status); cm_id = iw_create_cm_id(listen_id_priv->id.device, iw_event->so, listen_id_priv->id.cm_handler, listen_id_priv->id.context); /* If the cm_id could not be created, ignore the request */ if (IS_ERR(cm_id)) goto out; cm_id->provider_data = iw_event->provider_data; cm_id->local_addr = iw_event->local_addr; cm_id->remote_addr = iw_event->remote_addr; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); cm_id_priv->state = IW_CM_STATE_CONN_RECV; /* * We could be destroying the listening id. If so, ignore this * upcall. */ spin_lock_irqsave(&listen_id_priv->lock, flags); if (listen_id_priv->state != IW_CM_STATE_LISTEN) { spin_unlock_irqrestore(&listen_id_priv->lock, flags); iw_cm_reject(cm_id, NULL, 0); iw_destroy_cm_id(cm_id); goto out; } spin_unlock_irqrestore(&listen_id_priv->lock, flags); ret = alloc_work_entries(cm_id_priv, 3); if (ret) { iw_cm_reject(cm_id, NULL, 0); iw_destroy_cm_id(cm_id); goto out; } /* Call the client CM handler */ ret = cm_id->cm_handler(cm_id, iw_event); if (ret) { iw_cm_reject(cm_id, NULL, 0); set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); destroy_cm_id(cm_id); if (atomic_read(&cm_id_priv->refcount)==0) free_cm_id(cm_id_priv); } out: if (iw_event->private_data_len) kfree(iw_event->private_data); } /* * Passive Side: CM_ID <-- ESTABLISHED * * The provider generated an ESTABLISHED event which means that * the MPA negotion has completed successfully and we are now in MPA * FPDU mode. * * This event can only be received in the CONN_RECV state. If the * remote peer closed, the ESTABLISHED event would be received followed * by the CLOSE event. If the app closes, it will block until we wake * it up after processing this event. */ static int cm_conn_est_handler(struct iwcm_id_private *cm_id_priv, struct iw_cm_event *iw_event) { unsigned long flags; int ret; spin_lock_irqsave(&cm_id_priv->lock, flags); /* * We clear the CONNECT_WAIT bit here to allow the callback * function to call iw_cm_disconnect. Calling iw_destroy_cm_id * from a callback handler is not allowed. */ clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV); cm_id_priv->state = IW_CM_STATE_ESTABLISHED; spin_unlock_irqrestore(&cm_id_priv->lock, flags); ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); wake_up_all(&cm_id_priv->connect_wait); return ret; } /* * Active Side: CM_ID <-- ESTABLISHED * * The app has called connect and is waiting for the established event to * post it's requests to the server. This event will wake up anyone * blocked in iw_cm_disconnect or iw_destroy_id. */ static int cm_conn_rep_handler(struct iwcm_id_private *cm_id_priv, struct iw_cm_event *iw_event) { unsigned long flags; int ret; spin_lock_irqsave(&cm_id_priv->lock, flags); /* * Clear the connect wait bit so a callback function calling * iw_cm_disconnect will not wait and deadlock this thread */ clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags); BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT); if (iw_event->status == 0) { cm_id_priv->id.local_addr = iw_event->local_addr; cm_id_priv->id.remote_addr = iw_event->remote_addr; cm_id_priv->state = IW_CM_STATE_ESTABLISHED; } else { /* REJECTED or RESET */ cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); cm_id_priv->qp = NULL; cm_id_priv->state = IW_CM_STATE_IDLE; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); if (iw_event->private_data_len) kfree(iw_event->private_data); /* Wake up waiters on connect complete */ wake_up_all(&cm_id_priv->connect_wait); return ret; } /* * CM_ID <-- CLOSING * * If in the ESTABLISHED state, move to CLOSING. */ static void cm_disconnect_handler(struct iwcm_id_private *cm_id_priv, struct iw_cm_event *iw_event) { unsigned long flags; spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id_priv->state == IW_CM_STATE_ESTABLISHED) cm_id_priv->state = IW_CM_STATE_CLOSING; spin_unlock_irqrestore(&cm_id_priv->lock, flags); } /* * CM_ID <-- IDLE * * If in the ESTBLISHED or CLOSING states, the QP will have have been * moved by the provider to the ERR state. Disassociate the CM_ID from * the QP, move to IDLE, and remove the 'connected' reference. * * If in some other state, the cm_id was destroyed asynchronously. * This is the last reference that will result in waking up * the app thread blocked in iw_destroy_cm_id. */ static int cm_close_handler(struct iwcm_id_private *cm_id_priv, struct iw_cm_event *iw_event) { unsigned long flags; int ret = 0; spin_lock_irqsave(&cm_id_priv->lock, flags); if (cm_id_priv->qp) { cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp); cm_id_priv->qp = NULL; } switch (cm_id_priv->state) { case IW_CM_STATE_ESTABLISHED: case IW_CM_STATE_CLOSING: cm_id_priv->state = IW_CM_STATE_IDLE; spin_unlock_irqrestore(&cm_id_priv->lock, flags); ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event); spin_lock_irqsave(&cm_id_priv->lock, flags); break; case IW_CM_STATE_DESTROYING: break; default: BUG(); } spin_unlock_irqrestore(&cm_id_priv->lock, flags); return ret; } static int process_event(struct iwcm_id_private *cm_id_priv, struct iw_cm_event *iw_event) { int ret = 0; switch (iw_event->event) { case IW_CM_EVENT_CONNECT_REQUEST: cm_conn_req_handler(cm_id_priv, iw_event); break; case IW_CM_EVENT_CONNECT_REPLY: ret = cm_conn_rep_handler(cm_id_priv, iw_event); break; case IW_CM_EVENT_ESTABLISHED: ret = cm_conn_est_handler(cm_id_priv, iw_event); break; case IW_CM_EVENT_DISCONNECT: cm_disconnect_handler(cm_id_priv, iw_event); break; case IW_CM_EVENT_CLOSE: ret = cm_close_handler(cm_id_priv, iw_event); break; default: BUG(); } return ret; } /* * Process events on the work_list for the cm_id. If the callback * function requests that the cm_id be deleted, a flag is set in the * cm_id flags to indicate that when the last reference is * removed, the cm_id is to be destroyed. This is necessary to * distinguish between an object that will be destroyed by the app * thread asleep on the destroy_comp list vs. an object destroyed * here synchronously when the last reference is removed. */ static void cm_work_handler(struct work_struct *_work) { struct iwcm_work *work = container_of(_work, struct iwcm_work, work); struct iw_cm_event levent; struct iwcm_id_private *cm_id_priv = work->cm_id; unsigned long flags; int empty; int ret = 0; int destroy_id; spin_lock_irqsave(&cm_id_priv->lock, flags); empty = list_empty(&cm_id_priv->work_list); while (!empty) { work = list_entry(cm_id_priv->work_list.next, struct iwcm_work, list); list_del_init(&work->list); empty = list_empty(&cm_id_priv->work_list); levent = work->event; put_work(work); spin_unlock_irqrestore(&cm_id_priv->lock, flags); ret = process_event(cm_id_priv, &levent); if (ret) { set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); destroy_cm_id(&cm_id_priv->id); } BUG_ON(atomic_read(&cm_id_priv->refcount)==0); destroy_id = test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); if (iwcm_deref_id(cm_id_priv)) { if (destroy_id) { BUG_ON(!list_empty(&cm_id_priv->work_list)); free_cm_id(cm_id_priv); } return; } spin_lock_irqsave(&cm_id_priv->lock, flags); } spin_unlock_irqrestore(&cm_id_priv->lock, flags); } /* * This function is called on interrupt context. Schedule events on * the iwcm_wq thread to allow callback functions to downcall into * the CM and/or block. Events are queued to a per-CM_ID * work_list. If this is the first event on the work_list, the work * element is also queued on the iwcm_wq thread. * * Each event holds a reference on the cm_id. Until the last posted * event has been delivered and processed, the cm_id cannot be * deleted. * * Returns: * 0 - the event was handled. * -ENOMEM - the event was not handled due to lack of resources. */ static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *iw_event) { struct iwcm_work *work; struct iwcm_id_private *cm_id_priv; unsigned long flags; int ret = 0; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); spin_lock_irqsave(&cm_id_priv->lock, flags); work = get_work(cm_id_priv); if (!work) { ret = -ENOMEM; goto out; } INIT_WORK(&work->work, cm_work_handler); work->cm_id = cm_id_priv; work->event = *iw_event; if ((work->event.event == IW_CM_EVENT_CONNECT_REQUEST || work->event.event == IW_CM_EVENT_CONNECT_REPLY) && work->event.private_data_len) { ret = copy_private_data(&work->event); if (ret) { put_work(work); goto out; } } atomic_inc(&cm_id_priv->refcount); if (list_empty(&cm_id_priv->work_list)) { list_add_tail(&work->list, &cm_id_priv->work_list); queue_work(iwcm_wq, &work->work); } else list_add_tail(&work->list, &cm_id_priv->work_list); out: spin_unlock_irqrestore(&cm_id_priv->lock, flags); return ret; } static int iwcm_init_qp_init_attr(struct iwcm_id_private *cm_id_priv, struct ib_qp_attr *qp_attr, int *qp_attr_mask) { unsigned long flags; int ret; spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->state) { case IW_CM_STATE_IDLE: case IW_CM_STATE_CONN_SENT: case IW_CM_STATE_CONN_RECV: case IW_CM_STATE_ESTABLISHED: *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS; qp_attr->qp_access_flags = IB_ACCESS_REMOTE_WRITE| IB_ACCESS_REMOTE_READ; ret = 0; break; default: ret = -EINVAL; break; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); return ret; } static int iwcm_init_qp_rts_attr(struct iwcm_id_private *cm_id_priv, struct ib_qp_attr *qp_attr, int *qp_attr_mask) { unsigned long flags; int ret; spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->state) { case IW_CM_STATE_IDLE: case IW_CM_STATE_CONN_SENT: case IW_CM_STATE_CONN_RECV: case IW_CM_STATE_ESTABLISHED: *qp_attr_mask = 0; ret = 0; break; default: ret = -EINVAL; break; } spin_unlock_irqrestore(&cm_id_priv->lock, flags); return ret; } int iw_cm_init_qp_attr(struct iw_cm_id *cm_id, struct ib_qp_attr *qp_attr, int *qp_attr_mask) { struct iwcm_id_private *cm_id_priv; int ret; cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); switch (qp_attr->qp_state) { case IB_QPS_INIT: case IB_QPS_RTR: ret = iwcm_init_qp_init_attr(cm_id_priv, qp_attr, qp_attr_mask); break; case IB_QPS_RTS: ret = iwcm_init_qp_rts_attr(cm_id_priv, qp_attr, qp_attr_mask); break; default: ret = -EINVAL; break; } return ret; } EXPORT_SYMBOL(iw_cm_init_qp_attr); static int __init iw_cm_init(void) { iwcm_wq = create_singlethread_workqueue("iw_cm_wq"); if (!iwcm_wq) return -ENOMEM; return 0; } static void __exit iw_cm_cleanup(void) { destroy_workqueue(iwcm_wq); } module_init(iw_cm_init); module_exit(iw_cm_cleanup); Index: projects/clang380-import/sys/ofed/include/rdma/iw_cm.h =================================================================== --- projects/clang380-import/sys/ofed/include/rdma/iw_cm.h (revision 294776) +++ projects/clang380-import/sys/ofed/include/rdma/iw_cm.h (revision 294777) @@ -1,253 +1,257 @@ /* * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * Copyright (c) 2016 Chelsio Communications. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #ifndef IW_CM_H #define IW_CM_H #include #include struct iw_cm_id; enum iw_cm_event_type { IW_CM_EVENT_CONNECT_REQUEST = 1, /* connect request received */ IW_CM_EVENT_CONNECT_REPLY, /* reply from active connect request */ IW_CM_EVENT_ESTABLISHED, /* passive side accept successful */ IW_CM_EVENT_DISCONNECT, /* orderly shutdown */ IW_CM_EVENT_CLOSE /* close complete */ }; struct iw_cm_event { enum iw_cm_event_type event; int status; struct sockaddr_in local_addr; struct sockaddr_in remote_addr; void *private_data; void *provider_data; u8 private_data_len; struct socket *so; u8 ord; u8 ird; }; /** * iw_cm_handler - Function to be called by the IW CM when delivering events * to the client. * * @cm_id: The IW CM identifier associated with the event. * @event: Pointer to the event structure. */ typedef int (*iw_cm_handler)(struct iw_cm_id *cm_id, struct iw_cm_event *event); /** * iw_event_handler - Function called by the provider when delivering provider * events to the IW CM. Returns either 0 indicating the event was processed * or -errno if the event could not be processed. * * @cm_id: The IW CM identifier associated with the event. * @event: Pointer to the event structure. */ typedef int (*iw_event_handler)(struct iw_cm_id *cm_id, struct iw_cm_event *event); struct iw_cm_id { iw_cm_handler cm_handler; /* client callback function */ void *context; /* client cb context */ struct ib_device *device; struct sockaddr_in local_addr; struct sockaddr_in remote_addr; void *provider_data; /* provider private data */ iw_event_handler event_handler; /* cb for provider events */ /* Used by provider to add and remove refs on IW cm_id */ void (*add_ref)(struct iw_cm_id *); void (*rem_ref)(struct iw_cm_id *); struct socket *so; }; struct iw_cm_conn_param { const void *private_data; u16 private_data_len; u32 ord; u32 ird; u32 qpn; }; struct iw_cm_verbs { void (*add_ref)(struct ib_qp *qp); void (*rem_ref)(struct ib_qp *qp); struct ib_qp * (*get_qp)(struct ib_device *device, int qpn); int (*connect)(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param); int (*accept)(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param); int (*reject)(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len); - int (*create_listen)(struct iw_cm_id *cm_id, + int (*create_listen_ep)(struct iw_cm_id *cm_id, int backlog); - int (*destroy_listen)(struct iw_cm_id *cm_id); + void (*destroy_listen_ep)(struct iw_cm_id *cm_id); + + void (*newconn)(struct iw_cm_id *parent_cm_id, + struct socket *so); }; /** * iw_create_cm_id - Create an IW CM identifier. * * @device: The IB device on which to create the IW CM identier. * @event_handler: User callback invoked to report events associated with the * returned IW CM identifier. * @context: User specified context associated with the id. */ struct iw_cm_id *iw_create_cm_id(struct ib_device *device, struct socket *so, iw_cm_handler cm_handler, void *context); /** * iw_destroy_cm_id - Destroy an IW CM identifier. * * @cm_id: The previously created IW CM identifier to destroy. * * The client can assume that no events will be delivered for the CM ID after * this function returns. */ void iw_destroy_cm_id(struct iw_cm_id *cm_id); /** * iw_cm_bind_qp - Unbind the specified IW CM identifier and QP * * @cm_id: The IW CM idenfier to unbind from the QP. * @qp: The QP * * This is called by the provider when destroying the QP to ensure * that any references held by the IWCM are released. It may also * be called by the IWCM when destroying a CM_ID to that any * references held by the provider are released. */ void iw_cm_unbind_qp(struct iw_cm_id *cm_id, struct ib_qp *qp); /** * iw_cm_get_qp - Return the ib_qp associated with a QPN * * @ib_device: The IB device * @qpn: The queue pair number */ struct ib_qp *iw_cm_get_qp(struct ib_device *device, int qpn); /** * iw_cm_listen - Listen for incoming connection requests on the * specified IW CM id. * * @cm_id: The IW CM identifier. * @backlog: The maximum number of outstanding un-accepted inbound listen * requests to queue. * * The source address and port number are specified in the IW CM identifier * structure. */ int iw_cm_listen(struct iw_cm_id *cm_id, int backlog); /** * iw_cm_accept - Called to accept an incoming connect request. * * @cm_id: The IW CM identifier associated with the connection request. * @iw_param: Pointer to a structure containing connection establishment * parameters. * * The specified cm_id will have been provided in the event data for a * CONNECT_REQUEST event. Subsequent events related to this connection will be * delivered to the specified IW CM identifier prior and may occur prior to * the return of this function. If this function returns a non-zero value, the * client can assume that no events will be delivered to the specified IW CM * identifier. */ int iw_cm_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param); /** * iw_cm_reject - Reject an incoming connection request. * * @cm_id: Connection identifier associated with the request. * @private_daa: Pointer to data to deliver to the remote peer as part of the * reject message. * @private_data_len: The number of bytes in the private_data parameter. * * The client can assume that no events will be delivered to the specified IW * CM identifier following the return of this function. The private_data * buffer is available for reuse when this function returns. */ int iw_cm_reject(struct iw_cm_id *cm_id, const void *private_data, u8 private_data_len); /** * iw_cm_connect - Called to request a connection to a remote peer. * * @cm_id: The IW CM identifier for the connection. * @iw_param: Pointer to a structure containing connection establishment * parameters. * * Events may be delivered to the specified IW CM identifier prior to the * return of this function. If this function returns a non-zero value, the * client can assume that no events will be delivered to the specified IW CM * identifier. */ int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param); /** * iw_cm_disconnect - Close the specified connection. * * @cm_id: The IW CM identifier to close. * @abrupt: If 0, the connection will be closed gracefully, otherwise, the * connection will be reset. * * The IW CM identifier is still active until the IW_CM_EVENT_CLOSE event is * delivered. */ int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt); /** * iw_cm_init_qp_attr - Called to initialize the attributes of the QP * associated with a IW CM identifier. * * @cm_id: The IW CM identifier associated with the QP * @qp_attr: Pointer to the QP attributes structure. * @qp_attr_mask: Pointer to a bit vector specifying which QP attributes are * valid. */ int iw_cm_init_qp_attr(struct iw_cm_id *cm_id, struct ib_qp_attr *qp_attr, int *qp_attr_mask); #endif /* IW_CM_H */ Index: projects/clang380-import/sys/ofed/include/rdma/rdma_cm.h =================================================================== --- projects/clang380-import/sys/ofed/include/rdma/rdma_cm.h (revision 294776) +++ projects/clang380-import/sys/ofed/include/rdma/rdma_cm.h (revision 294777) @@ -1,404 +1,407 @@ /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2016 Chelsio Communications. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #if !defined(RDMA_CM_H) #define RDMA_CM_H #include #include #include #include /* * Upon receiving a device removal event, users must destroy the associated * RDMA identifier and release all resources allocated with the device. */ enum rdma_cm_event_type { RDMA_CM_EVENT_ADDR_RESOLVED, RDMA_CM_EVENT_ADDR_ERROR, RDMA_CM_EVENT_ROUTE_RESOLVED, RDMA_CM_EVENT_ROUTE_ERROR, RDMA_CM_EVENT_CONNECT_REQUEST, RDMA_CM_EVENT_CONNECT_RESPONSE, RDMA_CM_EVENT_CONNECT_ERROR, RDMA_CM_EVENT_UNREACHABLE, RDMA_CM_EVENT_REJECTED, RDMA_CM_EVENT_ESTABLISHED, RDMA_CM_EVENT_DISCONNECTED, RDMA_CM_EVENT_DEVICE_REMOVAL, RDMA_CM_EVENT_MULTICAST_JOIN, RDMA_CM_EVENT_MULTICAST_ERROR, RDMA_CM_EVENT_ADDR_CHANGE, RDMA_CM_EVENT_TIMEWAIT_EXIT, RDMA_CM_EVENT_ALT_ROUTE_RESOLVED, RDMA_CM_EVENT_ALT_ROUTE_ERROR, RDMA_CM_EVENT_LOAD_ALT_PATH, RDMA_CM_EVENT_ALT_PATH_LOADED, }; enum rdma_port_space { RDMA_PS_SDP = 0x0001, RDMA_PS_IPOIB = 0x0002, RDMA_PS_IB = 0x013F, RDMA_PS_TCP = 0x0106, RDMA_PS_UDP = 0x0111, }; enum alt_path_type { RDMA_ALT_PATH_NONE, RDMA_ALT_PATH_PORT, RDMA_ALT_PATH_LID, RDMA_ALT_PATH_BEST }; struct rdma_addr { struct sockaddr_storage src_addr; struct sockaddr_storage dst_addr; struct rdma_dev_addr dev_addr; }; struct rdma_route { struct rdma_addr addr; struct ib_sa_path_rec *path_rec; int num_paths; }; struct rdma_conn_param { const void *private_data; u8 private_data_len; u8 responder_resources; u8 initiator_depth; u8 flow_control; u8 retry_count; /* ignored when accepting */ u8 rnr_retry_count; /* Fields below ignored if a QP is created on the rdma_cm_id. */ u8 srq; u32 qp_num; }; struct rdma_ud_param { const void *private_data; u8 private_data_len; struct ib_ah_attr ah_attr; u32 qp_num; u32 qkey; u8 alt_path_index; }; struct rdma_cm_event { enum rdma_cm_event_type event; int status; union { struct rdma_conn_param conn; struct rdma_ud_param ud; } param; }; enum rdma_cm_state { RDMA_CM_IDLE, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ROUTE_QUERY, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT, RDMA_CM_DISCONNECT, RDMA_CM_ADDR_BOUND, RDMA_CM_LISTEN, RDMA_CM_DEVICE_REMOVAL, RDMA_CM_DESTROYING }; struct rdma_cm_id; /** * rdma_cm_event_handler - Callback used to report user events. * * Notes: Users may not call rdma_destroy_id from this callback to destroy * the passed in id, or a corresponding listen id. Returning a * non-zero value from the callback will destroy the passed in id. */ typedef int (*rdma_cm_event_handler)(struct rdma_cm_id *id, struct rdma_cm_event *event); struct rdma_cm_id { struct ib_device *device; void *context; struct ib_qp *qp; rdma_cm_event_handler event_handler; struct rdma_route route; enum rdma_port_space ps; enum ib_qp_type qp_type; u8 port_num; void *ucontext; }; /** * rdma_create_id - Create an RDMA identifier. * * @event_handler: User callback invoked to report events associated with the * returned rdma_id. * @context: User specified context associated with the id. * @ps: RDMA port space. * @qp_type: type of queue pair associated with the id. */ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler, void *context, enum rdma_port_space ps, enum ib_qp_type qp_type); /** * rdma_destroy_id - Destroys an RDMA identifier. * * @id: RDMA identifier. * * Note: calling this function has the effect of canceling in-flight * asynchronous operations associated with the id. */ void rdma_destroy_id(struct rdma_cm_id *id); /** * rdma_bind_addr - Bind an RDMA identifier to a source address and * associated RDMA device, if needed. * * @id: RDMA identifier. * @addr: Local address information. Wildcard values are permitted. * * This associates a source address with the RDMA identifier before calling * rdma_listen. If a specific local address is given, the RDMA identifier will * be bound to a local RDMA device. */ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr); /** * rdma_resolve_addr - Resolve destination and optional source addresses * from IP addresses to an RDMA address. If successful, the specified * rdma_cm_id will be bound to a local device. * * @id: RDMA identifier. * @src_addr: Source address information. This parameter may be NULL. * @dst_addr: Destination address information. * @timeout_ms: Time to wait for resolution to complete. */ int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr *dst_addr, int timeout_ms); /** * rdma_resolve_route - Resolve the RDMA address bound to the RDMA identifier * into route information needed to establish a connection. * * This is called on the client side of a connection. * Users must have first called rdma_resolve_addr to resolve a dst_addr * into an RDMA address before calling this routine. */ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms); /** * rdma_enable_apm - Get ready to use APM for the given ID. * Actual Alternate path discovery and load will take place only * after a connection has been established. * * Calling this function only has an effect on the connection's client side. * It should be called after rdma_resolve_route and before rdma_connect. * * @id: RDMA identifier. * @alt_type: Alternate path type to resolve. */ int rdma_enable_apm(struct rdma_cm_id *id, enum alt_path_type alt_type); /** * rdma_create_qp - Allocate a QP and associate it with the specified RDMA * identifier. * * QPs allocated to an rdma_cm_id will automatically be transitioned by the CMA * through their states. */ int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd, struct ib_qp_init_attr *qp_init_attr); /** * rdma_destroy_qp - Deallocate the QP associated with the specified RDMA * identifier. * * Users must destroy any QP associated with an RDMA identifier before * destroying the RDMA ID. */ void rdma_destroy_qp(struct rdma_cm_id *id); /** * rdma_init_qp_attr - Initializes the QP attributes for use in transitioning * to a specified QP state. * @id: Communication identifier associated with the QP attributes to * initialize. * @qp_attr: On input, specifies the desired QP state. On output, the * mandatory and desired optional attributes will be set in order to * modify the QP to the specified state. * @qp_attr_mask: The QP attribute mask that may be used to transition the * QP to the specified state. * * Users must set the @qp_attr->qp_state to the desired QP state. This call * will set all required attributes for the given transition, along with * known optional attributes. Users may override the attributes returned from * this call before calling ib_modify_qp. * * Users that wish to have their QP automatically transitioned through its * states can associate a QP with the rdma_cm_id by calling rdma_create_qp(). */ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr, int *qp_attr_mask); /** * rdma_connect - Initiate an active connection request. * @id: Connection identifier to connect. * @conn_param: Connection information used for connected QPs. * * Users must have resolved a route for the rdma_cm_id to connect with * by having called rdma_resolve_route before calling this routine. * * This call will either connect to a remote QP or obtain remote QP * information for unconnected rdma_cm_id's. The actual operation is * based on the rdma_cm_id's port space. */ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); /** * rdma_listen - This function is called by the passive side to * listen for incoming connection requests. * * Users must have bound the rdma_cm_id to a local address by calling * rdma_bind_addr before calling this routine. */ int rdma_listen(struct rdma_cm_id *id, int backlog); /** * rdma_accept - Called to accept a connection request or response. * @id: Connection identifier associated with the request. * @conn_param: Information needed to establish the connection. This must be * provided if accepting a connection request. If accepting a connection * response, this parameter must be NULL. * * Typically, this routine is only called by the listener to accept a connection * request. It must also be called on the active side of a connection if the * user is performing their own QP transitions. * * In the case of error, a reject message is sent to the remote side and the * state of the qp associated with the id is modified to error, such that any * previously posted receive buffers would be flushed. */ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param); /** * rdma_notify - Notifies the RDMA CM of an asynchronous event that has * occurred on the connection. * @id: Connection identifier to transition to established. * @event: Asynchronous event. * * This routine should be invoked by users to notify the CM of relevant * communication events. Events that should be reported to the CM and * when to report them are: * * IB_EVENT_COMM_EST - Used when a message is received on a connected * QP before an RTU has been received. */ int rdma_notify(struct rdma_cm_id *id, enum ib_event_type event); /** * rdma_reject - Called to reject a connection request or response. */ int rdma_reject(struct rdma_cm_id *id, const void *private_data, u8 private_data_len); /** * rdma_disconnect - This function disconnects the associated QP and * transitions it into the error state. */ int rdma_disconnect(struct rdma_cm_id *id); /** * rdma_join_multicast - Join the multicast group specified by the given * address. * @id: Communication identifier associated with the request. * @addr: Multicast address identifying the group to join. * @context: User-defined context associated with the join request, returned * to the user through the private_data pointer in multicast events. */ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, void *context); /** * rdma_leave_multicast - Leave the multicast group specified by the given * address. */ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr); /** * rdma_set_service_type - Set the type of service associated with a * connection identifier. * @id: Communication identifier to associated with service type. * @tos: Type of service. * * The type of service is interpretted as a differentiated service * field (RFC 2474). The service type should be specified before * performing route resolution, as existing communication on the * connection identifier may be unaffected. The type of service * requested may not be supported by the network to all destinations. */ void rdma_set_service_type(struct rdma_cm_id *id, int tos); /** * rdma_set_reuseaddr - Allow the reuse of local addresses when binding * the rdma_cm_id. * @id: Communication identifier to configure. * @reuse: Value indicating if the bound address is reusable. * * Reuse must be set before an address is bound to the id. */ int rdma_set_reuseaddr(struct rdma_cm_id *id, int reuse); /** * rdma_set_afonly - Specify that listens are restricted to the * bound address family only. * @id: Communication identifer to configure. * @afonly: Value indicating if listens are restricted. * * Must be set before identifier is in the listening state. */ int rdma_set_afonly(struct rdma_cm_id *id, int afonly); /** * rdma_set_timeout - Set the QP timeout associated with a connection * identifier. * @id: Communication identifier to associated with service type. * @timeout: QP timeout */ void rdma_set_timeout(struct rdma_cm_id *id, int timeout); - +int rdma_cma_any_addr(struct sockaddr *addr); +int rdma_find_cmid_laddr(struct sockaddr_in *local_addr, + unsigned short dev_type, void **cm_id); #endif /* RDMA_CM_H */ Index: projects/clang380-import/sys/powerpc/booke/pmap.c =================================================================== --- projects/clang380-import/sys/powerpc/booke/pmap.c (revision 294776) +++ projects/clang380-import/sys/powerpc/booke/pmap.c (revision 294777) @@ -1,3498 +1,3498 @@ /*- * Copyright (C) 2007-2009 Semihalf, Rafal Jaworowski * Copyright (C) 2006 Semihalf, Marian Balakowicz * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN * NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * Some hw specific parts of this pmap were derived or influenced * by NetBSD's ibm4xx pmap module. More generic code is shared with * a few other pmap modules from the FreeBSD tree. */ /* * VM layout notes: * * Kernel and user threads run within one common virtual address space * defined by AS=0. * * Virtual address space layout: * ----------------------------- * 0x0000_0000 - 0xafff_ffff : user process * 0xb000_0000 - 0xbfff_ffff : pmap_mapdev()-ed area (PCI/PCIE etc.) * 0xc000_0000 - 0xc0ff_ffff : kernel reserved * 0xc000_0000 - data_end : kernel code+data, env, metadata etc. * 0xc100_0000 - 0xfeef_ffff : KVA * 0xc100_0000 - 0xc100_3fff : reserved for page zero/copy * 0xc100_4000 - 0xc200_3fff : reserved for ptbl bufs * 0xc200_4000 - 0xc200_8fff : guard page + kstack0 * 0xc200_9000 - 0xfeef_ffff : actual free KVA space * 0xfef0_0000 - 0xffff_ffff : I/O devices region */ #include __FBSDID("$FreeBSD$"); #include "opt_kstack_pages.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "mmu_if.h" #ifdef DEBUG #define debugf(fmt, args...) printf(fmt, ##args) #else #define debugf(fmt, args...) #endif #define TODO panic("%s: not implemented", __func__); extern unsigned char _etext[]; extern unsigned char _end[]; extern uint32_t *bootinfo; #ifdef SMP extern uint32_t bp_ntlb1s; #endif vm_paddr_t kernload; vm_offset_t kernstart; vm_size_t kernsize; /* Message buffer and tables. */ static vm_offset_t data_start; static vm_size_t data_end; /* Phys/avail memory regions. */ static struct mem_region *availmem_regions; static int availmem_regions_sz; static struct mem_region *physmem_regions; static int physmem_regions_sz; /* Reserved KVA space and mutex for mmu_booke_zero_page. */ static vm_offset_t zero_page_va; static struct mtx zero_page_mutex; static struct mtx tlbivax_mutex; /* * Reserved KVA space for mmu_booke_zero_page_idle. This is used * by idle thred only, no lock required. */ static vm_offset_t zero_page_idle_va; /* Reserved KVA space and mutex for mmu_booke_copy_page. */ static vm_offset_t copy_page_src_va; static vm_offset_t copy_page_dst_va; static struct mtx copy_page_mutex; /**************************************************************************/ /* PMAP */ /**************************************************************************/ static int mmu_booke_enter_locked(mmu_t, pmap_t, vm_offset_t, vm_page_t, vm_prot_t, u_int flags, int8_t psind); unsigned int kptbl_min; /* Index of the first kernel ptbl. */ unsigned int kernel_ptbls; /* Number of KVA ptbls. */ /* * If user pmap is processed with mmu_booke_remove and the resident count * drops to 0, there are no more pages to remove, so we need not continue. */ #define PMAP_REMOVE_DONE(pmap) \ ((pmap) != kernel_pmap && (pmap)->pm_stats.resident_count == 0) extern int elf32_nxstack; /**************************************************************************/ /* TLB and TID handling */ /**************************************************************************/ /* Translation ID busy table */ static volatile pmap_t tidbusy[MAXCPU][TID_MAX + 1]; /* * TLB0 capabilities (entry, way numbers etc.). These can vary between e500 * core revisions and should be read from h/w registers during early config. */ uint32_t tlb0_entries; uint32_t tlb0_ways; uint32_t tlb0_entries_per_way; uint32_t tlb1_entries; #define TLB0_ENTRIES (tlb0_entries) #define TLB0_WAYS (tlb0_ways) #define TLB0_ENTRIES_PER_WAY (tlb0_entries_per_way) #define TLB1_ENTRIES (tlb1_entries) #define TLB1_MAXENTRIES 64 /* In-ram copy of the TLB1 */ static tlb_entry_t tlb1[TLB1_MAXENTRIES]; /* Next free entry in the TLB1 */ static unsigned int tlb1_idx; static vm_offset_t tlb1_map_base = VM_MAX_KERNEL_ADDRESS; static tlbtid_t tid_alloc(struct pmap *); static void tid_flush(tlbtid_t tid); static void tlb_print_entry(int, uint32_t, uint32_t, uint32_t, uint32_t); static int tlb1_set_entry(vm_offset_t, vm_paddr_t, vm_size_t, uint32_t); static void tlb1_write_entry(unsigned int); static int tlb1_iomapped(int, vm_paddr_t, vm_size_t, vm_offset_t *); static vm_size_t tlb1_mapin_region(vm_offset_t, vm_paddr_t, vm_size_t); static vm_size_t tsize2size(unsigned int); static unsigned int size2tsize(vm_size_t); static unsigned int ilog2(unsigned int); static void set_mas4_defaults(void); static inline void tlb0_flush_entry(vm_offset_t); static inline unsigned int tlb0_tableidx(vm_offset_t, unsigned int); /**************************************************************************/ /* Page table management */ /**************************************************************************/ static struct rwlock_padalign pvh_global_lock; /* Data for the pv entry allocation mechanism */ static uma_zone_t pvzone; static int pv_entry_count = 0, pv_entry_max = 0, pv_entry_high_water = 0; #define PV_ENTRY_ZONE_MIN 2048 /* min pv entries in uma zone */ #ifndef PMAP_SHPGPERPROC #define PMAP_SHPGPERPROC 200 #endif static void ptbl_init(void); static struct ptbl_buf *ptbl_buf_alloc(void); static void ptbl_buf_free(struct ptbl_buf *); static void ptbl_free_pmap_ptbl(pmap_t, pte_t *); static pte_t *ptbl_alloc(mmu_t, pmap_t, unsigned int, boolean_t); static void ptbl_free(mmu_t, pmap_t, unsigned int); static void ptbl_hold(mmu_t, pmap_t, unsigned int); static int ptbl_unhold(mmu_t, pmap_t, unsigned int); static vm_paddr_t pte_vatopa(mmu_t, pmap_t, vm_offset_t); static pte_t *pte_find(mmu_t, pmap_t, vm_offset_t); static int pte_enter(mmu_t, pmap_t, vm_page_t, vm_offset_t, uint32_t, boolean_t); static int pte_remove(mmu_t, pmap_t, vm_offset_t, uint8_t); static void kernel_pte_alloc(vm_offset_t data_end, vm_offset_t addr, vm_offset_t pdir); static pv_entry_t pv_alloc(void); static void pv_free(pv_entry_t); static void pv_insert(pmap_t, vm_offset_t, vm_page_t); static void pv_remove(pmap_t, vm_offset_t, vm_page_t); static void booke_pmap_init_qpages(void); /* Number of kva ptbl buffers, each covering one ptbl (PTBL_PAGES). */ #define PTBL_BUFS (128 * 16) struct ptbl_buf { TAILQ_ENTRY(ptbl_buf) link; /* list link */ vm_offset_t kva; /* va of mapping */ }; /* ptbl free list and a lock used for access synchronization. */ static TAILQ_HEAD(, ptbl_buf) ptbl_buf_freelist; static struct mtx ptbl_buf_freelist_lock; /* Base address of kva space allocated fot ptbl bufs. */ static vm_offset_t ptbl_buf_pool_vabase; /* Pointer to ptbl_buf structures. */ static struct ptbl_buf *ptbl_bufs; #ifdef SMP void pmap_bootstrap_ap(volatile uint32_t *); #endif /* * Kernel MMU interface */ static void mmu_booke_clear_modify(mmu_t, vm_page_t); static void mmu_booke_copy(mmu_t, pmap_t, pmap_t, vm_offset_t, vm_size_t, vm_offset_t); static void mmu_booke_copy_page(mmu_t, vm_page_t, vm_page_t); static void mmu_booke_copy_pages(mmu_t, vm_page_t *, vm_offset_t, vm_page_t *, vm_offset_t, int); static int mmu_booke_enter(mmu_t, pmap_t, vm_offset_t, vm_page_t, vm_prot_t, u_int flags, int8_t psind); static void mmu_booke_enter_object(mmu_t, pmap_t, vm_offset_t, vm_offset_t, vm_page_t, vm_prot_t); static void mmu_booke_enter_quick(mmu_t, pmap_t, vm_offset_t, vm_page_t, vm_prot_t); static vm_paddr_t mmu_booke_extract(mmu_t, pmap_t, vm_offset_t); static vm_page_t mmu_booke_extract_and_hold(mmu_t, pmap_t, vm_offset_t, vm_prot_t); static void mmu_booke_init(mmu_t); static boolean_t mmu_booke_is_modified(mmu_t, vm_page_t); static boolean_t mmu_booke_is_prefaultable(mmu_t, pmap_t, vm_offset_t); static boolean_t mmu_booke_is_referenced(mmu_t, vm_page_t); static int mmu_booke_ts_referenced(mmu_t, vm_page_t); static vm_offset_t mmu_booke_map(mmu_t, vm_offset_t *, vm_paddr_t, vm_paddr_t, int); static int mmu_booke_mincore(mmu_t, pmap_t, vm_offset_t, vm_paddr_t *); static void mmu_booke_object_init_pt(mmu_t, pmap_t, vm_offset_t, vm_object_t, vm_pindex_t, vm_size_t); static boolean_t mmu_booke_page_exists_quick(mmu_t, pmap_t, vm_page_t); static void mmu_booke_page_init(mmu_t, vm_page_t); static int mmu_booke_page_wired_mappings(mmu_t, vm_page_t); static void mmu_booke_pinit(mmu_t, pmap_t); static void mmu_booke_pinit0(mmu_t, pmap_t); static void mmu_booke_protect(mmu_t, pmap_t, vm_offset_t, vm_offset_t, vm_prot_t); static void mmu_booke_qenter(mmu_t, vm_offset_t, vm_page_t *, int); static void mmu_booke_qremove(mmu_t, vm_offset_t, int); static void mmu_booke_release(mmu_t, pmap_t); static void mmu_booke_remove(mmu_t, pmap_t, vm_offset_t, vm_offset_t); static void mmu_booke_remove_all(mmu_t, vm_page_t); static void mmu_booke_remove_write(mmu_t, vm_page_t); static void mmu_booke_unwire(mmu_t, pmap_t, vm_offset_t, vm_offset_t); static void mmu_booke_zero_page(mmu_t, vm_page_t); static void mmu_booke_zero_page_area(mmu_t, vm_page_t, int, int); static void mmu_booke_zero_page_idle(mmu_t, vm_page_t); static void mmu_booke_activate(mmu_t, struct thread *); static void mmu_booke_deactivate(mmu_t, struct thread *); static void mmu_booke_bootstrap(mmu_t, vm_offset_t, vm_offset_t); static void *mmu_booke_mapdev(mmu_t, vm_paddr_t, vm_size_t); static void *mmu_booke_mapdev_attr(mmu_t, vm_paddr_t, vm_size_t, vm_memattr_t); static void mmu_booke_unmapdev(mmu_t, vm_offset_t, vm_size_t); static vm_paddr_t mmu_booke_kextract(mmu_t, vm_offset_t); static void mmu_booke_kenter(mmu_t, vm_offset_t, vm_paddr_t); static void mmu_booke_kenter_attr(mmu_t, vm_offset_t, vm_paddr_t, vm_memattr_t); static void mmu_booke_kremove(mmu_t, vm_offset_t); static boolean_t mmu_booke_dev_direct_mapped(mmu_t, vm_paddr_t, vm_size_t); static void mmu_booke_sync_icache(mmu_t, pmap_t, vm_offset_t, vm_size_t); static void mmu_booke_dumpsys_map(mmu_t, vm_paddr_t pa, size_t, void **); static void mmu_booke_dumpsys_unmap(mmu_t, vm_paddr_t pa, size_t, void *); static void mmu_booke_scan_init(mmu_t); static vm_offset_t mmu_booke_quick_enter_page(mmu_t mmu, vm_page_t m); static void mmu_booke_quick_remove_page(mmu_t mmu, vm_offset_t addr); static mmu_method_t mmu_booke_methods[] = { /* pmap dispatcher interface */ MMUMETHOD(mmu_clear_modify, mmu_booke_clear_modify), MMUMETHOD(mmu_copy, mmu_booke_copy), MMUMETHOD(mmu_copy_page, mmu_booke_copy_page), MMUMETHOD(mmu_copy_pages, mmu_booke_copy_pages), MMUMETHOD(mmu_enter, mmu_booke_enter), MMUMETHOD(mmu_enter_object, mmu_booke_enter_object), MMUMETHOD(mmu_enter_quick, mmu_booke_enter_quick), MMUMETHOD(mmu_extract, mmu_booke_extract), MMUMETHOD(mmu_extract_and_hold, mmu_booke_extract_and_hold), MMUMETHOD(mmu_init, mmu_booke_init), MMUMETHOD(mmu_is_modified, mmu_booke_is_modified), MMUMETHOD(mmu_is_prefaultable, mmu_booke_is_prefaultable), MMUMETHOD(mmu_is_referenced, mmu_booke_is_referenced), MMUMETHOD(mmu_ts_referenced, mmu_booke_ts_referenced), MMUMETHOD(mmu_map, mmu_booke_map), MMUMETHOD(mmu_mincore, mmu_booke_mincore), MMUMETHOD(mmu_object_init_pt, mmu_booke_object_init_pt), MMUMETHOD(mmu_page_exists_quick,mmu_booke_page_exists_quick), MMUMETHOD(mmu_page_init, mmu_booke_page_init), MMUMETHOD(mmu_page_wired_mappings, mmu_booke_page_wired_mappings), MMUMETHOD(mmu_pinit, mmu_booke_pinit), MMUMETHOD(mmu_pinit0, mmu_booke_pinit0), MMUMETHOD(mmu_protect, mmu_booke_protect), MMUMETHOD(mmu_qenter, mmu_booke_qenter), MMUMETHOD(mmu_qremove, mmu_booke_qremove), MMUMETHOD(mmu_release, mmu_booke_release), MMUMETHOD(mmu_remove, mmu_booke_remove), MMUMETHOD(mmu_remove_all, mmu_booke_remove_all), MMUMETHOD(mmu_remove_write, mmu_booke_remove_write), MMUMETHOD(mmu_sync_icache, mmu_booke_sync_icache), MMUMETHOD(mmu_unwire, mmu_booke_unwire), MMUMETHOD(mmu_zero_page, mmu_booke_zero_page), MMUMETHOD(mmu_zero_page_area, mmu_booke_zero_page_area), MMUMETHOD(mmu_zero_page_idle, mmu_booke_zero_page_idle), MMUMETHOD(mmu_activate, mmu_booke_activate), MMUMETHOD(mmu_deactivate, mmu_booke_deactivate), MMUMETHOD(mmu_quick_enter_page, mmu_booke_quick_enter_page), MMUMETHOD(mmu_quick_remove_page, mmu_booke_quick_remove_page), /* Internal interfaces */ MMUMETHOD(mmu_bootstrap, mmu_booke_bootstrap), MMUMETHOD(mmu_dev_direct_mapped,mmu_booke_dev_direct_mapped), MMUMETHOD(mmu_mapdev, mmu_booke_mapdev), MMUMETHOD(mmu_mapdev_attr, mmu_booke_mapdev_attr), MMUMETHOD(mmu_kenter, mmu_booke_kenter), MMUMETHOD(mmu_kenter_attr, mmu_booke_kenter_attr), MMUMETHOD(mmu_kextract, mmu_booke_kextract), /* MMUMETHOD(mmu_kremove, mmu_booke_kremove), */ MMUMETHOD(mmu_unmapdev, mmu_booke_unmapdev), /* dumpsys() support */ MMUMETHOD(mmu_dumpsys_map, mmu_booke_dumpsys_map), MMUMETHOD(mmu_dumpsys_unmap, mmu_booke_dumpsys_unmap), MMUMETHOD(mmu_scan_init, mmu_booke_scan_init), { 0, 0 } }; MMU_DEF(booke_mmu, MMU_TYPE_BOOKE, mmu_booke_methods, 0); static __inline uint32_t tlb_calc_wimg(vm_paddr_t pa, vm_memattr_t ma) { uint32_t attrib; int i; if (ma != VM_MEMATTR_DEFAULT) { switch (ma) { case VM_MEMATTR_UNCACHEABLE: return (PTE_I | PTE_G); case VM_MEMATTR_WRITE_COMBINING: case VM_MEMATTR_WRITE_BACK: case VM_MEMATTR_PREFETCHABLE: return (PTE_I); case VM_MEMATTR_WRITE_THROUGH: return (PTE_W | PTE_M); } } /* * Assume the page is cache inhibited and access is guarded unless * it's in our available memory array. */ attrib = _TLB_ENTRY_IO; for (i = 0; i < physmem_regions_sz; i++) { if ((pa >= physmem_regions[i].mr_start) && (pa < (physmem_regions[i].mr_start + physmem_regions[i].mr_size))) { attrib = _TLB_ENTRY_MEM; break; } } return (attrib); } static inline void tlb_miss_lock(void) { #ifdef SMP struct pcpu *pc; if (!smp_started) return; STAILQ_FOREACH(pc, &cpuhead, pc_allcpu) { if (pc != pcpup) { CTR3(KTR_PMAP, "%s: tlb miss LOCK of CPU=%d, " "tlb_lock=%p", __func__, pc->pc_cpuid, pc->pc_booke_tlb_lock); KASSERT((pc->pc_cpuid != PCPU_GET(cpuid)), ("tlb_miss_lock: tried to lock self")); tlb_lock(pc->pc_booke_tlb_lock); CTR1(KTR_PMAP, "%s: locked", __func__); } } #endif } static inline void tlb_miss_unlock(void) { #ifdef SMP struct pcpu *pc; if (!smp_started) return; STAILQ_FOREACH(pc, &cpuhead, pc_allcpu) { if (pc != pcpup) { CTR2(KTR_PMAP, "%s: tlb miss UNLOCK of CPU=%d", __func__, pc->pc_cpuid); tlb_unlock(pc->pc_booke_tlb_lock); CTR1(KTR_PMAP, "%s: unlocked", __func__); } } #endif } /* Return number of entries in TLB0. */ static __inline void tlb0_get_tlbconf(void) { uint32_t tlb0_cfg; tlb0_cfg = mfspr(SPR_TLB0CFG); tlb0_entries = tlb0_cfg & TLBCFG_NENTRY_MASK; tlb0_ways = (tlb0_cfg & TLBCFG_ASSOC_MASK) >> TLBCFG_ASSOC_SHIFT; tlb0_entries_per_way = tlb0_entries / tlb0_ways; } /* Return number of entries in TLB1. */ static __inline void tlb1_get_tlbconf(void) { uint32_t tlb1_cfg; tlb1_cfg = mfspr(SPR_TLB1CFG); tlb1_entries = tlb1_cfg & TLBCFG_NENTRY_MASK; } /**************************************************************************/ /* Page table related */ /**************************************************************************/ /* Initialize pool of kva ptbl buffers. */ static void ptbl_init(void) { int i; CTR3(KTR_PMAP, "%s: s (ptbl_bufs = 0x%08x size 0x%08x)", __func__, (uint32_t)ptbl_bufs, sizeof(struct ptbl_buf) * PTBL_BUFS); CTR3(KTR_PMAP, "%s: s (ptbl_buf_pool_vabase = 0x%08x size = 0x%08x)", __func__, ptbl_buf_pool_vabase, PTBL_BUFS * PTBL_PAGES * PAGE_SIZE); mtx_init(&ptbl_buf_freelist_lock, "ptbl bufs lock", NULL, MTX_DEF); TAILQ_INIT(&ptbl_buf_freelist); for (i = 0; i < PTBL_BUFS; i++) { ptbl_bufs[i].kva = ptbl_buf_pool_vabase + i * PTBL_PAGES * PAGE_SIZE; TAILQ_INSERT_TAIL(&ptbl_buf_freelist, &ptbl_bufs[i], link); } } /* Get a ptbl_buf from the freelist. */ static struct ptbl_buf * ptbl_buf_alloc(void) { struct ptbl_buf *buf; mtx_lock(&ptbl_buf_freelist_lock); buf = TAILQ_FIRST(&ptbl_buf_freelist); if (buf != NULL) TAILQ_REMOVE(&ptbl_buf_freelist, buf, link); mtx_unlock(&ptbl_buf_freelist_lock); CTR2(KTR_PMAP, "%s: buf = %p", __func__, buf); return (buf); } /* Return ptbl buff to free pool. */ static void ptbl_buf_free(struct ptbl_buf *buf) { CTR2(KTR_PMAP, "%s: buf = %p", __func__, buf); mtx_lock(&ptbl_buf_freelist_lock); TAILQ_INSERT_TAIL(&ptbl_buf_freelist, buf, link); mtx_unlock(&ptbl_buf_freelist_lock); } /* * Search the list of allocated ptbl bufs and find on list of allocated ptbls */ static void ptbl_free_pmap_ptbl(pmap_t pmap, pte_t *ptbl) { struct ptbl_buf *pbuf; CTR2(KTR_PMAP, "%s: ptbl = %p", __func__, ptbl); PMAP_LOCK_ASSERT(pmap, MA_OWNED); TAILQ_FOREACH(pbuf, &pmap->pm_ptbl_list, link) if (pbuf->kva == (vm_offset_t)ptbl) { /* Remove from pmap ptbl buf list. */ TAILQ_REMOVE(&pmap->pm_ptbl_list, pbuf, link); /* Free corresponding ptbl buf. */ ptbl_buf_free(pbuf); break; } } /* Allocate page table. */ static pte_t * ptbl_alloc(mmu_t mmu, pmap_t pmap, unsigned int pdir_idx, boolean_t nosleep) { vm_page_t mtbl[PTBL_PAGES]; vm_page_t m; struct ptbl_buf *pbuf; unsigned int pidx; pte_t *ptbl; int i, j; CTR4(KTR_PMAP, "%s: pmap = %p su = %d pdir_idx = %d", __func__, pmap, (pmap == kernel_pmap), pdir_idx); KASSERT((pdir_idx <= (VM_MAXUSER_ADDRESS / PDIR_SIZE)), ("ptbl_alloc: invalid pdir_idx")); KASSERT((pmap->pm_pdir[pdir_idx] == NULL), ("pte_alloc: valid ptbl entry exists!")); pbuf = ptbl_buf_alloc(); if (pbuf == NULL) panic("pte_alloc: couldn't alloc kernel virtual memory"); ptbl = (pte_t *)pbuf->kva; CTR2(KTR_PMAP, "%s: ptbl kva = %p", __func__, ptbl); /* Allocate ptbl pages, this will sleep! */ for (i = 0; i < PTBL_PAGES; i++) { pidx = (PTBL_PAGES * pdir_idx) + i; while ((m = vm_page_alloc(NULL, pidx, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) { PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); if (nosleep) { ptbl_free_pmap_ptbl(pmap, ptbl); for (j = 0; j < i; j++) vm_page_free(mtbl[j]); atomic_subtract_int(&vm_cnt.v_wire_count, i); return (NULL); } VM_WAIT; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); } mtbl[i] = m; } /* Map allocated pages into kernel_pmap. */ mmu_booke_qenter(mmu, (vm_offset_t)ptbl, mtbl, PTBL_PAGES); /* Zero whole ptbl. */ bzero((caddr_t)ptbl, PTBL_PAGES * PAGE_SIZE); /* Add pbuf to the pmap ptbl bufs list. */ TAILQ_INSERT_TAIL(&pmap->pm_ptbl_list, pbuf, link); return (ptbl); } /* Free ptbl pages and invalidate pdir entry. */ static void ptbl_free(mmu_t mmu, pmap_t pmap, unsigned int pdir_idx) { pte_t *ptbl; vm_paddr_t pa; vm_offset_t va; vm_page_t m; int i; CTR4(KTR_PMAP, "%s: pmap = %p su = %d pdir_idx = %d", __func__, pmap, (pmap == kernel_pmap), pdir_idx); KASSERT((pdir_idx <= (VM_MAXUSER_ADDRESS / PDIR_SIZE)), ("ptbl_free: invalid pdir_idx")); ptbl = pmap->pm_pdir[pdir_idx]; CTR2(KTR_PMAP, "%s: ptbl = %p", __func__, ptbl); KASSERT((ptbl != NULL), ("ptbl_free: null ptbl")); /* * Invalidate the pdir entry as soon as possible, so that other CPUs * don't attempt to look up the page tables we are releasing. */ mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); pmap->pm_pdir[pdir_idx] = NULL; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); for (i = 0; i < PTBL_PAGES; i++) { va = ((vm_offset_t)ptbl + (i * PAGE_SIZE)); pa = pte_vatopa(mmu, kernel_pmap, va); m = PHYS_TO_VM_PAGE(pa); vm_page_free_zero(m); atomic_subtract_int(&vm_cnt.v_wire_count, 1); mmu_booke_kremove(mmu, va); } ptbl_free_pmap_ptbl(pmap, ptbl); } /* * Decrement ptbl pages hold count and attempt to free ptbl pages. * Called when removing pte entry from ptbl. * * Return 1 if ptbl pages were freed. */ static int ptbl_unhold(mmu_t mmu, pmap_t pmap, unsigned int pdir_idx) { pte_t *ptbl; vm_paddr_t pa; vm_page_t m; int i; CTR4(KTR_PMAP, "%s: pmap = %p su = %d pdir_idx = %d", __func__, pmap, (pmap == kernel_pmap), pdir_idx); KASSERT((pdir_idx <= (VM_MAXUSER_ADDRESS / PDIR_SIZE)), ("ptbl_unhold: invalid pdir_idx")); KASSERT((pmap != kernel_pmap), ("ptbl_unhold: unholding kernel ptbl!")); ptbl = pmap->pm_pdir[pdir_idx]; //debugf("ptbl_unhold: ptbl = 0x%08x\n", (u_int32_t)ptbl); KASSERT(((vm_offset_t)ptbl >= VM_MIN_KERNEL_ADDRESS), ("ptbl_unhold: non kva ptbl")); /* decrement hold count */ for (i = 0; i < PTBL_PAGES; i++) { pa = pte_vatopa(mmu, kernel_pmap, (vm_offset_t)ptbl + (i * PAGE_SIZE)); m = PHYS_TO_VM_PAGE(pa); m->wire_count--; } /* * Free ptbl pages if there are no pte etries in this ptbl. * wire_count has the same value for all ptbl pages, so check the last * page. */ if (m->wire_count == 0) { ptbl_free(mmu, pmap, pdir_idx); //debugf("ptbl_unhold: e (freed ptbl)\n"); return (1); } return (0); } /* * Increment hold count for ptbl pages. This routine is used when a new pte * entry is being inserted into the ptbl. */ static void ptbl_hold(mmu_t mmu, pmap_t pmap, unsigned int pdir_idx) { vm_paddr_t pa; pte_t *ptbl; vm_page_t m; int i; CTR3(KTR_PMAP, "%s: pmap = %p pdir_idx = %d", __func__, pmap, pdir_idx); KASSERT((pdir_idx <= (VM_MAXUSER_ADDRESS / PDIR_SIZE)), ("ptbl_hold: invalid pdir_idx")); KASSERT((pmap != kernel_pmap), ("ptbl_hold: holding kernel ptbl!")); ptbl = pmap->pm_pdir[pdir_idx]; KASSERT((ptbl != NULL), ("ptbl_hold: null ptbl")); for (i = 0; i < PTBL_PAGES; i++) { pa = pte_vatopa(mmu, kernel_pmap, (vm_offset_t)ptbl + (i * PAGE_SIZE)); m = PHYS_TO_VM_PAGE(pa); m->wire_count++; } } /* Allocate pv_entry structure. */ pv_entry_t pv_alloc(void) { pv_entry_t pv; pv_entry_count++; if (pv_entry_count > pv_entry_high_water) pagedaemon_wakeup(); pv = uma_zalloc(pvzone, M_NOWAIT); return (pv); } /* Free pv_entry structure. */ static __inline void pv_free(pv_entry_t pve) { pv_entry_count--; uma_zfree(pvzone, pve); } /* Allocate and initialize pv_entry structure. */ static void pv_insert(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pve; //int su = (pmap == kernel_pmap); //debugf("pv_insert: s (su = %d pmap = 0x%08x va = 0x%08x m = 0x%08x)\n", su, // (u_int32_t)pmap, va, (u_int32_t)m); pve = pv_alloc(); if (pve == NULL) panic("pv_insert: no pv entries!"); pve->pv_pmap = pmap; pve->pv_va = va; /* add to pv_list */ PMAP_LOCK_ASSERT(pmap, MA_OWNED); rw_assert(&pvh_global_lock, RA_WLOCKED); TAILQ_INSERT_TAIL(&m->md.pv_list, pve, pv_link); //debugf("pv_insert: e\n"); } /* Destroy pv entry. */ static void pv_remove(pmap_t pmap, vm_offset_t va, vm_page_t m) { pv_entry_t pve; //int su = (pmap == kernel_pmap); //debugf("pv_remove: s (su = %d pmap = 0x%08x va = 0x%08x)\n", su, (u_int32_t)pmap, va); PMAP_LOCK_ASSERT(pmap, MA_OWNED); rw_assert(&pvh_global_lock, RA_WLOCKED); /* find pv entry */ TAILQ_FOREACH(pve, &m->md.pv_list, pv_link) { if ((pmap == pve->pv_pmap) && (va == pve->pv_va)) { /* remove from pv_list */ TAILQ_REMOVE(&m->md.pv_list, pve, pv_link); if (TAILQ_EMPTY(&m->md.pv_list)) vm_page_aflag_clear(m, PGA_WRITEABLE); /* free pv entry struct */ pv_free(pve); break; } } //debugf("pv_remove: e\n"); } /* * Clean pte entry, try to free page table page if requested. * * Return 1 if ptbl pages were freed, otherwise return 0. */ static int pte_remove(mmu_t mmu, pmap_t pmap, vm_offset_t va, uint8_t flags) { unsigned int pdir_idx = PDIR_IDX(va); unsigned int ptbl_idx = PTBL_IDX(va); vm_page_t m; pte_t *ptbl; pte_t *pte; //int su = (pmap == kernel_pmap); //debugf("pte_remove: s (su = %d pmap = 0x%08x va = 0x%08x flags = %d)\n", // su, (u_int32_t)pmap, va, flags); ptbl = pmap->pm_pdir[pdir_idx]; KASSERT(ptbl, ("pte_remove: null ptbl")); pte = &ptbl[ptbl_idx]; if (pte == NULL || !PTE_ISVALID(pte)) return (0); if (PTE_ISWIRED(pte)) pmap->pm_stats.wired_count--; /* Handle managed entry. */ if (PTE_ISMANAGED(pte)) { /* Get vm_page_t for mapped pte. */ m = PHYS_TO_VM_PAGE(PTE_PA(pte)); if (PTE_ISMODIFIED(pte)) vm_page_dirty(m); if (PTE_ISREFERENCED(pte)) vm_page_aflag_set(m, PGA_REFERENCED); pv_remove(pmap, va, m); } mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); tlb0_flush_entry(va); pte->flags = 0; pte->rpn = 0; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); pmap->pm_stats.resident_count--; if (flags & PTBL_UNHOLD) { //debugf("pte_remove: e (unhold)\n"); return (ptbl_unhold(mmu, pmap, pdir_idx)); } //debugf("pte_remove: e\n"); return (0); } /* * Insert PTE for a given page and virtual address. */ static int pte_enter(mmu_t mmu, pmap_t pmap, vm_page_t m, vm_offset_t va, uint32_t flags, boolean_t nosleep) { unsigned int pdir_idx = PDIR_IDX(va); unsigned int ptbl_idx = PTBL_IDX(va); pte_t *ptbl, *pte; CTR4(KTR_PMAP, "%s: su = %d pmap = %p va = %p", __func__, pmap == kernel_pmap, pmap, va); /* Get the page table pointer. */ ptbl = pmap->pm_pdir[pdir_idx]; if (ptbl == NULL) { /* Allocate page table pages. */ ptbl = ptbl_alloc(mmu, pmap, pdir_idx, nosleep); if (ptbl == NULL) { KASSERT(nosleep, ("nosleep and NULL ptbl")); return (ENOMEM); } } else { /* * Check if there is valid mapping for requested * va, if there is, remove it. */ pte = &pmap->pm_pdir[pdir_idx][ptbl_idx]; if (PTE_ISVALID(pte)) { pte_remove(mmu, pmap, va, PTBL_HOLD); } else { /* * pte is not used, increment hold count * for ptbl pages. */ if (pmap != kernel_pmap) ptbl_hold(mmu, pmap, pdir_idx); } } /* * Insert pv_entry into pv_list for mapped page if part of managed * memory. */ if ((m->oflags & VPO_UNMANAGED) == 0) { flags |= PTE_MANAGED; /* Create and insert pv entry. */ pv_insert(pmap, va, m); } pmap->pm_stats.resident_count++; mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); tlb0_flush_entry(va); if (pmap->pm_pdir[pdir_idx] == NULL) { /* * If we just allocated a new page table, hook it in * the pdir. */ pmap->pm_pdir[pdir_idx] = ptbl; } pte = &(pmap->pm_pdir[pdir_idx][ptbl_idx]); pte->rpn = PTE_RPN_FROM_PA(VM_PAGE_TO_PHYS(m)); pte->flags |= (PTE_VALID | flags); tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); return (0); } /* Return the pa for the given pmap/va. */ static vm_paddr_t pte_vatopa(mmu_t mmu, pmap_t pmap, vm_offset_t va) { vm_paddr_t pa = 0; pte_t *pte; pte = pte_find(mmu, pmap, va); if ((pte != NULL) && PTE_ISVALID(pte)) pa = (PTE_PA(pte) | (va & PTE_PA_MASK)); return (pa); } /* Get a pointer to a PTE in a page table. */ static pte_t * pte_find(mmu_t mmu, pmap_t pmap, vm_offset_t va) { unsigned int pdir_idx = PDIR_IDX(va); unsigned int ptbl_idx = PTBL_IDX(va); KASSERT((pmap != NULL), ("pte_find: invalid pmap")); if (pmap->pm_pdir[pdir_idx]) return (&(pmap->pm_pdir[pdir_idx][ptbl_idx])); return (NULL); } /* Set up kernel page tables. */ static void kernel_pte_alloc(vm_offset_t data_end, vm_offset_t addr, vm_offset_t pdir) { int i; vm_offset_t va; pte_t *pte; /* Initialize kernel pdir */ for (i = 0; i < kernel_ptbls; i++) kernel_pmap->pm_pdir[kptbl_min + i] = (pte_t *)(pdir + (i * PAGE_SIZE * PTBL_PAGES)); /* * Fill in PTEs covering kernel code and data. They are not required * for address translation, as this area is covered by static TLB1 * entries, but for pte_vatopa() to work correctly with kernel area * addresses. */ for (va = addr; va < data_end; va += PAGE_SIZE) { pte = &(kernel_pmap->pm_pdir[PDIR_IDX(va)][PTBL_IDX(va)]); pte->rpn = kernload + (va - kernstart); pte->flags = PTE_M | PTE_SR | PTE_SW | PTE_SX | PTE_WIRED | PTE_VALID; } } /**************************************************************************/ /* PMAP related */ /**************************************************************************/ /* * This is called during booke_init, before the system is really initialized. */ static void mmu_booke_bootstrap(mmu_t mmu, vm_offset_t start, vm_offset_t kernelend) { vm_paddr_t phys_kernelend; struct mem_region *mp, *mp1; int cnt, i, j; vm_paddr_t s, e, sz; vm_paddr_t physsz, hwphyssz; u_int phys_avail_count; vm_size_t kstack0_sz; vm_offset_t kernel_pdir, kstack0; vm_paddr_t kstack0_phys; void *dpcpu; debugf("mmu_booke_bootstrap: entered\n"); /* Set interesting system properties */ hw_direct_map = 0; elf32_nxstack = 1; /* Initialize invalidation mutex */ mtx_init(&tlbivax_mutex, "tlbivax", NULL, MTX_SPIN); /* Read TLB0 size and associativity. */ tlb0_get_tlbconf(); /* * Align kernel start and end address (kernel image). * Note that kernel end does not necessarily relate to kernsize. * kernsize is the size of the kernel that is actually mapped. */ kernstart = trunc_page(start); data_start = round_page(kernelend); data_end = data_start; /* * Addresses of preloaded modules (like file systems) use * physical addresses. Make sure we relocate those into * virtual addresses. */ preload_addr_relocate = kernstart - kernload; /* Allocate the dynamic per-cpu area. */ dpcpu = (void *)data_end; data_end += DPCPU_SIZE; /* Allocate space for the message buffer. */ msgbufp = (struct msgbuf *)data_end; data_end += msgbufsize; debugf(" msgbufp at 0x%08x end = 0x%08x\n", (uint32_t)msgbufp, data_end); data_end = round_page(data_end); /* Allocate space for ptbl_bufs. */ ptbl_bufs = (struct ptbl_buf *)data_end; data_end += sizeof(struct ptbl_buf) * PTBL_BUFS; debugf(" ptbl_bufs at 0x%08x end = 0x%08x\n", (uint32_t)ptbl_bufs, data_end); data_end = round_page(data_end); /* Allocate PTE tables for kernel KVA. */ kernel_pdir = data_end; kernel_ptbls = (VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS + PDIR_SIZE - 1) / PDIR_SIZE; data_end += kernel_ptbls * PTBL_PAGES * PAGE_SIZE; debugf(" kernel ptbls: %d\n", kernel_ptbls); debugf(" kernel pdir at 0x%08x end = 0x%08x\n", kernel_pdir, data_end); debugf(" data_end: 0x%08x\n", data_end); if (data_end - kernstart > kernsize) { kernsize += tlb1_mapin_region(kernstart + kernsize, kernload + kernsize, (data_end - kernstart) - kernsize); } data_end = kernstart + kernsize; debugf(" updated data_end: 0x%08x\n", data_end); /* * Clear the structures - note we can only do it safely after the * possible additional TLB1 translations are in place (above) so that * all range up to the currently calculated 'data_end' is covered. */ dpcpu_init(dpcpu, 0); memset((void *)ptbl_bufs, 0, sizeof(struct ptbl_buf) * PTBL_SIZE); memset((void *)kernel_pdir, 0, kernel_ptbls * PTBL_PAGES * PAGE_SIZE); /*******************************************************/ /* Set the start and end of kva. */ /*******************************************************/ virtual_avail = round_page(data_end); virtual_end = VM_MAX_KERNEL_ADDRESS; /* Allocate KVA space for page zero/copy operations. */ zero_page_va = virtual_avail; virtual_avail += PAGE_SIZE; zero_page_idle_va = virtual_avail; virtual_avail += PAGE_SIZE; copy_page_src_va = virtual_avail; virtual_avail += PAGE_SIZE; copy_page_dst_va = virtual_avail; virtual_avail += PAGE_SIZE; debugf("zero_page_va = 0x%08x\n", zero_page_va); debugf("zero_page_idle_va = 0x%08x\n", zero_page_idle_va); debugf("copy_page_src_va = 0x%08x\n", copy_page_src_va); debugf("copy_page_dst_va = 0x%08x\n", copy_page_dst_va); /* Initialize page zero/copy mutexes. */ mtx_init(&zero_page_mutex, "mmu_booke_zero_page", NULL, MTX_DEF); mtx_init(©_page_mutex, "mmu_booke_copy_page", NULL, MTX_DEF); /* Allocate KVA space for ptbl bufs. */ ptbl_buf_pool_vabase = virtual_avail; virtual_avail += PTBL_BUFS * PTBL_PAGES * PAGE_SIZE; debugf("ptbl_buf_pool_vabase = 0x%08x end = 0x%08x\n", ptbl_buf_pool_vabase, virtual_avail); /* Calculate corresponding physical addresses for the kernel region. */ phys_kernelend = kernload + kernsize; debugf("kernel image and allocated data:\n"); debugf(" kernload = 0x%09llx\n", (uint64_t)kernload); debugf(" kernstart = 0x%08x\n", kernstart); debugf(" kernsize = 0x%08x\n", kernsize); if (sizeof(phys_avail) / sizeof(phys_avail[0]) < availmem_regions_sz) panic("mmu_booke_bootstrap: phys_avail too small"); /* * Remove kernel physical address range from avail regions list. Page * align all regions. Non-page aligned memory isn't very interesting * to us. Also, sort the entries for ascending addresses. */ /* Retrieve phys/avail mem regions */ mem_regions(&physmem_regions, &physmem_regions_sz, &availmem_regions, &availmem_regions_sz); sz = 0; cnt = availmem_regions_sz; debugf("processing avail regions:\n"); for (mp = availmem_regions; mp->mr_size; mp++) { s = mp->mr_start; e = mp->mr_start + mp->mr_size; debugf(" %09jx-%09jx -> ", (uintmax_t)s, (uintmax_t)e); /* Check whether this region holds all of the kernel. */ if (s < kernload && e > phys_kernelend) { availmem_regions[cnt].mr_start = phys_kernelend; availmem_regions[cnt++].mr_size = e - phys_kernelend; e = kernload; } /* Look whether this regions starts within the kernel. */ if (s >= kernload && s < phys_kernelend) { if (e <= phys_kernelend) goto empty; s = phys_kernelend; } /* Now look whether this region ends within the kernel. */ if (e > kernload && e <= phys_kernelend) { if (s >= kernload) goto empty; e = kernload; } /* Now page align the start and size of the region. */ s = round_page(s); e = trunc_page(e); if (e < s) e = s; sz = e - s; debugf("%09jx-%09jx = %jx\n", (uintmax_t)s, (uintmax_t)e, (uintmax_t)sz); /* Check whether some memory is left here. */ if (sz == 0) { empty: memmove(mp, mp + 1, (cnt - (mp - availmem_regions)) * sizeof(*mp)); cnt--; mp--; continue; } /* Do an insertion sort. */ for (mp1 = availmem_regions; mp1 < mp; mp1++) if (s < mp1->mr_start) break; if (mp1 < mp) { memmove(mp1 + 1, mp1, (char *)mp - (char *)mp1); mp1->mr_start = s; mp1->mr_size = sz; } else { mp->mr_start = s; mp->mr_size = sz; } } availmem_regions_sz = cnt; /*******************************************************/ /* Steal physical memory for kernel stack from the end */ /* of the first avail region */ /*******************************************************/ kstack0_sz = kstack_pages * PAGE_SIZE; kstack0_phys = availmem_regions[0].mr_start + availmem_regions[0].mr_size; kstack0_phys -= kstack0_sz; availmem_regions[0].mr_size -= kstack0_sz; /*******************************************************/ /* Fill in phys_avail table, based on availmem_regions */ /*******************************************************/ phys_avail_count = 0; physsz = 0; hwphyssz = 0; TUNABLE_ULONG_FETCH("hw.physmem", (u_long *) &hwphyssz); debugf("fill in phys_avail:\n"); for (i = 0, j = 0; i < availmem_regions_sz; i++, j += 2) { debugf(" region: 0x%jx - 0x%jx (0x%jx)\n", (uintmax_t)availmem_regions[i].mr_start, (uintmax_t)availmem_regions[i].mr_start + availmem_regions[i].mr_size, (uintmax_t)availmem_regions[i].mr_size); if (hwphyssz != 0 && (physsz + availmem_regions[i].mr_size) >= hwphyssz) { debugf(" hw.physmem adjust\n"); if (physsz < hwphyssz) { phys_avail[j] = availmem_regions[i].mr_start; phys_avail[j + 1] = availmem_regions[i].mr_start + hwphyssz - physsz; physsz = hwphyssz; phys_avail_count++; } break; } phys_avail[j] = availmem_regions[i].mr_start; phys_avail[j + 1] = availmem_regions[i].mr_start + availmem_regions[i].mr_size; phys_avail_count++; physsz += availmem_regions[i].mr_size; } physmem = btoc(physsz); /* Calculate the last available physical address. */ for (i = 0; phys_avail[i + 2] != 0; i += 2) ; Maxmem = powerpc_btop(phys_avail[i + 1]); debugf("Maxmem = 0x%08lx\n", Maxmem); debugf("phys_avail_count = %d\n", phys_avail_count); - debugf("physsz = 0x%08x physmem = %ld (0x%08lx)\n", physsz, physmem, - physmem); + debugf("physsz = 0x%09jx physmem = %jd (0x%09jx)\n", + (uintmax_t)physsz, (uintmax_t)physmem, (uintmax_t)physmem); /*******************************************************/ /* Initialize (statically allocated) kernel pmap. */ /*******************************************************/ PMAP_LOCK_INIT(kernel_pmap); kptbl_min = VM_MIN_KERNEL_ADDRESS / PDIR_SIZE; debugf("kernel_pmap = 0x%08x\n", (uint32_t)kernel_pmap); debugf("kptbl_min = %d, kernel_ptbls = %d\n", kptbl_min, kernel_ptbls); debugf("kernel pdir range: 0x%08x - 0x%08x\n", kptbl_min * PDIR_SIZE, (kptbl_min + kernel_ptbls) * PDIR_SIZE - 1); kernel_pte_alloc(data_end, kernstart, kernel_pdir); for (i = 0; i < MAXCPU; i++) { kernel_pmap->pm_tid[i] = TID_KERNEL; /* Initialize each CPU's tidbusy entry 0 with kernel_pmap */ tidbusy[i][TID_KERNEL] = kernel_pmap; } /* Mark kernel_pmap active on all CPUs */ CPU_FILL(&kernel_pmap->pm_active); /* * Initialize the global pv list lock. */ rw_init(&pvh_global_lock, "pmap pv global"); /*******************************************************/ /* Final setup */ /*******************************************************/ /* Enter kstack0 into kernel map, provide guard page */ kstack0 = virtual_avail + KSTACK_GUARD_PAGES * PAGE_SIZE; thread0.td_kstack = kstack0; thread0.td_kstack_pages = kstack_pages; debugf("kstack_sz = 0x%08x\n", kstack0_sz); debugf("kstack0_phys at 0x%09llx - 0x%09llx\n", kstack0_phys, kstack0_phys + kstack0_sz); debugf("kstack0 at 0x%08x - 0x%08x\n", kstack0, kstack0 + kstack0_sz); virtual_avail += KSTACK_GUARD_PAGES * PAGE_SIZE + kstack0_sz; for (i = 0; i < kstack_pages; i++) { mmu_booke_kenter(mmu, kstack0, kstack0_phys); kstack0 += PAGE_SIZE; kstack0_phys += PAGE_SIZE; } pmap_bootstrapped = 1; debugf("virtual_avail = %08x\n", virtual_avail); debugf("virtual_end = %08x\n", virtual_end); debugf("mmu_booke_bootstrap: exit\n"); } #ifdef SMP void pmap_bootstrap_ap(volatile uint32_t *trcp __unused) { int i; /* * Finish TLB1 configuration: the BSP already set up its TLB1 and we * have the snapshot of its contents in the s/w tlb1[] table, so use * these values directly to (re)program AP's TLB1 hardware. */ for (i = bp_ntlb1s; i < tlb1_idx; i++) { /* Skip invalid entries */ if (!(tlb1[i].mas1 & MAS1_VALID)) continue; tlb1_write_entry(i); } set_mas4_defaults(); } #endif static void booke_pmap_init_qpages(void) { struct pcpu *pc; int i; CPU_FOREACH(i) { pc = pcpu_find(i); pc->pc_qmap_addr = kva_alloc(PAGE_SIZE); if (pc->pc_qmap_addr == 0) panic("pmap_init_qpages: unable to allocate KVA"); } } SYSINIT(qpages_init, SI_SUB_CPU, SI_ORDER_ANY, booke_pmap_init_qpages, NULL); /* * Get the physical page address for the given pmap/virtual address. */ static vm_paddr_t mmu_booke_extract(mmu_t mmu, pmap_t pmap, vm_offset_t va) { vm_paddr_t pa; PMAP_LOCK(pmap); pa = pte_vatopa(mmu, pmap, va); PMAP_UNLOCK(pmap); return (pa); } /* * Extract the physical page address associated with the given * kernel virtual address. */ static vm_paddr_t mmu_booke_kextract(mmu_t mmu, vm_offset_t va) { int i; /* Check TLB1 mappings */ for (i = 0; i < tlb1_idx; i++) { if (!(tlb1[i].mas1 & MAS1_VALID)) continue; if (va >= tlb1[i].virt && va < tlb1[i].virt + tlb1[i].size) return (tlb1[i].phys + (va - tlb1[i].virt)); } return (pte_vatopa(mmu, kernel_pmap, va)); } /* * Initialize the pmap module. * Called by vm_init, to initialize any structures that the pmap * system needs to map virtual memory. */ static void mmu_booke_init(mmu_t mmu) { int shpgperproc = PMAP_SHPGPERPROC; /* * Initialize the address space (zone) for the pv entries. Set a * high water mark so that the system can recover from excessive * numbers of pv entries. */ pvzone = uma_zcreate("PV ENTRY", sizeof(struct pv_entry), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); TUNABLE_INT_FETCH("vm.pmap.shpgperproc", &shpgperproc); pv_entry_max = shpgperproc * maxproc + vm_cnt.v_page_count; TUNABLE_INT_FETCH("vm.pmap.pv_entries", &pv_entry_max); pv_entry_high_water = 9 * (pv_entry_max / 10); uma_zone_reserve_kva(pvzone, pv_entry_max); /* Pre-fill pvzone with initial number of pv entries. */ uma_prealloc(pvzone, PV_ENTRY_ZONE_MIN); /* Initialize ptbl allocation. */ ptbl_init(); } /* * Map a list of wired pages into kernel virtual address space. This is * intended for temporary mappings which do not need page modification or * references recorded. Existing mappings in the region are overwritten. */ static void mmu_booke_qenter(mmu_t mmu, vm_offset_t sva, vm_page_t *m, int count) { vm_offset_t va; va = sva; while (count-- > 0) { mmu_booke_kenter(mmu, va, VM_PAGE_TO_PHYS(*m)); va += PAGE_SIZE; m++; } } /* * Remove page mappings from kernel virtual address space. Intended for * temporary mappings entered by mmu_booke_qenter. */ static void mmu_booke_qremove(mmu_t mmu, vm_offset_t sva, int count) { vm_offset_t va; va = sva; while (count-- > 0) { mmu_booke_kremove(mmu, va); va += PAGE_SIZE; } } /* * Map a wired page into kernel virtual address space. */ static void mmu_booke_kenter(mmu_t mmu, vm_offset_t va, vm_paddr_t pa) { mmu_booke_kenter_attr(mmu, va, pa, VM_MEMATTR_DEFAULT); } static void mmu_booke_kenter_attr(mmu_t mmu, vm_offset_t va, vm_paddr_t pa, vm_memattr_t ma) { uint32_t flags; pte_t *pte; KASSERT(((va >= VM_MIN_KERNEL_ADDRESS) && (va <= VM_MAX_KERNEL_ADDRESS)), ("mmu_booke_kenter: invalid va")); flags = PTE_SR | PTE_SW | PTE_SX | PTE_WIRED | PTE_VALID; flags |= tlb_calc_wimg(pa, ma); pte = pte_find(mmu, kernel_pmap, va); mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); if (PTE_ISVALID(pte)) { CTR1(KTR_PMAP, "%s: replacing entry!", __func__); /* Flush entry from TLB0 */ tlb0_flush_entry(va); } pte->rpn = PTE_RPN_FROM_PA(pa); pte->flags = flags; //debugf("mmu_booke_kenter: pdir_idx = %d ptbl_idx = %d va=0x%08x " // "pa=0x%08x rpn=0x%08x flags=0x%08x\n", // pdir_idx, ptbl_idx, va, pa, pte->rpn, pte->flags); /* Flush the real memory from the instruction cache. */ if ((flags & (PTE_I | PTE_G)) == 0) { __syncicache((void *)va, PAGE_SIZE); } tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } /* * Remove a page from kernel page table. */ static void mmu_booke_kremove(mmu_t mmu, vm_offset_t va) { pte_t *pte; CTR2(KTR_PMAP,"%s: s (va = 0x%08x)\n", __func__, va); KASSERT(((va >= VM_MIN_KERNEL_ADDRESS) && (va <= VM_MAX_KERNEL_ADDRESS)), ("mmu_booke_kremove: invalid va")); pte = pte_find(mmu, kernel_pmap, va); if (!PTE_ISVALID(pte)) { CTR1(KTR_PMAP, "%s: invalid pte", __func__); return; } mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); /* Invalidate entry in TLB0, update PTE. */ tlb0_flush_entry(va); pte->flags = 0; pte->rpn = 0; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } /* * Initialize pmap associated with process 0. */ static void mmu_booke_pinit0(mmu_t mmu, pmap_t pmap) { PMAP_LOCK_INIT(pmap); mmu_booke_pinit(mmu, pmap); PCPU_SET(curpmap, pmap); } /* * Initialize a preallocated and zeroed pmap structure, * such as one in a vmspace structure. */ static void mmu_booke_pinit(mmu_t mmu, pmap_t pmap) { int i; CTR4(KTR_PMAP, "%s: pmap = %p, proc %d '%s'", __func__, pmap, curthread->td_proc->p_pid, curthread->td_proc->p_comm); KASSERT((pmap != kernel_pmap), ("pmap_pinit: initializing kernel_pmap")); for (i = 0; i < MAXCPU; i++) pmap->pm_tid[i] = TID_NONE; CPU_ZERO(&kernel_pmap->pm_active); bzero(&pmap->pm_stats, sizeof(pmap->pm_stats)); bzero(&pmap->pm_pdir, sizeof(pte_t *) * PDIR_NENTRIES); TAILQ_INIT(&pmap->pm_ptbl_list); } /* * Release any resources held by the given physical map. * Called when a pmap initialized by mmu_booke_pinit is being released. * Should only be called if the map contains no valid mappings. */ static void mmu_booke_release(mmu_t mmu, pmap_t pmap) { KASSERT(pmap->pm_stats.resident_count == 0, ("pmap_release: pmap resident count %ld != 0", pmap->pm_stats.resident_count)); } /* * Insert the given physical page at the specified virtual address in the * target physical map with the protection requested. If specified the page * will be wired down. */ static int mmu_booke_enter(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int flags, int8_t psind) { int error; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); error = mmu_booke_enter_locked(mmu, pmap, va, m, prot, flags, psind); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); return (error); } static int mmu_booke_enter_locked(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot, u_int pmap_flags, int8_t psind __unused) { pte_t *pte; vm_paddr_t pa; uint32_t flags; int error, su, sync; pa = VM_PAGE_TO_PHYS(m); su = (pmap == kernel_pmap); sync = 0; //debugf("mmu_booke_enter_locked: s (pmap=0x%08x su=%d tid=%d m=0x%08x va=0x%08x " // "pa=0x%08x prot=0x%08x flags=%#x)\n", // (u_int32_t)pmap, su, pmap->pm_tid, // (u_int32_t)m, va, pa, prot, flags); if (su) { KASSERT(((va >= virtual_avail) && (va <= VM_MAX_KERNEL_ADDRESS)), ("mmu_booke_enter_locked: kernel pmap, non kernel va")); } else { KASSERT((va <= VM_MAXUSER_ADDRESS), ("mmu_booke_enter_locked: user pmap, non user va")); } if ((m->oflags & VPO_UNMANAGED) == 0 && !vm_page_xbusied(m)) VM_OBJECT_ASSERT_LOCKED(m->object); PMAP_LOCK_ASSERT(pmap, MA_OWNED); /* * If there is an existing mapping, and the physical address has not * changed, must be protection or wiring change. */ if (((pte = pte_find(mmu, pmap, va)) != NULL) && (PTE_ISVALID(pte)) && (PTE_PA(pte) == pa)) { /* * Before actually updating pte->flags we calculate and * prepare its new value in a helper var. */ flags = pte->flags; flags &= ~(PTE_UW | PTE_UX | PTE_SW | PTE_SX | PTE_MODIFIED); /* Wiring change, just update stats. */ if ((pmap_flags & PMAP_ENTER_WIRED) != 0) { if (!PTE_ISWIRED(pte)) { flags |= PTE_WIRED; pmap->pm_stats.wired_count++; } } else { if (PTE_ISWIRED(pte)) { flags &= ~PTE_WIRED; pmap->pm_stats.wired_count--; } } if (prot & VM_PROT_WRITE) { /* Add write permissions. */ flags |= PTE_SW; if (!su) flags |= PTE_UW; if ((flags & PTE_MANAGED) != 0) vm_page_aflag_set(m, PGA_WRITEABLE); } else { /* Handle modified pages, sense modify status. */ /* * The PTE_MODIFIED flag could be set by underlying * TLB misses since we last read it (above), possibly * other CPUs could update it so we check in the PTE * directly rather than rely on that saved local flags * copy. */ if (PTE_ISMODIFIED(pte)) vm_page_dirty(m); } if (prot & VM_PROT_EXECUTE) { flags |= PTE_SX; if (!su) flags |= PTE_UX; /* * Check existing flags for execute permissions: if we * are turning execute permissions on, icache should * be flushed. */ if ((pte->flags & (PTE_UX | PTE_SX)) == 0) sync++; } flags &= ~PTE_REFERENCED; /* * The new flags value is all calculated -- only now actually * update the PTE. */ mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); tlb0_flush_entry(va); pte->flags = flags; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } else { /* * If there is an existing mapping, but it's for a different * physical address, pte_enter() will delete the old mapping. */ //if ((pte != NULL) && PTE_ISVALID(pte)) // debugf("mmu_booke_enter_locked: replace\n"); //else // debugf("mmu_booke_enter_locked: new\n"); /* Now set up the flags and install the new mapping. */ flags = (PTE_SR | PTE_VALID); flags |= PTE_M; if (!su) flags |= PTE_UR; if (prot & VM_PROT_WRITE) { flags |= PTE_SW; if (!su) flags |= PTE_UW; if ((m->oflags & VPO_UNMANAGED) == 0) vm_page_aflag_set(m, PGA_WRITEABLE); } if (prot & VM_PROT_EXECUTE) { flags |= PTE_SX; if (!su) flags |= PTE_UX; } /* If its wired update stats. */ if ((pmap_flags & PMAP_ENTER_WIRED) != 0) flags |= PTE_WIRED; error = pte_enter(mmu, pmap, m, va, flags, (pmap_flags & PMAP_ENTER_NOSLEEP) != 0); if (error != 0) return (KERN_RESOURCE_SHORTAGE); if ((flags & PMAP_ENTER_WIRED) != 0) pmap->pm_stats.wired_count++; /* Flush the real memory from the instruction cache. */ if (prot & VM_PROT_EXECUTE) sync++; } if (sync && (su || pmap == PCPU_GET(curpmap))) { __syncicache((void *)va, PAGE_SIZE); sync = 0; } return (KERN_SUCCESS); } /* * Maps a sequence of resident pages belonging to the same object. * The sequence begins with the given page m_start. This page is * mapped at the given virtual address start. Each subsequent page is * mapped at a virtual address that is offset from start by the same * amount as the page is offset from m_start within the object. The * last page in the sequence is the page with the largest offset from * m_start that can be mapped at a virtual address less than the given * virtual address end. Not every virtual page between start and end * is mapped; only those for which a resident page exists with the * corresponding offset from m_start are mapped. */ static void mmu_booke_enter_object(mmu_t mmu, pmap_t pmap, vm_offset_t start, vm_offset_t end, vm_page_t m_start, vm_prot_t prot) { vm_page_t m; vm_pindex_t diff, psize; VM_OBJECT_ASSERT_LOCKED(m_start->object); psize = atop(end - start); m = m_start; rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); while (m != NULL && (diff = m->pindex - m_start->pindex) < psize) { mmu_booke_enter_locked(mmu, pmap, start + ptoa(diff), m, prot & (VM_PROT_READ | VM_PROT_EXECUTE), PMAP_ENTER_NOSLEEP, 0); m = TAILQ_NEXT(m, listq); } rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } static void mmu_booke_enter_quick(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_page_t m, vm_prot_t prot) { rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); mmu_booke_enter_locked(mmu, pmap, va, m, prot & (VM_PROT_READ | VM_PROT_EXECUTE), PMAP_ENTER_NOSLEEP, 0); rw_wunlock(&pvh_global_lock); PMAP_UNLOCK(pmap); } /* * Remove the given range of addresses from the specified map. * * It is assumed that the start and end are properly rounded to the page size. */ static void mmu_booke_remove(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_offset_t endva) { pte_t *pte; uint8_t hold_flag; int su = (pmap == kernel_pmap); //debugf("mmu_booke_remove: s (su = %d pmap=0x%08x tid=%d va=0x%08x endva=0x%08x)\n", // su, (u_int32_t)pmap, pmap->pm_tid, va, endva); if (su) { KASSERT(((va >= virtual_avail) && (va <= VM_MAX_KERNEL_ADDRESS)), ("mmu_booke_remove: kernel pmap, non kernel va")); } else { KASSERT((va <= VM_MAXUSER_ADDRESS), ("mmu_booke_remove: user pmap, non user va")); } if (PMAP_REMOVE_DONE(pmap)) { //debugf("mmu_booke_remove: e (empty)\n"); return; } hold_flag = PTBL_HOLD_FLAG(pmap); //debugf("mmu_booke_remove: hold_flag = %d\n", hold_flag); rw_wlock(&pvh_global_lock); PMAP_LOCK(pmap); for (; va < endva; va += PAGE_SIZE) { pte = pte_find(mmu, pmap, va); if ((pte != NULL) && PTE_ISVALID(pte)) pte_remove(mmu, pmap, va, hold_flag); } PMAP_UNLOCK(pmap); rw_wunlock(&pvh_global_lock); //debugf("mmu_booke_remove: e\n"); } /* * Remove physical page from all pmaps in which it resides. */ static void mmu_booke_remove_all(mmu_t mmu, vm_page_t m) { pv_entry_t pv, pvn; uint8_t hold_flag; rw_wlock(&pvh_global_lock); for (pv = TAILQ_FIRST(&m->md.pv_list); pv != NULL; pv = pvn) { pvn = TAILQ_NEXT(pv, pv_link); PMAP_LOCK(pv->pv_pmap); hold_flag = PTBL_HOLD_FLAG(pv->pv_pmap); pte_remove(mmu, pv->pv_pmap, pv->pv_va, hold_flag); PMAP_UNLOCK(pv->pv_pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); } /* * Map a range of physical addresses into kernel virtual address space. */ static vm_offset_t mmu_booke_map(mmu_t mmu, vm_offset_t *virt, vm_paddr_t pa_start, vm_paddr_t pa_end, int prot) { vm_offset_t sva = *virt; vm_offset_t va = sva; //debugf("mmu_booke_map: s (sva = 0x%08x pa_start = 0x%08x pa_end = 0x%08x)\n", // sva, pa_start, pa_end); while (pa_start < pa_end) { mmu_booke_kenter(mmu, va, pa_start); va += PAGE_SIZE; pa_start += PAGE_SIZE; } *virt = va; //debugf("mmu_booke_map: e (va = 0x%08x)\n", va); return (sva); } /* * The pmap must be activated before it's address space can be accessed in any * way. */ static void mmu_booke_activate(mmu_t mmu, struct thread *td) { pmap_t pmap; u_int cpuid; pmap = &td->td_proc->p_vmspace->vm_pmap; CTR5(KTR_PMAP, "%s: s (td = %p, proc = '%s', id = %d, pmap = 0x%08x)", __func__, td, td->td_proc->p_comm, td->td_proc->p_pid, pmap); KASSERT((pmap != kernel_pmap), ("mmu_booke_activate: kernel_pmap!")); sched_pin(); cpuid = PCPU_GET(cpuid); CPU_SET_ATOMIC(cpuid, &pmap->pm_active); PCPU_SET(curpmap, pmap); if (pmap->pm_tid[cpuid] == TID_NONE) tid_alloc(pmap); /* Load PID0 register with pmap tid value. */ mtspr(SPR_PID0, pmap->pm_tid[cpuid]); __asm __volatile("isync"); mtspr(SPR_DBCR0, td->td_pcb->pcb_cpu.booke.dbcr0); sched_unpin(); CTR3(KTR_PMAP, "%s: e (tid = %d for '%s')", __func__, pmap->pm_tid[PCPU_GET(cpuid)], td->td_proc->p_comm); } /* * Deactivate the specified process's address space. */ static void mmu_booke_deactivate(mmu_t mmu, struct thread *td) { pmap_t pmap; pmap = &td->td_proc->p_vmspace->vm_pmap; CTR5(KTR_PMAP, "%s: td=%p, proc = '%s', id = %d, pmap = 0x%08x", __func__, td, td->td_proc->p_comm, td->td_proc->p_pid, pmap); td->td_pcb->pcb_cpu.booke.dbcr0 = mfspr(SPR_DBCR0); CPU_CLR_ATOMIC(PCPU_GET(cpuid), &pmap->pm_active); PCPU_SET(curpmap, NULL); } /* * Copy the range specified by src_addr/len * from the source map to the range dst_addr/len * in the destination map. * * This routine is only advisory and need not do anything. */ static void mmu_booke_copy(mmu_t mmu, pmap_t dst_pmap, pmap_t src_pmap, vm_offset_t dst_addr, vm_size_t len, vm_offset_t src_addr) { } /* * Set the physical protection on the specified range of this map as requested. */ static void mmu_booke_protect(mmu_t mmu, pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) { vm_offset_t va; vm_page_t m; pte_t *pte; if ((prot & VM_PROT_READ) == VM_PROT_NONE) { mmu_booke_remove(mmu, pmap, sva, eva); return; } if (prot & VM_PROT_WRITE) return; PMAP_LOCK(pmap); for (va = sva; va < eva; va += PAGE_SIZE) { if ((pte = pte_find(mmu, pmap, va)) != NULL) { if (PTE_ISVALID(pte)) { m = PHYS_TO_VM_PAGE(PTE_PA(pte)); mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); /* Handle modified pages. */ if (PTE_ISMODIFIED(pte) && PTE_ISMANAGED(pte)) vm_page_dirty(m); tlb0_flush_entry(va); pte->flags &= ~(PTE_UW | PTE_SW | PTE_MODIFIED); tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } } } PMAP_UNLOCK(pmap); } /* * Clear the write and modified bits in each of the given page's mappings. */ static void mmu_booke_remove_write(mmu_t mmu, vm_page_t m) { pv_entry_t pv; pte_t *pte; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_remove_write: page %p is not managed", m)); /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * set by another thread while the object is locked. Thus, * if PGA_WRITEABLE is clear, no page table entries need updating. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL) { if (PTE_ISVALID(pte)) { m = PHYS_TO_VM_PAGE(PTE_PA(pte)); mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); /* Handle modified pages. */ if (PTE_ISMODIFIED(pte)) vm_page_dirty(m); /* Flush mapping from TLB0. */ pte->flags &= ~(PTE_UW | PTE_SW | PTE_MODIFIED); tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } } PMAP_UNLOCK(pv->pv_pmap); } vm_page_aflag_clear(m, PGA_WRITEABLE); rw_wunlock(&pvh_global_lock); } static void mmu_booke_sync_icache(mmu_t mmu, pmap_t pm, vm_offset_t va, vm_size_t sz) { pte_t *pte; pmap_t pmap; vm_page_t m; vm_offset_t addr; vm_paddr_t pa = 0; int active, valid; va = trunc_page(va); sz = round_page(sz); rw_wlock(&pvh_global_lock); pmap = PCPU_GET(curpmap); active = (pm == kernel_pmap || pm == pmap) ? 1 : 0; while (sz > 0) { PMAP_LOCK(pm); pte = pte_find(mmu, pm, va); valid = (pte != NULL && PTE_ISVALID(pte)) ? 1 : 0; if (valid) pa = PTE_PA(pte); PMAP_UNLOCK(pm); if (valid) { if (!active) { /* Create a mapping in the active pmap. */ addr = 0; m = PHYS_TO_VM_PAGE(pa); PMAP_LOCK(pmap); pte_enter(mmu, pmap, m, addr, PTE_SR | PTE_VALID | PTE_UR, FALSE); __syncicache((void *)addr, PAGE_SIZE); pte_remove(mmu, pmap, addr, PTBL_UNHOLD); PMAP_UNLOCK(pmap); } else __syncicache((void *)va, PAGE_SIZE); } va += PAGE_SIZE; sz -= PAGE_SIZE; } rw_wunlock(&pvh_global_lock); } /* * Atomically extract and hold the physical page with the given * pmap and virtual address pair if that mapping permits the given * protection. */ static vm_page_t mmu_booke_extract_and_hold(mmu_t mmu, pmap_t pmap, vm_offset_t va, vm_prot_t prot) { pte_t *pte; vm_page_t m; uint32_t pte_wbit; vm_paddr_t pa; m = NULL; pa = 0; PMAP_LOCK(pmap); retry: pte = pte_find(mmu, pmap, va); if ((pte != NULL) && PTE_ISVALID(pte)) { if (pmap == kernel_pmap) pte_wbit = PTE_SW; else pte_wbit = PTE_UW; if ((pte->flags & pte_wbit) || ((prot & VM_PROT_WRITE) == 0)) { if (vm_page_pa_tryrelock(pmap, PTE_PA(pte), &pa)) goto retry; m = PHYS_TO_VM_PAGE(PTE_PA(pte)); vm_page_hold(m); } } PA_UNLOCK_COND(pa); PMAP_UNLOCK(pmap); return (m); } /* * Initialize a vm_page's machine-dependent fields. */ static void mmu_booke_page_init(mmu_t mmu, vm_page_t m) { TAILQ_INIT(&m->md.pv_list); } /* * mmu_booke_zero_page_area zeros the specified hardware page by * mapping it into virtual memory and using bzero to clear * its contents. * * off and size must reside within a single page. */ static void mmu_booke_zero_page_area(mmu_t mmu, vm_page_t m, int off, int size) { vm_offset_t va; /* XXX KASSERT off and size are within a single page? */ mtx_lock(&zero_page_mutex); va = zero_page_va; mmu_booke_kenter(mmu, va, VM_PAGE_TO_PHYS(m)); bzero((caddr_t)va + off, size); mmu_booke_kremove(mmu, va); mtx_unlock(&zero_page_mutex); } /* * mmu_booke_zero_page zeros the specified hardware page. */ static void mmu_booke_zero_page(mmu_t mmu, vm_page_t m) { vm_offset_t off, va; mtx_lock(&zero_page_mutex); va = zero_page_va; mmu_booke_kenter(mmu, va, VM_PAGE_TO_PHYS(m)); for (off = 0; off < PAGE_SIZE; off += cacheline_size) - __asm __volatile("dcbzl 0,%0" :: "r"(va + off)); + __asm __volatile("dcbz 0,%0" :: "r"(va + off)); mmu_booke_kremove(mmu, va); mtx_unlock(&zero_page_mutex); } /* * mmu_booke_copy_page copies the specified (machine independent) page by * mapping the page into virtual memory and using memcopy to copy the page, * one machine dependent page at a time. */ static void mmu_booke_copy_page(mmu_t mmu, vm_page_t sm, vm_page_t dm) { vm_offset_t sva, dva; sva = copy_page_src_va; dva = copy_page_dst_va; mtx_lock(©_page_mutex); mmu_booke_kenter(mmu, sva, VM_PAGE_TO_PHYS(sm)); mmu_booke_kenter(mmu, dva, VM_PAGE_TO_PHYS(dm)); memcpy((caddr_t)dva, (caddr_t)sva, PAGE_SIZE); mmu_booke_kremove(mmu, dva); mmu_booke_kremove(mmu, sva); mtx_unlock(©_page_mutex); } static inline void mmu_booke_copy_pages(mmu_t mmu, vm_page_t *ma, vm_offset_t a_offset, vm_page_t *mb, vm_offset_t b_offset, int xfersize) { void *a_cp, *b_cp; vm_offset_t a_pg_offset, b_pg_offset; int cnt; mtx_lock(©_page_mutex); while (xfersize > 0) { a_pg_offset = a_offset & PAGE_MASK; cnt = min(xfersize, PAGE_SIZE - a_pg_offset); mmu_booke_kenter(mmu, copy_page_src_va, VM_PAGE_TO_PHYS(ma[a_offset >> PAGE_SHIFT])); a_cp = (char *)copy_page_src_va + a_pg_offset; b_pg_offset = b_offset & PAGE_MASK; cnt = min(cnt, PAGE_SIZE - b_pg_offset); mmu_booke_kenter(mmu, copy_page_dst_va, VM_PAGE_TO_PHYS(mb[b_offset >> PAGE_SHIFT])); b_cp = (char *)copy_page_dst_va + b_pg_offset; bcopy(a_cp, b_cp, cnt); mmu_booke_kremove(mmu, copy_page_dst_va); mmu_booke_kremove(mmu, copy_page_src_va); a_offset += cnt; b_offset += cnt; xfersize -= cnt; } mtx_unlock(©_page_mutex); } /* * mmu_booke_zero_page_idle zeros the specified hardware page by mapping it * into virtual memory and using bzero to clear its contents. This is intended * to be called from the vm_pagezero process only and outside of Giant. No * lock is required. */ static void mmu_booke_zero_page_idle(mmu_t mmu, vm_page_t m) { vm_offset_t va; va = zero_page_idle_va; mmu_booke_kenter(mmu, va, VM_PAGE_TO_PHYS(m)); bzero((caddr_t)va, PAGE_SIZE); mmu_booke_kremove(mmu, va); } static vm_offset_t mmu_booke_quick_enter_page(mmu_t mmu, vm_page_t m) { vm_paddr_t paddr; vm_offset_t qaddr; uint32_t flags; pte_t *pte; paddr = VM_PAGE_TO_PHYS(m); flags = PTE_SR | PTE_SW | PTE_SX | PTE_WIRED | PTE_VALID; flags |= tlb_calc_wimg(paddr, pmap_page_get_memattr(m)); critical_enter(); qaddr = PCPU_GET(qmap_addr); pte = pte_find(mmu, kernel_pmap, qaddr); KASSERT(pte->flags == 0, ("mmu_booke_quick_enter_page: PTE busy")); /* * XXX: tlbivax is broadcast to other cores, but qaddr should * not be present in other TLBs. Is there a better instruction * sequence to use? Or just forget it & use mmu_booke_kenter()... */ __asm __volatile("tlbivax 0, %0" :: "r"(qaddr & MAS2_EPN_MASK)); __asm __volatile("isync; msync"); pte->rpn = paddr & ~PTE_PA_MASK; pte->flags = flags; /* Flush the real memory from the instruction cache. */ if ((flags & (PTE_I | PTE_G)) == 0) __syncicache((void *)qaddr, PAGE_SIZE); return (qaddr); } static void mmu_booke_quick_remove_page(mmu_t mmu, vm_offset_t addr) { pte_t *pte; pte = pte_find(mmu, kernel_pmap, addr); KASSERT(PCPU_GET(qmap_addr) == addr, ("mmu_booke_quick_remove_page: invalid address")); KASSERT(pte->flags != 0, ("mmu_booke_quick_remove_page: PTE not in use")); pte->flags = 0; pte->rpn = 0; critical_exit(); } /* * Return whether or not the specified physical page was modified * in any of physical maps. */ static boolean_t mmu_booke_is_modified(mmu_t mmu, vm_page_t m) { pte_t *pte; pv_entry_t pv; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_is_modified: page %p is not managed", m)); rv = FALSE; /* * If the page is not exclusive busied, then PGA_WRITEABLE cannot be * concurrently set while the object is locked. Thus, if PGA_WRITEABLE * is clear, no PTEs can be modified. */ VM_OBJECT_ASSERT_WLOCKED(m->object); if (!vm_page_xbusied(m) && (m->aflags & PGA_WRITEABLE) == 0) return (rv); rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL && PTE_ISVALID(pte)) { if (PTE_ISMODIFIED(pte)) rv = TRUE; } PMAP_UNLOCK(pv->pv_pmap); if (rv) break; } rw_wunlock(&pvh_global_lock); return (rv); } /* * Return whether or not the specified virtual address is eligible * for prefault. */ static boolean_t mmu_booke_is_prefaultable(mmu_t mmu, pmap_t pmap, vm_offset_t addr) { return (FALSE); } /* * Return whether or not the specified physical page was referenced * in any physical maps. */ static boolean_t mmu_booke_is_referenced(mmu_t mmu, vm_page_t m) { pte_t *pte; pv_entry_t pv; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_is_referenced: page %p is not managed", m)); rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL && PTE_ISVALID(pte)) { if (PTE_ISREFERENCED(pte)) rv = TRUE; } PMAP_UNLOCK(pv->pv_pmap); if (rv) break; } rw_wunlock(&pvh_global_lock); return (rv); } /* * Clear the modify bits on the specified physical page. */ static void mmu_booke_clear_modify(mmu_t mmu, vm_page_t m) { pte_t *pte; pv_entry_t pv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_clear_modify: page %p is not managed", m)); VM_OBJECT_ASSERT_WLOCKED(m->object); KASSERT(!vm_page_xbusied(m), ("mmu_booke_clear_modify: page %p is exclusive busied", m)); /* * If the page is not PG_AWRITEABLE, then no PTEs can be modified. * If the object containing the page is locked and the page is not * exclusive busied, then PG_AWRITEABLE cannot be concurrently set. */ if ((m->aflags & PGA_WRITEABLE) == 0) return; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL && PTE_ISVALID(pte)) { mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); if (pte->flags & (PTE_SW | PTE_UW | PTE_MODIFIED)) { tlb0_flush_entry(pv->pv_va); pte->flags &= ~(PTE_SW | PTE_UW | PTE_MODIFIED | PTE_REFERENCED); } tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); } PMAP_UNLOCK(pv->pv_pmap); } rw_wunlock(&pvh_global_lock); } /* * Return a count of reference bits for a page, clearing those bits. * It is not necessary for every reference bit to be cleared, but it * is necessary that 0 only be returned when there are truly no * reference bits set. * * XXX: The exact number of bits to check and clear is a matter that * should be tested and standardized at some point in the future for * optimal aging of shared pages. */ static int mmu_booke_ts_referenced(mmu_t mmu, vm_page_t m) { pte_t *pte; pv_entry_t pv; int count; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_ts_referenced: page %p is not managed", m)); count = 0; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL && PTE_ISVALID(pte)) { if (PTE_ISREFERENCED(pte)) { mtx_lock_spin(&tlbivax_mutex); tlb_miss_lock(); tlb0_flush_entry(pv->pv_va); pte->flags &= ~PTE_REFERENCED; tlb_miss_unlock(); mtx_unlock_spin(&tlbivax_mutex); if (++count > 4) { PMAP_UNLOCK(pv->pv_pmap); break; } } } PMAP_UNLOCK(pv->pv_pmap); } rw_wunlock(&pvh_global_lock); return (count); } /* * Clear the wired attribute from the mappings for the specified range of * addresses in the given pmap. Every valid mapping within that range must * have the wired attribute set. In contrast, invalid mappings cannot have * the wired attribute set, so they are ignored. * * The wired attribute of the page table entry is not a hardware feature, so * there is no need to invalidate any TLB entries. */ static void mmu_booke_unwire(mmu_t mmu, pmap_t pmap, vm_offset_t sva, vm_offset_t eva) { vm_offset_t va; pte_t *pte; PMAP_LOCK(pmap); for (va = sva; va < eva; va += PAGE_SIZE) { if ((pte = pte_find(mmu, pmap, va)) != NULL && PTE_ISVALID(pte)) { if (!PTE_ISWIRED(pte)) panic("mmu_booke_unwire: pte %p isn't wired", pte); pte->flags &= ~PTE_WIRED; pmap->pm_stats.wired_count--; } } PMAP_UNLOCK(pmap); } /* * Return true if the pmap's pv is one of the first 16 pvs linked to from this * page. This count may be changed upwards or downwards in the future; it is * only necessary that true be returned for a small subset of pmaps for proper * page aging. */ static boolean_t mmu_booke_page_exists_quick(mmu_t mmu, pmap_t pmap, vm_page_t m) { pv_entry_t pv; int loops; boolean_t rv; KASSERT((m->oflags & VPO_UNMANAGED) == 0, ("mmu_booke_page_exists_quick: page %p is not managed", m)); loops = 0; rv = FALSE; rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { if (pv->pv_pmap == pmap) { rv = TRUE; break; } if (++loops >= 16) break; } rw_wunlock(&pvh_global_lock); return (rv); } /* * Return the number of managed mappings to the given physical page that are * wired. */ static int mmu_booke_page_wired_mappings(mmu_t mmu, vm_page_t m) { pv_entry_t pv; pte_t *pte; int count = 0; if ((m->oflags & VPO_UNMANAGED) != 0) return (count); rw_wlock(&pvh_global_lock); TAILQ_FOREACH(pv, &m->md.pv_list, pv_link) { PMAP_LOCK(pv->pv_pmap); if ((pte = pte_find(mmu, pv->pv_pmap, pv->pv_va)) != NULL) if (PTE_ISVALID(pte) && PTE_ISWIRED(pte)) count++; PMAP_UNLOCK(pv->pv_pmap); } rw_wunlock(&pvh_global_lock); return (count); } static int mmu_booke_dev_direct_mapped(mmu_t mmu, vm_paddr_t pa, vm_size_t size) { int i; vm_offset_t va; /* * This currently does not work for entries that * overlap TLB1 entries. */ for (i = 0; i < tlb1_idx; i ++) { if (tlb1_iomapped(i, pa, size, &va) == 0) return (0); } return (EFAULT); } void mmu_booke_dumpsys_map(mmu_t mmu, vm_paddr_t pa, size_t sz, void **va) { vm_paddr_t ppa; vm_offset_t ofs; vm_size_t gran; /* Minidumps are based on virtual memory addresses. */ if (do_minidump) { *va = (void *)(vm_offset_t)pa; return; } /* Raw physical memory dumps don't have a virtual address. */ /* We always map a 256MB page at 256M. */ gran = 256 * 1024 * 1024; ppa = pa & ~(gran - 1); ofs = pa - ppa; *va = (void *)gran; tlb1_set_entry((vm_offset_t)va, ppa, gran, _TLB_ENTRY_IO); if (sz > (gran - ofs)) tlb1_set_entry((vm_offset_t)(va + gran), ppa + gran, gran, _TLB_ENTRY_IO); } void mmu_booke_dumpsys_unmap(mmu_t mmu, vm_paddr_t pa, size_t sz, void *va) { vm_paddr_t ppa; vm_offset_t ofs; vm_size_t gran; /* Minidumps are based on virtual memory addresses. */ /* Nothing to do... */ if (do_minidump) return; /* Raw physical memory dumps don't have a virtual address. */ tlb1_idx--; tlb1[tlb1_idx].mas1 = 0; tlb1[tlb1_idx].mas2 = 0; tlb1[tlb1_idx].mas3 = 0; tlb1_write_entry(tlb1_idx); gran = 256 * 1024 * 1024; ppa = pa & ~(gran - 1); ofs = pa - ppa; if (sz > (gran - ofs)) { tlb1_idx--; tlb1[tlb1_idx].mas1 = 0; tlb1[tlb1_idx].mas2 = 0; tlb1[tlb1_idx].mas3 = 0; tlb1_write_entry(tlb1_idx); } } extern struct dump_pa dump_map[PHYS_AVAIL_SZ + 1]; void mmu_booke_scan_init(mmu_t mmu) { vm_offset_t va; pte_t *pte; int i; if (!do_minidump) { /* Initialize phys. segments for dumpsys(). */ memset(&dump_map, 0, sizeof(dump_map)); mem_regions(&physmem_regions, &physmem_regions_sz, &availmem_regions, &availmem_regions_sz); for (i = 0; i < physmem_regions_sz; i++) { dump_map[i].pa_start = physmem_regions[i].mr_start; dump_map[i].pa_size = physmem_regions[i].mr_size; } return; } /* Virtual segments for minidumps: */ memset(&dump_map, 0, sizeof(dump_map)); /* 1st: kernel .data and .bss. */ dump_map[0].pa_start = trunc_page((uintptr_t)_etext); dump_map[0].pa_size = round_page((uintptr_t)_end) - dump_map[0].pa_start; /* 2nd: msgbuf and tables (see pmap_bootstrap()). */ dump_map[1].pa_start = data_start; dump_map[1].pa_size = data_end - data_start; /* 3rd: kernel VM. */ va = dump_map[1].pa_start + dump_map[1].pa_size; /* Find start of next chunk (from va). */ while (va < virtual_end) { /* Don't dump the buffer cache. */ if (va >= kmi.buffer_sva && va < kmi.buffer_eva) { va = kmi.buffer_eva; continue; } pte = pte_find(mmu, kernel_pmap, va); if (pte != NULL && PTE_ISVALID(pte)) break; va += PAGE_SIZE; } if (va < virtual_end) { dump_map[2].pa_start = va; va += PAGE_SIZE; /* Find last page in chunk. */ while (va < virtual_end) { /* Don't run into the buffer cache. */ if (va == kmi.buffer_sva) break; pte = pte_find(mmu, kernel_pmap, va); if (pte == NULL || !PTE_ISVALID(pte)) break; va += PAGE_SIZE; } dump_map[2].pa_size = va - dump_map[2].pa_start; } } /* * Map a set of physical memory pages into the kernel virtual address space. * Return a pointer to where it is mapped. This routine is intended to be used * for mapping device memory, NOT real memory. */ static void * mmu_booke_mapdev(mmu_t mmu, vm_paddr_t pa, vm_size_t size) { return (mmu_booke_mapdev_attr(mmu, pa, size, VM_MEMATTR_DEFAULT)); } static void * mmu_booke_mapdev_attr(mmu_t mmu, vm_paddr_t pa, vm_size_t size, vm_memattr_t ma) { void *res; uintptr_t va; vm_size_t sz; int i; /* * Check if this is premapped in TLB1. Note: this should probably also * check whether a sequence of TLB1 entries exist that match the * requirement, but now only checks the easy case. */ if (ma == VM_MEMATTR_DEFAULT) { for (i = 0; i < tlb1_idx; i++) { if (!(tlb1[i].mas1 & MAS1_VALID)) continue; if (pa >= tlb1[i].phys && (pa + size) <= (tlb1[i].phys + tlb1[i].size)) return (void *)(tlb1[i].virt + (vm_offset_t)(pa - tlb1[i].phys)); } } size = roundup(size, PAGE_SIZE); /* * We leave a hole for device direct mapping between the maximum user * address (0x8000000) and the minimum KVA address (0xc0000000). If * devices are in there, just map them 1:1. If not, map them to the * device mapping area about VM_MAX_KERNEL_ADDRESS. These mapped * addresses should be pulled from an allocator, but since we do not * ever free TLB1 entries, it is safe just to increment a counter. * Note that there isn't a lot of address space here (128 MB) and it * is not at all difficult to imagine running out, since that is a 4:1 * compression from the 0xc0000000 - 0xf0000000 address space that gets * mapped there. */ if (pa >= (VM_MAXUSER_ADDRESS + PAGE_SIZE) && (pa + size - 1) < VM_MIN_KERNEL_ADDRESS) va = pa; else va = atomic_fetchadd_int(&tlb1_map_base, size); res = (void *)va; do { sz = 1 << (ilog2(size) & ~1); if (va % sz != 0) { do { sz >>= 2; } while (va % sz != 0); } if (bootverbose) printf("Wiring VA=%x to PA=%jx (size=%x), " "using TLB1[%d]\n", va, (uintmax_t)pa, sz, tlb1_idx); tlb1_set_entry(va, pa, sz, tlb_calc_wimg(pa, ma)); size -= sz; pa += sz; va += sz; } while (size > 0); return (res); } /* * 'Unmap' a range mapped by mmu_booke_mapdev(). */ static void mmu_booke_unmapdev(mmu_t mmu, vm_offset_t va, vm_size_t size) { #ifdef SUPPORTS_SHRINKING_TLB1 vm_offset_t base, offset; /* * Unmap only if this is inside kernel virtual space. */ if ((va >= VM_MIN_KERNEL_ADDRESS) && (va <= VM_MAX_KERNEL_ADDRESS)) { base = trunc_page(va); offset = va & PAGE_MASK; size = roundup(offset + size, PAGE_SIZE); kva_free(base, size); } #endif } /* * mmu_booke_object_init_pt preloads the ptes for a given object into the * specified pmap. This eliminates the blast of soft faults on process startup * and immediately after an mmap. */ static void mmu_booke_object_init_pt(mmu_t mmu, pmap_t pmap, vm_offset_t addr, vm_object_t object, vm_pindex_t pindex, vm_size_t size) { VM_OBJECT_ASSERT_WLOCKED(object); KASSERT(object->type == OBJT_DEVICE || object->type == OBJT_SG, ("mmu_booke_object_init_pt: non-device object")); } /* * Perform the pmap work for mincore. */ static int mmu_booke_mincore(mmu_t mmu, pmap_t pmap, vm_offset_t addr, vm_paddr_t *locked_pa) { /* XXX: this should be implemented at some point */ return (0); } /**************************************************************************/ /* TID handling */ /**************************************************************************/ /* * Allocate a TID. If necessary, steal one from someone else. * The new TID is flushed from the TLB before returning. */ static tlbtid_t tid_alloc(pmap_t pmap) { tlbtid_t tid; int thiscpu; KASSERT((pmap != kernel_pmap), ("tid_alloc: kernel pmap")); CTR2(KTR_PMAP, "%s: s (pmap = %p)", __func__, pmap); thiscpu = PCPU_GET(cpuid); tid = PCPU_GET(tid_next); if (tid > TID_MAX) tid = TID_MIN; PCPU_SET(tid_next, tid + 1); /* If we are stealing TID then clear the relevant pmap's field */ if (tidbusy[thiscpu][tid] != NULL) { CTR2(KTR_PMAP, "%s: warning: stealing tid %d", __func__, tid); tidbusy[thiscpu][tid]->pm_tid[thiscpu] = TID_NONE; /* Flush all entries from TLB0 matching this TID. */ tid_flush(tid); } tidbusy[thiscpu][tid] = pmap; pmap->pm_tid[thiscpu] = tid; __asm __volatile("msync; isync"); CTR3(KTR_PMAP, "%s: e (%02d next = %02d)", __func__, tid, PCPU_GET(tid_next)); return (tid); } /**************************************************************************/ /* TLB0 handling */ /**************************************************************************/ static void tlb_print_entry(int i, uint32_t mas1, uint32_t mas2, uint32_t mas3, uint32_t mas7) { int as; char desc[3]; tlbtid_t tid; vm_size_t size; unsigned int tsize; desc[2] = '\0'; if (mas1 & MAS1_VALID) desc[0] = 'V'; else desc[0] = ' '; if (mas1 & MAS1_IPROT) desc[1] = 'P'; else desc[1] = ' '; as = (mas1 & MAS1_TS_MASK) ? 1 : 0; tid = MAS1_GETTID(mas1); tsize = (mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT; size = 0; if (tsize) size = tsize2size(tsize); debugf("%3d: (%s) [AS=%d] " "sz = 0x%08x tsz = %d tid = %d mas1 = 0x%08x " "mas2(va) = 0x%08x mas3(pa) = 0x%08x mas7 = 0x%08x\n", i, desc, as, size, tsize, tid, mas1, mas2, mas3, mas7); } /* Convert TLB0 va and way number to tlb0[] table index. */ static inline unsigned int tlb0_tableidx(vm_offset_t va, unsigned int way) { unsigned int idx; idx = (way * TLB0_ENTRIES_PER_WAY); idx += (va & MAS2_TLB0_ENTRY_IDX_MASK) >> MAS2_TLB0_ENTRY_IDX_SHIFT; return (idx); } /* * Invalidate TLB0 entry. */ static inline void tlb0_flush_entry(vm_offset_t va) { CTR2(KTR_PMAP, "%s: s va=0x%08x", __func__, va); mtx_assert(&tlbivax_mutex, MA_OWNED); __asm __volatile("tlbivax 0, %0" :: "r"(va & MAS2_EPN_MASK)); __asm __volatile("isync; msync"); __asm __volatile("tlbsync; msync"); CTR1(KTR_PMAP, "%s: e", __func__); } /* Print out contents of the MAS registers for each TLB0 entry */ void tlb0_print_tlbentries(void) { uint32_t mas0, mas1, mas2, mas3, mas7; int entryidx, way, idx; debugf("TLB0 entries:\n"); for (way = 0; way < TLB0_WAYS; way ++) for (entryidx = 0; entryidx < TLB0_ENTRIES_PER_WAY; entryidx++) { mas0 = MAS0_TLBSEL(0) | MAS0_ESEL(way); mtspr(SPR_MAS0, mas0); __asm __volatile("isync"); mas2 = entryidx << MAS2_TLB0_ENTRY_IDX_SHIFT; mtspr(SPR_MAS2, mas2); __asm __volatile("isync; tlbre"); mas1 = mfspr(SPR_MAS1); mas2 = mfspr(SPR_MAS2); mas3 = mfspr(SPR_MAS3); mas7 = mfspr(SPR_MAS7); idx = tlb0_tableidx(mas2, way); tlb_print_entry(idx, mas1, mas2, mas3, mas7); } } /**************************************************************************/ /* TLB1 handling */ /**************************************************************************/ /* * TLB1 mapping notes: * * TLB1[0] Kernel text and data. * TLB1[1-15] Additional kernel text and data mappings (if required), PCI * windows, other devices mappings. */ /* * Write given entry to TLB1 hardware. * Use 32 bit pa, clear 4 high-order bits of RPN (mas7). */ static void tlb1_write_entry(unsigned int idx) { uint32_t mas0; //debugf("tlb1_write_entry: s\n"); /* Select entry */ mas0 = MAS0_TLBSEL(1) | MAS0_ESEL(idx); //debugf("tlb1_write_entry: mas0 = 0x%08x\n", mas0); mtspr(SPR_MAS0, mas0); __asm __volatile("isync"); mtspr(SPR_MAS1, tlb1[idx].mas1); __asm __volatile("isync"); mtspr(SPR_MAS2, tlb1[idx].mas2); __asm __volatile("isync"); mtspr(SPR_MAS3, tlb1[idx].mas3); __asm __volatile("isync"); switch ((mfpvr() >> 16) & 0xFFFF) { case FSL_E500mc: case FSL_E5500: mtspr(SPR_MAS8, 0); __asm __volatile("isync"); /* FALLTHROUGH */ case FSL_E500v2: mtspr(SPR_MAS7, tlb1[idx].mas7); __asm __volatile("isync"); break; default: break; } __asm __volatile("tlbwe; isync; msync"); //debugf("tlb1_write_entry: e\n"); } /* * Return the largest uint value log such that 2^log <= num. */ static unsigned int ilog2(unsigned int num) { int lz; __asm ("cntlzw %0, %1" : "=r" (lz) : "r" (num)); return (31 - lz); } /* * Convert TLB TSIZE value to mapped region size. */ static vm_size_t tsize2size(unsigned int tsize) { /* * size = 4^tsize KB * size = 4^tsize * 2^10 = 2^(2 * tsize - 10) */ return ((1 << (2 * tsize)) * 1024); } /* * Convert region size (must be power of 4) to TLB TSIZE value. */ static unsigned int size2tsize(vm_size_t size) { return (ilog2(size) / 2 - 5); } /* * Register permanent kernel mapping in TLB1. * * Entries are created starting from index 0 (current free entry is * kept in tlb1_idx) and are not supposed to be invalidated. */ static int tlb1_set_entry(vm_offset_t va, vm_paddr_t pa, vm_size_t size, uint32_t flags) { uint32_t ts, tid; int tsize, index; index = atomic_fetchadd_int(&tlb1_idx, 1); if (index >= TLB1_ENTRIES) { printf("tlb1_set_entry: TLB1 full!\n"); return (-1); } /* Convert size to TSIZE */ tsize = size2tsize(size); tid = (TID_KERNEL << MAS1_TID_SHIFT) & MAS1_TID_MASK; /* XXX TS is hard coded to 0 for now as we only use single address space */ ts = (0 << MAS1_TS_SHIFT) & MAS1_TS_MASK; /* * Atomicity is preserved by the atomic increment above since nothing * is ever removed from tlb1. */ tlb1[index].phys = pa; tlb1[index].virt = va; tlb1[index].size = size; tlb1[index].mas1 = MAS1_VALID | MAS1_IPROT | ts | tid; tlb1[index].mas1 |= ((tsize << MAS1_TSIZE_SHIFT) & MAS1_TSIZE_MASK); tlb1[index].mas2 = (va & MAS2_EPN_MASK) | flags; /* Set supervisor RWX permission bits */ tlb1[index].mas3 = (pa & MAS3_RPN) | MAS3_SR | MAS3_SW | MAS3_SX; tlb1[index].mas7 = (pa >> 32) & MAS7_RPN; tlb1_write_entry(index); /* * XXX in general TLB1 updates should be propagated between CPUs, * since current design assumes to have the same TLB1 set-up on all * cores. */ return (0); } /* * Map in contiguous RAM region into the TLB1 using maximum of * KERNEL_REGION_MAX_TLB_ENTRIES entries. * * If necessary round up last entry size and return total size * used by all allocated entries. */ vm_size_t tlb1_mapin_region(vm_offset_t va, vm_paddr_t pa, vm_size_t size) { vm_size_t pgs[KERNEL_REGION_MAX_TLB_ENTRIES]; vm_size_t mapped, pgsz, base, mask; int idx, nents; /* Round up to the next 1M */ size = (size + (1 << 20) - 1) & ~((1 << 20) - 1); mapped = 0; idx = 0; base = va; pgsz = 64*1024*1024; while (mapped < size) { while (mapped < size && idx < KERNEL_REGION_MAX_TLB_ENTRIES) { while (pgsz > (size - mapped)) pgsz >>= 2; pgs[idx++] = pgsz; mapped += pgsz; } /* We under-map. Correct for this. */ if (mapped < size) { while (pgs[idx - 1] == pgsz) { idx--; mapped -= pgsz; } /* XXX We may increase beyond out starting point. */ pgsz <<= 2; pgs[idx++] = pgsz; mapped += pgsz; } } nents = idx; mask = pgs[0] - 1; /* Align address to the boundary */ if (va & mask) { va = (va + mask) & ~mask; pa = (pa + mask) & ~mask; } for (idx = 0; idx < nents; idx++) { pgsz = pgs[idx]; debugf("%u: %llx -> %x, size=%x\n", idx, pa, va, pgsz); tlb1_set_entry(va, pa, pgsz, _TLB_ENTRY_MEM); pa += pgsz; va += pgsz; } mapped = (va - base); #ifdef __powerpc64__ printf("mapped size 0x%016lx (wasted space 0x%16lx)\n", #else printf("mapped size 0x%08x (wasted space 0x%08x)\n", #endif mapped, mapped - size); return (mapped); } /* * TLB1 initialization routine, to be called after the very first * assembler level setup done in locore.S. */ void tlb1_init() { uint32_t mas0, mas1, mas2, mas3, mas7; uint32_t tsz; int i; tlb1_idx = 1; tlb1_get_tlbconf(); mas0 = MAS0_TLBSEL(1) | MAS0_ESEL(0); mtspr(SPR_MAS0, mas0); __asm __volatile("isync; tlbre"); mas1 = mfspr(SPR_MAS1); mas2 = mfspr(SPR_MAS2); mas3 = mfspr(SPR_MAS3); mas7 = mfspr(SPR_MAS7); tlb1[0].mas1 = mas1; tlb1[0].mas2 = mfspr(SPR_MAS2); tlb1[0].mas3 = mas3; tlb1[0].mas7 = mas7; tlb1[0].virt = mas2 & MAS2_EPN_MASK; tlb1[0].phys = ((vm_paddr_t)(mas7 & MAS7_RPN) << 32) | (mas3 & MAS3_RPN); kernload = tlb1[0].phys; tsz = (mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT; tlb1[0].size = (tsz > 0) ? tsize2size(tsz) : 0; kernsize += tlb1[0].size; #ifdef SMP bp_ntlb1s = tlb1_idx; #endif /* Purge the remaining entries */ for (i = tlb1_idx; i < TLB1_ENTRIES; i++) tlb1_write_entry(i); /* Setup TLB miss defaults */ set_mas4_defaults(); } vm_offset_t pmap_early_io_map(vm_paddr_t pa, vm_size_t size) { vm_paddr_t pa_base; vm_offset_t va, sz; int i; KASSERT(!pmap_bootstrapped, ("Do not use after PMAP is up!")); for (i = 0; i < tlb1_idx; i++) { if (!(tlb1[i].mas1 & MAS1_VALID)) continue; if (pa >= tlb1[i].phys && (pa + size) <= (tlb1[i].phys + tlb1[i].size)) return (tlb1[i].virt + (pa - tlb1[i].phys)); } pa_base = rounddown(pa, PAGE_SIZE); size = roundup(size + (pa - pa_base), PAGE_SIZE); tlb1_map_base = roundup2(tlb1_map_base, 1 << (ilog2(size) & ~1)); va = tlb1_map_base + (pa - pa_base); do { sz = 1 << (ilog2(size) & ~1); tlb1_set_entry(tlb1_map_base, pa_base, sz, _TLB_ENTRY_IO); size -= sz; pa_base += sz; tlb1_map_base += sz; } while (size > 0); #ifdef SMP bp_ntlb1s = tlb1_idx; #endif return (va); } /* * Setup MAS4 defaults. * These values are loaded to MAS0-2 on a TLB miss. */ static void set_mas4_defaults(void) { uint32_t mas4; /* Defaults: TLB0, PID0, TSIZED=4K */ mas4 = MAS4_TLBSELD0; mas4 |= (TLB_SIZE_4K << MAS4_TSIZED_SHIFT) & MAS4_TSIZED_MASK; #ifdef SMP mas4 |= MAS4_MD; #endif mtspr(SPR_MAS4, mas4); __asm __volatile("isync"); } /* * Print out contents of the MAS registers for each TLB1 entry */ void tlb1_print_tlbentries(void) { uint32_t mas0, mas1, mas2, mas3, mas7; int i; debugf("TLB1 entries:\n"); for (i = 0; i < TLB1_ENTRIES; i++) { mas0 = MAS0_TLBSEL(1) | MAS0_ESEL(i); mtspr(SPR_MAS0, mas0); __asm __volatile("isync; tlbre"); mas1 = mfspr(SPR_MAS1); mas2 = mfspr(SPR_MAS2); mas3 = mfspr(SPR_MAS3); mas7 = mfspr(SPR_MAS7); tlb_print_entry(i, mas1, mas2, mas3, mas7); } } /* * Print out contents of the in-ram tlb1 table. */ void tlb1_print_entries(void) { int i; debugf("tlb1[] table entries:\n"); for (i = 0; i < TLB1_ENTRIES; i++) tlb_print_entry(i, tlb1[i].mas1, tlb1[i].mas2, tlb1[i].mas3, tlb1[i].mas7); } /* * Return 0 if the physical IO range is encompassed by one of the * the TLB1 entries, otherwise return related error code. */ static int tlb1_iomapped(int i, vm_paddr_t pa, vm_size_t size, vm_offset_t *va) { uint32_t prot; vm_paddr_t pa_start; vm_paddr_t pa_end; unsigned int entry_tsize; vm_size_t entry_size; *va = (vm_offset_t)NULL; /* Skip invalid entries */ if (!(tlb1[i].mas1 & MAS1_VALID)) return (EINVAL); /* * The entry must be cache-inhibited, guarded, and r/w * so it can function as an i/o page */ prot = tlb1[i].mas2 & (MAS2_I | MAS2_G); if (prot != (MAS2_I | MAS2_G)) return (EPERM); prot = tlb1[i].mas3 & (MAS3_SR | MAS3_SW); if (prot != (MAS3_SR | MAS3_SW)) return (EPERM); /* The address should be within the entry range. */ entry_tsize = (tlb1[i].mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT; KASSERT((entry_tsize), ("tlb1_iomapped: invalid entry tsize")); entry_size = tsize2size(entry_tsize); pa_start = (((vm_paddr_t)tlb1[i].mas7 & MAS7_RPN) << 32) | (tlb1[i].mas3 & MAS3_RPN); pa_end = pa_start + entry_size; if ((pa < pa_start) || ((pa + size) > pa_end)) return (ERANGE); /* Return virtual address of this mapping. */ *va = (tlb1[i].mas2 & MAS2_EPN_MASK) + (pa - pa_start); return (0); } /* * Invalidate all TLB0 entries which match the given TID. Note this is * dedicated for cases when invalidations should NOT be propagated to other * CPUs. */ static void tid_flush(tlbtid_t tid) { register_t msr; uint32_t mas0, mas1, mas2; int entry, way; /* Don't evict kernel translations */ if (tid == TID_KERNEL) return; msr = mfmsr(); __asm __volatile("wrteei 0"); for (way = 0; way < TLB0_WAYS; way++) for (entry = 0; entry < TLB0_ENTRIES_PER_WAY; entry++) { mas0 = MAS0_TLBSEL(0) | MAS0_ESEL(way); mtspr(SPR_MAS0, mas0); __asm __volatile("isync"); mas2 = entry << MAS2_TLB0_ENTRY_IDX_SHIFT; mtspr(SPR_MAS2, mas2); __asm __volatile("isync; tlbre"); mas1 = mfspr(SPR_MAS1); if (!(mas1 & MAS1_VALID)) continue; if (((mas1 & MAS1_TID_MASK) >> MAS1_TID_SHIFT) != tid) continue; mas1 &= ~MAS1_VALID; mtspr(SPR_MAS1, mas1); __asm __volatile("isync; tlbwe; isync; msync"); } mtmsr(msr); } Index: projects/clang380-import/sys/sys/ttydevsw.h =================================================================== --- projects/clang380-import/sys/sys/ttydevsw.h (revision 294776) +++ projects/clang380-import/sys/sys/ttydevsw.h (revision 294777) @@ -1,197 +1,209 @@ /*- * Copyright (c) 2008 Ed Schouten * All rights reserved. * * Portions of this software were developed under sponsorship from Snow * B.V., the Netherlands. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _SYS_TTYDEVSW_H_ #define _SYS_TTYDEVSW_H_ #ifndef _SYS_TTY_H_ #error "can only be included through " #endif /* !_SYS_TTY_H_ */ /* * Driver routines that are called from the line discipline to adjust * hardware parameters and such. */ typedef int tsw_open_t(struct tty *tp); typedef void tsw_close_t(struct tty *tp); typedef void tsw_outwakeup_t(struct tty *tp); typedef void tsw_inwakeup_t(struct tty *tp); typedef int tsw_ioctl_t(struct tty *tp, u_long cmd, caddr_t data, struct thread *td); typedef int tsw_cioctl_t(struct tty *tp, int unit, u_long cmd, caddr_t data, struct thread *td); typedef int tsw_param_t(struct tty *tp, struct termios *t); typedef int tsw_modem_t(struct tty *tp, int sigon, int sigoff); typedef int tsw_mmap_t(struct tty *tp, vm_ooffset_t offset, vm_paddr_t * paddr, int nprot, vm_memattr_t *memattr); typedef void tsw_pktnotify_t(struct tty *tp, char event); typedef void tsw_free_t(void *softc); typedef bool tsw_busy_t(struct tty *tp); struct ttydevsw { unsigned int tsw_flags; /* Default TTY flags. */ tsw_open_t *tsw_open; /* Device opening. */ tsw_close_t *tsw_close; /* Device closure. */ tsw_outwakeup_t *tsw_outwakeup; /* Output available. */ tsw_inwakeup_t *tsw_inwakeup; /* Input can be stored again. */ tsw_ioctl_t *tsw_ioctl; /* ioctl() hooks. */ tsw_cioctl_t *tsw_cioctl; /* ioctl() on control devices. */ tsw_param_t *tsw_param; /* TIOCSETA device parameter setting. */ tsw_modem_t *tsw_modem; /* Modem sigon/sigoff. */ tsw_mmap_t *tsw_mmap; /* mmap() hooks. */ tsw_pktnotify_t *tsw_pktnotify; /* TIOCPKT events. */ tsw_free_t *tsw_free; /* Destructor. */ tsw_busy_t *tsw_busy; /* Draining output. */ void *tsw_spare[3]; /* For future use. */ }; static __inline int ttydevsw_open(struct tty *tp) { + tty_lock_assert(tp, MA_OWNED); MPASS(!tty_gone(tp)); - return tp->t_devsw->tsw_open(tp); + return (tp->t_devsw->tsw_open(tp)); } static __inline void ttydevsw_close(struct tty *tp) { + tty_lock_assert(tp, MA_OWNED); MPASS(!tty_gone(tp)); tp->t_devsw->tsw_close(tp); } static __inline void ttydevsw_outwakeup(struct tty *tp) { + tty_lock_assert(tp, MA_OWNED); MPASS(!tty_gone(tp)); /* Prevent spurious wakeups. */ if (ttydisc_getc_poll(tp) == 0) return; tp->t_devsw->tsw_outwakeup(tp); } static __inline void ttydevsw_inwakeup(struct tty *tp) { + tty_lock_assert(tp, MA_OWNED); MPASS(!tty_gone(tp)); /* Prevent spurious wakeups. */ if (tp->t_flags & TF_HIWAT_IN) return; tp->t_devsw->tsw_inwakeup(tp); } static __inline int ttydevsw_ioctl(struct tty *tp, u_long cmd, caddr_t data, struct thread *td) { + tty_lock_assert(tp, MA_OWNED); MPASS(!tty_gone(tp)); - return tp->t_devsw->tsw_ioctl(tp, cmd, data, td); + return (tp->t_devsw->tsw_ioctl(tp, cmd, data, td)); } static __inline int -ttydevsw_cioctl(struct tty *tp, int unit, u_long cmd, caddr_t data, struct thread *td) +ttydevsw_cioctl(struct tty *tp, int unit, u_long cmd, caddr_t data, + struct thread *td) { + tty_lock_assert(tp, MA_OWNED); MPASS(!tty_gone(tp)); - return tp->t_devsw->tsw_cioctl(tp, unit, cmd, data, td); + return (tp->t_devsw->tsw_cioctl(tp, unit, cmd, data, td)); } static __inline int ttydevsw_param(struct tty *tp, struct termios *t) { + MPASS(!tty_gone(tp)); - return tp->t_devsw->tsw_param(tp, t); + return (tp->t_devsw->tsw_param(tp, t)); } static __inline int ttydevsw_modem(struct tty *tp, int sigon, int sigoff) { + MPASS(!tty_gone(tp)); - return tp->t_devsw->tsw_modem(tp, sigon, sigoff); + return (tp->t_devsw->tsw_modem(tp, sigon, sigoff)); } static __inline int ttydevsw_mmap(struct tty *tp, vm_ooffset_t offset, vm_paddr_t *paddr, int nprot, vm_memattr_t *memattr) { + MPASS(!tty_gone(tp)); - return tp->t_devsw->tsw_mmap(tp, offset, paddr, nprot, memattr); + return (tp->t_devsw->tsw_mmap(tp, offset, paddr, nprot, memattr)); } static __inline void ttydevsw_pktnotify(struct tty *tp, char event) { + tty_lock_assert(tp, MA_OWNED); MPASS(!tty_gone(tp)); tp->t_devsw->tsw_pktnotify(tp, event); } static __inline void ttydevsw_free(struct tty *tp) { + MPASS(tty_gone(tp)); tp->t_devsw->tsw_free(tty_softc(tp)); } static __inline bool ttydevsw_busy(struct tty *tp) { tty_lock_assert(tp, MA_OWNED); MPASS(!tty_gone(tp)); return (tp->t_devsw->tsw_busy(tp)); } #endif /* !_SYS_TTYDEVSW_H_ */ Index: projects/clang380-import/sys/vm/vm_map.c =================================================================== --- projects/clang380-import/sys/vm/vm_map.c (revision 294776) +++ projects/clang380-import/sys/vm/vm_map.c (revision 294777) @@ -1,4326 +1,4326 @@ /*- * Copyright (c) 1991, 1993 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * The Mach Operating System project at Carnegie-Mellon University. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: @(#)vm_map.c 8.3 (Berkeley) 1/12/94 * * * Copyright (c) 1987, 1990 Carnegie-Mellon University. * All rights reserved. * * Authors: Avadis Tevanian, Jr., Michael Wayne Young * * Permission to use, copy, modify and distribute this software and * its documentation is hereby granted, provided that both the copyright * notice and this permission notice appear in all copies of the * software, derivative works or modified versions, and any portions * thereof, and that both notices appear in supporting documentation. * * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" * CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND * FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. * * Carnegie Mellon requests users of this software to return to * * Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU * School of Computer Science * Carnegie Mellon University * Pittsburgh PA 15213-3890 * * any improvements or extensions that they make and grant Carnegie the * rights to redistribute these changes. */ /* * Virtual memory mapping module. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* * Virtual memory maps provide for the mapping, protection, * and sharing of virtual memory objects. In addition, * this module provides for an efficient virtual copy of * memory from one map to another. * * Synchronization is required prior to most operations. * * Maps consist of an ordered doubly-linked list of simple * entries; a self-adjusting binary search tree of these * entries is used to speed up lookups. * * Since portions of maps are specified by start/end addresses, * which may not align with existing map entries, all * routines merely "clip" entries to these start/end values. * [That is, an entry is split into two, bordering at a * start or end value.] Note that these clippings may not * always be necessary (as the two resulting entries are then * not changed); however, the clipping is done for convenience. * * As mentioned above, virtual copy operations are performed * by copying VM object references from one map to * another, and then marking both regions as copy-on-write. */ static struct mtx map_sleep_mtx; static uma_zone_t mapentzone; static uma_zone_t kmapentzone; static uma_zone_t mapzone; static uma_zone_t vmspace_zone; static int vmspace_zinit(void *mem, int size, int flags); static int vm_map_zinit(void *mem, int ize, int flags); static void _vm_map_init(vm_map_t map, pmap_t pmap, vm_offset_t min, vm_offset_t max); static void vm_map_entry_deallocate(vm_map_entry_t entry, boolean_t system_map); static void vm_map_entry_dispose(vm_map_t map, vm_map_entry_t entry); static void vm_map_entry_unwire(vm_map_t map, vm_map_entry_t entry); static void vm_map_pmap_enter(vm_map_t map, vm_offset_t addr, vm_prot_t prot, vm_object_t object, vm_pindex_t pindex, vm_size_t size, int flags); #ifdef INVARIANTS static void vm_map_zdtor(void *mem, int size, void *arg); static void vmspace_zdtor(void *mem, int size, void *arg); #endif static int vm_map_stack_locked(vm_map_t map, vm_offset_t addrbos, vm_size_t max_ssize, vm_size_t growsize, vm_prot_t prot, vm_prot_t max, int cow); static void vm_map_wire_entry_failure(vm_map_t map, vm_map_entry_t entry, vm_offset_t failed_addr); #define ENTRY_CHARGED(e) ((e)->cred != NULL || \ ((e)->object.vm_object != NULL && (e)->object.vm_object->cred != NULL && \ !((e)->eflags & MAP_ENTRY_NEEDS_COPY))) /* * PROC_VMSPACE_{UN,}LOCK() can be a noop as long as vmspaces are type * stable. */ #define PROC_VMSPACE_LOCK(p) do { } while (0) #define PROC_VMSPACE_UNLOCK(p) do { } while (0) /* * VM_MAP_RANGE_CHECK: [ internal use only ] * * Asserts that the starting and ending region * addresses fall within the valid range of the map. */ #define VM_MAP_RANGE_CHECK(map, start, end) \ { \ if (start < vm_map_min(map)) \ start = vm_map_min(map); \ if (end > vm_map_max(map)) \ end = vm_map_max(map); \ if (start > end) \ start = end; \ } /* * vm_map_startup: * * Initialize the vm_map module. Must be called before * any other vm_map routines. * * Map and entry structures are allocated from the general * purpose memory pool with some exceptions: * * - The kernel map and kmem submap are allocated statically. * - Kernel map entries are allocated out of a static pool. * * These restrictions are necessary since malloc() uses the * maps and requires map entries. */ void vm_map_startup(void) { mtx_init(&map_sleep_mtx, "vm map sleep mutex", NULL, MTX_DEF); mapzone = uma_zcreate("MAP", sizeof(struct vm_map), NULL, #ifdef INVARIANTS vm_map_zdtor, #else NULL, #endif vm_map_zinit, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); uma_prealloc(mapzone, MAX_KMAP); kmapentzone = uma_zcreate("KMAP ENTRY", sizeof(struct vm_map_entry), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_MTXCLASS | UMA_ZONE_VM); mapentzone = uma_zcreate("MAP ENTRY", sizeof(struct vm_map_entry), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); vmspace_zone = uma_zcreate("VMSPACE", sizeof(struct vmspace), NULL, #ifdef INVARIANTS vmspace_zdtor, #else NULL, #endif vmspace_zinit, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE); } static int vmspace_zinit(void *mem, int size, int flags) { struct vmspace *vm; vm = (struct vmspace *)mem; vm->vm_map.pmap = NULL; (void)vm_map_zinit(&vm->vm_map, sizeof(vm->vm_map), flags); PMAP_LOCK_INIT(vmspace_pmap(vm)); return (0); } static int vm_map_zinit(void *mem, int size, int flags) { vm_map_t map; map = (vm_map_t)mem; memset(map, 0, sizeof(*map)); mtx_init(&map->system_mtx, "vm map (system)", NULL, MTX_DEF | MTX_DUPOK); sx_init(&map->lock, "vm map (user)"); return (0); } #ifdef INVARIANTS static void vmspace_zdtor(void *mem, int size, void *arg) { struct vmspace *vm; vm = (struct vmspace *)mem; vm_map_zdtor(&vm->vm_map, sizeof(vm->vm_map), arg); } static void vm_map_zdtor(void *mem, int size, void *arg) { vm_map_t map; map = (vm_map_t)mem; KASSERT(map->nentries == 0, ("map %p nentries == %d on free.", map, map->nentries)); KASSERT(map->size == 0, ("map %p size == %lu on free.", map, (unsigned long)map->size)); } #endif /* INVARIANTS */ /* * Allocate a vmspace structure, including a vm_map and pmap, * and initialize those structures. The refcnt is set to 1. * * If 'pinit' is NULL then the embedded pmap is initialized via pmap_pinit(). */ struct vmspace * vmspace_alloc(vm_offset_t min, vm_offset_t max, pmap_pinit_t pinit) { struct vmspace *vm; vm = uma_zalloc(vmspace_zone, M_WAITOK); KASSERT(vm->vm_map.pmap == NULL, ("vm_map.pmap must be NULL")); if (pinit == NULL) pinit = &pmap_pinit; if (!pinit(vmspace_pmap(vm))) { uma_zfree(vmspace_zone, vm); return (NULL); } CTR1(KTR_VM, "vmspace_alloc: %p", vm); _vm_map_init(&vm->vm_map, vmspace_pmap(vm), min, max); vm->vm_refcnt = 1; vm->vm_shm = NULL; vm->vm_swrss = 0; vm->vm_tsize = 0; vm->vm_dsize = 0; vm->vm_ssize = 0; vm->vm_taddr = 0; vm->vm_daddr = 0; vm->vm_maxsaddr = 0; return (vm); } #ifdef RACCT static void vmspace_container_reset(struct proc *p) { PROC_LOCK(p); racct_set(p, RACCT_DATA, 0); racct_set(p, RACCT_STACK, 0); racct_set(p, RACCT_RSS, 0); racct_set(p, RACCT_MEMLOCK, 0); racct_set(p, RACCT_VMEM, 0); PROC_UNLOCK(p); } #endif static inline void vmspace_dofree(struct vmspace *vm) { CTR1(KTR_VM, "vmspace_free: %p", vm); /* * Make sure any SysV shm is freed, it might not have been in * exit1(). */ shmexit(vm); /* * Lock the map, to wait out all other references to it. * Delete all of the mappings and pages they hold, then call * the pmap module to reclaim anything left. */ (void)vm_map_remove(&vm->vm_map, vm->vm_map.min_offset, vm->vm_map.max_offset); pmap_release(vmspace_pmap(vm)); vm->vm_map.pmap = NULL; uma_zfree(vmspace_zone, vm); } void vmspace_free(struct vmspace *vm) { WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "vmspace_free() called with non-sleepable lock held"); if (vm->vm_refcnt == 0) panic("vmspace_free: attempt to free already freed vmspace"); if (atomic_fetchadd_int(&vm->vm_refcnt, -1) == 1) vmspace_dofree(vm); } void vmspace_exitfree(struct proc *p) { struct vmspace *vm; PROC_VMSPACE_LOCK(p); vm = p->p_vmspace; p->p_vmspace = NULL; PROC_VMSPACE_UNLOCK(p); KASSERT(vm == &vmspace0, ("vmspace_exitfree: wrong vmspace")); vmspace_free(vm); } void vmspace_exit(struct thread *td) { int refcnt; struct vmspace *vm; struct proc *p; /* * Release user portion of address space. * This releases references to vnodes, * which could cause I/O if the file has been unlinked. * Need to do this early enough that we can still sleep. * * The last exiting process to reach this point releases as * much of the environment as it can. vmspace_dofree() is the * slower fallback in case another process had a temporary * reference to the vmspace. */ p = td->td_proc; vm = p->p_vmspace; atomic_add_int(&vmspace0.vm_refcnt, 1); do { refcnt = vm->vm_refcnt; if (refcnt > 1 && p->p_vmspace != &vmspace0) { /* Switch now since other proc might free vmspace */ PROC_VMSPACE_LOCK(p); p->p_vmspace = &vmspace0; PROC_VMSPACE_UNLOCK(p); pmap_activate(td); } } while (!atomic_cmpset_int(&vm->vm_refcnt, refcnt, refcnt - 1)); if (refcnt == 1) { if (p->p_vmspace != vm) { /* vmspace not yet freed, switch back */ PROC_VMSPACE_LOCK(p); p->p_vmspace = vm; PROC_VMSPACE_UNLOCK(p); pmap_activate(td); } pmap_remove_pages(vmspace_pmap(vm)); /* Switch now since this proc will free vmspace */ PROC_VMSPACE_LOCK(p); p->p_vmspace = &vmspace0; PROC_VMSPACE_UNLOCK(p); pmap_activate(td); vmspace_dofree(vm); } #ifdef RACCT if (racct_enable) vmspace_container_reset(p); #endif } /* Acquire reference to vmspace owned by another process. */ struct vmspace * vmspace_acquire_ref(struct proc *p) { struct vmspace *vm; int refcnt; PROC_VMSPACE_LOCK(p); vm = p->p_vmspace; if (vm == NULL) { PROC_VMSPACE_UNLOCK(p); return (NULL); } do { refcnt = vm->vm_refcnt; if (refcnt <= 0) { /* Avoid 0->1 transition */ PROC_VMSPACE_UNLOCK(p); return (NULL); } } while (!atomic_cmpset_int(&vm->vm_refcnt, refcnt, refcnt + 1)); if (vm != p->p_vmspace) { PROC_VMSPACE_UNLOCK(p); vmspace_free(vm); return (NULL); } PROC_VMSPACE_UNLOCK(p); return (vm); } /* * Switch between vmspaces in an AIO kernel process. * * The AIO kernel processes switch to and from a user process's * vmspace while performing an I/O operation on behalf of a user * process. The new vmspace is either the vmspace of a user process * obtained from an active AIO request or the initial vmspace of the * AIO kernel process (when it is idling). Because user processes * will block to drain any active AIO requests before proceeding in * exit() or execve(), the vmspace reference count for these vmspaces * can never be 0. This allows for a much simpler implementation than * the loop in vmspace_acquire_ref() above. Similarly, AIO kernel * processes hold an extra reference on their initial vmspace for the * life of the process so that this guarantee is true for any vmspace * passed as 'newvm'. */ void vmspace_switch_aio(struct vmspace *newvm) { struct vmspace *oldvm; /* XXX: Need some way to assert that this is an aio daemon. */ KASSERT(newvm->vm_refcnt > 0, ("vmspace_switch_aio: newvm unreferenced")); oldvm = curproc->p_vmspace; if (oldvm == newvm) return; /* * Point to the new address space and refer to it. */ curproc->p_vmspace = newvm; atomic_add_int(&newvm->vm_refcnt, 1); /* Activate the new mapping. */ pmap_activate(curthread); /* Remove the daemon's reference to the old address space. */ KASSERT(oldvm->vm_refcnt > 1, ("vmspace_switch_aio: oldvm dropping last reference")); vmspace_free(oldvm); } void _vm_map_lock(vm_map_t map, const char *file, int line) { if (map->system_map) mtx_lock_flags_(&map->system_mtx, 0, file, line); else sx_xlock_(&map->lock, file, line); map->timestamp++; } static void vm_map_process_deferred(void) { struct thread *td; vm_map_entry_t entry, next; vm_object_t object; td = curthread; entry = td->td_map_def_user; td->td_map_def_user = NULL; while (entry != NULL) { next = entry->next; if ((entry->eflags & MAP_ENTRY_VN_WRITECNT) != 0) { /* * Decrement the object's writemappings and * possibly the vnode's v_writecount. */ KASSERT((entry->eflags & MAP_ENTRY_IS_SUB_MAP) == 0, ("Submap with writecount")); object = entry->object.vm_object; KASSERT(object != NULL, ("No object for writecount")); vnode_pager_release_writecount(object, entry->start, entry->end); } vm_map_entry_deallocate(entry, FALSE); entry = next; } } void _vm_map_unlock(vm_map_t map, const char *file, int line) { if (map->system_map) mtx_unlock_flags_(&map->system_mtx, 0, file, line); else { sx_xunlock_(&map->lock, file, line); vm_map_process_deferred(); } } void _vm_map_lock_read(vm_map_t map, const char *file, int line) { if (map->system_map) mtx_lock_flags_(&map->system_mtx, 0, file, line); else sx_slock_(&map->lock, file, line); } void _vm_map_unlock_read(vm_map_t map, const char *file, int line) { if (map->system_map) mtx_unlock_flags_(&map->system_mtx, 0, file, line); else { sx_sunlock_(&map->lock, file, line); vm_map_process_deferred(); } } int _vm_map_trylock(vm_map_t map, const char *file, int line) { int error; error = map->system_map ? !mtx_trylock_flags_(&map->system_mtx, 0, file, line) : !sx_try_xlock_(&map->lock, file, line); if (error == 0) map->timestamp++; return (error == 0); } int _vm_map_trylock_read(vm_map_t map, const char *file, int line) { int error; error = map->system_map ? !mtx_trylock_flags_(&map->system_mtx, 0, file, line) : !sx_try_slock_(&map->lock, file, line); return (error == 0); } /* * _vm_map_lock_upgrade: [ internal use only ] * * Tries to upgrade a read (shared) lock on the specified map to a write * (exclusive) lock. Returns the value "0" if the upgrade succeeds and a * non-zero value if the upgrade fails. If the upgrade fails, the map is * returned without a read or write lock held. * * Requires that the map be read locked. */ int _vm_map_lock_upgrade(vm_map_t map, const char *file, int line) { unsigned int last_timestamp; if (map->system_map) { mtx_assert_(&map->system_mtx, MA_OWNED, file, line); } else { if (!sx_try_upgrade_(&map->lock, file, line)) { last_timestamp = map->timestamp; sx_sunlock_(&map->lock, file, line); vm_map_process_deferred(); /* * If the map's timestamp does not change while the * map is unlocked, then the upgrade succeeds. */ sx_xlock_(&map->lock, file, line); if (last_timestamp != map->timestamp) { sx_xunlock_(&map->lock, file, line); return (1); } } } map->timestamp++; return (0); } void _vm_map_lock_downgrade(vm_map_t map, const char *file, int line) { if (map->system_map) { mtx_assert_(&map->system_mtx, MA_OWNED, file, line); } else sx_downgrade_(&map->lock, file, line); } /* * vm_map_locked: * * Returns a non-zero value if the caller holds a write (exclusive) lock * on the specified map and the value "0" otherwise. */ int vm_map_locked(vm_map_t map) { if (map->system_map) return (mtx_owned(&map->system_mtx)); else return (sx_xlocked(&map->lock)); } #ifdef INVARIANTS static void _vm_map_assert_locked(vm_map_t map, const char *file, int line) { if (map->system_map) mtx_assert_(&map->system_mtx, MA_OWNED, file, line); else sx_assert_(&map->lock, SA_XLOCKED, file, line); } #define VM_MAP_ASSERT_LOCKED(map) \ _vm_map_assert_locked(map, LOCK_FILE, LOCK_LINE) #else #define VM_MAP_ASSERT_LOCKED(map) #endif /* * _vm_map_unlock_and_wait: * * Atomically releases the lock on the specified map and puts the calling * thread to sleep. The calling thread will remain asleep until either * vm_map_wakeup() is performed on the map or the specified timeout is * exceeded. * * WARNING! This function does not perform deferred deallocations of * objects and map entries. Therefore, the calling thread is expected to * reacquire the map lock after reawakening and later perform an ordinary * unlock operation, such as vm_map_unlock(), before completing its * operation on the map. */ int _vm_map_unlock_and_wait(vm_map_t map, int timo, const char *file, int line) { mtx_lock(&map_sleep_mtx); if (map->system_map) mtx_unlock_flags_(&map->system_mtx, 0, file, line); else sx_xunlock_(&map->lock, file, line); return (msleep(&map->root, &map_sleep_mtx, PDROP | PVM, "vmmaps", timo)); } /* * vm_map_wakeup: * * Awaken any threads that have slept on the map using * vm_map_unlock_and_wait(). */ void vm_map_wakeup(vm_map_t map) { /* * Acquire and release map_sleep_mtx to prevent a wakeup() * from being performed (and lost) between the map unlock * and the msleep() in _vm_map_unlock_and_wait(). */ mtx_lock(&map_sleep_mtx); mtx_unlock(&map_sleep_mtx); wakeup(&map->root); } void vm_map_busy(vm_map_t map) { VM_MAP_ASSERT_LOCKED(map); map->busy++; } void vm_map_unbusy(vm_map_t map) { VM_MAP_ASSERT_LOCKED(map); KASSERT(map->busy, ("vm_map_unbusy: not busy")); if (--map->busy == 0 && (map->flags & MAP_BUSY_WAKEUP)) { vm_map_modflags(map, 0, MAP_BUSY_WAKEUP); wakeup(&map->busy); } } void vm_map_wait_busy(vm_map_t map) { VM_MAP_ASSERT_LOCKED(map); while (map->busy) { vm_map_modflags(map, MAP_BUSY_WAKEUP, 0); if (map->system_map) msleep(&map->busy, &map->system_mtx, 0, "mbusy", 0); else sx_sleep(&map->busy, &map->lock, 0, "mbusy", 0); } map->timestamp++; } long vmspace_resident_count(struct vmspace *vmspace) { return pmap_resident_count(vmspace_pmap(vmspace)); } /* * vm_map_create: * * Creates and returns a new empty VM map with * the given physical map structure, and having * the given lower and upper address bounds. */ vm_map_t vm_map_create(pmap_t pmap, vm_offset_t min, vm_offset_t max) { vm_map_t result; result = uma_zalloc(mapzone, M_WAITOK); CTR1(KTR_VM, "vm_map_create: %p", result); _vm_map_init(result, pmap, min, max); return (result); } /* * Initialize an existing vm_map structure * such as that in the vmspace structure. */ static void _vm_map_init(vm_map_t map, pmap_t pmap, vm_offset_t min, vm_offset_t max) { map->header.next = map->header.prev = &map->header; map->needs_wakeup = FALSE; map->system_map = 0; map->pmap = pmap; map->min_offset = min; map->max_offset = max; map->flags = 0; map->root = NULL; map->timestamp = 0; map->busy = 0; } void vm_map_init(vm_map_t map, pmap_t pmap, vm_offset_t min, vm_offset_t max) { _vm_map_init(map, pmap, min, max); mtx_init(&map->system_mtx, "system map", NULL, MTX_DEF | MTX_DUPOK); sx_init(&map->lock, "user map"); } /* * vm_map_entry_dispose: [ internal use only ] * * Inverse of vm_map_entry_create. */ static void vm_map_entry_dispose(vm_map_t map, vm_map_entry_t entry) { uma_zfree(map->system_map ? kmapentzone : mapentzone, entry); } /* * vm_map_entry_create: [ internal use only ] * * Allocates a VM map entry for insertion. * No entry fields are filled in. */ static vm_map_entry_t vm_map_entry_create(vm_map_t map) { vm_map_entry_t new_entry; if (map->system_map) new_entry = uma_zalloc(kmapentzone, M_NOWAIT); else new_entry = uma_zalloc(mapentzone, M_WAITOK); if (new_entry == NULL) panic("vm_map_entry_create: kernel resources exhausted"); return (new_entry); } /* * vm_map_entry_set_behavior: * * Set the expected access behavior, either normal, random, or * sequential. */ static inline void vm_map_entry_set_behavior(vm_map_entry_t entry, u_char behavior) { entry->eflags = (entry->eflags & ~MAP_ENTRY_BEHAV_MASK) | (behavior & MAP_ENTRY_BEHAV_MASK); } /* * vm_map_entry_set_max_free: * * Set the max_free field in a vm_map_entry. */ static inline void vm_map_entry_set_max_free(vm_map_entry_t entry) { entry->max_free = entry->adj_free; if (entry->left != NULL && entry->left->max_free > entry->max_free) entry->max_free = entry->left->max_free; if (entry->right != NULL && entry->right->max_free > entry->max_free) entry->max_free = entry->right->max_free; } /* * vm_map_entry_splay: * * The Sleator and Tarjan top-down splay algorithm with the * following variation. Max_free must be computed bottom-up, so * on the downward pass, maintain the left and right spines in * reverse order. Then, make a second pass up each side to fix * the pointers and compute max_free. The time bound is O(log n) * amortized. * * The new root is the vm_map_entry containing "addr", or else an * adjacent entry (lower or higher) if addr is not in the tree. * * The map must be locked, and leaves it so. * * Returns: the new root. */ static vm_map_entry_t vm_map_entry_splay(vm_offset_t addr, vm_map_entry_t root) { vm_map_entry_t llist, rlist; vm_map_entry_t ltree, rtree; vm_map_entry_t y; /* Special case of empty tree. */ if (root == NULL) return (root); /* * Pass One: Splay down the tree until we find addr or a NULL * pointer where addr would go. llist and rlist are the two * sides in reverse order (bottom-up), with llist linked by * the right pointer and rlist linked by the left pointer in * the vm_map_entry. Wait until Pass Two to set max_free on * the two spines. */ llist = NULL; rlist = NULL; for (;;) { /* root is never NULL in here. */ if (addr < root->start) { y = root->left; if (y == NULL) break; if (addr < y->start && y->left != NULL) { /* Rotate right and put y on rlist. */ root->left = y->right; y->right = root; vm_map_entry_set_max_free(root); root = y->left; y->left = rlist; rlist = y; } else { /* Put root on rlist. */ root->left = rlist; rlist = root; root = y; } } else if (addr >= root->end) { y = root->right; if (y == NULL) break; if (addr >= y->end && y->right != NULL) { /* Rotate left and put y on llist. */ root->right = y->left; y->left = root; vm_map_entry_set_max_free(root); root = y->right; y->right = llist; llist = y; } else { /* Put root on llist. */ root->right = llist; llist = root; root = y; } } else break; } /* * Pass Two: Walk back up the two spines, flip the pointers * and set max_free. The subtrees of the root go at the * bottom of llist and rlist. */ ltree = root->left; while (llist != NULL) { y = llist->right; llist->right = ltree; vm_map_entry_set_max_free(llist); ltree = llist; llist = y; } rtree = root->right; while (rlist != NULL) { y = rlist->left; rlist->left = rtree; vm_map_entry_set_max_free(rlist); rtree = rlist; rlist = y; } /* * Final assembly: add ltree and rtree as subtrees of root. */ root->left = ltree; root->right = rtree; vm_map_entry_set_max_free(root); return (root); } /* * vm_map_entry_{un,}link: * * Insert/remove entries from maps. */ static void vm_map_entry_link(vm_map_t map, vm_map_entry_t after_where, vm_map_entry_t entry) { CTR4(KTR_VM, "vm_map_entry_link: map %p, nentries %d, entry %p, after %p", map, map->nentries, entry, after_where); VM_MAP_ASSERT_LOCKED(map); KASSERT(after_where == &map->header || after_where->end <= entry->start, ("vm_map_entry_link: prev end %jx new start %jx overlap", (uintmax_t)after_where->end, (uintmax_t)entry->start)); KASSERT(after_where->next == &map->header || entry->end <= after_where->next->start, ("vm_map_entry_link: new end %jx next start %jx overlap", (uintmax_t)entry->end, (uintmax_t)after_where->next->start)); map->nentries++; entry->prev = after_where; entry->next = after_where->next; entry->next->prev = entry; after_where->next = entry; if (after_where != &map->header) { if (after_where != map->root) vm_map_entry_splay(after_where->start, map->root); entry->right = after_where->right; entry->left = after_where; after_where->right = NULL; after_where->adj_free = entry->start - after_where->end; vm_map_entry_set_max_free(after_where); } else { entry->right = map->root; entry->left = NULL; } entry->adj_free = (entry->next == &map->header ? map->max_offset : entry->next->start) - entry->end; vm_map_entry_set_max_free(entry); map->root = entry; } static void vm_map_entry_unlink(vm_map_t map, vm_map_entry_t entry) { vm_map_entry_t next, prev, root; VM_MAP_ASSERT_LOCKED(map); if (entry != map->root) vm_map_entry_splay(entry->start, map->root); if (entry->left == NULL) root = entry->right; else { root = vm_map_entry_splay(entry->start, entry->left); root->right = entry->right; root->adj_free = (entry->next == &map->header ? map->max_offset : entry->next->start) - root->end; vm_map_entry_set_max_free(root); } map->root = root; prev = entry->prev; next = entry->next; next->prev = prev; prev->next = next; map->nentries--; CTR3(KTR_VM, "vm_map_entry_unlink: map %p, nentries %d, entry %p", map, map->nentries, entry); } /* * vm_map_entry_resize_free: * * Recompute the amount of free space following a vm_map_entry * and propagate that value up the tree. Call this function after * resizing a map entry in-place, that is, without a call to * vm_map_entry_link() or _unlink(). * * The map must be locked, and leaves it so. */ static void vm_map_entry_resize_free(vm_map_t map, vm_map_entry_t entry) { /* * Using splay trees without parent pointers, propagating * max_free up the tree is done by moving the entry to the * root and making the change there. */ if (entry != map->root) map->root = vm_map_entry_splay(entry->start, map->root); entry->adj_free = (entry->next == &map->header ? map->max_offset : entry->next->start) - entry->end; vm_map_entry_set_max_free(entry); } /* * vm_map_lookup_entry: [ internal use only ] * * Finds the map entry containing (or * immediately preceding) the specified address * in the given map; the entry is returned * in the "entry" parameter. The boolean * result indicates whether the address is * actually contained in the map. */ boolean_t vm_map_lookup_entry( vm_map_t map, vm_offset_t address, vm_map_entry_t *entry) /* OUT */ { vm_map_entry_t cur; boolean_t locked; /* * If the map is empty, then the map entry immediately preceding * "address" is the map's header. */ cur = map->root; if (cur == NULL) *entry = &map->header; else if (address >= cur->start && cur->end > address) { *entry = cur; return (TRUE); } else if ((locked = vm_map_locked(map)) || sx_try_upgrade(&map->lock)) { /* * Splay requires a write lock on the map. However, it only * restructures the binary search tree; it does not otherwise * change the map. Thus, the map's timestamp need not change * on a temporary upgrade. */ map->root = cur = vm_map_entry_splay(address, cur); if (!locked) sx_downgrade(&map->lock); /* * If "address" is contained within a map entry, the new root * is that map entry. Otherwise, the new root is a map entry * immediately before or after "address". */ if (address >= cur->start) { *entry = cur; if (cur->end > address) return (TRUE); } else *entry = cur->prev; } else /* * Since the map is only locked for read access, perform a * standard binary search tree lookup for "address". */ for (;;) { if (address < cur->start) { if (cur->left == NULL) { *entry = cur->prev; break; } cur = cur->left; } else if (cur->end > address) { *entry = cur; return (TRUE); } else { if (cur->right == NULL) { *entry = cur; break; } cur = cur->right; } } return (FALSE); } /* * vm_map_insert: * * Inserts the given whole VM object into the target * map at the specified address range. The object's * size should match that of the address range. * * Requires that the map be locked, and leaves it so. * * If object is non-NULL, ref count must be bumped by caller * prior to making call to account for the new entry. */ int vm_map_insert(vm_map_t map, vm_object_t object, vm_ooffset_t offset, vm_offset_t start, vm_offset_t end, vm_prot_t prot, vm_prot_t max, int cow) { vm_map_entry_t new_entry, prev_entry, temp_entry; vm_eflags_t protoeflags; struct ucred *cred; vm_inherit_t inheritance; VM_MAP_ASSERT_LOCKED(map); KASSERT((object != kmem_object && object != kernel_object) || (cow & MAP_COPY_ON_WRITE) == 0, ("vm_map_insert: kmem or kernel object and COW")); KASSERT(object == NULL || (cow & MAP_NOFAULT) == 0, ("vm_map_insert: paradoxical MAP_NOFAULT request")); /* * Check that the start and end points are not bogus. */ if ((start < map->min_offset) || (end > map->max_offset) || (start >= end)) return (KERN_INVALID_ADDRESS); /* * Find the entry prior to the proposed starting address; if it's part * of an existing entry, this range is bogus. */ if (vm_map_lookup_entry(map, start, &temp_entry)) return (KERN_NO_SPACE); prev_entry = temp_entry; /* * Assert that the next entry doesn't overlap the end point. */ if ((prev_entry->next != &map->header) && (prev_entry->next->start < end)) return (KERN_NO_SPACE); protoeflags = 0; if (cow & MAP_COPY_ON_WRITE) protoeflags |= MAP_ENTRY_COW | MAP_ENTRY_NEEDS_COPY; if (cow & MAP_NOFAULT) protoeflags |= MAP_ENTRY_NOFAULT; if (cow & MAP_DISABLE_SYNCER) protoeflags |= MAP_ENTRY_NOSYNC; if (cow & MAP_DISABLE_COREDUMP) protoeflags |= MAP_ENTRY_NOCOREDUMP; if (cow & MAP_STACK_GROWS_DOWN) protoeflags |= MAP_ENTRY_GROWS_DOWN; if (cow & MAP_STACK_GROWS_UP) protoeflags |= MAP_ENTRY_GROWS_UP; if (cow & MAP_VN_WRITECOUNT) protoeflags |= MAP_ENTRY_VN_WRITECNT; if (cow & MAP_INHERIT_SHARE) inheritance = VM_INHERIT_SHARE; else inheritance = VM_INHERIT_DEFAULT; cred = NULL; if (cow & (MAP_ACC_NO_CHARGE | MAP_NOFAULT)) goto charged; if ((cow & MAP_ACC_CHARGED) || ((prot & VM_PROT_WRITE) && ((protoeflags & MAP_ENTRY_NEEDS_COPY) || object == NULL))) { if (!(cow & MAP_ACC_CHARGED) && !swap_reserve(end - start)) return (KERN_RESOURCE_SHORTAGE); KASSERT(object == NULL || (protoeflags & MAP_ENTRY_NEEDS_COPY) || object->cred == NULL, ("OVERCOMMIT: vm_map_insert o %p", object)); cred = curthread->td_ucred; } charged: /* Expand the kernel pmap, if necessary. */ if (map == kernel_map && end > kernel_vm_end) pmap_growkernel(end); if (object != NULL) { /* * OBJ_ONEMAPPING must be cleared unless this mapping * is trivially proven to be the only mapping for any * of the object's pages. (Object granularity * reference counting is insufficient to recognize * aliases with precision.) */ VM_OBJECT_WLOCK(object); if (object->ref_count > 1 || object->shadow_count != 0) vm_object_clear_flag(object, OBJ_ONEMAPPING); VM_OBJECT_WUNLOCK(object); } else if ((prev_entry != &map->header) && (prev_entry->eflags == protoeflags) && (cow & (MAP_STACK_GROWS_DOWN | MAP_STACK_GROWS_UP)) == 0 && (prev_entry->end == start) && (prev_entry->wired_count == 0) && (prev_entry->cred == cred || (prev_entry->object.vm_object != NULL && (prev_entry->object.vm_object->cred == cred))) && vm_object_coalesce(prev_entry->object.vm_object, prev_entry->offset, (vm_size_t)(prev_entry->end - prev_entry->start), (vm_size_t)(end - prev_entry->end), cred != NULL && (protoeflags & MAP_ENTRY_NEEDS_COPY) == 0)) { /* * We were able to extend the object. Determine if we * can extend the previous map entry to include the * new range as well. */ if ((prev_entry->inheritance == inheritance) && (prev_entry->protection == prot) && (prev_entry->max_protection == max)) { map->size += (end - prev_entry->end); prev_entry->end = end; vm_map_entry_resize_free(map, prev_entry); vm_map_simplify_entry(map, prev_entry); return (KERN_SUCCESS); } /* * If we can extend the object but cannot extend the * map entry, we have to create a new map entry. We * must bump the ref count on the extended object to * account for it. object may be NULL. */ object = prev_entry->object.vm_object; offset = prev_entry->offset + (prev_entry->end - prev_entry->start); vm_object_reference(object); if (cred != NULL && object != NULL && object->cred != NULL && !(prev_entry->eflags & MAP_ENTRY_NEEDS_COPY)) { /* Object already accounts for this uid. */ cred = NULL; } } if (cred != NULL) crhold(cred); /* * Create a new entry */ new_entry = vm_map_entry_create(map); new_entry->start = start; new_entry->end = end; new_entry->cred = NULL; new_entry->eflags = protoeflags; new_entry->object.vm_object = object; new_entry->offset = offset; new_entry->avail_ssize = 0; new_entry->inheritance = inheritance; new_entry->protection = prot; new_entry->max_protection = max; new_entry->wired_count = 0; new_entry->wiring_thread = NULL; new_entry->read_ahead = VM_FAULT_READ_AHEAD_INIT; new_entry->next_read = OFF_TO_IDX(offset); KASSERT(cred == NULL || !ENTRY_CHARGED(new_entry), ("OVERCOMMIT: vm_map_insert leaks vm_map %p", new_entry)); new_entry->cred = cred; /* * Insert the new entry into the list */ vm_map_entry_link(map, prev_entry, new_entry); map->size += new_entry->end - new_entry->start; /* * Try to coalesce the new entry with both the previous and next * entries in the list. Previously, we only attempted to coalesce * with the previous entry when object is NULL. Here, we handle the * other cases, which are less common. */ vm_map_simplify_entry(map, new_entry); if (cow & (MAP_PREFAULT|MAP_PREFAULT_PARTIAL)) { vm_map_pmap_enter(map, start, prot, object, OFF_TO_IDX(offset), end - start, cow & MAP_PREFAULT_PARTIAL); } return (KERN_SUCCESS); } /* * vm_map_findspace: * * Find the first fit (lowest VM address) for "length" free bytes * beginning at address >= start in the given map. * * In a vm_map_entry, "adj_free" is the amount of free space * adjacent (higher address) to this entry, and "max_free" is the * maximum amount of contiguous free space in its subtree. This * allows finding a free region in one path down the tree, so * O(log n) amortized with splay trees. * * The map must be locked, and leaves it so. * * Returns: 0 on success, and starting address in *addr, * 1 if insufficient space. */ int vm_map_findspace(vm_map_t map, vm_offset_t start, vm_size_t length, vm_offset_t *addr) /* OUT */ { vm_map_entry_t entry; vm_offset_t st; /* * Request must fit within min/max VM address and must avoid * address wrap. */ if (start < map->min_offset) start = map->min_offset; if (start + length > map->max_offset || start + length < start) return (1); /* Empty tree means wide open address space. */ if (map->root == NULL) { *addr = start; return (0); } /* * After splay, if start comes before root node, then there * must be a gap from start to the root. */ map->root = vm_map_entry_splay(start, map->root); if (start + length <= map->root->start) { *addr = start; return (0); } /* * Root is the last node that might begin its gap before * start, and this is the last comparison where address * wrap might be a problem. */ st = (start > map->root->end) ? start : map->root->end; if (length <= map->root->end + map->root->adj_free - st) { *addr = st; return (0); } /* With max_free, can immediately tell if no solution. */ entry = map->root->right; if (entry == NULL || length > entry->max_free) return (1); /* * Search the right subtree in the order: left subtree, root, * right subtree (first fit). The previous splay implies that * all regions in the right subtree have addresses > start. */ while (entry != NULL) { if (entry->left != NULL && entry->left->max_free >= length) entry = entry->left; else if (entry->adj_free >= length) { *addr = entry->end; return (0); } else entry = entry->right; } /* Can't get here, so panic if we do. */ panic("vm_map_findspace: max_free corrupt"); } int vm_map_fixed(vm_map_t map, vm_object_t object, vm_ooffset_t offset, vm_offset_t start, vm_size_t length, vm_prot_t prot, vm_prot_t max, int cow) { vm_offset_t end; int result; end = start + length; KASSERT((cow & (MAP_STACK_GROWS_DOWN | MAP_STACK_GROWS_UP)) == 0 || object == NULL, ("vm_map_fixed: non-NULL backing object for stack")); vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); if ((cow & MAP_CHECK_EXCL) == 0) vm_map_delete(map, start, end); if ((cow & (MAP_STACK_GROWS_DOWN | MAP_STACK_GROWS_UP)) != 0) { result = vm_map_stack_locked(map, start, length, sgrowsiz, prot, max, cow); } else { result = vm_map_insert(map, object, offset, start, end, prot, max, cow); } vm_map_unlock(map); return (result); } /* * vm_map_find finds an unallocated region in the target address * map with the given length. The search is defined to be * first-fit from the specified address; the region found is * returned in the same parameter. * * If object is non-NULL, ref count must be bumped by caller * prior to making call to account for the new entry. */ int vm_map_find(vm_map_t map, vm_object_t object, vm_ooffset_t offset, vm_offset_t *addr, /* IN/OUT */ vm_size_t length, vm_offset_t max_addr, int find_space, vm_prot_t prot, vm_prot_t max, int cow) { vm_offset_t alignment, initial_addr, start; int result; KASSERT((cow & (MAP_STACK_GROWS_DOWN | MAP_STACK_GROWS_UP)) == 0 || object == NULL, ("vm_map_find: non-NULL backing object for stack")); if (find_space == VMFS_OPTIMAL_SPACE && (object == NULL || (object->flags & OBJ_COLORED) == 0)) find_space = VMFS_ANY_SPACE; if (find_space >> 8 != 0) { KASSERT((find_space & 0xff) == 0, ("bad VMFS flags")); alignment = (vm_offset_t)1 << (find_space >> 8); } else alignment = 0; initial_addr = *addr; again: start = initial_addr; vm_map_lock(map); do { if (find_space != VMFS_NO_SPACE) { if (vm_map_findspace(map, start, length, addr) || (max_addr != 0 && *addr + length > max_addr)) { vm_map_unlock(map); if (find_space == VMFS_OPTIMAL_SPACE) { find_space = VMFS_ANY_SPACE; goto again; } return (KERN_NO_SPACE); } switch (find_space) { case VMFS_SUPER_SPACE: case VMFS_OPTIMAL_SPACE: pmap_align_superpage(object, offset, addr, length); break; case VMFS_ANY_SPACE: break; default: if ((*addr & (alignment - 1)) != 0) { *addr &= ~(alignment - 1); *addr += alignment; } break; } start = *addr; } if ((cow & (MAP_STACK_GROWS_DOWN | MAP_STACK_GROWS_UP)) != 0) { result = vm_map_stack_locked(map, start, length, sgrowsiz, prot, max, cow); } else { result = vm_map_insert(map, object, offset, start, start + length, prot, max, cow); } } while (result == KERN_NO_SPACE && find_space != VMFS_NO_SPACE && find_space != VMFS_ANY_SPACE); vm_map_unlock(map); return (result); } /* * vm_map_simplify_entry: * * Simplify the given map entry by merging with either neighbor. This * routine also has the ability to merge with both neighbors. * * The map must be locked. * - * This routine guarentees that the passed entry remains valid (though + * This routine guarantees that the passed entry remains valid (though * possibly extended). When merging, this routine may delete one or * both neighbors. */ void vm_map_simplify_entry(vm_map_t map, vm_map_entry_t entry) { vm_map_entry_t next, prev; vm_size_t prevsize, esize; if ((entry->eflags & (MAP_ENTRY_GROWS_DOWN | MAP_ENTRY_GROWS_UP | MAP_ENTRY_IN_TRANSITION | MAP_ENTRY_IS_SUB_MAP)) != 0) return; prev = entry->prev; if (prev != &map->header) { prevsize = prev->end - prev->start; if ( (prev->end == entry->start) && (prev->object.vm_object == entry->object.vm_object) && (!prev->object.vm_object || (prev->offset + prevsize == entry->offset)) && (prev->eflags == entry->eflags) && (prev->protection == entry->protection) && (prev->max_protection == entry->max_protection) && (prev->inheritance == entry->inheritance) && (prev->wired_count == entry->wired_count) && (prev->cred == entry->cred)) { vm_map_entry_unlink(map, prev); entry->start = prev->start; entry->offset = prev->offset; if (entry->prev != &map->header) vm_map_entry_resize_free(map, entry->prev); /* * If the backing object is a vnode object, * vm_object_deallocate() calls vrele(). * However, vrele() does not lock the vnode * because the vnode has additional * references. Thus, the map lock can be kept * without causing a lock-order reversal with * the vnode lock. * * Since we count the number of virtual page * mappings in object->un_pager.vnp.writemappings, * the writemappings value should not be adjusted * when the entry is disposed of. */ if (prev->object.vm_object) vm_object_deallocate(prev->object.vm_object); if (prev->cred != NULL) crfree(prev->cred); vm_map_entry_dispose(map, prev); } } next = entry->next; if (next != &map->header) { esize = entry->end - entry->start; if ((entry->end == next->start) && (next->object.vm_object == entry->object.vm_object) && (!entry->object.vm_object || (entry->offset + esize == next->offset)) && (next->eflags == entry->eflags) && (next->protection == entry->protection) && (next->max_protection == entry->max_protection) && (next->inheritance == entry->inheritance) && (next->wired_count == entry->wired_count) && (next->cred == entry->cred)) { vm_map_entry_unlink(map, next); entry->end = next->end; vm_map_entry_resize_free(map, entry); /* * See comment above. */ if (next->object.vm_object) vm_object_deallocate(next->object.vm_object); if (next->cred != NULL) crfree(next->cred); vm_map_entry_dispose(map, next); } } } /* * vm_map_clip_start: [ internal use only ] * * Asserts that the given entry begins at or after * the specified address; if necessary, * it splits the entry into two. */ #define vm_map_clip_start(map, entry, startaddr) \ { \ if (startaddr > entry->start) \ _vm_map_clip_start(map, entry, startaddr); \ } /* * This routine is called only when it is known that * the entry must be split. */ static void _vm_map_clip_start(vm_map_t map, vm_map_entry_t entry, vm_offset_t start) { vm_map_entry_t new_entry; VM_MAP_ASSERT_LOCKED(map); /* * Split off the front portion -- note that we must insert the new * entry BEFORE this one, so that this entry has the specified * starting address. */ vm_map_simplify_entry(map, entry); /* * If there is no object backing this entry, we might as well create * one now. If we defer it, an object can get created after the map * is clipped, and individual objects will be created for the split-up * map. This is a bit of a hack, but is also about the best place to * put this improvement. */ if (entry->object.vm_object == NULL && !map->system_map) { vm_object_t object; object = vm_object_allocate(OBJT_DEFAULT, atop(entry->end - entry->start)); entry->object.vm_object = object; entry->offset = 0; if (entry->cred != NULL) { object->cred = entry->cred; object->charge = entry->end - entry->start; entry->cred = NULL; } } else if (entry->object.vm_object != NULL && ((entry->eflags & MAP_ENTRY_NEEDS_COPY) == 0) && entry->cred != NULL) { VM_OBJECT_WLOCK(entry->object.vm_object); KASSERT(entry->object.vm_object->cred == NULL, ("OVERCOMMIT: vm_entry_clip_start: both cred e %p", entry)); entry->object.vm_object->cred = entry->cred; entry->object.vm_object->charge = entry->end - entry->start; VM_OBJECT_WUNLOCK(entry->object.vm_object); entry->cred = NULL; } new_entry = vm_map_entry_create(map); *new_entry = *entry; new_entry->end = start; entry->offset += (start - entry->start); entry->start = start; if (new_entry->cred != NULL) crhold(entry->cred); vm_map_entry_link(map, entry->prev, new_entry); if ((entry->eflags & MAP_ENTRY_IS_SUB_MAP) == 0) { vm_object_reference(new_entry->object.vm_object); /* * The object->un_pager.vnp.writemappings for the * object of MAP_ENTRY_VN_WRITECNT type entry shall be * kept as is here. The virtual pages are * re-distributed among the clipped entries, so the sum is * left the same. */ } } /* * vm_map_clip_end: [ internal use only ] * * Asserts that the given entry ends at or before * the specified address; if necessary, * it splits the entry into two. */ #define vm_map_clip_end(map, entry, endaddr) \ { \ if ((endaddr) < (entry->end)) \ _vm_map_clip_end((map), (entry), (endaddr)); \ } /* * This routine is called only when it is known that * the entry must be split. */ static void _vm_map_clip_end(vm_map_t map, vm_map_entry_t entry, vm_offset_t end) { vm_map_entry_t new_entry; VM_MAP_ASSERT_LOCKED(map); /* * If there is no object backing this entry, we might as well create * one now. If we defer it, an object can get created after the map * is clipped, and individual objects will be created for the split-up * map. This is a bit of a hack, but is also about the best place to * put this improvement. */ if (entry->object.vm_object == NULL && !map->system_map) { vm_object_t object; object = vm_object_allocate(OBJT_DEFAULT, atop(entry->end - entry->start)); entry->object.vm_object = object; entry->offset = 0; if (entry->cred != NULL) { object->cred = entry->cred; object->charge = entry->end - entry->start; entry->cred = NULL; } } else if (entry->object.vm_object != NULL && ((entry->eflags & MAP_ENTRY_NEEDS_COPY) == 0) && entry->cred != NULL) { VM_OBJECT_WLOCK(entry->object.vm_object); KASSERT(entry->object.vm_object->cred == NULL, ("OVERCOMMIT: vm_entry_clip_end: both cred e %p", entry)); entry->object.vm_object->cred = entry->cred; entry->object.vm_object->charge = entry->end - entry->start; VM_OBJECT_WUNLOCK(entry->object.vm_object); entry->cred = NULL; } /* * Create a new entry and insert it AFTER the specified entry */ new_entry = vm_map_entry_create(map); *new_entry = *entry; new_entry->start = entry->end = end; new_entry->offset += (end - entry->start); if (new_entry->cred != NULL) crhold(entry->cred); vm_map_entry_link(map, entry, new_entry); if ((entry->eflags & MAP_ENTRY_IS_SUB_MAP) == 0) { vm_object_reference(new_entry->object.vm_object); } } /* * vm_map_submap: [ kernel use only ] * * Mark the given range as handled by a subordinate map. * * This range must have been created with vm_map_find, * and no other operations may have been performed on this * range prior to calling vm_map_submap. * * Only a limited number of operations can be performed * within this rage after calling vm_map_submap: * vm_fault * [Don't try vm_map_copy!] * * To remove a submapping, one must first remove the * range from the superior map, and then destroy the * submap (if desired). [Better yet, don't try it.] */ int vm_map_submap( vm_map_t map, vm_offset_t start, vm_offset_t end, vm_map_t submap) { vm_map_entry_t entry; int result = KERN_INVALID_ARGUMENT; vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); if (vm_map_lookup_entry(map, start, &entry)) { vm_map_clip_start(map, entry, start); } else entry = entry->next; vm_map_clip_end(map, entry, end); if ((entry->start == start) && (entry->end == end) && ((entry->eflags & MAP_ENTRY_COW) == 0) && (entry->object.vm_object == NULL)) { entry->object.sub_map = submap; entry->eflags |= MAP_ENTRY_IS_SUB_MAP; result = KERN_SUCCESS; } vm_map_unlock(map); return (result); } /* * The maximum number of pages to map if MAP_PREFAULT_PARTIAL is specified */ #define MAX_INIT_PT 96 /* * vm_map_pmap_enter: * * Preload the specified map's pmap with mappings to the specified * object's memory-resident pages. No further physical pages are * allocated, and no further virtual pages are retrieved from secondary * storage. If the specified flags include MAP_PREFAULT_PARTIAL, then a * limited number of page mappings are created at the low-end of the * specified address range. (For this purpose, a superpage mapping * counts as one page mapping.) Otherwise, all resident pages within * the specified address range are mapped. Because these mappings are * being created speculatively, cached pages are not reactivated and * mapped. */ static void vm_map_pmap_enter(vm_map_t map, vm_offset_t addr, vm_prot_t prot, vm_object_t object, vm_pindex_t pindex, vm_size_t size, int flags) { vm_offset_t start; vm_page_t p, p_start; vm_pindex_t mask, psize, threshold, tmpidx; if ((prot & (VM_PROT_READ | VM_PROT_EXECUTE)) == 0 || object == NULL) return; VM_OBJECT_RLOCK(object); if (object->type == OBJT_DEVICE || object->type == OBJT_SG) { VM_OBJECT_RUNLOCK(object); VM_OBJECT_WLOCK(object); if (object->type == OBJT_DEVICE || object->type == OBJT_SG) { pmap_object_init_pt(map->pmap, addr, object, pindex, size); VM_OBJECT_WUNLOCK(object); return; } VM_OBJECT_LOCK_DOWNGRADE(object); } psize = atop(size); if (psize + pindex > object->size) { if (object->size < pindex) { VM_OBJECT_RUNLOCK(object); return; } psize = object->size - pindex; } start = 0; p_start = NULL; threshold = MAX_INIT_PT; p = vm_page_find_least(object, pindex); /* * Assert: the variable p is either (1) the page with the * least pindex greater than or equal to the parameter pindex * or (2) NULL. */ for (; p != NULL && (tmpidx = p->pindex - pindex) < psize; p = TAILQ_NEXT(p, listq)) { /* * don't allow an madvise to blow away our really * free pages allocating pv entries. */ if (((flags & MAP_PREFAULT_MADVISE) != 0 && vm_cnt.v_free_count < vm_cnt.v_free_reserved) || ((flags & MAP_PREFAULT_PARTIAL) != 0 && tmpidx >= threshold)) { psize = tmpidx; break; } if (p->valid == VM_PAGE_BITS_ALL) { if (p_start == NULL) { start = addr + ptoa(tmpidx); p_start = p; } /* Jump ahead if a superpage mapping is possible. */ if (p->psind > 0 && ((addr + ptoa(tmpidx)) & (pagesizes[p->psind] - 1)) == 0) { mask = atop(pagesizes[p->psind]) - 1; if (tmpidx + mask < psize && vm_page_ps_is_valid(p)) { p += mask; threshold += mask; } } } else if (p_start != NULL) { pmap_enter_object(map->pmap, start, addr + ptoa(tmpidx), p_start, prot); p_start = NULL; } } if (p_start != NULL) pmap_enter_object(map->pmap, start, addr + ptoa(psize), p_start, prot); VM_OBJECT_RUNLOCK(object); } /* * vm_map_protect: * * Sets the protection of the specified address * region in the target map. If "set_max" is * specified, the maximum protection is to be set; * otherwise, only the current protection is affected. */ int vm_map_protect(vm_map_t map, vm_offset_t start, vm_offset_t end, vm_prot_t new_prot, boolean_t set_max) { vm_map_entry_t current, entry; vm_object_t obj; struct ucred *cred; vm_prot_t old_prot; if (start == end) return (KERN_SUCCESS); vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); if (vm_map_lookup_entry(map, start, &entry)) { vm_map_clip_start(map, entry, start); } else { entry = entry->next; } /* * Make a first pass to check for protection violations. */ current = entry; while ((current != &map->header) && (current->start < end)) { if (current->eflags & MAP_ENTRY_IS_SUB_MAP) { vm_map_unlock(map); return (KERN_INVALID_ARGUMENT); } if ((new_prot & current->max_protection) != new_prot) { vm_map_unlock(map); return (KERN_PROTECTION_FAILURE); } current = current->next; } /* * Do an accounting pass for private read-only mappings that * now will do cow due to allowed write (e.g. debugger sets * breakpoint on text segment) */ for (current = entry; (current != &map->header) && (current->start < end); current = current->next) { vm_map_clip_end(map, current, end); if (set_max || ((new_prot & ~(current->protection)) & VM_PROT_WRITE) == 0 || ENTRY_CHARGED(current)) { continue; } cred = curthread->td_ucred; obj = current->object.vm_object; if (obj == NULL || (current->eflags & MAP_ENTRY_NEEDS_COPY)) { if (!swap_reserve(current->end - current->start)) { vm_map_unlock(map); return (KERN_RESOURCE_SHORTAGE); } crhold(cred); current->cred = cred; continue; } VM_OBJECT_WLOCK(obj); if (obj->type != OBJT_DEFAULT && obj->type != OBJT_SWAP) { VM_OBJECT_WUNLOCK(obj); continue; } /* * Charge for the whole object allocation now, since * we cannot distinguish between non-charged and * charged clipped mapping of the same object later. */ KASSERT(obj->charge == 0, ("vm_map_protect: object %p overcharged (entry %p)", obj, current)); if (!swap_reserve(ptoa(obj->size))) { VM_OBJECT_WUNLOCK(obj); vm_map_unlock(map); return (KERN_RESOURCE_SHORTAGE); } crhold(cred); obj->cred = cred; obj->charge = ptoa(obj->size); VM_OBJECT_WUNLOCK(obj); } /* * Go back and fix up protections. [Note that clipping is not * necessary the second time.] */ current = entry; while ((current != &map->header) && (current->start < end)) { old_prot = current->protection; if (set_max) current->protection = (current->max_protection = new_prot) & old_prot; else current->protection = new_prot; /* * For user wired map entries, the normal lazy evaluation of * write access upgrades through soft page faults is * undesirable. Instead, immediately copy any pages that are * copy-on-write and enable write access in the physical map. */ if ((current->eflags & MAP_ENTRY_USER_WIRED) != 0 && (current->protection & VM_PROT_WRITE) != 0 && (old_prot & VM_PROT_WRITE) == 0) vm_fault_copy_entry(map, map, current, current, NULL); /* * When restricting access, update the physical map. Worry * about copy-on-write here. */ if ((old_prot & ~current->protection) != 0) { #define MASK(entry) (((entry)->eflags & MAP_ENTRY_COW) ? ~VM_PROT_WRITE : \ VM_PROT_ALL) pmap_protect(map->pmap, current->start, current->end, current->protection & MASK(current)); #undef MASK } vm_map_simplify_entry(map, current); current = current->next; } vm_map_unlock(map); return (KERN_SUCCESS); } /* * vm_map_madvise: * * This routine traverses a processes map handling the madvise * system call. Advisories are classified as either those effecting * the vm_map_entry structure, or those effecting the underlying * objects. */ int vm_map_madvise( vm_map_t map, vm_offset_t start, vm_offset_t end, int behav) { vm_map_entry_t current, entry; int modify_map = 0; /* * Some madvise calls directly modify the vm_map_entry, in which case * we need to use an exclusive lock on the map and we need to perform * various clipping operations. Otherwise we only need a read-lock * on the map. */ switch(behav) { case MADV_NORMAL: case MADV_SEQUENTIAL: case MADV_RANDOM: case MADV_NOSYNC: case MADV_AUTOSYNC: case MADV_NOCORE: case MADV_CORE: if (start == end) return (KERN_SUCCESS); modify_map = 1; vm_map_lock(map); break; case MADV_WILLNEED: case MADV_DONTNEED: case MADV_FREE: if (start == end) return (KERN_SUCCESS); vm_map_lock_read(map); break; default: return (KERN_INVALID_ARGUMENT); } /* * Locate starting entry and clip if necessary. */ VM_MAP_RANGE_CHECK(map, start, end); if (vm_map_lookup_entry(map, start, &entry)) { if (modify_map) vm_map_clip_start(map, entry, start); } else { entry = entry->next; } if (modify_map) { /* * madvise behaviors that are implemented in the vm_map_entry. * * We clip the vm_map_entry so that behavioral changes are * limited to the specified address range. */ for (current = entry; (current != &map->header) && (current->start < end); current = current->next ) { if (current->eflags & MAP_ENTRY_IS_SUB_MAP) continue; vm_map_clip_end(map, current, end); switch (behav) { case MADV_NORMAL: vm_map_entry_set_behavior(current, MAP_ENTRY_BEHAV_NORMAL); break; case MADV_SEQUENTIAL: vm_map_entry_set_behavior(current, MAP_ENTRY_BEHAV_SEQUENTIAL); break; case MADV_RANDOM: vm_map_entry_set_behavior(current, MAP_ENTRY_BEHAV_RANDOM); break; case MADV_NOSYNC: current->eflags |= MAP_ENTRY_NOSYNC; break; case MADV_AUTOSYNC: current->eflags &= ~MAP_ENTRY_NOSYNC; break; case MADV_NOCORE: current->eflags |= MAP_ENTRY_NOCOREDUMP; break; case MADV_CORE: current->eflags &= ~MAP_ENTRY_NOCOREDUMP; break; default: break; } vm_map_simplify_entry(map, current); } vm_map_unlock(map); } else { vm_pindex_t pstart, pend; /* * madvise behaviors that are implemented in the underlying * vm_object. * * Since we don't clip the vm_map_entry, we have to clip * the vm_object pindex and count. */ for (current = entry; (current != &map->header) && (current->start < end); current = current->next ) { vm_offset_t useEnd, useStart; if (current->eflags & MAP_ENTRY_IS_SUB_MAP) continue; pstart = OFF_TO_IDX(current->offset); pend = pstart + atop(current->end - current->start); useStart = current->start; useEnd = current->end; if (current->start < start) { pstart += atop(start - current->start); useStart = start; } if (current->end > end) { pend -= atop(current->end - end); useEnd = end; } if (pstart >= pend) continue; /* * Perform the pmap_advise() before clearing * PGA_REFERENCED in vm_page_advise(). Otherwise, a * concurrent pmap operation, such as pmap_remove(), * could clear a reference in the pmap and set * PGA_REFERENCED on the page before the pmap_advise() * had completed. Consequently, the page would appear * referenced based upon an old reference that * occurred before this pmap_advise() ran. */ if (behav == MADV_DONTNEED || behav == MADV_FREE) pmap_advise(map->pmap, useStart, useEnd, behav); vm_object_madvise(current->object.vm_object, pstart, pend, behav); /* * Pre-populate paging structures in the * WILLNEED case. For wired entries, the * paging structures are already populated. */ if (behav == MADV_WILLNEED && current->wired_count == 0) { vm_map_pmap_enter(map, useStart, current->protection, current->object.vm_object, pstart, ptoa(pend - pstart), MAP_PREFAULT_MADVISE ); } } vm_map_unlock_read(map); } return (0); } /* * vm_map_inherit: * * Sets the inheritance of the specified address * range in the target map. Inheritance * affects how the map will be shared with * child maps at the time of vmspace_fork. */ int vm_map_inherit(vm_map_t map, vm_offset_t start, vm_offset_t end, vm_inherit_t new_inheritance) { vm_map_entry_t entry; vm_map_entry_t temp_entry; switch (new_inheritance) { case VM_INHERIT_NONE: case VM_INHERIT_COPY: case VM_INHERIT_SHARE: break; default: return (KERN_INVALID_ARGUMENT); } if (start == end) return (KERN_SUCCESS); vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); if (vm_map_lookup_entry(map, start, &temp_entry)) { entry = temp_entry; vm_map_clip_start(map, entry, start); } else entry = temp_entry->next; while ((entry != &map->header) && (entry->start < end)) { vm_map_clip_end(map, entry, end); entry->inheritance = new_inheritance; vm_map_simplify_entry(map, entry); entry = entry->next; } vm_map_unlock(map); return (KERN_SUCCESS); } /* * vm_map_unwire: * * Implements both kernel and user unwiring. */ int vm_map_unwire(vm_map_t map, vm_offset_t start, vm_offset_t end, int flags) { vm_map_entry_t entry, first_entry, tmp_entry; vm_offset_t saved_start; unsigned int last_timestamp; int rv; boolean_t need_wakeup, result, user_unwire; if (start == end) return (KERN_SUCCESS); user_unwire = (flags & VM_MAP_WIRE_USER) ? TRUE : FALSE; vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); if (!vm_map_lookup_entry(map, start, &first_entry)) { if (flags & VM_MAP_WIRE_HOLESOK) first_entry = first_entry->next; else { vm_map_unlock(map); return (KERN_INVALID_ADDRESS); } } last_timestamp = map->timestamp; entry = first_entry; while (entry != &map->header && entry->start < end) { if (entry->eflags & MAP_ENTRY_IN_TRANSITION) { /* * We have not yet clipped the entry. */ saved_start = (start >= entry->start) ? start : entry->start; entry->eflags |= MAP_ENTRY_NEEDS_WAKEUP; if (vm_map_unlock_and_wait(map, 0)) { /* * Allow interruption of user unwiring? */ } vm_map_lock(map); if (last_timestamp+1 != map->timestamp) { /* * Look again for the entry because the map was * modified while it was unlocked. * Specifically, the entry may have been * clipped, merged, or deleted. */ if (!vm_map_lookup_entry(map, saved_start, &tmp_entry)) { if (flags & VM_MAP_WIRE_HOLESOK) tmp_entry = tmp_entry->next; else { if (saved_start == start) { /* * First_entry has been deleted. */ vm_map_unlock(map); return (KERN_INVALID_ADDRESS); } end = saved_start; rv = KERN_INVALID_ADDRESS; goto done; } } if (entry == first_entry) first_entry = tmp_entry; else first_entry = NULL; entry = tmp_entry; } last_timestamp = map->timestamp; continue; } vm_map_clip_start(map, entry, start); vm_map_clip_end(map, entry, end); /* * Mark the entry in case the map lock is released. (See * above.) */ KASSERT((entry->eflags & MAP_ENTRY_IN_TRANSITION) == 0 && entry->wiring_thread == NULL, ("owned map entry %p", entry)); entry->eflags |= MAP_ENTRY_IN_TRANSITION; entry->wiring_thread = curthread; /* * Check the map for holes in the specified region. * If VM_MAP_WIRE_HOLESOK was specified, skip this check. */ if (((flags & VM_MAP_WIRE_HOLESOK) == 0) && (entry->end < end && (entry->next == &map->header || entry->next->start > entry->end))) { end = entry->end; rv = KERN_INVALID_ADDRESS; goto done; } /* * If system unwiring, require that the entry is system wired. */ if (!user_unwire && vm_map_entry_system_wired_count(entry) == 0) { end = entry->end; rv = KERN_INVALID_ARGUMENT; goto done; } entry = entry->next; } rv = KERN_SUCCESS; done: need_wakeup = FALSE; if (first_entry == NULL) { result = vm_map_lookup_entry(map, start, &first_entry); if (!result && (flags & VM_MAP_WIRE_HOLESOK)) first_entry = first_entry->next; else KASSERT(result, ("vm_map_unwire: lookup failed")); } for (entry = first_entry; entry != &map->header && entry->start < end; entry = entry->next) { /* * If VM_MAP_WIRE_HOLESOK was specified, an empty * space in the unwired region could have been mapped * while the map lock was dropped for draining * MAP_ENTRY_IN_TRANSITION. Moreover, another thread * could be simultaneously wiring this new mapping * entry. Detect these cases and skip any entries * marked as in transition by us. */ if ((entry->eflags & MAP_ENTRY_IN_TRANSITION) == 0 || entry->wiring_thread != curthread) { KASSERT((flags & VM_MAP_WIRE_HOLESOK) != 0, ("vm_map_unwire: !HOLESOK and new/changed entry")); continue; } if (rv == KERN_SUCCESS && (!user_unwire || (entry->eflags & MAP_ENTRY_USER_WIRED))) { if (user_unwire) entry->eflags &= ~MAP_ENTRY_USER_WIRED; if (entry->wired_count == 1) vm_map_entry_unwire(map, entry); else entry->wired_count--; } KASSERT((entry->eflags & MAP_ENTRY_IN_TRANSITION) != 0, ("vm_map_unwire: in-transition flag missing %p", entry)); KASSERT(entry->wiring_thread == curthread, ("vm_map_unwire: alien wire %p", entry)); entry->eflags &= ~MAP_ENTRY_IN_TRANSITION; entry->wiring_thread = NULL; if (entry->eflags & MAP_ENTRY_NEEDS_WAKEUP) { entry->eflags &= ~MAP_ENTRY_NEEDS_WAKEUP; need_wakeup = TRUE; } vm_map_simplify_entry(map, entry); } vm_map_unlock(map); if (need_wakeup) vm_map_wakeup(map); return (rv); } /* * vm_map_wire_entry_failure: * * Handle a wiring failure on the given entry. * * The map should be locked. */ static void vm_map_wire_entry_failure(vm_map_t map, vm_map_entry_t entry, vm_offset_t failed_addr) { VM_MAP_ASSERT_LOCKED(map); KASSERT((entry->eflags & MAP_ENTRY_IN_TRANSITION) != 0 && entry->wired_count == 1, ("vm_map_wire_entry_failure: entry %p isn't being wired", entry)); KASSERT(failed_addr < entry->end, ("vm_map_wire_entry_failure: entry %p was fully wired", entry)); /* * If any pages at the start of this entry were successfully wired, * then unwire them. */ if (failed_addr > entry->start) { pmap_unwire(map->pmap, entry->start, failed_addr); vm_object_unwire(entry->object.vm_object, entry->offset, failed_addr - entry->start, PQ_ACTIVE); } /* * Assign an out-of-range value to represent the failure to wire this * entry. */ entry->wired_count = -1; } /* * vm_map_wire: * * Implements both kernel and user wiring. */ int vm_map_wire(vm_map_t map, vm_offset_t start, vm_offset_t end, int flags) { vm_map_entry_t entry, first_entry, tmp_entry; vm_offset_t faddr, saved_end, saved_start; unsigned int last_timestamp; int rv; boolean_t need_wakeup, result, user_wire; vm_prot_t prot; if (start == end) return (KERN_SUCCESS); prot = 0; if (flags & VM_MAP_WIRE_WRITE) prot |= VM_PROT_WRITE; user_wire = (flags & VM_MAP_WIRE_USER) ? TRUE : FALSE; vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); if (!vm_map_lookup_entry(map, start, &first_entry)) { if (flags & VM_MAP_WIRE_HOLESOK) first_entry = first_entry->next; else { vm_map_unlock(map); return (KERN_INVALID_ADDRESS); } } last_timestamp = map->timestamp; entry = first_entry; while (entry != &map->header && entry->start < end) { if (entry->eflags & MAP_ENTRY_IN_TRANSITION) { /* * We have not yet clipped the entry. */ saved_start = (start >= entry->start) ? start : entry->start; entry->eflags |= MAP_ENTRY_NEEDS_WAKEUP; if (vm_map_unlock_and_wait(map, 0)) { /* * Allow interruption of user wiring? */ } vm_map_lock(map); if (last_timestamp + 1 != map->timestamp) { /* * Look again for the entry because the map was * modified while it was unlocked. * Specifically, the entry may have been * clipped, merged, or deleted. */ if (!vm_map_lookup_entry(map, saved_start, &tmp_entry)) { if (flags & VM_MAP_WIRE_HOLESOK) tmp_entry = tmp_entry->next; else { if (saved_start == start) { /* * first_entry has been deleted. */ vm_map_unlock(map); return (KERN_INVALID_ADDRESS); } end = saved_start; rv = KERN_INVALID_ADDRESS; goto done; } } if (entry == first_entry) first_entry = tmp_entry; else first_entry = NULL; entry = tmp_entry; } last_timestamp = map->timestamp; continue; } vm_map_clip_start(map, entry, start); vm_map_clip_end(map, entry, end); /* * Mark the entry in case the map lock is released. (See * above.) */ KASSERT((entry->eflags & MAP_ENTRY_IN_TRANSITION) == 0 && entry->wiring_thread == NULL, ("owned map entry %p", entry)); entry->eflags |= MAP_ENTRY_IN_TRANSITION; entry->wiring_thread = curthread; if ((entry->protection & (VM_PROT_READ | VM_PROT_EXECUTE)) == 0 || (entry->protection & prot) != prot) { entry->eflags |= MAP_ENTRY_WIRE_SKIPPED; if ((flags & VM_MAP_WIRE_HOLESOK) == 0) { end = entry->end; rv = KERN_INVALID_ADDRESS; goto done; } goto next_entry; } if (entry->wired_count == 0) { entry->wired_count++; saved_start = entry->start; saved_end = entry->end; /* * Release the map lock, relying on the in-transition * mark. Mark the map busy for fork. */ vm_map_busy(map); vm_map_unlock(map); faddr = saved_start; do { /* * Simulate a fault to get the page and enter * it into the physical map. */ if ((rv = vm_fault(map, faddr, VM_PROT_NONE, VM_FAULT_WIRE)) != KERN_SUCCESS) break; } while ((faddr += PAGE_SIZE) < saved_end); vm_map_lock(map); vm_map_unbusy(map); if (last_timestamp + 1 != map->timestamp) { /* * Look again for the entry because the map was * modified while it was unlocked. The entry * may have been clipped, but NOT merged or * deleted. */ result = vm_map_lookup_entry(map, saved_start, &tmp_entry); KASSERT(result, ("vm_map_wire: lookup failed")); if (entry == first_entry) first_entry = tmp_entry; else first_entry = NULL; entry = tmp_entry; while (entry->end < saved_end) { /* * In case of failure, handle entries * that were not fully wired here; * fully wired entries are handled * later. */ if (rv != KERN_SUCCESS && faddr < entry->end) vm_map_wire_entry_failure(map, entry, faddr); entry = entry->next; } } last_timestamp = map->timestamp; if (rv != KERN_SUCCESS) { vm_map_wire_entry_failure(map, entry, faddr); end = entry->end; goto done; } } else if (!user_wire || (entry->eflags & MAP_ENTRY_USER_WIRED) == 0) { entry->wired_count++; } /* * Check the map for holes in the specified region. * If VM_MAP_WIRE_HOLESOK was specified, skip this check. */ next_entry: if (((flags & VM_MAP_WIRE_HOLESOK) == 0) && (entry->end < end && (entry->next == &map->header || entry->next->start > entry->end))) { end = entry->end; rv = KERN_INVALID_ADDRESS; goto done; } entry = entry->next; } rv = KERN_SUCCESS; done: need_wakeup = FALSE; if (first_entry == NULL) { result = vm_map_lookup_entry(map, start, &first_entry); if (!result && (flags & VM_MAP_WIRE_HOLESOK)) first_entry = first_entry->next; else KASSERT(result, ("vm_map_wire: lookup failed")); } for (entry = first_entry; entry != &map->header && entry->start < end; entry = entry->next) { if ((entry->eflags & MAP_ENTRY_WIRE_SKIPPED) != 0) goto next_entry_done; /* * If VM_MAP_WIRE_HOLESOK was specified, an empty * space in the unwired region could have been mapped * while the map lock was dropped for faulting in the * pages or draining MAP_ENTRY_IN_TRANSITION. * Moreover, another thread could be simultaneously * wiring this new mapping entry. Detect these cases * and skip any entries marked as in transition by us. */ if ((entry->eflags & MAP_ENTRY_IN_TRANSITION) == 0 || entry->wiring_thread != curthread) { KASSERT((flags & VM_MAP_WIRE_HOLESOK) != 0, ("vm_map_wire: !HOLESOK and new/changed entry")); continue; } if (rv == KERN_SUCCESS) { if (user_wire) entry->eflags |= MAP_ENTRY_USER_WIRED; } else if (entry->wired_count == -1) { /* * Wiring failed on this entry. Thus, unwiring is * unnecessary. */ entry->wired_count = 0; } else if (!user_wire || (entry->eflags & MAP_ENTRY_USER_WIRED) == 0) { /* * Undo the wiring. Wiring succeeded on this entry * but failed on a later entry. */ if (entry->wired_count == 1) vm_map_entry_unwire(map, entry); else entry->wired_count--; } next_entry_done: KASSERT((entry->eflags & MAP_ENTRY_IN_TRANSITION) != 0, ("vm_map_wire: in-transition flag missing %p", entry)); KASSERT(entry->wiring_thread == curthread, ("vm_map_wire: alien wire %p", entry)); entry->eflags &= ~(MAP_ENTRY_IN_TRANSITION | MAP_ENTRY_WIRE_SKIPPED); entry->wiring_thread = NULL; if (entry->eflags & MAP_ENTRY_NEEDS_WAKEUP) { entry->eflags &= ~MAP_ENTRY_NEEDS_WAKEUP; need_wakeup = TRUE; } vm_map_simplify_entry(map, entry); } vm_map_unlock(map); if (need_wakeup) vm_map_wakeup(map); return (rv); } /* * vm_map_sync * * Push any dirty cached pages in the address range to their pager. * If syncio is TRUE, dirty pages are written synchronously. * If invalidate is TRUE, any cached pages are freed as well. * * If the size of the region from start to end is zero, we are * supposed to flush all modified pages within the region containing * start. Unfortunately, a region can be split or coalesced with * neighboring regions, making it difficult to determine what the * original region was. Therefore, we approximate this requirement by * flushing the current region containing start. * * Returns an error if any part of the specified range is not mapped. */ int vm_map_sync( vm_map_t map, vm_offset_t start, vm_offset_t end, boolean_t syncio, boolean_t invalidate) { vm_map_entry_t current; vm_map_entry_t entry; vm_size_t size; vm_object_t object; vm_ooffset_t offset; unsigned int last_timestamp; boolean_t failed; vm_map_lock_read(map); VM_MAP_RANGE_CHECK(map, start, end); if (!vm_map_lookup_entry(map, start, &entry)) { vm_map_unlock_read(map); return (KERN_INVALID_ADDRESS); } else if (start == end) { start = entry->start; end = entry->end; } /* * Make a first pass to check for user-wired memory and holes. */ for (current = entry; current != &map->header && current->start < end; current = current->next) { if (invalidate && (current->eflags & MAP_ENTRY_USER_WIRED)) { vm_map_unlock_read(map); return (KERN_INVALID_ARGUMENT); } if (end > current->end && (current->next == &map->header || current->end != current->next->start)) { vm_map_unlock_read(map); return (KERN_INVALID_ADDRESS); } } if (invalidate) pmap_remove(map->pmap, start, end); failed = FALSE; /* * Make a second pass, cleaning/uncaching pages from the indicated * objects as we go. */ for (current = entry; current != &map->header && current->start < end;) { offset = current->offset + (start - current->start); size = (end <= current->end ? end : current->end) - start; if (current->eflags & MAP_ENTRY_IS_SUB_MAP) { vm_map_t smap; vm_map_entry_t tentry; vm_size_t tsize; smap = current->object.sub_map; vm_map_lock_read(smap); (void) vm_map_lookup_entry(smap, offset, &tentry); tsize = tentry->end - offset; if (tsize < size) size = tsize; object = tentry->object.vm_object; offset = tentry->offset + (offset - tentry->start); vm_map_unlock_read(smap); } else { object = current->object.vm_object; } vm_object_reference(object); last_timestamp = map->timestamp; vm_map_unlock_read(map); if (!vm_object_sync(object, offset, size, syncio, invalidate)) failed = TRUE; start += size; vm_object_deallocate(object); vm_map_lock_read(map); if (last_timestamp == map->timestamp || !vm_map_lookup_entry(map, start, ¤t)) current = current->next; } vm_map_unlock_read(map); return (failed ? KERN_FAILURE : KERN_SUCCESS); } /* * vm_map_entry_unwire: [ internal use only ] * * Make the region specified by this entry pageable. * * The map in question should be locked. * [This is the reason for this routine's existence.] */ static void vm_map_entry_unwire(vm_map_t map, vm_map_entry_t entry) { VM_MAP_ASSERT_LOCKED(map); KASSERT(entry->wired_count > 0, ("vm_map_entry_unwire: entry %p isn't wired", entry)); pmap_unwire(map->pmap, entry->start, entry->end); vm_object_unwire(entry->object.vm_object, entry->offset, entry->end - entry->start, PQ_ACTIVE); entry->wired_count = 0; } static void vm_map_entry_deallocate(vm_map_entry_t entry, boolean_t system_map) { if ((entry->eflags & MAP_ENTRY_IS_SUB_MAP) == 0) vm_object_deallocate(entry->object.vm_object); uma_zfree(system_map ? kmapentzone : mapentzone, entry); } /* * vm_map_entry_delete: [ internal use only ] * * Deallocate the given entry from the target map. */ static void vm_map_entry_delete(vm_map_t map, vm_map_entry_t entry) { vm_object_t object; vm_pindex_t offidxstart, offidxend, count, size1; vm_ooffset_t size; vm_map_entry_unlink(map, entry); object = entry->object.vm_object; size = entry->end - entry->start; map->size -= size; if (entry->cred != NULL) { swap_release_by_cred(size, entry->cred); crfree(entry->cred); } if ((entry->eflags & MAP_ENTRY_IS_SUB_MAP) == 0 && (object != NULL)) { KASSERT(entry->cred == NULL || object->cred == NULL || (entry->eflags & MAP_ENTRY_NEEDS_COPY), ("OVERCOMMIT vm_map_entry_delete: both cred %p", entry)); count = OFF_TO_IDX(size); offidxstart = OFF_TO_IDX(entry->offset); offidxend = offidxstart + count; VM_OBJECT_WLOCK(object); if (object->ref_count != 1 && ((object->flags & (OBJ_NOSPLIT|OBJ_ONEMAPPING)) == OBJ_ONEMAPPING || object == kernel_object || object == kmem_object)) { vm_object_collapse(object); /* * The option OBJPR_NOTMAPPED can be passed here * because vm_map_delete() already performed * pmap_remove() on the only mapping to this range * of pages. */ vm_object_page_remove(object, offidxstart, offidxend, OBJPR_NOTMAPPED); if (object->type == OBJT_SWAP) swap_pager_freespace(object, offidxstart, count); if (offidxend >= object->size && offidxstart < object->size) { size1 = object->size; object->size = offidxstart; if (object->cred != NULL) { size1 -= object->size; KASSERT(object->charge >= ptoa(size1), ("vm_map_entry_delete: object->charge < 0")); swap_release_by_cred(ptoa(size1), object->cred); object->charge -= ptoa(size1); } } } VM_OBJECT_WUNLOCK(object); } else entry->object.vm_object = NULL; if (map->system_map) vm_map_entry_deallocate(entry, TRUE); else { entry->next = curthread->td_map_def_user; curthread->td_map_def_user = entry; } } /* * vm_map_delete: [ internal use only ] * * Deallocates the given address range from the target * map. */ int vm_map_delete(vm_map_t map, vm_offset_t start, vm_offset_t end) { vm_map_entry_t entry; vm_map_entry_t first_entry; VM_MAP_ASSERT_LOCKED(map); if (start == end) return (KERN_SUCCESS); /* * Find the start of the region, and clip it */ if (!vm_map_lookup_entry(map, start, &first_entry)) entry = first_entry->next; else { entry = first_entry; vm_map_clip_start(map, entry, start); } /* * Step through all entries in this region */ while ((entry != &map->header) && (entry->start < end)) { vm_map_entry_t next; /* * Wait for wiring or unwiring of an entry to complete. * Also wait for any system wirings to disappear on * user maps. */ if ((entry->eflags & MAP_ENTRY_IN_TRANSITION) != 0 || (vm_map_pmap(map) != kernel_pmap && vm_map_entry_system_wired_count(entry) != 0)) { unsigned int last_timestamp; vm_offset_t saved_start; vm_map_entry_t tmp_entry; saved_start = entry->start; entry->eflags |= MAP_ENTRY_NEEDS_WAKEUP; last_timestamp = map->timestamp; (void) vm_map_unlock_and_wait(map, 0); vm_map_lock(map); if (last_timestamp + 1 != map->timestamp) { /* * Look again for the entry because the map was * modified while it was unlocked. * Specifically, the entry may have been * clipped, merged, or deleted. */ if (!vm_map_lookup_entry(map, saved_start, &tmp_entry)) entry = tmp_entry->next; else { entry = tmp_entry; vm_map_clip_start(map, entry, saved_start); } } continue; } vm_map_clip_end(map, entry, end); next = entry->next; /* * Unwire before removing addresses from the pmap; otherwise, * unwiring will put the entries back in the pmap. */ if (entry->wired_count != 0) { vm_map_entry_unwire(map, entry); } pmap_remove(map->pmap, entry->start, entry->end); /* * Delete the entry only after removing all pmap * entries pointing to its pages. (Otherwise, its * page frames may be reallocated, and any modify bits * will be set in the wrong object!) */ vm_map_entry_delete(map, entry); entry = next; } return (KERN_SUCCESS); } /* * vm_map_remove: * * Remove the given address range from the target map. * This is the exported form of vm_map_delete. */ int vm_map_remove(vm_map_t map, vm_offset_t start, vm_offset_t end) { int result; vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); result = vm_map_delete(map, start, end); vm_map_unlock(map); return (result); } /* * vm_map_check_protection: * * Assert that the target map allows the specified privilege on the * entire address region given. The entire region must be allocated. * * WARNING! This code does not and should not check whether the * contents of the region is accessible. For example a smaller file * might be mapped into a larger address space. * * NOTE! This code is also called by munmap(). * * The map must be locked. A read lock is sufficient. */ boolean_t vm_map_check_protection(vm_map_t map, vm_offset_t start, vm_offset_t end, vm_prot_t protection) { vm_map_entry_t entry; vm_map_entry_t tmp_entry; if (!vm_map_lookup_entry(map, start, &tmp_entry)) return (FALSE); entry = tmp_entry; while (start < end) { if (entry == &map->header) return (FALSE); /* * No holes allowed! */ if (start < entry->start) return (FALSE); /* * Check protection associated with entry. */ if ((entry->protection & protection) != protection) return (FALSE); /* go to next entry */ start = entry->end; entry = entry->next; } return (TRUE); } /* * vm_map_copy_entry: * * Copies the contents of the source entry to the destination * entry. The entries *must* be aligned properly. */ static void vm_map_copy_entry( vm_map_t src_map, vm_map_t dst_map, vm_map_entry_t src_entry, vm_map_entry_t dst_entry, vm_ooffset_t *fork_charge) { vm_object_t src_object; vm_map_entry_t fake_entry; vm_offset_t size; struct ucred *cred; int charged; VM_MAP_ASSERT_LOCKED(dst_map); if ((dst_entry->eflags|src_entry->eflags) & MAP_ENTRY_IS_SUB_MAP) return; if (src_entry->wired_count == 0 || (src_entry->protection & VM_PROT_WRITE) == 0) { /* * If the source entry is marked needs_copy, it is already * write-protected. */ if ((src_entry->eflags & MAP_ENTRY_NEEDS_COPY) == 0 && (src_entry->protection & VM_PROT_WRITE) != 0) { pmap_protect(src_map->pmap, src_entry->start, src_entry->end, src_entry->protection & ~VM_PROT_WRITE); } /* * Make a copy of the object. */ size = src_entry->end - src_entry->start; if ((src_object = src_entry->object.vm_object) != NULL) { VM_OBJECT_WLOCK(src_object); charged = ENTRY_CHARGED(src_entry); if ((src_object->handle == NULL) && (src_object->type == OBJT_DEFAULT || src_object->type == OBJT_SWAP)) { vm_object_collapse(src_object); if ((src_object->flags & (OBJ_NOSPLIT|OBJ_ONEMAPPING)) == OBJ_ONEMAPPING) { vm_object_split(src_entry); src_object = src_entry->object.vm_object; } } vm_object_reference_locked(src_object); vm_object_clear_flag(src_object, OBJ_ONEMAPPING); if (src_entry->cred != NULL && !(src_entry->eflags & MAP_ENTRY_NEEDS_COPY)) { KASSERT(src_object->cred == NULL, ("OVERCOMMIT: vm_map_copy_entry: cred %p", src_object)); src_object->cred = src_entry->cred; src_object->charge = size; } VM_OBJECT_WUNLOCK(src_object); dst_entry->object.vm_object = src_object; if (charged) { cred = curthread->td_ucred; crhold(cred); dst_entry->cred = cred; *fork_charge += size; if (!(src_entry->eflags & MAP_ENTRY_NEEDS_COPY)) { crhold(cred); src_entry->cred = cred; *fork_charge += size; } } src_entry->eflags |= (MAP_ENTRY_COW|MAP_ENTRY_NEEDS_COPY); dst_entry->eflags |= (MAP_ENTRY_COW|MAP_ENTRY_NEEDS_COPY); dst_entry->offset = src_entry->offset; if (src_entry->eflags & MAP_ENTRY_VN_WRITECNT) { /* * MAP_ENTRY_VN_WRITECNT cannot * indicate write reference from * src_entry, since the entry is * marked as needs copy. Allocate a * fake entry that is used to * decrement object->un_pager.vnp.writecount * at the appropriate time. Attach * fake_entry to the deferred list. */ fake_entry = vm_map_entry_create(dst_map); fake_entry->eflags = MAP_ENTRY_VN_WRITECNT; src_entry->eflags &= ~MAP_ENTRY_VN_WRITECNT; vm_object_reference(src_object); fake_entry->object.vm_object = src_object; fake_entry->start = src_entry->start; fake_entry->end = src_entry->end; fake_entry->next = curthread->td_map_def_user; curthread->td_map_def_user = fake_entry; } } else { dst_entry->object.vm_object = NULL; dst_entry->offset = 0; if (src_entry->cred != NULL) { dst_entry->cred = curthread->td_ucred; crhold(dst_entry->cred); *fork_charge += size; } } pmap_copy(dst_map->pmap, src_map->pmap, dst_entry->start, dst_entry->end - dst_entry->start, src_entry->start); } else { /* * We don't want to make writeable wired pages copy-on-write. * Immediately copy these pages into the new map by simulating * page faults. The new pages are pageable. */ vm_fault_copy_entry(dst_map, src_map, dst_entry, src_entry, fork_charge); } } /* * vmspace_map_entry_forked: * Update the newly-forked vmspace each time a map entry is inherited * or copied. The values for vm_dsize and vm_tsize are approximate * (and mostly-obsolete ideas in the face of mmap(2) et al.) */ static void vmspace_map_entry_forked(const struct vmspace *vm1, struct vmspace *vm2, vm_map_entry_t entry) { vm_size_t entrysize; vm_offset_t newend; entrysize = entry->end - entry->start; vm2->vm_map.size += entrysize; if (entry->eflags & (MAP_ENTRY_GROWS_DOWN | MAP_ENTRY_GROWS_UP)) { vm2->vm_ssize += btoc(entrysize); } else if (entry->start >= (vm_offset_t)vm1->vm_daddr && entry->start < (vm_offset_t)vm1->vm_daddr + ctob(vm1->vm_dsize)) { newend = MIN(entry->end, (vm_offset_t)vm1->vm_daddr + ctob(vm1->vm_dsize)); vm2->vm_dsize += btoc(newend - entry->start); } else if (entry->start >= (vm_offset_t)vm1->vm_taddr && entry->start < (vm_offset_t)vm1->vm_taddr + ctob(vm1->vm_tsize)) { newend = MIN(entry->end, (vm_offset_t)vm1->vm_taddr + ctob(vm1->vm_tsize)); vm2->vm_tsize += btoc(newend - entry->start); } } /* * vmspace_fork: * Create a new process vmspace structure and vm_map * based on those of an existing process. The new map * is based on the old map, according to the inheritance * values on the regions in that map. * * XXX It might be worth coalescing the entries added to the new vmspace. * * The source map must not be locked. */ struct vmspace * vmspace_fork(struct vmspace *vm1, vm_ooffset_t *fork_charge) { struct vmspace *vm2; vm_map_t new_map, old_map; vm_map_entry_t new_entry, old_entry; vm_object_t object; int locked; old_map = &vm1->vm_map; /* Copy immutable fields of vm1 to vm2. */ vm2 = vmspace_alloc(old_map->min_offset, old_map->max_offset, NULL); if (vm2 == NULL) return (NULL); vm2->vm_taddr = vm1->vm_taddr; vm2->vm_daddr = vm1->vm_daddr; vm2->vm_maxsaddr = vm1->vm_maxsaddr; vm_map_lock(old_map); if (old_map->busy) vm_map_wait_busy(old_map); new_map = &vm2->vm_map; locked = vm_map_trylock(new_map); /* trylock to silence WITNESS */ KASSERT(locked, ("vmspace_fork: lock failed")); old_entry = old_map->header.next; while (old_entry != &old_map->header) { if (old_entry->eflags & MAP_ENTRY_IS_SUB_MAP) panic("vm_map_fork: encountered a submap"); switch (old_entry->inheritance) { case VM_INHERIT_NONE: break; case VM_INHERIT_SHARE: /* * Clone the entry, creating the shared object if necessary. */ object = old_entry->object.vm_object; if (object == NULL) { object = vm_object_allocate(OBJT_DEFAULT, atop(old_entry->end - old_entry->start)); old_entry->object.vm_object = object; old_entry->offset = 0; if (old_entry->cred != NULL) { object->cred = old_entry->cred; object->charge = old_entry->end - old_entry->start; old_entry->cred = NULL; } } /* * Add the reference before calling vm_object_shadow * to insure that a shadow object is created. */ vm_object_reference(object); if (old_entry->eflags & MAP_ENTRY_NEEDS_COPY) { vm_object_shadow(&old_entry->object.vm_object, &old_entry->offset, old_entry->end - old_entry->start); old_entry->eflags &= ~MAP_ENTRY_NEEDS_COPY; /* Transfer the second reference too. */ vm_object_reference( old_entry->object.vm_object); /* * As in vm_map_simplify_entry(), the * vnode lock will not be acquired in * this call to vm_object_deallocate(). */ vm_object_deallocate(object); object = old_entry->object.vm_object; } VM_OBJECT_WLOCK(object); vm_object_clear_flag(object, OBJ_ONEMAPPING); if (old_entry->cred != NULL) { KASSERT(object->cred == NULL, ("vmspace_fork both cred")); object->cred = old_entry->cred; object->charge = old_entry->end - old_entry->start; old_entry->cred = NULL; } /* * Assert the correct state of the vnode * v_writecount while the object is locked, to * not relock it later for the assertion * correctness. */ if (old_entry->eflags & MAP_ENTRY_VN_WRITECNT && object->type == OBJT_VNODE) { KASSERT(((struct vnode *)object->handle)-> v_writecount > 0, ("vmspace_fork: v_writecount %p", object)); KASSERT(object->un_pager.vnp.writemappings > 0, ("vmspace_fork: vnp.writecount %p", object)); } VM_OBJECT_WUNLOCK(object); /* * Clone the entry, referencing the shared object. */ new_entry = vm_map_entry_create(new_map); *new_entry = *old_entry; new_entry->eflags &= ~(MAP_ENTRY_USER_WIRED | MAP_ENTRY_IN_TRANSITION); new_entry->wiring_thread = NULL; new_entry->wired_count = 0; if (new_entry->eflags & MAP_ENTRY_VN_WRITECNT) { vnode_pager_update_writecount(object, new_entry->start, new_entry->end); } /* * Insert the entry into the new map -- we know we're * inserting at the end of the new map. */ vm_map_entry_link(new_map, new_map->header.prev, new_entry); vmspace_map_entry_forked(vm1, vm2, new_entry); /* * Update the physical map */ pmap_copy(new_map->pmap, old_map->pmap, new_entry->start, (old_entry->end - old_entry->start), old_entry->start); break; case VM_INHERIT_COPY: /* * Clone the entry and link into the map. */ new_entry = vm_map_entry_create(new_map); *new_entry = *old_entry; /* * Copied entry is COW over the old object. */ new_entry->eflags &= ~(MAP_ENTRY_USER_WIRED | MAP_ENTRY_IN_TRANSITION | MAP_ENTRY_VN_WRITECNT); new_entry->wiring_thread = NULL; new_entry->wired_count = 0; new_entry->object.vm_object = NULL; new_entry->cred = NULL; vm_map_entry_link(new_map, new_map->header.prev, new_entry); vmspace_map_entry_forked(vm1, vm2, new_entry); vm_map_copy_entry(old_map, new_map, old_entry, new_entry, fork_charge); break; } old_entry = old_entry->next; } /* * Use inlined vm_map_unlock() to postpone handling the deferred * map entries, which cannot be done until both old_map and * new_map locks are released. */ sx_xunlock(&old_map->lock); sx_xunlock(&new_map->lock); vm_map_process_deferred(); return (vm2); } int vm_map_stack(vm_map_t map, vm_offset_t addrbos, vm_size_t max_ssize, vm_prot_t prot, vm_prot_t max, int cow) { vm_size_t growsize, init_ssize; rlim_t lmemlim, vmemlim; int rv; growsize = sgrowsiz; init_ssize = (max_ssize < growsize) ? max_ssize : growsize; vm_map_lock(map); lmemlim = lim_cur(curthread, RLIMIT_MEMLOCK); vmemlim = lim_cur(curthread, RLIMIT_VMEM); if (!old_mlock && map->flags & MAP_WIREFUTURE) { if (ptoa(pmap_wired_count(map->pmap)) + init_ssize > lmemlim) { rv = KERN_NO_SPACE; goto out; } } /* If we would blow our VMEM resource limit, no go */ if (map->size + init_ssize > vmemlim) { rv = KERN_NO_SPACE; goto out; } rv = vm_map_stack_locked(map, addrbos, max_ssize, growsize, prot, max, cow); out: vm_map_unlock(map); return (rv); } static int vm_map_stack_locked(vm_map_t map, vm_offset_t addrbos, vm_size_t max_ssize, vm_size_t growsize, vm_prot_t prot, vm_prot_t max, int cow) { vm_map_entry_t new_entry, prev_entry; vm_offset_t bot, top; vm_size_t init_ssize; int orient, rv; /* * The stack orientation is piggybacked with the cow argument. * Extract it into orient and mask the cow argument so that we * don't pass it around further. * NOTE: We explicitly allow bi-directional stacks. */ orient = cow & (MAP_STACK_GROWS_DOWN|MAP_STACK_GROWS_UP); KASSERT(orient != 0, ("No stack grow direction")); if (addrbos < vm_map_min(map) || addrbos > vm_map_max(map) || addrbos + max_ssize < addrbos) return (KERN_NO_SPACE); init_ssize = (max_ssize < growsize) ? max_ssize : growsize; /* If addr is already mapped, no go */ if (vm_map_lookup_entry(map, addrbos, &prev_entry)) return (KERN_NO_SPACE); /* * If we can't accomodate max_ssize in the current mapping, no go. * However, we need to be aware that subsequent user mappings might * map into the space we have reserved for stack, and currently this * space is not protected. * * Hopefully we will at least detect this condition when we try to * grow the stack. */ if ((prev_entry->next != &map->header) && (prev_entry->next->start < addrbos + max_ssize)) return (KERN_NO_SPACE); /* * We initially map a stack of only init_ssize. We will grow as * needed later. Depending on the orientation of the stack (i.e. * the grow direction) we either map at the top of the range, the * bottom of the range or in the middle. * * Note: we would normally expect prot and max to be VM_PROT_ALL, * and cow to be 0. Possibly we should eliminate these as input * parameters, and just pass these values here in the insert call. */ if (orient == MAP_STACK_GROWS_DOWN) bot = addrbos + max_ssize - init_ssize; else if (orient == MAP_STACK_GROWS_UP) bot = addrbos; else bot = round_page(addrbos + max_ssize/2 - init_ssize/2); top = bot + init_ssize; rv = vm_map_insert(map, NULL, 0, bot, top, prot, max, cow); /* Now set the avail_ssize amount. */ if (rv == KERN_SUCCESS) { new_entry = prev_entry->next; if (new_entry->end != top || new_entry->start != bot) panic("Bad entry start/end for new stack entry"); new_entry->avail_ssize = max_ssize - init_ssize; KASSERT((orient & MAP_STACK_GROWS_DOWN) == 0 || (new_entry->eflags & MAP_ENTRY_GROWS_DOWN) != 0, ("new entry lacks MAP_ENTRY_GROWS_DOWN")); KASSERT((orient & MAP_STACK_GROWS_UP) == 0 || (new_entry->eflags & MAP_ENTRY_GROWS_UP) != 0, ("new entry lacks MAP_ENTRY_GROWS_UP")); } return (rv); } static int stack_guard_page = 0; SYSCTL_INT(_security_bsd, OID_AUTO, stack_guard_page, CTLFLAG_RWTUN, &stack_guard_page, 0, "Insert stack guard page ahead of the growable segments."); /* Attempts to grow a vm stack entry. Returns KERN_SUCCESS if the * desired address is already mapped, or if we successfully grow * the stack. Also returns KERN_SUCCESS if addr is outside the * stack range (this is strange, but preserves compatibility with * the grow function in vm_machdep.c). */ int vm_map_growstack(struct proc *p, vm_offset_t addr) { vm_map_entry_t next_entry, prev_entry; vm_map_entry_t new_entry, stack_entry; struct vmspace *vm = p->p_vmspace; vm_map_t map = &vm->vm_map; vm_offset_t end; vm_size_t growsize; size_t grow_amount, max_grow; rlim_t lmemlim, stacklim, vmemlim; int is_procstack, rv; struct ucred *cred; #ifdef notyet uint64_t limit; #endif #ifdef RACCT int error; #endif lmemlim = lim_cur(curthread, RLIMIT_MEMLOCK); stacklim = lim_cur(curthread, RLIMIT_STACK); vmemlim = lim_cur(curthread, RLIMIT_VMEM); Retry: vm_map_lock_read(map); /* If addr is already in the entry range, no need to grow.*/ if (vm_map_lookup_entry(map, addr, &prev_entry)) { vm_map_unlock_read(map); return (KERN_SUCCESS); } next_entry = prev_entry->next; if (!(prev_entry->eflags & MAP_ENTRY_GROWS_UP)) { /* * This entry does not grow upwards. Since the address lies * beyond this entry, the next entry (if one exists) has to * be a downward growable entry. The entry list header is * never a growable entry, so it suffices to check the flags. */ if (!(next_entry->eflags & MAP_ENTRY_GROWS_DOWN)) { vm_map_unlock_read(map); return (KERN_SUCCESS); } stack_entry = next_entry; } else { /* * This entry grows upward. If the next entry does not at * least grow downwards, this is the entry we need to grow. * otherwise we have two possible choices and we have to * select one. */ if (next_entry->eflags & MAP_ENTRY_GROWS_DOWN) { /* * We have two choices; grow the entry closest to * the address to minimize the amount of growth. */ if (addr - prev_entry->end <= next_entry->start - addr) stack_entry = prev_entry; else stack_entry = next_entry; } else stack_entry = prev_entry; } if (stack_entry == next_entry) { KASSERT(stack_entry->eflags & MAP_ENTRY_GROWS_DOWN, ("foo")); KASSERT(addr < stack_entry->start, ("foo")); end = (prev_entry != &map->header) ? prev_entry->end : stack_entry->start - stack_entry->avail_ssize; grow_amount = roundup(stack_entry->start - addr, PAGE_SIZE); max_grow = stack_entry->start - end; } else { KASSERT(stack_entry->eflags & MAP_ENTRY_GROWS_UP, ("foo")); KASSERT(addr >= stack_entry->end, ("foo")); end = (next_entry != &map->header) ? next_entry->start : stack_entry->end + stack_entry->avail_ssize; grow_amount = roundup(addr + 1 - stack_entry->end, PAGE_SIZE); max_grow = end - stack_entry->end; } if (grow_amount > stack_entry->avail_ssize) { vm_map_unlock_read(map); return (KERN_NO_SPACE); } /* * If there is no longer enough space between the entries nogo, and * adjust the available space. Note: this should only happen if the * user has mapped into the stack area after the stack was created, * and is probably an error. * * This also effectively destroys any guard page the user might have * intended by limiting the stack size. */ if (grow_amount + (stack_guard_page ? PAGE_SIZE : 0) > max_grow) { if (vm_map_lock_upgrade(map)) goto Retry; stack_entry->avail_ssize = max_grow; vm_map_unlock(map); return (KERN_NO_SPACE); } is_procstack = (addr >= (vm_offset_t)vm->vm_maxsaddr && addr < (vm_offset_t)p->p_sysent->sv_usrstack) ? 1 : 0; /* * If this is the main process stack, see if we're over the stack * limit. */ if (is_procstack && (ctob(vm->vm_ssize) + grow_amount > stacklim)) { vm_map_unlock_read(map); return (KERN_NO_SPACE); } #ifdef RACCT if (racct_enable) { PROC_LOCK(p); if (is_procstack && racct_set(p, RACCT_STACK, ctob(vm->vm_ssize) + grow_amount)) { PROC_UNLOCK(p); vm_map_unlock_read(map); return (KERN_NO_SPACE); } PROC_UNLOCK(p); } #endif /* Round up the grow amount modulo sgrowsiz */ growsize = sgrowsiz; grow_amount = roundup(grow_amount, growsize); if (grow_amount > stack_entry->avail_ssize) grow_amount = stack_entry->avail_ssize; if (is_procstack && (ctob(vm->vm_ssize) + grow_amount > stacklim)) { grow_amount = trunc_page((vm_size_t)stacklim) - ctob(vm->vm_ssize); } #ifdef notyet PROC_LOCK(p); limit = racct_get_available(p, RACCT_STACK); PROC_UNLOCK(p); if (is_procstack && (ctob(vm->vm_ssize) + grow_amount > limit)) grow_amount = limit - ctob(vm->vm_ssize); #endif if (!old_mlock && map->flags & MAP_WIREFUTURE) { if (ptoa(pmap_wired_count(map->pmap)) + grow_amount > lmemlim) { vm_map_unlock_read(map); rv = KERN_NO_SPACE; goto out; } #ifdef RACCT if (racct_enable) { PROC_LOCK(p); if (racct_set(p, RACCT_MEMLOCK, ptoa(pmap_wired_count(map->pmap)) + grow_amount)) { PROC_UNLOCK(p); vm_map_unlock_read(map); rv = KERN_NO_SPACE; goto out; } PROC_UNLOCK(p); } #endif } /* If we would blow our VMEM resource limit, no go */ if (map->size + grow_amount > vmemlim) { vm_map_unlock_read(map); rv = KERN_NO_SPACE; goto out; } #ifdef RACCT if (racct_enable) { PROC_LOCK(p); if (racct_set(p, RACCT_VMEM, map->size + grow_amount)) { PROC_UNLOCK(p); vm_map_unlock_read(map); rv = KERN_NO_SPACE; goto out; } PROC_UNLOCK(p); } #endif if (vm_map_lock_upgrade(map)) goto Retry; if (stack_entry == next_entry) { /* * Growing downward. */ /* Get the preliminary new entry start value */ addr = stack_entry->start - grow_amount; /* * If this puts us into the previous entry, cut back our * growth to the available space. Also, see the note above. */ if (addr < end) { stack_entry->avail_ssize = max_grow; addr = end; if (stack_guard_page) addr += PAGE_SIZE; } rv = vm_map_insert(map, NULL, 0, addr, stack_entry->start, next_entry->protection, next_entry->max_protection, MAP_STACK_GROWS_DOWN); /* Adjust the available stack space by the amount we grew. */ if (rv == KERN_SUCCESS) { new_entry = prev_entry->next; KASSERT(new_entry == stack_entry->prev, ("foo")); KASSERT(new_entry->end == stack_entry->start, ("foo")); KASSERT(new_entry->start == addr, ("foo")); KASSERT((new_entry->eflags & MAP_ENTRY_GROWS_DOWN) != 0, ("new entry lacks MAP_ENTRY_GROWS_DOWN")); grow_amount = new_entry->end - new_entry->start; new_entry->avail_ssize = stack_entry->avail_ssize - grow_amount; stack_entry->eflags &= ~MAP_ENTRY_GROWS_DOWN; } } else { /* * Growing upward. */ addr = stack_entry->end + grow_amount; /* * If this puts us into the next entry, cut back our growth * to the available space. Also, see the note above. */ if (addr > end) { stack_entry->avail_ssize = end - stack_entry->end; addr = end; if (stack_guard_page) addr -= PAGE_SIZE; } grow_amount = addr - stack_entry->end; cred = stack_entry->cred; if (cred == NULL && stack_entry->object.vm_object != NULL) cred = stack_entry->object.vm_object->cred; if (cred != NULL && !swap_reserve_by_cred(grow_amount, cred)) rv = KERN_NO_SPACE; /* Grow the underlying object if applicable. */ else if (stack_entry->object.vm_object == NULL || vm_object_coalesce(stack_entry->object.vm_object, stack_entry->offset, (vm_size_t)(stack_entry->end - stack_entry->start), (vm_size_t)grow_amount, cred != NULL)) { map->size += (addr - stack_entry->end); /* Update the current entry. */ stack_entry->end = addr; stack_entry->avail_ssize -= grow_amount; vm_map_entry_resize_free(map, stack_entry); rv = KERN_SUCCESS; } else rv = KERN_FAILURE; } if (rv == KERN_SUCCESS && is_procstack) vm->vm_ssize += btoc(grow_amount); vm_map_unlock(map); /* * Heed the MAP_WIREFUTURE flag if it was set for this process. */ if (rv == KERN_SUCCESS && (map->flags & MAP_WIREFUTURE)) { vm_map_wire(map, (stack_entry == next_entry) ? addr : addr - grow_amount, (stack_entry == next_entry) ? stack_entry->start : addr, (p->p_flag & P_SYSTEM) ? VM_MAP_WIRE_SYSTEM|VM_MAP_WIRE_NOHOLES : VM_MAP_WIRE_USER|VM_MAP_WIRE_NOHOLES); } out: #ifdef RACCT if (racct_enable && rv != KERN_SUCCESS) { PROC_LOCK(p); error = racct_set(p, RACCT_VMEM, map->size); KASSERT(error == 0, ("decreasing RACCT_VMEM failed")); if (!old_mlock) { error = racct_set(p, RACCT_MEMLOCK, ptoa(pmap_wired_count(map->pmap))); KASSERT(error == 0, ("decreasing RACCT_MEMLOCK failed")); } error = racct_set(p, RACCT_STACK, ctob(vm->vm_ssize)); KASSERT(error == 0, ("decreasing RACCT_STACK failed")); PROC_UNLOCK(p); } #endif return (rv); } /* * Unshare the specified VM space for exec. If other processes are * mapped to it, then create a new one. The new vmspace is null. */ int vmspace_exec(struct proc *p, vm_offset_t minuser, vm_offset_t maxuser) { struct vmspace *oldvmspace = p->p_vmspace; struct vmspace *newvmspace; KASSERT((curthread->td_pflags & TDP_EXECVMSPC) == 0, ("vmspace_exec recursed")); newvmspace = vmspace_alloc(minuser, maxuser, NULL); if (newvmspace == NULL) return (ENOMEM); newvmspace->vm_swrss = oldvmspace->vm_swrss; /* * This code is written like this for prototype purposes. The * goal is to avoid running down the vmspace here, but let the * other process's that are still using the vmspace to finally * run it down. Even though there is little or no chance of blocking * here, it is a good idea to keep this form for future mods. */ PROC_VMSPACE_LOCK(p); p->p_vmspace = newvmspace; PROC_VMSPACE_UNLOCK(p); if (p == curthread->td_proc) pmap_activate(curthread); curthread->td_pflags |= TDP_EXECVMSPC; return (0); } /* * Unshare the specified VM space for forcing COW. This * is called by rfork, for the (RFMEM|RFPROC) == 0 case. */ int vmspace_unshare(struct proc *p) { struct vmspace *oldvmspace = p->p_vmspace; struct vmspace *newvmspace; vm_ooffset_t fork_charge; if (oldvmspace->vm_refcnt == 1) return (0); fork_charge = 0; newvmspace = vmspace_fork(oldvmspace, &fork_charge); if (newvmspace == NULL) return (ENOMEM); if (!swap_reserve_by_cred(fork_charge, p->p_ucred)) { vmspace_free(newvmspace); return (ENOMEM); } PROC_VMSPACE_LOCK(p); p->p_vmspace = newvmspace; PROC_VMSPACE_UNLOCK(p); if (p == curthread->td_proc) pmap_activate(curthread); vmspace_free(oldvmspace); return (0); } /* * vm_map_lookup: * * Finds the VM object, offset, and * protection for a given virtual address in the * specified map, assuming a page fault of the * type specified. * * Leaves the map in question locked for read; return * values are guaranteed until a vm_map_lookup_done * call is performed. Note that the map argument * is in/out; the returned map must be used in * the call to vm_map_lookup_done. * * A handle (out_entry) is returned for use in * vm_map_lookup_done, to make that fast. * * If a lookup is requested with "write protection" * specified, the map may be changed to perform virtual * copying operations, although the data referenced will * remain the same. */ int vm_map_lookup(vm_map_t *var_map, /* IN/OUT */ vm_offset_t vaddr, vm_prot_t fault_typea, vm_map_entry_t *out_entry, /* OUT */ vm_object_t *object, /* OUT */ vm_pindex_t *pindex, /* OUT */ vm_prot_t *out_prot, /* OUT */ boolean_t *wired) /* OUT */ { vm_map_entry_t entry; vm_map_t map = *var_map; vm_prot_t prot; vm_prot_t fault_type = fault_typea; vm_object_t eobject; vm_size_t size; struct ucred *cred; RetryLookup:; vm_map_lock_read(map); /* * Lookup the faulting address. */ if (!vm_map_lookup_entry(map, vaddr, out_entry)) { vm_map_unlock_read(map); return (KERN_INVALID_ADDRESS); } entry = *out_entry; /* * Handle submaps. */ if (entry->eflags & MAP_ENTRY_IS_SUB_MAP) { vm_map_t old_map = map; *var_map = map = entry->object.sub_map; vm_map_unlock_read(old_map); goto RetryLookup; } /* * Check whether this task is allowed to have this page. */ prot = entry->protection; fault_type &= (VM_PROT_READ|VM_PROT_WRITE|VM_PROT_EXECUTE); if ((fault_type & prot) != fault_type || prot == VM_PROT_NONE) { vm_map_unlock_read(map); return (KERN_PROTECTION_FAILURE); } KASSERT((prot & VM_PROT_WRITE) == 0 || (entry->eflags & (MAP_ENTRY_USER_WIRED | MAP_ENTRY_NEEDS_COPY)) != (MAP_ENTRY_USER_WIRED | MAP_ENTRY_NEEDS_COPY), ("entry %p flags %x", entry, entry->eflags)); if ((fault_typea & VM_PROT_COPY) != 0 && (entry->max_protection & VM_PROT_WRITE) == 0 && (entry->eflags & MAP_ENTRY_COW) == 0) { vm_map_unlock_read(map); return (KERN_PROTECTION_FAILURE); } /* * If this page is not pageable, we have to get it for all possible * accesses. */ *wired = (entry->wired_count != 0); if (*wired) fault_type = entry->protection; size = entry->end - entry->start; /* * If the entry was copy-on-write, we either ... */ if (entry->eflags & MAP_ENTRY_NEEDS_COPY) { /* * If we want to write the page, we may as well handle that * now since we've got the map locked. * * If we don't need to write the page, we just demote the * permissions allowed. */ if ((fault_type & VM_PROT_WRITE) != 0 || (fault_typea & VM_PROT_COPY) != 0) { /* * Make a new object, and place it in the object * chain. Note that no new references have appeared * -- one just moved from the map to the new * object. */ if (vm_map_lock_upgrade(map)) goto RetryLookup; if (entry->cred == NULL) { /* * The debugger owner is charged for * the memory. */ cred = curthread->td_ucred; crhold(cred); if (!swap_reserve_by_cred(size, cred)) { crfree(cred); vm_map_unlock(map); return (KERN_RESOURCE_SHORTAGE); } entry->cred = cred; } vm_object_shadow(&entry->object.vm_object, &entry->offset, size); entry->eflags &= ~MAP_ENTRY_NEEDS_COPY; eobject = entry->object.vm_object; if (eobject->cred != NULL) { /* * The object was not shadowed. */ swap_release_by_cred(size, entry->cred); crfree(entry->cred); entry->cred = NULL; } else if (entry->cred != NULL) { VM_OBJECT_WLOCK(eobject); eobject->cred = entry->cred; eobject->charge = size; VM_OBJECT_WUNLOCK(eobject); entry->cred = NULL; } vm_map_lock_downgrade(map); } else { /* * We're attempting to read a copy-on-write page -- * don't allow writes. */ prot &= ~VM_PROT_WRITE; } } /* * Create an object if necessary. */ if (entry->object.vm_object == NULL && !map->system_map) { if (vm_map_lock_upgrade(map)) goto RetryLookup; entry->object.vm_object = vm_object_allocate(OBJT_DEFAULT, atop(size)); entry->offset = 0; if (entry->cred != NULL) { VM_OBJECT_WLOCK(entry->object.vm_object); entry->object.vm_object->cred = entry->cred; entry->object.vm_object->charge = size; VM_OBJECT_WUNLOCK(entry->object.vm_object); entry->cred = NULL; } vm_map_lock_downgrade(map); } /* * Return the object/offset from this entry. If the entry was * copy-on-write or empty, it has been fixed up. */ *pindex = OFF_TO_IDX((vaddr - entry->start) + entry->offset); *object = entry->object.vm_object; *out_prot = prot; return (KERN_SUCCESS); } /* * vm_map_lookup_locked: * * Lookup the faulting address. A version of vm_map_lookup that returns * KERN_FAILURE instead of blocking on map lock or memory allocation. */ int vm_map_lookup_locked(vm_map_t *var_map, /* IN/OUT */ vm_offset_t vaddr, vm_prot_t fault_typea, vm_map_entry_t *out_entry, /* OUT */ vm_object_t *object, /* OUT */ vm_pindex_t *pindex, /* OUT */ vm_prot_t *out_prot, /* OUT */ boolean_t *wired) /* OUT */ { vm_map_entry_t entry; vm_map_t map = *var_map; vm_prot_t prot; vm_prot_t fault_type = fault_typea; /* * Lookup the faulting address. */ if (!vm_map_lookup_entry(map, vaddr, out_entry)) return (KERN_INVALID_ADDRESS); entry = *out_entry; /* * Fail if the entry refers to a submap. */ if (entry->eflags & MAP_ENTRY_IS_SUB_MAP) return (KERN_FAILURE); /* * Check whether this task is allowed to have this page. */ prot = entry->protection; fault_type &= VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE; if ((fault_type & prot) != fault_type) return (KERN_PROTECTION_FAILURE); /* * If this page is not pageable, we have to get it for all possible * accesses. */ *wired = (entry->wired_count != 0); if (*wired) fault_type = entry->protection; if (entry->eflags & MAP_ENTRY_NEEDS_COPY) { /* * Fail if the entry was copy-on-write for a write fault. */ if (fault_type & VM_PROT_WRITE) return (KERN_FAILURE); /* * We're attempting to read a copy-on-write page -- * don't allow writes. */ prot &= ~VM_PROT_WRITE; } /* * Fail if an object should be created. */ if (entry->object.vm_object == NULL && !map->system_map) return (KERN_FAILURE); /* * Return the object/offset from this entry. If the entry was * copy-on-write or empty, it has been fixed up. */ *pindex = OFF_TO_IDX((vaddr - entry->start) + entry->offset); *object = entry->object.vm_object; *out_prot = prot; return (KERN_SUCCESS); } /* * vm_map_lookup_done: * * Releases locks acquired by a vm_map_lookup * (according to the handle returned by that lookup). */ void vm_map_lookup_done(vm_map_t map, vm_map_entry_t entry) { /* * Unlock the main-level map */ vm_map_unlock_read(map); } #include "opt_ddb.h" #ifdef DDB #include #include static void vm_map_print(vm_map_t map) { vm_map_entry_t entry; db_iprintf("Task map %p: pmap=%p, nentries=%d, version=%u\n", (void *)map, (void *)map->pmap, map->nentries, map->timestamp); db_indent += 2; for (entry = map->header.next; entry != &map->header; entry = entry->next) { db_iprintf("map entry %p: start=%p, end=%p\n", (void *)entry, (void *)entry->start, (void *)entry->end); { static char *inheritance_name[4] = {"share", "copy", "none", "donate_copy"}; db_iprintf(" prot=%x/%x/%s", entry->protection, entry->max_protection, inheritance_name[(int)(unsigned char)entry->inheritance]); if (entry->wired_count != 0) db_printf(", wired"); } if (entry->eflags & MAP_ENTRY_IS_SUB_MAP) { db_printf(", share=%p, offset=0x%jx\n", (void *)entry->object.sub_map, (uintmax_t)entry->offset); if ((entry->prev == &map->header) || (entry->prev->object.sub_map != entry->object.sub_map)) { db_indent += 2; vm_map_print((vm_map_t)entry->object.sub_map); db_indent -= 2; } } else { if (entry->cred != NULL) db_printf(", ruid %d", entry->cred->cr_ruid); db_printf(", object=%p, offset=0x%jx", (void *)entry->object.vm_object, (uintmax_t)entry->offset); if (entry->object.vm_object && entry->object.vm_object->cred) db_printf(", obj ruid %d charge %jx", entry->object.vm_object->cred->cr_ruid, (uintmax_t)entry->object.vm_object->charge); if (entry->eflags & MAP_ENTRY_COW) db_printf(", copy (%s)", (entry->eflags & MAP_ENTRY_NEEDS_COPY) ? "needed" : "done"); db_printf("\n"); if ((entry->prev == &map->header) || (entry->prev->object.vm_object != entry->object.vm_object)) { db_indent += 2; vm_object_print((db_expr_t)(intptr_t) entry->object.vm_object, 0, 0, (char *)0); db_indent -= 2; } } } db_indent -= 2; } DB_SHOW_COMMAND(map, map) { if (!have_addr) { db_printf("usage: show map \n"); return; } vm_map_print((vm_map_t)addr); } DB_SHOW_COMMAND(procvm, procvm) { struct proc *p; if (have_addr) { p = (struct proc *) addr; } else { p = curproc; } db_printf("p = %p, vmspace = %p, map = %p, pmap = %p\n", (void *)p, (void *)p->p_vmspace, (void *)&p->p_vmspace->vm_map, (void *)vmspace_pmap(p->p_vmspace)); vm_map_print((vm_map_t)&p->p_vmspace->vm_map); } #endif /* DDB */ Index: projects/clang380-import/sys =================================================================== --- projects/clang380-import/sys (revision 294776) +++ projects/clang380-import/sys (revision 294777) Property changes on: projects/clang380-import/sys ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys:r294599-294776 Index: projects/clang380-import/tools/regression/sockets/unix_cmsg/unix_cmsg.c =================================================================== --- projects/clang380-import/tools/regression/sockets/unix_cmsg/unix_cmsg.c (revision 294776) +++ projects/clang380-import/tools/regression/sockets/unix_cmsg/unix_cmsg.c (revision 294777) @@ -1,1969 +1,1998 @@ /*- * Copyright (c) 2005 Andrey Simonenko * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* * There are tables with tests descriptions and pointers to test * functions. Each t_*() function returns 0 if its test passed, * -1 if its test failed, -2 if some system error occurred. * If a test function returns -2, then a program exits. * * If a test function forks a client process, then it waits for its * termination. If a return code of a client process is not equal * to zero, or if a client process was terminated by a signal, then * a test function returns -1 or -2 depending on exit status of * a client process. * * Each function which can block, is run under TIMEOUT. If timeout * occurs, then a test function returns -2 or a client process exits * with a non-zero return code. */ #ifndef LISTENQ # define LISTENQ 1 #endif #ifndef TIMEOUT # define TIMEOUT 2 #endif static int t_cmsgcred(void); static int t_sockcred_1(void); static int t_sockcred_2(void); static int t_cmsgcred_sockcred(void); static int t_timeval(void); static int t_bintime(void); +/* + * The testcase fails on 64-bit architectures (amd64), but passes on 32-bit + * architectures (i386); see bug 206543 + */ +#ifndef __LP64__ static int t_cmsg_len(void); +#endif static int t_peercred(void); struct test_func { int (*func)(void); const char *desc; }; static const struct test_func test_stream_tbl[] = { { .func = NULL, .desc = "All tests" }, { .func = t_cmsgcred, .desc = "Sending, receiving cmsgcred" }, { .func = t_sockcred_1, .desc = "Receiving sockcred (listening socket)" }, { .func = t_sockcred_2, .desc = "Receiving sockcred (accepted socket)" }, { .func = t_cmsgcred_sockcred, .desc = "Sending cmsgcred, receiving sockcred" }, { .func = t_timeval, .desc = "Sending, receiving timeval" }, { .func = t_bintime, .desc = "Sending, receiving bintime" }, +#ifndef __LP64__ { .func = t_cmsg_len, .desc = "Check cmsghdr.cmsg_len" }, +#endif { .func = t_peercred, .desc = "Check LOCAL_PEERCRED socket option" } }; #define TEST_STREAM_TBL_SIZE \ (sizeof(test_stream_tbl) / sizeof(test_stream_tbl[0])) static const struct test_func test_dgram_tbl[] = { { .func = NULL, .desc = "All tests" }, { .func = t_cmsgcred, .desc = "Sending, receiving cmsgcred" }, { .func = t_sockcred_2, .desc = "Receiving sockcred" }, { .func = t_cmsgcred_sockcred, .desc = "Sending cmsgcred, receiving sockcred" }, { .func = t_timeval, .desc = "Sending, receiving timeval" }, { .func = t_bintime, .desc = "Sending, receiving bintime" }, +#ifndef __LP64__ { .func = t_cmsg_len, .desc = "Check cmsghdr.cmsg_len" } +#endif }; #define TEST_DGRAM_TBL_SIZE \ (sizeof(test_dgram_tbl) / sizeof(test_dgram_tbl[0])) static bool debug = false; static bool server_flag = true; static bool send_data_flag = true; static bool send_array_flag = true; static bool failed_flag = false; static int sock_type; static const char *sock_type_str; static const char *proc_name; static char work_dir[] = _PATH_TMP "unix_cmsg.XXXXXXX"; static int serv_sock_fd; static struct sockaddr_un serv_addr_sun; static struct { char *buf_send; char *buf_recv; size_t buf_size; u_int msg_num; } ipc_msg; #define IPC_MSG_NUM_DEF 5 #define IPC_MSG_NUM_MAX 10 #define IPC_MSG_SIZE_DEF 7 #define IPC_MSG_SIZE_MAX 128 static struct { uid_t uid; uid_t euid; gid_t gid; gid_t egid; gid_t *gid_arr; int gid_num; } proc_cred; static pid_t client_pid; #define SYNC_SERVER 0 #define SYNC_CLIENT 1 #define SYNC_RECV 0 #define SYNC_SEND 1 static int sync_fd[2][2]; #define LOGMSG_SIZE 128 static void logmsg(const char *, ...) __printflike(1, 2); static void logmsgx(const char *, ...) __printflike(1, 2); static void dbgmsg(const char *, ...) __printflike(1, 2); static void output(const char *, ...) __printflike(1, 2); static void usage(bool verbose) { u_int i; printf("usage: %s [-dh] [-n num] [-s size] [-t type] " "[-z value] [testno]\n", getprogname()); if (!verbose) return; printf("\n Options are:\n\ -d Output debugging information\n\ -h Output the help message and exit\n\ -n num Number of messages to send\n\ -s size Specify size of data for IPC\n\ -t type Specify socket type (stream, dgram) for tests\n\ -z value Do not send data in a message (bit 0x1), do not send\n\ data array associated with a cmsghdr structure (bit 0x2)\n\ testno Run one test by its number (require the -t option)\n\n"); printf(" Available tests for stream sockets:\n"); for (i = 0; i < TEST_STREAM_TBL_SIZE; ++i) printf(" %u: %s\n", i, test_stream_tbl[i].desc); printf("\n Available tests for datagram sockets:\n"); for (i = 0; i < TEST_DGRAM_TBL_SIZE; ++i) printf(" %u: %s\n", i, test_dgram_tbl[i].desc); } static void output(const char *format, ...) { char buf[LOGMSG_SIZE]; va_list ap; va_start(ap, format); if (vsnprintf(buf, sizeof(buf), format, ap) < 0) err(EXIT_FAILURE, "output: vsnprintf failed"); write(STDOUT_FILENO, buf, strlen(buf)); va_end(ap); } static void logmsg(const char *format, ...) { char buf[LOGMSG_SIZE]; va_list ap; int errno_save; errno_save = errno; va_start(ap, format); if (vsnprintf(buf, sizeof(buf), format, ap) < 0) err(EXIT_FAILURE, "logmsg: vsnprintf failed"); if (errno_save == 0) output("%s: %s\n", proc_name, buf); else output("%s: %s: %s\n", proc_name, buf, strerror(errno_save)); va_end(ap); errno = errno_save; } static void vlogmsgx(const char *format, va_list ap) { char buf[LOGMSG_SIZE]; if (vsnprintf(buf, sizeof(buf), format, ap) < 0) err(EXIT_FAILURE, "logmsgx: vsnprintf failed"); output("%s: %s\n", proc_name, buf); } static void logmsgx(const char *format, ...) { va_list ap; va_start(ap, format); vlogmsgx(format, ap); va_end(ap); } static void dbgmsg(const char *format, ...) { va_list ap; if (debug) { va_start(ap, format); vlogmsgx(format, ap); va_end(ap); } } static int run_tests(int type, u_int testno1) { const struct test_func *tf; u_int i, testno2, failed_num; sock_type = type; if (type == SOCK_STREAM) { sock_type_str = "SOCK_STREAM"; tf = test_stream_tbl; i = TEST_STREAM_TBL_SIZE - 1; } else { sock_type_str = "SOCK_DGRAM"; tf = test_dgram_tbl; i = TEST_DGRAM_TBL_SIZE - 1; } if (testno1 == 0) { testno1 = 1; testno2 = i; } else testno2 = testno1; output("Running tests for %s sockets:\n", sock_type_str); failed_num = 0; for (i = testno1, tf += testno1; i <= testno2; ++tf, ++i) { output(" %u: %s\n", i, tf->desc); switch (tf->func()) { case -1: ++failed_num; break; case -2: logmsgx("some system error or timeout occurred"); return (-1); } } if (failed_num != 0) failed_flag = true; if (testno1 != testno2) { if (failed_num == 0) output("-- all tests passed!\n"); else output("-- %u test%s failed!\n", failed_num, failed_num == 1 ? "" : "s"); } else { if (failed_num == 0) output("-- test passed!\n"); else output("-- test failed!\n"); } return (0); } static int init(void) { struct sigaction sigact; size_t idx; int rv; proc_name = "SERVER"; sigact.sa_handler = SIG_IGN; sigact.sa_flags = 0; sigemptyset(&sigact.sa_mask); if (sigaction(SIGPIPE, &sigact, (struct sigaction *)NULL) < 0) { logmsg("init: sigaction"); return (-1); } if (ipc_msg.buf_size == 0) ipc_msg.buf_send = ipc_msg.buf_recv = NULL; else { ipc_msg.buf_send = malloc(ipc_msg.buf_size); ipc_msg.buf_recv = malloc(ipc_msg.buf_size); if (ipc_msg.buf_send == NULL || ipc_msg.buf_recv == NULL) { logmsg("init: malloc"); return (-1); } for (idx = 0; idx < ipc_msg.buf_size; ++idx) ipc_msg.buf_send[idx] = (char)idx; } proc_cred.uid = getuid(); proc_cred.euid = geteuid(); proc_cred.gid = getgid(); proc_cred.egid = getegid(); proc_cred.gid_num = getgroups(0, (gid_t *)NULL); if (proc_cred.gid_num < 0) { logmsg("init: getgroups"); return (-1); } proc_cred.gid_arr = malloc(proc_cred.gid_num * sizeof(*proc_cred.gid_arr)); if (proc_cred.gid_arr == NULL) { logmsg("init: malloc"); return (-1); } if (getgroups(proc_cred.gid_num, proc_cred.gid_arr) < 0) { logmsg("init: getgroups"); return (-1); } memset(&serv_addr_sun, 0, sizeof(serv_addr_sun)); rv = snprintf(serv_addr_sun.sun_path, sizeof(serv_addr_sun.sun_path), "%s/%s", work_dir, proc_name); if (rv < 0) { logmsg("init: snprintf"); return (-1); } if ((size_t)rv >= sizeof(serv_addr_sun.sun_path)) { logmsgx("init: not enough space for socket pathname"); return (-1); } serv_addr_sun.sun_family = PF_LOCAL; serv_addr_sun.sun_len = SUN_LEN(&serv_addr_sun); return (0); } static int client_fork(void) { int fd1, fd2; if (pipe(sync_fd[SYNC_SERVER]) < 0 || pipe(sync_fd[SYNC_CLIENT]) < 0) { logmsg("client_fork: pipe"); return (-1); } client_pid = fork(); if (client_pid == (pid_t)-1) { logmsg("client_fork: fork"); return (-1); } if (client_pid == 0) { proc_name = "CLIENT"; server_flag = false; fd1 = sync_fd[SYNC_SERVER][SYNC_RECV]; fd2 = sync_fd[SYNC_CLIENT][SYNC_SEND]; } else { fd1 = sync_fd[SYNC_SERVER][SYNC_SEND]; fd2 = sync_fd[SYNC_CLIENT][SYNC_RECV]; } if (close(fd1) < 0 || close(fd2) < 0) { logmsg("client_fork: close"); return (-1); } return (client_pid != 0); } static void client_exit(int rv) { if (close(sync_fd[SYNC_SERVER][SYNC_SEND]) < 0 || close(sync_fd[SYNC_CLIENT][SYNC_RECV]) < 0) { logmsg("client_exit: close"); rv = -1; } rv = rv == 0 ? EXIT_SUCCESS : -rv; dbgmsg("exit: code %d", rv); _exit(rv); } static int client_wait(void) { int status; pid_t pid; dbgmsg("waiting for client"); if (close(sync_fd[SYNC_SERVER][SYNC_RECV]) < 0 || close(sync_fd[SYNC_CLIENT][SYNC_SEND]) < 0) { logmsg("client_wait: close"); return (-1); } pid = waitpid(client_pid, &status, 0); if (pid == (pid_t)-1) { logmsg("client_wait: waitpid"); return (-1); } if (WIFEXITED(status)) { if (WEXITSTATUS(status) != EXIT_SUCCESS) { logmsgx("client exit status is %d", WEXITSTATUS(status)); return (-WEXITSTATUS(status)); } } else { if (WIFSIGNALED(status)) logmsgx("abnormal termination of client, signal %d%s", WTERMSIG(status), WCOREDUMP(status) ? " (core file generated)" : ""); else logmsgx("termination of client, unknown status"); return (-1); } return (0); } int main(int argc, char *argv[]) { const char *errstr; u_int testno, zvalue; int opt, rv; bool dgram_flag, stream_flag; ipc_msg.buf_size = IPC_MSG_SIZE_DEF; ipc_msg.msg_num = IPC_MSG_NUM_DEF; dgram_flag = stream_flag = false; while ((opt = getopt(argc, argv, "dhn:s:t:z:")) != -1) switch (opt) { case 'd': debug = true; break; case 'h': usage(true); return (EXIT_SUCCESS); case 'n': ipc_msg.msg_num = strtonum(optarg, 1, IPC_MSG_NUM_MAX, &errstr); if (errstr != NULL) errx(EXIT_FAILURE, "option -n: number is %s", errstr); break; case 's': ipc_msg.buf_size = strtonum(optarg, 0, IPC_MSG_SIZE_MAX, &errstr); if (errstr != NULL) errx(EXIT_FAILURE, "option -s: number is %s", errstr); break; case 't': if (strcmp(optarg, "stream") == 0) stream_flag = true; else if (strcmp(optarg, "dgram") == 0) dgram_flag = true; else errx(EXIT_FAILURE, "option -t: " "wrong socket type"); break; case 'z': zvalue = strtonum(optarg, 0, 3, &errstr); if (errstr != NULL) errx(EXIT_FAILURE, "option -z: number is %s", errstr); if (zvalue & 0x1) send_data_flag = false; if (zvalue & 0x2) send_array_flag = false; break; default: usage(false); return (EXIT_FAILURE); } if (optind < argc) { if (optind + 1 != argc) errx(EXIT_FAILURE, "too many arguments"); testno = strtonum(argv[optind], 0, UINT_MAX, &errstr); if (errstr != NULL) errx(EXIT_FAILURE, "test number is %s", errstr); if (stream_flag && testno >= TEST_STREAM_TBL_SIZE) errx(EXIT_FAILURE, "given test %u for stream " "sockets does not exist", testno); if (dgram_flag && testno >= TEST_DGRAM_TBL_SIZE) errx(EXIT_FAILURE, "given test %u for datagram " "sockets does not exist", testno); } else testno = 0; if (!dgram_flag && !stream_flag) { if (testno != 0) errx(EXIT_FAILURE, "particular test number " "can be used with the -t option only"); dgram_flag = stream_flag = true; } if (mkdtemp(work_dir) == NULL) err(EXIT_FAILURE, "mkdtemp(%s)", work_dir); rv = EXIT_FAILURE; if (init() < 0) goto done; if (stream_flag) if (run_tests(SOCK_STREAM, testno) < 0) goto done; if (dgram_flag) if (run_tests(SOCK_DGRAM, testno) < 0) goto done; rv = EXIT_SUCCESS; done: if (rmdir(work_dir) < 0) { logmsg("rmdir(%s)", work_dir); rv = EXIT_FAILURE; } return (failed_flag ? EXIT_FAILURE : rv); } static int socket_close(int fd) { int rv; rv = 0; if (close(fd) < 0) { logmsg("socket_close: close"); rv = -1; } if (server_flag && fd == serv_sock_fd) if (unlink(serv_addr_sun.sun_path) < 0) { logmsg("socket_close: unlink(%s)", serv_addr_sun.sun_path); rv = -1; } return (rv); } static int socket_create(void) { struct timeval tv; int fd; fd = socket(PF_LOCAL, sock_type, 0); if (fd < 0) { logmsg("socket_create: socket(PF_LOCAL, %s, 0)", sock_type_str); return (-1); } if (server_flag) serv_sock_fd = fd; tv.tv_sec = TIMEOUT; tv.tv_usec = 0; if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)) < 0 || setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv)) < 0) { logmsg("socket_create: setsockopt(SO_RCVTIMEO/SO_SNDTIMEO)"); goto failed; } if (server_flag) { if (bind(fd, (struct sockaddr *)&serv_addr_sun, serv_addr_sun.sun_len) < 0) { logmsg("socket_create: bind(%s)", serv_addr_sun.sun_path); goto failed; } if (sock_type == SOCK_STREAM) { int val; if (listen(fd, LISTENQ) < 0) { logmsg("socket_create: listen"); goto failed; } val = fcntl(fd, F_GETFL, 0); if (val < 0) { logmsg("socket_create: fcntl(F_GETFL)"); goto failed; } if (fcntl(fd, F_SETFL, val | O_NONBLOCK) < 0) { logmsg("socket_create: fcntl(F_SETFL)"); goto failed; } } } return (fd); failed: if (close(fd) < 0) logmsg("socket_create: close"); if (server_flag) if (unlink(serv_addr_sun.sun_path) < 0) logmsg("socket_close: unlink(%s)", serv_addr_sun.sun_path); return (-1); } static int socket_connect(int fd) { dbgmsg("connect"); if (connect(fd, (struct sockaddr *)&serv_addr_sun, serv_addr_sun.sun_len) < 0) { logmsg("socket_connect: connect(%s)", serv_addr_sun.sun_path); return (-1); } return (0); } static int sync_recv(void) { ssize_t ssize; int fd; char buf; dbgmsg("sync: wait"); fd = sync_fd[server_flag ? SYNC_SERVER : SYNC_CLIENT][SYNC_RECV]; ssize = read(fd, &buf, 1); if (ssize < 0) { logmsg("sync_recv: read"); return (-1); } if (ssize < 1) { logmsgx("sync_recv: read %zd of 1 byte", ssize); return (-1); } dbgmsg("sync: received"); return (0); } static int sync_send(void) { ssize_t ssize; int fd; dbgmsg("sync: send"); fd = sync_fd[server_flag ? SYNC_CLIENT : SYNC_SERVER][SYNC_SEND]; ssize = write(fd, "", 1); if (ssize < 0) { logmsg("sync_send: write"); return (-1); } if (ssize < 1) { logmsgx("sync_send: sent %zd of 1 byte", ssize); return (-1); } return (0); } static int message_send(int fd, const struct msghdr *msghdr) { const struct cmsghdr *cmsghdr; size_t size; ssize_t ssize; size = msghdr->msg_iov != 0 ? msghdr->msg_iov->iov_len : 0; dbgmsg("send: data size %zu", size); dbgmsg("send: msghdr.msg_controllen %u", (u_int)msghdr->msg_controllen); cmsghdr = CMSG_FIRSTHDR(msghdr); if (cmsghdr != NULL) dbgmsg("send: cmsghdr.cmsg_len %u", (u_int)cmsghdr->cmsg_len); ssize = sendmsg(fd, msghdr, 0); if (ssize < 0) { logmsg("message_send: sendmsg"); return (-1); } if ((size_t)ssize != size) { logmsgx("message_send: sendmsg: sent %zd of %zu bytes", ssize, size); return (-1); } if (!send_data_flag) if (sync_send() < 0) return (-1); return (0); } static int message_sendn(int fd, struct msghdr *msghdr) { u_int i; for (i = 1; i <= ipc_msg.msg_num; ++i) { dbgmsg("message #%u", i); if (message_send(fd, msghdr) < 0) return (-1); } return (0); } static int message_recv(int fd, struct msghdr *msghdr) { const struct cmsghdr *cmsghdr; size_t size; ssize_t ssize; if (!send_data_flag) if (sync_recv() < 0) return (-1); size = msghdr->msg_iov != NULL ? msghdr->msg_iov->iov_len : 0; ssize = recvmsg(fd, msghdr, MSG_WAITALL); if (ssize < 0) { logmsg("message_recv: recvmsg"); return (-1); } if ((size_t)ssize != size) { logmsgx("message_recv: recvmsg: received %zd of %zu bytes", ssize, size); return (-1); } dbgmsg("recv: data size %zd", ssize); dbgmsg("recv: msghdr.msg_controllen %u", (u_int)msghdr->msg_controllen); cmsghdr = CMSG_FIRSTHDR(msghdr); if (cmsghdr != NULL) dbgmsg("recv: cmsghdr.cmsg_len %u", (u_int)cmsghdr->cmsg_len); if (memcmp(ipc_msg.buf_recv, ipc_msg.buf_send, size) != 0) { logmsgx("message_recv: received message has wrong content"); return (-1); } return (0); } static int socket_accept(int listenfd) { fd_set rset; struct timeval tv; int fd, rv, val; dbgmsg("accept"); FD_ZERO(&rset); FD_SET(listenfd, &rset); tv.tv_sec = TIMEOUT; tv.tv_usec = 0; rv = select(listenfd + 1, &rset, (fd_set *)NULL, (fd_set *)NULL, &tv); if (rv < 0) { logmsg("socket_accept: select"); return (-1); } if (rv == 0) { logmsgx("socket_accept: select timeout"); return (-1); } fd = accept(listenfd, (struct sockaddr *)NULL, (socklen_t *)NULL); if (fd < 0) { logmsg("socket_accept: accept"); return (-1); } val = fcntl(fd, F_GETFL, 0); if (val < 0) { logmsg("socket_accept: fcntl(F_GETFL)"); goto failed; } if (fcntl(fd, F_SETFL, val & ~O_NONBLOCK) < 0) { logmsg("socket_accept: fcntl(F_SETFL)"); goto failed; } return (fd); failed: if (close(fd) < 0) logmsg("socket_accept: close"); return (-1); } static int check_msghdr(const struct msghdr *msghdr, size_t size) { if (msghdr->msg_flags & MSG_TRUNC) { logmsgx("msghdr.msg_flags has MSG_TRUNC"); return (-1); } if (msghdr->msg_flags & MSG_CTRUNC) { logmsgx("msghdr.msg_flags has MSG_CTRUNC"); return (-1); } if (msghdr->msg_controllen < size) { logmsgx("msghdr.msg_controllen %u < %zu", (u_int)msghdr->msg_controllen, size); return (-1); } if (msghdr->msg_controllen > 0 && size == 0) { logmsgx("msghdr.msg_controllen %u > 0", (u_int)msghdr->msg_controllen); return (-1); } return (0); } static int check_cmsghdr(const struct cmsghdr *cmsghdr, int type, size_t size) { if (cmsghdr == NULL) { logmsgx("cmsghdr is NULL"); return (-1); } if (cmsghdr->cmsg_level != SOL_SOCKET) { logmsgx("cmsghdr.cmsg_level %d != SOL_SOCKET", cmsghdr->cmsg_level); return (-1); } if (cmsghdr->cmsg_type != type) { logmsgx("cmsghdr.cmsg_type %d != %d", cmsghdr->cmsg_type, type); return (-1); } if (cmsghdr->cmsg_len != CMSG_LEN(size)) { logmsgx("cmsghdr.cmsg_len %u != %zu", (u_int)cmsghdr->cmsg_len, CMSG_LEN(size)); return (-1); } return (0); } static int check_groups(const char *gid_arr_str, const gid_t *gid_arr, const char *gid_num_str, int gid_num, bool all_gids) { int i; for (i = 0; i < gid_num; ++i) dbgmsg("%s[%d] %lu", gid_arr_str, i, (u_long)gid_arr[i]); if (all_gids) { if (gid_num != proc_cred.gid_num) { logmsgx("%s %d != %d", gid_num_str, gid_num, proc_cred.gid_num); return (-1); } } else { if (gid_num > proc_cred.gid_num) { logmsgx("%s %d > %d", gid_num_str, gid_num, proc_cred.gid_num); return (-1); } } if (memcmp(gid_arr, proc_cred.gid_arr, gid_num * sizeof(*gid_arr)) != 0) { logmsgx("%s content is wrong", gid_arr_str); for (i = 0; i < gid_num; ++i) if (gid_arr[i] != proc_cred.gid_arr[i]) { logmsgx("%s[%d] %lu != %lu", gid_arr_str, i, (u_long)gid_arr[i], (u_long)proc_cred.gid_arr[i]); break; } return (-1); } return (0); } static int check_xucred(const struct xucred *xucred, socklen_t len) { + int rc; + if (len != sizeof(*xucred)) { logmsgx("option value size %zu != %zu", (size_t)len, sizeof(*xucred)); return (-1); } dbgmsg("xucred.cr_version %u", xucred->cr_version); dbgmsg("xucred.cr_uid %lu", (u_long)xucred->cr_uid); dbgmsg("xucred.cr_ngroups %d", xucred->cr_ngroups); + rc = 0; + if (xucred->cr_version != XUCRED_VERSION) { logmsgx("xucred.cr_version %u != %d", xucred->cr_version, XUCRED_VERSION); - return (-1); + rc = -1; } if (xucred->cr_uid != proc_cred.euid) { logmsgx("xucred.cr_uid %lu != %lu (EUID)", (u_long)xucred->cr_uid, (u_long)proc_cred.euid); - return (-1); + rc = -1; } if (xucred->cr_ngroups == 0) { logmsgx("xucred.cr_ngroups == 0"); - return (-1); + rc = -1; } if (xucred->cr_ngroups < 0) { logmsgx("xucred.cr_ngroups < 0"); - return (-1); + rc = -1; } if (xucred->cr_ngroups > XU_NGROUPS) { logmsgx("xucred.cr_ngroups %hu > %u (max)", xucred->cr_ngroups, XU_NGROUPS); - return (-1); + rc = -1; } if (xucred->cr_groups[0] != proc_cred.egid) { logmsgx("xucred.cr_groups[0] %lu != %lu (EGID)", (u_long)xucred->cr_groups[0], (u_long)proc_cred.egid); - return (-1); + rc = -1; } if (check_groups("xucred.cr_groups", xucred->cr_groups, "xucred.cr_ngroups", xucred->cr_ngroups, false) < 0) - return (-1); - return (0); + rc = -1; + return (rc); } static int check_scm_creds_cmsgcred(struct cmsghdr *cmsghdr) { - const struct cmsgcred *cmsgcred; + const struct cmsgcred *cmcred; + int rc; - if (check_cmsghdr(cmsghdr, SCM_CREDS, sizeof(*cmsgcred)) < 0) + if (check_cmsghdr(cmsghdr, SCM_CREDS, sizeof(struct cmsgcred)) < 0) return (-1); - cmsgcred = (struct cmsgcred *)CMSG_DATA(cmsghdr); + cmcred = (struct cmsgcred *)CMSG_DATA(cmsghdr); - dbgmsg("cmsgcred.cmcred_pid %ld", (long)cmsgcred->cmcred_pid); - dbgmsg("cmsgcred.cmcred_uid %lu", (u_long)cmsgcred->cmcred_uid); - dbgmsg("cmsgcred.cmcred_euid %lu", (u_long)cmsgcred->cmcred_euid); - dbgmsg("cmsgcred.cmcred_gid %lu", (u_long)cmsgcred->cmcred_gid); - dbgmsg("cmsgcred.cmcred_ngroups %d", cmsgcred->cmcred_ngroups); + dbgmsg("cmsgcred.cmcred_pid %ld", (long)cmcred->cmcred_pid); + dbgmsg("cmsgcred.cmcred_uid %lu", (u_long)cmcred->cmcred_uid); + dbgmsg("cmsgcred.cmcred_euid %lu", (u_long)cmcred->cmcred_euid); + dbgmsg("cmsgcred.cmcred_gid %lu", (u_long)cmcred->cmcred_gid); + dbgmsg("cmsgcred.cmcred_ngroups %d", cmcred->cmcred_ngroups); - if (cmsgcred->cmcred_pid != client_pid) { + rc = 0; + + if (cmcred->cmcred_pid != client_pid) { logmsgx("cmsgcred.cmcred_pid %ld != %ld", - (long)cmsgcred->cmcred_pid, (long)client_pid); - return (-1); + (long)cmcred->cmcred_pid, (long)client_pid); + rc = -1; } - if (cmsgcred->cmcred_uid != proc_cred.uid) { + if (cmcred->cmcred_uid != proc_cred.uid) { logmsgx("cmsgcred.cmcred_uid %lu != %lu", - (u_long)cmsgcred->cmcred_uid, (u_long)proc_cred.uid); - return (-1); + (u_long)cmcred->cmcred_uid, (u_long)proc_cred.uid); + rc = -1; } - if (cmsgcred->cmcred_euid != proc_cred.euid) { + if (cmcred->cmcred_euid != proc_cred.euid) { logmsgx("cmsgcred.cmcred_euid %lu != %lu", - (u_long)cmsgcred->cmcred_euid, (u_long)proc_cred.euid); - return (-1); + (u_long)cmcred->cmcred_euid, (u_long)proc_cred.euid); + rc = -1; } - if (cmsgcred->cmcred_gid != proc_cred.gid) { + if (cmcred->cmcred_gid != proc_cred.gid) { logmsgx("cmsgcred.cmcred_gid %lu != %lu", - (u_long)cmsgcred->cmcred_gid, (u_long)proc_cred.gid); - return (-1); + (u_long)cmcred->cmcred_gid, (u_long)proc_cred.gid); + rc = -1; } - if (cmsgcred->cmcred_ngroups == 0) { + if (cmcred->cmcred_ngroups == 0) { logmsgx("cmsgcred.cmcred_ngroups == 0"); - return (-1); + rc = -1; } - if (cmsgcred->cmcred_ngroups < 0) { + if (cmcred->cmcred_ngroups < 0) { logmsgx("cmsgcred.cmcred_ngroups %d < 0", - cmsgcred->cmcred_ngroups); - return (-1); + cmcred->cmcred_ngroups); + rc = -1; } - if (cmsgcred->cmcred_ngroups > CMGROUP_MAX) { + if (cmcred->cmcred_ngroups > CMGROUP_MAX) { logmsgx("cmsgcred.cmcred_ngroups %d > %d", - cmsgcred->cmcred_ngroups, CMGROUP_MAX); - return (-1); + cmcred->cmcred_ngroups, CMGROUP_MAX); + rc = -1; } - if (cmsgcred->cmcred_groups[0] != proc_cred.egid) { + if (cmcred->cmcred_groups[0] != proc_cred.egid) { logmsgx("cmsgcred.cmcred_groups[0] %lu != %lu (EGID)", - (u_long)cmsgcred->cmcred_groups[0], (u_long)proc_cred.egid); - return (-1); + (u_long)cmcred->cmcred_groups[0], (u_long)proc_cred.egid); + rc = -1; } - if (check_groups("cmsgcred.cmcred_groups", cmsgcred->cmcred_groups, - "cmsgcred.cmcred_ngroups", cmsgcred->cmcred_ngroups, false) < 0) - return (-1); - return (0); + if (check_groups("cmsgcred.cmcred_groups", cmcred->cmcred_groups, + "cmsgcred.cmcred_ngroups", cmcred->cmcred_ngroups, false) < 0) + rc = -1; + return (rc); } static int check_scm_creds_sockcred(struct cmsghdr *cmsghdr) { - const struct sockcred *sockcred; + const struct sockcred *sc; + int rc; if (check_cmsghdr(cmsghdr, SCM_CREDS, SOCKCREDSIZE(proc_cred.gid_num)) < 0) return (-1); - sockcred = (struct sockcred *)CMSG_DATA(cmsghdr); + sc = (struct sockcred *)CMSG_DATA(cmsghdr); - dbgmsg("sockcred.sc_uid %lu", (u_long)sockcred->sc_uid); - dbgmsg("sockcred.sc_euid %lu", (u_long)sockcred->sc_euid); - dbgmsg("sockcred.sc_gid %lu", (u_long)sockcred->sc_gid); - dbgmsg("sockcred.sc_egid %lu", (u_long)sockcred->sc_egid); - dbgmsg("sockcred.sc_ngroups %d", sockcred->sc_ngroups); + rc = 0; - if (sockcred->sc_uid != proc_cred.uid) { + dbgmsg("sockcred.sc_uid %lu", (u_long)sc->sc_uid); + dbgmsg("sockcred.sc_euid %lu", (u_long)sc->sc_euid); + dbgmsg("sockcred.sc_gid %lu", (u_long)sc->sc_gid); + dbgmsg("sockcred.sc_egid %lu", (u_long)sc->sc_egid); + dbgmsg("sockcred.sc_ngroups %d", sc->sc_ngroups); + + if (sc->sc_uid != proc_cred.uid) { logmsgx("sockcred.sc_uid %lu != %lu", - (u_long)sockcred->sc_uid, (u_long)proc_cred.uid); - return (-1); + (u_long)sc->sc_uid, (u_long)proc_cred.uid); + rc = -1; } - if (sockcred->sc_euid != proc_cred.euid) { + if (sc->sc_euid != proc_cred.euid) { logmsgx("sockcred.sc_euid %lu != %lu", - (u_long)sockcred->sc_euid, (u_long)proc_cred.euid); - return (-1); + (u_long)sc->sc_euid, (u_long)proc_cred.euid); + rc = -1; } - if (sockcred->sc_gid != proc_cred.gid) { + if (sc->sc_gid != proc_cred.gid) { logmsgx("sockcred.sc_gid %lu != %lu", - (u_long)sockcred->sc_gid, (u_long)proc_cred.gid); - return (-1); + (u_long)sc->sc_gid, (u_long)proc_cred.gid); + rc = -1; } - if (sockcred->sc_egid != proc_cred.egid) { + if (sc->sc_egid != proc_cred.egid) { logmsgx("sockcred.sc_egid %lu != %lu", - (u_long)sockcred->sc_egid, (u_long)proc_cred.egid); - return (-1); + (u_long)sc->sc_egid, (u_long)proc_cred.egid); + rc = -1; } - if (sockcred->sc_ngroups == 0) { + if (sc->sc_ngroups == 0) { logmsgx("sockcred.sc_ngroups == 0"); - return (-1); + rc = -1; } - if (sockcred->sc_ngroups < 0) { + if (sc->sc_ngroups < 0) { logmsgx("sockcred.sc_ngroups %d < 0", - sockcred->sc_ngroups); - return (-1); + sc->sc_ngroups); + rc = -1; } - if (sockcred->sc_ngroups != proc_cred.gid_num) { + if (sc->sc_ngroups != proc_cred.gid_num) { logmsgx("sockcred.sc_ngroups %d != %u", - sockcred->sc_ngroups, proc_cred.gid_num); - return (-1); + sc->sc_ngroups, proc_cred.gid_num); + rc = -1; } - if (check_groups("sockcred.sc_groups", sockcred->sc_groups, - "sockcred.sc_ngroups", sockcred->sc_ngroups, true) < 0) - return (-1); - return (0); + if (check_groups("sockcred.sc_groups", sc->sc_groups, + "sockcred.sc_ngroups", sc->sc_ngroups, true) < 0) + rc = -1; + return (rc); } static int check_scm_timestamp(struct cmsghdr *cmsghdr) { - const struct timeval *timeval; + const struct timeval *tv; if (check_cmsghdr(cmsghdr, SCM_TIMESTAMP, sizeof(struct timeval)) < 0) return (-1); - timeval = (struct timeval *)CMSG_DATA(cmsghdr); + tv = (struct timeval *)CMSG_DATA(cmsghdr); dbgmsg("timeval.tv_sec %"PRIdMAX", timeval.tv_usec %"PRIdMAX, - (intmax_t)timeval->tv_sec, (intmax_t)timeval->tv_usec); + (intmax_t)tv->tv_sec, (intmax_t)tv->tv_usec); return (0); } static int check_scm_bintime(struct cmsghdr *cmsghdr) { - const struct bintime *bintime; + const struct bintime *bt; if (check_cmsghdr(cmsghdr, SCM_BINTIME, sizeof(struct bintime)) < 0) return (-1); - bintime = (struct bintime *)CMSG_DATA(cmsghdr); + bt = (struct bintime *)CMSG_DATA(cmsghdr); dbgmsg("bintime.sec %"PRIdMAX", bintime.frac %"PRIu64, - (intmax_t)bintime->sec, bintime->frac); + (intmax_t)bt->sec, bt->frac); return (0); } static void msghdr_init_generic(struct msghdr *msghdr, struct iovec *iov, void *cmsg_data) { msghdr->msg_name = NULL; msghdr->msg_namelen = 0; if (send_data_flag) { iov->iov_base = server_flag ? ipc_msg.buf_recv : ipc_msg.buf_send; iov->iov_len = ipc_msg.buf_size; msghdr->msg_iov = iov; msghdr->msg_iovlen = 1; } else { msghdr->msg_iov = NULL; msghdr->msg_iovlen = 0; } msghdr->msg_control = cmsg_data; msghdr->msg_flags = 0; } static void msghdr_init_server(struct msghdr *msghdr, struct iovec *iov, void *cmsg_data, size_t cmsg_size) { msghdr_init_generic(msghdr, iov, cmsg_data); msghdr->msg_controllen = cmsg_size; dbgmsg("init: data size %zu", msghdr->msg_iov != NULL ? msghdr->msg_iov->iov_len : (size_t)0); dbgmsg("init: msghdr.msg_controllen %u", (u_int)msghdr->msg_controllen); } static void msghdr_init_client(struct msghdr *msghdr, struct iovec *iov, void *cmsg_data, size_t cmsg_size, int type, size_t arr_size) { struct cmsghdr *cmsghdr; msghdr_init_generic(msghdr, iov, cmsg_data); if (cmsg_data != NULL) { + if (send_array_flag) + dbgmsg("sending an array"); + else + dbgmsg("sending a scalar"); msghdr->msg_controllen = send_array_flag ? cmsg_size : CMSG_SPACE(0); cmsghdr = CMSG_FIRSTHDR(msghdr); cmsghdr->cmsg_level = SOL_SOCKET; cmsghdr->cmsg_type = type; cmsghdr->cmsg_len = CMSG_LEN(send_array_flag ? arr_size : 0); } else msghdr->msg_controllen = 0; } static int t_generic(int (*client_func)(int), int (*server_func)(int)) { int fd, rv, rv_client; switch (client_fork()) { case 0: fd = socket_create(); if (fd < 0) rv = -2; else { rv = client_func(fd); if (socket_close(fd) < 0) rv = -2; } client_exit(rv); break; case 1: fd = socket_create(); if (fd < 0) rv = -2; else { rv = server_func(fd); rv_client = client_wait(); if (rv == 0 || (rv == -2 && rv_client != 0)) rv = rv_client; if (socket_close(fd) < 0) rv = -2; } break; default: rv = -2; } return (rv); } static int t_cmsgcred_client(int fd) { struct msghdr msghdr; struct iovec iov[1]; void *cmsg_data; size_t cmsg_size; int rv; if (sync_recv() < 0) return (-2); rv = -2; cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); cmsg_data = malloc(cmsg_size); if (cmsg_data == NULL) { logmsg("malloc"); goto done; } msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, SCM_CREDS, sizeof(struct cmsgcred)); if (socket_connect(fd) < 0) goto done; if (message_sendn(fd, &msghdr) < 0) goto done; rv = 0; done: free(cmsg_data); return (rv); } static int t_cmsgcred_server(int fd1) { struct msghdr msghdr; struct iovec iov[1]; struct cmsghdr *cmsghdr; void *cmsg_data; size_t cmsg_size; u_int i; int fd2, rv; if (sync_send() < 0) return (-2); fd2 = -1; rv = -2; cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); cmsg_data = malloc(cmsg_size); if (cmsg_data == NULL) { logmsg("malloc"); goto done; } if (sock_type == SOCK_STREAM) { fd2 = socket_accept(fd1); if (fd2 < 0) goto done; } else fd2 = fd1; rv = -1; for (i = 1; i <= ipc_msg.msg_num; ++i) { dbgmsg("message #%u", i); msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); if (message_recv(fd2, &msghdr) < 0) { rv = -2; break; } if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) break; cmsghdr = CMSG_FIRSTHDR(&msghdr); if (check_scm_creds_cmsgcred(cmsghdr) < 0) break; } if (i > ipc_msg.msg_num) rv = 0; done: free(cmsg_data); if (sock_type == SOCK_STREAM && fd2 >= 0) if (socket_close(fd2) < 0) rv = -2; return (rv); } static int t_cmsgcred(void) { return (t_generic(t_cmsgcred_client, t_cmsgcred_server)); } static int t_sockcred_client(int type, int fd) { struct msghdr msghdr; struct iovec iov[1]; int rv; if (sync_recv() < 0) return (-2); rv = -2; msghdr_init_client(&msghdr, iov, NULL, 0, 0, 0); if (socket_connect(fd) < 0) goto done; if (type == 2) if (sync_recv() < 0) goto done; if (message_sendn(fd, &msghdr) < 0) goto done; rv = 0; done: return (rv); } static int t_sockcred_server(int type, int fd1) { struct msghdr msghdr; struct iovec iov[1]; struct cmsghdr *cmsghdr; void *cmsg_data; size_t cmsg_size; u_int i; int fd2, rv, val; fd2 = -1; rv = -2; cmsg_size = CMSG_SPACE(SOCKCREDSIZE(proc_cred.gid_num)); cmsg_data = malloc(cmsg_size); if (cmsg_data == NULL) { logmsg("malloc"); goto done; } if (type == 1) { dbgmsg("setting LOCAL_CREDS"); val = 1; if (setsockopt(fd1, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { logmsg("setsockopt(LOCAL_CREDS)"); goto done; } } if (sync_send() < 0) goto done; if (sock_type == SOCK_STREAM) { fd2 = socket_accept(fd1); if (fd2 < 0) goto done; } else fd2 = fd1; if (type == 2) { dbgmsg("setting LOCAL_CREDS"); val = 1; if (setsockopt(fd2, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { logmsg("setsockopt(LOCAL_CREDS)"); goto done; } if (sync_send() < 0) goto done; } rv = -1; for (i = 1; i <= ipc_msg.msg_num; ++i) { dbgmsg("message #%u", i); msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); if (message_recv(fd2, &msghdr) < 0) { rv = -2; break; } if (i > 1 && sock_type == SOCK_STREAM) { if (check_msghdr(&msghdr, 0) < 0) break; } else { if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) break; cmsghdr = CMSG_FIRSTHDR(&msghdr); if (check_scm_creds_sockcred(cmsghdr) < 0) break; } } if (i > ipc_msg.msg_num) rv = 0; done: free(cmsg_data); if (sock_type == SOCK_STREAM && fd2 >= 0) if (socket_close(fd2) < 0) rv = -2; return (rv); } static int t_sockcred_1(void) { u_int i; int fd, rv, rv_client; switch (client_fork()) { case 0: for (i = 1; i <= 2; ++i) { dbgmsg("client #%u", i); fd = socket_create(); if (fd < 0) rv = -2; else { rv = t_sockcred_client(1, fd); if (socket_close(fd) < 0) rv = -2; } if (rv != 0) break; } client_exit(rv); break; case 1: fd = socket_create(); if (fd < 0) rv = -2; else { rv = t_sockcred_server(1, fd); if (rv == 0) rv = t_sockcred_server(3, fd); rv_client = client_wait(); if (rv == 0 || (rv == -2 && rv_client != 0)) rv = rv_client; if (socket_close(fd) < 0) rv = -2; } break; default: rv = -2; } return (rv); } static int t_sockcred_2_client(int fd) { return (t_sockcred_client(2, fd)); } static int t_sockcred_2_server(int fd) { return (t_sockcred_server(2, fd)); } static int t_sockcred_2(void) { return (t_generic(t_sockcred_2_client, t_sockcred_2_server)); } static int t_cmsgcred_sockcred_server(int fd1) { struct msghdr msghdr; struct iovec iov[1]; struct cmsghdr *cmsghdr; void *cmsg_data, *cmsg1_data, *cmsg2_data; size_t cmsg_size, cmsg1_size, cmsg2_size; u_int i; int fd2, rv, val; fd2 = -1; rv = -2; cmsg1_size = CMSG_SPACE(SOCKCREDSIZE(proc_cred.gid_num)); cmsg2_size = CMSG_SPACE(sizeof(struct cmsgcred)); cmsg1_data = malloc(cmsg1_size); cmsg2_data = malloc(cmsg2_size); if (cmsg1_data == NULL || cmsg2_data == NULL) { logmsg("malloc"); goto done; } dbgmsg("setting LOCAL_CREDS"); val = 1; if (setsockopt(fd1, 0, LOCAL_CREDS, &val, sizeof(val)) < 0) { logmsg("setsockopt(LOCAL_CREDS)"); goto done; } if (sync_send() < 0) goto done; if (sock_type == SOCK_STREAM) { fd2 = socket_accept(fd1); if (fd2 < 0) goto done; } else fd2 = fd1; cmsg_data = cmsg1_data; cmsg_size = cmsg1_size; rv = -1; for (i = 1; i <= ipc_msg.msg_num; ++i) { dbgmsg("message #%u", i); msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); if (message_recv(fd2, &msghdr) < 0) { rv = -2; break; } if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) break; cmsghdr = CMSG_FIRSTHDR(&msghdr); if (i == 1 || sock_type == SOCK_DGRAM) { if (check_scm_creds_sockcred(cmsghdr) < 0) break; } else { if (check_scm_creds_cmsgcred(cmsghdr) < 0) break; } cmsg_data = cmsg2_data; cmsg_size = cmsg2_size; } if (i > ipc_msg.msg_num) rv = 0; done: free(cmsg1_data); free(cmsg2_data); if (sock_type == SOCK_STREAM && fd2 >= 0) if (socket_close(fd2) < 0) rv = -2; return (rv); } static int t_cmsgcred_sockcred(void) { return (t_generic(t_cmsgcred_client, t_cmsgcred_sockcred_server)); } static int t_timeval_client(int fd) { struct msghdr msghdr; struct iovec iov[1]; void *cmsg_data; size_t cmsg_size; int rv; if (sync_recv() < 0) return (-2); rv = -2; cmsg_size = CMSG_SPACE(sizeof(struct timeval)); cmsg_data = malloc(cmsg_size); if (cmsg_data == NULL) { logmsg("malloc"); goto done; } msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, SCM_TIMESTAMP, sizeof(struct timeval)); if (socket_connect(fd) < 0) goto done; if (message_sendn(fd, &msghdr) < 0) goto done; rv = 0; done: free(cmsg_data); return (rv); } static int t_timeval_server(int fd1) { struct msghdr msghdr; struct iovec iov[1]; struct cmsghdr *cmsghdr; void *cmsg_data; size_t cmsg_size; u_int i; int fd2, rv; if (sync_send() < 0) return (-2); fd2 = -1; rv = -2; cmsg_size = CMSG_SPACE(sizeof(struct timeval)); cmsg_data = malloc(cmsg_size); if (cmsg_data == NULL) { logmsg("malloc"); goto done; } if (sock_type == SOCK_STREAM) { fd2 = socket_accept(fd1); if (fd2 < 0) goto done; } else fd2 = fd1; rv = -1; for (i = 1; i <= ipc_msg.msg_num; ++i) { dbgmsg("message #%u", i); msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); if (message_recv(fd2, &msghdr) < 0) { rv = -2; break; } if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) break; cmsghdr = CMSG_FIRSTHDR(&msghdr); if (check_scm_timestamp(cmsghdr) < 0) break; } if (i > ipc_msg.msg_num) rv = 0; done: free(cmsg_data); if (sock_type == SOCK_STREAM && fd2 >= 0) if (socket_close(fd2) < 0) rv = -2; return (rv); } static int t_timeval(void) { return (t_generic(t_timeval_client, t_timeval_server)); } static int t_bintime_client(int fd) { struct msghdr msghdr; struct iovec iov[1]; void *cmsg_data; size_t cmsg_size; int rv; if (sync_recv() < 0) return (-2); rv = -2; cmsg_size = CMSG_SPACE(sizeof(struct bintime)); cmsg_data = malloc(cmsg_size); if (cmsg_data == NULL) { logmsg("malloc"); goto done; } msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, SCM_BINTIME, sizeof(struct bintime)); if (socket_connect(fd) < 0) goto done; if (message_sendn(fd, &msghdr) < 0) goto done; rv = 0; done: free(cmsg_data); return (rv); } static int t_bintime_server(int fd1) { struct msghdr msghdr; struct iovec iov[1]; struct cmsghdr *cmsghdr; void *cmsg_data; size_t cmsg_size; u_int i; int fd2, rv; if (sync_send() < 0) return (-2); fd2 = -1; rv = -2; cmsg_size = CMSG_SPACE(sizeof(struct bintime)); cmsg_data = malloc(cmsg_size); if (cmsg_data == NULL) { logmsg("malloc"); goto done; } if (sock_type == SOCK_STREAM) { fd2 = socket_accept(fd1); if (fd2 < 0) goto done; } else fd2 = fd1; rv = -1; for (i = 1; i <= ipc_msg.msg_num; ++i) { dbgmsg("message #%u", i); msghdr_init_server(&msghdr, iov, cmsg_data, cmsg_size); if (message_recv(fd2, &msghdr) < 0) { rv = -2; break; } if (check_msghdr(&msghdr, sizeof(*cmsghdr)) < 0) break; cmsghdr = CMSG_FIRSTHDR(&msghdr); if (check_scm_bintime(cmsghdr) < 0) break; } if (i > ipc_msg.msg_num) rv = 0; done: free(cmsg_data); if (sock_type == SOCK_STREAM && fd2 >= 0) if (socket_close(fd2) < 0) rv = -2; return (rv); } static int t_bintime(void) { return (t_generic(t_bintime_client, t_bintime_server)); } +#ifndef __LP64__ static int t_cmsg_len_client(int fd) { struct msghdr msghdr; struct iovec iov[1]; struct cmsghdr *cmsghdr; void *cmsg_data; size_t size, cmsg_size; socklen_t socklen; int rv; if (sync_recv() < 0) return (-2); rv = -2; cmsg_size = CMSG_SPACE(sizeof(struct cmsgcred)); cmsg_data = malloc(cmsg_size); if (cmsg_data == NULL) { logmsg("malloc"); goto done; } msghdr_init_client(&msghdr, iov, cmsg_data, cmsg_size, SCM_CREDS, sizeof(struct cmsgcred)); cmsghdr = CMSG_FIRSTHDR(&msghdr); if (socket_connect(fd) < 0) goto done; size = msghdr.msg_iov != NULL ? msghdr.msg_iov->iov_len : 0; rv = -1; for (socklen = 0; socklen < CMSG_LEN(0); ++socklen) { cmsghdr->cmsg_len = socklen; dbgmsg("send: data size %zu", size); dbgmsg("send: msghdr.msg_controllen %u", (u_int)msghdr.msg_controllen); dbgmsg("send: cmsghdr.cmsg_len %u", (u_int)cmsghdr->cmsg_len); - if (sendmsg(fd, &msghdr, 0) < 0) + if (sendmsg(fd, &msghdr, 0) < 0) { + dbgmsg("sendmsg(2) failed: %s; retrying", + strerror(errno)); continue; + } logmsgx("sent message with cmsghdr.cmsg_len %u < %u", (u_int)cmsghdr->cmsg_len, (u_int)CMSG_LEN(0)); break; } if (socklen == CMSG_LEN(0)) rv = 0; if (sync_send() < 0) { rv = -2; goto done; } done: free(cmsg_data); return (rv); } static int t_cmsg_len_server(int fd1) { int fd2, rv; if (sync_send() < 0) return (-2); rv = -2; if (sock_type == SOCK_STREAM) { fd2 = socket_accept(fd1); if (fd2 < 0) goto done; } else fd2 = fd1; if (sync_recv() < 0) goto done; rv = 0; done: if (sock_type == SOCK_STREAM && fd2 >= 0) if (socket_close(fd2) < 0) rv = -2; return (rv); } static int t_cmsg_len(void) { return (t_generic(t_cmsg_len_client, t_cmsg_len_server)); } +#endif static int t_peercred_client(int fd) { struct xucred xucred; socklen_t len; if (sync_recv() < 0) return (-1); if (socket_connect(fd) < 0) return (-1); len = sizeof(xucred); if (getsockopt(fd, 0, LOCAL_PEERCRED, &xucred, &len) < 0) { logmsg("getsockopt(LOCAL_PEERCRED)"); return (-1); } if (check_xucred(&xucred, len) < 0) return (-1); return (0); } static int t_peercred_server(int fd1) { struct xucred xucred; socklen_t len; int fd2, rv; if (sync_send() < 0) return (-2); fd2 = socket_accept(fd1); if (fd2 < 0) return (-2); len = sizeof(xucred); if (getsockopt(fd2, 0, LOCAL_PEERCRED, &xucred, &len) < 0) { logmsg("getsockopt(LOCAL_PEERCRED)"); rv = -2; goto done; } if (check_xucred(&xucred, len) < 0) { rv = -1; goto done; } rv = 0; done: if (socket_close(fd2) < 0) rv = -2; return (rv); } static int t_peercred(void) { return (t_generic(t_peercred_client, t_peercred_server)); } Index: projects/clang380-import/tools/regression/sockets/zerosend/zerosend.c =================================================================== --- projects/clang380-import/tools/regression/sockets/zerosend/zerosend.c (revision 294776) +++ projects/clang380-import/tools/regression/sockets/zerosend/zerosend.c (revision 294777) @@ -1,290 +1,290 @@ /*- * Copyright (c) 2007 Robert N. M. Watson * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #include #include #include #include #include #include #include #include #include #include #include #include #define PORT1 10001 #define PORT2 10002 static void try_0send(const char *test, int fd) { ssize_t len; char ch; ch = 0; len = send(fd, &ch, 0, 0); if (len < 0) err(1, "%s: try_0send", test); if (len != 0) errx(1, "%s: try_0send: returned %zd", test, len); } static void try_0write(const char *test, int fd) { ssize_t len; char ch; ch = 0; len = write(fd, &ch, 0); if (len < 0) err(1, "%s: try_0write", test); if (len != 0) errx(1, "%s: try_0write: returned %zd", test, len); } static void -setup_udp(const char *test, int *fdp) +setup_udp(const char *test, int *fdp, int port1, int port2) { struct sockaddr_in sin; int sock1, sock2; bzero(&sin, sizeof(sin)); sin.sin_len = sizeof(sin); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK); - sin.sin_port = htons(PORT1); + sin.sin_port = htons(port1); sock1 = socket(PF_INET, SOCK_DGRAM, 0); if (sock1 < 0) err(1, "%s: setup_udp: socket", test); if (bind(sock1, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(1, "%s: setup_udp: bind(%s, %d)", test, inet_ntoa(sin.sin_addr), PORT1); - sin.sin_port = htons(PORT2); + sin.sin_port = htons(port2); if (connect(sock1, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(1, "%s: setup_udp: connect(%s, %d)", test, inet_ntoa(sin.sin_addr), PORT2); sock2 = socket(PF_INET, SOCK_DGRAM, 0); if (sock2 < 0) err(1, "%s: setup_udp: socket", test); if (bind(sock2, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(1, "%s: setup_udp: bind(%s, %d)", test, inet_ntoa(sin.sin_addr), PORT2); - sin.sin_port = htons(PORT1); + sin.sin_port = htons(port1); if (connect(sock2, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(1, "%s: setup_udp: connect(%s, %d)", test, inet_ntoa(sin.sin_addr), PORT1); fdp[0] = sock1; fdp[1] = sock2; } static void -setup_tcp(const char *test, int *fdp) +setup_tcp(const char *test, int *fdp, int port) { fd_set writefds, exceptfds; struct sockaddr_in sin; int ret, sock1, sock2, sock3; struct timeval tv; bzero(&sin, sizeof(sin)); sin.sin_len = sizeof(sin); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK); /* * First set up the listen socket. */ - sin.sin_port = htons(PORT1); + sin.sin_port = htons(port); sock1 = socket(PF_INET, SOCK_STREAM, 0); if (sock1 < 0) err(1, "%s: setup_tcp: socket", test); if (bind(sock1, (struct sockaddr *)&sin, sizeof(sin)) < 0) err(1, "%s: bind(%s, %d)", test, inet_ntoa(sin.sin_addr), PORT1); if (listen(sock1, -1) < 0) err(1, "%s: listen", test); /* * Now connect to it, non-blocking so that we don't deadlock against * ourselves. */ sock2 = socket(PF_INET, SOCK_STREAM, 0); if (sock2 < 0) err(1, "%s: setup_tcp: socket", test); if (fcntl(sock2, F_SETFL, O_NONBLOCK) < 0) err(1, "%s: setup_tcp: fcntl(O_NONBLOCK)", test); if (connect(sock2, (struct sockaddr *)&sin, sizeof(sin)) < 0 && errno != EINPROGRESS) err(1, "%s: setup_tcp: connect(%s, %d)", test, inet_ntoa(sin.sin_addr), PORT1); /* * Now pick up the connection after sleeping a moment to make sure * there's been time for some packets to go back and forth. */ if (sleep(1) != 0) err(1, "%s: sleep(1)", test); sock3 = accept(sock1, NULL, NULL); if (sock3 < 0) err(1, "%s: accept", test); if (sleep(1) != 0) err(1, "%s: sleep(1)", test); FD_ZERO(&writefds); FD_SET(sock2, &writefds); FD_ZERO(&exceptfds); FD_SET(sock2, &exceptfds); tv.tv_sec = 1; tv.tv_usec = 0; ret = select(sock2 + 1, NULL, &writefds, &exceptfds, &tv); if (ret < 0) err(1, "%s: setup_tcp: select", test); if (FD_ISSET(sock2, &exceptfds)) errx(1, "%s: setup_tcp: select: exception", test); if (!FD_ISSET(sock2, &writefds)) errx(1, "%s: setup_tcp: select: not writable", test); close(sock1); fdp[0] = sock2; fdp[1] = sock3; } static void setup_udsstream(const char *test, int *fdp) { if (socketpair(PF_LOCAL, SOCK_STREAM, 0, fdp) < 0) err(1, "%s: setup_udsstream: socketpair", test); } static void setup_udsdgram(const char *test, int *fdp) { if (socketpair(PF_LOCAL, SOCK_DGRAM, 0, fdp) < 0) err(1, "%s: setup_udsdgram: socketpair", test); } static void setup_pipe(const char *test, int *fdp) { if (pipe(fdp) < 0) err(1, "%s: setup_pipe: pipe", test); } static void setup_fifo(const char *test, int *fdp) { char path[] = "0send_fifo.XXXXXXX"; int fd1, fd2; if (mkstemp(path) == -1) err(1, "%s: setup_fifo: mktemp", test); unlink(path); if (mkfifo(path, 0600) < 0) err(1, "%s: setup_fifo: mkfifo(%s)", test, path); fd1 = open(path, O_RDONLY | O_NONBLOCK); if (fd1 < 0) err(1, "%s: setup_fifo: open(%s, O_RDONLY)", test, path); fd2 = open(path, O_WRONLY | O_NONBLOCK); if (fd2 < 0) err(1, "%s: setup_fifo: open(%s, O_WRONLY)", test, path); fdp[0] = fd2; fdp[1] = fd1; } static void close_both(int *fdp) { close(fdp[0]); fdp[0] = -1; close(fdp[1]); fdp[1] = -1; } int main(void) { int fd[2]; - setup_udp("udp_0send", fd); + setup_udp("udp_0send", fd, PORT1, PORT2); try_0send("udp_0send", fd[0]); close_both(fd); - setup_udp("udp_0write", fd); + setup_udp("udp_0write", fd, PORT1 + 10, PORT2 + 10); try_0write("udp_0write", fd[0]); close_both(fd); - setup_tcp("tcp_0send", fd); + setup_tcp("tcp_0send", fd, PORT1); try_0send("tcp_0send", fd[0]); close_both(fd); - setup_tcp("tcp_0write", fd); + setup_tcp("tcp_0write", fd, PORT1 + 10); try_0write("tcp_0write", fd[0]); close_both(fd); setup_udsstream("udsstream_0send", fd); try_0send("udsstream_0send", fd[0]); close_both(fd); setup_udsstream("udsstream_0write", fd); try_0write("udsstream_0write", fd[0]); close_both(fd); setup_udsdgram("udsdgram_0send", fd); try_0send("udsdgram_0send", fd[0]); close_both(fd); setup_udsdgram("udsdgram_0write", fd); try_0write("udsdgram_0write", fd[0]); close_both(fd); setup_pipe("pipe_0write", fd); try_0write("pipd_0write", fd[0]); close_both(fd); setup_fifo("fifo_0write", fd); try_0write("fifo_0write", fd[0]); close_both(fd); return (0); } Index: projects/clang380-import/tools/tools/ath/ath_ee_v4k_print/v4k.c =================================================================== --- projects/clang380-import/tools/tools/ath/ath_ee_v4k_print/v4k.c (revision 294776) +++ projects/clang380-import/tools/tools/ath/ath_ee_v4k_print/v4k.c (revision 294777) @@ -1,300 +1,299 @@ /* * Copyright (c) 2010-2011 Adrian Chadd, Xenion Pty Ltd. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ +#include + +#include #include #include -#include #include -#include -#include +#include typedef enum { AH_FALSE = 0, /* NB: lots of code assumes false is zero */ AH_TRUE = 1, } HAL_BOOL; typedef enum { HAL_OK = 0, /* No error */ } HAL_STATUS; struct ath_hal; #include "ah_eeprom_v4k.h" void eeprom_v4k_base_print(uint16_t *buf) { HAL_EEPROM_v4k *eep = (HAL_EEPROM_v4k *) buf; BASE_EEP4K_HEADER *eh = &eep->ee_base.baseEepHeader; printf("| Version: 0x%.4x | Length: 0x%.4x | Checksum: 0x%.4x ", eh->version, eh->length, eh->checksum); printf("| CapFlags: 0x%.2x | eepMisc: 0x%.2x | RegDomain: 0x%.4x 0x%.4x | \n", eh->opCapFlags, eh->eepMisc, eh->regDmn[0], eh->regDmn[1]); printf("| MAC: %.2x:%.2x:%.2x:%.2x:%.2x:%.2x ", eh->macAddr[0], eh->macAddr[1], eh->macAddr[2], eh->macAddr[3], eh->macAddr[4], eh->macAddr[5]); printf("| RxMask: 0x%.2x | TxMask: 0x%.2x | RfSilent: 0x%.4x | btOptions: 0x%.4x |\n", eh->rxMask, eh->txMask, eh->rfSilent, eh->blueToothOptions); printf("| DeviceCap: 0x%.4x | binBuildNumber: %.8x | deviceType: 0x%.2x | txGainType 0x%.2x |\n", eh->deviceCap, eh->binBuildNumber, eh->deviceType, eh->txGainType); } void eeprom_v4k_custdata_print(uint16_t *buf) { HAL_EEPROM_v4k *eep = (HAL_EEPROM_v4k *) buf; uint8_t *custdata = (uint8_t *) &eep->ee_base.custData; int i; printf("\n| Custdata: |\n"); for (i = 0; i < 20; i++) { printf("%s0x%.2x %s", i % 16 == 0 ? "| " : "", custdata[i], i % 16 == 15 ? "|\n" : ""); } printf("\n"); } void eeprom_v4k_modal_print(uint16_t *buf) { HAL_EEPROM_v4k *eep = (HAL_EEPROM_v4k *) buf; MODAL_EEP4K_HEADER *mh = &eep->ee_base.modalHeader; int i; printf("| antCtrlCommon: 0x%.8x |\n", mh->antCtrlCommon); printf("| switchSettling: 0x%.2x |\n", mh->switchSettling); printf("| adcDesiredSize: %d |\n| pgaDesiredSize: %.2f dBm |\n", mh->adcDesiredSize, (float) mh->pgaDesiredSize / 2.0); printf("| antCtrlChain: 0:0x%.4x |\n", mh->antCtrlChain[0]); printf("| antennaGainCh: 0:0x%.2x |\n", mh->antennaGainCh[0]); printf("| txRxAttenCh: 0:0x%.2x |\n", mh->txRxAttenCh[0]); printf("| rxTxMarginCh: 0:0x%.2x |\n", mh->rxTxMarginCh[0]); printf("| noiseFloorThresCh: 0:0x%.2x |\n", mh->noiseFloorThreshCh[0]); printf("| xlnaGainCh: 0:0x%.2x |\n", mh->xlnaGainCh[0]); printf("| iqCalICh: 0:0x%.2x |\n", mh->iqCalICh[0]); printf("| iqCalQCh: 0:0x%.2x |\n", mh->iqCalQCh[0]); printf("| bswAtten: 0:0x%.2x |\n", mh->bswAtten[0]); printf("| bswMargin: 0:0x%.2x |\n", mh->bswMargin[0]); printf("| xatten2Db: 0:0x%.2x |\n", mh->xatten2Db[0]); printf("| xatten2Margin: 0:0x%.2x |\n", mh->xatten2Margin[0]); printf("| txEndToXpaOff: 0x%.2x | txEndToRxOn: 0x%.2x | txFrameToXpaOn: 0x%.2x |\n", mh->txEndToXpaOff, mh->txEndToRxOn, mh->txFrameToXpaOn); printf("| thres62: 0x%.2x\n", mh->thresh62); printf("| xpdGain: 0x%.2x | xpd: 0x%.2x |\n", mh->xpdGain, mh->xpd); printf("| pdGainOverlap: 0x%.2x xpaBiasLvl: 0x%.2x |\n", mh->pdGainOverlap, mh->xpaBiasLvl); printf("| txFrameToDataStart: 0x%.2x | txFrameToPaOn: 0x%.2x |\n", mh->txFrameToDataStart, mh->txFrameToPaOn); printf("| ht40PowerIncForPdadc: 0x%.2x |\n", mh->ht40PowerIncForPdadc); printf("| swSettleHt40: 0x%.2x |\n", mh->swSettleHt40); printf("| ob_0: 0x%.2x | ob_1: 0x%.2x | ob_2: 0x%.2x | ob_3: 0x%.2x |\n", mh->ob_0, mh->ob_1, mh->ob_2, mh->ob_3); printf("| db_1_0: 0x%.2x | db_1_1: 0x%.2x | db_1_2: 0x%.2x | db_1_3: 0x%.2x db_1_4: 0x%.2x|\n", mh->db1_0, mh->db1_1, mh->db1_2, mh->db1_3, mh->db1_4); printf("| db_1_0: 0x%.2x | db_1_1: 0x%.2x | db_1_2: 0x%.2x | db_1_3: 0x%.2x db_1_4: 0x%.2x|\n", mh->db2_0, mh->db2_1, mh->db2_2, mh->db2_3, mh->db2_4); printf("| antdiv_ctl1: 0x%.2x antdiv_ctl2: 0x%.2x |\n", mh->antdiv_ctl1, mh->antdiv_ctl2); printf("| Modal Version: %.2x |\n", mh->version); - printf("| futureModal: 0x%.2x 0x%.2x 0x%.2x 0x%.2x |\n", - mh->futureModal[0], - mh->futureModal[1], - mh->futureModal[2], - mh->futureModal[3] - ); + printf("| tx_diversity: 0x%.2x |\n", mh->tx_diversity); + printf("| flc_pwr_thresh: 0x%.2x |\n", mh->flc_pwr_thresh); + printf("| bb_scale_smrt_antenna: 0x%.2x |\n", mh->bb_scale_smrt_antenna); + printf("| futureModal: 0x%.2x |\n", mh->futureModal[0]); /* and now, spur channels */ for (i = 0; i < AR5416_EEPROM_MODAL_SPURS; i++) { printf("| Spur %d: spurChan: 0x%.4x spurRangeLow: 0x%.2x spurRangeHigh: 0x%.2x |\n", i, mh->spurChans[i].spurChan, (int) mh->spurChans[i].spurRangeLow, (int) mh->spurChans[i].spurRangeHigh); } } static void eeprom_v4k_print_caldata_perfreq(CAL_DATA_PER_FREQ_4K *f) { int i, j; for (i = 0; i < AR5416_4K_NUM_PD_GAINS; i++) { printf(" Gain %d: pwr dBm/vpd: ", i); for (j = 0; j < AR5416_PD_GAIN_ICEPTS; j++) { /* These are stored in 0.25dBm increments */ /* XXX is this assumption correct for ar9285? */ /* XXX shouldn't we care about the power table offset, if there is one? */ printf("%d:(%.2f/%d) ", j, (float) f->pwrPdg[i][j] / 4.00, f->vpdPdg[i][j]); } printf("\n"); } } void eeprom_v4k_calfreqpiers_print(uint16_t *buf) { HAL_EEPROM_v4k *eep = (HAL_EEPROM_v4k *) buf; int i, n; /* 2ghz cal piers */ printf("calFreqPier2G: "); for (i = 0; i < AR5416_4K_NUM_2G_CAL_PIERS; i++) { printf(" 0x%.2x ", eep->ee_base.calFreqPier2G[i]); } printf("|\n"); for (i = 0; i < AR5416_4K_NUM_2G_CAL_PIERS; i++) { if (eep->ee_base.calFreqPier2G[i] == 0xff) continue; printf("2Ghz Cal Pier %d\n", i); for (n = 0; n < AR5416_4K_MAX_CHAINS; n++) { printf(" Chain %d:\n", n); eeprom_v4k_print_caldata_perfreq(&eep->ee_base.calPierData2G[n][i]); } } printf("\n"); } /* XXX these should just reference the v14 print routines */ static void eeprom_v14_target_legacy_print(CAL_TARGET_POWER_LEG *l) { int i; if (l->bChannel == 0xff) return; printf(" bChannel: %d;", l->bChannel); for (i = 0; i < 4; i++) { printf(" %.2f", (float) l->tPow2x[i] / 2.0); } printf(" (dBm)\n"); } static void eeprom_v14_target_ht_print(CAL_TARGET_POWER_HT *l) { int i; if (l->bChannel == 0xff) return; printf(" bChannel: %d;", l->bChannel); for (i = 0; i < 8; i++) { printf(" %.2f", (float) l->tPow2x[i] / 2.0); } printf(" (dBm)\n"); } void eeprom_v4k_print_targets(uint16_t *buf) { HAL_EEPROM_v4k *eep = (HAL_EEPROM_v4k *) buf; int i; /* 2ghz rates */ printf("2Ghz CCK:\n"); for (i = 0; i < AR5416_4K_NUM_2G_CCK_TARGET_POWERS; i++) { eeprom_v14_target_legacy_print(&eep->ee_base.calTargetPowerCck[i]); } printf("2Ghz 11g:\n"); for (i = 0; i < AR5416_4K_NUM_2G_20_TARGET_POWERS; i++) { eeprom_v14_target_legacy_print(&eep->ee_base.calTargetPower2G[i]); } printf("2Ghz HT20:\n"); for (i = 0; i < AR5416_4K_NUM_2G_20_TARGET_POWERS; i++) { eeprom_v14_target_ht_print(&eep->ee_base.calTargetPower2GHT20[i]); } printf("2Ghz HT40:\n"); for (i = 0; i < AR5416_4K_NUM_2G_40_TARGET_POWERS; i++) { eeprom_v14_target_ht_print(&eep->ee_base.calTargetPower2GHT40[i]); } } static void eeprom_v4k_ctl_edge_print(CAL_CTL_DATA_4K *ctl) { int i, j; uint8_t pow, flag; for (i = 0; i < AR5416_4K_MAX_CHAINS; i++) { printf(" chain %d: ", i); for (j = 0; j < AR5416_4K_NUM_BAND_EDGES; j++) { pow = ctl->ctlEdges[i][j].tPowerFlag & 0x3f; flag = (ctl->ctlEdges[i][j].tPowerFlag & 0xc0) >> 6; printf(" %d:pow=%d,flag=%.2x", j, pow, flag); } printf("\n"); } } void eeprom_v4k_ctl_print(uint16_t *buf) { HAL_EEPROM_v4k *eep = (HAL_EEPROM_v4k *) buf; int i; for (i = 0; i < AR5416_4K_NUM_CTLS; i++) { if (eep->ee_base.ctlIndex[i] == 0) continue; printf("| ctlIndex: offset %d, value %d\n", i, eep->ee_base.ctlIndex[i]); eeprom_v4k_ctl_edge_print(&eep->ee_base.ctlData[i]); } } void eeprom_v4k_print_edges(uint16_t *buf) { HAL_EEPROM_v4k *eep = (HAL_EEPROM_v4k *) buf; int i; printf("| eeNumCtls: %d\n", eep->ee_numCtls); for (i = 0; i < NUM_EDGES*eep->ee_numCtls; i++) { /* XXX is flag 8 or 32 bits? */ printf("| edge %2d/%2d: rdEdge: %5d EdgePower: %.2f dBm Flag: 0x%.8x\n", i / NUM_EDGES, i % NUM_EDGES, eep->ee_rdEdgesPower[i].rdEdge, (float) eep->ee_rdEdgesPower[i].twice_rdEdgePower / 2.0, eep->ee_rdEdgesPower[i].flag); if (i % NUM_EDGES == (NUM_EDGES -1)) printf("|\n"); } } void eeprom_v4k_print_other(uint16_t *buf) { HAL_EEPROM_v4k *eep = (HAL_EEPROM_v4k *) buf; printf("| ee_antennaGainMax: %.2x\n", eep->ee_antennaGainMax); } Index: projects/clang380-import/tools/tools/nanobsd/embedded/qemu-amd64.cfg =================================================================== --- projects/clang380-import/tools/tools/nanobsd/embedded/qemu-amd64.cfg (revision 294776) +++ projects/clang380-import/tools/tools/nanobsd/embedded/qemu-amd64.cfg (revision 294777) @@ -1,34 +1,34 @@ # $FreeBSD$ #- # Copyright (c) 2015 Warner Losh. All Rights Reserved. # Copyright (c) 2010-2011 iXsystems, Inc., All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL iXsystems, Inc. OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # NANO_ARCH=amd64 NANO_NAME=qemu-amd64 -qemu_env +. common # Pull in common definitions -. common # Pull in common definitions, keep last +qemu_env Index: projects/clang380-import/tools/tools/nanobsd/embedded/qemu-i386.cfg =================================================================== --- projects/clang380-import/tools/tools/nanobsd/embedded/qemu-i386.cfg (revision 294776) +++ projects/clang380-import/tools/tools/nanobsd/embedded/qemu-i386.cfg (revision 294777) @@ -1,34 +1,34 @@ # $FreeBSD$ #- # Copyright (c) 2015 Warner Losh. All Rights Reserved. # Copyright (c) 2010-2011 iXsystems, Inc., All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL iXsystems, Inc. OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # NANO_ARCH=i386 NANO_NAME=qemu-i386 -qemu_env +. common # Pull in common definitions -. common # Pull in common definitions, keep last +qemu_env Index: projects/clang380-import/tools/tools/nanobsd/embedded/qemu-mips.cfg =================================================================== --- projects/clang380-import/tools/tools/nanobsd/embedded/qemu-mips.cfg (revision 294776) +++ projects/clang380-import/tools/tools/nanobsd/embedded/qemu-mips.cfg (revision 294777) @@ -1,36 +1,36 @@ # $FreeBSD$ #- # Copyright (c) 2015 Warner Losh. All Rights Reserved. # Copyright (c) 2010-2011 iXsystems, Inc., All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL iXsystems, Inc. OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # NANO_ARCH=mips NANO_KERNEL=MALTA NANO_DRIVE=ada0 NANO_NAME=qemu-mips -qemu_env +. common # Pull in common definitions -. common # Pull in common definitions, keep last +qemu_env Index: projects/clang380-import/tools/tools/nanobsd/embedded/qemu-mips64.cfg =================================================================== --- projects/clang380-import/tools/tools/nanobsd/embedded/qemu-mips64.cfg (revision 294776) +++ projects/clang380-import/tools/tools/nanobsd/embedded/qemu-mips64.cfg (revision 294777) @@ -1,36 +1,36 @@ # $FreeBSD$ #- # Copyright (c) 2015 Warner Losh. All Rights Reserved. # Copyright (c) 2010-2011 iXsystems, Inc., All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL iXsystems, Inc. OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # NANO_ARCH=mips NANO_KERNEL=MALTA64 NANO_DRIVE=ada0 NANO_NAME=qemu-mips64 -qemu_env +. common # Pull in common definitions -. common # Pull in common definitions, keep last +qemu_env Index: projects/clang380-import/tools/tools/nanobsd/embedded/qemu-powerpc.cfg =================================================================== --- projects/clang380-import/tools/tools/nanobsd/embedded/qemu-powerpc.cfg (revision 294776) +++ projects/clang380-import/tools/tools/nanobsd/embedded/qemu-powerpc.cfg (revision 294777) @@ -1,37 +1,37 @@ # $FreeBSD$ #- # Copyright (c) 2015 Warner Losh. All Rights Reserved. # Copyright (c) 2010-2011 iXsystems, Inc., All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL iXsystems, Inc. OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # # Open question: do we have one for MAC G4 and one for Book-E? NANO_ARCH=powerpc NANO_KERNEL=GENERIC NANO_DRIVE=ada0 NANO_NAME=qemu-powerpc -qemu_env +. common # Pull in common definitions -. common # Pull in common definitions, keep last +qemu_env Index: projects/clang380-import/tools/tools/nanobsd/embedded/qemu-powerpc64.cfg =================================================================== --- projects/clang380-import/tools/tools/nanobsd/embedded/qemu-powerpc64.cfg (revision 294776) +++ projects/clang380-import/tools/tools/nanobsd/embedded/qemu-powerpc64.cfg (revision 294777) @@ -1,36 +1,36 @@ # $FreeBSD$ #- # Copyright (c) 2015 Warner Losh. All Rights Reserved. # Copyright (c) 2010-2011 iXsystems, Inc., All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL iXsystems, Inc. OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # NANO_ARCH=powerpc64 NANO_KERNEL=GENERIC64 NANO_DRIVE=ada0 NANO_NAME=qemu-powerpc64 -NANO_DISKIMAGE_FORMAT=qcow2 +. common # Pull in common definitions -. common # Pull in common definitions, keep last +qemu_env Index: projects/clang380-import/tools/tools/nanobsd/embedded/qemu-sparc64.cfg =================================================================== --- projects/clang380-import/tools/tools/nanobsd/embedded/qemu-sparc64.cfg (revision 294776) +++ projects/clang380-import/tools/tools/nanobsd/embedded/qemu-sparc64.cfg (revision 294777) @@ -1,36 +1,36 @@ # $FreeBSD$ #- # Copyright (c) 2015 Warner Losh. All Rights Reserved. # Copyright (c) 2010-2011 iXsystems, Inc., All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL iXsystems, Inc. OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # NANO_ARCH=sparc64 NANO_KERNEL=GENERIC NANO_DRIVE=ada0 NANO_NAME=qemu-sparc64 -qemu_env +. common # Pull in common definitions -. common # Pull in common definitions, keep last +qemu_env Index: projects/clang380-import/usr.bin/Makefile =================================================================== --- projects/clang380-import/usr.bin/Makefile (revision 294776) +++ projects/clang380-import/usr.bin/Makefile (revision 294777) @@ -1,304 +1,307 @@ # From: @(#)Makefile 8.3 (Berkeley) 1/7/94 # $FreeBSD$ .include # XXX MISSING: deroff diction graph learn plot # spell spline struct xsend # XXX Use GNU versions: diff ld patch # Moved to secure: bdes # SUBDIR= alias \ apply \ asa \ awk \ banner \ basename \ brandelf \ bsdiff \ bzip2 \ bzip2recover \ cap_mkdb \ chat \ chpass \ cksum \ cmp \ col \ colldef \ colrm \ column \ comm \ compress \ cpuset \ csplit \ ctlstat \ cut \ dirname \ dpv \ du \ elf2aout \ elfdump \ enigma \ env \ expand \ false \ fetch \ find \ fmt \ fold \ fstat \ fsync \ gcore \ gencat \ getconf \ getent \ getopt \ grep \ gzip \ head \ hexdump \ id \ ident \ ipcrm \ ipcs \ join \ jot \ keylogin \ keylogout \ killall \ ktrace \ ktrdump \ lam \ lastcomm \ ldd \ leave \ less \ lessecho \ lesskey \ limits \ locale \ localedef \ lock \ lockf \ logger \ login \ logins \ logname \ look \ lorder \ lsvfs \ lzmainfo \ m4 \ mandoc \ mesg \ minigzip \ ministat \ mkdep \ mkfifo \ mkimg \ mklocale \ mktemp \ mkulzma \ mkuzip \ mt \ ncal \ netstat \ newgrp \ nfsstat \ nice \ nl \ numactl \ nohup \ opieinfo \ opiekey \ opiepasswd \ pagesize \ passwd \ paste \ patch \ pathchk \ perror \ pr \ printenv \ printf \ procstat \ protect \ rctl \ renice \ rev \ revoke \ rpcinfo \ rs \ rup \ rusers \ rwall \ script \ sed \ send-pr \ seq \ shar \ showmount \ sockstat \ soelim \ sort \ split \ stat \ stdbuf \ su \ systat \ tabs \ tail \ tar \ tcopy \ tee \ time \ timeout \ tip \ top \ touch \ tput \ tr \ true \ truncate \ tset \ tsort \ tty \ uname \ unexpand \ uniq \ unzip \ units \ unvis \ uudecode \ uuencode \ vis \ vmstat \ w \ wall \ wc \ what \ whereis \ which \ whois \ write \ xargs \ xinstall \ xo \ xz \ xzdec \ yes # NB: keep these sorted by MK_* knobs SUBDIR.${MK_AT}+= at SUBDIR.${MK_ATM}+= atm SUBDIR.${MK_BLUETOOTH}+= bluetooth SUBDIR.${MK_BSD_CPIO}+= cpio SUBDIR.${MK_CALENDAR}+= calendar SUBDIR.${MK_CLANG}+= clang SUBDIR.${MK_EE}+= ee SUBDIR.${MK_FILE}+= file SUBDIR.${MK_FINGER}+= finger SUBDIR.${MK_FTP}+= ftp SUBDIR.${MK_GAMES}+= caesar SUBDIR.${MK_GAMES}+= factor SUBDIR.${MK_GAMES}+= fortune SUBDIR.${MK_GAMES}+= grdc SUBDIR.${MK_GAMES}+= morse SUBDIR.${MK_GAMES}+= number SUBDIR.${MK_GAMES}+= pom SUBDIR.${MK_GAMES}+= primes SUBDIR.${MK_GAMES}+= random .if ${MK_GPL_DTC} != "yes" .if ${COMPILER_FEATURES:Mc++11} SUBDIR+= dtc .endif .endif SUBDIR.${MK_GROFF}+= vgrind SUBDIR.${MK_HESIOD}+= hesinfo SUBDIR.${MK_ICONV}+= iconv SUBDIR.${MK_ICONV}+= mkcsmapper SUBDIR.${MK_ICONV}+= mkesdb SUBDIR.${MK_ISCSI}+= iscsictl SUBDIR.${MK_KDUMP}+= kdump SUBDIR.${MK_KDUMP}+= truss SUBDIR.${MK_KERBEROS_SUPPORT}+= compile_et SUBDIR.${MK_LDNS_UTILS}+= drill SUBDIR.${MK_LDNS_UTILS}+= host SUBDIR.${MK_LOCATE}+= locate # XXX msgs? SUBDIR.${MK_MAIL}+= biff SUBDIR.${MK_MAIL}+= from SUBDIR.${MK_MAIL}+= mail SUBDIR.${MK_MAIL}+= msgs SUBDIR.${MK_MAKE}+= bmake SUBDIR.${MK_MAN_UTILS}+= catman .if ${MK_MANDOCDB} == "no" # AND SUBDIR.${MK_MAN_UTILS}+= makewhatis .endif SUBDIR.${MK_MAN_UTILS}+= man SUBDIR.${MK_NETCAT}+= nc SUBDIR.${MK_NIS}+= ypcat SUBDIR.${MK_NIS}+= ypmatch SUBDIR.${MK_NIS}+= ypwhich SUBDIR.${MK_OPENSSH}+= ssh-copy-id SUBDIR.${MK_OPENSSL}+= bc SUBDIR.${MK_OPENSSL}+= chkey SUBDIR.${MK_OPENSSL}+= dc SUBDIR.${MK_OPENSSL}+= newkey SUBDIR.${MK_QUOTAS}+= quota SUBDIR.${MK_RCMDS}+= rlogin SUBDIR.${MK_RCMDS}+= rsh SUBDIR.${MK_RCMDS}+= ruptime SUBDIR.${MK_RCMDS}+= rwho SUBDIR.${MK_SENDMAIL}+= vacation SUBDIR.${MK_TALK}+= talk SUBDIR.${MK_TELNET}+= telnet SUBDIR.${MK_TESTS}+= tests SUBDIR.${MK_TEXTPROC}+= checknr SUBDIR.${MK_TEXTPROC}+= colcrt SUBDIR.${MK_TEXTPROC}+= ul SUBDIR.${MK_TFTP}+= tftp SUBDIR.${MK_TOOLCHAIN}+= addr2line SUBDIR.${MK_TOOLCHAIN}+= ar SUBDIR.${MK_TOOLCHAIN}+= c89 SUBDIR.${MK_TOOLCHAIN}+= c99 SUBDIR.${MK_TOOLCHAIN}+= ctags SUBDIR.${MK_TOOLCHAIN}+= cxxfilt SUBDIR.${MK_TOOLCHAIN}+= elfcopy SUBDIR.${MK_TOOLCHAIN}+= file2c -.if ${MACHINE_ARCH} != "aarch64" # ARM64TODO gprof does not build +.if ${MACHINE_ARCH} != "aarch64" && \ # ARM64TODO gprof does not build + ${MACHINE_CPUARCH} != "riscv" # RISCVTODO gprof does not build SUBDIR.${MK_TOOLCHAIN}+= gprof .endif SUBDIR.${MK_TOOLCHAIN}+= indent SUBDIR.${MK_TOOLCHAIN}+= lex SUBDIR.${MK_TOOLCHAIN}+= mkstr SUBDIR.${MK_TOOLCHAIN}+= nm SUBDIR.${MK_TOOLCHAIN}+= readelf SUBDIR.${MK_TOOLCHAIN}+= rpcgen SUBDIR.${MK_TOOLCHAIN}+= unifdef SUBDIR.${MK_TOOLCHAIN}+= size SUBDIR.${MK_TOOLCHAIN}+= strings .if ${MACHINE_ARCH} != "aarch64" # ARM64TODO xlint does not build SUBDIR.${MK_TOOLCHAIN}+= xlint .endif SUBDIR.${MK_TOOLCHAIN}+= xstr SUBDIR.${MK_TOOLCHAIN}+= yacc SUBDIR.${MK_VI}+= vi SUBDIR.${MK_VT}+= vtfontcvt SUBDIR.${MK_USB}+= usbhidaction SUBDIR.${MK_USB}+= usbhidctl SUBDIR.${MK_UTMPX}+= last +.if ${MACHINE_CPUARCH} != "riscv" # RISCVTODO users does not build SUBDIR.${MK_UTMPX}+= users +.endif SUBDIR.${MK_UTMPX}+= who SUBDIR.${MK_SVN}+= svn SUBDIR.${MK_SVNLITE}+= svn .include SUBDIR:= ${SUBDIR:O:u} SUBDIR_PARALLEL= .include Index: projects/clang380-import/usr.bin/elfdump/elfdump.c =================================================================== --- projects/clang380-import/usr.bin/elfdump/elfdump.c (revision 294776) +++ projects/clang380-import/usr.bin/elfdump/elfdump.c (revision 294777) @@ -1,1237 +1,1257 @@ /*- * Copyright (c) 2003 David O'Brien. All rights reserved. * Copyright (c) 2001 Jake Burkholder * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define ED_DYN (1<<0) #define ED_EHDR (1<<1) #define ED_GOT (1<<2) #define ED_HASH (1<<3) #define ED_INTERP (1<<4) #define ED_NOTE (1<<5) #define ED_PHDR (1<<6) #define ED_REL (1<<7) #define ED_SHDR (1<<8) #define ED_SYMTAB (1<<9) #define ED_ALL ((1<<10)-1) #define elf_get_addr elf_get_quad #define elf_get_off elf_get_quad #define elf_get_size elf_get_quad enum elf_member { D_TAG = 1, D_PTR, D_VAL, E_CLASS, E_DATA, E_OSABI, E_TYPE, E_MACHINE, E_VERSION, E_ENTRY, E_PHOFF, E_SHOFF, E_FLAGS, E_EHSIZE, E_PHENTSIZE, E_PHNUM, E_SHENTSIZE, E_SHNUM, E_SHSTRNDX, N_NAMESZ, N_DESCSZ, N_TYPE, P_TYPE, P_OFFSET, P_VADDR, P_PADDR, P_FILESZ, P_MEMSZ, P_FLAGS, P_ALIGN, SH_NAME, SH_TYPE, SH_FLAGS, SH_ADDR, SH_OFFSET, SH_SIZE, SH_LINK, SH_INFO, SH_ADDRALIGN, SH_ENTSIZE, ST_NAME, ST_VALUE, ST_SIZE, ST_INFO, ST_SHNDX, R_OFFSET, R_INFO, RA_OFFSET, RA_INFO, RA_ADDEND }; typedef enum elf_member elf_member_t; static int elf32_offsets[] = { 0, offsetof(Elf32_Dyn, d_tag), offsetof(Elf32_Dyn, d_un.d_ptr), offsetof(Elf32_Dyn, d_un.d_val), offsetof(Elf32_Ehdr, e_ident[EI_CLASS]), offsetof(Elf32_Ehdr, e_ident[EI_DATA]), offsetof(Elf32_Ehdr, e_ident[EI_OSABI]), offsetof(Elf32_Ehdr, e_type), offsetof(Elf32_Ehdr, e_machine), offsetof(Elf32_Ehdr, e_version), offsetof(Elf32_Ehdr, e_entry), offsetof(Elf32_Ehdr, e_phoff), offsetof(Elf32_Ehdr, e_shoff), offsetof(Elf32_Ehdr, e_flags), offsetof(Elf32_Ehdr, e_ehsize), offsetof(Elf32_Ehdr, e_phentsize), offsetof(Elf32_Ehdr, e_phnum), offsetof(Elf32_Ehdr, e_shentsize), offsetof(Elf32_Ehdr, e_shnum), offsetof(Elf32_Ehdr, e_shstrndx), offsetof(Elf_Note, n_namesz), offsetof(Elf_Note, n_descsz), offsetof(Elf_Note, n_type), offsetof(Elf32_Phdr, p_type), offsetof(Elf32_Phdr, p_offset), offsetof(Elf32_Phdr, p_vaddr), offsetof(Elf32_Phdr, p_paddr), offsetof(Elf32_Phdr, p_filesz), offsetof(Elf32_Phdr, p_memsz), offsetof(Elf32_Phdr, p_flags), offsetof(Elf32_Phdr, p_align), offsetof(Elf32_Shdr, sh_name), offsetof(Elf32_Shdr, sh_type), offsetof(Elf32_Shdr, sh_flags), offsetof(Elf32_Shdr, sh_addr), offsetof(Elf32_Shdr, sh_offset), offsetof(Elf32_Shdr, sh_size), offsetof(Elf32_Shdr, sh_link), offsetof(Elf32_Shdr, sh_info), offsetof(Elf32_Shdr, sh_addralign), offsetof(Elf32_Shdr, sh_entsize), offsetof(Elf32_Sym, st_name), offsetof(Elf32_Sym, st_value), offsetof(Elf32_Sym, st_size), offsetof(Elf32_Sym, st_info), offsetof(Elf32_Sym, st_shndx), offsetof(Elf32_Rel, r_offset), offsetof(Elf32_Rel, r_info), offsetof(Elf32_Rela, r_offset), offsetof(Elf32_Rela, r_info), offsetof(Elf32_Rela, r_addend) }; static int elf64_offsets[] = { 0, offsetof(Elf64_Dyn, d_tag), offsetof(Elf64_Dyn, d_un.d_ptr), offsetof(Elf64_Dyn, d_un.d_val), offsetof(Elf32_Ehdr, e_ident[EI_CLASS]), offsetof(Elf32_Ehdr, e_ident[EI_DATA]), offsetof(Elf32_Ehdr, e_ident[EI_OSABI]), offsetof(Elf64_Ehdr, e_type), offsetof(Elf64_Ehdr, e_machine), offsetof(Elf64_Ehdr, e_version), offsetof(Elf64_Ehdr, e_entry), offsetof(Elf64_Ehdr, e_phoff), offsetof(Elf64_Ehdr, e_shoff), offsetof(Elf64_Ehdr, e_flags), offsetof(Elf64_Ehdr, e_ehsize), offsetof(Elf64_Ehdr, e_phentsize), offsetof(Elf64_Ehdr, e_phnum), offsetof(Elf64_Ehdr, e_shentsize), offsetof(Elf64_Ehdr, e_shnum), offsetof(Elf64_Ehdr, e_shstrndx), offsetof(Elf_Note, n_namesz), offsetof(Elf_Note, n_descsz), offsetof(Elf_Note, n_type), offsetof(Elf64_Phdr, p_type), offsetof(Elf64_Phdr, p_offset), offsetof(Elf64_Phdr, p_vaddr), offsetof(Elf64_Phdr, p_paddr), offsetof(Elf64_Phdr, p_filesz), offsetof(Elf64_Phdr, p_memsz), offsetof(Elf64_Phdr, p_flags), offsetof(Elf64_Phdr, p_align), offsetof(Elf64_Shdr, sh_name), offsetof(Elf64_Shdr, sh_type), offsetof(Elf64_Shdr, sh_flags), offsetof(Elf64_Shdr, sh_addr), offsetof(Elf64_Shdr, sh_offset), offsetof(Elf64_Shdr, sh_size), offsetof(Elf64_Shdr, sh_link), offsetof(Elf64_Shdr, sh_info), offsetof(Elf64_Shdr, sh_addralign), offsetof(Elf64_Shdr, sh_entsize), offsetof(Elf64_Sym, st_name), offsetof(Elf64_Sym, st_value), offsetof(Elf64_Sym, st_size), offsetof(Elf64_Sym, st_info), offsetof(Elf64_Sym, st_shndx), offsetof(Elf64_Rel, r_offset), offsetof(Elf64_Rel, r_info), offsetof(Elf64_Rela, r_offset), offsetof(Elf64_Rela, r_info), offsetof(Elf64_Rela, r_addend) }; /* http://www.sco.com/developers/gabi/latest/ch5.dynamic.html#tag_encodings */ static const char * d_tags(u_int64_t tag) { static char unknown_tag[48]; switch (tag) { case DT_NULL: return "DT_NULL"; case DT_NEEDED: return "DT_NEEDED"; case DT_PLTRELSZ: return "DT_PLTRELSZ"; case DT_PLTGOT: return "DT_PLTGOT"; case DT_HASH: return "DT_HASH"; case DT_STRTAB: return "DT_STRTAB"; case DT_SYMTAB: return "DT_SYMTAB"; case DT_RELA: return "DT_RELA"; case DT_RELASZ: return "DT_RELASZ"; case DT_RELAENT: return "DT_RELAENT"; case DT_STRSZ: return "DT_STRSZ"; case DT_SYMENT: return "DT_SYMENT"; case DT_INIT: return "DT_INIT"; case DT_FINI: return "DT_FINI"; case DT_SONAME: return "DT_SONAME"; case DT_RPATH: return "DT_RPATH"; case DT_SYMBOLIC: return "DT_SYMBOLIC"; case DT_REL: return "DT_REL"; case DT_RELSZ: return "DT_RELSZ"; case DT_RELENT: return "DT_RELENT"; case DT_PLTREL: return "DT_PLTREL"; case DT_DEBUG: return "DT_DEBUG"; case DT_TEXTREL: return "DT_TEXTREL"; case DT_JMPREL: return "DT_JMPREL"; case DT_BIND_NOW: return "DT_BIND_NOW"; case DT_INIT_ARRAY: return "DT_INIT_ARRAY"; case DT_FINI_ARRAY: return "DT_FINI_ARRAY"; case DT_INIT_ARRAYSZ: return "DT_INIT_ARRAYSZ"; case DT_FINI_ARRAYSZ: return "DT_FINI_ARRAYSZ"; case DT_RUNPATH: return "DT_RUNPATH"; case DT_FLAGS: return "DT_FLAGS"; case DT_PREINIT_ARRAY: return "DT_PREINIT_ARRAY"; /* XXX DT_ENCODING */ case DT_PREINIT_ARRAYSZ:return "DT_PREINIT_ARRAYSZ"; /* 0x6000000D - 0x6ffff000 operating system-specific semantics */ case 0x6ffffdf5: return "DT_GNU_PRELINKED"; case 0x6ffffdf6: return "DT_GNU_CONFLICTSZ"; case 0x6ffffdf7: return "DT_GNU_LIBLISTSZ"; case 0x6ffffdf8: return "DT_SUNW_CHECKSUM"; case DT_PLTPADSZ: return "DT_PLTPADSZ"; case DT_MOVEENT: return "DT_MOVEENT"; case DT_MOVESZ: return "DT_MOVESZ"; case DT_FEATURE: return "DT_FEATURE"; case DT_POSFLAG_1: return "DT_POSFLAG_1"; case DT_SYMINSZ: return "DT_SYMINSZ"; case DT_SYMINENT : return "DT_SYMINENT (DT_VALRNGHI)"; case DT_ADDRRNGLO: return "DT_ADDRRNGLO"; case DT_GNU_HASH: return "DT_GNU_HASH"; case 0x6ffffef8: return "DT_GNU_CONFLICT"; case 0x6ffffef9: return "DT_GNU_LIBLIST"; case DT_CONFIG: return "DT_CONFIG"; case DT_DEPAUDIT: return "DT_DEPAUDIT"; case DT_AUDIT: return "DT_AUDIT"; case DT_PLTPAD: return "DT_PLTPAD"; case DT_MOVETAB: return "DT_MOVETAB"; case DT_SYMINFO : return "DT_SYMINFO (DT_ADDRRNGHI)"; case DT_RELACOUNT: return "DT_RELACOUNT"; case DT_RELCOUNT: return "DT_RELCOUNT"; case DT_FLAGS_1: return "DT_FLAGS_1"; case DT_VERDEF: return "DT_VERDEF"; case DT_VERDEFNUM: return "DT_VERDEFNUM"; case DT_VERNEED: return "DT_VERNEED"; case DT_VERNEEDNUM: return "DT_VERNEEDNUM"; case 0x6ffffff0: return "DT_GNU_VERSYM"; /* 0x70000000 - 0x7fffffff processor-specific semantics */ case 0x70000000: return "DT_IA_64_PLT_RESERVE"; case 0x7ffffffd: return "DT_SUNW_AUXILIARY"; case 0x7ffffffe: return "DT_SUNW_USED"; case 0x7fffffff: return "DT_SUNW_FILTER"; } snprintf(unknown_tag, sizeof(unknown_tag), "ERROR: TAG NOT DEFINED -- tag 0x%jx", (uintmax_t)tag); return (unknown_tag); } static const char * e_machines(u_int mach) { static char machdesc[64]; switch (mach) { case EM_NONE: return "EM_NONE"; case EM_M32: return "EM_M32"; case EM_SPARC: return "EM_SPARC"; case EM_386: return "EM_386"; case EM_68K: return "EM_68K"; case EM_88K: return "EM_88K"; case EM_IAMCU: return "EM_IAMCU"; case EM_860: return "EM_860"; case EM_MIPS: return "EM_MIPS"; case EM_PPC: return "EM_PPC"; case EM_PPC64: return "EM_PPC64"; case EM_ARM: return "EM_ARM"; case EM_ALPHA: return "EM_ALPHA (legacy)"; case EM_SPARCV9:return "EM_SPARCV9"; case EM_IA_64: return "EM_IA_64"; case EM_X86_64: return "EM_X86_64"; case EM_AARCH64:return "EM_AARCH64"; case EM_RISCV: return "EM_RISCV"; } snprintf(machdesc, sizeof(machdesc), "(unknown machine) -- type 0x%x", mach); return (machdesc); } static const char *e_types[] = { "ET_NONE", "ET_REL", "ET_EXEC", "ET_DYN", "ET_CORE" }; static const char *ei_versions[] = { "EV_NONE", "EV_CURRENT" }; static const char *ei_classes[] = { "ELFCLASSNONE", "ELFCLASS32", "ELFCLASS64" }; static const char *ei_data[] = { "ELFDATANONE", "ELFDATA2LSB", "ELFDATA2MSB" }; static const char *ei_abis[256] = { "ELFOSABI_NONE", "ELFOSABI_HPUX", "ELFOSABI_NETBSD", "ELFOSABI_LINUX", "ELFOSABI_HURD", "ELFOSABI_86OPEN", "ELFOSABI_SOLARIS", "ELFOSABI_AIX", "ELFOSABI_IRIX", "ELFOSABI_FREEBSD", "ELFOSABI_TRU64", "ELFOSABI_MODESTO", "ELFOSABI_OPENBSD", [255] = "ELFOSABI_STANDALONE" }; static const char *p_types[] = { "PT_NULL", "PT_LOAD", "PT_DYNAMIC", "PT_INTERP", "PT_NOTE", "PT_SHLIB", "PT_PHDR", "PT_TLS" }; static const char *p_flags[] = { "", "PF_X", "PF_W", "PF_X|PF_W", "PF_R", "PF_X|PF_R", "PF_W|PF_R", "PF_X|PF_W|PF_R" }; /* http://www.sco.com/developers/gabi/latest/ch4.sheader.html#sh_type */ static const char * sh_types(uint64_t machine, uint64_t sht) { static char unknown_buf[64]; if (sht < 0x60000000) { switch (sht) { case SHT_NULL: return "SHT_NULL"; case SHT_PROGBITS: return "SHT_PROGBITS"; case SHT_SYMTAB: return "SHT_SYMTAB"; case SHT_STRTAB: return "SHT_STRTAB"; case SHT_RELA: return "SHT_RELA"; case SHT_HASH: return "SHT_HASH"; case SHT_DYNAMIC: return "SHT_DYNAMIC"; case SHT_NOTE: return "SHT_NOTE"; case SHT_NOBITS: return "SHT_NOBITS"; case SHT_REL: return "SHT_REL"; case SHT_SHLIB: return "SHT_SHLIB"; case SHT_DYNSYM: return "SHT_DYNSYM"; case SHT_INIT_ARRAY: return "SHT_INIT_ARRAY"; case SHT_FINI_ARRAY: return "SHT_FINI_ARRAY"; case SHT_PREINIT_ARRAY: return "SHT_PREINIT_ARRAY"; case SHT_GROUP: return "SHT_GROUP"; case SHT_SYMTAB_SHNDX: return "SHT_SYMTAB_SHNDX"; } snprintf(unknown_buf, sizeof(unknown_buf), "ERROR: SHT %ju NOT DEFINED", (uintmax_t)sht); return (unknown_buf); } else if (sht < 0x70000000) { /* 0x60000000-0x6fffffff operating system-specific semantics */ switch (sht) { case 0x6ffffff0: return "XXX:VERSYM"; case SHT_SUNW_dof: return "SHT_SUNW_dof"; case SHT_GNU_HASH: return "SHT_GNU_HASH"; case 0x6ffffff7: return "SHT_GNU_LIBLIST"; case 0x6ffffffc: return "XXX:VERDEF"; case SHT_SUNW_verdef: return "SHT_SUNW(GNU)_verdef"; case SHT_SUNW_verneed: return "SHT_SUNW(GNU)_verneed"; case SHT_SUNW_versym: return "SHT_SUNW(GNU)_versym"; } snprintf(unknown_buf, sizeof(unknown_buf), "ERROR: OS-SPECIFIC SHT 0x%jx NOT DEFINED", (uintmax_t)sht); return (unknown_buf); } else if (sht < 0x80000000) { /* 0x70000000-0x7fffffff processor-specific semantics */ switch (machine) { case EM_ARM: switch (sht) { case SHT_ARM_EXIDX: return "SHT_ARM_EXIDX"; case SHT_ARM_PREEMPTMAP:return "SHT_ARM_PREEMPTMAP"; case SHT_ARM_ATTRIBUTES:return "SHT_ARM_ATTRIBUTES"; case SHT_ARM_DEBUGOVERLAY: return "SHT_ARM_DEBUGOVERLAY"; case SHT_ARM_OVERLAYSECTION: return "SHT_ARM_OVERLAYSECTION"; } break; case EM_IA_64: switch (sht) { case 0x70000000: return "SHT_IA_64_EXT"; case 0x70000001: return "SHT_IA_64_UNWIND"; } break; case EM_MIPS: switch (sht) { case SHT_MIPS_REGINFO: return "SHT_MIPS_REGINFO"; case SHT_MIPS_OPTIONS: return "SHT_MIPS_OPTIONS"; case SHT_MIPS_ABIFLAGS: return "SHT_MIPS_ABIFLAGS"; } break; } switch (sht) { case 0x7ffffffd: return "XXX:AUXILIARY"; case 0x7fffffff: return "XXX:FILTER"; } snprintf(unknown_buf, sizeof(unknown_buf), "ERROR: PROCESSOR-SPECIFIC SHT 0x%jx NOT DEFINED", (uintmax_t)sht); return (unknown_buf); } else { /* 0x80000000-0xffffffff application programs */ snprintf(unknown_buf, sizeof(unknown_buf), "ERROR: SHT 0x%jx NOT DEFINED", (uintmax_t)sht); return (unknown_buf); } } static const char *sh_flags[] = { "", "SHF_WRITE", "SHF_ALLOC", "SHF_WRITE|SHF_ALLOC", "SHF_EXECINSTR", "SHF_WRITE|SHF_EXECINSTR", "SHF_ALLOC|SHF_EXECINSTR", "SHF_WRITE|SHF_ALLOC|SHF_EXECINSTR" }; -static const char *st_types[] = { - "STT_NOTYPE", "STT_OBJECT", "STT_FUNC", "STT_SECTION", "STT_FILE" -}; +static const char * +st_type(unsigned int mach, unsigned int type) +{ + static char s_type[32]; + switch (type) { + case STT_NOTYPE: return "STT_NOTYPE"; + case STT_OBJECT: return "STT_OBJECT"; + case STT_FUNC: return "STT_FUNC"; + case STT_SECTION: return "STT_SECTION"; + case STT_FILE: return "STT_FILE"; + case STT_COMMON: return "STT_COMMON"; + case STT_TLS: return "STT_TLS"; + case 13: + if (mach == EM_SPARCV9) + return "STT_SPARC_REGISTER"; + break; + } + snprintf(s_type, sizeof(s_type), "", type); + return (s_type); +} + static const char *st_bindings[] = { "STB_LOCAL", "STB_GLOBAL", "STB_WEAK" }; static char *dynstr; static char *shstrtab; static char *strtab; static FILE *out; static u_int64_t elf_get_byte(Elf32_Ehdr *e, void *base, elf_member_t member); static u_int64_t elf_get_quarter(Elf32_Ehdr *e, void *base, elf_member_t member); #if 0 static u_int64_t elf_get_half(Elf32_Ehdr *e, void *base, elf_member_t member); #endif static u_int64_t elf_get_word(Elf32_Ehdr *e, void *base, elf_member_t member); static u_int64_t elf_get_quad(Elf32_Ehdr *e, void *base, elf_member_t member); static void elf_print_ehdr(Elf32_Ehdr *e, void *sh); static void elf_print_phdr(Elf32_Ehdr *e, void *p); static void elf_print_shdr(Elf32_Ehdr *e, void *sh); static void elf_print_symtab(Elf32_Ehdr *e, void *sh, char *str); static void elf_print_dynamic(Elf32_Ehdr *e, void *sh); static void elf_print_rel(Elf32_Ehdr *e, void *r); static void elf_print_rela(Elf32_Ehdr *e, void *ra); static void elf_print_interp(Elf32_Ehdr *e, void *p); static void elf_print_got(Elf32_Ehdr *e, void *sh); static void elf_print_hash(Elf32_Ehdr *e, void *sh); static void elf_print_note(Elf32_Ehdr *e, void *sh); static void usage(void); /* * Helpers for ELF files with shnum or shstrndx values that don't fit in the * ELF header. If the values are too large then an escape value is used to * indicate that the actual value is found in one of section 0's fields. */ static uint64_t elf_get_shnum(Elf32_Ehdr *e, void *sh) { uint64_t shnum; shnum = elf_get_quarter(e, e, E_SHNUM); if (shnum == 0) shnum = elf_get_word(e, (char *)sh, SH_SIZE); return shnum; } static uint64_t elf_get_shstrndx(Elf32_Ehdr *e, void *sh) { uint64_t shstrndx; shstrndx = elf_get_quarter(e, e, E_SHSTRNDX); if (shstrndx == SHN_XINDEX) shstrndx = elf_get_word(e, (char *)sh, SH_LINK); return shstrndx; } int main(int ac, char **av) { cap_rights_t rights; u_int64_t phoff; u_int64_t shoff; u_int64_t phentsize; u_int64_t phnum; u_int64_t shentsize; u_int64_t shnum; u_int64_t shstrndx; u_int64_t offset; u_int64_t name; u_int64_t type; struct stat sb; u_int flags; Elf32_Ehdr *e; void *p; void *sh; void *v; int fd; int ch; int i; out = stdout; flags = 0; while ((ch = getopt(ac, av, "acdeiGhnprsw:")) != -1) switch (ch) { case 'a': flags = ED_ALL; break; case 'c': flags |= ED_SHDR; break; case 'd': flags |= ED_DYN; break; case 'e': flags |= ED_EHDR; break; case 'i': flags |= ED_INTERP; break; case 'G': flags |= ED_GOT; break; case 'h': flags |= ED_HASH; break; case 'n': flags |= ED_NOTE; break; case 'p': flags |= ED_PHDR; break; case 'r': flags |= ED_REL; break; case 's': flags |= ED_SYMTAB; break; case 'w': if ((out = fopen(optarg, "w")) == NULL) err(1, "%s", optarg); cap_rights_init(&rights, CAP_FSTAT, CAP_WRITE); if (cap_rights_limit(fileno(out), &rights) < 0 && errno != ENOSYS) err(1, "unable to limit rights for %s", optarg); break; case '?': default: usage(); } ac -= optind; av += optind; if (ac == 0 || flags == 0) usage(); if ((fd = open(*av, O_RDONLY)) < 0 || fstat(fd, &sb) < 0) err(1, "%s", *av); cap_rights_init(&rights, CAP_MMAP_R); if (cap_rights_limit(fd, &rights) < 0 && errno != ENOSYS) err(1, "unable to limit rights for %s", *av); close(STDIN_FILENO); cap_rights_init(&rights, CAP_WRITE); if (cap_rights_limit(STDOUT_FILENO, &rights) < 0 && errno != ENOSYS) err(1, "unable to limit rights for stdout"); if (cap_rights_limit(STDERR_FILENO, &rights) < 0 && errno != ENOSYS) err(1, "unable to limit rights for stderr"); if (cap_enter() < 0 && errno != ENOSYS) err(1, "unable to enter capability mode"); e = mmap(NULL, sb.st_size, PROT_READ, MAP_SHARED, fd, 0); if (e == MAP_FAILED) err(1, NULL); if (!IS_ELF(*(Elf32_Ehdr *)e)) errx(1, "not an elf file"); phoff = elf_get_off(e, e, E_PHOFF); shoff = elf_get_off(e, e, E_SHOFF); phentsize = elf_get_quarter(e, e, E_PHENTSIZE); phnum = elf_get_quarter(e, e, E_PHNUM); shentsize = elf_get_quarter(e, e, E_SHENTSIZE); p = (char *)e + phoff; if (shoff > 0) { sh = (char *)e + shoff; shnum = elf_get_shnum(e, sh); shstrndx = elf_get_shstrndx(e, sh); offset = elf_get_off(e, (char *)sh + shstrndx * shentsize, SH_OFFSET); shstrtab = (char *)e + offset; } else { sh = NULL; shnum = 0; shstrndx = 0; shstrtab = NULL; } for (i = 0; (u_int64_t)i < shnum; i++) { name = elf_get_word(e, (char *)sh + i * shentsize, SH_NAME); offset = elf_get_off(e, (char *)sh + i * shentsize, SH_OFFSET); if (strcmp(shstrtab + name, ".strtab") == 0) strtab = (char *)e + offset; if (strcmp(shstrtab + name, ".dynstr") == 0) dynstr = (char *)e + offset; } if (flags & ED_EHDR) elf_print_ehdr(e, sh); if (flags & ED_PHDR) elf_print_phdr(e, p); if (flags & ED_SHDR) elf_print_shdr(e, sh); for (i = 0; (u_int64_t)i < phnum; i++) { v = (char *)p + i * phentsize; type = elf_get_word(e, v, P_TYPE); switch (type) { case PT_INTERP: if (flags & ED_INTERP) elf_print_interp(e, v); break; case PT_NULL: case PT_LOAD: case PT_DYNAMIC: case PT_NOTE: case PT_SHLIB: case PT_PHDR: break; } } for (i = 0; (u_int64_t)i < shnum; i++) { v = (char *)sh + i * shentsize; type = elf_get_word(e, v, SH_TYPE); switch (type) { case SHT_SYMTAB: if (flags & ED_SYMTAB) elf_print_symtab(e, v, strtab); break; case SHT_DYNAMIC: if (flags & ED_DYN) elf_print_dynamic(e, v); break; case SHT_RELA: if (flags & ED_REL) elf_print_rela(e, v); break; case SHT_REL: if (flags & ED_REL) elf_print_rel(e, v); break; case SHT_NOTE: name = elf_get_word(e, v, SH_NAME); if (flags & ED_NOTE && strcmp(shstrtab + name, ".note.ABI-tag") == 0) elf_print_note(e, v); break; case SHT_DYNSYM: if (flags & ED_SYMTAB) elf_print_symtab(e, v, dynstr); break; case SHT_PROGBITS: name = elf_get_word(e, v, SH_NAME); if (flags & ED_GOT && strcmp(shstrtab + name, ".got") == 0) elf_print_got(e, v); break; case SHT_HASH: if (flags & ED_HASH) elf_print_hash(e, v); break; case SHT_NULL: case SHT_STRTAB: case SHT_NOBITS: case SHT_SHLIB: break; } } return 0; } static void elf_print_ehdr(Elf32_Ehdr *e, void *sh) { u_int64_t class; u_int64_t data; u_int64_t osabi; u_int64_t type; u_int64_t machine; u_int64_t version; u_int64_t entry; u_int64_t phoff; u_int64_t shoff; u_int64_t flags; u_int64_t ehsize; u_int64_t phentsize; u_int64_t phnum; u_int64_t shentsize; u_int64_t shnum; u_int64_t shstrndx; class = elf_get_byte(e, e, E_CLASS); data = elf_get_byte(e, e, E_DATA); osabi = elf_get_byte(e, e, E_OSABI); type = elf_get_quarter(e, e, E_TYPE); machine = elf_get_quarter(e, e, E_MACHINE); version = elf_get_word(e, e, E_VERSION); entry = elf_get_addr(e, e, E_ENTRY); phoff = elf_get_off(e, e, E_PHOFF); shoff = elf_get_off(e, e, E_SHOFF); flags = elf_get_word(e, e, E_FLAGS); ehsize = elf_get_quarter(e, e, E_EHSIZE); phentsize = elf_get_quarter(e, e, E_PHENTSIZE); phnum = elf_get_quarter(e, e, E_PHNUM); shentsize = elf_get_quarter(e, e, E_SHENTSIZE); fprintf(out, "\nelf header:\n"); fprintf(out, "\n"); fprintf(out, "\te_ident: %s %s %s\n", ei_classes[class], ei_data[data], ei_abis[osabi]); fprintf(out, "\te_type: %s\n", e_types[type]); fprintf(out, "\te_machine: %s\n", e_machines(machine)); fprintf(out, "\te_version: %s\n", ei_versions[version]); fprintf(out, "\te_entry: %#jx\n", (intmax_t)entry); fprintf(out, "\te_phoff: %jd\n", (intmax_t)phoff); fprintf(out, "\te_shoff: %jd\n", (intmax_t)shoff); fprintf(out, "\te_flags: %jd\n", (intmax_t)flags); fprintf(out, "\te_ehsize: %jd\n", (intmax_t)ehsize); fprintf(out, "\te_phentsize: %jd\n", (intmax_t)phentsize); fprintf(out, "\te_phnum: %jd\n", (intmax_t)phnum); fprintf(out, "\te_shentsize: %jd\n", (intmax_t)shentsize); if (sh != NULL) { shnum = elf_get_shnum(e, sh); shstrndx = elf_get_shstrndx(e, sh); fprintf(out, "\te_shnum: %jd\n", (intmax_t)shnum); fprintf(out, "\te_shstrndx: %jd\n", (intmax_t)shstrndx); } } static void elf_print_phdr(Elf32_Ehdr *e, void *p) { u_int64_t phentsize; u_int64_t phnum; u_int64_t type; u_int64_t offset; u_int64_t vaddr; u_int64_t paddr; u_int64_t filesz; u_int64_t memsz; u_int64_t flags; u_int64_t align; void *v; int i; phentsize = elf_get_quarter(e, e, E_PHENTSIZE); phnum = elf_get_quarter(e, e, E_PHNUM); fprintf(out, "\nprogram header:\n"); for (i = 0; (u_int64_t)i < phnum; i++) { v = (char *)p + i * phentsize; type = elf_get_word(e, v, P_TYPE); offset = elf_get_off(e, v, P_OFFSET); vaddr = elf_get_addr(e, v, P_VADDR); paddr = elf_get_addr(e, v, P_PADDR); filesz = elf_get_size(e, v, P_FILESZ); memsz = elf_get_size(e, v, P_MEMSZ); flags = elf_get_word(e, v, P_FLAGS); align = elf_get_size(e, v, P_ALIGN); fprintf(out, "\n"); fprintf(out, "entry: %d\n", i); fprintf(out, "\tp_type: %s\n", p_types[type & 0x7]); fprintf(out, "\tp_offset: %jd\n", (intmax_t)offset); fprintf(out, "\tp_vaddr: %#jx\n", (intmax_t)vaddr); fprintf(out, "\tp_paddr: %#jx\n", (intmax_t)paddr); fprintf(out, "\tp_filesz: %jd\n", (intmax_t)filesz); fprintf(out, "\tp_memsz: %jd\n", (intmax_t)memsz); fprintf(out, "\tp_flags: %s\n", p_flags[flags]); fprintf(out, "\tp_align: %jd\n", (intmax_t)align); } } static void elf_print_shdr(Elf32_Ehdr *e, void *sh) { u_int64_t shentsize; u_int64_t shnum; u_int64_t name; u_int64_t type; u_int64_t flags; u_int64_t addr; u_int64_t offset; u_int64_t size; u_int64_t shlink; u_int64_t info; u_int64_t addralign; u_int64_t entsize; u_int64_t machine; void *v; int i; if (sh == NULL) { fprintf(out, "\nNo section headers\n"); return; } machine = elf_get_quarter(e, e, E_MACHINE); shentsize = elf_get_quarter(e, e, E_SHENTSIZE); shnum = elf_get_shnum(e, sh); fprintf(out, "\nsection header:\n"); for (i = 0; (u_int64_t)i < shnum; i++) { v = (char *)sh + i * shentsize; name = elf_get_word(e, v, SH_NAME); type = elf_get_word(e, v, SH_TYPE); flags = elf_get_word(e, v, SH_FLAGS); addr = elf_get_addr(e, v, SH_ADDR); offset = elf_get_off(e, v, SH_OFFSET); size = elf_get_size(e, v, SH_SIZE); shlink = elf_get_word(e, v, SH_LINK); info = elf_get_word(e, v, SH_INFO); addralign = elf_get_size(e, v, SH_ADDRALIGN); entsize = elf_get_size(e, v, SH_ENTSIZE); fprintf(out, "\n"); fprintf(out, "entry: %d\n", i); fprintf(out, "\tsh_name: %s\n", shstrtab + name); fprintf(out, "\tsh_type: %s\n", sh_types(machine, type)); fprintf(out, "\tsh_flags: %s\n", sh_flags[flags & 0x7]); fprintf(out, "\tsh_addr: %#jx\n", addr); fprintf(out, "\tsh_offset: %jd\n", (intmax_t)offset); fprintf(out, "\tsh_size: %jd\n", (intmax_t)size); fprintf(out, "\tsh_link: %jd\n", (intmax_t)shlink); fprintf(out, "\tsh_info: %jd\n", (intmax_t)info); fprintf(out, "\tsh_addralign: %jd\n", (intmax_t)addralign); fprintf(out, "\tsh_entsize: %jd\n", (intmax_t)entsize); } } static void elf_print_symtab(Elf32_Ehdr *e, void *sh, char *str) { + u_int64_t machine; u_int64_t offset; u_int64_t entsize; u_int64_t size; u_int64_t name; u_int64_t value; u_int64_t info; u_int64_t shndx; void *st; int len; int i; + machine = elf_get_quarter(e, e, E_MACHINE); offset = elf_get_off(e, sh, SH_OFFSET); entsize = elf_get_size(e, sh, SH_ENTSIZE); size = elf_get_size(e, sh, SH_SIZE); name = elf_get_word(e, sh, SH_NAME); len = size / entsize; fprintf(out, "\nsymbol table (%s):\n", shstrtab + name); for (i = 0; i < len; i++) { st = (char *)e + offset + i * entsize; name = elf_get_word(e, st, ST_NAME); value = elf_get_addr(e, st, ST_VALUE); size = elf_get_size(e, st, ST_SIZE); info = elf_get_byte(e, st, ST_INFO); shndx = elf_get_quarter(e, st, ST_SHNDX); fprintf(out, "\n"); fprintf(out, "entry: %d\n", i); fprintf(out, "\tst_name: %s\n", str + name); fprintf(out, "\tst_value: %#jx\n", value); fprintf(out, "\tst_size: %jd\n", (intmax_t)size); fprintf(out, "\tst_info: %s %s\n", - st_types[ELF32_ST_TYPE(info)], + st_type(machine, ELF32_ST_TYPE(info)), st_bindings[ELF32_ST_BIND(info)]); fprintf(out, "\tst_shndx: %jd\n", (intmax_t)shndx); } } static void elf_print_dynamic(Elf32_Ehdr *e, void *sh) { u_int64_t offset; u_int64_t entsize; u_int64_t size; int64_t tag; u_int64_t ptr; u_int64_t val; void *d; int i; offset = elf_get_off(e, sh, SH_OFFSET); entsize = elf_get_size(e, sh, SH_ENTSIZE); size = elf_get_size(e, sh, SH_SIZE); fprintf(out, "\ndynamic:\n"); for (i = 0; (u_int64_t)i < size / entsize; i++) { d = (char *)e + offset + i * entsize; tag = elf_get_size(e, d, D_TAG); ptr = elf_get_size(e, d, D_PTR); val = elf_get_addr(e, d, D_VAL); fprintf(out, "\n"); fprintf(out, "entry: %d\n", i); fprintf(out, "\td_tag: %s\n", d_tags(tag)); switch (tag) { case DT_NEEDED: case DT_SONAME: case DT_RPATH: fprintf(out, "\td_val: %s\n", dynstr + val); break; case DT_PLTRELSZ: case DT_RELA: case DT_RELASZ: case DT_RELAENT: case DT_STRSZ: case DT_SYMENT: case DT_RELSZ: case DT_RELENT: case DT_PLTREL: fprintf(out, "\td_val: %jd\n", (intmax_t)val); break; case DT_PLTGOT: case DT_HASH: case DT_STRTAB: case DT_SYMTAB: case DT_INIT: case DT_FINI: case DT_REL: case DT_JMPREL: fprintf(out, "\td_ptr: %#jx\n", ptr); break; case DT_NULL: case DT_SYMBOLIC: case DT_DEBUG: case DT_TEXTREL: break; } } } static void elf_print_rela(Elf32_Ehdr *e, void *sh) { u_int64_t offset; u_int64_t entsize; u_int64_t size; u_int64_t name; u_int64_t info; int64_t addend; void *ra; void *v; int i; offset = elf_get_off(e, sh, SH_OFFSET); entsize = elf_get_size(e, sh, SH_ENTSIZE); size = elf_get_size(e, sh, SH_SIZE); name = elf_get_word(e, sh, SH_NAME); v = (char *)e + offset; fprintf(out, "\nrelocation with addend (%s):\n", shstrtab + name); for (i = 0; (u_int64_t)i < size / entsize; i++) { ra = (char *)v + i * entsize; offset = elf_get_addr(e, ra, RA_OFFSET); info = elf_get_word(e, ra, RA_INFO); addend = elf_get_off(e, ra, RA_ADDEND); fprintf(out, "\n"); fprintf(out, "entry: %d\n", i); fprintf(out, "\tr_offset: %#jx\n", offset); fprintf(out, "\tr_info: %jd\n", (intmax_t)info); fprintf(out, "\tr_addend: %jd\n", (intmax_t)addend); } } static void elf_print_rel(Elf32_Ehdr *e, void *sh) { u_int64_t offset; u_int64_t entsize; u_int64_t size; u_int64_t name; u_int64_t info; void *r; void *v; int i; offset = elf_get_off(e, sh, SH_OFFSET); entsize = elf_get_size(e, sh, SH_ENTSIZE); size = elf_get_size(e, sh, SH_SIZE); name = elf_get_word(e, sh, SH_NAME); v = (char *)e + offset; fprintf(out, "\nrelocation (%s):\n", shstrtab + name); for (i = 0; (u_int64_t)i < size / entsize; i++) { r = (char *)v + i * entsize; offset = elf_get_addr(e, r, R_OFFSET); info = elf_get_word(e, r, R_INFO); fprintf(out, "\n"); fprintf(out, "entry: %d\n", i); fprintf(out, "\tr_offset: %#jx\n", offset); fprintf(out, "\tr_info: %jd\n", (intmax_t)info); } } static void elf_print_interp(Elf32_Ehdr *e, void *p) { u_int64_t offset; char *s; offset = elf_get_off(e, p, P_OFFSET); s = (char *)e + offset; fprintf(out, "\ninterp:\n"); fprintf(out, "\t%s\n", s); } static void elf_print_got(Elf32_Ehdr *e, void *sh) { u_int64_t offset; u_int64_t addralign; u_int64_t size; u_int64_t addr; void *v; int i; offset = elf_get_off(e, sh, SH_OFFSET); addralign = elf_get_size(e, sh, SH_ADDRALIGN); size = elf_get_size(e, sh, SH_SIZE); v = (char *)e + offset; fprintf(out, "\nglobal offset table:\n"); for (i = 0; (u_int64_t)i < size / addralign; i++) { addr = elf_get_addr(e, (char *)v + i * addralign, 0); fprintf(out, "\n"); fprintf(out, "entry: %d\n", i); fprintf(out, "\t%#jx\n", addr); } } static void elf_print_hash(Elf32_Ehdr *e __unused, void *sh __unused) { } static void elf_print_note(Elf32_Ehdr *e, void *sh) { u_int64_t offset; u_int64_t size; u_int64_t name; u_int32_t namesz; u_int32_t descsz; u_int32_t desc; char *n, *s; offset = elf_get_off(e, sh, SH_OFFSET); size = elf_get_size(e, sh, SH_SIZE); name = elf_get_word(e, sh, SH_NAME); n = (char *)e + offset; fprintf(out, "\nnote (%s):\n", shstrtab + name); while (n < ((char *)e + offset + size)) { namesz = elf_get_word(e, n, N_NAMESZ); descsz = elf_get_word(e, n, N_DESCSZ); s = n + sizeof(Elf_Note); desc = elf_get_word(e, n + sizeof(Elf_Note) + namesz, 0); fprintf(out, "\t%s %d\n", s, desc); n += sizeof(Elf_Note) + namesz + descsz; } } static u_int64_t elf_get_byte(Elf32_Ehdr *e, void *base, elf_member_t member) { u_int64_t val; val = 0; switch (e->e_ident[EI_CLASS]) { case ELFCLASS32: val = ((uint8_t *)base)[elf32_offsets[member]]; break; case ELFCLASS64: val = ((uint8_t *)base)[elf64_offsets[member]]; break; case ELFCLASSNONE: errx(1, "invalid class"); } return val; } static u_int64_t elf_get_quarter(Elf32_Ehdr *e, void *base, elf_member_t member) { u_int64_t val; val = 0; switch (e->e_ident[EI_CLASS]) { case ELFCLASS32: base = (char *)base + elf32_offsets[member]; switch (e->e_ident[EI_DATA]) { case ELFDATA2MSB: val = be16dec(base); break; case ELFDATA2LSB: val = le16dec(base); break; case ELFDATANONE: errx(1, "invalid data format"); } break; case ELFCLASS64: base = (char *)base + elf64_offsets[member]; switch (e->e_ident[EI_DATA]) { case ELFDATA2MSB: val = be16dec(base); break; case ELFDATA2LSB: val = le16dec(base); break; case ELFDATANONE: errx(1, "invalid data format"); } break; case ELFCLASSNONE: errx(1, "invalid class"); } return val; } #if 0 static u_int64_t elf_get_half(Elf32_Ehdr *e, void *base, elf_member_t member) { u_int64_t val; val = 0; switch (e->e_ident[EI_CLASS]) { case ELFCLASS32: base = (char *)base + elf32_offsets[member]; switch (e->e_ident[EI_DATA]) { case ELFDATA2MSB: val = be16dec(base); break; case ELFDATA2LSB: val = le16dec(base); break; case ELFDATANONE: errx(1, "invalid data format"); } break; case ELFCLASS64: base = (char *)base + elf64_offsets[member]; switch (e->e_ident[EI_DATA]) { case ELFDATA2MSB: val = be32dec(base); break; case ELFDATA2LSB: val = le32dec(base); break; case ELFDATANONE: errx(1, "invalid data format"); } break; case ELFCLASSNONE: errx(1, "invalid class"); } return val; } #endif static u_int64_t elf_get_word(Elf32_Ehdr *e, void *base, elf_member_t member) { u_int64_t val; val = 0; switch (e->e_ident[EI_CLASS]) { case ELFCLASS32: base = (char *)base + elf32_offsets[member]; switch (e->e_ident[EI_DATA]) { case ELFDATA2MSB: val = be32dec(base); break; case ELFDATA2LSB: val = le32dec(base); break; case ELFDATANONE: errx(1, "invalid data format"); } break; case ELFCLASS64: base = (char *)base + elf64_offsets[member]; switch (e->e_ident[EI_DATA]) { case ELFDATA2MSB: val = be32dec(base); break; case ELFDATA2LSB: val = le32dec(base); break; case ELFDATANONE: errx(1, "invalid data format"); } break; case ELFCLASSNONE: errx(1, "invalid class"); } return val; } static u_int64_t elf_get_quad(Elf32_Ehdr *e, void *base, elf_member_t member) { u_int64_t val; val = 0; switch (e->e_ident[EI_CLASS]) { case ELFCLASS32: base = (char *)base + elf32_offsets[member]; switch (e->e_ident[EI_DATA]) { case ELFDATA2MSB: val = be32dec(base); break; case ELFDATA2LSB: val = le32dec(base); break; case ELFDATANONE: errx(1, "invalid data format"); } break; case ELFCLASS64: base = (char *)base + elf64_offsets[member]; switch (e->e_ident[EI_DATA]) { case ELFDATA2MSB: val = be64dec(base); break; case ELFDATA2LSB: val = le64dec(base); break; case ELFDATANONE: errx(1, "invalid data format"); } break; case ELFCLASSNONE: errx(1, "invalid class"); } return val; } static void usage(void) { fprintf(stderr, "usage: elfdump -a | -cdeGhinprs [-w file] file\n"); exit(1); } Index: projects/clang380-import/usr.bin/ldd/ldd.c =================================================================== --- projects/clang380-import/usr.bin/ldd/ldd.c (revision 294776) +++ projects/clang380-import/usr.bin/ldd/ldd.c (revision 294777) @@ -1,413 +1,413 @@ /* * Copyright (c) 1993 Paul Kranenburg * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by Paul Kranenburg. * 4. The name of the author may not be used to endorse or promote products * derived from this software without specific prior written permission * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include "extern.h" -/* We don't support a.out executables on arm64 */ -#ifndef __aarch64__ +/* We don't support a.out executables on arm64 and riscv */ +#if !defined(__aarch64__) && !defined(__riscv__) #include #define AOUT_SUPPORTED #endif /* * 32-bit ELF data structures can only be used if the system header[s] declare * them. There is no official macro for determining whether they are declared, * so check for the existence of one of the 32-macros defined in elf(5). */ #ifdef ELF32_R_TYPE #define ELF32_SUPPORTED #endif #define LDD_SETENV(name, value, overwrite) do { \ setenv("LD_" name, value, overwrite); \ setenv("LD_32_" name, value, overwrite); \ } while (0) #define LDD_UNSETENV(name) do { \ unsetenv("LD_" name); \ unsetenv("LD_32_" name); \ } while (0) static int is_executable(const char *fname, int fd, int *is_shlib, int *type); static void usage(void); #define TYPE_UNKNOWN 0 #define TYPE_AOUT 1 #define TYPE_ELF 2 /* Architecture default */ #if __ELF_WORD_SIZE > 32 && defined(ELF32_SUPPORTED) #define TYPE_ELF32 3 /* Explicit 32 bits on architectures >32 bits */ #define _PATH_LDD32 "/usr/bin/ldd32" static int execldd32(char *file, char *fmt1, char *fmt2, int aflag, int vflag) { char *argv[8]; int i, rval, status; LDD_UNSETENV("TRACE_LOADED_OBJECTS"); rval = 0; i = 0; argv[i++] = strdup(_PATH_LDD32); if (aflag) argv[i++] = strdup("-a"); if (vflag) argv[i++] = strdup("-v"); if (fmt1 != NULL) { argv[i++] = strdup("-f"); argv[i++] = strdup(fmt1); } if (fmt2 != NULL) { argv[i++] = strdup("-f"); argv[i++] = strdup(fmt2); } argv[i++] = strdup(file); argv[i++] = NULL; switch (fork()) { case -1: err(1, "fork"); break; case 0: execv(_PATH_LDD32, argv); warn("%s", _PATH_LDD32); _exit(127); break; default: if (wait(&status) < 0) rval = 1; else if (WIFSIGNALED(status)) rval = 1; else if (WIFEXITED(status) && WEXITSTATUS(status) != 0) rval = 1; break; } while (i--) free(argv[i]); LDD_SETENV("TRACE_LOADED_OBJECTS", "yes", 1); return (rval); } #endif int main(int argc, char *argv[]) { char *fmt1, *fmt2; int rval, c, aflag, vflag; aflag = vflag = 0; fmt1 = fmt2 = NULL; while ((c = getopt(argc, argv, "af:v")) != -1) { switch (c) { case 'a': aflag++; break; case 'f': if (fmt1 != NULL) { if (fmt2 != NULL) errx(1, "too many formats"); fmt2 = optarg; } else fmt1 = optarg; break; case 'v': vflag++; break; default: usage(); /* NOTREACHED */ } } argc -= optind; argv += optind; if (vflag && fmt1 != NULL) errx(1, "-v may not be used with -f"); if (argc <= 0) { usage(); /* NOTREACHED */ } #ifdef __i386__ if (vflag) { for (c = 0; c < argc; c++) dump_file(argv[c]); exit(error_count == 0 ? EXIT_SUCCESS : EXIT_FAILURE); } #endif rval = 0; for (; argc > 0; argc--, argv++) { int fd, status, is_shlib, rv, type; if ((fd = open(*argv, O_RDONLY, 0)) < 0) { warn("%s", *argv); rval |= 1; continue; } rv = is_executable(*argv, fd, &is_shlib, &type); close(fd); if (rv == 0) { rval |= 1; continue; } switch (type) { case TYPE_ELF: case TYPE_AOUT: break; #if __ELF_WORD_SIZE > 32 && defined(ELF32_SUPPORTED) case TYPE_ELF32: rval |= execldd32(*argv, fmt1, fmt2, aflag, vflag); continue; #endif case TYPE_UNKNOWN: default: /* * This shouldn't happen unless is_executable() * is broken. */ errx(EDOOFUS, "unknown executable type"); } /* ld.so magic */ LDD_SETENV("TRACE_LOADED_OBJECTS", "yes", 1); if (fmt1 != NULL) LDD_SETENV("TRACE_LOADED_OBJECTS_FMT1", fmt1, 1); if (fmt2 != NULL) LDD_SETENV("TRACE_LOADED_OBJECTS_FMT2", fmt2, 1); LDD_SETENV("TRACE_LOADED_OBJECTS_PROGNAME", *argv, 1); if (aflag) LDD_SETENV("TRACE_LOADED_OBJECTS_ALL", "1", 1); else if (fmt1 == NULL && fmt2 == NULL) /* Default formats */ printf("%s:\n", *argv); fflush(stdout); switch (fork()) { case -1: err(1, "fork"); break; default: if (wait(&status) < 0) { warn("wait"); rval |= 1; } else if (WIFSIGNALED(status)) { fprintf(stderr, "%s: signal %d\n", *argv, WTERMSIG(status)); rval |= 1; } else if (WIFEXITED(status) && WEXITSTATUS(status) != 0) { fprintf(stderr, "%s: exit status %d\n", *argv, WEXITSTATUS(status)); rval |= 1; } break; case 0: if (is_shlib == 0) { execl(*argv, *argv, (char *)NULL); warn("%s", *argv); } else { dlopen(*argv, RTLD_TRACE); warnx("%s: %s", *argv, dlerror()); } _exit(1); } } return rval; } static void usage(void) { fprintf(stderr, "usage: ldd [-a] [-v] [-f format] program ...\n"); exit(1); } static int is_executable(const char *fname, int fd, int *is_shlib, int *type) { union { #ifdef AOUT_SUPPORTED struct exec aout; #endif #if __ELF_WORD_SIZE > 32 && defined(ELF32_SUPPORTED) Elf32_Ehdr elf32; #endif Elf_Ehdr elf; } hdr; int n; *is_shlib = 0; *type = TYPE_UNKNOWN; if ((n = read(fd, &hdr, sizeof(hdr))) == -1) { warn("%s: can't read program header", fname); return (0); } #ifdef AOUT_SUPPORTED if ((size_t)n >= sizeof(hdr.aout) && !N_BADMAG(hdr.aout)) { /* a.out file */ if ((N_GETFLAG(hdr.aout) & EX_DPMASK) != EX_DYNAMIC #if 1 /* Compatibility */ || hdr.aout.a_entry < __LDPGSZ #endif ) { warnx("%s: not a dynamic executable", fname); return (0); } *type = TYPE_AOUT; return (1); } #endif #if __ELF_WORD_SIZE > 32 && defined(ELF32_SUPPORTED) if ((size_t)n >= sizeof(hdr.elf32) && IS_ELF(hdr.elf32) && hdr.elf32.e_ident[EI_CLASS] == ELFCLASS32) { /* Handle 32 bit ELF objects */ Elf32_Phdr phdr; int dynamic, i; dynamic = 0; *type = TYPE_ELF32; if (lseek(fd, hdr.elf32.e_phoff, SEEK_SET) == -1) { warnx("%s: header too short", fname); return (0); } for (i = 0; i < hdr.elf32.e_phnum; i++) { if (read(fd, &phdr, hdr.elf32.e_phentsize) != sizeof(phdr)) { warnx("%s: can't read program header", fname); return (0); } if (phdr.p_type == PT_DYNAMIC) { dynamic = 1; break; } } if (!dynamic) { warnx("%s: not a dynamic ELF executable", fname); return (0); } if (hdr.elf32.e_type == ET_DYN) { if (hdr.elf32.e_ident[EI_OSABI] == ELFOSABI_FREEBSD) { *is_shlib = 1; return (1); } warnx("%s: not a FreeBSD ELF shared object", fname); return (0); } return (1); } #endif if ((size_t)n >= sizeof(hdr.elf) && IS_ELF(hdr.elf) && hdr.elf.e_ident[EI_CLASS] == ELF_TARG_CLASS) { /* Handle default ELF objects on this architecture */ Elf_Phdr phdr; int dynamic, i; dynamic = 0; *type = TYPE_ELF; if (lseek(fd, hdr.elf.e_phoff, SEEK_SET) == -1) { warnx("%s: header too short", fname); return (0); } for (i = 0; i < hdr.elf.e_phnum; i++) { if (read(fd, &phdr, hdr.elf.e_phentsize) != sizeof(phdr)) { warnx("%s: can't read program header", fname); return (0); } if (phdr.p_type == PT_DYNAMIC) { dynamic = 1; break; } } if (!dynamic) { warnx("%s: not a dynamic ELF executable", fname); return (0); } if (hdr.elf.e_type == ET_DYN) { switch (hdr.elf.e_ident[EI_OSABI]) { case ELFOSABI_FREEBSD: *is_shlib = 1; return (1); #ifdef __ARM_EABI__ case ELFOSABI_NONE: if (hdr.elf.e_machine != EM_ARM) break; if (EF_ARM_EABI_VERSION(hdr.elf.e_flags) < EF_ARM_EABI_FREEBSD_MIN) break; *is_shlib = 1; return (1); #endif } warnx("%s: not a FreeBSD ELF shared object", fname); return (0); } return (1); } warnx("%s: not a dynamic executable", fname); return (0); } Index: projects/clang380-import/usr.bin/whois/whois.1 =================================================================== --- projects/clang380-import/usr.bin/whois/whois.1 (revision 294776) +++ projects/clang380-import/usr.bin/whois/whois.1 (revision 294777) @@ -1,273 +1,267 @@ .\" Copyright (c) 1985, 1990, 1993 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 4. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" From: @(#)whois.1 8.1 (Berkeley) 6/6/93 .\" $FreeBSD$ .\" -.Dd January 22, 2016 +.Dd January 23, 2016 .Dt WHOIS 1 .Os .Sh NAME .Nm whois .Nd "Internet domain name and network number directory service" .Sh SYNOPSIS .Nm .Op Fl aAbfgiIklmPQrRS .Op Fl c Ar country-code | Fl h Ar host .Op Fl p Ar port .Ar name ... .Sh DESCRIPTION The .Nm utility looks up records in the databases maintained by several Network Information Centers .Pq Tn NICs . .Pp By default .Nm -automatically discovers the name of a whois server to use -from the top-level domain -.Pq Tn TLD -of the supplied (single) argument. -It tries -.Qq Va TLD Ns Li .whois-servers.net -and -.Qq Li whois.nic. Ns Va TLD -and if neither host exists it falls back to its default server. +starts by querying the Internet Assigned Numbers Authority (IANA) whois server, +and follows referrals to whois servers +that have more specific details about the query +.Ar name . +The IANA whois server knows about +IP address and AS numbers +as well as domain names. .Pp -If an IP address or AS number is specified, -the whois server will default to -the American Registry for Internet Numbers -.Pq Tn ARIN . -.Pp -If +There are a few special cases where referrals do not work, so .Nm -cannot automatically discover a server, -it will fall back to -the host specified in the -.Ev WHOIS_SERVER -or -.Ev RA_SERVER -environment variables, or if those are not set, it will use -.Pa whois.crsnic.net . +goes directly to the appropriate server. +These include point-of-contact handles for ARIN, +.Pa nic.at , +NORID, and RIPE, +and domain names under +.Pa ac.uk . .Pp The options are as follows: .Bl -tag -width indent .It Fl a Use the American Registry for Internet Numbers .Pq Tn ARIN database. It contains network numbers used in those parts of the world covered neither by .Tn APNIC , AfriNIC , LACNIC , nor by .Tn RIPE . -.Pp -(Hint: All point of contact handles in the -.Tn ARIN -whois database end with -.Qq Li -ARIN . ) +The query syntax is documented at +.Pa https://www.arin.net/resources/whoisrws/whois_api.html#nicname .It Fl A Use the Asia/Pacific Network Information Center .Pq Tn APNIC database. It contains network numbers used in East Asia, Australia, New Zealand, and the Pacific islands. +Get query syntax documentation using +.Ic whois -A help .It Fl b Use the Network Abuse Clearinghouse database. It contains addresses to which network abuse should be reported, indexed by domain name. .It Fl c Ar country-code This is the equivalent of using the .Fl h option with an argument of .Qq Ar country-code Ns Li .whois-servers.net . .It Fl f Use the African Network Information Centre .Pq Tn AfriNIC database. It contains network numbers used in Africa and the islands of the western Indian Ocean. +Get query syntax documentation using +.Ic whois -f help .It Fl g Use the US non-military federal government database, which contains points of contact for subdomains of .Pa .GOV . .It Fl h Ar host Use the specified host instead of the default. Either a host name or an IP address may be specified. .It Fl i -Use the obsolete Network Solutions Registry for Internet Numbers -.Pq Pa whois.networksolutions.com +Use the traditional Network Information Center (InterNIC) +.Pq Pa whois.internic.net database. +This now contains only registrations for domain names under +.Pa .COM , +.Pa .NET , +.Pa .EDU . +You can specify the type of object to search for like +.Ic whois -i ' Ns Ar type Ar name Ns Ic ' +where +.Ar type +can be +.Nm domain , nameserver , registrar . +The +.Ar name +can contain +.Li * +wildcards. .It Fl I Use the Internet Assigned Numbers Authority .Pq Tn IANA database. -It contains network information for top-level domains. .It Fl k Use the National Internet Development Agency of Korea's .Pq Tn KRNIC database. It contains network numbers and domain contact information for Korea. .It Fl l Use the Latin American and Caribbean IP address Regional Registry .Pq Tn LACNIC database. It contains network numbers used in much of Latin America and the Caribbean. .It Fl m Use the Route Arbiter Database .Pq Tn RADB database. It contains route policy specifications for a large number of operators' networks. .It Fl p Ar port Connect to the whois server on .Ar port . If this option is not specified, .Nm defaults to port 43. .It Fl P Use the PeeringDB database of AS numbers. It contains details about presence at internet peering points for many network operators. .It Fl Q Do a quick lookup; .Nm will not attempt to follow referrals to other whois servers. This is the default if a server is explicitly specified -using one of the other options. +using one of the other options or in an environment variable. See also the .Fl R option. .It Fl r Use the R\(aaeseaux IP Europ\(aaeens .Pq Tn RIPE database. It contains network numbers and domain contact information for Europe. +Get query syntax documentation using +.Ic whois -r help .It Fl R Do a recursive lookup; .Nm will attempt to follow referrals to other whois servers. This is the default if no server is explicitly specified. See also the .Fl Q option. .It Fl S -By default, if the whois server is -.Pa whois.verisign-grs.com -(or a CNAME alias pointing at that name) -then +By default .Nm -will query for -.Dl domain Ar name -The +adjusts simple queries (without spaces) to produce more useful output +from certain whois servers, +and it suppresses some uninformative output. +With the .Fl S -option suppresses this behaviour, -allowing you to make a loose-matching query, -or query for host objects using the syntax -.Dl nameserver Ar name +option, +.Nm +sends the query and prints the output verbatim. .El .Pp The operands specified to .Nm are treated independently and may be used as queries on different whois servers. .Sh ENVIRONMENT .Bl -tag .It Ev WHOIS_SERVER The primary default whois server. If this is unset, .Nm uses the .Ev RA_SERVER environment variable. .It Ev RA_SERVER The secondary default whois server. If this is unset, .Nm will use -.Pa whois.crsnic.net . +.Pa whois.iana.org . .El .Sh EXIT STATUS .Ex -std .Sh EXAMPLES -Most types of data, such as domain names and -.Tn IP -addresses, can be used as arguments to -.Nm -without any options, and -.Nm -will choose the correct whois server to query. -Some exceptions, where -.Nm -will not be able to handle data correctly, are detailed below. -.Pp To obtain contact information about an administrator located in the Russian .Tn TLD domain .Qq Li RU , use the .Fl c option as shown in the following example, where .Ar CONTACT-ID is substituted with the actual contact identifier. .Pp .Dl "whois -c RU CONTACT-ID" .Pp (Note: This example is specific to the .Tn TLD .Qq Li RU , but other .Tn TLDs can be queried by using a similar syntax.) .Pp The following example demonstrates how to query a whois server using a non-standard port, where .Dq Li query-data is the query to be sent to .Dq Li whois.example.com on port .Dq Li rwhois (written numerically as 4321). .Pp .Dl "whois -h whois.example.com -p rwhois query-data" .Sh SEE ALSO .Rs .%A Ken Harrenstien .%A Vic White .%T NICNAME/WHOIS .%D 1 March 1982 .%O RFC 812 .Re .Sh HISTORY The .Nm command appeared in .Bx 4.3 . Index: projects/clang380-import/usr.bin/whois/whois.c =================================================================== --- projects/clang380-import/usr.bin/whois/whois.c (revision 294776) +++ projects/clang380-import/usr.bin/whois/whois.c (revision 294777) @@ -1,529 +1,493 @@ /*- * Copyright (c) 1980, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #ifndef lint static const char copyright[] = "@(#) Copyright (c) 1980, 1993\n\ The Regents of the University of California. All rights reserved.\n"; #endif /* not lint */ #if 0 #ifndef lint static char sccsid[] = "@(#)whois.c 8.1 (Berkeley) 6/6/93"; #endif /* not lint */ #endif #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define ABUSEHOST "whois.abuse.net" #define ANICHOST "whois.arin.net" -#define BNICHOST "whois.registro.br" +#define DENICHOST "whois.denic.de" +#define DKNICHOST "whois.dk-hostmaster.dk" #define FNICHOST "whois.afrinic.net" -#define GERMNICHOST "de" QNICHOST_TAIL #define GNICHOST "whois.nic.gov" #define IANAHOST "whois.iana.org" -#define INICHOST "whois.networksolutions.com" +#define INICHOST "whois.internic.net" #define KNICHOST "whois.krnic.net" #define LNICHOST "whois.lacnic.net" #define MNICHOST "whois.ra.net" -#define NICHOST "whois.crsnic.net" #define PDBHOST "whois.peeringdb.com" #define PNICHOST "whois.apnic.net" -#define QNICHOST_HEAD "whois.nic." #define QNICHOST_TAIL ".whois-servers.net" #define RNICHOST "whois.ripe.net" #define VNICHOST "whois.verisign-grs.com" #define DEFAULT_PORT "whois" -#define WHOIS_RECURSE 0x01 -#define WHOIS_QUICK 0x02 -#define WHOIS_SPAM_ME 0x04 +#define WHOIS_RECURSE 0x01 +#define WHOIS_QUICK 0x02 +#define WHOIS_SPAM_ME 0x04 +#define CHOPSPAM ">>> Last update of WHOIS database:" + #define ishost(h) (isalnum((unsigned char)h) || h == '.' || h == '-') +#define SCAN(p, end, check) \ + while ((p) < (end)) \ + if (check) ++(p); \ + else break + static struct { const char *suffix, *server; } whoiswhere[] = { /* Various handles */ { "-ARIN", ANICHOST }, { "-NICAT", "at" QNICHOST_TAIL }, { "-NORID", "no" QNICHOST_TAIL }, { "-RIPE", RNICHOST }, /* Nominet's whois server doesn't return referrals to JANET */ { ".ac.uk", "ac.uk" QNICHOST_TAIL }, - { NULL, NULL } + { "", IANAHOST }, /* default */ + { NULL, NULL } /* safety belt */ }; #define WHOIS_REFERRAL(s) { s, sizeof(s) - 1 } static struct { const char *prefix; size_t len; } whois_referral[] = { - WHOIS_REFERRAL("Whois Server: "), - WHOIS_REFERRAL("WHOIS Server: "), - WHOIS_REFERRAL(" Whois Server: "), - WHOIS_REFERRAL("refer: "), - WHOIS_REFERRAL("Registrant Street1:Whois Server:"), - WHOIS_REFERRAL("ReferralServer: whois://"), + WHOIS_REFERRAL("whois:"), /* IANA */ + WHOIS_REFERRAL("Whois Server:"), + WHOIS_REFERRAL("Registrar WHOIS Server:"), /* corporatedomains.com */ + WHOIS_REFERRAL("ReferralServer: whois://"), /* ARIN */ { NULL, 0 } }; static const char *port = DEFAULT_PORT; -static char *choose_server(char *); +static const char *choose_server(char *); static struct addrinfo *gethostinfo(char const *host, int exitnoname); static void s_asprintf(char **ret, const char *format, ...) __printflike(2, 3); static void usage(void); static void whois(const char *, const char *, int); int main(int argc, char *argv[]) { const char *country, *host; - char *qnichost; - int ch, flags, use_qnichost; + int ch, flags; #ifdef SOCKS SOCKSinit(argv[0]); #endif - country = host = qnichost = NULL; - flags = use_qnichost = 0; + country = host = NULL; + flags = 0; while ((ch = getopt(argc, argv, "aAbc:fgh:iIklmp:PQrRS")) != -1) { switch (ch) { case 'a': host = ANICHOST; break; case 'A': host = PNICHOST; break; case 'b': host = ABUSEHOST; break; case 'c': country = optarg; break; case 'f': host = FNICHOST; break; case 'g': host = GNICHOST; break; case 'h': host = optarg; break; case 'i': host = INICHOST; break; case 'I': host = IANAHOST; break; case 'k': host = KNICHOST; break; case 'l': host = LNICHOST; break; case 'm': host = MNICHOST; break; case 'p': port = optarg; break; case 'P': host = PDBHOST; break; case 'Q': flags |= WHOIS_QUICK; break; case 'r': host = RNICHOST; break; case 'R': flags |= WHOIS_RECURSE; break; case 'S': flags |= WHOIS_SPAM_ME; break; case '?': default: usage(); /* NOTREACHED */ } } argc -= optind; argv += optind; if (!argc || (country != NULL && host != NULL)) usage(); /* - * If no host or country is specified, try to determine the top - * level domain from the query, or fall back to NICHOST. + * If no host or country is specified, rely on referrals from IANA. */ if (host == NULL && country == NULL) { if ((host = getenv("WHOIS_SERVER")) == NULL && (host = getenv("RA_SERVER")) == NULL) { - use_qnichost = 1; - host = NICHOST; if (!(flags & WHOIS_QUICK)) flags |= WHOIS_RECURSE; } } while (argc-- > 0) { if (country != NULL) { + char *qnichost; s_asprintf(&qnichost, "%s%s", country, QNICHOST_TAIL); whois(*argv, qnichost, flags); - } else if (use_qnichost) - if ((qnichost = choose_server(*argv)) != NULL) - whois(*argv, qnichost, flags); - if (qnichost == NULL) - whois(*argv, host, flags); - free(qnichost); - qnichost = NULL; + free(qnichost); + } else + whois(*argv, host != NULL ? host : + choose_server(*argv), flags); argv++; } exit(0); } -/* - * This function will remove any trailing periods from domain, after which it - * returns a pointer to newly allocated memory containing the whois server to - * be queried, or a NULL if the correct server couldn't be determined. The - * caller must remember to free(3) the allocated memory. - * - * If the domain is an IPv6 address or has a known suffix, that determines - * the server, else if the TLD is a number, query ARIN, else try a couple of - * formulaic server names. Fail if the domain does not contain '.'. - */ -static char * +static const char * choose_server(char *domain) { - char *pos, *retval; + size_t len = strlen(domain); int i; - struct addrinfo *res; - if (strchr(domain, ':')) { - s_asprintf(&retval, "%s", ANICHOST); - return (retval); - } - if (strncasecmp(domain, "AS", 2) == 0) { - size_t len = strspn(domain + 2, "0123456789"); - if (domain[len + 2] == '\0') { - s_asprintf(&retval, "%s", ANICHOST); - return (retval); - } - } - for (pos = strchr(domain, '\0'); pos > domain && pos[-1] == '.';) - *--pos = '\0'; - if (*domain == '\0') - errx(EX_USAGE, "can't search for a null string"); for (i = 0; whoiswhere[i].suffix != NULL; i++) { size_t suffix_len = strlen(whoiswhere[i].suffix); - if (domain + suffix_len < pos && - strcasecmp(pos - suffix_len, whoiswhere[i].suffix) == 0) { - s_asprintf(&retval, "%s", whoiswhere[i].server); - return (retval); - } + if (len > suffix_len && + strcasecmp(domain + len - suffix_len, + whoiswhere[i].suffix) == 0) + return (whoiswhere[i].server); } - while (pos > domain && *pos != '.') - --pos; - if (pos <= domain) - return (NULL); - if (isdigit((unsigned char)*++pos)) { - s_asprintf(&retval, "%s", ANICHOST); - return (retval); - } - /* Try possible alternative whois server name formulae. */ - for (i = 0; ; ++i) { - switch (i) { - case 0: - s_asprintf(&retval, "%s%s", pos, QNICHOST_TAIL); - break; - case 1: - s_asprintf(&retval, "%s%s", QNICHOST_HEAD, pos); - break; - default: - return (NULL); - } - res = gethostinfo(retval, 0); - if (res) { - freeaddrinfo(res); - return (retval); - } else { - free(retval); - continue; - } - } + errx(EX_SOFTWARE, "no default whois server"); } static struct addrinfo * gethostinfo(char const *host, int exit_on_noname) { struct addrinfo hints, *res; int error; memset(&hints, 0, sizeof(hints)); hints.ai_flags = AI_CANONNAME; hints.ai_family = AF_UNSPEC; hints.ai_socktype = SOCK_STREAM; res = NULL; error = getaddrinfo(host, port, &hints, &res); if (error && (exit_on_noname || error != EAI_NONAME)) err(EX_NOHOST, "%s: %s", host, gai_strerror(error)); return (res); } /* * Wrapper for asprintf(3) that exits on error. */ static void s_asprintf(char **ret, const char *format, ...) { va_list ap; va_start(ap, format); if (vasprintf(ret, format, ap) == -1) { va_end(ap); err(EX_OSERR, "vasprintf()"); } va_end(ap); } static void whois(const char *query, const char *hostname, int flags) { FILE *fp; struct addrinfo *hostres, *res; char *buf, *host, *nhost, *p; - int s = -1, f, antispam; + int s = -1, f; nfds_t i, j; size_t len, count; struct pollfd *fds; int timeout = 180; hostres = gethostinfo(hostname, 1); for (res = hostres, count = 0; res; res = res->ai_next) count++; - - antispam = (flags & WHOIS_SPAM_ME) == 0 && - strcmp(hostres->ai_canonname, VNICHOST) == 0; - fds = calloc(count, sizeof(*fds)); if (fds == NULL) err(EX_OSERR, "calloc()"); /* * Traverse the result list elements and make non-block * connection attempts. */ count = i = 0; for (res = hostres; res != NULL; res = res->ai_next) { s = socket(res->ai_family, res->ai_socktype | SOCK_NONBLOCK, res->ai_protocol); if (s < 0) continue; if (connect(s, res->ai_addr, res->ai_addrlen) < 0) { if (errno == EINPROGRESS) { /* Add the socket to poll list */ fds[i].fd = s; fds[i].events = POLLERR | POLLHUP | POLLIN | POLLOUT; count++; i++; } else { close(s); s = -1; /* * Poll only if we have something to poll, * otherwise just go ahead and try next * address */ if (count == 0) continue; } } else goto done; /* * If we are at the last address, poll until a connection is * established or we failed all connection attempts. */ if (res->ai_next == NULL) timeout = INFTIM; /* * Poll the watched descriptors for successful connections: * if we still have more untried resolved addresses, poll only * once; otherwise, poll until all descriptors have errors, * which will be considered as ETIMEDOUT later. */ do { int n; n = poll(fds, i, timeout); if (n == 0) { /* * No event reported in time. Try with a * smaller timeout (but cap at 2-3ms) * after a new host have been added. */ if (timeout >= 3) timeout <<= 1; break; } else if (n < 0) { /* - * errno here can only be EINTR which we would want - * to clean up and bail out. + * errno here can only be EINTR which we would + * want to clean up and bail out. */ s = -1; goto done; } /* * Check for the event(s) we have seen. */ for (j = 0; j < i; j++) { if (fds[j].fd == -1 || fds[j].events == 0 || fds[j].revents == 0) continue; if (fds[j].revents & ~(POLLIN | POLLOUT)) { close(s); fds[j].fd = -1; fds[j].events = 0; count--; continue; } else if (fds[j].revents & (POLLIN | POLLOUT)) { /* Connect succeeded. */ s = fds[j].fd; goto done; } } } while (timeout == INFTIM && count != 0); } /* All attempts were failed */ s = -1; if (count == 0) errno = ETIMEDOUT; - done: + if (s == -1) + err(EX_OSERR, "connect()"); + /* Close all watched fds except the succeeded one */ for (j = 0; j < i; j++) if (fds[j].fd != s && fds[j].fd != -1) close(fds[j].fd); - - if (s != -1) { - /* Restore default blocking behavior. */ - if ((f = fcntl(s, F_GETFL)) != -1) { - f &= ~O_NONBLOCK; - if (fcntl(s, F_SETFL, f) == -1) - err(EX_OSERR, "fcntl()"); - } else - err(EX_OSERR, "fcntl()"); - } - free(fds); - freeaddrinfo(hostres); - if (s == -1) - err(EX_OSERR, "connect()"); + /* Restore default blocking behavior. */ + if ((f = fcntl(s, F_GETFL)) == -1) + err(EX_OSERR, "fcntl()"); + f &= ~O_NONBLOCK; + if (fcntl(s, F_SETFL, f) == -1) + err(EX_OSERR, "fcntl()"); + fp = fdopen(s, "r+"); if (fp == NULL) err(EX_OSERR, "fdopen()"); - if (strcmp(hostname, GERMNICHOST) == 0) { - fprintf(fp, "-T dn,ace -C ISO-8859-1 %s\r\n", query); - } else if (strcmp(hostname, "dk" QNICHOST_TAIL) == 0) { + + if (!(flags & WHOIS_SPAM_ME) && + (strcasecmp(hostname, DENICHOST) == 0 || + strcasecmp(hostname, "de" QNICHOST_TAIL) == 0)) { + const char *q; + int idn = 0; + for (q = query; *q != '\0'; q++) + if (!isascii(*q)) + idn = 1; + fprintf(fp, "-T dn%s %s\r\n", idn ? "" : ",ace", query); + } else if (!(flags & WHOIS_SPAM_ME) && + (strcasecmp(hostname, DKNICHOST) == 0 || + strcasecmp(hostname, "dk" QNICHOST_TAIL) == 0)) fprintf(fp, "--show-handles %s\r\n", query); - } else if (antispam) { + else if ((flags & WHOIS_SPAM_ME) || + strchr(query, ' ') != NULL) + fprintf(fp, "%s\r\n", query); + else if (strcasecmp(hostname, ANICHOST) == 0) + fprintf(fp, "+ %s\r\n", query); + else if (strcasecmp(hostres->ai_canonname, VNICHOST) == 0) fprintf(fp, "domain %s\r\n", query); - } else { + else fprintf(fp, "%s\r\n", query); - } fflush(fp); + nhost = NULL; while ((buf = fgetln(fp, &len)) != NULL) { - while (len > 0 && isspace((unsigned char)buf[len - 1])) - buf[--len] = '\0'; - printf("%.*s\n", (int)len, buf); + /* Nominet */ + if (!(flags & WHOIS_SPAM_ME) && + len == 5 && strncmp(buf, "-- \r\n", 5) == 0) + break; + printf("%.*s", (int)len, buf); + if ((flags & WHOIS_RECURSE) && nhost == NULL) { for (i = 0; whois_referral[i].prefix != NULL; i++) { - if (strncmp(buf, - whois_referral[i].prefix, - whois_referral[i].len) != 0) + p = buf; + SCAN(p, buf+len, *p == ' '); + if (strncasecmp(p, whois_referral[i].prefix, + whois_referral[i].len) != 0) continue; - host = buf + whois_referral[i].len; - for (p = host; p < buf + len; p++) - if (!ishost(*p)) - break; - s_asprintf(&nhost, "%.*s", - (int)(p - host), host); + p += whois_referral[i].len; + SCAN(p, buf+len, *p == ' '); + host = p; + SCAN(p, buf+len, ishost(*p)); + /* avoid loops */ + if (strncmp(hostname, host, p - host) != 0) + s_asprintf(&nhost, "%.*s", + (int)(p - host), host); break; } } + /* Verisign etc. */ + if (!(flags & WHOIS_SPAM_ME) && + len >= sizeof(CHOPSPAM)-1 && + (strncasecmp(buf, CHOPSPAM, sizeof(CHOPSPAM)-1) == 0 || + strncasecmp(buf, CHOPSPAM+4, sizeof(CHOPSPAM)-5) == 0)) { + printf("\n"); + break; + } } fclose(fp); + freeaddrinfo(hostres); if (nhost != NULL) { - whois(query, nhost, 0); + whois(query, nhost, flags); free(nhost); } } static void usage(void) { fprintf(stderr, "usage: whois [-aAbfgiIklmPQrRS] [-c country-code | -h hostname] " "[-p port] name ...\n"); exit(EX_USAGE); } Index: projects/clang380-import/usr.sbin/autofs/automount.c =================================================================== --- projects/clang380-import/usr.sbin/autofs/automount.c (revision 294776) +++ projects/clang380-import/usr.sbin/autofs/automount.c (revision 294777) @@ -1,396 +1,395 @@ /*- * Copyright (c) 2014 The FreeBSD Foundation * All rights reserved. * * This software was developed by Edward Tomasz Napierala under sponsorship * from the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include - -#include #include "common.h" #include "mntopts.h" static int unmount_by_statfs(const struct statfs *sb, bool force) { char *fsid_str; int error, ret, flags; ret = asprintf(&fsid_str, "FSID:%d:%d", sb->f_fsid.val[0], sb->f_fsid.val[1]); if (ret < 0) log_err(1, "asprintf"); log_debugx("unmounting %s using %s", sb->f_mntonname, fsid_str); flags = MNT_BYFSID; if (force) flags |= MNT_FORCE; error = unmount(fsid_str, flags); free(fsid_str); if (error != 0) log_warn("cannot unmount %s", sb->f_mntonname); return (error); } static const struct statfs * find_statfs(const struct statfs *mntbuf, int nitems, const char *mountpoint) { int i; for (i = 0; i < nitems; i++) { if (strcmp(mntbuf[i].f_mntonname, mountpoint) == 0) return (mntbuf + i); } return (NULL); } static void mount_autofs(const char *from, const char *fspath, const char *options, const char *prefix) { struct iovec *iov = NULL; char errmsg[255]; int error, iovlen = 0; create_directory(fspath); log_debugx("mounting %s on %s, prefix \"%s\", options \"%s\"", from, fspath, prefix, options); memset(errmsg, 0, sizeof(errmsg)); build_iovec(&iov, &iovlen, "fstype", __DECONST(void *, "autofs"), (size_t)-1); build_iovec(&iov, &iovlen, "fspath", __DECONST(void *, fspath), (size_t)-1); build_iovec(&iov, &iovlen, "from", __DECONST(void *, from), (size_t)-1); build_iovec(&iov, &iovlen, "errmsg", errmsg, sizeof(errmsg)); /* * Append the options and mountpoint defined in auto_master(5); * this way automountd(8) does not need to parse it. */ build_iovec(&iov, &iovlen, "master_options", __DECONST(void *, options), (size_t)-1); build_iovec(&iov, &iovlen, "master_prefix", __DECONST(void *, prefix), (size_t)-1); error = nmount(iov, iovlen, 0); if (error != 0) { if (*errmsg != '\0') { log_err(1, "cannot mount %s on %s: %s", from, fspath, errmsg); } else { log_err(1, "cannot mount %s on %s", from, fspath); } } } static void mount_if_not_already(const struct node *n, const char *map, const char *options, const char *prefix, const struct statfs *mntbuf, int nitems) { const struct statfs *sb; char *mountpoint; char *from; int ret; ret = asprintf(&from, "map %s", map); if (ret < 0) log_err(1, "asprintf"); mountpoint = node_path(n); sb = find_statfs(mntbuf, nitems, mountpoint); if (sb != NULL) { if (strcmp(sb->f_fstypename, "autofs") != 0) { log_debugx("unknown filesystem mounted " "on %s; mounting", mountpoint); /* * XXX: Compare options and 'from', * and update the mount if necessary. */ } else { log_debugx("autofs already mounted " "on %s", mountpoint); free(from); free(mountpoint); return; } } else { log_debugx("nothing mounted on %s; mounting", mountpoint); } mount_autofs(from, mountpoint, options, prefix); free(from); free(mountpoint); } static void mount_unmount(struct node *root) { struct statfs *mntbuf; struct node *n, *n2; int i, nitems; nitems = getmntinfo(&mntbuf, MNT_WAIT); if (nitems <= 0) log_err(1, "getmntinfo"); log_debugx("unmounting stale autofs mounts"); for (i = 0; i < nitems; i++) { if (strcmp(mntbuf[i].f_fstypename, "autofs") != 0) { log_debugx("skipping %s, filesystem type is not autofs", mntbuf[i].f_mntonname); continue; } n = node_find(root, mntbuf[i].f_mntonname); if (n != NULL) { log_debugx("leaving autofs mounted on %s", mntbuf[i].f_mntonname); continue; } log_debugx("autofs mounted on %s not found " "in new configuration; unmounting", mntbuf[i].f_mntonname); unmount_by_statfs(&(mntbuf[i]), false); } log_debugx("mounting new autofs mounts"); TAILQ_FOREACH(n, &root->n_children, n_next) { if (!node_is_direct_map(n)) { mount_if_not_already(n, n->n_map, n->n_options, n->n_key, mntbuf, nitems); continue; } TAILQ_FOREACH(n2, &n->n_children, n_next) { mount_if_not_already(n2, n->n_map, n->n_options, "/", mntbuf, nitems); } } } static void flush_autofs(const char *fspath) { struct iovec *iov = NULL; char errmsg[255]; int error, iovlen = 0; log_debugx("flushing %s", fspath); memset(errmsg, 0, sizeof(errmsg)); build_iovec(&iov, &iovlen, "fstype", __DECONST(void *, "autofs"), (size_t)-1); build_iovec(&iov, &iovlen, "fspath", __DECONST(void *, fspath), (size_t)-1); build_iovec(&iov, &iovlen, "errmsg", errmsg, sizeof(errmsg)); error = nmount(iov, iovlen, MNT_UPDATE); if (error != 0) { if (*errmsg != '\0') { log_err(1, "cannot flush %s: %s", fspath, errmsg); } else { log_err(1, "cannot flush %s", fspath); } } } static void flush_caches(void) { struct statfs *mntbuf; int i, nitems; nitems = getmntinfo(&mntbuf, MNT_WAIT); if (nitems <= 0) log_err(1, "getmntinfo"); log_debugx("flushing autofs caches"); for (i = 0; i < nitems; i++) { if (strcmp(mntbuf[i].f_fstypename, "autofs") != 0) { log_debugx("skipping %s, filesystem type is not autofs", mntbuf[i].f_mntonname); continue; } flush_autofs(mntbuf[i].f_mntonname); } } static void unmount_automounted(bool force) { struct statfs *mntbuf; int i, nitems; nitems = getmntinfo(&mntbuf, MNT_WAIT); if (nitems <= 0) log_err(1, "getmntinfo"); log_debugx("unmounting automounted filesystems"); for (i = 0; i < nitems; i++) { if (strcmp(mntbuf[i].f_fstypename, "autofs") == 0) { log_debugx("skipping %s, filesystem type is autofs", mntbuf[i].f_mntonname); continue; } if ((mntbuf[i].f_flags & MNT_AUTOMOUNTED) == 0) { log_debugx("skipping %s, not automounted", mntbuf[i].f_mntonname); continue; } unmount_by_statfs(&(mntbuf[i]), force); } } static void usage_automount(void) { fprintf(stderr, "usage: automount [-D name=value][-o opts][-Lcfuv]\n"); exit(1); } int main_automount(int argc, char **argv) { struct node *root; int ch, debug = 0, show_maps = 0; char *options = NULL; bool do_unmount = false, force_unmount = false, flush = false; /* * Note that in automount(8), the only purpose of variable * handling is to aid in debugging maps (automount -L). */ defined_init(); while ((ch = getopt(argc, argv, "D:Lfco:uv")) != -1) { switch (ch) { case 'D': defined_parse_and_add(optarg); break; case 'L': show_maps++; break; case 'c': flush = true; break; case 'f': force_unmount = true; break; case 'o': options = concat(options, ',', optarg); break; case 'u': do_unmount = true; break; case 'v': debug++; break; case '?': default: usage_automount(); } } argc -= optind; if (argc != 0) usage_automount(); if (force_unmount && !do_unmount) usage_automount(); log_init(debug); if (flush) { flush_caches(); return (0); } if (do_unmount) { unmount_automounted(force_unmount); return (0); } root = node_new_root(); parse_master(root, AUTO_MASTER_PATH); if (show_maps) { if (show_maps > 1) { node_expand_indirect_maps(root); node_expand_ampersand(root, NULL); } node_expand_defined(root); node_print(root, options); return (0); } mount_unmount(root); return (0); } Index: projects/clang380-import/usr.sbin/autofs/automountd.c =================================================================== --- projects/clang380-import/usr.sbin/autofs/automountd.c (revision 294776) +++ projects/clang380-import/usr.sbin/autofs/automountd.c (revision 294777) @@ -1,569 +1,568 @@ /*- * Copyright (c) 2014 The FreeBSD Foundation * All rights reserved. * * This software was developed by Edward Tomasz Napierala under sponsorship * from the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include - -#include #include "autofs_ioctl.h" #include "common.h" #define AUTOMOUNTD_PIDFILE "/var/run/automountd.pid" static int nchildren = 0; static int autofs_fd; static int request_id; static void done(int request_error, bool wildcards) { struct autofs_daemon_done add; int error; memset(&add, 0, sizeof(add)); add.add_id = request_id; add.add_wildcards = wildcards; add.add_error = request_error; log_debugx("completing request %d with error %d", request_id, request_error); error = ioctl(autofs_fd, AUTOFSDONE, &add); if (error != 0) log_warn("AUTOFSDONE"); } /* * Remove "fstype=whatever" from optionsp and return the "whatever" part. */ static char * pick_option(const char *option, char **optionsp) { char *tofree, *pair, *newoptions; char *picked = NULL; bool first = true; tofree = *optionsp; newoptions = calloc(strlen(*optionsp) + 1, 1); if (newoptions == NULL) log_err(1, "calloc"); while ((pair = strsep(optionsp, ",")) != NULL) { /* * XXX: strncasecmp(3) perhaps? */ if (strncmp(pair, option, strlen(option)) == 0) { picked = checked_strdup(pair + strlen(option)); } else { if (first == false) strcat(newoptions, ","); else first = false; strcat(newoptions, pair); } } free(tofree); *optionsp = newoptions; return (picked); } static void create_subtree(const struct node *node, bool incomplete) { const struct node *child; char *path; bool wildcard_found = false; /* * Skip wildcard nodes. */ if (strcmp(node->n_key, "*") == 0) return; path = node_path(node); log_debugx("creating subtree at %s", path); create_directory(path); if (incomplete) { TAILQ_FOREACH(child, &node->n_children, n_next) { if (strcmp(child->n_key, "*") == 0) { wildcard_found = true; break; } } if (wildcard_found) { log_debugx("node %s contains wildcard entry; " "not creating its subdirectories due to -d flag", path); free(path); return; } } free(path); TAILQ_FOREACH(child, &node->n_children, n_next) create_subtree(child, incomplete); } static void exit_callback(void) { done(EIO, true); } static void handle_request(const struct autofs_daemon_request *adr, char *cmdline_options, bool incomplete_hierarchy) { const char *map; struct node *root, *parent, *node; FILE *f; char *key, *options, *fstype, *nobrowse, *retrycnt, *tmp; int error; bool wildcards; log_debugx("got request %d: from %s, path %s, prefix \"%s\", " "key \"%s\", options \"%s\"", adr->adr_id, adr->adr_from, adr->adr_path, adr->adr_prefix, adr->adr_key, adr->adr_options); /* * Try to notify the kernel about any problems. */ request_id = adr->adr_id; atexit(exit_callback); if (strncmp(adr->adr_from, "map ", 4) != 0) { log_errx(1, "invalid mountfrom \"%s\"; failing request", adr->adr_from); } map = adr->adr_from + 4; /* 4 for strlen("map "); */ root = node_new_root(); if (adr->adr_prefix[0] == '\0' || strcmp(adr->adr_prefix, "/") == 0) { /* * Direct map. autofs(4) doesn't have a way to determine * correct map key, but since it's a direct map, we can just * use adr_path instead. */ parent = root; key = checked_strdup(adr->adr_path); } else { /* * Indirect map. */ parent = node_new_map(root, checked_strdup(adr->adr_prefix), NULL, checked_strdup(map), checked_strdup("[kernel request]"), lineno); if (adr->adr_key[0] == '\0') key = NULL; else key = checked_strdup(adr->adr_key); } /* * "Wildcards" here actually means "make autofs(4) request * automountd(8) action if the node being looked up does not * exist, even though the parent is marked as cached". This * needs to be done for maps with wildcard entries, but also * for special and executable maps. */ parse_map(parent, map, key, &wildcards); if (!wildcards) wildcards = node_has_wildcards(parent); if (wildcards) log_debugx("map may contain wildcard entries"); else log_debugx("map does not contain wildcard entries"); if (key != NULL) node_expand_wildcard(root, key); node = node_find(root, adr->adr_path); if (node == NULL) { log_errx(1, "map %s does not contain key for \"%s\"; " "failing mount", map, adr->adr_path); } options = node_options(node); /* * Append options from auto_master. */ options = concat(options, ',', adr->adr_options); /* * Prepend options passed via automountd(8) command line. */ options = concat(cmdline_options, ',', options); if (node->n_location == NULL) { log_debugx("found node defined at %s:%d; not a mountpoint", node->n_config_file, node->n_config_line); nobrowse = pick_option("nobrowse", &options); if (nobrowse != NULL && key == NULL) { log_debugx("skipping map %s due to \"nobrowse\" " "option; exiting", map); done(0, true); /* * Exit without calling exit_callback(). */ quick_exit(0); } /* * Not a mountpoint; create directories in the autofs mount * and complete the request. */ create_subtree(node, incomplete_hierarchy); if (incomplete_hierarchy && key != NULL) { /* * We still need to create the single subdirectory * user is trying to access. */ tmp = concat(adr->adr_path, '/', key); node = node_find(root, tmp); if (node != NULL) create_subtree(node, false); } log_debugx("nothing to mount; exiting"); done(0, wildcards); /* * Exit without calling exit_callback(). */ quick_exit(0); } log_debugx("found node defined at %s:%d; it is a mountpoint", node->n_config_file, node->n_config_line); if (key != NULL) node_expand_ampersand(node, key); error = node_expand_defined(node); if (error != 0) { log_errx(1, "variable expansion failed for %s; " "failing mount", adr->adr_path); } /* * Append "automounted". */ options = concat(options, ',', "automounted"); /* * Remove "nobrowse", mount(8) doesn't understand it. */ pick_option("nobrowse", &options); /* * Figure out fstype. */ fstype = pick_option("fstype=", &options); if (fstype == NULL) { log_debugx("fstype not specified in options; " "defaulting to \"nfs\""); fstype = checked_strdup("nfs"); } if (strcmp(fstype, "nfs") == 0) { /* * The mount_nfs(8) command defaults to retry undefinitely. * We do not want that behaviour, because it leaves mount_nfs(8) * instances and automountd(8) children hanging forever. * Disable retries unless the option was passed explicitly. */ retrycnt = pick_option("retrycnt=", &options); if (retrycnt == NULL) { log_debugx("retrycnt not specified in options; " "defaulting to 1"); options = concat(options, ',', "retrycnt=1"); } else { options = concat(options, ',', concat("retrycnt", '=', retrycnt)); } } f = auto_popen("mount", "-t", fstype, "-o", options, node->n_location, adr->adr_path, NULL); assert(f != NULL); error = auto_pclose(f); if (error != 0) log_errx(1, "mount failed"); log_debugx("mount done; exiting"); done(0, wildcards); /* * Exit without calling exit_callback(). */ quick_exit(0); } static void sigchld_handler(int dummy __unused) { /* * The only purpose of this handler is to make SIGCHLD * interrupt the AUTOFSREQUEST ioctl(2), so we can call * wait_for_children(). */ } static void register_sigchld(void) { struct sigaction sa; int error; bzero(&sa, sizeof(sa)); sa.sa_handler = sigchld_handler; sigfillset(&sa.sa_mask); error = sigaction(SIGCHLD, &sa, NULL); if (error != 0) log_err(1, "sigaction"); } static int wait_for_children(bool block) { pid_t pid; int status; int num = 0; for (;;) { /* * If "block" is true, wait for at least one process. */ if (block && num == 0) pid = wait4(-1, &status, 0, NULL); else pid = wait4(-1, &status, WNOHANG, NULL); if (pid <= 0) break; if (WIFSIGNALED(status)) { log_warnx("child process %d terminated with signal %d", pid, WTERMSIG(status)); } else if (WEXITSTATUS(status) != 0) { log_debugx("child process %d terminated with exit status %d", pid, WEXITSTATUS(status)); } else { log_debugx("child process %d terminated gracefully", pid); } num++; } return (num); } static void usage_automountd(void) { fprintf(stderr, "usage: automountd [-D name=value][-m maxproc]" "[-o opts][-Tidv]\n"); exit(1); } int main_automountd(int argc, char **argv) { struct pidfh *pidfh; pid_t pid, otherpid; const char *pidfile_path = AUTOMOUNTD_PIDFILE; char *options = NULL; struct autofs_daemon_request request; int ch, debug = 0, error, maxproc = 30, retval, saved_errno; bool dont_daemonize = false, incomplete_hierarchy = false; defined_init(); while ((ch = getopt(argc, argv, "D:Tdim:o:v")) != -1) { switch (ch) { case 'D': defined_parse_and_add(optarg); break; case 'T': /* * For compatibility with other implementations, * such as OS X. */ debug++; break; case 'd': dont_daemonize = true; debug++; break; case 'i': incomplete_hierarchy = true; break; case 'm': maxproc = atoi(optarg); break; case 'o': options = concat(options, ',', optarg); break; case 'v': debug++; break; case '?': default: usage_automountd(); } } argc -= optind; if (argc != 0) usage_automountd(); log_init(debug); pidfh = pidfile_open(pidfile_path, 0600, &otherpid); if (pidfh == NULL) { if (errno == EEXIST) { log_errx(1, "daemon already running, pid: %jd.", (intmax_t)otherpid); } log_err(1, "cannot open or create pidfile \"%s\"", pidfile_path); } autofs_fd = open(AUTOFS_PATH, O_RDWR | O_CLOEXEC); if (autofs_fd < 0 && errno == ENOENT) { saved_errno = errno; retval = kldload("autofs"); if (retval != -1) autofs_fd = open(AUTOFS_PATH, O_RDWR | O_CLOEXEC); else errno = saved_errno; } if (autofs_fd < 0) log_err(1, "failed to open %s", AUTOFS_PATH); if (dont_daemonize == false) { if (daemon(0, 0) == -1) { log_warn("cannot daemonize"); pidfile_remove(pidfh); exit(1); } } else { lesser_daemon(); } pidfile_write(pidfh); register_sigchld(); for (;;) { log_debugx("waiting for request from the kernel"); memset(&request, 0, sizeof(request)); error = ioctl(autofs_fd, AUTOFSREQUEST, &request); if (error != 0) { if (errno == EINTR) { nchildren -= wait_for_children(false); assert(nchildren >= 0); continue; } log_err(1, "AUTOFSREQUEST"); } if (dont_daemonize) { log_debugx("not forking due to -d flag; " "will exit after servicing a single request"); } else { nchildren -= wait_for_children(false); assert(nchildren >= 0); while (maxproc > 0 && nchildren >= maxproc) { log_debugx("maxproc limit of %d child processes hit; " "waiting for child process to exit", maxproc); nchildren -= wait_for_children(true); assert(nchildren >= 0); } log_debugx("got request; forking child process #%d", nchildren); nchildren++; pid = fork(); if (pid < 0) log_err(1, "fork"); if (pid > 0) continue; } pidfile_close(pidfh); handle_request(&request, options, incomplete_hierarchy); } pidfile_close(pidfh); return (0); } Index: projects/clang380-import/usr.sbin/autofs/autounmountd.c =================================================================== --- projects/clang380-import/usr.sbin/autofs/autounmountd.c (revision 294776) +++ projects/clang380-import/usr.sbin/autofs/autounmountd.c (revision 294777) @@ -1,351 +1,351 @@ /*- * Copyright (c) 2014 The FreeBSD Foundation * All rights reserved. * * This software was developed by Edward Tomasz Napierala under sponsorship * from the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include +#include #include #include #include #include #include #include -#include #include "common.h" #define AUTOUNMOUNTD_PIDFILE "/var/run/autounmountd.pid" struct automounted_fs { TAILQ_ENTRY(automounted_fs) af_next; time_t af_mount_time; bool af_mark; fsid_t af_fsid; char af_mountpoint[MNAMELEN]; }; static TAILQ_HEAD(, automounted_fs) automounted; static struct automounted_fs * automounted_find(fsid_t fsid) { struct automounted_fs *af; TAILQ_FOREACH(af, &automounted, af_next) { if (af->af_fsid.val[0] == fsid.val[0] && af->af_fsid.val[1] == fsid.val[1]) return (af); } return (NULL); } static struct automounted_fs * automounted_add(fsid_t fsid, const char *mountpoint) { struct automounted_fs *af; af = calloc(sizeof(*af), 1); if (af == NULL) log_err(1, "calloc"); af->af_mount_time = time(NULL); af->af_fsid = fsid; strlcpy(af->af_mountpoint, mountpoint, sizeof(af->af_mountpoint)); TAILQ_INSERT_TAIL(&automounted, af, af_next); return (af); } static void automounted_remove(struct automounted_fs *af) { TAILQ_REMOVE(&automounted, af, af_next); free(af); } static void refresh_automounted(void) { struct automounted_fs *af, *tmpaf; struct statfs *mntbuf; int i, nitems; nitems = getmntinfo(&mntbuf, MNT_WAIT); if (nitems <= 0) log_err(1, "getmntinfo"); log_debugx("refreshing list of automounted filesystems"); TAILQ_FOREACH(af, &automounted, af_next) af->af_mark = false; for (i = 0; i < nitems; i++) { if (strcmp(mntbuf[i].f_fstypename, "autofs") == 0) { log_debugx("skipping %s, filesystem type is autofs", mntbuf[i].f_mntonname); continue; } if ((mntbuf[i].f_flags & MNT_AUTOMOUNTED) == 0) { log_debugx("skipping %s, not automounted", mntbuf[i].f_mntonname); continue; } af = automounted_find(mntbuf[i].f_fsid); if (af == NULL) { log_debugx("new automounted filesystem found on %s " "(FSID:%d:%d)", mntbuf[i].f_mntonname, mntbuf[i].f_fsid.val[0], mntbuf[i].f_fsid.val[1]); af = automounted_add(mntbuf[i].f_fsid, mntbuf[i].f_mntonname); } else { log_debugx("already known automounted filesystem " "found on %s (FSID:%d:%d)", mntbuf[i].f_mntonname, mntbuf[i].f_fsid.val[0], mntbuf[i].f_fsid.val[1]); } af->af_mark = true; } TAILQ_FOREACH_SAFE(af, &automounted, af_next, tmpaf) { if (af->af_mark) continue; log_debugx("lost filesystem mounted on %s (FSID:%d:%d)", af->af_mountpoint, af->af_fsid.val[0], af->af_fsid.val[1]); automounted_remove(af); } } static int unmount_by_fsid(const fsid_t fsid, const char *mountpoint) { char *fsid_str; int error, ret; ret = asprintf(&fsid_str, "FSID:%d:%d", fsid.val[0], fsid.val[1]); if (ret < 0) log_err(1, "asprintf"); error = unmount(fsid_str, MNT_BYFSID); if (error != 0) { if (errno == EBUSY) { log_debugx("cannot unmount %s (%s): %s", mountpoint, fsid_str, strerror(errno)); } else { log_warn("cannot unmount %s (%s)", mountpoint, fsid_str); } } free(fsid_str); return (error); } static double expire_automounted(double expiration_time) { struct automounted_fs *af, *tmpaf; time_t now; double mounted_for, mounted_max = -1.0; int error; now = time(NULL); log_debugx("expiring automounted filesystems"); TAILQ_FOREACH_SAFE(af, &automounted, af_next, tmpaf) { mounted_for = difftime(now, af->af_mount_time); if (mounted_for < expiration_time) { log_debugx("skipping %s (FSID:%d:%d), mounted " "for %.0f seconds", af->af_mountpoint, af->af_fsid.val[0], af->af_fsid.val[1], mounted_for); if (mounted_for > mounted_max) mounted_max = mounted_for; continue; } log_debugx("filesystem mounted on %s (FSID:%d:%d), " "was mounted for %.0f seconds; unmounting", af->af_mountpoint, af->af_fsid.val[0], af->af_fsid.val[1], mounted_for); error = unmount_by_fsid(af->af_fsid, af->af_mountpoint); if (error != 0) { if (mounted_for > mounted_max) mounted_max = mounted_for; } } return (mounted_max); } static void usage_autounmountd(void) { fprintf(stderr, "usage: autounmountd [-r time][-t time][-dv]\n"); exit(1); } static void do_wait(int kq, double sleep_time) { struct timespec timeout; struct kevent unused; int nevents; if (sleep_time != -1.0) { assert(sleep_time > 0.0); timeout.tv_sec = sleep_time; timeout.tv_nsec = 0; log_debugx("waiting for filesystem event for %.0f seconds", sleep_time); nevents = kevent(kq, NULL, 0, &unused, 1, &timeout); } else { log_debugx("waiting for filesystem event"); nevents = kevent(kq, NULL, 0, &unused, 1, NULL); } if (nevents < 0) log_err(1, "kevent"); if (nevents == 0) { log_debugx("timeout reached"); assert(sleep_time > 0.0); } else { log_debugx("got filesystem event"); } } int main_autounmountd(int argc, char **argv) { struct kevent event; struct pidfh *pidfh; pid_t otherpid; const char *pidfile_path = AUTOUNMOUNTD_PIDFILE; int ch, debug = 0, error, kq; double expiration_time = 600, retry_time = 600, mounted_max, sleep_time; bool dont_daemonize = false; while ((ch = getopt(argc, argv, "dr:t:v")) != -1) { switch (ch) { case 'd': dont_daemonize = true; debug++; break; case 'r': retry_time = atoi(optarg); break; case 't': expiration_time = atoi(optarg); break; case 'v': debug++; break; case '?': default: usage_autounmountd(); } } argc -= optind; if (argc != 0) usage_autounmountd(); if (retry_time <= 0) log_errx(1, "retry time must be greater than zero"); if (expiration_time <= 0) log_errx(1, "expiration time must be greater than zero"); log_init(debug); pidfh = pidfile_open(pidfile_path, 0600, &otherpid); if (pidfh == NULL) { if (errno == EEXIST) { log_errx(1, "daemon already running, pid: %jd.", (intmax_t)otherpid); } log_err(1, "cannot open or create pidfile \"%s\"", pidfile_path); } if (dont_daemonize == false) { if (daemon(0, 0) == -1) { log_warn("cannot daemonize"); pidfile_remove(pidfh); exit(1); } } pidfile_write(pidfh); TAILQ_INIT(&automounted); kq = kqueue(); if (kq < 0) log_err(1, "kqueue"); EV_SET(&event, 0, EVFILT_FS, EV_ADD | EV_CLEAR, 0, 0, NULL); error = kevent(kq, &event, 1, NULL, 0, NULL); if (error < 0) log_err(1, "kevent"); for (;;) { refresh_automounted(); mounted_max = expire_automounted(expiration_time); if (mounted_max == -1.0) { sleep_time = mounted_max; log_debugx("no filesystems to expire"); } else if (mounted_max < expiration_time) { sleep_time = difftime(expiration_time, mounted_max); log_debugx("some filesystems expire in %.0f seconds", sleep_time); } else { sleep_time = retry_time; log_debugx("some expired filesystems remain mounted, " "will retry in %.0f seconds", sleep_time); } do_wait(kq, sleep_time); } return (0); } Index: projects/clang380-import/usr.sbin/autofs/common.c =================================================================== --- projects/clang380-import/usr.sbin/autofs/common.c (revision 294776) +++ projects/clang380-import/usr.sbin/autofs/common.c (revision 294777) @@ -1,1225 +1,1224 @@ /*- * Copyright (c) 2014 The FreeBSD Foundation * All rights reserved. * * This software was developed by Edward Tomasz Napierala under sponsorship * from the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #define _WITH_GETLINE #include #include #include #include - -#include #include "autofs_ioctl.h" #include "common.h" extern FILE *yyin; extern char *yytext; extern int yylex(void); static void parse_master_yyin(struct node *root, const char *master); static void parse_map_yyin(struct node *parent, const char *map, const char *executable_key); char * checked_strdup(const char *s) { char *c; assert(s != NULL); c = strdup(s); if (c == NULL) log_err(1, "strdup"); return (c); } /* * Concatenate two strings, inserting separator between them, unless not needed. */ char * concat(const char *s1, char separator, const char *s2) { char *result; char s1last, s2first; int ret; if (s1 == NULL) s1 = ""; if (s2 == NULL) s2 = ""; if (s1[0] == '\0') s1last = '\0'; else s1last = s1[strlen(s1) - 1]; s2first = s2[0]; if (s1last == separator && s2first == separator) { /* * If s1 ends with the separator and s2 begins with * it - skip the latter; otherwise concatenating "/" * and "/foo" would end up returning "//foo". */ ret = asprintf(&result, "%s%s", s1, s2 + 1); } else if (s1last == separator || s2first == separator || s1[0] == '\0' || s2[0] == '\0') { ret = asprintf(&result, "%s%s", s1, s2); } else { ret = asprintf(&result, "%s%c%s", s1, separator, s2); } if (ret < 0) log_err(1, "asprintf"); //log_debugx("%s: got %s and %s, returning %s", __func__, s1, s2, result); return (result); } void create_directory(const char *path) { char *component, *copy, *tofree, *partial, *tmp; int error; assert(path[0] == '/'); /* * +1 to skip the leading slash. */ copy = tofree = checked_strdup(path + 1); partial = checked_strdup(""); for (;;) { component = strsep(©, "/"); if (component == NULL) break; tmp = concat(partial, '/', component); free(partial); partial = tmp; //log_debugx("creating \"%s\"", partial); error = mkdir(partial, 0755); if (error != 0 && errno != EEXIST) { log_warn("cannot create %s", partial); return; } } free(tofree); } struct node * node_new_root(void) { struct node *n; n = calloc(1, sizeof(*n)); if (n == NULL) log_err(1, "calloc"); // XXX n->n_key = checked_strdup("/"); n->n_options = checked_strdup(""); TAILQ_INIT(&n->n_children); return (n); } struct node * node_new(struct node *parent, char *key, char *options, char *location, const char *config_file, int config_line) { struct node *n; n = calloc(1, sizeof(*n)); if (n == NULL) log_err(1, "calloc"); TAILQ_INIT(&n->n_children); assert(key != NULL); assert(key[0] != '\0'); n->n_key = key; if (options != NULL) n->n_options = options; else n->n_options = strdup(""); n->n_location = location; assert(config_file != NULL); n->n_config_file = config_file; assert(config_line >= 0); n->n_config_line = config_line; assert(parent != NULL); n->n_parent = parent; TAILQ_INSERT_TAIL(&parent->n_children, n, n_next); return (n); } struct node * node_new_map(struct node *parent, char *key, char *options, char *map, const char *config_file, int config_line) { struct node *n; n = calloc(1, sizeof(*n)); if (n == NULL) log_err(1, "calloc"); TAILQ_INIT(&n->n_children); assert(key != NULL); assert(key[0] != '\0'); n->n_key = key; if (options != NULL) n->n_options = options; else n->n_options = strdup(""); n->n_map = map; assert(config_file != NULL); n->n_config_file = config_file; assert(config_line >= 0); n->n_config_line = config_line; assert(parent != NULL); n->n_parent = parent; TAILQ_INSERT_TAIL(&parent->n_children, n, n_next); return (n); } static struct node * node_duplicate(const struct node *o, struct node *parent) { const struct node *child; struct node *n; if (parent == NULL) parent = o->n_parent; n = node_new(parent, o->n_key, o->n_options, o->n_location, o->n_config_file, o->n_config_line); TAILQ_FOREACH(child, &o->n_children, n_next) node_duplicate(child, n); return (n); } static void node_delete(struct node *n) { struct node *child, *tmp; assert (n != NULL); TAILQ_FOREACH_SAFE(child, &n->n_children, n_next, tmp) node_delete(child); if (n->n_parent != NULL) TAILQ_REMOVE(&n->n_parent->n_children, n, n_next); free(n); } /* * Move (reparent) node 'n' to make it sibling of 'previous', placed * just after it. */ static void node_move_after(struct node *n, struct node *previous) { TAILQ_REMOVE(&n->n_parent->n_children, n, n_next); n->n_parent = previous->n_parent; TAILQ_INSERT_AFTER(&previous->n_parent->n_children, previous, n, n_next); } static void node_expand_includes(struct node *root, bool is_master) { struct node *n, *n2, *tmp, *tmp2, *tmproot; int error; TAILQ_FOREACH_SAFE(n, &root->n_children, n_next, tmp) { if (n->n_key[0] != '+') continue; error = access(AUTO_INCLUDE_PATH, F_OK); if (error != 0) { log_errx(1, "directory services not configured; " "%s does not exist", AUTO_INCLUDE_PATH); } /* * "+1" to skip leading "+". */ yyin = auto_popen(AUTO_INCLUDE_PATH, n->n_key + 1, NULL); assert(yyin != NULL); tmproot = node_new_root(); if (is_master) parse_master_yyin(tmproot, n->n_key); else parse_map_yyin(tmproot, n->n_key, NULL); error = auto_pclose(yyin); yyin = NULL; if (error != 0) { log_errx(1, "failed to handle include \"%s\"", n->n_key); } /* * Entries to be included are now in tmproot. We need to merge * them with the rest, preserving their place and ordering. */ TAILQ_FOREACH_REVERSE_SAFE(n2, &tmproot->n_children, nodehead, n_next, tmp2) { node_move_after(n2, n); } node_delete(n); node_delete(tmproot); } } static char * expand_ampersand(char *string, const char *key) { char c, *expanded; int i, ret, before_len = 0; bool backslashed = false; assert(key[0] != '\0'); expanded = checked_strdup(string); for (i = 0; string[i] != '\0'; i++) { c = string[i]; if (c == '\\' && backslashed == false) { backslashed = true; continue; } if (backslashed) { backslashed = false; continue; } backslashed = false; if (c != '&') continue; /* * The 'before_len' variable contains the number * of characters before the '&'. */ before_len = i; //assert(i + 1 < (int)strlen(string)); ret = asprintf(&expanded, "%.*s%s%s", before_len, string, key, string + before_len + 1); if (ret < 0) log_err(1, "asprintf"); //log_debugx("\"%s\" expanded with key \"%s\" to \"%s\"", // string, key, expanded); /* * Figure out where to start searching for next variable. */ string = expanded; i = before_len + strlen(key); backslashed = false; //assert(i < (int)strlen(string)); } return (expanded); } /* * Expand "&" in n_location. If the key is NULL, try to use * key from map entries themselves. Keep in mind that maps * consist of tho levels of node structures, the key is one * level up. * * Variant with NULL key is for "automount -LL". */ void node_expand_ampersand(struct node *n, const char *key) { struct node *child; if (n->n_location != NULL) { if (key == NULL) { if (n->n_parent != NULL && strcmp(n->n_parent->n_key, "*") != 0) { n->n_location = expand_ampersand(n->n_location, n->n_parent->n_key); } } else { n->n_location = expand_ampersand(n->n_location, key); } } TAILQ_FOREACH(child, &n->n_children, n_next) node_expand_ampersand(child, key); } /* * Expand "*" in n_key. */ void node_expand_wildcard(struct node *n, const char *key) { struct node *child, *expanded; assert(key != NULL); if (strcmp(n->n_key, "*") == 0) { expanded = node_duplicate(n, NULL); expanded->n_key = checked_strdup(key); node_move_after(expanded, n); } TAILQ_FOREACH(child, &n->n_children, n_next) node_expand_wildcard(child, key); } int node_expand_defined(struct node *n) { struct node *child; int error, cumulated_error = 0; if (n->n_location != NULL) { n->n_location = defined_expand(n->n_location); if (n->n_location == NULL) { log_warnx("failed to expand location for %s", node_path(n)); return (EINVAL); } } TAILQ_FOREACH(child, &n->n_children, n_next) { error = node_expand_defined(child); if (error != 0 && cumulated_error == 0) cumulated_error = error; } return (cumulated_error); } static bool node_is_direct_key(const struct node *n) { if (n->n_parent != NULL && n->n_parent->n_parent == NULL && strcmp(n->n_key, "/-") == 0) { return (true); } return (false); } bool node_is_direct_map(const struct node *n) { for (;;) { assert(n->n_parent != NULL); if (n->n_parent->n_parent == NULL) break; n = n->n_parent; } return (node_is_direct_key(n)); } bool node_has_wildcards(const struct node *n) { const struct node *child; TAILQ_FOREACH(child, &n->n_children, n_next) { if (strcmp(child->n_key, "*") == 0) return (true); } return (false); } static void node_expand_maps(struct node *n, bool indirect) { struct node *child, *tmp; TAILQ_FOREACH_SAFE(child, &n->n_children, n_next, tmp) { if (node_is_direct_map(child)) { if (indirect) continue; } else { if (indirect == false) continue; } /* * This is the first-level map node; the one that contains * the key and subnodes with mountpoints and actual map names. */ if (child->n_map == NULL) continue; if (indirect) { log_debugx("map \"%s\" is an indirect map, parsing", child->n_map); } else { log_debugx("map \"%s\" is a direct map, parsing", child->n_map); } parse_map(child, child->n_map, NULL, NULL); } } static void node_expand_direct_maps(struct node *n) { node_expand_maps(n, false); } void node_expand_indirect_maps(struct node *n) { node_expand_maps(n, true); } static char * node_path_x(const struct node *n, char *x) { char *path; if (n->n_parent == NULL) return (x); /* * Return "/-" for direct maps only if we were asked for path * to the "/-" node itself, not to any of its subnodes. */ if (node_is_direct_key(n) && x[0] != '\0') return (x); assert(n->n_key[0] != '\0'); path = concat(n->n_key, '/', x); free(x); return (node_path_x(n->n_parent, path)); } /* * Return full path for node, consisting of concatenated * paths of node itself and all its parents, up to the root. */ char * node_path(const struct node *n) { char *path; size_t len; path = node_path_x(n, checked_strdup("")); /* * Strip trailing slash, unless the whole path is "/". */ len = strlen(path); if (len > 1 && path[len - 1] == '/') path[len - 1] = '\0'; return (path); } static char * node_options_x(const struct node *n, char *x) { char *options; if (n == NULL) return (x); options = concat(x, ',', n->n_options); free(x); return (node_options_x(n->n_parent, options)); } /* * Return options for node, consisting of concatenated * options from the node itself and all its parents, * up to the root. */ char * node_options(const struct node *n) { return (node_options_x(n, checked_strdup(""))); } static void node_print_indent(const struct node *n, const char *cmdline_options, int indent) { const struct node *child, *first_child; char *path, *options, *tmp; path = node_path(n); tmp = node_options(n); options = concat(cmdline_options, ',', tmp); free(tmp); /* * Do not show both parent and child node if they have the same * mountpoint; only show the child node. This means the typical, * "key location", map entries are shown in a single line; * the "key mountpoint1 location2 mountpoint2 location2" entries * take multiple lines. */ first_child = TAILQ_FIRST(&n->n_children); if (first_child == NULL || TAILQ_NEXT(first_child, n_next) != NULL || strcmp(path, node_path(first_child)) != 0) { assert(n->n_location == NULL || n->n_map == NULL); printf("%*.s%-*s %s%-*s %-*s # %s map %s at %s:%d\n", indent, "", 25 - indent, path, options[0] != '\0' ? "-" : " ", 20, options[0] != '\0' ? options : "", 20, n->n_location != NULL ? n->n_location : n->n_map != NULL ? n->n_map : "", node_is_direct_map(n) ? "direct" : "indirect", indent == 0 ? "referenced" : "defined", n->n_config_file, n->n_config_line); } free(path); free(options); TAILQ_FOREACH(child, &n->n_children, n_next) node_print_indent(child, cmdline_options, indent + 2); } /* * Recursively print node with all its children. The cmdline_options * argument is used for additional options to be prepended to all the * others - usually those are the options passed by command line. */ void node_print(const struct node *n, const char *cmdline_options) { const struct node *child; TAILQ_FOREACH(child, &n->n_children, n_next) node_print_indent(child, cmdline_options, 0); } static struct node * node_find_x(struct node *node, const char *path) { struct node *child, *found; char *tmp; size_t tmplen; //log_debugx("looking up %s in %s", path, node_path(node)); if (!node_is_direct_key(node)) { tmp = node_path(node); tmplen = strlen(tmp); if (strncmp(tmp, path, tmplen) != 0) { free(tmp); return (NULL); } if (path[tmplen] != '/' && path[tmplen] != '\0') { /* * If we have two map entries like 'foo' and 'foobar', make * sure the search for 'foobar' won't match 'foo' instead. */ free(tmp); return (NULL); } free(tmp); } TAILQ_FOREACH(child, &node->n_children, n_next) { found = node_find_x(child, path); if (found != NULL) return (found); } if (node->n_parent == NULL || node_is_direct_key(node)) return (NULL); return (node); } struct node * node_find(struct node *root, const char *path) { struct node *node; assert(root->n_parent == NULL); node = node_find_x(root, path); if (node != NULL) assert(node != root); return (node); } /* * Canonical form of a map entry looks like this: * * key [-options] [ [/mountpoint] [-options2] location ... ] * * Entries for executable maps are slightly different, as they * lack the 'key' field and are always single-line; the key field * for those maps is taken from 'executable_key' argument. * * We parse it in such a way that a map always has two levels - first * for key, and the second, for the mountpoint. */ static void parse_map_yyin(struct node *parent, const char *map, const char *executable_key) { char *key = NULL, *options = NULL, *mountpoint = NULL, *options2 = NULL, *location = NULL; int ret; struct node *node; lineno = 1; if (executable_key != NULL) key = checked_strdup(executable_key); for (;;) { ret = yylex(); if (ret == 0 || ret == NEWLINE) { /* * In case of executable map, the key is always * non-NULL, even if the map is empty. So, make sure * we don't fail empty maps here. */ if ((key != NULL && executable_key == NULL) || options != NULL) { log_errx(1, "truncated entry at %s, line %d", map, lineno); } if (ret == 0 || executable_key != NULL) { /* * End of file. */ break; } else { key = options = NULL; continue; } } if (key == NULL) { key = checked_strdup(yytext); if (key[0] == '+') { node_new(parent, key, NULL, NULL, map, lineno); key = options = NULL; continue; } continue; } else if (yytext[0] == '-') { if (options != NULL) { log_errx(1, "duplicated options at %s, line %d", map, lineno); } /* * +1 to skip leading "-". */ options = checked_strdup(yytext + 1); continue; } /* * We cannot properly handle a situation where the map key * is "/". Ignore such entries. * * XXX: According to Piete Brooks, Linux automounter uses * "/" as a wildcard character in LDAP maps. Perhaps * we should work around this braindamage by substituting * "*" for "/"? */ if (strcmp(key, "/") == 0) { log_warnx("nonsensical map key \"/\" at %s, line %d; " "ignoring map entry ", map, lineno); /* * Skip the rest of the entry. */ do { ret = yylex(); } while (ret != 0 && ret != NEWLINE); key = options = NULL; continue; } //log_debugx("adding map node, %s", key); node = node_new(parent, key, options, NULL, map, lineno); key = options = NULL; for (;;) { if (yytext[0] == '/') { if (mountpoint != NULL) { log_errx(1, "duplicated mountpoint " "in %s, line %d", map, lineno); } if (options2 != NULL || location != NULL) { log_errx(1, "mountpoint out of order " "in %s, line %d", map, lineno); } mountpoint = checked_strdup(yytext); goto again; } if (yytext[0] == '-') { if (options2 != NULL) { log_errx(1, "duplicated options " "in %s, line %d", map, lineno); } if (location != NULL) { log_errx(1, "options out of order " "in %s, line %d", map, lineno); } options2 = checked_strdup(yytext + 1); goto again; } if (location != NULL) { log_errx(1, "too many arguments " "in %s, line %d", map, lineno); } /* * If location field starts with colon, e.g. ":/dev/cd0", * then strip it. */ if (yytext[0] == ':') { location = checked_strdup(yytext + 1); if (location[0] == '\0') { log_errx(1, "empty location in %s, " "line %d", map, lineno); } } else { location = checked_strdup(yytext); } if (mountpoint == NULL) mountpoint = checked_strdup("/"); if (options2 == NULL) options2 = checked_strdup(""); #if 0 log_debugx("adding map node, %s %s %s", mountpoint, options2, location); #endif node_new(node, mountpoint, options2, location, map, lineno); mountpoint = options2 = location = NULL; again: ret = yylex(); if (ret == 0 || ret == NEWLINE) { if (mountpoint != NULL || options2 != NULL || location != NULL) { log_errx(1, "truncated entry " "in %s, line %d", map, lineno); } break; } } } } /* * Parse output of a special map called without argument. It is a list * of keys, separated by newlines. They can contain whitespace, so use * getline(3) instead of lexer used for maps. */ static void parse_map_keys_yyin(struct node *parent, const char *map) { char *line = NULL, *key; size_t linecap = 0; ssize_t linelen; lineno = 1; for (;;) { linelen = getline(&line, &linecap, yyin); if (linelen < 0) { /* * End of file. */ break; } if (linelen <= 1) { /* * Empty line, consisting of just the newline. */ continue; } /* * "-1" to strip the trailing newline. */ key = strndup(line, linelen - 1); log_debugx("adding key \"%s\"", key); node_new(parent, key, NULL, NULL, map, lineno); lineno++; } free(line); } static bool file_is_executable(const char *path) { struct stat sb; int error; error = stat(path, &sb); if (error != 0) log_err(1, "cannot stat %s", path); if ((sb.st_mode & S_IXUSR) || (sb.st_mode & S_IXGRP) || (sb.st_mode & S_IXOTH)) return (true); return (false); } /* * Parse a special map, e.g. "-hosts". */ static void parse_special_map(struct node *parent, const char *map, const char *key) { char *path; int error, ret; assert(map[0] == '-'); /* * +1 to skip leading "-" in map name. */ ret = asprintf(&path, "%s/special_%s", AUTO_SPECIAL_PREFIX, map + 1); if (ret < 0) log_err(1, "asprintf"); yyin = auto_popen(path, key, NULL); assert(yyin != NULL); if (key == NULL) { parse_map_keys_yyin(parent, map); } else { parse_map_yyin(parent, map, key); } error = auto_pclose(yyin); yyin = NULL; if (error != 0) log_errx(1, "failed to handle special map \"%s\"", map); node_expand_includes(parent, false); node_expand_direct_maps(parent); free(path); } /* * Retrieve and parse map from directory services, e.g. LDAP. * Note that it is different from executable maps, in that * the include script outputs the whole map to standard output * (as opposed to executable maps that only output a single * entry, without the key), and it takes the map name as an * argument, instead of key. */ static void parse_included_map(struct node *parent, const char *map) { int error; assert(map[0] != '-'); assert(map[0] != '/'); error = access(AUTO_INCLUDE_PATH, F_OK); if (error != 0) { log_errx(1, "directory services not configured;" " %s does not exist", AUTO_INCLUDE_PATH); } yyin = auto_popen(AUTO_INCLUDE_PATH, map, NULL); assert(yyin != NULL); parse_map_yyin(parent, map, NULL); error = auto_pclose(yyin); yyin = NULL; if (error != 0) log_errx(1, "failed to handle remote map \"%s\"", map); node_expand_includes(parent, false); node_expand_direct_maps(parent); } void parse_map(struct node *parent, const char *map, const char *key, bool *wildcards) { char *path = NULL; int error, ret; bool executable; assert(map != NULL); assert(map[0] != '\0'); log_debugx("parsing map \"%s\"", map); if (wildcards != NULL) *wildcards = false; if (map[0] == '-') { if (wildcards != NULL) *wildcards = true; return (parse_special_map(parent, map, key)); } if (map[0] == '/') { path = checked_strdup(map); } else { ret = asprintf(&path, "%s/%s", AUTO_MAP_PREFIX, map); if (ret < 0) log_err(1, "asprintf"); log_debugx("map \"%s\" maps to \"%s\"", map, path); /* * See if the file exists. If not, try to obtain the map * from directory services. */ error = access(path, F_OK); if (error != 0) { log_debugx("map file \"%s\" does not exist; falling " "back to directory services", path); return (parse_included_map(parent, map)); } } executable = file_is_executable(path); if (executable) { log_debugx("map \"%s\" is executable", map); if (wildcards != NULL) *wildcards = true; if (key != NULL) { yyin = auto_popen(path, key, NULL); } else { yyin = auto_popen(path, NULL); } assert(yyin != NULL); } else { yyin = fopen(path, "r"); if (yyin == NULL) log_err(1, "unable to open \"%s\"", path); } free(path); path = NULL; parse_map_yyin(parent, map, executable ? key : NULL); if (executable) { error = auto_pclose(yyin); yyin = NULL; if (error != 0) { log_errx(1, "failed to handle executable map \"%s\"", map); } } else { fclose(yyin); } yyin = NULL; log_debugx("done parsing map \"%s\"", map); node_expand_includes(parent, false); node_expand_direct_maps(parent); } static void parse_master_yyin(struct node *root, const char *master) { char *mountpoint = NULL, *map = NULL, *options = NULL; int ret; /* * XXX: 1 gives incorrect values; wtf? */ lineno = 0; for (;;) { ret = yylex(); if (ret == 0 || ret == NEWLINE) { if (mountpoint != NULL) { //log_debugx("adding map for %s", mountpoint); node_new_map(root, mountpoint, options, map, master, lineno); } if (ret == 0) { break; } else { mountpoint = map = options = NULL; continue; } } if (mountpoint == NULL) { mountpoint = checked_strdup(yytext); } else if (map == NULL) { map = checked_strdup(yytext); } else if (options == NULL) { /* * +1 to skip leading "-". */ options = checked_strdup(yytext + 1); } else { log_errx(1, "too many arguments at %s, line %d", master, lineno); } } } void parse_master(struct node *root, const char *master) { log_debugx("parsing auto_master file at \"%s\"", master); yyin = fopen(master, "r"); if (yyin == NULL) err(1, "unable to open %s", master); parse_master_yyin(root, master); fclose(yyin); yyin = NULL; log_debugx("done parsing \"%s\"", master); node_expand_includes(root, true); node_expand_direct_maps(root); } /* * Two things daemon(3) does, that we actually also want to do * when running in foreground, is closing the stdin and chdiring * to "/". This is what we do here. */ void lesser_daemon(void) { int error, fd; error = chdir("/"); if (error != 0) log_warn("chdir"); fd = open(_PATH_DEVNULL, O_RDWR, 0); if (fd < 0) { log_warn("cannot open %s", _PATH_DEVNULL); return; } error = dup2(fd, STDIN_FILENO); if (error != 0) log_warn("dup2"); error = close(fd); if (error != 0) { /* Bloody hell. */ log_warn("close"); } } int main(int argc, char **argv) { char *cmdname; if (argv[0] == NULL) log_errx(1, "NULL command name"); cmdname = basename(argv[0]); if (strcmp(cmdname, "automount") == 0) return (main_automount(argc, argv)); else if (strcmp(cmdname, "automountd") == 0) return (main_automountd(argc, argv)); else if (strcmp(cmdname, "autounmountd") == 0) return (main_autounmountd(argc, argv)); else log_errx(1, "binary name should be either \"automount\", " "\"automountd\", or \"autounmountd\""); } Index: projects/clang380-import/usr.sbin/autofs/defined.c =================================================================== --- projects/clang380-import/usr.sbin/autofs/defined.c (revision 294776) +++ projects/clang380-import/usr.sbin/autofs/defined.c (revision 294777) @@ -1,272 +1,271 @@ /*- * Copyright (c) 2014 The FreeBSD Foundation * All rights reserved. * * This software was developed by Edward Tomasz Napierala under sponsorship * from the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ /* * All the "defined" stuff is for handling variables, * such as ${OSNAME}, in maps. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include - -#include #include "common.h" static TAILQ_HEAD(, defined_value) defined_values; static const char * defined_find(const char *name) { struct defined_value *d; TAILQ_FOREACH(d, &defined_values, d_next) { if (strcmp(d->d_name, name) == 0) return (d->d_value); } return (NULL); } char * defined_expand(const char *string) { const char *value; char c, *expanded, *name; int i, ret, before_len = 0, name_off = 0, name_len = 0, after_off = 0; bool backslashed = false, bracketed = false; expanded = checked_strdup(string); for (i = 0; string[i] != '\0'; i++) { c = string[i]; if (c == '\\' && backslashed == false) { backslashed = true; continue; } if (backslashed) { backslashed = false; continue; } backslashed = false; if (c != '$') continue; /* * The 'before_len' variable contains the number * of characters before the '$'. */ before_len = i; assert(i + 1 < (int)strlen(string)); if (string[i + 1] == '{') bracketed = true; if (string[i + 1] == '\0') { log_warnx("truncated variable"); return (NULL); } /* * Skip '$'. */ i++; if (bracketed) { if (string[i + 1] == '\0') { log_warnx("truncated variable"); return (NULL); } /* * Skip '{'. */ i++; } /* * The 'name_off' variable contains the number * of characters before the variable name, * including the "$" or "${". */ name_off = i; for (; string[i] != '\0'; i++) { c = string[i]; /* * XXX: Decide on the set of characters that can be * used in a variable name. */ if (isalnum(c) || c == '_') continue; /* * End of variable name. */ if (bracketed) { if (c != '}') continue; /* * The 'after_off' variable contains the number * of characters before the rest of the string, * i.e. after the variable name. */ after_off = i + 1; assert(i > 1); assert(i - 1 > name_off); name_len = i - name_off; break; } after_off = i; assert(i > 1); assert(i > name_off); name_len = i - name_off; break; } name = strndup(string + name_off, name_len); if (name == NULL) log_err(1, "strndup"); value = defined_find(name); if (value == NULL) { log_warnx("undefined variable ${%s}", name); return (NULL); } /* * Concatenate it back. */ ret = asprintf(&expanded, "%.*s%s%s", before_len, string, value, string + after_off); if (ret < 0) log_err(1, "asprintf"); //log_debugx("\"%s\" expanded to \"%s\"", string, expanded); free(name); /* * Figure out where to start searching for next variable. */ string = expanded; i = before_len + strlen(value); backslashed = bracketed = false; before_len = name_off = name_len = after_off = 0; assert(i <= (int)strlen(string)); } if (before_len != 0 || name_off != 0 || name_len != 0 || after_off != 0) { log_warnx("truncated variable"); return (NULL); } return (expanded); } static void defined_add(const char *name, const char *value) { struct defined_value *d; const char *found; found = defined_find(name); if (found != NULL) log_errx(1, "variable %s already defined", name); log_debugx("defining variable %s=%s", name, value); d = calloc(sizeof(*d), 1); if (d == NULL) log_err(1, "calloc"); d->d_name = checked_strdup(name); d->d_value = checked_strdup(value); TAILQ_INSERT_TAIL(&defined_values, d, d_next); } void defined_parse_and_add(char *def) { char *name, *value; value = def; name = strsep(&value, "="); if (value == NULL || value[0] == '\0') log_errx(1, "missing variable value"); if (name == NULL || name[0] == '\0') log_errx(1, "missing variable name"); defined_add(name, value); } void defined_init(void) { struct utsname name; int error; TAILQ_INIT(&defined_values); error = uname(&name); if (error != 0) log_err(1, "uname"); defined_add("ARCH", name.machine); defined_add("CPU", name.machine); defined_add("HOST", name.nodename); defined_add("OSNAME", name.sysname); defined_add("OSREL", name.release); defined_add("OSVERS", name.version); } Index: projects/clang380-import/usr.sbin/bhyve/block_if.c =================================================================== --- projects/clang380-import/usr.sbin/bhyve/block_if.c (revision 294776) +++ projects/clang380-import/usr.sbin/bhyve/block_if.c (revision 294777) @@ -1,822 +1,820 @@ /*- * Copyright (c) 2013 Peter Grehan * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "bhyverun.h" #include "mevent.h" #include "block_if.h" #define BLOCKIF_SIG 0xb109b109 #define BLOCKIF_NUMTHR 8 #define BLOCKIF_MAXREQ (64 + BLOCKIF_NUMTHR) enum blockop { BOP_READ, BOP_WRITE, BOP_FLUSH, BOP_DELETE }; enum blockstat { BST_FREE, BST_BLOCK, BST_PEND, BST_BUSY, BST_DONE }; struct blockif_elem { TAILQ_ENTRY(blockif_elem) be_link; struct blockif_req *be_req; enum blockop be_op; enum blockstat be_status; pthread_t be_tid; off_t be_block; }; struct blockif_ctxt { int bc_magic; int bc_fd; int bc_ischr; int bc_isgeom; int bc_candelete; int bc_rdonly; off_t bc_size; int bc_sectsz; int bc_psectsz; int bc_psectoff; int bc_closing; pthread_t bc_btid[BLOCKIF_NUMTHR]; pthread_mutex_t bc_mtx; pthread_cond_t bc_cond; /* Request elements and free/pending/busy queues */ TAILQ_HEAD(, blockif_elem) bc_freeq; TAILQ_HEAD(, blockif_elem) bc_pendq; TAILQ_HEAD(, blockif_elem) bc_busyq; struct blockif_elem bc_reqs[BLOCKIF_MAXREQ]; }; static pthread_once_t blockif_once = PTHREAD_ONCE_INIT; struct blockif_sig_elem { pthread_mutex_t bse_mtx; pthread_cond_t bse_cond; int bse_pending; struct blockif_sig_elem *bse_next; }; static struct blockif_sig_elem *blockif_bse_head; static int blockif_enqueue(struct blockif_ctxt *bc, struct blockif_req *breq, enum blockop op) { struct blockif_elem *be, *tbe; off_t off; int i; be = TAILQ_FIRST(&bc->bc_freeq); assert(be != NULL); assert(be->be_status == BST_FREE); TAILQ_REMOVE(&bc->bc_freeq, be, be_link); be->be_req = breq; be->be_op = op; switch (op) { case BOP_READ: case BOP_WRITE: case BOP_DELETE: off = breq->br_offset; for (i = 0; i < breq->br_iovcnt; i++) off += breq->br_iov[i].iov_len; break; default: off = OFF_MAX; } be->be_block = off; TAILQ_FOREACH(tbe, &bc->bc_pendq, be_link) { if (tbe->be_block == breq->br_offset) break; } if (tbe == NULL) { TAILQ_FOREACH(tbe, &bc->bc_busyq, be_link) { if (tbe->be_block == breq->br_offset) break; } } if (tbe == NULL) be->be_status = BST_PEND; else be->be_status = BST_BLOCK; TAILQ_INSERT_TAIL(&bc->bc_pendq, be, be_link); return (be->be_status == BST_PEND); } static int blockif_dequeue(struct blockif_ctxt *bc, pthread_t t, struct blockif_elem **bep) { struct blockif_elem *be; TAILQ_FOREACH(be, &bc->bc_pendq, be_link) { if (be->be_status == BST_PEND) break; assert(be->be_status == BST_BLOCK); } if (be == NULL) return (0); TAILQ_REMOVE(&bc->bc_pendq, be, be_link); be->be_status = BST_BUSY; be->be_tid = t; TAILQ_INSERT_TAIL(&bc->bc_busyq, be, be_link); *bep = be; return (1); } static void blockif_complete(struct blockif_ctxt *bc, struct blockif_elem *be) { struct blockif_elem *tbe; if (be->be_status == BST_DONE || be->be_status == BST_BUSY) TAILQ_REMOVE(&bc->bc_busyq, be, be_link); else TAILQ_REMOVE(&bc->bc_pendq, be, be_link); TAILQ_FOREACH(tbe, &bc->bc_pendq, be_link) { if (tbe->be_req->br_offset == be->be_block) tbe->be_status = BST_PEND; } be->be_tid = 0; be->be_status = BST_FREE; be->be_req = NULL; TAILQ_INSERT_TAIL(&bc->bc_freeq, be, be_link); } static void blockif_proc(struct blockif_ctxt *bc, struct blockif_elem *be, uint8_t *buf) { struct blockif_req *br; off_t arg[2]; ssize_t clen, len, off, boff, voff; int i, err; br = be->be_req; if (br->br_iovcnt <= 1) buf = NULL; err = 0; switch (be->be_op) { case BOP_READ: if (buf == NULL) { if ((len = preadv(bc->bc_fd, br->br_iov, br->br_iovcnt, br->br_offset)) < 0) err = errno; else br->br_resid -= len; break; } i = 0; off = voff = 0; while (br->br_resid > 0) { len = MIN(br->br_resid, MAXPHYS); if (pread(bc->bc_fd, buf, len, br->br_offset + off) < 0) { err = errno; break; } boff = 0; do { clen = MIN(len - boff, br->br_iov[i].iov_len - voff); memcpy(br->br_iov[i].iov_base + voff, buf + boff, clen); if (clen < br->br_iov[i].iov_len - voff) voff += clen; else { i++; voff = 0; } boff += clen; } while (boff < len); off += len; br->br_resid -= len; } break; case BOP_WRITE: if (bc->bc_rdonly) { err = EROFS; break; } if (buf == NULL) { if ((len = pwritev(bc->bc_fd, br->br_iov, br->br_iovcnt, br->br_offset)) < 0) err = errno; else br->br_resid -= len; break; } i = 0; off = voff = 0; while (br->br_resid > 0) { len = MIN(br->br_resid, MAXPHYS); boff = 0; do { clen = MIN(len - boff, br->br_iov[i].iov_len - voff); memcpy(buf + boff, br->br_iov[i].iov_base + voff, clen); if (clen < br->br_iov[i].iov_len - voff) voff += clen; else { i++; voff = 0; } boff += clen; } while (boff < len); if (pwrite(bc->bc_fd, buf, len, br->br_offset + off) < 0) { err = errno; break; } off += len; br->br_resid -= len; } break; case BOP_FLUSH: if (bc->bc_ischr) { if (ioctl(bc->bc_fd, DIOCGFLUSH)) err = errno; } else if (fsync(bc->bc_fd)) err = errno; break; case BOP_DELETE: if (!bc->bc_candelete) err = EOPNOTSUPP; else if (bc->bc_rdonly) err = EROFS; else if (bc->bc_ischr) { arg[0] = br->br_offset; arg[1] = br->br_resid; if (ioctl(bc->bc_fd, DIOCGDELETE, arg)) err = errno; else br->br_resid = 0; } else err = EOPNOTSUPP; break; default: err = EINVAL; break; } be->be_status = BST_DONE; (*br->br_callback)(br, err); } static void * blockif_thr(void *arg) { struct blockif_ctxt *bc; struct blockif_elem *be; pthread_t t; uint8_t *buf; bc = arg; if (bc->bc_isgeom) buf = malloc(MAXPHYS); else buf = NULL; t = pthread_self(); pthread_mutex_lock(&bc->bc_mtx); for (;;) { while (blockif_dequeue(bc, t, &be)) { pthread_mutex_unlock(&bc->bc_mtx); blockif_proc(bc, be, buf); pthread_mutex_lock(&bc->bc_mtx); blockif_complete(bc, be); } /* Check ctxt status here to see if exit requested */ if (bc->bc_closing) break; pthread_cond_wait(&bc->bc_cond, &bc->bc_mtx); } pthread_mutex_unlock(&bc->bc_mtx); if (buf) free(buf); pthread_exit(NULL); return (NULL); } static void blockif_sigcont_handler(int signal, enum ev_type type, void *arg) { struct blockif_sig_elem *bse; for (;;) { /* * Process the entire list even if not intended for * this thread. */ do { bse = blockif_bse_head; if (bse == NULL) return; } while (!atomic_cmpset_ptr((uintptr_t *)&blockif_bse_head, (uintptr_t)bse, (uintptr_t)bse->bse_next)); pthread_mutex_lock(&bse->bse_mtx); bse->bse_pending = 0; pthread_cond_signal(&bse->bse_cond); pthread_mutex_unlock(&bse->bse_mtx); } } static void blockif_init(void) { mevent_add(SIGCONT, EVF_SIGNAL, blockif_sigcont_handler, NULL); (void) signal(SIGCONT, SIG_IGN); } struct blockif_ctxt * blockif_open(const char *optstr, const char *ident) { char tname[MAXCOMLEN + 1]; char name[MAXPATHLEN]; char *nopt, *xopts, *cp; struct blockif_ctxt *bc; struct stat sbuf; struct diocgattr_arg arg; off_t size, psectsz, psectoff; int extra, fd, i, sectsz; int nocache, sync, ro, candelete, geom, ssopt, pssopt; pthread_once(&blockif_once, blockif_init); fd = -1; ssopt = 0; nocache = 0; sync = 0; ro = 0; /* * The first element in the optstring is always a pathname. * Optional elements follow */ nopt = xopts = strdup(optstr); while (xopts != NULL) { cp = strsep(&xopts, ","); if (cp == nopt) /* file or device pathname */ continue; else if (!strcmp(cp, "nocache")) nocache = 1; else if (!strcmp(cp, "sync") || !strcmp(cp, "direct")) sync = 1; else if (!strcmp(cp, "ro")) ro = 1; else if (sscanf(cp, "sectorsize=%d/%d", &ssopt, &pssopt) == 2) ; else if (sscanf(cp, "sectorsize=%d", &ssopt) == 1) pssopt = ssopt; else { fprintf(stderr, "Invalid device option \"%s\"\n", cp); goto err; } } extra = 0; if (nocache) extra |= O_DIRECT; if (sync) extra |= O_SYNC; fd = open(nopt, (ro ? O_RDONLY : O_RDWR) | extra); if (fd < 0 && !ro) { /* Attempt a r/w fail with a r/o open */ fd = open(nopt, O_RDONLY | extra); ro = 1; } if (fd < 0) { perror("Could not open backing file"); goto err; } if (fstat(fd, &sbuf) < 0) { perror("Could not stat backing file"); goto err; } /* * Deal with raw devices */ size = sbuf.st_size; sectsz = DEV_BSIZE; psectsz = psectoff = 0; candelete = geom = 0; if (S_ISCHR(sbuf.st_mode)) { if (ioctl(fd, DIOCGMEDIASIZE, &size) < 0 || ioctl(fd, DIOCGSECTORSIZE, §sz)) { perror("Could not fetch dev blk/sector size"); goto err; } assert(size != 0); assert(sectsz != 0); if (ioctl(fd, DIOCGSTRIPESIZE, &psectsz) == 0 && psectsz > 0) ioctl(fd, DIOCGSTRIPEOFFSET, &psectoff); strlcpy(arg.name, "GEOM::candelete", sizeof(arg.name)); arg.len = sizeof(arg.value.i); if (ioctl(fd, DIOCGATTR, &arg) == 0) candelete = arg.value.i; if (ioctl(fd, DIOCGPROVIDERNAME, name) == 0) geom = 1; } else psectsz = sbuf.st_blksize; if (ssopt != 0) { if (!powerof2(ssopt) || !powerof2(pssopt) || ssopt < 512 || ssopt > pssopt) { fprintf(stderr, "Invalid sector size %d/%d\n", ssopt, pssopt); goto err; } /* * Some backend drivers (e.g. cd0, ada0) require that the I/O * size be a multiple of the device's sector size. * * Validate that the emulated sector size complies with this * requirement. */ if (S_ISCHR(sbuf.st_mode)) { if (ssopt < sectsz || (ssopt % sectsz) != 0) { fprintf(stderr, "Sector size %d incompatible " "with underlying device sector size %d\n", ssopt, sectsz); goto err; } } sectsz = ssopt; psectsz = pssopt; psectoff = 0; } bc = calloc(1, sizeof(struct blockif_ctxt)); if (bc == NULL) { perror("calloc"); goto err; } bc->bc_magic = BLOCKIF_SIG; bc->bc_fd = fd; bc->bc_ischr = S_ISCHR(sbuf.st_mode); bc->bc_isgeom = geom; bc->bc_candelete = candelete; bc->bc_rdonly = ro; bc->bc_size = size; bc->bc_sectsz = sectsz; bc->bc_psectsz = psectsz; bc->bc_psectoff = psectoff; pthread_mutex_init(&bc->bc_mtx, NULL); pthread_cond_init(&bc->bc_cond, NULL); TAILQ_INIT(&bc->bc_freeq); TAILQ_INIT(&bc->bc_pendq); TAILQ_INIT(&bc->bc_busyq); for (i = 0; i < BLOCKIF_MAXREQ; i++) { bc->bc_reqs[i].be_status = BST_FREE; TAILQ_INSERT_HEAD(&bc->bc_freeq, &bc->bc_reqs[i], be_link); } for (i = 0; i < BLOCKIF_NUMTHR; i++) { pthread_create(&bc->bc_btid[i], NULL, blockif_thr, bc); snprintf(tname, sizeof(tname), "blk-%s-%d", ident, i); pthread_set_name_np(bc->bc_btid[i], tname); } return (bc); err: if (fd >= 0) close(fd); return (NULL); } static int blockif_request(struct blockif_ctxt *bc, struct blockif_req *breq, enum blockop op) { int err; err = 0; pthread_mutex_lock(&bc->bc_mtx); if (!TAILQ_EMPTY(&bc->bc_freeq)) { /* * Enqueue and inform the block i/o thread * that there is work available */ if (blockif_enqueue(bc, breq, op)) pthread_cond_signal(&bc->bc_cond); } else { /* * Callers are not allowed to enqueue more than * the specified blockif queue limit. Return an * error to indicate that the queue length has been * exceeded. */ err = E2BIG; } pthread_mutex_unlock(&bc->bc_mtx); return (err); } int blockif_read(struct blockif_ctxt *bc, struct blockif_req *breq) { assert(bc->bc_magic == BLOCKIF_SIG); return (blockif_request(bc, breq, BOP_READ)); } int blockif_write(struct blockif_ctxt *bc, struct blockif_req *breq) { assert(bc->bc_magic == BLOCKIF_SIG); return (blockif_request(bc, breq, BOP_WRITE)); } int blockif_flush(struct blockif_ctxt *bc, struct blockif_req *breq) { assert(bc->bc_magic == BLOCKIF_SIG); return (blockif_request(bc, breq, BOP_FLUSH)); } int blockif_delete(struct blockif_ctxt *bc, struct blockif_req *breq) { assert(bc->bc_magic == BLOCKIF_SIG); return (blockif_request(bc, breq, BOP_DELETE)); } int blockif_cancel(struct blockif_ctxt *bc, struct blockif_req *breq) { struct blockif_elem *be; assert(bc->bc_magic == BLOCKIF_SIG); pthread_mutex_lock(&bc->bc_mtx); /* * Check pending requests. */ TAILQ_FOREACH(be, &bc->bc_pendq, be_link) { if (be->be_req == breq) break; } if (be != NULL) { /* * Found it. */ blockif_complete(bc, be); pthread_mutex_unlock(&bc->bc_mtx); return (0); } /* * Check in-flight requests. */ TAILQ_FOREACH(be, &bc->bc_busyq, be_link) { if (be->be_req == breq) break; } if (be == NULL) { /* * Didn't find it. */ pthread_mutex_unlock(&bc->bc_mtx); return (EINVAL); } /* * Interrupt the processing thread to force it return * prematurely via it's normal callback path. */ while (be->be_status == BST_BUSY) { struct blockif_sig_elem bse, *old_head; pthread_mutex_init(&bse.bse_mtx, NULL); pthread_cond_init(&bse.bse_cond, NULL); bse.bse_pending = 1; do { old_head = blockif_bse_head; bse.bse_next = old_head; } while (!atomic_cmpset_ptr((uintptr_t *)&blockif_bse_head, (uintptr_t)old_head, (uintptr_t)&bse)); pthread_kill(be->be_tid, SIGCONT); pthread_mutex_lock(&bse.bse_mtx); while (bse.bse_pending) pthread_cond_wait(&bse.bse_cond, &bse.bse_mtx); pthread_mutex_unlock(&bse.bse_mtx); } pthread_mutex_unlock(&bc->bc_mtx); /* * The processing thread has been interrupted. Since it's not * clear if the callback has been invoked yet, return EBUSY. */ return (EBUSY); } int blockif_close(struct blockif_ctxt *bc) { void *jval; - int err, i; - - err = 0; + int i; assert(bc->bc_magic == BLOCKIF_SIG); /* * Stop the block i/o thread */ pthread_mutex_lock(&bc->bc_mtx); bc->bc_closing = 1; pthread_mutex_unlock(&bc->bc_mtx); pthread_cond_broadcast(&bc->bc_cond); for (i = 0; i < BLOCKIF_NUMTHR; i++) pthread_join(bc->bc_btid[i], &jval); /* XXX Cancel queued i/o's ??? */ /* * Release resources */ bc->bc_magic = 0; close(bc->bc_fd); free(bc); return (0); } /* * Return virtual C/H/S values for a given block. Use the algorithm * outlined in the VHD specification to calculate values. */ void blockif_chs(struct blockif_ctxt *bc, uint16_t *c, uint8_t *h, uint8_t *s) { off_t sectors; /* total sectors of the block dev */ off_t hcyl; /* cylinders times heads */ uint16_t secpt; /* sectors per track */ uint8_t heads; assert(bc->bc_magic == BLOCKIF_SIG); sectors = bc->bc_size / bc->bc_sectsz; /* Clamp the size to the largest possible with CHS */ if (sectors > 65535UL*16*255) sectors = 65535UL*16*255; if (sectors >= 65536UL*16*63) { secpt = 255; heads = 16; hcyl = sectors / secpt; } else { secpt = 17; hcyl = sectors / secpt; heads = (hcyl + 1023) / 1024; if (heads < 4) heads = 4; if (hcyl >= (heads * 1024) || heads > 16) { secpt = 31; heads = 16; hcyl = sectors / secpt; } if (hcyl >= (heads * 1024)) { secpt = 63; heads = 16; hcyl = sectors / secpt; } } *c = hcyl / heads; *h = heads; *s = secpt; } /* * Accessors */ off_t blockif_size(struct blockif_ctxt *bc) { assert(bc->bc_magic == BLOCKIF_SIG); return (bc->bc_size); } int blockif_sectsz(struct blockif_ctxt *bc) { assert(bc->bc_magic == BLOCKIF_SIG); return (bc->bc_sectsz); } void blockif_psectsz(struct blockif_ctxt *bc, int *size, int *off) { assert(bc->bc_magic == BLOCKIF_SIG); *size = bc->bc_psectsz; *off = bc->bc_psectoff; } int blockif_queuesz(struct blockif_ctxt *bc) { assert(bc->bc_magic == BLOCKIF_SIG); return (BLOCKIF_MAXREQ - 1); } int blockif_is_ro(struct blockif_ctxt *bc) { assert(bc->bc_magic == BLOCKIF_SIG); return (bc->bc_rdonly); } int blockif_candelete(struct blockif_ctxt *bc) { assert(bc->bc_magic == BLOCKIF_SIG); return (bc->bc_candelete); } Index: projects/clang380-import/usr.sbin/bhyve/pci_ahci.c =================================================================== --- projects/clang380-import/usr.sbin/bhyve/pci_ahci.c (revision 294776) +++ projects/clang380-import/usr.sbin/bhyve/pci_ahci.c (revision 294777) @@ -1,2354 +1,2351 @@ /*- * Copyright (c) 2013 Zhixiang Yu * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "bhyverun.h" #include "pci_emul.h" #include "ahci.h" #include "block_if.h" #define MAX_PORTS 6 /* Intel ICH8 AHCI supports 6 ports */ #define PxSIG_ATA 0x00000101 /* ATA drive */ #define PxSIG_ATAPI 0xeb140101 /* ATAPI drive */ enum sata_fis_type { FIS_TYPE_REGH2D = 0x27, /* Register FIS - host to device */ FIS_TYPE_REGD2H = 0x34, /* Register FIS - device to host */ FIS_TYPE_DMAACT = 0x39, /* DMA activate FIS - device to host */ FIS_TYPE_DMASETUP = 0x41, /* DMA setup FIS - bidirectional */ FIS_TYPE_DATA = 0x46, /* Data FIS - bidirectional */ FIS_TYPE_BIST = 0x58, /* BIST activate FIS - bidirectional */ FIS_TYPE_PIOSETUP = 0x5F, /* PIO setup FIS - device to host */ FIS_TYPE_SETDEVBITS = 0xA1, /* Set dev bits FIS - device to host */ }; /* * SCSI opcodes */ #define TEST_UNIT_READY 0x00 #define REQUEST_SENSE 0x03 #define INQUIRY 0x12 #define START_STOP_UNIT 0x1B #define PREVENT_ALLOW 0x1E #define READ_CAPACITY 0x25 #define READ_10 0x28 #define POSITION_TO_ELEMENT 0x2B #define READ_TOC 0x43 #define GET_EVENT_STATUS_NOTIFICATION 0x4A #define MODE_SENSE_10 0x5A #define REPORT_LUNS 0xA0 #define READ_12 0xA8 #define READ_CD 0xBE /* * SCSI mode page codes */ #define MODEPAGE_RW_ERROR_RECOVERY 0x01 #define MODEPAGE_CD_CAPABILITIES 0x2A /* * ATA commands */ #define ATA_SF_ENAB_SATA_SF 0x10 #define ATA_SATA_SF_AN 0x05 #define ATA_SF_DIS_SATA_SF 0x90 /* * Debug printf */ #ifdef AHCI_DEBUG static FILE *dbg; #define DPRINTF(format, arg...) do{fprintf(dbg, format, ##arg);fflush(dbg);}while(0) #else #define DPRINTF(format, arg...) #endif #define WPRINTF(format, arg...) printf(format, ##arg) struct ahci_ioreq { struct blockif_req io_req; struct ahci_port *io_pr; STAILQ_ENTRY(ahci_ioreq) io_flist; TAILQ_ENTRY(ahci_ioreq) io_blist; uint8_t *cfis; uint32_t len; uint32_t done; int slot; int more; }; struct ahci_port { struct blockif_ctxt *bctx; struct pci_ahci_softc *pr_sc; uint8_t *cmd_lst; uint8_t *rfis; char ident[20 + 1]; int atapi; int reset; int waitforclear; int mult_sectors; uint8_t xfermode; uint8_t err_cfis[20]; uint8_t sense_key; uint8_t asc; u_int ccs; uint32_t pending; uint32_t clb; uint32_t clbu; uint32_t fb; uint32_t fbu; uint32_t is; uint32_t ie; uint32_t cmd; uint32_t unused0; uint32_t tfd; uint32_t sig; uint32_t ssts; uint32_t sctl; uint32_t serr; uint32_t sact; uint32_t ci; uint32_t sntf; uint32_t fbs; /* * i/o request info */ struct ahci_ioreq *ioreq; int ioqsz; STAILQ_HEAD(ahci_fhead, ahci_ioreq) iofhd; TAILQ_HEAD(ahci_bhead, ahci_ioreq) iobhd; }; struct ahci_cmd_hdr { uint16_t flags; uint16_t prdtl; uint32_t prdbc; uint64_t ctba; uint32_t reserved[4]; }; struct ahci_prdt_entry { uint64_t dba; uint32_t reserved; #define DBCMASK 0x3fffff uint32_t dbc; }; struct pci_ahci_softc { struct pci_devinst *asc_pi; pthread_mutex_t mtx; int ports; uint32_t cap; uint32_t ghc; uint32_t is; uint32_t pi; uint32_t vs; uint32_t ccc_ctl; uint32_t ccc_pts; uint32_t em_loc; uint32_t em_ctl; uint32_t cap2; uint32_t bohc; uint32_t lintr; struct ahci_port port[MAX_PORTS]; }; #define ahci_ctx(sc) ((sc)->asc_pi->pi_vmctx) static void ahci_handle_port(struct ahci_port *p); static inline void lba_to_msf(uint8_t *buf, int lba) { lba += 150; buf[0] = (lba / 75) / 60; buf[1] = (lba / 75) % 60; buf[2] = lba % 75; } /* * generate HBA intr depending on whether or not ports within * the controller have an interrupt pending. */ static void ahci_generate_intr(struct pci_ahci_softc *sc) { struct pci_devinst *pi; int i; pi = sc->asc_pi; for (i = 0; i < sc->ports; i++) { struct ahci_port *pr; pr = &sc->port[i]; if (pr->is & pr->ie) sc->is |= (1 << i); } DPRINTF("%s %x\n", __func__, sc->is); if (sc->is && (sc->ghc & AHCI_GHC_IE)) { if (pci_msi_enabled(pi)) { /* * Generate an MSI interrupt on every edge */ pci_generate_msi(pi, 0); } else if (!sc->lintr) { /* * Only generate a pin-based interrupt if one wasn't * in progress */ sc->lintr = 1; pci_lintr_assert(pi); } } else if (sc->lintr) { /* * No interrupts: deassert pin-based signal if it had * been asserted */ pci_lintr_deassert(pi); sc->lintr = 0; } } static void ahci_write_fis(struct ahci_port *p, enum sata_fis_type ft, uint8_t *fis) { int offset, len, irq; if (p->rfis == NULL || !(p->cmd & AHCI_P_CMD_FRE)) return; switch (ft) { case FIS_TYPE_REGD2H: offset = 0x40; len = 20; irq = (fis[1] & (1 << 6)) ? AHCI_P_IX_DHR : 0; break; case FIS_TYPE_SETDEVBITS: offset = 0x58; len = 8; irq = (fis[1] & (1 << 6)) ? AHCI_P_IX_SDB : 0; break; case FIS_TYPE_PIOSETUP: offset = 0x20; len = 20; irq = (fis[1] & (1 << 6)) ? AHCI_P_IX_PS : 0; break; default: WPRINTF("unsupported fis type %d\n", ft); return; } if (fis[2] & ATA_S_ERROR) { p->waitforclear = 1; irq |= AHCI_P_IX_TFE; } memcpy(p->rfis + offset, fis, len); if (irq) { p->is |= irq; ahci_generate_intr(p->pr_sc); } } static void ahci_write_fis_piosetup(struct ahci_port *p) { uint8_t fis[20]; memset(fis, 0, sizeof(fis)); fis[0] = FIS_TYPE_PIOSETUP; ahci_write_fis(p, FIS_TYPE_PIOSETUP, fis); } static void ahci_write_fis_sdb(struct ahci_port *p, int slot, uint8_t *cfis, uint32_t tfd) { uint8_t fis[8]; uint8_t error; error = (tfd >> 8) & 0xff; tfd &= 0x77; memset(fis, 0, sizeof(fis)); fis[0] = FIS_TYPE_SETDEVBITS; fis[1] = (1 << 6); fis[2] = tfd; fis[3] = error; if (fis[2] & ATA_S_ERROR) { p->err_cfis[0] = slot; p->err_cfis[2] = tfd; p->err_cfis[3] = error; memcpy(&p->err_cfis[4], cfis + 4, 16); } else { *(uint32_t *)(fis + 4) = (1 << slot); p->sact &= ~(1 << slot); } p->tfd &= ~0x77; p->tfd |= tfd; ahci_write_fis(p, FIS_TYPE_SETDEVBITS, fis); } static void ahci_write_fis_d2h(struct ahci_port *p, int slot, uint8_t *cfis, uint32_t tfd) { uint8_t fis[20]; uint8_t error; error = (tfd >> 8) & 0xff; memset(fis, 0, sizeof(fis)); fis[0] = FIS_TYPE_REGD2H; fis[1] = (1 << 6); fis[2] = tfd & 0xff; fis[3] = error; fis[4] = cfis[4]; fis[5] = cfis[5]; fis[6] = cfis[6]; fis[7] = cfis[7]; fis[8] = cfis[8]; fis[9] = cfis[9]; fis[10] = cfis[10]; fis[11] = cfis[11]; fis[12] = cfis[12]; fis[13] = cfis[13]; if (fis[2] & ATA_S_ERROR) { p->err_cfis[0] = 0x80; p->err_cfis[2] = tfd & 0xff; p->err_cfis[3] = error; memcpy(&p->err_cfis[4], cfis + 4, 16); } else p->ci &= ~(1 << slot); p->tfd = tfd; ahci_write_fis(p, FIS_TYPE_REGD2H, fis); } static void ahci_write_fis_d2h_ncq(struct ahci_port *p, int slot) { uint8_t fis[20]; p->tfd = ATA_S_READY | ATA_S_DSC; memset(fis, 0, sizeof(fis)); fis[0] = FIS_TYPE_REGD2H; fis[1] = 0; /* No interrupt */ fis[2] = p->tfd; /* Status */ fis[3] = 0; /* No error */ p->ci &= ~(1 << slot); ahci_write_fis(p, FIS_TYPE_REGD2H, fis); } static void ahci_write_reset_fis_d2h(struct ahci_port *p) { uint8_t fis[20]; memset(fis, 0, sizeof(fis)); fis[0] = FIS_TYPE_REGD2H; fis[3] = 1; fis[4] = 1; if (p->atapi) { fis[5] = 0x14; fis[6] = 0xeb; } fis[12] = 1; ahci_write_fis(p, FIS_TYPE_REGD2H, fis); } static void ahci_check_stopped(struct ahci_port *p) { /* * If we are no longer processing the command list and nothing * is in-flight, clear the running bit, the current command * slot, the command issue and active bits. */ if (!(p->cmd & AHCI_P_CMD_ST)) { if (p->pending == 0) { p->ccs = 0; p->cmd &= ~(AHCI_P_CMD_CR | AHCI_P_CMD_CCS_MASK); p->ci = 0; p->sact = 0; p->waitforclear = 0; } } } static void ahci_port_stop(struct ahci_port *p) { struct ahci_ioreq *aior; uint8_t *cfis; int slot; int ncq; int error; assert(pthread_mutex_isowned_np(&p->pr_sc->mtx)); TAILQ_FOREACH(aior, &p->iobhd, io_blist) { /* * Try to cancel the outstanding blockif request. */ error = blockif_cancel(p->bctx, &aior->io_req); if (error != 0) continue; slot = aior->slot; cfis = aior->cfis; if (cfis[2] == ATA_WRITE_FPDMA_QUEUED || cfis[2] == ATA_READ_FPDMA_QUEUED || cfis[2] == ATA_SEND_FPDMA_QUEUED) ncq = 1; if (ncq) p->sact &= ~(1 << slot); else p->ci &= ~(1 << slot); /* * This command is now done. */ p->pending &= ~(1 << slot); /* * Delete the blockif request from the busy list */ TAILQ_REMOVE(&p->iobhd, aior, io_blist); /* * Move the blockif request back to the free list */ STAILQ_INSERT_TAIL(&p->iofhd, aior, io_flist); } ahci_check_stopped(p); } static void ahci_port_reset(struct ahci_port *pr) { pr->serr = 0; pr->sact = 0; pr->xfermode = ATA_UDMA6; pr->mult_sectors = 128; if (!pr->bctx) { pr->ssts = ATA_SS_DET_NO_DEVICE; pr->sig = 0xFFFFFFFF; pr->tfd = 0x7F; return; } pr->ssts = ATA_SS_DET_PHY_ONLINE | ATA_SS_IPM_ACTIVE; if (pr->sctl & ATA_SC_SPD_MASK) pr->ssts |= (pr->sctl & ATA_SC_SPD_MASK); else pr->ssts |= ATA_SS_SPD_GEN3; pr->tfd = (1 << 8) | ATA_S_DSC | ATA_S_DMA; if (!pr->atapi) { pr->sig = PxSIG_ATA; pr->tfd |= ATA_S_READY; } else pr->sig = PxSIG_ATAPI; ahci_write_reset_fis_d2h(pr); } static void ahci_reset(struct pci_ahci_softc *sc) { int i; sc->ghc = AHCI_GHC_AE; sc->is = 0; if (sc->lintr) { pci_lintr_deassert(sc->asc_pi); sc->lintr = 0; } for (i = 0; i < sc->ports; i++) { sc->port[i].ie = 0; sc->port[i].is = 0; sc->port[i].cmd = (AHCI_P_CMD_SUD | AHCI_P_CMD_POD); if (sc->port[i].bctx) sc->port[i].cmd |= AHCI_P_CMD_CPS; sc->port[i].sctl = 0; ahci_port_reset(&sc->port[i]); } } static void ata_string(uint8_t *dest, const char *src, int len) { int i; for (i = 0; i < len; i++) { if (*src) dest[i ^ 1] = *src++; else dest[i ^ 1] = ' '; } } static void atapi_string(uint8_t *dest, const char *src, int len) { int i; for (i = 0; i < len; i++) { if (*src) dest[i] = *src++; else dest[i] = ' '; } } /* * Build up the iovec based on the PRDT, 'done' and 'len'. */ static void ahci_build_iov(struct ahci_port *p, struct ahci_ioreq *aior, struct ahci_prdt_entry *prdt, uint16_t prdtl) { struct blockif_req *breq = &aior->io_req; int i, j, skip, todo, left, extra; uint32_t dbcsz; /* Copy part of PRDT between 'done' and 'len' bytes into the iov. */ skip = aior->done; left = aior->len - aior->done; todo = 0; for (i = 0, j = 0; i < prdtl && j < BLOCKIF_IOV_MAX && left > 0; i++, prdt++) { dbcsz = (prdt->dbc & DBCMASK) + 1; /* Skip already done part of the PRDT */ if (dbcsz <= skip) { skip -= dbcsz; continue; } dbcsz -= skip; if (dbcsz > left) dbcsz = left; breq->br_iov[j].iov_base = paddr_guest2host(ahci_ctx(p->pr_sc), prdt->dba + skip, dbcsz); breq->br_iov[j].iov_len = dbcsz; todo += dbcsz; left -= dbcsz; skip = 0; j++; } /* If we got limited by IOV length, round I/O down to sector size. */ if (j == BLOCKIF_IOV_MAX) { extra = todo % blockif_sectsz(p->bctx); todo -= extra; assert(todo > 0); while (extra > 0) { if (breq->br_iov[j - 1].iov_len > extra) { breq->br_iov[j - 1].iov_len -= extra; break; } extra -= breq->br_iov[j - 1].iov_len; j--; } } breq->br_iovcnt = j; breq->br_resid = todo; aior->done += todo; aior->more = (aior->done < aior->len && i < prdtl); } static void ahci_handle_rw(struct ahci_port *p, int slot, uint8_t *cfis, uint32_t done) { struct ahci_ioreq *aior; struct blockif_req *breq; struct ahci_prdt_entry *prdt; struct ahci_cmd_hdr *hdr; uint64_t lba; uint32_t len; int err, first, ncq, readop; prdt = (struct ahci_prdt_entry *)(cfis + 0x80); hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + slot * AHCI_CL_SIZE); ncq = 0; readop = 1; first = (done == 0); if (cfis[2] == ATA_WRITE || cfis[2] == ATA_WRITE48 || cfis[2] == ATA_WRITE_MUL || cfis[2] == ATA_WRITE_MUL48 || cfis[2] == ATA_WRITE_DMA || cfis[2] == ATA_WRITE_DMA48 || cfis[2] == ATA_WRITE_FPDMA_QUEUED) readop = 0; if (cfis[2] == ATA_WRITE_FPDMA_QUEUED || cfis[2] == ATA_READ_FPDMA_QUEUED) { lba = ((uint64_t)cfis[10] << 40) | ((uint64_t)cfis[9] << 32) | ((uint64_t)cfis[8] << 24) | ((uint64_t)cfis[6] << 16) | ((uint64_t)cfis[5] << 8) | cfis[4]; len = cfis[11] << 8 | cfis[3]; if (!len) len = 65536; ncq = 1; } else if (cfis[2] == ATA_READ48 || cfis[2] == ATA_WRITE48 || cfis[2] == ATA_READ_MUL48 || cfis[2] == ATA_WRITE_MUL48 || cfis[2] == ATA_READ_DMA48 || cfis[2] == ATA_WRITE_DMA48) { lba = ((uint64_t)cfis[10] << 40) | ((uint64_t)cfis[9] << 32) | ((uint64_t)cfis[8] << 24) | ((uint64_t)cfis[6] << 16) | ((uint64_t)cfis[5] << 8) | cfis[4]; len = cfis[13] << 8 | cfis[12]; if (!len) len = 65536; } else { lba = ((cfis[7] & 0xf) << 24) | (cfis[6] << 16) | (cfis[5] << 8) | cfis[4]; len = cfis[12]; if (!len) len = 256; } lba *= blockif_sectsz(p->bctx); len *= blockif_sectsz(p->bctx); /* Pull request off free list */ aior = STAILQ_FIRST(&p->iofhd); assert(aior != NULL); STAILQ_REMOVE_HEAD(&p->iofhd, io_flist); aior->cfis = cfis; aior->slot = slot; aior->len = len; aior->done = done; breq = &aior->io_req; breq->br_offset = lba + done; ahci_build_iov(p, aior, prdt, hdr->prdtl); /* Mark this command in-flight. */ p->pending |= 1 << slot; /* Stuff request onto busy list. */ TAILQ_INSERT_HEAD(&p->iobhd, aior, io_blist); if (ncq && first) ahci_write_fis_d2h_ncq(p, slot); if (readop) err = blockif_read(p->bctx, breq); else err = blockif_write(p->bctx, breq); assert(err == 0); } static void ahci_handle_flush(struct ahci_port *p, int slot, uint8_t *cfis) { struct ahci_ioreq *aior; struct blockif_req *breq; int err; /* * Pull request off free list */ aior = STAILQ_FIRST(&p->iofhd); assert(aior != NULL); STAILQ_REMOVE_HEAD(&p->iofhd, io_flist); aior->cfis = cfis; aior->slot = slot; aior->len = 0; aior->done = 0; aior->more = 0; breq = &aior->io_req; /* * Mark this command in-flight. */ p->pending |= 1 << slot; /* * Stuff request onto busy list */ TAILQ_INSERT_HEAD(&p->iobhd, aior, io_blist); err = blockif_flush(p->bctx, breq); assert(err == 0); } static inline void read_prdt(struct ahci_port *p, int slot, uint8_t *cfis, void *buf, int size) { struct ahci_cmd_hdr *hdr; struct ahci_prdt_entry *prdt; void *to; int i, len; hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + slot * AHCI_CL_SIZE); len = size; to = buf; prdt = (struct ahci_prdt_entry *)(cfis + 0x80); for (i = 0; i < hdr->prdtl && len; i++) { uint8_t *ptr; uint32_t dbcsz; int sublen; dbcsz = (prdt->dbc & DBCMASK) + 1; ptr = paddr_guest2host(ahci_ctx(p->pr_sc), prdt->dba, dbcsz); sublen = len < dbcsz ? len : dbcsz; memcpy(to, ptr, sublen); len -= sublen; to += sublen; prdt++; } } static void ahci_handle_dsm_trim(struct ahci_port *p, int slot, uint8_t *cfis, uint32_t done) { struct ahci_ioreq *aior; struct blockif_req *breq; uint8_t *entry; uint64_t elba; uint32_t len, elen; int err, first, ncq; uint8_t buf[512]; first = (done == 0); if (cfis[2] == ATA_DATA_SET_MANAGEMENT) { len = (uint16_t)cfis[13] << 8 | cfis[12]; len *= 512; ncq = 0; } else { /* ATA_SEND_FPDMA_QUEUED */ len = (uint16_t)cfis[11] << 8 | cfis[3]; len *= 512; ncq = 1; } read_prdt(p, slot, cfis, buf, sizeof(buf)); next: entry = &buf[done]; elba = ((uint64_t)entry[5] << 40) | ((uint64_t)entry[4] << 32) | ((uint64_t)entry[3] << 24) | ((uint64_t)entry[2] << 16) | ((uint64_t)entry[1] << 8) | entry[0]; elen = (uint16_t)entry[7] << 8 | entry[6]; done += 8; if (elen == 0) { if (done >= len) { ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); p->pending &= ~(1 << slot); ahci_check_stopped(p); if (!first) ahci_handle_port(p); return; } goto next; } /* * Pull request off free list */ aior = STAILQ_FIRST(&p->iofhd); assert(aior != NULL); STAILQ_REMOVE_HEAD(&p->iofhd, io_flist); aior->cfis = cfis; aior->slot = slot; aior->len = len; aior->done = done; aior->more = (len != done); breq = &aior->io_req; breq->br_offset = elba * blockif_sectsz(p->bctx); breq->br_resid = elen * blockif_sectsz(p->bctx); /* * Mark this command in-flight. */ p->pending |= 1 << slot; /* * Stuff request onto busy list */ TAILQ_INSERT_HEAD(&p->iobhd, aior, io_blist); if (ncq && first) ahci_write_fis_d2h_ncq(p, slot); err = blockif_delete(p->bctx, breq); assert(err == 0); } static inline void write_prdt(struct ahci_port *p, int slot, uint8_t *cfis, void *buf, int size) { struct ahci_cmd_hdr *hdr; struct ahci_prdt_entry *prdt; void *from; int i, len; hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + slot * AHCI_CL_SIZE); len = size; from = buf; prdt = (struct ahci_prdt_entry *)(cfis + 0x80); for (i = 0; i < hdr->prdtl && len; i++) { uint8_t *ptr; uint32_t dbcsz; int sublen; dbcsz = (prdt->dbc & DBCMASK) + 1; ptr = paddr_guest2host(ahci_ctx(p->pr_sc), prdt->dba, dbcsz); sublen = len < dbcsz ? len : dbcsz; memcpy(ptr, from, sublen); len -= sublen; from += sublen; prdt++; } hdr->prdbc = size - len; } static void ahci_checksum(uint8_t *buf, int size) { int i; uint8_t sum = 0; for (i = 0; i < size - 1; i++) sum += buf[i]; buf[size - 1] = 0x100 - sum; } static void ahci_handle_read_log(struct ahci_port *p, int slot, uint8_t *cfis) { struct ahci_cmd_hdr *hdr; uint8_t buf[512]; hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + slot * AHCI_CL_SIZE); if (p->atapi || hdr->prdtl == 0 || cfis[4] != 0x10 || cfis[5] != 0 || cfis[9] != 0 || cfis[12] != 1 || cfis[13] != 0) { ahci_write_fis_d2h(p, slot, cfis, (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR); return; } memset(buf, 0, sizeof(buf)); memcpy(buf, p->err_cfis, sizeof(p->err_cfis)); ahci_checksum(buf, sizeof(buf)); if (cfis[2] == ATA_READ_LOG_EXT) ahci_write_fis_piosetup(p); write_prdt(p, slot, cfis, (void *)buf, sizeof(buf)); ahci_write_fis_d2h(p, slot, cfis, ATA_S_DSC | ATA_S_READY); } static void handle_identify(struct ahci_port *p, int slot, uint8_t *cfis) { struct ahci_cmd_hdr *hdr; hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + slot * AHCI_CL_SIZE); if (p->atapi || hdr->prdtl == 0) { ahci_write_fis_d2h(p, slot, cfis, (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR); } else { uint16_t buf[256]; uint64_t sectors; int sectsz, psectsz, psectoff, candelete, ro; uint16_t cyl; uint8_t sech, heads; ro = blockif_is_ro(p->bctx); candelete = blockif_candelete(p->bctx); sectsz = blockif_sectsz(p->bctx); sectors = blockif_size(p->bctx) / sectsz; blockif_chs(p->bctx, &cyl, &heads, &sech); blockif_psectsz(p->bctx, &psectsz, &psectoff); memset(buf, 0, sizeof(buf)); buf[0] = 0x0040; buf[1] = cyl; buf[3] = heads; buf[6] = sech; ata_string((uint8_t *)(buf+10), p->ident, 20); ata_string((uint8_t *)(buf+23), "001", 8); ata_string((uint8_t *)(buf+27), "BHYVE SATA DISK", 40); buf[47] = (0x8000 | 128); buf[48] = 0; buf[49] = (1 << 8 | 1 << 9 | 1 << 11); buf[50] = (1 << 14); buf[53] = (1 << 1 | 1 << 2); if (p->mult_sectors) buf[59] = (0x100 | p->mult_sectors); if (sectors <= 0x0fffffff) { buf[60] = sectors; buf[61] = (sectors >> 16); } else { buf[60] = 0xffff; buf[61] = 0x0fff; } buf[63] = 0x7; if (p->xfermode & ATA_WDMA0) buf[63] |= (1 << ((p->xfermode & 7) + 8)); buf[64] = 0x3; buf[65] = 120; buf[66] = 120; buf[67] = 120; buf[68] = 120; buf[69] = 0; buf[75] = 31; buf[76] = (ATA_SATA_GEN1 | ATA_SATA_GEN2 | ATA_SATA_GEN3 | ATA_SUPPORT_NCQ); buf[77] = (ATA_SUPPORT_RCVSND_FPDMA_QUEUED | (p->ssts & ATA_SS_SPD_MASK) >> 3); buf[80] = 0x3f0; buf[81] = 0x28; buf[82] = (ATA_SUPPORT_POWERMGT | ATA_SUPPORT_WRITECACHE| ATA_SUPPORT_LOOKAHEAD | ATA_SUPPORT_NOP); buf[83] = (ATA_SUPPORT_ADDRESS48 | ATA_SUPPORT_FLUSHCACHE | ATA_SUPPORT_FLUSHCACHE48 | 1 << 14); buf[84] = (1 << 14); buf[85] = (ATA_SUPPORT_POWERMGT | ATA_SUPPORT_WRITECACHE| ATA_SUPPORT_LOOKAHEAD | ATA_SUPPORT_NOP); buf[86] = (ATA_SUPPORT_ADDRESS48 | ATA_SUPPORT_FLUSHCACHE | ATA_SUPPORT_FLUSHCACHE48 | 1 << 15); buf[87] = (1 << 14); buf[88] = 0x7f; if (p->xfermode & ATA_UDMA0) buf[88] |= (1 << ((p->xfermode & 7) + 8)); buf[100] = sectors; buf[101] = (sectors >> 16); buf[102] = (sectors >> 32); buf[103] = (sectors >> 48); if (candelete && !ro) { buf[69] |= ATA_SUPPORT_RZAT | ATA_SUPPORT_DRAT; buf[105] = 1; buf[169] = ATA_SUPPORT_DSM_TRIM; } buf[106] = 0x4000; buf[209] = 0x4000; if (psectsz > sectsz) { buf[106] |= 0x2000; buf[106] |= ffsl(psectsz / sectsz) - 1; buf[209] |= (psectoff / sectsz); } if (sectsz > 512) { buf[106] |= 0x1000; buf[117] = sectsz / 2; buf[118] = ((sectsz / 2) >> 16); } buf[119] = (ATA_SUPPORT_RWLOGDMAEXT | 1 << 14); buf[120] = (ATA_SUPPORT_RWLOGDMAEXT | 1 << 14); buf[222] = 0x1020; buf[255] = 0x00a5; ahci_checksum((uint8_t *)buf, sizeof(buf)); ahci_write_fis_piosetup(p); write_prdt(p, slot, cfis, (void *)buf, sizeof(buf)); ahci_write_fis_d2h(p, slot, cfis, ATA_S_DSC | ATA_S_READY); } } static void handle_atapi_identify(struct ahci_port *p, int slot, uint8_t *cfis) { if (!p->atapi) { ahci_write_fis_d2h(p, slot, cfis, (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR); } else { uint16_t buf[256]; memset(buf, 0, sizeof(buf)); buf[0] = (2 << 14 | 5 << 8 | 1 << 7 | 2 << 5); ata_string((uint8_t *)(buf+10), p->ident, 20); ata_string((uint8_t *)(buf+23), "001", 8); ata_string((uint8_t *)(buf+27), "BHYVE SATA DVD ROM", 40); buf[49] = (1 << 9 | 1 << 8); buf[50] = (1 << 14 | 1); buf[53] = (1 << 2 | 1 << 1); buf[62] = 0x3f; buf[63] = 7; if (p->xfermode & ATA_WDMA0) buf[63] |= (1 << ((p->xfermode & 7) + 8)); buf[64] = 3; buf[65] = 120; buf[66] = 120; buf[67] = 120; buf[68] = 120; buf[76] = (ATA_SATA_GEN1 | ATA_SATA_GEN2 | ATA_SATA_GEN3); buf[77] = ((p->ssts & ATA_SS_SPD_MASK) >> 3); buf[78] = (1 << 5); buf[80] = 0x3f0; buf[82] = (ATA_SUPPORT_POWERMGT | ATA_SUPPORT_PACKET | ATA_SUPPORT_RESET | ATA_SUPPORT_NOP); buf[83] = (1 << 14); buf[84] = (1 << 14); buf[85] = (ATA_SUPPORT_POWERMGT | ATA_SUPPORT_PACKET | ATA_SUPPORT_RESET | ATA_SUPPORT_NOP); buf[87] = (1 << 14); buf[88] = 0x7f; if (p->xfermode & ATA_UDMA0) buf[88] |= (1 << ((p->xfermode & 7) + 8)); buf[222] = 0x1020; buf[255] = 0x00a5; ahci_checksum((uint8_t *)buf, sizeof(buf)); ahci_write_fis_piosetup(p); write_prdt(p, slot, cfis, (void *)buf, sizeof(buf)); ahci_write_fis_d2h(p, slot, cfis, ATA_S_DSC | ATA_S_READY); } } static void atapi_inquiry(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t buf[36]; uint8_t *acmd; int len; uint32_t tfd; acmd = cfis + 0x40; if (acmd[1] & 1) { /* VPD */ if (acmd[2] == 0) { /* Supported VPD pages */ buf[0] = 0x05; buf[1] = 0; buf[2] = 0; buf[3] = 1; buf[4] = 0; len = 4 + buf[3]; } else { p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x24; tfd = (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR; cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, tfd); return; } } else { buf[0] = 0x05; buf[1] = 0x80; buf[2] = 0x00; buf[3] = 0x21; buf[4] = 31; buf[5] = 0; buf[6] = 0; buf[7] = 0; atapi_string(buf + 8, "BHYVE", 8); atapi_string(buf + 16, "BHYVE DVD-ROM", 16); atapi_string(buf + 32, "001", 4); len = sizeof(buf); } if (len > acmd[4]) len = acmd[4]; cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; write_prdt(p, slot, cfis, buf, len); ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); } static void atapi_read_capacity(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t buf[8]; uint64_t sectors; sectors = blockif_size(p->bctx) / 2048; be32enc(buf, sectors - 1); be32enc(buf + 4, 2048); cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; write_prdt(p, slot, cfis, buf, sizeof(buf)); ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); } static void atapi_read_toc(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t *acmd; uint8_t format; int len; acmd = cfis + 0x40; len = be16dec(acmd + 7); format = acmd[9] >> 6; switch (format) { case 0: { int msf, size; uint64_t sectors; uint8_t start_track, buf[20], *bp; msf = (acmd[1] >> 1) & 1; start_track = acmd[6]; if (start_track > 1 && start_track != 0xaa) { uint32_t tfd; p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x24; tfd = (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR; cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, tfd); return; } bp = buf + 2; *bp++ = 1; *bp++ = 1; if (start_track <= 1) { *bp++ = 0; *bp++ = 0x14; *bp++ = 1; *bp++ = 0; if (msf) { *bp++ = 0; lba_to_msf(bp, 0); bp += 3; } else { *bp++ = 0; *bp++ = 0; *bp++ = 0; *bp++ = 0; } } *bp++ = 0; *bp++ = 0x14; *bp++ = 0xaa; *bp++ = 0; sectors = blockif_size(p->bctx) / blockif_sectsz(p->bctx); sectors >>= 2; if (msf) { *bp++ = 0; lba_to_msf(bp, sectors); bp += 3; } else { be32enc(bp, sectors); bp += 4; } size = bp - buf; be16enc(buf, size - 2); if (len > size) len = size; write_prdt(p, slot, cfis, buf, len); cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); break; } case 1: { uint8_t buf[12]; memset(buf, 0, sizeof(buf)); buf[1] = 0xa; buf[2] = 0x1; buf[3] = 0x1; if (len > sizeof(buf)) len = sizeof(buf); write_prdt(p, slot, cfis, buf, len); cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); break; } case 2: { int msf, size; uint64_t sectors; - uint8_t start_track, *bp, buf[50]; + uint8_t *bp, buf[50]; msf = (acmd[1] >> 1) & 1; - start_track = acmd[6]; bp = buf + 2; *bp++ = 1; *bp++ = 1; *bp++ = 1; *bp++ = 0x14; *bp++ = 0; *bp++ = 0xa0; *bp++ = 0; *bp++ = 0; *bp++ = 0; *bp++ = 0; *bp++ = 1; *bp++ = 0; *bp++ = 0; *bp++ = 1; *bp++ = 0x14; *bp++ = 0; *bp++ = 0xa1; *bp++ = 0; *bp++ = 0; *bp++ = 0; *bp++ = 0; *bp++ = 1; *bp++ = 0; *bp++ = 0; *bp++ = 1; *bp++ = 0x14; *bp++ = 0; *bp++ = 0xa2; *bp++ = 0; *bp++ = 0; *bp++ = 0; sectors = blockif_size(p->bctx) / blockif_sectsz(p->bctx); sectors >>= 2; if (msf) { *bp++ = 0; lba_to_msf(bp, sectors); bp += 3; } else { be32enc(bp, sectors); bp += 4; } *bp++ = 1; *bp++ = 0x14; *bp++ = 0; *bp++ = 1; *bp++ = 0; *bp++ = 0; *bp++ = 0; if (msf) { *bp++ = 0; lba_to_msf(bp, 0); bp += 3; } else { *bp++ = 0; *bp++ = 0; *bp++ = 0; *bp++ = 0; } size = bp - buf; be16enc(buf, size - 2); if (len > size) len = size; write_prdt(p, slot, cfis, buf, len); cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); break; } default: { uint32_t tfd; p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x24; tfd = (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR; cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, tfd); break; } } } static void atapi_report_luns(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t buf[16]; memset(buf, 0, sizeof(buf)); buf[3] = 8; cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; write_prdt(p, slot, cfis, buf, sizeof(buf)); ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); } static void atapi_read(struct ahci_port *p, int slot, uint8_t *cfis, uint32_t done) { struct ahci_ioreq *aior; struct ahci_cmd_hdr *hdr; struct ahci_prdt_entry *prdt; struct blockif_req *breq; - struct pci_ahci_softc *sc; uint8_t *acmd; uint64_t lba; uint32_t len; int err; - sc = p->pr_sc; acmd = cfis + 0x40; hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + slot * AHCI_CL_SIZE); prdt = (struct ahci_prdt_entry *)(cfis + 0x80); lba = be32dec(acmd + 2); if (acmd[0] == READ_10) len = be16dec(acmd + 7); else len = be32dec(acmd + 6); if (len == 0) { cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); } lba *= 2048; len *= 2048; /* * Pull request off free list */ aior = STAILQ_FIRST(&p->iofhd); assert(aior != NULL); STAILQ_REMOVE_HEAD(&p->iofhd, io_flist); aior->cfis = cfis; aior->slot = slot; aior->len = len; aior->done = done; breq = &aior->io_req; breq->br_offset = lba + done; ahci_build_iov(p, aior, prdt, hdr->prdtl); /* Mark this command in-flight. */ p->pending |= 1 << slot; /* Stuff request onto busy list. */ TAILQ_INSERT_HEAD(&p->iobhd, aior, io_blist); err = blockif_read(p->bctx, breq); assert(err == 0); } static void atapi_request_sense(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t buf[64]; uint8_t *acmd; int len; acmd = cfis + 0x40; len = acmd[4]; if (len > sizeof(buf)) len = sizeof(buf); memset(buf, 0, len); buf[0] = 0x70 | (1 << 7); buf[2] = p->sense_key; buf[7] = 10; buf[12] = p->asc; write_prdt(p, slot, cfis, buf, len); cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); } static void atapi_start_stop_unit(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t *acmd = cfis + 0x40; uint32_t tfd; switch (acmd[4] & 3) { case 0: case 1: case 3: cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; tfd = ATA_S_READY | ATA_S_DSC; break; case 2: /* TODO eject media */ cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x53; tfd = (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR; break; } ahci_write_fis_d2h(p, slot, cfis, tfd); } static void atapi_mode_sense(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t *acmd; uint32_t tfd; uint8_t pc, code; int len; acmd = cfis + 0x40; len = be16dec(acmd + 7); pc = acmd[2] >> 6; code = acmd[2] & 0x3f; switch (pc) { case 0: switch (code) { case MODEPAGE_RW_ERROR_RECOVERY: { uint8_t buf[16]; if (len > sizeof(buf)) len = sizeof(buf); memset(buf, 0, sizeof(buf)); be16enc(buf, 16 - 2); buf[2] = 0x70; buf[8] = 0x01; buf[9] = 16 - 10; buf[11] = 0x05; write_prdt(p, slot, cfis, buf, len); tfd = ATA_S_READY | ATA_S_DSC; break; } case MODEPAGE_CD_CAPABILITIES: { uint8_t buf[30]; if (len > sizeof(buf)) len = sizeof(buf); memset(buf, 0, sizeof(buf)); be16enc(buf, 30 - 2); buf[2] = 0x70; buf[8] = 0x2A; buf[9] = 30 - 10; buf[10] = 0x08; buf[12] = 0x71; be16enc(&buf[18], 2); be16enc(&buf[20], 512); write_prdt(p, slot, cfis, buf, len); tfd = ATA_S_READY | ATA_S_DSC; break; } default: goto error; break; } break; case 3: p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x39; tfd = (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR; break; error: case 1: case 2: p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x24; tfd = (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR; break; } cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, tfd); } static void atapi_get_event_status_notification(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t *acmd; uint32_t tfd; acmd = cfis + 0x40; /* we don't support asynchronous operation */ if (!(acmd[1] & 1)) { p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x24; tfd = (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR; } else { uint8_t buf[8]; int len; len = be16dec(acmd + 7); if (len > sizeof(buf)) len = sizeof(buf); memset(buf, 0, sizeof(buf)); be16enc(buf, 8 - 2); buf[2] = 0x04; buf[3] = 0x10; buf[5] = 0x02; write_prdt(p, slot, cfis, buf, len); tfd = ATA_S_READY | ATA_S_DSC; } cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, tfd); } static void handle_packet_cmd(struct ahci_port *p, int slot, uint8_t *cfis) { uint8_t *acmd; acmd = cfis + 0x40; #ifdef AHCI_DEBUG { int i; DPRINTF("ACMD:"); for (i = 0; i < 16; i++) DPRINTF("%02x ", acmd[i]); DPRINTF("\n"); } #endif switch (acmd[0]) { case TEST_UNIT_READY: cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); break; case INQUIRY: atapi_inquiry(p, slot, cfis); break; case READ_CAPACITY: atapi_read_capacity(p, slot, cfis); break; case PREVENT_ALLOW: /* TODO */ cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); break; case READ_TOC: atapi_read_toc(p, slot, cfis); break; case REPORT_LUNS: atapi_report_luns(p, slot, cfis); break; case READ_10: case READ_12: atapi_read(p, slot, cfis, 0); break; case REQUEST_SENSE: atapi_request_sense(p, slot, cfis); break; case START_STOP_UNIT: atapi_start_stop_unit(p, slot, cfis); break; case MODE_SENSE_10: atapi_mode_sense(p, slot, cfis); break; case GET_EVENT_STATUS_NOTIFICATION: atapi_get_event_status_notification(p, slot, cfis); break; default: cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x20; ahci_write_fis_d2h(p, slot, cfis, (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR); break; } } static void ahci_handle_cmd(struct ahci_port *p, int slot, uint8_t *cfis) { p->tfd |= ATA_S_BUSY; switch (cfis[2]) { case ATA_ATA_IDENTIFY: handle_identify(p, slot, cfis); break; case ATA_SETFEATURES: { switch (cfis[3]) { case ATA_SF_ENAB_SATA_SF: switch (cfis[12]) { case ATA_SATA_SF_AN: p->tfd = ATA_S_DSC | ATA_S_READY; break; default: p->tfd = ATA_S_ERROR | ATA_S_READY; p->tfd |= (ATA_ERROR_ABORT << 8); break; } break; case ATA_SF_ENAB_WCACHE: case ATA_SF_DIS_WCACHE: case ATA_SF_ENAB_RCACHE: case ATA_SF_DIS_RCACHE: p->tfd = ATA_S_DSC | ATA_S_READY; break; case ATA_SF_SETXFER: { switch (cfis[12] & 0xf8) { case ATA_PIO: case ATA_PIO0: break; case ATA_WDMA0: case ATA_UDMA0: p->xfermode = (cfis[12] & 0x7); break; } p->tfd = ATA_S_DSC | ATA_S_READY; break; } default: p->tfd = ATA_S_ERROR | ATA_S_READY; p->tfd |= (ATA_ERROR_ABORT << 8); break; } ahci_write_fis_d2h(p, slot, cfis, p->tfd); break; } case ATA_SET_MULTI: if (cfis[12] != 0 && (cfis[12] > 128 || (cfis[12] & (cfis[12] - 1)))) { p->tfd = ATA_S_ERROR | ATA_S_READY; p->tfd |= (ATA_ERROR_ABORT << 8); } else { p->mult_sectors = cfis[12]; p->tfd = ATA_S_DSC | ATA_S_READY; } ahci_write_fis_d2h(p, slot, cfis, p->tfd); break; case ATA_READ: case ATA_WRITE: case ATA_READ48: case ATA_WRITE48: case ATA_READ_MUL: case ATA_WRITE_MUL: case ATA_READ_MUL48: case ATA_WRITE_MUL48: case ATA_READ_DMA: case ATA_WRITE_DMA: case ATA_READ_DMA48: case ATA_WRITE_DMA48: case ATA_READ_FPDMA_QUEUED: case ATA_WRITE_FPDMA_QUEUED: ahci_handle_rw(p, slot, cfis, 0); break; case ATA_FLUSHCACHE: case ATA_FLUSHCACHE48: ahci_handle_flush(p, slot, cfis); break; case ATA_DATA_SET_MANAGEMENT: if (cfis[11] == 0 && cfis[3] == ATA_DSM_TRIM && cfis[13] == 0 && cfis[12] == 1) { ahci_handle_dsm_trim(p, slot, cfis, 0); break; } ahci_write_fis_d2h(p, slot, cfis, (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR); break; case ATA_SEND_FPDMA_QUEUED: if ((cfis[13] & 0x1f) == ATA_SFPDMA_DSM && cfis[17] == 0 && cfis[16] == ATA_DSM_TRIM && cfis[11] == 0 && cfis[13] == 1) { ahci_handle_dsm_trim(p, slot, cfis, 0); break; } ahci_write_fis_d2h(p, slot, cfis, (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR); break; case ATA_READ_LOG_EXT: case ATA_READ_LOG_DMA_EXT: ahci_handle_read_log(p, slot, cfis); break; case ATA_SECURITY_FREEZE_LOCK: case ATA_SMART_CMD: case ATA_NOP: ahci_write_fis_d2h(p, slot, cfis, (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR); break; case ATA_CHECK_POWER_MODE: cfis[12] = 0xff; /* always on */ ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); break; case ATA_STANDBY_CMD: case ATA_STANDBY_IMMEDIATE: case ATA_IDLE_CMD: case ATA_IDLE_IMMEDIATE: case ATA_SLEEP: case ATA_READ_VERIFY: case ATA_READ_VERIFY48: ahci_write_fis_d2h(p, slot, cfis, ATA_S_READY | ATA_S_DSC); break; case ATA_ATAPI_IDENTIFY: handle_atapi_identify(p, slot, cfis); break; case ATA_PACKET_CMD: if (!p->atapi) { ahci_write_fis_d2h(p, slot, cfis, (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR); } else handle_packet_cmd(p, slot, cfis); break; default: WPRINTF("Unsupported cmd:%02x\n", cfis[2]); ahci_write_fis_d2h(p, slot, cfis, (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR); break; } } static void ahci_handle_slot(struct ahci_port *p, int slot) { struct ahci_cmd_hdr *hdr; struct ahci_prdt_entry *prdt; struct pci_ahci_softc *sc; uint8_t *cfis; int cfl; sc = p->pr_sc; hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + slot * AHCI_CL_SIZE); cfl = (hdr->flags & 0x1f) * 4; cfis = paddr_guest2host(ahci_ctx(sc), hdr->ctba, 0x80 + hdr->prdtl * sizeof(struct ahci_prdt_entry)); prdt = (struct ahci_prdt_entry *)(cfis + 0x80); #ifdef AHCI_DEBUG DPRINTF("\ncfis:"); for (i = 0; i < cfl; i++) { if (i % 10 == 0) DPRINTF("\n"); DPRINTF("%02x ", cfis[i]); } DPRINTF("\n"); for (i = 0; i < hdr->prdtl; i++) { DPRINTF("%d@%08"PRIx64"\n", prdt->dbc & 0x3fffff, prdt->dba); prdt++; } #endif if (cfis[0] != FIS_TYPE_REGH2D) { WPRINTF("Not a H2D FIS:%02x\n", cfis[0]); return; } if (cfis[1] & 0x80) { ahci_handle_cmd(p, slot, cfis); } else { if (cfis[15] & (1 << 2)) p->reset = 1; else if (p->reset) { p->reset = 0; ahci_port_reset(p); } p->ci &= ~(1 << slot); } } static void ahci_handle_port(struct ahci_port *p) { if (!(p->cmd & AHCI_P_CMD_ST)) return; /* * Search for any new commands to issue ignoring those that * are already in-flight. Stop if device is busy or in error. */ for (; (p->ci & ~p->pending) != 0; p->ccs = ((p->ccs + 1) & 31)) { if ((p->tfd & (ATA_S_BUSY | ATA_S_DRQ)) != 0) break; if (p->waitforclear) break; if ((p->ci & ~p->pending & (1 << p->ccs)) != 0) { p->cmd &= ~AHCI_P_CMD_CCS_MASK; p->cmd |= p->ccs << AHCI_P_CMD_CCS_SHIFT; ahci_handle_slot(p, p->ccs); } } } /* * blockif callback routine - this runs in the context of the blockif * i/o thread, so the mutex needs to be acquired. */ static void ata_ioreq_cb(struct blockif_req *br, int err) { struct ahci_cmd_hdr *hdr; struct ahci_ioreq *aior; struct ahci_port *p; struct pci_ahci_softc *sc; uint32_t tfd; uint8_t *cfis; int slot, ncq, dsm; DPRINTF("%s %d\n", __func__, err); ncq = dsm = 0; aior = br->br_param; p = aior->io_pr; cfis = aior->cfis; slot = aior->slot; sc = p->pr_sc; hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + slot * AHCI_CL_SIZE); if (cfis[2] == ATA_WRITE_FPDMA_QUEUED || cfis[2] == ATA_READ_FPDMA_QUEUED || cfis[2] == ATA_SEND_FPDMA_QUEUED) ncq = 1; if (cfis[2] == ATA_DATA_SET_MANAGEMENT || (cfis[2] == ATA_SEND_FPDMA_QUEUED && (cfis[13] & 0x1f) == ATA_SFPDMA_DSM)) dsm = 1; pthread_mutex_lock(&sc->mtx); /* * Delete the blockif request from the busy list */ TAILQ_REMOVE(&p->iobhd, aior, io_blist); /* * Move the blockif request back to the free list */ STAILQ_INSERT_TAIL(&p->iofhd, aior, io_flist); if (!err) hdr->prdbc = aior->done; if (!err && aior->more) { if (dsm) ahci_handle_dsm_trim(p, slot, cfis, aior->done); else ahci_handle_rw(p, slot, cfis, aior->done); goto out; } if (!err) tfd = ATA_S_READY | ATA_S_DSC; else tfd = (ATA_E_ABORT << 8) | ATA_S_READY | ATA_S_ERROR; if (ncq) ahci_write_fis_sdb(p, slot, cfis, tfd); else ahci_write_fis_d2h(p, slot, cfis, tfd); /* * This command is now complete. */ p->pending &= ~(1 << slot); ahci_check_stopped(p); ahci_handle_port(p); out: pthread_mutex_unlock(&sc->mtx); DPRINTF("%s exit\n", __func__); } static void atapi_ioreq_cb(struct blockif_req *br, int err) { struct ahci_cmd_hdr *hdr; struct ahci_ioreq *aior; struct ahci_port *p; struct pci_ahci_softc *sc; uint8_t *cfis; uint32_t tfd; int slot; DPRINTF("%s %d\n", __func__, err); aior = br->br_param; p = aior->io_pr; cfis = aior->cfis; slot = aior->slot; sc = p->pr_sc; hdr = (struct ahci_cmd_hdr *)(p->cmd_lst + aior->slot * AHCI_CL_SIZE); pthread_mutex_lock(&sc->mtx); /* * Delete the blockif request from the busy list */ TAILQ_REMOVE(&p->iobhd, aior, io_blist); /* * Move the blockif request back to the free list */ STAILQ_INSERT_TAIL(&p->iofhd, aior, io_flist); if (!err) hdr->prdbc = aior->done; if (!err && aior->more) { atapi_read(p, slot, cfis, aior->done); goto out; } if (!err) { tfd = ATA_S_READY | ATA_S_DSC; } else { p->sense_key = ATA_SENSE_ILLEGAL_REQUEST; p->asc = 0x21; tfd = (p->sense_key << 12) | ATA_S_READY | ATA_S_ERROR; } cfis[4] = (cfis[4] & ~7) | ATA_I_CMD | ATA_I_IN; ahci_write_fis_d2h(p, slot, cfis, tfd); /* * This command is now complete. */ p->pending &= ~(1 << slot); ahci_check_stopped(p); ahci_handle_port(p); out: pthread_mutex_unlock(&sc->mtx); DPRINTF("%s exit\n", __func__); } static void pci_ahci_ioreq_init(struct ahci_port *pr) { struct ahci_ioreq *vr; int i; pr->ioqsz = blockif_queuesz(pr->bctx); pr->ioreq = calloc(pr->ioqsz, sizeof(struct ahci_ioreq)); STAILQ_INIT(&pr->iofhd); /* * Add all i/o request entries to the free queue */ for (i = 0; i < pr->ioqsz; i++) { vr = &pr->ioreq[i]; vr->io_pr = pr; if (!pr->atapi) vr->io_req.br_callback = ata_ioreq_cb; else vr->io_req.br_callback = atapi_ioreq_cb; vr->io_req.br_param = vr; STAILQ_INSERT_TAIL(&pr->iofhd, vr, io_flist); } TAILQ_INIT(&pr->iobhd); } static void pci_ahci_port_write(struct pci_ahci_softc *sc, uint64_t offset, uint64_t value) { int port = (offset - AHCI_OFFSET) / AHCI_STEP; offset = (offset - AHCI_OFFSET) % AHCI_STEP; struct ahci_port *p = &sc->port[port]; DPRINTF("pci_ahci_port %d: write offset 0x%"PRIx64" value 0x%"PRIx64"\n", port, offset, value); switch (offset) { case AHCI_P_CLB: p->clb = value; break; case AHCI_P_CLBU: p->clbu = value; break; case AHCI_P_FB: p->fb = value; break; case AHCI_P_FBU: p->fbu = value; break; case AHCI_P_IS: p->is &= ~value; break; case AHCI_P_IE: p->ie = value & 0xFDC000FF; ahci_generate_intr(sc); break; case AHCI_P_CMD: { p->cmd &= ~(AHCI_P_CMD_ST | AHCI_P_CMD_SUD | AHCI_P_CMD_POD | AHCI_P_CMD_CLO | AHCI_P_CMD_FRE | AHCI_P_CMD_APSTE | AHCI_P_CMD_ATAPI | AHCI_P_CMD_DLAE | AHCI_P_CMD_ALPE | AHCI_P_CMD_ASP | AHCI_P_CMD_ICC_MASK); p->cmd |= (AHCI_P_CMD_ST | AHCI_P_CMD_SUD | AHCI_P_CMD_POD | AHCI_P_CMD_CLO | AHCI_P_CMD_FRE | AHCI_P_CMD_APSTE | AHCI_P_CMD_ATAPI | AHCI_P_CMD_DLAE | AHCI_P_CMD_ALPE | AHCI_P_CMD_ASP | AHCI_P_CMD_ICC_MASK) & value; if (!(value & AHCI_P_CMD_ST)) { ahci_port_stop(p); } else { uint64_t clb; p->cmd |= AHCI_P_CMD_CR; clb = (uint64_t)p->clbu << 32 | p->clb; p->cmd_lst = paddr_guest2host(ahci_ctx(sc), clb, AHCI_CL_SIZE * AHCI_MAX_SLOTS); } if (value & AHCI_P_CMD_FRE) { uint64_t fb; p->cmd |= AHCI_P_CMD_FR; fb = (uint64_t)p->fbu << 32 | p->fb; /* we don't support FBSCP, so rfis size is 256Bytes */ p->rfis = paddr_guest2host(ahci_ctx(sc), fb, 256); } else { p->cmd &= ~AHCI_P_CMD_FR; } if (value & AHCI_P_CMD_CLO) { p->tfd &= ~(ATA_S_BUSY | ATA_S_DRQ); p->cmd &= ~AHCI_P_CMD_CLO; } if (value & AHCI_P_CMD_ICC_MASK) { p->cmd &= ~AHCI_P_CMD_ICC_MASK; } ahci_handle_port(p); break; } case AHCI_P_TFD: case AHCI_P_SIG: case AHCI_P_SSTS: WPRINTF("pci_ahci_port: read only registers 0x%"PRIx64"\n", offset); break; case AHCI_P_SCTL: p->sctl = value; if (!(p->cmd & AHCI_P_CMD_ST)) { if (value & ATA_SC_DET_RESET) ahci_port_reset(p); } break; case AHCI_P_SERR: p->serr &= ~value; break; case AHCI_P_SACT: p->sact |= value; break; case AHCI_P_CI: p->ci |= value; ahci_handle_port(p); break; case AHCI_P_SNTF: case AHCI_P_FBS: default: break; } } static void pci_ahci_host_write(struct pci_ahci_softc *sc, uint64_t offset, uint64_t value) { DPRINTF("pci_ahci_host: write offset 0x%"PRIx64" value 0x%"PRIx64"\n", offset, value); switch (offset) { case AHCI_CAP: case AHCI_PI: case AHCI_VS: case AHCI_CAP2: DPRINTF("pci_ahci_host: read only registers 0x%"PRIx64"\n", offset); break; case AHCI_GHC: if (value & AHCI_GHC_HR) ahci_reset(sc); else if (value & AHCI_GHC_IE) { sc->ghc |= AHCI_GHC_IE; ahci_generate_intr(sc); } break; case AHCI_IS: sc->is &= ~value; ahci_generate_intr(sc); break; default: break; } } static void pci_ahci_write(struct vmctx *ctx, int vcpu, struct pci_devinst *pi, int baridx, uint64_t offset, int size, uint64_t value) { struct pci_ahci_softc *sc = pi->pi_arg; assert(baridx == 5); assert((offset % 4) == 0 && size == 4); pthread_mutex_lock(&sc->mtx); if (offset < AHCI_OFFSET) pci_ahci_host_write(sc, offset, value); else if (offset < AHCI_OFFSET + sc->ports * AHCI_STEP) pci_ahci_port_write(sc, offset, value); else WPRINTF("pci_ahci: unknown i/o write offset 0x%"PRIx64"\n", offset); pthread_mutex_unlock(&sc->mtx); } static uint64_t pci_ahci_host_read(struct pci_ahci_softc *sc, uint64_t offset) { uint32_t value; switch (offset) { case AHCI_CAP: case AHCI_GHC: case AHCI_IS: case AHCI_PI: case AHCI_VS: case AHCI_CCCC: case AHCI_CCCP: case AHCI_EM_LOC: case AHCI_EM_CTL: case AHCI_CAP2: { uint32_t *p = &sc->cap; p += (offset - AHCI_CAP) / sizeof(uint32_t); value = *p; break; } default: value = 0; break; } DPRINTF("pci_ahci_host: read offset 0x%"PRIx64" value 0x%x\n", offset, value); return (value); } static uint64_t pci_ahci_port_read(struct pci_ahci_softc *sc, uint64_t offset) { uint32_t value; int port = (offset - AHCI_OFFSET) / AHCI_STEP; offset = (offset - AHCI_OFFSET) % AHCI_STEP; switch (offset) { case AHCI_P_CLB: case AHCI_P_CLBU: case AHCI_P_FB: case AHCI_P_FBU: case AHCI_P_IS: case AHCI_P_IE: case AHCI_P_CMD: case AHCI_P_TFD: case AHCI_P_SIG: case AHCI_P_SSTS: case AHCI_P_SCTL: case AHCI_P_SERR: case AHCI_P_SACT: case AHCI_P_CI: case AHCI_P_SNTF: case AHCI_P_FBS: { uint32_t *p= &sc->port[port].clb; p += (offset - AHCI_P_CLB) / sizeof(uint32_t); value = *p; break; } default: value = 0; break; } DPRINTF("pci_ahci_port %d: read offset 0x%"PRIx64" value 0x%x\n", port, offset, value); return value; } static uint64_t pci_ahci_read(struct vmctx *ctx, int vcpu, struct pci_devinst *pi, int baridx, uint64_t regoff, int size) { struct pci_ahci_softc *sc = pi->pi_arg; uint64_t offset; uint32_t value; assert(baridx == 5); assert(size == 1 || size == 2 || size == 4); assert((regoff & (size - 1)) == 0); pthread_mutex_lock(&sc->mtx); offset = regoff & ~0x3; /* round down to a multiple of 4 bytes */ if (offset < AHCI_OFFSET) value = pci_ahci_host_read(sc, offset); else if (offset < AHCI_OFFSET + sc->ports * AHCI_STEP) value = pci_ahci_port_read(sc, offset); else { value = 0; WPRINTF("pci_ahci: unknown i/o read offset 0x%"PRIx64"\n", regoff); } value >>= 8 * (regoff & 0x3); pthread_mutex_unlock(&sc->mtx); return (value); } static int pci_ahci_init(struct vmctx *ctx, struct pci_devinst *pi, char *opts, int atapi) { char bident[sizeof("XX:X:X")]; struct blockif_ctxt *bctxt; struct pci_ahci_softc *sc; int ret, slots; MD5_CTX mdctx; u_char digest[16]; ret = 0; if (opts == NULL) { fprintf(stderr, "pci_ahci: backing device required\n"); return (1); } #ifdef AHCI_DEBUG dbg = fopen("/tmp/log", "w+"); #endif sc = calloc(1, sizeof(struct pci_ahci_softc)); pi->pi_arg = sc; sc->asc_pi = pi; sc->ports = MAX_PORTS; /* * Only use port 0 for a backing device. All other ports will be * marked as unused */ sc->port[0].atapi = atapi; /* * Attempt to open the backing image. Use the PCI * slot/func for the identifier string. */ snprintf(bident, sizeof(bident), "%d:%d", pi->pi_slot, pi->pi_func); bctxt = blockif_open(opts, bident); if (bctxt == NULL) { ret = 1; goto open_fail; } sc->port[0].bctx = bctxt; sc->port[0].pr_sc = sc; /* * Create an identifier for the backing file. Use parts of the * md5 sum of the filename */ MD5Init(&mdctx); MD5Update(&mdctx, opts, strlen(opts)); MD5Final(digest, &mdctx); sprintf(sc->port[0].ident, "BHYVE-%02X%02X-%02X%02X-%02X%02X", digest[0], digest[1], digest[2], digest[3], digest[4], digest[5]); /* * Allocate blockif request structures and add them * to the free list */ pci_ahci_ioreq_init(&sc->port[0]); pthread_mutex_init(&sc->mtx, NULL); /* Intel ICH8 AHCI */ slots = sc->port[0].ioqsz; if (slots > 32) slots = 32; --slots; sc->cap = AHCI_CAP_64BIT | AHCI_CAP_SNCQ | AHCI_CAP_SSNTF | AHCI_CAP_SMPS | AHCI_CAP_SSS | AHCI_CAP_SALP | AHCI_CAP_SAL | AHCI_CAP_SCLO | (0x3 << AHCI_CAP_ISS_SHIFT)| AHCI_CAP_PMD | AHCI_CAP_SSC | AHCI_CAP_PSC | (slots << AHCI_CAP_NCS_SHIFT) | AHCI_CAP_SXS | (sc->ports - 1); /* Only port 0 implemented */ sc->pi = 1; sc->vs = 0x10300; sc->cap2 = AHCI_CAP2_APST; ahci_reset(sc); pci_set_cfgdata16(pi, PCIR_DEVICE, 0x2821); pci_set_cfgdata16(pi, PCIR_VENDOR, 0x8086); pci_set_cfgdata8(pi, PCIR_CLASS, PCIC_STORAGE); pci_set_cfgdata8(pi, PCIR_SUBCLASS, PCIS_STORAGE_SATA); pci_set_cfgdata8(pi, PCIR_PROGIF, PCIP_STORAGE_SATA_AHCI_1_0); pci_emul_add_msicap(pi, 1); pci_emul_alloc_bar(pi, 5, PCIBAR_MEM32, AHCI_OFFSET + sc->ports * AHCI_STEP); pci_lintr_request(pi); open_fail: if (ret) { if (sc->port[0].bctx != NULL) blockif_close(sc->port[0].bctx); free(sc); } return (ret); } static int pci_ahci_hd_init(struct vmctx *ctx, struct pci_devinst *pi, char *opts) { return (pci_ahci_init(ctx, pi, opts, 0)); } static int pci_ahci_atapi_init(struct vmctx *ctx, struct pci_devinst *pi, char *opts) { return (pci_ahci_init(ctx, pi, opts, 1)); } /* * Use separate emulation names to distinguish drive and atapi devices */ struct pci_devemu pci_de_ahci_hd = { .pe_emu = "ahci-hd", .pe_init = pci_ahci_hd_init, .pe_barwrite = pci_ahci_write, .pe_barread = pci_ahci_read }; PCI_EMUL_SET(pci_de_ahci_hd); struct pci_devemu pci_de_ahci_cd = { .pe_emu = "ahci-cd", .pe_init = pci_ahci_atapi_init, .pe_barwrite = pci_ahci_write, .pe_barread = pci_ahci_read }; PCI_EMUL_SET(pci_de_ahci_cd); Index: projects/clang380-import/usr.sbin/bhyve =================================================================== --- projects/clang380-import/usr.sbin/bhyve (revision 294776) +++ projects/clang380-import/usr.sbin/bhyve (revision 294777) Property changes on: projects/clang380-import/usr.sbin/bhyve ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/usr.sbin/bhyve:r293686-294776 Index: projects/clang380-import/usr.sbin/bsdconfig/share/strings.subr =================================================================== --- projects/clang380-import/usr.sbin/bsdconfig/share/strings.subr (revision 294776) +++ projects/clang380-import/usr.sbin/bsdconfig/share/strings.subr (revision 294777) @@ -1,454 +1,454 @@ if [ ! "$_STRINGS_SUBR" ]; then _STRINGS_SUBR=1 # # Copyright (c) 2006-2013 Devin Teske # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS # OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT # LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # # $FreeBSD$ # ############################################################ INCLUDES BSDCFG_SHARE="/usr/share/bsdconfig" . $BSDCFG_SHARE/common.subr || exit 1 ############################################################ GLOBALS # # A Literal newline (for use with f_replace_all(), or IFS, or whatever) # NL=" " # END-QUOTE # # Valid characters that can appear in an sh(1) variable name # # Please note that the character ranges A-Z and a-z should be avoided because # these can include accent characters (which are not valid in a variable name). # For example, A-Z matches any character that sorts after A but before Z, # including A and Z. Although ASCII order would make more sense, that is not # how it works. # VALID_VARNAME_CHARS="0-9ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_" ############################################################ FUNCTIONS # f_substr "$string" $start [$length] # # Simple wrapper to awk(1)'s `substr' function. # f_substr() { local string="$1" start="${2:-0}" len="${3:-0}" echo "$string" | awk "{ print substr(\$0, $start, $len) }" } # f_snprintf $var_to_set $size $format [$arguments ...] # # Similar to snprintf(3), write at most $size number of bytes into $var_to_set # using printf(1) syntax (`$format [$arguments ...]'). The value of $var_to_set # is NULL unless at-least one byte is stored from the output. # f_snprintf() { local __var_to_set="$1" __size="$2" shift 2 # var_to_set size eval "$__var_to_set"=\$\( printf -- \"\$@\" \| \ awk -v max=\"\$__size\" \'' { len = length($0) max -= len print substr($0,0,(max > 0 ? len : max + len)) if ( max < 0 ) exit max-- }'\' \) } # f_sprintf $var_to_set $format [$arguments ...] # # Similar to sprintf(3), write a string into $var_to_set using printf(1) syntax # (`$format [$arguments ...]'). # f_sprintf() { local __var_to_set="$1" shift 1 # var_to_set eval "$__var_to_set"=\$\( printf -- \"\$@\" \) } # f_vsnprintf $var_to_set $size $format $format_args # # Similar to vsnprintf(3), write at most $size number of bytes into $var_to_set # using printf(1) syntax (`$format $format_args'). The value of $var_to_set is # NULL unless at-least one byte is stored from the output. # # Example 1: # # limit=7 format="%s" # format_args="'abc 123'" # 3-spaces between abc and 123 # f_vsnprintf foo $limit "$format" "$format_args" # foo=[abc 1] # # Example 2: # # limit=12 format="%s %s" -# format_args=" 'doghouse' 'foxhound' " +# format_args=" 'doghouse' 'fox' " # # even more spaces added to illustrate escape-method # f_vsnprintf foo $limit "$format" "$format_args" # foo=[doghouse fox] # # Example 3: # # limit=13 format="%s %s" # f_shell_escape arg1 'aaa"aaa' # arg1=[aaa"aaa] (no change) # f_shell_escape arg2 "aaa'aaa" # arg2=[aaa'\''aaa] (escaped s-quote) # format_args="'$arg1' '$arg2'" # use single-quotes to surround args # f_vsnprintf foo $limit "$format" "$format_args" # foo=[aaa"aaa aaa'a] # # In all of the above examples, the call to f_vsnprintf() does not change. Only # the contents of $limit, $format, and $format_args changes in each example. # f_vsnprintf() { eval f_snprintf \"\$1\" \"\$2\" \"\$3\" $4 } # f_vsprintf $var_to_set $format $format_args # # Similar to vsprintf(3), write a string into $var_to_set using printf(1) # syntax (`$format $format_args'). # f_vsprintf() { eval f_sprintf \"\$1\" \"\$2\" $3 } # f_longest_line_length # # Simple wrapper to an awk(1) script to print the length of the longest line of # input (read from stdin). Supports the newline escape-sequence `\n' for # splitting a single line into multiple lines. # f_longest_line_length_awk=' BEGIN { longest = 0 } { if (split($0, lines, /\\n/) > 1) { for (n in lines) { len = length(lines[n]) longest = ( len > longest ? len : longest ) } } else { len = length($0) longest = ( len > longest ? len : longest ) } } END { print longest } ' f_longest_line_length() { awk "$f_longest_line_length_awk" } # f_number_of_lines # # Simple wrapper to an awk(1) script to print the number of lines read from # stdin. Supports newline escape-sequence `\n' for splitting a single line into # multiple lines. # f_number_of_lines_awk=' BEGIN { num_lines = 0 } { num_lines += split(" "$0, unused, /\\n/) } END { print num_lines } ' f_number_of_lines() { awk "$f_number_of_lines_awk" } # f_isinteger $arg # # Returns true if argument is a positive/negative whole integer. # f_isinteger() { local arg="${1#-}" [ "${arg:-x}" = "${arg%[!0-9]*}" ] } # f_uriencode [$text] # # Encode $text for the purpose of embedding safely into a URL. Non-alphanumeric # characters are converted to `%XX' sequence where XX represents the hexa- # decimal ordinal of the non-alphanumeric character. If $text is missing, data # is instead read from standard input. # f_uriencode_awk=' BEGIN { output = "" for (n = 0; n < 256; n++) pack[sprintf("%c", n)] = sprintf("%%%02x", n) } { sline = "" slen = length($0) for (n = 1; n <= slen; n++) { char = substr($0, n, 1) if ( char !~ /^[[:alnum:]_]$/ ) char = pack[char] sline = sline char } output = output ( output ? "%0a" : "" ) sline } END { print output } ' f_uriencode() { if [ $# -gt 0 ]; then echo "$1" | awk "$f_uriencode_awk" else awk "$f_uriencode_awk" fi } # f_uridecode [$text] # # Decode $text from a URI. Encoded characters are converted from their `%XX' # sequence into original unencoded ASCII sequences. If $text is missing, data # is instead read from standard input. # f_uridecode_awk=' BEGIN { for (n = 0; n < 256; n++) chr[n] = sprintf("%c", n) } { sline = "" slen = length($0) for (n = 1; n <= slen; n++) { seq = substr($0, n, 3) if ( seq ~ /^%[[:xdigit:]][[:xdigit:]]$/ ) { hex = substr(seq, 2, 2) sline = sline chr[sprintf("%u", "0x"hex)] n += 2 } else sline = sline substr(seq, 1, 1) } print sline } ' f_uridecode() { if [ $# -gt 0 ]; then echo "$1" | awk "$f_uridecode_awk" else awk "$f_uridecode_awk" fi } # f_replaceall $string $find $replace [$var_to_set] # # Replace all occurrences of $find in $string with $replace. If $var_to_set is # either missing or NULL, the variable name is produced on standard out for # capturing in a sub-shell (which is less recommended due to performance # degradation). # # To replace newlines or a sequence containing the newline character, use $NL # as `\n' is not supported. # f_replaceall() { local __left="" __right="$1" local __find="$2" __replace="$3" __var_to_set="$4" while :; do case "$__right" in *$__find*) __left="$__left${__right%%$__find*}$__replace" __right="${__right#*$__find}" continue esac break done __left="$__left${__right#*$__find}" if [ "$__var_to_set" ]; then setvar "$__var_to_set" "$__left" else echo "$__left" fi } # f_str2varname $string [$var_to_set] # # Convert a string into a suitable value to be used as a variable name # by converting unsuitable characters into the underscrore [_]. If $var_to_set # is either missing or NULL, the variable name is produced on standard out for # capturing in a sub-shell (which is less recommended due to performance # degradation). # f_str2varname() { local __string="$1" __var_to_set="$2" f_replaceall "$__string" "[!$VALID_VARNAME_CHARS]" "_" "$__var_to_set" } # f_shell_escape $string [$var_to_set] # # Escape $string for shell eval statement(s) by replacing all single-quotes # with a special sequence that creates a compound string when interpolated # by eval with surrounding single-quotes. # # For example: # # foo="abc'123" # f_shell_escape "$foo" bar # bar=[abc'\''123] # eval echo \'$bar\' # produces abc'123 # # This is helpful when processing an argument list that has to retain its # escaped structure for later evaluations. # # WARNING: Surrounding single-quotes are not added; this is the responsibility # of the code passing the escaped values to eval (which also aids readability). # f_shell_escape() { local __string="$1" __var_to_set="$2" f_replaceall "$__string" "'" "'\\''" "$__var_to_set" } # f_shell_unescape $string [$var_to_set] # # The antithesis of f_shell_escape(), this function takes an escaped $string # and expands it. # # For example: # # foo="abc'123" # f_shell_escape "$foo" bar # bar=[abc'\''123] # f_shell_unescape "$bar" # produces abc'123 # f_shell_unescape() { local __string="$1" __var_to_set="$2" f_replaceall "$__string" "'\\''" "'" "$__var_to_set" } # f_expand_number $string [$var_to_set] # # Unformat $string into a number, optionally to be stored in $var_to_set. This # function follows the SI power of two convention. # # The prefixes are: # # Prefix Description Multiplier # k kilo 1024 # M mega 1048576 # G giga 1073741824 # T tera 1099511627776 # P peta 1125899906842624 # E exa 1152921504606846976 # # NOTE: Prefixes are case-insensitive. # # Upon successful completion, success status is returned; otherwise the number # -1 is produced ($var_to_set set to -1 or if $var_to_set is NULL or missing) # on standard output. In the case of failure, the error status will be one of: # # Status Reason # 1 Given $string contains no digits # 2 An unrecognized prefix was given # 3 Result too large to calculate # f_expand_number() { local __string="$1" __var_to_set="$2" local __cp __num __bshift __maxinput # Remove any leading non-digits __string="${__string#${__string%%[0-9]*}}" # Store the numbers (no trailing suffix) __num="${__string%%[!0-9]*}" # Produce `-1' if string didn't contain any digits if [ ! "$__num" ]; then if [ "$__var_to_set" ]; then setvar "$__var_to_set" -1 else echo -1 fi return 1 # 1 = "Given $string contains no digits" fi # Remove all the leading numbers from the string to get at the prefix __string="${__string#"$__num"}" # # Test for invalid prefix (and determine bitshift length) # case "$__string" in ""|[[:space:]]*) # Shortcut if [ "$__var_to_set" ]; then setvar "$__var_to_set" $__num else echo $__num fi return $SUCCESS ;; [Kk]*) __bshift=10 ;; [Mm]*) __bshift=20 ;; [Gg]*) __bshift=30 ;; [Tt]*) __bshift=40 ;; [Pp]*) __bshift=50 ;; [Ee]*) __bshift=60 ;; *) # Unknown prefix if [ "$__var_to_set" ]; then setvar "$__var_to_set" -1 else echo -1 fi return 2 # 2 = "An unrecognized prefix was given" esac # Determine if the wheels fall off __maxinput=$(( 0x7fffffffffffffff >> $__bshift )) if [ $__num -gt $__maxinput ]; then # Input (before expanding) would exceed 64-bit signed int if [ "$__var_to_set" ]; then setvar "$__var_to_set" -1 else echo -1 fi return 3 # 3 = "Result too large to calculate" fi # Shift the number out and produce it __num=$(( $__num << $__bshift )) if [ "$__var_to_set" ]; then setvar "$__var_to_set" $__num else echo $__num fi } ############################################################ MAIN f_dprintf "%s: Successfully loaded." strings.subr fi # ! $_STRINGS_SUBR Index: projects/clang380-import/usr.sbin/iscsid/iscsid.c =================================================================== --- projects/clang380-import/usr.sbin/iscsid/iscsid.c (revision 294776) +++ projects/clang380-import/usr.sbin/iscsid/iscsid.c (revision 294777) @@ -1,603 +1,602 @@ /*- * Copyright (c) 2012 The FreeBSD Foundation * All rights reserved. * * This software was developed by Edward Tomasz Napierala under sponsorship * from the FreeBSD Foundation. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include +#include #include #include #include #include #include #include #include #include - -#include #include "iscsid.h" static volatile bool sigalrm_received = false; static int nchildren = 0; static void usage(void) { fprintf(stderr, "usage: iscsid [-P pidfile][-d][-m maxproc][-t timeout]\n"); exit(1); } char * checked_strdup(const char *s) { char *c; c = strdup(s); if (c == NULL) log_err(1, "strdup"); return (c); } static void resolve_addr(const struct connection *conn, const char *address, struct addrinfo **ai, bool initiator_side) { struct addrinfo hints; char *arg, *addr, *ch; const char *port; int error, colons = 0; arg = checked_strdup(address); if (arg[0] == '\0') { fail(conn, "empty address"); log_errx(1, "empty address"); } if (arg[0] == '[') { /* * IPv6 address in square brackets, perhaps with port. */ arg++; addr = strsep(&arg, "]"); if (arg == NULL) { fail(conn, "malformed address"); log_errx(1, "malformed address %s", address); } if (arg[0] == '\0') { port = NULL; } else if (arg[0] == ':') { port = arg + 1; } else { fail(conn, "malformed address"); log_errx(1, "malformed address %s", address); } } else { /* * Either IPv6 address without brackets - and without * a port - or IPv4 address. Just count the colons. */ for (ch = arg; *ch != '\0'; ch++) { if (*ch == ':') colons++; } if (colons > 1) { addr = arg; port = NULL; } else { addr = strsep(&arg, ":"); if (arg == NULL) port = NULL; else port = arg; } } if (port == NULL && !initiator_side) port = "3260"; memset(&hints, 0, sizeof(hints)); hints.ai_family = PF_UNSPEC; hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_ADDRCONFIG | AI_NUMERICSERV; if (initiator_side) hints.ai_flags |= AI_PASSIVE; error = getaddrinfo(addr, port, &hints, ai); if (error != 0) { fail(conn, gai_strerror(error)); log_errx(1, "getaddrinfo for %s failed: %s", address, gai_strerror(error)); } } static struct connection * connection_new(int iscsi_fd, const struct iscsi_daemon_request *request) { struct connection *conn; struct addrinfo *from_ai, *to_ai; const char *from_addr, *to_addr; #ifdef ICL_KERNEL_PROXY struct iscsi_daemon_connect idc; #endif int error, sockbuf; conn = calloc(1, sizeof(*conn)); if (conn == NULL) log_err(1, "calloc"); /* * Default values, from RFC 3720, section 12. */ conn->conn_header_digest = CONN_DIGEST_NONE; conn->conn_data_digest = CONN_DIGEST_NONE; conn->conn_initial_r2t = true; conn->conn_immediate_data = true; conn->conn_max_data_segment_length = 8192; conn->conn_max_burst_length = 262144; conn->conn_first_burst_length = 65536; conn->conn_iscsi_fd = iscsi_fd; conn->conn_session_id = request->idr_session_id; memcpy(&conn->conn_conf, &request->idr_conf, sizeof(conn->conn_conf)); memcpy(&conn->conn_isid, &request->idr_isid, sizeof(conn->conn_isid)); conn->conn_tsih = request->idr_tsih; memcpy(&conn->conn_limits, &request->idr_limits, sizeof(conn->conn_limits)); from_addr = conn->conn_conf.isc_initiator_addr; to_addr = conn->conn_conf.isc_target_addr; if (from_addr[0] != '\0') resolve_addr(conn, from_addr, &from_ai, true); else from_ai = NULL; resolve_addr(conn, to_addr, &to_ai, false); #ifdef ICL_KERNEL_PROXY if (conn->conn_conf.isc_iser) { memset(&idc, 0, sizeof(idc)); idc.idc_session_id = conn->conn_session_id; if (conn->conn_conf.isc_iser) idc.idc_iser = 1; idc.idc_domain = to_ai->ai_family; idc.idc_socktype = to_ai->ai_socktype; idc.idc_protocol = to_ai->ai_protocol; if (from_ai != NULL) { idc.idc_from_addr = from_ai->ai_addr; idc.idc_from_addrlen = from_ai->ai_addrlen; } idc.idc_to_addr = to_ai->ai_addr; idc.idc_to_addrlen = to_ai->ai_addrlen; log_debugx("connecting to %s using ICL kernel proxy", to_addr); error = ioctl(iscsi_fd, ISCSIDCONNECT, &idc); if (error != 0) { fail(conn, strerror(errno)); log_err(1, "failed to connect to %s " "using ICL kernel proxy: ISCSIDCONNECT", to_addr); } return (conn); } #endif /* ICL_KERNEL_PROXY */ if (conn->conn_conf.isc_iser) { fail(conn, "iSER not supported"); log_errx(1, "iscsid(8) compiled without ICL_KERNEL_PROXY " "does not support iSER"); } conn->conn_socket = socket(to_ai->ai_family, to_ai->ai_socktype, to_ai->ai_protocol); if (conn->conn_socket < 0) { fail(conn, strerror(errno)); log_err(1, "failed to create socket for %s", from_addr); } sockbuf = SOCKBUF_SIZE; if (setsockopt(conn->conn_socket, SOL_SOCKET, SO_RCVBUF, &sockbuf, sizeof(sockbuf)) == -1) log_warn("setsockopt(SO_RCVBUF) failed"); sockbuf = SOCKBUF_SIZE; if (setsockopt(conn->conn_socket, SOL_SOCKET, SO_SNDBUF, &sockbuf, sizeof(sockbuf)) == -1) log_warn("setsockopt(SO_SNDBUF) failed"); if (from_ai != NULL) { error = bind(conn->conn_socket, from_ai->ai_addr, from_ai->ai_addrlen); if (error != 0) { fail(conn, strerror(errno)); log_err(1, "failed to bind to %s", from_addr); } } log_debugx("connecting to %s", to_addr); error = connect(conn->conn_socket, to_ai->ai_addr, to_ai->ai_addrlen); if (error != 0) { fail(conn, strerror(errno)); log_err(1, "failed to connect to %s", to_addr); } return (conn); } static void handoff(struct connection *conn) { struct iscsi_daemon_handoff idh; int error; log_debugx("handing off connection to the kernel"); memset(&idh, 0, sizeof(idh)); idh.idh_session_id = conn->conn_session_id; idh.idh_socket = conn->conn_socket; strlcpy(idh.idh_target_alias, conn->conn_target_alias, sizeof(idh.idh_target_alias)); idh.idh_tsih = conn->conn_tsih; idh.idh_statsn = conn->conn_statsn; idh.idh_header_digest = conn->conn_header_digest; idh.idh_data_digest = conn->conn_data_digest; idh.idh_initial_r2t = conn->conn_initial_r2t; idh.idh_immediate_data = conn->conn_immediate_data; idh.idh_max_data_segment_length = conn->conn_max_data_segment_length; idh.idh_max_burst_length = conn->conn_max_burst_length; idh.idh_first_burst_length = conn->conn_first_burst_length; error = ioctl(conn->conn_iscsi_fd, ISCSIDHANDOFF, &idh); if (error != 0) log_err(1, "ISCSIDHANDOFF"); } void fail(const struct connection *conn, const char *reason) { struct iscsi_daemon_fail idf; int error; memset(&idf, 0, sizeof(idf)); idf.idf_session_id = conn->conn_session_id; strlcpy(idf.idf_reason, reason, sizeof(idf.idf_reason)); error = ioctl(conn->conn_iscsi_fd, ISCSIDFAIL, &idf); if (error != 0) log_err(1, "ISCSIDFAIL"); } /* * XXX: I CANT INTO LATIN */ static void capsicate(struct connection *conn) { int error; cap_rights_t rights; #ifdef ICL_KERNEL_PROXY const unsigned long cmds[] = { ISCSIDCONNECT, ISCSIDSEND, ISCSIDRECEIVE, ISCSIDHANDOFF, ISCSIDFAIL, ISCSISADD, ISCSISREMOVE, ISCSISMODIFY }; #else const unsigned long cmds[] = { ISCSIDHANDOFF, ISCSIDFAIL, ISCSISADD, ISCSISREMOVE, ISCSISMODIFY }; #endif cap_rights_init(&rights, CAP_IOCTL); error = cap_rights_limit(conn->conn_iscsi_fd, &rights); if (error != 0 && errno != ENOSYS) log_err(1, "cap_rights_limit"); error = cap_ioctls_limit(conn->conn_iscsi_fd, cmds, sizeof(cmds) / sizeof(cmds[0])); if (error != 0 && errno != ENOSYS) log_err(1, "cap_ioctls_limit"); error = cap_enter(); if (error != 0 && errno != ENOSYS) log_err(1, "cap_enter"); if (cap_sandboxed()) log_debugx("Capsicum capability mode enabled"); else log_warnx("Capsicum capability mode not supported"); } bool timed_out(void) { return (sigalrm_received); } static void sigalrm_handler(int dummy __unused) { /* * It would be easiest to just log an error and exit. We can't * do this, though, because log_errx() is not signal safe, since * it calls syslog(3). Instead, set a flag checked by pdu_send() * and pdu_receive(), to call log_errx() there. Should they fail * to notice, we'll exit here one second later. */ if (sigalrm_received) { /* * Oh well. Just give up and quit. */ _exit(2); } sigalrm_received = true; } static void set_timeout(int timeout) { struct sigaction sa; struct itimerval itv; int error; if (timeout <= 0) { log_debugx("session timeout disabled"); return; } bzero(&sa, sizeof(sa)); sa.sa_handler = sigalrm_handler; sigfillset(&sa.sa_mask); error = sigaction(SIGALRM, &sa, NULL); if (error != 0) log_err(1, "sigaction"); /* * First SIGALRM will arive after conf_timeout seconds. * If we do nothing, another one will arrive a second later. */ bzero(&itv, sizeof(itv)); itv.it_interval.tv_sec = 1; itv.it_value.tv_sec = timeout; log_debugx("setting session timeout to %d seconds", timeout); error = setitimer(ITIMER_REAL, &itv, NULL); if (error != 0) log_err(1, "setitimer"); } static void sigchld_handler(int dummy __unused) { /* * The only purpose of this handler is to make SIGCHLD * interrupt the ISCSIDWAIT ioctl(2), so we can call * wait_for_children(). */ } static void register_sigchld(void) { struct sigaction sa; int error; bzero(&sa, sizeof(sa)); sa.sa_handler = sigchld_handler; sigfillset(&sa.sa_mask); error = sigaction(SIGCHLD, &sa, NULL); if (error != 0) log_err(1, "sigaction"); } static void handle_request(int iscsi_fd, const struct iscsi_daemon_request *request, int timeout) { struct connection *conn; log_set_peer_addr(request->idr_conf.isc_target_addr); if (request->idr_conf.isc_target[0] != '\0') { log_set_peer_name(request->idr_conf.isc_target); setproctitle("%s (%s)", request->idr_conf.isc_target_addr, request->idr_conf.isc_target); } else { setproctitle("%s", request->idr_conf.isc_target_addr); } conn = connection_new(iscsi_fd, request); set_timeout(timeout); capsicate(conn); login(conn); if (conn->conn_conf.isc_discovery != 0) discovery(conn); else handoff(conn); log_debugx("nothing more to do; exiting"); exit (0); } static int wait_for_children(bool block) { pid_t pid; int status; int num = 0; for (;;) { /* * If "block" is true, wait for at least one process. */ if (block && num == 0) pid = wait4(-1, &status, 0, NULL); else pid = wait4(-1, &status, WNOHANG, NULL); if (pid <= 0) break; if (WIFSIGNALED(status)) { log_warnx("child process %d terminated with signal %d", pid, WTERMSIG(status)); } else if (WEXITSTATUS(status) != 0) { log_warnx("child process %d terminated with exit status %d", pid, WEXITSTATUS(status)); } else { log_debugx("child process %d terminated gracefully", pid); } num++; } return (num); } int main(int argc, char **argv) { int ch, debug = 0, error, iscsi_fd, maxproc = 30, retval, saved_errno, timeout = 60; bool dont_daemonize = false; struct pidfh *pidfh; pid_t pid, otherpid; const char *pidfile_path = DEFAULT_PIDFILE; struct iscsi_daemon_request request; while ((ch = getopt(argc, argv, "P:dl:m:t:")) != -1) { switch (ch) { case 'P': pidfile_path = optarg; break; case 'd': dont_daemonize = true; debug++; break; case 'l': debug = atoi(optarg); break; case 'm': maxproc = atoi(optarg); break; case 't': timeout = atoi(optarg); break; case '?': default: usage(); } } argc -= optind; if (argc != 0) usage(); log_init(debug); pidfh = pidfile_open(pidfile_path, 0600, &otherpid); if (pidfh == NULL) { if (errno == EEXIST) log_errx(1, "daemon already running, pid: %jd.", (intmax_t)otherpid); log_err(1, "cannot open or create pidfile \"%s\"", pidfile_path); } iscsi_fd = open(ISCSI_PATH, O_RDWR); if (iscsi_fd < 0 && errno == ENOENT) { saved_errno = errno; retval = kldload("iscsi"); if (retval != -1) iscsi_fd = open(ISCSI_PATH, O_RDWR); else errno = saved_errno; } if (iscsi_fd < 0) log_err(1, "failed to open %s", ISCSI_PATH); if (dont_daemonize == false) { if (daemon(0, 0) == -1) { log_warn("cannot daemonize"); pidfile_remove(pidfh); exit(1); } } pidfile_write(pidfh); register_sigchld(); for (;;) { log_debugx("waiting for request from the kernel"); memset(&request, 0, sizeof(request)); error = ioctl(iscsi_fd, ISCSIDWAIT, &request); if (error != 0) { if (errno == EINTR) { nchildren -= wait_for_children(false); assert(nchildren >= 0); continue; } log_err(1, "ISCSIDWAIT"); } if (dont_daemonize) { log_debugx("not forking due to -d flag; " "will exit after servicing a single request"); } else { nchildren -= wait_for_children(false); assert(nchildren >= 0); while (maxproc > 0 && nchildren >= maxproc) { log_debugx("maxproc limit of %d child processes hit; " "waiting for child process to exit", maxproc); nchildren -= wait_for_children(true); assert(nchildren >= 0); } log_debugx("incoming connection; forking child process #%d", nchildren); nchildren++; pid = fork(); if (pid < 0) log_err(1, "fork"); if (pid > 0) continue; } pidfile_close(pidfh); handle_request(iscsi_fd, &request, timeout); } return (0); } Index: projects/clang380-import =================================================================== --- projects/clang380-import (revision 294776) +++ projects/clang380-import (revision 294777) Property changes on: projects/clang380-import ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,2 ## Merged /head:r294599-294776 Merged /user/ngie/socket-tests:r294245-294247,294488,294555,294643-294644