diff --git a/en_US.ISO8859-1/books/arch-handbook/book.sgml b/en_US.ISO8859-1/books/arch-handbook/book.sgml index 7882d0146e..42c7c321a2 100644 --- a/en_US.ISO8859-1/books/arch-handbook/book.sgml +++ b/en_US.ISO8859-1/books/arch-handbook/book.sgml @@ -1,309 +1,310 @@ %bookinfo; %man; %chapters; %authors %mailing-lists; ]> FreeBSD Developers' Handbook The FreeBSD Documentation Project August 2000 2000 2001 The FreeBSD Documentation Project &bookinfo.legalnotice; Welcome to the Developers' Handbook. This manual is a work in progress and is the work of many individuals. Many sections do not yet exist and some of those that do exist need to be updated. If you are interested in helping with this project, send email to the &a.doc;. The latest version of this document is always available from the FreeBSD World Wide Web server. It may also be downloaded in a variety of formats and compression options from the FreeBSD FTP server or one of the numerous mirror sites. Basics &chap.introduction; &chap.tools; &chap.secure; &chap.l10n; &chap.policies; Interprocess Communication * Signals Signals, pipes, semaphores, message queues, shared memory, ports, sockets, doors &chap.sockets; &chap.ipv6; Kernel * History of the Unix Kernel Some history of the Unix/BSD kernel, system calls, how do processes work, blocking, scheduling, threads (kernel), context switching, signals, interrupts, modules, etc. &chap.locking; &chap.kobj; + &chap.jail; &chap.sysinit; &chap.vm; &chap.dma; &chap.kerneldebug; * UFS UFS, FFS, Ext2FS, JFS, inodes, buffer cache, labeling, locking, metadata, soft-updates, LFS, portalfs, procfs, vnodes, memory sharing, memory objects, TLBs, caching * AFS AFS, NFS, SANs etc] * Syscons Syscons, tty, PCVT, serial console, screen savers, etc * Compatibility Layers * Linux Linux, SVR4, etc Device Drivers &chap.driverbasics; &chap.isa; &chap.pci; &chap.scsi; &chap.usb; * NewBus This chapter will talk about the FreeBSD NewBus architecture. &chap.snd; Architectures &chap.x86; * Alpha Talk about the architectural specifics of FreeBSD/alpha. Explanation of allignment errors, how to fix, how to ignore. Example assembly language code for FreeBSD/alpha. * IA-64 Talk about the architectural specifics of FreeBSD/ia64. Appendices Dave A Patterson John L Hennessy 1998Morgan Kaufmann Publishers, Inc. 1-55860-428-6 Morgan Kaufmann Publishers, Inc. Computer Organization and Design The Hardware / Software Interface 1-2 W. Richard Stevens 1993Addison Wesley Longman, Inc. 0-201-56317-7 Addison Wesley Longman, Inc. Advanced Programming in the Unix Environment 1-2 Marshall Kirk McKusick Keith Bostic Michael J Karels John S Quarterman 1996Addison-Wesley Publishing Company, Inc. 0-201-54979-4 Addison-Wesley Publishing Company, Inc. The Design and Implementation of the 4.4 BSD Operating System 1-2 Aleph One Phrack 49; "Smashing the Stack for Fun and Profit" Chrispin Cowan Calton Pu Dave Maier StackGuard; Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks Todd Miller Theo de Raadt strlcpy and strlcat -- consistent, safe string copy and concatenation. diff --git a/en_US.ISO8859-1/books/arch-handbook/chapters.ent b/en_US.ISO8859-1/books/arch-handbook/chapters.ent index 69ba12524e..da3682e830 100644 --- a/en_US.ISO8859-1/books/arch-handbook/chapters.ent +++ b/en_US.ISO8859-1/books/arch-handbook/chapters.ent @@ -1,68 +1,45 @@ - + + + + + - - - - - - - - - - - - - - - - - - - - - - + - + - + - - - - - - - + diff --git a/en_US.ISO8859-1/books/arch-handbook/jail/chapter.sgml b/en_US.ISO8859-1/books/arch-handbook/jail/chapter.sgml new file mode 100644 index 0000000000..310fc31a81 --- /dev/null +++ b/en_US.ISO8859-1/books/arch-handbook/jail/chapter.sgml @@ -0,0 +1,611 @@ + + + + + Evan Sarmiento +
evms@cs.bu.edu
+
+
+ + 2001 + Evan Sarmiento + +
+ The Jail Subsystem + + On most UNIX systems, root has omnipotent power. This promotes + insecurity. If an attacker were to gain root on a system, he would + have every function at his fingertips. In FreeBSD there are + sysctls which dilute the power of root, in order to minimize the + damage caused by an attacker. Specifically, one of these functions + is called secure levels. Similarly, another function which is + present from FreeBSD 4.0 and onward, is a utility called + &man.jail.8;. + jail chroots an + environment and sets certain restrictions on processes which are + forked from within. For example, a jailed process cannot affect + processes outside of the jail, utilize certain system calls, or + inflict any damage on the main computer. + Jail is becoming the new security + model. People are running potentially vulnerable servers such as + Apache, BIND, and sendmail within jails, so that if an attacker + gains root within the Jail, it is only + an annoyance, and not a devastation. This article focuses on the + internals (source code) of Jail and + Jail NG. It will also suggest + improvements upon the jail code base which are already being + worked on. If you are looking for a how-to on setting up a + Jail, I suggest you look at my other + article in Sys Admin Magazine, May 2001, entitled "Securing + FreeBSD using Jail." + + + Architecture + + + Jail consists of two realms: the + user-space program, jail, and the code implemented within the + kernel: the jail() system call and associated + restrictions. I will be discussing the user-space program and + then how jail is implemented within the kernel. + + + Userland code + + The source for the user-land jail is located in + /usr/src/usr.sbin/jail , consisting of + one file, jail.c. The program takes these + arguments: the path of the jail, hostname, ip address, and the + command to be executed. + + + Data Structures + + In jail.c, the first thing I would + note is the declaration of an important structure + struct jail j; which was included from + /usr/include/sys/jail.h . + + The definition of the jail structure is: + +/usr/include/sys/jail.h: + +struct jail { +u.int32.t version; +char *path; +char *hostname; +u.int32.t ip.number; +}; + + As you can see, there is an entry for each of the + arguments passed to the jail program, and indeed, they are + set during it's execution. + + /usr/src/usr.sbin/jail.c +j.version = 0; +j.path = argv[1]; +j.hostname = argv[2]; + + + + + Networking + + One of the arguments passed to the Jail program is an IP + address with which the jail can be accessed over the + network. Jail translates the ip address given into network + byte order and then stores it in j (the jail structure). + + /usr/src/usr.sbin/jail/jail.c: +struct in.addr in; +... +i = inet.aton(argv[3], ); +... +j.ip.number = ntohl(in.s.addr); + + The + inet_aton3 + function "interprets the specified character string as an + Internet address, placing the address into the structure + provided." The ip number node in the jail structure is set + only when the ip address placed onto the in structure by + inet aton is translated into network byte order by + ntohl(). + + + + + Jailing The Process + + Finally, the userland program jails the process, and + executes the command specified. Jail now becomes an + imprisoned process itself and forks a child process which + then executes the command given using &man.execv.3; + + /usr/src/sys/usr.sbin/jail/jail.c +i = jail(); +... +i = execv(argv[4], argv + 4); + + As you can see, the jail function is being called, and + its argument is the jail structure which has been filled + with the arguments given to the program. Finally, the + program you specify is executed. I will now discuss how Jail + is implemented within the kernel. + + + + + Kernel Space + + We will now be looking at the file + /usr/src/sys/kern/kern_jail.c. This is + the file where the jail system call, appropriate sysctls, and + networking functions are defined. + + + sysctls + + In kern_jail.c, the following + sysctls are defined: + + /usr/src/sys/kern/kern_jail.c: + +int jail_set_hostname_allowed = 1; +SYSCTL_INT(_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW, + _set_hostname_allowed, 0, + "Processes in jail can set their hostnames"); + +int jail_socket_unixiproute_only = 1; +SYSCTL_INT(_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW, + _socket_unixiproute_only, 0, + "Processes in jail are limited to creating UNIX/IPv4/route sockets only +"); + +int jail_sysvipc_allowed = 0; +SYSCTL_INT(_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW, + _sysvipc_allowed, 0, + "Processes in jail can use System V IPC primitives"); + + Each of these sysctls can be accessed by the user + through the sysctl program. Throughout the kernel, these + specific sysctls are recognized by their name. For example, + the name of the first sysctl is + jail.set.hostname.allowed. + + + + &man.jail.2; system call + + Like all system calls, the &man.jail.2; system call takes + two arguments, struct proc *p and + struct jail_args + *uap. p is a pointer to a proc + structure which describes the calling process. In this + context, uap is a pointer to a structure which specifies the + arguments given to &man.jail.2; from the userland program + jail.c. When I described the userland + program before, you saw that the &man.jail.2; system call was + given a jail structure as its own argument. + + /usr/src/sys/kern/kern_jail.c: +int +jail(p, uap) + struct proc *p; + struct jail_args /* { + syscallarg(struct jail *) jail; + } */ *uap; + + Therefore, uap->jail would access the + jail structure which was passed to the system call. Next, + the system call copies the jail structure into kernel space + using the copyin() + function. copyin() takes three arguments: + the data which is to be copied into kernel space, + uap->jail, where to store it, + j and the size of the storage. The jail + structure uap->jail is copied into kernel + space and stored in another jail structure, + j. + + /usr/src/sys/kern/kern_jail.c: +error = copyin(uap->jail, , sizeof j); + + There is another important structure defined in + jail.h. It is the prison structure + (pr). The prison structure is used + exclusively within kernel space. The &man.jail.2; system call + copies everything from the jail structure onto the prison + structure. Here is the definition of the prison structure. + + /usr/include/sys/jail.h: +struct prison { + int pr_ref; + char pr_host[MAXHOSTNAMELEN]; + u_int32_t pr_ip; + void *pr_linux; +}; + + The jail() system call then allocates memory for a + pointer to a prison structure and copies data between the two + structures. + + /usr/src/sys/kern/kern_jail.c: + MALLOC(pr, struct prison *, sizeof *pr , M_PRISON, M_WAITOK); + bzero((caddr_t)pr, sizeof *pr); + error = copyinstr(j.hostname, pr_host]]>, sizeof pr->pr_host, 0); + if (error) + goto bail; + + Finally, the jail system call chroots the path + specified. The chroot function is given two arguments. The + first is p, which represents the calling process, the second + is a pointer to the structure chroot args. The structure + chroot args contains the path which is to be chrooted. As + you can see, the path specified in the jail structure is + copied to the chroot args structure and used. + + /usr/src/sys/kern/kern_jail.c: +ca.path = j.path; +error = chroot(p, ); + + These next three lines in the source are very important, + as they specify how the kernel recognizes a process as + jailed. Each process on a Unix system is described by its + own proc structure. You can see the whole proc structure in + /usr/include/sys/proc.h. For example, + the p argument in any system call is actually a pointer to + that process' proc structure, as stated before. The proc + structure contains nodes which can describe the owner's + identity (p_cred), the process resource + limits (p_limit), and so on. In the + definition of the process structure, there is a pointer to a + prison structure. (p_prison). + + /usr/include/sys/proc.h: +struct proc { +... +struct prison *p_prison; +... +}; + + In kern_jail.c, the function then + copies the pr structure, which is filled with all the + information from the original jail structure, over to the + p->p_prison structure. It then does a + bitwise OR of p->p_flag with the constant + P_JAILED, meaning that the calling + process is now recognized as jailed. The parent process of + each process, forked within the jail, is the program jail + itself, as it calls the &man.jail.2; system call. When the + program is executed through execve, it inherits the + properties of its parents proc structure, therefore it has + the p->p_flag set, and the + p->p_prison structure is filled. + + /usr/src/sys/kern/kern_jail.c +p->p.prison = pr; +p->p.flag --= P.JAILED; + + When a process is forked from a parent process, the + &man.fork.2; system call deals differently with imprisoned + processes. In the fork system call, there are two pointers + to a proc structure p1 + and p2. p1 points to + the parent's proc structure and p2 points + to the child's unfilled proc + structure. After copying all relevant data between the + structures, &man.fork.2; checks if the structure + p->p_prison is filled on + p2. If it is, it increments the + pr.ref by one, and sets the + p_flag to one on the child process. + + /usr/src/sys/kern/kern_fork.c: +if (p2->p_prison) { + p2->p_prison->pr_ref++; + p2->p_flag |= P_JAILED; +} + + + + + + + Restrictions + + Throughout the kernel there are access restrictions relating + to jailed processes. Usually, these restrictions only check if + the process is jailed, and if so, returns an error. For + example: + + if (p->p_prison) + return EPERM; + + + SysV IPC + + System V IPC is based on messages. Processes can send each + other these messages which tell them how to act. The functions + which deal with messages are: msgsys, + msgctl, msgget, + msgsend and msgrcv. + Earlier, I mentioned that there were certain sysctls you could + turn on or off in order to affect the behavior of Jail. One of + these sysctls was jail_sysvipc_allowed. On + most systems, this sysctl is set to 0. If it were set to 1, it + would defeat the whole purpose of having a jail; privleged + users from within the jail would be able to affect processes + outside of the environment. The difference between a message + and a signal is that the message only consists of the signal + number. + + /usr/src/sys/kern/sysv_msg.c: + + + &man.msgget.3;: msgget returns (and possibly + creates) a message descriptor that designates a message queue + for use in other system calls. + + &man.msgctl.3;: Using this function, a process + can query the status of a message + descriptor. + + &man.msgsnd.3;: msgsnd sends a message to a + process. + + &man.msgrcv.3;: a process receives messages using + this function + + + + In each of these system calls, there is this + conditional: + + /usr/src/sys/kern/sysv msg.c: +if (!jail.sysvipc.allowed && p->p_prison != NULL) + return (ENOSYS); + + Semaphore system calls allow processes to synchronize + execution by doing a set of operations atomically on a set of + semaphores. Basically semaphores provide another way for + processes lock resources. However, process waiting on a + semaphore, that is being used, will sleep until the resources + are relinquished. The following semaphore system calls are + blocked inside a jail: semsys, + semget, semctl and + semop. + + /usr/src/sys/kern/sysv_sem.c: + + + + &man.semctl.2;(id, num, cmd, arg): + Semctl does the specified cmd on the semaphore queue + indicated by id. + + + &man.semget.2;(key, nsems, flag): + Semget creates an array of semaphores, corresponding to + key. + + Key and flag take on the same meaning as they + do in msgget. + + &man.semop.2;(id, ops, num): + Semop does the set of semaphore operations in the array of + structures ops, to the set of semaphores identified by + id. + + + System V IPC allows for processes to share + memory. Processes can communicate directly with each other by + sharing parts of their virtual address space and then reading + and writing data stored in the shared memory. These system + calls are blocked within a jailed environment: shmdt, + shmat, oshmctl, shmctl, shmget, and + shmsys. + + /usr/src/sys/kern/sysv shm.c: + + + &man.shmctl.2;(id, cmd, buf): + shmctl does various control operations on the shared memory + region identified by id. + + &man.shmget.2;(key, size, + flag): shmget accesses or creates a shared memory + region of size bytes. + + &man.shmat.2;(id, addr, flag): + shmat attaches a shared memory region identified by id to the + address space of a process. + + &man.shmdt.2;(addr): shmdt + detaches the shared memory region previously attached at + addr. + + + + + + Sockets + + Jail treats the &man.socket.2; system call and related + lower-level socket functions in a special manner. In order to + determine whether a certain socket is allowed to be created, + it first checks to see if the sysctl + jail.socket.unixiproute.only is set. If + set, sockets are only allowed to be created if the family + specified is either PF_LOCAL, + PF_INET or + PF_ROUTE. Otherwise, it returns an + error. + + /usr/src/sys/kern/uipc_socket.c: +int socreate(dom, aso, type, proto, p) +... +register struct protosw *prp; +... +{ + if (p->p_prison && jail_socket_unixiproute_only && + prp->pr_domain->dom_family != PR_LOCAL && prp->pr_domain->dom_family != PF_INET + && prp->pr_domain->dom_family != PF_ROUTE) + return (EPROTONOSUPPORT); +... +} + + + + + Berkeley Packet Filter + + The Berkeley Packet Filter provides a raw interface to + data link layers in a protocol independent fashion. The + function bpfopen() opens an Ethernet + device. There is a conditional which disallows any jailed + processes from accessing this function. + + /usr/src/sys/net/bpf.c: +static int bpfopen(dev, flags, fmt, p) +... +{ + if (p->p_prison) + return (EPERM); +... +} + + + + + Protocols + + There are certain protocols which are very common, such as + TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the + network layer 2 . There are certain precautions which are + taken in order to prevent a jailed process from binding a + protocol to a certain port only if the nam + parameter is set. nam is a pointer to a sockaddr structure, + which describes the address on which to bind the service. A + more exact definition is that sockaddr "may be used as a + template for reffering to the identifying tag and length of + each address"[2] . In the function in + pcbbind, sin is a + pointer to a sockaddr.in structure, which contains the port, + address, length and domain family of the socket which is to be + bound. Basically, this disallows any processes from jail to be + able to specify the domain family. + + /usr/src/sys/kern/netinet/in_pcb.c: +int in.pcbbind(int, nam, p) +... + struct sockaddr *nam; + struct proc *p; +{ + ... + struct sockaddr.in *sin; + ... + if (nam) { + sin = (struct sockaddr.in *)nam; + ... + if (sin->sin_addr.s_addr != INADDR_ANY) + if (prison.ip(p, 0, ->sin.addr.s_addr)) + return (EINVAL); + .... + } +... +} + + You might be wondering what function + prison_ip() does. prison.ip is given three + arguments, the current process (represented by + p), any flags, and an ip address. It + returns 1 if the ip address belongs to a jail or 0 if it does + not. As you can see from the code, if it is indeed an ip + address belonging to a jail, the protcol is not allowed to + bind to a certain port. + + /usr/src/sys/kern/kern_jail.c: +int prison_ip(struct proc *p, int flag, u_int32_t *ip) { + u_int32_t tmp; + + if (!p->p_prison) + return (0); + if (flag) + tmp = *ip; + else tmp = ntohl (*ip); + + if (tmp == INADDR_ANY) { + if (flag) + *ip = p->p_prison->pr_ip; + else *ip = htonl(p->p_prison->pr_ip); + return (0); + } + + if (p->p_prison->pr_ip != tmp) + return (1); + return (0); +} + + Jailed users are not allowed to bind services to an ip + which does not belong to the jail. The restriction is also + written within the function in_pcbbind : + + /usr/src/sys/net inet/in_pcb.c + if (nam) { + ... + lport = sin->sin.port; + ... if (lport) { + ... + if (p && p->p_prison) + prison = 1; + if (prison && + prison_ip(p, 0, ->sin_addr.s_addr)) + return (EADDRNOTAVAIL); + + + + + Filesystem + + Even root users within the jail are not allowed to set any + file flags, such as immutable, append, and no unlink flags, if + the securelevel is greater than 0. + + /usr/src/sys/ufs/ufs/ufs_vnops.c: +int ufs.setattr(ap) + ... +{ + if ((cred->cr.uid == 0) && (p->prison == NULL)) { + if ((ip->i_flags + & (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) && + securelevel > 0) + return (EPERM); +} + + + + + + + Jail NG + + Jail NG is a "from-scratch re-implementation of Jail" by + Robert Watson, a FreeBSD committer. Some of the new features + include the ability to add processes to a jail, an improved + management tool, and per-jail sysctls. For example, you could + have sysvipc_permitted set on one jail while + another jail may be allowed to use System V IPC. You can + download the kernel patches and utilities for Jail NG from his + website at: + . + + + +
+ diff --git a/en_US.ISO8859-1/books/developers-handbook/book.sgml b/en_US.ISO8859-1/books/developers-handbook/book.sgml index 7882d0146e..42c7c321a2 100644 --- a/en_US.ISO8859-1/books/developers-handbook/book.sgml +++ b/en_US.ISO8859-1/books/developers-handbook/book.sgml @@ -1,309 +1,310 @@ %bookinfo; %man; %chapters; %authors %mailing-lists; ]> FreeBSD Developers' Handbook The FreeBSD Documentation Project August 2000 2000 2001 The FreeBSD Documentation Project &bookinfo.legalnotice; Welcome to the Developers' Handbook. This manual is a work in progress and is the work of many individuals. Many sections do not yet exist and some of those that do exist need to be updated. If you are interested in helping with this project, send email to the &a.doc;. The latest version of this document is always available from the FreeBSD World Wide Web server. It may also be downloaded in a variety of formats and compression options from the FreeBSD FTP server or one of the numerous mirror sites. Basics &chap.introduction; &chap.tools; &chap.secure; &chap.l10n; &chap.policies; Interprocess Communication * Signals Signals, pipes, semaphores, message queues, shared memory, ports, sockets, doors &chap.sockets; &chap.ipv6; Kernel * History of the Unix Kernel Some history of the Unix/BSD kernel, system calls, how do processes work, blocking, scheduling, threads (kernel), context switching, signals, interrupts, modules, etc. &chap.locking; &chap.kobj; + &chap.jail; &chap.sysinit; &chap.vm; &chap.dma; &chap.kerneldebug; * UFS UFS, FFS, Ext2FS, JFS, inodes, buffer cache, labeling, locking, metadata, soft-updates, LFS, portalfs, procfs, vnodes, memory sharing, memory objects, TLBs, caching * AFS AFS, NFS, SANs etc] * Syscons Syscons, tty, PCVT, serial console, screen savers, etc * Compatibility Layers * Linux Linux, SVR4, etc Device Drivers &chap.driverbasics; &chap.isa; &chap.pci; &chap.scsi; &chap.usb; * NewBus This chapter will talk about the FreeBSD NewBus architecture. &chap.snd; Architectures &chap.x86; * Alpha Talk about the architectural specifics of FreeBSD/alpha. Explanation of allignment errors, how to fix, how to ignore. Example assembly language code for FreeBSD/alpha. * IA-64 Talk about the architectural specifics of FreeBSD/ia64. Appendices Dave A Patterson John L Hennessy 1998Morgan Kaufmann Publishers, Inc. 1-55860-428-6 Morgan Kaufmann Publishers, Inc. Computer Organization and Design The Hardware / Software Interface 1-2 W. Richard Stevens 1993Addison Wesley Longman, Inc. 0-201-56317-7 Addison Wesley Longman, Inc. Advanced Programming in the Unix Environment 1-2 Marshall Kirk McKusick Keith Bostic Michael J Karels John S Quarterman 1996Addison-Wesley Publishing Company, Inc. 0-201-54979-4 Addison-Wesley Publishing Company, Inc. The Design and Implementation of the 4.4 BSD Operating System 1-2 Aleph One Phrack 49; "Smashing the Stack for Fun and Profit" Chrispin Cowan Calton Pu Dave Maier StackGuard; Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks Todd Miller Theo de Raadt strlcpy and strlcat -- consistent, safe string copy and concatenation. diff --git a/en_US.ISO8859-1/books/developers-handbook/chapters.ent b/en_US.ISO8859-1/books/developers-handbook/chapters.ent index 69ba12524e..da3682e830 100644 --- a/en_US.ISO8859-1/books/developers-handbook/chapters.ent +++ b/en_US.ISO8859-1/books/developers-handbook/chapters.ent @@ -1,68 +1,45 @@ - + + + + + - - - - - - - - - - - - - - - - - - - - - - + - + - + - - - - - - - + diff --git a/en_US.ISO8859-1/books/developers-handbook/jail/chapter.sgml b/en_US.ISO8859-1/books/developers-handbook/jail/chapter.sgml new file mode 100644 index 0000000000..310fc31a81 --- /dev/null +++ b/en_US.ISO8859-1/books/developers-handbook/jail/chapter.sgml @@ -0,0 +1,611 @@ + + + + + Evan Sarmiento +
evms@cs.bu.edu
+
+
+ + 2001 + Evan Sarmiento + +
+ The Jail Subsystem + + On most UNIX systems, root has omnipotent power. This promotes + insecurity. If an attacker were to gain root on a system, he would + have every function at his fingertips. In FreeBSD there are + sysctls which dilute the power of root, in order to minimize the + damage caused by an attacker. Specifically, one of these functions + is called secure levels. Similarly, another function which is + present from FreeBSD 4.0 and onward, is a utility called + &man.jail.8;. + jail chroots an + environment and sets certain restrictions on processes which are + forked from within. For example, a jailed process cannot affect + processes outside of the jail, utilize certain system calls, or + inflict any damage on the main computer. + Jail is becoming the new security + model. People are running potentially vulnerable servers such as + Apache, BIND, and sendmail within jails, so that if an attacker + gains root within the Jail, it is only + an annoyance, and not a devastation. This article focuses on the + internals (source code) of Jail and + Jail NG. It will also suggest + improvements upon the jail code base which are already being + worked on. If you are looking for a how-to on setting up a + Jail, I suggest you look at my other + article in Sys Admin Magazine, May 2001, entitled "Securing + FreeBSD using Jail." + + + Architecture + + + Jail consists of two realms: the + user-space program, jail, and the code implemented within the + kernel: the jail() system call and associated + restrictions. I will be discussing the user-space program and + then how jail is implemented within the kernel. + + + Userland code + + The source for the user-land jail is located in + /usr/src/usr.sbin/jail , consisting of + one file, jail.c. The program takes these + arguments: the path of the jail, hostname, ip address, and the + command to be executed. + + + Data Structures + + In jail.c, the first thing I would + note is the declaration of an important structure + struct jail j; which was included from + /usr/include/sys/jail.h . + + The definition of the jail structure is: + +/usr/include/sys/jail.h: + +struct jail { +u.int32.t version; +char *path; +char *hostname; +u.int32.t ip.number; +}; + + As you can see, there is an entry for each of the + arguments passed to the jail program, and indeed, they are + set during it's execution. + + /usr/src/usr.sbin/jail.c +j.version = 0; +j.path = argv[1]; +j.hostname = argv[2]; + + + + + Networking + + One of the arguments passed to the Jail program is an IP + address with which the jail can be accessed over the + network. Jail translates the ip address given into network + byte order and then stores it in j (the jail structure). + + /usr/src/usr.sbin/jail/jail.c: +struct in.addr in; +... +i = inet.aton(argv[3], ); +... +j.ip.number = ntohl(in.s.addr); + + The + inet_aton3 + function "interprets the specified character string as an + Internet address, placing the address into the structure + provided." The ip number node in the jail structure is set + only when the ip address placed onto the in structure by + inet aton is translated into network byte order by + ntohl(). + + + + + Jailing The Process + + Finally, the userland program jails the process, and + executes the command specified. Jail now becomes an + imprisoned process itself and forks a child process which + then executes the command given using &man.execv.3; + + /usr/src/sys/usr.sbin/jail/jail.c +i = jail(); +... +i = execv(argv[4], argv + 4); + + As you can see, the jail function is being called, and + its argument is the jail structure which has been filled + with the arguments given to the program. Finally, the + program you specify is executed. I will now discuss how Jail + is implemented within the kernel. + + + + + Kernel Space + + We will now be looking at the file + /usr/src/sys/kern/kern_jail.c. This is + the file where the jail system call, appropriate sysctls, and + networking functions are defined. + + + sysctls + + In kern_jail.c, the following + sysctls are defined: + + /usr/src/sys/kern/kern_jail.c: + +int jail_set_hostname_allowed = 1; +SYSCTL_INT(_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW, + _set_hostname_allowed, 0, + "Processes in jail can set their hostnames"); + +int jail_socket_unixiproute_only = 1; +SYSCTL_INT(_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW, + _socket_unixiproute_only, 0, + "Processes in jail are limited to creating UNIX/IPv4/route sockets only +"); + +int jail_sysvipc_allowed = 0; +SYSCTL_INT(_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW, + _sysvipc_allowed, 0, + "Processes in jail can use System V IPC primitives"); + + Each of these sysctls can be accessed by the user + through the sysctl program. Throughout the kernel, these + specific sysctls are recognized by their name. For example, + the name of the first sysctl is + jail.set.hostname.allowed. + + + + &man.jail.2; system call + + Like all system calls, the &man.jail.2; system call takes + two arguments, struct proc *p and + struct jail_args + *uap. p is a pointer to a proc + structure which describes the calling process. In this + context, uap is a pointer to a structure which specifies the + arguments given to &man.jail.2; from the userland program + jail.c. When I described the userland + program before, you saw that the &man.jail.2; system call was + given a jail structure as its own argument. + + /usr/src/sys/kern/kern_jail.c: +int +jail(p, uap) + struct proc *p; + struct jail_args /* { + syscallarg(struct jail *) jail; + } */ *uap; + + Therefore, uap->jail would access the + jail structure which was passed to the system call. Next, + the system call copies the jail structure into kernel space + using the copyin() + function. copyin() takes three arguments: + the data which is to be copied into kernel space, + uap->jail, where to store it, + j and the size of the storage. The jail + structure uap->jail is copied into kernel + space and stored in another jail structure, + j. + + /usr/src/sys/kern/kern_jail.c: +error = copyin(uap->jail, , sizeof j); + + There is another important structure defined in + jail.h. It is the prison structure + (pr). The prison structure is used + exclusively within kernel space. The &man.jail.2; system call + copies everything from the jail structure onto the prison + structure. Here is the definition of the prison structure. + + /usr/include/sys/jail.h: +struct prison { + int pr_ref; + char pr_host[MAXHOSTNAMELEN]; + u_int32_t pr_ip; + void *pr_linux; +}; + + The jail() system call then allocates memory for a + pointer to a prison structure and copies data between the two + structures. + + /usr/src/sys/kern/kern_jail.c: + MALLOC(pr, struct prison *, sizeof *pr , M_PRISON, M_WAITOK); + bzero((caddr_t)pr, sizeof *pr); + error = copyinstr(j.hostname, pr_host]]>, sizeof pr->pr_host, 0); + if (error) + goto bail; + + Finally, the jail system call chroots the path + specified. The chroot function is given two arguments. The + first is p, which represents the calling process, the second + is a pointer to the structure chroot args. The structure + chroot args contains the path which is to be chrooted. As + you can see, the path specified in the jail structure is + copied to the chroot args structure and used. + + /usr/src/sys/kern/kern_jail.c: +ca.path = j.path; +error = chroot(p, ); + + These next three lines in the source are very important, + as they specify how the kernel recognizes a process as + jailed. Each process on a Unix system is described by its + own proc structure. You can see the whole proc structure in + /usr/include/sys/proc.h. For example, + the p argument in any system call is actually a pointer to + that process' proc structure, as stated before. The proc + structure contains nodes which can describe the owner's + identity (p_cred), the process resource + limits (p_limit), and so on. In the + definition of the process structure, there is a pointer to a + prison structure. (p_prison). + + /usr/include/sys/proc.h: +struct proc { +... +struct prison *p_prison; +... +}; + + In kern_jail.c, the function then + copies the pr structure, which is filled with all the + information from the original jail structure, over to the + p->p_prison structure. It then does a + bitwise OR of p->p_flag with the constant + P_JAILED, meaning that the calling + process is now recognized as jailed. The parent process of + each process, forked within the jail, is the program jail + itself, as it calls the &man.jail.2; system call. When the + program is executed through execve, it inherits the + properties of its parents proc structure, therefore it has + the p->p_flag set, and the + p->p_prison structure is filled. + + /usr/src/sys/kern/kern_jail.c +p->p.prison = pr; +p->p.flag --= P.JAILED; + + When a process is forked from a parent process, the + &man.fork.2; system call deals differently with imprisoned + processes. In the fork system call, there are two pointers + to a proc structure p1 + and p2. p1 points to + the parent's proc structure and p2 points + to the child's unfilled proc + structure. After copying all relevant data between the + structures, &man.fork.2; checks if the structure + p->p_prison is filled on + p2. If it is, it increments the + pr.ref by one, and sets the + p_flag to one on the child process. + + /usr/src/sys/kern/kern_fork.c: +if (p2->p_prison) { + p2->p_prison->pr_ref++; + p2->p_flag |= P_JAILED; +} + + + + + + + Restrictions + + Throughout the kernel there are access restrictions relating + to jailed processes. Usually, these restrictions only check if + the process is jailed, and if so, returns an error. For + example: + + if (p->p_prison) + return EPERM; + + + SysV IPC + + System V IPC is based on messages. Processes can send each + other these messages which tell them how to act. The functions + which deal with messages are: msgsys, + msgctl, msgget, + msgsend and msgrcv. + Earlier, I mentioned that there were certain sysctls you could + turn on or off in order to affect the behavior of Jail. One of + these sysctls was jail_sysvipc_allowed. On + most systems, this sysctl is set to 0. If it were set to 1, it + would defeat the whole purpose of having a jail; privleged + users from within the jail would be able to affect processes + outside of the environment. The difference between a message + and a signal is that the message only consists of the signal + number. + + /usr/src/sys/kern/sysv_msg.c: + + + &man.msgget.3;: msgget returns (and possibly + creates) a message descriptor that designates a message queue + for use in other system calls. + + &man.msgctl.3;: Using this function, a process + can query the status of a message + descriptor. + + &man.msgsnd.3;: msgsnd sends a message to a + process. + + &man.msgrcv.3;: a process receives messages using + this function + + + + In each of these system calls, there is this + conditional: + + /usr/src/sys/kern/sysv msg.c: +if (!jail.sysvipc.allowed && p->p_prison != NULL) + return (ENOSYS); + + Semaphore system calls allow processes to synchronize + execution by doing a set of operations atomically on a set of + semaphores. Basically semaphores provide another way for + processes lock resources. However, process waiting on a + semaphore, that is being used, will sleep until the resources + are relinquished. The following semaphore system calls are + blocked inside a jail: semsys, + semget, semctl and + semop. + + /usr/src/sys/kern/sysv_sem.c: + + + + &man.semctl.2;(id, num, cmd, arg): + Semctl does the specified cmd on the semaphore queue + indicated by id. + + + &man.semget.2;(key, nsems, flag): + Semget creates an array of semaphores, corresponding to + key. + + Key and flag take on the same meaning as they + do in msgget. + + &man.semop.2;(id, ops, num): + Semop does the set of semaphore operations in the array of + structures ops, to the set of semaphores identified by + id. + + + System V IPC allows for processes to share + memory. Processes can communicate directly with each other by + sharing parts of their virtual address space and then reading + and writing data stored in the shared memory. These system + calls are blocked within a jailed environment: shmdt, + shmat, oshmctl, shmctl, shmget, and + shmsys. + + /usr/src/sys/kern/sysv shm.c: + + + &man.shmctl.2;(id, cmd, buf): + shmctl does various control operations on the shared memory + region identified by id. + + &man.shmget.2;(key, size, + flag): shmget accesses or creates a shared memory + region of size bytes. + + &man.shmat.2;(id, addr, flag): + shmat attaches a shared memory region identified by id to the + address space of a process. + + &man.shmdt.2;(addr): shmdt + detaches the shared memory region previously attached at + addr. + + + + + + Sockets + + Jail treats the &man.socket.2; system call and related + lower-level socket functions in a special manner. In order to + determine whether a certain socket is allowed to be created, + it first checks to see if the sysctl + jail.socket.unixiproute.only is set. If + set, sockets are only allowed to be created if the family + specified is either PF_LOCAL, + PF_INET or + PF_ROUTE. Otherwise, it returns an + error. + + /usr/src/sys/kern/uipc_socket.c: +int socreate(dom, aso, type, proto, p) +... +register struct protosw *prp; +... +{ + if (p->p_prison && jail_socket_unixiproute_only && + prp->pr_domain->dom_family != PR_LOCAL && prp->pr_domain->dom_family != PF_INET + && prp->pr_domain->dom_family != PF_ROUTE) + return (EPROTONOSUPPORT); +... +} + + + + + Berkeley Packet Filter + + The Berkeley Packet Filter provides a raw interface to + data link layers in a protocol independent fashion. The + function bpfopen() opens an Ethernet + device. There is a conditional which disallows any jailed + processes from accessing this function. + + /usr/src/sys/net/bpf.c: +static int bpfopen(dev, flags, fmt, p) +... +{ + if (p->p_prison) + return (EPERM); +... +} + + + + + Protocols + + There are certain protocols which are very common, such as + TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the + network layer 2 . There are certain precautions which are + taken in order to prevent a jailed process from binding a + protocol to a certain port only if the nam + parameter is set. nam is a pointer to a sockaddr structure, + which describes the address on which to bind the service. A + more exact definition is that sockaddr "may be used as a + template for reffering to the identifying tag and length of + each address"[2] . In the function in + pcbbind, sin is a + pointer to a sockaddr.in structure, which contains the port, + address, length and domain family of the socket which is to be + bound. Basically, this disallows any processes from jail to be + able to specify the domain family. + + /usr/src/sys/kern/netinet/in_pcb.c: +int in.pcbbind(int, nam, p) +... + struct sockaddr *nam; + struct proc *p; +{ + ... + struct sockaddr.in *sin; + ... + if (nam) { + sin = (struct sockaddr.in *)nam; + ... + if (sin->sin_addr.s_addr != INADDR_ANY) + if (prison.ip(p, 0, ->sin.addr.s_addr)) + return (EINVAL); + .... + } +... +} + + You might be wondering what function + prison_ip() does. prison.ip is given three + arguments, the current process (represented by + p), any flags, and an ip address. It + returns 1 if the ip address belongs to a jail or 0 if it does + not. As you can see from the code, if it is indeed an ip + address belonging to a jail, the protcol is not allowed to + bind to a certain port. + + /usr/src/sys/kern/kern_jail.c: +int prison_ip(struct proc *p, int flag, u_int32_t *ip) { + u_int32_t tmp; + + if (!p->p_prison) + return (0); + if (flag) + tmp = *ip; + else tmp = ntohl (*ip); + + if (tmp == INADDR_ANY) { + if (flag) + *ip = p->p_prison->pr_ip; + else *ip = htonl(p->p_prison->pr_ip); + return (0); + } + + if (p->p_prison->pr_ip != tmp) + return (1); + return (0); +} + + Jailed users are not allowed to bind services to an ip + which does not belong to the jail. The restriction is also + written within the function in_pcbbind : + + /usr/src/sys/net inet/in_pcb.c + if (nam) { + ... + lport = sin->sin.port; + ... if (lport) { + ... + if (p && p->p_prison) + prison = 1; + if (prison && + prison_ip(p, 0, ->sin_addr.s_addr)) + return (EADDRNOTAVAIL); + + + + + Filesystem + + Even root users within the jail are not allowed to set any + file flags, such as immutable, append, and no unlink flags, if + the securelevel is greater than 0. + + /usr/src/sys/ufs/ufs/ufs_vnops.c: +int ufs.setattr(ap) + ... +{ + if ((cred->cr.uid == 0) && (p->prison == NULL)) { + if ((ip->i_flags + & (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) && + securelevel > 0) + return (EPERM); +} + + + + + + + Jail NG + + Jail NG is a "from-scratch re-implementation of Jail" by + Robert Watson, a FreeBSD committer. Some of the new features + include the ability to add processes to a jail, an improved + management tool, and per-jail sysctls. For example, you could + have sysvipc_permitted set on one jail while + another jail may be allowed to use System V IPC. You can + download the kernel patches and utilities for Jail NG from his + website at: + . + + + +
+