diff --git a/usr.sbin/nfsd/nfsv4.4 b/usr.sbin/nfsd/nfsv4.4 index de40194cd1dd..8460ed174ea6 100644 --- a/usr.sbin/nfsd/nfsv4.4 +++ b/usr.sbin/nfsd/nfsv4.4 @@ -1,370 +1,372 @@ .\" Copyright (c) 2009 Rick Macklem, University of Guelph .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd December 20, 2019 .Dt NFSV4 4 .Os .Sh NAME .Nm NFSv4 .Nd NFS Version 4 Protocol .Sh DESCRIPTION The NFS client and server provides support for the .Tn NFSv4 specification; see .%T "Network File System (NFS) Version 4 Protocol RFC 7530" , .%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" , .%T "Network File System (NFS) Version 4 Minor Version 2 Protocol RFC 7862" , .%T "File System Extended Attributes in NFSv4 RFC 8276" and .%T "Parallel NFS (pNFS) Flexible File Layout RFC 8435" . The protocol is somewhat similar to NFS Version 3, but differs in significant ways. It uses a single compound RPC that concatenates operations to-gether. Each of these operations are similar to the RPCs of NFS Version 3. The operations in the compound are performed in order, until one of them fails (returns an error) and then the RPC terminates at that point. .Pp It has integrated locking support, which implies that the server is no longer stateless. As such, the .Nm server remains in recovery mode for a grace period (always greater than the lease duration the server uses) after a reboot. During this grace period, clients may recover state but not perform other open/lock state changing operations. To provide for correct recovery semantics, a small file described by .Xr stablerestart 5 is used by the server during the recovery phase. If this file is missing or empty, there is a backup copy maintained by .Xr nfsd 8 that will be used. If either file is missing, they will be created by the .Xr nfsd 8 . If both the file and the backup copy are empty, it will result in the server starting without providing a grace period for recovery. Note that recovery only occurs when the server machine is rebooted, not when the .Xr nfsd 8 are just restarted. .Pp It provides several optional features not present in NFS Version 3: .sp .Bd -literal -offset indent -compact - NFS Version 4 ACLs - Referrals, which redirect subtrees to other servers (not yet implemented) - Delegations, which allow a client to operate on a file locally - pNFS, where I/O operations are separated from Metadata operations And for NFSv4.2 only - User namespace extended attributes - lseek(SEEK_DATA/SEEK_HOLE) - File copying done locally on the server for copy_file_range(2) - posix_fallocate(2) - posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED) .Ed .Pp The .Nm protocol does not use a separate mount protocol and assumes that the server provides a single file system tree structure, rooted at the point in the local file system tree specified by one or more .sp 1 .Bd -literal -offset indent -compact V4: [-sec=secflavors] [host(s) or net] .Ed .sp 1 line(s) in the .Xr exports 5 file. (See .Xr exports 5 for details.) The .Xr nfsd 8 allows a limited subset of operations to be performed on non-exported subtrees of the local file system, so that traversal of the tree to the exported subtrees is possible. As such, the ``'' can be in a non-exported file system. The exception is ZFS, which checks exports and, as such, all ZFS file systems below the ``'' must be exported. However, the entire tree that is rooted at that point must be in local file systems that are of types that can be NFS exported. Since the .Nm file system is rooted at ``'', setting this to anything other than ``/'' will result in clients being required to use different mount paths for .Nm than for NFS Version 2 or 3. Unlike NFS Version 2 and 3, Version 4 allows a client mount to span across multiple server file systems, although not all clients are capable of doing this. .Pp .Nm uses strings for users and groups instead of numbers. On the wire, these strings can either have the numbers in the string or take the form: .sp .Bd -literal -offset indent -compact @ .Ed .sp where ``'' is not the same as the DNS domain used for host name lookups, but is usually set to the same string. Most systems set this ``'' to the domain name part of the machine's .Xr hostname 1 by default. However, this can normally be overridden by a command line option or configuration file for the daemon used to do the name<->number mapping. -Under FreeBSD, the mapping daemon is called +Under +.Fx , +the mapping daemon is called .Xr nfsuserd 8 and has a command line option that overrides the domain component of the machine's hostname. For use of this form of string on .Nm , either client or server, this daemon must be running. .Pp The form where the numbers are in the strings can only be used for AUTH_SYS. To configure your systems this way, the .Xr nfsuserd 8 daemon does not need to be running on the server, but the following sysctls need to be set to 1 on the server. .sp .Bd -literal -offset indent -compact vfs.nfs.enable_uidtostring vfs.nfsd.enable_stringtouid .Ed .sp On the client, the sysctl .sp .Bd -literal -offset indent -compact vfs.nfs.enable_uidtostring .Ed .sp must be set to 1 and the .Xr nfsuserd 8 daemon does not need to be running. .Pp If these strings are not configured correctly, ``ls -l'' will typically report a lot of ``nobody'' and ``nogroup'' ownerships. .Pp Although uid/gid numbers are no longer used in the .Nm protocol except optionally in the above strings, they will still be in the RPC authentication fields when using AUTH_SYS (sec=sys), which is the default. As such, in this case both the user/group name and number spaces must be consistent between the client and server. .Pp However, if you run .Nm with RPCSEC_GSS (sec=krb5, krb5i, krb5p), only names and KerberosV tickets will go on the wire. .Sh SERVER SETUP To set up the NFS server that supports .Nm , you will need to set the variables in .Xr rc.conf 5 as follows: .sp .Bd -literal -offset indent -compact nfs_server_enable="YES" nfsv4_server_enable="YES" .Ed .sp plus .sp .Bd -literal -offset indent -compact nfsuserd_enable="YES" .Ed .sp if the server is using the ``@'' form of user/group strings or is using the ``-manage-gids'' option for .Xr nfsuserd 8 . .Pp You will also need to add at least one ``V4:'' line to the .Xr exports 5 file for .Nm to work. .Pp If the file systems you are exporting are only being accessed via .Nm there are a couple of .Xr sysctl 8 variables that you can change, which might improve performance. .Bl -tag -width Ds .It Cm vfs.nfsd.issue_delegations when set non-zero, allows the server to issue Open Delegations to clients. These delegations permit the client to manipulate the file locally on the client. Unfortunately, at this time, client use of delegations is limited, so performance gains may not be observed. This can only be enabled when the file systems being exported to .Nm clients are not being accessed locally on the server and, if being accessed via NFS Version 2 or 3 clients, these clients cannot be using the NLM. .It Cm vfs.nfsd.enable_locallocks can be set to 0 to disable acquisition of local byte range locks. Disabling local locking can only be done if neither local accesses to the exported file systems nor the NLM is operating on them. .El .sp Note that Samba server access would be considered ``local access'' for the above discussion. .Pp To build a kernel with the NFS server that supports .Nm linked into it, the .sp .Bd -literal -offset indent -compact options NFSD .Ed .sp must be specified in the kernel's .Xr config 5 file. .Sh CLIENT MOUNTS To do an .Nm mount, specify the ``nfsv4'' option on the .Xr mount_nfs 8 command line. This will force use of the client that supports .Nm plus set ``tcp'' and .Nm . .Pp The .Xr nfsuserd 8 must be running if name<->uid/gid mapping is being used, as above. Also, since an .Nm mount uses the host uuid to identify the client uniquely to the server, you cannot safely do an .Nm mount when .sp .Bd -literal -offset indent -compact hostid_enable="NO" .Ed .sp is set in .Xr rc.conf 5 . .sp If the .Nm server that is being mounted on supports delegations, you can start the .Xr nfscbd 8 daemon to handle client side callbacks. This will occur if .sp .Bd -literal -offset indent -compact nfsuserd_enable="YES" <-- If name<->uid/gid mapping is being used. nfscbd_enable="YES" .Ed .sp are set in .Xr rc.conf 5 . .sp Without a functioning callback path, a server will never issue Delegations to a client. .sp For NFSv4.0, by default, the callback address will be set to the IP address acquired via .Fn rtalloc in the kernel and port# 7745. To override the default port#, a command line option for .Xr nfscbd 8 can be used. .sp To get callbacks to work when behind a NAT gateway, a port for the callback service will need to be set up on the NAT gateway and then the address of the NAT gateway (host IP plus port#) will need to be set by assigning the .Xr sysctl 8 variable vfs.nfs.callback_addr to a string of the form: .sp N.N.N.N.N.N .sp where the first 4 Ns are the host IP address and the last two are the port# in network byte order (all decimal #s in the range 0-255). .Pp For NFSv4.1 and NFSv4.2, the callback path (called a backchannel) uses the same TCP connection as the mount, so none of the above applies and should work through gateways without any issues. .Pp To build a kernel with the client that supports .Nm linked into it, the option .sp .Bd -literal -offset indent -compact options NFSCL .Ed .sp must be specified in the kernel's .Xr config 5 file. .Pp Options can be specified for the .Xr nfsuserd 8 and .Xr nfscbd 8 daemons at boot time via the ``nfsuserd_flags'' and ``nfscbd_flags'' .Xr rc.conf 5 variables. .Pp NFSv4 mount(s) against exported volume(s) on the same host are not recommended, since this can result in a hung NFS server. It occurs when an nfsd thread tries to do an NFSv4 .Fn VOP_RECLAIM / Close RPC as part of acquiring a new vnode. If all other nfsd threads are blocked waiting for lock(s) held by this nfsd thread, then there isn't an nfsd thread to service the Close RPC. .Sh FILES .Bl -tag -width /var/db/nfs-stablerestart.bak -compact .It Pa /var/db/nfs-stablerestart NFS V4 stable restart file .It Pa /var/db/nfs-stablerestart.bak backup copy of the file .El .Sh SEE ALSO .Xr stablerestart 5 , .Xr mountd 8 , .Xr nfscbd 8 , .Xr nfsd 8 , .Xr nfsdumpstate 8 , .Xr nfsrevoke 8 , .Xr nfsuserd 8 .Sh BUGS At this time, there is no recall of delegations for local file system operations. As such, delegations should only be enabled for file systems that are being used solely as NFS export volumes and are not being accessed via local system calls nor services such as Samba. diff --git a/usr.sbin/nfsd/pnfs.4 b/usr.sbin/nfsd/pnfs.4 index c48357591f85..94fe02536ba1 100644 --- a/usr.sbin/nfsd/pnfs.4 +++ b/usr.sbin/nfsd/pnfs.4 @@ -1,205 +1,230 @@ .\" Copyright (c) 2017 Rick Macklem .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd December 20, 2019 .Dt PNFS 4 .Os .Sh NAME .Nm pNFS .Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol .Sh DESCRIPTION The NFSv4.1 and NFSv4.2 client and server provides support for the .Tn pNFS specification; see .%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" , .%T "Network File System (NFS) Version 4 Minor Version 2 Protocol RFC 7862" and .%T "Parallel NFS (pNFS) Flexible File Layout RFC 8435" . A pNFS service separates Read/Write operations from all other NFSv4.1 and NFSv4.2 operations, which are referred to as Metadata operations. The Read/Write operations are performed directly on the Data Server (DS) where the file's data resides, bypassing the NFS server. All other file operations are performed on the NFS server, which is referred to as a Metadata Server (MDS). NFS clients that do not support .Tn pNFS perform Read/Write operations on the MDS, which acts as a proxy for the appropriate DS(s). .Pp The NFSv4.1 and NFSv4.2 protocols provide two pieces of information to pNFS aware clients that allow them to perform Read/Write operations directly on the DS. .Pp The first is DeviceInfo, which is static information defining the DS server. The critical piece of information in DeviceInfo for the layout types -supported by FreeBSD is the IP address that is used to perform RPCs on the DS. +supported by +.Fx +is the IP address that is used to perform RPCs on the DS. It also indicates which version of NFS the DS supports, I/O size and other layout specific information. -In the DeviceInfo, there is a DeviceID which, for the FreeBSD server +In the DeviceInfo, there is a DeviceID which, for the +.Fx +server is unique to the DS configuration and changes whenever the .Xr nfsd daemon is restarted or the server is rebooted. .Pp The second is the layout, which is per file and references the DeviceInfo to use via the DeviceID. It is for a byte range of a file and is either Read or Read/Write. -For the FreeBSD server, a layout covers all bytes of a file. +For the +.Fx +server, a layout covers all bytes of a file. A layout may be recalled by the MDS using a LayoutRecall callback. When a client returns a layout via the LayoutReturn operation it can indicate that error(s) were encountered while doing I/O on the DS, at least for certain layout types such as the Flexible File Layout. .Pp -The FreeBSD client and server supports two layout types. +The +.Fx +client and server supports two layout types. .Pp The File Layout is described in RFC5661 and uses the NFSv4.1 or NFSv4.2 protocol to perform I/O on the DS. It does not support client aware DS mirroring and, as such, -the FreeBSD server only provides File Layout support for non-mirrored +the +.Fx +server only provides File Layout support for non-mirrored configurations. .Pp The Flexible File Layout allows the use of the NFSv3, NFSv4.0, NFSv4.1 or NFSv4.2 protocol to perform I/O on the DS and does support client aware mirroring. -As such, the FreeBSD server uses Flexible File Layout layouts for the +As such, the +.Fx +server uses Flexible File Layout layouts for the mirrored DS configurations. -The FreeBSD server supports the +The +.Fx +server supports the .Dq tightly coupled variant and all DSs allow use of the NFSv4.2 or NFSv4.1 protocol for I/O operations. Clients that support the Flexible File Layout will do writes and commits to all DS mirrors in the mirror set. .Pp -A FreeBSD pNFS service consists of a single MDS server plus one or more -DS servers, all of which are FreeBSD systems. -For a non-mirrored configuration, the FreeBSD server will issue File Layout +A +.Fx +pNFS service consists of a single MDS server plus one or more +DS servers, all of which are +.Fx +systems. +For a non-mirrored configuration, the +.Fx +server will issue File Layout layouts by default. However that default can be set to the Flexible File Layout by setting the .Xr sysctl 1 sysctl .Dq vfs.nfsd.default_flexfile to one. Mirrored server configurations will only issue Flexible File Layouts. .Tn pNFS clients mount the MDS as they would a single NFS server. .Pp -A FreeBSD +A +.Fx .Tn pNFS client must be running the .Xr nfscbd 8 daemon and use the mount options .Dq nfsv4,minorversion=2,pnfs or .Dq nfsv4,minorversion=1,pnfs . .Pp When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size zero. Each of these files will also have two extended attributes in the system attribute name space: .Bd -literal -offset indent pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data file on a DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime, Change and SpaceUsed attributes for the file. .Ed .Pp For each regular (VREG) file, the MDS creates a data file on one (or on N of them for the mirrored case, where N is the mirror_level) of the DS(s) where the file's data will be stored. The name of this file is the file handle of the file on the MDS in hexadecimal at time of file creation. The data file will have the same file ownership, mode and NFSv4 ACL (if ACLs are enabled for the file system) as the file on the MDS, so that permission checking can be done on the DS. This is referred to as .Dq tightly coupled for the Flexible File Layout. .Pp For .Tn pNFS aware clients, the service generates File Layout or Flexible File Layout layouts and associated DeviceInfo. For non-pNFS aware NFS clients, the pNFS service appears just like a normal NFS service. For the non-pNFS aware client, the MDS will perform I/O operations on the appropriate DS(s), acting as a proxy for the non-pNFS aware client. This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS aware. .Pp It is possible to assign a DS to an MDS exported file system so that it will store data for files on the MDS exported file system. If a DS is not assigned to an MDS exported file system, it will store data for files on all exported file systems on the MDS. .Pp If mirroring is enabled, the pNFS service will continue to function when DS(s) have failed, so long is there is at least one DS still operational that stores data for files on all of the MDS exported file systems. After a disabled mirrored DS is repaired, it is possible to recover the DS as a mirror while the pNFS service continues to function. .Pp See .Xr pnfsserver 4 -for information on how to set up a FreeBSD pNFS service. +for information on how to set up a +.Fx +pNFS service. .Sh SEE ALSO .Xr nfsv4 4 , .Xr pnfsserver 4 , .Xr exports 5 , .Xr fstab 5 , .Xr rc.conf 5 , .Xr nfscbd 8 , .Xr nfsd 8 , .Xr nfsuserd 8 , .Xr pnfsdscopymr 8 , .Xr pnfsdsfile 8 , .Xr pnfsdskill 8 .Sh BUGS Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client and will do all I/O through the MDS. For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen Linux client crashes when testing this client. For Linux 4.17-rc2 kernels, I have not seen client crashes during testing, but it only supports the .Dq loosely coupled variant. -To make it work correctly when mounting the FreeBSD server, you must +To make it work correctly when mounting the +.Fx +server, you must set the sysctl .Dq vfs.nfsd.flexlinuxhack to one so that it works around the Linux client driver's limitations. Wihout this sysctl being set, there will be access errors, since the Linux client will use the authenticator in the layout (uid=999, gid=999) and not the authenticator specified in the RPC header. .Pp Linux 5.n kernels appear to be patched so that it uses the authenticator in the RPC header and, as such, the above sysctl should not need to be set. .Pp Since the MDS cannot be mirrored, it is a single point of failure just as a non .Tn pNFS server is. diff --git a/usr.sbin/nfsd/pnfsserver.4 b/usr.sbin/nfsd/pnfsserver.4 index 22c2ecdb8696..f3cf2fa6acf9 100644 --- a/usr.sbin/nfsd/pnfsserver.4 +++ b/usr.sbin/nfsd/pnfsserver.4 @@ -1,425 +1,446 @@ .\" Copyright (c) 2018 Rick Macklem .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd December 20, 2019 .Dt PNFSSERVER 4 .Os .Sh NAME .Nm pNFSserver .Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol Server .Sh DESCRIPTION -A set of FreeBSD servers may be configured to provide a +A set of +.Fx +servers may be configured to provide a .Xr pnfs 4 service. -One FreeBSD system needs to be configured as a MetaData Server (MDS) and -at least one additional FreeBSD system needs to be configured as one or +One +.Fx +system needs to be configured as a MetaData Server (MDS) and +at least one additional +.Fx +system needs to be configured as one or more Data Servers (DS)s. .Pp -These FreeBSD systems are configured to be NFSv4.1 and NFSv4.2 +These +.Fx +systems are configured to be NFSv4.1 and NFSv4.2 servers, see .Xr nfsd 8 and .Xr exports 5 if you are not familiar with configuring a NFSv4.n server. All DS(s) and the MDS should support NFSv4.2 as well as NFSv4.1. Mixing an MDS that supports NFSv4.2 with any DS(s) that do not support NFSv4.2 will not work correctly. As such, all DS(s) must be upgraded from .Fx 12 to .Fx 13 before upgrading the MDS. .Sh DS server configuration The DS(s) need to be configured as NFSv4.1 and NFSv4.2 server(s), with a top level exported directory used for storage of data files. This directory must be owned by .Dq root and would normally have a mode of .Dq 700 . Within this directory there needs to be additional directories named ds0,...,dsN (where N is 19 by default) also owned by .Dq root with mode .Dq 700 . These are the directories where the data files are stored. The following command can be run by root when in the top level exported directory to create these subdirectories. .Bd -literal -offset indent jot -w ds 20 0 | xargs mkdir -m 700 .Ed .sp Note that .Dq 20 is the default and can be set to a larger value on the MDS as shown below. .sp The top level exported directory used for storage of data files must be exported to the MDS with the .Dq maproot=root sec=sys export options so that the MDS can create entries in these subdirectories. It must also be exported to all pNFS aware clients, but these clients do not require the .Dq maproot=root export option and this directory should be exported to them with the same options as used by the MDS to export file system(s) to the clients. .Pp -It is possible to have multiple DSs on the same FreeBSD system, but each +It is possible to have multiple DSs on the same +.Fx +system, but each of these DSs must have a separate top level exported directory used for storage of data files and each of these DSs must be mountable via a separate IP address. Alias addresses can be set on the DS server system for a network interface via .Xr ifconfig 8 to create these different IP addresses. Multiple DSs on the same server may be useful when data for different file systems -on the MDS are being stored on different file system volumes on the FreeBSD +on the MDS are being stored on different file system volumes on the +.Fx DS system. .Sh MDS server configuration -The MDS must be a separate FreeBSD system from the FreeBSD DS system(s) and +The MDS must be a separate +.Fx +system from the +.Fx +DS system(s) and NFS clients. It is configured as a NFSv4.1 and NFSv4.2 server with file system(s) exported to clients. However, the .Dq -p command line argument for .Xr nfsd is used to indicate that it is running as the MDS for a pNFS server. .Pp The DS(s) must all be mounted on the MDS using the following mount options: .Bd -literal -offset indent nfsv4,minorversion=2,soft,retrans=2 .Ed .sp so that they can be defined as DSs in the .Dq -p option. Normally these mounts would be entered in the .Xr fstab 5 on the MDS. For example, if there are four DSs named nfsv4-data[0-3], the .Xr fstab 5 lines might look like: .Bd -literal -offset nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 .Ed .sp The .Xr nfsd 8 command line option .Dq -p indicates that the NFS server is a pNFS MDS and specifies what DSs are to be used. .br For the above .Xr fstab 5 example, the .Xr nfsd 8 nfs_server_flags line in your .Xr rc.conf 5 might look like: .Bd -literal -offset nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3" .Ed .sp This example specifies that the data files should be distributed over the four DSs and File layouts will be issued to pNFS enabled clients. If issuing Flexible File layouts is desired for this case, setting the sysctl .Dq vfs.nfsd.default_flexfile non-zero in your .Xr sysctl.conf 5 file will make the .Nm do that. .br Alternately, this variant of .Dq nfs_server_flags will specify that two way mirroring is to be done, via the .Dq -m command line option. .Bd -literal -offset nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2" .Ed .sp With two way mirroring, the data file for each exported file on the MDS will be stored on two of the DSs. When mirroring is enabled, the server will always issue Flexible File layouts. .Pp It is also possible to specify which DSs are to be used to store data files for specific exported file systems on the MDS. For example, if the MDS has exported two file systems .Dq /export1 and .Dq /export2 to clients, the following variant of .Dq nfs_server_flags will specify that data files for .Dq /export1 will be stored on nfsv4-data0 and nfsv4-data1, whereas the data files for .Dq /export2 will be store on nfsv4-data2 and nfsv4-data3. .Bd -literal -offset nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2" .Ed .sp This can be used by system administrators to control where data files are stored and might be useful for control of storage use. For this case, it may be convenient to co-locate more than one of the DSs -on the same FreeBSD server, using separate file systems on the DS system +on the same +.Fx +server, using separate file systems on the DS system for storage of the respective DS's data files. If mirroring is desired for this case, the .Dq -m option also needs to be specified. There must be enough DSs assigned to each exported file system on the MDS to support the level of mirroring. The above example would be fine for two way mirroring, but four way mirroring would not work, since there are only two DSs assigned to each exported file system on the MDS. .Pp The number of subdirectories in each DS is defined by the .Dq vfs.nfs.dsdirsize sysctl on the MDS. This value can be increased from the default of 20, but only when the .Xr nfsd 8 is not running and after the additional ds20,... subdirectories have been created on all the DSs. For a service that will store a large number of files this sysctl should be set much larger, to avoid the number of entries in a subdirectory from getting too large. .Sh Client mounts -Once operational, NFSv4.1 or NFSv4.2 FreeBSD client mounts +Once operational, NFSv4.1 or NFSv4.2 +.Fx +client mounts done with the .Dq pnfs option should do I/O directly on the DSs. The clients mounting the MDS must be running the .Xr nfscbd daemon for pNFS to work. Set .Bd -literal -offset indent nfscbd_enable="YES" .Ed .sp in the .Xr rc.conf 5 on these clients. Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the MDS, which acts as a proxy for the appropriate DS(s). .Sh Backing up a pNFS service Since the data is separated from the metadata, the simple way to back up a pNFS service is to do so from an NFS client that has the service mounted on it. If you back up the MDS exported file system(s) on the MDS, you must do it in such a way that the .Dq system namespace extended attributes get backed up. .Sh Handling of failed mirrored DSs When a mirrored DS fails, it can be disabled one of three ways: .sp 1 - The MDS detects a problem when trying to do proxy operations on the DS. This can take a couple of minutes after the DS failure or network partitioning occurs. .sp 2 - A pNFS client can report an I/O error that occurred for a DS to the MDS in the arguments for a LayoutReturn operation. .sp 3 - The system administrator can perform the pnfsdskill(8) command on the MDS to disable it. If the system administrator does a pnfsdskill(8) and it fails with ENXIO (Device not configured) that normally means the DS was already disabled via #1 or #2. Since doing this is harmless, once a system administrator knows that there is a problem with a mirrored DS, doing the command is recommended. .sp Once a system administrator knows that a mirrored DS has malfunctioned or has been network partitioned, they should do the following as root/su on the MDS: .Bd -literal -offset indent # pnfsdskill # umount -N .Ed .sp Note that the must be the exact mounted-on path string used when the DS was mounted on the MDS. .Pp Once the mirrored DS has been disabled, the pNFS service should continue to function, but file updates will only happen on the DS(s) that have not been disabled. Assuming two way mirroring, that implies the one DS of the pair stored in the .Dq pnfsd.dsfile extended attribute for the file on the MDS, for files stored on the disabled DS. .Pp The next step is to clear the IP address in the .Dq pnfsd.dsfile extended attribute on all files on the MDS for the failed DS. This is done so that, when the disabled DS is repaired and brought back online, the data files on this DS will not be used, since they may be out of date. The command that clears the IP address is .Xr pnfsdsfile 8 with the .Dq -r option. .Bd -literal -offset For example: # pnfsdsfile -r nfsv4-data3 yyy.c yyy.c: nfsv4-data2.home.rick ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 0.0.0.0 ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 .Ed .sp replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3 will not get used. .Pp Normally this will be called within a .Xr find 1 command for all regular files in the exported directory tree and must be done on the MDS. When used with .Xr find 1 , you will probably also want the .Dq -q option so that it won't spit out the results for every file. If the disabled/repaired DS is nfsv4-data3, the commands done on the MDS would be: .Bd -literal -offset # cd # find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \; .Ed .sp There is a problem with the above command if the file found by .Xr find 1 is renamed or unlinked before the .Xr pnfsdsfile 8 command is done on it. This should normally generate an error message. A simple unlink is harmless but a link/unlink or rename might result in the file not having been processed under its new name. To check that all files have their IP addresses set to 0.0.0.0 these commands can be used (assuming the .Xr sh 1 shell): .Bd -literal -offset # cd # find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d" .Ed .sp Any line(s) printed require the .Xr pnfsdsfile 8 with .Dq -r to be done again. Once this is done, the replaced/repaired DS can be brought back online. It should have empty ds0,...,dsN directories under the top level exported directory for storage of data files just like it did when first set up. Mount it on the MDS exactly as you did before disabling it. For the nfsv4-data3 example, the command would be: .Bd -literal -offset # mount -t nfs -o nfsv4,minorversion=2,soft,retrans=2 nfsv4-data3:/ /data3 .Ed .sp Then restart the nfsd to re-enable the DS. .Bd -literal -offset # /etc/rc.d/nfsd restart .Ed .sp Now, new files can be stored on nfsv4-data3, but files with the IP address zeroed out on the MDS will not yet use the repaired DS (nfsv4-data3). The next step is to go through the exported file tree on the MDS and, for each of the files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file data to the repaired DS and re-enable use of this mirror for it. This command for copying the file data for one MDS file is .Xr pnfsdscopymr 8 and it will also normally be used in a .Xr find 1 . For the example case, the commands on the MDS would be: .Bd -literal -offset # cd # find . -type f -exec pnfsdscopymr -r /data3 {} \; .Ed .sp When this completes, the recovery should be complete or at least nearly so. As noted above, if a link/unlink or rename occurs on a file name while the above .Xr find 1 is in progress, it may not get copied. To check for any file(s) not yet copied, the commands are: .Bd -literal -offset # cd # find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d" .Ed .sp If this command prints out any file name(s), these files must have the .Xr pnfsdscopymr 8 command done on them to complete the recovery. .Bd -literal -offset # pnfsdscopymr -r /data3 .Ed .sp If this commmand fails with the error .br .Dq pnfsdscopymr: Copymr failed for file : Device not configured .br repeatedly, this may be caused by a Read/Write layout that has not been returned. The only way to get rid of such a layout is to restart the .Xr nfsd 8 . .sp All of these commands are designed to be done while the pNFS service is running and can be re-run safely. .Pp For a more detailed discussion of the setup and management of a pNFS service see: .Bd -literal -offset indent http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt .Ed .sp .Sh SEE ALSO .Xr nfsv4 4 , .Xr pnfs 4 , .Xr exports 5 , .Xr fstab 5 , .Xr rc.conf 5 , .Xr sysctl.conf 5 , .Xr nfscbd 8 , .Xr nfsd 8 , .Xr nfsuserd 8 , .Xr pnfsdscopymr 8 , .Xr pnfsdsfile 8 , .Xr pnfsdskill 8 .Sh HISTORY The .Nm service first appeared in .Fx 12.0 . .Sh BUGS Since the MDS cannot be mirrored, it is a single point of failure just as a non .Tn pNFS server is. -For non-mirrored configurations, all FreeBSD systems used in the service +For non-mirrored configurations, all +.Fx +systems used in the service are single points of failure.