diff --git a/en_US.ISO8859-1/articles/committers-guide/article.xml b/en_US.ISO8859-1/articles/committers-guide/article.xml
index e05587a219..db04ddd5f0 100644
--- a/en_US.ISO8859-1/articles/committers-guide/article.xml
+++ b/en_US.ISO8859-1/articles/committers-guide/article.xml
@@ -1,5655 +1,5655 @@
]>
Committer's GuideThe &os; Documentation Project1999200020012002200320042005200620072008200920102011201220132014201520162017201820192020The &os; Documentation Project
&tm-attrib.freebsd;
&tm-attrib.coverity;
&tm-attrib.ibm;
&tm-attrib.intel;
&tm-attrib.sparc;
&tm-attrib.general;
$FreeBSD$$FreeBSD$This document provides information for the &os;
committer community. All new committers should read this
document before they start, and existing committers are
strongly encouraged to review it from time to time.Almost all &os; developers have commit rights to one or
more repositories. However, a few developers do not, and some
of the information here applies to them as well. (For
instance, some people only have rights to work with the
Problem Report database). Please see
for more information.This document may also be of interest to members of the
&os; community who want to learn more about how the project
works.Administrative DetailsLogin Methods&man.ssh.1;, protocol 2 onlyMain Shell Hostfreefall.FreeBSD.orgSMTP Hostsmtp.FreeBSD.org:587
(see also ).src/ Subversion
Rootsvn+ssh://repo.FreeBSD.org/base
(see also ).doc/ Subversion
Rootsvn+ssh://repo.FreeBSD.org/doc
(see also ).ports/ Subversion
Rootsvn+ssh://repo.FreeBSD.org/ports
(see also ).Internal Mailing Listsdevelopers (technically called all-developers),
doc-developers, doc-committers, ports-developers,
ports-committers, src-developers, src-committers. (Each
project repository has its own -developers and
-committers mailing lists. Archives for these lists can
be found in the files
/local/mail/repository-name-developers-archive
and
/local/mail/repository-name-committers-archive
on the FreeBSD.org
cluster.)Core Team monthly
reports/home/core/public/monthly-reports
on the FreeBSD.org
cluster.Ports Management Team monthly
reports/home/portmgr/public/monthly-reports
on the FreeBSD.org
cluster.Noteworthy src/ SVN
Branchesstable/n
(n-STABLE),
head (-CURRENT)&man.ssh.1; is required to connect to the project hosts.
For more information, see .Useful links:&os;
Project Internal Pages&os;
Project Hosts&os;
Project Administrative GroupsOpenPGP Keys for &os;Cryptographic keys conforming to the
OpenPGP (Pretty Good
Privacy) standard are used by the &os; project to
authenticate committers. Messages carrying important
information like public SSH keys can be
signed with the OpenPGP key to prove that
they are really from the committer. See
PGP &
GPG: Email for the Practical Paranoid by Michael Lucas
and
for more information.Creating a KeyExisting keys can be used, but should be checked with
doc/head/share/pgpkeys/checkkey.sh
first. In this case, make sure the key has a &os; user
ID.For those who do not yet have an
OpenPGP key, or need a new key to meet &os;
security requirements, here we show how to generate
one.Install
security/gnupg. Enter
these lines in ~/.gnupg/gpg.conf to
set minimum acceptable defaults:fixed-list-mode
keyid-format 0xlong
personal-digest-preferences SHA512 SHA384 SHA256 SHA224
default-preference-list SHA512 SHA384 SHA256 SHA224 AES256 AES192 AES CAST5 BZIP2 ZLIB ZIP Uncompressed
use-agent
verify-options show-uid-validity
list-options show-uid-validity
sig-notation issuer-fpr@notations.openpgp.fifthhorseman.net=%g
cert-digest-algo SHA512Generate a key:&prompt.user; gpg --full-gen-key
gpg (GnuPG) 2.1.8; Copyright (C) 2015 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Warning: using insecure memory!
Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 2048
Requested keysize is 2048 bits
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0) 3y
Key expires at Wed Nov 4 17:20:20 2015 MST
Is this correct? (y/N) y
GnuPG needs to construct a user ID to identify your key.
Real name: Chucky Daemon
Email address: notreal@example.com
Comment:
You selected this USER-ID:
"Chucky Daemon <notreal@example.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o
You need a Passphrase to protect your secret key.2048-bit keys with a three-year expiration provide
adequate protection at present (2013-12).
describes the situation in more detail.A three year key lifespan is short enough to
obsolete keys weakened by advancing computer power,
but long enough to reduce key management
problems.Use your real name here, preferably matching that
shown on government-issued ID to
make it easier for others to verify your identity.
Text that may help others identify you can be entered
in the Comment section.After the email address is entered, a passphrase is
requested. Methods of creating a secure passphrase are
contentious. Rather than suggest a single way, here are
some links to sites that describe various methods: ,
,
,
.Protect the private key and passphrase. If either the
private key or passphrase may have been compromised or
disclosed, immediately notify
accounts@FreeBSD.org and revoke the key.Committing the new key is shown in
.Kerberos and LDAP web Password for &os; ClusterThe &os; cluster requires a Kerberos password to access
certain services. The Kerberos password also serves as the
LDAP web password, since LDAP is proxying to Kerberos in the
cluster. Some of the services
which require this include:BugzillaJenkinsTo create a new Kerberos account in the &os; cluster, or to
reset a Kerberos password for an existing account using a random
password generator:&prompt.user; ssh kpasswd.freebsd.orgThis must be done from a machine outside of the &os;.org
cluster.A Kerberos password can also be set manually
by logging into freefall.FreeBSD.org and
running:&prompt.user; kpasswdUnless the Kerberos-authenticated services
of the &os;.org cluster have been used previously,
Client unknown will be shown. This
error means that the
ssh kpasswd.freebsd.org method shown above
must be used first to initialize the Kerberos account.Commit Bit TypesThe &os; repository has a number of components which, when
combined, support the basic operating system source,
documentation, third party application ports infrastructure, and
various maintained utilities. When &os; commit bits are
allocated, the areas of the tree where the bit may be used are
specified. Generally, the areas associated with a bit reflect
who authorized the allocation of the commit bit. Additional
areas of authority may be added at a later date: when this
occurs, the committer should follow normal commit bit allocation
procedures for that area of the tree, seeking approval from the
appropriate entity and possibly getting a mentor for that area
for some period of time.Committer TypeResponsibleTree Componentssrccore@src/, doc/ subject to appropriate reviewdocdoceng@doc/, ports/, src/ documentationportsportmgr@ports/Commit bits allocated prior to the development of the notion
of areas of authority may be appropriate for use in many parts
of the tree. However, common sense dictates that a committer
who has not previously worked in an area of the tree seek review
prior to committing, seek approval from the appropriate
responsible party, and/or work with a mentor. Since the rules
regarding code maintenance differ by area of the tree, this is
as much for the benefit of the committer working in an area of
less familiarity as it is for others working on the tree.Committers are encouraged to seek review for their work as
part of the normal development process, regardless of the area
of the tree where the work is occurring.Policy for Committer Activity in Other TreesAll committers may modify
base/head/share/misc/committers-*.dot,
base/head/usr.bin/calendar/calendars/calendar.freebsd,
and
ports/head/astro/xearth/files.doc committers may commit
documentation changes to src
files, such as man pages, READMEs, fortune databases,
calendar files, and comment fixes without approval from a
src committer, subject to the normal care and tending of
commits.Any committer may make changes to any other tree
with an "Approved by" from a non-mentored committer with
the appropriate bit.Committers can acquire an additional bit by the usual
process of finding a mentor who will propose them to core,
doceng, or portmgr, as appropriate. When approved, they
will be added to 'access' and the normal mentoring period
will ensue, which will involve a continuing of
Approved by for some period."Approved by" is only acceptable from non-mentored src
committers -- mentored committers can provide a "Reviewed
by" but not an "Approved by".Subversion PrimerNew committers are assumed to already be familiar with the
basic operation of Subversion. If not, start by reading the
Subversion
Book.IntroductionThe &os; source repository switched from
CVS to Subversion on May 31st, 2008. The
first real SVN commit is
r179447.The &os; doc/www repository switched
from CVS to Subversion on May 19th, 2012.
The first real SVN commit is
r38821.The &os; ports repository switched
from CVS to Subversion on July 14th, 2012.
The first real SVN commit is
r300894.Subversion can be installed from the &os; Ports
Collection by issuing these commands:&prompt.root; pkg install subversionGetting StartedThere are a few ways to obtain a working copy of the tree
from Subversion. This section will explain them.Direct CheckoutThe first is to check out directly from the main
repository. For the src tree,
use:&prompt.user; svn checkout svn+ssh://repo.freebsd.org/base/head /usr/srcFor the doc tree, use:&prompt.user; svn checkout svn+ssh://repo.freebsd.org/doc/head /usr/docFor the ports tree, use:&prompt.user; svn checkout svn+ssh://repo.freebsd.org/ports/head /usr/portsThough the remaining examples in this document are
written with the workflow of working with the
src tree in mind, the underlying
concepts are the same for working with the
doc and the ports
tree.
Ports related Subversion operations are listed in
.The above command will check out a
CURRENT source tree as
/usr/src/,
which can be any target directory on the local filesystem.
Omitting the final argument of that command causes the
working copy, in this case, to be named head,
but that can be renamed safely.svn+ssh means the
SVN protocol tunnelled over
SSH. The name of the server is
repo.freebsd.org, base
is the path to the repository, and head
is the subdirectory within the repository.If your &os; login name is different from the login
name used on the local machine, either include it in
the URL (for example
svn+ssh://jarjar@repo.freebsd.org/base/head),
or add an entry to ~/.ssh/config
in the form:Host repo.freebsd.org
User jarjarThis is the simplest method, but it is hard to tell just
yet how much load it will place on the repository.The svn diff does not require
access to the server as SVN stores a
reference copy of every file in the working copy. This,
however, means that Subversion working copies are very
large in size.RELENG_* Branches and General
LayoutIn svn+ssh://repo.freebsd.org/base,
base refers to the source tree.
Similarly, ports refers to the ports
tree, and so on. These are separate repositories with their
own change number sequences, access controls and commit
mail.For the base repository, HEAD refers to the -CURRENT
tree. For example, head/bin/ls is what
would go into /usr/src/bin/ls in a
release. Some key locations are:/head/ which corresponds to
HEAD, also known as
-CURRENT./stable/n
which corresponds to
RELENG_n./releng/n.n
which corresponds to
RELENG_n_n./release/n.n.n
which corresponds to
RELENG_n_n_n_RELEASE./vendor* is the vendor branch
import work area. This directory itself does not
contain branches, however its subdirectories do. This
contrasts with the stable,
releng and
release directories./projects and
/user feature a branch work area.
As above, the
/user directory does not contain
branches itself.&os; Documentation Project Branches and
LayoutIn svn+ssh://repo.freebsd.org/doc,
doc refers to the repository root of
the source tree.In general, most &os; Documentation Project work will be
done within the head/ branch of the
documentation source tree.&os; documentation is written and/or translated to
various languages, each in a separate
directory in the head/
branch.Each translation set contains several subdirectories for
the various parts of the &os; Documentation Project. A few
noteworthy directories are:/articles/ contains the source
code for articles written by various &os;
contributors./books/ contains the source
code for the different books, such as the
&os; Handbook./htdocs/ contains the source
code for the &os; website.&os; Ports Tree Branches and LayoutIn svn+ssh://repo.freebsd.org/ports,
ports refers to the repository root of
the ports tree.In general, most &os; port work will be done within the
head/ branch of the ports tree which is
the actual ports tree used to install software. Some other
key locations are:/branches/RELENG_n_n_n
which corresponds to
RELENG_n_n_n
is used to merge back security updates in preparation
for a release./tags/RELEASE_n_n_n
which corresponds to
RELEASE_n_n_n
represents a release tag of the ports tree./tags/RELEASE_n_EOL
represents the end of life tag of a specific &os;
branch.Daily UseThis section will explain how to perform common day-to-day
operations with Subversion.HelpSVN has built in help documentation.
It can be accessed by typing:&prompt.user; svn helpAdditional information can be found in the
Subversion
Book.CheckoutAs seen earlier, to check out the &os; head
branch:&prompt.user; svn checkout svn+ssh://repo.freebsd.org/base/head /usr/srcAt some point, more than just HEAD
will probably be useful, for instance when merging changes
to stable/7. Therefore, it may be useful to have a partial
checkout of the complete tree (a full checkout would be very
painful).To do this, first check out the root of the
repository:&prompt.user; svn checkout --depth=immediates svn+ssh://repo.freebsd.org/baseThis will give base with all the
files it contains (at the time of writing, just
ROADMAP.txt) and empty subdirectories
for head, stable,
vendor and so on.Expanding the working copy is possible. Just change the
depth of the various subdirectories:&prompt.user; svn up --set-depth=infinity base/head
&prompt.user; svn up --set-depth=immediates base/release base/releng base/stableThe above command will pull down a full copy of
head, plus empty copies of every
release tag, every
releng branch, and every
stable branch.If at a later date merging to
7-STABLE is required, expand the working
copy:&prompt.user; svn up --set-depth=infinity base/stable/7Subtrees do not have to be expanded completely. For
instance, expanding only stable/7/sys and
then later expand the rest of
stable/7:&prompt.user; svn up --set-depth=infinity base/stable/7/sys
&prompt.user; svn up --set-depth=infinity base/stable/7Updating the tree with svn update
will only update what was previously asked for (in this
case, head and
stable/7; it will not pull down the whole
tree.Anonymous CheckoutIt is possible to anonymously check out the &os;
repository with Subversion. This will give access to a
read-only tree that can be updated, but not committed back
to the main repository. To do this, use:&prompt.user; svn co https://svn.FreeBSD.org/base/head /usr/srcMore details on using Subversion this way can be found
in Using
Subversion.Updating the TreeTo update a working copy to either the latest revision,
or a specific revision:&prompt.user; svn update
&prompt.user; svn update -r12345StatusTo view the local changes that have been made to the
working copy:&prompt.user; svn statusTo show local changes and files that are out-of-date
do:&prompt.user; svn status --show-updatesEditing and CommittingSVN does not need to
be told in advance about file editing.To commit all changes in
the current directory and all subdirectories:&prompt.user; svn commitTo commit all changes in, for example,
lib/libfetch/
and
usr/bin/fetch/
in a single operation:&prompt.user; svn commit lib/libfetchusr/bin/fetchThere is also a commit wrapper for the ports tree to
handle the properties and sanity checking the
changes:&prompt.user; /usr/ports/Tools/scripts/psvn commitAdding and Removing FilesBefore adding files, get a copy of auto-props.txt
(there is also a
ports tree specific version) and add it to
~/.subversion/config according to the
instructions in the file. If you added something before
reading this, use svn rm --keep-local
for just added files, fix your config file and re-add them
again. The initial config file is created when you first
run a svn command, even something as simple as
svn help.Files are added to a
SVN repository with svn
add. To add a file named
foo, edit it, then:&prompt.user; svn add fooMost new source files should include a
$&os;$ string near the
start of the file. On commit, svn will
expand the $&os;$ string,
adding the file path, revision number, date and time of
commit, and the username of the committer. Files which
cannot be modified may be committed without the
$&os;$ string.Files can be removed with svn
remove:&prompt.user; svn remove fooSubversion does not require deleting the file before
using svn rm, and indeed complains if
that happens.It is possible to add directories with
svn add:&prompt.user; mkdir bar
&prompt.user; svn add barAlthough svn mkdir makes this easier
by combining the creation of the directory and the adding of
it:&prompt.user; svn mkdir barLike files, directories are removed with
svn rm. There is no separate command
specifically for removing directories.&prompt.user; svn rm barCopying and Moving FilesThis command creates a copy of
foo.c named bar.c,
with the new file also under version control and with the
full history of foo.c:&prompt.user; svn copy foo.cbar.cThis is usually preferred to copying the file with
cp and adding it to the repository with
svn add because this way the new file
does not inherit the original one's history.To move and rename a file:&prompt.user; svn move foo.cbar.cLog and Annotatesvn log shows revisions and commit
messages, most recent first, for files or directories. When
used on a directory, all revisions that affected the
directory and files within that directory are shown.svn annotate, or equally svn
praise or svn blame, shows
the most recent revision number and who committed that
revision for each line of a file.Diffssvn diff displays changes to the
working copy. Diffs generated by SVN are
unified and include new files by default in the diff
output.svn diff can show the changes between
two revisions of the same file:&prompt.user; svn diff -r179453:179454 ROADMAP.txtIt can also show all changes for a specific changeset.
This command shows what changes were made to the
current directory and all subdirectories in changeset
179454:&prompt.user; svn diff -c179454 .RevertingLocal changes (including additions and deletions) can be
reverted using svn revert. It does not
update out-of-date files, but just replaces them with
pristine copies of the original version.ConflictsIf an svn update resulted in a merge
conflict, Subversion will remember which files have
conflicts and refuse to commit any changes to those files
until explicitly told that the conflicts have been resolved.
The simple, not yet deprecated procedure is:&prompt.user; svn resolved fooHowever, the preferred procedure is:&prompt.user; svn resolve --accept=working fooThe two examples are equivalent. Possible values for
--accept are:working: use the version in your
working directory (which one presumes has been edited to
resolve the conflicts).base: use a pristine copy of the
version you had before svn update,
discarding your own changes, the conflicting changes,
and possibly other intervening changes as well.mine-full: use what you had
before svn update, including your own
changes, but discarding the conflicting changes, and
possibly other intervening changes as well.theirs-full: use the version that
was retrieved when you did
svn update, discarding your own
changes.Advanced UseSparse CheckoutsSVN allows
sparse, or partial checkouts of a
directory by adding to a
svn checkout.Valid arguments to
are:empty: the directory itself
without any of its contents.files: the directory and any
files it contains.immediates: the directory and any
files and directories it contains, but none of the
subdirectories' contents.infinity: anything.The --depth option applies to many
other commands, including svn commit,
svn revert, and svn
diff.Since --depth is sticky, there is a
--set-depth option for svn
update that will change the selected depth.
Thus, given the working copy produced by the previous
example:&prompt.user; cd ~/freebsd
&prompt.user; svn update --set-depth=immediates .The above command will populate the working copy in
~/freebsd with
ROADMAP.txt and empty subdirectories,
and nothing will happen when svn update
is executed on the subdirectories. However, this
command will set the depth for
head (in this case) to infinity,
and fully populate it:&prompt.user; svn update --set-depth=infinity headDirect OperationCertain operations can be performed directly on the
repository without touching the working copy. Specifically,
this applies to any operation that does not require editing
a file, including:log,
diffmkdirremove, copy,
renamepropset,
propedit,
propdelmergeBranching is very fast. This command would be
used to branch RELENG_8:&prompt.user; svn copy svn+ssh://repo.freebsd.org/base/head svn+ssh://repo.freebsd.org/base/stable/8This is equivalent to these commands
which take minutes and hours as opposed to seconds,
depending on your network connection:&prompt.user; svn checkout --depth=immediates svn+ssh://repo.freebsd.org/base
&prompt.user; cd base
&prompt.user; svn update --set-depth=infinity head
&prompt.user; svn copy head stable/8
&prompt.user; svn commit stable/8Merging with SVNThis section deals with merging code from one branch to
another (typically, from head to a stable branch).In all examples below, $FSVN
refers to the location of the &os; Subversion repository,
svn+ssh://repo.freebsd.org/base/.About Merge TrackingFrom the user's perspective, merge tracking
information (or mergeinfo) is stored in a property called
svn:mergeinfo, which is a
comma-separated list of revisions and ranges of revisions
that have been merged. When set on a file, it applies
only to that file. When set on a directory, it applies to
that directory and its descendants (files and directories)
except for those that have their own
svn:mergeinfo.It is not inherited. For
instance, stable/6/contrib/openpam/
does not implicitly inherit mergeinfo from
stable/6/, or
stable/6/contrib/.
Doing so would make partial checkouts very hard to manage.
Instead, mergeinfo is explicitly propagated down the tree.
For merging something into
branch/foo/bar/,
these rules apply:If
branch/foo/bar/
does not already have a mergeinfo record, but a direct
ancestor (for instance,
branch/foo/)
does, then that record will be propagated down to
branch/foo/bar/
before information about the current merge is
recorded.Information about the current merge will
not be propagated back up that
ancestor.If a direct descendant of
branch/foo/bar/ (for instance,
branch/foo/bar/baz/) already has
a mergeinfo record, information about the current
merge will be propagated down to it.If you consider the case where a revision changes
several separate parts of the tree (for example,
branch/foo/bar/ and
branch/foo/quux/), but you only want
to merge some of it (for example,
branch/foo/bar/), you will see that
these rules make sense. If mergeinfo was propagated up,
it would seem like that revision had also been merged to
branch/foo/quux/, when in fact it had
not been.Selecting the Source and Target Branch
When MergingMerging to stable/ branches should
originate from head/. For
example:&prompt.user; svn merge -c r123456 ^/head/ stable/11
&prompt.user; svn commit stable/11Merges to releng/ branches should
always originate from the corresponding
stable/ branch. For example:&prompt.user; svn merge -c r123456 ^/stable/11 releng/11.0
&prompt.user; svn commit releng/11.0Committers are only permitted to commit to the
releng/ branches during a release
cycle after receiving approval from the Release
Engineering Team, after which only the Security Officer
may commit to a releng/ branch for
a Security Advisory or Errata Notice.All merges are
merged to and committed from the root of the
branch. All merges look like:&prompt.user; svn merge -c r123456 ^/head/ checkout
&prompt.user; svn commit checkoutNote that checkout must be
a complete checkout of the branch to which the merge
occurs.&prompt.user; svn merge -c r123456 ^/stable/10 releng/10.0Preparing the Merge Target
- Because of the mergeinfo propagation issues described
+ Due to the mergeinfo propagation issues described
earlier, it is very important to never merge changes
into a sparse working copy. Always use a full
checkout of the branch being merged into. For instance,
when merging from HEAD to 7, use a full checkout
of stable/7:&prompt.user; cd stable/7
&prompt.user; svn up --set-depth=infinityThe target directory must also be up-to-date and must
not contain any uncommitted changes or stray files.Identifying RevisionsIdentifying revisions to be merged is a must. If the
target already has complete mergeinfo, ask
SVN for a list:&prompt.user; cd stable/6/contrib/openpam
&prompt.user; svn mergeinfo --show-revs=eligible $FSVN/head/contrib/openpamIf the target does not have complete mergeinfo, check
the log for the merge source.MergingNow, let us start merging!The PrinciplesFor example, To merge:revision $Rin directory $target in stable branch
$Bfrom directory $source in head$FSVN is
svn+ssh://repo.freebsd.org/baseAssuming that revisions $P and $Q have
already been merged, and that the current directory is
an up-to-date working copy of stable/$B, the
existing mergeinfo looks like this:&prompt.user; svn propget svn:mergeinfo -R $target
$target - /head/$source:$P,$QMerging is done like so:&prompt.user; svn merge -c$R $FSVN/head/$source $targetChecking the results of this is possible with
svn diff.The svn:mergeinfo now looks like:&prompt.user; svn propget svn:mergeinfo -R $target
$target - head/$source:$P,$Q,$RIf the results are not exactly as shown, assistance
may be required before committing as mistakes may have
been made, or there may be something wrong with the
existing mergeinfo, or there may be a bug in
Subversion.Practical ExampleAs a practical example, consider this
scenario. The changes to netmap.4
in r238987 are to be merged from CURRENT to 9-STABLE.
The file resides in
head/share/man/man4. According
to , this is
also where to do the merge. Note that in this example
all paths are relative to the top of the svn repository.
For more information on the directory layout, see .The first step is to inspect the existing
mergeinfo.&prompt.user; svn propget svn:mergeinfo -R stable/9/share/man/man4Take a quick note of how it looks before moving on
to the next step; doing the actual merge:&prompt.user; svn merge -c r238987 svn+ssh://repo.freebsd.org/base/head/share/man/man4 stable/9/share/man/man4
--- Merging r238987 into 'stable/9/share/man/man4':
U stable/9/share/man/man4/netmap.4
--- Recording mergeinfo for merge of r238987 into
'stable/9/share/man/man4':
U stable/9/share/man/man4Check that the revision number of the merged
revision has been added. Once this is verified, the
only thing left is the actual commit.&prompt.user; svn commit stable/9/share/man/man4Precautions Before CommittingAs always, build world (or appropriate parts of
it).Check the changes with svn diff and
svn stat. Make sure all the files that
should have been added or deleted were in fact added or
deleted.Take a closer look at any property change (marked by a
M in the second column of svn
stat). Normally, no svn:mergeinfo properties
should be anywhere except the target directory (or
directories).If something looks fishy, ask for help.CommittingMake sure to commit a top level directory to have the
mergeinfo included as well. Do not specify individual
files on the command line. For more information about
committing files in general, see the relevant section of
this primer.Vendor Imports with SVNPlease read this entire section before starting a
vendor import.Patches to vendor code fall into two
categories:Vendor patches: these are patches that have been
issued by the vendor, or that have been extracted from
the vendor's version control system, which address
issues which cannot wait until the
next vendor release.&os; patches: these are patches that modify the
vendor code to address &os;-specific issues.The nature of a patch dictates where it should be
committed:Vendor patches must be committed to the vendor
branch, and merged from there to head. If the patch
addresses an issue in a new release that is currently
being imported, it must not be
committed along with the new release: the release must
be imported and tagged first, then the patch can be
applied and committed. There is no need to re-tag the
vendor sources after committing the patch.&os; patches are committed directly to
head.Preparing the TreeIf importing for the first time after the switch to
Subversion, flattening and cleaning up the vendor tree is
necessary, as well as bootstrapping the merge history in
the main tree.FlatteningDuring the conversion from CVS to
Subversion, vendor branches were imported with the same
layout as the main tree. This means that the
pf vendor sources ended up in
vendor/pf/dist/contrib/pf. The
vendor source is best directly in
vendor/pf/dist.To flatten the pf tree:&prompt.user; cd vendor/pf/dist/contrib/pf
&prompt.user; svn mv $(svn list) ../..
&prompt.user; cd ../..
&prompt.user; svn rm contrib
&prompt.user; svn propdel -R svn:mergeinfo .
&prompt.user; svn commitThe propdel bit is necessary
because starting with 1.5, Subversion will automatically
add svn:mergeinfo to any directory
that is copied or moved. In this case, as nothing is
being merged from the deleted tree, they just get in the
way.Tags may be flattened as well (3, 4, 3.5 etc.); the
procedure is exactly the same, only changing
dist to 3.5 or
similar, and putting the svn commit
off until the end of the process.Cleaning UpThe dist tree can be cleaned up
as necessary. Disabling keyword expansion is
recommended, as it makes no sense on unmodified vendor
code and in some cases it can even be harmful.
OpenSSH, for example,
includes two files that originated with &os; and still
contain the original version tags. To do this:&prompt.user; svn propdel svn:keywords -R .
&prompt.user; svn commitBootstrapping Merge HistoryIf importing for the first time after the switch to
Subversion, bootstrap svn:mergeinfo
on the target directory in the main tree to the revision
that corresponds to the last related change to the
vendor tree, prior to importing new sources:&prompt.user; cd head/contrib/pf
&prompt.user; svn merge --record-only svn+ssh://repo.freebsd.org/base/vendor/pf/dist@180876 .
&prompt.user; svn commitImporting New SourcesWith two commits—one for the import itself and
one for the tag—this step can optionally be repeated
for every upstream release between the last import and the
current import.Preparing the Vendor SourcesSubversion is able to store a
full distribution in the vendor tree. So, import
everything, but merge only what is required.A svn add is required to add any
files that were added since the last vendor import, and
svn rm is required to remove any that
were removed since. Preparing sorted lists of the
contents of the vendor tree and of the sources that are
about to be imported is recommended, to facilitate the
process.&prompt.user; cd vendor/pf/dist
&prompt.user; svn list -R | grep -v '/$' | sort >../old
&prompt.user; cd ../pf-4.3
&prompt.user; find . -type f | cut -c 3- | sort >../newWith these two files,
comm -23 ../old ../new will list
removed files (files only in old),
while comm -13 ../old ../new will
list added files only in
new.Importing into the Vendor TreeNow, the sources must be copied into
dist and
the svn add and
svn rm commands are used as
needed:&prompt.user; cd vendor/pf/pf-4.3
&prompt.user; tar cf - . | tar xf - -C ../dist
&prompt.user; cd ../dist
&prompt.user; comm -23 ../old ../new | xargs svn rm
&prompt.user; comm -13 ../old ../new | xargs svn add --parentsIf any directories were removed, they will have to
be svn rmed manually. Nothing will
break if they are not, but they will remain in the
tree.Check properties on any new files. All text files
should have svn:eol-style set to
native. All binary files should have
svn:mime-type set to
application/octet-stream unless there
is a more appropriate media type. Executable files
should have svn:executable set to
*. No other properties should exist
on any file in the tree.Committing is now possible. However, it is good
practice to make sure that everything is okay by using
the svn stat and
svn diff commands.TaggingOnce committed, vendor releases are tagged for
future reference. The best and quickest way to do this
is directly in the repository:&prompt.user; svn cp svn+ssh://repo.freebsd.org/base/vendor/pf/dist svn+ssh://repo.freebsd.org/base/vendor/pf/4.3Once that is complete, svn up the
working copy of
vendor/pf
to get the new tag, although this is rarely
needed.If creating the tag in the working copy of the tree,
svn:mergeinfo results must be
removed:&prompt.user; cd vendor/pf
&prompt.user; svn cp dist 4.3
&prompt.user; svn propdel svn:mergeinfo -R 4.3Merging to Head&prompt.user; cd head/contrib/pf
&prompt.user; svn up
&prompt.user; svn merge --accept=postpone svn+ssh://repo.freebsd.org/base/vendor/pf/dist .The --accept=postpone tells
Subversion not to complain about merge
conflicts as they will be handled manually.The cvs2svn changeover occurred
on June 3, 2008. When performing vendor merges for
packages which were already present and converted by the
cvs2svn process, the command used to
merge
/vendor/package_name/dist
to
/head/package_location
(for example,
head/contrib/sendmail) must use
to
indicate the revision to merge from the
/vendor tree. For example:&prompt.user; svn checkout svn+ssh://repo.freebsd.org/base/head/contrib/sendmail
&prompt.user; cd sendmail
&prompt.user; svn merge -c r261190 '^/vendor/sendmail/dist' .^ is an alias for the
repository path.If using the Zsh shell,
the ^ must be escaped with
\ or quoted.It is necessary to resolve any merge conflicts.Make sure that any files that were added or removed in
the vendor tree have been properly added or removed in the
main tree. To check diffs against the vendor
branch:&prompt.user; svn diff --no-diff-deleted --old=svn+ssh://repo.freebsd.org/base/vendor/pf/dist --new=.The --no-diff-deleted tells
Subversion not to complain about files that are in the
vendor tree but not in the main tree. Things that
would have previously been removed before the vendor
import, like the vendor's makefiles
and configure scripts.Using CVS, once a file was off the
vendor branch, it was not able to be put back. With
Subversion, there is no concept of on or off the vendor
branch. If a file that previously had local
modifications, to make it not show up in diffs in the
vendor tree, all that has to be done is remove any
left-over cruft like &os; version tags, which is much
easier.If any changes are required for the world to build
with the new sources, make them now, and keep testing
until everything builds and runs perfectly.Committing the Vendor ImportCommitting is now possible! Everything must be
committed in one go. If done properly, the tree will move
from a consistent state with old code, to a consistent
state with new code.From ScratchImporting into the Vendor TreeThis section is an example of importing and tagging
byacc into
head.First, prepare the directory in
vendor:&prompt.user; svn co --depth immediates $FSVN/vendor
&prompt.user; cd vendor
&prompt.user; svn mkdir byacc
&prompt.user; svn mkdir byacc/distNow, import the sources into the
dist directory.
Once the files are in place, svn add
the new ones, then svn commit and tag
the imported version. To save time and bandwidth,
direct remote committing and tagging is possible:&prompt.user; svn cp -m "Tag byacc 20120115"$FSVN/vendor/byacc/dist$FSVN/vendor/byacc/20120115Merging to headDue to this being a new file, copy it for the
merge:&prompt.user; svn cp -m "Import byacc to contrib"$FSVN/vendor/byacc/dist$FSVN/head/contrib/byaccWorking normally on newly imported sources is still
possible.Reverting a CommitReverting a commit to a previous version is fairly
easy:&prompt.user; svn merge -r179454:179453 ROADMAP.txt
&prompt.user; svn commitChange number syntax, with negative meaning a reverse
change, can also be used:&prompt.user; svn merge -c -179454 ROADMAP.txt
&prompt.user; svn commitThis can also be done directly in the repository:&prompt.user; svn merge -r179454:179453 svn+ssh://repo.freebsd.org/base/ROADMAP.txtIt is important to ensure that the mergeinfo
is correct when reverting a file to permit
svn mergeinfo --eligible to work as
expected.Reverting the deletion of a file is slightly different.
Copying the version of the file that predates the deletion
is required. For example, to restore a file that was
deleted in revision N, restore version N-1:&prompt.user; svn copy svn+ssh://repo.freebsd.org/base/ROADMAP.txt@179454
&prompt.user; svn commitor, equally:&prompt.user; svn copy svn+ssh://repo.freebsd.org/base/ROADMAP.txt@179454 svn+ssh://repo.freebsd.org/baseDo not simply recreate the file
manually and svn add it—this will
cause history to be lost.Fixing MistakesWhile we can do surgery in an emergency, do not plan on
having mistakes fixed behind the scenes. Plan on mistakes
remaining in the logs forever. Be sure to check the output
of svn status and svn
diff before committing.Mistakes will happen but,
they can generally be fixed without
disruption.Take a case of adding a file in the wrong location. The
right thing to do is to svn move the file
to the correct location and commit. This causes just a
couple of lines of metadata in the repository journal, and
the logs are all linked up correctly.The wrong thing to do is to delete the file and then
svn add an independent copy in the
correct location. Instead of a couple of lines of text, the
repository journal grows an entire new copy of the file.
This is a waste.Using a Subversion MirrorThere is a serious disadvantage to this method: every
time something is to be committed, a
svn relocate to the main repository has
to be done, remembering to svn relocate
back to the mirror after the commit. Also, since
svn relocate only works between
repositories that have the same UUID, some hacking of the
local repository's UUID has to occur before it is possible
to start using it.Checkout from a MirrorCheck out a working copy from a mirror by
substituting the mirror's URL for
svn+ssh://repo.freebsd.org/base. This
can be an official mirror or a mirror maintained by using
svnsync.Setting up a svnsync
MirrorAvoid setting up a svnsync
mirror unless there is a very good reason for it. Most
of the time a git mirror is a better
alternative. Starting a fresh mirror from scratch takes
a long time.
Expect a minimum of 10 hours for high speed connectivity.
If international links are involved, expect this to take
four to ten times longer.One way to limit the time required is to grab a seed
file. It is large (~1GB) but will consume less
network traffic and take less time to fetch than svnsync
will.Extract the file and update it:&prompt.user; tar xf svnmirror-base-r261170.tar.xz
&prompt.user; svnsync sync file:///home/svnmirror/baseNow, set that up to run from &man.cron.8;, do
checkouts locally, set up a svnserve server for local
machines to talk to, etc.The seed mirror is set to fetch from
svn://svn.freebsd.org/base. The
configuration for the mirror is stored in
revprop 0 on the local mirror. To see
the configuration, try:&prompt.user; svn proplist -v --revprop -r 0 file:///home/svnmirror/baseUse svn propset to change
things.Committing High-ASCII DataFiles that have high-ASCII bits are
considered binary files in SVN, so the
pre-commit checks fail and indicate that the
mime-type property should be set to
application/octet-stream. However, the
use of this is discouraged, so please do not set it. The
best way is always avoiding high-ASCII
data, so that it can be read everywhere with any text editor
but if it is not avoidable, instead of changing the
mime-type, set the fbsd:notbinary
property with propset:&prompt.user; svn propset fbsd:notbinary yes foo.dataMaintaining a Project BranchA project branch is one that is synced to head (or
another branch) is used to develop a project then commit it
back to head. In SVN,
dolphin branching is used for this. A
dolphin branch is one that diverges for a
while and is finally committed back to the original branch.
During development code migration in one direction (from
head to the branch only). No code is committed back to head
until the end. After the branch is committed back at the
end, it is dead (although a new branch with the same name
can be created after the dead one is deleted).As per https://people.FreeBSD.org/~peter/svn_notes.txt,
work that is intended to be merged back into HEAD should be
in base/projects/. If the
work is beneficial to the &os; community in some way
but not intended to be merged directly back into HEAD then
the proper location is
base/user/username/.
This
page contains further details.To create a project branch:&prompt.user; svn copy svn+ssh://repo.freebsd.org/base/head svn+ssh://repo.freebsd.org/base/projects/spifTo merge changes from HEAD back into the project
branch:&prompt.user; cd copy_of_spif
&prompt.user; svn merge svn+ssh://repo.freebsd.org/base/head
&prompt.user; svn commitIt is important to resolve any merge conflicts before
committing.Some TipsIn commit logs etc., rev 179872 is
spelled r179872 as per convention.Speeding up svn is possible by adding these entries to
~/.ssh/config:Host *
ControlPath ~/.ssh/sockets/master-%l-%r@%h:%p
ControlMaster auto
ControlPersist yesand then typingmkdir ~/.ssh/socketsChecking out a working copy with a stock Subversion client
without &os;-specific patches
(OPTIONS_SET=FREEBSD_TEMPLATE) will mean
that $FreeBSD$ tags will not
be expanded. Once the correct version has been installed,
trick Subversion into expanding them like so:&prompt.user; svn propdel -R svn:keywords .
&prompt.user; svn revert -R .This will wipe out uncommitted patches.It is possible to automatically fill the "Sponsored by"
and "MFC after" commit log fields by setting
"freebsd-sponsored-by" and "freebsd-mfc-after" fields in the
"[miscellany]" section of the
~/.subversion/config configuration file.
For example:freebsd-sponsored-by = The FreeBSD Foundation
freebsd-mfc-after = 2 weeksSetup, Conventions, and TraditionsThere are a number of things to do as a new developer.
The first set of steps is specific to committers only. These
steps must be done by a mentor for those who are not
committers.For New CommittersThose who have been given commit rights to the &os;
repositories must follow these steps.Get mentor approval before committing each of these
changes!The .ent and
.xml files mentioned below exist in
the &os; Documentation Project SVN repository at
svn+ssh://repo.FreeBSD.org/doc/.New files that do not have the
FreeBSD=%Hsvn:keywords property will be rejected
when attempting to commit them to the repository. Be sure
to read
regarding adding and removing files. Verify that
~/.subversion/config contains the
necessary auto-props entries from
auto-props.txt mentioned
there.All src commits go to
&os.current; first before being merged to &os.stable;.
The &os.stable; branch must maintain
ABI and API
compatibility with earlier versions of that branch. Do
not merge changes that break this compatibility.Steps for New CommittersAdd an Author Entitydoc/head/share/xml/authors.ent
— Add an author entity. Later steps depend on this
entity, and missing this step will cause the
doc/ build to fail. This is a
relatively easy task, but remains a good first test of
version control skills.Update the List of Developers and
Contributorsdoc/head/en_US.ISO8859-1/articles/contributors/contrib.committers.xml
—
Add an entry to the Developers section
of the Contributors
List. Entries are sorted by last name.doc/head/en_US.ISO8859-1/articles/contributors/contrib.additional.xml
— Remove the entry from the
Additional Contributors section. Entries
are sorted by first name.Add a News Itemdoc/head/share/xml/news.xml
— Add an entry. Look for the other entries that
announce new committers and follow the format. Use the
date from the commit bit approval email from
core@FreeBSD.org.Add a PGP Keydoc/head/share/pgpkeys/pgpkeys.ent
and
doc/head/share/pgpkeys/pgpkeys-developers.xml
- Add your PGP or
GnuPG key. Those who do not yet have a
key should see .&a.des.email; has written a shell script
(doc/head/share/pgpkeys/addkey.sh) to
make this easier. See the README
file for more information.Use
doc/head/share/pgpkeys/checkkey.sh to
verify that keys meet minimal best-practices
standards.After adding and checking a key, add both updated
files to source control and then commit them. Entries in
this file are sorted by last name.It is very important to have a current
PGP/GnuPG key in
the repository. The key may be required for positive
identification of a committer. For example, the
&a.admins; might need it for account recovery. A
complete keyring of FreeBSD.org users is
available for download from https://www.FreeBSD.org/doc/pgpkeyring.txt.Update Mentor and Mentee Informationbase/head/share/misc/committers-repository.dot
— Add an entry to the current committers section,
where repository is
doc, ports, or
src, depending on the commit privileges
granted.Add an entry for each additional mentor/mentee
relationship in the bottom section.Generate a Kerberos
PasswordSee to generate or
set a Kerberos for use with
other &os; services like the bug tracking database.Optional: Enable Wiki Account&os;
Wiki Account — A wiki account allows
sharing projects and ideas. Those who do not yet have an
account can follow instructions on the AboutWiki
Page to obtain one. Contact
wiki-admin@FreeBSD.org if you need help
with your Wiki account.Optional: Update Wiki InformationWiki Information - After gaining access to the wiki,
some people add entries to the How
We Got Here, IRC
Nicks, and
Dogs of FreeBSD pages.Optional: Update Ports with Personal
Informationports/astro/xearth/files/freebsd.committers.markers
and
src/usr.bin/calendar/calendars/calendar.freebsd
- Some people add entries for themselves to these files to
show where they are located or the date of their
birthday.Optional: Prevent Duplicate MailingsSubscribers to &a.svn-src-all.name;,
&a.svn-ports-all.name; or &a.svn-doc-all.name; might wish
to unsubscribe to avoid receiving duplicate copies of
commit messages and followups.For EveryoneIntroduce yourself to the other developers, otherwise
no one will have any idea who you are or what you are
working on. The introduction need not be a comprehensive
biography, just write a paragraph or two about who you
are, what you plan to be working on as a developer in
&os;, and who will be your mentor. Email this to the
&a.developers; and you will be on your way!Log into freefall.FreeBSD.org
and create a
/var/forward/user
(where user is your username)
file containing the e-mail address where you want mail
addressed to
yourusername@FreeBSD.org to be
forwarded. This includes all of the commit messages as
well as any other mail addressed to the &a.committers; and
the &a.developers;. Really large mailboxes which have
taken up permanent residence on
freefall may get truncated
without warning if space needs to be freed, so forward it
or save it elsewhere.If your e-mail system uses SPF with strict rules,
you should whitelist mx2.FreeBSD.org from
SPF checks.Due to the severe load dealing with SPAM places on the
central mail servers that do the mailing list processing,
the front-end server does do some basic checks and will
drop some messages based on these checks. At the moment
proper DNS information for the connecting host is the only
check in place but that may change. Some people blame
these checks for bouncing valid email. To have these
checks turned off for your email, create a file
named ~/.spam_lover
on freefall.FreeBSD.org.Those who are developers but not committers will
not be subscribed to the committers or developers mailing
lists. The subscriptions are derived from the access
rights.SMTP Access SetupFor those willing to send e-mail messages through the
FreeBSD.org infrastructure, follow the instructions
below:Point your mail client at
smtp.FreeBSD.org:587.Enable STARTTLS.Ensure your From: address is set
to
yourusername@FreeBSD.org.For authentication, you can use your &os; Kerberos
username and password (see ). The
yourusername/mail
principal is preferred, as it is only valid for
authenticating to mail resources.Do not include @FreeBSD.org
when entering in your username.Additional NotesWill only accept mail from
yourusername@FreeBSD.org.
If you are authenticated as one user, you are not
permitted to send mail from another.A header will be appended with the SASL username:
(Authenticated sender:
username).Host has various rate limits in place to cut down
on brute force attempts.Using a Local MTA to Forward Emails to the
&os;.org SMTP ServiceIt is also possible to use a local
MTA to forward locally sent emails to
the &os;.org SMTP servers.Using PostfixTo tell a local Postfix instance that anything from
yourusername@FreeBSD.org
should be forwarded to the &os;.org servers, add this to
your main.cf:sender_dependent_relayhost_maps = hash:/usr/local/etc/postfix/relayhost_maps
smtp_sasl_auth_enable = yes
smtp_sasl_security_options = noanonymous
smtp_sasl_password_maps = hash:/usr/local/etc/postfix/sasl_passwd
smtp_use_tls = yesCreate
/usr/local/etc/postfix/relayhost_maps
with the following content:yourusername@FreeBSD.org [smtp.freebsd.org]:587Create
/usr/local/etc/postfix/sasl_passwd
with the following content:[smtp.freebsd.org]:587 yourusername:yourpasswordIf the email server is used by other people, you
may want to prevent them from sending e-mails from your
address. To achieve this, add this to your
main.cf:smtpd_sender_login_maps = hash:/usr/local/etc/postfix/sender_login_maps
smtpd_sender_restrictions = reject_known_sender_login_mismatchCreate
/usr/local/etc/postfix/sender_login_maps
with the following content:yourusername@FreeBSD.org yourlocalusernameWhere yourlocalusername
is the SASL username used to connect
to the local instance of
Postfix.MentorsAll new developers have a mentor assigned to them for
the first few months. A mentor is responsible for teaching
the mentee the rules and conventions of the project and
guiding their first steps in the developer community. The
mentor is also personally responsible for the mentee's actions
during this initial period.For committers: do not commit anything without first
getting mentor approval. Document that approval with an
Approved by: line in the commit
message.When the mentor decides that a mentee has learned the
ropes and is ready to commit on their own, the mentor
announces it with a commit to
conf/mentors. This file is in the
svnadmin branch of each
repository:srcbase/svnadmin/conf/mentorsdocdoc/svnadmin/conf/mentorsportsports/svnadmin/conf/mentorsNew committers should aim to complete enough commits that
their mentor is comfortable releasing them from mentorship
within the first year. If they are still under mentorship, the
appropriate management body (core, doceng, or portmgr) should
attempt to ensure that there are no barriers preventing
completion. If the committer is unable to satisfy their mentor
of readiness by a year and a half their commit bit may be
converted to project membership.Pre-Commit ReviewCode review is one way to increase the quality of software.
The following guidelines apply to commits to the
head (-CURRENT) branch of the
src repository. Other branches and the
ports and docs trees have
their own review policies, but these guidelines generally apply
to commits requiring review:All non-trivial changes should be reviewed before they
are committed to the repository.Reviews may be conducted by email, in
Bugzilla, in
Phabricator, or by another
mechanism. Where possible, reviews should be public.The developer responsible for a code change is also
responsible for making all necessary review-related
changes.Code review can be an iterative process, which continues
until the patch is ready to be committed. Specifically,
once a patch is sent out for review, it should receive an
explicit looks good before it is committed.
So long as it is explicit, this can take whatever form makes
sense for the review method.Timeouts are not a substitute for review.Sometimes code reviews will take longer than you would hope
for, especially for larger features. Accepted ways to speed up
review times for your patches are:Review other people's patches. If you help out,
everybody will be more willing to do the same for you;
goodwill is our currency.Ping the patch. If it is urgent, provide reasons why
it is important to you to get this patch landed and ping
it every couple of days. If it is not urgent, the common
courtesy ping rate is one week. Remember that you are
asking for valuable time from other professional
developers.Ask for help on mailing lists, IRC, etc. Others
may be able to either help you directly, or suggest a
reviewer.Split your patch into multiple smaller patches that
build on each other. The smaller your patch, the higher
the probability that somebody will take a quick look at
it.When making large changes, it is helpful to keep this
in mind from the beginning of the effort as breaking large
changes into smaller ones is often difficult after the
fact.Developers should participate in code reviews as both
reviewers and reviewees. If someone is kind enough to review
your code, you should return the favor for someone else.
Note that while anyone is welcome to review and give feedback
on a patch, only an appropriate subject-matter expert can
approve a change. This will usually be a committer who works
with the code in question on a regular basis.In some cases, no subject-matter expert may be available.
In those cases, a review by an experienced developer is
sufficient when coupled with appropriate testing.Commit Log MessagesThis section contains some suggestions and traditions for
how commit logs are formatted.As well as including an informative message with each
commit, some additional information may be needed.This information consists of one or more lines
containing the key word or phrase, a colon, tabs for formatting,
and then the additional information.The key words or phrases are:PR:The problem report (if any) which is affected
(typically, by being closed) by this commit.
Multiple PRs may be specified on one line, separated by
commas or spaces.Submitted by:The name and e-mail address of the person
that submitted the fix; for developers, just the
username on the &os; cluster.If the submitter is the maintainer of the port
being committed, include "(maintainer)"
after the email address.Avoid obfuscating the email address of the
submitter as this adds additional work when searching
logs.Reviewed by:The name and e-mail address of the person or
people that reviewed the change; for developers,
just the username on the &os; cluster. If a
patch was submitted to a mailing list for review,
and the review was favorable, then just include
the list name.Approved by:The name and e-mail address of the person or
people that approved the change; for developers, just
the username on the &os; cluster. It is customary to
get prior approval for a commit if it is to an area of
the tree to which you do not usually commit. In
addition, during the run up to a new release all commits
must be approved by the release
engineering team.While under mentorship, get mentor approval before
the commit. Enter the mentor's username in this field,
and note that they are a mentor:Approved by: username-of-mentor(mentor)If a team approved these commits then include the
team name followed by the username of the approver in
parentheses. For example:Approved by: re (username)Obtained from:The name of the project (if any) from which
the code was obtained. Do not use this line for the
name of an individual person.Sponsored by:Sponsoring organizations for this change, if any.
Separate multiple organizations with commas. If only a
portion of the work was sponsored, or different amounts
of sponsorship were provided to different authors,
please give appropriate credit in parentheses after each
sponsor name. For example, Example.com (alice,
code refactoring), Wormulon (bob), Momcorp
(cindy) shows that Alice was sponsored by
Example.com to do code refactoring, while Wormulon
sponsored Bob's work and Momcorp sponsored Cindy's work.
Other authors were either not sponsored or chose not to
list sponsorship.MFC after:To receive an e-mail reminder to
MFC at a later date, specify the
number of days, weeks, or months after which an
MFC is planned.MFC to:If the commit should be merged to a subset of
stable branches, specify the branch names.MFC with:If the commit should be merged together with
a previous one in a single
MFC commit (for example, where
this commit corrects a bug in the previous change),
specify the corresponding revision number.Relnotes:If the change is a candidate for inclusion in
the release notes for the next release from the branch,
set to yes.Security:If the change is related to a security
vulnerability or security exposure, include one or more
references or a description of the issue. If possible,
include a VuXML URL or a CVE ID.Event:The description for the event where this commit was
made. If this is a recurring event, add the year or
even the month to it. For example, this could be
FooBSDcon 2019. The idea behind this
line is to put recognition to conferences, gatherings,
and other types of meetups and to show that these are
useful to have. Please do not use the
Sponsored by: line for this as that
is meant for organizations sponsoring certain features
or developers working on them.Differential Revision:The full URL of the Phabricator review. This line
must be the last line. For
example:
https://reviews.freebsd.org/D1708.Commit Log for a Commit Based on a PRThe commit is based on a patch from a PR submitted by John
Smith. The commit message PR and
Submitted by fields are filled.....
PR: 12345
Submitted by: John Smith <John.Smith@example.com>Commit Log for a Commit Needing ReviewThe virtual memory system is being changed. After
posting patches to the appropriate mailing list (in this
case, freebsd-arch) and the changes have
been approved....
Reviewed by: -archCommit Log for a Commit Needing ApprovalCommit a port, after working with
the listed MAINTAINER, who said to go ahead and
commit....
Approved by: abc (maintainer)Where abc is the account name
of the person who approved.Commit Log for a Commit Bringing in Code from
OpenBSDCommitting some code based on work done in the
OpenBSD project....
Obtained from: OpenBSDCommit Log for a Change to &os.current; with a Planned
Commit to &os.stable; to Follow at a Later Date.Committing some code which will be merged from
&os.current; into the &os.stable; branch after two
weeks....
MFC after: 2 weeksWhere 2 is the number of days,
weeks, or months after which an MFC is
planned. The weeks option may be
day, days,
week, weeks,
month, months.It is often necessary to combine these.Consider the situation where a user has submitted a PR
containing code from the NetBSD project. Looking at the PR, the
developer sees it is not an area of the tree they normally work
in, so they have the change reviewed by the
arch mailing list. Since the change is
complex, the developer opts to MFC after one
month to allow adequate testing.The extra information to include in the commit would look
something likeExample Combined Commit LogPR: 54321
Submitted by: John Smith <John.Smith@example.com>
Reviewed by: -arch
Obtained from: NetBSD
MFC after: 1 month
Relnotes: yesPreferred License for New FilesThe &os; Project's full license policy can be found at https://www.FreeBSD.org/internal/software-license.html.
The rest of this section is intended to help you get started.
As a rule, when in doubt, ask. It is much easier to give advice
than to fix the source tree.The &os; Project suggests and uses this
text as the preferred license scheme:/*-
* SPDX-License-Identifier: BSD-2-Clause-FreeBSD
*
* Copyright (c) [year] [your name]
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
* [id for your version control system, if any]
*/The &os; project strongly discourages the so-called
"advertising clause" in new code. Due to the large number of
contributors to the &os; project, complying with this clause for
many commercial vendors has become difficult. If you have code
in the tree with the advertising clause, please consider
removing it. In fact, please consider using the above license
for your code.The &os; project discourages completely new licenses and
variations on the standard licenses. New licenses require the
approval of the &a.core; to reside in the
main repository. The more different licenses that are used in
the tree, the more problems that this causes to those wishing to
utilize this code, typically from unintended consequences from a
poorly worded license.Project policy dictates that code under some non-BSD
licenses must be placed only in specific sections of the
repository, and in some cases, compilation must be conditional
or even disabled by default. For example, the GENERIC kernel
must be compiled under only licenses identical to or
substantially similar to the BSD license. GPL, APSL, CDDL, etc,
licensed software must not be compiled into GENERIC.Developers are reminded that in open source, getting "open"
right is just as important as getting "source" right, as
improper handling of intellectual property has serious
consequences. Any questions or concerns should immediately be
brought to the attention of the core team.Keeping Track of Licenses Granted to the &os;
ProjectVarious software or data exist in the repositories where
the &os; project has been granted a special licence to be able
to use them. A case in point are the Terminus fonts for use
with &man.vt.4;. Here the author Dimitar Zhekov has allowed us
to use the "Terminus BSD Console" font under a 2-clause BSD
license rather than the regular Open Font License he normally
uses.It is clearly sensible to keep a record of any such
license grants. To that end, the &a.core; has decided to keep
an archive of them. Whenever the &os; project is granted a
special license we require the &a.core; to be notified. Any
developers involved in arranging such a license grant, please
send details to the &a.core; including:Contact details for people or organizations granting the
special license.What files, directories etc. in the repositories are
covered by the license grant including the revision numbers
where any specially licensed material was committed.The date the license comes into effect from. Unless
otherwise agreed, this will be the date the license was
issued by the authors of the software in question.The license text.A note of any restrictions, limitations or exceptions
that apply specifically to &os;'s usage of the licensed
material.Any other relevant information.Once the &a.core; is satisfied that all the necessary
details have been gathered and are correct, the secretary will
send a PGP-signed acknowledgement of receipt including the
license details. This receipt will be persistently archived and
serve as our permanent record of the license grant.The license archive should contain only details of license
grants; this is not the place for any discussions around
licensing or other subjects. Access to data within the license
archive will be available on request to the &a.core;.Developer RelationsWhen working directly on your own code or on code which is
already well established as your responsibility, then there is
probably little need to check with other committers before
jumping in with a commit. Working on a bug in an area of the
system which is clearly orphaned (and there are a few such
areas, to our shame), the same applies. When modifying
parts of the system which are maintained, formally, or
informally, consider asking for review just as a developer
would have before becoming a
committer. For ports, contact the listed
MAINTAINER in the
Makefile.To determine if an area of the tree is maintained, check the
MAINTAINERS file at the root of the tree. If nobody is listed,
scan the revision history to see who has committed
changes in the past. An example script that lists each person
who has committed to a given file along with the number of
commits each person has made can be found at on
freefall at
~eadler/bin/whodid. If queries go
unanswered or the committer otherwise indicates a lack of
interest in the area affected, go ahead and commit it.Avoid sending private emails to maintainers. Other people
might be interested in the conversation, not just the final
output.If there is any doubt about a commit for any reason at all,
have it reviewed before
committing. Better to have it flamed then and there rather than
when it is part of the repository. If a commit does results in
controversy erupting, it may be advisable to consider backing
the change out again until the matter is settled. Remember,
with a version control system we can always change it
back.Do not impugn the intentions of others. If they see a
different solution to a problem, or even a different problem, it
is probably not because they are stupid, because they have
questionable parentage, or because they are trying to destroy
hard work, personal image, or &os;, but basically because they
have a different outlook on the world. Different is
good.Disagree honestly. Argue your position from its merits,
be honest about any shortcomings it may have, and be open to
seeing their solution, or even their vision of the problem,
with an open mind.Accept correction. We are all fallible. When you have made
a mistake, apologize and get on with life. Do not beat up
yourself, and certainly do not beat up others for your mistake.
Do not waste time on embarrassment or recrimination, just fix
the problem and move on.Ask for help. Seek out (and give) peer reviews. One of
the ways open source software is supposed to excel is in the
number of eyeballs applied to it; this does not apply if nobody
will review code.If in Doubt...When unsure about something, whether it be a
technical issue or a project convention be sure to ask. If you
stay silent you will never make progress.If it relates to a technical issue ask on the public
mailing lists. Avoid the temptation to email the individual
person that knows the answer. This way everyone will be able to
learn from the question and the answer.For project specific or administrative questions
ask, in order:Your mentor or former mentor.An experienced committer on IRC, email, etc.Any team with a "hat", as they can give you a
definitive answer.If still not sure, ask on &a.developers;.Once your question is answered, if no one pointed you to
documentation that spelled out the answer to your question,
document it, as others will have the same question.BugzillaThe &os; Project utilizes
Bugzilla for tracking bugs and change
requests. Be sure that if you commit a fix or suggestion found
in the PR database to close it. It is also considered nice if
you take time to close any PRs associated with your commits, if
appropriate.Committers with
non-&os;.org
Bugzilla accounts can have the old account merged with the
&os;.org account by
following these steps:Log in using your old account.Open new bug. Choose Services as the
Product, and Bug Tracker as the
Component. In bug description list accounts you wish to be
merged.Log in using &os;.org account and post
comment to newly opened bug to confirm ownership. See for more details on how to
generate or set a password for your &os;.org account.If there are more than two accounts to merge, post
comments from each of them.You can find out more about
Bugzilla at:&os;
Problem Report Handling Guidelineshttps://www.FreeBSD.org/support.htmlPhabricatorThe &os; Project utilizes Phabricator
for code review requests. See the CodeReview
wiki page for details.Committers with
non-&os;.org
Phabricator accounts can have the old account renamed to the
&os;.org account by
following these steps:Change your Phabricator
account email to your &os;.org email.Open new bug on our bug tracker using your &os;.org account, see
for more information. Choose
Services as the Product, and
Code Review as the Component. In bug
description request that your
Phabricator account be renamed,
and provide a link to your
Phabricator user. For example,
https://reviews.freebsd.org/p/bob_example.com/Phabricator accounts cannot be
merged, please do not open a new account.Who's WhoBesides the repository meisters, there are other &os;
project members and teams whom you will probably get to know in
your role as a committer. Briefly, and by no means
all-inclusively, these are:&a.doceng;doceng is the group responsible for the documentation
build infrastructure, approving new documentation
committers, and ensuring that the &os; website and
documentation on the FTP site is up to date with respect
to the subversion tree. It is
not a conflict resolution body.
The vast majority of documentation related discussion
takes place on the &a.doc;. More details regarding the
doceng team can be found in its charter.
Committers interested in contributing to the documentation
should familiarize themselves with the Documentation
Project Primer.&a.re.members.email;These are the members of the &a.re;. This team is
responsible for setting release deadlines and controlling
the release process. During code freezes, the release
engineers have final authority on all changes to the
system for whichever branch is pending release status. If
there is something you want merged from &os.current; to
&os.stable; (whatever values those may have at any given
time), these are the people to talk to about it.&a.so.email;&a.so; is the
&os; Security
Officer and oversees the
&a.security-officer;.&a.wollman.email;If you need advice on obscure network internals or
are not sure of some potential change to the networking
subsystem you have in mind, Garrett is someone to talk
to. Garrett is also very knowledgeable on the various
standards applicable to &os;.&a.committers;&a.svn-src-all.name;, &a.svn-ports-all.name; and
&a.svn-doc-all.name; are the mailing lists that the
version control system uses to send commit messages to.
Never send email directly
to these lists. Only send replies to this list
when they are short and are directly related to a
commit.&a.developers;All committers are subscribed to -developers. This
list was created to be a forum for the committers
community issues. Examples are Core
voting, announcements, etc.The &a.developers; is for the exclusive use of &os;
committers. To develop &os;, committers must
have the ability to openly discuss matters that will be
resolved before they are publicly announced. Frank
discussions of work in progress are not suitable for open
publication and may harm &os;.All &os; committers are expected not to
not publish or forward messages from the
&a.developers; outside the list membership without
permission of all of the authors. Violators will be
removed from the
&a.developers;, resulting in a suspension of commit
privileges. Repeated or flagrant violations may result in
permanent revocation of commit privileges.This list is not intended as a
place for code reviews or for any technical discussion.
In fact using it as such hurts the &os; Project as it
gives a sense of a closed list where general decisions
affecting all of the &os; using community are made without
being open. Last, but not least
never, never ever, email the &a.developers; and
CC:/BCC: another &os; list. Never, ever email
another &os; email list and CC:/BCC: the &a.developers;.
Doing so can greatly diminish the benefits of this
list.SSH Quick-Start GuideIf you do not wish to type your password in every time
you use &man.ssh.1;, and you use keys to
authenticate, &man.ssh-agent.1; is there for your
convenience. If you want to use &man.ssh-agent.1;, make
sure that you run it before running other applications. X
users, for example, usually do this from their
.xsession or
.xinitrc. See &man.ssh-agent.1; for
details.Generate a key pair using &man.ssh-keygen.1;. The key
pair will wind up in your
$HOME/.ssh/
directory.Only ECDSA,
Ed25519 or RSA keys
are supported.Send your public key
($HOME/.ssh/id_ecdsa.pub,
$HOME/.ssh/id_ed25519.pub, or
$HOME/.ssh/id_rsa.pub)
to the person setting you up as a committer so it can be put
into
yourlogin
in
/etc/ssh-keys/ on
freefall.Now &man.ssh-add.1; can be used for
authentication once per session. It prompts for
the private key's pass phrase, and then stores it in the
authentication agent (&man.ssh-agent.1;). Use ssh-add
-d to remove keys stored in the agent.Test with a simple remote command: ssh
freefall.FreeBSD.org ls /usr.For more information, see
security/openssh-portable,
&man.ssh.1;, &man.ssh-add.1;, &man.ssh-agent.1;,
&man.ssh-keygen.1;, and &man.scp.1;.For information on adding, changing, or removing &man.ssh.1;
keys, see this
article.&coverity; Availability for &os; CommittersAll &os; developers can obtain access to
Coverity analysis results of all &os;
Project software. All who are interested in obtaining access to
the analysis results of the automated
Coverity runs, can sign up at Coverity
Scan.The &os; wiki includes a mini-guide for developers who are
interested in working with the &coverity; analysis reports: https://wiki.freebsd.org/CoverityPrevent.
Please note that this mini-guide is only readable by &os;
developers, so if you cannot access this page, you will have to
ask someone to add you to the appropriate Wiki access
list.Finally, all &os; developers who are going to use
&coverity; are always encouraged to ask for more details and
usage information, by posting any questions to the mailing list
of the &os; developers.The &os; Committers' Big List of RulesEveryone involved with the &os; project is expected to
abide by the Code of Conduct available from
https://www.FreeBSD.org/internal/code-of-conduct.html.
As committers, you form the public face of the project, and how
you behave has a vital impact on the public perception of it.
This guide expands on the parts of the
Code of Conduct specific to
committers.Respect other committers.Respect other contributors.Discuss any significant change
before committing.Respect existing maintainers (if listed in the
MAINTAINER field in
Makefile or in
MAINTAINER in the top-level
directory).Any disputed change must be backed out pending
resolution of the dispute if requested by a maintainer.
Security related changes may override a maintainer's wishes
at the Security Officer's discretion.Changes go to &os.current; before &os.stable; unless
specifically permitted by the release engineer or unless
they are not applicable to &os.current;. Any non-trivial or
non-urgent change which is applicable should also be allowed
to sit in &os.current; for at least 3 days before merging so
that it can be given sufficient testing. The release
engineer has the same authority over the &os.stable; branch
as outlined for the maintainer in rule #5.Do not fight in public with other committers; it looks
bad.Respect all code freezes and read the
committers and
developers mailing lists in a timely
manner so you know when a code freeze is in effect.When in doubt on any procedure, ask first!Test your changes before committing them.Do not commit to contributed software without
explicit approval from the respective
maintainers.As noted, breaking some of these rules can be grounds for
suspension or, upon repeated offense, permanent removal of
commit privileges. Individual members of core have the power to
temporarily suspend commit privileges until core as a whole has
the chance to review the issue. In case of an
emergency (a committer doing damage to the
repository), a temporary suspension may also be done by the
repository meisters. Only a 2/3 majority of core has the
authority to suspend commit privileges for longer than a week or
to remove them permanently. This rule does not exist to set
core up as a bunch of cruel dictators who can dispose of
committers as casually as empty soda cans, but to give the
project a kind of safety fuse. If someone is out of control, it
is important to be able to deal with this immediately rather
than be paralyzed by debate. In all cases, a committer whose
privileges are suspended or revoked is entitled to a
hearing by core, the total duration of the
suspension being determined at that time. A committer whose
privileges are suspended may also request a review of the
decision after 30 days and every 30 days thereafter (unless the
total suspension period is less than 30 days). A committer
whose privileges have been revoked entirely may request a review
after a period of 6 months has elapsed. This review policy is
strictly informal and, in all cases, core
reserves the right to either act on or disregard requests for
review if they feel their original decision to be the right
one.In all other aspects of project operation, core is a subset
of committers and is bound by the
same rules. Just because someone is in
core this does not mean that they have special dispensation to
step outside any of the lines painted here; core's
special powers only kick in when it acts as a
group, not on an individual basis. As individuals, the core
team members are all committers first and core second.DetailsRespect other committers.This means that you need to treat other committers as
the peer-group developers that they are. Despite our
occasional attempts to prove the contrary, one does not
get to be a committer by being stupid and nothing rankles
more than being treated that way by one of your peers.
Whether we always feel respect for one another or not (and
everyone has off days), we still have to
treat other committers with respect
at all times, on public forums and in private
email.Being able to work together long term is this
project's greatest asset, one far more important than any
set of changes to the code, and turning arguments about
code into issues that affect our long-term ability to work
harmoniously together is just not worth the trade-off by
any conceivable stretch of the imagination.To comply with this rule, do not send email when you
are angry or otherwise behave in a manner which is likely
to strike others as needlessly confrontational. First
calm down, then think about how to communicate in the most
effective fashion for convincing the other persons that
your side of the argument is correct, do not just blow off
some steam so you can feel better in the short term at the
cost of a long-term flame war. Not only is this very bad
energy economics, but repeated displays of
public aggression which impair our ability to work well
together will be dealt with severely by the project
leadership and may result in suspension or termination of
your commit privileges. The project leadership will take
into account both public and private communications
brought before it. It will not seek the disclosure of
private communications, but it will take it into account
if it is volunteered by the committers involved in the
complaint.All of this is never an option which the project's
leadership enjoys in the slightest, but unity comes first.
No amount of code or good advice is worth trading that
away.Respect other contributors.You were not always a committer. At one time you were
a contributor. Remember that at all times. Remember what
it was like trying to get help and attention. Do not
forget that your work as a contributor was very important
to you. Remember what it was like. Do not discourage,
belittle, or demean contributors. Treat them with
respect. They are our committers in waiting. They are
every bit as important to the project as committers.
Their contributions are as valid and as important as your
own. After all, you made many contributions before you
became a committer. Always remember that.Consider the points raised under
and apply them also to
contributors.Discuss any significant change
before committing.The repository is not where changes are initially
submitted for correctness or argued over, that happens
first in the mailing lists or by use of the Phabricator
service. The commit will only happen once something
resembling consensus has been reached. This does not mean
that permission is required before correcting every
obvious syntax error or manual page misspelling, just that
it is good to develop a feel for when a proposed change is
not quite such a no-brainer and requires some feedback
first. People really do not mind sweeping changes if the
result is something clearly better than what they had
before, they just do not like being
surprised by those changes. The very
best way of making sure that things are on the right track
is to have code reviewed by one or more other
committers.When in doubt, ask for review!Respect existing maintainers if listed.Many parts of &os; are not owned in
the sense that any specific individual will jump up and
yell if you commit a change to their area,
but it still pays to check first. One convention we use
is to put a maintainer line in the
Makefile for any package or subtree
which is being actively maintained by one or more people;
see https://www.FreeBSD.org/doc/en_US.ISO8859-1/books/developers-handbook/policies.html
for documentation on this. Where sections of code have
several maintainers, commits to affected areas by one
maintainer need to be reviewed by at least one other
maintainer. In cases where the
maintainer-ship of something is not clear,
look at the repository logs for the files
in question and see if someone has been working recently
or predominantly in that area.Any disputed change must be backed out pending
resolution of the dispute if requested by a maintainer.
Security related changes may override a maintainer's
wishes at the Security Officer's discretion.This may be hard to swallow in times of conflict (when
each side is convinced that they are in the right, of
course) but a version control system makes it unnecessary
to have an ongoing dispute raging when it is far easier to
simply reverse the disputed change, get everyone calmed
down again and then try to figure out what is the best way
to proceed. If the change turns out to be the best thing
after all, it can be easily brought back. If it turns out
not to be, then the users did not have to live with the
bogus change in the tree while everyone was busily
debating its merits. People very
rarely call for back-outs in the repository since
discussion generally exposes bad or controversial changes
before the commit even happens, but on such rare occasions
the back-out should be done without argument so that we
can get immediately on to the topic of figuring out
whether it was bogus or not.Changes go to &os.current; before &os.stable; unless
specifically permitted by the release engineer or unless
they are not applicable to &os.current;. Any non-trivial
or non-urgent change which is applicable should also be
allowed to sit in &os.current; for at least 3 days before
merging so that it can be given sufficient testing. The
release engineer has the same authority over the
&os.stable; branch as outlined in rule #5.This is another do not argue about it
issue since it is the release engineer who is ultimately
responsible (and gets beaten up) if a change turns out to
be bad. Please respect this and give the release engineer
your full cooperation when it comes to the &os.stable;
branch. The management of &os.stable; may frequently seem
to be overly conservative to the casual observer, but also
bear in mind the fact that conservatism is supposed to be
the hallmark of &os.stable; and different rules apply
there than in &os.current;. There is also really no point
in having &os.current; be a testing ground if changes are
merged over to &os.stable; immediately. Changes need a
chance to be tested by the &os.current; developers, so
allow some time to elapse before merging unless the
&os.stable; fix is critical, time sensitive or so obvious
as to make further testing unnecessary (spelling fixes to
manual pages, obvious bug/typo fixes, etc.) In other
words, apply common sense.Changes to the security branches (for example,
releng/9.3) must be approved by a
member of the &a.security-officer;, or in some cases, by a
member of the &a.re;.Do not fight in public with other committers; it looks
bad.This project has a public image to uphold and that
image is very important to all of us, especially if we are
to continue to attract new members. There will be
occasions when, despite everyone's very best attempts at
self-control, tempers are lost and angry words are
exchanged. The best thing that can be done in such cases
is to minimize the effects of this until everyone has
cooled back down. Do not air
angry words in public and do not forward private
correspondence or other private communications to public
mailing lists, mail aliases, instant messaging channels or
social media sites. What people say one-to-one is often
much less sugar-coated than what they would say in public,
and such communications therefore have no place there -
they only serve to inflame an already bad situation. If
the person sending a flame-o-gram at least had the
grace to send it privately, then have the grace to keep it
private yourself. If you feel you are being unfairly
treated by another developer, and it is causing you
anguish, bring the matter up with core rather than taking
it public. Core will do its best to play peace makers and
get things back to sanity. In cases where the dispute
involves a change to the codebase and the participants do
not appear to be reaching an amicable agreement, core may
appoint a mutually-agreeable third party to resolve the
dispute. All parties involved must then agree to be bound
by the decision reached by this third party.Respect all code freezes and read the
committers and
developers mailing list on a timely
basis so you know when a code freeze is in effect.Committing unapproved changes during a code freeze is
a really big mistake and committers are expected to keep
up-to-date on what is going on before jumping in after a
long absence and committing 10 megabytes worth of
accumulated stuff. People who abuse this on a regular
basis will have their commit privileges suspended until
they get back from the &os; Happy Reeducation Camp we
run in Greenland.When in doubt on any procedure, ask first!Many mistakes are made because someone is in a hurry
and just assumes they know the right way of doing
something. If you have not done it before, chances are
good that you do not actually know the way we do things
and really need to ask first or you are going to
completely embarrass yourself in public. There is no
shame in asking
how in the heck do I do this? We already
know you are an intelligent person; otherwise, you would
not be a committer.Test your changes before committing them.This may sound obvious, but if it really were so
obvious then we probably would not see so many cases of
people clearly not doing this. If your changes are to the
kernel, make sure you can still compile both GENERIC and
LINT. If your changes are anywhere else, make sure you
can still make world. If your changes are to a branch,
make sure your testing occurs with a machine which is
running that code. If you have a change which also may
break another architecture, be sure and test on all
supported architectures. Please refer to the
&os;
Internal Page for a list of available resources.
As other architectures are added to the &os; supported
platforms list, the appropriate shared testing resources
will be made available.Do not commit to contributed software without
explicit approval from the respective
maintainers.Contributed software is anything under the
src/contrib,
src/crypto, or
src/sys/contrib trees.The trees mentioned above are for contributed software
usually imported onto a vendor branch. Committing
something there may cause unnecessary headaches
when importing newer versions of the software. As a
general consider sending patches upstream to the vendor.
Patches may be committed to FreeBSD first with permission
of the maintainer.Reasons for modifying upstream software range from
wanting strict control over a tightly coupled dependency
to lack of portability in the canonical repository's
distribution of their code. Regardless of the reason,
effort to minimize the maintenance burden of fork is
helpful to fellow maintainers. Avoid committing trivial
or cosmetic changes to files since it makes every merge
thereafter more difficult: such patches need to be
manually re-verified every import.If a particular piece of software lacks a maintainer,
you are encouraged to take up ownership. If you are unsure
of the current maintainership email &a.arch; and
ask.Policy on Multiple Architectures&os; has added several new architecture ports during
recent release cycles and is truly no longer an &i386; centric
operating system. In an effort to make it easier to keep
&os; portable across the platforms we support, core has
developed this mandate:
Our 32-bit reference platform is &arch.i386;, and our
64-bit reference platform is &arch.amd64;. Major design
work (including major API and ABI changes) must prove
itself on at least one 32-bit and at least one 64-bit
platform, preferably the primary reference platforms,
before it may be committed to the source tree.
Synchronous Serial TransmissionSynchronous serial transmission requires that the sender
and receiver share a clock with one another, or that the
sender provide a strobe or other timing signal so that the
receiver knows when to read the next bit of
the data. In most forms of serial Synchronous
communication, if there is no data available at a given
instant to transmit, a fill character must be sent instead
so that data is always being transmitted. Synchronous
communication is usually more efficient because only data
bits are transmitted between sender and receiver, and
synchronous communication can be more costly if extra wiring
and circuits are required to share a clock signal between
the sender and receiver.A form of Synchronous transmission is used with printers
and fixed disk devices in that the data is sent on one set
of wires while a clock or strobe is sent on a different
wire. Printers and fixed disk devices are not normally
serial devices because most fixed disk interface standards
send an entire word of data for each clock or strobe signal
by using a separate wire for each bit of the word. In the
PC industry, these are known as Parallel devices.The standard serial communications hardware in the PC
does not support Synchronous operations. This mode is
described here for comparison purposes only.Asynchronous Serial TransmissionAsynchronous transmission allows data to be transmitted
without the sender having to send a clock signal to the
receiver. Instead, the sender and receiver must agree on
timing parameters in advance and special bits are added to
each word which are used to synchronize the sending and
receiving units.When a word is given to the UART for Asynchronous
transmissions, a bit called the "Start Bit" is added to the
beginning of each word that is to be transmitted. The Start
Bit is used to alert the receiver that a word of data is
about to be sent, and to force the clock in the receiver
into synchronization with the clock in the transmitter.
These two clocks must be accurate enough to not have the
frequency drift by more than 10% during the transmission of
the remaining bits in the word. (This requirement was set
in the days of mechanical teleprinters and is easily met by
modern electronic equipment.)After the Start Bit, the individual bits of the word of
data are sent, with the Least Significant Bit (LSB) being
sent first. Each bit in the transmission is transmitted for
exactly the same amount of time as all of the other bits,
and the receiver looks at the wire at
approximately halfway through the period assigned to each
bit to determine if the bit is a 1 or a
0. For example, if it takes two seconds
to send each bit, the receiver will examine the signal to
determine if it is a 1 or a
0 after one second has passed, then it
will wait two seconds and then examine the value of the next
bit, and so on.The sender does not know when the receiver has
looked at the value of the bit. The sender
only knows when the clock says to begin transmitting the
next bit of the word.When the entire data word has been sent, the transmitter
may add a Parity Bit that the transmitter generates. The
Parity Bit may be used by the receiver to perform simple
error checking. Then at least one Stop Bit is sent by the
transmitter.When the receiver has received all of the bits in the
data word, it may check for the Parity Bits (both sender and
receiver must agree on whether a Parity Bit is to be used),
and then the receiver looks for a Stop Bit. If the Stop Bit
does not appear when it is supposed to, the UART considers
the entire word to be garbled and will report a Framing
Error to the host processor when the data word is read. The
usual cause of a Framing Error is that the sender and
receiver clocks were not running at the same speed, or that
the signal was interrupted.Regardless of whether the data was received correctly or
not, the UART automatically discards the Start, Parity and
Stop bits. If the sender and receiver are configured
identically, these bits are not passed to the host.If another word is ready for transmission, the Start Bit
for the new word can be sent as soon as the Stop Bit for the
previous word has been sent.
- Because asynchronous data is self
+ As asynchronous data is self
synchronizing, if there is no data to transmit, the
transmission line can be idle.Other UART FunctionsIn addition to the basic job of converting data from
parallel to serial for transmission and from serial to
parallel on reception, a UART will usually provide
additional circuits for signals that can be used to indicate
the state of the transmission media, and to regulate the
flow of data in the event that the remote device is not
prepared to accept more data. For example, when the device
connected to the UART is a modem, the modem may report the
presence of a carrier on the phone line while the computer
may be able to instruct the modem to reset itself or to not
take calls by raising or lowering one more of these
extra signals. The function of each of these additional
signals is defined in the EIA RS232-C standard.The RS232-C and V.24 StandardsIn most computer systems, the UART is connected to
circuitry that generates signals that comply with the EIA
RS232-C specification. There is also a CCITT standard named
V.24 that mirrors the specifications included in
RS232-C.RS232-C Bit Assignments (Marks and Spaces)In RS232-C, a value of 1 is called
a Mark and a value of
0 is called a Space.
When a communication line is idle, the line is said to be
Marking, or transmitting continuous
1 values.The Start bit always has a value of
0 (a Space). The Stop Bit always has a
value of 1 (a Mark). This means that
there will always be a Mark (1) to Space (0) transition on
the line at the start of every word, even when multiple
word are transmitted back to back. This guarantees that
sender and receiver can resynchronize their clocks
regardless of the content of the data bits that are being
transmitted.The idle time between Stop and Start bits does not
have to be an exact multiple (including zero) of the bit
rate of the communication link, but most UARTs are
designed this way for simplicity.In RS232-C, the "Marking" signal (a
1) is represented by a voltage between
-2 VDC and -12 VDC, and a "Spacing" signal (a
0) is represented by a voltage between
0 and +12 VDC. The transmitter is supposed to send +12
VDC or -12 VDC, and the receiver is supposed to allow for
some voltage loss in long cables. Some transmitters in
low power devices (like portable computers) sometimes use
only +5 VDC and -5 VDC, but these values are still
acceptable to a RS232-C receiver, provided that the cable
lengths are short.RS232-C Break SignalRS232-C also specifies a signal called a
Break, which is caused by sending
continuous Spacing values (no Start or Stop bits). When
there is no electricity present on the data circuit, the
line is considered to be sending
Break.The Break signal must be of a
duration longer than the time it takes to send a complete
byte plus Start, Stop and Parity bits. Most UARTs can
distinguish between a Framing Error and a Break, but if
the UART cannot do this, the Framing Error detection can
be used to identify Breaks.In the days of teleprinters, when numerous printers
around the country were wired in series (such as news
services), any unit could cause a Break
by temporarily opening the entire circuit so that no
current flowed. This was used to allow a location with
urgent news to interrupt some other location that was
currently sending information.In modern systems there are two types of Break
signals. If the Break is longer than 1.6 seconds, it is
considered a "Modem Break", and some modems can be
programmed to terminate the conversation and go on-hook or
enter the modems' command mode when the modem detects this
signal. If the Break is smaller than 1.6 seconds, it
signifies a Data Break and it is up to the remote computer
to respond to this signal. Sometimes this form of Break
is used as an Attention or Interrupt signal and sometimes
is accepted as a substitute for the ASCII CONTROL-C
character.Marks and Spaces are also equivalent to
Holes and No Holes in paper
tape systems.Breaks cannot be generated from paper tape or from
any other byte value, since bytes are always sent with
Start and Stop bit. The UART is usually capable of
generating the continuous Spacing signal in response to
a special command from the host processor.RS232-C DTE and DCE DevicesThe RS232-C specification defines two types of
equipment: the Data Terminal Equipment (DTE) and the Data
Carrier Equipment (DCE). Usually, the DTE device is the
terminal (or computer), and the DCE is a modem. Across
the phone line at the other end of a conversation, the
receiving modem is also a DCE device and the computer that
is connected to that modem is a DTE device. The DCE
device receives signals on the pins that the DTE device
transmits on, and vice versa.When two devices that are both DTE or both DCE must be
connected together without a modem or a similar media
translator between them, a NULL modem must be used. The
NULL modem electrically re-arranges the cabling so that
the transmitter output is connected to the receiver input
on the other device, and vice versa. Similar translations
are performed on all of the control signals so that each
device will see what it thinks are DCE (or DTE) signals
from the other device.The number of signals generated by the DTE and DCE
devices are not symmetrical. The DTE device generates
fewer signals for the DCE device than the DTE device
receives from the DCE.RS232-C Pin AssignmentsThe EIA RS232-C specification (and the ITU equivalent,
V.24) calls for a twenty-five pin connector (usually a
DB25) and defines the purpose of most of the pins in that
connector.In the IBM Personal Computer and similar systems, a
subset of RS232-C signals are provided via nine pin
connectors (DB9). The signals that are not included on
the PC connector deal mainly with synchronous operation,
and this transmission mode is not supported by the UART
that IBM selected for use in the IBM PC.Depending on the computer manufacturer, a DB25, a DB9,
or both types of connector may be used for RS232-C
communications. (The IBM PC also uses a DB25 connector
for the parallel printer interface which causes some
confusion.)Below is a table of the RS232-C signal assignments in
the DB25 and DB9 connectors.DB25 RS232-C PinDB9 IBM PC
PinEIA Circuit SymbolCCITT Circuit SymbolCommon
NameSignal SourceDescription1-AA101PG/FG-Frame/Protective Ground23BA103TDDTETransmit Data32BB104RDDCEReceive Data47CA105RTSDTERequest to Send58CB106CTSDCEClear to Send66CC107DSRDCEData Set Ready75AV102SG/GND-Signal Ground81CF109DCD/CDDCEData Carrier Detect9-----Reserved for Test10-----Reserved for Test11-----Reserved for Test12-CI122SRLSDDCESec. Recv. Line Signal Detector13-SCB121SCTSDCESecondary Clear to Send14-SBA118STDDTESecondary Transmit Data15-DB114TSETDCETrans. Sig. Element Timing16-SBB119SRDDCESecondary Received Data17-DD115RSETDCEReceiver Signal Element Timing18--141LOOPDTELocal Loopback19-SCA120SRSDTESecondary Request to Send204CD108.2DTRDTEData Terminal Ready21---RDLDTERemote Digital Loopback229CE125RIDCERing Indicator23-CH111DSRSDTEData Signal Rate Selector24-DA113TSETDTETrans. Sig. Element Timing25--142-DCETest ModeBits, Baud and SymbolsBaud is a measurement of transmission speed in
- asynchronous communication. Because of advances in modem
+ asynchronous communication. Due to advances in modem
communication technology, this term is frequently misused
when describing the data rates in newer devices.Traditionally, a Baud Rate represents the number of bits
that are actually being sent over the media, not the amount
of data that is actually moved from one DTE device to the
other. The Baud count includes the overhead bits Start, Stop
and Parity that are generated by the sending UART and
removed by the receiving UART. This means that seven-bit
words of data actually take 10 bits to be completely
transmitted. Therefore, a modem capable of moving 300 bits
per second from one place to another can normally only move
30 7-bit words if Parity is used and one Start and Stop bit
are present.If 8-bit data words are used and Parity bits are also
used, the data rate falls to 27.27 words per second, because
it now takes 11 bits to send the eight-bit words, and the
modem still only sends 300 bits per second.The formula for converting bytes per second into a baud
rate and vice versa was simple until error-correcting modems
came along. These modems receive the serial stream of bits
from the UART in the host computer (even when internal
modems are used the data is still frequently serialized) and
converts the bits back into bytes. These bytes are then
combined into packets and sent over the phone line using a
Synchronous transmission method. This means that the Stop,
Start, and Parity bits added by the UART in the DTE (the
computer) were removed by the modem before transmission by
the sending modem. When these bytes are received by the
remote modem, the remote modem adds Start, Stop and Parity
bits to the words, converts them to a serial format and then
sends them to the receiving UART in the remote computer, who
then strips the Start, Stop and Parity bits.The reason all these extra conversions are done is so
that the two modems can perform error correction, which
means that the receiving modem is able to ask the sending
modem to resend a block of data that was not received with
the correct checksum. This checking is handled by the
modems, and the DTE devices are usually unaware that the
process is occurring.By striping the Start, Stop and Parity bits, the
additional bits of data that the two modems must share
between themselves to perform error-correction are mostly
concealed from the effective transmission rate seen by the
sending and receiving DTE equipment. For example, if a
modem sends ten 7-bit words to another modem without
including the Start, Stop and Parity bits, the sending modem
will be able to add 30 bits of its own information that the
receiving modem can use to do error-correction without
impacting the transmission speed of the real data.The use of the term Baud is further confused by modems
that perform compression. A single 8-bit word passed over
the telephone line might represent a dozen words that were
transmitted to the sending modem. The receiving modem will
expand the data back to its original content and pass that
data to the receiving DTE.Modern modems also include buffers that allow the rate
that bits move across the phone line (DCE to DCE) to be a
different speed than the speed that the bits move between
the DTE and DCE on both ends of the conversation. Normally
the speed between the DTE and DCE is higher than the DCE to
DCE speed because of the use of compression by the
modems.
- Because the number of bits needed to describe a byte
+ As the number of bits needed to describe a byte
varied during the trip between the two machines plus the
differing bits-per-seconds speeds that are used present on
the DTE-DCE and DCE-DCE links, the usage of the term Baud to
describe the overall communication speed causes problems and
can misrepresent the true transmission speed. So Bits Per
Second (bps) is the correct term to use to describe the
transmission rate seen at the DCE to DCE interface and Baud
or Bits Per Second are acceptable terms to use when a
connection is made between two systems with a wired
connection, or if a modem is in use that is not performing
error-correction or compression.Modern high speed modems (2400, 9600, 14,400, and
19,200bps) in reality still operate at or below 2400 baud,
or more accurately, 2400 Symbols per second. High speed
modem are able to encode more bits of data into each Symbol
using a technique called Constellation Stuffing, which is
why the effective bits per second rate of the modem is
higher, but the modem continues to operate within the
limited audio bandwidth that the telephone system provides.
Modems operating at 28,800 and higher speeds have variable
Symbol rates, but the technique is the same.The IBM Personal Computer UARTStarting with the original IBM Personal Computer, IBM
selected the National Semiconductor INS8250 UART for use in
the IBM PC Parallel/Serial Adapter. Subsequent generations
of compatible computers from IBM and other vendors continued
to use the INS8250 or improved versions of the National
Semiconductor UART family.National Semiconductor UART Family TreeThere have been several versions and subsequent
generations of the INS8250 UART. Each major version is
described below.INS8250 -> INS8250B
\
\
\-> INS8250A -> INS82C50A
\
\
\-> NS16450 -> NS16C450
\
\
\-> NS16550 -> NS16550A -> PC16550DINS8250This part was used in the original IBM PC and
IBM PC/XT. The original name for this part was the
INS8250 ACE (Asynchronous Communications Element)
and it is made from NMOS technology.The 8250 uses eight I/O ports and has a one-byte
send and a one-byte receive buffer. This original
UART has several race conditions and other
flaws. The original IBM BIOS includes code to work
around these flaws, but this made the BIOS dependent
on the flaws being present, so subsequent parts like
the 8250A, 16450 or 16550 could not be used in the
original IBM PC or IBM PC/XT.INS8250-BThis is the slower speed of the INS8250 made
from NMOS technology. It contains the same problems
as the original INS8250.INS8250AAn improved version of the INS8250 using XMOS
technology with various functional flaws
corrected. The INS8250A was used initially in PC
clone computers by vendors who used
- clean BIOS designs. Because of the
+ clean BIOS designs. Due to the
corrections in the chip, this part could not be used
with a BIOS compatible with the INS8250 or
INS8250B.INS82C50AThis is a CMOS version (low power consumption)
of the INS8250A and has similar functional
characteristics.NS16450Same as NS8250A with improvements so it can be
used with faster CPU bus designs. IBM used this
part in the IBM AT and updated the IBM BIOS to no
longer rely on the bugs in the INS8250.NS16C450This is a CMOS version (low power consumption)
of the NS16450.NS16550Same as NS16450 with a 16-byte send and receive
buffer but the buffer design was flawed and could
not be reliably be used.NS16550ASame as NS16550 with the buffer flaws
corrected. The 16550A and its successors have become
the most popular UART design in the PC industry,
mainly due to its ability to reliably handle higher
data rates on operating systems with sluggish
interrupt response times.NS16C552This component consists of two NS16C550A CMOS
UARTs in a single package.PC16550DSame as NS16550A with subtle flaws
corrected. This is revision D of the 16550 family
and is the latest design available from National
Semiconductor.The NS16550AF and the PC16550D are the same thingNational reorganized their part numbering system a few
years ago, and the NS16550AFN no longer exists by that
name. (If you have a NS16550AFN, look at the date code on
the part, which is a four digit number that usually starts
with a nine. The first two digits of the number are the
year, and the last two digits are the week in that year
when the part was packaged. If you have a NS16550AFN, it
is probably a few years old.)The new numbers are like PC16550DV, with minor
differences in the suffix letters depending on the package
material and its shape. (A description of the numbering
system can be found below.)It is important to understand that in some stores, you
may pay $15(US) for a NS16550AFN made in 1990 and in
the next bin are the new PC16550DN parts with minor fixes
that National has made since the AFN part was in
production, the PC16550DN was probably made in the past
six months and it costs half (as low as $5(US) in
volume) as much as the NS16550AFN because they are readily
available.As the supply of NS16550AFN chips continues to shrink,
the price will probably continue to increase until more
people discover and accept that the PC16550DN really has
the same function as the old part number.National Semiconductor Part Numbering SystemThe older NSnnnnnrqp part
numbers are now of the format
PCnnnnnrgp.The r is the revision
field. The current revision of the 16550 from National
Semiconductor is D.The p is the package-type
field. The types are:"F"QFP(quad flat pack) L lead type"N"DIP(dual inline package) through hole straight lead
type"V"LPCC(lead plastic chip carrier) J lead typeThe g is the product grade
field. If an I precedes the
package-type letter, it indicates an
industrial grade part, which has higher
specs than a standard part but not as high as Military
Specification (Milspec) component. This is an optional
field.So what we used to call a NS16550AFN (DIP Package) is
now called a PC16550DN or PC16550DIN.Other Vendors and Similar UARTsOver the years, the 8250, 8250A, 16450 and 16550 have
been licensed or copied by other chip vendors. In the case
of the 8250, 8250A and 16450, the exact circuit (the
megacell) was licensed to many vendors,
including Western Digital and Intel. Other vendors
reverse-engineered the part or produced emulations that had
similar behavior.In internal modems, the modem designer will frequently
emulate the 8250A/16450 with the modem microprocessor, and
the emulated UART will frequently have a hidden buffer
- consisting of several hundred bytes. Because of the size of
+ consisting of several hundred bytes. Due to the size of
the buffer, these emulations can be as reliable as a 16550A
in their ability to handle high speed data. However, most
operating systems will still report that the UART is only a
8250A or 16450, and may not make effective use of the extra
buffering present in the emulated UART unless special
drivers are used.Some modem makers are driven by market forces to abandon
a design that has hundreds of bytes of buffer and instead
use a 16550A UART so that the product will compare favorably
in market comparisons even though the effective performance
may be lowered by this action.A common misconception is that all parts with
16550A written on them are identical in
performance. There are differences, and in some cases,
outright flaws in most of these 16550A clones.When the NS16550 was developed, the National
Semiconductor obtained several patents on the design and
they also limited licensing, making it harder for other
- vendors to provide a chip with similar features. Because of
+ vendors to provide a chip with similar features. As a result of
the patents, reverse-engineered designs and emulations had
to avoid infringing the claims covered by the patents.
Subsequently, these copies almost never perform exactly the
same as the NS16550A or PC16550D, which are the parts most
computer and modem makers want to buy but are sometimes
unwilling to pay the price required to get the genuine
part.Some of the differences in the clone 16550A parts are
unimportant, while others can prevent the device from being
used at all with a given operating system or driver. These
differences may show up when using other drivers, or when
particular combinations of events occur that were not well
tested or considered in the &windows; driver. This is because
most modem vendors and 16550-clone makers use the Microsoft
drivers from &windows; for Workgroups 3.11 and the µsoft;
&ms-dos; utility as the primary tests for compatibility with
the NS16550A. This over-simplistic criteria means that if a
different operating system is used, problems could appear
due to subtle differences between the clones and genuine
components.National Semiconductor has made available a program
named COMTEST that performs
compatibility tests independent of any OS drivers. It
should be remembered that the purpose of this type of
program is to demonstrate the flaws in the products of the
competition, so the program will report major as well as
extremely subtle differences in behavior in the part being
tested.In a series of tests performed by the author of this
document in 1994, components made by National Semiconductor,
TI, StarTech, and CMD as well as megacells and emulations
embedded in internal modems were tested with COMTEST. A
difference count for some of these components is listed
- below. Because these tests were performed in 1994, they may
+ below. Since these tests were performed in 1994, they may
not reflect the current performance of the given product
from a vendor.It should be noted that COMTEST normally aborts when an
excessive number or certain types of problems have been
detected. As part of this testing, COMTEST was modified so
that it would not abort no matter how many differences were
encountered.VendorPart NumberErrors (aka "differences" reported)National(PC16550DV)0National(NS16550AFN)0National(NS16C552V)0TI(TL16550AFN)3CMD(16C550PE)19StarTech(ST16C550J)23RockwellReference modem with internal 16550 or an
emulation (RC144DPi/C3000-25)117SierraModem with an internal 16550
(SC11951/SC11351)91To date, the author of this document has not found any
non-National parts that report zero differences using the
COMTEST program. It should also be noted that National
has had five versions of the 16550 over the years and the
newest parts behave a bit differently than the classic
NS16550AFN that is considered the benchmark for
functionality. COMTEST appears to turn a blind eye to the
differences within the National product line and reports
no errors on the National parts (except for the original
16550) even when there are official erratas that describe
bugs in the A, B and C revisions of the parts, so this
bias in COMTEST must be taken into account.It is important to understand that a simple count of
differences from COMTEST does not reveal a lot about what
differences are important and which are not. For example,
about half of the differences reported in the two modems
listed above that have internal UARTs were caused by the
clone UARTs not supporting five- and six-bit character
modes. The real 16550, 16450, and 8250 UARTs all support
these modes and COMTEST checks the functionality of these
modes so over fifty differences are reported. However,
almost no modern modem supports five- or six-bit characters,
particularly those with error-correction and compression
capabilities. This means that the differences related to
five- and six-bit character modes can be discounted.Many of the differences COMTEST reports have to do with
timing. In many of the clone designs, when the host reads
from one port, the status bits in some other port may not
update in the same amount of time (some faster, some slower)
as a real NS16550AFN and COMTEST looks
for these differences. This means that the number of
differences can be misleading in that one device may only
have one or two differences but they are extremely serious,
and some other device that updates the status registers
faster or slower than the reference part (that would
probably never affect the operation of a properly written
driver) could have dozens of differences reported.COMTEST can be used as a screening tool to alert the
administrator to the presence of potentially incompatible
components that might cause problems or have to be handled
as a special case.If you run COMTEST on a 16550 that is in a modem or a
modem is attached to the serial port, you need to first
issue a ATE0&W command to the modem so that the modem
will not echo any of the test characters. If you forget to
do this, COMTEST will report at least this one
difference:Error (6)...Timeout interrupt failed: IIR = c1 LSR = 618250/16450/16550 RegistersThe 8250/16450/16550 UART occupies eight contiguous I/O
port addresses. In the IBM PC, there are two defined
locations for these eight ports and they are known
collectively as COM1 and COM2. The makers of PC-clones and
add-on cards have created two additional areas known as COM3
and COM4, but these extra COM ports conflict with other
hardware on some systems. The most common conflict is with
video adapters that provide IBM 8514 emulation.COM1 is located from 0x3f8 to 0x3ff and normally uses
IRQ 4. COM2 is located from 0x2f8 to 0x2ff and normally uses
IRQ 3. COM3 is located from 0x3e8 to 0x3ef and has no
standardized IRQ. COM4 is located from 0x2e8 to 0x2ef and has
no standardized IRQ.A description of the I/O ports of the 8250/16450/16550
UART is provided below.I/O PortAccess AllowedDescription+0x00write (DLAB==0)Transmit Holding Register
(THR).Information written to this port are
treated as data words and will be transmitted by the
UART.+0x00read (DLAB==0)Receive Buffer Register (RBR).Any
data words received by the UART form the serial link are
accessed by the host by reading this
port.+0x00write/read (DLAB==1)Divisor Latch LSB (DLL)This value
will be divided from the master input clock (in the IBM
PC, the master clock is 1.8432MHz) and the resulting
clock will determine the baud rate of the UART. This
register holds bits 0 thru 7 of the
divisor.+0x01write/read (DLAB==1)Divisor Latch MSB (DLH)This value
will be divided from the master input clock (in the IBM
PC, the master clock is 1.8432MHz) and the resulting
clock will determine the baud rate of the UART. This
register holds bits 8 thru 15 of the
divisor.+0x01write/read (DLAB==0)Interrupt Enable Register
(IER)The 8250/16450/16550 UART
classifies events into one of four categories.
Each category can be configured to generate an
interrupt when any of the events occurs. The
8250/16450/16550 UART generates a single external
interrupt signal regardless of how many events in
the enabled categories have occurred. It is up to
the host processor to respond to the interrupt and
then poll the enabled interrupt categories
(usually all categories have interrupts enabled)
to determine the true cause(s) of the
interrupt.Bit 7Reserved, always 0.Bit 6Reserved, always 0.Bit 5Reserved, always 0.Bit 4Reserved, always 0.Bit 3Enable Modem Status Interrupt (EDSSI). Setting
this bit to "1" allows the UART to generate an
interrupt when a change occurs on one or more of the
status lines.Bit 2Enable Receiver Line Status Interrupt (ELSI)
Setting this bit to "1" causes the UART to generate
an interrupt when the an error (or a BREAK signal)
has been detected in the incoming data.Bit 1Enable Transmitter Holding Register Empty
Interrupt (ETBEI) Setting this bit to "1" causes the
UART to generate an interrupt when the UART has room
for one or more additional characters that are to be
transmitted.Bit 0Enable Received Data Available Interrupt
(ERBFI) Setting this bit to "1" causes the UART to
generate an interrupt when the UART has received
enough characters to exceed the trigger level of the
FIFO, or the FIFO timer has expired (stale data), or
a single character has been received when the FIFO
is disabled.+0x02writeFIFO Control Register (FCR)
(This port does not exist on the 8250 and 16450
UART.)Bit 7Receiver Trigger Bit #1Bit 6Receiver Trigger Bit
#0These two bits control at what
point the receiver is to generate an interrupt
when the FIFO is active.76How many words are received
before an interrupt is generated0010141081114Bit 5Reserved, always 0.Bit 4Reserved, always 0.Bit 3DMA Mode Select. If Bit 0 is
set to "1" (FIFOs enabled), setting this bit changes
the operation of the -RXRDY and -TXRDY signals from
Mode 0 to Mode 1.Bit 2Transmit FIFO Reset. When a
"1" is written to this bit, the contents of the FIFO
are discarded. Any word currently being transmitted
will be sent intact. This function is useful in
aborting transfers.Bit 1Receiver FIFO Reset. When a
"1" is written to this bit, the contents of the FIFO
are discarded. Any word currently being assembled
in the shift register will be received
intact.Bit 016550 FIFO Enable. When set,
both the transmit and receive FIFOs are enabled.
Any contents in the holding register, shift
registers or FIFOs are lost when FIFOs are enabled
or disabled.+0x02readInterrupt Identification
RegisterBit 7FIFOs enabled. On the
8250/16450 UART, this bit is zero.Bit 6FIFOs enabled. On the
8250/16450 UART, this bit is zero.Bit 5Reserved, always 0.Bit 4Reserved, always 0.Bit 3Interrupt ID Bit #2. On the
8250/16450 UART, this bit is zero.Bit 2Interrupt ID Bit #1Bit 1Interrupt ID Bit #0.These
three bits combine to report the category of
event that caused the interrupt that is in
progress. These categories have priorities,
so if multiple categories of events occur at
the same time, the UART will report the more
important events first and the host must
resolve the events in the order they are
reported. All events that caused the current
interrupt must be resolved before any new
interrupts will be generated. (This is a
limitation of the PC architecture.)210PriorityDescription011FirstReceived Error (OE, PE, BI, or
FE)010SecondReceived Data Available110SecondTrigger level identification
(Stale data in receive buffer)001ThirdTransmitter has room for more
words (THRE)000FourthModem Status Change (-CTS, -DSR,
-RI, or -DCD)Bit 0Interrupt Pending Bit. If this
bit is set to "0", then at least one interrupt is
pending.+0x03write/readLine Control Register
(LCR)Bit 7Divisor Latch Access Bit
(DLAB). When set, access to the data
transmit/receive register (THR/RBR) and the
Interrupt Enable Register (IER) is disabled. Any
access to these ports is now redirected to the
Divisor Latch Registers. Setting this bit, loading
the Divisor Registers, and clearing DLAB should be
done with interrupts disabled.Bit 6Set Break. When set to "1",
the transmitter begins to transmit continuous
Spacing until this bit is set to "0". This
overrides any bits of characters that are being
transmitted.Bit 5Stick Parity. When parity is
enabled, setting this bit causes parity to always be
"1" or "0", based on the value of Bit 4.Bit 4Even Parity Select (EPS). When
parity is enabled and Bit 5 is "0", setting this bit
causes even parity to be transmitted and expected.
Otherwise, odd parity is used.Bit 3Parity Enable (PEN). When set
to "1", a parity bit is inserted between the last
bit of the data and the Stop Bit. The UART will
also expect parity to be present in the received
data.Bit 2Number of Stop Bits (STB). If
set to "1" and using 5-bit data words, 1.5 Stop Bits
are transmitted and expected in each data word. For
6, 7 and 8-bit data words, 2 Stop Bits are
transmitted and expected. When this bit is set to
"0", one Stop Bit is used on each data word.Bit 1Word Length Select Bit #1
(WLSB1)Bit 0Word Length Select Bit #0
(WLSB0)Together these
bits specify the number of bits in each data
word.10Word
Length005 Data
Bits016 Data
Bits107 Data
Bits118 Data
Bits+0x04write/readModem Control Register
(MCR)Bit 7Reserved, always 0.Bit 6Reserved, always 0.Bit 5Reserved, always 0.Bit 4Loop-Back Enable. When set to "1", the UART
transmitter and receiver are internally connected
together to allow diagnostic operations. In
addition, the UART modem control outputs are
connected to the UART modem control inputs. CTS is
connected to RTS, DTR is connected to DSR, OUT1 is
connected to RI, and OUT 2 is connected to
DCD.Bit 3OUT 2. An auxiliary output that the host
processor may set high or low. In the IBM PC serial
adapter (and most clones), OUT 2 is used to
tri-state (disable) the interrupt signal from the
8250/16450/16550 UART.Bit 2OUT 1. An auxiliary output that the host
processor may set high or low. This output is not
used on the IBM PC serial adapter.Bit 1Request to Send (RTS). When set to "1", the
output of the UART -RTS line is Low
(Active).Bit 0Data Terminal Ready (DTR). When set to "1",
the output of the UART -DTR line is Low
(Active).+0x05write/readLine Status Register
(LSR)Bit 7Error in Receiver FIFO. On the 8250/16450
UART, this bit is zero. This bit is set to "1" when
any of the bytes in the FIFO have one or more of the
following error conditions: PE, FE, or BI.Bit 6Transmitter Empty (TEMT). When set to "1",
there are no words remaining in the transmit FIFO
or the transmit shift register. The transmitter is
completely idle.Bit 5Transmitter Holding Register Empty (THRE).
When set to "1", the FIFO (or holding register) now
has room for at least one additional word to
transmit. The transmitter may still be transmitting
when this bit is set to "1".Bit 4Break Interrupt (BI). The receiver has
detected a Break signal.Bit 3Framing Error (FE). A Start Bit was detected
but the Stop Bit did not appear at the expected
time. The received word is probably
garbled.Bit 2Parity Error (PE). The parity bit was
incorrect for the word received.Bit 1Overrun Error (OE). A new word was received
and there was no room in the receive buffer. The
newly-arrived word in the shift register is
discarded. On 8250/16450 UARTs, the word in the
holding register is discarded and the newly- arrived
word is put in the holding register.Bit 0Data Ready (DR) One or more words are in the
receive FIFO that the host may read. A word must be
completely received and moved from the shift
register into the FIFO (or holding register for
8250/16450 designs) before this bit is set.+0x06write/readModem Status Register
(MSR)Bit 7Data Carrier Detect (DCD). Reflects the state
of the DCD line on the UART.Bit 6Ring Indicator (RI). Reflects the state of the
RI line on the UART.Bit 5Data Set Ready (DSR). Reflects the state of
the DSR line on the UART.Bit 4Clear To Send (CTS). Reflects the state of the
CTS line on the UART.Bit 3Delta Data Carrier Detect (DDCD). Set to "1"
if the -DCD line has changed state one more
time since the last time the MSR was read by the
host.Bit 2Trailing Edge Ring Indicator (TERI). Set to
"1" if the -RI line has had a low to high transition
since the last time the MSR was read by the
host.Bit 1Delta Data Set Ready (DDSR). Set to "1" if the
-DSR line has changed state one more time
since the last time the MSR was read by the
host.Bit 0Delta Clear To Send (DCTS). Set to "1" if the
-CTS line has changed state one more time
since the last time the MSR was read by the
host.+0x07write/readScratch Register (SCR). This register performs no
function in the UART. Any value can be written by the
host to this location and read by the host later
on.Beyond the 16550A UARTAlthough National Semiconductor has not offered any
components compatible with the 16550 that provide additional
features, various other vendors have. Some of these
components are described below. It should be understood
that to effectively utilize these improvements, drivers may
have to be provided by the chip vendor since most of the
popular operating systems do not support features beyond
those provided by the 16550.ST16650By default this part is similar to the NS16550A, but an
extended 32-byte send and receive buffer can be optionally
enabled. Made by StarTech.TIL16660By default this part behaves similar to the NS16550A,
but an extended 64-byte send and receive buffer can be
optionally enabled. Made by Texas Instruments.Hayes ESPThis proprietary plug-in card contains a 2048-byte send
and receive buffer, and supports data rates to
230.4Kbit/sec. Made by Hayes.In addition to these dumb UARTs, many vendors
produce intelligent serial communication boards. This type of
design usually provides a microprocessor that interfaces with
several UARTs, processes and buffers the data, and then alerts the
- main PC processor when necessary. Because the UARTs are not
+ main PC processor when necessary. As the UARTs are not
directly accessed by the PC processor in this type of
communication system, it is not necessary for the vendor to use
UARTs that are compatible with the 8250, 16450, or the 16550 UART.
This leaves the designer free to components that may have better
performance characteristics.Configuring the sio driverThe sio driver provides support
for NS8250-, NS16450-, NS16550 and NS16550A-based EIA RS-232C
(CCITT V.24) communications interfaces. Several multiport
cards are supported as well. See the &man.sio.4; manual page
for detailed technical documentation.Digi International (DigiBoard) PC/8Contributed by &a.awebster.email;. 26 August
1995.Here is a config snippet from a machine with a Digi
International PC/8 with 16550. It has 8 modems connected to
these 8 lines, and they work just great. Do not forget to
add options COM_MULTIPORT or it will not
work very well!device sio4 at isa? port 0x100 flags 0xb05
device sio5 at isa? port 0x108 flags 0xb05
device sio6 at isa? port 0x110 flags 0xb05
device sio7 at isa? port 0x118 flags 0xb05
device sio8 at isa? port 0x120 flags 0xb05
device sio9 at isa? port 0x128 flags 0xb05
device sio10 at isa? port 0x130 flags 0xb05
device sio11 at isa? port 0x138 flags 0xb05 irq 9The trick in setting this up is that the MSB of the
flags represent the last SIO port, in this case 11 so flags
are 0xb05.Boca 16Contributed by &a.whiteside.email;. 26 August
1995.The procedures to make a Boca 16 port board with FreeBSD
are pretty straightforward, but you will need a couple
things to make it work:You either need the kernel sources installed so you
can recompile the necessary options or you will need
someone else to compile it for you. The 2.0.5 default
kernel does not come with
multiport support enabled and you will need to add a
device entry for each port anyways.Two, you will need to know the interrupt and IO
setting for your Boca Board so you can set these options
properly in the kernel.One important note — the actual UART chips for the
Boca 16 are in the connector box, not on the internal board
itself. So if you have it unplugged, probes of those ports
will fail. I have never tested booting with the box
unplugged and plugging it back in, and I suggest you do not
either.If you do not already have a custom kernel
configuration file set up, refer to Kernel
Configuration chapter of the FreeBSD Handbook for
general procedures. The following are the specifics for the
Boca 16 board and assume you are using the kernel name
MYKERNEL and editing with vi.Add the line
options COM_MULTIPORT
to the config file.Where the current device
sion lines are, you
will need to add 16 more devices. The
following example is for a Boca Board with an interrupt
of 3, and a base IO address 100h. The IO address for
Each port is +8 hexadecimal from the previous port, thus
the 100h, 108h, 110h... addresses.device sio1 at isa? port 0x100 flags 0x1005
device sio2 at isa? port 0x108 flags 0x1005
device sio3 at isa? port 0x110 flags 0x1005
device sio4 at isa? port 0x118 flags 0x1005
…
device sio15 at isa? port 0x170 flags 0x1005
device sio16 at isa? port 0x178 flags 0x1005 irq 3The flags entry must be changed
from this example unless you are using the exact same
sio assignments. Flags are set according to
0xMYY
where M indicates the minor
number of the master port (the last port on a Boca 16)
and YY indicates if FIFO is
enabled or disabled(enabled), IRQ sharing is used(yes)
and if there is an AST/4 compatible IRQ control
register(no). In this example, flags
0x1005 indicates that the master port
is sio16. If I added another board and assigned sio17
through sio28, the flags for all 16 ports on
that board would be 0x1C05, where
1C indicates the minor number of the master port. Do
not change the 05 setting.Save and complete the kernel configuration,
recompile, install and reboot. Presuming you have
successfully installed the recompiled kernel and have it
set to the correct address and IRQ, your boot message
should indicate the successful probe of the Boca ports
as follows: (obviously the sio numbers, IO and IRQ could
be different)sio1 at 0x100-0x107 flags 0x1005 on isa
sio1: type 16550A (multiport)
sio2 at 0x108-0x10f flags 0x1005 on isa
sio2: type 16550A (multiport)
sio3 at 0x110-0x117 flags 0x1005 on isa
sio3: type 16550A (multiport)
sio4 at 0x118-0x11f flags 0x1005 on isa
sio4: type 16550A (multiport)
sio5 at 0x120-0x127 flags 0x1005 on isa
sio5: type 16550A (multiport)
sio6 at 0x128-0x12f flags 0x1005 on isa
sio6: type 16550A (multiport)
sio7 at 0x130-0x137 flags 0x1005 on isa
sio7: type 16550A (multiport)
sio8 at 0x138-0x13f flags 0x1005 on isa
sio8: type 16550A (multiport)
sio9 at 0x140-0x147 flags 0x1005 on isa
sio9: type 16550A (multiport)
sio10 at 0x148-0x14f flags 0x1005 on isa
sio10: type 16550A (multiport)
sio11 at 0x150-0x157 flags 0x1005 on isa
sio11: type 16550A (multiport)
sio12 at 0x158-0x15f flags 0x1005 on isa
sio12: type 16550A (multiport)
sio13 at 0x160-0x167 flags 0x1005 on isa
sio13: type 16550A (multiport)
sio14 at 0x168-0x16f flags 0x1005 on isa
sio14: type 16550A (multiport)
sio15 at 0x170-0x177 flags 0x1005 on isa
sio15: type 16550A (multiport)
sio16 at 0x178-0x17f irq 3 flags 0x1005 on isa
sio16: type 16550A (multiport master)If the messages go by too fast to see,
&prompt.root; dmesg | more
will show you the boot messages.Next, appropriate entries in
/dev for the devices must be made
using the /dev/MAKEDEV
script. This step can be omitted if you are running
FreeBSD 5.X with a kernel that has &man.devfs.5;
support compiled in.If you do need to create the /dev
entries, run the following as root:&prompt.root; cd /dev
&prompt.root; ./MAKEDEV tty1
&prompt.root; ./MAKEDEV cua1(everything in between)
&prompt.root; ./MAKEDEV ttyg
&prompt.root; ./MAKEDEV cuagIf you do not want or need call-out devices for some
reason, you can dispense with making the
cua* devices.If you want a quick and sloppy way to make sure the
devices are working, you can simply plug a modem into
each port and (as root)
&prompt.root; echo at > ttyd*
for each device you have made. You
should see the RX lights flash for each
working port.Support for Cheap Multi-UART CardsContributed by Helge Oldach
hmo@sep.hamburg.com, September
1999Ever wondered about FreeBSD support for your 20$
multi-I/O card with two (or more) COM ports, sharing IRQs?
Here is how:Usually the only option to support these kind of boards
is to use a distinct IRQ for each port. For example, if
your CPU board has an on-board COM1
port (aka sio0–I/O address
0x3F8 and IRQ 4) and you have an extension board with two
UARTs, you will commonly need to configure them as
COM2 (aka
sio1–I/O address 0x2F8 and
IRQ 3), and the third port (aka
sio2) as I/O 0x3E8 and IRQ 5.
Obviously this is a waste of IRQ resources, as it should be
basically possible to run both extension board ports using a
single IRQ with the COM_MULTIPORT
configuration described in the previous sections.Such cheap I/O boards commonly have a 4 by 3 jumper
matrix for the COM ports, similar to the following: o o o *
Port A |
o * o *
Port B |
o * o o
IRQ 2 3 4 5Shown here is port A wired for IRQ 5 and port B wired
for IRQ 3. The IRQ columns on your specific board may
vary—other boards may supply jumpers for IRQs 3, 4, 5,
and 7 instead.One could conclude that wiring both ports for IRQ 3
using a handcrafted wire-made jumper covering all three
connection points in the IRQ 3 column would solve the issue,
but no. You cannot duplicate IRQ 3 because the output
drivers of each UART are wired in a totem
pole fashion, so if one of the UARTs drives IRQ 3,
the output signal will not be what you would expect.
Depending on the implementation of the extension board or
your motherboard, the IRQ 3 line will continuously stay up,
or always stay low.You need to decouple the IRQ drivers for the two UARTs,
so that the IRQ line of the board only goes up if (and only
if) one of the UARTs asserts a IRQ, and stays low otherwise.
The solution was proposed by Joerg Wunsch
j@ida.interface-business.de: To solder up a
wired-or consisting of two diodes (Germanium or
Schottky-types strongly preferred) and a 1 kOhm resistor.
Here is the schematic, starting from the 4 by 3 jumper field
above: Diode
+---------->|-------+
/ |
o * o o | 1 kOhm
Port A +----|######|-------+
o * o o | |
Port B `-------------------+ ==+==
o * o o | Ground
\ |
+--------->|-------+
IRQ 2 3 4 5 DiodeThe cathodes of the diodes are connected to a common
point, together with a 1 kOhm pull-down resistor. It is
essential to connect the resistor to ground to avoid
floating of the IRQ line on the bus.Now we are ready to configure a kernel. Staying with
this example, we would configure:# standard on-board COM1 port
device sio0 at isa? port "IO_COM1" flags 0x10
# patched-up multi-I/O extension board
options COM_MULTIPORT
device sio1 at isa? port "IO_COM2" flags 0x205
device sio2 at isa? port "IO_COM3" flags 0x205 irq 3Note that the flags setting for
sio1 and
sio2 is truly essential; refer to
&man.sio.4; for details. (Generally, the
2 in the "flags" attribute refers to
sio2 which holds the IRQ, and you
surely want a 5 low nibble.) With kernel
verbose mode turned on this should yield something similar
to this:sio0: irq maps: 0x1 0x11 0x1 0x1
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1: irq maps: 0x1 0x9 0x1 0x1
sio1 at 0x2f8-0x2ff flags 0x205 on isa
sio1: type 16550A (multiport)
sio2: irq maps: 0x1 0x9 0x1 0x1
sio2 at 0x3e8-0x3ef irq 3 flags 0x205 on isa
sio2: type 16550A (multiport master)Though /sys/i386/isa/sio.c is
somewhat cryptic with its use of the irq maps
array above, the basic idea is that you observe
0x1 in the first, third, and fourth
place. This means that the corresponding IRQ was set upon
output and cleared after, which is just what we would
expect. If your kernel does not display this behavior, most
likely there is something wrong with your wiring.Configuring the cy driverContributed by Alex Nash. 6 June
1996.The Cyclades multiport cards are based on the
cy driver instead of the usual
sio driver used by other multiport
cards. Configuration is a simple matter of:Add the cy device to your
kernel configuration (note that your irq and iomem
settings may differ).device cy0 at isa? irq 10 iomem 0xd4000 iosiz 0x2000Rebuild and install the new kernel.Make the device nodes by typing (the following
example assumes an 8-port board)You can omit this part if you are running FreeBSD 5.X
with &man.devfs.5;.:&prompt.root; cd /dev
&prompt.root; for i in 0 1 2 3 4 5 6 7;do ./MAKEDEV cuac$i ttyc$i;doneIf appropriate, add dialup entries to
/etc/ttys by duplicating serial
device (ttyd) entries and using
ttyc in place of
ttyd. For example:ttyc0 "/usr/libexec/getty std.38400" unknown on insecure
ttyc1 "/usr/libexec/getty std.38400" unknown on insecure
ttyc2 "/usr/libexec/getty std.38400" unknown on insecure
…
ttyc7 "/usr/libexec/getty std.38400" unknown on insecureReboot with the new kernel.Configuring the si driverContributed by &a.nsayer.email;. 25 March
1998.The Specialix SI/XIO and SX multiport cards use the
si driver. A single machine can have
up to 4 host cards. The following host cards are
supported:ISA SI/XIO host card (2 versions)EISA SI/XIO host cardPCI SI/XIO host cardISA SX host cardPCI SX host cardAlthough the SX and SI/XIO host cards look markedly
different, their functionality are basically the same. The
host cards do not use I/O locations, but instead require a 32K
chunk of memory. The factory configuration for ISA cards
places this at 0xd0000-0xd7fff. They also
require an IRQ. PCI cards will, of course, auto-configure
themselves.You can attach up to 4 external modules to each host
card. The external modules contain either 4 or 8 serial
ports. They come in the following varieties:SI 4 or 8 port modules. Up to 57600 bps on each port
supported.XIO 8 port modules. Up to 115200 bps on each port
supported. One type of XIO module has 7 serial and 1 parallel
port.SXDC 8 port modules. Up to 921600 bps on each port
supported. Like XIO, a module is available with one parallel
port as well.To configure an ISA host card, add the following line to
your kernel configuration file, changing the numbers as
appropriate:device si0 at isa? iomem 0xd0000 irq 11Valid IRQ numbers are 9, 10, 11, 12 and 15 for SX ISA host
cards and 11, 12 and 15 for SI/XIO ISA host cards.To configure an EISA or PCI host card, use this line:device si0After adding the configuration entry, rebuild and
install your new kernel.The following step, is not necessary if you are using
&man.devfs.5; in FreeBSD 5.X.After rebooting with the new kernel, you need to make the
device nodes in /dev. The MAKEDEV script
will take care of this for you. Count how many total ports
you have and type:&prompt.root; cd /dev
&prompt.root; ./MAKEDEV ttyAnn cuaAnn(where nn is the number of
ports)If you want login prompts to appear on these ports, you
will need to add lines like this to
/etc/ttys:ttyA01 "/usr/libexec/getty std.9600" vt100 on insecureChange the terminal type as appropriate. For modems,
dialup or
unknown is fine.
diff --git a/en_US.ISO8859-1/articles/solid-state/article.xml b/en_US.ISO8859-1/articles/solid-state/article.xml
index 232a5d59f4..8de18f207c 100644
--- a/en_US.ISO8859-1/articles/solid-state/article.xml
+++ b/en_US.ISO8859-1/articles/solid-state/article.xml
@@ -1,498 +1,498 @@
&os; and Solid State DevicesJohnKozubikjohn@kozubik.com20012009The FreeBSD Documentation Project
&tm-attrib.freebsd;
&tm-attrib.general;
&legalnotice;
$FreeBSD$$FreeBSD$This article covers the use of solid state disk devices in
&os; to create embedded systems.Embedded systems have the advantage of increased stability
due to the lack of integral moving parts (hard drives).
Account must be taken, however, for the generally low disk
space available in the system and the durability of the
storage medium.Specific topics to be covered include the types and
attributes of solid state media suitable for disk use in &os;,
kernel options that are of interest in such an environment,
the rc.initdiskless mechanisms that
automate the initialization of such systems and the need for
read-only filesystems, and building filesystems from scratch.
The article will conclude with some general strategies for
small and read-only &os; environments.Solid State Disk DevicesThe scope of this article will be limited to solid state
disk devices made from flash memory. Flash memory is a solid
state memory (no moving parts) that is non-volatile (the memory
maintains data even after all power sources have been
disconnected). Flash memory can withstand tremendous physical
shock and is reasonably fast (the flash memory solutions covered
in this article are slightly slower than a EIDE hard disk for
write operations, and much faster for read operations). One
very important aspect of flash memory, the ramifications of
which will be discussed later in this article, is that each
sector has a limited rewrite capacity. You can only write,
erase, and write again to a sector of flash memory a certain
number of times before the sector becomes permanently unusable.
Although many flash memory products automatically map bad
blocks, and although some even distribute write operations
evenly throughout the unit, the fact remains that there exists a
limit to the amount of writing that can be done to the device.
Competitive units have between 1,000,000 and 10,000,000 writes
per sector in their specification. This figure varies due to
the temperature of the environment.Specifically, we will be discussing ATA compatible
compact-flash units, which are quite popular as storage media
for digital cameras. Of particular interest is the fact that
they pin out directly to the IDE bus and are compatible with the
ATA command set. Therefore, with a very simple and low-cost
adaptor, these devices can be attached directly to an IDE bus in
a computer. Once implemented in this manner, operating systems
such as &os; see the device as a normal hard disk (albeit
small).Other solid state disk solutions do exist, but their
expense, obscurity, and relative unease of use places them
beyond the scope of this article.Kernel OptionsA few kernel options are of specific interest to those
creating an embedded &os; system.All embedded &os; systems that use flash memory as system
disk will be interested in memory disks and memory filesystems.
- Because of the limited number of writes that can be done to
+ As a result of the limited number of writes that can be done to
flash memory, the disk and the filesystems on the disk will most
likely be mounted read-only. In this environment, filesystems
such as /tmp and /var
are mounted as memory filesystems to allow the system to create
logs and update counters and temporary files. Memory
filesystems are a critical component to a successful solid state
&os; implementation.You should make sure the following lines exist in your
kernel configuration file:options MFS # Memory Filesystem
options MD_ROOT # md device usable as a potential root device
pseudo-device md # memory diskThe rc Subsystem and Read-Only
FilesystemsThe post-boot initialization of an embedded &os; system is
controlled by /etc/rc.initdiskless./etc/rc.d/var mounts
/var as a memory filesystem, makes a
configurable list of directories in /var
with the &man.mkdir.1; command, and changes modes on some of
those directories. In the execution of
/etc/rc.d/var, one other
rc.conf variable comes into play –
varsize. A /var
partition is created by /etc/rc.d/var based
on the value of this variable in
rc.conf:varsize=8192Remember that this value is in sectors by default.The fact that /var is a read-write
filesystem is an important distinction, as the
/ partition (and any other partitions you
may have on your flash media) should be mounted read-only.
Remember that in we detailed the
limitations of flash memory - specifically the limited write
capability. The importance of not mounting filesystems on flash
media read-write, and the importance of not using a swap file,
cannot be overstated. A swap file on a busy system can burn
through a piece of flash media in less than one year. Heavy
logging or temporary file creation and destruction can do the
same. Therefore, in addition to removing the
swap entry from your
/etc/fstab, you should also change the
Options field for each filesystem to ro as
follows:# Device Mountpoint FStype Options Dump Pass#
/dev/ad0s1a / ufs ro 1 1A few applications in the average system will immediately
begin to fail as a result of this change. For instance, cron
will not run properly as a result of missing cron tabs in the
/var created by
/etc/rc.d/var, and syslog and dhcp will
encounter problems as well as a result of the read-only
filesystem and missing items in the /var
that /etc/rc.d/var has created. These are
only temporary problems though, and are addressed, along with
solutions to the execution of other common software packages in
.An important thing to remember is that a filesystem that was
mounted read-only with /etc/fstab can be
made read-write at any time by issuing the command:&prompt.root; /sbin/mount -uw partitionand can be toggled back to read-only with the
command:&prompt.root; /sbin/mount -ur partitionBuilding a File System from Scratch
- Because ATA compatible compact-flash cards are seen by &os;
+ Since ATA compatible compact-flash cards are seen by &os;
as normal IDE hard drives, you could theoretically install &os;
from the network using the kern and mfsroot floppies or from a
CD.However, even a small installation of &os; using normal
installation procedures can produce a system in size of greater
- than 200 megabytes. Because most people will be using smaller
+ than 200 megabytes. Most people will be using smaller
flash memory devices (128 megabytes is considered fairly large -
- 32 or even 16 megabytes is common) an installation using normal
+ 32 or even 16 megabytes is common), so an installation using normal
mechanisms is not possible—there is simply not enough disk
space for even the smallest of conventional
installations.The easiest way to overcome this space limitation is to
install &os; using conventional means to a normal hard disk.
After the installation is complete, pare down the operating
system to a size that will fit onto your flash media, then tar
the entire filesystem. The following steps will guide you
through the process of preparing a piece of flash memory for
your tarred filesystem. Remember, because a normal installation
is not being performed, operations such as partitioning,
labeling, file-system creation, etc. need to be performed by
hand. In addition to the kern and mfsroot floppy disks, you
will also need to use the fixit floppy.Partitioning Your Flash Media DeviceAfter booting with the kern and mfsroot floppies, choose
custom from the installation menu. In
the custom installation menu, choose
partition. In the partition menu, you
should delete all existing partitions using
d. After deleting all existing
partitions, create a partition using c
and accept the default value for the size of the
partition. When asked for the type of the partition, make
sure the value is set to 165. Now write
this partition table to the disk by pressing
w (this is a hidden option on this
screen). If you are using an ATA compatible compact flash
card, you should choose the &os; Boot Manager. Now press
q to quit the partition menu. You
will be shown the boot manager menu once more - repeat the
choice you made earlier.Creating Filesystems on Your Flash Memory
DeviceExit the custom installation menu, and from the main
installation menu choose the fixit
option. After entering the fixit environment, enter the
following command:&prompt.root; disklabel -e /dev/ad0cAt this point you will have entered the vi editor under
the auspices of the disklabel command. Next, you need to
add an a: line at the end of the file.
This a: line should look like:a: 123456 0 4.2BSD 0 0Where 123456 is a number that
is exactly the same as the number in the existing
c: entry for size. Basically you are
duplicating the existing c: line as an
a: line, making sure that fstype is
4.2BSD. Save the file and exit.&prompt.root; disklabel -B -r /dev/ad0c
&prompt.root; newfs /dev/ad0aPlacing Your Filesystem on the Flash MediaMount the newly prepared flash media:&prompt.root; mount /dev/ad0a /flashBring this machine up on the network so we may transfer
our tar file and explode it onto our flash media filesystem.
One example of how to do this is:&prompt.root; ifconfig xl0 192.168.0.10 netmask 255.255.255.0
&prompt.root; route add default 192.168.0.1Now that the machine is on the network, transfer your
tar file. You may be faced with a bit of a dilemma at this
point - if your flash memory part is 128 megabytes, for
instance, and your tar file is larger than 64 megabytes, you
cannot have your tar file on the flash media at the same
time as you explode it - you will run out of
space. One solution to this problem, if you are using FTP,
is to untar the file while it is transferred over FTP. If
you perform your transfer in this manner, you will never
have the tar file and the tar contents on your disk at the
same time:ftp>get tarfile.tar "| tar xvf -"If your tarfile is gzipped, you can accomplish this as
well:ftp>get tarfile.tar "| zcat | tar xvf -"After the contents of your tarred filesystem are on your
flash memory filesystem, you can unmount the flash memory
and reboot:&prompt.root; cd /
&prompt.root; umount /flash
&prompt.root; exitAssuming that you configured your filesystem correctly
when it was built on the normal hard disk (with your
filesystems mounted read-only, and with the necessary
options compiled into the kernel) you should now be
successfully booting your &os; embedded system.System Strategies for Small and Read Only
EnvironmentsIn , it was pointed out that the
/var filesystem constructed by
/etc/rc.d/var and the presence of a
read-only root filesystem causes problems with many common
software packages used with &os;. In this article, suggestions
for successfully running cron, syslog, ports installations, and
the Apache web server will be provided.CronUpon boot, /var gets populated by
/etc/rc.d/var using the list from
/etc/mtree/BSD.var.dist, so the
cron, cron/tabs,
at, and a few other standard directories
get created.However, this does not solve the problem of maintaining
cron tabs across reboots. When the system reboots, the
/var filesystem that is in memory will
disappear and any cron tabs you may have had in it will also
disappear. Therefore, one solution would be to create cron
tabs for the users that need them, mount your
/ filesystem as read-write and copy those
cron tabs to somewhere safe, like
/etc/tabs, then add a line to the end of
/etc/rc.initdiskless that copies those
crontabs into /var/cron/tabs after that
directory has been created during system initialization. You
may also need to add a line that changes modes and permissions
on the directories you create and the files you copy with
/etc/rc.initdiskless.Syslogsyslog.conf specifies the locations
of certain log files that exist in
/var/log. These files are not created by
/etc/rc.d/var upon system initialization.
Therefore, somewhere in /etc/rc.d/var,
after the section that creates the directories in
/var, you will need to add something like
this:&prompt.root; touch /var/log/security /var/log/maillog /var/log/cron /var/log/messages
&prompt.root; chmod 0644 /var/log/*Ports InstallationBefore discussing the changes necessary to successfully
use the ports tree, a reminder is necessary regarding the
read-only nature of your filesystems on the flash media.
Since they are read-only, you will need to temporarily mount
them read-write using the mount syntax shown in . You should always remount those
filesystems read-only when you are done with any maintenance -
unnecessary writes to the flash media could considerably
shorten its lifespan.To make it possible to enter a ports directory and
successfully run makeinstall, we must create a packages
directory on a non-memory filesystem that will keep track of
- our packages across reboots. Because it is necessary to mount
+ our packages across reboots. As it is necessary to mount
your filesystems as read-write for the installation of a
package anyway, it is sensible to assume that an area on the
flash media can also be used for package information to be
written to.First, create a package database directory. This is
normally in /var/db/pkg, but we cannot
place it there as it will disappear every time the system is
booted.&prompt.root; mkdir /etc/pkgNow, add a line to /etc/rc.d/var that
links the /etc/pkg directory to
/var/db/pkg. An example:&prompt.root; ln -s /etc/pkg /var/db/pkgNow, any time that you mount your filesystems as
read-write and install a package, the makeinstall will work, and package
information will be written successfully to
/etc/pkg (because the filesystem will, at
that time, be mounted read-write) which will always be
available to the operating system as
/var/db/pkg.Apache Web ServerThe steps in this section are only necessary if Apache
is set up to write its pid or log information outside of
/var. By default, Apache keeps its pid
file in /var/run/httpd.pid and its log
files in /var/log.It is now assumed that Apache keeps its log files in a
directory
apache_log_dir
outside of /var. When this directory
lives on a read-only filesystem, Apache will not be able to
save any log files, and may have problems working. If so, it
is necessary to add a new directory to the list of directories
in /etc/rc.d/var to create in
/var, and to link
apache_log_dir
to /var/log/apache. It is also necessary
to set permissions and ownership on this new directory.First, add the directory log/apache to
the list of directories to be created in
/etc/rc.d/var.Second, add these commands to
/etc/rc.d/var after the directory
creation section:&prompt.root; chmod 0774 /var/log/apache
&prompt.root; chown nobody:nobody /var/log/apacheFinally, remove the existing
apache_log_dir
directory, and replace it with a link:&prompt.root; rm -rf apache_log_dir
&prompt.root; ln -s /var/log/apache apache_log_dir
diff --git a/en_US.ISO8859-1/articles/vm-design/article.xml b/en_US.ISO8859-1/articles/vm-design/article.xml
index 2cf7e001eb..79b56d296c 100644
--- a/en_US.ISO8859-1/articles/vm-design/article.xml
+++ b/en_US.ISO8859-1/articles/vm-design/article.xml
@@ -1,899 +1,899 @@
Design elements of the &os; VM systemMatthewDillondillon@apollo.backplane.com
&tm-attrib.freebsd;
&tm-attrib.linux;
&tm-attrib.microsoft;
&tm-attrib.opengroup;
&tm-attrib.general;
$FreeBSD$$FreeBSD$The title is really just a fancy way of saying that I am going to
attempt to describe the whole VM enchilada, hopefully in a way that
everyone can follow. For the last year I have concentrated on a number
of major kernel subsystems within &os;, with the VM and Swap
subsystems being the most interesting and NFS being a necessary
chore. I rewrote only small portions of the code. In the VM
arena the only major rewrite I have done is to the swap subsystem.
Most of my work was cleanup and maintenance, with only moderate code
rewriting and no major algorithmic adjustments within the VM
subsystem. The bulk of the VM subsystem's theoretical base remains
unchanged and a lot of the credit for the modernization effort in the
last few years belongs to John Dyson and David Greenman. Not being a
historian like Kirk I will not attempt to tag all the various features
with peoples names, since I will invariably get it wrong.This article was originally published in the January 2000 issue of
DaemonNews. This
version of the article may include updates from Matt and other authors
to reflect changes in &os;'s VM implementation.IntroductionBefore moving along to the actual design let's spend a little time
on the necessity of maintaining and modernizing any long-living
codebase. In the programming world, algorithms tend to be more
important than code and it is precisely due to BSD's academic roots that
a great deal of attention was paid to algorithm design from the
beginning. More attention paid to the design generally leads to a clean
and flexible codebase that can be fairly easily modified, extended, or
replaced over time. While BSD is considered an old
operating system by some people, those of us who work on it tend to view
it more as a mature codebase which has various components
modified, extended, or replaced with modern code. It has evolved, and
&os; is at the bleeding edge no matter how old some of the code might
be. This is an important distinction to make and one that is
unfortunately lost to many people. The biggest error a programmer can
make is to not learn from history, and this is precisely the error that
many other modern operating systems have made. &windowsnt; is the best example
of this, and the consequences have been dire. Linux also makes this
mistake to some degree—enough that we BSD folk can make small
jokes about it every once in a while, anyway. Linux's problem is simply
one of a lack of experience and history to compare ideas against, a
problem that is easily and rapidly being addressed by the Linux
community in the same way it has been addressed in the BSD
community—by continuous code development. The &windowsnt; folk, on the
other hand, repeatedly make the same mistakes solved by &unix; decades ago
and then spend years fixing them. Over and over again. They have a
severe case of not designed here and we are always
right because our marketing department says so. I have little
tolerance for anyone who cannot learn from history.Much of the apparent complexity of the &os; design, especially in
the VM/Swap subsystem, is a direct result of having to solve serious
performance issues that occur under various conditions. These issues
are not due to bad algorithmic design but instead rise from
environmental factors. In any direct comparison between platforms,
these issues become most apparent when system resources begin to get
stressed. As I describe &os;'s VM/Swap subsystem the reader should
always keep two points in mind:The most important aspect of performance design is what is
known as Optimizing the Critical Path. It is often
the case that performance optimizations add a little bloat to the
code in order to make the critical path perform better.A solid, generalized design outperforms a heavily-optimized
design over the long run. While a generalized design may end up
being slower than an heavily-optimized design when they are
first implemented, the generalized design tends to be easier to
adapt to changing conditions and the heavily-optimized design
winds up having to be thrown away.Any codebase that will survive and be maintainable for
years must therefore be designed properly from the beginning even if it
costs some performance. Twenty years ago people were still arguing that
programming in assembly was better than programming in a high-level
language because it produced code that was ten times as fast. Today,
the fallibility of that argument is obvious — as are
the parallels to algorithmic design and code generalization.VM ObjectsThe best way to begin describing the &os; VM system is to look at
it from the perspective of a user-level process. Each user process sees
a single, private, contiguous VM address space containing several types
of memory objects. These objects have various characteristics. Program
code and program data are effectively a single memory-mapped file (the
binary file being run), but program code is read-only while program data
is copy-on-write. Program BSS is just memory allocated and filled with
zeros on demand, called demand zero page fill. Arbitrary files can be
memory-mapped into the address space as well, which is how the shared
library mechanism works. Such mappings can require modifications to
remain private to the process making them. The fork system call adds an
entirely new dimension to the VM management problem on top of the
complexity already given.A program binary data page (which is a basic copy-on-write page)
illustrates the complexity. A program binary contains a preinitialized
data section which is initially mapped directly from the program file.
When a program is loaded into a process's VM space, this area is
initially memory-mapped and backed by the program binary itself,
allowing the VM system to free/reuse the page and later load it back in
from the binary. The moment a process modifies this data, however, the
VM system must make a private copy of the page for that process. Since
the private copy has been modified, the VM system may no longer free it,
because there is no longer any way to restore it later on.You will notice immediately that what was originally a simple file
mapping has become much more complex. Data may be modified on a
page-by-page basis whereas the file mapping encompasses many pages at
once. The complexity further increases when a process forks. When a
process forks, the result is two processes—each with their own
private address spaces, including any modifications made by the original
process prior to the call to fork(). It would be
silly for the VM system to make a complete copy of the data at the time
of the fork() because it is quite possible that at
least one of the two processes will only need to read from that page
from then on, allowing the original page to continue to be used. What
was a private page is made copy-on-write again, since each process
(parent and child) expects their own personal post-fork modifications to
remain private to themselves and not effect the other.&os; manages all of this with a layered VM Object model. The
original binary program file winds up being the lowest VM Object layer.
A copy-on-write layer is pushed on top of that to hold those pages which
had to be copied from the original file. If the program modifies a data
page belonging to the original file the VM system takes a fault and
makes a copy of the page in the higher layer. When a process forks,
additional VM Object layers are pushed on. This might make a little
more sense with a fairly basic example. A fork()
is a common operation for any *BSD system, so this example will consider
a program that starts up, and forks. When the process starts, the VM
system creates an object layer, let's call this A:+---------------+
| A |
+---------------+A pictureA represents the file—pages may be paged in and out of the
file's physical media as necessary. Paging in from the disk is
reasonable for a program, but we really do not want to page back out and
overwrite the executable. The VM system therefore creates a second
layer, B, that will be physically backed by swap space:+---------------+
| B |
+---------------+
| A |
+---------------+On the first write to a page after this, a new page is created in B,
and its contents are initialized from A. All pages in B can be paged in
or out to a swap device. When the program forks, the VM system creates
two new object layers—C1 for the parent, and C2 for the
child—that rest on top of B:+-------+-------+
| C1 | C2 |
+-------+-------+
| B |
+---------------+
| A |
+---------------+In this case, let's say a page in B is modified by the original
parent process. The process will take a copy-on-write fault and
duplicate the page in C1, leaving the original page in B untouched.
Now, let's say the same page in B is modified by the child process. The
process will take a copy-on-write fault and duplicate the page in C2.
The original page in B is now completely hidden since both C1 and C2
have a copy and B could theoretically be destroyed if it does not
represent a real file; however, this sort of optimization is not
trivial to make because it is so fine-grained. &os; does not make
this optimization. Now, suppose (as is often the case) that the child
process does an exec(). Its current address space
is usually replaced by a new address space representing a new file. In
this case, the C2 layer is destroyed:+-------+
| C1 |
+-------+-------+
| B |
+---------------+
| A |
+---------------+In this case, the number of children of B drops to one, and all
accesses to B now go through C1. This means that B and C1 can be
collapsed together. Any pages in B that also exist in C1 are deleted
from B during the collapse. Thus, even though the optimization in the
previous step could not be made, we can recover the dead pages when
either of the processes exit or exec().This model creates a number of potential problems. The first is that
you can wind up with a relatively deep stack of layered VM Objects which
can cost scanning time and memory when you take a fault. Deep
layering can occur when processes fork and then fork again (either
parent or child). The second problem is that you can wind up with dead,
inaccessible pages deep in the stack of VM Objects. In our last example
if both the parent and child processes modify the same page, they both
get their own private copies of the page and the original page in B is
no longer accessible by anyone. That page in B can be freed.&os; solves the deep layering problem with a special optimization
called the All Shadowed Case. This case occurs if either
C1 or C2 take sufficient COW faults to completely shadow all pages in B.
Lets say that C1 achieves this. C1 can now bypass B entirely, so rather
then have C1->B->A and C2->B->A we now have C1->A and C2->B->A. But
look what also happened—now B has only one reference (C2), so we
can collapse B and C2 together. The end result is that B is deleted
entirely and we have C1->A and C2->A. It is often the case that B will
contain a large number of pages and neither C1 nor C2 will be able to
completely overshadow it. If we fork again and create a set of D
layers, however, it is much more likely that one of the D layers will
eventually be able to completely overshadow the much smaller dataset
represented by C1 or C2. The same optimization will work at any point in
the graph and the grand result of this is that even on a heavily forked
machine VM Object stacks tend to not get much deeper then 4. This is
true of both the parent and the children and true whether the parent is
doing the forking or whether the children cascade forks.The dead page problem still exists in the case where C1 or C2 do not
completely overshadow B. Due to our other optimizations this case does
not represent much of a problem and we simply allow the pages to be
dead. If the system runs low on memory it will swap them out, eating a
little swap, but that is it.The advantage to the VM Object model is that
fork() is extremely fast, since no real data
copying need take place. The disadvantage is that you can build a
relatively complex VM Object layering that slows page fault handling
down a little, and you spend memory managing the VM Object structures.
The optimizations &os; makes proves to reduce the problems enough
that they can be ignored, leaving no real disadvantage.SWAP LayersPrivate data pages are initially either copy-on-write or zero-fill
pages. When a change, and therefore a copy, is made, the original
backing object (usually a file) can no longer be used to save a copy of
the page when the VM system needs to reuse it for other purposes. This
is where SWAP comes in. SWAP is allocated to create backing store for
memory that does not otherwise have it. &os; allocates the swap
management structure for a VM Object only when it is actually needed.
However, the swap management structure has had problems
historically:Under &os; 3.X the swap management structure preallocates an
array that encompasses the entire object requiring swap backing
store—even if only a few pages of that object are
swap-backed. This creates a kernel memory fragmentation problem
when large objects are mapped, or processes with large runsizes
(RSS) fork.Also, in order to keep track of swap space, a list of
holes is kept in kernel memory, and this tends to get
severely fragmented as well. Since the list of
holes is a linear list, the swap allocation and freeing
performance is a non-optimal O(n)-per-page.It requires kernel memory allocations to take place during
the swap freeing process, and that creates low memory deadlock
problems.The problem is further exacerbated by holes created due to
the interleaving algorithm.Also, the swap block map can become fragmented fairly easily
resulting in non-contiguous allocations.Kernel memory must also be allocated on the fly for additional
swap management structures when a swapout occurs.It is evident from that list that there was plenty of room for
improvement. For &os; 4.X, I completely rewrote the swap
subsystem:Swap management structures are allocated through a hash
table rather than a linear array giving them a fixed allocation
size and much finer granularity.Rather then using a linearly linked list to keep track of
swap space reservations, it now uses a bitmap of swap blocks
arranged in a radix tree structure with free-space hinting in
the radix node structures. This effectively makes swap
allocation and freeing an O(1) operation.The entire radix tree bitmap is also preallocated in
order to avoid having to allocate kernel memory during critical
low memory swapping operations. After all, the system tends to
swap when it is low on memory so we should avoid allocating
kernel memory at such times in order to avoid potential
deadlocks.To reduce fragmentation the radix tree is capable
of allocating large contiguous chunks at once, skipping over
smaller fragmented chunks.I did not take the final step of having an
allocating hint pointer that would trundle
through a portion of swap as allocations were made in order to further
guarantee contiguous allocations or at least locality of reference, but
I ensured that such an addition could be made.When to free a pageSince the VM system uses all available memory for disk caching,
there are usually very few truly-free pages. The VM system depends on
being able to properly choose pages which are not in use to reuse for
new allocations. Selecting the optimal pages to free is possibly the
single-most important function any VM system can perform because if it
makes a poor selection, the VM system may be forced to unnecessarily
retrieve pages from disk, seriously degrading system performance.How much overhead are we willing to suffer in the critical path to
avoid freeing the wrong page? Each wrong choice we make will cost us
hundreds of thousands of CPU cycles and a noticeable stall of the
affected processes, so we are willing to endure a significant amount of
overhead in order to be sure that the right page is chosen. This is why
&os; tends to outperform other systems when memory resources become
stressed.The free page determination algorithm is built upon a history of the
use of memory pages. To acquire this history, the system takes advantage
of a page-used bit feature that most hardware page tables have.In any case, the page-used bit is cleared and at some later point
the VM system comes across the page again and sees that the page-used
bit has been set. This indicates that the page is still being actively
used. If the bit is still clear it is an indication that the page is not
being actively used. By testing this bit periodically, a use history (in
the form of a counter) for the physical page is developed. When the VM
system later needs to free up some pages, checking this history becomes
the cornerstone of determining the best candidate page to reuse.What if the hardware has no page-used bit?For those platforms that do not have this feature, the system
actually emulates a page-used bit. It unmaps or protects a page,
forcing a page fault if the page is accessed again. When the page
fault is taken, the system simply marks the page as having been used
and unprotects the page so that it may be used. While taking such page
faults just to determine if a page is being used appears to be an
expensive proposition, it is much less expensive than reusing the page
for some other purpose only to find that a process needs it back and
then have to go to disk.&os; makes use of several page queues to further refine the
selection of pages to reuse as well as to determine when dirty pages
must be flushed to their backing store. Since page tables are dynamic
entities under &os;, it costs virtually nothing to unmap a page from
the address space of any processes using it. When a page candidate has
been chosen based on the page-use counter, this is precisely what is
done. The system must make a distinction between clean pages which can
theoretically be freed up at any time, and dirty pages which must first
be written to their backing store before being reusable. When a page
candidate has been found it is moved to the inactive queue if it is
dirty, or the cache queue if it is clean. A separate algorithm based on
the dirty-to-clean page ratio determines when dirty pages in the
inactive queue must be flushed to disk. Once this is accomplished, the
flushed pages are moved from the inactive queue to the cache queue. At
this point, pages in the cache queue can still be reactivated by a VM
fault at relatively low cost. However, pages in the cache queue are
considered to be immediately freeable and will be reused
in an LRU (least-recently used) fashion when the system needs to
allocate new memory.It is important to note that the &os; VM system attempts to
separate clean and dirty pages for the express reason of avoiding
unnecessary flushes of dirty pages (which eats I/O bandwidth), nor does
it move pages between the various page queues gratuitously when the
memory subsystem is not being stressed. This is why you will see some
systems with very low cache queue counts and high active queue counts
when doing a systat -vm command. As the VM system
becomes more stressed, it makes a greater effort to maintain the various
page queues at the levels determined to be the most effective.An urban
myth has circulated for years that Linux did a better job avoiding
swapouts than &os;, but this in fact is not true. What was actually
occurring was that &os; was proactively paging out unused pages in
order to make room for more disk cache while Linux was keeping unused
pages in core and leaving less memory available for cache and process
pages. I do not know whether this is still true today.Pre-Faulting and Zeroing OptimizationsTaking a VM fault is not expensive if the underlying page is already
in core and can simply be mapped into the process, but it can become
expensive if you take a whole lot of them on a regular basis. A good
example of this is running a program such as &man.ls.1; or &man.ps.1;
over and over again. If the program binary is mapped into memory but
not mapped into the page table, then all the pages that will be accessed
by the program will have to be faulted in every time the program is run.
This is unnecessary when the pages in question are already in the VM
Cache, so &os; will attempt to pre-populate a process's page tables
with those pages that are already in the VM Cache. One thing that
&os; does not yet do is pre-copy-on-write certain pages on exec. For
example, if you run the &man.ls.1; program while running vmstat
1 you will notice that it always takes a certain number of
page faults, even when you run it over and over again. These are
zero-fill faults, not program code faults (which were pre-faulted in
already). Pre-copying pages on exec or fork is an area that could use
more study.A large percentage of page faults that occur are zero-fill faults.
You can usually see this by observing the vmstat -s
output. These occur when a process accesses pages in its BSS area. The
BSS area is expected to be initially zero but the VM system does not
bother to allocate any memory at all until the process actually accesses
it. When a fault occurs the VM system must not only allocate a new page,
it must zero it as well. To optimize the zeroing operation the VM system
has the ability to pre-zero pages and mark them as such, and to request
pre-zeroed pages when zero-fill faults occur. The pre-zeroing occurs
whenever the CPU is idle but the number of pages the system pre-zeros is
limited in order to avoid blowing away the memory caches. This is an
excellent example of adding complexity to the VM system in order to
optimize the critical path.Page Table OptimizationsThe page table optimizations make up the most contentious part of
the &os; VM design and they have shown some strain with the advent of
serious use of mmap(). I think this is actually a
feature of most BSDs though I am not sure when it was first introduced.
There are two major optimizations. The first is that hardware page
tables do not contain persistent state but instead can be thrown away at
any time with only a minor amount of management overhead. The second is
that every active page table entry in the system has a governing
pv_entry structure which is tied into the
vm_page structure. &os; can simply iterate
through those mappings that are known to exist while Linux must check
all page tables that might contain a specific
mapping to see if it does, which can achieve O(n^2) overhead in certain
situations. It is because of this that &os; tends to make better
choices on which pages to reuse or swap when memory is stressed, giving
it better performance under load. However, &os; requires kernel
tuning to accommodate large-shared-address-space situations such as
those that can occur in a news system because it may run out of
pv_entry structures.Both Linux and &os; need work in this area. &os; is trying to
maximize the advantage of a potentially sparse active-mapping model (not
all processes need to map all pages of a shared library, for example),
whereas Linux is trying to simplify its algorithms. &os; generally
has the performance advantage here at the cost of wasting a little extra
memory, but &os; breaks down in the case where a large file is
massively shared across hundreds of processes. Linux, on the other hand,
breaks down in the case where many processes are sparsely-mapping the
same shared library and also runs non-optimally when trying to determine
whether a page can be reused or not.Page ColoringWe will end with the page coloring optimizations. Page coloring is a
performance optimization designed to ensure that accesses to contiguous
pages in virtual memory make the best use of the processor cache. In
ancient times (i.e. 10+ years ago) processor caches tended to map
virtual memory rather than physical memory. This led to a huge number of
problems including having to clear the cache on every context switch in
some cases, and problems with data aliasing in the cache. Modern
processor caches map physical memory precisely to solve those problems.
This means that two side-by-side pages in a processes address space may
not correspond to two side-by-side pages in the cache. In fact, if you
are not careful side-by-side pages in virtual memory could wind up using
the same page in the processor cache—leading to cacheable data
being thrown away prematurely and reducing CPU performance. This is true
even with multi-way set-associative caches (though the effect is
mitigated somewhat).&os;'s memory allocation code implements page coloring
optimizations, which means that the memory allocation code will attempt
to locate free pages that are contiguous from the point of view of the
cache. For example, if page 16 of physical memory is assigned to page 0
of a process's virtual memory and the cache can hold 4 pages, the page
coloring code will not assign page 20 of physical memory to page 1 of a
process's virtual memory. It would, instead, assign page 21 of physical
memory. The page coloring code attempts to avoid assigning page 20
because this maps over the same cache memory as page 16 and would result
in non-optimal caching. This code adds a significant amount of
complexity to the VM memory allocation subsystem as you can well
imagine, but the result is well worth the effort. Page Coloring makes VM
memory as deterministic as physical memory in regards to cache
performance.ConclusionVirtual memory in modern operating systems must address a number of
different issues efficiently and for many different usage patterns. The
modular and algorithmic approach that BSD has historically taken allows
us to study and understand the current implementation as well as
relatively cleanly replace large sections of the code. There have been a
number of improvements to the &os; VM system in the last several
years, and work is ongoing.Bonus QA session by Allen Briggs
briggs@ninthwonder.comWhat is the interleaving algorithm that you
refer to in your listing of the ills of the &os; 3.X swap
arrangements?&os; uses a fixed swap interleave which defaults to 4. This
means that &os; reserves space for four swap areas even if you
only have one, two, or three. Since swap is interleaved the linear
address space representing the four swap areas will be
fragmented if you do not actually have four swap areas. For
example, if you have two swap areas A and B &os;'s address
space representation for that swap area will be interleaved in
blocks of 16 pages:A B C D A B C D A B C D A B C D&os; 3.X uses a sequential list of free
regions approach to accounting for the free swap areas.
The idea is that large blocks of free linear space can be
represented with a single list node
(kern/subr_rlist.c). But due to the
fragmentation the sequential list winds up being insanely
fragmented. In the above example, completely unused swap will
have A and B shown as free and C and D shown as
all allocated. Each A-B sequence requires a list
node to account for because C and D are holes, so the list node
cannot be combined with the next A-B sequence.Why do we interleave our swap space instead of just tack swap
- areas onto the end and do something fancier? Because it is a whole
+ areas onto the end and do something fancier? It is a whole
lot easier to allocate linear swaths of an address space and have
the result automatically be interleaved across multiple disks than
it is to try to put that sophistication elsewhere.The fragmentation causes other problems. Being a linear list
under 3.X, and having such a huge amount of inherent
fragmentation, allocating and freeing swap winds up being an O(N)
algorithm instead of an O(1) algorithm. Combined with other
factors (heavy swapping) and you start getting into O(N^2) and
O(N^3) levels of overhead, which is bad. The 3.X system may also
need to allocate KVM during a swap operation to create a new list
node which can lead to a deadlock if the system is trying to
pageout pages in a low-memory situation.Under 4.X we do not use a sequential list. Instead we use a
radix tree and bitmaps of swap blocks rather than ranged list
nodes. We take the hit of preallocating all the bitmaps required
for the entire swap area up front but it winds up wasting less
memory due to the use of a bitmap (one bit per block) instead of a
linked list of nodes. The use of a radix tree instead of a
sequential list gives us nearly O(1) performance no matter how
fragmented the tree becomes.How is the separation of clean and dirty (inactive) pages
related to the situation where you see low cache queue counts and
high active queue counts in systat -vm? Do the
systat stats roll the active and dirty pages together for the
active queue count?I do not get the following:
It is important to note that the &os; VM system attempts
to separate clean and dirty pages for the express reason of
avoiding unnecessary flushes of dirty pages (which eats I/O
bandwidth), nor does it move pages between the various page
queues gratuitously when the memory subsystem is not being
stressed. This is why you will see some systems with very low
cache queue counts and high active queue counts when doing a
systat -vm command.
Yes, that is confusing. The relationship is
goal verses reality. Our goal is to
separate the pages but the reality is that if we are not in a
memory crunch, we do not really have to.What this means is that &os; will not try very hard to
separate out dirty pages (inactive queue) from clean pages (cache
queue) when the system is not being stressed, nor will it try to
deactivate pages (active queue -> inactive queue) when the system
is not being stressed, even if they are not being used. In the &man.ls.1; / vmstat 1 example,
would not some of the page faults be data page faults (COW from
executable file to private page)? I.e., I would expect the page
faults to be some zero-fill and some program data. Or are you
implying that &os; does do pre-COW for the program data?A COW fault can be either zero-fill or program-data. The
mechanism is the same either way because the backing program-data
is almost certainly already in the cache. I am indeed lumping the
two together. &os; does not pre-COW program data or zero-fill,
but it does pre-map pages that exist in its
cache.In your section on page table optimizations, can you give a
little more detail about pv_entry and
vm_page (or should vm_page be
vm_pmap—as in 4.4, cf. pp. 180-181 of
McKusick, Bostic, Karel, Quarterman)? Specifically, what kind of
operation/reaction would require scanning the mappings?How does Linux do in the case where &os; breaks down
(sharing a large file mapping over many processes)?A vm_page represents an (object,index#)
tuple. A pv_entry represents a hardware page
table entry (pte). If you have five processes sharing the same
physical page, and three of those processes's page tables actually
map the page, that page will be represented by a single
vm_page structure and three
pv_entry structures.pv_entry structures only represent pages
mapped by the MMU (one pv_entry represents one
pte). This means that when we need to remove all hardware
references to a vm_page (in order to reuse the
page for something else, page it out, clear it, dirty it, and so
forth) we can simply scan the linked list of
pv_entry's associated with that
vm_page to remove or modify the pte's from
their page tables.Under Linux there is no such linked list. In order to remove
all the hardware page table mappings for a
vm_page linux must index into every VM object
that might have mapped the page. For
example, if you have 50 processes all mapping the same shared
library and want to get rid of page X in that library, you need to
index into the page table for each of those 50 processes even if
only 10 of them have actually mapped the page. So Linux is
trading off the simplicity of its design against performance.
Many VM algorithms which are O(1) or (small N) under &os; wind
up being O(N), O(N^2), or worse under Linux. Since the pte's
representing a particular page in an object tend to be at the same
offset in all the page tables they are mapped in, reducing the
number of accesses into the page tables at the same pte offset
will often avoid blowing away the L1 cache line for that offset,
which can lead to better performance.&os; has added complexity (the pv_entry
scheme) in order to increase performance (to limit page table
accesses to only those pte's that need to be
modified).But &os; has a scaling problem that Linux does not in that
there are a limited number of pv_entry
structures and this causes problems when you have massive sharing
of data. In this case you may run out of
pv_entry structures even though there is plenty
of free memory available. This can be fixed easily enough by
bumping up the number of pv_entry structures in
the kernel config, but we really need to find a better way to do
it.In regards to the memory overhead of a page table verses the
pv_entry scheme: Linux uses
permanent page tables that are not throw away, but
does not need a pv_entry for each potentially
mapped pte. &os; uses throw away page tables but
adds in a pv_entry structure for each
actually-mapped pte. I think memory utilization winds up being
about the same, giving &os; an algorithmic advantage with its
ability to throw away page tables at will with very low
overhead.Finally, in the page coloring section, it might help to have a
little more description of what you mean here. I did not quite
follow it.Do you know how an L1 hardware memory cache works? I will
explain: Consider a machine with 16MB of main memory but only 128K
of L1 cache. Generally the way this cache works is that each 128K
block of main memory uses the same 128K of
cache. If you access offset 0 in main memory and then offset
128K in main memory you can wind up throwing away the
cached data you read from offset 0!Now, I am simplifying things greatly. What I just described
is what is called a direct mapped hardware memory
cache. Most modern caches are what are called
2-way-set-associative or 4-way-set-associative caches. The
set-associatively allows you to access up to N different memory
regions that overlap the same cache memory without destroying the
previously cached data. But only N.So if I have a 4-way set associative cache I can access offset
0, offset 128K, 256K and offset 384K and still be able to access
offset 0 again and have it come from the L1 cache. If I then
access offset 512K, however, one of the four previously cached
data objects will be thrown away by the cache.It is extremely important…
extremely important for most of a processor's
memory accesses to be able to come from the L1 cache, because the
L1 cache operates at the processor frequency. The moment you have
an L1 cache miss and have to go to the L2 cache or to main memory,
the processor will stall and potentially sit twiddling its fingers
for hundreds of instructions worth of time
waiting for a read from main memory to complete. Main memory (the
dynamic ram you stuff into a computer) is
slow, when compared to the speed of a modern
processor core.Ok, so now onto page coloring: All modern memory caches are
what are known as physical caches. They
cache physical memory addresses, not virtual memory addresses.
This allows the cache to be left alone across a process context
switch, which is very important.But in the &unix; world you are dealing with virtual address
spaces, not physical address spaces. Any program you write will
see the virtual address space given to it. The actual
physical pages underlying that virtual
address space are not necessarily physically contiguous! In fact,
you might have two pages that are side by side in a processes
address space which wind up being at offset 0 and offset 128K in
physical memory.A program normally assumes that two side-by-side pages will be
optimally cached. That is, that you can access data objects in
both pages without having them blow away each other's cache entry.
But this is only true if the physical pages underlying the virtual
address space are contiguous (insofar as the cache is
concerned).This is what Page coloring does. Instead of assigning
random physical pages to virtual addresses,
which may result in non-optimal cache performance, Page coloring
assigns reasonably-contiguous physical pages
to virtual addresses. Thus programs can be written under the
assumption that the characteristics of the underlying hardware
cache are the same for their virtual address space as they would
be if the program had been run directly in a physical address
space.Note that I say reasonably contiguous rather
than simply contiguous. From the point of view of a
128K direct mapped cache, the physical address 0 is the same as
the physical address 128K. So two side-by-side pages in your
virtual address space may wind up being offset 128K and offset
132K in physical memory, but could also easily be offset 128K and
offset 4K in physical memory and still retain the same cache
performance characteristics. So page-coloring does
not have to assign truly contiguous pages of
physical memory to contiguous pages of virtual memory, it just
needs to make sure it assigns contiguous pages from the point of
view of cache performance and operation.
diff --git a/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml b/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml
index 65f8c6cfd0..798b7bc6d9 100644
--- a/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml
+++ b/en_US.ISO8859-1/books/arch-handbook/boot/chapter.xml
@@ -1,2396 +1,2396 @@
Bootstrapping and Kernel InitializationSergeyLyubkaContributed by Sergio Andrés Gómez del RealUpdated and enhanced by SynopsisBIOSfirmwarePOSTIA-32bootingsystem initializationThis chapter is an overview of the boot and system
initialization processes, starting from the
BIOS (firmware) POST, to
the first user process creation. Since the initial
steps of system startup are very architecture dependent, the
IA-32 architecture is used as an example.The &os; boot process can be surprisingly complex. After
control is passed from the BIOS, a
considerable amount of low-level configuration must be done
before the kernel can be loaded and executed. This setup must
be done in a simple and flexible manner, allowing the user a
great deal of customization possibilities.OverviewThe boot process is an extremely machine-dependent
activity. Not only must code be written for every computer
architecture, but there may also be multiple types of booting on
the same architecture. For example, a directory listing of
/usr/src/sys/boot
reveals a great amount of architecture-dependent code. There is
a directory for each of the various supported architectures. In
the x86-specific i386
directory, there are subdirectories for different boot standards
like mbr (Master Boot Record),
gpt (GUID Partition
Table), and efi (Extensible Firmware
Interface). Each boot standard has its own conventions and data
structures. The example that follows shows booting an x86
computer from an MBR hard drive with the &os;
boot0 multi-boot loader stored in the very
first sector. That boot code starts the &os; three-stage boot
process.The key to understanding this process is that it is a series
of stages of increasing complexity. These stages are
boot1, boot2, and
loader (see &man.boot.8; for more detail).
The boot system executes each stage in sequence. The last
stage, loader, is responsible for loading
the &os; kernel. Each stage is examined in the following
sections.Here is an example of the output generated by the
different boot stages. Actual output
may differ from machine to machine:&os; ComponentOutput (may vary)boot0F1 FreeBSD
F2 BSD
F5 Disk 2boot2This prompt will appear if the user
presses a key just after selecting an OS to boot at
the boot0
stage.>>FreeBSD/i386 BOOT
Default: 1:ad(1,a)/boot/loader
boot:loaderBTX loader 1.00 BTX version is 1.02
Consoles: internal video/keyboard
BIOS drive C: is disk0
BIOS 639kB/2096064kB available memory
FreeBSD/x86 bootstrap loader, Revision 1.1
Console internal video/keyboard
(root@snap.freebsd.org, Thu Jan 16 22:18:05 UTC 2014)
Loading /boot/defaults/loader.conf
/boot/kernel/kernel text=0xed9008 data=0x117d28+0x176650 syms=[0x8+0x137988+0x8+0x1515f8]kernelCopyright (c) 1992-2013 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014
root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610The BIOSWhen the computer powers on, the processor's registers are
set to some predefined values. One of the registers is the
instruction pointer register, and its value
after a power on is well defined: it is a 32-bit value of
0xfffffff0. The instruction pointer register
(also known as the Program Counter) points to code to be
executed by the processor. Another important register is the
cr0 32-bit control register, and its value
just after a reboot is 0. One of
cr0's bits, the PE (Protection Enabled) bit,
indicates whether the processor is running in 32-bit protected
mode or 16-bit real mode. Since this bit is cleared at boot
time, the processor boots in 16-bit real mode. Real mode means,
among other things, that linear and physical addresses are
identical. The reason for the processor not to start
immediately in 32-bit protected mode is backwards compatibility.
In particular, the boot process relies on the services provided
by the BIOS, and the BIOS
itself works in legacy, 16-bit code.The value of 0xfffffff0 is slightly less
than 4 GB, so unless the machine has 4 GB of physical
memory, it cannot point to a valid memory address. The
computer's hardware translates this address so that it points to
a BIOS memory block.The BIOS (Basic Input Output
System) is a chip on the motherboard that has a relatively small
amount of read-only memory (ROM). This
memory contains various low-level routines that are specific to
the hardware supplied with the motherboard. The processor will
first jump to the address 0xfffffff0, which really resides in
the BIOS's memory. Usually this address
contains a jump instruction to the BIOS's
POST routines.The POST (Power On Self Test)
is a set of routines including the memory check, system bus
check, and other low-level initialization so the
CPU can set up the computer properly. The
important step of this stage is determining the boot device.
Modern BIOS implementations permit the
selection of a boot device, allowing booting from a floppy,
CD-ROM, hard disk, or other devices.The very last thing in the POST is the
INT 0x19 instruction. The
INT 0x19 handler reads 512 bytes from the
first sector of boot device into the memory at address
0x7c00. The term
first sector originates from hard drive
architecture, where the magnetic plate is divided into a number
of cylindrical tracks. Tracks are numbered, and every track is
divided into a number (usually 64) of sectors. Track numbers
start at 0, but sector numbers start from 1. Track 0 is the
outermost on the magnetic plate, and sector 1, the first sector,
has a special purpose. It is also called the
MBR, or Master Boot Record. The remaining
sectors on the first track are never used.This sector is our boot-sequence starting point. As we will
see, this sector contains a copy of our
boot0 program. A jump is made by the
BIOS to address 0x7c00 so
it starts executing.The Master Boot Record (boot0)MBRAfter control is received from the BIOS
at memory address 0x7c00,
boot0 starts executing. It is the first
piece of code under &os; control. The task of
boot0 is quite simple: scan the partition
table and let the user choose which partition to boot from. The
Partition Table is a special, standard data structure embedded
in the MBR (hence embedded in
boot0) describing the four standard PC
partitions.
boot0 resides in the filesystem as
/boot/boot0. It is a small 512-byte file,
and it is exactly what &os;'s installation procedure wrote to
the hard disk's MBR if you chose the
bootmanager option at installation time. Indeed,
boot0is the
MBR.As mentioned previously, the INT 0x19
instruction causes the INT 0x19 handler to
load an MBR (boot0) into
memory at address 0x7c00. The source file
for boot0 can be found in
sys/boot/i386/boot0/boot0.S - which is an
awesome piece of code written by Robert Nordier.A special structure starting from offset
0x1be in the MBR is called
the partition table. It has four records
of 16 bytes each, called partition records,
which represent how the hard disk is partitioned, or, in &os;'s
terminology, sliced. One byte of those 16 says whether a
partition (slice) is bootable or not. Exactly one record must
have that flag set, otherwise boot0's code
will refuse to proceed.A partition record has the following fields:the 1-byte filesystem typethe 1-byte bootable flagthe 6 byte descriptor in CHS formatthe 8 byte descriptor in LBA formatA partition record descriptor contains information about
where exactly the partition resides on the drive. Both
descriptors, LBA and CHS,
describe the same information, but in different ways:
LBA (Logical Block Addressing) has the
starting sector for the partition and the partition's length,
while CHS (Cylinder Head Sector) has
coordinates for the first and last sectors of the partition.
The partition table ends with the special signature
0xaa55.The MBR must fit into 512 bytes, a single
disk sector. This program uses low-level tricks
like taking advantage of the side effects of certain
instructions and reusing register values from previous
operations to make the most out of the fewest possible
instructions. Care must also be taken when handling the
partition table, which is embedded in the MBR
itself. For these reasons, be very careful when modifying
boot0.S.Note that the boot0.S source file
is assembled as is: instructions are translated
one by one to binary, with no additional information (no
ELF file format, for example). This kind of
low-level control is achieved at link time through special
control flags passed to the linker. For example, the text
section of the program is set to be located at address
0x600. In practice this means that
boot0 must be loaded to memory address
0x600 in order to function properly.It is worth looking at the Makefile for
boot0
(sys/boot/i386/boot0/Makefile), as it
defines some of the run-time behavior of
boot0. For instance, if a terminal
connected to the serial port (COM1) is used for I/O, the macro
SIO must be defined
(-DSIO). -DPXE enables
boot through PXE by pressing
F6. Additionally, the program defines a set of
flags that allow further modification of
its behavior. All of this is illustrated in the
Makefile. For example, look at the
linker directives which command the linker to start the text
section at address 0x600, and to build the
output file as is (strip out any file
formatting):Let us now start our study of the MBR, or
boot0, starting where execution
begins.Some modifications have been made to some instructions in
favor of better exposition. For example, some macros are
expanded, and some macro tests are omitted when the result of
the test is known. This applies to all of the code examples
shown.This first block of code is the entry point of the program.
It is where the BIOS transfers control.
First, it makes sure that the string operations autoincrement
its pointer operands (the cld instruction)
When in doubt, we refer the reader to the official Intel
manuals, which describe the exact semantics for each
instruction: ..
Then, as it makes no assumption about the state of the segment
registers, it initializes them. Finally, it sets the stack
pointer register (%sp) to address
0x7c00, so we have a working stack.The next block is responsible for the relocation and
subsequent jump to the relocated code.
- Because boot0 is loaded by the
+ As boot0 is loaded by the
BIOS to address 0x7C00, it
copies itself to address 0x600 and then
transfers control there (recall that it was linked to execute at
address 0x600). The source address,
0x7c00, is copied to register
%si. The destination address,
0x600, to register %di.
The number of bytes to copy, 512 (the
program's size), is copied to register %cx.
Next, the rep instruction repeats the
instruction that follows, that is, movsb, the
number of times dictated by the %cx register.
The movsb instruction copies the byte pointed
to by %si to the address pointed to by
%di. This is repeated another 511 times. On
each repetition, both the source and destination registers,
%si and %di, are
incremented by one. Thus, upon completion of the 512-byte copy,
%di has the value
0x600+512=
0x800, and %si has the
value 0x7c00+512=
0x7e00; we have thus completed the code
relocation.Next, the destination register
%di is copied to %bp.
%bp gets the value 0x800.
The value 16 is copied to
%cl in preparation for a new string operation
(like our previous movsb). Now,
stosb is executed 16 times. This instruction
copies a 0 value to the address pointed to by
the destination register (%di, which is
0x800), and increments it. This is repeated
another 15 times, so %di ends up with value
0x810. Effectively, this clears the address
range 0x800-0x80f. This
range is used as a (fake) partition table for writing the
MBR back to disk. Finally, the sector field
for the CHS addressing of this fake partition
is given the value 1 and a jump is made to the main function
from the relocated code. Note that until this jump to the
relocated code, any reference to an absolute address was
avoided.The following code block tests whether the drive number
provided by the BIOS should be used, or
the one stored in boot0.This code tests the SETDRV bit
(0x20) in the flags
variable. Recall that register %bp points to
address location 0x800, so the test is done
to the flags variable at address
0x800-69=
0x7bb. This is an example of the type of
modifications that can be done to boot0.
The SETDRV flag is not set by default, but it
can be set in the Makefile. When set, the
drive number stored in the MBR is used
instead of the one provided by the BIOS. We
assume the defaults, and that the BIOS
provided a valid drive number, so we jump to
save_curdrive.The next block saves the drive number provided by the
BIOS, and calls putn to
print a new line on the screen.Note that we assume TEST is not defined,
so the conditional code in it is not assembled and will not
appear in our executable boot0.Our next block implements the actual scanning of the
partition table. It prints to the screen the partition type for
each of the four entries in the partition table. It compares
each type with a list of well-known operating system file
systems. Examples of recognized partition types are
NTFS (&windows;, ID 0x7),
ext2fs (&linux;, ID 0x83), and, of course,
ffs/ufs2 (&os;, ID 0xa5).
The implementation is fairly simple.It is important to note that the active flag for each entry
is cleared, so after the scanning, no
partition entry is active in our memory copy of
boot0. Later, the active flag will be set
for the selected partition. This ensures that only one active
partition exists if the user chooses to write the changes back
to disk.The next block tests for other drives. At startup,
the BIOS writes the number of drives present
in the computer to address 0x475. If there
are any other drives present, boot0 prints
the current drive to screen. The user may command
boot0 to scan partitions on another drive
later.We make the assumption that a single drive is present, so
the jump to print_drive is not performed. We
also assume nothing strange happened, so we jump to
print_prompt.This next block just prints out a prompt followed by the
default option:Finally, a jump is performed to
start_input, where the
BIOS services are used to start a timer and
for reading user input from the keyboard; if the timer expires,
the default option will be selected:An interrupt is requested with number
0x1a and argument 0 in
register %ah. The BIOS
has a predefined set of services, requested by applications as
software-generated interrupts through the int
instruction and receiving arguments in registers (in this case,
%ah). Here, particularly, we are requesting
the number of clock ticks since last midnight; this value is
computed by the BIOS through the
RTC (Real Time Clock). This clock can be
programmed to work at frequencies ranging from 2 Hz to
8192 Hz. The BIOS sets it to
18.2 Hz at startup. When the request is satisfied, a
32-bit result is returned by the BIOS in
registers %cx and %dx
(lower bytes in %dx). This result (the
%dx part) is copied to register
%di, and the value of the
TICKS variable is added to
%di. This variable resides in
boot0 at offset _TICKS
(a negative value) from register %bp (which,
recall, points to 0x800). The default value
of this variable is 0xb6 (182 in decimal).
Now, the idea is that boot0 constantly
requests the time from the BIOS, and when the
value returned in register %dx is greater
than the value stored in %di, the time is up
and the default selection will be made. Since the RTC ticks
18.2 times per second, this condition will be met after 10
seconds (this default behavior can be changed in the
Makefile). Until this time has passed,
boot0 continually asks the
BIOS for any user input; this is done through
int 0x16, argument 1 in
%ah.Whether a key was pressed or the time expired, subsequent
code validates the selection. Based on the selection, the
register %si is set to point to the
appropriate partition entry in the partition table. This new
selection overrides the previous default one. Indeed, it
becomes the new default. Finally, the ACTIVE flag of the
selected partition is set. If it was enabled at compile time,
the in-memory version of boot0 with these
modified values is written back to the MBR on
disk. We leave the details of this implementation to the
reader.We now end our study with the last code block from the
boot0 program:Recall that %si points to the selected
partition entry. This entry tells us where the partition begins
on disk. We assume, of course, that the partition selected is
actually a &os; slice.From now on, we will favor the use of the technically
more accurate term slice rather than
partition.The transfer buffer is set to 0x7c00
(register %bx), and a read for the first
sector of the &os; slice is requested by calling
intx13. We assume that everything went okay,
so a jump to beep is not performed. In
particular, the new sector read must end with the magic sequence
0xaa55. Finally, the value at
%si (the pointer to the selected partition
table) is preserved for use by the next stage, and a jump is
performed to address 0x7c00, where execution
of our next stage (the just-read block) is started.boot1 StageSo far we have gone through the following sequence:The BIOS did some early hardware
initialization, including the POST. The
MBR (boot0) was
loaded from absolute disk sector one to address
0x7c00. Execution control was passed to
that location.boot0 relocated itself to the
location it was linked to execute
(0x600), followed by a jump to continue
execution at the appropriate place. Finally,
boot0 loaded the first disk sector from
the &os; slice to address 0x7c00.
Execution control was passed to that location.boot1 is the next step in the
boot-loading sequence. It is the first of three boot stages.
Note that we have been dealing exclusively
with disk sectors. Indeed, the BIOS loads
the absolute first sector, while boot0
loads the first sector of the &os; slice. Both loads are to
address 0x7c00. We can conceptually think of
these disk sectors as containing the files
boot0 and boot1,
respectively, but in reality this is not entirely true for
boot1. Strictly speaking, unlike
boot0, boot1 is not
part of the boot blocks
There is a file /boot/boot1, but it
is not the written to the beginning of the &os; slice.
Instead, it is concatenated with boot2
to form boot, which
is written to the beginning of the &os;
slice and read at boot time..
Instead, a single, full-blown file, boot
(/boot/boot), is what ultimately is
written to disk. This file is a combination of
boot1, boot2 and the
Boot Extender (or BTX).
This single file is greater in size than a single sector
(greater than 512 bytes). Fortunately,
boot1 occupies exactly
the first 512 bytes of this single file, so when
boot0 loads the first sector of the &os;
slice (512 bytes), it is actually loading
boot1 and transferring control to
it.The main task of boot1 is to load the
next boot stage. This next stage is somewhat more complex. It
is composed of a server called the Boot Extender,
or BTX, and a client, called
boot2. As we will see, the last boot
stage, loader, is also a client of the
BTX server.Let us now look in detail at what exactly is done by
boot1, starting like we did for
boot0, at its entry point:The entry point at start simply jumps
past a special data area to the label main,
which in turn looks like this:Just like boot0, this
code relocates boot1,
this time to memory address 0x700. However,
unlike boot0, it does not jump there.
boot1 is linked to execute at
address 0x7c00, effectively where it was
loaded in the first place. The reason for this relocation will
be discussed shortly.Next comes a loop that looks for the &os; slice. Although
boot0 loaded boot1
from the &os; slice, no information was passed to it about this
Actually we did pass a pointer to the slice entry in
register %si. However,
boot1 does not assume that it was
loaded by boot0 (perhaps some other
MBR loaded it, and did not pass this
information), so it assumes nothing.,
so boot1 must rescan the
partition table to find where the &os; slice starts. Therefore
it rereads the MBR:In the code above, register %dl
maintains information about the boot device. This is passed on
by the BIOS and preserved by the
MBR. Numbers 0x80 and
greater tells us that we are dealing with a hard drive, so a
call is made to nread, where the
MBR is read. Arguments to
nread are passed through
%si and %dh. The memory
address at label part4 is copied to
%si. This memory address holds a
fake partition to be used by
nread. The following is the data in the fake
partition:In particular, the LBA for this fake
partition is hardcoded to zero. This is used as an argument to
the BIOS for reading absolute sector one from
the hard drive. Alternatively, CHS addressing could be used.
In this case, the fake partition holds cylinder 0, head 0 and
sector 1, which is equivalent to absolute sector one.Let us now proceed to take a look at
nread:Recall that %si points to the fake
partition. The word
In the context of 16-bit real mode, a word is 2
bytes.
at offset 0x8 is copied to register
%ax and word at offset 0xa
to %cx. They are interpreted by the
BIOS as the lower 4-byte value denoting the
LBA to be read (the upper four bytes are assumed to be zero).
Register %bx holds the memory address where
the MBR will be loaded. The instruction
pushing %cs onto the stack is very
interesting. In this context, it accomplishes nothing.
However, as we will see shortly, boot2, in
conjunction with the BTX server, also uses
xread.1. This mechanism will be discussed in
the next section.The code at xread.1 further calls
the read function, which actually calls the
BIOS asking for the disk sector:Note the long return instruction at the end of this block.
This instruction pops out the %cs register
pushed by nread, and returns. Finally,
nread also returns.With the MBR loaded to memory, the actual
loop for searching the &os; slice begins:If a &os; slice is identified, execution continues at
main.5. Note that when a &os; slice is found
%si points to the appropriate entry in the
partition table, and %dh holds the partition
number. We assume that a &os; slice is found, so we continue
execution at main.5:Recall that at this point, register %si
points to the &os; slice entry in the MBR
partition table, so a call to nread will
effectively read sectors at the beginning of this partition.
The argument passed on register %dh tells
nread to read 16 disk sectors. Recall that
the first 512 bytes, or the first sector of the &os; slice,
coincides with the boot1 program. Also
recall that the file written to the beginning of the &os;
slice is not /boot/boot1, but
/boot/boot. Let us look at the size of
these files in the filesystem:-r--r--r-- 1 root wheel 512B Jan 8 00:15 /boot/boot0
-r--r--r-- 1 root wheel 512B Jan 8 00:15 /boot/boot1
-r--r--r-- 1 root wheel 7.5K Jan 8 00:15 /boot/boot2
-r--r--r-- 1 root wheel 8.0K Jan 8 00:15 /boot/bootBoth boot0 and
boot1 are 512 bytes each, so they fit
exactly in one disk sector.
boot2 is much bigger, holding both
the BTX server and the
boot2 client. Finally, a file called
simply boot is 512 bytes larger than
boot2. This file is a
concatenation of boot1 and
boot2. As already noted,
boot0 is the file written to the absolute
first disk sector (the MBR), and
boot is the file written to the first
sector of the &os; slice; boot1 and
boot2 are not written
to disk. The command used to concatenate
boot1 and boot2 into a
single boot is merely
cat boot1 boot2 > boot.So boot1 occupies exactly the first 512
bytes of boot and, because
boot is written to the first sector of the
&os; slice, boot1 fits exactly in this
- first sector. Because nread reads the first
+ first sector. When nread reads the first
16 sectors of the &os; slice, it effectively reads the entire
boot file
512*16=8192 bytes, exactly the size of
boot.
We will see more details about how boot is
formed from boot1 and
boot2 in the next section.Recall that nread uses memory address
0x8c00 as the transfer buffer to hold the
sectors read. This address is conveniently chosen. Indeed,
because boot1 belongs to the first 512
bytes, it ends up in the address range
0x8c00-0x8dff. The 512
bytes that follows (range
0x8e00-0x8fff) is used to
store the bsdlabelHistorically known as disklabel. If you
ever wondered where &os; stored this information, it is in
this region. See &man.bsdlabel.8;.Starting at address 0x9000 is the
beginning of the BTX server, and immediately
following is the boot2 client. The
BTX server acts as a kernel, and executes in
protected mode in the most privileged level. In contrast, the
BTX clients (boot2, for
example), execute in user mode. We will see how this is
accomplished in the next section. The code after the call to
nread locates the beginning of
boot2 in the memory buffer, and copies it
to memory address 0xc000. This is because
the BTX server arranges
boot2 to execute in a segment starting at
0xa000. We explore this in detail in the
following section.The last code block of boot1 enables
access to memory above 1MB
This is necessary for legacy reasons. Interested
readers should see .
and concludes with a jump to the starting point of the
BTX server:Note that right before the jump, interrupts are
enabled.The BTX ServerNext in our boot sequence is the
BTX Server. Let us quickly remember how we
got here:The BIOS loads the absolute sector
one (the MBR, or
boot0), to address
0x7c00 and jumps there.boot0 relocates itself to
0x600, the address it was linked to
execute, and jumps over there. It then reads the first
sector of the &os; slice (which consists of
boot1) into address
0x7c00 and jumps over there.boot1 loads the first 16 sectors
of the &os; slice into address 0x8c00.
This 16 sectors, or 8192 bytes, is the whole file
boot. The file is a
concatenation of boot1 and
boot2. boot2, in
turn, contains the BTX server and the
boot2 client. Finally, a jump is made
to address 0x9010, the entry point of the
BTX server.Before studying the BTX Server in detail,
let us further review how the single, all-in-one
boot file is created. The way
boot is built is defined in its
Makefile
(/usr/src/sys/boot/i386/boot2/Makefile).
Let us look at the rule that creates the
boot file:This tells us that boot1 and
boot2 are needed, and the rule simply
concatenates them to produce a single file called
boot. The rules for creating
boot1 are also quite simple:To apply the rule for creating
boot1, boot1.out must
be resolved. This, in turn, depends on the existence of
boot1.o. This last file is simply the
result of assembling our familiar boot1.S,
without linking. Now, the rule for creating
boot1.out is applied. This tells us that
boot1.o should be linked with
start as its entry point, and starting at
address 0x7c00. Finally,
boot1 is created from
boot1.out applying the appropriate rule.
This rule is the objcopy command applied to
boot1.out. Note the flags passed to
objcopy: -S tells it to
strip all relocation and symbolic information;
-O binary indicates the output format, that
is, a simple, unformatted binary file.Having boot1, let us take a look at how
boot2 is constructed:The mechanism for building boot2 is
far more elaborate. Let us point out the most relevant facts.
The dependency list is as follows:Note that initially there is no header file
boot2.h, but its creation depends on
boot1.out, which we already have. The rule
for its creation is a bit terse, but the important thing is that
the output, boot2.h, is something like
this:Recall that boot1 was relocated (i.e.,
copied from 0x7c00 to
0x700). This relocation will now make sense,
because as we will see, the BTX server
reclaims some memory, including the space where
boot1 was originally loaded. However, the
BTX server needs access to
boot1's xread function;
this function, according to the output of
boot2.h, is at location
0x725. Indeed, the
BTX server uses the
xread function from
boot1's relocated code. This function is
now accessible from within the boot2
client.We next build boot2.s from files
boot2.h, boot2.c and
/usr/src/sys/boot/common/ufsread.c. The
rule for this is to compile the code in
boot2.c (which includes
boot2.h and ufsread.c)
into assembly code. Having boot2.s, the
next rule assembles boot2.s, creating the
object file boot2.o. The
next rule directs the linker to link various files
(crt0.o,
boot2.o and sio.o).
Note that the output file, boot2.out, is
linked to execute at address 0x2000. Recall
that boot2 will be executed in user mode,
within a special user segment set up by the
BTX server. This segment starts at
0xa000. Also, remember that the
boot2 portion of boot
was copied to address 0xc000, that is, offset
0x2000 from the start of the user segment, so
boot2 will work properly when we transfer
control to it. Next, boot2.bin is created
from boot2.out by stripping its symbols and
format information; boot2.bin is a raw
binary. Now, note that a file boot2.ldr is
created as a 512-byte file full of zeros. This space is
reserved for the bsdlabel.Now that we have files boot1,
boot2.bin and
boot2.ldr, only the
BTX server is missing before creating the
all-in-one boot file. The
BTX server is located in
/usr/src/sys/boot/i386/btx/btx; it has its
own Makefile with its own set of rules for
building. The important thing to notice is that it is also
compiled as a raw binary, and that it is
linked to execute at address 0x9000. The
details can be found in
/usr/src/sys/boot/i386/btx/btx/Makefile.Having the files that comprise the boot
program, the final step is to merge them.
This is done by a special program called
btxld (source located in
/usr/src/usr.sbin/btxld). Some arguments
to this program include the name of the output file
(boot), its entry point
(0x2000) and its file format
(raw binary). The various files are
finally merged by this utility into the file
boot, which consists of
boot1, boot2, the
bsdlabel and the
BTX server. This file, which takes
exactly 16 sectors, or 8192 bytes, is what is
actually written to the beginning of the &os; slice
during installation. Let us now proceed to study the
BTX server program.The BTX server prepares a simple
environment and switches from 16-bit real mode to 32-bit
protected mode, right before passing control to the client.
This includes initializing and updating the following data
structures:virtual v86 modeModifies the
Interrupt Vector Table (IVT). The
IVT provides exception and interrupt
handlers for Real-Mode code.The Interrupt Descriptor Table (IDT)
is created. Entries are provided for processor exceptions,
hardware interrupts, two system calls and V86 interface.
The IDT provides exception and interrupt handlers for
Protected-Mode code.A Task-State Segment (TSS) is
created. This is necessary because the processor works in
the least privileged level when
executing the client (boot2), but in
the most privileged level when
executing the BTX server.The GDT (Global Descriptor Table) is
set up. Entries (descriptors) are provided for
supervisor code and data, user code and data, and real-mode
code and data.
Real-mode code and data are necessary when switching
back to real mode from protected mode, as suggested by
the Intel manuals.Let us now start studying the actual implementation. Recall
that boot1 made a jump to address
0x9010, the BTX server's
entry point. Before studying program execution there,
note that the BTX server has a special header
at address range 0x9000-0x900f, right before
its entry point. This header is defined as follows:Note the first two bytes are 0xeb and
0xe. In the IA-32 architecture, these two
bytes are interpreted as a relative jump past the header into
the entry point, so in theory, boot1 could
jump here (address 0x9000) instead of address
0x9010. Note that the last field in the
BTX header is a pointer to the client's
(boot2) entry point. This field is patched
at link time.Immediately following the header is the
BTX server's entry point:This code disables interrupts, sets up a working stack
(starting at address 0x1800) and clears the
flags in the EFLAGS register. Note that the
popfl instruction pops out a doubleword (4
bytes) from the stack and places it in the EFLAGS register.
- Because the value actually popped is 2, the
+ As the value actually popped is 2, the
EFLAGS register is effectively cleared (IA-32 requires that bit
2 of the EFLAGS register always be 1).Our next code block clears (sets to 0)
the memory range 0x5e00-0x8fff. This range
is where the various data structures will be created:Recall that boot1 was originally loaded
to address 0x7c00, so, with this memory
initialization, that copy effectively disappeared. However,
also recall that boot1 was relocated to
0x700, so that copy is
still in memory, and the BTX server will make
use of it.Next, the real-mode IVT (Interrupt Vector
Table is updated. The IVT is an array of
segment/offset pairs for exception and interrupt handlers. The
BIOS normally maps hardware interrupts to
interrupt vectors 0x8 to
0xf and 0x70 to
0x77 but, as will be seen, the 8259A
Programmable Interrupt Controller, the chip controlling the
actual mapping of hardware interrupts to interrupt vectors, is
programmed to remap these interrupt vectors from
0x8-0xf to 0x20-0x27 and
from 0x70-0x77 to
0x28-0x2f. Thus, interrupt handlers are
provided for interrupt vectors 0x20-0x2f.
The reason the BIOS-provided handlers are not
used directly is because they work in 16-bit real mode, but not
32-bit protected mode. Processor mode will be switched to
32-bit protected mode shortly. However, the
BTX server sets up a mechanism to effectively
use the handlers provided by the BIOS:The next block creates the IDT (Interrupt
Descriptor Table). The IDT is analogous, in
protected mode, to the IVT in real mode.
That is, the IDT describes the various
exception and interrupt handlers used when the processor is
executing in protected mode. In essence, it also consists of an
array of segment/offset pairs, although the structure is
somewhat more complex, because segments in protected mode are
different than in real mode, and various protection mechanisms
apply:Each entry in the IDT is 8 bytes long.
Besides the segment/offset information, they also describe the
segment type, privilege level, and whether the segment is
present in memory or not. The construction is such that
interrupt vectors from 0 to
0xf (exceptions) are handled by function
intx00; vector 0x10 (also
an exception) is handled by intx10; hardware
interrupts, which are later configured to start at interrupt
vector 0x20 all the way to interrupt vector
0x2f, are handled by function
intx20. Lastly, interrupt vector
0x30, which is used for system calls, is
handled by intx30, and vectors
0x31 and 0x32 are handled
by intx31. It must be noted that only
descriptors for interrupt vectors 0x30,
0x31 and 0x32 are given
privilege level 3, the same privilege level as the
boot2 client, which means the client can
execute a software-generated interrupt to this vectors through
the int instruction without failing (this is
the way boot2 use the services provided by
the BTX server). Also, note that
only software-generated interrupts are
protected from code executing in lesser privilege levels.
Hardware-generated interrupts and processor-generated exceptions
are always handled adequately, regardless
of the actual privileges involved.The next step is to initialize the TSS
(Task-State Segment). The TSS is a hardware
feature that helps the operating system or executive software
implement multitasking functionality through process
abstraction. The IA-32 architecture demands the creation and
use of at least one TSS
if multitasking facilities are used or different privilege
- levels are defined. Because the boot2
+ levels are defined. Since the boot2
client is executed in privilege level 3, but the
BTX server does in privilege level 0, a
TSS must be defined:Note that a value is given for the Privilege Level 0 stack
pointer and stack segment in the TSS. This
is needed because, if an interrupt or exception is received
while executing boot2 in Privilege Level 3,
a change to Privilege Level 0 is automatically performed by the
processor, so a new working stack is needed. Finally, the I/O
Map Base Address field of the TSS is given a
value, which is a 16-bit offset from the beginning of the
TSS to the I/O Permission Bitmap and the
Interrupt Redirection Bitmap.After the IDT and TSS
are created, the processor is ready to switch to protected mode.
This is done in the next block:First, a call is made to setpic to
program the 8259A PIC (Programmable Interrupt
Controller). This chip is connected to multiple hardware
interrupt sources. Upon receiving an interrupt from a device,
it signals the processor with the appropriate interrupt vector.
This can be customized so that specific interrupts are
associated with specific interrupt vectors, as explained before.
Next, the IDTR (Interrupt Descriptor Table
Register) and GDTR (Global Descriptor Table
Register) are loaded with the instructions
lidt and lgdt,
respectively. These registers are loaded with the base address
and limit address for the IDT and
GDT. The following three instructions set
the Protection Enable (PE) bit of the %cr0
register. This effectively switches the processor to 32-bit
protected mode. Next, a long jump is made to
init.8 using segment selector SEL_SCODE,
which selects the Supervisor Code Segment. The processor is
effectively executing in CPL 0, the most privileged level, after
this jump. Finally, the Supervisor Data Segment is selected for
the stack by assigning the segment selector SEL_SDATA to the
%ss register. This data segment also has a
privilege level of 0.Our last code block is responsible for loading the
TR (Task Register) with the segment selector
for the TSS we created earlier, and setting
the User Mode environment before passing execution control to
the boot2 client.Note that the client's environment include a stack segment
selector and stack pointer (registers %ss and
%esp). Indeed, once the
TR is loaded with the appropriate stack
segment selector (instruction ltr), the stack
pointer is calculated and pushed onto the stack along with the
stack's segment selector. Next, the value
0x202 is pushed onto the stack; it is the
value that the EFLAGS will get when control is passed to the
client. Also, the User Mode code segment selector and the
client's entry point are pushed. Recall that this entry
point is patched in the BTX header at link
time. Finally, segment selectors (stored in register
%ecx) for the segment registers
%gs, %fs, %ds and %es are pushed onto the
stack, along with the value at %edx
(0xa000). Keep in mind the various values
that have been pushed onto the stack (they will be popped out
shortly). Next, values for the remaining general purpose
registers are also pushed onto the stack (note the
loop that pushes the value
0 seven times). Now, values will be started
to be popped out of the stack. First, the
popa instruction pops out of the stack the
latest seven values pushed. They are stored in the general
purpose registers in order
%edi, %esi, %ebp, %ebx, %edx, %ecx, %eax.
Then, the various segment selectors pushed are popped into the
various segment registers. Five values still remain on the
stack. They are popped when the iret
instruction is executed. This instruction first pops
the value that was pushed from the BTX
header. This value is a pointer to boot2's
entry point. It is placed in the register
%eip, the instruction pointer register.
Next, the segment selector for the User Code Segment is popped
and copied to register %cs. Remember that
this segment's privilege level is 3, the least privileged
level. This means that we must provide values for the stack of
this privilege level. This is why the processor, besides
further popping the value for the EFLAGS register, does two more
pops out of the stack. These values go to the stack
pointer (%esp) and the stack segment
(%ss). Now, execution continues at
boot0's entry point.It is important to note how the User Code Segment is
defined. This segment's base address is
set to 0xa000. This means that code memory
addresses are relative to address 0xa000;
if code being executed is fetched from address
0x2000, the actual
memory addressed is
0xa000+0x2000=0xc000.boot2 Stageboot2 defines an important structure,
struct bootinfo. This structure is
initialized by boot2 and passed to the
loader, and then further to the kernel. Some nodes of this
structures are set by boot2, the rest by the
loader. This structure, among other information, contains the
kernel filename, BIOS harddisk geometry,
BIOS drive number for boot device, physical
memory available, envp pointer etc. The
definition for it is:/usr/include/machine/bootinfo.h:
struct bootinfo {
u_int32_t bi_version;
u_int32_t bi_kernelname; /* represents a char * */
u_int32_t bi_nfs_diskless; /* struct nfs_diskless * */
/* End of fields that are always present. */
#define bi_endcommon bi_n_bios_used
u_int32_t bi_n_bios_used;
u_int32_t bi_bios_geom[N_BIOS_GEOM];
u_int32_t bi_size;
u_int8_t bi_memsizes_valid;
u_int8_t bi_bios_dev; /* bootdev BIOS unit number */
u_int8_t bi_pad[2];
u_int32_t bi_basemem;
u_int32_t bi_extmem;
u_int32_t bi_symtab; /* struct symtab * */
u_int32_t bi_esymtab; /* struct symtab * */
/* Items below only from advanced bootloader */
u_int32_t bi_kernend; /* end of kernel space */
u_int32_t bi_envp; /* environment */
u_int32_t bi_modulep; /* preloaded modules */
};boot2 enters into an infinite loop
waiting for user input, then calls load().
If the user does not press anything, the loop breaks by a
timeout, so load() will load the default
file (/boot/loader). Functions
ino_t lookup(char *filename) and
int xfsread(ino_t inode, void *buf, size_t
nbyte) are used to read the content of a file into
memory. /boot/loader is an
ELF binary, but where the
ELF header is prepended with
a.out's struct
exec structure. load() scans the
loader's ELF header, loading the content of
/boot/loader into memory, and passing the
execution to the loader's entry:sys/boot/i386/boot2/boot2.c:
__exec((caddr_t)addr, RB_BOOTINFO | (opts & RBX_MASK),
MAKEBOOTDEV(dev_maj[dsk.type], 0, dsk.slice, dsk.unit, dsk.part),
0, 0, 0, VTOP(&bootinfo));loader Stageloader is a
BTX client as well. I will not describe it
here in detail, there is a comprehensive man page written by
Mike Smith, &man.loader.8;. The underlying mechanisms and
BTX were discussed above.The main task for the loader is to boot the kernel. When
the kernel is loaded into memory, it is being called by the
loader:sys/boot/common/boot.c:
/* Call the exec handler from the loader matching the kernel */
module_formats[km->m_loader]->l_exec(km);Kernel InitializationLet us take a look at the command that links the kernel.
This will help identify the exact location where the loader
passes execution to the kernel. This location is the kernel's
actual entry point.sys/conf/Makefile.i386:
ld -elf -Bdynamic -T /usr/src/sys/conf/ldscript.i386 -export-dynamic \
-dynamic-linker /red/herring -o kernel -X locore.o \
<lots of kernel .o files>ELFA few interesting things can be seen here. First, the
kernel is an ELF dynamically linked binary, but the dynamic
linker for kernel is /red/herring, which is
definitely a bogus file. Second, taking a look at the file
sys/conf/ldscript.i386 gives an idea about
what ld options are used when
compiling a kernel. Reading through the first few lines, the
stringsys/conf/ldscript.i386:
ENTRY(btext)says that a kernel's entry point is the symbol `btext'.
This symbol is defined in locore.s:sys/i386/i386/locore.s:
.text
/**********************************************************************
*
* This is where the bootblocks start us, set the ball rolling...
*
*/
NON_GPROF_ENTRY(btext)First, the register EFLAGS is set to a predefined value of
0x00000002. Then all the segment registers are
initialized:sys/i386/i386/locore.s:
/* Don't trust what the BIOS gives for eflags. */
pushl $PSL_KERNEL
popfl
/*
* Don't trust what the BIOS gives for %fs and %gs. Trust the bootstrap
* to set %cs, %ds, %es and %ss.
*/
mov %ds, %ax
mov %ax, %fs
mov %ax, %gsbtext calls the routines
recover_bootinfo(),
identify_cpu(),
create_pagetables(), which are also defined
in locore.s. Here is a description of what
they do:recover_bootinfoThis routine parses the parameters to the kernel
passed from the bootstrap. The kernel may have been
booted in 3 ways: by the loader, described above, by the
old disk boot blocks, or by the old diskless boot
procedure. This function determines the booting method,
and stores the struct bootinfo
structure into the kernel memory.identify_cpuThis functions tries to find out what CPU it is
running on, storing the value found in a variable
_cpu.create_pagetablesThis function allocates and fills out a Page Table
Directory at the top of the kernel memory area.The next steps are enabling VME, if the CPU supports
it: testl $CPUID_VME, R(_cpu_feature)
jz 1f
movl %cr4, %eax
orl $CR4_VME, %eax
movl %eax, %cr4Then, enabling paging:/* Now enable paging */
movl R(_IdlePTD), %eax
movl %eax,%cr3 /* load ptd addr into mmu */
movl %cr0,%eax /* get control word */
orl $CR0_PE|CR0_PG,%eax /* enable paging */
movl %eax,%cr0 /* and let's page NOW! */The next three lines of code are because the paging was set,
so the jump is needed to continue the execution in virtualized
address space: pushl $begin /* jump to high virtualized address */
ret
/* now running relocated at KERNBASE where the system is linked to run */
begin:The function init386() is called with
a pointer to the first free physical page, after that
mi_startup(). init386
is an architecture dependent initialization function, and
mi_startup() is an architecture independent
one (the 'mi_' prefix stands for Machine Independent). The
kernel never returns from mi_startup(), and
by calling it, the kernel finishes booting:sys/i386/i386/locore.s:
movl physfree, %esi
pushl %esi /* value of first for init386(first) */
call _init386 /* wire 386 chip for unix operation */
call _mi_startup /* autoconfiguration, mountroot etc */
hlt /* never returns to here */init386()init386() is defined in
sys/i386/i386/machdep.c and performs
low-level initialization specific to the i386 chip. The
switch to protected mode was performed by the loader. The
loader has created the very first task, in which the kernel
continues to operate. Before looking at the code, consider
the tasks the processor must complete to initialize protected
mode execution:Initialize the kernel tunable parameters, passed from
the bootstrapping program.Prepare the GDT.Prepare the IDT.Initialize the system console.Initialize the DDB, if it is compiled into
kernel.Initialize the TSS.Prepare the LDT.Set up proc0's pcb.parametersinit386() initializes the tunable
parameters passed from bootstrap by setting the environment
pointer (envp) and calling init_param1().
The envp pointer has been passed from loader in the
bootinfo structure:sys/i386/i386/machdep.c:
kern_envp = (caddr_t)bootinfo.bi_envp + KERNBASE;
/* Init basic tunables, hz etc */
init_param1();init_param1() is defined in
sys/kern/subr_param.c. That file has a
number of sysctls, and two functions,
init_param1() and
init_param2(), that are called from
init386():sys/kern/subr_param.c:
hz = HZ;
TUNABLE_INT_FETCH("kern.hz", &hz);TUNABLE_<typename>_FETCH is used to fetch the value
from the environment:/usr/src/sys/sys/kernel.h:
#define TUNABLE_INT_FETCH(path, var) getenv_int((path), (var))Sysctl kern.hz is the system clock
tick. Additionally, these sysctls are set by
init_param1(): kern.maxswzone,
kern.maxbcache, kern.maxtsiz, kern.dfldsiz, kern.maxdsiz,
kern.dflssiz, kern.maxssiz, kern.sgrowsiz.Global Descriptors Table (GDT)Then init386() prepares the Global
Descriptors Table (GDT). Every task on an x86 is running in
its own virtual address space, and this space is addressed by
a segment:offset pair. Say, for instance, the current
instruction to be executed by the processor lies at CS:EIP,
then the linear virtual address for that instruction would be
the virtual address of code segment CS + EIP.
For convenience, segments begin at virtual address 0 and end
at a 4Gb boundary. Therefore, the instruction's linear
virtual address for this example would just be the value of
EIP. Segment registers such as CS, DS etc are the selectors,
i.e., indexes, into GDT (to be more precise, an index is not a
selector itself, but the INDEX field of a selector).
FreeBSD's GDT holds descriptors for 15 selectors per
CPU:sys/i386/i386/machdep.c:
union descriptor gdt[NGDT * MAXCPU]; /* global descriptor table */
sys/i386/include/segments.h:
/*
* Entries in the Global Descriptor Table (GDT)
*/
#define GNULL_SEL 0 /* Null Descriptor */
#define GCODE_SEL 1 /* Kernel Code Descriptor */
#define GDATA_SEL 2 /* Kernel Data Descriptor */
#define GPRIV_SEL 3 /* SMP Per-Processor Private Data */
#define GPROC0_SEL 4 /* Task state process slot zero and up */
#define GLDT_SEL 5 /* LDT - eventually one per process */
#define GUSERLDT_SEL 6 /* User LDT */
#define GTGATE_SEL 7 /* Process task switch gate */
#define GBIOSLOWMEM_SEL 8 /* BIOS low memory access (must be entry 8) */
#define GPANIC_SEL 9 /* Task state to consider panic from */
#define GBIOSCODE32_SEL 10 /* BIOS interface (32bit Code) */
#define GBIOSCODE16_SEL 11 /* BIOS interface (16bit Code) */
#define GBIOSDATA_SEL 12 /* BIOS interface (Data) */
#define GBIOSUTIL_SEL 13 /* BIOS interface (Utility) */
#define GBIOSARGS_SEL 14 /* BIOS interface (Arguments) */Note that those #defines are not selectors themselves, but
just a field INDEX of a selector, so they are exactly the
indices of the GDT. for example, an actual selector for the
kernel code (GCODE_SEL) has the value 0x08.Interrupt Descriptor Table
(IDT)The next step is to initialize the Interrupt Descriptor
Table (IDT). This table is referenced by the processor when a
software or hardware interrupt occurs. For example, to make a
system call, user application issues the
INT 0x80 instruction. This is a software
interrupt, so the processor's hardware looks up a record with
index 0x80 in the IDT. This record points to the routine that
handles this interrupt, in this particular case, this will be
the kernel's syscall gate. The IDT may have a maximum of 256
(0x100) records. The kernel allocates NIDT records for the
IDT, where NIDT is the maximum (256):sys/i386/i386/machdep.c:
static struct gate_descriptor idt0[NIDT];
struct gate_descriptor *idt = &idt0[0]; /* interrupt descriptor table */For each interrupt, an appropriate handler is set. The
syscall gate for INT 0x80 is set as
well:sys/i386/i386/machdep.c:
setidt(0x80, &IDTVEC(int0x80_syscall),
SDT_SYS386TGT, SEL_UPL, GSEL(GCODE_SEL, SEL_KPL));So when a userland application issues the
INT 0x80 instruction, control will transfer
to the function _Xint0x80_syscall, which
is in the kernel code segment and will be executed with
supervisor privileges.Console and DDB are then initialized:DDBsys/i386/i386/machdep.c:
cninit();
/* skipped */
#ifdef DDB
kdb_init();
if (boothowto & RB_KDB)
Debugger("Boot flags requested debugger");
#endifThe Task State Segment is another x86 protected mode
structure, the TSS is used by the hardware to store task
information when a task switch occurs.The Local Descriptors Table is used to reference userland
code and data. Several selectors are defined to point to the
LDT, they are the system call gates and the user code and data
selectors:/usr/include/machine/segments.h:
#define LSYS5CALLS_SEL 0 /* forced by intel BCS */
#define LSYS5SIGR_SEL 1
#define L43BSDCALLS_SEL 2 /* notyet */
#define LUCODE_SEL 3
#define LSOL26CALLS_SEL 4 /* Solaris >= 2.6 system call gate */
#define LUDATA_SEL 5
/* separate stack, es,fs,gs sels ? */
/* #define LPOSIXCALLS_SEL 5*/ /* notyet */
#define LBSDICALLS_SEL 16 /* BSDI system call gate */
#define NLDT (LBSDICALLS_SEL + 1)Next, proc0's Process Control Block
(struct pcb) structure is initialized.
proc0 is a struct proc structure that
describes a kernel process. It is always present while the
kernel is running, therefore it is declared as global:sys/kern/kern_init.c:
struct proc proc0;The structure struct pcb is a part of a
proc structure. It is defined in
/usr/include/machine/pcb.h and has a
process's information specific to the i386 architecture, such
as registers values.mi_startup()This function performs a bubble sort of all the system
initialization objects and then calls the entry of each object
one by one:sys/kern/init_main.c:
for (sipp = sysinit; *sipp; sipp++) {
/* ... skipped ... */
/* Call function */
(*((*sipp)->func))((*sipp)->udata);
/* ... skipped ... */
}Although the sysinit framework is described in the Developers'
Handbook, I will discuss the internals of it.sysinit objectsEvery system initialization object (sysinit object) is
created by calling a SYSINIT() macro. Let us take as example
an announce sysinit object. This object
prints the copyright message:sys/kern/init_main.c:
static void
print_caddr_t(void *data __unused)
{
printf("%s", (char *)data);
}
SYSINIT(announce, SI_SUB_COPYRIGHT, SI_ORDER_FIRST, print_caddr_t, copyright)The subsystem ID for this object is SI_SUB_COPYRIGHT
(0x0800001), which comes right after the SI_SUB_CONSOLE
(0x0800000). So, the copyright message will be printed out
first, just after the console initialization.Let us take a look at what exactly the macro
SYSINIT() does. It expands to a
C_SYSINIT() macro. The
C_SYSINIT() macro then expands to a static
struct sysinit structure declaration with
another DATA_SET macro call:/usr/include/sys/kernel.h:
#define C_SYSINIT(uniquifier, subsystem, order, func, ident) \
static struct sysinit uniquifier ## _sys_init = { \ subsystem, \
order, \ func, \ ident \ }; \ DATA_SET(sysinit_set,uniquifier ##
_sys_init);
#define SYSINIT(uniquifier, subsystem, order, func, ident) \
C_SYSINIT(uniquifier, subsystem, order, \
(sysinit_cfunc_t)(sysinit_nfunc_t)func, (void *)ident)The DATA_SET() macro expands to a
MAKE_SET(), and that macro is the point
where all the sysinit magic is hidden:/usr/include/linker_set.h:
#define MAKE_SET(set, sym) \
static void const * const __set_##set##_sym_##sym = &sym; \
__asm(".section .set." #set ",\"aw\""); \
__asm(".long " #sym); \
__asm(".previous")
#endif
#define TEXT_SET(set, sym) MAKE_SET(set, sym)
#define DATA_SET(set, sym) MAKE_SET(set, sym)In our case, the following declaration will occur:static struct sysinit announce_sys_init = {
SI_SUB_COPYRIGHT,
SI_ORDER_FIRST,
(sysinit_cfunc_t)(sysinit_nfunc_t) print_caddr_t,
(void *) copyright
};
static void const *const __set_sysinit_set_sym_announce_sys_init =
&announce_sys_init;
__asm(".section .set.sysinit_set" ",\"aw\"");
__asm(".long " "announce_sys_init");
__asm(".previous");The first __asm instruction will create
an ELF section within the kernel's executable. This will
happen at kernel link time. The section will have the name
.set.sysinit_set. The content of this
section is one 32-bit value, the address of announce_sys_init
structure, and that is what the second
__asm is. The third
__asm instruction marks the end of a
section. If a directive with the same section name occurred
before, the content, i.e., the 32-bit value, will be appended
to the existing section, so forming an array of 32-bit
pointers.Running objdump on a kernel
binary, you may notice the presence of such small
sections:&prompt.user; objdump -h /kernel
7 .set.cons_set 00000014 c03164c0 c03164c0 002154c0 2**2
CONTENTS, ALLOC, LOAD, DATA
8 .set.kbddriver_set 00000010 c03164d4 c03164d4 002154d4 2**2
CONTENTS, ALLOC, LOAD, DATA
9 .set.scrndr_set 00000024 c03164e4 c03164e4 002154e4 2**2
CONTENTS, ALLOC, LOAD, DATA
10 .set.scterm_set 0000000c c0316508 c0316508 00215508 2**2
CONTENTS, ALLOC, LOAD, DATA
11 .set.sysctl_set 0000097c c0316514 c0316514 00215514 2**2
CONTENTS, ALLOC, LOAD, DATA
12 .set.sysinit_set 00000664 c0316e90 c0316e90 00215e90 2**2
CONTENTS, ALLOC, LOAD, DATAThis screen dump shows that the size of .set.sysinit_set
section is 0x664 bytes, so 0x664/sizeof(void
*) sysinit objects are compiled into the kernel.
The other sections such as .set.sysctl_set
represent other linker sets.By defining a variable of type struct
linker_set the content of
.set.sysinit_set section will be
collected into that variable:sys/kern/init_main.c:
extern struct linker_set sysinit_set; /* XXX */The struct linker_set is defined as
follows:/usr/include/linker_set.h:
struct linker_set {
int ls_length;
void *ls_items[1]; /* really ls_length of them, trailing NULL */
};The first node will be equal to the number of a sysinit
objects, and the second node will be a NULL-terminated array
of pointers to them.Returning to the mi_startup()
discussion, it is must be clear now, how the sysinit objects
are being organized. The mi_startup()
function sorts them and calls each. The very last object is
the system scheduler:/usr/include/sys/kernel.h:
enum sysinit_sub_id {
SI_SUB_DUMMY = 0x0000000, /* not executed; for linker*/
SI_SUB_DONE = 0x0000001, /* processed*/
SI_SUB_CONSOLE = 0x0800000, /* console*/
SI_SUB_COPYRIGHT = 0x0800001, /* first use of console*/
...
SI_SUB_RUN_SCHEDULER = 0xfffffff /* scheduler: no return*/
};The system scheduler sysinit object is defined in the file
sys/vm/vm_glue.c, and the entry point for
that object is scheduler(). That
function is actually an infinite loop, and it represents a
process with PID 0, the swapper process. The proc0 structure,
mentioned before, is used to describe it.The first user process, called init,
is created by the sysinit object
init:sys/kern/init_main.c:
static void
create_init(const void *udata __unused)
{
int error;
int s;
s = splhigh();
error = fork1(&proc0, RFFDG | RFPROC, &initproc);
if (error)
panic("cannot fork init: %d\n", error);
initproc->p_flag |= P_INMEM | P_SYSTEM;
cpu_set_fork_handler(initproc, start_init, NULL);
remrunqueue(initproc);
splx(s);
}
SYSINIT(init,SI_SUB_CREATE_INIT, SI_ORDER_FIRST, create_init, NULL)The create_init() allocates a new
process by calling fork1(), but does not
mark it runnable. When this new process is scheduled for
execution by the scheduler, the
start_init() will be called. That
function is defined in init_main.c. It
tries to load and exec the init binary,
probing /sbin/init first, then
/sbin/oinit,
/sbin/init.bak, and finally
/stand/sysinstall:sys/kern/init_main.c:
static char init_path[MAXPATHLEN] =
#ifdef INIT_PATH
__XSTRING(INIT_PATH);
#else
"/sbin/init:/sbin/oinit:/sbin/init.bak:/stand/sysinstall";
#endif
diff --git a/en_US.ISO8859-1/books/arch-handbook/driverbasics/chapter.xml b/en_US.ISO8859-1/books/arch-handbook/driverbasics/chapter.xml
index 6e5551873b..9826e3a1d9 100644
--- a/en_US.ISO8859-1/books/arch-handbook/driverbasics/chapter.xml
+++ b/en_US.ISO8859-1/books/arch-handbook/driverbasics/chapter.xml
@@ -1,423 +1,423 @@
Writing FreeBSD Device DriversMurrayStokelyWritten by JörgWunschBased on intro(4) manual page by Introductiondevice driverpseudo-deviceThis chapter provides a brief introduction to writing device
drivers for FreeBSD. A device in this context is a term used
mostly for hardware-related stuff that belongs to the system,
like disks, printers, or a graphics display with its keyboard.
A device driver is the software component of the operating
system that controls a specific device. There are also
so-called pseudo-devices where a device driver emulates the
behavior of a device in software without any particular
underlying hardware. Device drivers can be compiled into the
system statically or loaded on demand through the dynamic kernel
linker facility `kld'.device nodesMost devices in a &unix;-like operating system are accessed
through device-nodes, sometimes also called special files.
These files are usually located under the directory
/dev in the filesystem hierarchy.Device drivers can roughly be broken down into two
categories; character and network device drivers.Dynamic Kernel Linker Facility - KLDkernel linkingdynamickernel loadable modules (KLD)The kld interface allows system administrators to
dynamically add and remove functionality from a running system.
This allows device driver writers to load their new changes into
a running kernel without constantly rebooting to test
changes.kernel modulesloadingkernel modulesunloadingkernel moduleslistingThe kld interface is used through:kldload - loads a new kernel
modulekldunload - unloads a kernel
modulekldstat - lists loaded
modulesSkeleton Layout of a kernel module/*
* KLD Skeleton
* Inspired by Andrew Reiter's Daemonnews article
*/
#include <sys/types.h>
#include <sys/module.h>
#include <sys/systm.h> /* uprintf */
#include <sys/errno.h>
#include <sys/param.h> /* defines used in kernel.h */
#include <sys/kernel.h> /* types used in module initialization */
/*
* Load handler that deals with the loading and unloading of a KLD.
*/
static int
skel_loader(struct module *m, int what, void *arg)
{
int err = 0;
switch (what) {
case MOD_LOAD: /* kldload */
uprintf("Skeleton KLD loaded.\n");
break;
case MOD_UNLOAD:
uprintf("Skeleton KLD unloaded.\n");
break;
default:
err = EOPNOTSUPP;
break;
}
return(err);
}
/* Declare this module to the rest of the kernel */
static moduledata_t skel_mod = {
"skel",
skel_loader,
NULL
};
DECLARE_MODULE(skeleton, skel_mod, SI_SUB_KLD, SI_ORDER_ANY);Makefile&os; provides a system makefile to simplify compiling a
kernel module.SRCS=skeleton.c
KMOD=skeleton
.include <bsd.kmod.mk>Running make with this makefile
will create a file skeleton.ko that can
be loaded into the kernel by typing:&prompt.root; kldload -v ./skeleton.koCharacter Devicescharacter devicesA character device driver is one that transfers data
directly to and from a user process. This is the most common
type of device driver and there are plenty of simple examples in
the source tree.This simple example pseudo-device remembers whatever values
are written to it and can then echo them back when
read.Example of a Sample Echo Pseudo-Device Driver for
&os; 10.X - 12.X/*
* Simple Echo pseudo-device KLD
*
* Murray Stokely
* Søren (Xride) Straarup
* Eitan Adler
*/
#include <sys/types.h>
#include <sys/module.h>
#include <sys/systm.h> /* uprintf */
#include <sys/param.h> /* defines used in kernel.h */
#include <sys/kernel.h> /* types used in module initialization */
#include <sys/conf.h> /* cdevsw struct */
#include <sys/uio.h> /* uio struct */
#include <sys/malloc.h>
#define BUFFERSIZE 255
/* Function prototypes */
static d_open_t echo_open;
static d_close_t echo_close;
static d_read_t echo_read;
static d_write_t echo_write;
/* Character device entry points */
static struct cdevsw echo_cdevsw = {
.d_version = D_VERSION,
.d_open = echo_open,
.d_close = echo_close,
.d_read = echo_read,
.d_write = echo_write,
.d_name = "echo",
};
struct s_echo {
char msg[BUFFERSIZE + 1];
int len;
};
/* vars */
static struct cdev *echo_dev;
static struct s_echo *echomsg;
MALLOC_DECLARE(M_ECHOBUF);
MALLOC_DEFINE(M_ECHOBUF, "echobuffer", "buffer for echo module");
/*
* This function is called by the kld[un]load(2) system calls to
* determine what actions to take when a module is loaded or unloaded.
*/
static int
echo_loader(struct module *m __unused, int what, void *arg __unused)
{
int error = 0;
switch (what) {
case MOD_LOAD: /* kldload */
error = make_dev_p(MAKEDEV_CHECKNAME | MAKEDEV_WAITOK,
&echo_dev,
&echo_cdevsw,
0,
UID_ROOT,
GID_WHEEL,
0600,
"echo");
if (error != 0)
break;
echomsg = malloc(sizeof(*echomsg), M_ECHOBUF, M_WAITOK |
M_ZERO);
printf("Echo device loaded.\n");
break;
case MOD_UNLOAD:
destroy_dev(echo_dev);
free(echomsg, M_ECHOBUF);
printf("Echo device unloaded.\n");
break;
default:
error = EOPNOTSUPP;
break;
}
return (error);
}
static int
echo_open(struct cdev *dev __unused, int oflags __unused, int devtype __unused,
struct thread *td __unused)
{
int error = 0;
uprintf("Opened device \"echo\" successfully.\n");
return (error);
}
static int
echo_close(struct cdev *dev __unused, int fflag __unused, int devtype __unused,
struct thread *td __unused)
{
uprintf("Closing device \"echo\".\n");
return (0);
}
/*
* The read function just takes the buf that was saved via
* echo_write() and returns it to userland for accessing.
* uio(9)
*/
static int
echo_read(struct cdev *dev __unused, struct uio *uio, int ioflag __unused)
{
size_t amt;
int error;
/*
* How big is this read operation? Either as big as the user wants,
* or as big as the remaining data. Note that the 'len' does not
* include the trailing null character.
*/
amt = MIN(uio->uio_resid, uio->uio_offset >= echomsg->len + 1 ? 0 :
echomsg->len + 1 - uio->uio_offset);
if ((error = uiomove(echomsg->msg, amt, uio)) != 0)
uprintf("uiomove failed!\n");
return (error);
}
/*
* echo_write takes in a character string and saves it
* to buf for later accessing.
*/
static int
echo_write(struct cdev *dev __unused, struct uio *uio, int ioflag __unused)
{
size_t amt;
int error;
/*
* We either write from the beginning or are appending -- do
* not allow random access.
*/
if (uio->uio_offset != 0 && (uio->uio_offset != echomsg->len))
return (EINVAL);
/* This is a new message, reset length */
if (uio->uio_offset == 0)
echomsg->len = 0;
/* Copy the string in from user memory to kernel memory */
amt = MIN(uio->uio_resid, (BUFFERSIZE - echomsg->len));
error = uiomove(echomsg->msg + uio->uio_offset, amt, uio);
/* Now we need to null terminate and record the length */
echomsg->len = uio->uio_offset;
echomsg->msg[echomsg->len] = 0;
if (error != 0)
uprintf("Write failed: bad address!\n");
return (error);
}
DEV_MODULE(echo, echo_loader, NULL);With this driver loaded try:&prompt.root; echo -n "Test Data" > /dev/echo
&prompt.root; cat /dev/echo
Opened device "echo" successfully.
Test Data
Closing device "echo".Real hardware devices are described in the next
chapter.Block Devices (Are Gone)block devicesOther &unix; systems may support a second type of disk
device known as block devices. Block devices are disk devices
for which the kernel provides caching. This caching makes
block-devices almost unusable, or at least dangerously
unreliable. The caching will reorder the sequence of write
operations, depriving the application of the ability to know the
exact disk contents at any one instant in time.This makes predictable and reliable crash recovery of
on-disk data structures (filesystems, databases, etc.)
impossible. Since writes may be delayed, there is no way
the kernel can report to the application which particular
write operation encountered a write error, this further
compounds the consistency problem.For this reason, no serious applications rely on block
devices, and in fact, almost all applications which access
disks directly take great pains to specify that character
- (or raw) devices should always be used. Because
+ (or raw) devices should always be used. As
the implementation of the aliasing of each disk (partition) to
two devices with different semantics significantly complicated
- the relevant kernel code &os; dropped support for cached disk
+ the relevant kernel code, &os; dropped support for cached disk
devices as part of the modernization of the disk I/O
infrastructure.Network Driversnetwork devicesDrivers for network devices do not use device nodes in order
to be accessed. Their selection is based on other decisions
made inside the kernel and instead of calling open(), use of a
network device is generally introduced by using the system call
socket(2).For more information see ifnet(9), the source of the
loopback device, and Bill Paul's network drivers.
diff --git a/en_US.ISO8859-1/books/arch-handbook/isa/chapter.xml b/en_US.ISO8859-1/books/arch-handbook/isa/chapter.xml
index 97bd2822c5..04de498a3f 100644
--- a/en_US.ISO8859-1/books/arch-handbook/isa/chapter.xml
+++ b/en_US.ISO8859-1/books/arch-handbook/isa/chapter.xml
@@ -1,2514 +1,2514 @@
ISA Device DriversSergeyBabkinWritten by MurrayStokelyModifications for Handbook made by ValentinoVaschettoWylieStilwellSynopsisISAdevice driverISAThis chapter introduces the issues relevant to writing a
driver for an ISA device. The pseudo-code presented here is
rather detailed and reminiscent of the real code but is still
only pseudo-code. It avoids the details irrelevant to the
subject of the discussion. The real-life examples can be found
in the source code of real drivers. In particular the drivers
ep and aha are good sources of information.Basic InformationA typical ISA driver would need the following include
files:#include <sys/module.h>
#include <sys/bus.h>
#include <machine/bus.h>
#include <machine/resource.h>
#include <sys/rman.h>
#include <isa/isavar.h>
#include <isa/pnpvar.h>They describe the things specific to the ISA and generic
bus subsystem.object-orientedThe bus subsystem is implemented in an object-oriented
fashion, its main structures are accessed by associated method
functions.bus methodsThe list of bus methods implemented by an ISA driver is like
one for any other bus. For a hypothetical driver named xxx
they would be:static void xxx_isa_identify (driver_t *,
device_t); Normally used for bus drivers, not
device drivers. But for ISA devices this method may have
special use: if the device provides some device-specific
(non-PnP) way to auto-detect devices this routine may
implement it.static int xxx_isa_probe (device_t
dev); Probe for a device at a known (or PnP)
location. This routine can also accommodate device-specific
auto-detection of parameters for partially configured
devices.static int xxx_isa_attach (device_t
dev); Attach and initialize device.static int xxx_isa_detach (device_t
dev); Detach device before unloading the driver
module.static int xxx_isa_shutdown (device_t
dev); Execute shutdown of the device before
system shutdown.static int xxx_isa_suspend (device_t
dev); Suspend the device before the system goes
to the power-save state. May also abort transition to the
power-save state.static int xxx_isa_resume (device_t
dev); Resume the device activity after return
from power-save state.xxx_isa_probe() and
xxx_isa_attach() are mandatory, the rest of
the routines are optional, depending on the device's
needs.The driver is linked to the system with the following set of
descriptions. /* table of supported bus methods */
static device_method_t xxx_isa_methods[] = {
/* list all the bus method functions supported by the driver */
/* omit the unsupported methods */
DEVMETHOD(device_identify, xxx_isa_identify),
DEVMETHOD(device_probe, xxx_isa_probe),
DEVMETHOD(device_attach, xxx_isa_attach),
DEVMETHOD(device_detach, xxx_isa_detach),
DEVMETHOD(device_shutdown, xxx_isa_shutdown),
DEVMETHOD(device_suspend, xxx_isa_suspend),
DEVMETHOD(device_resume, xxx_isa_resume),
DEVMETHOD_END
};
static driver_t xxx_isa_driver = {
"xxx",
xxx_isa_methods,
sizeof(struct xxx_softc),
};
static devclass_t xxx_devclass;
DRIVER_MODULE(xxx, isa, xxx_isa_driver, xxx_devclass,
load_function, load_argument);softcHere struct xxx_softc is a
device-specific structure that contains private driver data
and descriptors for the driver's resources. The bus code
automatically allocates one softc descriptor per device as
needed.kernel moduleIf the driver is implemented as a loadable module then
load_function() is called to do
driver-specific initialization or clean-up when the driver is
loaded or unloaded and load_argument is passed as one of its
arguments. If the driver does not support dynamic loading (in
other words it must always be linked into the kernel) then these
values should be set to 0 and the last definition would look
like: DRIVER_MODULE(xxx, isa, xxx_isa_driver,
xxx_devclass, 0, 0);PnPIf the driver is for a device which supports PnP then a
table of supported PnP IDs must be defined. The table
consists of a list of PnP IDs supported by this driver and
human-readable descriptions of the hardware types and models
having these IDs. It looks like: static struct isa_pnp_id xxx_pnp_ids[] = {
/* a line for each supported PnP ID */
{ 0x12345678, "Our device model 1234A" },
{ 0x12345679, "Our device model 1234B" },
{ 0, NULL }, /* end of table */
};If the driver does not support PnP devices it still needs
an empty PnP ID table, like: static struct isa_pnp_id xxx_pnp_ids[] = {
{ 0, NULL }, /* end of table */
};device_t Pointerdevice_t is the pointer type for
the device structure. Here we consider only the methods
interesting from the device driver writer's standpoint. The
methods to manipulate values in the device structure
are:device_t
device_get_parent(dev) Get the parent bus of a
device.driver_t
device_get_driver(dev) Get pointer to its driver
structure.char
*device_get_name(dev) Get the driver name, such
as "xxx" for our example.int device_get_unit(dev)
Get the unit number (units are numbered from 0 for the
devices associated with each driver).char
*device_get_nameunit(dev) Get the device name
including the unit number, such as xxx0, xxx1 and so
on.char
*device_get_desc(dev) Get the device
description. Normally it describes the exact model of device
in human-readable form.device_set_desc(dev,
desc) Set the description. This makes the device
description point to the string desc which may not be
deallocated or changed after that.device_set_desc_copy(dev,
desc) Set the description. The description is
copied into an internal dynamically allocated buffer, so the
string desc may be changed afterwards without adverse
effects.void
*device_get_softc(dev) Get pointer to the device
descriptor (struct xxx_softc)
associated with this device.u_int32_t
device_get_flags(dev) Get the flags specified for
the device in the configuration file.A convenience function device_printf(dev, fmt,
...) may be used to print the messages from the
device driver. It automatically prepends the unitname and
colon to the message.The device_t methods are implemented in the file
kern/bus_subr.c.Configuration File and the Order of Identifying and Probing
During Auto-ConfigurationISAprobingThe ISA devices are described in the kernel configuration file
like:device xxx0 at isa? port 0x300 irq 10 drq 5
iomem 0xd0000 flags 0x1 sensitiveIRQThe values of port, IRQ and so on are converted to the
resource values associated with the device. They are optional,
depending on the device's needs and abilities for
auto-configuration. For example, some devices do not need DRQ
at all and some allow the driver to read the IRQ setting from
the device configuration ports. If a machine has multiple ISA
buses the exact bus may be specified in the configuration
line, like isa0 or isa1, otherwise the device would be
searched for on all the ISA buses.sensitive is a resource requesting that this device must
be probed before all non-sensitive devices. It is supported
but does not seem to be used in any current driver.For legacy ISA devices in many cases the drivers are still
able to detect the configuration parameters. But each device
to be configured in the system must have a config line. If two
devices of some type are installed in the system but there is
only one configuration line for the corresponding driver, ie:
device xxx0 at isa? then only
one device will be configured.But for the devices supporting automatic identification by
the means of Plug-n-Play or some proprietary protocol one
configuration line is enough to configure all the devices in
the system, like the one above or just simply:device xxx at isa?If a driver supports both auto-identified and legacy
devices and both kinds are installed at once in one machine
then it is enough to describe in the config file the legacy
devices only. The auto-identified devices will be added
automatically.When an ISA bus is auto-configured the events happen as
follows:All the drivers' identify routines (including the PnP
identify routine which identifies all the PnP devices) are
called in random order. As they identify the devices they add
them to the list on the ISA bus. Normally the drivers'
identify routines associate their drivers with the new
devices. The PnP identify routine does not know about the
other drivers yet so it does not associate any with the new
devices it adds.The PnP devices are put to sleep using the PnP protocol to
prevent them from being probed as legacy devices.The probe routines of non-PnP devices marked as
sensitive are called. If probe for a device went
successfully, the attach routine is called for it.The probe and attach routines of all non-PNP devices are
called likewise.The PnP devices are brought back from the sleep state and
assigned the resources they request: I/O and memory address
ranges, IRQs and DRQs, all of them not conflicting with the
attached legacy devices.Then for each PnP device the probe routines of all the
present ISA drivers are called. The first one that claims the
device gets attached. It is possible that multiple drivers
would claim the device with different priority; in this case, the
highest-priority driver wins. The probe routines must call
ISA_PNP_PROBE() to compare the actual PnP
ID with the list of the IDs supported by the driver and if the
ID is not in the table return failure. That means that
absolutely every driver, even the ones not supporting any PnP
devices must call ISA_PNP_PROBE(), at
least with an empty PnP ID table to return failure on unknown
PnP devices.The probe routine returns a positive value (the error
code) on error, zero or negative value on success.The negative return values are used when a PnP device
supports multiple interfaces. For example, an older
compatibility interface and a newer advanced interface which
are supported by different drivers. Then both drivers would
detect the device. The driver which returns a higher value in
the probe routine takes precedence (in other words, the driver
returning 0 has highest precedence, returning -1 is next,
returning -2 is after it and so on). In result the devices
which support only the old interface will be handled by the
old driver (which should return -1 from the probe routine)
while the devices supporting the new interface as well will be
handled by the new driver (which should return 0 from the
probe routine). If multiple drivers return the same value then
the one called first wins. So if a driver returns value 0 it
may be sure that it won the priority arbitration.The device-specific identify routines can also assign not
a driver but a class of drivers to the device. Then all the
drivers in the class are probed for this device, like the case
with PnP. This feature is not implemented in any existing
driver and is not considered further in this document.
- Because the PnP devices are disabled when probing the
+ As the PnP devices are disabled when probing the
legacy devices they will not be attached twice (once as legacy
and once as PnP). But in case of device-dependent identify
routines it is the responsibility of the driver to make sure
that the same device will not be attached by the driver twice:
once as legacy user-configured and once as
auto-identified.Another practical consequence for the auto-identified
devices (both PnP and device-specific) is that the flags can
not be passed to them from the kernel configuration file. So
they must either not use the flags at all or use the flags
from the device unit 0 for all the auto-identified devices or
use the sysctl interface instead of flags.Other unusual configurations may be accommodated by
accessing the configuration resources directly with functions
of families resource_query_*() and
resource_*_value(). Their implementations
are located in kern/subr_bus.c. The old IDE disk driver
i386/isa/wd.c contains examples of such use. But the standard
means of configuration must always be preferred. Leave parsing
the configuration resources to the bus configuration
code.Resourcesresourcesdevice driverresourcesThe information that a user enters into the kernel
configuration file is processed and passed to the kernel as
configuration resources. This information is parsed by the bus
configuration code and transformed into a value of structure
device_t and the bus resources associated with it. The drivers
may access the configuration resources directly using
functions resource_* for more complex cases of
configuration. However, generally this is neither needed nor recommended,
so this issue is not discussed further here.The bus resources are associated with each device. They
are identified by type and number within the type. For the ISA
bus the following types are defined:DMA channelSYS_RES_IRQ - interrupt
numberSYS_RES_DRQ - ISA DMA channel
numberSYS_RES_MEMORY - range of
device memory mapped into the system memory space
SYS_RES_IOPORT - range of
device I/O registersThe enumeration within types starts from 0, so if a device
has two memory regions it would have resources of type
SYS_RES_MEMORY numbered 0 and 1. The resource type has
nothing to do with the C language type, all the resource
values have the C language type unsigned long and must be
cast as necessary. The resource numbers do not have to be
contiguous, although for ISA they normally would be. The
permitted resource numbers for ISA devices are: IRQ: 0-1
DRQ: 0-1
MEMORY: 0-3
IOPORT: 0-7All the resources are represented as ranges, with a start
value and count. For IRQ and DRQ resources the count would
normally be equal to 1. The values for memory refer to the
physical addresses.Three types of activities can be performed on
resources:set/getallocate/releaseactivate/deactivateSetting sets the range used by the resource. Allocation
reserves the requested range that no other driver would be
able to reserve it (and checking that no other driver reserved
this range already). Activation makes the resource accessible
to the driver by doing whatever is necessary for that (for
example, for memory it would be mapping into the kernel
virtual address space).The functions to manipulate resources are:int bus_set_resource(device_t dev, int type,
int rid, u_long start, u_long count)Set a range for a resource. Returns 0 if successful,
error code otherwise. Normally, this function will
return an error only if one of type,
rid, start or
count has a value that falls out of the
permitted range. dev - driver's device type - type of resource, SYS_RES_* rid - resource number (ID) within type start, count - resource range int bus_get_resource(device_t dev, int type,
int rid, u_long *startp, u_long *countp)Get the range of resource. Returns 0 if successful,
error code if the resource is not defined yet.u_long bus_get_resource_start(device_t dev,
int type, int rid) u_long bus_get_resource_count (device_t
dev, int type, int rid)Convenience functions to get only the start or
count. Return 0 in case of error, so if the resource start
has 0 among the legitimate values it would be impossible
to tell if the value is 0 or an error occurred. Luckily,
no ISA resources for add-on drivers may have a start value
equal to 0.void bus_delete_resource(device_t dev, int
type, int rid) Delete a resource, make it undefined.struct resource *
bus_alloc_resource(device_t dev, int type, int *rid,
u_long start, u_long end, u_long count, u_int
flags)Allocate a resource as a range of count values not
allocated by anyone else, somewhere between start and
end. Alas, alignment is not supported. If the resource
was not set yet it is automatically created. The special
values of start 0 and end ~0 (all ones) means that the
fixed values previously set by
bus_set_resource() must be used
instead: start and count as themselves and
end=(start+count), in this case if the resource was not
defined before then an error is returned. Although rid is
passed by reference it is not set anywhere by the resource
allocation code of the ISA bus. (The other buses may use a
different approach and modify it).Flags are a bitmap, the flags interesting for the caller
are:RF_ACTIVE - causes the resource
to be automatically activated after allocation.RF_SHAREABLE - resource may be
shared at the same time by multiple drivers.RF_TIMESHARE - resource may be
time-shared by multiple drivers, i.e., allocated at the
same time by many but activated only by one at any given
moment of time.Returns 0 on error. The allocated values may be
obtained from the returned handle using methods
rhand_*().int bus_release_resource(device_t dev, int
type, int rid, struct resource *r)Release the resource, r is the handle returned by
bus_alloc_resource(). Returns 0 on
success, error code otherwise.int bus_activate_resource(device_t dev, int
type, int rid, struct resource *r)int bus_deactivate_resource(device_t dev, int
type, int rid, struct resource *r)Activate or deactivate resource. Return 0 on success,
error code otherwise. If the resource is time-shared and
currently activated by another driver then EBUSY is
returned.int bus_setup_intr(device_t dev, struct
resource *r, int flags, driver_intr_t *handler, void *arg,
void **cookiep)int
bus_teardown_intr(device_t dev, struct resource *r, void
*cookie)Associate or de-associate the interrupt handler with a
device. Return 0 on success, error code otherwise.r - the activated resource handler describing the
IRQflags - the interrupt priority level, one of:INTR_TYPE_TTY - terminals and
other likewise character-type devices. To mask them
use spltty().(INTR_TYPE_TTY |
INTR_TYPE_FAST) - terminal type devices
with small input buffer, critical to the data loss on
input (such as the old-fashioned serial ports). To
mask them use spltty().INTR_TYPE_BIO - block-type
devices, except those on the CAM controllers. To mask
them use splbio().INTR_TYPE_CAM - CAM (Common
Access Method) bus controllers. To mask them use
splcam().INTR_TYPE_NET - network
interface controllers. To mask them use
splimp().INTR_TYPE_MISC -
miscellaneous devices. There is no other way to mask
them than by splhigh() which
masks all interrupts.When an interrupt handler executes all the other
interrupts matching its priority level will be masked. The
only exception is the MISC level for which no other interrupts
are masked and which is not masked by any other
interrupt.handler - pointer to the handler
function, the type driver_intr_t is defined as void
driver_intr_t(void *)arg - the argument passed to the
handler to identify this particular device. It is cast
from void* to any real type by the handler. The old
convention for the ISA interrupt handlers was to use the
unit number as argument, the new (recommended) convention
is using a pointer to the device softc structure.cookie[p] - the value received
from setup() is used to identify the
handler when passed to
teardown()A number of methods are defined to operate on the resource
handlers (struct resource *). Those of interest to the device
driver writers are:u_long rman_get_start(r) u_long
rman_get_end(r) Get the start and end of
allocated resource range.void *rman_get_virtual(r) Get
the virtual address of activated memory resource.Bus Memory MappingIn many cases data is exchanged between the driver and the
device through the memory. Two variants are possible:(a) memory is located on the device card(b) memory is the main memory of the computerIn case (a) the driver always copies the data back and
forth between the on-card memory and the main memory as
necessary. To map the on-card memory into the kernel virtual
address space the physical address and length of the on-card
memory must be defined as a SYS_RES_MEMORY resource. That
resource can then be allocated and activated, and its virtual
address obtained using
rman_get_virtual(). The older drivers
used the function pmap_mapdev() for this
purpose, which should not be used directly any more. Now it is
one of the internal steps of resource activation.Most of the ISA cards will have their memory configured
for physical location somewhere in range 640KB-1MB. Some of
the ISA cards require larger memory ranges which should be
placed somewhere under 16MB (because of the 24-bit address
limitation on the ISA bus). In that case if the machine has
more memory than the start address of the device memory (in
other words, they overlap) a memory hole must be configured at
the address range used by devices. Many BIOSes allow
configuration of a memory hole of 1MB starting at 14MB or
15MB. FreeBSD can handle the memory holes properly if the BIOS
reports them properly (this feature may be broken on old BIOSes).In case (b) just the address of the data is sent to
the device, and the device uses DMA to actually access the
data in the main memory. Two limitations are present: First,
ISA cards can only access memory below 16MB. Second, the
contiguous pages in virtual address space may not be
contiguous in physical address space, so the device may have
to do scatter/gather operations. The bus subsystem provides
ready solutions for some of these problems, the rest has to be
done by the drivers themselves.Two structures are used for DMA memory allocation,
bus_dma_tag_t and bus_dmamap_t. Tag describes the properties
required for the DMA memory. Map represents a memory block
allocated according to these properties. Multiple maps may be
associated with the same tag.Tags are organized into a tree-like hierarchy with
inheritance of the properties. A child tag inherits all the
requirements of its parent tag, and may make them more strict
but never more loose.Normally one top-level tag (with no parent) is created for
each device unit. If multiple memory areas with different
requirements are needed for each device then a tag for each of
them may be created as a child of the parent tag.The tags can be used to create a map in two ways.First, a chunk of contiguous memory conformant with the
tag requirements may be allocated (and later may be
freed). This is normally used to allocate relatively
long-living areas of memory for communication with the
device. Loading of such memory into a map is trivial: it is
always considered as one chunk in the appropriate physical
memory range.Second, an arbitrary area of virtual memory may be loaded
into a map. Each page of this memory will be checked for
conformance to the map requirement. If it conforms then it is
left at its original location. If it is not then a fresh
conformant bounce page is allocated and used as intermediate
storage. When writing the data from the non-conformant
original pages they will be copied to their bounce pages first
and then transferred from the bounce pages to the device. When
reading the data would go from the device to the bounce pages
and then copied to their non-conformant original pages. The
process of copying between the original and bounce pages is
called synchronization. This is normally used on a per-transfer
basis: buffer for each transfer would be loaded, transfer done
and buffer unloaded.The functions working on the DMA memory are:int bus_dma_tag_create(bus_dma_tag_t parent,
bus_size_t alignment, bus_size_t boundary, bus_addr_t
lowaddr, bus_addr_t highaddr, bus_dma_filter_t *filter, void
*filterarg, bus_size_t maxsize, int nsegments, bus_size_t
maxsegsz, int flags, bus_dma_tag_t *dmat)Create a new tag. Returns 0 on success, the error code
otherwise.parent - parent tag, or NULL to
create a top-level tag.alignment -
required physical alignment of the memory area to be
allocated for this tag. Use value 1 for no specific
alignment. Applies only to the future
bus_dmamem_alloc() but not
bus_dmamap_create() calls.boundary - physical address
boundary that must not be crossed when allocating the
memory. Use value 0 for no boundary. Applies only to
the future bus_dmamem_alloc() but
not bus_dmamap_create() calls.
Must be power of 2. If the memory is planned to be used
in non-cascaded DMA mode (i.e., the DMA addresses will be
supplied not by the device itself but by the ISA DMA
controller) then the boundary must be no larger than
64KB (64*1024) due to the limitations of the DMA
hardware.lowaddr, highaddr - the names
are slightly misleading; these values are used to limit
the permitted range of physical addresses used to
allocate the memory. The exact meaning varies depending
on the planned future use:For bus_dmamem_alloc() all
the addresses from 0 to lowaddr-1 are considered
permitted, the higher ones are forbidden.For bus_dmamap_create() all
the addresses outside the inclusive range [lowaddr;
highaddr] are considered accessible. The addresses
of pages inside the range are passed to the filter
function which decides if they are accessible. If no
filter function is supplied then all the range is
considered unaccessible.For the ISA devices the normal values (with no
filter function) are:lowaddr = BUS_SPACE_MAXADDR_24BIThighaddr = BUS_SPACE_MAXADDRfilter, filterarg - the filter
function and its argument. If NULL is passed for filter
then the whole range [lowaddr, highaddr] is considered
unaccessible when doing
bus_dmamap_create(). Otherwise the
physical address of each attempted page in range
[lowaddr; highaddr] is passed to the filter function
which decides if it is accessible. The prototype of the
filter function is: int filterfunc(void *arg,
bus_addr_t paddr). It must return 0 if the
page is accessible, non-zero otherwise.maxsize - the maximal size of
memory (in bytes) that may be allocated through this
tag. In case it is difficult to estimate or could be
arbitrarily big, the value for ISA devices would be
BUS_SPACE_MAXSIZE_24BIT.nsegments - maximal number of
scatter-gather segments supported by the device. If
unrestricted then the value BUS_SPACE_UNRESTRICTED
should be used. This value is recommended for the parent
tags, the actual restrictions would then be specified
for the descendant tags. Tags with nsegments equal to
BUS_SPACE_UNRESTRICTED may not be used to actually load
maps, they may be used only as parent tags. The
practical limit for nsegments seems to be about 250-300,
higher values will cause kernel stack overflow (the hardware
can not normally support that many
scatter-gather buffers anyway).maxsegsz - maximal size of a
scatter-gather segment supported by the device. The
maximal value for ISA device would be
BUS_SPACE_MAXSIZE_24BIT.flags - a bitmap of flags. The
only interesting flags are:BUS_DMA_ALLOCNOW - requests
to allocate all the potentially needed bounce pages
when creating the tag.BUS_DMA_ISA - mysterious
flag used only on Alpha machines. It is not defined
for the i386 machines. Probably it should be used
by all the ISA drivers for Alpha machines but it
looks like there are no such drivers yet.dmat - pointer to the storage
for the new tag to be returned.int bus_dma_tag_destroy(bus_dma_tag_t
dmat)Destroy a tag. Returns 0 on success, the error code
otherwise.dmat - the tag to be destroyed.int bus_dmamem_alloc(bus_dma_tag_t dmat,
void** vaddr, int flags, bus_dmamap_t
*mapp)Allocate an area of contiguous memory described by the
tag. The size of memory to be allocated is tag's maxsize.
Returns 0 on success, the error code otherwise. The result
still has to be loaded by
bus_dmamap_load() before being used to get
the physical address of the memory.dmat - the tag
vaddr - pointer to the storage
for the kernel virtual address of the allocated area
to be returned.
flags - a bitmap of flags. The only interesting flag is:
BUS_DMA_NOWAIT - if the
memory is not immediately available return the
error. If this flag is not set then the routine
is allowed to sleep until the memory
becomes available.
mapp - pointer to the storage
for the new map to be returned.
void bus_dmamem_free(bus_dma_tag_t dmat, void
*vaddr, bus_dmamap_t map)
Free the memory allocated by
bus_dmamem_alloc(). At present,
freeing of the memory allocated with ISA restrictions is
- not implemented. Because of this the recommended model
+ not implemented. Due to this the recommended model
of use is to keep and re-use the allocated areas for as
long as possible. Do not lightly free some area and then
shortly allocate it again. That does not mean that
bus_dmamem_free() should not be
used at all: hopefully it will be properly implemented
soon.
dmat - the tag
vaddr - the kernel virtual
address of the memory
map - the map of the memory (as
returned from
bus_dmamem_alloc())
int bus_dmamap_create(bus_dma_tag_t dmat, int
flags, bus_dmamap_t *mapp)
Create a map for the tag, to be used in
bus_dmamap_load() later. Returns 0
on success, the error code otherwise.
dmat - the tag
flags - theoretically, a bit map
of flags. But no flags are defined yet, so at present
it will be always 0.
mapp - pointer to the storage
for the new map to be returned
int bus_dmamap_destroy(bus_dma_tag_t dmat,
bus_dmamap_t map)
Destroy a map. Returns 0 on success, the error code otherwise.
dmat - the tag to which the map is associated
map - the map to be destroyed
int bus_dmamap_load(bus_dma_tag_t dmat,
bus_dmamap_t map, void *buf, bus_size_t buflen,
bus_dmamap_callback_t *callback, void *callback_arg, int
flags)
Load a buffer into the map (the map must be previously
created by bus_dmamap_create() or
bus_dmamem_alloc()). All the pages
of the buffer are checked for conformance to the tag
requirements and for those not conformant the bounce
pages are allocated. An array of physical segment
descriptors is built and passed to the callback
routine. This callback routine is then expected to
handle it in some way. The number of bounce buffers in
the system is limited, so if the bounce buffers are
needed but not immediately available the request will be
queued and the callback will be called when the bounce
buffers will become available. Returns 0 if the callback
was executed immediately or EINPROGRESS if the request
was queued for future execution. In the latter case the
synchronization with queued callback routine is the
responsibility of the driver.
dmat - the tag
map - the map
buf - kernel virtual address of
the buffer
buflen - length of the buffer
callback,
callback_arg - the callback function and
its argument
The prototype of callback function is:
void callback(void *arg, bus_dma_segment_t
*seg, int nseg, int error)arg - the same as callback_arg
passed to bus_dmamap_load()seg - array of the segment
descriptors
nseg - number of descriptors in
array
error - indication of the
segment number overflow: if it is set to EFBIG then
the buffer did not fit into the maximal number of
segments permitted by the tag. In this case only the
permitted number of descriptors will be in the
array. Handling of this situation is up to the
driver: depending on the desired semantics it can
either consider this an error or split the buffer in
two and handle the second part separately
Each entry in the segments array contains the fields:
ds_addr - physical bus address
of the segment
ds_len - length of the segment
void bus_dmamap_unload(bus_dma_tag_t dmat,
bus_dmamap_t map)unload the map.
dmat - tag
map - loaded map
void bus_dmamap_sync (bus_dma_tag_t dmat,
bus_dmamap_t map, bus_dmasync_op_t op)
Synchronise a loaded buffer with its bounce pages before
and after physical transfer to or from device. This is
the function that does all the necessary copying of data
between the original buffer and its mapped version. The
buffers must be synchronized both before and after doing
the transfer.
dmat - tag
map - loaded map
op - type of synchronization
operation to perform:
BUS_DMASYNC_PREREAD - before
reading from device into buffer
BUS_DMASYNC_POSTREAD - after
reading from device into buffer
BUS_DMASYNC_PREWRITE - before
writing the buffer to device
BUS_DMASYNC_POSTWRITE - after
writing the buffer to device
As of now PREREAD and POSTWRITE are null operations but that
may change in the future, so they must not be ignored in the
driver. Synchronization is not needed for the memory
obtained from bus_dmamem_alloc().
Before calling the callback function from
bus_dmamap_load() the segment array is
stored in the stack. And it gets pre-allocated for the
- maximal number of segments allowed by the tag. Because of
+ maximal number of segments allowed by the tag. As a result of
this the practical limit for the number of segments on i386
architecture is about 250-300 (the kernel stack is 4KB minus
the size of the user structure, size of a segment array
- entry is 8 bytes, and some space must be left). Because the
+ entry is 8 bytes, and some space must be left). Since the
array is allocated based on the maximal number this value
must not be set higher than really needed. Fortunately, for
most of hardware the maximal supported number of segments is
much lower. But if the driver wants to handle buffers with a
very large number of scatter-gather segments it should do
that in portions: load part of the buffer, transfer it to
the device, load next part of the buffer, and so on.
Another practical consequence is that the number of segments
may limit the size of the buffer. If all the pages in the
buffer happen to be physically non-contiguous then the
maximal supported buffer size for that fragmented case would
be (nsegments * page_size). For example, if a maximal number
of 10 segments is supported then on i386 maximal guaranteed
supported buffer size would be 40K. If a higher size is
desired then special tricks should be used in the driver.
If the hardware does not support scatter-gather at all or
the driver wants to support some buffer size even if it is
heavily fragmented then the solution is to allocate a
contiguous buffer in the driver and use it as intermediate
storage if the original buffer does not fit.
Below are the typical call sequences when using a map depend
on the use of the map. The characters -> are used to show
the flow of time.
For a buffer which stays practically fixed during all the
time between attachment and detachment of a device:
bus_dmamem_alloc -> bus_dmamap_load -> ...use buffer... ->
-> bus_dmamap_unload -> bus_dmamem_free
For a buffer that changes frequently and is passed from
outside the driver:
bus_dmamap_create ->
-> bus_dmamap_load -> bus_dmamap_sync(PRE...) -> do transfer ->
-> bus_dmamap_sync(POST...) -> bus_dmamap_unload ->
...
-> bus_dmamap_load -> bus_dmamap_sync(PRE...) -> do transfer ->
-> bus_dmamap_sync(POST...) -> bus_dmamap_unload ->
-> bus_dmamap_destroy
When loading a map created by
bus_dmamem_alloc() the passed address
and size of the buffer must be the same as used in
bus_dmamem_alloc(). In this case it is
guaranteed that the whole buffer will be mapped as one
segment (so the callback may be based on this assumption)
and the request will be executed immediately (EINPROGRESS
will never be returned). All the callback needs to do in
this case is to save the physical address.
A typical example would be:
static void
alloc_callback(void *arg, bus_dma_segment_t *seg, int nseg, int error)
{
*(bus_addr_t *)arg = seg[0].ds_addr;
}
...
int error;
struct somedata {
....
};
struct somedata *vsomedata; /* virtual address */
bus_addr_t psomedata; /* physical bus-relative address */
bus_dma_tag_t tag_somedata;
bus_dmamap_t map_somedata;
...
error=bus_dma_tag_create(parent_tag, alignment,
boundary, lowaddr, highaddr, /*filter*/ NULL, /*filterarg*/ NULL,
/*maxsize*/ sizeof(struct somedata), /*nsegments*/ 1,
/*maxsegsz*/ sizeof(struct somedata), /*flags*/ 0,
&tag_somedata);
if(error)
return error;
error = bus_dmamem_alloc(tag_somedata, &vsomedata, /* flags*/ 0,
&map_somedata);
if(error)
return error;
bus_dmamap_load(tag_somedata, map_somedata, (void *)vsomedata,
sizeof (struct somedata), alloc_callback,
(void *) &psomedata, /*flags*/0);
Looks a bit long and complicated but that is the way to do
it. The practical consequence is: if multiple memory areas
are allocated always together it would be a really good idea
to combine them all into one structure and allocate as one
(if the alignment and boundary limitations permit).
When loading an arbitrary buffer into the map created by
bus_dmamap_create() special measures
must be taken to synchronize with the callback in case it
would be delayed. The code would look like:
{
int s;
int error;
s = splsoftvm();
error = bus_dmamap_load(
dmat,
dmamap,
buffer_ptr,
buffer_len,
callback,
/*callback_arg*/ buffer_descriptor,
/*flags*/0);
if (error == EINPROGRESS) {
/*
* Do whatever is needed to ensure synchronization
* with callback. Callback is guaranteed not to be started
* until we do splx() or tsleep().
*/
}
splx(s);
}
Two possible approaches for the processing of requests are:
1. If requests are completed by marking them explicitly as
done (such as the CAM requests) then it would be simpler to
put all the further processing into the callback driver
which would mark the request when it is done. Then not much
extra synchronization is needed. For the flow control
reasons it may be a good idea to freeze the request queue
until this request gets completed.
2. If requests are completed when the function returns (such
as classic read or write requests on character devices) then
a synchronization flag should be set in the buffer
descriptor and tsleep() called. Later
when the callback gets called it will do its processing and
check this synchronization flag. If it is set then the
callback should issue a wakeup. In this approach the
callback function could either do all the needed processing
(just like the previous case) or simply save the segments
array in the buffer descriptor. Then after callback
completes the calling function could use this saved segments
array and do all the processing.
DMADirect Memory Access (DMA)
The Direct Memory Access (DMA) is implemented in the ISA bus
through the DMA controller (actually, two of them but that is
an irrelevant detail). To make the early ISA devices simple
and cheap the logic of the bus control and address
generation was concentrated in the DMA controller.
Fortunately, FreeBSD provides a set of functions that mostly
hide the annoying details of the DMA controller from the
device drivers.
The simplest case is for the fairly intelligent
devices. Like the bus master devices on PCI they can
generate the bus cycles and memory addresses all by
themselves. The only thing they really need from the DMA
controller is bus arbitration. So for this purpose they
pretend to be cascaded slave DMA controllers. And the only
thing needed from the system DMA controller is to enable the
cascaded mode on a DMA channel by calling the following
function when attaching the driver:
void isa_dmacascade(int channel_number)
All the further activity is done by programming the
device. When detaching the driver no DMA-related functions
need to be called.
For the simpler devices things get more complicated. The
functions used are:
int isa_dma_acquire(int chanel_number)
Reserve a DMA channel. Returns 0 on success or EBUSY
if the channel was already reserved by this or a
different driver. Most of the ISA devices are not able
to share DMA channels anyway, so normally this
function is called when attaching a device. This
reservation was made redundant by the modern interface
of bus resources but still must be used in addition to
the latter. If not used then later, other DMA routines
will panic.
int isa_dma_release(int chanel_number)
Release a previously reserved DMA channel. No
transfers must be in progress when the channel is
released (in addition the device must not try to
initiate transfer after the channel is released).
void isa_dmainit(int chan, u_int
bouncebufsize)
Allocate a bounce buffer for use with the specified
channel. The requested size of the buffer can not exceed
64KB. This bounce buffer will be automatically used
later if a transfer buffer happens to be not
physically contiguous or outside of the memory
accessible by the ISA bus or crossing the 64KB
boundary. If the transfers will be always done from
buffers which conform to these conditions (such as
those allocated by
bus_dmamem_alloc() with proper
limitations) then isa_dmainit()
does not have to be called. But it is quite convenient
to transfer arbitrary data using the DMA controller.
The bounce buffer will automatically care of the
scatter-gather issues.
chan - channel number
bouncebufsize - size of the
bounce buffer in bytes
void isa_dmastart(int flags, caddr_t addr, u_int
nbytes, int chan)
Prepare to start a DMA transfer. This function must be
called to set up the DMA controller before actually
starting transfer on the device. It checks that the
buffer is contiguous and falls into the ISA memory
range, if not then the bounce buffer is automatically
used. If bounce buffer is required but not set up by
isa_dmainit() or too small for
the requested transfer size then the system will
panic. In case of a write request with bounce buffer
the data will be automatically copied to the bounce
buffer.
flags - a bitmask determining the type of operation to
be done. The direction bits B_READ and B_WRITE are mutually
exclusive.
B_READ - read from the ISA bus into memory
B_WRITE - write from the memory to the ISA bus
B_RAW - if set then the DMA controller will remember
the buffer and after the end of transfer will
automatically re-initialize itself to repeat transfer
of the same buffer again (of course, the driver may
change the data in the buffer before initiating
another transfer in the device). If not set then the
parameters will work only for one transfer, and
isa_dmastart() will have to be
called again before initiating the next
transfer. Using B_RAW makes sense only if the bounce
buffer is not used.
addr - virtual address of the buffer
nbytes - length of the buffer. Must be less or equal to
64KB. Length of 0 is not allowed: the DMA controller will
understand it as 64KB while the kernel code will
understand it as 0 and that would cause unpredictable
effects. For channels number 4 and higher the length must
be even because these channels transfer 2 bytes at a
time. In case of an odd length the last byte will not be
transferred.
chan - channel number
void isa_dmadone(int flags, caddr_t addr, int
nbytes, int chan)
Synchronize the memory after device reports that transfer
is done. If that was a read operation with a bounce buffer
then the data will be copied from the bounce buffer to the
original buffer. Arguments are the same as for
isa_dmastart(). Flag B_RAW is
permitted but it does not affect
isa_dmadone() in any way.
int isa_dmastatus(int channel_number)
Returns the number of bytes left in the current transfer
to be transferred. In case the flag B_READ was set in
isa_dmastart() the number returned
will never be equal to zero. At the end of transfer it
will be automatically reset back to the length of
buffer. The normal use is to check the number of bytes
left after the device signals that the transfer is
completed. If the number of bytes is not 0 then something
probably went wrong with that transfer.
int isa_dmastop(int channel_number)
Aborts the current transfer and returns the number of
bytes left untransferred.
xxx_isa_probe
This function probes if a device is present. If the driver
supports auto-detection of some part of device configuration
(such as interrupt vector or memory address) this
auto-detection must be done in this routine.
As for any other bus, if the device cannot be detected or
is detected but failed the self-test or some other problem
happened then it returns a positive value of error. The
value ENXIO must be returned if the device is not
present. Other error values may mean other conditions. Zero
or negative values mean success. Most of the drivers return
zero as success.
The negative return values are used when a PnP device
supports multiple interfaces. For example, an older
compatibility interface and a newer advanced interface which
are supported by different drivers. Then both drivers would
detect the device. The driver which returns a higher value
in the probe routine takes precedence (in other words, the
driver returning 0 has highest precedence, one returning -1
is next, one returning -2 is after it and so on). In result
the devices which support only the old interface will be
handled by the old driver (which should return -1 from the
probe routine) while the devices supporting the new
interface as well will be handled by the new driver (which
should return 0 from the probe routine).
The device descriptor struct xxx_softc is allocated by the
system before calling the probe routine. If the probe
routine returns an error the descriptor will be
automatically deallocated by the system. So if a probing
error occurs the driver must make sure that all the
resources it used during probe are deallocated and that
nothing keeps the descriptor from being safely
deallocated. If the probe completes successfully the
descriptor will be preserved by the system and later passed
to the routine xxx_isa_attach(). If a
driver returns a negative value it can not be sure that it
will have the highest priority and its attach routine will
be called. So in this case it also must release all the
resources before returning and if necessary allocate them
again in the attach routine. When
xxx_isa_probe() returns 0 releasing the
resources before returning is also a good idea and a
well-behaved driver should do so. But in cases where there is
some problem with releasing the resources the driver is
allowed to keep resources between returning 0 from the probe
routine and execution of the attach routine.
A typical probe routine starts with getting the device
descriptor and unit:
struct xxx_softc *sc = device_get_softc(dev);
int unit = device_get_unit(dev);
int pnperror;
int error = 0;
sc->dev = dev; /* link it back */
sc->unit = unit;
Then check for the PnP devices. The check is carried out by
a table containing the list of PnP IDs supported by this
driver and human-readable descriptions of the device models
corresponding to these IDs.
pnperror=ISA_PNP_PROBE(device_get_parent(dev), dev,
xxx_pnp_ids); if(pnperror == ENXIO) return ENXIO;
The logic of ISA_PNP_PROBE is the following: If this card
(device unit) was not detected as PnP then ENOENT will be
returned. If it was detected as PnP but its detected ID does
not match any of the IDs in the table then ENXIO is
returned. Finally, if it has PnP support and it matches on
of the IDs in the table, 0 is returned and the appropriate
description from the table is set by
device_set_desc().
If a driver supports only PnP devices then the condition
would look like:
if(pnperror != 0)
return pnperror;
No special treatment is required for the drivers which do not
support PnP because they pass an empty PnP ID table and will
always get ENXIO if called on a PnP card.
The probe routine normally needs at least some minimal set
of resources, such as I/O port number to find the card and
probe it. Depending on the hardware the driver may be able
to discover the other necessary resources automatically. The
PnP devices have all the resources pre-set by the PnP
subsystem, so the driver does not need to discover them by
itself.
Typically the minimal information required to get access to
the device is the I/O port number. Then some devices allow
to get the rest of information from the device configuration
registers (though not all devices do that). So first we try
to get the port start value:
sc->port0 = bus_get_resource_start(dev,
SYS_RES_IOPORT, 0 /*rid*/); if(sc->port0 == 0) return ENXIO;
The base port address is saved in the structure softc for
future use. If it will be used very often then calling the
resource function each time would be prohibitively slow. If
we do not get a port we just return an error. Some device
drivers can instead be clever and try to probe all the
possible ports, like this:
/* table of all possible base I/O port addresses for this device */
static struct xxx_allports {
u_short port; /* port address */
short used; /* flag: if this port is already used by some unit */
} xxx_allports = {
{ 0x300, 0 },
{ 0x320, 0 },
{ 0x340, 0 },
{ 0, 0 } /* end of table */
};
...
int port, i;
...
port = bus_get_resource_start(dev, SYS_RES_IOPORT, 0 /*rid*/);
if(port !=0 ) {
for(i=0; xxx_allports[i].port!=0; i++) {
if(xxx_allports[i].used || xxx_allports[i].port != port)
continue;
/* found it */
xxx_allports[i].used = 1;
/* do probe on a known port */
return xxx_really_probe(dev, port);
}
return ENXIO; /* port is unknown or already used */
}
/* we get here only if we need to guess the port */
for(i=0; xxx_allports[i].port!=0; i++) {
if(xxx_allports[i].used)
continue;
/* mark as used - even if we find nothing at this port
* at least we won't probe it in future
*/
xxx_allports[i].used = 1;
error = xxx_really_probe(dev, xxx_allports[i].port);
if(error == 0) /* found a device at that port */
return 0;
}
/* probed all possible addresses, none worked */
return ENXIO;
Of course, normally the driver's
identify() routine should be used for
such things. But there may be one valid reason why it may be
better to be done in probe(): if this
probe would drive some other sensitive device crazy. The
probe routines are ordered with consideration of the
sensitive flag: the sensitive devices get probed first and
the rest of the devices later. But the
identify() routines are called before
any probes, so they show no respect to the sensitive devices
and may upset them.
Now, after we got the starting port we need to set the port
count (except for PnP devices) because the kernel does not
have this information in the configuration file.
if(pnperror /* only for non-PnP devices */
&& bus_set_resource(dev, SYS_RES_IOPORT, 0, sc->port0,
XXX_PORT_COUNT)<0)
return ENXIO;
Finally allocate and activate a piece of port address space
(special values of start and end mean use those we set by
bus_set_resource()):
sc->port0_rid = 0;
sc->port0_r = bus_alloc_resource(dev, SYS_RES_IOPORT,
&sc->port0_rid,
/*start*/ 0, /*end*/ ~0, /*count*/ 0, RF_ACTIVE);
if(sc->port0_r == NULL)
return ENXIO;
Now having access to the port-mapped registers we can poke
the device in some way and check if it reacts like it is
expected to. If it does not then there is probably some
other device or no device at all at this address.
Normally drivers do not set up the interrupt handlers until
the attach routine. Instead they do probes in the polling
mode using the DELAY() function for
timeout. The probe routine must never hang forever, all the
waits for the device must be done with timeouts. If the
device does not respond within the time it is probably broken
or misconfigured and the driver must return error. When
determining the timeout interval give the device some extra
time to be on the safe side: although
DELAY() is supposed to delay for the
same amount of time on any machine it has some margin of
error, depending on the exact CPU.
If the probe routine really wants to check that the
interrupts really work it may configure and probe the
interrupts too. But that is not recommended.
/* implemented in some very device-specific way */
if(error = xxx_probe_ports(sc))
goto bad; /* will deallocate the resources before returning */
The function xxx_probe_ports() may also
set the device description depending on the exact model of
device it discovers. But if there is only one supported
device model this can be as well done in a hardcoded way.
Of course, for the PnP devices the PnP support sets the
description from the table automatically.
if(pnperror)
device_set_desc(dev, "Our device model 1234");
Then the probe routine should either discover the ranges of
all the resources by reading the device configuration
registers or make sure that they were set explicitly by the
user. We will consider it with an example of on-board
memory. The probe routine should be as non-intrusive as
possible, so allocation and check of functionality of the
rest of resources (besides the ports) would be better left
to the attach routine.
The memory address may be specified in the kernel
configuration file or on some devices it may be
pre-configured in non-volatile configuration registers. If
both sources are available and different, which one should
be used? Probably if the user bothered to set the address
explicitly in the kernel configuration file they know what
they are doing and this one should take precedence. An
example of implementation could be:
/* try to find out the config address first */
sc->mem0_p = bus_get_resource_start(dev, SYS_RES_MEMORY, 0 /*rid*/);
if(sc->mem0_p == 0) { /* nope, not specified by user */
sc->mem0_p = xxx_read_mem0_from_device_config(sc);
if(sc->mem0_p == 0)
/* can't get it from device config registers either */
goto bad;
} else {
if(xxx_set_mem0_address_on_device(sc) < 0)
goto bad; /* device does not support that address */
}
/* just like the port, set the memory size,
* for some devices the memory size would not be constant
* but should be read from the device configuration registers instead
* to accommodate different models of devices. Another option would
* be to let the user set the memory size as "msize" configuration
* resource which will be automatically handled by the ISA bus.
*/
if(pnperror) { /* only for non-PnP devices */
sc->mem0_size = bus_get_resource_count(dev, SYS_RES_MEMORY, 0 /*rid*/);
if(sc->mem0_size == 0) /* not specified by user */
sc->mem0_size = xxx_read_mem0_size_from_device_config(sc);
if(sc->mem0_size == 0) {
/* suppose this is a very old model of device without
* auto-configuration features and the user gave no preference,
* so assume the minimalistic case
* (of course, the real value will vary with the driver)
*/
sc->mem0_size = 8*1024;
}
if(xxx_set_mem0_size_on_device(sc) < 0)
goto bad; /* device does not support that size */
if(bus_set_resource(dev, SYS_RES_MEMORY, /*rid*/0,
sc->mem0_p, sc->mem0_size)<0)
goto bad;
} else {
sc->mem0_size = bus_get_resource_count(dev, SYS_RES_MEMORY, 0 /*rid*/);
}
Resources for IRQ and DRQ are easy to check by analogy.
If all went well then release all the resources and return success.
xxx_free_resources(sc);
return 0;
Finally, handle the troublesome situations. All the
resources should be deallocated before returning. We make
use of the fact that before the structure softc is passed to
us it gets zeroed out, so we can find out if some resource
was allocated: then its descriptor is non-zero.
bad:
xxx_free_resources(sc);
if(error)
return error;
else /* exact error is unknown */
return ENXIO;
That would be all for the probe routine. Freeing of
resources is done from multiple places, so it is moved to a
function which may look like:
static void
xxx_free_resources(sc)
struct xxx_softc *sc;
{
/* check every resource and free if not zero */
/* interrupt handler */
if(sc->intr_r) {
bus_teardown_intr(sc->dev, sc->intr_r, sc->intr_cookie);
bus_release_resource(sc->dev, SYS_RES_IRQ, sc->intr_rid,
sc->intr_r);
sc->intr_r = 0;
}
/* all kinds of memory maps we could have allocated */
if(sc->data_p) {
bus_dmamap_unload(sc->data_tag, sc->data_map);
sc->data_p = 0;
}
if(sc->data) { /* sc->data_map may be legitimately equal to 0 */
/* the map will also be freed */
bus_dmamem_free(sc->data_tag, sc->data, sc->data_map);
sc->data = 0;
}
if(sc->data_tag) {
bus_dma_tag_destroy(sc->data_tag);
sc->data_tag = 0;
}
... free other maps and tags if we have them ...
if(sc->parent_tag) {
bus_dma_tag_destroy(sc->parent_tag);
sc->parent_tag = 0;
}
/* release all the bus resources */
if(sc->mem0_r) {
bus_release_resource(sc->dev, SYS_RES_MEMORY, sc->mem0_rid,
sc->mem0_r);
sc->mem0_r = 0;
}
...
if(sc->port0_r) {
bus_release_resource(sc->dev, SYS_RES_IOPORT, sc->port0_rid,
sc->port0_r);
sc->port0_r = 0;
}
}xxx_isa_attachThe attach routine actually connects the driver to the
system if the probe routine returned success and the system
had chosen to attach that driver. If the probe routine
returned 0 then the attach routine may expect to receive the
device structure softc intact, as it was set by the probe
routine. Also if the probe routine returns 0 it may expect
that the attach routine for this device shall be called at
some point in the future. If the probe routine returns a
negative value then the driver may make none of these
assumptions.
The attach routine returns 0 if it completed successfully or
error code otherwise.
The attach routine starts just like the probe routine,
with getting some frequently used data into more accessible
variables.
struct xxx_softc *sc = device_get_softc(dev);
int unit = device_get_unit(dev);
int error = 0;Then allocate and activate all the necessary
- resources. Because normally the port range will be released
+ resources. As normally the port range will be released
before returning from probe, it has to be allocated
again. We expect that the probe routine had properly set all
the resource ranges, as well as saved them in the structure
softc. If the probe routine had left some resource allocated
then it does not need to be allocated again (which would be
considered an error).
sc->port0_rid = 0;
sc->port0_r = bus_alloc_resource(dev, SYS_RES_IOPORT, &sc->port0_rid,
/*start*/ 0, /*end*/ ~0, /*count*/ 0, RF_ACTIVE);
if(sc->port0_r == NULL)
return ENXIO;
/* on-board memory */
sc->mem0_rid = 0;
sc->mem0_r = bus_alloc_resource(dev, SYS_RES_MEMORY, &sc->mem0_rid,
/*start*/ 0, /*end*/ ~0, /*count*/ 0, RF_ACTIVE);
if(sc->mem0_r == NULL)
goto bad;
/* get its virtual address */
sc->mem0_v = rman_get_virtual(sc->mem0_r);The DMA request channel (DRQ) is allocated likewise. To
initialize it use functions of the
isa_dma*() family. For example:
isa_dmacascade(sc->drq0);The interrupt request line (IRQ) is a bit
special. Besides allocation the driver's interrupt handler
should be associated with it. Historically in the old ISA
drivers the argument passed by the system to the interrupt
handler was the device unit number. But in modern drivers
the convention suggests passing the pointer to structure
softc. The important reason is that when the structures
softc are allocated dynamically then getting the unit number
from softc is easy while getting softc from the unit number is
difficult. Also this convention makes the drivers for
different buses look more uniform and allows them to share
the code: each bus gets its own probe, attach, detach and
other bus-specific routines while the bulk of the driver
code may be shared among them.
sc->intr_rid = 0;
sc->intr_r = bus_alloc_resource(dev, SYS_RES_MEMORY, &sc->intr_rid,
/*start*/ 0, /*end*/ ~0, /*count*/ 0, RF_ACTIVE);
if(sc->intr_r == NULL)
goto bad;
/*
* XXX_INTR_TYPE is supposed to be defined depending on the type of
* the driver, for example as INTR_TYPE_CAM for a CAM driver
*/
error = bus_setup_intr(dev, sc->intr_r, XXX_INTR_TYPE,
(driver_intr_t *) xxx_intr, (void *) sc, &sc->intr_cookie);
if(error)
goto bad;
If the device needs to make DMA to the main memory then
this memory should be allocated like described before:
error=bus_dma_tag_create(NULL, /*alignment*/ 4,
/*boundary*/ 0, /*lowaddr*/ BUS_SPACE_MAXADDR_24BIT,
/*highaddr*/ BUS_SPACE_MAXADDR, /*filter*/ NULL, /*filterarg*/ NULL,
/*maxsize*/ BUS_SPACE_MAXSIZE_24BIT,
/*nsegments*/ BUS_SPACE_UNRESTRICTED,
/*maxsegsz*/ BUS_SPACE_MAXSIZE_24BIT, /*flags*/ 0,
&sc->parent_tag);
if(error)
goto bad;
/* many things get inherited from the parent tag
* sc->data is supposed to point to the structure with the shared data,
* for example for a ring buffer it could be:
* struct {
* u_short rd_pos;
* u_short wr_pos;
* char bf[XXX_RING_BUFFER_SIZE]
* } *data;
*/
error=bus_dma_tag_create(sc->parent_tag, 1,
0, BUS_SPACE_MAXADDR, 0, /*filter*/ NULL, /*filterarg*/ NULL,
/*maxsize*/ sizeof(* sc->data), /*nsegments*/ 1,
/*maxsegsz*/ sizeof(* sc->data), /*flags*/ 0,
&sc->data_tag);
if(error)
goto bad;
error = bus_dmamem_alloc(sc->data_tag, &sc->data, /* flags*/ 0,
&sc->data_map);
if(error)
goto bad;
/* xxx_alloc_callback() just saves the physical address at
* the pointer passed as its argument, in this case &sc->data_p.
* See details in the section on bus memory mapping.
* It can be implemented like:
*
* static void
* xxx_alloc_callback(void *arg, bus_dma_segment_t *seg,
* int nseg, int error)
* {
* *(bus_addr_t *)arg = seg[0].ds_addr;
* }
*/
bus_dmamap_load(sc->data_tag, sc->data_map, (void *)sc->data,
sizeof (* sc->data), xxx_alloc_callback, (void *) &sc->data_p,
/*flags*/0);After all the necessary resources are allocated the
device should be initialized. The initialization may include
testing that all the expected features are functional. if(xxx_initialize(sc) < 0)
goto bad; The bus subsystem will automatically print on the
console the device description set by probe. But if the
driver wants to print some extra information about the
device it may do so, for example:
device_printf(dev, "has on-card FIFO buffer of %d bytes\n", sc->fifosize);
If the initialization routine experiences any problems
then printing messages about them before returning error is
also recommended.The final step of the attach routine is attaching the
device to its functional subsystem in the kernel. The exact
way to do it depends on the type of the driver: a character
device, a block device, a network device, a CAM SCSI bus
device and so on.If all went well then return success. error = xxx_attach_subsystem(sc);
if(error)
goto bad;
return 0; Finally, handle the troublesome situations. All the
resources should be deallocated before returning an
error. We make use of the fact that before the structure
softc is passed to us it gets zeroed out, so we can find out
if some resource was allocated: then its descriptor is
non-zero. bad:
xxx_free_resources(sc);
if(error)
return error;
else /* exact error is unknown */
return ENXIO;That would be all for the attach routine.xxx_isa_detach
If this function is present in the driver and the driver is
compiled as a loadable module then the driver gets the
ability to be unloaded. This is an important feature if the
hardware supports hot plug. But the ISA bus does not support
hot plug, so this feature is not particularly important for
the ISA devices. The ability to unload a driver may be
useful when debugging it, but in many cases installation of
the new version of the driver would be required only after
the old version somehow wedges the system and a reboot will be
needed anyway, so the efforts spent on writing the detach
routine may not be worth it. Another argument that
unloading would allow upgrading the drivers on a production
machine seems to be mostly theoretical. Installing a new
version of a driver is a dangerous operation which should
never be performed on a production machine (and which is not
permitted when the system is running in secure mode). Still,
the detach routine may be provided for the sake of
completeness.
The detach routine returns 0 if the driver was successfully
detached or the error code otherwise.
The logic of detach is a mirror of the attach. The first
thing to do is to detach the driver from its kernel
subsystem. If the device is currently open then the driver
has two choices: refuse to be detached or forcibly close and
proceed with detach. The choice used depends on the ability
of the particular kernel subsystem to do a forced close and
on the preferences of the driver's author. Generally the
forced close seems to be the preferred alternative.
struct xxx_softc *sc = device_get_softc(dev);
int error;
error = xxx_detach_subsystem(sc);
if(error)
return error;
Next the driver may want to reset the hardware to some
consistent state. That includes stopping any ongoing
transfers, disabling the DMA channels and interrupts to
avoid memory corruption by the device. For most of the
drivers this is exactly what the shutdown routine does, so
if it is included in the driver we can just call it.
xxx_isa_shutdown(dev);
And finally release all the resources and return success.
xxx_free_resources(sc);
return 0;xxx_isa_shutdown
This routine is called when the system is about to be shut
down. It is expected to bring the hardware to some
consistent state. For most of the ISA devices no special
action is required, so the function is not really necessary
because the device will be re-initialized on reboot
anyway. But some devices have to be shut down with a special
procedure, to make sure that they will be properly detected
after soft reboot (this is especially true for many devices
with proprietary identification protocols). In any case
disabling DMA and interrupts in the device registers and
stopping any ongoing transfers is a good idea. The exact
action depends on the hardware, so we do not consider it here
in any detail.
xxx_intrinterrupt handler
The interrupt handler is called when an interrupt is
received which may be from this particular device. The ISA
bus does not support interrupt sharing (except in some special
cases) so in practice if the interrupt handler is called
then the interrupt almost for sure came from its
device. Still, the interrupt handler must poll the device
registers and make sure that the interrupt was generated by
its device. If not it should just return.
The old convention for the ISA drivers was getting the
device unit number as an argument. This is obsolete, and the
new drivers receive whatever argument was specified for them
in the attach routine when calling
bus_setup_intr(). By the new convention
it should be the pointer to the structure softc. So the
interrupt handler commonly starts as:
static void
xxx_intr(struct xxx_softc *sc)
{
It runs at the interrupt priority level specified by the
interrupt type parameter of
bus_setup_intr(). That means that all
the other interrupts of the same type as well as all the
software interrupts are disabled.
To avoid races it is commonly written as a loop:
while(xxx_interrupt_pending(sc)) {
xxx_process_interrupt(sc);
xxx_acknowledge_interrupt(sc);
}
The interrupt handler has to acknowledge interrupt to the
device only but not to the interrupt controller, the system
takes care of the latter.
diff --git a/en_US.ISO8859-1/books/arch-handbook/pccard/chapter.xml b/en_US.ISO8859-1/books/arch-handbook/pccard/chapter.xml
index a9a2753d9a..59261a9568 100644
--- a/en_US.ISO8859-1/books/arch-handbook/pccard/chapter.xml
+++ b/en_US.ISO8859-1/books/arch-handbook/pccard/chapter.xml
@@ -1,367 +1,367 @@
PC CardPC CardCardBusThis chapter will talk about the FreeBSD mechanisms for
writing a device driver for a PC Card or CardBus device. However,
at present it just documents how to add a new device to an
existing pccard driver.Adding a DeviceDevice drivers know what devices they support. There is a
table of supported devices in the kernel that drivers use to
attach to a device.OverviewCISPC Cards are identified in one of two ways, both based on
the Card Information Structure
(CIS)
stored on the card. The
first method is to use numeric manufacturer and product
numbers. The second method is to use the human readable
strings that are also contained in the CIS. The PC Card bus
uses a centralized database and some macros to facilitate a
design pattern to help the driver writer match devices to his
driver.Original equipment manufacturers (OEMs)
often develop a reference design for a PC Card product, then
sell this design to other companies to market. Those
companies refine the design, market the product to their
target audience or geographic area, and put their own name
plate onto the card. The refinements to the physical card are
typically very minor, if any changes are made at all. To
strengthen their brand, these vendors place their company name
in the human readable strings in the CIS space, but leave the
manufacturer and product IDs unchanged.NetGearLinksysD-Link
- Because of this practice, FreeBSD drivers usually rely on
+ Due to this practice, FreeBSD drivers usually rely on
numeric IDs for device identification. Using numeric IDs and
a centralized database complicates adding IDs and support for
cards to the system. One must carefully check to see who
really made the card, especially when it appears that the
vendor who made the card might already have a different
manufacturer ID listed in the central database. Linksys,
D-Link, and NetGear are a number of US manufacturers of LAN
hardware that often sell the same design. These same designs
can be sold in Japan under names such as Buffalo and Corega.
Often, these devices will all have the same manufacturer and
product IDs.The PC Card bus code keeps a central database of card
information, but not which driver is associated with them, in
/sys/dev/pccard/pccarddevs. It also
provides a set of macros that allow one to easily construct
simple entries in the table the driver uses to claim
devices.Finally, some really low end devices do not contain
manufacturer identification at all. These devices must be
detected by matching the human readable CIS strings.
While it would be nice if we did not need this method as a
fallback, it is necessary for some very low end CD-ROM players
and Ethernet cards. This method should generally be
avoided, but a number of devices are listed in this section
because they were added prior to the recognition of the
OEM nature of the PC Card business. When
adding new devices, prefer using the numeric method.Format of pccarddevsThere are four sections in the
pccarddevs files. The first section
lists the manufacturer numbers for vendors that use
them. This section is sorted in numerical order. The next
section has all of the products that are used by these
vendors, along with their product ID numbers and a description
string. The description string typically is not used (instead
we set the device's description based on the human readable
CIS, even if we match on the numeric version). These two
sections are then repeated for devices that use the
string matching method. Finally, C-style comments enclosed in
/* and */ characters are
allowed anywhere in the file.The first section of the file contains the vendor IDs.
Please keep this list sorted in numeric order. Also, please
coordinate changes to this file because we share it with
NetBSD to help facilitate a common clearing house for this
information. For example, here are the first few vendor
IDs:vendor FUJITSU 0x0004 Fujitsu Corporation
vendor NETGEAR_2 0x000b Netgear
vendor PANASONIC 0x0032 Matsushita Electric Industrial Co.
vendor SANDISK 0x0045 Sandisk CorporationChances are very good
that the NETGEAR_2 entry is really an OEM
that NETGEAR purchased cards from and the author of support
for those cards was unaware at the time that Netgear was using
someone else's ID. These entries are fairly straightforward.
The vendor keyword denotes the kind of line that this is,
followed by the name of the vendor. This name will be
repeated later in pccarddevs, as
well as used in the driver's match tables, so keep it short
and a valid C identifier. A numeric ID in hex identifies the
manufacturer. Do not add IDs of the form
0xffffffff or 0xffff
because these are reserved IDs (the former is
no ID set while the latter is sometimes seen in
extremely poor quality cards to try to indicate
none). Finally there is a string description
of the company that makes the card. This string is not used
in FreeBSD for anything but commentary purposes.The second section of the file contains the products. As
shown in this example, the format is similar to the vendor
lines:/* Allied Telesis K.K. */
product ALLIEDTELESIS LA_PCM 0x0002 Allied Telesis LA-PCM
/* Archos */
product ARCHOS ARC_ATAPI 0x0043 MiniCDThe
product keyword is followed by the vendor
name, repeated from above. This is followed by the product
name, which is used by the driver and should be a valid C
identifier, but may also start with a number. As with the
vendors, the hex product ID for this card follows the same
convention for 0xffffffff and
0xffff. Finally, there is a string
description of the device itself. This string typically is
not used in FreeBSD, since FreeBSD's pccard bus driver will
construct a string from the human readable CIS entries, but it
can be used in the rare cases where this is somehow
insufficient. The products are in alphabetical order by
manufacturer, then numerical order by product ID. They have a
C comment before each manufacturer's entries and there is a
blank line between entries.The third section is like the previous vendor section, but
with all of the manufacturer numeric IDs set to
-1, meaning
match anything found in the FreeBSD pccard
bus code. Since these are C identifiers, their names must be
unique. Otherwise the format is identical to the first
section of the file.The final section contains the entries for those cards
that must be identified by string entries. This section's
format is a little different from the generic section:product ADDTRON AWP100 { "Addtron", "AWP-100&spWireless&spPCMCIA", "Version&sp01.02", NULL }
product ALLIEDTELESIS WR211PCM { "Allied&spTelesis&spK.K.", "WR211PCM", NULL, NULL } Allied Telesis WR211PCMThe familiar product keyword is
followed by the vendor name and the card name, just as in the
second section of the file. Here the format deviates from
that used earlier. There is a {} grouping, followed by a
number of strings. These strings correspond to the vendor,
product, and extra information that is defined in a CIS_INFO
tuple. These strings are filtered by the program that
generates pccarddevs.h to replace &sp
with a real space. NULL strings mean that the corresponding
part of the entry should be ignored. The example shown here
contains a bad entry. It should not contain the version
number unless that is critical for the operation of the card.
Sometimes vendors will have many different versions of the
card in the field that all work, in which case that
information only makes it harder for someone with a similar
card to use it with FreeBSD. Sometimes it is necessary when a
vendor wishes to sell many different parts under the same
brand due to market considerations (availability, price, and
so forth). Then it can be critical to disambiguating the card
in those rare cases where the vendor kept the same
manufacturer/product pair. Regular expression matching is not
available at this time.Sample Probe RoutinePC CardprobeTo understand how to add a device to the list of supported
devices, one must understand the probe and/or match routines
that many drivers have. It is complicated a little in FreeBSD
5.x because there is a compatibility layer for OLDCARD present
as well. Since only the window-dressing is different, an
idealized version will be presented here.static const struct pccard_product wi_pccard_products[] = {
PCMCIA_CARD(3COM, 3CRWE737A, 0),
PCMCIA_CARD(BUFFALO, WLI_PCM_S11, 0),
PCMCIA_CARD(BUFFALO, WLI_CF_S11G, 0),
PCMCIA_CARD(TDK, LAK_CD011WL, 0),
{ NULL }
};
static int
wi_pccard_probe(dev)
device_t dev;
{
const struct pccard_product *pp;
if ((pp = pccard_product_lookup(dev, wi_pccard_products,
sizeof(wi_pccard_products[0]), NULL)) != NULL) {
if (pp->pp_name != NULL)
device_set_desc(dev, pp->pp_name);
return (0);
}
return (ENXIO);
}Here we have a simple pccard probe routine that matches a
few devices. As stated above, the name may vary (if it is not
foo_pccard_probe() it will be
foo_pccard_match()). The function
pccard_product_lookup() is a generalized
function that walks the table and returns a pointer to the
first entry that it matches. Some drivers may use this
mechanism to convey additional information about some cards to
the rest of the driver, so there may be some variance in the
table. The only requirement is that each row of the table
must have a structpccard_product as the first
element.Looking at the table
wi_pccard_products, one notices that
all the entries are of the form
PCMCIA_CARD(foo,
bar,
baz). The
foo part is the manufacturer ID
from pccarddevs. The
bar part is the product ID.
baz is the expected function number
for this card. Many pccards can have multiple functions,
and some way to disambiguate function 1 from function 0 is
needed. You may see PCMCIA_CARD_D, which
includes the device description from
pccarddevs. You may also see
PCMCIA_CARD2 and
PCMCIA_CARD2_D which are used when you need
to match both CIS strings and manufacturer numbers, in the
use the default description and take the
description from pccarddevs flavors.Putting it All TogetherTo add a new device, one must first obtain the
identification information from the
device. The easiest way to do this is to insert the device
into a PC Card or CF slot and issue
devinfo -v. Sample output: cbb1 pnpinfo vendor=0x104c device=0xac51 subvendor=0x1265 subdevice=0x0300 class=0x060700 at slot=10 function=1
cardbus1
pccard1
unknown pnpinfo manufacturer=0x026f product=0x030c cisvendor="BUFFALO" cisproduct="WLI2-CF-S11" function_type=6 at function=0manufacturer
and product are the numeric IDs for this
product, while cisvendor and
cisproduct are the product description
strings from the CIS.Since we first want to prefer the numeric option, first
try to construct an entry based on that. The above card has
been slightly fictionalized for the purpose of this example.
The vendor is BUFFALO, which we see already has an
entry:vendor BUFFALO 0x026f BUFFALO (Melco Corporation)But there is no entry for this particular card.
Instead we find:/* BUFFALO */
product BUFFALO WLI_PCM_S11 0x0305 BUFFALO AirStation 11Mbps WLAN
product BUFFALO LPC_CF_CLT 0x0307 BUFFALO LPC-CF-CLT
product BUFFALO LPC3_CLT 0x030a BUFFALO LPC3-CLT Ethernet Adapter
product BUFFALO WLI_CF_S11G 0x030b BUFFALO AirStation 11Mbps CF WLANTo add the device, we can just add this entry to
pccarddevs:product BUFFALO WLI2_CF_S11G 0x030c BUFFALO AirStation ultra 802.11b CFOnce these steps are complete, the card can be added to
the driver. That is a simple operation of adding one
line:static const struct pccard_product wi_pccard_products[] = {
PCMCIA_CARD(3COM, 3CRWE737A, 0),
PCMCIA_CARD(BUFFALO, WLI_PCM_S11, 0),
PCMCIA_CARD(BUFFALO, WLI_CF_S11G, 0),
+ PCMCIA_CARD(BUFFALO, WLI_CF2_S11G, 0),
PCMCIA_CARD(TDK, LAK_CD011WL, 0),
{ NULL }
};Note that I have included a '+' in the
line before the line that I added, but that is simply to
highlight the line. Do not add it to the actual driver. Once
you have added the line, you can recompile your kernel or
module and test it. If the device is recognized and works,
please submit a patch. If it does not work, please figure out
what is needed to make it work and submit a patch. If the
device is not recognized at all, you have done something wrong
and should recheck each step.If you are a FreeBSD src committer, and everything appears
to be working, then you can commit the changes to the tree.
However, there are some minor tricky things to be considered.
pccarddevs must be committed to the tree
first. Then pccarddevs.h must be
regenerated and committed as a second step, ensuring that the
right $FreeBSD$ tag is in the latter file.
Finally, commit the additions to the driver.Submitting a New DevicePlease do not send entries for new devices to the author
directly. Instead, submit them as a PR and send the author
the PR number for his records. This ensures that entries are
not lost. When submitting a PR, it is unnecessary to include
the pccardevs.h diffs in the patch, since
those will be regenerated. It is necessary to include a
description of the device, as well as the patches to the
client driver. If you do not know the name, use OEM99 as the
name, and the author will adjust OEM99 accordingly after
investigation. Committers should not commit OEM99, but
instead find the highest OEM entry and commit one more than
that.
diff --git a/en_US.ISO8859-1/books/arch-handbook/scsi/chapter.xml b/en_US.ISO8859-1/books/arch-handbook/scsi/chapter.xml
index c325840bed..7de627b5b9 100644
--- a/en_US.ISO8859-1/books/arch-handbook/scsi/chapter.xml
+++ b/en_US.ISO8859-1/books/arch-handbook/scsi/chapter.xml
@@ -1,2239 +1,2239 @@
Common Access Method SCSI ControllersSergeyBabkinWritten by MurrayStokelyModifications for Handbook made by SynopsisSCSIThis document assumes that the reader has a general
understanding of device drivers in FreeBSD and of the SCSI
protocol. Much of the information in this document was
extracted from the drivers:ncr (/sys/pci/ncr.c) by
Wolfgang Stanglmeier and Stefan Essersym (/sys/dev/sym/sym_hipd.c) by
Gerard Roudieraic7xxx
(/sys/dev/aic7xxx/aic7xxx.c) by Justin
T. Gibbsand from the CAM code itself (by Justin T. Gibbs, see
/sys/cam/*). When some solution looked the
most logical and was essentially verbatim extracted from the
code by Justin T. Gibbs, I marked it as
recommended.The document is illustrated with examples in
pseudo-code. Although sometimes the examples have many details
and look like real code, it is still pseudo-code. It was
written to demonstrate the concepts in an understandable way.
For a real driver other approaches may be more modular and
efficient. It also abstracts from the hardware details, as well
as issues that would cloud the demonstrated concepts or that are
supposed to be described in the other chapters of the developers
handbook. Such details are commonly shown as calls to functions
with descriptive names, comments or pseudo-statements.
Fortunately real life full-size examples with all the details
can be found in the real drivers.General ArchitectureCommon Access Method (CAM)CAM stands for Common Access Method. It is a generic way to
address the I/O buses in a SCSI-like way. This allows a
separation of the generic device drivers from the drivers
controlling the I/O bus: for example the disk driver becomes
able to control disks on both SCSI, IDE, and/or any other bus so
the disk driver portion does not have to be rewritten (or copied
and modified) for every new I/O bus. Thus the two most
important active entities are:CD-ROMtapeIDEPeripheral Modules - a
driver for peripheral devices (disk, tape, CD-ROM,
etc.)SCSI Interface Modules (SIM) - a
Host Bus Adapter drivers for connecting to an I/O bus such
as SCSI or IDE.A peripheral driver receives requests from the OS, converts
them to a sequence of SCSI commands and passes these SCSI
commands to a SCSI Interface Module. The SCSI Interface Module
is responsible for passing these commands to the actual hardware
(or if the actual hardware is not SCSI but, for example, IDE
then also converting the SCSI commands to the native commands of
the hardware).
- Because we are interested in writing a SCSI adapter driver
+ As we are interested in writing a SCSI adapter driver
here, from this point on we will consider everything from the
SIM standpoint.A typical SIM driver needs to include the following
CAM-related header files:#include <cam/cam.h>
#include <cam/cam_ccb.h>
#include <cam/cam_sim.h>
#include <cam/cam_xpt_sim.h>
#include <cam/cam_debug.h>
#include <cam/scsi/scsi_all.h>The first thing each SIM driver must do is register itself
with the CAM subsystem. This is done during the driver's
xxx_attach() function (here and further
xxx_ is used to denote the unique driver name prefix). The
xxx_attach() function itself is called by
the system bus auto-configuration code which we do not describe
here.This is achieved in multiple steps: first it is necessary to
allocate the queue of requests associated with this SIM: struct cam_devq *devq;
if(( devq = cam_simq_alloc(SIZE) )==NULL) {
error; /* some code to handle the error */
}Here SIZE is the size of the queue to be
allocated, maximal number of requests it could contain. It is
the number of requests that the SIM driver can handle in
parallel on one SCSI card. Commonly it can be calculated
as:SIZE = NUMBER_OF_SUPPORTED_TARGETS * MAX_SIMULTANEOUS_COMMANDS_PER_TARGETNext we create a descriptor of our SIM: struct cam_sim *sim;
if(( sim = cam_sim_alloc(action_func, poll_func, driver_name,
softc, unit, mtx, max_dev_transactions,
max_tagged_dev_transactions, devq) )==NULL) {
cam_simq_free(devq);
error; /* some code to handle the error */
}Note that if we are not able to create a SIM descriptor we
free the devq also because we can do
nothing else with it and we want to conserve memory.If a SCSI card has multiple SCSI
busesSCSIbus
on it then each bus requires its own
cam_sim structure.An interesting question is what to do if a SCSI card has
more than one SCSI bus, do we need one
devq structure per card or per SCSI
bus? The answer given in the comments to the CAM code is:
either way, as the driver's author prefers.The arguments are:action_func - pointer to
the driver's xxx_action function.
static void
xxx_actionstruct cam_sim *sim,
union ccb *ccbpoll_func - pointer to
the driver's xxx_poll()static void
xxx_pollstruct cam_sim *simdriver_name - the name of the actual driver,
such as ncr or
wds.softc - pointer to the driver's
internal descriptor for this SCSI card. This pointer will
be used by the driver in future to get private
data.unit - the controller unit number, for example
for controller mps0 this number will be
0mtx - Lock associated with this SIM. For SIMs that don't
know about locking, pass in Giant. For SIMs that do, pass in
the lock used to guard this SIM's data structures. This lock
will be held when xxx_action and xxx_poll are called.max_dev_transactions - maximal number of simultaneous
transactions per SCSI target in the non-tagged mode. This
value will be almost universally equal to 1, with possible
exceptions only for the non-SCSI cards. Also the drivers
that hope to take advantage by preparing one transaction
while another one is executed may set it to 2 but this does
not seem to be worth the complexity.max_tagged_dev_transactions - the same thing, but in the
tagged mode. Tags are the SCSI way to initiate multiple
transactions on a device: each transaction is assigned a
unique tag and the transaction is sent to the device. When
the device completes some transaction it sends back the
result together with the tag so that the SCSI adapter (and
the driver) can tell which transaction was completed. This
argument is also known as the maximal tag depth. It depends
on the abilities of the SCSI adapter.Finally we register the SCSI buses associated with our SCSI
adapterSCSIadapter: if(xpt_bus_register(sim, softc, bus_number) != CAM_SUCCESS) {
cam_sim_free(sim, /*free_devq*/ TRUE);
error; /* some code to handle the error */
}If there is one devq structure per
SCSI bus (i.e., we consider a card with multiple buses as
multiple cards with one bus each) then the bus number will
always be 0, otherwise each bus on the SCSI card should be get a
distinct number. Each bus needs its own separate structure
cam_sim.After that our controller is completely hooked to the CAM
system. The value of devq can be
discarded now: sim will be passed as an argument in all further
calls from CAM and devq can be derived from it.CAM provides the framework for such asynchronous events.
Some events originate from the lower levels (the SIM drivers),
some events originate from the peripheral drivers, some events
originate from the CAM subsystem itself. Any driver can
register callbacks for some types of the asynchronous events, so
that it would be notified if these events occur.A typical example of such an event is a device reset. Each
transaction and event identifies the devices to which it applies
by the means of path. The target-specific events
normally occur during a transaction with this device. So the
path from that transaction may be re-used to report this event
(this is safe because the event path is copied in the event
reporting routine but not deallocated nor passed anywhere
further). Also it is safe to allocate paths dynamically at any
time including the interrupt routines, although that incurs
certain overhead, and a possible problem with this approach is
that there may be no free memory at that time. For a bus reset
event we need to define a wildcard path including all devices on
the bus. So we can create the path for the future bus reset
events in advance and avoid problems with the future memory
shortage: struct cam_path *path;
if(xpt_create_path(&path, /*periph*/NULL,
cam_sim_path(sim), CAM_TARGET_WILDCARD,
CAM_LUN_WILDCARD) != CAM_REQ_CMP) {
xpt_bus_deregister(cam_sim_path(sim));
cam_sim_free(sim, /*free_devq*/TRUE);
error; /* some code to handle the error */
}
softc->wpath = path;
softc->sim = sim;As you can see the path includes:ID of the peripheral driver (NULL here because we have
none)ID of the SIM driver
(cam_sim_path(sim))SCSI target number of the device (CAM_TARGET_WILDCARD
means all devices)SCSI LUN number of the subdevice (CAM_LUN_WILDCARD means
all LUNs)If the driver can not allocate this path it will not be able
to work normally, so in that case we dismantle that SCSI
bus.And we save the path pointer in the
softc structure for future use. After
that we save the value of sim (or we can also discard it on the
exit from xxx_probe() if we wish).That is all for a minimalistic initialization. To do things
right there is one more issue left.For a SIM driver there is one particularly interesting
event: when a target device is considered lost. In this case
resetting the SCSI negotiations with this device may be a good
idea. So we register a callback for this event with CAM. The
request is passed to CAM by requesting CAM action on a CAM
control block for this type of request: struct ccb_setasync csa;
xpt_setup_ccb(&csa.ccb_h, path, /*priority*/5);
csa.ccb_h.func_code = XPT_SASYNC_CB;
csa.event_enable = AC_LOST_DEVICE;
csa.callback = xxx_async;
csa.callback_arg = sim;
xpt_action((union ccb *)&csa);Now we take a look at the xxx_action()
and xxx_poll() driver entry points.static void
xxx_actionstruct cam_sim *sim,
union ccb *ccbDo some action on request of the CAM subsystem. Sim
describes the SIM for the request, CCB is the request itself.
CCB stands for CAM Control Block. It is a union
of many specific instances, each describing arguments for some
type of transactions. All of these instances share the CCB
header where the common part of arguments is stored.CAM supports the SCSI controllers working in both initiator
(normal) mode and target (simulating a SCSI
device) mode. Here we only consider the part relevant to the
initiator mode.There are a few function and macros (in other words,
methods) defined to access the public data in the struct
sim:cam_sim_path(sim) - the path ID
(see above)cam_sim_name(sim) - the name of the
simcam_sim_softc(sim) - the pointer to
the softc (driver private data) structure cam_sim_unit(sim) - the unit
number cam_sim_bus(sim) - the bus
IDTo identify the device, xxx_action()
can get the unit number and pointer to its structure softc using
these functions.The type of request is stored in
ccb->ccb_h.func_code. So
generally xxx_action() consists of a big
switch: struct xxx_softc *softc = (struct xxx_softc *) cam_sim_softc(sim);
struct ccb_hdr *ccb_h = &ccb->ccb_h;
int unit = cam_sim_unit(sim);
int bus = cam_sim_bus(sim);
switch(ccb_h->func_code) {
case ...:
...
default:
ccb_h->status = CAM_REQ_INVALID;
xpt_done(ccb);
break;
}As can be seen from the default case (if an unknown command
was received) the return code of the command is set into
ccb->ccb_h.status and the
completed CCB is returned back to CAM by calling
xpt_done(ccb).xpt_done() does not have to be called
from xxx_action(): For example an I/O
request may be enqueued inside the SIM driver and/or its SCSI
controller. Then when the device would post an interrupt
signaling that the processing of this request is complete
xpt_done() may be called from the interrupt
handling routine.Actually, the CCB status is not only assigned as a return
code but a CCB has some status all the time. Before CCB is
passed to the xxx_action() routine it gets
the status CCB_REQ_INPROG meaning that it is in progress. There
are a surprising number of status values defined in
/sys/cam/cam.h which should be able to
represent the status of a request in great detail. More
interesting yet, the status is in fact a bitwise
or of an enumerated status value (the lower 6 bits) and
possible additional flag-like bits (the upper bits). The
enumerated values will be discussed later in more detail. The
summary of them can be found in the Errors Summary section. The
possible status flags are:CAM_DEV_QFRZN - if the SIM driver
gets a serious error (for example, the device does not
respond to the selection or breaks the SCSI protocol) when
processing a CCB it should freeze the request queue by
calling xpt_freeze_simq(), return the
other enqueued but not processed yet CCBs for this device
back to the CAM queue, then set this flag for the
troublesome CCB and call xpt_done().
This flag causes the CAM subsystem to unfreeze the queue
after it handles the error.CAM_AUTOSNS_VALID - if the
device returned an error condition and the flag
CAM_DIS_AUTOSENSE is not set in CCB the SIM driver must
execute the REQUEST SENSE command automatically to extract
the sense (extended error information) data from the device.
If this attempt was successful the sense data should be
saved in the CCB and this flag set.CAM_RELEASE_SIMQ - like
CAM_DEV_QFRZN but used in case there is some problem (or
resource shortage) with the SCSI controller itself. Then
all the future requests to the controller should be stopped
by xpt_freeze_simq(). The controller
queue will be restarted after the SIM driver overcomes the
shortage and informs CAM by returning some CCB with this
flag set.CAM_SIM_QUEUED - when SIM puts a
CCB into its request queue this flag should be set (and
removed when this CCB gets dequeued before being returned
back to CAM). This flag is not used anywhere in the CAM
code now, so its purpose is purely diagnostic.CAM_QOS_VALID - The QOS data
is now valid.The function xxx_action() is not
allowed to sleep, so all the synchronization for resource access
must be done using SIM or device queue freezing. Besides the
aforementioned flags the CAM subsystem provides functions
xpt_release_simq() and
xpt_release_devq() to unfreeze the queues
directly, without passing a CCB to CAM.The CCB header contains the following fields:path - path ID for the
requesttarget_id - target device ID for
the requesttarget_lun - LUN ID of the target
devicetimeout - timeout interval for this
command, in millisecondstimeout_ch - a convenience place
for the SIM driver to store the timeout handle (the CAM
subsystem itself does not make any assumptions about
it)flags - various bits of information
about the request spriv_ptr0, spriv_ptr1 - fields reserved
for private use by the SIM driver (such as linking to the
SIM queues or SIM private control blocks); actually, they
exist as unions: spriv_ptr0 and spriv_ptr1 have the type
(void *), spriv_field0 and spriv_field1 have the type
unsigned long, sim_priv.entries[0].bytes and
sim_priv.entries[1].bytes are byte arrays of the size
consistent with the other incarnations of the union and
sim_priv.bytes is one array, twice bigger.The recommended way of using the SIM private fields of CCB
is to define some meaningful names for them and use these
meaningful names in the driver, like:#define ccb_some_meaningful_name sim_priv.entries[0].bytes
#define ccb_hcb spriv_ptr1 /* for hardware control block */The most common initiator mode requests are:XPT_SCSI_IO - execute an I/O
transactionThe instance struct ccb_scsiio csio of
the union ccb is used to transfer the arguments. They
are:cdb_io - pointer to the SCSI
command buffer or the buffer itselfcdb_len - SCSI command
lengthdata_ptr - pointer to the data
buffer (gets a bit complicated if scatter/gather is
used)dxfer_len - length of the data
to transfersglist_cnt - counter of the
scatter/gather segmentsscsi_status - place to return
the SCSI statussense_data - buffer for the
SCSI sense information if the command returns an error
(the SIM driver is supposed to run the REQUEST SENSE
command automatically in this case if the CCB flag
CAM_DIS_AUTOSENSE is not set)sense_len - the length of that
buffer (if it happens to be higher than size of
sense_data the SIM driver must silently assume the
smaller value) resid, sense_resid - if the transfer of
data or SCSI sense returned an error these are the
returned counters of the residual (not transferred)
data. They do not seem to be especially meaningful, so
in a case when they are difficult to compute (say,
counting bytes in the SCSI controller's FIFO buffer) an
approximate value will do as well. For a successfully
completed transfer they must be set to
zero.tag_action - the kind of tag to
use:CAM_TAG_ACTION_NONE - do not use tags for this
transactionMSG_SIMPLE_Q_TAG, MSG_HEAD_OF_Q_TAG,
MSG_ORDERED_Q_TAG - value equal to the appropriate
tag message (see /sys/cam/scsi/scsi_message.h); this
gives only the tag type, the SIM driver must assign
the tag value itselfThe general logic of handling this request is the
following:The first thing to do is to check for possible races, to
make sure that the command did not get aborted when it was
sitting in the queue: struct ccb_scsiio *csio = &ccb->csio;
if ((ccb_h->status & CAM_STATUS_MASK) != CAM_REQ_INPROG) {
xpt_done(ccb);
return;
}Also we check that the device is supported at all by our
controller: if(ccb_h->target_id > OUR_MAX_SUPPORTED_TARGET_ID
|| cch_h->target_id == OUR_SCSI_CONTROLLERS_OWN_ID) {
ccb_h->status = CAM_TID_INVALID;
xpt_done(ccb);
return;
}
if(ccb_h->target_lun > OUR_MAX_SUPPORTED_LUN) {
ccb_h->status = CAM_LUN_INVALID;
xpt_done(ccb);
return;
}Then allocate whatever data structures (such as
card-dependent hardware control
blockhardware control
block) we need to process this
request. If we can not then freeze the SIM queue and
remember that we have a pending operation, return the CCB
back and ask CAM to re-queue it. Later when the resources
become available the SIM queue must be unfrozen by returning
a ccb with the CAM_SIMQ_RELEASE bit set
in its status. Otherwise, if all went well, link the CCB
with the hardware control block (HCB) and mark it as
queued. struct xxx_hcb *hcb = allocate_hcb(softc, unit, bus);
if(hcb == NULL) {
softc->flags |= RESOURCE_SHORTAGE;
xpt_freeze_simq(sim, /*count*/1);
ccb_h->status = CAM_REQUEUE_REQ;
xpt_done(ccb);
return;
}
hcb->ccb = ccb; ccb_h->ccb_hcb = (void *)hcb;
ccb_h->status |= CAM_SIM_QUEUED;Extract the target data from CCB into the hardware
control block. Check if we are asked to assign a tag and if
yes then generate an unique tag and build the SCSI tag
messages. The SIM driver is also responsible for
negotiations with the devices to set the maximal mutually
supported bus width, synchronous rate and offset. hcb->target = ccb_h->target_id; hcb->lun = ccb_h->target_lun;
generate_identify_message(hcb);
if( ccb_h->tag_action != CAM_TAG_ACTION_NONE )
generate_unique_tag_message(hcb, ccb_h->tag_action);
if( !target_negotiated(hcb) )
generate_negotiation_messages(hcb);Then set up the SCSI command. The command storage may
be specified in the CCB in many interesting ways, specified
by the CCB flags. The command buffer can be contained in
CCB or pointed to, in the latter case the pointer may be
physical or virtual. Since the hardware commonly needs
physical address we always convert the address to the
physical one, typically using the busdma API.In case if a physical address is
requested it is OK to return the CCB with the status
CAM_REQ_INVALID, the current drivers
do that. If necessary a physical address can be also
converted or mapped back to a virtual address but with
big pain, so we do not do that. if(ccb_h->flags & CAM_CDB_POINTER) {
/* CDB is a pointer */
if(!(ccb_h->flags & CAM_CDB_PHYS)) {
/* CDB pointer is virtual */
hcb->cmd = vtobus(csio->cdb_io.cdb_ptr);
} else {
/* CDB pointer is physical */
hcb->cmd = csio->cdb_io.cdb_ptr ;
}
} else {
/* CDB is in the ccb (buffer) */
hcb->cmd = vtobus(csio->cdb_io.cdb_bytes);
}
hcb->cmdlen = csio->cdb_len;Now it is time to set up the data. Again, the data
storage may be specified in the CCB in many interesting
ways, specified by the CCB flags. First we get the
direction of the data transfer. The simplest case is if
there is no data to transfer: int dir = (ccb_h->flags & CAM_DIR_MASK);
if (dir == CAM_DIR_NONE)
goto end_data;Then we check if the data is in one chunk or in a
scatter-gather list, and the addresses are physical or
virtual. The SCSI controller may be able to handle only a
limited number of chunks of limited length. If the request
hits this limitation we return an error. We use a special
function to return the CCB to handle in one place the HCB
resource shortages. The functions to add chunks are
driver-dependent, and here we leave them without detailed
implementation. See description of the SCSI command (CDB)
handling for the details on the address-translation issues.
If some variation is too difficult or impossible to
implement with a particular card it is OK to return the
status CAM_REQ_INVALID. Actually, it
seems like the scatter-gather ability is not used anywhere
in the CAM code now. But at least the case for a single
non-scattered virtual buffer must be implemented, it is
actively used by CAM. int rv;
initialize_hcb_for_data(hcb);
if((!(ccb_h->flags & CAM_SCATTER_VALID)) {
/* single buffer */
if(!(ccb_h->flags & CAM_DATA_PHYS)) {
rv = add_virtual_chunk(hcb, csio->data_ptr, csio->dxfer_len, dir);
}
} else {
rv = add_physical_chunk(hcb, csio->data_ptr, csio->dxfer_len, dir);
}
} else {
int i;
struct bus_dma_segment *segs;
segs = (struct bus_dma_segment *)csio->data_ptr;
if ((ccb_h->flags & CAM_SG_LIST_PHYS) != 0) {
/* The SG list pointer is physical */
rv = setup_hcb_for_physical_sg_list(hcb, segs, csio->sglist_cnt);
} else if (!(ccb_h->flags & CAM_DATA_PHYS)) {
/* SG buffer pointers are virtual */
for (i = 0; i < csio->sglist_cnt; i++) {
rv = add_virtual_chunk(hcb, segs[i].ds_addr,
segs[i].ds_len, dir);
if (rv != CAM_REQ_CMP)
break;
}
} else {
/* SG buffer pointers are physical */
for (i = 0; i < csio->sglist_cnt; i++) {
rv = add_physical_chunk(hcb, segs[i].ds_addr,
segs[i].ds_len, dir);
if (rv != CAM_REQ_CMP)
break;
}
}
}
if(rv != CAM_REQ_CMP) {
/* we expect that add_*_chunk() functions return CAM_REQ_CMP
* if they added a chunk successfully, CAM_REQ_TOO_BIG if
* the request is too big (too many bytes or too many chunks),
* CAM_REQ_INVALID in case of other troubles
*/
free_hcb_and_ccb_done(hcb, ccb, rv);
return;
}
end_data:If disconnection is disabled for this CCB we pass this
information to the hcb: if(ccb_h->flags & CAM_DIS_DISCONNECT)
hcb_disable_disconnect(hcb);If the controller is able to run REQUEST SENSE command
all by itself then the value of the flag CAM_DIS_AUTOSENSE
should also be passed to it, to prevent automatic REQUEST
SENSE if the CAM subsystem does not want it.The only thing left is to set up the timeout, pass our
hcb to the hardware and return, the rest will be done by the
interrupt handler (or timeout handler). ccb_h->timeout_ch = timeout(xxx_timeout, (caddr_t) hcb,
(ccb_h->timeout * hz) / 1000); /* convert milliseconds to ticks */
put_hcb_into_hardware_queue(hcb);
return;And here is a possible implementation of the function
returning CCB: static void
free_hcb_and_ccb_done(struct xxx_hcb *hcb, union ccb *ccb, u_int32_t status)
{
struct xxx_softc *softc = hcb->softc;
ccb->ccb_h.ccb_hcb = 0;
if(hcb != NULL) {
untimeout(xxx_timeout, (caddr_t) hcb, ccb->ccb_h.timeout_ch);
/* we're about to free a hcb, so the shortage has ended */
if(softc->flags & RESOURCE_SHORTAGE) {
softc->flags &= ~RESOURCE_SHORTAGE;
status |= CAM_RELEASE_SIMQ;
}
free_hcb(hcb); /* also removes hcb from any internal lists */
}
ccb->ccb_h.status = status |
(ccb->ccb_h.status & ~(CAM_STATUS_MASK|CAM_SIM_QUEUED));
xpt_done(ccb);
}XPT_RESET_DEV - send the SCSI
BUS DEVICE RESET message to a deviceThere is no data transferred in CCB except the header
and the most interesting argument of it is target_id.
Depending on the controller hardware a hardware control
block just like for the XPT_SCSI_IO request may be
constructed (see XPT_SCSI_IO request description) and sent
to the controller or the SCSI controller may be immediately
programmed to send this RESET message to the device or this
request may be just not supported (and return the status
CAM_REQ_INVALID). Also on completion
of the request all the disconnected transactions for this
target must be aborted (probably in the interrupt
routine).Also all the current negotiations for the target are
lost on reset, so they might be cleaned too. Or they
clearing may be deferred, because anyway the target would
request re-negotiation on the next
transaction.XPT_RESET_BUS - send the RESET
signal to the SCSI busNo arguments are passed in the CCB, the only interesting
argument is the SCSI bus indicated by the struct sim
pointer.A minimalistic implementation would forget the SCSI
negotiations for all the devices on the bus and return the
status CAM_REQ_CMP.The proper implementation would in addition actually
reset the SCSI bus (possible also reset the SCSI controller)
and mark all the CCBs being processed, both those in the
hardware queue and those being disconnected, as done with
the status CAM_SCSI_BUS_RESET. Like: int targ, lun;
struct xxx_hcb *h, *hh;
struct ccb_trans_settings neg;
struct cam_path *path;
/* The SCSI bus reset may take a long time, in this case its completion
* should be checked by interrupt or timeout. But for simplicity
* we assume here that it is really fast.
*/
reset_scsi_bus(softc);
/* drop all enqueued CCBs */
for(h = softc->first_queued_hcb; h != NULL; h = hh) {
hh = h->next;
free_hcb_and_ccb_done(h, h->ccb, CAM_SCSI_BUS_RESET);
}
/* the clean values of negotiations to report */
neg.bus_width = 8;
neg.sync_period = neg.sync_offset = 0;
neg.valid = (CCB_TRANS_BUS_WIDTH_VALID
| CCB_TRANS_SYNC_RATE_VALID | CCB_TRANS_SYNC_OFFSET_VALID);
/* drop all disconnected CCBs and clean negotiations */
for(targ=0; targ <= OUR_MAX_SUPPORTED_TARGET; targ++) {
clean_negotiations(softc, targ);
/* report the event if possible */
if(xpt_create_path(&path, /*periph*/NULL,
cam_sim_path(sim), targ,
CAM_LUN_WILDCARD) == CAM_REQ_CMP) {
xpt_async(AC_TRANSFER_NEG, path, &neg);
xpt_free_path(path);
}
for(lun=0; lun <= OUR_MAX_SUPPORTED_LUN; lun++)
for(h = softc->first_discon_hcb[targ][lun]; h != NULL; h = hh) {
hh=h->next;
free_hcb_and_ccb_done(h, h->ccb, CAM_SCSI_BUS_RESET);
}
}
ccb->ccb_h.status = CAM_REQ_CMP;
xpt_done(ccb);
/* report the event */
xpt_async(AC_BUS_RESET, softc->wpath, NULL);
return;Implementing the SCSI bus reset as a function may be a
good idea because it would be re-used by the timeout
function as a last resort if the things go
wrong.XPT_ABORT - abort the specified
CCBThe arguments are transferred in the instance
struct ccb_abort cab of the union ccb. The
only argument field in it is:abort_ccb - pointer to the CCB to
be abortedIf the abort is not supported just return the status
CAM_UA_ABORT. This is also the easy way to minimally
implement this call, return CAM_UA_ABORT in any case.The hard way is to implement this request honestly.
First check that abort applies to a SCSI transaction: struct ccb *abort_ccb;
abort_ccb = ccb->cab.abort_ccb;
if(abort_ccb->ccb_h.func_code != XPT_SCSI_IO) {
ccb->ccb_h.status = CAM_UA_ABORT;
xpt_done(ccb);
return;
}Then it is necessary to find this CCB in our queue.
This can be done by walking the list of all our hardware
control blocks in search for one associated with this
CCB: struct xxx_hcb *hcb, *h;
hcb = NULL;
/* We assume that softc->first_hcb is the head of the list of all
* HCBs associated with this bus, including those enqueued for
* processing, being processed by hardware and disconnected ones.
*/
for(h = softc->first_hcb; h != NULL; h = h->next) {
if(h->ccb == abort_ccb) {
hcb = h;
break;
}
}
if(hcb == NULL) {
/* no such CCB in our queue */
ccb->ccb_h.status = CAM_PATH_INVALID;
xpt_done(ccb);
return;
}
hcb=found_hcb;Now we look at the current processing status of the HCB.
It may be either sitting in the queue waiting to be sent to
the SCSI bus, being transferred right now, or disconnected
and waiting for the result of the command, or actually
completed by hardware but not yet marked as done by
software. To make sure that we do not get in any races with
hardware we mark the HCB as being aborted, so that if this
HCB is about to be sent to the SCSI bus the SCSI controller
will see this flag and skip it. int hstatus;
/* shown as a function, in case special action is needed to make
* this flag visible to hardware
*/
set_hcb_flags(hcb, HCB_BEING_ABORTED);
abort_again:
hstatus = get_hcb_status(hcb);
switch(hstatus) {
case HCB_SITTING_IN_QUEUE:
remove_hcb_from_hardware_queue(hcb);
/* FALLTHROUGH */
case HCB_COMPLETED:
/* this is an easy case */
free_hcb_and_ccb_done(hcb, abort_ccb, CAM_REQ_ABORTED);
break;If the CCB is being transferred right now we would like
to signal to the SCSI controller in some hardware-dependent
way that we want to abort the current transfer. The SCSI
controller would set the SCSI ATTENTION signal and when the
target responds to it send an ABORT message. We also reset
the timeout to make sure that the target is not sleeping
forever. If the command would not get aborted in some
reasonable time like 10 seconds the timeout routine would go
- ahead and reset the whole SCSI bus. Because the command
+ ahead and reset the whole SCSI bus. Since the command
will be aborted in some reasonable time we can just return
the abort request now as successfully completed, and mark
the aborted CCB as aborted (but not mark it as done
yet). case HCB_BEING_TRANSFERRED:
untimeout(xxx_timeout, (caddr_t) hcb, abort_ccb->ccb_h.timeout_ch);
abort_ccb->ccb_h.timeout_ch =
timeout(xxx_timeout, (caddr_t) hcb, 10 * hz);
abort_ccb->ccb_h.status = CAM_REQ_ABORTED;
/* ask the controller to abort that HCB, then generate
* an interrupt and stop
*/
if(signal_hardware_to_abort_hcb_and_stop(hcb) < 0) {
/* oops, we missed the race with hardware, this transaction
* got off the bus before we aborted it, try again */
goto abort_again;
}
break;If the CCB is in the list of disconnected then set it up
as an abort request and re-queue it at the front of hardware
queue. Reset the timeout and report the abort request to be
completed. case HCB_DISCONNECTED:
untimeout(xxx_timeout, (caddr_t) hcb, abort_ccb->ccb_h.timeout_ch);
abort_ccb->ccb_h.timeout_ch =
timeout(xxx_timeout, (caddr_t) hcb, 10 * hz);
put_abort_message_into_hcb(hcb);
put_hcb_at_the_front_of_hardware_queue(hcb);
break;
}
ccb->ccb_h.status = CAM_REQ_CMP;
xpt_done(ccb);
return;That is all for the ABORT request, although there is one
- more issue. Because the ABORT message cleans all the
+ more issue. As the ABORT message cleans all the
ongoing transactions on a LUN we have to mark all the other
active transactions on this LUN as aborted. That should be
done in the interrupt routine, after the transaction gets
aborted.Implementing the CCB abort as a function may be quite a
good idea, this function can be re-used if an I/O
transaction times out. The only difference would be that
the timed out transaction would return the status
CAM_CMD_TIMEOUT for the timed out request. Then the case
XPT_ABORT would be small, like that: case XPT_ABORT:
struct ccb *abort_ccb;
abort_ccb = ccb->cab.abort_ccb;
if(abort_ccb->ccb_h.func_code != XPT_SCSI_IO) {
ccb->ccb_h.status = CAM_UA_ABORT;
xpt_done(ccb);
return;
}
if(xxx_abort_ccb(abort_ccb, CAM_REQ_ABORTED) < 0)
/* no such CCB in our queue */
ccb->ccb_h.status = CAM_PATH_INVALID;
else
ccb->ccb_h.status = CAM_REQ_CMP;
xpt_done(ccb);
return;XPT_SET_TRAN_SETTINGS - explicitly
set values of SCSI transfer settingsThe arguments are transferred in the instance
struct ccb_trans_setting cts of the union
ccb:valid - a bitmask showing which
settings should be updated:CCB_TRANS_SYNC_RATE_VALID -
synchronous transfer rateCCB_TRANS_SYNC_OFFSET_VALID -
synchronous offsetCCB_TRANS_BUS_WIDTH_VALID - bus
widthCCB_TRANS_DISC_VALID - set
enable/disable disconnectionCCB_TRANS_TQ_VALID - set
enable/disable tagged queuingflags - consists of two parts,
binary arguments and identification of sub-operations.
The binary arguments are:CCB_TRANS_DISC_ENB - enable
disconnectionCCB_TRANS_TAG_ENB - enable
tagged queuingthe sub-operations are:CCB_TRANS_CURRENT_SETTINGS
- change the current negotiationsCCB_TRANS_USER_SETTINGS -
remember the desired user values sync_period,
sync_offset - self-explanatory, if sync_offset==0
then the asynchronous mode is requested bus_width -
bus width, in bits (not bytes)Two sets of negotiated parameters are supported, the
user settings and the current settings. The user settings
are not really used much in the SIM drivers, this is mostly
just a piece of memory where the upper levels can store (and
later recall) its ideas about the parameters. Setting the
user parameters does not cause re-negotiation of the
transfer rates. But when the SCSI controller does a
negotiation it must never set the values higher than the
user parameters, so it is essentially the top
boundary.The current settings are, as the name says, current.
Changing them means that the parameters must be
re-negotiated on the next transfer. Again, these
new current settings are not supposed to be
forced on the device, just they are used as the initial step
of negotiations. Also they must be limited by actual
capabilities of the SCSI controller: for example, if the
SCSI controller has 8-bit bus and the request asks to set
16-bit wide transfers this parameter must be silently
truncated to 8-bit transfers before sending it to the
device.One caveat is that the bus width and synchronous
parameters are per target while the disconnection and tag
enabling parameters are per lun.The recommended implementation is to keep 3 sets of
negotiated (bus width and synchronous transfer)
parameters:user - the user set, as
abovecurrent - those actually in
effectgoal - those requested by
setting of the current
parametersThe code looks like: struct ccb_trans_settings *cts;
int targ, lun;
int flags;
cts = &ccb->cts;
targ = ccb_h->target_id;
lun = ccb_h->target_lun;
flags = cts->flags;
if(flags & CCB_TRANS_USER_SETTINGS) {
if(flags & CCB_TRANS_SYNC_RATE_VALID)
softc->user_sync_period[targ] = cts->sync_period;
if(flags & CCB_TRANS_SYNC_OFFSET_VALID)
softc->user_sync_offset[targ] = cts->sync_offset;
if(flags & CCB_TRANS_BUS_WIDTH_VALID)
softc->user_bus_width[targ] = cts->bus_width;
if(flags & CCB_TRANS_DISC_VALID) {
softc->user_tflags[targ][lun] &= ~CCB_TRANS_DISC_ENB;
softc->user_tflags[targ][lun] |= flags & CCB_TRANS_DISC_ENB;
}
if(flags & CCB_TRANS_TQ_VALID) {
softc->user_tflags[targ][lun] &= ~CCB_TRANS_TQ_ENB;
softc->user_tflags[targ][lun] |= flags & CCB_TRANS_TQ_ENB;
}
}
if(flags & CCB_TRANS_CURRENT_SETTINGS) {
if(flags & CCB_TRANS_SYNC_RATE_VALID)
softc->goal_sync_period[targ] =
max(cts->sync_period, OUR_MIN_SUPPORTED_PERIOD);
if(flags & CCB_TRANS_SYNC_OFFSET_VALID)
softc->goal_sync_offset[targ] =
min(cts->sync_offset, OUR_MAX_SUPPORTED_OFFSET);
if(flags & CCB_TRANS_BUS_WIDTH_VALID)
softc->goal_bus_width[targ] = min(cts->bus_width, OUR_BUS_WIDTH);
if(flags & CCB_TRANS_DISC_VALID) {
softc->current_tflags[targ][lun] &= ~CCB_TRANS_DISC_ENB;
softc->current_tflags[targ][lun] |= flags & CCB_TRANS_DISC_ENB;
}
if(flags & CCB_TRANS_TQ_VALID) {
softc->current_tflags[targ][lun] &= ~CCB_TRANS_TQ_ENB;
softc->current_tflags[targ][lun] |= flags & CCB_TRANS_TQ_ENB;
}
}
ccb->ccb_h.status = CAM_REQ_CMP;
xpt_done(ccb);
return;Then when the next I/O request will be processed it will
check if it has to re-negotiate, for example by calling the
function target_negotiated(hcb). It can be implemented like
this: int
target_negotiated(struct xxx_hcb *hcb)
{
struct softc *softc = hcb->softc;
int targ = hcb->targ;
if( softc->current_sync_period[targ] != softc->goal_sync_period[targ]
|| softc->current_sync_offset[targ] != softc->goal_sync_offset[targ]
|| softc->current_bus_width[targ] != softc->goal_bus_width[targ] )
return 0; /* FALSE */
else
return 1; /* TRUE */
}After the values are re-negotiated the resulting values
must be assigned to both current and goal parameters, so for
future I/O transactions the current and goal parameters
would be the same and
target_negotiated() would return TRUE.
When the card is initialized (in
xxx_attach()) the current negotiation
values must be initialized to narrow asynchronous mode, the
goal and current values must be initialized to the maximal
values supported by controller.XPT_GET_TRAN_SETTINGS - get values
of SCSI transfer settingsThis operations is the reverse of XPT_SET_TRAN_SETTINGS.
Fill up the CCB instance
struct ccb_trans_setting cts with data as
requested by the flags CCB_TRANS_CURRENT_SETTINGS or
CCB_TRANS_USER_SETTINGS (if both are set then the existing
drivers return the current settings). Set all the bits in
the valid field.XPT_CALC_GEOMETRY - calculate
logical (BIOS)BIOS
geometry of the diskThe arguments are transferred in the instance
struct ccb_calc_geometry ccg of the union
ccb:block_size - input, block
(A.K.A sector) size in bytesvolume_size - input, volume
size in bytescylinders - output, logical
cylindersheads - output, logical
headssecs_per_track - output,
logical sectors per trackIf the returned geometry differs much enough from what
the SCSI controller BIOSSCSIBIOS thinks and a disk on
this SCSI controller is used as bootable the system may not
be able to boot. The typical calculation example taken from
the aic7xxx driver is: struct ccb_calc_geometry *ccg;
u_int32_t size_mb;
u_int32_t secs_per_cylinder;
int extended;
ccg = &ccb->ccg;
size_mb = ccg->volume_size
/ ((1024L * 1024L) / ccg->block_size);
extended = check_cards_EEPROM_for_extended_geometry(softc);
if (size_mb > 1024 && extended) {
ccg->heads = 255;
ccg->secs_per_track = 63;
} else {
ccg->heads = 64;
ccg->secs_per_track = 32;
}
secs_per_cylinder = ccg->heads * ccg->secs_per_track;
ccg->cylinders = ccg->volume_size / secs_per_cylinder;
ccb->ccb_h.status = CAM_REQ_CMP;
xpt_done(ccb);
return;This gives the general idea, the exact calculation
depends on the quirks of the particular BIOS. If BIOS
provides no way set the extended translation
flag in EEPROM this flag should normally be assumed equal to
1. Other popular geometries are: 128 heads, 63 sectors - Symbios controllers
16 heads, 63 sectors - old controllersSome system BIOSes and SCSI BIOSes fight with each other
with variable success, for example a combination of Symbios
875/895 SCSI and Phoenix BIOS can give geometry 128/63 after
power up and 255/63 after a hard reset or soft
reboot.XPT_PATH_INQ - path inquiry, in
other words get the SIM driver and SCSI controller (also
known as HBA - Host Bus Adapter) propertiesThe properties are returned in the instance
struct ccb_pathinq cpi of the union
ccb:version_num - the SIM driver version number, now all
drivers use 1hba_inquiry - bitmask of features supported by the
controller:PI_MDP_ABLE - supports MDP message (something from
SCSI3?)PI_WIDE_32 - supports 32 bit wide
SCSIPI_WIDE_16 - supports 16 bit wide
SCSIPI_SDTR_ABLE - can negotiate synchronous transfer
ratePI_LINKED_CDB - supports linked
commandsPI_TAG_ABLE - supports tagged
commandsPI_SOFT_RST - supports soft reset alternative (hard
reset and soft reset are mutually exclusive within a
SCSI bus)target_sprt - flags for target mode support, 0 if
unsupportedhba_misc - miscellaneous controller
features:PIM_SCANHILO - bus scans from high ID to low
IDPIM_NOREMOVE - removable devices not included in
scanPIM_NOINITIATOR - initiator role not
supportedPIM_NOBUSRESET - user has disabled initial BUS
RESEThba_eng_cnt - mysterious HBA engine count, something
related to compression, now is always set to 0vuhba_flags - vendor-unique flags, unused nowmax_target - maximal supported target ID (7 for
8-bit bus, 15 for 16-bit bus, 127 for Fibre
Channel)max_lun - maximal supported LUN ID (7 for older SCSI
controllers, 63 for newer ones)async_flags - bitmask of installed Async handler,
unused nowhpath_id - highest Path ID in the subsystem, unused
nowunit_number - the controller unit number,
cam_sim_unit(sim)bus_id - the bus number, cam_sim_bus(sim)initiator_id - the SCSI ID of the controller
itselfbase_transfer_speed - nominal transfer speed in KB/s
for asynchronous narrow transfers, equals to 3300 for
SCSIsim_vid - SIM driver's vendor id, a zero-terminated
string of maximal length SIM_IDLEN including the
terminating zerohba_vid - SCSI controller's vendor id, a
zero-terminated string of maximal length HBA_IDLEN
including the terminating zerodev_name - device driver name, a zero-terminated
string of maximal length DEV_IDLEN including the
terminating zero, equal to cam_sim_name(sim)The recommended way of setting the string fields is
using strncpy, like: strncpy(cpi->dev_name, cam_sim_name(sim), DEV_IDLEN);After setting the values set the status to CAM_REQ_CMP
and mark the CCB as done.Pollingstatic void
xxx_pollstruct cam_sim *simThe poll function is used to simulate the interrupts when
the interrupt subsystem is not functioning (for example, when
the system has crashed and is creating the system dump). The
CAM subsystem sets the proper interrupt level before calling the
poll routine. So all it needs to do is to call the interrupt
routine (or the other way around, the poll routine may be doing
the real action and the interrupt routine would just call the
poll routine). Why bother about a separate function then?
- Because of different calling conventions. The
+ Due to different calling conventions. The
xxx_poll routine gets the struct cam_sim
pointer as its argument when the PCI interrupt routine by common
convention gets pointer to the struct
xxx_softc and the ISA interrupt routine
gets just the device unit number. So the poll routine would
normally look as:static void
xxx_poll(struct cam_sim *sim)
{
xxx_intr((struct xxx_softc *)cam_sim_softc(sim)); /* for PCI device */
}orstatic void
xxx_poll(struct cam_sim *sim)
{
xxx_intr(cam_sim_unit(sim)); /* for ISA device */
}Asynchronous EventsIf an asynchronous event callback has been set up then the
callback function should be defined.static void
ahc_async(void *callback_arg, u_int32_t code, struct cam_path *path, void *arg)callback_arg - the value supplied when registering the
callbackcode - identifies the type of eventpath - identifies the devices to which the event
appliesarg - event-specific argumentImplementation for a single type of event, AC_LOST_DEVICE,
looks like: struct xxx_softc *softc;
struct cam_sim *sim;
int targ;
struct ccb_trans_settings neg;
sim = (struct cam_sim *)callback_arg;
softc = (struct xxx_softc *)cam_sim_softc(sim);
switch (code) {
case AC_LOST_DEVICE:
targ = xpt_path_target_id(path);
if(targ <= OUR_MAX_SUPPORTED_TARGET) {
clean_negotiations(softc, targ);
/* send indication to CAM */
neg.bus_width = 8;
neg.sync_period = neg.sync_offset = 0;
neg.valid = (CCB_TRANS_BUS_WIDTH_VALID
| CCB_TRANS_SYNC_RATE_VALID | CCB_TRANS_SYNC_OFFSET_VALID);
xpt_async(AC_TRANSFER_NEG, path, &neg);
}
break;
default:
break;
}InterruptsSCSIinterruptsThe exact type of the interrupt routine depends on the type
of the peripheral bus (PCI, ISA and so on) to which the SCSI
controller is connected.The interrupt routines of the SIM drivers run at the
interrupt level splcam. So splcam() should
be used in the driver to synchronize activity between the
interrupt routine and the rest of the driver (for a
multiprocessor-aware driver things get yet more interesting but
we ignore this case here). The pseudo-code in this document
happily ignores the problems of synchronization. The real code
must not ignore them. A simple-minded approach is to set
splcam() on the entry to the other routines
and reset it on return thus protecting them by one big critical
section. To make sure that the interrupt level will be always
restored a wrapper function can be defined, like: static void
xxx_action(struct cam_sim *sim, union ccb *ccb)
{
int s;
s = splcam();
xxx_action1(sim, ccb);
splx(s);
}
static void
xxx_action1(struct cam_sim *sim, union ccb *ccb)
{
... process the request ...
}This approach is simple and robust but the problem with it
is that interrupts may get blocked for a relatively long time
and this would negatively affect the system's performance. On
the other hand the functions of the spl()
family have rather high overhead, so vast amount of tiny
critical sections may not be good either.The conditions handled by the interrupt routine and the
details depend very much on the hardware. We consider the set
of typical conditions.First, we check if a SCSI reset was encountered on the bus
(probably caused by another SCSI controller on the same SCSI
bus). If so we drop all the enqueued and disconnected requests,
report the events and re-initialize our SCSI controller. It is
important that during this initialization the controller will
not issue another reset or else two controllers on the same SCSI
bus could ping-pong resets forever. The case of fatal
controller error/hang could be handled in the same place, but it
will probably need also sending RESET signal to the SCSI bus to
reset the status of the connections with the SCSI
devices. int fatal=0;
struct ccb_trans_settings neg;
struct cam_path *path;
if( detected_scsi_reset(softc)
|| (fatal = detected_fatal_controller_error(softc)) ) {
int targ, lun;
struct xxx_hcb *h, *hh;
/* drop all enqueued CCBs */
for(h = softc->first_queued_hcb; h != NULL; h = hh) {
hh = h->next;
free_hcb_and_ccb_done(h, h->ccb, CAM_SCSI_BUS_RESET);
}
/* the clean values of negotiations to report */
neg.bus_width = 8;
neg.sync_period = neg.sync_offset = 0;
neg.valid = (CCB_TRANS_BUS_WIDTH_VALID
| CCB_TRANS_SYNC_RATE_VALID | CCB_TRANS_SYNC_OFFSET_VALID);
/* drop all disconnected CCBs and clean negotiations */
for(targ=0; targ <= OUR_MAX_SUPPORTED_TARGET; targ++) {
clean_negotiations(softc, targ);
/* report the event if possible */
if(xpt_create_path(&path, /*periph*/NULL,
cam_sim_path(sim), targ,
CAM_LUN_WILDCARD) == CAM_REQ_CMP) {
xpt_async(AC_TRANSFER_NEG, path, &neg);
xpt_free_path(path);
}
for(lun=0; lun <= OUR_MAX_SUPPORTED_LUN; lun++)
for(h = softc->first_discon_hcb[targ][lun]; h != NULL; h = hh) {
hh=h->next;
if(fatal)
free_hcb_and_ccb_done(h, h->ccb, CAM_UNREC_HBA_ERROR);
else
free_hcb_and_ccb_done(h, h->ccb, CAM_SCSI_BUS_RESET);
}
}
/* report the event */
xpt_async(AC_BUS_RESET, softc->wpath, NULL);
/* re-initialization may take a lot of time, in such case
* its completion should be signaled by another interrupt or
* checked on timeout - but for simplicity we assume here that
* it is really fast
*/
if(!fatal) {
reinitialize_controller_without_scsi_reset(softc);
} else {
reinitialize_controller_with_scsi_reset(softc);
}
schedule_next_hcb(softc);
return;
}If interrupt is not caused by a controller-wide condition
then probably something has happened to the current hardware
control block. Depending on the hardware there may be other
non-HCB-related events, we just do not consider them here. Then
we analyze what happened to this HCB: struct xxx_hcb *hcb, *h, *hh;
int hcb_status, scsi_status;
int ccb_status;
int targ;
int lun_to_freeze;
hcb = get_current_hcb(softc);
if(hcb == NULL) {
/* either stray interrupt or something went very wrong
* or this is something hardware-dependent
*/
handle as necessary;
return;
}
targ = hcb->target;
hcb_status = get_status_of_current_hcb(softc);First we check if the HCB has completed and if so we check
the returned SCSI status. if(hcb_status == COMPLETED) {
scsi_status = get_completion_status(hcb);Then look if this status is related to the REQUEST SENSE
command and if so handle it in a simple way. if(hcb->flags & DOING_AUTOSENSE) {
if(scsi_status == GOOD) { /* autosense was successful */
hcb->ccb->ccb_h.status |= CAM_AUTOSNS_VALID;
free_hcb_and_ccb_done(hcb, hcb->ccb, CAM_SCSI_STATUS_ERROR);
} else {
autosense_failed:
free_hcb_and_ccb_done(hcb, hcb->ccb, CAM_AUTOSENSE_FAIL);
}
schedule_next_hcb(softc);
return;
}Else the command itself has completed, pay more attention to
details. If auto-sense is not disabled for this CCB and the
command has failed with sense data then run REQUEST SENSE
command to receive that data. hcb->ccb->csio.scsi_status = scsi_status;
calculate_residue(hcb);
if( (hcb->ccb->ccb_h.flags & CAM_DIS_AUTOSENSE)==0
&& ( scsi_status == CHECK_CONDITION
|| scsi_status == COMMAND_TERMINATED) ) {
/* start auto-SENSE */
hcb->flags |= DOING_AUTOSENSE;
setup_autosense_command_in_hcb(hcb);
restart_current_hcb(softc);
return;
}
if(scsi_status == GOOD)
free_hcb_and_ccb_done(hcb, hcb->ccb, CAM_REQ_CMP);
else
free_hcb_and_ccb_done(hcb, hcb->ccb, CAM_SCSI_STATUS_ERROR);
schedule_next_hcb(softc);
return;
}One typical thing would be negotiation events: negotiation
messages received from a SCSI target (in answer to our
negotiation attempt or by target's initiative) or the target is
unable to negotiate (rejects our negotiation messages or does
not answer them). switch(hcb_status) {
case TARGET_REJECTED_WIDE_NEG:
/* revert to 8-bit bus */
softc->current_bus_width[targ] = softc->goal_bus_width[targ] = 8;
/* report the event */
neg.bus_width = 8;
neg.valid = CCB_TRANS_BUS_WIDTH_VALID;
xpt_async(AC_TRANSFER_NEG, hcb->ccb.ccb_h.path_id, &neg);
continue_current_hcb(softc);
return;
case TARGET_ANSWERED_WIDE_NEG:
{
int wd;
wd = get_target_bus_width_request(softc);
if(wd <= softc->goal_bus_width[targ]) {
/* answer is acceptable */
softc->current_bus_width[targ] =
softc->goal_bus_width[targ] = neg.bus_width = wd;
/* report the event */
neg.valid = CCB_TRANS_BUS_WIDTH_VALID;
xpt_async(AC_TRANSFER_NEG, hcb->ccb.ccb_h.path_id, &neg);
} else {
prepare_reject_message(hcb);
}
}
continue_current_hcb(softc);
return;
case TARGET_REQUESTED_WIDE_NEG:
{
int wd;
wd = get_target_bus_width_request(softc);
wd = min (wd, OUR_BUS_WIDTH);
wd = min (wd, softc->user_bus_width[targ]);
if(wd != softc->current_bus_width[targ]) {
/* the bus width has changed */
softc->current_bus_width[targ] =
softc->goal_bus_width[targ] = neg.bus_width = wd;
/* report the event */
neg.valid = CCB_TRANS_BUS_WIDTH_VALID;
xpt_async(AC_TRANSFER_NEG, hcb->ccb.ccb_h.path_id, &neg);
}
prepare_width_nego_rsponse(hcb, wd);
}
continue_current_hcb(softc);
return;
}Then we handle any errors that could have happened during
auto-sense in the same simple-minded way as before. Otherwise
we look closer at the details again. if(hcb->flags & DOING_AUTOSENSE)
goto autosense_failed;
switch(hcb_status) {The next event we consider is unexpected disconnect. Which
is considered normal after an ABORT or BUS DEVICE RESET message
and abnormal in other cases. case UNEXPECTED_DISCONNECT:
if(requested_abort(hcb)) {
/* abort affects all commands on that target+LUN, so
* mark all disconnected HCBs on that target+LUN as aborted too
*/
for(h = softc->first_discon_hcb[hcb->target][hcb->lun];
h != NULL; h = hh) {
hh=h->next;
free_hcb_and_ccb_done(h, h->ccb, CAM_REQ_ABORTED);
}
ccb_status = CAM_REQ_ABORTED;
} else if(requested_bus_device_reset(hcb)) {
int lun;
/* reset affects all commands on that target, so
* mark all disconnected HCBs on that target+LUN as reset
*/
for(lun=0; lun <= OUR_MAX_SUPPORTED_LUN; lun++)
for(h = softc->first_discon_hcb[hcb->target][lun];
h != NULL; h = hh) {
hh=h->next;
free_hcb_and_ccb_done(h, h->ccb, CAM_SCSI_BUS_RESET);
}
/* send event */
xpt_async(AC_SENT_BDR, hcb->ccb->ccb_h.path_id, NULL);
/* this was the CAM_RESET_DEV request itself, it is completed */
ccb_status = CAM_REQ_CMP;
} else {
calculate_residue(hcb);
ccb_status = CAM_UNEXP_BUSFREE;
/* request the further code to freeze the queue */
hcb->ccb->ccb_h.status |= CAM_DEV_QFRZN;
lun_to_freeze = hcb->lun;
}
break;If the target refuses to accept tags we notify CAM about
that and return back all commands for this LUN: case TAGS_REJECTED:
/* report the event */
neg.flags = 0 & ~CCB_TRANS_TAG_ENB;
neg.valid = CCB_TRANS_TQ_VALID;
xpt_async(AC_TRANSFER_NEG, hcb->ccb.ccb_h.path_id, &neg);
ccb_status = CAM_MSG_REJECT_REC;
/* request the further code to freeze the queue */
hcb->ccb->ccb_h.status |= CAM_DEV_QFRZN;
lun_to_freeze = hcb->lun;
break;Then we check a number of other conditions, with processing
basically limited to setting the CCB status: case SELECTION_TIMEOUT:
ccb_status = CAM_SEL_TIMEOUT;
/* request the further code to freeze the queue */
hcb->ccb->ccb_h.status |= CAM_DEV_QFRZN;
lun_to_freeze = CAM_LUN_WILDCARD;
break;
case PARITY_ERROR:
ccb_status = CAM_UNCOR_PARITY;
break;
case DATA_OVERRUN:
case ODD_WIDE_TRANSFER:
ccb_status = CAM_DATA_RUN_ERR;
break;
default:
/* all other errors are handled in a generic way */
ccb_status = CAM_REQ_CMP_ERR;
/* request the further code to freeze the queue */
hcb->ccb->ccb_h.status |= CAM_DEV_QFRZN;
lun_to_freeze = CAM_LUN_WILDCARD;
break;
}Then we check if the error was serious enough to freeze the
input queue until it gets proceeded and do so if it is: if(hcb->ccb->ccb_h.status & CAM_DEV_QFRZN) {
/* freeze the queue */
xpt_freeze_devq(ccb->ccb_h.path, /*count*/1);
/* re-queue all commands for this target/LUN back to CAM */
for(h = softc->first_queued_hcb; h != NULL; h = hh) {
hh = h->next;
if(targ == h->targ
&& (lun_to_freeze == CAM_LUN_WILDCARD || lun_to_freeze == h->lun) )
free_hcb_and_ccb_done(h, h->ccb, CAM_REQUEUE_REQ);
}
}
free_hcb_and_ccb_done(hcb, hcb->ccb, ccb_status);
schedule_next_hcb(softc);
return;This concludes the generic interrupt handling although
specific controllers may require some additions.Errors SummarySCSIerrorsWhen executing an I/O request many things may go wrong. The
reason of error can be reported in the CCB status with great
detail. Examples of use are spread throughout this document.
For completeness here is the summary of recommended responses
for the typical error conditions:CAM_RESRC_UNAVAIL - some resource
is temporarily unavailable and the SIM driver cannot
generate an event when it will become available. An example
of this resource would be some intra-controller hardware
resource for which the controller does not generate an
interrupt when it becomes available.CAM_UNCOR_PARITY - unrecovered
parity error occurredCAM_DATA_RUN_ERR - data overrun or
unexpected data phase (going in other direction than
specified in CAM_DIR_MASK) or odd transfer length for wide
transferCAM_SEL_TIMEOUT - selection timeout
occurred (target does not respond)CAM_CMD_TIMEOUT - command timeout
occurred (the timeout function ran)CAM_SCSI_STATUS_ERROR - the device
returned errorCAM_AUTOSENSE_FAIL - the device
returned error and the REQUEST SENSE COMMAND failedCAM_MSG_REJECT_REC - MESSAGE REJECT
message was receivedCAM_SCSI_BUS_RESET - received SCSI
bus resetCAM_REQ_CMP_ERR -
impossible SCSI phase occurred or something
else as weird or just a generic error if further detail is
not availableCAM_UNEXP_BUSFREE - unexpected
disconnect occurredCAM_BDR_SENT - BUS DEVICE RESET
message was sent to the targetCAM_UNREC_HBA_ERROR - unrecoverable
Host Bus Adapter ErrorCAM_REQ_TOO_BIG - the request was
too large for this controllerCAM_REQUEUE_REQ - this request
should be re-queued to preserve transaction ordering. This
typically occurs when the SIM recognizes an error that
should freeze the queue and must place other queued requests
for the target at the sim level back into the XPT queue.
Typical cases of such errors are selection timeouts, command
timeouts and other like conditions. In such cases the
troublesome command returns the status indicating the error,
the and the other commands which have not be sent to the bus
yet get re-queued.CAM_LUN_INVALID - the LUN ID in the
request is not supported by the SCSI controllerCAM_TID_INVALID - the target ID in
the request is not supported by the SCSI controllerTimeout HandlingWhen the timeout for an HCB expires that request should be
aborted, just like with an XPT_ABORT request. The only
difference is that the returned status of aborted request should
be CAM_CMD_TIMEOUT instead of CAM_REQ_ABORTED (that is why
implementation of the abort better be done as a function). But
there is one more possible problem: what if the abort request
itself will get stuck? In this case the SCSI bus should be
reset, just like with an XPT_RESET_BUS request (and the idea
about implementing it as a function called from both places
applies here too). Also we should reset the whole SCSI bus if a
device reset request got stuck. So after all the timeout
function would look like:static void
xxx_timeout(void *arg)
{
struct xxx_hcb *hcb = (struct xxx_hcb *)arg;
struct xxx_softc *softc;
struct ccb_hdr *ccb_h;
softc = hcb->softc;
ccb_h = &hcb->ccb->ccb_h;
if(hcb->flags & HCB_BEING_ABORTED
|| ccb_h->func_code == XPT_RESET_DEV) {
xxx_reset_bus(softc);
} else {
xxx_abort_ccb(hcb->ccb, CAM_CMD_TIMEOUT);
}
}When we abort a request all the other disconnected requests
to the same target/LUN get aborted too. So there appears a
question, should we return them with status CAM_REQ_ABORTED or
CAM_CMD_TIMEOUT? The current drivers use CAM_CMD_TIMEOUT. This
seems logical because if one request got timed out then probably
something really bad is happening to the device, so if they
would not be disturbed they would time out by themselves.
diff --git a/en_US.ISO8859-1/books/arch-handbook/usb/chapter.xml b/en_US.ISO8859-1/books/arch-handbook/usb/chapter.xml
index 6fa02b2d59..1d34c3192b 100644
--- a/en_US.ISO8859-1/books/arch-handbook/usb/chapter.xml
+++ b/en_US.ISO8859-1/books/arch-handbook/usb/chapter.xml
@@ -1,721 +1,721 @@
USB DevicesNickHibmaWritten by MurrayStokelyModifications for Handbook made by IntroductionUniversal Serial Bus
(USB)NetBSDThe Universal Serial Bus (USB) is a new way of attaching
devices to personal computers. The bus architecture features
two-way communication and has been developed as a response to
devices becoming smarter and requiring more interaction with the
host. USB support is included in all current PC chipsets and is
therefore available in all recently built PCs. Apple's
introduction of the USB-only iMac has been a major incentive for
hardware manufacturers to produce USB versions of their devices.
The future PC specifications specify that all legacy connectors
on PCs should be replaced by one or more USB connectors,
providing generic plug and play capabilities. Support for USB
hardware was available at a very early stage in NetBSD and was
developed by Lennart Augustsson for the NetBSD project. The
code has been ported to FreeBSD and we are currently maintaining
a shared code base. For the implementation of the USB subsystem
a number of features of USB are important.Lennart Augustsson has done most of the
implementation of the USB support for the NetBSD project.
Many thanks for this incredible amount of work. Many thanks
also to Ardy and Dirk for their comments and proofreading of
this paper.Devices connect to ports on the computer directly or on
devices called hubs, forming a treelike device
structure.The devices can be connected and disconnected at run
time.Devices can suspend themselves and trigger resumes of
the host systemAs the devices can be powered from the bus, the host
software has to keep track of power budgets for each
hub.Different quality of service requirements by the
different device types together with the maximum of 126
devices that can be connected to the same bus, require
proper scheduling of transfers on the shared bus to take
full advantage of the 12Mbps bandwidth available. (over
400Mbps with USB 2.0)Devices are intelligent and contain easily accessible
information about themselvesThe development of drivers for the USB subsystem and devices
connected to it is supported by the specifications that have
been developed and will be developed. These specifications are
publicly available from the USB home pages. Apple has been very
strong in pushing for standards based drivers, by making drivers
for the generic classes available in their operating system
MacOS and discouraging the use of separate drivers for each new
device. This chapter tries to collate essential information for
a basic understanding of the USB 2.0 implementation stack in
FreeBSD/NetBSD. It is recommended however to read it together
with the relevant 2.0 specifications and other developer
resources:USB 2.0 Specification (http://www.usb.org/developers/docs/usb20_docs/)Universal Host Controller Interface
(UHCI) Specification (ftp://ftp.netbsd.org/pub/NetBSD/misc/blymn/uhci11d.pdf)Open Host Controller Interface (OHCI)
Specification(ftp://ftp.compaq.com/pub/supportinformation/papers/hcir1_0a.pdf)Developer section of USB home page
(http://www.usb.org/developers/)Structure of the USB StackThe USB support in FreeBSD can be split into three layers.
The lowest layer contains the host controller driver,
providing a generic interface to the hardware and its
scheduling facilities. It supports initialisation of the
hardware, scheduling of transfers and handling of completed
and/or failed transfers. Each host controller driver
implements a virtual hub providing hardware independent access
to the registers controlling the root ports on the back of the
machine.The middle layer handles the device connection and
disconnection, basic initialisation of the device, driver
selection, the communication channels (pipes) and does
resource management. This services layer also controls the
default pipes and the device requests transferred over
them.The top layer contains the individual drivers supporting
specific (classes of) devices. These drivers implement the
protocol that is used over the pipes other than the default
pipe. They also implement additional functionality to make
the device available to other parts of the kernel or userland.
They use the USB driver interface (USBDI) exposed by the
services layer.Host ControllersUSBhost
controllersThe host controller (HC) controls the transmission of
packets on the bus. Frames of 1 millisecond are used. At the
start of each frame the host controller generates a Start of
Frame (SOF) packet.The SOF packet is used to synchronise to the start of the
frame and to keep track of the frame number. Within each frame
packets are transferred, either from host to device (out) or
from device to host (in). Transfers are always initiated by the
host (polled transfers). Therefore there can only be one host
per USB bus. Each transfer of a packet has a status stage in
which the recipient of the data can return either ACK
(acknowledge reception), NAK (retry), STALL (error condition) or
nothing (garbled data stage, device not available or
disconnected). Section 8.5 of the USB 2.0 Specification
explains the details of packets in more detail. Four different
types of transfers can occur on a USB bus: control, bulk,
interrupt and isochronous. The types of transfers and their
characteristics are described below.Large transfers between the device on the USB bus and the
device driver are split up into multiple packets by the host
controller or the HC driver.Device requests (control transfers) to the default endpoints
are special. They consist of two or three phases: SETUP, DATA
(optional) and STATUS. The set-up packet is sent to the device.
If there is a data phase, the direction of the data packet(s) is
given in the set-up packet. The direction in the status phase
is the opposite of the direction during the data phase, or IN if
there was no data phase. The host controller hardware also
provides registers with the current status of the root ports and
the changes that have occurred since the last reset of the
status change register. Access to these registers is provided
through a virtualised hub as suggested in the USB specification.
The virtual hub must comply with the hub device class given in
chapter 11 of that specification. It must provide a default
pipe through which device requests can be sent to it. It
returns the standard andhub class specific set of descriptors.
It should also provide an interrupt pipe that reports changes
happening at its ports. There are currently two specifications
for host controllers available: Universal Host Controller
Interface (UHCI) from Intel and Open Host
Controller Interface (OHCI) from Compaq,
Microsoft, and National Semiconductor. The
UHCI specification has been designed to
reduce hardware complexity by requiring the host controller
driver to supply a complete schedule of the transfers for each
frame. OHCI type controllers are much more independent by
providing a more abstract interface doing a lot of work
themselves.UHCIUSBUHCIThe UHCI host controller maintains a framelist with 1024
pointers to per frame data structures. It understands two
different data types: transfer descriptors (TD) and queue
heads (QH). Each TD represents a packet to be communicated to
or from a device endpoint. QHs are a means to groupTDs (and
QHs) together.Each transfer consists of one or more packets. The UHCI
driver splits large transfers into multiple packets. For
every transfer, apart from isochronous transfers, a QH is
allocated. For every type of transfer these QHs are collected
at a QH for that type. Isochronous transfers have to be
executed first because of the fixed latency requirement and
are directly referred to by the pointer in the framelist. The
last isochronous TD refers to the QH for interrupt transfers
for that frame. All QHs for interrupt transfers point at the
QH for control transfers, which in turn points at the QH for
bulk transfers. The following diagram gives a graphical
overview of this:This results in the following schedule being run in each
frame. After fetching the pointer for the current frame from
the framelist the controller first executes the TDs for all
the isochronous packets in that frame. The last of these TDs
refers to the QH for the interrupt transfers for thatframe.
The host controller will then descend from that QH to the QHs
for the individual interrupt transfers. After finishing that
queue, the QH for the interrupt transfers will refer the
controller to the QH for all control transfers. It will
execute all the subqueues scheduled there, followed by all the
transfers queued at the bulk QH. To facilitate the handling
of finished or failed transfers different types of interrupts
are generated by the hardware at the end of each frame. In
the last TD for a transfer the Interrupt-On Completion bit is
set by the HC driver to flag an interrupt when the transfer
has completed. An error interrupt is flagged if a TD reaches
its maximum error count. If the short packet detect bit is
set in a TD and less than the set packet length is transferred
this interrupt is flagged to notify the controller driver of
the completed transfer. It is the host controller driver's
task to find out which transfer has completed or produced an
error. When called the interrupt service routine will locate
all the finished transfers and call their callbacks.Refer to the UHCI Specification for a
more elaborate description.OHCIUSBOHCIProgramming an OHCI host controller is much simpler. The
controller assumes that a set of endpoints is available, and
is aware of scheduling priorities and the ordering of the
types of transfers in a frame. The main data structure used
by the host controller is the endpoint descriptor (ED) to
which a queue of transfer descriptors (TDs) is attached. The
ED contains the maximum packet size allowed for an endpoint
and the controller hardware does the splitting into packets.
The pointers to the data buffers are updated after each
transfer and when the start and end pointer are equal, the TD
is retired to the done-queue. The four types of endpoints
(interrupt, isochronous, control, and bulk) have their own
queues. Control and bulk endpoints are queued each at their
own queue. Interrupt EDs are queued in a tree, with the level
in the tree defining the frequency at which they run.The schedule being run by the host controller in each
frame looks as follows. The controller will first run the
non-periodic control and bulk queues, up to a time limit set
by the HC driver. Then the interrupt transfers for that frame
number are run, by using the lower five bits of the frame
number as an index into level 0 of the tree of interrupts EDs.
At the end of this tree the isochronous EDs are connected and
these are traversed subsequently. The isochronous TDs contain
the frame number of the first frame the transfer should be run
in. After all the periodic transfers have been run, the
control and bulk queues are traversed again. Periodically the
interrupt service routine is called to process the done queue
and call the callbacks for each transfer and reschedule
interrupt and isochronous endpoints.See the UHCI Specification for a more
elaborate description. The middle layer provides access to
the device in a controlled way and maintains resources in use
by the different drivers and the services layer. The layer
takes care of the following aspects:The device configuration informationThe pipes to communicate with a deviceProbing and attaching and detaching form a
device.USB Device InformationDevice Configuration InformationEach device provides different levels of configuration
information. Each device has one or more configurations, of
which one is selected during probe/attach. A configuration
provides power and bandwidth requirements. Within each
configuration there can be multiple interfaces. A device
interface is a collection of endpoints. For example USB
speakers can have an interface for the audio data (Audio
Class) and an interface for the knobs, dials and buttons (HID
Class). All interfaces in a configuration are active at the
same time and can be attached to by different drivers. Each
interface can have alternates, providing different quality of
service parameters. In for example cameras this is used to
provide different frame sizes and numbers of frames per
second.Within each interface, 0 or more endpoints can be
specified. Endpoints are the unidirectional access points for
communicating with a device. They provide buffers to
temporarily store incoming or outgoing data from the device.
Each endpoint has a unique address within a configuration, the
endpoint's number plus its direction. The default endpoint,
endpoint 0, is not part of any interface and available in all
configurations. It is managed by the services layer and not
directly available to device drivers.This hierarchical configuration information is described
in the device by a standard set of descriptors (see section
9.6 of the USB specification). They can be requested through
the Get Descriptor Request. The services layer caches these
descriptors to avoid unnecessary transfers on the USB bus.
Access to the descriptors is provided through function
calls.Device descriptors: General information about the
device, like Vendor, Product and Revision Id, supported
device class, subclass and protocol if applicable, maximum
packet size for the default endpoint, etc.Configuration descriptors: The number of interfaces in
this configuration, suspend and resume functionality
supported and power requirements.Interface descriptors: interface class, subclass and
protocol if applicable, number of alternate settings for
the interface and the number of endpoints.Endpoint descriptors: Endpoint address, direction and
type, maximum packet size supported and polling frequency
if type is interrupt endpoint. There is no descriptor for
the default endpoint (endpoint 0) and it is never counted
in an interface descriptor.String descriptors: In the other descriptors string
indices are supplied for some fields.These can be used to
retrieve descriptive strings, possibly in multiple
languages.Class specifications can add their own descriptor types
that are available through the GetDescriptor Request.Pipes Communication to end points on a device flows
through so-called pipes. Drivers submit transfers to
endpoints to a pipe and provide a callback to be called on
completion or failure of the transfer (asynchronous transfers)
or wait for completion (synchronous transfer). Transfers to
an endpoint are serialised in the pipe. A transfer can either
complete, fail or time-out (if a time-out has been set).
There are two types of time-outs for transfers. Time-outs can
happen due to time-out on the USBbus (milliseconds). These
time-outs are seen as failures and can be due to disconnection
of the device. A second form of time-out is implemented in
software and is triggered when a transfer does not complete
within a specified amount of time (seconds). These are caused
by a device acknowledging negatively (NAK) the transferred
packets. The cause for this is the device not being ready to
receive data, buffer under- or overrun or protocol
errors.If a transfer over a pipe is larger than the maximum
packet size specified in the associated endpoint descriptor,
the host controller (OHCI) or the HC driver (UHCI) will split
the transfer into packets of maximum packet size, with the
last packet possibly smaller than the maximum packet
size.Sometimes it is not a problem for a device to return less
data than requested. For example abulk-in-transfer to a modem
might request 200 bytes of data, but the modem has only 5
bytes available at that time. The driver can set the short
packet (SPD) flag. It allows the host controller to accept a
packet even if the amount of data transferred is less than
requested. This flag is only valid for in-transfers, as the
amount of data to be sent to a device is always known
beforehand. If an unrecoverable error occurs in a device
during a transfer the pipe is stalled. Before any more data
is accepted or sent the driver needs to resolve the cause of
the stall and clear the endpoint stall condition through send
the clear endpoint halt device request over the default pipe.
The default endpoint should never stall.There are four different types of endpoints and
corresponding pipes: - Control pipe / default pipe: There is
one control pipe per device, connected to the default endpoint
(endpoint 0). The pipe carries the device requests and
associated data. The difference between transfers over the
default pipe and other pipes is that the protocol for the
transfers is described in the USB specification. These
requests are used to reset and configure the device. A basic
set of commands that must be supported by each device is
provided in chapter 9 of the USB specification. The commands
supported on this pipe can be extended by a device class
specification to support additional functionality.Bulk pipe: This is the USB equivalent to a raw
transmission medium.Interrupt pipe: The host sends a request for data to
the device and if the device has nothing to send, it will
NAK the data packet. Interrupt transfers are scheduled at
a frequency specified when creating the
pipe.Isochronous pipe: These pipes are intended for
isochronous data, for example video or audio streams, with
fixed latency, but no guaranteed delivery. Some support
for pipes of this type is available in the current
implementation. Packets in control, bulk and interrupt
transfers are retried if an error occurs during
transmission or the device acknowledges the packet
negatively (NAK) due to for example lack of buffer space
to store the incoming data. Isochronous packets are
however not retried in case of failed delivery or NAK of a
packet as this might violate the timing
constraints.The availability of the necessary bandwidth is calculated
during the creation of the pipe. Transfers are scheduled
within frames of 1 millisecond. The bandwidth allocation
within a frame is prescribed by the USB specification, section
5.6 [ 2]. Isochronous and interrupt transfers are allowed to
consume up to 90% of the bandwidth within a frame. Packets
for control and bulk transfers are scheduled after all
isochronous and interrupt packets and will consume all the
remaining bandwidth.More information on scheduling of transfers and bandwidth
reclamation can be found in chapter 5 of the USB
specification, section 1.3 of the UHCI specification, and
section 3.4.2 of the OHCI specification.Device Probe and AttachUSBprobeAfter the notification by the hub that a new device has been
connected, the service layer switches on the port, providing the
device with 100 mA of current. At this point the device is in
its default state and listening to device address 0. The
services layer will proceed to retrieve the various descriptors
through the default pipe. After that it will send a Set Address
request to move the device away from the default device address
(address 0). Multiple device drivers might be able to support
the device. For example a modem driver might be able to support
an ISDN TA through the AT compatibility interface. A driver for
that specific model of the ISDN adapter might however be able to
provide much better support for this device. To support this
flexibility, the probes return priorities indicating their level
of support. Support for a specific revision of a product ranks
the highest and the generic driver the lowest priority. It
might also be that multiple drivers could attach to one device
if there are multiple interfaces within one configuration. Each
driver only needs to support a subset of the interfaces.The probing for a driver for a newly attached device checks
first for device specific drivers. If not found, the probe code
iterates over all supported configurations until a driver
attaches in a configuration. To support devices with multiple
drivers on different interfaces, the probe iterates over all
interfaces in a configuration that have not yet been claimed by
a driver. Configurations that exceed the power budget for the
hub are ignored. During attach the driver should initialise the
device to its proper state, but not reset it, as this will make
the device disconnect itself from the bus and restart the
probing process for it. To avoid consuming unnecessary
bandwidth should not claim the interrupt pipe at attach time,
but should postpone allocating the pipe until the file is opened
and the data is actually used. When the file is closed the pipe
should be closed again, even though the device might still be
attached.Device Disconnect and DetachUSBdisconnectA device driver should expect to receive errors during any
transaction with the device. The design of USB supports and
encourages the disconnection of devices at any point in time.
Drivers should make sure that they do the right thing when the
device disappears.Furthermore a device that has been disconnected and
reconnected will not be reattached at the same device
instance. This might change in the future when more devices
support serial numbers (see the device descriptor) or other
means of defining an identity for a device have been
developed.The disconnection of a device is signaled by a hub in the
interrupt packet delivered to the hub driver. The status
change information indicates which port has seen a connection
change. The device detach method for all device drivers for
the device connected on that port are called and the
structures cleaned up. If the port status indicates that in
the mean time a device has been connected to that port, the
procedure for probing and attaching the device will be
started. A device reset will produce a disconnect-connect
sequence on the hub and will be handled as described
above.USB Drivers Protocol InformationThe protocol used over pipes other than the default pipe is
undefined by the USB specification. Information on this can be
found from various sources. The most accurate source is the
developer's section on the USB home pages. From these pages, a
growing number of deviceclass specifications are available.
These specifications specify what a compliant device should look
like from a driver perspective, basic functionality it needs to
provide and the protocol that is to be used over the
communication channels. The USB specification includes the
description of the Hub Class. A class specification for Human
Interface Devices (HID) has been created to cater for keyboards,
tablets, bar-code readers, buttons, knobs, switches, etc. A
third example is the class specification for mass storage
devices. For a full list of device classes see the developers
section on the USB home pages.For many devices the protocol information has not yet been
published however. Information on the protocol being used might
be available from the company making the device. Some companies
will require you to sign a Non -Disclosure Agreement (NDA)
before giving you the specifications. This in most cases
precludes making the driver open source.Another good source of information is the Linux driver
sources, as a number of companies have started to provide
drivers for Linux for their devices. It is always a good idea
to contact the authors of those drivers for their source of
information.Example: Human Interface Devices The specification for the
Human Interface Devices like keyboards, mice, tablets, buttons,
dials,etc. is referred to in other device class specifications
and is used in many devices.For example audio speakers provide endpoints to the digital
to analogue converters and possibly an extra pipe for a
microphone. They also provide a HID endpoint in a separate
interface for the buttons and dials on the front of the device.
The same is true for the monitor control class. It is
straightforward to build support for these interfaces through
the available kernel and userland libraries together with the
HID class driver or the generic driver. Another device that
serves as an example for interfaces within one configuration
driven by different device drivers is a cheap keyboard with
built-in legacy mouse port. To avoid having the cost of
including the hardware for a USB hub in the device,
manufacturers combined the mouse data received from the PS/2
port on the back of the keyboard and the key presses from the
keyboard into two separate interfaces in the same configuration.
The mouse and keyboard drivers each attach to the appropriate
interface and allocate the pipes to the two independent
endpoints.USBfirmwareExample: Firmware download Many devices that have been
developed are based on a general purpose processor with an
- additional USB core added to it. Because the development of
+ additional USB core added to it. Since the development of
drivers and firmware for USB devices is still very new, many
devices require the downloading of the firmware after they have
been connected.The procedure followed is straightforward. The device
identifies itself through a vendor and product Id. The first
driver probes and attaches to it and downloads the firmware into
it. After that the device soft resets itself and the driver is
detached. After a short pause the device announces its presence
on the bus. The device will have changed its
vendor/product/revision Id to reflect the fact that it has been
supplied with firmware and as a consequence a second driver will
probe it and attach to it.An example of these types of devices is the ActiveWire I/O
board, based on the EZ-USB chip. For this chip a generic
firmware downloader is available. The firmware downloaded into
the ActiveWire board changes the revision Id. It will then
perform a soft reset of the USB part of the EZ-USB chip to
disconnect from the USB bus and again reconnect.Example: Mass Storage Devices Support for mass storage
devices is mainly built around existing protocols. The Iomega
USB Zipdrive is based on the SCSI version of their drive. The
SCSI commands and status messages are wrapped in blocks and
transferred over the bulk pipes to and from the device,
emulating a SCSI controller over the USB wire. ATAPI and UFI
commands are supported in a similar fashion.ATAPIThe Mass Storage Specification supports 2 different types of
wrapping of the command block.The initial attempt was based on
sending the command and status through the default pipe and
using bulk transfers for the data to be moved between the host
and the device. Based on experience a second approach was
designed that was based on wrapping the command and status
blocks and sending them over the bulk out and in endpoint. The
specification specifies exactly what has to happen when and what
has to be done in case an error condition is encountered. The
biggest challenge when writing drivers for these devices is to
fit USB based protocol into the existing support for mass
storage devices. CAM provides hooks to do this in a fairly
straight forward way. ATAPI is less simple as historically the
IDE interface has never had many different appearances.The support for the USB floppy from Y-E Data is again less
straightforward as a new command set has been designed.