diff --git a/en/tutorials/docproj-primer/sgml-primer/chapter.sgml b/en/tutorials/docproj-primer/sgml-primer/chapter.sgml
index c25bacf1f1..f181ce5dc9 100644
--- a/en/tutorials/docproj-primer/sgml-primer/chapter.sgml
+++ b/en/tutorials/docproj-primer/sgml-primer/chapter.sgml
@@ -1,1554 +1,1572 @@
SGML Primer
- The Documentation Project makes heavy use of the Standard Generalized
- Markup Language (SGML). This chapter describes what SGML is, how to read
- and understand markup, and some of the SGML tricks you will see used in
- the FAQ, Handbook, and website.
+ The majority of FDP documentation is written in applications of
+ SGML. This chapter explains exactly what that means, how to read
+ and understand the source to the documentation, and the sort of SGML
+ tricks you will see used in the documentation.Portions of this section were inspired by Mark Galassi's Get Going With DocBook.OverviewWay back when, electronic text was simple to deal with. Admittedly,
you had to know which character set your document was written in (ASCII,
EBCDIC, or one of a number of others) but that was about it. Text was
text, and what you saw really was what you got. No frills, no
formatting, no intelligence.Inevitably, this was not enough. Once you have text in a
- machine-usable format, you expect machines to be able to use it, and
+ machine-usable format, you expect machines to be able to use it and
manipulate it intelligently. You would like to indicate that certain
phrases should be emphasised, or added to a glossary, or be hyperlinks.
You might want filenames to be shown in a “typewriter” style
font for viewing on screen, but as “italics” when printed,
or any of a myriad of other options for presentation.It was once hoped that Artificial Intelligence (AI) would make this
- easy. Your computer would read in the document, and automatically
+ easy. Your computer would read in the document and automatically
identify key phrases, filenames, text that the reader should type in,
examples, and more. Unfortunately, real life has not happened quite
- like that, and our computers require some assistance before the can
+ like that, and our computers require some assistance before they can
meaningfully process our text.More precisely, they need help identifying what is what. You or I
can look at
To remove /tmp/foo use &man.rm.1;.
- rm /tmp/foo
+ &prompt.user; rm /tmp/foo
and easily see which parts are filenames, which are commands to be typed
in, which parts are references to manual pages, and so on. But the
computer processing the document can not. For this we need
markup.“Markup” is commonly used to describe “adding
value” or “increasing cost”. The term takes on both
these meanings when applied to text. Markup is additional text included
in the document, distinguished from the document's content in some way,
so that programs that process the document can read the markup and use
it when making decisions about the document. Editors can hide the
- markup from the user, so they are not distracted by it.
+ markup from the user, so the user is not distracted by it.
The extra information stored in the markup adds
value to the document. Adding the markup to the document
must typically be done by a person—after all, if computers could
recognise the text sufficiently well to add the markup then there would
be no need to add it in the first place. This increases the
cost of the document.The previous example is actually represented in this document like
this;To remove /tmp/foo use &man.rm.1;.
rm /tmp/foo]]>As you can see, the markup is clearly separate from the
content.Obviously, if you are going to use markup you need to define what
your markup means, and how it should be interpreted. You will need a
markup language that you can follow when marking up your
documents.
-
- SGML is not a markup langugage. Instead, SGML
- is the language in which you write markup
- languages. There have been many markup languages written
- using SGML. HTML and DocBook are two of these.
-
- This is an important point to understand. Most of the time you are
- not writing SGML documents. Instead, you are writing documents in a
- particular markup language. The definition of the markup language you
- are using is written in SGML.
-
- Each language definition (which is written in SGML) is more properly
- called a Document Type Definition (DTD). The DTD specifies the name of
- the elements that can be used, what order they appear in (and whether
- some markup can be used inside other markup) and related
- information.
+
+ Of course, one markup language might not be enough. A markup
+ language for technical documentation has very different requirements
+ than a markup language that was to be used for cookery recipes. This,
+ in turn, would be very different from a markup language used to describe
+ poetry. What you really need is a first language that you use to write
+ these other markup languages. A meta markup
+ language.
+
+ This is exactly what the Standard Generalised Markup Language (SGML)
+ is. Many markup languages have been written in SGML, including the two
+ most used by the FDP, HTML and DocBook.
+
+ Each language definition is more properly called a Document Type
+ Definition (DTD). The DTD specifies the name of the elements that can
+ be used, what order they appear in (and whether some markup can be used
+ inside other markup) and related information. A DTD is sometimes
+ referred to as an application of SGML.A DTD is a complete
specification of all the elements that are allowed to appear, the order
in which they should appear, which elements are mandatory, which are
- optional, and so forth. This makes it possible to write a
- parser which reads in the DTD and a document which
- claims to conform to the DTD. The parser can then confirm whether or
- not all the elements required by the DTD are in the document in the
+ optional, and so forth. This makes it possible to write an SGML
+ parser which reads in both the DTD and a document
+ which claims to conform to the DTD. The parser can then confirm whether
+ or not all the elements required by the DTD are in the document in the
right order, and whether there are any errors in the markup. This is
normally referred to as validating the document.This processing simply confirms that the choice of elements, their
ordering, and so on, conforms to that listed in the DTD. It does
not check that you have used
appropriate markup for the content. If you were
to try and mark up all the filenames in your document as function
names, the parser would not flag this as an error (assuming, of
course, that your DTD defines elements for filenames and functions,
and that they are allowed to appear in the same place).It is likely that most of your contributions to the Documentation
Project will consist of content marked up in either HTML or DocBook,
rather than alterations to the DTDs. For this reason this book will
not touch on how to write a DTD.Elements, tags, and attributesAll the DTDs written in SGML share certain characteristics. This is
- hardly surprising, as the philisophy behind SGML will inevitably show
+ hardly surprising, as the philosophy behind SGML will inevitably show
through. One of the most obvious manifestations of this philisophy is
that of content and
elements.Your documentation (whether it is a single web page, or a lengthy
book) is considered to consist of content. This content is then divided
(and further subdivided) into elements. The purpose of adding markup is
to name and identify the boundaries of these elements for further
processing.For example, consider a typical book. At the very top level, the
book is itself an element. This “book” element obviously
contains chapters, which can be considered to be elements in their own
right. Each chapter will contain more elements, such as paragraphs,
quotations, and footnotes. Each paragraph might contain further
elements, identifying content that was direct speech, or the name of a
character in the story.You might like to think of this as “chunking” content.
At the very top level you have one chunk, the book. Look a little
deeper, and you have more chunks, the individual chapters. These are
chunked further into paragraphs, footnotes, character names, and so
on.Notice how you can make this differentation between different
elements of the content without resorting to any SGML terms. It really
is surprisingly straightforward. You could do this with a highlighter
pen and a printout of the book, using different colours to indicate
- different types of content.
+ different chunks of content.
- Of course, we don't have an electronic highlighter pen, so we need
+ Of course, we do not have an electronic highlighter pen, so we need
some other way of indicating which element each piece of content belongs
to. In languages written in SGML (HTML, DocBook, et al) this is done by
means of tags.A tag is used to identify where a particular element starts, and
- where the ends. The tag is not part of the element
+ where the element ends. The tag is not part of the element
itself. Because each DTD was normally written to mark up
specific types of information, each one will recognise different
elements, and will therefore have different names for the tags.For an element called element-name the
start tag will normally look like
<element-name>. The
corresponding closing tag for this element is
</element-name>.Using an element (start and end tags)HTML has an element for indicating that the content enclosed by
the element is a paragraph, called p. This
element has both start and end tags.
This is a paragraph. It starts with the start tag for
the 'p' element, and it will end with the end tag for the 'p'
element.
This is another paragraph. But this one is much shorter.
]]>
Not all elements require an end tag. Some elements have no content.
For example, in HTML you can indicate that you want a horizontal line to
appear in the document. Obviously, this line has no content, so just
the start tag is required for this element.Using an element (start tag only)HTML has an element for indicating a horizontal rule, called
hr. This element does not wrap content, so only has
a start tag.
This is a paragraph.
This is another paragraph. A horizontal rule separates this
from the previous paragraph.
]]>If it is not obvious by now, elements can contain other elements.
In the book example earlier, the book element contained all the chapter
elements, which in turn contained all the paragraph elements, and so
on.Elements within elements; em
This is a simple paragraph where some
of the words have been emphasised.]]>The DTD will specify the rules detailing which elements can contain
other elements, and exactly what they can contain.People often confuse the terms tags and elements, and use the terms
as if they were interchangeable. They are not.An element is a conceptual part of your document. An element has
a defined start and end. The tags mark where the element starts and
end.When this document (or anyone else knowledgable about SGML) refers
to “the <p> tag” they mean the literal text
consisting of the three characters <,
p, and >. But the phrase
“the <p> element” refers to the whole element.This distinction is very subtle. But keep it
in mind.Elements can have attributes. An attribute has a name and a value,
and is used for adding extra information to the element. This might be
information that indicates how the content should be rendered, or might
be something that uniquely identifies that occurence of the element, or
it might be something else.An element's attributes are written inside the
start tag for that element, and take the form
attribute-name="attribute-value".In sufficiently recent versions of HTML, the p
element has an attribute called align, which suggests
an alignment (justification) for the paragraph to the program displaying
the HTML.The align attribute can take one of four defined
values, left, center,
right and justify. If the
attribute is not specified then the default is
left.Using an element with an attribute
The inclusion of the align attribute
on this paragraph was superfluous, since the default is left.
This may appear in the center.
]]>Some attributes will only take specific values, such as
left or justify. Others will
allow you to enter anything you want. If you need to include quotes
(") within an attribute then use single quotes around
the attribute value.Single quotes around attributes
I'm on the right!]]>Sometimes you do not need to use quotes around attribute values at
all. However, the rules for doing this are subtle, and it is far simpler
just to always quote your attribute values.For you to do…In order to run the examples in this document you will need to
install some software on your system and ensure that an environment
variable is set correctly.Download and install textproc/docproj
from the FreeBSD ports system. This is a
meta-port that should download and install
all of the programs and supporting files that are used by the
Documentation Project.Add lines to your shell startup files to set
SGML_CATALOG_FILES..profile, for &man.sh.1; and
&man.bash.1; users
SGML_ROOT=/usr/local/share/sgml
SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog
SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILES
export SGML_CATALOG_FILES.login, for &man.csh.1; and
&man.tcsh.1; users
setenv SGML_ROOT /usr/local/share/sgml
setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog
setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILESThen either log out, and log back in again, or run those
commands from the command line to set the variable values.Create example.sgml, and enter the
following text;An example HTML file
This is a paragraph containing some text.
This paragraph contains some more text.
This paragraph might be right-justified.
]]>Try and validate this file using an SGML parser.Part of textproc/docproj is the
&man.nsgmls.1; validating
parser. Normally, &man.nsgmls.1; reads in a document
marked up according to an SGML DTD and returns a copy of the
document's Element Structure Information Set (ESIS, but that is
not important right now).However, when is passed as a parameter to
it, &man.nsgmls.1; will suppress its normal output, and just print
error messages. This makes it a useful way to check to see if your
document is valid or not.Use &man.nsgmls.1; to check that your document is
valid;&prompt.user; nsgmls -s example.sgmlAs you will see, &man.nsgmls.1; returns without displaying any
output. This means that your document validated
successfully.See what happens when required elements are omitted. Try
removing the title and /title
tags, and re-run the validation.&prompt.user; nsgmls -s example.sgml
nsgmls:example.sgml:5:4:E: character data is not allowed here
nsgmls:example.sgml:6:8:E: end tag for "HEAD" which is not finishedThe error output from &man.nsgmls.1; is organised into
colon-separated groups, or columns.ColumnMeaning1The name of the program generating the error. This
will always be nsgmls.2The name of the file that contains the error.3Line number where the error appears.4Column number where the error appears.5A one letter code indicating the nature of the
message. I indicates an informational
message, W is for warnings, and
E is for errorsIt is not always the fifth column either.
nsgmls -sv displays
nsgmls:I: SP version "1.3"
(depending on the installed version). As you can see,
this is an informational message., and X is for
cross-references. As you can see, these messages are
errors.6The text of the error message.Simply omitting the title tags has generated
2 different errors.The first error indicates that content (in this case,
characters, rather than the start tag for an element) has occured
where the SGML parser was expecting something else. In this case,
the parser was expecting to see one of the start tags for elements
that are valid inside head (such as
title).The second error is because head elements
must contain a title
element. Because it does not &man.nsgmls.1; considers that the
element has not been properly finished. However, the closing tag
indicates that the element has been closed before it has been
finished.Put the title element back in.The DOCTYPE declarationThe beginning of each document that you write must specify the name
of the DTD that the document conforms to. This is so that SGML parsers
- can determine the DTD and ensure that the document does conform to the
+ can determine the DTD and ensure that the document does conform to
it.This information is generally expressed on one line, in the DOCTYPE
declaration.
- A typical declaration for document written to conform with version
+ A typical declaration for a document written to conform with version
4.0 of the HTML DTD looks like this;
]]>That line contains a number of different components.<!Is the indicator that indicates that this
is an SGML declaration. This line is declaring the document type.
DOCTYPEShows that this is an SGML declaration for the document
type.htmlNames the first element that
will appear in the document.PUBLIC "-//W3C//DTD HTML 4.0//EN"Lists the Formal Public Identifier (FPI) for the DTD that this
document conforms to. Your SGML parser will use this to find the
correct DTD when processing this document.PUBLIC is not a part of the FPI, but
indicates to the SGML processor how to find the DTD referenced in
the FPI. Other ways of telling the SGML parser how to find the DTD
are shown later.>Returns to the document.Formal Public Identifiers (FPIs)You don't need to know this, but it's useful background, and
might help you debug problems when your SGML processor can't locate
the DTD you are using.FPIs must follow a specific syntax. This syntax is as
follows;
"Owner//KeywordDescription//Language"OwnerThis indicates the owner of the FPI.If this string starts with “ISO” then this is an
ISO owned FPI. For example, the FPI "ISO
8879:1986//ENTITIES Greek Symbols//EN" lists
ISO 8879:1986 as being the owner for the set
of entities for greek symbols. ISO 8879:1986 is the ISO number
for the SGML standard.Otherwise, this string will either look like
-//Owner or
+//Owner (notice
the only difference is the leading + or
-).If the string starts with - then the
owner information is unregistered, with a +
it identifies it as being registered.ISO 9070:1991 defines how registered names are generated; it
might be derived from the number of an ISO publication, an ISBN
code, or an organisation code assigned according to ISO 6523. In
addition, a registration authority could be created in order to
assign registered names. The ISO council delegated this to the
American National Standards Institute (ANSI).Because the FreeBSD Project hasn't been registered the
owner string is -//FreeBSD. And as you can
see, the W3C are not a registered owner either.KeywordThere are several keywords that indicate the type of
information in the file. Some of the most common keywords are
DTD, ELEMENT,
ENTITIES, and TEXT.
DTD is used only for DTD files,
ELEMENT is usually used for DTD fragments
that contain only entity or element declarations.
TEXT is used for SGML content (text and
tags).DescriptionAny description you want to supply for the contents of this
file. This may include version numbers or any short text that is
meaningful to you and unique for the SGML system.LanguageThis is an ISO two-character code that identifies the native
language for the file. EN is used for
English.catalog filesIf you use the syntax above and try and process this document
using an SGML processor, the processor will need to have some way of
turning the FPI into the name of the file on your computer that
contains the DTD.In order to do this it can use a catalog file. A catalog file
(typically called catalog) contains lines that
map FPIs to filenames. For example, if the catalog file contained the
line;
PUBLIC "-//W3C//DTD HTML 4.0//EN" "4.0/strict.dtd"The SGML processor would know to look up the DTD from
strict.dtd in the 4.0
subdirectory of whichever directory held the
catalog file that contained that line.Look at the contents of
/usr/local/share/sgml/html/catalog. This is the
catalog file for the HTML DTDs that will have been installed as part
of the textproc/docproj port.SGML_CATALOG_FILESIn order to locate a catalog file, your
SGML processor will need to know where to look. Many of them feature
command line parameters for specifying the path to one or more
catalogs.In addition, you can set SGML_CATALOG_FILES to
point to the files. This environment variable should consist of a
colon-separated list of catalog files (including their full
path).Typically, you will want to include the following files;/usr/local/share/sgml/docbook/3.0/catalog/usr/local/share/sgml/html/catalog/usr/local/share/sgml/iso8879/catalog/usr/local/share/sgml/jade/catalogYou should already have done
this.Alternatives to FPIsInstead of using an FPI to indicate the DTD that the document
conforms to (and therefore, which file on the system contains the DTD)
you can explicitly specify the name of the file.The syntax for this is slightly different;
]]>The SYSTEM keyword indicates that the SGML
processor should locate the DTD in a system specific fashion. This
typically (but not always) means the DTD will be provided as a
filename.Using FPIs is preferred for reasons of portability. You don't want
to have to ship a copy of the DTD around with your document, and if
you used the SYSTEM identifier then everyone would
need to keep their DTDs in the same place.Escaping back to SGMLEarlier in this primer I said that SGML is only used when writing a
DTD. This is not strictly true. There is certain SGML syntax that you
will want to be able to use within your documents. For example,
comments can be included in your document, and will be ignored by the
parser. Comments are entered using SGML syntax. Other uses for SGML
syntax in your document will be shown later too.Obviously, you need some way of indicating to the SGML processor
that the following content is not elements within the document, but is
SGML that the parser should act upon.These sections are marked by <! ... > in
your document. Everything between these delimiters is SGML syntax as you
might find within a DTD.As you may just have realised, the DOCTYPE declaration is an example
of SGML syntax that you need to include in your document…CommentsComments are an SGML construction, and are normally only valid
inside a DTD. However, as shows, it is
possible to use SGML syntax within your document.The delimiters for SGML comments is the string
“--”. The first occurence of this string
opens a comment, and the second closes it.SGML generic comment
<!-- test comment -->
]]>Use 2 dashesThere is a problem with producing the Postscript and PDF versions
of this document. The above example probably shows just one hyphen
symbol, - after the <! and
before the >.You must use two -,
not one. The Postscript and PDF versions have
translated the two - in the original to a longer,
more professional em-dash, and broken this
example in the process.The HTML, plain text, and RTF versions of this document are not
affected.
]]>
If you have used HTML before you may have been shown different rules
for comments. In particular, you may think that the string
<!-- opens a comment, and it is only closed by
-->.This is not the case. A lot of web browsers
have broken HTML parsers, and will accept that as valid. However, the
SGML parsers used by the Documentation Project are much stricter, and
will reject documents that make that error.Errorneous SGML comments]]>The SGML parser will treat this as though it were actually;
<!THIS IS OUTSIDE THE COMMENT>This is not valid SGML, and may give confusing error
messages.
]]>As the example suggests, do not write
comments like that.
]]>That is a (slightly) better approach, but it still potentially
confusing to people new to SGML.For you to do…Add some comments to example.sgml, and
check that the file still validates using &man.nsgmls.1;Add some invalid comments to
example.sgml, and see the error messages that
&man.nsgmls.1; gives when it encounters an invalid comment.Entities
- Entities are an SGML term. You might feel more comfortable thinking
- of them as variables. There are two types of entity in SGML, general
- entities and parameter entities.
+ Entities are a mechanism for assigning names to chunks of
+ content. As an SGML parser processes your document, any entities
+ it finds are replaced by the content of the entity.
+
+ This is a good way to have re-usable, easily changeable chunks
+ of content in your SGML documents. It is also the only way to
+ include one marked up file inside another using SGML.
+
+ There are two types of entities which can be used in two
+ different situations; general entities and
+ parameter entities.General Entities
- General entities are a way of assigning names to chunks of text,
- and reusing that text (which may contain markup) throughout your
- document.
-
You can not use general entities in an SGML context (although you
define them in one). They can only be used in your document. Contrast
this with parameter
entities.Each general entity has a name. When you want to reference a
general entity (and therefore include whatever text it represents in
your document), you write
&entity-name;. For
example, suppose you had an entity called
current.version which expanded to the current
version number of your product. You could write;
The current version of our product is
¤t.version;.]]>When the version number changes you can simply change the
definition of the value of the general entity and reprocess your
document.You can also use general entities to enter characters that you
- could not normally include in an SGML document. For example, < and
- & can not normally appear in an SGML document. Normally, when the
- SGML processor sees a < symbol it assumes that a tag (either a start
- tag or an end tag) is about to appear, and when it sees a & symbol
- it assumes the next text will be the name of an entity.
+ could not otherwise include in an SGML document. For example, <
+ and & can not normally appear in an SGML document. When the SGML
+ parser sees the < symbol it assumes that a tag (either a start tag
+ or an end tag) is about to appear, and when it sees the & symbol it
+ assumes the next text will be the name of an entity.
Fortunately, you can use the two general entities < and
& whenever you need to include one or other of these A general entity can only be defined within an SGML context.
Typically, this is done immediately after the DOCTYPE
declaration.Defining general entities
]>]]>Notice how the DOCTYPE declaration has been extended by adding a
square bracket at the end of the first line. The two entities are
then defined over the next two lines, before the square bracket is
closed, and then the DOCTYPE declaration is closed.The square brackets are necessary to indicate that we are
extending the DTD indicated by the DOCTYPE declaration.Parameter entitiesLike general entities,
parameter entities are used to assign names to reusable chunks of
text. However, where as general entities can only be used within your
document, parameter entities can only be used within an SGML context.Parameter entities are defined in a similar way to general
entities. However, instead of using
&entity-name; to
refer to them, use
%entity-name;Parameter entities use the
Percent symbol.. The definition also includes the %
between the ENTITY keyword and the name of the
entity.Defining parameter entities
]>]]>This may not seem particularly useful. It will be.For you to do…Add a general entity to
example.sgml.
]>
An example HTML file
This is a paragraph containing some text.
This paragraph contains some more text.
This paragraph might be right-justified.
The current version of this document is: &version;
]]>Validate the document using &man.nsgmls.1;Load example.sgml into your web browser
(you may need to copy it to example.html
before your browser recognises it as an HTML document).Unless your browser is very advanced, you won't see the entity
reference &version; replaced with the
version number. Most web browsers have very simplistic parsers
- which don't do proper SGML
+ which do not handle proper SGMLThis is a shame. Imagine all the problems and hacks (such
as Server Side Includes) that could be avoided if they
did..The solution is to normalise your
- document. Normalising it involves converting all the entity
- references to the values of those entities.
+ document using an SGML normaliser. The normaliser reads in valid
+ SGML and outputs equally valid SGML which has been transformed in
+ some way. One of the ways in which the normaliser transforms the
+ SGML is to expand all the entity references in the document,
+ replacing the entities with the text that they represent.
You can use &man.sgmlnorm.1; to do this.&prompt.user; sgmlnorm example.sgml > example.htmlYou should find a normalised (i.e., entity references
expanded) copy of your document in
example.html, ready to load into your web
browser.If you look at the output from &man.sgmlnorm.1; you will see
that it does not include a DOCTYPE declaration at the start. To
include this you need to use the
option;&prompt.user; sgmlnorm -d example.sgml > example.htmlUsing entities to include filesEntities (both general and
- parameter) come into their own
- when you realise they can be used to include other files.
+ parameter) are particularly
+ useful when used to include one file inside another.
Using general entities to include filesSuppose you have some content for an SGML book organised into
files, one file per chapter, called
chapter1.sgml,
chapter2.sgml, and so forth, with a
book.sgml file that will contain these
chapters.In order to use the contents of these files as the values for your
entities, you declare them with the SYSTEM keyword.
This directs the SGML parser to use the contents of the named file as
the value of the entity.Using general entities to include files
]>
&chapter.1;
&chapter.2;
&chapter.3;
]]>When using general entities to include other files within a
document, the files being included
(chapter1.sgml,
chapter2.sgml, and so on) must
not start with a DOCTYPE declaration. This is a syntax
error.Using parameter entities to include filesRecall that parameter entities can only be used inside an SGML
context. Why then would you want to include a file within an SGML
context?You can use this to ensure that you can reuse your general
entities.Suppose that you had many chapters in your document, and you
reused these chapters in two different books, each book organising the
chapters in a different fashion.You could list the entities at the top of each book, but this
quickly becomes cumbersome to manage.Instead, place the general entity definitions inside one file,
and use a parameter entity to include that file within your
document.Using parameter entities to include filesFirst, place your entity definitions in a separate file, called
chapters.ent. This file contains the
following;
]]>Now create a parameter entity to refer to the contents of the
file. Then use the parameter entity to load the file into the
document, which will then make all the general entities available
for use. Then use the general entities as before;
%chapters;
]>
&chapter.1;
&chapter.2;
&chapter.3;
]]>For you to do…Use general entities to include filesCreate three files, para1.sgml,
para2.sgml, and
para3.sgml.Put content similar to the following in each file;
This is the first paragraph.]]>Edit example.sgml so that it looks like
this;
]>
An example HTML file
The current version of this document is: &version;
¶1;
¶2;
¶3;
]]>Produce example.html by normalising
example.sgml.&prompt.user; sgmlnorm -d example.sgml > example.htmlLoad example.html in to your web
browser, and confirm that the
paran.sgml files
have been included in example.html.Use parameter entities to include filesYou must have taken the previous steps first.Edit example.sgml so that it looks like
this;
%entities;
]>
An example HTML file
The current version of this document is: &version;
¶1;
¶2;
¶3;
]]>Create a new file, entities.sgml, with
this content;
]]>Produce example.html by normalising
example.sgml.&prompt.user; sgmlnorm -d example.sgml > example.htmlLoad example.html in to your web
browser, and confirm that the
paran.sgml files
have been included in example.html.Marked sectionsSGML provides a mechanism to indicate that particular pieces of the
document should be processed in a special way. These are termed
“marked sections”.Structure of a marked section
<![ KEYWORD [
Contents of marked section
]]>As you would expect, being an SGML construct, a marked section
starts <!.The first square bracket begins to delimit the marked
section.KEYWORD describes how this marked
section should be processed by the parser.The second square bracket indicates that the content of the marked
section starts here.The marked section is finished by closing the two square brackets,
and then returning to the document context from the SGML context with
>Marked section keywordsCDATA, RCDATAThese keywords denote the marked sections content
model, and allow you to change it from the
default.
- When an SGML processor is processing a document, it keeps track
+ When an SGML parser is processing a document, it keeps track
of what is called the “content model”.Briefly, the content model describes what sort of content the
parser is expecting to see, and what it will do with it when it
finds it.The two content models you will probably find most useful are
CDATA and RCDATA.CDATA is for “Character Data”. If
the parser is in this content model then it is expecting to see
characters, and characters only. In this model the < and &
symbols lose their special status, and will be treated as ordinary
characters.RCDATA is for “Entity references and
character data” If the parser is in this content model then it
is expecting to see characters and entities.
< loses its special status, but & will still be treated as
starting the beginning of a general entity.This is particularly useful if you are including some verbatim
text that contains lots of < and & characters. While you
could go through the text ensuring that every < is converted to a
< and every & is converted to a &, it can be
easier to mark the section as only containing CDATA. When the SGML
parser encounters this it will ignore the < and & symbols
embedded in the content.Using a CDATA marked section
<para>Here is an example of how you would include some text
that contained many < and & symbols. The sample
text is a fragment of HTML. The surrounding text (<para> and
<programlisting>) are from DocBook.</para>
<programlisting>
<![ CDATA [ This is a sample that shows you some of the elements within
HTML. Since the angle brackets are used so many times, it's
simpler to say the whole example is a CDATA marked section
than to use the entity names for the left and right angle
brackets throughout.
This is a listitem
This is a second listitem
This is a third listitem
This is the end of the example.
]]>
]]>
</programlisting>If you look at the source for this document you will see this
technique used throughout.INCLUDE and
IGNOREIf the keyword is INCLUDE then the contents
of the marked section will be processed. If the keyword is
IGNORE then the marked section is ignored and
will not be processed. It will not appear in the output.Using INCLUDE and
IGNORE in marked sections
<![ INCLUDE [
This text will be processed and included.
]]>
<![ IGNORE [
This text will not be processed or included.
]]>By itself, this isn't too useful. If you wanted to remove text
from your document you could cut it out, or wrap it in
comments.It becomes more useful when you realise you can use parameter entities to control
this. Remember that parameter entities can only be used in SGML
contexts, and the keyword of a marked section
is an SGML context.For example, suppose that you produced a hard-copy version of
some documentation and an electronic version. In the electronic
version you wanted to include some extra content that wasn't to
appear in the hard-copy.Create a parameter entity, and set it's value to
INCLUDE. Write your document, using marked
sections to delimit content that should only appear in the
electronic version. In these marked sections use the parameter
entity in place of the keyword.When you want to produce the hard-copy version of the document,
change the parameter entity's value to IGNORE and
reprocess the document.Using a parameter entity to control a marked
section
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY % electronic.copy "INCLUDE">
]]>
...
<![ %electronic.copy [
This content should only appear in the electronic
version of the document.
]]>When producing the hard-copy version, change the entity's
definition to;
<!ENTITY % electronic.copy "IGNORE">On reprocessing the document, the marked sections that use
%electronic.copy as their keyword will be
ignored.For you to do…Create a new file, section.sgml, that
contains the following;
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY % text.output "INCLUDE">
]>
<html>
<head>
<title>An example using marked sections</title>
</head>
<body>
<p>This paragraph <![ CDATA [contains many <
characters (< < < < <) so it is easier
to wrap it in a CDATA marked section ]]></p>
<![ IGNORE [
<p>This paragraph will definitely not be included in the
output.</p>
]]>
<![ [
<p>This paragraph might appear in the output, or it
might not.</p>
<p>Its appearance is controlled by the
parameter entity.</p>
]]>
</body>
</html>Normalise this file using &man.sgmlnorm.1; and examine the
output. Notice which paragraphs have appeared, which have
disappeared, and what has happened to the content of the CDATA
marked section.Change the definition of the text.output
entity from INCLUDE to
IGNORE. Re-normalise the file, and examine the
output to see what has changed.
+
+
+ Conclusion
+
+ That is the conclusion of this SGML primer. For reasons of space
+ and complexity several things have not been covered in depth (or at
+ all). However, the previous sections cover enough SGML for you to be
+ able to follow the organisation of the FDP documentation.
+
diff --git a/en_US.ISO8859-1/books/fdp-primer/sgml-primer/chapter.sgml b/en_US.ISO8859-1/books/fdp-primer/sgml-primer/chapter.sgml
index c25bacf1f1..f181ce5dc9 100644
--- a/en_US.ISO8859-1/books/fdp-primer/sgml-primer/chapter.sgml
+++ b/en_US.ISO8859-1/books/fdp-primer/sgml-primer/chapter.sgml
@@ -1,1554 +1,1572 @@
SGML Primer
- The Documentation Project makes heavy use of the Standard Generalized
- Markup Language (SGML). This chapter describes what SGML is, how to read
- and understand markup, and some of the SGML tricks you will see used in
- the FAQ, Handbook, and website.
+ The majority of FDP documentation is written in applications of
+ SGML. This chapter explains exactly what that means, how to read
+ and understand the source to the documentation, and the sort of SGML
+ tricks you will see used in the documentation.Portions of this section were inspired by Mark Galassi's Get Going With DocBook.OverviewWay back when, electronic text was simple to deal with. Admittedly,
you had to know which character set your document was written in (ASCII,
EBCDIC, or one of a number of others) but that was about it. Text was
text, and what you saw really was what you got. No frills, no
formatting, no intelligence.Inevitably, this was not enough. Once you have text in a
- machine-usable format, you expect machines to be able to use it, and
+ machine-usable format, you expect machines to be able to use it and
manipulate it intelligently. You would like to indicate that certain
phrases should be emphasised, or added to a glossary, or be hyperlinks.
You might want filenames to be shown in a “typewriter” style
font for viewing on screen, but as “italics” when printed,
or any of a myriad of other options for presentation.It was once hoped that Artificial Intelligence (AI) would make this
- easy. Your computer would read in the document, and automatically
+ easy. Your computer would read in the document and automatically
identify key phrases, filenames, text that the reader should type in,
examples, and more. Unfortunately, real life has not happened quite
- like that, and our computers require some assistance before the can
+ like that, and our computers require some assistance before they can
meaningfully process our text.More precisely, they need help identifying what is what. You or I
can look at
To remove /tmp/foo use &man.rm.1;.
- rm /tmp/foo
+ &prompt.user; rm /tmp/foo
and easily see which parts are filenames, which are commands to be typed
in, which parts are references to manual pages, and so on. But the
computer processing the document can not. For this we need
markup.“Markup” is commonly used to describe “adding
value” or “increasing cost”. The term takes on both
these meanings when applied to text. Markup is additional text included
in the document, distinguished from the document's content in some way,
so that programs that process the document can read the markup and use
it when making decisions about the document. Editors can hide the
- markup from the user, so they are not distracted by it.
+ markup from the user, so the user is not distracted by it.
The extra information stored in the markup adds
value to the document. Adding the markup to the document
must typically be done by a person—after all, if computers could
recognise the text sufficiently well to add the markup then there would
be no need to add it in the first place. This increases the
cost of the document.The previous example is actually represented in this document like
this;To remove /tmp/foo use &man.rm.1;.
rm /tmp/foo]]>As you can see, the markup is clearly separate from the
content.Obviously, if you are going to use markup you need to define what
your markup means, and how it should be interpreted. You will need a
markup language that you can follow when marking up your
documents.
-
- SGML is not a markup langugage. Instead, SGML
- is the language in which you write markup
- languages. There have been many markup languages written
- using SGML. HTML and DocBook are two of these.
-
- This is an important point to understand. Most of the time you are
- not writing SGML documents. Instead, you are writing documents in a
- particular markup language. The definition of the markup language you
- are using is written in SGML.
-
- Each language definition (which is written in SGML) is more properly
- called a Document Type Definition (DTD). The DTD specifies the name of
- the elements that can be used, what order they appear in (and whether
- some markup can be used inside other markup) and related
- information.
+
+ Of course, one markup language might not be enough. A markup
+ language for technical documentation has very different requirements
+ than a markup language that was to be used for cookery recipes. This,
+ in turn, would be very different from a markup language used to describe
+ poetry. What you really need is a first language that you use to write
+ these other markup languages. A meta markup
+ language.
+
+ This is exactly what the Standard Generalised Markup Language (SGML)
+ is. Many markup languages have been written in SGML, including the two
+ most used by the FDP, HTML and DocBook.
+
+ Each language definition is more properly called a Document Type
+ Definition (DTD). The DTD specifies the name of the elements that can
+ be used, what order they appear in (and whether some markup can be used
+ inside other markup) and related information. A DTD is sometimes
+ referred to as an application of SGML.A DTD is a complete
specification of all the elements that are allowed to appear, the order
in which they should appear, which elements are mandatory, which are
- optional, and so forth. This makes it possible to write a
- parser which reads in the DTD and a document which
- claims to conform to the DTD. The parser can then confirm whether or
- not all the elements required by the DTD are in the document in the
+ optional, and so forth. This makes it possible to write an SGML
+ parser which reads in both the DTD and a document
+ which claims to conform to the DTD. The parser can then confirm whether
+ or not all the elements required by the DTD are in the document in the
right order, and whether there are any errors in the markup. This is
normally referred to as validating the document.This processing simply confirms that the choice of elements, their
ordering, and so on, conforms to that listed in the DTD. It does
not check that you have used
appropriate markup for the content. If you were
to try and mark up all the filenames in your document as function
names, the parser would not flag this as an error (assuming, of
course, that your DTD defines elements for filenames and functions,
and that they are allowed to appear in the same place).It is likely that most of your contributions to the Documentation
Project will consist of content marked up in either HTML or DocBook,
rather than alterations to the DTDs. For this reason this book will
not touch on how to write a DTD.Elements, tags, and attributesAll the DTDs written in SGML share certain characteristics. This is
- hardly surprising, as the philisophy behind SGML will inevitably show
+ hardly surprising, as the philosophy behind SGML will inevitably show
through. One of the most obvious manifestations of this philisophy is
that of content and
elements.Your documentation (whether it is a single web page, or a lengthy
book) is considered to consist of content. This content is then divided
(and further subdivided) into elements. The purpose of adding markup is
to name and identify the boundaries of these elements for further
processing.For example, consider a typical book. At the very top level, the
book is itself an element. This “book” element obviously
contains chapters, which can be considered to be elements in their own
right. Each chapter will contain more elements, such as paragraphs,
quotations, and footnotes. Each paragraph might contain further
elements, identifying content that was direct speech, or the name of a
character in the story.You might like to think of this as “chunking” content.
At the very top level you have one chunk, the book. Look a little
deeper, and you have more chunks, the individual chapters. These are
chunked further into paragraphs, footnotes, character names, and so
on.Notice how you can make this differentation between different
elements of the content without resorting to any SGML terms. It really
is surprisingly straightforward. You could do this with a highlighter
pen and a printout of the book, using different colours to indicate
- different types of content.
+ different chunks of content.
- Of course, we don't have an electronic highlighter pen, so we need
+ Of course, we do not have an electronic highlighter pen, so we need
some other way of indicating which element each piece of content belongs
to. In languages written in SGML (HTML, DocBook, et al) this is done by
means of tags.A tag is used to identify where a particular element starts, and
- where the ends. The tag is not part of the element
+ where the element ends. The tag is not part of the element
itself. Because each DTD was normally written to mark up
specific types of information, each one will recognise different
elements, and will therefore have different names for the tags.For an element called element-name the
start tag will normally look like
<element-name>. The
corresponding closing tag for this element is
</element-name>.Using an element (start and end tags)HTML has an element for indicating that the content enclosed by
the element is a paragraph, called p. This
element has both start and end tags.
This is a paragraph. It starts with the start tag for
the 'p' element, and it will end with the end tag for the 'p'
element.
This is another paragraph. But this one is much shorter.
]]>Not all elements require an end tag. Some elements have no content.
For example, in HTML you can indicate that you want a horizontal line to
appear in the document. Obviously, this line has no content, so just
the start tag is required for this element.Using an element (start tag only)HTML has an element for indicating a horizontal rule, called
hr. This element does not wrap content, so only has
a start tag.
This is a paragraph.
This is another paragraph. A horizontal rule separates this
from the previous paragraph.
]]>If it is not obvious by now, elements can contain other elements.
In the book example earlier, the book element contained all the chapter
elements, which in turn contained all the paragraph elements, and so
on.Elements within elements; em
This is a simple paragraph where some
of the words have been emphasised.]]>The DTD will specify the rules detailing which elements can contain
other elements, and exactly what they can contain.People often confuse the terms tags and elements, and use the terms
as if they were interchangeable. They are not.An element is a conceptual part of your document. An element has
a defined start and end. The tags mark where the element starts and
end.When this document (or anyone else knowledgable about SGML) refers
to “the <p> tag” they mean the literal text
consisting of the three characters <,
p, and >. But the phrase
“the <p> element” refers to the whole element.This distinction is very subtle. But keep it
in mind.Elements can have attributes. An attribute has a name and a value,
and is used for adding extra information to the element. This might be
information that indicates how the content should be rendered, or might
be something that uniquely identifies that occurence of the element, or
it might be something else.An element's attributes are written inside the
start tag for that element, and take the form
attribute-name="attribute-value".In sufficiently recent versions of HTML, the p
element has an attribute called align, which suggests
an alignment (justification) for the paragraph to the program displaying
the HTML.The align attribute can take one of four defined
values, left, center,
right and justify. If the
attribute is not specified then the default is
left.Using an element with an attribute
The inclusion of the align attribute
on this paragraph was superfluous, since the default is left.
This may appear in the center.
]]>Some attributes will only take specific values, such as
left or justify. Others will
allow you to enter anything you want. If you need to include quotes
(") within an attribute then use single quotes around
the attribute value.Single quotes around attributes
I'm on the right!]]>Sometimes you do not need to use quotes around attribute values at
all. However, the rules for doing this are subtle, and it is far simpler
just to always quote your attribute values.For you to do…In order to run the examples in this document you will need to
install some software on your system and ensure that an environment
variable is set correctly.Download and install textproc/docproj
from the FreeBSD ports system. This is a
meta-port that should download and install
all of the programs and supporting files that are used by the
Documentation Project.Add lines to your shell startup files to set
SGML_CATALOG_FILES..profile, for &man.sh.1; and
&man.bash.1; users
SGML_ROOT=/usr/local/share/sgml
SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog
SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILES
export SGML_CATALOG_FILES.login, for &man.csh.1; and
&man.tcsh.1; users
setenv SGML_ROOT /usr/local/share/sgml
setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog
setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILESThen either log out, and log back in again, or run those
commands from the command line to set the variable values.Create example.sgml, and enter the
following text;An example HTML file
This is a paragraph containing some text.
This paragraph contains some more text.
This paragraph might be right-justified.
]]>Try and validate this file using an SGML parser.Part of textproc/docproj is the
&man.nsgmls.1; validating
parser. Normally, &man.nsgmls.1; reads in a document
marked up according to an SGML DTD and returns a copy of the
document's Element Structure Information Set (ESIS, but that is
not important right now).However, when is passed as a parameter to
it, &man.nsgmls.1; will suppress its normal output, and just print
error messages. This makes it a useful way to check to see if your
document is valid or not.Use &man.nsgmls.1; to check that your document is
valid;&prompt.user; nsgmls -s example.sgmlAs you will see, &man.nsgmls.1; returns without displaying any
output. This means that your document validated
successfully.See what happens when required elements are omitted. Try
removing the title and /title
tags, and re-run the validation.&prompt.user; nsgmls -s example.sgml
nsgmls:example.sgml:5:4:E: character data is not allowed here
nsgmls:example.sgml:6:8:E: end tag for "HEAD" which is not finishedThe error output from &man.nsgmls.1; is organised into
colon-separated groups, or columns.ColumnMeaning1The name of the program generating the error. This
will always be nsgmls.2The name of the file that contains the error.3Line number where the error appears.4Column number where the error appears.5A one letter code indicating the nature of the
message. I indicates an informational
message, W is for warnings, and
E is for errorsIt is not always the fifth column either.
nsgmls -sv displays
nsgmls:I: SP version "1.3"
(depending on the installed version). As you can see,
this is an informational message., and X is for
cross-references. As you can see, these messages are
errors.6The text of the error message.Simply omitting the title tags has generated
2 different errors.The first error indicates that content (in this case,
characters, rather than the start tag for an element) has occured
where the SGML parser was expecting something else. In this case,
the parser was expecting to see one of the start tags for elements
that are valid inside head (such as
title).The second error is because head elements
must contain a title
element. Because it does not &man.nsgmls.1; considers that the
element has not been properly finished. However, the closing tag
indicates that the element has been closed before it has been
finished.Put the title element back in.The DOCTYPE declarationThe beginning of each document that you write must specify the name
of the DTD that the document conforms to. This is so that SGML parsers
- can determine the DTD and ensure that the document does conform to the
+ can determine the DTD and ensure that the document does conform to
it.This information is generally expressed on one line, in the DOCTYPE
declaration.
- A typical declaration for document written to conform with version
+ A typical declaration for a document written to conform with version
4.0 of the HTML DTD looks like this;
]]>That line contains a number of different components.<!Is the indicator that indicates that this
is an SGML declaration. This line is declaring the document type.
DOCTYPEShows that this is an SGML declaration for the document
type.htmlNames the first element that
will appear in the document.PUBLIC "-//W3C//DTD HTML 4.0//EN"Lists the Formal Public Identifier (FPI) for the DTD that this
document conforms to. Your SGML parser will use this to find the
correct DTD when processing this document.PUBLIC is not a part of the FPI, but
indicates to the SGML processor how to find the DTD referenced in
the FPI. Other ways of telling the SGML parser how to find the DTD
are shown later.>Returns to the document.Formal Public Identifiers (FPIs)You don't need to know this, but it's useful background, and
might help you debug problems when your SGML processor can't locate
the DTD you are using.FPIs must follow a specific syntax. This syntax is as
follows;
"Owner//KeywordDescription//Language"OwnerThis indicates the owner of the FPI.If this string starts with “ISO” then this is an
ISO owned FPI. For example, the FPI "ISO
8879:1986//ENTITIES Greek Symbols//EN" lists
ISO 8879:1986 as being the owner for the set
of entities for greek symbols. ISO 8879:1986 is the ISO number
for the SGML standard.Otherwise, this string will either look like
-//Owner or
+//Owner (notice
the only difference is the leading + or
-).If the string starts with - then the
owner information is unregistered, with a +
it identifies it as being registered.ISO 9070:1991 defines how registered names are generated; it
might be derived from the number of an ISO publication, an ISBN
code, or an organisation code assigned according to ISO 6523. In
addition, a registration authority could be created in order to
assign registered names. The ISO council delegated this to the
American National Standards Institute (ANSI).Because the FreeBSD Project hasn't been registered the
owner string is -//FreeBSD. And as you can
see, the W3C are not a registered owner either.KeywordThere are several keywords that indicate the type of
information in the file. Some of the most common keywords are
DTD, ELEMENT,
ENTITIES, and TEXT.
DTD is used only for DTD files,
ELEMENT is usually used for DTD fragments
that contain only entity or element declarations.
TEXT is used for SGML content (text and
tags).DescriptionAny description you want to supply for the contents of this
file. This may include version numbers or any short text that is
meaningful to you and unique for the SGML system.LanguageThis is an ISO two-character code that identifies the native
language for the file. EN is used for
English.catalog filesIf you use the syntax above and try and process this document
using an SGML processor, the processor will need to have some way of
turning the FPI into the name of the file on your computer that
contains the DTD.In order to do this it can use a catalog file. A catalog file
(typically called catalog) contains lines that
map FPIs to filenames. For example, if the catalog file contained the
line;
PUBLIC "-//W3C//DTD HTML 4.0//EN" "4.0/strict.dtd"The SGML processor would know to look up the DTD from
strict.dtd in the 4.0
subdirectory of whichever directory held the
catalog file that contained that line.Look at the contents of
/usr/local/share/sgml/html/catalog. This is the
catalog file for the HTML DTDs that will have been installed as part
of the textproc/docproj port.SGML_CATALOG_FILESIn order to locate a catalog file, your
SGML processor will need to know where to look. Many of them feature
command line parameters for specifying the path to one or more
catalogs.In addition, you can set SGML_CATALOG_FILES to
point to the files. This environment variable should consist of a
colon-separated list of catalog files (including their full
path).Typically, you will want to include the following files;/usr/local/share/sgml/docbook/3.0/catalog/usr/local/share/sgml/html/catalog/usr/local/share/sgml/iso8879/catalog/usr/local/share/sgml/jade/catalogYou should already have done
this.Alternatives to FPIsInstead of using an FPI to indicate the DTD that the document
conforms to (and therefore, which file on the system contains the DTD)
you can explicitly specify the name of the file.The syntax for this is slightly different;
]]>The SYSTEM keyword indicates that the SGML
processor should locate the DTD in a system specific fashion. This
typically (but not always) means the DTD will be provided as a
filename.Using FPIs is preferred for reasons of portability. You don't want
to have to ship a copy of the DTD around with your document, and if
you used the SYSTEM identifier then everyone would
need to keep their DTDs in the same place.Escaping back to SGMLEarlier in this primer I said that SGML is only used when writing a
DTD. This is not strictly true. There is certain SGML syntax that you
will want to be able to use within your documents. For example,
comments can be included in your document, and will be ignored by the
parser. Comments are entered using SGML syntax. Other uses for SGML
syntax in your document will be shown later too.Obviously, you need some way of indicating to the SGML processor
that the following content is not elements within the document, but is
SGML that the parser should act upon.These sections are marked by <! ... > in
your document. Everything between these delimiters is SGML syntax as you
might find within a DTD.As you may just have realised, the DOCTYPE declaration is an example
of SGML syntax that you need to include in your document…CommentsComments are an SGML construction, and are normally only valid
inside a DTD. However, as shows, it is
possible to use SGML syntax within your document.The delimiters for SGML comments is the string
“--”. The first occurence of this string
opens a comment, and the second closes it.SGML generic comment
<!-- test comment -->
]]>Use 2 dashesThere is a problem with producing the Postscript and PDF versions
of this document. The above example probably shows just one hyphen
symbol, - after the <! and
before the >.You must use two -,
not one. The Postscript and PDF versions have
translated the two - in the original to a longer,
more professional em-dash, and broken this
example in the process.The HTML, plain text, and RTF versions of this document are not
affected.
]]>
If you have used HTML before you may have been shown different rules
for comments. In particular, you may think that the string
<!-- opens a comment, and it is only closed by
-->.This is not the case. A lot of web browsers
have broken HTML parsers, and will accept that as valid. However, the
SGML parsers used by the Documentation Project are much stricter, and
will reject documents that make that error.Errorneous SGML comments]]>The SGML parser will treat this as though it were actually;
<!THIS IS OUTSIDE THE COMMENT>This is not valid SGML, and may give confusing error
messages.
]]>As the example suggests, do not write
comments like that.
]]>That is a (slightly) better approach, but it still potentially
confusing to people new to SGML.For you to do…Add some comments to example.sgml, and
check that the file still validates using &man.nsgmls.1;Add some invalid comments to
example.sgml, and see the error messages that
&man.nsgmls.1; gives when it encounters an invalid comment.Entities
- Entities are an SGML term. You might feel more comfortable thinking
- of them as variables. There are two types of entity in SGML, general
- entities and parameter entities.
+ Entities are a mechanism for assigning names to chunks of
+ content. As an SGML parser processes your document, any entities
+ it finds are replaced by the content of the entity.
+
+ This is a good way to have re-usable, easily changeable chunks
+ of content in your SGML documents. It is also the only way to
+ include one marked up file inside another using SGML.
+
+ There are two types of entities which can be used in two
+ different situations; general entities and
+ parameter entities.General Entities
- General entities are a way of assigning names to chunks of text,
- and reusing that text (which may contain markup) throughout your
- document.
-
You can not use general entities in an SGML context (although you
define them in one). They can only be used in your document. Contrast
this with parameter
entities.Each general entity has a name. When you want to reference a
general entity (and therefore include whatever text it represents in
your document), you write
&entity-name;. For
example, suppose you had an entity called
current.version which expanded to the current
version number of your product. You could write;
The current version of our product is
¤t.version;.]]>When the version number changes you can simply change the
definition of the value of the general entity and reprocess your
document.You can also use general entities to enter characters that you
- could not normally include in an SGML document. For example, < and
- & can not normally appear in an SGML document. Normally, when the
- SGML processor sees a < symbol it assumes that a tag (either a start
- tag or an end tag) is about to appear, and when it sees a & symbol
- it assumes the next text will be the name of an entity.
+ could not otherwise include in an SGML document. For example, <
+ and & can not normally appear in an SGML document. When the SGML
+ parser sees the < symbol it assumes that a tag (either a start tag
+ or an end tag) is about to appear, and when it sees the & symbol it
+ assumes the next text will be the name of an entity.
Fortunately, you can use the two general entities < and
& whenever you need to include one or other of these A general entity can only be defined within an SGML context.
Typically, this is done immediately after the DOCTYPE
declaration.Defining general entities
]>]]>Notice how the DOCTYPE declaration has been extended by adding a
square bracket at the end of the first line. The two entities are
then defined over the next two lines, before the square bracket is
closed, and then the DOCTYPE declaration is closed.The square brackets are necessary to indicate that we are
extending the DTD indicated by the DOCTYPE declaration.Parameter entitiesLike general entities,
parameter entities are used to assign names to reusable chunks of
text. However, where as general entities can only be used within your
document, parameter entities can only be used within an SGML context.Parameter entities are defined in a similar way to general
entities. However, instead of using
&entity-name; to
refer to them, use
%entity-name;Parameter entities use the
Percent symbol.. The definition also includes the %
between the ENTITY keyword and the name of the
entity.Defining parameter entities
]>]]>This may not seem particularly useful. It will be.For you to do…Add a general entity to
example.sgml.
]>
An example HTML file
This is a paragraph containing some text.
This paragraph contains some more text.
This paragraph might be right-justified.
The current version of this document is: &version;
]]>Validate the document using &man.nsgmls.1;Load example.sgml into your web browser
(you may need to copy it to example.html
before your browser recognises it as an HTML document).Unless your browser is very advanced, you won't see the entity
reference &version; replaced with the
version number. Most web browsers have very simplistic parsers
- which don't do proper SGML
+ which do not handle proper SGMLThis is a shame. Imagine all the problems and hacks (such
as Server Side Includes) that could be avoided if they
did..The solution is to normalise your
- document. Normalising it involves converting all the entity
- references to the values of those entities.
+ document using an SGML normaliser. The normaliser reads in valid
+ SGML and outputs equally valid SGML which has been transformed in
+ some way. One of the ways in which the normaliser transforms the
+ SGML is to expand all the entity references in the document,
+ replacing the entities with the text that they represent.
You can use &man.sgmlnorm.1; to do this.&prompt.user; sgmlnorm example.sgml > example.htmlYou should find a normalised (i.e., entity references
expanded) copy of your document in
example.html, ready to load into your web
browser.If you look at the output from &man.sgmlnorm.1; you will see
that it does not include a DOCTYPE declaration at the start. To
include this you need to use the
option;&prompt.user; sgmlnorm -d example.sgml > example.htmlUsing entities to include filesEntities (both general and
- parameter) come into their own
- when you realise they can be used to include other files.
+ parameter) are particularly
+ useful when used to include one file inside another.
Using general entities to include filesSuppose you have some content for an SGML book organised into
files, one file per chapter, called
chapter1.sgml,
chapter2.sgml, and so forth, with a
book.sgml file that will contain these
chapters.In order to use the contents of these files as the values for your
entities, you declare them with the SYSTEM keyword.
This directs the SGML parser to use the contents of the named file as
the value of the entity.Using general entities to include files
]>
&chapter.1;
&chapter.2;
&chapter.3;
]]>When using general entities to include other files within a
document, the files being included
(chapter1.sgml,
chapter2.sgml, and so on) must
not start with a DOCTYPE declaration. This is a syntax
error.Using parameter entities to include filesRecall that parameter entities can only be used inside an SGML
context. Why then would you want to include a file within an SGML
context?You can use this to ensure that you can reuse your general
entities.Suppose that you had many chapters in your document, and you
reused these chapters in two different books, each book organising the
chapters in a different fashion.You could list the entities at the top of each book, but this
quickly becomes cumbersome to manage.Instead, place the general entity definitions inside one file,
and use a parameter entity to include that file within your
document.Using parameter entities to include filesFirst, place your entity definitions in a separate file, called
chapters.ent. This file contains the
following;
]]>Now create a parameter entity to refer to the contents of the
file. Then use the parameter entity to load the file into the
document, which will then make all the general entities available
for use. Then use the general entities as before;
%chapters;
]>
&chapter.1;
&chapter.2;
&chapter.3;
]]>For you to do…Use general entities to include filesCreate three files, para1.sgml,
para2.sgml, and
para3.sgml.Put content similar to the following in each file;
This is the first paragraph.]]>Edit example.sgml so that it looks like
this;
]>
An example HTML file
The current version of this document is: &version;
¶1;
¶2;
¶3;
]]>Produce example.html by normalising
example.sgml.&prompt.user; sgmlnorm -d example.sgml > example.htmlLoad example.html in to your web
browser, and confirm that the
paran.sgml files
have been included in example.html.Use parameter entities to include filesYou must have taken the previous steps first.Edit example.sgml so that it looks like
this;
%entities;
]>
An example HTML file
The current version of this document is: &version;
¶1;
¶2;
¶3;
]]>Create a new file, entities.sgml, with
this content;
]]>Produce example.html by normalising
example.sgml.&prompt.user; sgmlnorm -d example.sgml > example.htmlLoad example.html in to your web
browser, and confirm that the
paran.sgml files
have been included in example.html.Marked sectionsSGML provides a mechanism to indicate that particular pieces of the
document should be processed in a special way. These are termed
“marked sections”.Structure of a marked section
<![ KEYWORD [
Contents of marked section
]]>As you would expect, being an SGML construct, a marked section
starts <!.The first square bracket begins to delimit the marked
section.KEYWORD describes how this marked
section should be processed by the parser.The second square bracket indicates that the content of the marked
section starts here.The marked section is finished by closing the two square brackets,
and then returning to the document context from the SGML context with
>Marked section keywordsCDATA, RCDATAThese keywords denote the marked sections content
model, and allow you to change it from the
default.
- When an SGML processor is processing a document, it keeps track
+ When an SGML parser is processing a document, it keeps track
of what is called the “content model”.Briefly, the content model describes what sort of content the
parser is expecting to see, and what it will do with it when it
finds it.The two content models you will probably find most useful are
CDATA and RCDATA.CDATA is for “Character Data”. If
the parser is in this content model then it is expecting to see
characters, and characters only. In this model the < and &
symbols lose their special status, and will be treated as ordinary
characters.RCDATA is for “Entity references and
character data” If the parser is in this content model then it
is expecting to see characters and entities.
< loses its special status, but & will still be treated as
starting the beginning of a general entity.This is particularly useful if you are including some verbatim
text that contains lots of < and & characters. While you
could go through the text ensuring that every < is converted to a
< and every & is converted to a &, it can be
easier to mark the section as only containing CDATA. When the SGML
parser encounters this it will ignore the < and & symbols
embedded in the content.Using a CDATA marked section
<para>Here is an example of how you would include some text
that contained many < and & symbols. The sample
text is a fragment of HTML. The surrounding text (<para> and
<programlisting>) are from DocBook.</para>
<programlisting>
<![ CDATA [ This is a sample that shows you some of the elements within
HTML. Since the angle brackets are used so many times, it's
simpler to say the whole example is a CDATA marked section
than to use the entity names for the left and right angle
brackets throughout.
This is a listitem
This is a second listitem
This is a third listitem
This is the end of the example.
]]>
]]>
</programlisting>If you look at the source for this document you will see this
technique used throughout.INCLUDE and
IGNOREIf the keyword is INCLUDE then the contents
of the marked section will be processed. If the keyword is
IGNORE then the marked section is ignored and
will not be processed. It will not appear in the output.Using INCLUDE and
IGNORE in marked sections
<![ INCLUDE [
This text will be processed and included.
]]>
<![ IGNORE [
This text will not be processed or included.
]]>By itself, this isn't too useful. If you wanted to remove text
from your document you could cut it out, or wrap it in
comments.It becomes more useful when you realise you can use parameter entities to control
this. Remember that parameter entities can only be used in SGML
contexts, and the keyword of a marked section
is an SGML context.For example, suppose that you produced a hard-copy version of
some documentation and an electronic version. In the electronic
version you wanted to include some extra content that wasn't to
appear in the hard-copy.Create a parameter entity, and set it's value to
INCLUDE. Write your document, using marked
sections to delimit content that should only appear in the
electronic version. In these marked sections use the parameter
entity in place of the keyword.When you want to produce the hard-copy version of the document,
change the parameter entity's value to IGNORE and
reprocess the document.Using a parameter entity to control a marked
section
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY % electronic.copy "INCLUDE">
]]>
...
<![ %electronic.copy [
This content should only appear in the electronic
version of the document.
]]>When producing the hard-copy version, change the entity's
definition to;
<!ENTITY % electronic.copy "IGNORE">On reprocessing the document, the marked sections that use
%electronic.copy as their keyword will be
ignored.For you to do…Create a new file, section.sgml, that
contains the following;
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY % text.output "INCLUDE">
]>
<html>
<head>
<title>An example using marked sections</title>
</head>
<body>
<p>This paragraph <![ CDATA [contains many <
characters (< < < < <) so it is easier
to wrap it in a CDATA marked section ]]></p>
<![ IGNORE [
<p>This paragraph will definitely not be included in the
output.</p>
]]>
<![ [
<p>This paragraph might appear in the output, or it
might not.</p>
<p>Its appearance is controlled by the
parameter entity.</p>
]]>
</body>
</html>Normalise this file using &man.sgmlnorm.1; and examine the
output. Notice which paragraphs have appeared, which have
disappeared, and what has happened to the content of the CDATA
marked section.Change the definition of the text.output
entity from INCLUDE to
IGNORE. Re-normalise the file, and examine the
output to see what has changed.
+
+
+ Conclusion
+
+ That is the conclusion of this SGML primer. For reasons of space
+ and complexity several things have not been covered in depth (or at
+ all). However, the previous sections cover enough SGML for you to be
+ able to follow the organisation of the FDP documentation.
+
diff --git a/en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml b/en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml
index c25bacf1f1..f181ce5dc9 100644
--- a/en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml
+++ b/en_US.ISO_8859-1/books/fdp-primer/sgml-primer/chapter.sgml
@@ -1,1554 +1,1572 @@
SGML Primer
- The Documentation Project makes heavy use of the Standard Generalized
- Markup Language (SGML). This chapter describes what SGML is, how to read
- and understand markup, and some of the SGML tricks you will see used in
- the FAQ, Handbook, and website.
+ The majority of FDP documentation is written in applications of
+ SGML. This chapter explains exactly what that means, how to read
+ and understand the source to the documentation, and the sort of SGML
+ tricks you will see used in the documentation.Portions of this section were inspired by Mark Galassi's Get Going With DocBook.OverviewWay back when, electronic text was simple to deal with. Admittedly,
you had to know which character set your document was written in (ASCII,
EBCDIC, or one of a number of others) but that was about it. Text was
text, and what you saw really was what you got. No frills, no
formatting, no intelligence.Inevitably, this was not enough. Once you have text in a
- machine-usable format, you expect machines to be able to use it, and
+ machine-usable format, you expect machines to be able to use it and
manipulate it intelligently. You would like to indicate that certain
phrases should be emphasised, or added to a glossary, or be hyperlinks.
You might want filenames to be shown in a “typewriter” style
font for viewing on screen, but as “italics” when printed,
or any of a myriad of other options for presentation.It was once hoped that Artificial Intelligence (AI) would make this
- easy. Your computer would read in the document, and automatically
+ easy. Your computer would read in the document and automatically
identify key phrases, filenames, text that the reader should type in,
examples, and more. Unfortunately, real life has not happened quite
- like that, and our computers require some assistance before the can
+ like that, and our computers require some assistance before they can
meaningfully process our text.More precisely, they need help identifying what is what. You or I
can look at
To remove /tmp/foo use &man.rm.1;.
- rm /tmp/foo
+ &prompt.user; rm /tmp/foo
and easily see which parts are filenames, which are commands to be typed
in, which parts are references to manual pages, and so on. But the
computer processing the document can not. For this we need
markup.“Markup” is commonly used to describe “adding
value” or “increasing cost”. The term takes on both
these meanings when applied to text. Markup is additional text included
in the document, distinguished from the document's content in some way,
so that programs that process the document can read the markup and use
it when making decisions about the document. Editors can hide the
- markup from the user, so they are not distracted by it.
+ markup from the user, so the user is not distracted by it.
The extra information stored in the markup adds
value to the document. Adding the markup to the document
must typically be done by a person—after all, if computers could
recognise the text sufficiently well to add the markup then there would
be no need to add it in the first place. This increases the
cost of the document.The previous example is actually represented in this document like
this;To remove /tmp/foo use &man.rm.1;.
rm /tmp/foo]]>As you can see, the markup is clearly separate from the
content.Obviously, if you are going to use markup you need to define what
your markup means, and how it should be interpreted. You will need a
markup language that you can follow when marking up your
documents.
-
- SGML is not a markup langugage. Instead, SGML
- is the language in which you write markup
- languages. There have been many markup languages written
- using SGML. HTML and DocBook are two of these.
-
- This is an important point to understand. Most of the time you are
- not writing SGML documents. Instead, you are writing documents in a
- particular markup language. The definition of the markup language you
- are using is written in SGML.
-
- Each language definition (which is written in SGML) is more properly
- called a Document Type Definition (DTD). The DTD specifies the name of
- the elements that can be used, what order they appear in (and whether
- some markup can be used inside other markup) and related
- information.
+
+ Of course, one markup language might not be enough. A markup
+ language for technical documentation has very different requirements
+ than a markup language that was to be used for cookery recipes. This,
+ in turn, would be very different from a markup language used to describe
+ poetry. What you really need is a first language that you use to write
+ these other markup languages. A meta markup
+ language.
+
+ This is exactly what the Standard Generalised Markup Language (SGML)
+ is. Many markup languages have been written in SGML, including the two
+ most used by the FDP, HTML and DocBook.
+
+ Each language definition is more properly called a Document Type
+ Definition (DTD). The DTD specifies the name of the elements that can
+ be used, what order they appear in (and whether some markup can be used
+ inside other markup) and related information. A DTD is sometimes
+ referred to as an application of SGML.A DTD is a complete
specification of all the elements that are allowed to appear, the order
in which they should appear, which elements are mandatory, which are
- optional, and so forth. This makes it possible to write a
- parser which reads in the DTD and a document which
- claims to conform to the DTD. The parser can then confirm whether or
- not all the elements required by the DTD are in the document in the
+ optional, and so forth. This makes it possible to write an SGML
+ parser which reads in both the DTD and a document
+ which claims to conform to the DTD. The parser can then confirm whether
+ or not all the elements required by the DTD are in the document in the
right order, and whether there are any errors in the markup. This is
normally referred to as validating the document.This processing simply confirms that the choice of elements, their
ordering, and so on, conforms to that listed in the DTD. It does
not check that you have used
appropriate markup for the content. If you were
to try and mark up all the filenames in your document as function
names, the parser would not flag this as an error (assuming, of
course, that your DTD defines elements for filenames and functions,
and that they are allowed to appear in the same place).It is likely that most of your contributions to the Documentation
Project will consist of content marked up in either HTML or DocBook,
rather than alterations to the DTDs. For this reason this book will
not touch on how to write a DTD.Elements, tags, and attributesAll the DTDs written in SGML share certain characteristics. This is
- hardly surprising, as the philisophy behind SGML will inevitably show
+ hardly surprising, as the philosophy behind SGML will inevitably show
through. One of the most obvious manifestations of this philisophy is
that of content and
elements.Your documentation (whether it is a single web page, or a lengthy
book) is considered to consist of content. This content is then divided
(and further subdivided) into elements. The purpose of adding markup is
to name and identify the boundaries of these elements for further
processing.For example, consider a typical book. At the very top level, the
book is itself an element. This “book” element obviously
contains chapters, which can be considered to be elements in their own
right. Each chapter will contain more elements, such as paragraphs,
quotations, and footnotes. Each paragraph might contain further
elements, identifying content that was direct speech, or the name of a
character in the story.You might like to think of this as “chunking” content.
At the very top level you have one chunk, the book. Look a little
deeper, and you have more chunks, the individual chapters. These are
chunked further into paragraphs, footnotes, character names, and so
on.Notice how you can make this differentation between different
elements of the content without resorting to any SGML terms. It really
is surprisingly straightforward. You could do this with a highlighter
pen and a printout of the book, using different colours to indicate
- different types of content.
+ different chunks of content.
- Of course, we don't have an electronic highlighter pen, so we need
+ Of course, we do not have an electronic highlighter pen, so we need
some other way of indicating which element each piece of content belongs
to. In languages written in SGML (HTML, DocBook, et al) this is done by
means of tags.A tag is used to identify where a particular element starts, and
- where the ends. The tag is not part of the element
+ where the element ends. The tag is not part of the element
itself. Because each DTD was normally written to mark up
specific types of information, each one will recognise different
elements, and will therefore have different names for the tags.For an element called element-name the
start tag will normally look like
<element-name>. The
corresponding closing tag for this element is
</element-name>.Using an element (start and end tags)HTML has an element for indicating that the content enclosed by
the element is a paragraph, called p. This
element has both start and end tags.
This is a paragraph. It starts with the start tag for
the 'p' element, and it will end with the end tag for the 'p'
element.
This is another paragraph. But this one is much shorter.
]]>Not all elements require an end tag. Some elements have no content.
For example, in HTML you can indicate that you want a horizontal line to
appear in the document. Obviously, this line has no content, so just
the start tag is required for this element.Using an element (start tag only)HTML has an element for indicating a horizontal rule, called
hr. This element does not wrap content, so only has
a start tag.
This is a paragraph.
This is another paragraph. A horizontal rule separates this
from the previous paragraph.
]]>If it is not obvious by now, elements can contain other elements.
In the book example earlier, the book element contained all the chapter
elements, which in turn contained all the paragraph elements, and so
on.Elements within elements; em
This is a simple paragraph where some
of the words have been emphasised.]]>The DTD will specify the rules detailing which elements can contain
other elements, and exactly what they can contain.People often confuse the terms tags and elements, and use the terms
as if they were interchangeable. They are not.An element is a conceptual part of your document. An element has
a defined start and end. The tags mark where the element starts and
end.When this document (or anyone else knowledgable about SGML) refers
to “the <p> tag” they mean the literal text
consisting of the three characters <,
p, and >. But the phrase
“the <p> element” refers to the whole element.This distinction is very subtle. But keep it
in mind.Elements can have attributes. An attribute has a name and a value,
and is used for adding extra information to the element. This might be
information that indicates how the content should be rendered, or might
be something that uniquely identifies that occurence of the element, or
it might be something else.An element's attributes are written inside the
start tag for that element, and take the form
attribute-name="attribute-value".In sufficiently recent versions of HTML, the p
element has an attribute called align, which suggests
an alignment (justification) for the paragraph to the program displaying
the HTML.The align attribute can take one of four defined
values, left, center,
right and justify. If the
attribute is not specified then the default is
left.Using an element with an attribute
The inclusion of the align attribute
on this paragraph was superfluous, since the default is left.
This may appear in the center.
]]>Some attributes will only take specific values, such as
left or justify. Others will
allow you to enter anything you want. If you need to include quotes
(") within an attribute then use single quotes around
the attribute value.Single quotes around attributes
I'm on the right!]]>Sometimes you do not need to use quotes around attribute values at
all. However, the rules for doing this are subtle, and it is far simpler
just to always quote your attribute values.For you to do…In order to run the examples in this document you will need to
install some software on your system and ensure that an environment
variable is set correctly.Download and install textproc/docproj
from the FreeBSD ports system. This is a
meta-port that should download and install
all of the programs and supporting files that are used by the
Documentation Project.Add lines to your shell startup files to set
SGML_CATALOG_FILES..profile, for &man.sh.1; and
&man.bash.1; users
SGML_ROOT=/usr/local/share/sgml
SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog
SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
SGML_CATALOG_FILES=${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILES
export SGML_CATALOG_FILES.login, for &man.csh.1; and
&man.tcsh.1; users
setenv SGML_ROOT /usr/local/share/sgml
setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog
setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/3.0/catalog:$SGML_CATALOG_FILESThen either log out, and log back in again, or run those
commands from the command line to set the variable values.Create example.sgml, and enter the
following text;An example HTML file
This is a paragraph containing some text.
This paragraph contains some more text.
This paragraph might be right-justified.
]]>Try and validate this file using an SGML parser.Part of textproc/docproj is the
&man.nsgmls.1; validating
parser. Normally, &man.nsgmls.1; reads in a document
marked up according to an SGML DTD and returns a copy of the
document's Element Structure Information Set (ESIS, but that is
not important right now).However, when is passed as a parameter to
it, &man.nsgmls.1; will suppress its normal output, and just print
error messages. This makes it a useful way to check to see if your
document is valid or not.Use &man.nsgmls.1; to check that your document is
valid;&prompt.user; nsgmls -s example.sgmlAs you will see, &man.nsgmls.1; returns without displaying any
output. This means that your document validated
successfully.See what happens when required elements are omitted. Try
removing the title and /title
tags, and re-run the validation.&prompt.user; nsgmls -s example.sgml
nsgmls:example.sgml:5:4:E: character data is not allowed here
nsgmls:example.sgml:6:8:E: end tag for "HEAD" which is not finishedThe error output from &man.nsgmls.1; is organised into
colon-separated groups, or columns.ColumnMeaning1The name of the program generating the error. This
will always be nsgmls.2The name of the file that contains the error.3Line number where the error appears.4Column number where the error appears.5A one letter code indicating the nature of the
message. I indicates an informational
message, W is for warnings, and
E is for errorsIt is not always the fifth column either.
nsgmls -sv displays
nsgmls:I: SP version "1.3"
(depending on the installed version). As you can see,
this is an informational message., and X is for
cross-references. As you can see, these messages are
errors.6The text of the error message.Simply omitting the title tags has generated
2 different errors.The first error indicates that content (in this case,
characters, rather than the start tag for an element) has occured
where the SGML parser was expecting something else. In this case,
the parser was expecting to see one of the start tags for elements
that are valid inside head (such as
title).The second error is because head elements
must contain a title
element. Because it does not &man.nsgmls.1; considers that the
element has not been properly finished. However, the closing tag
indicates that the element has been closed before it has been
finished.Put the title element back in.The DOCTYPE declarationThe beginning of each document that you write must specify the name
of the DTD that the document conforms to. This is so that SGML parsers
- can determine the DTD and ensure that the document does conform to the
+ can determine the DTD and ensure that the document does conform to
it.This information is generally expressed on one line, in the DOCTYPE
declaration.
- A typical declaration for document written to conform with version
+ A typical declaration for a document written to conform with version
4.0 of the HTML DTD looks like this;
]]>That line contains a number of different components.<!Is the indicator that indicates that this
is an SGML declaration. This line is declaring the document type.
DOCTYPEShows that this is an SGML declaration for the document
type.htmlNames the first element that
will appear in the document.PUBLIC "-//W3C//DTD HTML 4.0//EN"Lists the Formal Public Identifier (FPI) for the DTD that this
document conforms to. Your SGML parser will use this to find the
correct DTD when processing this document.PUBLIC is not a part of the FPI, but
indicates to the SGML processor how to find the DTD referenced in
the FPI. Other ways of telling the SGML parser how to find the DTD
are shown later.>Returns to the document.Formal Public Identifiers (FPIs)You don't need to know this, but it's useful background, and
might help you debug problems when your SGML processor can't locate
the DTD you are using.FPIs must follow a specific syntax. This syntax is as
follows;
"Owner//KeywordDescription//Language"OwnerThis indicates the owner of the FPI.If this string starts with “ISO” then this is an
ISO owned FPI. For example, the FPI "ISO
8879:1986//ENTITIES Greek Symbols//EN" lists
ISO 8879:1986 as being the owner for the set
of entities for greek symbols. ISO 8879:1986 is the ISO number
for the SGML standard.Otherwise, this string will either look like
-//Owner or
+//Owner (notice
the only difference is the leading + or
-).If the string starts with - then the
owner information is unregistered, with a +
it identifies it as being registered.ISO 9070:1991 defines how registered names are generated; it
might be derived from the number of an ISO publication, an ISBN
code, or an organisation code assigned according to ISO 6523. In
addition, a registration authority could be created in order to
assign registered names. The ISO council delegated this to the
American National Standards Institute (ANSI).Because the FreeBSD Project hasn't been registered the
owner string is -//FreeBSD. And as you can
see, the W3C are not a registered owner either.KeywordThere are several keywords that indicate the type of
information in the file. Some of the most common keywords are
DTD, ELEMENT,
ENTITIES, and TEXT.
DTD is used only for DTD files,
ELEMENT is usually used for DTD fragments
that contain only entity or element declarations.
TEXT is used for SGML content (text and
tags).DescriptionAny description you want to supply for the contents of this
file. This may include version numbers or any short text that is
meaningful to you and unique for the SGML system.LanguageThis is an ISO two-character code that identifies the native
language for the file. EN is used for
English.catalog filesIf you use the syntax above and try and process this document
using an SGML processor, the processor will need to have some way of
turning the FPI into the name of the file on your computer that
contains the DTD.In order to do this it can use a catalog file. A catalog file
(typically called catalog) contains lines that
map FPIs to filenames. For example, if the catalog file contained the
line;
PUBLIC "-//W3C//DTD HTML 4.0//EN" "4.0/strict.dtd"The SGML processor would know to look up the DTD from
strict.dtd in the 4.0
subdirectory of whichever directory held the
catalog file that contained that line.Look at the contents of
/usr/local/share/sgml/html/catalog. This is the
catalog file for the HTML DTDs that will have been installed as part
of the textproc/docproj port.SGML_CATALOG_FILESIn order to locate a catalog file, your
SGML processor will need to know where to look. Many of them feature
command line parameters for specifying the path to one or more
catalogs.In addition, you can set SGML_CATALOG_FILES to
point to the files. This environment variable should consist of a
colon-separated list of catalog files (including their full
path).Typically, you will want to include the following files;/usr/local/share/sgml/docbook/3.0/catalog/usr/local/share/sgml/html/catalog/usr/local/share/sgml/iso8879/catalog/usr/local/share/sgml/jade/catalogYou should already have done
this.Alternatives to FPIsInstead of using an FPI to indicate the DTD that the document
conforms to (and therefore, which file on the system contains the DTD)
you can explicitly specify the name of the file.The syntax for this is slightly different;
]]>The SYSTEM keyword indicates that the SGML
processor should locate the DTD in a system specific fashion. This
typically (but not always) means the DTD will be provided as a
filename.Using FPIs is preferred for reasons of portability. You don't want
to have to ship a copy of the DTD around with your document, and if
you used the SYSTEM identifier then everyone would
need to keep their DTDs in the same place.Escaping back to SGMLEarlier in this primer I said that SGML is only used when writing a
DTD. This is not strictly true. There is certain SGML syntax that you
will want to be able to use within your documents. For example,
comments can be included in your document, and will be ignored by the
parser. Comments are entered using SGML syntax. Other uses for SGML
syntax in your document will be shown later too.Obviously, you need some way of indicating to the SGML processor
that the following content is not elements within the document, but is
SGML that the parser should act upon.These sections are marked by <! ... > in
your document. Everything between these delimiters is SGML syntax as you
might find within a DTD.As you may just have realised, the DOCTYPE declaration is an example
of SGML syntax that you need to include in your document…CommentsComments are an SGML construction, and are normally only valid
inside a DTD. However, as shows, it is
possible to use SGML syntax within your document.The delimiters for SGML comments is the string
“--”. The first occurence of this string
opens a comment, and the second closes it.SGML generic comment
<!-- test comment -->
]]>Use 2 dashesThere is a problem with producing the Postscript and PDF versions
of this document. The above example probably shows just one hyphen
symbol, - after the <! and
before the >.You must use two -,
not one. The Postscript and PDF versions have
translated the two - in the original to a longer,
more professional em-dash, and broken this
example in the process.The HTML, plain text, and RTF versions of this document are not
affected.
]]>
If you have used HTML before you may have been shown different rules
for comments. In particular, you may think that the string
<!-- opens a comment, and it is only closed by
-->.This is not the case. A lot of web browsers
have broken HTML parsers, and will accept that as valid. However, the
SGML parsers used by the Documentation Project are much stricter, and
will reject documents that make that error.Errorneous SGML comments]]>The SGML parser will treat this as though it were actually;
<!THIS IS OUTSIDE THE COMMENT>This is not valid SGML, and may give confusing error
messages.
]]>As the example suggests, do not write
comments like that.
]]>That is a (slightly) better approach, but it still potentially
confusing to people new to SGML.For you to do…Add some comments to example.sgml, and
check that the file still validates using &man.nsgmls.1;Add some invalid comments to
example.sgml, and see the error messages that
&man.nsgmls.1; gives when it encounters an invalid comment.Entities
- Entities are an SGML term. You might feel more comfortable thinking
- of them as variables. There are two types of entity in SGML, general
- entities and parameter entities.
+ Entities are a mechanism for assigning names to chunks of
+ content. As an SGML parser processes your document, any entities
+ it finds are replaced by the content of the entity.
+
+ This is a good way to have re-usable, easily changeable chunks
+ of content in your SGML documents. It is also the only way to
+ include one marked up file inside another using SGML.
+
+ There are two types of entities which can be used in two
+ different situations; general entities and
+ parameter entities.General Entities
- General entities are a way of assigning names to chunks of text,
- and reusing that text (which may contain markup) throughout your
- document.
-
You can not use general entities in an SGML context (although you
define them in one). They can only be used in your document. Contrast
this with parameter
entities.Each general entity has a name. When you want to reference a
general entity (and therefore include whatever text it represents in
your document), you write
&entity-name;. For
example, suppose you had an entity called
current.version which expanded to the current
version number of your product. You could write;
The current version of our product is
¤t.version;.]]>When the version number changes you can simply change the
definition of the value of the general entity and reprocess your
document.You can also use general entities to enter characters that you
- could not normally include in an SGML document. For example, < and
- & can not normally appear in an SGML document. Normally, when the
- SGML processor sees a < symbol it assumes that a tag (either a start
- tag or an end tag) is about to appear, and when it sees a & symbol
- it assumes the next text will be the name of an entity.
+ could not otherwise include in an SGML document. For example, <
+ and & can not normally appear in an SGML document. When the SGML
+ parser sees the < symbol it assumes that a tag (either a start tag
+ or an end tag) is about to appear, and when it sees the & symbol it
+ assumes the next text will be the name of an entity.
Fortunately, you can use the two general entities < and
& whenever you need to include one or other of these A general entity can only be defined within an SGML context.
Typically, this is done immediately after the DOCTYPE
declaration.Defining general entities
]>]]>Notice how the DOCTYPE declaration has been extended by adding a
square bracket at the end of the first line. The two entities are
then defined over the next two lines, before the square bracket is
closed, and then the DOCTYPE declaration is closed.The square brackets are necessary to indicate that we are
extending the DTD indicated by the DOCTYPE declaration.Parameter entitiesLike general entities,
parameter entities are used to assign names to reusable chunks of
text. However, where as general entities can only be used within your
document, parameter entities can only be used within an SGML context.Parameter entities are defined in a similar way to general
entities. However, instead of using
&entity-name; to
refer to them, use
%entity-name;Parameter entities use the
Percent symbol.. The definition also includes the %
between the ENTITY keyword and the name of the
entity.Defining parameter entities
]>]]>This may not seem particularly useful. It will be.For you to do…Add a general entity to
example.sgml.
]>
An example HTML file
This is a paragraph containing some text.
This paragraph contains some more text.
This paragraph might be right-justified.
The current version of this document is: &version;
]]>Validate the document using &man.nsgmls.1;Load example.sgml into your web browser
(you may need to copy it to example.html
before your browser recognises it as an HTML document).Unless your browser is very advanced, you won't see the entity
reference &version; replaced with the
version number. Most web browsers have very simplistic parsers
- which don't do proper SGML
+ which do not handle proper SGMLThis is a shame. Imagine all the problems and hacks (such
as Server Side Includes) that could be avoided if they
did..The solution is to normalise your
- document. Normalising it involves converting all the entity
- references to the values of those entities.
+ document using an SGML normaliser. The normaliser reads in valid
+ SGML and outputs equally valid SGML which has been transformed in
+ some way. One of the ways in which the normaliser transforms the
+ SGML is to expand all the entity references in the document,
+ replacing the entities with the text that they represent.
You can use &man.sgmlnorm.1; to do this.&prompt.user; sgmlnorm example.sgml > example.htmlYou should find a normalised (i.e., entity references
expanded) copy of your document in
example.html, ready to load into your web
browser.If you look at the output from &man.sgmlnorm.1; you will see
that it does not include a DOCTYPE declaration at the start. To
include this you need to use the
option;&prompt.user; sgmlnorm -d example.sgml > example.htmlUsing entities to include filesEntities (both general and
- parameter) come into their own
- when you realise they can be used to include other files.
+ parameter) are particularly
+ useful when used to include one file inside another.
Using general entities to include filesSuppose you have some content for an SGML book organised into
files, one file per chapter, called
chapter1.sgml,
chapter2.sgml, and so forth, with a
book.sgml file that will contain these
chapters.In order to use the contents of these files as the values for your
entities, you declare them with the SYSTEM keyword.
This directs the SGML parser to use the contents of the named file as
the value of the entity.Using general entities to include files
]>
&chapter.1;
&chapter.2;
&chapter.3;
]]>When using general entities to include other files within a
document, the files being included
(chapter1.sgml,
chapter2.sgml, and so on) must
not start with a DOCTYPE declaration. This is a syntax
error.Using parameter entities to include filesRecall that parameter entities can only be used inside an SGML
context. Why then would you want to include a file within an SGML
context?You can use this to ensure that you can reuse your general
entities.Suppose that you had many chapters in your document, and you
reused these chapters in two different books, each book organising the
chapters in a different fashion.You could list the entities at the top of each book, but this
quickly becomes cumbersome to manage.Instead, place the general entity definitions inside one file,
and use a parameter entity to include that file within your
document.Using parameter entities to include filesFirst, place your entity definitions in a separate file, called
chapters.ent. This file contains the
following;
]]>Now create a parameter entity to refer to the contents of the
file. Then use the parameter entity to load the file into the
document, which will then make all the general entities available
for use. Then use the general entities as before;
%chapters;
]>
&chapter.1;
&chapter.2;
&chapter.3;
]]>For you to do…Use general entities to include filesCreate three files, para1.sgml,
para2.sgml, and
para3.sgml.Put content similar to the following in each file;
This is the first paragraph.]]>Edit example.sgml so that it looks like
this;
]>
An example HTML file
The current version of this document is: &version;
¶1;
¶2;
¶3;
]]>Produce example.html by normalising
example.sgml.&prompt.user; sgmlnorm -d example.sgml > example.htmlLoad example.html in to your web
browser, and confirm that the
paran.sgml files
have been included in example.html.Use parameter entities to include filesYou must have taken the previous steps first.Edit example.sgml so that it looks like
this;
%entities;
]>
An example HTML file
The current version of this document is: &version;
¶1;
¶2;
¶3;
]]>Create a new file, entities.sgml, with
this content;
]]>Produce example.html by normalising
example.sgml.&prompt.user; sgmlnorm -d example.sgml > example.htmlLoad example.html in to your web
browser, and confirm that the
paran.sgml files
have been included in example.html.Marked sectionsSGML provides a mechanism to indicate that particular pieces of the
document should be processed in a special way. These are termed
“marked sections”.Structure of a marked section
<![ KEYWORD [
Contents of marked section
]]>As you would expect, being an SGML construct, a marked section
starts <!.The first square bracket begins to delimit the marked
section.KEYWORD describes how this marked
section should be processed by the parser.The second square bracket indicates that the content of the marked
section starts here.The marked section is finished by closing the two square brackets,
and then returning to the document context from the SGML context with
>Marked section keywordsCDATA, RCDATAThese keywords denote the marked sections content
model, and allow you to change it from the
default.
- When an SGML processor is processing a document, it keeps track
+ When an SGML parser is processing a document, it keeps track
of what is called the “content model”.Briefly, the content model describes what sort of content the
parser is expecting to see, and what it will do with it when it
finds it.The two content models you will probably find most useful are
CDATA and RCDATA.CDATA is for “Character Data”. If
the parser is in this content model then it is expecting to see
characters, and characters only. In this model the < and &
symbols lose their special status, and will be treated as ordinary
characters.RCDATA is for “Entity references and
character data” If the parser is in this content model then it
is expecting to see characters and entities.
< loses its special status, but & will still be treated as
starting the beginning of a general entity.This is particularly useful if you are including some verbatim
text that contains lots of < and & characters. While you
could go through the text ensuring that every < is converted to a
< and every & is converted to a &, it can be
easier to mark the section as only containing CDATA. When the SGML
parser encounters this it will ignore the < and & symbols
embedded in the content.Using a CDATA marked section
<para>Here is an example of how you would include some text
that contained many < and & symbols. The sample
text is a fragment of HTML. The surrounding text (<para> and
<programlisting>) are from DocBook.</para>
<programlisting>
<![ CDATA [ This is a sample that shows you some of the elements within
HTML. Since the angle brackets are used so many times, it's
simpler to say the whole example is a CDATA marked section
than to use the entity names for the left and right angle
brackets throughout.
This is a listitem
This is a second listitem
This is a third listitem
This is the end of the example.
]]>
]]>
</programlisting>If you look at the source for this document you will see this
technique used throughout.INCLUDE and
IGNOREIf the keyword is INCLUDE then the contents
of the marked section will be processed. If the keyword is
IGNORE then the marked section is ignored and
will not be processed. It will not appear in the output.Using INCLUDE and
IGNORE in marked sections
<![ INCLUDE [
This text will be processed and included.
]]>
<![ IGNORE [
This text will not be processed or included.
]]>By itself, this isn't too useful. If you wanted to remove text
from your document you could cut it out, or wrap it in
comments.It becomes more useful when you realise you can use parameter entities to control
this. Remember that parameter entities can only be used in SGML
contexts, and the keyword of a marked section
is an SGML context.For example, suppose that you produced a hard-copy version of
some documentation and an electronic version. In the electronic
version you wanted to include some extra content that wasn't to
appear in the hard-copy.Create a parameter entity, and set it's value to
INCLUDE. Write your document, using marked
sections to delimit content that should only appear in the
electronic version. In these marked sections use the parameter
entity in place of the keyword.When you want to produce the hard-copy version of the document,
change the parameter entity's value to IGNORE and
reprocess the document.Using a parameter entity to control a marked
section
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY % electronic.copy "INCLUDE">
]]>
...
<![ %electronic.copy [
This content should only appear in the electronic
version of the document.
]]>When producing the hard-copy version, change the entity's
definition to;
<!ENTITY % electronic.copy "IGNORE">On reprocessing the document, the marked sections that use
%electronic.copy as their keyword will be
ignored.For you to do…Create a new file, section.sgml, that
contains the following;
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY % text.output "INCLUDE">
]>
<html>
<head>
<title>An example using marked sections</title>
</head>
<body>
<p>This paragraph <![ CDATA [contains many <
characters (< < < < <) so it is easier
to wrap it in a CDATA marked section ]]></p>
<![ IGNORE [
<p>This paragraph will definitely not be included in the
output.</p>
]]>
<![ [
<p>This paragraph might appear in the output, or it
might not.</p>
<p>Its appearance is controlled by the
parameter entity.</p>
]]>
</body>
</html>Normalise this file using &man.sgmlnorm.1; and examine the
output. Notice which paragraphs have appeared, which have
disappeared, and what has happened to the content of the CDATA
marked section.Change the definition of the text.output
entity from INCLUDE to
IGNORE. Re-normalise the file, and examine the
output to see what has changed.
+
+
+ Conclusion
+
+ That is the conclusion of this SGML primer. For reasons of space
+ and complexity several things have not been covered in depth (or at
+ all). However, the previous sections cover enough SGML for you to be
+ able to follow the organisation of the FDP documentation.
+