SGML Primer

diff --git a/en_US.ISO8859-1/books/fdp-primer/sgml-primer/chapter.sgml b/en_US.ISO8859-1/books/fdp-primer/sgml-primer/chapter.sgml index e9f41128b9..e124d85a66 100644 --- a/en_US.ISO8859-1/books/fdp-primer/sgml-primer/chapter.sgml +++ b/en_US.ISO8859-1/books/fdp-primer/sgml-primer/chapter.sgml @@ -1,1580 +1,1580 @@ SGML Primer The majority of FDP documentation is written in applications of SGML. This chapter explains exactly what that means, how to read and understand the source to the documentation, and the sort of SGML tricks you will see used in the documentation. Portions of this section were inspired by Mark Galassi's Get Going With DocBook. Overview Way back when, electronic text was simple to deal with. Admittedly, you had to know which character set your document was written in (ASCII, EBCDIC, or one of a number of others) but that was about it. Text was text, and what you saw really was what you got. No frills, no formatting, no intelligence. Inevitably, this was not enough. Once you have text in a machine-usable format, you expect machines to be able to use it and manipulate it intelligently. You would like to indicate that certain phrases should be emphasised, or added to a glossary, or be hyperlinks. You might want filenames to be shown in a typewriter style font for viewing on screen, but as italics when printed, or any of a myriad of other options for presentation. It was once hoped that Artificial Intelligence (AI) would make this easy. Your computer would read in the document and automatically identify key phrases, filenames, text that the reader should type in, examples, and more. Unfortunately, real life has not happened quite like that, and our computers require some assistance before they can meaningfully process our text. More precisely, they need help identifying what is what. You or I can look at

To remove /tmp/foo use &man.rm.1;. &prompt.user; rm /tmp/foo

and easily see which parts are filenames, which are commands to be typed in, which parts are references to manual pages, and so on. But the computer processing the document cannot. For this we need markup. Markup is commonly used to describe adding value or increasing cost. The term takes on both these meanings when applied to text. Markup is additional text included in the document, distinguished from the document's content in some way, so that programs that process the document can read the markup and use it when making decisions about the document. Editors can hide the markup from the user, so the user is not distracted by it. The extra information stored in the markup adds value to the document. Adding the markup to the document must typically be done by a person—after all, if computers could recognise the text sufficiently well to add the markup then there would be no need to add it in the first place. This increases the cost (i.e., the effort required) to create the document. The previous example is actually represented in this document like this; To remove /tmp/foo use &man.rm.1;. &prompt.user; rm /tmp/foo]]> As you can see, the markup is clearly separate from the content. Obviously, if you are going to use markup you need to define what your markup means, and how it should be interpreted. You will need a markup language that you can follow when marking up your documents. Of course, one markup language might not be enough. A markup language for technical documentation has very different requirements than a markup language that was to be used for cookery recipes. This, in turn, would be very different from a markup language used to describe poetry. What you really need is a first language that you use to write these other markup languages. A meta markup language. This is exactly what the Standard Generalised Markup Language (SGML) is. Many markup languages have been written in SGML, including the two most used by the FDP, HTML and DocBook. Each language definition is more properly called a Document Type Definition (DTD). The DTD specifies the name of the elements that can be used, what order they appear in (and whether some markup can be used inside other markup) and related information. A DTD is sometimes referred to as an application of SGML. A DTD is a complete specification of all the elements that are allowed to appear, the order in which they should appear, which elements are mandatory, which are optional, and so forth. This makes it possible to write an SGML parser which reads in both the DTD and a document which claims to conform to the DTD. The parser can then confirm whether or not all the elements required by the DTD are in the document in the right order, and whether there are any errors in the markup. This is normally referred to as validating the document. This processing simply confirms that the choice of elements, their ordering, and so on, conforms to that listed in the DTD. It does not check that you have used appropriate markup for the content. If you were to try and mark up all the filenames in your document as function names, the parser would not flag this as an error (assuming, of course, that your DTD defines elements for filenames and functions, and that they are allowed to appear in the same place). It is likely that most of your contributions to the Documentation Project will consist of content marked up in either HTML or DocBook, rather than alterations to the DTDs. For this reason this book will not touch on how to write a DTD. Elements, tags, and attributes All the DTDs written in SGML share certain characteristics. This is hardly surprising, as the philosophy behind SGML will inevitably show through. One of the most obvious manifestations of this philosophy is that of content and elements. Your documentation (whether it is a single web page, or a lengthy book) is considered to consist of content. This content is then divided (and further subdivided) into elements. The purpose of adding markup is to name and identify the boundaries of these elements for further processing. For example, consider a typical book. At the very top level, the book is itself an element. This book element obviously contains chapters, which can be considered to be elements in their own right. Each chapter will contain more elements, such as paragraphs, quotations, and footnotes. Each paragraph might contain further elements, identifying content that was direct speech, or the name of a character in the story. You might like to think of this as chunking content. At the very top level you have one chunk, the book. Look a little deeper, and you have more chunks, the individual chapters. These are chunked further into paragraphs, footnotes, character names, and so on. Notice how you can make this differentiation between different elements of the content without resorting to any SGML terms. It really is surprisingly straightforward. You could do this with a highlighter pen and a printout of the book, using different colours to indicate different chunks of content. Of course, we do not have an electronic highlighter pen, so we need some other way of indicating which element each piece of content belongs to. In languages written in SGML (HTML, DocBook, et al) this is done by means of tags. A tag is used to identify where a particular element starts, and where the element ends. The tag is not part of the element itself. Because each DTD was normally written to mark up specific types of information, each one will recognise different elements, and will therefore have different names for the tags. For an element called element-name the start tag will normally look like <element-name>. The corresponding closing tag for this element is </element-name>. Using an element (start and end tags) HTML has an element for indicating that the content enclosed by the element is a paragraph, called p. This element has both start and end tags. This is a paragraph. It starts with the start tag for the 'p' element, and it will end with the end tag for the 'p' element.

This is another paragraph. But this one is much shorter.

]]> Not all elements require an end tag. Some elements have no content. For example, in HTML you can indicate that you want a horizontal line to appear in the document. Obviously, this line has no content, so just the start tag is required for this element. Using an element (start tag only) HTML has an element for indicating a horizontal rule, called hr. This element does not wrap content, so only has a start tag. This is a paragraph.

This is another paragraph. A horizontal rule separates this from the previous paragraph.

]]> If it is not obvious by now, elements can contain other elements. In the book example earlier, the book element contained all the chapter elements, which in turn contained all the paragraph elements, and so on. Elements within elements; <sgmltag>em</sgmltag> This is a simple paragraph where some of the words have been emphasised.

]]> The DTD will specify the rules detailing which elements can contain other elements, and exactly what they can contain. People often confuse the terms tags and elements, and use the terms as if they were interchangeable. They are not. An element is a conceptual part of your document. An element has a defined start and end. The tags mark where the element starts and end. When this document (or anyone else knowledgeable about SGML) refers to the tag they mean the literal text consisting of the three characters <, p, and >. But the phrase the element refers to the whole element. This distinction is very subtle. But keep it in mind. Elements can have attributes. An attribute has a name and a value, and is used for adding extra information to the element. This might be information that indicates how the content should be rendered, or might be something that uniquely identifies that occurrence of the element, or it might be something else. An element's attributes are written inside the start tag for that element, and take the form attribute-name="attribute-value". In sufficiently recent versions of HTML, the p element has an attribute called align, which suggests an alignment (justification) for the paragraph to the program displaying the HTML. The align attribute can take one of four defined values, left, center, right and justify. If the attribute is not specified then the default is left. Using an element with an attribute The inclusion of the align attribute on this paragraph was superfluous, since the default is left.

This may appear in the center.

]]> Some attributes will only take specific values, such as left or justify. Others will allow you to enter anything you want. If you need to include quotes (") within an attribute then use single quotes around the attribute value. Single quotes around attributes I am on the right!

]]> Sometimes you do not need to use quotes around attribute values at all. However, the rules for doing this are subtle, and it is far simpler just to always quote your attribute values. The information on attributes, elements, and tags is stored in SGML catalogs. The various Documentation Project tools use these catalog files to validate your work. The tools in textproc/docproj include a variety of SGML catalog files. The FreeBSD Documentation Project includes its own set of catalog files. Your tools need to know about both sorts of catalog files. For you to do… In order to run the examples in this document you will need to install some software on your system and ensure that an environment variable is set correctly. Download and install textproc/docproj from the FreeBSD ports system. This is a meta-port that should download and install all of the programs and supporting files that are used by the Documentation Project. Add lines to your shell startup files to set SGML_CATALOG_FILES. (If you are not working on the English version of the documentation, you will want to substitute the correct directory for your language.) <filename>.profile</filename>, for &man.sh.1; and &man.bash.1; users SGML_ROOT=/usr/local/share/sgml SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES SGML_CATALOG_FILES=${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES SGML_CATALOG_FILES=/usr/doc/share/sgml/catalog:$SGML_CATALOG_FILES SGML_CATALOG_FILES=/usr/doc/en_US.ISO8859-1/share/sgml/catalog:$SGML_CATALOG_FILES export SGML_CATALOG_FILES <filename>.login</filename>, for &man.csh.1; and &man.tcsh.1; users setenv SGML_ROOT /usr/local/share/sgml setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES setenv SGML_CATALOG_FILES /usr/doc/share/sgml/catalog:$SGML_CATALOG_FILES setenv SGML_CATALOG_FILES /usr/doc/en_US.ISO8859-1/share/sgml/catalog:$SGML_CATALOG_FILES Then either log out, and log back in again, or run those commands from the command line to set the variable values. Create example.sgml, and enter the following text; An example HTML file

This is a paragraph containing some text.

This paragraph contains some more text.

This paragraph might be right-justified.

]]> Try and validate this file using an SGML parser. Part of textproc/docproj is the &man.nsgmls.1; validating parser. Normally, &man.nsgmls.1; reads in a document marked up according to an SGML DTD and returns a copy of the document's Element Structure Information Set (ESIS, but that is not important right now). However, when &man.nsgmls.1; is given the parameter, &man.nsgmls.1; will suppress its normal output, and just print error messages. This makes it a useful way to check to see if your document is valid or not. Use &man.nsgmls.1; to check that your document is valid; &prompt.user; nsgmls -s example.sgml As you will see, &man.nsgmls.1; returns without displaying any output. This means that your document validated successfully. See what happens when required elements are omitted. Try removing the title and /title tags, and re-run the validation. &prompt.user; nsgmls -s example.sgml nsgmls:example.sgml:5:4:E: character data is not allowed here nsgmls:example.sgml:6:8:E: end tag for "HEAD" which is not finished The error output from &man.nsgmls.1; is organised into colon-separated groups, or columns. Column Meaning 1 The name of the program generating the error. This will always be nsgmls. 2 The name of the file that contains the error. 3 Line number where the error appears. 4 Column number where the error appears. 5 A one letter code indicating the nature of the message. I indicates an informational message, W is for warnings, and E is for errors It is not always the fifth column either. nsgmls -sv displays nsgmls:I: SP version "1.3" (depending on the installed version). As you can see, this is an informational message. , and X is for cross-references. As you can see, these messages are errors. 6 The text of the error message. Simply omitting the title tags has generated 2 different errors. The first error indicates that content (in this case, characters, rather than the start tag for an element) has occurred where the SGML parser was expecting something else. In this case, the parser was expecting to see one of the start tags for elements that are valid inside head (such as title). The second error is because head elements must contain a title element. Because it does not &man.nsgmls.1; considers that the element has not been properly finished. However, the closing tag indicates that the element has been closed before it has been finished. Put the title element back in. The DOCTYPE declaration The beginning of each document that you write must specify the name of the DTD that the document conforms to. This is so that SGML parsers can determine the DTD and ensure that the document does conform to it. This information is generally expressed on one line, in the DOCTYPE declaration. A typical declaration for a document written to conform with version 4.0 of the HTML DTD looks like this; ]]> That line contains a number of different components. <! Is the indicator that indicates that this is an SGML declaration. This line is declaring the document type. DOCTYPE Shows that this is an SGML declaration for the document type. html Names the first element that will appear in the document. PUBLIC "-//W3C//DTD HTML 4.0//EN" Lists the Formal Public Identifier (FPI) Formal Public Identifier for the DTD that this document conforms to. Your SGML parser will use this to find the correct DTD when processing this document. PUBLIC is not a part of the FPI, but indicates to the SGML processor how to find the DTD referenced in the FPI. Other ways of telling the SGML parser how to find the DTD are shown later. > Returns to the document. Formal Public Identifiers (FPIs)<indexterm significance="preferred"> <primary>Formal Public Identifier</primary> </indexterm> You do not need to know this, but it is useful background, and might help you debug problems when your SGML processor can not locate the DTD you are using. FPIs must follow a specific syntax. This syntax is as follows; "Owner//Keyword Description//Language" Owner This indicates the owner of the FPI. If this string starts with ISO then this is an ISO owned FPI. For example, the FPI "ISO 8879:1986//ENTITIES Greek Symbols//EN" lists ISO 8879:1986 as being the owner for the set of entities for Greek symbols. ISO 8879:1986 is the ISO number for the SGML standard. Otherwise, this string will either look like -//Owner or +//Owner (notice the only difference is the leading + or -). If the string starts with - then the owner information is unregistered, with a + it identifies it as being registered. ISO 9070:1991 defines how registered names are generated; it might be derived from the number of an ISO publication, an ISBN code, or an organisation code assigned according to ISO 6523. In addition, a registration authority could be created in order to assign registered names. The ISO council delegated this to the American National Standards Institute (ANSI). Because the FreeBSD Project has not been registered the owner string is -//FreeBSD. And as you can see, the W3C are not a registered owner either. Keyword There are several keywords that indicate the type of information in the file. Some of the most common keywords are DTD, ELEMENT, ENTITIES, and TEXT. DTD is used only for DTD files, ELEMENT is usually used for DTD fragments that contain only entity or element declarations. TEXT is used for SGML content (text and tags). Description Any description you want to supply for the contents of this file. This may include version numbers or any short text that is meaningful to you and unique for the SGML system. Language This is an ISO two-character code that identifies the native language for the file. EN is used for English. <filename>catalog</filename> files If you use the syntax above and try and process this document using an SGML processor, the processor will need to have some way of turning the FPI into the name of the file on your computer that contains the DTD. In order to do this it can use a catalog file. A catalog file (typically called catalog) contains lines that map FPIs to filenames. For example, if the catalog file contained the line; PUBLIC "-//W3C//DTD HTML 4.0//EN" "4.0/strict.dtd" The SGML processor would know to look up the DTD from strict.dtd in the 4.0 subdirectory of whichever directory held the catalog file that contained that line. Look at the contents of /usr/local/share/sgml/html/catalog. This is the catalog file for the HTML DTDs that will have been installed as part of the textproc/docproj port. <envar>SGML_CATALOG_FILES</envar> In order to locate a catalog file, your SGML processor will need to know where to look. Many of them feature command line parameters for specifying the path to one or more catalogs. In addition, you can set SGML_CATALOG_FILES to point to the files. This environment variable should consist of a colon-separated list of catalog files (including their full path). Typically, you will want to include the following files; /usr/local/share/sgml/docbook/4.1/catalog /usr/local/share/sgml/html/catalog /usr/local/share/sgml/iso8879/catalog /usr/local/share/sgml/jade/catalog You should already have done this. Alternatives to FPIs Instead of using an FPI to indicate the DTD that the document conforms to (and therefore, which file on the system contains the DTD) you can explicitly specify the name of the file. The syntax for this is slightly different: ]]> The SYSTEM keyword indicates that the SGML processor should locate the DTD in a system specific fashion. This typically (but not always) means the DTD will be provided as a filename. Using FPIs is preferred for reasons of portability. You do not want to have to ship a copy of the DTD around with your document, and if you used the SYSTEM identifier then everyone would need to keep their DTDs in the same place. Escaping back to SGML Earlier in this primer I said that SGML is only used when writing a DTD. This is not strictly true. There is certain SGML syntax that you will want to be able to use within your documents. For example, comments can be included in your document, and will be ignored by the parser. Comments are entered using SGML syntax. Other uses for SGML syntax in your document will be shown later too. Obviously, you need some way of indicating to the SGML processor that the following content is not elements within the document, but is SGML that the parser should act upon. These sections are marked by <! ... > in your document. Everything between these delimiters is SGML syntax as you might find within a DTD. As you may just have realised, the DOCTYPE declaration is an example of SGML syntax that you need to include in your document… Comments Comments are an SGML construction, and are normally only valid inside a DTD. However, as shows, it is possible to use SGML syntax within your document. The delimiter for SGML comments is the string --. The first occurrence of this string opens a comment, and the second closes it. SGML generic comment  ]]> Use 2 dashes There is a problem with producing the Postscript and PDF versions of this document. The above example probably shows just one hyphen symbol, - after the <! and before the >. You must use two -, not one. The Postscript and PDF versions have translated the two - in the original to a longer, more professional em-dash, and broken this example in the process. The HTML, plain text, and RTF versions of this document are not affected. ]]> If you have used HTML before you may have been shown different rules for comments. In particular, you may think that the string . This is not the case. A lot of web browsers have broken HTML parsers, and will accept that as valid. However, the SGML parsers used by the Documentation Project are much stricter, and will reject documents that make that error. - Errorneous SGML comments + Erroneous SGML comments ]]> The SGML parser will treat this as though it were actually; <!THIS IS OUTSIDE THE COMMENT> This is not valid SGML, and may give confusing error messages. ]]> As the example suggests, do not write comments like that. ]]> That is a (slightly) better approach, but it still potentially confusing to people new to SGML. For you to do… Add some comments to example.sgml, and check that the file still validates using &man.nsgmls.1; Add some invalid comments to example.sgml, and see the error messages that &man.nsgmls.1; gives when it encounters an invalid comment. Entities Entities are a mechanism for assigning names to chunks of content. As an SGML parser processes your document, any entities it finds are replaced by the content of the entity. This is a good way to have re-usable, easily changeable chunks of content in your SGML documents. It is also the only way to include one marked up file inside another using SGML. There are two types of entities which can be used in two different situations; general entities and parameter entities. General Entities You cannot use general entities in an SGML context (although you define them in one). They can only be used in your document. Contrast this with parameter entities. Each general entity has a name. When you want to reference a general entity (and therefore include whatever text it represents in your document), you write &entity-name;. For example, suppose you had an entity called current.version which expanded to the current version number of your product. You could write; The current version of our product is ¤t.version;.]]> When the version number changes you can simply change the definition of the value of the general entity and reprocess your document. You can also use general entities to enter characters that you could not otherwise include in an SGML document. For example, < and & cannot normally appear in an SGML document. When the SGML parser sees the < symbol it assumes that a tag (either a start tag or an end tag) is about to appear, and when it sees the & symbol it assumes the next text will be the name of an entity. Fortunately, you can use the two general entities < and & whenever you need to include one or other of these A general entity can only be defined within an SGML context. Typically, this is done immediately after the DOCTYPE declaration. Defining general entities ]>]]> Notice how the DOCTYPE declaration has been extended by adding a square bracket at the end of the first line. The two entities are then defined over the next two lines, before the square bracket is closed, and then the DOCTYPE declaration is closed. The square brackets are necessary to indicate that we are extending the DTD indicated by the DOCTYPE declaration. Parameter entities Like general entities, parameter entities are used to assign names to reusable chunks of text. However, where as general entities can only be used within your document, parameter entities can only be used within an SGML context. Parameter entities are defined in a similar way to general entities. However, instead of using &entity-name; to refer to them, use %entity-name; Parameter entities use the Percent symbol. . The definition also includes the % between the ENTITY keyword and the name of the entity. Defining parameter entities ]>]]> This may not seem particularly useful. It will be. For you to do… Add a general entity to example.sgml. ]> An example HTML file

This is a paragraph containing some text.

This paragraph contains some more text.

This paragraph might be right-justified.

The current version of this document is: &version;

]]> Validate the document using &man.nsgmls.1; Load example.sgml into your web browser (you may need to copy it to example.html before your browser recognises it as an HTML document). Unless your browser is very advanced, you will not see the entity reference &version; replaced with the version number. Most web browsers have very simplistic parsers which do not handle proper SGML This is a shame. Imagine all the problems and hacks (such as Server Side Includes) that could be avoided if they did. . The solution is to normalise your document using an SGML normaliser. The normaliser reads in valid SGML and outputs equally valid SGML which has been transformed in some way. One of the ways in which the normaliser transforms the SGML is to expand all the entity references in the document, replacing the entities with the text that they represent. You can use &man.sgmlnorm.1; to do this. &prompt.user; sgmlnorm example.sgml > example.html You should find a normalised (i.e., entity references expanded) copy of your document in example.html, ready to load into your web browser. If you look at the output from &man.sgmlnorm.1; you will see that it does not include a DOCTYPE declaration at the start. To include this you need to use the option; &prompt.user; sgmlnorm -d example.sgml > example.html Using entities to include files Entities (both general and parameter) are particularly useful when used to include one file inside another. Using general entities to include files Suppose you have some content for an SGML book organised into files, one file per chapter, called chapter1.sgml, chapter2.sgml, and so forth, with a book.sgml file that will contain these chapters. In order to use the contents of these files as the values for your entities, you declare them with the SYSTEM keyword. This directs the SGML parser to use the contents of the named file as the value of the entity. Using general entities to include files ]> &chapter.1; &chapter.2; &chapter.3; ]]> When using general entities to include other files within a document, the files being included (chapter1.sgml, chapter2.sgml, and so on) must not start with a DOCTYPE declaration. This is a syntax error. Using parameter entities to include files Recall that parameter entities can only be used inside an SGML context. Why then would you want to include a file within an SGML context? You can use this to ensure that you can reuse your general entities. Suppose that you had many chapters in your document, and you reused these chapters in two different books, each book organising the chapters in a different fashion. You could list the entities at the top of each book, but this quickly becomes cumbersome to manage. Instead, place the general entity definitions inside one file, and use a parameter entity to include that file within your document. Using parameter entities to include files First, place your entity definitions in a separate file, called chapters.ent. This file contains the following; ]]> Now create a parameter entity to refer to the contents of the file. Then use the parameter entity to load the file into the document, which will then make all the general entities available for use. Then use the general entities as before; %chapters; ]> &chapter.1; &chapter.2; &chapter.3; ]]> For you to do… Use general entities to include files Create three files, para1.sgml, para2.sgml, and para3.sgml. Put content similar to the following in each file; This is the first paragraph.

]]> Edit example.sgml so that it looks like this; ]> An example HTML file

The current version of this document is: &version;

¶1; ¶2; ¶3; ]]> Produce example.html by normalising example.sgml. &prompt.user; sgmlnorm -d example.sgml > example.html Load example.html in to your web browser, and confirm that the paran.sgml files have been included in example.html. Use parameter entities to include files You must have taken the previous steps first. Edit example.sgml so that it looks like this; %entities; ]> An example HTML file

The current version of this document is: &version;

¶1; ¶2; ¶3; ]]> Create a new file, entities.sgml, with this content: ]]> Produce example.html by normalising example.sgml. &prompt.user; sgmlnorm -d example.sgml > example.html Load example.html in to your web browser, and confirm that the paran.sgml files have been included in example.html. Marked sections SGML provides a mechanism to indicate that particular pieces of the document should be processed in a special way. These are termed marked sections. Structure of a marked section <![ KEYWORD [ Contents of marked section ]]> As you would expect, being an SGML construct, a marked section starts with <!. The first square bracket begins to delimit the marked section. KEYWORD describes how this marked section should be processed by the parser. The second square bracket indicates that the content of the marked section starts here. The marked section is finished by closing the two square brackets, and then returning to the document context from the SGML context with > Marked section keywords <literal>CDATA</literal>, <literal>RCDATA</literal> These keywords denote the marked sections content model, and allow you to change it from the default. When an SGML parser is processing a document it keeps track of what is called the content model. Briefly, the content model describes what sort of content the parser is expecting to see, and what it will do with it when it finds it. The two content models you will probably find most useful are CDATA and RCDATA. CDATA is for Character Data. If the parser is in this content model then it is expecting to see characters, and characters only. In this model the < and & symbols lose their special status, and will be treated as ordinary characters. RCDATA is for Entity references and character data If the parser is in this content model then it is expecting to see characters and entities. < loses its special status, but & will still be treated as starting the beginning of a general entity. This is particularly useful if you are including some verbatim text that contains lots of < and & characters. While you could go through the text ensuring that every < is converted to a < and every & is converted to a &, it can be easier to mark the section as only containing CDATA. When the SGML parser encounters this it will ignore the < and & symbols embedded in the content. When you use CDATA or RCDATA in examples of text marked up in SGML, keep in mind that the content of CDATA is not validated. You have to check the included SGML text using other means. You could, for example, write the example in another document, validate the example code, and then paste it to your CDATA content. Using a CDATA marked section <para>Here is an example of how you would include some text that contained many < and & symbols. The sample text is a fragment of HTML. The surrounding text (<para> and <programlisting>) are from DocBook.</para> <programlisting> <![ CDATA [ This is a sample that shows you some of the elements within HTML. Since the angle brackets are used so many times, it is simpler to say the whole example is a CDATA marked section than to use the entity names for the left and right angle brackets throughout.

This is a listitem
This is a second listitem
This is a third listitem

This is the end of the example.

]]> ]]> </programlisting> If you look at the source for this document you will see this technique used throughout. <literal>INCLUDE</literal> and <literal>IGNORE</literal> If the keyword is INCLUDE then the contents of the marked section will be processed. If the keyword is IGNORE then the marked section is ignored and will not be processed. It will not appear in the output. Using <literal>INCLUDE</literal> and <literal>IGNORE</literal> in marked sections <![ INCLUDE [ This text will be processed and included. ]]> <![ IGNORE [ This text will not be processed or included. ]]> By itself, this is not too useful. If you wanted to remove text from your document you could cut it out, or wrap it in comments. It becomes more useful when you realise you can use parameter entities to control this. Remember that parameter entities can only be used in SGML contexts, and the keyword of a marked section is an SGML context. For example, suppose that you produced a hard-copy version of some documentation and an electronic version. In the electronic version you wanted to include some extra content that was not to appear in the hard-copy. Create a parameter entity, and set its value to INCLUDE. Write your document, using marked sections to delimit content that should only appear in the electronic version. In these marked sections use the parameter entity in place of the keyword. When you want to produce the hard-copy version of the document, change the parameter entity's value to IGNORE and reprocess the document. Using a parameter entity to control a marked section <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [ <!ENTITY % electronic.copy "INCLUDE"> ]]> ... <![ %electronic.copy [ This content should only appear in the electronic version of the document. ]]> When producing the hard-copy version, change the entity's definition to; <!ENTITY % electronic.copy "IGNORE"> On reprocessing the document, the marked sections that use %electronic.copy as their keyword will be ignored. For you to do… Create a new file, section.sgml, that contains the following; <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [ <!ENTITY % text.output "INCLUDE"> ]> <html> <head> <title>An example using marked sections</title> </head> <body> This paragraph <![ CDATA [contains many < characters (< < < < <) so it is easier to wrap it in a CDATA marked section ]]> <![ IGNORE [ This paragraph will definitely not be included in the output. ]]> <![ [ This paragraph might appear in the output, or it might not. Its appearance is controlled by the parameter entity. ]]> </body> </html> Normalise this file using &man.sgmlnorm.1; and examine the output. Notice which paragraphs have appeared, which have disappeared, and what has happened to the content of the CDATA marked section. Change the definition of the text.output entity from INCLUDE to IGNORE. Re-normalise the file, and examine the output to see what has changed. Conclusion That is the conclusion of this SGML primer. For reasons of space and complexity several things have not been covered in depth (or at all). However, the previous sections cover enough SGML for you to be able to follow the organisation of the FDP documentation. diff --git a/en_US.ISO8859-1/books/fdp-primer/structure/chapter.sgml b/en_US.ISO8859-1/books/fdp-primer/structure/chapter.sgml index 561835a09f..a00b8a65b1 100644 --- a/en_US.ISO8859-1/books/fdp-primer/structure/chapter.sgml +++ b/en_US.ISO8859-1/books/fdp-primer/structure/chapter.sgml @@ -1,294 +1,294 @@ Structuring documents under <filename>doc/</filename> The doc/ tree is organised in a particular fashion, and the documents that are part of the FDP are in turn organised in a particular fashion. The aim is to make it simple to add new documentation in to the tree and: make it easy to automate converting the document to other formats promote consistency between the different documentation organisations, to make it easier to switch between working on different documents make it easy to decide where in the tree new documentation should be placed In addition, the documentation tree has to accommodate documentation that could be in many different languages and in many different encodings. It is important that the structure of the documentation tree does not enforce any particular defaults or cultural preferences. The top level, <filename>doc/</filename> There are two types of directory under doc/, each with very specific directory names and meanings. Directory Meaning share/ Contains files that are not specific to the various translations and encodings of the documentation. Contains subdirectories to further categorise the information. For example, the files that comprise the &man.make.1; infrastructure are in share/mk, while the additional SGML support files (such as the FreeBSD extended DocBook DTD) are in share/sgml. lang.encoding/ One directory exists for each available translation and encoding of the documentation, for example en_US.ISO8859-1/ and zh_TW.Big5/. The names are long, but by fully specifying the language and encoding we prevent any future headaches should a translation team want to provide the documentation in the same language but in more than one encoding. This also completely isolates us from any problems that might be caused by a switch to Unicode. The <filename><replaceable>lang</replaceable>.<replaceable>encoding</replaceable>/</filename> directories These directories contain the documents themselves. The documentation is split into up to three more categories at this level, indicated by the different directory names. Directory Contents articles Documentation marked up as a DocBook article (or equivalent). Reasonably short, and broken up into sections. Normally only available as one HTML file. books Documentation marked up as a DocBook book (or equivalent). Book length, and broken up in to chapters. Normally available as both one large HTML file (for people with fast connections, or who want to print it easily from a browser) and as a collection of linked, smaller files. man For translations of the system manual pages. This directory will contain one or more mann directories, corresponding to the sections that have been translated. Not every lang.encoding directory will contain all of these directories. It depends on how much translation has been accomplished by that translation team. Document specific information This section contains specific notes about particular documents managed by the FDP. The Handbook books/handbook/ The Handbook is written to comply with the FreeBSD DocBook extended DTD. The Handbook is organised as a DocBook book. It is then divided into parts, each of which may contain several chapters. chapters are further subdivided into sections (sect1) and subsections (sect2, sect3) and so on. Physical organisation There are a number of files and directories within the handbook directory. The Handbook's organisation may change over time, and this document may lag in detailing the organisational changes. If you have any questions about how the Handbook is organised, please contact the &a.doc;. <filename>Makefile</filename> The Makefile defines some variables that affect how the SGML source is converted to other formats, and lists the various source files that make up the Handbook. It then includes the standard doc.project.mk file, to bring in the rest of the code that handles converting documents from one format to another. <filename>book.sgml</filename> This is the top level document in the Handbook. It contains the Handbook's DOCTYPE declaration, as well as the elements that describe the Handbook's structure. book.sgml uses parameter entities to load in the files with the .ent extension. These files (described later) then define general entities that are used throughout the rest of the Handbook. <filename><replaceable>directory</replaceable>/chapter.sgml</filename> Each chapter in the Handbook is stored in a file called chapter.sgml in a separate directory from the other chapters. Each directory is named after the value of the id attribute on the chapter element. For example, if one of the chapter files contains: ... ]]> then it will be called chapter.sgml in the kernelconfiguration directory. In general, the entire contents of the chapter will be held in this file. When the HTML version of the Handbook is produced, this will yield kernelconfiguration.html. This is because of the id value, and is not related to the name of the directory. In earlier versions of the Handbook the files were stored in the same directory as book.sgml, and named after the value of the id attribute on the file's chapter element. Moving them in to separate directories prepares for future plans for the Handbook. Specifically, it will soon be possible to include images in each chapter. It makes more sense for each image to be stored in a directory with the text for the chapter than to try and keep the text for all the chapters, and all the images, in one large directory. Namespace collisions would be inevitable, and it is easier to work with several directories with a few files in them than it is to work with one directory that has many files in it. A brief look will show that there are many directories with individual chapter.sgml files, including basics/chapter.sgml, introduction/chapter.sgml, and printing/chapter.sgml. Chapters and/or directories should not be named in a fashion that reflects their ordering within the Handbook. This ordering might change as the content within the Handbook is reorganised; - this sort of reorganistion should not (generally) include the + this sort of reorganisation should not (generally) include the need to rename files (unless entire chapters are being promoted or demoted within the hierarchy). Each chapter.sgml file will not be a complete SGML document. In particular, they will not have their own DOCTYPE lines at the start of the files. This is unfortunate as it makes it impossible to treat these as generic SGML files and simply convert them to HTML, RTF, PS, and other formats in the same way the main Handbook is generated. This would force you to rebuild the Handbook every time you want to see the effect a change has had on just one chapter. diff --git a/en_US.ISO8859-1/books/fdp-primer/the-website/chapter.sgml b/en_US.ISO8859-1/books/fdp-primer/the-website/chapter.sgml index 24f64eacc5..1d110e80a6 100644 --- a/en_US.ISO8859-1/books/fdp-primer/the-website/chapter.sgml +++ b/en_US.ISO8859-1/books/fdp-primer/the-website/chapter.sgml @@ -1,218 +1,218 @@ The Website Preparation Get 200MB free disk space. You will need the disk space for the SGML tools, a subset of the CVS tree, temporary build space and the installed web pages. If you already have installed the SGML tools and the CVS tree, you need only ~100MB free disk space. Make sure your documentation ports are up to date! When in doubt, remove the old ports using &man.pkg.delete.1; command before installing the port. For example, we currently depend on jade-1.2 and if you have installed jade-1.1, please do &prompt.root; pkg_delete jade-1.1 Setup a CVS repository. You need the directories www, doc and ports in the CVS tree (plus the CVSROOT of course). Please read the CVSup introduction http://www.FreeBSD.org/handbook/synching.html#CVSUP how to mirror a CVS tree or parts of a CVS tree. The essential cvsup collections are: www, doc-all, cvs-base, and ports-base. These collections require ~100MB free disk space. A full CVS tree - including src, doc, www, and ports - is currently 650MB large. Build the web pages from scratch Go to into a build directory with at least 60MB of free space. &prompt.root; mkdir /var/tmp/webbuild &prompt.root; cd /var/tmp/webbuild Checkout the SGML files from the CVS tree. &prompt.root; cvs -R co www doc Change in to the www directory, and run the &man.make.1; links target, to create the necessary symbolic links. &prompt.root; cd www &prompt.root; make links Change in to the en directory, and run the &man.make.1; all target, to create the web pages. &prompt.root; cd en &prompt.root; make all Install the web pages into your web server If you have moved out of the en directory, change back to it. &prompt.root; cd path/www/en Run the &man.make.1; install target, setting the DESTDIR variable to the name of the directory you want to install the files to. &prompt.root; make DESTDIR=/usr/local/www install If you have previously installed the web pages in to the same directory the install process will not have deleted any old or outdated pages. For example, if you build and install a new copy of the site every day, this command will find and delete all files that have not been updated in three days. &prompt.root; find /usr/local/www -ctime 3 -print0 | xargs -0 rm Environment variables CVSROOT Location of the CVS tree. Essential. &prompt.root; CVSROOT=/home/ncvs; export CVSROOT ENGLISH_ONLY If set and not empty, the makefiles will build and install only the English documents. All translations will be ignored. E.g.: &prompt.root; make ENGLISH_ONLY=YES all install If you want unset the variable ENGLISH_ONLY and build all pages, including translations, set the variable ENGLISH_ONLY to an empty value &prompt.root; make ENGLISH_ONLY="" all install clean WEB_ONLY - If set and not empty, the makefiles wil build and install + If set and not empty, the makefiles will build and install only the HTML pages from the www directory. All documents from the doc directory (Handbook, FAQ, Tutorials) will be ignored. E.g.: &prompt.root; make WEB_ONLY=YES all install NOPORTSCVS If set, the makefiles will not checkout files from the ports cvs repository. Instead, it will copy the files from /usr/ports (or where the variable PORTSBASE points to). CVSROOT is an environment variable. You must set it on the command line or in your dot files (~/.profile). WEB_ONLY, ENGLISH_ONLY and NOPORTSCVS are makefile variables. You can set the variables in /etc/make.conf, Makefile.inc or as environment variables on the command line or in your dot files.