XML Notes

Table of Contents

Section 7: XML Documents

A “well-formed” XML Document is a document that conforms to the XML syntax that was described in the previous section.

The following is a well-formed XML document:

<?xml version=”1.0”?>
<note>
  <to>John</to>
  <from>Peter</from>
  <heading>Reminder</heading>
  <body>Don’t forget to send me the data</body>
</note>

A "Valid" XML document is a "Well-formed" XML document, which conforms to the rules of a Document Type Definition (DTD). The following is the same document as above but with an added reference to a DTD:

<?xml version=”1.0”?>
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
  <to>John</to>
  <from>Peter</from>
  <heading>Reminder</heading>
  <body>Don’t forget to send me the data</body>
</note>

DOCUMENT TYPE DEFINITIONS (DTD)

A Document Type Definition defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements.

A DTD can be declared inline in your XML document, or as an external reference. The example above for a "Valid" XML was an external reference.

I. Internal DOCTYPE Declarations

If the DTD is included in your XML source file, it should be wrapped in a DOCTYPE definition with the following syntax:

<!DOCTYPE root-element [element-declarations]>

Here is an example XML document with a DTD:

<?xml version="1.0"?>
<!DOCTYPE note [
  <!ELEMENT note (to,from,heading,body)>
  <!ELEMENT to      (#PCDATA)>
  <!ELEMENT from    (#PCDATA)>
  <!ELEMENT heading (#PCDATA)>
  <!ELEMENT body    (#PCDATA)>
]>
<note>
  <to>GA Astatine</to>
  <from>CM Zosite Konstyte Styles</from>
  <heading>Reminder</heading>
  <body>Don't forget to take over the Galaxy, Sir!!</body>
</note>

The DTD above is interpreted like this:

  • !DOCTYPE note (in line 2) defines that this is a document of the type note.
  • !ELEMENT note (in line 3) defines the note element as having four elements: "to,from,heading,body".
  • !ELEMENT to (in line 4) defines the to element to be of the type #PCDATA.
  • !ELEMENT from (in line 5) defines the from element to be of the type #PCDATA.
  • and so on...

II. External DOCTYPE Declarations

If the DTD is external to your XML source file, it should be wrapped in a DOCTYPE definition with the following syntax:

<!DOCTYPE root-element SYSTEM "filename">

This is the same XML document as above, but with an external DTD:

<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
  <to>GA Astatine</to>
  <from>CM Zosite Konstyte Styles</from>
  <heading>Reminder</heading>
  <body>Don't forget to take over the Galaxy, Sir!!</body>
</note>

And this is a copy of the file "note.dtd" containing the DTD:

<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>

III. Why using a DTD?

  • With DTD, each of your XML files can carry a description of its own format with it.
  • With a DTD, independent groups of people can agree to use a common DTD for interchanging data.
  • Your application can use a standard DTD to verify that the data you receive from the outside world is valid.
  • You can also use a DTD to verify your own data.

IV. Some examples of DTDs

TV Schedule DTD

By David Moisan. Copied from his Web: http://www.davidmoisan.org/

<!DOCTYPE TVSCHEDULE [

<!ELEMENT TVSCHEDULE (CHANNEL+)>
<!ELEMENT CHANNEL (BANNER,DAY+)>
<!ELEMENT BANNER (#PCDATA)>
<!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)>
<!ELEMENT HOLIDAY (#PCDATA)>
<!ELEMENT DATE (#PCDATA)>
<!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)>
<!ELEMENT TIME (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)> 
<!ELEMENT DESCRIPTION (#PCDATA)>

<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED>
<!ATTLIST CHANNEL CHAN CDATA #REQUIRED>
<!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>
<!ATTLIST TITLE RATING CDATA #IMPLIED>
<!ATTLIST TITLE LANGUAGE CDATA #IMPLIED>

]>

Newspaper Article DTD

Copied from http://www.vervet.com/

<!DOCTYPE NEWSPAPER [

<!ELEMENT NEWSPAPER (ARTICLE+)>
<!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)>
<!ELEMENT HEADLINE (#PCDATA)>
<!ELEMENT BYLINE (#PCDATA)>
<!ELEMENT LEAD (#PCDATA)>
<!ELEMENT BODY (#PCDATA)>
<!ELEMENT NOTES (#PCDATA)>

<!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED>
<!ATTLIST ARTICLE EDITOR CDATA #IMPLIED>
<!ATTLIST ARTICLE DATE CDATA #IMPLIED>
<!ATTLIST ARTICLE EDITION CDATA #IMPLIED>

<!ENTITY NEWSPAPER "Vervet Logic Times">
<!ENTITY PUBLISHER "Vervet Logic Press">
<!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press">

]>