this electronic form is original
This section shall support people who have modified the TEI DTD and want to migrate these modifications from SGML to XML, i.e., who want to use the XML-based P4 DTD with equivalent modifications. We begin with some general remarks, then describe an example DTD modification that covers the most important issues, outline a recommended migration procedure and carry out the key step hands-on on the example.
If the elements or content models that the TEI provides don't
quite meet the requirements of your project, there is an official
esacpe route: you can modify the DTD in a number of well-defined ways
and your documents still remain TEI conformant.
This involves
creating two extension files, setting some parameter entities,
possibly defining new elements or redefining existing ones, and
making these modifications known to the parser in the DTD subset at
the beginning of the document.
Although the process is a lot simpler than it looks at first
glance, many people have taken unofficial escape routes, especially
the users of the TEI Lite DTD, who would have been required to first
switch to a full TEI DTD before applying local extensions. It is
admittedly simpler to just open your local copy of teilite.dtd
and change a few lines. Only later will you find out why the TEI
Guidelines don't advertise this, and one of those moments could be
the migration of your customized DTD to XML.
If you are in this situation now, there are two and a half ways to proceed:
This being said, the rest of this section shall support you in migrating DTD extensions made using the official procedures. So: what types of TEI extensions exist and what is involved in migrating them from SGML to XML? The guidelines know four kinds of modification:
- deletion of elements;
- renaming of elements;
- extension of classes;
- modification of content models or attribute lists.
For practical purposes, the fourth item can be subdivided into:
The first three cases are extremely easy, the items of group 4 require more detailed attention. The following is a short list of some critical issues involved. In the following subsections, we will work through a fictitious example that covers most of these issues.
-or
Othat indicate whether start and end tag are required or can be omitted. These indicators don't exist anymore in XML DTDs and your private DTD snippets need to be modified.
In this subsection, we will do some simple TEI DTD modifications
in SGML. This will then serve as a tutorial example for the
migration to XML. While working on this example, the main problems
in converting DTD extensions should be covered. Not everyone will
need everything treated here, and some needs might not be covered,
but this should be an easy, hands-on starting point for most
projects.
Let's assume that five years ago, we wanted a TEI P3 DTD for prose that meets the following extra requirements (these requirements are tutorial examples only, this is no statement on whether they are recommendable TEI practice):
imageurlthat contains an URL for an image of the page;
These requirements can be cast into TEI SGML by creating two
files, my_sgml.ent
and my_sgml.dtd
that look as
follows:
A sample document godot.sgml
using these extensions would
look like this:
I will come really soon now.
This example shall be migrated to TEI P4 XML in the next two subsections
Although the following step-by-step list may sound over-protective, this approach is recommended to help you keep a clear head while you are converting DTD and documents. You are switching from P3 to P4, from SGML to XML DTDs, from SGML to XML documents and from SGML to XML parsers at the same time, and it can be difficult to find one's way through these many potential pitfalls.
Of the procedure recommended above, we will now focus on rewriting the
DTD extensions in XML, with the example DTD modification described
earlier as a basis. We will be creating the files my_xml.ent
and
my_xml.dtd
Before we start out, though, a strategic decision has to be made: Shall we burn the bridges and support only XML in the future? The P4 DTD provides mechanisms to parse both XML and SGML, and we can do the same for our customized DTD, if we will need to support SGML in parallel for some while. It takes a little more thought and effort, but in return you get the comfort of a safe transition period.
In this document, we shall call this
One obvious syntactic difference between SGML and XML DTDs are
the -
and O
in SGML element declarations and indicate
whether start and end tags need to be present or not. They are
superfluous and gone in XML, where minimization is not allowed. The TEI
P4 DTD provides and uses parameter entities %om.RO
and
%om.RR
to be used in their place. For SGML parsing, they expand
to - -
and - O
respectively, for XML parsing they expand
to nothing. om.RO
is used for elements that require only a start
tag (mostly empty elements), om.RR
is used for elements that
require start and end tag (non-empty elements should be defined that
way). We can make use of this mechanism for our dual-use DTD
extensions; another useful parameter entity is %TEI.XML
that will
expand to IGNORE
for SGML parsing and to INCLUDE
for XML
parsing.
So let's go to work:
If we first check for consistent case of the element and
attribute names in our DTD, we discover that element
Some things are easy: the renaming of - -
in the definition
of %om.RR;
.
This decision comes up again with the imageUrl
and it will be
merged with the existing ATTLIST in the TEI DTD files; there is no need
to suppress and copy the definition of
For continuing support of SGML, we have to suppress and
redefine the element as before. We find it in the TEI DTD files, copy
the P4 definition and modify it for our extra attribute, also replacing
- -
with %om.RR;
in the element definition.
The most difficult problem is the
Also, the simple way of allowing Incl
that is part of
every content model within
The first runs with the XML parser result in many warnings
because of redefined parameter entities; this is normal. Some syntax
correction is required where XML is more strict than SGML: we forgot a
semicolon in a parameter entity reference, %paraContent
must not
be in parentheses while the #PCDATA
for
The cross-check of the dual-use version with the SGML parser exposes a little additional problem: the document now uses the character entities < and > which are predefined in XML, but not in SGML; once discovered this is easily fixed. In the example file, you will find a solution that looks a little complicated but works flawlessly with SGML and XML.
The reworked extension files in the XML-only form look like this:
In the dual-use form, the DTD extension file comes out a little longer (my_dual.ent
is identical to my_xml.ent
above):
We can easily convert our short test document manually. All that
needs to change is the initial XML declaration, the XML-specific
parameter entity, the empty-tag syntax for
I will come really soon now.