The XSLT Processing Model
Although we often talk of an XSLT processor as something
that turns one XML document into another (or into an HTML or text document),
this is not strictly true. The specification actually talks in terms of a source tree
(or input
tree) and a result tree. There is
therefore an assumption that, for example, if we are starting from a text
document rather than an existing DOM tree, it has been turned into some sort of
tree structure before the XSLT processor starts its work, and that the result
tree will be used for further processing or serialized in some way to create
another text document.
The model, including formatting, therefore looks like this:
This concept is simple enough. But you will have read in
Chapter 1 that XSLT is a declarative language and uses templates. How does this
work in practice? Let's have a look at a simple XML document and stylesheet,
and walk through the processing.
Processing a Document
Here is my XML document it is the book catalog that you
will be familiar with if you have read Professional
XML (Wrox Press, ISBN 1-861003-11-0), although I have cut it down to just
two books, removed some elements and
renamed it shortcatalog.xml:
<?xml version="1.0"
encoding="utf-8" standalone="yes"?>
<Catalog>
<Book>
<Title>Designing Distributed Applications</Title>
<Authors>
<Author>Stephen Mohr</Author>
</Authors>
<PubDate>May 1999</PubDate>
<ISBN>1-861002-27-0</ISBN>
<Price>$49.99</Price>
</Book>
<Book>
<Title>Professional ASP 3.0</Title>
<Authors>
<Author>Alex Homer</Author>
<Author>Brian Francis</Author>
<Author>David Sussman</Author>
</Authors>
<PubDate>October 1999</PubDate>
<ISBN>1-861002-61-0</ISBN>
<Price>$59.99</Price>
</Book>
</Catalog>
We'll look at the XSLT
stylesheet we use to transform this document shortly, but let's now become an
XSLT processor and see what happens. We already know that, as an XSLT
processor, we cannot use the source XML, but need a tree representation based
on the structure and content of the document. So here it is:
Each node is described by a block of three rectangles. In
the top rectangle is the node type, with the node name in the rectangle below
it. The bottom rectangle contains an asterisk if the node has element content,
and the text if it has text content.
At the top of the tree is the root node
or document
root. Don't confuse
this with the root element (or document element)
familiar from XML.
The document root is the base of the document, and has the document element (<Catalog>)
as a child. It also has the XML declaration and any other top-level nodes
(which might be comments or processing instructions) as children. The document
element contains two child <Book>
elements, and these hold
the information about the books.
So now we have the tree structure, we can start to populate
and process it. This is the processing model we will
use:
Before XSL processing starts, both the source document and
XSLT stylesheet must be loaded into the processor's memory. How this happens is
dependent on the implementation. One option is that both are loaded as DOM
documents under the control of a program. Another option is that the stylesheet
is referenced by a processing instruction in the source XML document. IE5 can
operate in this way, and will automatically load the stylesheet when the XML
document is loaded.
And here is the XSLT stylesheet (TitleAndDate.xsl)
we will use to process the shortcatalog.xml
to get a new XML document
listing just the titles of the books and their publication dates:
<?xml version="1.0"
encoding="utf-8" standalone="yes"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template
match="Catalog">
<Books>
<xsl:apply-templates/>
</Books>
</xsl:template>
<xsl:template match="Book">
<Book>
<xsl:value-of select="Title"/>, <xsl:value-of
select="PubDate"/>
</Book>
</xsl:template>
</xsl:stylesheet>
Once the documents are in memory, we can start our
processing. The XSL processor starts by reading the template for the document
root from the stylesheet (step 1). Here is that template:
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
The first line indicates that it is a template, with a match
attribute to indicate the node or nodes it is matching. The attribute value is
an XPath expression, in this case just being the /
to indicate the document root.
Working round the diagram, at step 2 we find the source node
(strictly, the node-set, but here it will comprise a single node) in the source
tree that the template matches. This will be the document root. The second line
of the template moves us on to step 3 and indicates that we will execute
whatever templates apply to the children of this node. The document root has
two children the XML declaration and the <Catalog>
element.
Looking through the stylesheet, there is no template for the
XML declaration (XSLT does not give us access to this node), but there is one
for the <Catalog>
element. Processing a document using XSL is a recursive process, and we are now
back to step 1 with a new template. Here is the template:
<xsl:template
match="Catalog">
<Books>
<xsl:apply-templates/>
</Books>
</xsl:template>
This contains some text, which looks like another element
called <Books>.
As our diagram indicates, we will transform this into a result node at step 3.
It also contains an <xsl:apply-templates/> instruction, so we will
again look for templates to execute matching the child nodes.
The only children of the <Catalog>
element are the two <Book>
elements, so we will read the template for these elements and go round the
circle again. Here is the template:
<xsl:template match="Book">
<Book>
<xsl:value-of select="Title"/>, <xsl:value-of
select="PubDate"/>
</Book>
</xsl:template>
This time, for each <Book>
element we are creating
a <Book>
element in the result tree. Into this, we are placing the value of the <Title>
element, then some literal text comprising a comma and a space, then the value
of the <PubDate>
element.
Note that the value of an element in XSLT is not
the same as with the Document Object Model (DOM). With the DOM, the value of an
element is always null, while in XSLT it is the text between the start and end
tags.
At this point we stop since we have no more <xsl:apply-templates/>
elements. This means that no other elements in the source document will get
processed, but then that's what we wanted.
So how are we constructing the result tree? Let's work this
one from the bottom up. When we execute the template for <Book>,
we create the new <Book> element, and then replace the line:
<xsl:value-of
select="Title"/>,
<xsl:value-of select="PubDate"/>
with the result of evaluating the statements. For the first
book, that will be:
Designing Distributed Applications, May 1999
So overall, our result node will look like:
<Book>Designing Distributed
Applications,
May 1999</Book>
Since we have two <Book> elements in the source tree, we will get two <Book> elements in the result tree:
<Book>Designing Distributed Applications, May 1999</Book>
<Book>Professional ASP 3.0, October 1999</Book>
Similarly, in the template for <Catalog>,
we will replace the line:
with the results generated by executing the instruction.
This will put the two <Book> elements we have created inside a <Books>
element. The result tree now looks like this:
<Books>
<Book>Designing Distributed Applications, May 1999</Book>
<Book>Professional ASP 3.0, October 1999</Book>
</Books>
I have added the line breaks and formatting to make the
output look better.
Moving back to the first template we came across, the one
for the document root, we can see that this adds no further content, so our
output is exactly as I have just shown.
In our processing model, we
now break out of our cycle and format the output (step 5). In this case, we
have no formatting, so there is no further processing. Later in this book, we
will see how we can format using a standard web-browser and HTML, or using the
Formatting Objects part of the XSL specification (XSL-FO).
Note that the XSLT
specification
says that "
XSLT is not intended
as a completely general-purpose XML transformation language. Rather it is
designed primarily for the kinds of transformations that are needed when XSLT
is used as part of XSL." However, in the majority of cases, XSLT is used
independently of XSL-FO, just as we are doing here and will do again when we
produce HTML using XSLT. The specification acknowledges this with the statement
"
XSLT is also designed to be used independently of XSL."
Using any of the processors described in Appendix E we can
run the XSLT stylesheet with the XML. For example, if we now invoke XT with the
command line:
xt shortcatalog.xml TitleAndDate.xsl TitleAndDate.xml
we produce a file TitleAndDate.xml
with the content:
<?xml version="1.0"
encoding="utf-8"?>
<Books>
<Book>Designing Distributed Applications, May 1999</Book>
<Book>Professional ASP 3.0, October 1999</Book>
</Books>
XT has put an XML declaration at the top, but otherwise it
is exactly as we generated ourselves.