|
A Sample Chapter from
|
Early Adopter VoiceXML
|
|
|
|
VoiceXML
|
|
|
VoiceXML with XSLT (HTML and WML)
This chapter examines the use of the Extensible Stylesheet
Language for Transformations (XSLT) as a tool for the generation of VoiceXML. I
intend to illustrate a complete, end-to-end example of implementing a voice interface
for a client-server database via XML and XSL. The case study will demonstrate the
power of XSL for simultaneously delivering multiple interfaces to the same data
by developing HTML and WML front-ends also.
Our case study takes us inside the rarified atmosphere
of a fictional cash-strapped dot-com called MyRubberBands.com, purporting to be
the "premier rubber bands site on the Internet". In the aftermath of the
stock market meltdown, where our heroes saw their market valuation drop by over
95%, senior management, led by CEO Dr. Todd, has decreed that adding
WML and voice functionality to the existing order status web site is do-or-die.
MyRubberband's competitors have just rolled out their own WAP/voice access solution,
and an all-out effort is necessary to catch up. Follow the programmers as they embark
on their project to quickly roll out an equivalent capability.
Our development team have decided to implement an XSLT-based
solution to the problem. XSLT is an XML-based language for transforming input structured
according to one XML vocabulary to structured output in another XML vocabulary,
or some general text form.
XSLT treats the document to be transformed as a set of
nodes. An XSLT stylesheet defines a set of rules, or templates. When a template
matches one of the nodes in the source document, the output structure given in the
template is created in the transformed document. XSLT uses the W3C XPath specification
language to query XML data. XPath is strongly analogous to SQL, and lets us specify
complex rules to match nodes in a document.
For a lot of applications, and for getting off to a quick
start when processing XML, it is just the ticket especially when you consider
that XSLT is a relatively new technology so that processors should still have plenty
of performance improvements possible. Use of an XSLT processor avoids the startup
overhead of using a full parser API from a compiled language, making it more suited
to dynamic web applications.
MyRubberBands.com – A Case Study
Our legacy database is implemented with MySQL,
an open source client-server relational database management system. SQL code for
the schema and a set of sample data is included in the code download for this book.
An XML schema is used to represent an export of the legacy
database instead of a Document Type Definition (DTD) because the schema standard
is now complete, and increasing numbers of developers will be looking for the extra
power schemas offer, especially as new development tools become available. An excellent
primer on schemas is available from the W3C at http://www.w3.org/TR/xmlschema-0/.
Scripts to export the database to XML format have been
written using the Perl language, and the Data Base Interface (DBI) library. The
Perl script shown was developed on Windows using Active State Perl, but should run
on any platform, be it Windows or Unix. Many commercial databases, such as SQLServer
2000, are capable of exporting directly to XML, and so this step could be avoided
entirely.
Business Requirements
With their competitors rolling out both voice
and WAP access to services, MyRubberBands.com has no choice but to follow suit or
lose market share in the cutthroat world of elastic band marketing. Due to market
pressures, the new system must be up and running as soon as possible, and given
this short development cycle, the requirements have been scaled back to providing
simply voice and WAP access to a customer's order status data.
However, some thought can still be given to the future.
Rather than develop a quick and dirty "throwaway" voice interface, by
putting in a little extra work now, the engineers can build a reusable infrastructure.
By exporting their database to an XML format, they access the power of XSL to create
VoiceXML and WML interfaces, and are able to transparently replace parts of the
existing HTML site with dynamically created pages.
System Architecture
The figure below is a block diagram showing the existing components
of the system, and the relationships to the new XML/XSLT system required to implement
the voice interface.
5628_08_01.vsd, a Visio drawing
This drawing is not complete in all areas. For example,
no method for user login and authentication is given, because such a system would
already exist for an e-commerce site, and because although XML/XSLT would be helpful
for creating device-specific login code, we are not going to examine on-the-fly
transformation (inside a web server, for example) in this particular study.
Designing a Voice Interface
With these rather vague requirements in mind,
we can make some design decisions, and sketch out a rough model for the voice interactivity
envisaged. The goal is to make the experience simple and intuitive.
-
There will be a main menu of options. This is the entry
point to the application, and the user can always return to it with a single voice
command.
-
Online help will always be available. This will use the
VoiceXML <help> tag to simplify implementation, and also to overload any built-in
help that may be offered by the voice platform.
-
The number of available options from the main menu should
be kept to a minimum. The total number of states should also be minimized. This
means that the behavior of the current command should not depend on what the previous
command issued was. For example, the word "menu" should always refer to
the main menu in every context.
-
The top-level commands from the main menu should always
be active. If the main menu offers the command "foo", the user should
be able to say "foo" at any point in later dialogs with the same result.
The following state diagram illustrates these design goals.
The main options are "order status", "product list" (with a
link to voice ordering via the existing phone service bureau), and "more information"
to access a frequently asked questions list. For a more detailed examination of
the issues to consider when designing voice applications, refer to Chapter 6. The
order status menu leads to a variable number of additional choices, depending on
the number of records in the user's order history.
5628_08_02.vsd
Creating a Markup Language
Naturally, our fictional rubber band team
already has a database-driven e-commerce web site. Like all legacy databases, it
has evolved over time into a hodgepodge of tables, some of which were hastily knocked
together to implement poorly-defined requirements. We will assume that the company
is operating a traditional Java Server Pages (JSP) site.
Since most of the tables in the database are relevant
to the requirements of the various interfaces, the developers plump for a "verbose"
approach to their XML. They will dump all of the data from all of the tables into
XML form, even though some of it may be unnecessary in the VoiceXML, WML, or HTML
contexts.
MyRubberbandsML by Trial and Error
The first thing any XML dialect needs
is a top-level element. Since we might want to export all the customers in the database,
or only one at a time, let's add an attribute on the top level element to describe
what kind of data feed this XML document constitutes.
<myrubberbands export_type="single">
|
The thing we are most interested is a customer record,
because that will be the set of data needed to generate the voice interface for
querying order status. Since we might have more than one customer in a file, each
individual <customer> and their associated order history will be contained
by a <customer_record> element. Note that the time stamps are in XSL's standard
format, and won't translate easily for rendering by a TTS engine. The <customer_record>
element that starts here is very lengthy, and is not closed until the associated
addresses and order history that follows have been given.
<customer_record>
<customer id="1">
<firstname>John</firstname>
<middle>Quincy</middle>
<lastname>Public</lastname>
<username_or_email>jqp@foo.foo.com</username_or_email>
<password_or_pin>bar</password_or_pin>
<date_joined>2001-05-18T16:17:15</date_joined>
<date_lastchg>2001-05-18T16:17:15</date_lastchg>
</customer>
|
As shown in the database schema, a customer can have one
or more addresses. The XML representation should preserve the foreign key relationship
with the customer table, and this relationship should not be dependent on the position
of the elements. In this case, for example, both the customer profile and all associated
addresses are nested within the <customer_record> tag. This is why all of
the <customer_address> elements carry the customer_id attribute inside: it
mirrors the relationship between the customer and customer_address tables in the
database schema.
In this case, the database schema allows the customer_address
table to store real physical addresses like billing and shipping addresses, or e-mail
address for alternate methods of customer contact. Hence, the <customer_address>
element can contain the optional <email> element.
<customer_address address_type="Ship To Address"
customer_id="1">
<address1>4321 La Place Ct</address1>
<address2>Ste 306</address2>
<city>Carlsbad</city>
<state_or_prov>CA</state_or_prov>
<postalcode>92008</postalcode>
<email></email>
<phone>7605551212</phone>
</customer_address>
|
Since the main objective of the voice application is to
allow the user access to their order history and status information, it makes sense
to include the history inside the <customer_record> scope. In this case, because
we will need to enunciate the order time, and because we'd rather not use XSL's
limited text processing capabilities, we'll add the attribute sayas to the <order_date>
element. This provides a pronunciation that can be used with the VoiceXML <sayas>
tag for TTS. However, the desired pronunciation of the date and time cannot be derived
from the database alone, as addressed in the section Generating MyRubberbandsML.
<order_history customer_id="1">
<order id="1">
<customer_address
address_type="Ship To Address"
customer_id="1"/>
<order_date sayas="May
18, 2001 at 16 17 hours">
2001-05-18T16:17:16
</order_date>
<order_status>Processing</order_status>
<tax>0.09</tax>
<shipping_charge>0.4</shipping_charge>
<total_charge>2.48</total_charge>
<product id="1" quantity="3"/>
<product id="3" quantity="1"/>
</order>
</order_history>
|
Finally, to provide the user with detailed order information,
the full product name and description must be available. This will also allow the
user to ask for a product list, and eventually we can perhaps extend the interface
to enable products to be ordered by voice. Note that the <product_list> is
not associated with any particular <customer_record>.
Also note that, ideally, the <product_list> wouldn't
actually be in the same document as the customer data. However, we'll keep everything
in one file here to avoid the issue of linking between documents. It might amuse
us to picture the harried developers reaching the same conclusion to save time and
give themselves some chance of meeting their beloved boss's deadline. Later on,
they will no doubt want to refine the process and generate smaller XML documents
that can be processed more quickly.
</customer_record>
<product_list>
<product id="1" name="MIXED1000"
price="1.99">
Mixed Bag of 1000 Rubber Bands</product>
<product id="2" name="MIXED5000"
price="4.09">
Mixed Bag of 5000 Rubber Bands</product>
<product id="3" name="RED1000" price="2.19">
Bag of 1000 Red Rubber Bands</product>
<product id="4" name="RED10000"
price="17.49">
Bag of 10000 Red Rubber Bands</product>
<product id="5" name="BLUE1000"
price="0.99">
Bag of 1000 Blue Rubber Bands</product>
<product id="6" name="BLUE10000"
price="8.99">
Bag of 10000 Blue Rubber Bands</product>
</product_list>
</myrubberbands>
|
This example is formatted to fit the space above, and for readability, adds quite
a bit of whitespace between <product></product> tags that would probably
not occur in a real document.
|