|
Generating MyRubberbandsML
Now we have examined the existing database,
outlined a suitable voice interface for it, and defined our source markup language
and the method for generating it. Now, we are ready to create a stylesheet to convert
it to a VoiceXML form implementing the design we decided on in the previous section
Designing A Voice Interface. This stylesheet, myrubberbands2vxml.xsl, is quite lengthy,
and can be found in its entirety in the code download. Here, I shall pick out just
the important points in the code for discussion; including the dynamic generation
of grammars, some VoiceXML features worthy of particular attention, and fundamental
XSL concepts used.
VoiceXML Stylesheet
Note that the stylesheet is designed to produce
a single VoiceXML document containing just one user's data. So, its top-level template
only matches documents where the top-level attribute export_type is set to single.
The indent attribute on the <xsl:output> tag will produce a well-formatted
result document that will be easier for a human brain to examine.
<?xml version = "1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" encoding="ISO-8859-1" indent="yes"/>
<xsl:template match="myrubberbands[@export_type='single']/customer_record">
<vxml version="1.0">
<meta name="author" content="Underpaid Myrubberbands Engineer"/>
<meta name="copyright" content="Copyright (C) 2001 Myrubberbands.com"/>
|
The next block illustrates one way XSL can generate elements with an attribute having
dynamic content: the <xsl:element> construct.
<xsl:element name="meta">
<xsl:attribute name="name">description</xsl:attribute>
<xsl:attribute name="content">Voice Interface for #<xsl:value-of
select="customer/@id"/>
</xsl:attribute>
</xsl:element>
|
We will need to set up some variables for use in the VoiceXML
document. First off, we grab the user's Automatic Number Identification (ANI) and
Dialed Number Identification Service (DNIS) for later use. These correspond to the
phone number that originated the call (analogous, but not identical to, the consumer
caller ID service) and the number that the user dialed. The implementation of these
is system dependent, and the data may not be available for all calls in any case.
They are included here mainly for illustration. In a real application, the ANI can
be used for auto-identification of the user.
The form_pointer variable will be used for navigation later.
<var name="customer_ani" expr="session.telephone.ani"/>
<var name="customer_dnis" expr="session.telephone.dnis"/>
<var name="session_error_count" expr="0"/>
<var name="form_pointer" expr="'mainMenu'"/>
<var name="user_command" expr="''"/>
|
Next come the form level help dialogs, which here attempt to mimic typical responses
likely from a real life call center, contrary to the advice of Chapter 6:
<help count="1">
What seems to be the trouble?
<reprompt/>
</help>
<help count="2">
Come on - isn't this easy enough to understand?
<reprompt/>
</help>
<help count="3">
Hey <xsl:value-of select="customer/firstname"/>, are
you stupid or something?
<reprompt/>
</help>
|
The design specifies that the main menu command is always available. We can implement
that with a global VoiceXML <link> element:
<link next="#mainMenu">
<grammar type="application/x-jsgf"> main menu </grammar>
</link>
|
Again, we use the information from the XML file to customize
the prompts for the user. This form welcomeMessage corresponds to the "welcome
message" box in the interface design diagram earlier.
<form id="welcomeMessage">
<block>
<prompt bargein="false" timeout="0.1s">
Hello, <xsl:value-of select="customer/firstname"/>.
Welcome to the my rubber bands dot com voice
order status system.
</prompt>
<goto next="#mainMenu"/>
</block>
</form>
|
Next, we come to the mainMenu form, the primary form of
the voice application. There is a form level <nomatch> element here to transfer
control flow to the errorHandler form when an utterance doesn't match the grammar.
This form is used for most no-match events throughout the application, to keep track
of the total number of such errors that have occurred this session. The <noinput>
handler here ensures the main menu is repeated when the user doesn't respond to
the prompt. In a future version of the product, the designers may implement some
kind of timeout to restrict the number of loops, and disconnect the user if there
is no response for a long time, but this issue need not concern us now.
<form id="mainMenu">
<nomatch>
<goto next="#errorHandler"/>
</nomatch>
<noinput>
<goto next="#mainMenu"/>
</noinput>
<block>
<assign name="form_pointer" expr="'mainMenu'"/>
</block>
<field name="userSaid">
<prompt bargein="true" timeout="3s">
This is the main menu.
<break msecs="500"/>
You can say product list to hear a list of products.
<break msecs="500"/>
You can say order status to check your order
status.
<break msecs="500"/>
You can say frequently asked questions to get
more
information.
<break msecs="500"/>
You can always say main menu to return to this
menu, or help for additional help.
</prompt>
<grammar type="application/x-jsgf">
list
product list |
more information |
frequently asked questions |
questions |
order status
</grammar>
|
In order to keep track of the currently active form, the
global variable form_pointer is set. This could be used to implement specific navigation
logic, for example, by changing the behavior of a command slightly depending on
where the command originated. Once the prompt has been played, and the user has
responded with an utterance matching the inline grammar, the <filled> handler
for this <field> is entered. This copies the value returned by the grammar
to the global variable user_command, which is used by the navigator form to direct
control flow to the required form. We could also use <subdialog> and pass
a parameter, but this is simpler, and sufficient for this application.
<filled>
<assign name="user_command" expr="userSaid"/>
<goto next="#navigator"/>
</filled>
</field>
</form>
|
Now our main menu form is finished, we can start to implement
the dialogs that provide the application's basic functionality. First up is the
orderStatus form that makes the most extensive use of dynamic content generation:
<form id="orderStatus">
<noinput>
<assign name="user_command" expr="'main
menu'"/>
<goto next="#navigator"/>
</noinput>
<nomatch>
<goto next="#errorHandler"/>
</nomatch>
<field name="userSaid">
|
We need a test here to check that the customer does indeed
have outstanding orders, and play a message to that effect:
<prompt bargein="true" timeout="1s">
<xsl:if test="count(/myrubberbands/customer_record/order_history/order)
!= 0">
This is a list of all your orders.
</xsl:if>
<xsl:if test="count(/myrubberbands/customer_record/order_history/order)
= 0">
You have not placed any orders within the last
thirty days.
</xsl:if>
<break msecs="500"/>
|
The next block creates a prompt for each order that was
in the source XML document by using the XSLT element <xsl:for-each> to select
XML elements that match the XPath in its attribute. The sayas attribute from the
<order_date> element in our XML file is used here to provide an audio cue
to identify the order to the user.
<xsl:for-each select="order_history/order">
<sayas class="date">
<xsl:value-of select="order_date/@sayas"/>,
</sayas>
you placed an order. Say order number
<xsl:value-of select="position()"/>
to hear more about it.
<break msecs="500"/>
</xsl:for-each>
</prompt>
<grammar type="application/x-jsgf">
list |
product list |
more information |
frequently asked questions |
questions |
order status |
|
The grammar also includes an option dynamically generated
by the XSLT code. The user can say, "order number one" to access data
on the first order in their list. This JSGF could be improved to accept shorter
instructions, such as "order one", or even "one", but be aware
that using "order number" followed by the number will help the ASR system
correctly identify the user utterance, and will probably improve application performance
in this situation.
In this chapter, all the grammars are inline, but XSLT
could just as easily be used to create standalone external grammars in separate
files, or to generate grammars in multiple formats (GSL, JSGF, XML) from a common
data source, by the application of different stylesheets.
<xsl:for-each select="order_history/order">
order number <xsl:value-of select="position()"/>
|
</xsl:for-each>
buy me
</grammar>
<filled>
<assign name="form_pointer" expr="'productList'"/>
<assign name="user_command" expr="userSaid"/>
<goto next="#navigator"/>
</filled>
</field>
</form>
|
Just as we iterated over the order_history/order node-set
to create grammars with options for all the orders of a customer, we must now iterate
over these nodes again to create individual forms containing the details for each
order. If there were consistently a very large number of orders per user, we might
look at generating the order details on the fly using JSP or PHP to create a separate
VoiceXML document.
<xsl:for-each select="order_history/order">
<xsl:variable name="order_detail_counter" select="position()"/>
<form id="order_{$order_detail_counter}">
|
The block above illustrates the other way XSLT offers
for including dynamic content in an attribute value, this time through the <xsl:variable>
element, the value of which can be output by preceding the identifier with the dollar
sign, $. This technique is used several times when creating the VoiceXML forms for
each order.
<noinput>
<!-- This falls through to order status top level -->
<assign name="user_command" expr="'order
status'"/>
<goto next="#navigator"/>
</noinput>
<nomatch>
<goto next="#errorHandler"/>
</nomatch>
<block>
<assign name="form_pointer" expr="'order_{$order_detail_counter}'"/>
</block>
<field name="userSaid">
|
The following code generates the prompt to announce the order detail:
<prompt bargein="true" timeout="1s">
This order was placed on <sayas class="date">
<xsl:value-of select="order_date/@sayas"/></sayas>.
<break msecs="500"/>
The order consisted of
<xsl:for-each select="product">
quantity <xsl:value-of select="./@quantity"/>
of product
|
The <xsl:value-of> element below retrieves the name
of the product with an XPath expression that selects products from the XML product_list
section where the id attribute matches the id attribute of the current order. This
is analogous to an SQL join between the customer_order and customer_order_product
tables from relational schema. Note that XPath denotes an attribute with use of
the at symbol, @.
<xsl:value-of select="/myrubberbands/product_list/product[@id=current()/@id]"/>
<break msecs="500"/>
</xsl:for-each>
The total of the order was
|
Next, we see the VoiceXML <sayas> element put to
use. Whether or not this causes the contents to be rendered correctly as currency
depends on the TTS engine used, and its support for pronunciation markup. The rest
of the block repeats the previous technique to add order options to the inline <grammar>
element.
<sayas class="currency">
$<xsl:value-of select="total_charge"/></sayas>.
The status of this order is
<xsl:value-of select="order_status"/>.
</prompt>
<grammar type="application/x-jsgf">
list |
product list |
more information |
frequently asked questions |
questions |
order status |
<xsl:for-each select="/myrubberbands/customer_record/order_history/order">
order number <xsl:value-of select="position()"/>
|
</xsl:for-each>
buy me
</grammar>
<filled>
<assign name="form_pointer" expr="'productList'"/>
<assign name="user_command" expr="userSaid"/>
<goto next="#navigator"/>
</filled>
</field>
</form>
</xsl:for-each>
|
At this point we'll skip over the product list option,
because it doesn't illustrate anything we haven't already seen. However, the product
listing prompt offers the user the option to say, "Buy me", in which case
their call is transferred to the MyRubberbands.com's telephone service center by
the placeOrder form below. This is the quickest way for the company to add some
commerce capability to the voice system, but note that the VoiceXML <transfer>
element is not implemented by all platforms. When the user returns from the call,
they are unconditionally sent to the main menu by the subsequent <goto> element.
<form id="placeOrder">
<block>
<prompt bargein="false" timeout="1s">
Transferring your call to customer service.
</prompt>
</block>
<transfer name="callSales" dest="MYRUBBERBND"
connecttimeout="30s"
bridge="true"/>
<block>
<goto next="#mainMenu"/>
</block>
</form>
|
We can also skip over the frequently asked questions option,
since it contains only static VoiceXML code. We are now almost at the end of our
stylesheet, where the navigator form is located, holding the navigation logic for
many of the state transitions in our interface design.
It consists simply of an if-else construct that expresses
the state transitions on the diagram of the user interface earlier. If more commands,
states, or transitions were to be added to the design, the complexity of the interface
might exceed the limitations of this method of implementation.
<form id="navigator">
<block>
<if cond="user_command == 'product list' || user_command
== 'list'">
<goto next="#productList"/>
<elseif cond="user_command == 'questions'
||
user_command == 'frequently
asked questions' ||
user_command == 'more
information'"/>
<goto next="#FAQ"/>
<elseif cond="user_command == 'order status'"/>
<goto next="#orderStatus"/>
<elseif cond="user_command == 'buy me'"/>
<goto next="#placeOrder"/>
|
The stylesheet drives the rest of the navigation engine
by creating forms for each individual order in this customer's order history. It
does this by means of the following <xsl:for-each> construct:
<xsl:for-each select="order_history/order">
<xsl:variable name="order_counter" select="position()"/>
<elseif cond="user_command == 'order number {$order_counter}'"/>
<goto next="#order_{$order_counter}"/>
</xsl:for-each>
<else/>
<goto next="#mainMenu"/>
</if>
</block>
</form>
|
Finally, we come to the errorHandler form that is used
by most of the <nomatch> handlers in the dialog. This error handler keeps
a running count of the number of errors, and when four errors have occurred, it
will apologize and disconnect the user. This isn't the friendliest way of handling
errors, and is not suitable for a long-range solution that would more likely transfer
the user to a human operator at this point. Not only that, but we'd probably want
more sophisticated logic for processing errors, maybe using ECMAScript to vary the
response according to the time elapsed since the last error.
<form id="errorHandler">
<block>
<assign name="session_error_count"
expr="session_error_count
+ 1"/>
<if cond="session_error_count < 4">
<prompt bargein="false" timeout="0.1s">
I'm sorry, but I'm unable to understand
you.
</prompt>
<if cond="session_error_count >
2">
<prompt bargein="false"
timeout="0.1s">
It seems I am having
trouble.
</prompt>
</if>
<goto next="#navigator"/>
<else/>
<prompt bargein="false" timeout="0.1s">
I'm sorry, but I'm having a lot
of difficulty understanding you. If you
are currently in a noisy environment,
please call back later.
</prompt>
<exit/>
</if>
</block>
</form>
|
All that needs to be done now is to close the document,
after including an empty template matching standalone <product_list> elements,
to suppress any output from them. Without this, default XSLT templates would be
applied that output text children of any elements that aren't explicitly matched
by a template already.
</vxml>
</xsl:template>
<xsl:template match="product_list"/>
</xsl:stylesheet>
|
We've now reached the end of our VoiceXML stylesheet.
Running the Stylesheet
Now it is time to run the stylesheet transform, and produce a complete
VoiceXML document for one of our users. If you are using Saxon, enter the following
command at the command prompt:
C:\> saxon customer_1.xml myrubberbands2vxml.xsl > customer_1.vxml
We are now ready for our VoiceXML interface to go live. This simply requires us
to upload the result of our XSLT transformation to our chosen Voice gateway.
Summary
Perhaps the greatest advantage to implementing an XSLT-based
system such as this is that documents in other markup languages can be readily generated
from the same XML source data. The complete version of this chapter goes on to demonstrate
how easy it would be to use MyRubberBandsML with an appropriate XSLT stylesheet
to produce WML and HTML interfaces in addition to the above VoiceXML code.
The full chapter explores the process of using XSLT to
provide user multiple interfaces to a legacy database that does not provide native
XML access to data. Starting with a set of requirements for a voice interface taken
from our business needs, an XML format to mark up the legacy data is created, along
with an associated XML Schema to document it. We then create XSL templates to automatically
generate our VoiceXML interface, and run this VoiceXML inside a simulator, before
moving on to look at stylesheets that automatically produce parallel WML and HTML
interfaces.
Copyright and Authorship Notice
This is an abridged chapter extract and
was written by Stephen Breitenbach. It is taken from "Early Adopter VoiceXML" by
Eve Astrid Andersson, Stephen Breitenbach, Tyler Burd, Nirmal Chidambaram, Paul
Houle, Daniel Newsome, Xiaofei Tang, and Xiaolan Zhu, published by Wrox Press Limited
in August 2001; ISBN 1861005628; copyright © Wrox Press Limited 2001; all rights
reserved.
No part of this chapter may be reproduced, stored in a retrieval system or transmitted
in any form or by any means -- electronic, electrostatic, mechanical, photocopying,
recording or otherwise -- without the prior written permission of the publisher,
except in the case of brief quotations embodied in critical articles or reviews.
|