XML Processing with TRaXby Craig Pfeifer
Currently, we have standard APIs for representing an XML document as a tree of objects through the W3C's DOM specification, and as a series of events through the SAX API. JAXP 1.0 gave us a standard Java API for XML parsers, and JAXP 1.1 expands on this to include a standard API for XSLT engines. This standard API is the Transformation API for XML, or TRaX for short. I will cover TRaX basic usage and explain the top-level interfaces to show how powerful this API is. The specific TRaX implementation that I am working from is the Xalan-Java 2 XSLT processor from the Apache project.
This article assumes an awareness of the major facilities for processing and representing XML documents (DOM, SAX and XSLT), but it is not specific to these technologies.
Purpose of TRaX
The TRaX API extends the original JAXP mission to include XML transformations: provide a vendor and implementation agnostic standard Java API for specifying and executing XML transformations. This is important to note, because TRaX is more than just a standard interface for XSLT engines -- it is designed to be used as a general-purpose transformation interface for XML documents. The TRaX specification is a product of the JAXP 1.1 API, Java Specification Request #63.
TRaX isn't a competitor to the existing DOM, JDOM and SAX APIs used to represent and process XML, but a common Java API to bridge the various XML transformation methods (a la JDBC, JNDI, etc.) including SAX Events and XSLT Templates. In fact, TRaX relies upon a SAX2- and DOM-level-2-compliant XML parser/XSLT engine. JAXP 1.0 allows the developer to change XML parsers by setting a property, and TRaX provides the same functionality for XSLT engines.
Here is a sample of how to apply an XSLT stylesheet to an XML document and write the results out to a file. In this example, both the stylesheet and the XML document exist as files, but they could just as easily have come from any Java InputStream or Reader class. The same follows for the results of the transformation; I could've just as easily written the results out to any Java OutputStream or Writer class.
// create the XML content input source:
// can be a DOM node, SAX stream, or any
// Java input stream/reader
String xmlInputFile = "myXMLinput.xml";
Source xmlSource = new StreamSource(new FileInputStream(xmlInputFile));
// create the XSLT Stylesheet input source
// can be a DOM node, SAX stream, or a
// java input stream/reader
String xsltInputFile = "myXsltStylesheet.xsl";
Source xsltSource = new StreamSource(new
// create the result target of the transformation
// can be a DOM node, SAX stream, or a java out
String xmlOutputFile = "result.html";
Result transResult = new StreamResult(new
// create the transformerfactory & transformer instance
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer(xsltSource);
// execute transformation & fill result target object
The first three stanzas simply establish our inputs and result targets, and aren't that interesting, with one exception. Notice that the XSLT stylesheet isn't handled via a different class in TRaX. It's treated just like any other XML source document, because that's exactly what it is. We use the stream implementations of the Source and Result interfaces from the
javax.xml.transform.stream package to handle reading the data from our file streams.
In the fourth stanza, we use the TransformerFactory to get an instance of a Transformer, and then use the Source instance for the XSLT stylesheet we created in the second stanza to define the transformation that this transformer will perform. A Transformer actually executes the transformation and assembles the result. A single Transformer instance can be reused, but it is not thread-safe.
In this example, the XSLT stylesheet is reprocessed for each successive transformation. A very common case is that the same transformation is applied multiple times to different Sources, perhaps in different threads. A more efficient approach in this case is to process the transformation stylesheet once, and save this object for successive transformations. This is achieved through the use of the TraX Templates interface.
Templates Code Example
// we've already set up our content Source instance, // XSLT Source instance, TransformerFactory, and // Result target from the previous example // process the XSLT stylesheet into a Templates instance // with our TransformerFactory instance Templates t = tf.newTemplates(xsltSource); // whenever you need to execute this transformation, create // a new Transformer instance from the Templates instace Transformer trans = t.newTransformer(); // execute transformation & fill result target object trans.transform(xmlSource, transResult);
While the Transformer performs the transformation, a Templates instance is the actual run-time representation of the processed transformation instructions. Templates instances may be reused to increase performance, and they are thread-safe. It might seem odd that an interface has a plural name, but it stems from the fact that an XSLT stylesheet consists of a collection of one or more
xsl:template elements. Each template element defines a transformation in that stylesheet, so it follows that the simplest name for a representation of a collection of template elements is Templates.
Pages: 1, 2