XML to PDF? Oh, FOP It.by Vikram Goyal
Formatting Objects Processor (FOP) is an open source Java API that can convert your XML data into reports in PDF format, as well as such other relevant formats as TXT, SVG, AWT, MIF, and PS. The software is developed under the Apache XML project and is free to use.
This article shows your how to get started with FOP. The primary advantage of FOP is its ability to convert XML data into reports in the PDF format, using a formatting tree. Most of the examples we'll cover will concentrate on this particular conversion, but we will also cover converting XML data to the Java AWT format.
This article is aimed at developers who are comfortable with XML and XSLT. For more information on XML head over to XML.com.
FOP can be downloaded from the FOP distribution directory. It is available in a bundle as a .gzip file in two distributions. The fop-0.20.4-src distribution contains the source code, so that you can do a build yourself using Ant. The fop-0.20.4-bin distribution contains only the binary distribution, without the source code and the Javadocs.
Extract the source distribution into a directory of your choice. The extraction will create a main directory named fop-0.20.4 and subdirectories build, conf, docs, hyph, lib, and src.
build contains the latest FOP build as the fop.jar binary distributable file, which is the file that needs to be in your application's
conf contains certain configuration files that we will discuss later.
docs contains various examples, documentation, and some graphics.
hyph contains hyphenation information for different languages.
lib contains all of the external .jar files that are required for running FOP itself. These include Avalon, Batik, Xalan, and Xerces. Since this article concentrates on FOP, I will not discuss these APIs further. Suffice it to say that FOP uses these APIs in one way or another. This affects our application deployment, as all of these libraries need to be in our application's
CLASSPATHat deployment time as well.
src contains all of the source code.
Figure 1. FOP Architecture.
FOP is a tool that understands formatting objects as specified by the World Wide Web Consortium in the XSL specification. The first part of this specification deals with XSLT transformations. We are interested in the second part, which deals with what we call formatting objects (FO). This part of the spec defines output-independent formatting objects, which compose a vocabulary for style and layout of a document. For example, one of the formatting objects is
fo:simple-page-master, which specifies a page template and its relevant properties (margins, headers, etc.). This way, tools like FOP can read this information and render it to the desired output (PDF/TXT). The main point is that the same styling information can be used to produce different outputs.
An FO document is simply an XML document. Its namespace is defined at the W3C Web site. It may contain any of the elements from this namespace. You can manually create this document and specify exact values for each and every element that should be in the output. The more common approach, however, is to write an XSLT stylesheet to take your XML data file, transform it according to your stylesheet rules, and produce the final FO document. Dynamically-generated data can be combined with an existing stylesheet to produce the FO document.
Although the main idea of FOP is to work on the FO document, it can take over the task of transforming the existing data (XML) using a stylesheet. Let's say you have your business data in XML format and stylesheet information in the form of an XSL file. If you supply these two to FOP, FOP will convert this information to a temporary FO document and render it to your desired output.
A Simple Example
Download the example files for this article. This .zip file contains the following:
Enough with the theory. Lets get our hands dirty by running FOP. Open a command window and navigate to the directory where you installed FOP. The root FOP directory contains two executables: a shell script for Unix systems and a batch file for Windows, which enables running FOP from the command line. Based on your system, execute the relevant script. FOP will complain that no input was specified and gives you some example usage scenarios. Good -- this means that you can now start playing with it.
Let us start by creating a simple FO file. If you want to look at the end result, look at krusty.fo.
As I said earlier, a FO file is simply a well-formed XML file. So open up your text editor and the first line in it will be:
<?xml version="1.0" encoding="utf-8"?>
All FO files must have the outermost element as
root element is followed by the
<fo:layout-master-set>, which specifies the layout of the pages in our document.
<fo:layout-master-set> <fo:simple-page-master master-name="simple" page-height="29.7cm" page-width="21cm" margin-top="1cm" margin-bottom="2cm" margin-left="2.5cm" margin-right="2.5cm"> <fo:region-body margin-top="3cm"/> <fo:region-before extent="3cm"/> <fo:region-after extent="1.5cm"/> </fo:simple-page-master> </fo:layout-master-set>
As you can see, the master layout set contains definitions of different, simple page layouts on which content can be placed. In our case, we have defined a single simple page master where the attributes tell us that simple page master, the name of which we have given as
simple (this is the name that will be used to reference it), has a page height of 29.7 cm, page width of 21 cm, top margin of 1 cm, and so on. We can define as many simple page masters as we want and give them different names to reference them later.
Now that we have defined how our pages will look like in terms of alignment and size, we need to define the actual content holders for our content. This is where we use the
Notice that while defining the page sequence, we reference it to the simple page master, called
simple, that we have defined earlier. This means that our content will be in a page constrained by the
simple page master boundaries. The actual content can now be placed in the
<fo:flow> element. The
<fo:block> element, within the
<fo:flow> element, starts a paragraph and defines the properties of each paragraph. So for our heading "Krusty the Clown," we want a sans serif font, the background color of blue, and the text center-aligned. Similarly, for the next block, we want the font size to be 12 pt. and the text alignment to be justified.
<fo:flow flow-name="xsl-region-body"> <fo:block font-size="18pt" font-family="sans-serif" line-height="24pt" space-after.optimum="15pt" background-color="blue" color="white" text-align="center" padding-top="3pt"> Krusty the Clown </fo:block> <fo:block font-size="12pt" font-family="sans-serif" line-height="15pt" space-after.optimum="3pt" text-align="justify"> This memo explains why Krusty the Clown is our best customer. We need to take good care of him from now onwards and make sure that there are always enough bananas for his pet monkey. </fo:block> </fo:flow>
Finally, close all opened tags and save the file as krusty.fo in FOP root directory.
It's time to see the FOP magic. In the FOP root directory, type the following command:
fop krusty.fo krusty.pdf
FOP will run and transform our krusty.fo file into a krusty.pdf document in the same folder. Open it by double-clicking on it and check that final outcome is exactly the way we wanted it. Play with the FO file and make changes to it and see how it affects the outcome. Start with changing the text (our content), and then try changing the style, the margins, the color, the font, etc., and see how it all changes.
Pages: 1, 2