|Tech Jobs | Forum | Articles|
Web Services for Bioinformaticsby Ethan Cerami, author of Web Services Essentials.
At the January 2002 O'Reilly Bioinformatics conference, Lincoln Stein delivered a keynote address on "Building a Bioinformatics Nation." In this talk, Lincoln argued that current biological databases are islands unto themselves, much like the Italian city states of the Middle Ages. He also proposed that a more formalized Web Service model could link disparate systems, and thereby create a more unified set of bioinformatics tools and databases. (For more on Lincoln's talk see his recent Nature article).
This article follows up on Lincoln's talk and explores two bioinformatic services you can try out today. By examining these specific services, we get a bird's eye view of the Web Service protocol stack, including WSDL and SOAP. Looking at working services also provides much food for thought. For example, the recently released Google API provides a glimpse of the future of business Web Services. In much the same vein, the two examples discussed here offer a glimpse of the future of bioinformatic services.
This article assumes you are familiar with the basic terminology of Web Services. If you need a quick introduction, check out my Web Services FAQ. For an introduction to Web Services for bioinformatics, take a look at Lincolnís PowerPoint slides from the O'Reilly conference.
Our first example is the XEMBL service from the European Bioinformatics Institute. XEMBL provides complete access to the EMBL Nucleotide Sequence Database. This database is produced in collaboration with GenBank and the DNA Database of Japan, and currently provides access to over 16.8 million records, consisting of 19.6 billion nucleotides (see EMBL Database Stats.) It also provides access to completed genomes, including the human genome, the fruit fly, and C. elegans.
XEMBL is a recently released interface that provides easy XML access to the complete EMBL database. Access is provided via two main methods. The first is a REST-like interface whereby users specify parameters within a URL, and XEMBL returns a complete XML document. The second is a SOAP interface whereby users specify parameters within SOAP messages and XEMBL returns a complete XML document within a SOAP response.
In responding to the current debate between REST and SOAP, you can see that the XEMBL group has not taken sides, and simply chosen both. This is in line with one of Lincoln's main points -- databases should provide multiple modes of access to data, from HTML, XML, and SQL, all the way to SOAP.
For the REST-like or SOAP interfaces, XEMBL expects two main parameters: an ID and a format. The ID specifies a unique international accession code; for example, SC49845 specifies the AXL2 gene in baker's yeast. The format indicates the XML format of the returned document. Two format options are currently supported: BSML (Bioinformatics Sequence Markup Language) and AGAVE (Architecture for Genomic Annotation, Visualization and Exchange). Other formats, including GAME and BIOML, are planned for future releases.
Accessing the XEMBL REST Interface
To access the XEMBL REST interface, you simply need to specify the XEMBL URL and specify the ID and format as URL parameters. For example, this URL: http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?id=SC49845&format=Bsml retrieves the SC49845 record in BSML format.
To create a Java client to XEMBL, you can easily use any number of XML parsers. Example 1 below illustrates the use of JDOM. The program expects two command-line arguments: an ID followed by an XML format.
Example 1: XEMBLClient, Version 1: REST Interface
As you can see in Example 1, you access XEMBL by specifying the base URL and appending the
Copyright © 2000-2006 OReilly Media, Inc. All Rights Reserved.
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.
For problems or assistance with this site, email