O'Reilly NetworkO'Reilly.com
BooksAll ArticlesSafari BookshelfO'Reilly GearFree NewslettersSearch
Web Services DevCenter Tech Jobs | Forum | Articles

O'Reilly Network: Web Services DevCenter












Subject-specific articles, news, and more:

O'Reilly Network subject-specific sites

Subject-specific articles, news, and more
Open Source
Web Development

Web Services for Bioinformatics

by Ethan Cerami, author of Web Services Essentials.

At the January 2002 O'Reilly Bioinformatics conference, Lincoln Stein delivered a keynote address on "Building a Bioinformatics Nation." In this talk, Lincoln argued that current biological databases are islands unto themselves, much like the Italian city states of the Middle Ages. He also proposed that a more formalized Web Service model could link disparate systems, and thereby create a more unified set of bioinformatics tools and databases. (For more on Lincoln's talk see his recent Nature article).

This article follows up on Lincoln's talk and explores two bioinformatic services you can try out today. By examining these specific services, we get a bird's eye view of the Web Service protocol stack, including WSDL and SOAP. Looking at working services also provides much food for thought. For example, the recently released Google API provides a glimpse of the future of business Web Services. In much the same vein, the two examples discussed here offer a glimpse of the future of bioinformatic services.

This article assumes you are familiar with the basic terminology of Web Services. If you need a quick introduction, check out my Web Services FAQ. For an introduction to Web Services for bioinformatics, take a look at Lincolnís PowerPoint slides from the O'Reilly conference.


Our first example is the XEMBL service from the European Bioinformatics Institute. XEMBL provides complete access to the EMBL Nucleotide Sequence Database. This database is produced in collaboration with GenBank and the DNA Database of Japan, and currently provides access to over 16.8 million records, consisting of 19.6 billion nucleotides (see EMBL Database Stats.) It also provides access to completed genomes, including the human genome, the fruit fly, and C. elegans.

XEMBL is a recently released interface that provides easy XML access to the complete EMBL database. Access is provided via two main methods. The first is a REST-like interface whereby users specify parameters within a URL, and XEMBL returns a complete XML document. The second is a SOAP interface whereby users specify parameters within SOAP messages and XEMBL returns a complete XML document within a SOAP response.

Related Reading

Web Services Essentials
Distributed Applications with XML-RPC, SOAP, UDDI & WSDL
By Ethan Cerami

In responding to the current debate between REST and SOAP, you can see that the XEMBL group has not taken sides, and simply chosen both. This is in line with one of Lincoln's main points -- databases should provide multiple modes of access to data, from HTML, XML, and SQL, all the way to SOAP.

For the REST-like or SOAP interfaces, XEMBL expects two main parameters: an ID and a format. The ID specifies a unique international accession code; for example, SC49845 specifies the AXL2 gene in baker's yeast. The format indicates the XML format of the returned document. Two format options are currently supported: BSML (Bioinformatics Sequence Markup Language) and AGAVE (Architecture for Genomic Annotation, Visualization and Exchange). Other formats, including GAME and BIOML, are planned for future releases.

Accessing the XEMBL REST Interface

To access the XEMBL REST interface, you simply need to specify the XEMBL URL and specify the ID and format as URL parameters. For example, this URL: http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?id=SC49845&format=Bsml retrieves the SC49845 record in BSML format.

To create a Java client to XEMBL, you can easily use any number of XML parsers. Example 1 below illustrates the use of JDOM. The program expects two command-line arguments: an ID followed by an XML format.

Example 1: XEMBLClient, Version 1: REST Interface

package com.ecerami.bio;

import java.lang.StringBuffer;
import org.jdom.input.SAXBuilder;
import org.jdom.JDOMException;
import org.jdom.Document;
import org.jdom.output.XMLOutputter;

* Sample XEMBL Client Program using JDOM
* For details regarding XEMBL, go to:  http://www.ebi.ac.uk/xembl/
public class XEMBLClient1 {
	private String baseURL = "http://www.ebi.ac.uk/cgi-bin/xembl/XEMBL.pl?";

	public XEMBLClient1 (String id, String format) throws Exception {
		System.out.println ("Connecting to XEMBL...");
		System.out.println ("Retrieving ID:  "+id);
		System.out.println ("Format:  "+format);
		connect (id, format);

	private void connect (String id, String format) throws Exception {
		//  Build document;  validation is turned off
		SAXBuilder builder = new SAXBuilder (false);

		//  Do not load External DTDs
			"http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

		//  Create XEMBL URL;  append id and format
		StringBuffer url = new StringBuffer (baseURL);
		url.append ("id="+id);
		url.append ("&format="+format);

		System.out.println ("Using URL:  "+url.toString());
		Document doc = builder.build (url.toString());
		XMLOutputter outputter = new XMLOutputter();
		outputter.output(doc, System.out);

	public static void main (String[] args) throws Exception {
		if (args.length != 2) {
			System.out.println ("Usage:  XEMBLClient1 [ID] [Format]");
			System.out.println ("Where Format is:  Bsml or sciobj (for AGAVE)");
		XEMBLClient1 client = new XEMBLClient1(args[0], args[1]);

As you can see in Example 1, you access XEMBL by specifying the base URL and appending the id and format parameters. JDOM takes care of the rest by downloading the specified XML file, parsing its contents, and making the contents available to your application. In Example 1, the code simply outputs the contents of the XML file, but you can also use JDOM to extract any specific elements within the returned XML document.

Pages: 1, 2

Next Pagearrow

Sponsored by:

Get Red Hat training and certification.

Contact UsMedia KitPrivacy PolicyPress NewsJobs @ O'Reilly
Copyright © 2000-2006 O’Reilly Media, Inc. All Rights Reserved.
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.
For problems or assistance with this site, email

Have you seen Meerkat?