Java Trends - XML Processing with JAXP
A universal format used to represent information
XML is a universal format used to represent information. XML has already emerged as a front runner among data manipulation and data transmission tools. An XML document consists of data encapsulated within identifying tags . The tags may contain descriptive attributes about the data. The following is an XML representation of a Fruit Object. Fruits.xml is an XML file that contains entries for all the fruit examples.
<fruit> <id>1</id> <name>Apple</name> <color>red</color> <color>yellow</color> <color>green</color> <image>apple.PNG</image> <calories>75</calories> </fruit>
An XML document may be represented as a tree. The root of the tree is the base or root element, innermost elements form the leaves. Look at the following XML representation:
<BigStore> <ToysDepartment> <Trucks> <TruckBrand> <Cost>99.99</Cost> <Id>100</Id> </TruckBrand> <TruckBrand> <Cost>199.99</Cost> <Id>101</Id> </TruckBrand> </Trucks> <Dolls> <DollBrand> <Cost>129.99</Cost> <Id>102</Id> </DollBrand> <DollBrand> <Cost>79.99</Cost> <Id>103</Id> </DollBrand> </Dolls> <VideoGames> <VideoGameBrand> <Cost>29.99</Cost> <Id>104</Id> </VideoGameBrand> <VideoGameBrand> <Cost>279.99</Cost> <Id>105</Id> </VideoGameBrand> </VideoGames> . . . </ToysDepartment> . . </BigStore>
The root of the above XML document is the BigStore node. The leaves of the document are the Cost and Id nodes. A Document Object Model converts XML documents to their tree view; nodes may be accessed by specifying their 'parents' or through the root of the document.
Figure 10c: The Tree View of BigStore.xml
An XML document may contain any sort of information and is extremely useful as a universal datasource. XML documents have a very complex syntax and are associated with several specifications. The utility of XML documents is enhanced by XML schemas (a Schema defines the structure of an XML document). Schemas are themselves XML documents with the extension ".xsd". XML validators check XML documents against a specified schema. Validators perform an important function; entire databases may be contained in a single XML document and the integrity of the data may be easily verified using a schema. Take a look at the fruits.xsd file. This file contains a schema for fruits.xml.
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="fruitlist"> <xs:complexType> <xs:sequence> <xs:element name="fruit" maxOccurs="unbounded">
The above lines declare the version of XML and schema that fruits.xsd complies with. It also declares an element called fruitlist which is a 'complexType' element ('complexType' elements contain other elements). The 'Sequence' directive defines the exact order of elements within the complex type. Here, the 'fruitlist' root element simply contains another complextype element called 'fruit'. The value of the maxOccurs attribute specifies that the 'fruit' element may appear an unlimited amount of times within the fruitlist element.
<xs:attribute name="imagepath" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:schema>
The last lines of the schema contain the definition of the fruitlist element's 'imagepath' attribute. The type of this attribute is set to string. Also, the attribute is declared as compulsory through use="required". This attribute has been given a value in the fruits.xml file (imagepath="e:\JavaTutorial\info")
<xs:complexType> <xs:sequence> <xs:element name="id" type="xs:integer"/> <xs:element name="name" type="xs:string"/> <xs:element name="color" type="xs:string" maxOccurs="unbounded"/> <xs:element name="image" type="xs:string"/> <xs:element name="calories" type="xs:integer"/> </xs:sequence> </xs:complexType>
The fruit complextype is restricted to the above elements in a strict sequence. Note that the data type of each element is specified. Also, the 'color' element is allowed to appear an unlimited number of times. 'xs:all' instead of xs:sequence' would indicate that the elements within a complexType may appear in any order.
Open the file xmlInput.java in NetBeans.
Change the hard coded file paths to fruits.xml and fruits.xsd on lines 64 and 70 to reflect the path to these files on your computer (a real enterprise application would store such paths in a property file rather than hard code them into code).
Compile and run the file within NetBeans. The name value pairs of all fruits and properties in the fruits.xml file will be displayed.
JAXP is a Java resource that may be used to work with XML. JAXP contains libraries that allow developers to create DOM/SAX objects, validate, alter, and transform XML. JAXP is integral to Java developers as XML is a top source of Input data to many Java Applications. Using JAXP to check, validate, and parse an XML document is a very interesting exercise. Parsers verify if an XML file is well-formed – in that all the open tags are closed and the tags occur in the right sequence. Some parsers construct a DOM (Document Object Model) tree view of the parsed document.
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
Document fruitsDocument =
domBuilder.parse("<Input_XML_File_Path");
The above lines of code create an instance of a Document Builder object that may be used to parse a given file and construct its tree. This tree is then held in the 'fruitsDocument' Document object. The lines below describe a way of getting the values of every branch and element using NodeList and Node Objects and getElementsbyTagName(), getChildNodes(), getNodeName(), and getNodeValue() methods. Better methods that change the values of elements are also available:
Node base = fruitsDocument.getFirstChild();
NamedNodeMap attributes = base.getAttributes();
System.out.println("Image Path: " +
attributes.getNamedItem("imagepath").getNodeValue());
NodeList nl = fruitsDocument.getElementsByTagName("fruit");
for (int i=0; i <nl.getLength();i++) {
Node n = nl.item(i);
NodeList fruitPropertiesList = n.getChildNodes();
for (int j=0; j<fruitPropertiesList.getLength(); j++) {
Node properties = fruitPropertiesList.item(j);
if(properties.getNodeType() == Node.ELEMENT_NODE)
System.out.println(properties.getNodeName()
+ ": " + properties.getTextContent());
}
}
The following lines of code from xmlInput.java create a Schema Source that may be used to validate the 'fruitsDocument' object created from parsing the XML file. The validator goes beyond the parser by checking to make sure that the XML file conforms to its schema. We use the fruits.xsd file to validate fruit.xml. Do some changes in fruits.xml, change the order of elements, leave out attributes or enter strings instead of integers. If you run xmlInput.java on such non-conforming files, the validator will throw an error and mention the place and type of non-compliance to fruits.xsd
SchemaFactory factory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new
File("<Input_Schems_File_Path"));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used
// to validate an instance document
Validator validator = schema.newValidator();
validator.validate(new DOMSource(fruitsDocument));

Figure 10d: XML Schema Processor
XML stylesheets contain specifications on translating the XML document to HTML or some other markup. XSL Transformers are used to modify XML documents according to a provided XML stylesheet's specifications. XML Stylesheets have the extension ".xsl". Several commercial products such as XML Spy and others offer an Environment to create XML documents, StyleSheets, and Schemas.
Figure 10e: XSL Transformer
Take a look at fruits.xsl. This file defines how the fruits.xml file should be translated to HTML. Let us use JAXP to transform fruits.xml to HTML via a servlet that sends the output HTML to the browser.
- Open a new class called xmlToHtmlFruits.java in NetBeans5.5 and replace its contents with the contents of xmlToHtmlFruits.java.
- Change the hard coded file paths to fruits.xml and fruits.xsl on lines 64 and 71 to reflect the path to these files on your computer (a real enterprise application would store such paths in a property file rather than hard code them).
- Compile the file and copy xmlToHtmlFruits.class (in MySamples\build\classes) onto <Tomcat 5.5 Program Directory>\webapps\ShowFruits\WEB-INFO\classes
- Now, add the following entries to web.xml for xmlToHtmlFruits. Once again, paste the 'servlet' tag under the 'servlet' tags already in the file. Paste the 'servlet-mapping' tag under the existing 'servlet-mapping' tags
Important: All the 'servlet' tags should be in a group and all the 'servlet-mapping' tags should be in another group below the 'servlet' tags; do not pair a class's servlet and servlet mapping tags together.
Under servlet tags: <servlet> <servlet-name>xmlTransform</servlet-name> <servlet-class>xmlToHtmlFruits</servlet-class> </servlet> Under Servlet mapping tags: <servlet-mapping> <servlet-name>xmlTransform</servlet-name> <url-pattern>/useXML</url-pattern> </servlet-mapping>
- Now, reload or stop and start the ShowFruits application using the Tomcat Manager page ( http://localhost:8080/manager/html).
- Enter the following URL : http://localhost:8080/ShowFruits/useXML
- You will see a table containing all the fruits in the fruits.xml file.
- Test the program by adding new fruit elements in fruits.xml and the corresponding images in <Tomcat 5.5 Program Directory>\webapps\ShowFruits\images.
MVC separation is built into JAXP. The model is embedded in the input XML file, the view is embedded in the XSLT and all the servlet does is the transformation. Take a look at the clean code that produces the XML output (the untidy filepath could easily be moved to a properties file):
// Create DOM Tree Document
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder domBuilder = domFactory.newDocumentBuilder();
Document fruitsDocument = domBuilder.parse("<Input_XML_File>");
// Get a Refernce to response HTTP object
// and a handle to write onto it
PrintWriter out = response.getWriter();
response.setContentType("text/html");
// Create a StreamResult object whose output is
// directed to the response
StreamResult myHTML = new StreamResult(out);
// Create transformer based on XSL
// used to transform input XML
TransformerFactory t = TransformerFactory.newInstance();
Transformer x = t.newTransformer(new
StreamSource("<Input_XSL_File>"));
// Transform document, send resulting Stream
// to the Printwriter 'out'
// (myHTML was set to be sent to 'out'
x.transform(new DOMSource(fruitsDocument),myHTML);
out.close();
JAXP also offers SAX parsers. The SAX parser does not create a tree of the XML document. SAX is used more frequently than DOM in the real world because DOM objects take up a lot of memory (especially when the XML file is large or 'deep'). Java and XML are essentially connected as both are platform-independent tools that are flexible enough to satisfy wide ranging tools. JAXP furthers this partnership of sorts by offering many ways to harness the power of XML.