Event-based XML parsing in Java

June 22, 2010, (updated on September 6, 2014), Software Development

Introduction

On some inefficient devices or some programming languages there are only event-based XML parsers available. In this article I will show how to use such a parser in an easy way and how to read hierarchical XML structures.

To use the described method the following artifacts are needed:

  • A method for finding a start tag, with the name of the tag as a parameter
  • A method for finding an end tag, with two parameters: One is the name and one is the value of the tag
  • A stack for the created objects

Implementation

In the start method, the program has to create new objects which are put on the stack. In the end method the topmost object is read from the stack. On this object, the code sets the property with the ended tag’s value. If the end tag was the end of the loaded object (last else block) the object will be removed from the stack.

Disadvantages

With the method of event-base XML parsing, the logic to read multiple different object is in one class instead of the particular parsed class.

Sample

In this sample the java SAX parser is used to read the XML data.

Here is the sample XML to parse:

<?xml version="1.0" encoding="UTF-8"?>
<contactlist>
   <person>
      <name>
         <first>Ingo</first>
         <last>Melzer</last>
      </name>
      <city>Laupheim</city>
      <country>Germany</country>
   </person>
   <person>
      <name>
         <first>Andreas F.</first>
         <last>Borchert</last>
      </name>
      <city>Lonsee</city>
      <country>Germany</country>
   </person>
</contactlist>

Code to parse the sample XML:

import org.xml.sax.SAXException;

public class ContactListReader extends XmlReader {
    public ContactListReader(String xml)
    {
        super(xml);
        list = new ContactList();
        getStack().push(list);
    }

    public void startElement(String name) throws SAXException {
        Object o = getStack().peek();

        if(o instanceof ContactList){
            ContactList list = (ContactList)o;

            if(name.equals("person")){
                Person p = new Person();
                list.getPersons().add(p);
                getStack().push(p);
            }

            // It's possible to read different objects on the same hierarchy level
        }

        if(o instanceof Person){
            Person person = (Person)o;

            if(name.equals("name")){
                Name n = new Name();
                person.setName(n);
                getStack().push(n);
            }
        }
    }


    public void endElement(String name, String value) throws SAXException{
        Object o = getStack().peek();

        if(o instanceof ContactList){
            ContactList list = (ContactList)o;

            // no attributes

            // Because this is the root object, it must not be removed 
            // from the stack. At the end it can be read from the stack
        }

        if(o instanceof Person){
            Person person = (Person)o;

            if(name.equals("city"))
                list.setCity(value);
            else if(name.equals("country"))
                list.setCountry(value);

            else if(name.equals("person"))
                getStack().pop();
        }

        if(o instanceof Name){
            Name name = (Name)o; 

            if(name.equals("first"))
                list.setFirst(value);
            else if(name.equals("last"))
                list.setLast(value);

            else if(name.equals("name"))
                getStack().pop();
        }
    }
}

The much more sax-specific XmlReader class:

import javax.xml.parsers.*;
import org.xml.sax.*;
import java.io.*;
import java.util.Stack;

public abstract class XmlReader implements ContentHandler{
    private String xml;
    private String value = "";
    private int level = 0; 
    private Stack<Object> stack;

    public XmlReader(String xml)
    {
        this.xml = xml;
        this.stack = new Stack<Object>();
    }

    public Stack<Object> getStack(){
        return stack;
    }

    public void parse() throws ParserConfigurationException, SAXException, IOException{
        StringReader inStream = new StringReader(xml);
        InputSource inSource = new InputSource(inStream);

        SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
        SAXParser parser = saxParserFactory.newSAXParser();
        XMLReader reader = parser.getXMLReader();
        reader.setContentHandler(this);
        reader.parse(inSource);
    }

    public abstract void startElement(String name) throws SAXException;
    public abstract void endElement(String name, String value) throws SAXException;

    @Override
    public void endDocument() throws SAXException {}

    @Override
    public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException {
        level++;
        startElement(name);
        value = "";
    }

    @Override
    public void endElement(String uri, String localName, String name) throws SAXException {
        level--;
        endElement(name, value);
        value = ""; 
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        for(int i=0; i<length; i++){
            value += ch[start+i];
        }
        value = !value.equals("\t") ? value : "";
    }

    @Override
    public void endPrefixMapping(String prefix) throws SAXException {}
    @Override
    public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException {}
    @Override
    public void processingInstruction(String target, String data) throws SAXException {}
    @Override
    public void setDocumentLocator(Locator locator) {}
    @Override
    public void skippedEntity(String name) throws SAXException {}
    @Override
    public void startDocument() throws SAXException {}
    @Override
    public void startPrefixMapping(String prefix, String uri)throws SAXException {}
}
Tweet about this on TwitterShare on FacebookEmail this to someoneShare on TumblrShare on LinkedIn

Tags: ,

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax