Friday, June 26, 2009

Different XML processing mechanism DOM, OM, SAX

By XML processing i mean different approces to create, read and update xml document.

Creating XML document::

There are two ways we can create XML document::

1. DOM

2. OM

DOM is expensive in the sence it creats in memeory object.

To parse XML files commonly avalible approches are as follows>>

1. DOM

2. SAX

3. OM


DOM reads the complete XML document feeded in one shot and creates in memory object for that. Creating in memory object for complete xml document has got its advantage and disadvantage.

Adv: since we have complete xml as in memory object we can nevigate to different part of the xml documnet we can read it manupulate it.

DisAdv: Suppose we have got a document having 3000 emplyee element. so it will create 3000 emplyee element in memory, which will consume lot of memory.

Now some code ...How we'll do that...

//get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
//get the document builder
DocumentBuilder db = dbf.newDocumentBuilder();
//parse using builder to create dom represntaion of xml
Document dom = db.parse("employees.xml");
//get the root elememt
Element docEle = dom.getDocumentElement();
//get a nodelist of elements
NodeList nl = docEle.getElementsByTagName("Employee");
for(int i = 0 ; i < nl.getLength();i++) {
//get the employee element
Element empEl = (Element)nl.item(i);
String name = getTextValue(empEl,"Name");
int id = getIntValue(empEl,"Id");
int age = getIntValue(empEl,"Age");
String type = empEl.getAttribute("type");
sax do event based what does that means...that means it will parse xml documnet based on the event it find in XML document....actually sax treats strat, end and getting some char as event.. while do sax parsing we can register our handler with parser that will be invoked when parser will encounter with such event. We can write our code in the handler to process the xml stream thrown by the parser.
//Create a custom handler
public class MyHandler extends DefaultHandler{
private String tempVal;
//to maintain context
private Employee tempEmp;

//Event Handlers
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if(qName.equalsIgnoreCase("Employee")) {
//create a new instance of employee
tempEmp = new Employee();
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch,start,length);
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equalsIgnoreCase("Employee")) {
//add it to the list
}else if (qName.equalsIgnoreCase("Name")) {
}else if (qName.equalsIgnoreCase("Id")) {
}else if (qName.equalsIgnoreCase("Age")) {
//get the factory
SAXParserFactory spf = SAXParserFactory.newInstance();
//get a new instance of parser
SAXParser sp = spf.newSAXParser();
//parse the file and also register this class for call backs
MyHandler handle= new MyHandler();
sp.parse("employees.xml", handle);
Now here comes adv and disadv with sax:
Adv: Doesn't consume lot of memory why so, it's give only that element it's reading at a given moment.
DisAdv: There is no way we can go back while traversing in XML document.
Now let's talk about stax parser.
StAX(Streaming API for XML) parser:
The above two parsers we discussed earlier were push parser. STAX is push parser. Now what does that mean???
In push parsing wn u start parsing it will throw all the parsed info on you, in case of DOM parser it will read complete complete XML document in one shot while in case of SAX parse once it started parsing it will keep on throwing event as it progress and it cant go back.
In pull parsing we have better control, lets take an example to compare.
FileInputStream fileInputStream =
new FileInputStream(fileLocation);
XMLStreamReader xmlStreamReader =
we have created XMLStreamReader which has got iterator type of API to move forward while parsing.
Thing to note here is paser wont move further untill we call next(). In case of SAX parse once we start parsing we dont have any control it will keep on throwing the element till it reaches the end of the document to be parsed.