Tim Hastings - NonHostile (because there's no need)

Weblog and collection of geeky articles.

  Home :: Who? :: Contact :: Links :: Subscribe subscribe
Snow Day!Our House is Overrun with Animals!!Wake Me Up When September Ends


The Idea

Is there a performance gain to be had by using a dictionary to index the DOM Elements of a large DOM rather than using repeated XPath querys against a large nested DOM.

 

The Tests

To illustrate the performance of XPath a highly nested DOM will be constructed programmatically. A large number of nodes will be created and added as child nodes of a random node previously added.

 

1. The first test will use XPath to locate each node.
2.

The second test will add each element to a Dictionary object and this will be used instead of XPath to located the parent node for each iteration.

The Dictionary is provided by the Microsoft Scripting Runtime library.

3. The third test will use the same approach as test 2, but using VB's native Collection object.
4. Due to the contrived nature of this test (sequential IDs). To optimise this specific problem, test 4 will use a VB Array as a direct hash. This will be our optimal solution, although in a more real-world data related problem, the data may not be hashable as easy as this.

 

A test app is created referencing Microsoft XML 2.6 and Microsoft Scripting Runtime (for the Dictionary).

 

It must be noted that this is a particularly dumb use of a XPath and a DOM object. There are ways to structure XML Documents for better XPath retieval performance.

 

In order to time the results, the GetTickCount API function is used to get the system clock time in milliseconds, although this level of accuracy really isn't needed!

 

Results

To create a structure containing 250,000 nodes, the Dictionary approach took 14 seconds. Using a Collection was quicker still, approx 11 seconds. It used slightly more memory, most likely due to string indexing instead of numbers.

 

Test First Second Third

Memory

XPath

Got bored waiting, progress

slowed exponentially.


Dictionary

14801 14711 14621 approx 80MB

Collection

11246 11556 11166 approx 90MB

VB Array Hash

5027 4957 4957 approx 66MB

 

The XPath solution takes hours.

 

1. Code for the XPath Test (will take hours to complete)

    Private Declare Function GetTickCount Lib "kernel32" () As Long

    Public Sub Main()

        Dim objXML As MSXML2.DOMDocument
        Dim objElem As MSXML2.IXMLDOMElement
        Dim objParent As MSXML2.IXMLDOMElement
        Dim lngPos As Long
        Dim lngParent As Long
        Dim lngTime As Long
        
        ' init DOM
        Set objXML = New MSXML2.DOMDocument
        Set objXML.documentElement = objXML.createElement("root")

        ' clock the start-time
        lngTime = GetTickCount()

        ' populate with lots of elements
        For lngPos = 1 To 250000
            If lngPos = 1 Then
                Set objParent = objXML.documentElement
            Else
                ' randomly pick a node to append this node
                lngParent = 1 + Rnd * (lngPos - 2)
               
                ' use XPath to find element from DOM
                Set objParent = objXML.selectSingleNode("//node[@id=" & CStr(lngParent) & "]")
            End If
           
            ' create a new element belonging to our randomly picked element
            Set objElem = objParent.appendChild(objXML.createElement("node"))
            objElem.setAttribute "id", lngPos
        Next
       
        ' set the time elapsed to the dom
        objXML.documentElement.setAttribute "time", GetTickCount() - lngTime
       
        ' save DOM
        objXML.save "c:\test.xml"

    End Sub
2. Code for the Dictionary Test

 

The above code modified to use a Dictionary object to 'cache' a reference to the each node element. This is dictionary is used to retrieve the element instead of XPath.


    Private Declare Function GetTickCount Lib "kernel32" () As Long

    Public Sub Main()

        Dim objXML As MSXML2.DOMDocument
        Dim objElem As MSXML2.IXMLDOMElement
        Dim objParent As MSXML2.IXMLDOMElement
        Dim lngPos As Long
        Dim lngParent As Long
        Dim lngTime As Long
        Dim dicIndex As Dictionary
       
        ' init DOM
        Set objXML = New MSXML2.DOMDocument
        Set objXML.documentElement = objXML.createElement("root")

        ' create a dictionary to index nodes
        Set dicIndex = New Dictionary

        ' clock the start-time
        lngTime = GetTickCount()

        ' populate with lots of elements
        For lngPos = 1 To 250000
            If lngPos = 1 Then
                Set objParent = objXML.documentElement
            Else
                ' randomly pick a node to append this node
                lngParent = 1 + Rnd * (lngPos - 2)
               
                ' use XPath to find element from DOM
                Set objParent = dicIndex.Item(lngParent)
            End If
           
            ' create a new element belonging to our randomly picked element
            Set objElem = objParent.appendChild(objXML.createElement("node"))
            objElem.setAttribute "id", lngPos
           
            ' add new node-element to dictionary for quick lookup
            dicIndex.Add lngPos, objElem
        Next
       
        ' set the time elapsed to the dom
        objXML.documentElement.setAttribute "time", GetTickCount() - lngTime
       
        ' save DOM
        objXML.save "c:\test.xml"

    End Sub

 

3. Collection Code

    Private Declare Function GetTickCount Lib "kernel32" () As Long

    Public Sub Main()

        Dim objXML As MSXML2.DOMDocument
        Dim objElem As MSXML2.IXMLDOMElement
        Dim objParent As MSXML2.IXMLDOMElement
        Dim lngPos As Long
        Dim lngParent As Long
        Dim lngTime As Long
        Dim colIndex As Collection
       
        ' init DOM
        Set objXML = New MSXML2.DOMDocument
        Set objXML.documentElement = objXML.createElement("root")

        ' create a dictionary to index nodes
        Set colIndex = New Collection

        ' clock the start-time
        lngTime = GetTickCount()

        ' populate with lots of elements
        For lngPos = 1 To 250000
            If lngPos = 1 Then
                Set objParent = objXML.documentElement
            Else
                ' randomly pick a node to append this node
                lngParent = 1 + Rnd * (lngPos - 2)
               
                ' use XPath to find element from DOM
                Set objParent = colIndex.Item(CStr(lngParent))
            End If
           
            ' create a new element belonging to our randomly picked element
            Set objElem = objParent.appendChild(objXML.createElement("node"))
            objElem.setAttribute "id", lngPos
           
            ' add new node-element to dictionary for quick lookup
            colIndex.Add objElem, CStr(lngPos)
        Next
       
        ' set the time elapsed to the dom
        objXML.documentElement.setAttribute "time", GetTickCount() - lngTime
       
        ' save DOM
        objXML.save "c:\test.xml"

    End Sub

4. VB Array Hash Table

    Private Declare Function GetTickCount Lib "kernel32" () As Long

    Public Sub Main()

        Dim objXML As MSXML2.DOMDocument
        Dim objElem As MSXML2.IXMLDOMElement
        Dim objParent As MSXML2.IXMLDOMElement
        Dim lngPos As Long
        Dim lngParent As Long
        Dim lngTime As Long
        Dim arrHash(1 To 250000) As Object
       
        ' init DOM
        Set objXML = New MSXML2.DOMDocument
        Set objXML.documentElement = objXML.createElement("root")

        ' clock the start-time
        lngTime = GetTickCount()

        ' populate with lots of elements
        For lngPos = 1 To 250000
            If lngPos = 1 Then
                Set objParent = objXML.documentElement
            Else
                ' randomly pick a node to append this node
                lngParent = 1 + Rnd * (lngPos - 2)
               
                ' use XPath to find element from DOM
                Set objParent = arrHash(lngParent)
            End If
           
            ' create a new element belonging to our randomly picked element
            Set objElem = objParent.appendChild(objXML.createElement("node"))
            objElem.setAttribute "id", lngPos
           
            ' add new node-element to dictionary for quick lookup
            Set arrHash(lngPos) = objElem
        Next
       
        ' set the time elapsed to the dom
        objXML.documentElement.setAttribute "time", GetTickCount() - lngTime
       
        ' save DOM
        objXML.save "c:\test.xml"

    End Sub


2 comments, Visual Basic 6, Friday, March 26, 2004 15:19

Timeline Navigation for Visual Basic 6 posts
VB6: Read and Write Byte Arrays to Files in Visual Basic (Code Library) (made 1 week later)
VB6: Strategies for Indexing Elements of a Large XML DOM in Visual Basic (this post, made Friday, March 26, 2004 15:19)


Comments
I love your way to explain these topics
but i am just a beginner of VB programming
so can u plz put a part of this XML file as example to be shown and to know what nodes exactly u reached and how to reach to it
and i just want to ask u that i have an XML file about only 10 mega Bytes, r these codes useful for it or with this capacity i don't need these ways
my program is to just retrieve data from XML File

<entry name="points data">
<viz>
<dataobject>
<entry name="weatherpoint:moscow">
<entry name="temp">12</entry>
<entry name="min">0</entry>
<entry name="max">11</entry>
</entry>
</dataobject>
</viz>
</entry>

that's a sample af a file i want to retrieve it's data 12, 0, 11 to put it in database access with loop
so how can i read this type and what is the best method for this capacity
thanks for advance. :)

Posted by: hatem on Saturday, May 19, 2007 12:12
Hi,
thank u for given above code. but i need create an xml file using database in vb 6 means i insert multiple records from database in xml and read multiple records for insert another database table. please send a samle code use ms-access database .. thankyou sk.Mushkin .... skmushkin@gmail.com

Posted by: Mushkin on Friday, February 13, 2009 10:28

Post a Comment
Name:  Home page and email address are optional.
  Email addresses will not be displayed or spammed!
Remember these details
Email:
Home Page:
Comment:
Comments cannot contain HTML, URLs will be formatted into hyperlinks.
I reserve the right to remove any comments for any reason.