The Idea
Is there a performance gain to be had by using a dictionary to index the DOM Elements of a large DOM rather than using repeated XPath querys against a large nested DOM.
The Tests
To illustrate the performance of XPath a highly nested DOM will be constructed programmatically. A large number of nodes will be created and added as child nodes of a random node previously added.
| 1. | The first test will use XPath to locate each node. |
| 2. |
The second test will add each element to a Dictionary object and this will be used instead of XPath to located the parent node for each iteration. The Dictionary is provided by the Microsoft Scripting Runtime library. |
| 3. | The third test will use the same approach as test 2, but using VB's native Collection object. |
| 4. | Due to the contrived nature of this test (sequential IDs). To optimise this specific problem, test 4 will use a VB Array as a direct hash. This will be our optimal solution, although in a more real-world data related problem, the data may not be hashable as easy as this. |
A test app is created referencing Microsoft XML 2.6 and Microsoft Scripting Runtime (for the Dictionary).
It must be noted that this is a particularly dumb use of a XPath and a DOM object. There are ways to structure XML Documents for better XPath retieval performance.
In order to time the results, the GetTickCount API function is used to get the system clock time in milliseconds, although this level of accuracy really isn't needed!
Results
To create a structure containing 250,000 nodes, the Dictionary approach took 14 seconds. Using a Collection was quicker still, approx 11 seconds. It used slightly more memory, most likely due to string indexing instead of numbers.
| Test | First | Second | Third |
Memory |
|
XPath |
Got bored waiting, progress slowed exponentially. |
|||
|
Dictionary |
14801 | 14711 | 14621 | approx 80MB |
|
Collection |
11246 | 11556 | 11166 | approx 90MB |
|
VB Array Hash |
5027 | 4957 | 4957 | approx 66MB |
The XPath solution takes hours.
1. Code for the XPath Test (will take hours to complete)
Private Declare Function GetTickCount Lib "kernel32" () As Long
Public Sub Main()
Dim objXML As MSXML2.DOMDocument
Dim objElem As MSXML2.IXMLDOMElement
Dim objParent As MSXML2.IXMLDOMElement
Dim lngPos As Long
Dim lngParent As Long
Dim lngTime As Long
' init DOM
Set objXML = New MSXML2.DOMDocument
Set objXML.documentElement = objXML.createElement("root")
' clock the start-time
lngTime = GetTickCount()
' populate with lots of elements
For lngPos = 1 To 250000
If lngPos = 1 Then
Set objParent = objXML.documentElement
Else
' randomly pick a node to append this node
lngParent = 1 + Rnd * (lngPos - 2)
' use XPath to find element from DOM
Set objParent = objXML.selectSingleNode("//node[@id=" & CStr(lngParent) & "]")
End If
' create a new element belonging to our randomly picked element
Set objElem = objParent.appendChild(objXML.createElement("node"))
objElem.setAttribute "id", lngPos
Next
' set the time elapsed to the dom
objXML.documentElement.setAttribute "time", GetTickCount() - lngTime
' save DOM
objXML.save "c:\test.xml"
End Sub
2. Code for the Dictionary Test
The above code modified to use a Dictionary object to 'cache' a reference to the each node element. This is dictionary is used to retrieve the element instead of XPath.
Private Declare Function GetTickCount Lib "kernel32" () As Long
Public Sub Main()
Dim objXML As MSXML2.DOMDocument
Dim objElem As MSXML2.IXMLDOMElement
Dim objParent As MSXML2.IXMLDOMElement
Dim lngPos As Long
Dim lngParent As Long
Dim lngTime As Long
Dim dicIndex As Dictionary
' init DOM
Set objXML = New MSXML2.DOMDocument
Set objXML.documentElement = objXML.createElement("root")
' create a dictionary to index nodes
Set dicIndex = New Dictionary
' clock the start-time
lngTime = GetTickCount()
' populate with lots of elements
For lngPos = 1 To 250000
If lngPos = 1 Then
Set objParent = objXML.documentElement
Else
' randomly pick a node to append this node
lngParent = 1 + Rnd * (lngPos - 2)
' use XPath to find element from DOM
Set objParent = dicIndex.Item(lngParent)
End If
' create a new element belonging to our randomly picked element
Set objElem = objParent.appendChild(objXML.createElement("node"))
objElem.setAttribute "id", lngPos
' add new node-element to dictionary for quick lookup
dicIndex.Add lngPos, objElem
Next
' set the time elapsed to the dom
objXML.documentElement.setAttribute "time", GetTickCount() - lngTime
' save DOM
objXML.save "c:\test.xml"
End Sub
3. Collection Code
Private Declare Function GetTickCount Lib "kernel32" () As Long
Public Sub Main()
Dim objXML As MSXML2.DOMDocument
Dim objElem As MSXML2.IXMLDOMElement
Dim objParent As MSXML2.IXMLDOMElement
Dim lngPos As Long
Dim lngParent As Long
Dim lngTime As Long
Dim colIndex As Collection
' init DOM
Set objXML = New MSXML2.DOMDocument
Set objXML.documentElement = objXML.createElement("root")
' create a dictionary to index nodes
Set colIndex = New Collection
' clock the start-time
lngTime = GetTickCount()
' populate with lots of elements
For lngPos = 1 To 250000
If lngPos = 1 Then
Set objParent = objXML.documentElement
Else
' randomly pick a node to append this node
lngParent = 1 + Rnd * (lngPos - 2)
' use XPath to find element from DOM
Set objParent = colIndex.Item(CStr(lngParent))
End If
' create a new element belonging to our randomly picked element
Set objElem = objParent.appendChild(objXML.createElement("node"))
objElem.setAttribute "id", lngPos
' add new node-element to dictionary for quick lookup
colIndex.Add objElem, CStr(lngPos)
Next
' set the time elapsed to the dom
objXML.documentElement.setAttribute "time", GetTickCount() - lngTime
' save DOM
objXML.save "c:\test.xml"
End Sub
4. VB Array Hash Table
Private Declare Function GetTickCount Lib "kernel32" () As Long
Public Sub Main()
Dim objXML As MSXML2.DOMDocument
Dim objElem As MSXML2.IXMLDOMElement
Dim objParent As MSXML2.IXMLDOMElement
Dim lngPos As Long
Dim lngParent As Long
Dim lngTime As Long
Dim arrHash(1 To 250000) As Object
' init DOM
Set objXML = New MSXML2.DOMDocument
Set objXML.documentElement = objXML.createElement("root")
' clock the start-time
lngTime = GetTickCount()
' populate with lots of elements
For lngPos = 1 To 250000
If lngPos = 1 Then
Set objParent = objXML.documentElement
Else
' randomly pick a node to append this node
lngParent = 1 + Rnd * (lngPos - 2)
' use XPath to find element from DOM
Set objParent = arrHash(lngParent)
End If
' create a new element belonging to our randomly picked element
Set objElem = objParent.appendChild(objXML.createElement("node"))
objElem.setAttribute "id", lngPos
' add new node-element to dictionary for quick lookup
Set arrHash(lngPos) = objElem
Next
' set the time elapsed to the dom
objXML.documentElement.setAttribute "time", GetTickCount() - lngTime
' save DOM
objXML.save "c:\test.xml"
End Sub


