Get XPath objects from Excel workbook

When working with XML enabled workbooks the relevant objects and properties are easy to retrieve. Unfortunately a workbook has no built-in collection of XPath objects. How can we get a list of all XPath objects in a workbook?

XML objects

For a basic introduction to XML in Excel I recommend Overview of Xml in Excel – Microsoft.

The figure below shows the logical relationship between mentioned objects:


Logical relations between XML objects

Logical relations between XML objects



Explanation:

  • A Workbook has zero or more XMLMaps (property: Workbook.XMLMaps);
  • An XMLMap has one or more XMLSchemas (property: XMLMap.Schemas);
  • At any time an XMLMap can point to one XML File (property: XMLMap.XmlDataBinding.SourceUrl);
  • An XMLMap is related to zero or more mapped worksheet ranges.

Remarks:

  • The read-only property XMLMap.Schemas points to an XMLSchemas collection. However, this collection has no Add method. So, we may assume that in practice an XMLMap will never have more than one Schema.
  • In the above referenced article it is mentioned that “You cannot export the Excel inferred schema as a separate XML schema data file.” However, you can always retrieve the schema by means of the property Activeworkbook.XMLMaps(1).Schemas(1).XML. Replace the first index number by yours or your XMLMap name.
  • Based on the above logical diagram you would expect the XMLMap to have a property like MappedRanges, pointing to a collection of Ranges that are mapped to the XMLMap-schema. Unfortunately there is no such property or collection. In fact, the relationship is reverse: a mapped Range points to a certain XMLMap by means of the property Range.XPath. See below for more details.
  • For completeness: in the above model we disregarded the XMLNameSpace object (property XmlMap.RootElementNamespace, which points to an XmlNamespace object in the Workbook.XmlNamespaces collection).

Get a list of all XPaths



UPDATE: see Get XPaths from workbook – update


To get a list of all XPath objects in a workbook we need a procedure like this (pseudo code):

For Each Worksheet In ActiveWorkbook.Worksheets
    For Each Cell In Worksheet.UsedRange
        If Cell has XPath Then do something
    Next Cell
Next Worksheet

Remarks:

  • The above solution doesn’t take into account that an XPath will be be repeating when it is part of a listcolumn in a table (listobject). So, we can improve the above pseudo-code by skipping all other cells of a listcolumn after processing one of them;
  • Furthermore, we can optimize the above solution by trying to retrieve the XPath for a whole column of the UsedRange first. Only when this fails we have to process the column cell by cell.

The improved pseudo-code now looks like this:

For Each Worksheet In ActiveWorkbook.Worksheets
    For Each Column In Worksheet.UsedRange
        Try to get XPath from this column as a whole
        If we fail Then
            ' get XPath cell by cell
            For Each Cell In Column
                If Cell has XPath Then do something
                If Cell is in a ListColumn then jump to last ListColumn cell
            Next Cell
        Else ' succes: all cells in this column have same XPath
            If XPath not empty then do something
        End If
    Next Column
Next Worksheet

A possible implementation follows. Note that this code does not retrieve a list of all XPaths but analyzes each column in the workbook and reports the results to the immediate window.

Private Sub PrintXPaths()
    Dim ws          As Excel.Worksheet
    Dim rColumn     As Excel.Range
    Dim objXPath    As Excel.XPath
    Dim i           As Long
    Dim bHasXPath   As Boolean
    For Each ws In ActiveWorkbook.Worksheets
        Debug.Print ws.Name, ws.UsedRange.Address
        For Each rColumn In ws.UsedRange.Columns
            With rColumn
                ' try to get XPath from column as a whole
                Set objXPath = Nothing
                On Error Resume Next
                Set objXPath = .XPath
                On Error GoTo 0
                ' check result
                If objXPath Is Nothing Then ' objPath on column failed, error -2147024809
                    ' take this column cell by cell
                    bHasXPath = False ' posit
                    For i = 1 To .Cells.Count
                        With .Cells(i)
                            DoEvents
                            With .XPath
                                If .Value <> vbNullString Then
                                    bHasXPath = True ' admit
                                    Debug.Print .Parent.Column, .Map.Name, .Value, "Repeating = "; .Repeating
                                End If
                            End With
                            ' skip cells if we entered a listcolumn
                            If Not (.ListObject Is Nothing) Then i = i + .ListObject.ListRows.Count
                        End With
                    Next i
                    If bHasXPath = False Then Debug.Print .Column, "No XPath (processed cell by cell)"
                Else ' success: all cells in this column have the same XPath
                    If objXPath.Value <> vbNullString Then ' we have an XPath
                        Debug.Print .Column, objXPath.Map.Name, objXPath.Value, "Repeating = "; .XPath.Repeating
                    Else ' no XPath
                        Debug.Print .Column, "No XPath (processed column)"
                    End If
                End If
            End With
        Next
    Next
End Sub

An example of a report as written to the immediate window:

Sheet1        $B$2:$E$28
 2            CatalogMap    /ns1:Catalog/Book/Id       Repeating = True
 3            No XPath (processed cell by cell)
 4            CatalogMap    /ns1:Catalog/Id            Repeating = False
 4            CatalogMap    /ns1:Catalog/Book/Title    Repeating = True
 4            CatalogMap    /ns1:Catalog/Name          Repeating = False
 5            No XPath (processed column)

This report is taken from a worksheet Sheet1 (a Catalog) with UsedRange B2:E28 and one (Books) table three columns wide (sheet columns 2-4). The first and third listcolumn of the table (sheet columns 2 and 4) are mapped. Above and below listcolumn 3 (sheet column 4) a single cell is mapped. Sheet column 3 could not be processed as a column because of the listobject (see next paragraph).

Why per column?

Perhaps you wonder why we process the Usedrange per column. Can’t we improve even more by first trying to set the XPath object to the whole Usedrange? Let Excel give you the answer: activate a sheet with a Usedrange of more than one column and run this line from the immediate window:

? ActiveSheet.UsedRange.XPath

This will open the message-box below:

XPath Error from VBE

XPath Error from VBE

So, according to the first reason it is only possible to get the XPath per column.

Note: if you run the above statement from code it will raise a runtime error and the following messagebox will show:

XPath Error from Runtime

XPath Error from Runtime

Notice the difference between the two message boxes. Don’t we miss something here?

XPath in Open XML

If the technique described above is not fast enough for you there is an alternative: unzip your (not encrypted) Excel workbook. Then, in folder xl\tables\ you will find a list of all tables in your workbook. Tables that are connected to an XMLMap have an extra attribute tableType=”xml”, and the relevant columns have an extra property xmlColumnPr=”” (Pr stands for Properties). So it’s not difficult to built a list of all repeating XPaths in your workbook.
For the single cell XPaths things are a bit more complicated. Again start in the xl\tables\ folder, where you also find file-names in a format like tableSingleCells1.xml. One such a file defines all single xml cells in one sheet, and may look like this:

<singleXmlCells>
    <singleXmlCell id="14" r="D28" connectionId="0">
        <xmlCellPr id="1" uniqueName="Author">
        xmlPr = ""
        </xmlCellPr>
    </singleXmlCell>
    <singleXmlCell id="4" r="D6" connectionId="0">
        <xmlCellPr id="1" uniqueName="SaveDate">
        xmlPr = ""
        </xmlCellPr>
    </singleXmlCell>
</singleXmlCells>

The only missing piece of information is the sheet on which these single xml cells reside. For this you have to go to folder xl\worksheets\_rels\ where you will find files like sheet1.xml.rels, holding information about the relationships of the sheet. For example, in sheet1.xml.rels you will find a line that relates this sheet1 to tableSingleCells1.xml.
Conclusion: opening your Excel file as a zip archive and then locate and parse some of the xml-files mentioned above provides an alternative method to obtain all XPaths in your workbook.
How to open your Excel zip archive using VBA is described elsewhere, for example on Ron de Bruin’s site.

Conclusion

Above I described the basics for two techniques to obtain a list of all XPaths from a workbook. If you have any suggestions for further improvement please drop a line.

I hope this pixcel helps. Excel!

(Featured image from Microsoft)

Comments

  1. sean says:

    there is also the XMLDataQuery option (still evaluating it) – what do you think?

    https://msdn.microsoft.com/en-us/library/office/aa203724(v=office.11).aspx

    Accessing Data
    You can retrieve data in cells that you assign to an element using the XMLDataQuery method. The XMLDataQuery method accepts an XPath statement that described a mapping to a cell and returns an Excel Range object. For example, this code retrieves the Excel Range object mapped to the Name element:
    Dim rng As Range
    ActiveWorkbook.Worksheets(1).XmlDataQuery( _
    “/ns1:so/ns1:Customer/ns1:Name”).value

Speak Your Mind

*