When working with XML enabled workbooks the relevant objects and properties are easy to retrieve. Unfortunately a workbook has no built-in collection of XPath objects. How can we get a list of all XPath objects in a workbook?
For a basic introduction to XML in Excel I recommend Overview of Xml in Excel – Microsoft.
The figure below shows the logical relationship between mentioned objects:
- A Workbook has zero or more XMLMaps (property: Workbook.XMLMaps);
- An XMLMap has one or more XMLSchemas (property: XMLMap.Schemas);
- At any time an XMLMap can point to one XML File (property: XMLMap.XmlDataBinding.SourceUrl);
- An XMLMap is related to zero or more mapped worksheet ranges.
- The read-only property XMLMap.Schemas points to an XMLSchemas collection. However, this collection has no Add method. So, we may assume that in practice an XMLMap will never have more than one Schema.
- In the above referenced article it is mentioned that “You cannot export the Excel inferred schema as a separate XML schema data file.” However, you can always retrieve the schema by means of the property Activeworkbook.XMLMaps(1).Schemas(1).XML. Replace the first index number by yours or your XMLMap name.
- Based on the above logical diagram you would expect the XMLMap to have a property like MappedRanges, pointing to a collection of Ranges that are mapped to the XMLMap-schema. Unfortunately there is no such property or collection. In fact, the relationship is reverse: a mapped Range points to a certain XMLMap by means of the property Range.XPath. See below for more details.
- For completeness: in the above model we disregarded the XMLNameSpace object (property XmlMap.RootElementNamespace, which points to an XmlNamespace object in the Workbook.XmlNamespaces collection).
Get a list of all XPaths
UPDATE: see Get XPaths from workbook – update
To get a list of all XPath objects in a workbook we need a procedure like this (pseudo code):
For Each Worksheet In ActiveWorkbook.Worksheets For Each Cell In Worksheet.UsedRange If Cell has XPath Then do something Next Cell Next Worksheet
- The above solution doesn’t take into account that an XPath will be be repeating when it is part of a listcolumn in a table (listobject). So, we can improve the above pseudo-code by skipping all other cells of a listcolumn after processing one of them;
- Furthermore, we can optimize the above solution by trying to retrieve the XPath for a whole column of the UsedRange first. Only when this fails we have to process the column cell by cell.
The improved pseudo-code now looks like this:
For Each Worksheet In ActiveWorkbook.Worksheets For Each Column In Worksheet.UsedRange Try to get XPath from this column as a whole If we fail Then ' get XPath cell by cell For Each Cell In Column If Cell has XPath Then do something If Cell is in a ListColumn then jump to last ListColumn cell Next Cell Else ' succes: all cells in this column have same XPath If XPath not empty then do something End If Next Column Next Worksheet
A possible implementation follows. Note that this code does not retrieve a list of all XPaths but analyzes each column in the workbook and reports the results to the immediate window.
Private Sub PrintXPaths() Dim ws As Excel.Worksheet Dim rColumn As Excel.Range Dim objXPath As Excel.XPath Dim i As Long Dim bHasXPath As Boolean For Each ws In ActiveWorkbook.Worksheets Debug.Print ws.Name, ws.UsedRange.Address For Each rColumn In ws.UsedRange.Columns With rColumn ' try to get XPath from column as a whole Set objXPath = Nothing On Error Resume Next Set objXPath = .XPath On Error GoTo 0 ' check result If objXPath Is Nothing Then ' objPath on column failed, error -2147024809 ' take this column cell by cell bHasXPath = False ' posit For i = 1 To .Cells.Count With .Cells(i) DoEvents With .XPath If .Value <> vbNullString Then bHasXPath = True ' admit Debug.Print .Parent.Column, .Map.Name, .Value, "Repeating = "; .Repeating End If End With ' skip cells if we entered a listcolumn If Not (.ListObject Is Nothing) Then i = i + .ListObject.ListRows.Count End With Next i If bHasXPath = False Then Debug.Print .Column, "No XPath (processed cell by cell)" Else ' success: all cells in this column have the same XPath If objXPath.Value <> vbNullString Then ' we have an XPath Debug.Print .Column, objXPath.Map.Name, objXPath.Value, "Repeating = "; .XPath.Repeating Else ' no XPath Debug.Print .Column, "No XPath (processed column)" End If End If End With Next Next End Sub
An example of a report as written to the immediate window:
Sheet1 $B$2:$E$28 2 CatalogMap /ns1:Catalog/Book/Id Repeating = True 3 No XPath (processed cell by cell) 4 CatalogMap /ns1:Catalog/Id Repeating = False 4 CatalogMap /ns1:Catalog/Book/Title Repeating = True 4 CatalogMap /ns1:Catalog/Name Repeating = False 5 No XPath (processed column)
This report is taken from a worksheet Sheet1 (a Catalog) with UsedRange B2:E28 and one (Books) table three columns wide (sheet columns 2-4). The first and third listcolumn of the table (sheet columns 2 and 4) are mapped. Above and below listcolumn 3 (sheet column 4) a single cell is mapped. Sheet column 3 could not be processed as a column because of the listobject (see next paragraph).
Why per column?
Perhaps you wonder why we process the Usedrange per column. Can’t we improve even more by first trying to set the XPath object to the whole Usedrange? Let Excel give you the answer: activate a sheet with a Usedrange of more than one column and run this line from the immediate window:
This will open the message-box below:
So, according to the first reason it is only possible to get the XPath per column.
Note: if you run the above statement from code it will raise a runtime error and the following messagebox will show:
Notice the difference between the two message boxes. Don’t we miss something here?
XPath in Open XML
If the technique described above is not fast enough for you there is an alternative: unzip your (not encrypted) Excel workbook. Then, in folder xl\tables\ you will find a list of all tables in your workbook. Tables that are connected to an XMLMap have an extra attribute tableType=”xml”, and the relevant columns have an extra property xmlColumnPr=”” (Pr stands for Properties). So it’s not difficult to built a list of all repeating XPaths in your workbook.
For the single cell XPaths things are a bit more complicated. Again start in the xl\tables\ folder, where you also find file-names in a format like tableSingleCells1.xml. One such a file defines all single xml cells in one sheet, and may look like this:
<singleXmlCells> <singleXmlCell id="14" r="D28" connectionId="0"> <xmlCellPr id="1" uniqueName="Author"> xmlPr = "" </xmlCellPr> </singleXmlCell> <singleXmlCell id="4" r="D6" connectionId="0"> <xmlCellPr id="1" uniqueName="SaveDate"> xmlPr = "" </xmlCellPr> </singleXmlCell> </singleXmlCells>
The only missing piece of information is the sheet on which these single xml cells reside. For this you have to go to folder xl\worksheets\_rels\ where you will find files like sheet1.xml.rels, holding information about the relationships of the sheet. For example, in sheet1.xml.rels you will find a line that relates this sheet1 to tableSingleCells1.xml.
Conclusion: opening your Excel file as a zip archive and then locate and parse some of the xml-files mentioned above provides an alternative method to obtain all XPaths in your workbook.
How to open your Excel zip archive using VBA is described elsewhere, for example on Ron de Bruin’s site.
Above I described the basics for two techniques to obtain a list of all XPaths from a workbook. If you have any suggestions for further improvement please drop a line.
I hope this pixcel helps. Excel!
(Featured image from Microsoft)