Excel and XML: four observations and two bugs

I have been working with Xml files in Excel for some time now, and thought it is time to share some observations.

Introduction

Consider a simple xml file data1.xml with this content:

<catalog>
    <book>
        <title>My first title</title>
        <publishdate>2001-01-01</publishdate>
    </book>
    <book>
        <title>My second title</title>
        <publishdate>2002-01-01</publishdate>
    </book>
</catalog>

You can import this file in Excel. Since there are two books in the catalog Excel knows that “book” is a repeating element, and will create a table (listobject or list). Xml-import this file into a new workbook and the result will be:

Xml Observations fig 1

Xml Observations fig 1

(Download the enclosed workbook if you like.)
During the import process Excel created an XmlMap object and inferred a schema. You can retrieve this schema by property Thisworkbook.XmlMaps(1).Schemas(1).XML. Simplified this schema looks like this:

<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
  <xsd:element name='catalog'>
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name='book' maxOccurs='unbounded'>
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name='title' type='xsd:string'></xsd:element>
              <xsd:element name='publishdate' type='xsd:date'</xsd:element>
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Note that element “book” (line 5) has a restriction of maxOccurs=”unbounded”, and that publishdate (9) is typed as a date. That’s fine.
Now delete row 2 from the table (right-click in row 2 > Delete > Table rows). Right-click in the table again and choose XML > Export and save to a new file data2.xml. The contents of this file will be:

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<catalog xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
    <book>
        <title>My first title</title>
        <publishdate>2001-01-01</publishdate>
    </book>
</catalog>

No surprises so far.

Observation 1: Mapped data are bound to last import

Although you just exported the data to data2.xml, the mapped cells are still bound to data1.xml. Right-click in the table and choose Refresh, and you will see that data1.xml is reloaded, not data2.xml. To bind data2.xml to the mapped cells you must Xml-import this file. To see which file is bound to an XmlMap use property Thisworkbook.XmlMaps(1).DataBinding.SourceUrl. This property is read-only.
So keep in mind: mapped elements are bound to the last Xml-import.

Observation 2: Excel refuses to export empty tables

Delete all rows in the table and Xml-export to a file. Excel shows this message:

Xml Observations fig 2

Xml Observations fig 2

There is no reason for Excel to do this. Actually, I expected Excel to create an empty file like this:

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<catalog xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'/>

It is perfectly valid against the schema that Excel created. Furthermore, when you save this to a file and import it, Excel will do so without any notification or problem! Annoying. However, there is a simple workaround, if you don’t mind changing your file structure a bit.

Workaround: Add a single-mapped element

I added a single (non-repeating) element “owner” to the schema and mapped it to B2 in Sheet2 as is shown in fig 3:

Xml Observations fig 3

Xml Observations fig 3

I agree, if you don’t need an owner field then this is not a great solution. But now Excel does allow you to export an empty table! Delete all rows from the table, clear B2, and export to a file. Open this file in Notepad and (don’t) be surprised: it looks exactly like the empty file above. This file will import without a problem.
Well, almost…

Observation 3: Excel is inconsistent when importing an empty table

Make sure you have imported the empty file. After a refresh your sheet should be empty.
Next, paste these lines into the ThisWorkbook module:

Private Sub Workbook_AfterXmlImport(ByVal Map As XmlMap, ByVal IsRefresh As Boolean, _
    ByVal Result As XlXmlImportResult)
    Debug.Print 'Workbook_AfterXmlImport', Sheet2.ListObjects(1).ListRows.Count
End Sub

Now do the following:

  • Add two or more rows to the table and refresh. Your sheet should be empty again;
  • Now make sure the table is empty (delete all rows) and refresh: same result;
  • Finally, add one row only and refresh: same result.

But wait! Look in the immediate window of the VBE. It will show three lines like these:

Workbook_AfterXmlImport       0 
Workbook_AfterXmlImport       0 
Workbook_AfterXmlImport       1 

Conclusion: take a list with one row, refresh this list from a file with an empty table, then Excel does not delete the row but only clears its contents! Odd. Bug 1.

Workaround: verify the source file

The implication of this observation is that after an import which results in a list with one row, it is impossible to know whether this is from a file with zero books or from a file with one book with empty title and publishdate fields. You don’t care? Fine. But if you do: the only way to find out is to verify the source file itself. See the workaround in the next section.

Observation 4: Excel removes columnformulas when importing an empty table

First recall the normal behavior of a columnformula in a table. In Sheet2 add a third column Year to the table and enter the formula =YEAR([@publishdate]). After deleting all rows Excel will restore the columnformula when you add a new row.
Now, enter a few rows and make sure the columnformula is restored. Then refresh the list from a file with empty table, which should empty the list. Now add some rows again and note that the columnformula was not restored! Bug 2. If your application uses column formulas in xml-tables and allows a user to import empty tables you definitely need a workaround.

Workaround: Restore formulas after import

A possible solution is to prevent the user from exporting an empty table. In the BeforeExport event verify listrows.count of each involved table. If a listrows.count is 0 cancel the export and let the user add at least one row, or add one automatically. Not a nice solution. Besides, to be save you still have to verify the table after an import, since an external process could have emptied a table in your xml-file.
I prefer another solution: after each import restore the columnformulas of all empty xml-tables. Basically, the code for such a restore looks like this:

Private Sub Workbook_AfterXmlImport(ByVal Map As xmlMap, ByVal IsRefresh As Boolean, _
    ByVal Result As XlXmlImportResult)
    If Result = xlXmlImportSuccess Then
        If xmlDoc.SelectSingleNode("/catalog/book[1]") Is Nothing Then ' need to restore
            With Sheet2.ListObjects(1)
                If .ListRows.Count = 0 Then .ListRows.Add
                With .ListRows(1).Range
                    .Cells(3).Formula = "=YEAR([@publishdate])"
                End With
                .ListRows(1).Delete
            End With
        End If
    End If
End Sub

Explanation:

  • In this code it is assumed that the xml source file is loaded into DOM document xmlDoc;
  • If the node doesn’t exist you have to restore the formula, which can only be done when there is at least one listrow;
  • If the node doesn’t exist listrows.count is usually 0 but can also be 1 (Observation 3).

Please note that this workaround does not take into account that any cell formatting is not restored either.

Conclusion

Excel and Xml are a great couple. However, there is a need for three improvements:

  • Excel should allow the export of empty lists without alert (observation 2);
  • when importing empty tables Excel should delete all listrows under all conditions (observation 3);
  • when importing empty tables Excel should preserve the columnformulas and -formatting (observation 4);

These issues are still present in Excel 2013. Fortunately, a simple workarounds is available. Have you improved on this? Please drop a line.
I hope this pixcel helps. Excel!

Download

XmlObservations-v1.xlsm
Xml-files are not included. Before you proceed in the text above: on Sheet1 right-click in the table > XML > Export and save to data1.xml. Import this file again to bind it to the XmlMap before you proceed.

Comments

  1. Nice post. I wrote about this some time ago here:

    http://www.jkp-ads.com/articles/xmlandexcel00.asp

    • Hi Jan Karel,
      Thanks, appreciate it.
      Yes, I’ve read your articles a few years ago with great interest. A bit late but anyway: thanks!
      Regards,
      – Frans

  2. Tony McGovern says:

    Beautiful stuff. I’m enhancing an application that makes heavy use of XML and listobjects in Excel (i.e., creating schemas on the fly, loading XML strings into a DOM and manipulating nodes to rework the XML string, and re-importing XML strings back into Excel).

    I stumbled upon your site after banging my head for hours trying to figure out why deleting a listrow within a listobject that’s bound to an XML schema with one or more repeating elements would cause the application to crash. I think it has to do with what you mention in observation 3, so I’m now verfiying both the XML and the number of listrows before I re-import. In other words, I have better luck loading the XML into a DOM, manipulating the DOM, and then re-importing the XML back into Excel rather than manipulating the listobject itself. So far, so good …

    Thanks for this post!

    • Thanks Tony.
      Yes, In one app I Load the xml file into DOM, Import the DOM into the workbook. After editing: Export and Load into DOM and save to xml file. It’s more work, but you have much more control.

Speak Your Mind

*