If possible I always use pivottables to create reports in Excel. Pivottables need well formed, normalized data tables as source, which are usually not suitable for data entry. In this post I describe two techniques to bridge the gap between userfriendly and pivotfriendly data structures.
Suppose you need to enter quarterly figures of three departments in several years. One way to set up your input table is as follows:
For easy data entry this table is well formed and userfriendly. Unfortuately, this table is not well formed for pivottables. It has four columns with values of the same ‘type’, Q1 through Q4. The table is not ‘normalized’ and therefore usually not suitable to produce quality pivot reports. A well formed, normalized source table has only one value column:
The first three columns of this table somehow specify or describe the Value field; they are sometimes also called ‘attributes’ or ‘dimensions’.
With the above table it is easy to create pivot reports and charts like these:
Also, with a normalized source table it is usually easy to reproduce it:
The question is: how do we bridge the gap between the user friendly source table and the pivot friendly normalized table above? Below I describe two vba-based techniques. See the attached workbooks for details.
The Moving Ranges technique
The basic idea is to define a few ranges as ‘windows’ on the source and target tables, and to transfer values from source to target as a ‘block’.
In this case we define four ranges:
- In the source table the ranges RowLables (red) and ColumnLables (light blue) will not change;
- Range Values (green) will move every round, from Q1 to Q4;
- In the target table we define only one BaseRange which will be moved and horizontally resized as required.
To transfer the values of quarter Q1 to the target table we perform three steps:
- Step 1: resize the target BaseRange to two columns and transfer the values of range RowLabels;
- Step 2: move BaseRange to column Quarter and enter the current quarter from range ColumnLabels;
- Step 3: again move the base range to column Value and transfer the green range Values.
In the next round we move the green range Values to Q2, move BaseRange nine rows down and repeat steps 1 to 3. Etcetera.
This process is implemented in a public procedure Update which is located in the code module of the target sheet; Update is a method to refresh the target table, so to speak. It has three blocks of code. In block 1 we prepare some things based on the source table, after defining some variables:
Set tbSource = SSource.ListObjects(1) Set tbTarget = Me.ListObjects(1) lNRowLabels = 2 ' Year and Department lNColumnLabels = 4 ' Q1 - Q4 With tbSource ' prepare source things ' if source is empty clear target too and we're done If .ListRows.Count = 0 Then If tbTarget.ListRows.Count > 0 Then tbTarget.DataBodyRange.EntireRow.Delete Exit Function End If ' setup ranges Set rColumnLabels = .HeaderRowRange.Offset(, lNRowLabels).Resize(, lNColumnLabels) Set rRowLabels = .ListColumns(1).DataBodyRange.Resize(, lNRowLabels) Set rValues = .ListColumns(3).DataBodyRange lNSourceRows = .ListRows.Count End With
The comments speak for themselves, I hope.
In block 2 we prepare some things based on the target table.
With tbTarget ' prepare target things ' if we need too many rows exit If rColumnLabels.Cells.Count * lNSourceRows > Rows.Count - .HeaderRowRange.Row Then MsgBox "Too big.", vbCritical, "Error" Exit Function End If ' clear target If .ListRows.Count > 0 Then .DataBodyRange.EntireRow.Delete ' set baserange = databodyrange of first column after first transfer Set rBaseRange = .HeaderRowRange.Offset(1).Resize(lNSourceRows, 1) ' make sure table will expand automatically when transferring to row below table Application.AutoCorrect.AutoExpandListRange = True End With
And finally in block 3 the hard work is done:
' transfer: for each Quarter move the BaseRange down and transfer values For iColumn = 0 To lNColumnLabels - 1 ' note: zero based With rBaseRange.Offset(iColumn * lNSourceRows) .Resize(, lNRowLabels).Value = rRowLabels.Value .Offset(, lNRowLabels).Value = rColumnLabels.Cells(iColumn + 1).Value .Offset(, lNRowLabels + 1).Value = rValues.Offset(, iColumn).Value End With Next iColumn
The Moving Ranges technique is usually not difficult to setup and is very efficient, especially when you can define large ranges. The direct block transfer of values from range to range performs very well and doesn’t need extra resources, as in the technique below.
The Arrays technique
The basic idea of this technique is well known.
- Step 1: read the Source table into a two dimensional Source array;
- Step 2: redimension the Target array; in this case we need an array of 9*4=36 rows and 4 columns. Then for each Source row iterate over the four quarters, and transfer the values from source to target array. Every source row results in four target rows;
- Step 3: clear the target table and dump the target array to a range of the same size below the header of the target table.
This process is implemented in procedure Update of the enclosed workbook. This is the final part of this procedure:
' convert Source array to Target array For iSourceRow = 1 To lNSourceRows lTargetBaseRow = (iSourceRow - 1) * lNColumnLabels + 1 For iColumn = 0 To lNColumnLabels - 1 aTarget(lTargetBaseRow + iColumn, 1) = aSource(iSourceRow, 1) aTarget(lTargetBaseRow + iColumn, 2) = aSource(iSourceRow, 2) aTarget(lTargetBaseRow + iColumn, 3) = rColumnLabels.Cells(iColumn + 1).Value aTarget(lTargetBaseRow + iColumn, 4) = aSource(iSourceRow, 3 + iColumn) Next Next ' dump array to Target With Me.ListObjects(1).HeaderRowRange.Offset(1).Resize(lNTargetRows, lNTargetColumns) .Value = aTarget End With
This technique is usually easy to setup and performs well. An important benefit is that it is easy:
- to modify the individual values (change a date format);
- to enrich the source data (add the location of every department to the target array);
- to filter the source data (we are only interested in the departments in locations X and Y);
- to aggregate the source data (sum the values and count the number of departments per location);
- to validate the source data (if a value is not a date give some feedback to the user).
A drawback could be the use of memory resources, especially in case you have a large source table. Notice that without filtering or aggregating you need about twice the size of the source table, and that all array elements are variants. If memory is an issue you could decide to only read the source table, and write directly into the target table.
Showcase app Decathlon is an example of the Arrays technique which enriches the data (by adding Event type).
- It is possible to develop more general functions from the methods above. However, it’s my experience that every project has it’s own unique characteristics, so personally I prefer to use the code above as a template and modify it as required;
- In the procedures above we check if target needs more rows than Excel has available. I have never been in a situation that I needed more than a million rows, so to me this check is a bit overdone. Of course, there are situations where potentially you could need more than a million rows, but usually we only need a subset of the source data to create the report we want, so we can filter and/or aggregate the source data;
- For the above case we could use a userform. Then we wouldn’t need a source table: from the userform we can directly update four rows in the target table and normalize ‘on the fly’. A userform has many advantages. However, it also makes things more complicated, since we need to implement functions like insert, delete, search and filter rows, which are built in in a table on the worksheet. There’s no best choice, it depends on the demands of the project you’re working on;
- There are a few other techniques to normalize source data. See for example Unpivot Shootout, with a few references to other posts on this subject;
- Some use the term ‘unpivot’ instead of ‘normalize’. For more than a decade Unpivot is a well known function of database management systems like Oracle Database and SQL Server, but since Microsoft introduced the Unpivot feature in Power Query in 2013 the term gained in popularity in the Excel/BI community. Personally I prefer ‘normalize’, since ‘unpivot’ somehow gives me the idea that it will return the source rows. Which of course it never will.
I hope this pixcel helps. Excel!