How to use the FILTERXML function
What is the FILTERXML function?
The FILTERXML function extracts specific values from XML content by using the given xpath. The function was introduced in 2013 and you need Excel 2013 or a newer version to use the FILTERXML function.
What's on this page
1. Introduction
What is XML?
XML is an abbreviation for eXtensible Markup Language, it is a text format for storing and transporting data. It's used to structure, store, and transfer information between different systems or applications.
- XML uses tags to define elements, similar to HTML, but allows users to create custom tags.
- XML documents contain both data and information about what the data means.
- XML can be read by various applications across different platforms. XML is employed in various applications, including web services, configuration files, and data exchange formats.
- XML documents have a tree-like structure with nested elements.
- Unlike HTML, XML focuses on describing data structure, not presentation.
Here is an example of xml code:
<?xml version="1.0" encoding="UTF-8"?> <bookstore> <book category="fiction"> <title>The Great Gatsby</title> <author>F. Scott Fitzgerald</author> <year>1925</year> <price>10.99</price> </book> <book category="non-fiction"> <title>A Brief History of Time</title> <author>Stephen Hawking</author> <year>1988</year> <price>14.99</price> </book> </bookstore>
How to use XML:
- Create an XML file: Use a text editor to write your XML document, following the proper syntax with tags and elements.
- Define structure: Use nested elements to represent hierarchical data relationships.
- Add attributes: Include additional information about elements using attributes within the opening tags.
- Validate: Use DTD (Document Type Definition) or XML Schema to ensure your document follows a specific structure.
- Parse: Use XML parsers in various programming languages to read and extract data from the XML file.
XML is particularly useful when you need a flexible, platform-independent way to structure and share data, especially when the data has a hierarchical nature. However, for simpler data structures or when performance is a critical factor, other formats like JSON might be preferred.
Introduction to XML - w3schools
What is a xpath?
XPath is a query language for selecting elements and attributes in XML data.
What is XPath? - w3schools
2. Syntax
FILTERXML(xml, xpath)
xml | Required. A value in XML format. |
xpath | Required. A value in XPath format. |
3. Example 1
The content in cell B3 is:
<movie><title>The Dark Knight</title> <title>Pulp fiction</title> <title>Inception</title></movie>
This is a simple XML structure representing movie titles. It contains:
- A root element <movie>
- Three <title> elements nested within the <movie> element, each containing a movie title:
- The Dark Knight
- Pulp fiction
- Inception
This XML snippet is what the FILTERXML function in cell B6 is processing to extract all the title contents.
Formula in cell B6:
This formula uses the FILTERXML function in Excel to extract data from the XML string stored in cell B3. Here is how:
- FILTERXML is an Excel function that allows you to query XML data using XPath expressions.
- The first argument (B3) refers to the cell containing the XML data.
- The second argument ("//title") is an XPath expression:
- "//" means "search anywhere in the document"
- "title" specifies that we're looking for all <title> elements
This formula searches the XML data in cell B3 for all <title> elements, regardless of their position in the XML structure, and returns their contents. In this particular case, it extracts:
- "The Dark Knight"
- "Pulp fiction"
- "Inception"
These results are displayed in cells B6, B7, and B8 respectively. The FILTERXML function is particularly useful for parsing and extracting specific data from XML structures within Excel, without needing to use more complex XML parsing tools or programming languages.
4. How to extract XML attribute?
An XML element may contain an attribute, the following formula demonstrates how to extract a given attribute from XML data.
What is an XML attribute?
An XML attribute is a name-value pair that provides additional information about an XML element.
XML attribute- w3schools
What is an XML element?
An XML element is a basic unit of an XML document that consists of a start tag, an end tag, and the content in between.
XML Elements - w3schools
Formula in cell B6:
Cell B3 contains the following XML:
<cities><NorthAmerica city="Vancouver" country="Canada"/><NorthAmerica city="Seattle" country="Usa"/></cities>
The formula in cell B6 returns {"Vancouver";"Seattle"} spilled to cell B7.
5. How to import XML data?
- Select a destination cell.
- Go to tab "Developer" on the ribbon.
- Press with left mouse button on the "Import" button, a file dialog box appears.
- Press with mouse on an XML file to select it.
- Press with left mouse button on the "Open" button to import the file.
Here is what the example XML data looks like:
6. Put XML data in different columns based on XML tags?
The formula in cell B6 concatenates // with the value in cell B5 and then extracts the corresponding data.
Formula in cell B6:
The ampersand character lets you concatenate values in an Excel formula.
Cell B3 contains the following xml:
<countries><Europe>France</Europe><Asia>China</Asia><Europe>Spain</Europe><Asia>Japan</Asia><Asia>Thailand</Asia><Africa>Kenya</Africa></countries>
Cell B6 returns {"France";"Spain"}.
Explaining formula in cell B6
Step 1 - Concatenate column header to build xpath arg
"//"&B$5
becomes
"//"&"Europe"
and returns "//Europe"
Step 2 - Extract XML data
FILTERXML($B$3,"//"&B$5)
returns {"France";"Spain"}.
7. FILTERXML function not working
The FILTERXML returns a
- #VALUE! error if the XML is not valid.
- #VALUE! error if the XML contains a namespace with a prefix that is invalid.
- #NAME? error if you misspell the function name.
- propagates errors, meaning that if the input contains an error (e.g., #VALUE!, #REF!), the function will return the same error.
7.1 Troubleshooting the error value
When you encounter an error value in a cell a warning symbol appears, displayed in the image above. Press with mouse on it to see a pop-up menu that lets you get more information about the error.
- The first line describes the error if you press with left mouse button on it.
- The second line opens a pane that explains the error in greater detail.
- The third line takes you to the "Evaluate Formula" tool, a dialog box appears allowing you to examine the formula in greater detail.
- This line lets you ignore the error value meaning the warning icon disappears, however, the error is still in the cell.
- The fifth line lets you edit the formula in the Formula bar.
- The sixth line opens the Excel settings so you can adjust the Error Checking Options.
Here are a few of the most common Excel errors you may encounter.
#NULL error - This error occurs most often if you by mistake use a space character in a formula where it shouldn't be. Excel interprets a space character as an intersection operator. If the ranges don't intersect an #NULL error is returned. The #NULL! error occurs when a formula attempts to calculate the intersection of two ranges that do not actually intersect. This can happen when the wrong range operator is used in the formula, or when the intersection operator (represented by a space character) is used between two ranges that do not overlap. To fix this error double check that the ranges referenced in the formula that use the intersection operator actually have cells in common.
#SPILL error - The #SPILL! error occurs only in version Excel 365 and is caused by a dynamic array being to large, meaning there are cells below and/or to the right that are not empty. This prevents the dynamic array formula expanding into new empty cells.
#DIV/0 error - This error happens if you try to divide a number by 0 (zero) or a value that equates to zero which is not possible mathematically.
#VALUE error - The #VALUE error occurs when a formula has a value that is of the wrong data type. Such as text where a number is expected or when dates are evaluated as text.
#REF error - The #REF error happens when a cell reference is invalid. This can happen if a cell is deleted that is referenced by a formula.
#NAME error - The #NAME error happens if you misspelled a function or a named range.
#NUM error - The #NUM error shows up when you try to use invalid numeric values in formulas, like square root of a negative number.
#N/A error - The #N/A error happens when a value is not available for a formula or found in a given cell range, for example in the VLOOKUP or MATCH functions.
#GETTING_DATA error - The #GETTING_DATA error shows while external sources are loading, this can indicate a delay in fetching the data or that the external source is unavailable right now.
7.2 The formula returns an unexpected value
To understand why a formula returns an unexpected value we need to examine the calculations steps in detail. Luckily, Excel has a tool that is really handy in these situations. Here is how to troubleshoot a formula:
- Select the cell containing the formula you want to examine in detail.
- Go to tab “Formulas” on the ribbon.
- Press with left mouse button on "Evaluate Formula" button. A dialog box appears.
The formula appears in a white field inside the dialog box. Underlined expressions are calculations being processed in the next step. The italicized expression is the most recent result. The buttons at the bottom of the dialog box allows you to evaluate the formula in smaller calculations which you control. - Press with left mouse button on the "Evaluate" button located at the bottom of the dialog box to process the underlined expression.
- Repeat pressing the "Evaluate" button until you have seen all calculations step by step. This allows you to examine the formula in greater detail and hopefully find the culprit.
- Press "Close" button to dismiss the dialog box.
There is also another way to debug formulas using the function key F9. F9 is especially useful if you have a feeling that a specific part of the formula is the issue, this makes it faster than the "Evaluate Formula" tool since you don't need to go through all calculations to find the issue..
- Enter Edit mode: Double-press with left mouse button on the cell or press F2 to enter Edit mode for the formula.
- Select part of the formula: Highlight the specific part of the formula you want to evaluate. You can select and evaluate any part of the formula that could work as a standalone formula.
- Press F9: This will calculate and display the result of just that selected portion.
- Evaluate step-by-step: You can select and evaluate different parts of the formula to see intermediate results.
- Check for errors: This allows you to pinpoint which part of a complex formula may be causing an error.
The image above shows cell reference B3 converted to hard-coded value using the F9 key. The FILTERXML function requires valid xml code which is not the case in this example. We have found what is wrong with the formula.
Tips!
- View actual values: Selecting a cell reference and pressing F9 will show the actual values in those cells.
- Exit safely: Press Esc to exit Edit mode without changing the formula. Don't press Enter, as that would replace the formula part with the calculated value.
- Full recalculation: Pressing F9 outside of Edit mode will recalculate all formulas in the workbook.
Remember to be careful not to accidentally overwrite parts of your formula when using F9. Always exit with Esc rather than Enter to preserve the original formula. However, if you make a mistake overwriting the formula it is not the end of the world. You can “undo” the action by pressing keyboard shortcut keys CTRL + z or pressing the “Undo” button
7.3 Other errors
Floating-point arithmetic may give inaccurate results in Excel - Article
Floating-point errors are usually very small, often beyond the 15th decimal place, and in most cases don't affect calculations significantly.
'FILTERXML' function examples
This blog article describes how to split strings in a cell with space as a delimiting character, like Text to […]
The LIKE operator allows you to match a string to a pattern using Excel VBA. The image above demonstrates a […]
The picture above shows data presented in only one column (column B), this happens sometimes when you get an undesired […]
Functions in 'Web' category
The FILTERXML function function is one of 4 functions in the 'Web' category.
How to comment
How to add a formula to your comment
<code>Insert your formula here.</code>
Convert less than and larger than signs
Use html character entities instead of less than and larger than signs.
< becomes < and > becomes >
How to add VBA code to your comment
[vb 1="vbnet" language=","]
Put your VBA code here.
[/vb]
How to add a picture to your comment:
Upload picture to postimage.org or imgur
Paste image link to your comment.
Contact Oscar
You can contact me through this contact form