Understanding XML

Introducing XML files

97 views October 23, 2018 October 24, 2018 admin 1

What is XML?

XML, (abbreviated from eXtensible Markup Language), is very flexible markup language used to store and transport information.

XML is known as a self-descriptive language where all of the structure and data is exposed in each file. As a result, XML files are a very popular method of exporting a complex relational database to a single file.

Understanding an XML file

An XML file contains the structure and data in a logical, hierarchical fashion. XML files are plain text.

The XML file shown below arranges catalog data in the following structure:

  1. Catalog
  2. MainCategory
  3. Category
  4. SubCategory
  5. Products
  6. Product
  7. ProductAttributes
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Catalog CategoryName="Sample">
<!-- Catalog structure -->
	<MainCategory CategoryName="Switches &amp; socket outlets" CategoryID="101" CategorySortOrder="1">
		<Category CategoryName="Simplo light switches" CategoryID="201" CategorySortOrder="1">
			<SubCategory CategoryName="Easy use light switches" CategoryID="301" CategorySortOrder="1">
<!-- Product records -->
				<Products>
				
					<Product ProductInstanceID="123" ProductCode="E5145" ProductSortOrder="1">
						<ProductAttributes>
							<Description>Rocker switch, single, 10A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>10</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="124" ProductCode="E5150" ProductSortOrder="2">
						<ProductAttributes>
							<Description>Rocker switch, double, 10A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>10</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="125" ProductCode="E5155" ProductSortOrder="3">
						<ProductAttributes>
							<Description>Rocker switch, triple, 10A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>10</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="126" ProductCode="E5160" ProductSortOrder="4">
						<ProductAttributes>
							<Description>Rocker switch, single, 15A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>15</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="127" ProductCode="E5165" ProductSortOrder="5">
						<ProductAttributes>
							<Description>Rocker switch, double, 15A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>15</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="128" ProductCode="E5170" ProductSortOrder="6">
						<ProductAttributes>
							<Description>Rocker switch, triple, 15A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>15</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>

				</Products>

			</SubCategory>
			<SubCategory CategoryName="Rocker switches" CategoryID="302" CategorySortOrder="2">
				<CategoryText>These switches really are awesome. Easy to fix, easy to clean, with groovy coloured inserts.</CategoryText>
<!-- Product records -->
				<Products>
				
					<Product ProductInstanceID="223" ProductCode="E5145" ProductSortOrder="1">
						<ProductAttributes>
							<Description>Single switch, 10A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>10</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="224" ProductCode="E5150" ProductSortOrder="2">
						<ProductAttributes>
							<Description>Single switch, 15A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>15</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="225" ProductCode="E5155" ProductSortOrder="3">
						<ProductAttributes>
							<Description>Double switch, 10A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>10</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="226" ProductCode="E5160" ProductSortOrder="4">
						<ProductAttributes>
							<Description>Double switch, 15A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>15</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="227" ProductCode="E5165" ProductSortOrder="5">
						<ProductAttributes>
							<Description>Triple switch, 10A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>10</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="228" ProductCode="E5170" ProductSortOrder="6">
						<ProductAttributes>
							<Description>Triple switch, 15A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>15</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>
					<Product ProductInstanceID="229" ProductCode="E5175" ProductSortOrder="7">
						<ProductAttributes>
							<Description>Single switch, 10A</Description>
							<Colour>WH</Colour>
							<Voltage>240</Voltage>
							<Amps>10</Amps>
							<Pricing RRP="6.13" Trade="3.68" />
						</ProductAttributes>
					</Product>

				</Products>

			</SubCategory>
		</Category>
	</MainCategory>
</Catalog>

Understanding XML Structure

Refer to the following sample from the complete XML shown:

<ProductAttributes>
	<Description>Rocker switch, single, 10A</Description>
	<Colour>WH</Colour>
	<Voltage>240</Voltage
	<Amps>10</Amps
	<Pricing RRP="6.13" Trade="3.68" />
</ProductAttributes>

Nodes

The term node is used to refer to part of the XML tree structure.

Everything in an XML document is a node. The entire document is a document node, every element is an element node, the text in the XML elements are element notes. Every attribute is an attribute node.

Elements

The term element is used to refer to each new tag shown within the XML document structure. In the example shown above, the following elements are shown in purple:

  • ProductAttributes
  • Description
  • Colour
  • Voltage
  • Amps
  • Pricing

Attributes

The term attribute refers to additional items contained after the element name within an XML document. In the example shown above, the following attributes are shown in blue:

  • RRP
  • Trade

Comparing elements and attributes

While data may typically be represented as either elements or attributes, there are some constraints are useful in helping you decide which method suits best. (There are also some items listed which apply to both elements and attributes – these are provided here for reference):

Elements Attributes
  • Element names must comprise alphabetical or numeric characters
  • Element names may not contain spaces
  • Elements may have an unlimited number of child elements
  • Empty elements may be represented as either <element /> or <element></element>
  • Attribute names must comprise alphabetical or numeric characters
  • Attribute names may not contain spaces
  • Attribute values must be contained within quotes <element attribute=”value”>
  • Attributes cannot contain multiple values
  • Attributes do not describe document or data structure

XML Conventions

There are many different ways to represent the same (or similar) information within an XML file.

A number of key choices should inform your decision (remember too that sometimes you won’t have a choice as the XML schema will be already decided in the source application):

  • Whether the XML file needs to be read by a human
  • Whether the source or destination have any technical limitations

XML should contain logical structure

An XML file should nest items within its hierarchical view to inform the user as to the relationship between these elements.

Do not show categories all at same level

The following example is not considered an optimal structure for future use as each category is listed at the same level in the document hierarchy. An end user will need to be informed of the desired structure.

<MainCategory CategoryName="Switches &amp; socket outlets" CategoryID="101" CategorySortOrder="1" />
<Category CategoryName="Simplo light switches" CategoryID="202" CategorySortOrder="1" />
<SubCategory CategoryName="Easy use light switches" CategoryID="301" CategorySortOrder="1" />

Show hierarchical categories to expose structure

The following is much clearer example highlighting the inherent relationship between categories and sub-categories. The parent categories are closed towards the bottom of the example, clearly denoting that the MainCategory contains the Category and SubCategory levels.

<MainCategory CategoryName="Switches &amp; socket outlets" CategoryID="101" CategorySortOrder="1">
	<Category CategoryName="Simplo light switches" CategoryID="202" CategorySortOrder="1">
		<SubCategory CategoryName="Easy use light switches" CategoryID="301" CategorySortOrder="1">
			<!-- Other content-->
		</SubCategory>
	</Category>
</MainCategory>

Comparing elements or attributes

The following examples contain the same data:

Data as nested elements

<ProductAttributes>
	<Description>Rocker switch, single, 10A</Description
	<Colour>WH</Colour>
	<Voltage>240</Voltage>
	<Amps>10</Amps>
</ProductAttributes>

Data with multiple attributes

<ProductAttributes Description="Rocker switch, single, 10A" Colour="WH" Voltage="240" Amps="10"/>

When to use elements or attributes

While there are no specific rules regarding when to use elements of attributes, it is generally accepted that important data within the XML document is contained within elements, while attributes are used to display additional information to assist in solution development or troubleshooting.

Accessing data within an XML file using XPaths

Information contained within an XML file is able to be referenced and used by addressing various items in the tree structure.

XPaths include powerful methods for matching elements based on specific ordering or text criteria.

For more information regarding XPaths please refer our detailed article. [This article will be published on or before 31 October 2018.]

Was this helpful?