DataStream
LANGUAGES: XML, C#
ASP.NET VERSIONS: All
Using XML
Schemas
XSD
Files Can Save You Time & Frustration
By Brian
Noyes
The DataSet
class makes it very easy to load XML data into an instance of the class, and
then work with it as relational data. However, many questions arise when
developers start loading nested XML elements into a data set. That's when a
little understanding of what's going on under the covers goes a long way toward
obtaining the results you expect.
When you
read XML data into a data set, the resulting relational schema that's created
will depend on two things: the shape of the XML data and whether you provide an
XML Schema describing that shape before loading the data itself. To understand
the different approaches and the resulting outcome, let's start with the
basics.
Loading XML
Schema and Data into a DataSet
The DataSet
class has two methods you must consider when loading data: ReadXmlSchema
and ReadXml (their names make the purpose of each method fairly easy to
understand). If you provide an XML Schema Definition (XSD) file to the ReadXmlSchema
method before you try to load the XML data with ReadXml, the data set
will create a relational schema for itself that corresponds to the XML schema
provided. When you then load the XML data with the ReadXml method, the
data for elements and attributes corresponding to the provided schema will be
loaded into the tables and columns that were created. If you read any XML
elements or attributes that aren't part of the XML schema, they will be
ignored.
In
contrast, if you call ReadXml on an XML document without first calling ReadXmlSchema
with a schema file, the data set will do its best to infer an appropriate
relational schema based on the shape or structure of the XML it finds in the
XML file that you tell it to read. The results you get may or may not match
your expectations. There are also many forms of XML that the data set will be
unable to load, particularly if a given element is present at multiple levels
in the hierarchy of nodes.
When the
data set creates a relational schema for XML data, it maps elements with child
elements or attributes to tables, and elements containing text or attributes to
fields on the containing table. Consider the example XML document shown in
Figure 1, which contains a subset of data from the Customers table of the
Northwind database.
<CustomersDataSet>
<Customers>
<CustomerID>ALFKI</CustomerID>
<CompanyName>Alfreds
Futterkiste</CompanyName>
<Orders>
<OrderID>10643</OrderID>
</Orders>
<Orders>
<OrderID>10692</OrderID>
</Orders>
</Customers>
<Customers>
<CustomerID>ANATR</CustomerID>
<CompanyName>Ana
Trujillo</CompanyName>
<Orders>
<OrderID>10308</OrderID>
</Orders>
<Orders>
<OrderID>10625</OrderID>
</Orders>
</Customers>
</CustomersDataSet>
Figure
1: Sample data
from the Customers table of the Northwind database as XML.
If you
simply read this data in to a data set with ReadXml, the data set will
contain two tables, Customers and Orders, corresponding to the elements it
finds that contain other child elements. The XML in Figure 2 results in exactly
the same relational schema.
<CustomersDataSet>
<Customers CustomerID="ALFKI"
CompanyName="Alfreds
Futterkiste">
<Orders OrderID="10643"/>
<Orders OrderID="10692"/>
</Customers>
<Customers CustomerID="ANATR"
CompanyName="Ana Trujillo">
<Orders OrderID="10308"/>
<Orders OrderID="10625"/>
</Customers>
</CustomersDataSet>
Figure
2: Sample data
from the Customers table of the Northwind database as XML with attribute
content instead of elements.
The
other thing the data set will do while reading in data like this is look at the
nested relationships of elements with other elements. When it sees that the
Orders elements are child elements of the Customers elements, it will create a
parent-child relation between the two tables with a corresponding foreign key
constraint to maintain the nested nature of the data while it's being used as
relational data.
Foreign
Keys Are the Key
The
thing that usually trips up developers when they're loading nested XML data
into a data set is not understanding what the data set is using as a foreign
key between parent and child tables. In the case of either of the XML documents
shown in Figures 1 and 2, the data set doesn't have enough information to form
a foreign key, so it adds a field to each of the two tables it creates to form
that foreign key constraint. What it adds in that case is an integer field to
each of the Customers and Orders tables that it names Customers_Id. It uses
this field to create a primary key in the Customers table and a foreign key
constraint from the Orders table to the Customers table using the corresponding
field in the Orders table.
This has
a couple of undesirable effects. One is that the relational data now has
another field that you must manage. If you are inserting new data into the
child table, you will have to determine what the corresponding parent row's Id
column value is, and set that before adding the row. Otherwise, it won't be
added as a child row of the parent. For example, to add an order to the
customer whose CustomerID field is ANATR, you would need the following code:
DataSet ds =
new DataSet();
ds.ReadXml("CustomersOrders.xml");
DataRow dr =
ds.Tables["Orders"].NewRow();
dr["OrderID"]
= 99999;
dr["Customers_Id"]
= 1;
ds.Tables["Orders"].Rows.Add(dr);
The key
extra step here that eludes many people is the need to set that additional
Customers_Id field that was fabricated by the data set to the appropriate
foreign key to make it a child of the appropriate parent Customers row.
The
other undesirable effect is that the data may already contain a field that
represents the real primary key for the parent table to which a foreign key
should be set. If that is the case, then the solution lies in ensuring to first
load an XML schema with that information embedded using the ReadXmlSchema
method before loading the XML data with ReadXml.
You can
easily create an XML schema for a given XML document using Visual Studio .NET.
Simply load the XML document into the editor, and select Create Schema from the XML menu. Once you've created an XML schema for the document, open
the XSD file in the editor and drag a Key component from the Toolbox onto the
appropriate table (Customers in the sample XML above) and set the primary key
on the appropriate field. Once you've done that, if you load the XSD with ReadXmlSchema,
followed by loading the XML with ReadXml, the data set will no longer
infer an extra field on the parent table; it will simply add the foreign key
field in the child table to the specified primary key field in the parent, and
you should get exactly the results you expect.
If
you're loading the XML data into an XmlDataDocument instead of a data set,
you'll have to call ReadXmlSchema on the DataSet property before
loading data with the Load method. Otherwise, none of the loaded data
will get added to the data set relational schema. The code below shows how to
load a schema for the data set of an XmlDataDocument, before loading the
XML document:
// Create the
document.
XmlDataDocument
doc = new XmlDataDocument();
// Get the data
set reference.
DataSet ds =
doc.DataSet;
// Read schema
to use for relational data in document.
ds.ReadXmlSchema(@"..\..\NestedData.xsd");
// Load it in.
doc.Load(@"..\..\NestedData.xml");
If you
don't provide the schema, the data would still be loaded, and accessible
through the methods and properties of the base XmlDocument class, but
the DataSet property would contain no tables. Even if you provide the
schema, you must keep in mind the foreign key issues described earlier when
adding rows to the generated tables. If you add a new row to the Orders table
without setting the foreign key column to an existing row in the Customers
table, the resulting Order element is created as a child of the root element of
the XML document.
Brian
Noyes is a
software architect with IDesign, Inc. (http://www.idesign.net),
a .NET-focused architecture and design consulting firm. Brian specializes in
designing and building data-driven distributed Windows and Web applications.
Brian writes for a variety of publications and is working on a book for
Addison-Wesley on building Windows Forms Data Applications with .NET 2.0.
Contact him at mailto:brian.noyes@idesign.net.