March 11, 2004 12:03 AM

Using XML Schemas

XSD Files Can Save You Time & Frustration
DevConnections
Rating: (0)

DataStream

LANGUAGES: XML, C#

ASP.NET VERSIONS: All

 

Using XMLSchemas

XSDFiles Can Save You Time & Frustration

 

 

The DataSetclass makes it very easy to load XML data into an instance of the class, andthen work with it as relational data. However, many questions arise whendevelopers start loading nested XML elements into a data set. That's when alittle understanding of what's going on under the covers goes a long way towardobtaining the results you expect.

 

When youread XML data into a data set, the resulting relational schema that's createdwill depend on two things: the shape of the XML data and whether you provide anXML Schema describing that shape before loading the data itself. To understandthe different approaches and the resulting outcome, let's start with thebasics.

 

Loading XMLSchema and Data into a DataSet

The DataSetclass has two methods you must consider when loading data: ReadXmlSchemaand ReadXml (their names make the purpose of each method fairly easy tounderstand). If you provide an XML Schema Definition (XSD) file to the ReadXmlSchemamethod before you try to load the XML data with ReadXml, the data setwill create a relational schema for itself that corresponds to the XML schemaprovided. When you then load the XML data with the ReadXml method, thedata for elements and attributes corresponding to the provided schema will beloaded into the tables and columns that were created. If you read any XMLelements or attributes that aren't part of the XML schema, they will beignored.

 

Incontrast, if you call ReadXml on an XML document without first calling ReadXmlSchemawith a schema file, the data set will do its best to infer an appropriaterelational schema based on the shape or structure of the XML it finds in theXML file that you tell it to read. The results you get may or may not matchyour expectations. There are also many forms of XML that the data set will beunable to load, particularly if a given element is present at multiple levelsin the hierarchy of nodes.

 

When thedata set creates a relational schema for XML data, it maps elements with childelements or attributes to tables, and elements containing text or attributes tofields on the containing table. Consider the example XML document shown inFigure 1, which contains a subset of data from the Customers table of theNorthwind database.

 

<CustomersDataSet>

  <Customers>

    <CustomerID>ALFKI</CustomerID>

    <CompanyName>AlfredsFutterkiste</CompanyName>

    <Orders>

      <OrderID>10643</OrderID>

    </Orders>

    <Orders>

      <OrderID>10692</OrderID>

    </Orders>

  </Customers>

  <Customers>

    <CustomerID>ANATR</CustomerID>

    <CompanyName>AnaTrujillo</CompanyName>

    <Orders>

      <OrderID>10308</OrderID>

    </Orders>

    <Orders>

      <OrderID>10625</OrderID>

    </Orders>

  </Customers>

</CustomersDataSet>

Figure1: Sample datafrom the Customers table of the Northwind database as XML.

 

If yousimply read this data in to a data set with ReadXml, the data set willcontain two tables, Customers and Orders, corresponding to the elements itfinds that contain other child elements. The XML in Figure 2 results in exactlythe same relational schema.

 

<CustomersDataSet>

  <Customers CustomerID="ALFKI"

    CompanyName="AlfredsFutterkiste">

    <Orders OrderID="10643"/>

    <Orders OrderID="10692"/>

  </Customers>

  <Customers CustomerID="ANATR"

    CompanyName="Ana Trujillo">

    <Orders OrderID="10308"/>

    <Orders OrderID="10625"/>

  </Customers>

</CustomersDataSet>

Figure2: Sample datafrom the Customers table of the Northwind database as XML with attributecontent instead of elements.

 

Theother thing the data set will do while reading in data like this is look at thenested relationships of elements with other elements. When it sees that theOrders elements are child elements of the Customers elements, it will create aparent-child relation between the two tables with a corresponding foreign keyconstraint to maintain the nested nature of the data while it's being used asrelational data.

 

ForeignKeys Are the Key

Thething that usually trips up developers when they're loading nested XML datainto a data set is not understanding what the data set is using as a foreignkey between parent and child tables. In the case of either of the XML documentsshown in Figures 1 and 2, the data set doesn't have enough information to forma foreign key, so it adds a field to each of the two tables it creates to formthat foreign key constraint. What it adds in that case is an integer field toeach of the Customers and Orders tables that it names Customers_Id. It usesthis field to create a primary key in the Customers table and a foreign keyconstraint from the Orders table to the Customers table using the correspondingfield in the Orders table.

 

This hasa couple of undesirable effects. One is that the relational data now hasanother field that you must manage. If you are inserting new data into thechild table, you will have to determine what the corresponding parent row's Idcolumn value is, and set that before adding the row. Otherwise, it won't beadded as a child row of the parent. For example, to add an order to thecustomer whose CustomerID field is ANATR, you would need the following code:

 

DataSet ds =new DataSet();

ds.ReadXml("CustomersOrders.xml");

DataRow dr =ds.Tables["Orders"].NewRow();

dr["OrderID"]= 99999;

dr["Customers_Id"]= 1;

ds.Tables["Orders"].Rows.Add(dr);

 

The keyextra step here that eludes many people is the need to set that additionalCustomers_Id field that was fabricated by the data set to the appropriateforeign key to make it a child of the appropriate parent Customers row.

 

Theother undesirable effect is that the data may already contain a field thatrepresents the real primary key for the parent table to which a foreign keyshould be set. If that is the case, then the solution lies in ensuring to firstload an XML schema with that information embedded using the ReadXmlSchemamethod before loading the XML data with ReadXml.

 

You caneasily create an XML schema for a given XML document using Visual Studio .NET.Simply load the XML document into the editor, and select Create Schema from the XML menu. Once you've created an XML schema for the document, openthe XSD file in the editor and drag a Key component from the Toolbox onto theappropriate table (Customers in the sample XML above) and set the primary keyon the appropriate field. Once you've done that, if you load the XSD with ReadXmlSchema,followed by loading the XML with ReadXml, the data set will no longerinfer an extra field on the parent table; it will simply add the foreign keyfield in the child table to the specified primary key field in the parent, andyou should get exactly the results you expect.

 

Ifyou're loading the XML data into an XmlDataDocument instead of a data set,you'll have to call ReadXmlSchema on the DataSet property beforeloading data with the Load method. Otherwise, none of the loaded datawill get added to the data set relational schema. The code below shows how toload a schema for the data set of an XmlDataDocument, before loading theXML document:

 

// Create thedocument.

XmlDataDocumentdoc = new XmlDataDocument();

// Get the dataset reference.

DataSet ds =doc.DataSet;

// Read schemato use for relational data in document.

ds.ReadXmlSchema(@"..\..\NestedData.xsd");

// Load it in.

doc.Load(@"..\..\NestedData.xml");

 

If youdon't provide the schema, the data would still be loaded, and accessiblethrough the methods and properties of the base XmlDocument class, butthe DataSet property would contain no tables. Even if you provide theschema, you must keep in mind the foreign key issues described earlier whenadding rows to the generated tables. If you add a new row to the Orders tablewithout setting the foreign key column to an existing row in the Customerstable, the resulting Order element is created as a child of the root element ofthe XML document.

 

BrianNoyes is asoftware architect with IDesign, Inc. (http://www.idesign.net),a .NET-focused architecture and design consulting firm. Brian specializes indesigning and building data-driven distributed Windows and Web applications.Brian writes for a variety of publications and is working on a book forAddison-Wesley on building Windows Forms Data Applications with .NET 2.0.Contact him at mailto:brian.noyes@idesign.net.

 

 

 

 

Add a Comment

There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here

advertisement




Comments from the DevConnections Community

Join our community of development pros.

Windows problem

I all, I have a problem on my Windows Vista that began afetr the purchase of an external Hard Disk Freecom. A few days afetr the purchase I discon...

Most Recent Posts

GOOGLE LINKS
SPONSORED LINKS
FEATURED LINKS