DataStream
LANGUAGES: C#
ASP.NET
VERSIONS: 1.0 | 1.1
Merge
Disparate Source Data
Learn
to pull data from multiple sources into a single DataSet.
By Brian
Noyes
The
DataSet is a powerful object. You can add multiple tables, set relations
between the tables, and add constraints, giving you a powerful in-memory
container for data. It can keep track of modifications to rows, and can use the
original and the modified rows to perform updates using optimistic concurrency.
How you
get data from multiple sources into one DataSet depends a lot on where the data
is coming from. If you are executing the queries yourself against one or more
databases to construct the DataSet, then there are lots of examples of how to
execute a query against a database using the ADO.NET Command object and a
DataAdapter to fill the DataSet with one or multiple tables.
But what
do you do if you are aggregating data from multiple sources, and that data
comes in the form of pre-constructed DataSets or XML that you want to turn into
a DataSet? Perhaps you are getting a DataSet as the return value from a Web
service that you do not control, or are getting separate XML files representing
tables of data from other data sources. You want to get that data into a single
DataSet for data binding, updates, or manipulation purposes.
Adding Data
and Schema with Merge
The
answer to bringing together data from multiple sources is actually very
straightforward, but seems to elude many people. The DataSet class implements a
method named Merge that allows you to pass in a source DataSet, DataTable, or
DataRow array to a target DataSet and have the data from the source merged into
the target DataSet. There are a number of overloads of the Merge method,
allowing you to approach things in different ways depending on your needs. You
can simply add multiple tables to a DataSet by merging them in from multiple
sources, or you can add data from multiple sources into a single table. Both of
these approaches include many options in terms of the way merge is performed
with respect to the schema of the target DataSet and whether changes are
preserved when data is imported into an existing table in the target DataSet.
To
demonstrate some of the options and the basic approach, the sample code
accompanying this article includes a simple application that pulls in data from
three XML files (see end of article for download details). Two of the XML files
contain data from the Northwind Orders and Order Details tables. The third
contains some modified and added records for the Orders table that the code
merges into the existing Orders table data to effect changes to the existing
data. Imagine that these three XML files could have come from completely
different data sources over the wire. However, the data has implicit relationships
that you want to recapture and use in your application.
Your
first thought might be that you can just read multiple XML files into the
DataSet in a similar way to loading multiple queries into a DataSet using a
DataAdapter. The ReadXml method of the DataSet makes it easy to read in XML
into a DataSet, but it only supports repopulating the DataSet from scratch, not
adding information to the DataSet incrementally. Unfortunately, with the
current version of .NET, there is no XmlAdapter. I will be covering some of the
new capabilities just announced for .NET 2.0 in future columns, but there is in
fact an XmlAdapter in .NET 2.0. If you need to do this kind of loading of
multiple XML files into a DataSet a lot today, consider writing your own class
to encapsulate the process until the .NET 2.0 functionality is released.
So the
solution in this case is to use the Merge method on the DataSet class. Let's
start with the case where you have data coming from two sources that represent
different tables, but that there is an implicit parent-child relationship
between the tables. To get the data into a single DataSet with that
relationship, the first step is to read the data into two different DataSets
first (assuming XML as the source data format - you may also have been passed a
fully constructed DataSet from a Web service method or class library call):
DataSet
dsOrders = new DataSet();
DataSet
dsOrderDetails = new DataSet();
dsOrders.ReadXml("Orders.xml");
dsOrders.Tables[0].TableName
= "Orders";
dsOrderDetails.ReadXml("OrderDetails.xml");
dsOrderDetails.Tables[0].TableName
= "OrderDetails";
Note
that the code is renaming the tables after reading them in. This is because the
XML files for the sample were purposely saved without schema information to
more closely represent data from disparate data sources, where there may not be
any agreed upon common schema. To bring the two tables together into one
DataSet, you then call the Merge method on the target DataSet into which you
want to aggregate the data from the source DataSet. Then you can create the
relationship between the tables:
// Perform the
merge
dsOrders.Merge(dsOrderDetails.Tables["OrderDetails"]);
// Now that the
child table is there, create the
// relationship
and constraints
dsOrders.Relations.Add("FK_Orders_OrderDetails",
dsOrders.Tables["Orders"].Columns["OrderID"],
dsOrders.Tables["OrderDetails"].Columns["OrderID"],true);
In this
case, I am using the version of the Merge method that simply takes a DataTable
reference. It will also transfer the schema information from that table into
the DataSet and add it as an additional table (see Figure 1). The result is
that a new DataTable is added to the target DataSet (dsOrders) with the schema
of the Order Details table and its contained data.
Figure 1. Calling Merge on one DataSet and
passing in the table from another results in a copy of the table from the
source being created in the target.
Change Data
with Merge
Now
let's take a look at another scenario where you might use the Merge method. Say
you have a DataSet that contains your current data for a table. That current
data may or may not have modifications already made to it since it was
retrieved from its data source. You receive a set of data from some other
source that contains new or modified data that you need to use to perform
updates to the existing set of data. The data may be in the form of an XML
file, a DataSet with a DataTable that matches the schema of the target
DataTable, or perhaps an array of DataRows resulting from a method call such as
Select or GetChanges on an existing DataSet.
In this
situation, when you call Merge on the target DataSet, passing in source data
whose schema matches an existing table in the target, the data from the source
will be merged into the target. What "merge" means in this case depends on a
number of factors.
The
simplest to understand is the default case, where you just pass in the source
data to a target, and both source and target have matching table schema with
primary key information. In this case, if a row in the source data matches the
primary key of a row in the target data, the source data will replace the
current values of the fields of the corresponding row in the target data. If no
matching primary key is found for a row in the source data, it will be added to
the target data as a new row (see Figure 2). If there are any schema mismatches
or if a constraint is violated, an exception will be raised at the end of the
updating process with embedded information about what went wrong, and error
information will be added to the offending rows.
Figure 2. Rows from the source table replace
the matching rows in the target table. Rows from the source that do not match
target rows are simply added to the target as new rows.
If you
use one of the overloads of the Merge method that take a Boolean
preserveChanges parameter, and you pass true for that parameter instead of the
default value of false, the behavior in the scenario is quite different. When
you tell the Merge method to preserve changes, if a row in the source data has
the same primary key as a row in the target data, the target data row current
values will remain unchanged, but the original values for the row in the target
will take on the original values of the source row.
I'm not
sure I see a lot of value in using this approach, because if the original
values change, you will get concurrency exceptions if you try to use the target
DataSet to perform updates to its data source. But that is the designed
behavior of the Merge method when told to preserve changes. Figure 3 shows the
code for the method that reads in the source and target data and merges them.
Note that it is important to have primary key information available in both of
the tables. Without this, Merge will treat all rows coming from the source as
new rows.
private void
MergeRows(object sender, System.EventArgs e)
{
// Create the source and target DataSets
DataSet dsOrders = new DataSet();
DataSet dsModOrders = new DataSet();
// Load the target data and name the table
dsOrders.ReadXml("Orders.xml");
dsOrders.Tables[0].TableName =
"Orders";
// Setting primary key is important because
that is the
// only way Merge can identify matching
rows
dsOrders.Tables[0].PrimaryKey = new
DataColumn[]
{dsOrders.Tables[0].Columns["OrderID"]};
// Make all the rows in the target the
original values
dsOrders.AcceptChanges();
// Load the source data, set table name and
PK
dsModOrders.ReadXml("ModifiedOrders.xml");
dsModOrders.Tables[0].TableName =
"Orders";
dsModOrders.Tables[0].PrimaryKey = new
DataColumn[]
{dsModOrders.Tables[0].Columns["OrderID"]};
// When loaded, the rows are all treated as
new/mod rows
// If AcceptChanges is called, it sets them
all to orig
// so that the merge behavior with
preserveChanges
// can be observed.
if (chkSourceOrig.Checked)
{
dsModOrders.AcceptChanges();
}
// Do the merge
dsOrders.Merge(dsModOrders.Tables["Orders"],
chkChanges.Checked,MissingSchemaAction.AddWithKey);
// Data bind to see the results, varying
based on the
// selected RowState
dataGrid1.DataSource = new DataView(
dsOrders.Tables["Orders"],null,null,
(DataViewRowState)cmbRowStateFilter.SelectedItem);
}
Figure
3. Calling Merge
with the same table in both source and target allows you to perform updates
from other DataSets. The behavior of merge depends on a combination of the
RowState of rows in the source and target and whether the preserveChanges
Boolean argument is passed to Merge.
The
sample program for the article will let you play with these options by changing
the preserveChanges parameter on the fly and displaying the resulting changes
with a selectable RowState filter. You will want to look at the source and
target data to see which values are coming from where to understand what you
are seeing at run time, but this should give you a better idea of what
combinations are possible using the Merge method. To read about all the various
combinations of behavior of the Merge method, see the topic "Merging DataSet
Contents" in the MSDN library.
The
sample code accompanying this article is available for download.
Brian Noyes is a consultant, trainer, speaker,
and writer with IDesign, Inc. (http://www.idesign.net),
a .NET focused architecture and design consulting firm. Brian specializes in
designing and building data-driven distributed applications. He has over 12
years' experience in programming, engineering, and project management, and is a
contributing editor and writer for C#PRO, asp.netPRO, and other publications. Contact
him at mailto:brian.noyes@idesign.net.