The DataSet Grows Up in ADO.NET 2.0

Part 1 - Fundamental Enhancements to the Primary Data Store

 

Alex Homer (alex@stonebroom.com)      ©2005 Stonebroom Limited, England

 

Note: the code listings in this article are correct for the Beta 1 release of the .NET Framework v 2.0, with notes as to the upcoming changes in the Beta 2 release. The example code is available in both Beta 1 and Beta 2 versions. All the examples can be downloaded from our Web site at http://www.daveandal.net/articles/datasetv2/, and you can also run some of them online from the same location.

 

Version 1.0 of the .NET Framework introduced a completely new approach to handling relational data, by separating out the two basics functions of all data access application requirements into separate classes that are optimized for each specific task. For applications that just need to access rowsets in a stream fashion, the DataReader classes are lightweight and give great performance in a connected scenario.

 

On the other hand, the DataSet class is designed for applications that require disconnected access to data, or which need to persist data without maintaining an open connection to the database. The DataSet is rather like the Recordset object from classic ADO, but has several useful advantages:

 

 

Although the DataSet class in version 1.x of the .NET Framework performs well, and can provide a suitable data store for all kinds of disconnected applications, Web Services, etc., there are some shortcomings. These have been addressed in version 2.0, where the DataSet grows up into a fully-fledged and extremely powerful data container that supports much better serialization, update capabilities and performance than in version 1.x.

 

In this and a follow-up article, we look at the changes between the version 1.x and version 2.0 DataSet and their associated classes, showing you how you can take advantage of the new features to improve your applications' capabilities and performance. The topics we'll be covering in this part are:

 

 

Then, in a follow-up article, we'll look in more detail at:

 

Loading a DataSet, and the New LoadOption Enumeration

One of the issues developers came across when using version 1.x of the DataSet was the awkward way that you have to use the GetChanges and Merge methods of the DataSet to support updates where you want to allow user-reconciliation of update errors. You must keep a copy of the DataSet containing the user's updates, so that the current values (the values they entered) are not lost when the Fill method of the DataAdapter refreshes the rows after attempting an update (or if the stored procedures you use for the update return the values from the database after the update has been executed).

 

In version 1.x, this could result in a six-step process, where you had to:

 

 

 

In version 2.0, Microsoft has addressed this issue by providing an enumeration called LoadOption, which allows you to control precisely how the values in existing rows in a DataSet are updated when incoming rows (from the Fill or the new Load methods) match the existing rows on the primary key - and hence effectively "replace" these existing rows. And they have also filled out the API for the DataSet, DataAdapter and DataView classes by adding a new property called AcceptChangesDuringUpdate, implementing methods that allow you (under certain circumstances) to change the RowState property value for rows in the DataSet tables, and adding a method to convert a DataView instance into a DataTable instance. We'll look at all of these topics in the following sections of this article.

The LoadOption Enumeration

The LoadOption enumeration is used to specify how the values in incoming rows replace (update) the values in existing rows in the DataSet tables. Remember that this only applies if the table has a primary key defined (so that incoming rows can be matched with existing rows), and where the incoming row actually does match (and hence replace or refresh) an existing row. The names of the enumeration members differ between the Beta 1 and Beta 2 releases, as shown in Table 1. The Beta 2 names should be the ones used in the release version of ADO.NET 2.0.

 

Table 1 - The Values in the LoadOption Enumeration for Beta 1 and Beta 2

Beta 1 Enumeration Member

Beta 2 Enumeration Member

Description

PreserveCurrentValues

PreserveChanges

Synchronize the rows and keep the modifications to rows in the DataSet. Effectively, only the original values in the existing rows are replaced with incoming values.

UpdateCurrentValues

Upsert

Aggregate the rows in the DataSet with the new rows. Effectively, only the current values in the existing rows are replaced with incoming values.

OverwriteRow

OverwriteChanges

Synchronize the rows and abandon modifications to rows in the DataSet. Effectively, both the current and original values in the existing rows are replaced with incoming values.

 

Table 2 shows the effects of each of the LoadOption values in more detail. You can see, for example, that the PreserveCurrentValues option only updates the original value in matching rows in the DataSet tables, while keeping the user's edited value as the current value. This is the ideal situation where you want to attempt to push the user's changes to the rows into the database, but allow them to reconcile any failed updates afterwards.

 

Usually an update will fail because another user has performed an update on the row in the database while the first user is holding a disconnected copy of the row in their DataSet (the application should, of course, validate values they enter and prevent most of the other common update errors). The application will be able to display the value that is in the database now (the underlying value) and the proposed value that was entered by the user (the current value), so that they can make an informed decision about how to resolve the failed update.

 

Table 2 - The Effects of the LoadOption Enumeration in More Detail

RowState of Existing Row

PreserveCurrentValues

PreserveChanges

UpdateCurrentValues

Upsert

OverwriteRow (default)

OverwriteChanges

 

Added

Current = Existing

Original = Incoming

RowState = Modified

Current = Incoming

Original = Existing

RowState = Added

Current = Incoming

Original = Incoming

RowState = Unchanged

Modified

Current = Existing

Original = Incoming

RowState = Modified

Current = Incoming

Original = Existing

RowState = Modified

Current = Incoming

Original = Incoming

RowState = Unchanged

Deleted

Current = Existing

Original = Incoming

RowState = Deleted

In Beta 1:

Undo Delete

Current = Incoming

Original = Existing

RowState = Modified

See note below for Beta 2 behavior.

Undo Delete

Current = Incoming

Original = Incoming

RowState = Unchanged

Unchanged

Current = Incoming

Original = Incoming

RowState = Unchanged

Current = Incoming

Original = Existing

if new value = existing:

  RowState = Unchanged

else:

  RowState = Modified

Current = Incoming

Original = Incoming

RowState = Unchanged

No existing matching row in table

Current = Incoming

Original = Incoming

RowState = Unchanged

Current = Incoming

Original = [n/a]

RowState = Added

Current = Incoming

Original = Incoming

RowState = Unchanged

 

Using the LoadOption Enumeration Values

The values in the LoadOption enumeration are used when populating a DataSet table with the Fill method or the new Load method. To use them in the Fill method, you set the FillLoadOption property of the DataAdapter that is performing the Fill:

 

myDataAdapter.FillLoadOption = LoadOption.PreserveCurrentValues   // Beta 1

myDataAdapter.FillLoadOption = LoadOption.PreserveChanges         // Beta 2

 

The new Load method takes a LoadOption value as its second parameter, for example:

 

myDataSet.Load(data-reader, LoadOption.PreserveCurrentValues, table-name array)  

 

We look at the Load method in more detail later in this article. Meanwhile, to see the effects of the LoadOption enumeration values, the next example uses them to control how rows are updated when using the Fill method of the DataAdapter. Figure 1 shows the example page we provide that demonstrates some of the effects of the LoadOption enumeration. At the top are a set of option (radio) buttons that you can use to specify the LoadOption value to use, followed by a list of three rows from the Northwind database. The page shows the current and original values, and the value of the RowState property, for the three stages that each row goes through as the page is executed.

 

Figure 1 - The Example Page that Demonstrates the Effects of the LoadOption Enumeration

 

The three stages of processing in the example page are:

 

 

There is quite a lot of code in the page, because it has to keep copies of the DataSet at each stage to be able to display the values afterwards. It also has to create a primary key on the table in the DataSet before making copies, so that the Fill method will match incoming rows with the existing rows (and hence update them). All these techniques are the same as you would use in version 1.x, and so we haven’t listed all the code here. You can use the [view source] link at the foot of the page to see it, or download the examples and play with the code yourself.

 

Examining the Effects of the LoadOption Enumeration Values

To see the effects of the LoadOption enumeration values, you can select each one and view the results in the example page. The following three screenshots (Figures 2, 3 and 4) show these results. When the LoadOption value is PreserveCurrentValues (PreserveChanges in Beta 2), the current value of the customer name in the modified row is maintained and the RowState remains set to Modified (see Figure 2). For the deleted row, there are no current values and the Fill method does not re-instance them - the RowState remains set to Deleted.

 

In the row that was added to the table, there is no change to the current values or the RowState because there is no matching incoming row (this row does not exist in the database). If another user had added a row to the database with the same primary key, however, the original values in this row would be set to the incoming values, and the row would be marked as Modified.

 

The overall effect is to synchronize the original values in the rows with the values in the database and keep the modifications to rows in the DataSet.  

 

Figure 2 - The Effects of the PreserveCurrentValues LoadOption value

 

When the LoadOption is UpdateCurrentValues (Upsert in Beta 2), the modified customer name in the current value of first row is replaced by the incoming row value, and the RowState remains set to Modified (see Figure 3). For the deleted row, where there are no current values, the incoming row re-instates these with the values in the database. It also changes the RowState from Deleted to Modified, because the row exists in the database and so an UPDATE command is required for the next update, and not an INSERT command.

 

 

The third row, which was added to the DataSet, remains as an Added row because no row in the database matches it. If another user had added a row with the same primary key, however, their values would replace the current value. But even if another user had added rows to the database that match any of the rows in the DataSet, these values would only replace the current values - while the original values would be retained.

 

The overall effect, therefore, is to aggregate the rows in the DataSet with the new rows, without losing the original values.

 

Figure 3 - The Effects of the UpdateCurrentValues LoadOption value

 

Finally, when the LoadOption value is OverwriteRow (OverwriteChanges in Beta 2), the current and the original values in each matching row (the first and second rows) are replaced by the values in the incoming rows (see Figure 4). And, because these match the values in the database, the rows are marked as Unchanged because there is no need to push the values back into the database when the next update is attempted. The Update method will process them and update the database only if you edit the rows again.

 

The only exception is the row that was added to the DataSet, which remains marked as an Added row ready to be inserted into the database when the next Update takes place. However, if another user had added a row to the database with the same primary key, their values would replace both the current and original values in the DataSet and the RowState would be set to Unchanged.

 

The overall effect is to synchronize the rows, and abandon modifications to rows in the DataSet.

 

Figure 4 - The Effects of the OverwriteRow LoadOption value

Updating the RowState Property

Notice how, in Table 2, the LoadOption values affect the setting of the RowState property for each row that an update is attempted on. A common problem in version 1.x of ADO.NET is that the RowState values are read-only, and cannot be changed. While this is fine for common update operations on the rows, it does make it hard to perform custom management of the data - perhaps when you want to manipulate the contents of the DataSet so that updates and/or inserts are attempted on rows that are currently marked as Unchanged (and so the Update method will ignore them).

 

In version 2.0, you can change the values of the RowState property using two new methods that are added to the DataRow class. These are SetAdded and SetModified, and change the RowState property of the row to Added or Modified. However, they can only be called on rows that are currently marked at Unchanged, and not on rows that are marked as Added, Deleted or Modified.

 

The DataAdapter.AcceptChangesDuringUpdate Property

In version 1.x of the DataAdapter class, you can change the AcceptChangesDuringFill property from its default value of true to false, so that incoming row values only set the current values in the rows in the DataSet table, and are not propagated to the original values. In other words, the Update method does not call the AcceptChanges method on each of the rows as it fills them. This is useful when, for example, you want to keep existing values in the original values of the rows - perhaps to check for changes to the data since the DataSet was last filled.

 

In version 2.0, Microsoft has added the AcceptChangesDuringUpdate property to the DataAdapter. This has the same effect as AcceptChangesDuringFill, except it affects the outcome when the Update method is called. Effectively, it prevents the AcceptChanges method from being executed on each row. It is useful where, for example, a SQL batch statement or stored procedure you are using to perform the updates returns values. These values will replace the current values of the row in the DataSet table, and by default are copied to the original values by the AcceptChanges method. By changing the value of this property from the default true to false, you prevent the values being copied into the original values of the rows.

The DataView.ToTable Method

In version 2.0 of ADO.NET, the DataView class gains a new method named ToTable. This allows you to convert an existing DataView instance into a DataTable¸ which might be useful if you have existing code that creates a DataView, or a class that exposes only a DataView and not the original DataTable. There are three overloads of the ToTable method that allow you to use the existing view and column names as the new table and column names, or specify different table and/or column names. You can find out more from the .NET Framework SDK - search for "DataView.ToTable".

Stand-alone DataTable Instances

Those developers who looked at the Technical Preview release of ADO.NET will have noticed a class named DbTable that appeared there - with no obvious links to existing classes or the natural hierarchy of the other ADO.NET objects. The idea was to allow developers to work with single tables or rowsets, without having to create a DataSet to manage them. DbTable was designed to be used standalone, but could also be stored in a DataSet.

 

However, the whole approach was confusing and this class disappeared in Beta 1. Instead, the more obvious approach was taken, by surfacing the existing DataTable class with new methods and properties that allow it to be used in stand-alone mode. Effectively, it still lives in a DataSet but you no longer have to create and manipulate this DataSet. You just work directly with the individual DataTable instance.

 

What this means is that common operations for which you would have required a DataSet in version 1.x are now supported against DataTable. These include the ReadXml, ReadXmlSchema, WriteXml, WriteXmlSchema, Clear, Clone, Copy, Merge, GetChanges methods. The DataSet is also auto-serializable, and so can be returned from a Web Service or via Remoting. And you can use the Fill and Update methods of the DataAdapter on a DataTable:

 

myDataAdapter.Fill(myDataTable)

myDataAdapter.Update(myDataTable)

 

You can also fill an array of DataTable instances:

 

myDataAdapter.Fill(myDataTableArray)

 

There are also other overloads of the Fill method that allow you to fill subsets of rows in a DataTable.

Using the Load Method with a DataSet and DataTable

The stand-alone DataTable also supports the Load method, as mentioned in connection with the DataSet class earlier in this article. Here, we'll look at the Load method in more detail, both for the DataTable and the DataSet classes. The only difference between the two is that, when calling Load on DataSet, you must specify the tables to load. When calling it on DataTable, it loads the current DataTable instance.

 

The syntax for the three overloads of the Load method for the DataSet class is shown in Table 3. You must always specify a DataReader containing the source data (it can, of course, return more than one rowset), and a LoadOption value. You must also specify the target tables as an array, either by name or as references to DataTable instances. And, finally, you can specify a handler for the FillError event that is raised if an error occurs during the Load process.  

 

Table 3 - The Load Method Overloads for the DataSet Class

Method overload

Description

Load(data-reader, load-option, table-name-array)

Takes a reference to a class that implements the IDataReader interface, such as a SqlDataReader or OleDbDataReader, and loads the rows it exposes into the specified tables of the DataSet. The target tables are specified as a String array of table names (less than or equal to the number of rowsets available through the DataReader). The load-option parameter is a value from the LoadOptions enumeration that determines how the values of incoming rows will be used when existing matching rows are present in the DataSet tables.

Load(data-reader, load-option, data-table-array)

Loads data from a reader instance that implements the IDataReader interface into the tables in a DataSet. Works exactly like the previous overload except that the list of target tables is specified as an array of DataTable instances instead of the table names.

Load(data-reader, load-option, fill-error-handler, data-table-array)

Loads data from a reader instance that implements the IDataReader interface into the tables in a DataSet. Works exactly like the previous overload except that a reference to a delegate (event handler) can be specified. This event handler will be executed if there is an error while loading the data.

 

For a DataTable, there are also three overloads of the Load method, as shown in Table 4. These also allow you to specify a LoadOption value, and (if required) a handler for the FillError event that is raised if an error occurs during the Load process. There is no need to specify the target table, as the data is loaded into the DataTable instance on which the method is called.

 

Table 4 - The Load Method Overloads for the DataTable Class

Method overload

Description

Load(data-reader)

Takes a reference to a class that implements the IDataReader interface, such as a SqlDataReader or OleDbDataReader, and loads the rows it exposes into this DataTable.

Load(data-reader, load-option)

Loads data from a reader instance that implements the IDataReader interface into this DataTable. The load-option parameter is a value from the LoadOptions enumeration that determines how the values of incoming rows will be used when existing matching rows are present in the DataSet tables.

Load(data-reader, load-option, fill-error-handler)

Loads data from a reader instance that implements the IDataReader interface into this DataTable. Works exactly like the previous overload except that a reference to a delegate (event handler) can be specified. This event handler will be executed if there is an error while loading the data.

 

Remember that the Load method (like the Fill method) will match incoming rows against any existing rows if a primary key is declared in the table. It must do this to maintain data integrity - it cannot insert a duplicate row that has the same primary key value as an existing row. However, if the table already contains rows and there is no primary key defined, the incoming rows are added to the table after the existing rows.

 

If the schema of the table (i.e. the names and ordering of any existing columns) does not match the incoming rows, the schema may be adjusted. If there are columns in the incoming rows that are not in the existing table, any that are missing are appended to the end of the Columns collection of the table. This can cause Load errors if the new columns have constraints, for example if they cannot contain null values, and you have used PreserveCurrentValues (PreserveChanges in Beta 2) as the LoadOption for the process. The new columns in existing rows will contain null values (unless you have specified default values), and so an error will occur when the Load method tries to re-enable constraints for the row after loading the data.

Using a DataReader to Load Tables in a DataSet

As an example of using the Load method, the code listed below demonstrates how you can create a DataReader (in this case a SqlDataReader) that returns multiple rowsets using a SQL batch statement, and then push these rows into the tables in a DataSet. The code creates the DataReader, and then declares a String array containing two table names, Orders and Order Details. Then it creates a new empty DataSet and calls its Load method - passing in the DataReader and the array of table names.

 

All the overloads of the Load method for a DataSet require a value from the LoadOption enumeration to be specified, and we use OverwriteRow (OverwriteChanges in Beta 2) here. In this case, because the DataSet is empty, the value we choose makes no difference - there are no existing rows that could match the incoming rows, and there is no primary key defined either. 

 

// connection and query details

String connect = "your-connection-string";

const String SQL = "SELECT TOP 5 OrderID, OrderDate, ShipName, ShipCity, "

                 + "ShipCountry FROM [Orders];"

                 + "SELECT TOP 5 * FROM [Order Details];"

                 + "SELECT TOP 5 CustomerID, CompanyName, Address, City, "

                 + "Country FROM [Customers]";

 

// create a connection, command and get a DataReader

using (SqlConnection con = new SqlConnection(connect))

{

  SqlCommand cmd = new SqlCommand(SQL, con);

  con.Open();

  reader = cmd.ExecuteReader(CommandBehavior.CloseConnection);

 

  // create an array of table names for first two incoming rowsets

  String[] tablenames = new String[] {"Orders", "Order Details"};

 

  // load the data from the DataReader

  // no existing data, but have to provide a value for LoadOptions

  // so use OverwriteRow - makes no difference in this case

  DataSet ds = new DataSet();

  ds.Load(reader, LoadOption.OverwriteRow, tablenames);

 

  // load another table directly using the Load method

  DataTable dt = new DataTable();

  dt.TableName = "Customers";

  dt.Load(reader);

  ds.Tables.Add(dt);

 

  // display the results

  ...

}

 

So far, however, we've only loaded two tables (Orders and Order Details). To demonstrate another way of using the Load method, the code then calls it on an individual DataTable. To achieve this, we have to first create a new DataTable. We assign the name Customers to this table then call its Load method to load the remaining rowset into it from the DataReader, and finally add it to the Tables collection of the DataSet. The remaining code (not shown here) displays the progress messages you see at the top of the page shown in Figure 5, and then assigns the tables to three GridView controls to display the contents:

 

Figure 5 - Loading DataTables from a DataReader using the Load Method

The BeginLoadData and EndLoadData Methods of the DataTable

The DataTable class also exposes two more methods, named BeginLoadData and EndLoadData, which can be used to disable constraint checking and internal index maintenance in a DataTable while data is being loaded. This speeds up load times, and only raises any constraint violations (such as primary key duplication) when the EndLoadData method is called. If you intend to load a large number of rows, you should consider using these methods:

 

try

{

  myDataTable.BeginLoadData();

  myDataTable.Load(myDataReader);

  myDataTable.EndLoadData();

}

catch (Exception e)

{

   // ... handle error ...

}

Using the FillErrorEventHandler to Catch Load Errors

You can also create an error handler routine that will be executed if an error occurs during the loading of the data into the target table. This is basically the same process as handling the existing FillError event of the DataAdapter class in version 1.x of ADO.NET. However, as there is no DataAdapter involved in the Load process, the technique for connecting the event handler to the method involves passing a delegate reference into the Load method:

 

myDataSet.Load(data-reader, load-option, new FillErrorHandler(myFillHandler), tables);

 

Then you create the event handler - where, for example, you can access the table name, the values in the row that caused the error, and set the Continue property of the FillErrorEventArgs to specify if processing should halt or continue with the next row:

 

  private void MyFillHandler(Object sender, FillErrorEventArgs args)

  {

 

    // write message to a StringBuilder for display later

    builder.Append("Error while loading table " + args.DataTable.TableName);

    builder.Append("Values of the row being loaded are:<br />");

    foreach (Object o in args.Values)

    {

      builder.Append(" - " + o.ToString() + "<br />");

    }

 

    // tell Load method to continue with next row

    args.Continue = true;

  }

 

Figure 6 shows an example that uses both the Load method by itself, and the sequence of BeginLoadData, Load and EndLoadData methods, to load two DataTable instances directly. It also contains a handler for the FillError event, which you can view from the source code, or by clicking the [view source] link that is at the foot of all the example pages on our Web site.

 

Figure 6 - Using the BeginLoadData, Load and EndLoadData Methods

 

Creating and Using the DataTableReader Class

One of the really great new features in version 2.0 of ADO.NET is the new class named DataTableReader. You can use it to stream data out of a DataSet or a stand-alone DataTable instance, just like you stream data from a relational database using a DataReader instance such as SqlDataReader or OleDbDataReader. The DataTableReader exposes much the same set of methods and properties as the OleDbDataReader class (it does not expose the set of GetSqlxxx methods that are available in SqlDataReader because it cannot used against a SQL Server database).

 

The methods available for DataTableReader include Read, NextResult, Close, GetName, GetOrdinal, and the full set of methods to retrieve values as specific types - such as GetInt32, GetString, GetByte etc. Properties include Depth, FieldCount and HasRows. And you can read the values in each row using the Item collection, or by indexing the row on the column name or the ordinal position - just as you do with a SqlDataReader.

 

The DataTableReader is a "live" reader, in that it reflects the values of the rows in the table at the time the row is read, and not at the time the reader is opened (in other words it is not operating over a snapshot of the table, but is accessing the rows in the table directly when you call the Read method).

 

You can get an instance of a DataTableReader by calling the GetDataReader method (in Beta 1) or the CreateDataReader method (in Beta 2) of a DataSet or a single DataTable. For a DataSet, in Beta 1, the options are:

 

myDataTableReader = myDataSet.GetDataReader()

myDataTableReader = myDataSet.GetDataReader(array-of-tables)

 

In Beta 2 you use:

 

myDataTableReader = myDataSet.CreateDataReader()

myDataTableReader = myDataSet.CreateDataReader(array-of-tables)

 

If you don't provide a parameter to the GetDataReader/CreateDataReader method, it returns a reader over all the tables in this DataSet. Instead, if you only want to read some of the tables, or require them to be read in a specific order, you can provide an array of DataTable references to get a DataTableReader that will return a rowset for each of the tables in the array.

 

For a DataTable, you just call the GetDataReader/CreateDataReader method with no parameters, to get a DataTableReader that is open over just the DataTable that this method is called on:

 

myDataTableReader = myDataTable.GetDataReader()       {Beta 1}

myDataTableReader = myDataTable.CreateDataReader()    {Beta 2}

 

An alternative approach is to use the constructors of the DataTableReader class to create a reader over a table. You can create a DataTableReader over an existing DataTable using:

 

DataTableReader myReader = new DataTableReader(myDataTable)

 

Or you can create a DataTableReader over more than one DataTable by specifying references to the tables in an array that you pass to the constructor:

 

DataTableReader myReader = new DataTableReader(array-of-tables)

 

You can open multiple DataTableReader instances over a table at the same time, and read from them in any order (though they are, like the DataReader, forward-only readers).

Using the DataTableReader Class to Copy a Table

Another great feature of the DataTableReader class is that it implements the IDataReader interface. This means that you can use it as the data source to populate a DataSet or DataTable though the Load method. For example, in Beta 1, you can create a DataTableReader on a DataTable, and then use it to fill another DataTable like this:

 

myDataTableReader = myFirstDataTable.GetDataReader()

mySecondDataTable.Load(myDataTableReader)

 

The code shown next demonstrates this by loading a DataSet with a table named Customers that contains five rows, and then creating a new DataTable named CustomerCopy within the DataSet. It then uses the constructor of the DataTableReader class to create a new DataTableReader instance over the Customers table and load these rows into the new CustomerCopy table:

 

    builder = new StringBuilder();

    DataSet ds = new DataSet();

 

    // create DataAdapter and get a DataSet of rows

    SqlDataAdapter da = new SqlDataAdapter(SQL, connect);

    da.Fill(ds, "Customers");

 

    // display details of table in DataSet

    builder.Append(String.Format("DataSet contains table '{0}' "

           + "with {1} rows<p />",

           ds.Tables[0].TableName, ds.Tables[0].Rows.Count.ToString()));

 

    // add new table to DataSet named "CustomerCopy"

    DataTable dt = new DataTable();

    dt.TableName = "CustomerCopy";

    ds.Tables.Add(dt);

 

    // create DataTableReader over original "Customers" table

    DataTableReader reader = new DataTableReader(ds.Tables["Customers"]);

 

    // load new table with rows from original table

    dt.Load(reader);

 

    // display rows in the GridView control

    grid1.DataSource = dt;

    grid1.DataBind();

 

Figure 7 shows the results of the code listed above. You can see that five rows were read from the database into the DataSet, and then these five rows were copied into the new table named CustomerCopy using a DataTableReader (you can use the [view source] link at the foot of the page to see the source code).

 

Figure 7 - Reading a Table with a DataTableReader and Loading another DataTable

Using a DataTableReader to Copy Tables between DataSets

You can also use a DataTableReader to copy table rows from one DataSet to another. For example, you can create a DataTableReader on a table in one DataSet, and then use it to fill a table in another DataSet, like this:

 

// create array of source table references

DataTable[] tablearray = new DataTable[3];

tablearray[0] = myFirstDataSet.Tables["Table"];

tablearray[1] = myFirstDataSet.Tables[1];

tablearray[2] = myFirstDataSet.Tables["SomeOtherName"];

 

// get a DataTableReader over these tables

myDataTableReader = myFirstDataSet.GetDataReader(tablearray);

 

// create array of destination table names

String[] tablenames = new String[] {"Orders", "Order Details", "Customers"};

 

// load data into destination tables

mySecondDataSet.Load(myDataTableReader, LoadOption.OverwriteRow, tablenames);

 

As an example of this technique, the page shown in Figure 8 uses code similar to that listed above to copy three tables from one DataSet to another. It fills a DataSet with three tables using the Fill method of a DataAdapter, and - because the table names are not specified - they are automatically assigned the names Table, Table1 and Table2. Then the code creates a DataTableReader over these three tables using the GetDataReader method, and uses this to load three tables in a new DataSet.

 

Figure 8 - Copying Rows between DataSets using a DataTableReader

 

You can even use this technique to copy rows that are not all in the same DataSet. In other words, you can create a DataTableReader that will stream data from tables in different DataSet instances. For example:

 

DataTable[] tablearray = new DataTable[3];

tablearray[0] = myFirstDataSet.Tables["SomeTable"];

tablearray[1] = mySecondDataSet.Tables["AnotherTable"];

tablearray[2] = myThirdDataSet.Tables[0];

 

// get a DataTableReader over these tables

myDataTableReader = myFirstDataSet.GetDataReader(tablearray);

...

 

Batched Updates from a DataSet or DataTable

When you call the Update method of the version 1.x DataAdapter to push the changes in a DataSet into the database, it iterates the rows in the DataSet looking for any that have their RowState property set to Added, Deleted or Modified. For each one it finds, it executes the command specified for the appropriate InsertCommand, DeleteCommand or UpdateCommand property of the DataAdapter. This means that each update requires a round-trip to the database.

 

In version 2.0, you can instruct the DataAdapter to perform the updates in batches, where the database you are using supports this feature (most do, including SQL Server 7.0 and above, and Oracle when using the OracleClient classes). All that's required is to set the UpdateBatchSize property of the DataAdapter to an integer value greater than 1:

 

myDataAdapter.UpdateBatchSize = batch_size

 

The default value for this property, if you do not set it, is 1; and so the Update behaviour defaults to that of version 1.x. Using the value zero forces all rows in the DataSet to be processed as a single batch.

 

 

The batched update feature works inside any connection-based transactions that you might have implemented, and dramatically reduces the number of network round-trips required to perform the updates from the DataSet. And the great thing is that it is virtually transparent in that, if the database you are using cannot support batched updates, the DataAdapter automatically reverts to using individual updates for each changed row.

Handling the RowUpdated Event for Batched Updates

As in version 1.x, the DataAdapter in version 2.0 exposes an event named RowUpdated, which you may find useful when performing batched updates. It is fired when each batch is completed, and exposes properties that you can use to get information about the process. Note that you don’t have to handle this event when performing batched updates - but can be used to implement process that require access to each batch update if you require this functionality.

 

To use the RowUpdated event, you just declare a SqlRowUpdatedEventHandler routine, which receives a reference to a SqlRowUpdatedEventArgs instance that contains information about the current batch. You can get a count of the number of rows in the current batch from the RowCount property, and the number of rows updated from the RecordsAffected property. There is also a property named StatementType that indicates the type of statement being executed (Update, Insert or Delete), and a write-only property named Status that you can set to indicate if the process should continue or halt (using a value from the UpdateStatus enumeration).

 

 

The SqlRowUpdatedEventArgs class also exposes references to the current Command, any exceptions that have occurred (the Errors property), and the table mapping in use by the DataAdapter (the TableMapping property). Finally, the Row property returns a reference to the last row that was updated. However, one point to watch out for is that the Row property returns null if there are fewer rows in the batch than you specified for the batch size (which is likely to be the case for the last update batch).

 

The following listing shows a simple event handler for the RowUpdated event, and the way that it is attached to the RowUpdated property of the DataAdapter:

 

// add event handler for RowUpdated event

da.RowUpdated += new SqlRowUpdatedEventHandler(RowUpdatedHandler);

...

...

private void RowUpdatedHandler(Object sender, SqlRowUpdatedEventArgs args)

{

  builder.Append(String.Format("RowUpdated event, StatementType = '{0}'<br />",

                                args.StatementType.ToString()));

  builder.Append(String.Format("Updated {0} row(s)<br />",

                                args.RecordsAffected.ToString()));

  try

  {

    // need try..catch as row may be null if this is the last batch

    builder.Append(String.Format("OrderID: {0}, &nbsp; CustomerName: {1}, "

          + "&nbsp; City: {2}",

          args.Row["OrderID"], args.Row["ShipName"], args.Row["ShipCity"]));

  }

  catch { }

  builder.Append("<p />");

}

Viewing the Effects of Batched Updates

We provide an example page that demonstrates the effects of setting the UpdateBatchSize property of the DataAdapter when executing the Update method. Figure 9 shows the page, with the value zero selected for the UpdateBatchSize property. This causes the DataAdapter to attempt all the updates within a single batch. The RowUpdated event handler displays the statement type ("Batch"), the number of rows that were updated within the current batch, and some values from the last row that was updated. As you can see from Figure 9, all 77 rows stored in the DataSet were updated within the database as a single batch.

 

Figure 9 - The Batched Updates Example When the Batch Size is 1

 

 

If you select the value 1 for the UpdateBatchSize property (this is the default is not specified), the DataAdapter reverts to its version 1.x behavior and updates each row individually. Figure 10 shows the results, and you can see that the statement type is now "Update" (rather than "Batch"), and only one row is updated each time.

 

Figure 10 - The Result When the Batch Size is 1

 

Selecting the value 20 for the UpdateBatchSize causes the DataAdapter to perform the updates in batches of 20, and the RowUpdated event is therefore raised four times for the 77 rows in our DataSet. Each time except for the last batch, the values in the last row within the batch are displayed. However, for the last batch, there is no "current row" when it completes and so no row values are shown.

 

Figure 11 - The Result When the Batch Size is 20

 

Finally, selecting a value greater than the total number of rows in the DataSet table gives the same behavior as you get when selecting the value zero for the UpdateBatchSize property. The DataSet performs all the updates as a single batch, as you can see in Figure 12. Just bear in mind the limitations on batch sizes, and so using a specific value is probably a good idea in Beta 1; rather than using zero and relying on the DataAdapter to manage the batch size. This may change, however, for the release version of ADO.NET 2.0.

 

Figure 12 - The Result When the Batch Size is 100

Performance and Scalability Improvements

The DataSet acts rather like a relational database in the way that it stores and manages rowsets. OK, so you can't execute SQL queries against the data, but the DataSet does contain a lot of functionality that comes into play when you add, delete and edit the data rows it holds. For example, if you have specified a primary key for a table, the DataSet will not allow you to add new rows that match existing rows on the primary key, or change the value of the primary key in one row to match that in any other existing row.

 

Likewise, if you have a DataRelation defined between two tables in the DataSet, you can't (by default) violate referential integrity by adding new child rows where there is no matching parent row. And (again, by default) if you delete or edit the primary key value of a parent row, the matching child rows will be deleted or updated automatically.

 

 

All of this means that the DataSet has to manage the rows it contains. To maximize performance, and avoid operations such as a table scan to check for existing values in each row, the DataSet maintains a set of internal indexes on the rows. However, it does seem that Microsoft initially though the number of rows developers would store in a DataSet would be modest, and so the internal indexing algorithms are not as efficient as they could be. And, as it turns out, developers seem to regularly use a DataSet to store huge numbers of rows - 10,000 to 100,000 being quite common. At this point, the process of inserting, deleting and updating rows can slow down quite noticeably, and cause issues with application response times and overall throughput.

The Version 2.0 DataSet and DataTable Load API

To solve this in version 2.0, Microsoft has optimized the internal indexing of the DataSet through a complete re-write of the code that takes a new approach (called "red/black trees"). The result is that inserting and deleting rows gives a performance that drops by only around log-n as the number of rows (n) increases. Meanwhile, performance for updates is virtually constant irrespective of the number of rows.

 

OK, for less than around 30,000 rows, the difference in performance (based on figures produced using the Beta release) is not really noticeable. However, as the number of stored rows increases, the performance gains are a lot more obvious. Preliminary figures show a reduction in loading time of 25% at 30,000 rows and 50% at 50,000 rows.

 

This gain in performance is implemented through the internal interfaces of the DataSet and DataTable classes, in what's knows as the "Load API". This API is used by many methods of the DataSet and DataTable classes, including the Fill and the new Load methods we discussed earlier, and so the performance gains within the API are reflected in improved performance of these methods as well.

The Version 2.0 DataSet Persistence Formats

It's worth looking briefly at some other changes to the DataSet that improve performance and usability, but this time when saving or persisting data. A common scenario for the DataSet is to persist it across post-backs in an ASP.NET application, and deliver it across the tiers of an application - or through a Web Service - to another location where it acts as a data repository. The