DataSets and Diffgrams in .NET 1.0 and 1.1



In ADO.NET, the only persistence format for data is as XML. The DataSet object can persist its content to a disk file or a stream as an XML diffgram, which preserves all the values for both the current and original versions of each row. However, diffgram support was a relatively late addition to the DataSet, and the format changed regularly during development of the .NET Framework. The result is that there are a couple of "holes" in version 1.0, which were fixed in version 1.1 - so knowing how to work round the issues is important if you need to be able to run on both versions.


I really like diffgrams - especially the fact I can export data from a DataSet as a diffgram, and then import it again later so that I get exactly the same Dataset as I started out with. Well almost. There are, in fact, a few issues and "gotchas" you need to be aware of when working with diffgrams and the DataSet. I've been playing with both since the first beta releases of .NET, and with versions 1.0 and 1.1. The three issues you are most likely to have to face are really all to do with the bug fixes and changes between the two versions of the DataSet:





So, if you are building applications that rely on diffgram support to remote data (i.e. send it across a network as an XML document), you have to find workarounds for these issues. In this brief article, we look at:



Working Round the "Empty Diffgram" Issue in v1.0

In version 1.0, the DataSet does not correctly recognize a legal diffgram, exported from MSXML or other parsers, which contains only deleted rows. For example, the following diffgram is perfectly legal, and describes the situation where the only existing order for a customer has been deleted:


<diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"




  <diffgr:before><Orders diffgr:id="Orders1" msdata:rowOrder="0">







      <ShipName>Du monde entier</ShipName>

      <ShipAddress>67, rue des Cinquante Otages</ShipAddress>




      <ShipperName>United Package</ShipperName>





The DataSet object's ReadXml method in Version 1.0 of the .NET Framework insists that such an XML diffgram must contain <NewDataSet /> rather than <NewDataSet> </NewDataSet>. If not, it does not parse the following <diffgr:before> section, and hence the DataSet remains empty. Therefore you have to replace the offending combination of tags with the required single tag. The actual combination of characters exported from MSXML is:


"<NewDataSet>" & vbCrlf & vbTab & "</NewDataSet>"


So you can get round this using something like:


  Dim strOldElement = "<NewDataSet>" & vbCrlf & vbTab & "</NewDataSet>"

  strDiffGram = strDiffGram.Replace(strOldElement, "<NewDataSet />")


This is not required in version 1.1, where the DataSet will correctly parse the diffgram shown earlier. However, the "kludge" does not break version 1.1 applications, so can be used to implement version independence.

Working Round the "Extra Rows" Issue in v1.0

The second issue that comes into play with the diffgram-parsing process in the version 1.0 DataSet object's ReadXml method involves the incorrect allocation of internal GUIDs to the rows. If a diffgram contains the definition of a Modified row, it does not always get parsed correctly by the ReadXml method. When a new DataSet is created from an existing DataSet using the GetChanges method, both contain GUIDs for each row, and these are used to assist in identifying matching rows when the Merge method of the DataSet is called later on.


However these GUIDs are not persisted in a diffgram, and reading a diffgram into the DataSet object sometimes appears to cause incorrect row GUIDs to be created instead. This is not visible in the DataSet itself, where everything looks fine. However, the "refresh" (Fill method) then results in two copies of the row occurring in the DataSet - one marked as Modified and one marked as Unchanged. A subsequent Merge method then fails to match the rows with those in the original DataSet, and it cannot re-establish the primary key constraint afterwards - resulting in a run-time error.


You can get round this by simply calling the GetChanges method on the DataSet first - before you do anything else with it. This appears to correct the errant RowState value for the modified row. But bear in mind that it also removes any unchanged rows from the DataSet, although this could be an advantage as it effectively "marshals" just the changed rows and could improve efficiency if you intend to pass the DataSet to another location, or another tier of your application for processing:


  objDataSet = objDataSet.GetChanges()


This is not required in version 1.1, where the DataSet appears to correctly match the two versions of the row. However, it does not break version 1.1 applications unless they depend on all the rows (including unchanged ones) from still being in the DataSet.

Working Round the "Create Primary Keys" Issue in v1.1

For a change, the third issue involves version 1.1 of the DataSet. Whether this is actually a bug in version 1.1, or is in fact the correct behavior - and it was version 1.0 that contained the bug, is open to discussion. However, it will affect your code if you load a diffgram that contains a deleted row into a DataSet in version 1.1, the try and set the primary key(s) of the table(s).


It's permissible to define the primary key(s) of tables within a DataSet that are already filled (i.e. tables that contains existing data rows). In theory this is not best practice, and you should really specify all the schema details (such as primary keys) before filling the table. However, when using the Fill method, where the schema of the database table is automatically used to define the schema of the DataTable objects within the DataSet, you have no choice but to set the primary keys while the tables contain rows.


However, if the DataSet has been filled from a diffgram that contains deleted rows, setting the primary keys afterwards is only possible in version 1.0 of the DataSet. In version 1.1, attempting to do so produces a DeletedRowInaccessibleException with the message "Deleted row information cannot be accessed through the row".


If you don't intend to use diffgrams, you might be able to get away with setting the primary keys afterwards. A good plan would be to check for the presence of the primary key in each table first, then set it only if required. However, when you do use a diffgram, you must ensure that the schema you load before the diffgram specifies any primary key details you require. You can also specify the relationships between tables in a schema, as demonstrated next.

Setting Primary Keys and Relations with a Schema

When you need to set the primary keys in a DataSet, and specify the relationship between tables, you can do so within an XSD schema. Without the primary keys, a call to the Merge method of the DataAdapter will fail to match the original and refreshed rows in the DataSet after you update the database. And without the proper relationship between the tables, the automatic cascade of deleted rows and updated row key values will not take place within the Update method.


The relevant sections of an example schema are shown in the next two listings. In the first, you can see two <xs:unique> elements that define the constraints for two tables (named Orders and Order_Details), and how they specify that these are the primary key columns using an msdata:PrimaryKey attribute. The nested <xs:selector> and <xs:field> elements then identify the column(s) in the tables that these constraints apply to:



<xs:unique name="Constraint1" msdata:PrimaryKey="true">

  <xs:selector xpath=".//Orders" />

  <xs:field xpath="OrderID" />


<xs:unique name="Order_Details_Constraint1"



  <xs:selector xpath=".//Order_Details" />

  <xs:field xpath="OrderID" />

  <xs:field xpath="ProductID" />




The schema can now specify the relationship between the tables using an <xs:keyref> element, which refers to the first of the two constraints shown above, and specifies that the column the constraint applies to is related to the column selected by the nested <xs:selector> and <xs:field> elements within this <keyref> section:



<xs:keyref name="CustOrders" refer="Constraint1">

  <xs:selector xpath=".//Order_Details" />

  <xs:field xpath="OrderID" />




If you don't fancy building a schema yourself, you can use a tool such as Visual Studio .NET. An even easier way, which ensures that you get all the details correct, is to write some simple code that fills a DataSet with the tables you need, sets the constraints and primary keys, adds any relationships you require, and then export the schema to a disk file using the WriteXmlSchema method.

Loading a Schema or Diffgram String into a DataSet

While reading a schema or a diffgram from a disk file into a DataSet is a simple enough task, reading them from a String (such as you get from the Request in the postback from an ASP.NET page) is not quite as straightforward. The ReadXml method does not have an overload that accepts a String, so you have to use the version that takes an XmlTextReader instead.


You must import the System.Xml namespace into the page, and create a new XmlTextReader instance that specifies the String containing the schema or diffgram as the source. Set the second parameter of the constructor to XmlNodeType.Document to indicate that you're reading a complete XML document. The third parameter is used to specify an XmlContext object that defines the namespace, encoding, language etc. of the XML document, but you can usually just use Nothing (null in C#) to indicate that you'll accept the default settings:



'create an XmlTextReader to read data sent from client

'specifying that string fragment is an XML Document

Dim objReader As New XmlTextReader(strDiffGram, _

                         XmlNodeType.Document, Nothing)


'read in the DiffGram posted from the client

objDataSet.ReadXml(objReader, XmlReadMode.DiffGram)