DataSet XML Methods

The ADO.NET DataSet stores information internally in a proprietary binary format that’s optimized for XML representation. This means that data can be retrieved in XML format seamlessly, without any data loss or conversion errors. Table 17-1 lists the DataSet methods that work with XML.

Table 17-1. ADO.NET DataSet XML methods

Method	Description
GetXml( )	Retrieves the XML representation of the data in the `DataSet` as a single string.
GetXmlSchema( )	Retrieves the XSD schema for the `DataSet` XML as a single string. No data is returned.
ReadXml( )	Reads XML data from a file or a `TextReader`, `XmlReader`, or `Stream` object, and uses it to populate the `DataSet`. The XML document can include an inline schema.
ReadXmlSchema( )	Reads an XML schema from a file or a `TextReader`, `XmlReader`, or `Stream` object, and uses it to configure the `DataSet` (for example, creating `Constraint` and `DataColumn` objects).
WriteXml( )	Writes the contents of the `DataSet` to a file or a `TextWriter`, `XmlWriter`, or `Stream` object. You can choose to write the schema inline.
WriteXmlSchema( )	Writes just the XSD schema describing the contents of the `DataSet` to a file or a `TextWriter`, `XmlWriter`, or `Stream` object.
InferXmlSchema( )	Infers the XML schema and applies it to the `DataSet` by reading through an XML document supplied by a file or a `TextReader`, `XmlReader`, or `Stream` object.

Tip

If you need to manipulate or serialize the XML, the best choice is the direct WriteXml( ) method. Using the GetXml( ) method to retrieve a string containing the XML content (and then using the string to create a new object or write the data to disk) is inherently less efficient.

The key decision you make when dealing with the XML representation of a DataSet is deciding how to handle the schema, which defines the allowed structure and data types for the XML document. If you save the schema, you can use it as a basic form of error checking. Simply reload the schema into the DataSet before you insert any data.

Tip

Another reason to use an XML schema is to make your database code more efficient. For example, you can use ReadXmlSchema( ) to preconfigure your DataSet instead of the FillSchema( ) method discussed in Chapter 5, which requires a separate trip to the database. Alternatively, you can use a strongly typed DataSet.

Example 17-1 shows a console application that writes the retrieved XML to a file, reads it back, and then displays the XML for the retrieved DataSet using the GetXml( ) method. Note that this code doesn’t provide any error handling for its file operations (i.e., the WriteXml( ) and ReadXml( ) methods).

Example 17-1. Writing a DataSet to XML with a schema

using System;

using System.Data;

using System.Data.SqlClient;



public class SaveDataSet

{

    private static string connectionString = "Data Source=localhost;" +

         "Initial Catalog=Northwind;Integrated Security=SSPI";



    public static void Main() 

    {

        string SQL = "SELECT CategoryID, CategoryName, " +

                     "Description FROM Categories";



        // Create ADO.NET objects.

        SqlConnection con = new SqlConnection(connectionString);

        SqlCommand com = new SqlCommand(SQL, con);

        SqlDataAdapter adapter = new SqlDataAdapter(com);

        DataSet ds = new DataSet("Nortwind");



        // Execute the command.

        try

        {

            con.Open();

            adapter.FillSchema(ds, SchemaType.Mapped, "Categories");

            adapter.Fill(ds, "Categories");

        }

        catch (Exception err)

        {

            Console.WriteLine(err.ToString());

        }

        finally

        {

            con.Close();

        }



        // Save DataSet to disk (with schema).

        ds.WriteXmlSchema("mydata.xsd");

        ds.WriteXml("mydata.xml");



        // Reset DataSet.

        ds.Reset();



        // Read schema and reload data.

        ds.ReadXmlSchema("mydata.xsd");

        ds.ReadXml("mydata.xml");



        // Display DataSet.

        Console.WriteLine("DataSet retrieved.");

        Console.WriteLine(ds.GetXml());

    }

}

Dissecting the DataSet XML

The DataSet XML follows a predefined format that follows a few simple rules:

The root document element is the DataSet.DataSetName (in our example, Northwind).
Each row in every table is contained in a separate element, using the name of the table. In our example with one table, this means multiple Categories elements. If there are two tables, these rows are followed by a list of other elements (such as Customers elements).
An element for every column is included in each table row element. The actual column value is recorded as text inside the tag (although this is configurable).

Here’s the default XML (excerpted to the first two rows) created by Example 17-1 for the Categories table:

<?xml version="1.0" standalone="yes"?>

<Northwind>

  <Categories>

    <CategoryID>1</CategoryID>

    <CategoryName>Beverages</CategoryName>

    <Description>Soft drinks, coffees, teas, beers, and ales</Description>

  </Categories>

  <Categories>

    <CategoryID>2</CategoryID>

    <CategoryName>Condiments</CategoryName>

    <Description>Sweet and savory sauces, relishes, spreads, and

     seasonings</Description>

  </Categories>

  <!-- Other categories omitted. -->

</Northwind>

It’s possible to modify this representation without resorting to additional code or an XSLT transform. You’ll learn how you can alter the structure of the XML data with ADO.NET a little later in this chapter.

Dissecting the DataSet XML Schema

The rules for the schema document are a little more subtle. First of all, a complexType is defined for each type of table row. Complex types define a structure that is composed of several separate pieces of information. In the following example, the Categories element is a complex type that contains several subtags:

<?xml version="1.0" standalone="yes"?>

<xs:schema id="Northwind" xmlns=""

 xmlns:xs="http://www.w3.org/2001/XMLSchema"

 xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">



  <xs:element name="Northwind" msdata:IsDataSet="true">



    <xs:complexType>

      <xs:choice maxOccurs="unbounded">

        <xs:element name="Categories">

          <xs:complexType>



          <!-- Definition of Categories type omitted. -->



          </xs:complexType>

        </xs:element>

      </xs:choice>

    </xs:complexType>



    <!-- Additional code omitted. -->



</xs:schema>

The XSD sequence element is nested inside the complexType element, indicating that the fields must occur in a set order:

        <xs:element name="Categories">

          <xs:complexType>

            <xs:sequence>



            <!-- Definition of Categories type omitted. -->



            </xs:sequence>

          </xs:complexType>

        </xs:element>

Every field in a row is declared using the corresponding XSD data type. If the field is optional (in other words, DataColumn.AllowDbNull is True), the minOccurs attribute is set to 0, indicating that this element isn’t necessary. Similarly, a maxLength restriction element is added to a type if the DataColumn.MaxLength property is set.

        <xs:element name="Categories">

          <xs:complexType>

            <xs:sequence>



              <xs:element name="CategoryID" msdata:ReadOnly="true"

                  msdata:AutoIncrement="true" type="xs:int" />



              <xs:element name="CategoryName">

                <xs:simpleType>

                  <xs:restriction base="xs:string">

                    <xs:maxLength value="15" />

                  </xs:restriction>

                </xs:simpleType>

              </xs:element>



              <xs:element name="Description" minOccurs="0">

                <xs:simpleType>

                  <xs:restriction base="xs:string">

                    <xs:maxLength value="1073741823" />

                  </xs:restriction>

                </xs:simpleType>

              </xs:element>

            </xs:sequence>



          </xs:complexType>

        </xs:element>

Additional database-specific information is added using the msdata namespace, which allows attributes such as ReadOnly and AutoIncrement that aren’t part of the XSD standard but are recognized by ADO.NET.

Finally, the XSD document ends with a definition of unique elements to represent DataSet constraints. In the next snippet, a single unique element represents the primary key definition for the CategoryID field. Two XPath elements are also contained: a selector element that indicates how to find the table this constraint applies to and a field element that indicates how to find the relevant column.

<xs:unique name="Constraint1" msdata:PrimaryKey="true">

  <xs:selector xpath=".//Categories" />

  <xs:field xpath="CategoryID" />

</xs:unique>

Remember, the XSD schema is created based on the characteristics of the DataColumn and Constraint objects in the DataSet. To make sure you have as much information as possible, use the FillSchema( ) method before the WriteXmlSchema( ) method (or a strongly typed DataSet).

Tip

For more information about Microsoft’s msdata and codegen XML namespaces and what elements and attributes they define, refer to Appendix B.

XML Write and Read Modes

By default, the WriteXml( ) method simply outputs the XML data. You must create the XSD document separately. However, you can use an overloaded version of the WriteXml( ) method, which accepts a value from the XmlWriteMode enumeration. These values are described in Table 17-2.

Table 17-2. XmlWriteMode values

Value	Description
IgnoreSchema	Writes the current contents of the `DataSet` as XML data, without an XML Schema. This is the default.
WriteSchema	Writes the current contents of the `DataSet` as XML data with the relational structure as inline XML schema.
DiffGram	Writes the entire `DataSet` as a DiffGram, which includes information about original and current values. We’ll examine DiffGrams in the next section.

For example, you can choose to write the XSD inline with the XML. This shortens the coding but can waste some disk space if you store multiple DataSet files with the same schema. It can also lead to versioning problems if you modify the DataSet structure later on, and it no longer matches the schema.

// Save DataSet to disk (with schema).

ds.WriteXml("mydata.xml", XmlWriteMode.WriteSchema);



// Reset the DataSet.

ds.Reset();



// Read schema and reload data.

ds.ReadXml("mydata.xml", XmlReadMode.ReadSchema);

The mydata.xml file now has the following structure:

<?xml version="1.0" standalone="yes"?>

<Northwind>



  <!-- Inline XSD schema document goes here. -->



  <Categories>

    <CategoryID>1</CategoryID>

    <CategoryName>Beverages</CategoryName>

    <Description>Soft drinks, coffees, teas, beers, and ales</Description>

  </Categories>



  <!-- Other rows omitted. -->



</Northwind>

In this case, you don’t need to specify the XmlReadMode.ReadSchema when retrieving the data. The default, XmlReadMode.Auto, inspects the file and uses ReadSchema mode if it contains a schema. On the other hand, the default when saving data is XmlWriteMode.IgnoreSchema, which uses only the data. The full list of XmlWriteMode values is shown in Table 17-3.

Table 17-3. XmlReadMode values

Value	Description
Auto	Inspects the XML file and tries to perform the most appropriate action, depending on whether the file is a DiffGram or contains an inline schema. (Note that the inline schema is ignored if the `DataSet` already has a schema or the file contains a DiffGram.)
DiffGram	Reads a DiffGram, which specifies the contents of the `DataSet` and uses additional attributes to indicate changed values and inserted and deleted rows. These changes are applied to the `DataSet` automatically. However, it must have the same schema as the `DataSet` used to create the DiffGram, or an exception will be thrown.
Fragment	Reads partial XML documents, such as those generated by executing FOR XML queries with SQL Server 2000. The default namespace is read as the inline schema.
IgnoreSchema	Ignores any inline schema and reads data into the existing `DataSet` (preserving any schema information it already has). If any data doesn’t match the existing schema, it is discarded. You can also use this mode to read a DiffGram.
InferSchema	Ignores any inline schema, loads the data, and infers the schema from the XML document. If the `DataSet` already contains a schema, the current schema is extended by adding new tables or adding columns to existing tables. An exception is thrown if the inferred table already exists but with a different namespace or if any of the inferred columns conflict with existing columns.
ReadSchema	Reads any inline schema and loads the data. If the `DataSet` already contains schema, and the schema refers to the same table, an exception is thrown. Otherwise, the new schema information extends the current `DataSet` schema by adding new tables.

Dissecting the DiffGram

The DataSet doesn’t just contain schema information and a single set of data, it also tracks the state of each row and, if modified, the current and original values. In order to record this information in XML, ADO.NET defines a special DiffGram format.

The DiffGram format is divided into three sections: the current data, the original data, and any errors:

<?xml version="1.0"?>

<diffgr:diffgram 

        xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"

        xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1"

        xmlns:xsd="http://www.w3.org/2001/XMLSchema">



   <DataInstance>

   </DataInstance>



  <diffgr:before>

  </diffgr:before>



  <diffgr:errors>

  </diffgr:errors>



</diffgr:diffgram>

The DataInstance element contains the actual DataSet information. Rows that have been changed are marked with the diffgr:hasChanges attribute. The diffgr:before element lists the information about the original values, while elements in diffgr:errors represent the DataRow.RowError property. Elements are matched between these three sections using the diffgr:id attribute.

Example 17-2 creates a DiffGram (and displays it in a console window). The DataSet is made up of three rows retrieved from the database. The first row is modified, the second is deleted, and a fourth row is created and added programmatically.

Example 17-2. Writing a DataSet DiffGram

using System;

using System.Data;

using System.Data.SqlClient;



public class SaveDiffGram

{

    private static string connectionString = "Data Source=localhost;" +

         "Initial Catalog=Northwind;Integrated Security=SSPI";



    public static void Main() 

    {

        string SQL = "SELECT TOP 3 CategoryID, CategoryName, " +

                     "Description FROM Categories";



        // Create ADO.NET objects.

        SqlConnection con = new SqlConnection(connectionString);

        SqlCommand com = new SqlCommand(SQL, con);

        SqlDataAdapter adapter = new SqlDataAdapter(com);

        DataSet ds = new DataSet("Nortwind");



        // Execute the command.

        try

        {

            con.Open();

            adapter.FillSchema(ds, SchemaType.Mapped, "Categories");

            adapter.Fill(ds, "Categories");

        }

        catch (Exception err)

        {

            Console.WriteLine(err.ToString());

        }

        finally

        {

            con.Close();

        }



        // Modify the DataSet (change first row, delete second,

        // and add a fourth).

        DataRow row = ds.Tables["Categories"].Rows[0];

        row["CategoryName"] = "Pastries";

        row["Description"] = "Danishes, donuts, and coffee cake";



        ds.Tables["Categories"].Rows[1].Delete();



        row = ds.Tables["Categories"].NewRow();

        row["CategoryName"] = "Baked goods";

        row["Description"] = "Bread, croissants, and bagels";

        ds.Tables["Categories"].Rows.Add(row);



        // Save DataSet diffgram to disk.

        ds.WriteXml("mydata.xml" , XmlWriteMode.DiffGram);



        // Display DataSet diffgram.

        ds.WriteXml(Console.Out , XmlWriteMode.DiffGram);

    }

}

The DiffGram includes all four rows. However, the deleted row appears only in the diffgr:before section:

<?xml version="1.0" standalone="yes"?>

<diffgr:diffgram

        xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"

        xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">



  <Nortwind>

    <Categories diffgr:id="Categories1" msdata:rowOrder="0"

     diffgr:hasChanges="modified">

      <CategoryID>1</CategoryID>

      <CategoryName>Pastries</CategoryName>

      <Description>Danishes, donuts, and coffee cake</Description>

    </Categories>

    <Categories diffgr:id="Categories3" msdata:rowOrder="2">

      <CategoryID>3</CategoryID>

      <CategoryName>Confections</CategoryName>

      <Description>Desserts, candies, and sweet breads</Description>

    </Categories>

    <Categories diffgr:id="Categories4" msdata:rowOrder="3"

     diffgr:hasChanges="inserted">

      <CategoryID>4</CategoryID>

      <CategoryName>Baked goods</CategoryName>

      <Description>Bread, croissants, and bagels</Description>

    </Categories>

  </Nortwind>



  <diffgr:before>

    <Categories diffgr:id="Categories1" msdata:rowOrder="0">

      <CategoryID>1</CategoryID>

      <CategoryName>Beverages</CategoryName>

      <Description>Soft drinks, coffees, teas, beers,

       and ales</Description>

    </Categories>

    <Categories diffgr:id="Categories2" msdata:rowOrder="1">

      <CategoryID>2</CategoryID>

      <CategoryName>Condiments</CategoryName>

      <Description>Sweet and savory sauces, relishes, spreads, and

       seasonings</Description>

    </Categories>

  </diffgr:before>



</diffgr:diffgram>

Without the DiffGram, the XML file resembles the first section, without the added msdata and diffgr attributes. In other words, the deleted row isn’t saved, and no distinction is made between the original rows and the inserted row. When you reload the non-DiffGram XML into a DataSet, every row is set to DataRowState.Unchanged. If you tried to update the data source with this DataSet, no changes are made.

Tip

You can read the DiffGram using XmlReadMode.DiffGram or the default XmlReadMode.Auto. However, the DataSet must already have the correct schema in place, or an exception is thrown. There is no way to create an XML file with a DiffGram and inline schema.

The default output generated with WriteXml( ) and WriteXmlSchema( ) includes the current contents. The DiffGram output, on the other hand, generates the information needed to use the DataSet change tracking.

If you use a DataSet as a return value from a method in a web service or a component exposed through .NET remoting, the DiffGram is automatically returned. For example, consider what happens if you modify the code in Example 17-2 to become a rudimentary web service in Example 17-3.

Example 17-3. A web service that returns a modified DataSet

<%@ Webservice Class="ADOService" Language="C#" %>



using System;

using System.Web.Services;

using System.Data;

using System.Data.SqlClient;



public class ADOService : System.Web.Services.WebService

{

    private string connectionString = "Data Source=localhost;" +

        "Initial Catalog=Northwind;Integrated Security=SSPI";



    [WebMethod]

    public DataSet GetCategoriesTest()

    {

    string SQL = "SELECT TOP 3 CategoryID, CategoryName, " +

                 "Description FROM Categories";



    // Create ADO.NET objects.

    SqlConnection con = new SqlConnection(connectionString);

    SqlCommand com = new SqlCommand(SQL, con);

    SqlDataAdapter adapter = new SqlDataAdapter(com);

    DataSet ds = new DataSet("Nortwind");



    // Execute the command.

    con.Open();

    adapter.FillSchema(ds, SchemaType.Mapped, "Categories");

    adapter.Fill(ds, "Categories");

    con.Close();

            

    // Modify the DataSet.

    DataRow row = ds.Tables["Categories"].Rows[0];

    row["CategoryName"] = "Pastries";

    row["Description"] = "Danishes, donuts, and coffee cake";



    ds.Tables["Categories"].Rows[1].Delete();



    row = ds.Tables["Categories"].NewRow();

    row["CategoryName"] = "Baked goods";

    row["Description"] = "Bread, croissants, and bagels";

    ds.Tables["Categories"].Rows.Add(row);



    // Return DataSet

    return ds;

    }

}

If you try this web method using the Internet Explorer test page, you’ll find that the retrieved result includes a schema (at the beginning of the message), followed by a DiffGram containing the DataSet contents and recording all changes Figure 17-1 shows a partially collapsed view of this information.

Figure 17-1. A DataSet schema and DiffGram returned by a web method

Thus, a .NET client can capture this information and automatically recreate an identical DataSet instance. A third-party client, however, needs to prepare for this information and handle it accordingly.