Synchronizing Databases with a Web Service

©2004 - Alex Homer, Stonebroom Limited, England. 

 

Read Part 1

 

In a previous article, we examined how we can easily expose data through an ASP.NET Web Service, in a format that is useful for all kinds of clients. By exposing a .NET DataSet object, we effectively deliver a standard format XML document that completely represents the contents of that DataSet. This is called a diffgram in the .NET world, and can be used to reconstruct the original DataSet complete with identical columns, data types, columns metadata, primary and foreign keys, and even the relationships between the tables in the DataSet.

An Overview of the Synchronization Process

In this article, we develop on the principles we discussed in the previous article to build a utility that allows the contents of two or more databases to be synchronized across the Internet, by delivering the rows that have been added to each one to the other database server(s). The principle is relatively simple:

 

  • Each database has the same set of tables, with identical structure, for the tables that will be synchronized.

 

  • Each table has a column containing a value that identifies that database or machine, so that rows can be identified depending on which machine they originally came from.

 

  • Each machine exposes the same Web Service that can deliver a DataSet containing the rows to be synchronized to any of the other machines.

 

  • Each machine calls this Web Service on all the other machines or a regular basis, fetching new rows from each one and adding them to its own database.

 

The schematic in Figure 1 shows the overall process for two machines. The important point to note is that the process must avoid collecting all the new rows from the other machine(s). By specifying the value for the column that identifies each machine when querying the Web Service, we ensure that only the new rows that have been added to that machine are returned. If we fail to do this, the first machine will copy the new rows from the second machine, and then - when the second machine fetches the new rows from the first machine - it will collect the rows it previously sent to the first machine as well. This will cause duplication of the new rows.

 

Figure 1 - The synchronization process in outline

Choosing a Client Platform for the Synchronization

The Web Service we use is implemented in ASP.NET, as demonstrated in the previous article. The next question is, what do we use to build the client utility that will consume the Web Service and update the database on each machine?

 

To be able to handle the DataSet the is exposed by the Web Service, we obviously require a .NET-aware client, and so the choices come down to using an ASP.NET Web Forms page, a console application, a Windows service, or a Windows Forms application. However, we want the process to be "automatable", in other words we want it to be executed at specified intervals without human intervention. This basically rules out an ASP.NET Web Forms page.

 

Any of the other three types of application will work, though we chose to implement it as a Windows Forms application. This can provide a better user experience when setting up and experimenting with the synchronization process, without subjecting the user to the limitations of command window output. It can be run on a regular basis using Windows Task Scheduler, under any account that you specify (not necessarily the currently logged-on user).

 

The other possible option is to build it as a Windows service, which runs continuously on the server without requiring a user to be logged onto the machine. However, this makes presenting visual feedback for testing and setting up more difficult. It also has the disadvantage that it consumes server resources all the time, when the replication process might only take place occasionally.

 

In the example we're showing here (synchronizing the Web server log file summary data), the process only takes place once a week, and runs for only a couple of minutes, so having the application permanently loaded and running as a Windows service seems to be overkill. However, if you choose to build the client as a Windows service, there is no requirement for a user to be logged on at the time that it runs.

The Synchronization Example Database

On our own Web sites, we maintain logs of traffic using both IIS logging (via ODBC to a database), and custom logging of ASP and ASP.NET session starts that stores details of the browser's user agent string, language setting, and any referrer details (the URL of the page containing a link they clicked to get to our site). This data is summarized once a week. Stored procedures within the database populate a series of six tables with the totals for each type of data that we want to collect.

 

Figure 2 shows the six tables, plus the extra table named LangCodes that provides a text equivalent of the language code that the browser exposes, for example "en-us" which represents "English (United States)". As the design of the database has matured over time, the way that the date for each row is stored has changed from being a datetime column (as in the PagesByWeekSummary and TrafficByHourSummary tables) into a two separate int columns (as in the remaining tables).

 

Figure 2 - The table used in the synchronization example

 

Using the data in these tables, we provide information on visitor patterns and page popularity to administrators, and we also expose some of the data in graphical form to visitors to our site. You can see this on the "traffic" pages at http://www.daveandal.net/traffic.asp. Figure 3 shows some examples.

 

Figure 3 - The graphical displays created from the site log database tables

The task, therefore, is to synchronize the six tables across all the machines that contain copies (mirrors) of the site. In fact we only operate two sites, but the process and example application you'll see in the rest of this article has been designed to manage more than just two sites.

The Unique Site Identification Key

As we mentioned earlier, one important factor is that each database must include in all the synchronized tables a column that identifies the machine that originally collected the data and added it to the tables. In our example, the key is the IP address of the host server - the machine that contains the site - and this is stored in the TSiteIP or HostIP column of each table.

 

Therefore, the Web Service that exposes the data for each site can select only the rows that were added to the tables by that site, and ignore any that were previously fetched from other servers that take part in the synchronization process.

The Synchronization Web Service

The Web Service we use for this example is pretty much the same in concept and outline as that we looked at in the previous article. The Class file (named getlogstable.asmx in the winforms-app\WebService folder of the examples) has the following format:

 

<%@WebService Language="VB" Class="SynchIISLogs"%>

 

Imports System

Imports System.Data

Imports System.Data.SqlClient

Imports System.Web

Imports System.Web.Services

Imports System.Configuration

 

<WebService(Description:="Synchronize IIS Logs Tables", _

 Namespace:="http://yoursite.com/webservices/synchiislogs" _

 )> Public Class SynchIISLogs

 

  ... class implementation here ...

 

End Class

 

You can see that we import the various namespaces we'll need (our database is in SQL Server so we are using the data access classes from the SqlClient namespace). We store the database connection string in web.config, so we need to import the System.Configuration namespace to be able to access this.

Deciding How To Return the Data

In the examples we looked at in the previous chapter, the DataSet we returned from our example Web Services contained only a single table. However, there is no reason why we can't add more than one table to the DataSet before returning it to the client application. There are downsides with this approach, though - it means that if an update fails you have shifted a lot of data that might not be used. By fetching each table separately, and halting the process when the first failure occurs, you reduce the bandwidth requirements. Using smaller a DataSet also minimizes the risk of time-outs in the Web service.

 

On the other hand there is an issue with the possibility of partial updates causing data errors. If we copy rows from one table, but then an error occurs in another table (or in rows encountered later in the first table), the target database will not contain a valid set of rows. The ideal solution to this is to use a connection-based transaction in the client application, so that we can roll back all the updates to all the tables if any error should occur. But this means keeping the connection open throughout the whole process, rather than closing it after each individual table has been fetched and pushed into the database.

 

In the end, we decided to implement a process that can use multiple tables, but doesn't have to. The Web Service we demonstrate here can accept a comma-delimited string of table names, and will build and return a single DataSet containing the appropriate rows from all these tables. If there is only one table specified, that's all the code will insert into the DataSet.

 

By allowing the client to collect all the rows for all the tables that they need to synchronize in one go, we make it possible to build efficient client-side applications that can still hold the connection open and perform all the updates to all the tables within a transaction. Then, after all the updates have succeeded they can be committed - or all the additions to all of the tables can be rolled back if there is an error.

 

Figure 4 - The service description page for the SynchIISLogs Web Service

The GetLogTables Method of the Web Service

Figure 4 (above) shows the service description page of the Web Service. As you can see, it exposes two Public Web methods. The first of these (named GetLogTables) is used to fetch the data. It accepts three parameters: the IP address of the server whose data should be returned (this is the key to the rows that indicates which server they came from), a comma delimited list of table names to extract the data from, and the date that the last synchronization took place. The DataSet that this method returns will then only contain rows that were generated on that host machine since the last synchronization:

 

<WebMethod(Description:="Returns a DataSet of new rows from the specified tables" _

 )> Public Function GetLogTables(sHostIP As String, _

                                 sTableNames As String, _

                                 dLastUpdate As Date) As DataSet

 

  Dim oConn As SqlConnection

 

  Try

 

    ' only allow queries for data up to six months ago

    If DateTime.Compare(dLastUpdate.AddMonths(6), DateTime.Now) < 0 Then

      Throw New Exception("Cannot query for data more than six months old")

    End If

 

    ' create new DataSet and create array of table names

    Dim oDS As New DataSet("IISLogResult")

    Dim aTables() As String = sTableNames.Split(",")

 

    ' create Connection and DataAdapter with empty SelectCommand text

    oConn = New SqlConnection(ConfigurationSettings.AppSettings("IISLogs"))

    Dim oDA As New SqlDataAdapter("", oConn)

    oDA.AcceptChangesDuringFill = False   'rows must be marked as "added"

    oConn.Open()

 

    ' iterate through list of tables filling DataSet from database

    Dim sTable As String

    For Each sTable In aTables

 

      ' change the SQL statement by setting the CommandText property of the

      ' Command object already attached to the DataAdapter

      oDA.SelectCommand.CommandText = GetSQLStatement(sHostIP, sTable.Trim(), _

                                                                 dLastUpdate)

      oDA.Fill(oDS, sTable.Trim())  ' fill this table in the DataSet

 

    Next

 

    Return oDS   ' and return the DataSet to the client

 

  Catch oErr As Exception

 

    Return GetErrorDataSet(oErr.Message)

 

  Finally

 

    oConn.Close()

 

  End Try

 

End Function

 

Notice that we check for a date more than six months old in our example. As there are a considerable number of rows for each week in some of the tables, we chose to do this to prevent overloading the server if an obviously incorrect date is provided. You can change this to place a different limit on the dates, or even remove it altogether. Then we create a new empty DataSet, giving it the name IISLogResult, and split the list of tables into a String array. The next step is to create a Connection and a DataAdapter object in the usual way, open the Connection, and then iterate through the array of table names pulling the appropriate set of rows into the DataSet so that it contains a DataTable object for each of the tables in the list.

Getting the Correct DataRowState Values for the Rows

However, there is one important issue to grasp here. When we reconstruct the DataSet on the receiving machine, we want to be able to use the Update method of an attached DataAdapter to push the rows into the matching database table on that machine. The tables have the same structure, so this isn't a problem. However, the Update method will only push the rows into the table if they are marked as being "added" or "new" rows.

 

Each row in a table (DataTable instance) within a DataSet has a RowState property that is set automatically when the values in that row are changed. The row stores the original and the current values of each column in the row, and so the Update method can tell which rows have changed (i.e. which have been added, modified or deleted), and what the original values and the current values are. So we must ensure that all the rows we pass back to the client in our DataSet tables have the value DataRowState.Added for their RowState property.

 

Normally, by default, the Fill method of the DataAdapter loads the rows into the DataTable instance and then calls the AcceptChanges method for each one. This means that their original and current values are the same, and the RowState is set to DataRowState.Unchanged. So, in this case, the rows will not be pushed into the target table on the client machine. However, by changing the AcceptChangesDuringFill property of the DataAdapter to True from its default value of False, before we call the Fill method, we prevent the AcceptChanges method from being called automatically. The result is that the DataSet will have each row still marked as "added":

 

  oDA.AcceptChangesDuringFill = False

Building the SQL Statement

As the code in the GetLogTables method iterates through the list of tables, it uses a separate Private function that creates the SQL statement to extract the required set of rows from the database. The GetSQLStatement function takes three parameters (the host IP address, the name of the table currently being processed, and the date that the last synchronization took place). It uses these values to build up the appropriate SQL statement. Notice how we have to use a GregorianCalendar instance to figure out the week number from the DateTime value that we pass to this method. This is required because, as you saw earlier, some of the tables contain a year and week number instead of a datetime column:

 

Private Function GetSQLStatement(sHostIP As String, _

                                 sTableName As String, _

                                 dLastUpdate As Date) As String

 

  Dim oCal As New System.Globalization.GregorianCalendar

  Dim iYear As Integer = dLastUpdate.Year

  Dim iWeek As Integer = oCal.GetWeekOfYear(dLastUpdate, Nothing, Nothing)

 

  Dim sSQL As String = "SELECT * FROM " & sTableName & " WHERE "

 

  If sTableName.ToLower() = "pagesbyweeksummary" _

  Or sTableName.ToLower() = "trafficbyhoursummary" Then

    sSQL &= "DATEDIFF(day, '" & dLastUpdate.ToString("yyyy-MM-dd") _

         & "', TSumDate) > 0 AND TSiteIP = '" & sHostIP & "'"

  Else

    sSQL &= "((TYearNumber = " & iYear.ToString() _

         & " AND TWeekNumber > " & iWeek.ToString() _

         & ") OR (TYearNumber > " & iYear.ToString() _

         & ")) AND HostIP = '" & sHostIP _

         & "' ORDER BY TYearNumber, TWeekNumber"

  End If

 

  Return sSQL

 

End Function

Returning Error Details

There is also a separate Private function in our Web service class that creates a DataSet containing error details, if an exception is raised within the Web Service. We looked at this routine in the previous article, and it's simply listed here again. You'll see how we use this "error" DataSet when we look at the client application later in this article:

 

Private Function GetErrorDataSet(sMessage As String) As DataSet

 

  Try

    Dim oDT As New DataTable("Errors")

    oDT.Columns.Add("Message", System.Type.GetType("System.String"))

    Dim oDR As DataRow = oDT.NewRow()

    oDR("Message") = sMessage

    oDT.Rows.Add(oDR)

    Dim oDS As New DataSet("Error")

    oDS.Tables.Add(oDT)

    Return oDS

  Catch

    Return Nothing

  End Try

 

End Function

 

Figure 5 shows the test page that ASP.NET generates automatically for the GetLogTables method when you select it in the main service description page (which was shown in Figure 4).

Figure 5 - The test page generated by ASP.NET for the GetLogTables method

 

When invoked, a diffgram such as that shown below is returned. We've removed the schema details and some of the repeated  summary data elements to make it easier to see the structure. The important point to note is that the data rows are marked as "added". You can see this from the diffgr:hasChanges="inserted" attributes on the <WeekSummary>, <CountrySummary> and <RefererSummary> elements:

 

<?xml version="1.0" encoding="utf-8" ?>

<DataSet xmlns="http://www.stonebroom.com/webservices/synchiislogs">

 <xs:schema id="NewDataSet" xmlns=""

            xmlns:xs="http://www.w3.org/2001/XMLSchema"

            xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">

   ... schema located here ...

 </xs:schema>

 <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"

         xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">

  <NewDataSet xmlns="">

 

   <WeekSummary diffgr:id="WeekSummary1" msdata:rowOrder="0"

                diffgr:hasChanges="inserted">

    <TYearNumber>2004</TYearNumber>

    <TWeekNumber>2</TWeekNumber>

    <HitCount>9980</HitCount>

    <KBytes>275442</KBytes>

    <Sessions>3206</Sessions>

    <HostIP>217.45.1.242</HostIP>

   </WeekSummary>

 

   ... more WeekSummary elements here ...

 

   <CountrySummary diffgr:id="CountrySummary32" msdata:rowOrder="31"

                   diffgr:hasChanges="inserted">

    <TYearNumber>2004</TYearNumber>

    <TWeekNumber>2</TWeekNumber>

    <ItemText>fi</ItemText>

    <ItemCount>3</ItemCount>

    <HostIP>217.45.1.242</HostIP>

   </CountrySummary>

 

   ... more CountrySummary elements here ...

 

   <RefererSummary diffgr:id="RefererSummary1019" msdata:rowOrder="1018"

                  diffgr:hasChanges="inserted">

    <TYearNumber>2004</TYearNumber>

    <TWeekNumber>5</TWeekNumber>

    <ItemText>http://www.google.ca/search</ItemText>

    <ItemCount>38</ItemCount>

    <HostIP>217.45.1.242</HostIP>

   </RefererSummary>

 

   ... more RefererSummary elements here ...

 

  </NewDataSet>

 </diffgr:diffgram>

</DataSet>

The TestSQLStatement Method of the Web Service

This second Public Web method of the Web Service is actually a debugging tool that we decided to leave in place to help when testing the service and client application. This method simply returns as a String the SQL statement that would be used to extract the data when calling the Web Service with that specific set of parameter values. It just calls the GetSQLStatement function we saw earlier with three parameters. Notice that this method only accepts a single table name:

 

<WebMethod(Description:="Returns the SQL statement for testing" _

 )> Public Function TestSQLStatement(sHostIP As String, _

                                              sTableName As String, _

                                              dFromDate As Date) As String

 

   Return GetSQLStatement(sHostIP, sTableName, dFromDate)

 

End Function

 

The result of calling the TestSQLStatement method can be seen in Figure 6. Because this table is one that contains a year and week number, the code in the GetSQLStatement function has generated a WHERE clause that specifies the appropriate year and week values:

 

Figure 6 - The result of calling the TestSQLStatement method of the Web Service

The Windows Forms Synchronization Client Application

The next step is to build the client application that will consume the Web service and update the database on the target machine. This application requires several items of information:

 

  • The URLs of the Web Service on all of the machines from which it will fetch new rows
  • The values of the keys that identify the rows added to the tables by each of the source machine (in our example this is the IP address of the source Web sites)
  • The connection string of the local "target" database into which the rows it fetches will be pushed
  • A list of the tables that it will synchronize for each machine
  • The dates that the last synchronizations took place, so that it can fetch only the rows that were added since then. These dates might be different for each of the source machines from which it collects its rows.

 

We also want to generate a log file, because Windows Task Scheduler will run the process automatically on a weekly basis. We don't want to pop up error dialogs, or just display status information within the application where it will be lost when the synchronization process ends each time. So we also store the path and name of a log file.

The Application Configuration File

The configuration information for a Windows Forms application can be stored in an XML-formatted file, using the application name with the .config suffix (in our case this is SynchIIS.exe.config). An example of the contents of this file are shown in the following listing:

 

<?xml version="1.0" encoding="utf-8"?>

<configuration>

<appSettings>

 

  <add key="SourceWebServiceURLs"

      value="http://site1.com/webservices/getlogtables.asmx,

             http://site2.com/iislogsynchservices/getlogtables.asmx"/>

 

  <add key="SourceWebSiteIPAddresses"

       value="199.11.22.33, 199.44.55.66" />

 

  <add key="TargetSqlClientConnectionString"

       value="server=localhost;database=IISLogs;uid=iislogs;pwd=" />

 

  <add key="SourceTableList"

       value="PagesByWeekSummary,TrafficByHourSummary,WeekSummary,CountrySummary,

              UserAgentSummary,RefererSummary" />

 

  <add key="LogFileFullPathAndName"

       value="c:\inetpub\wwwroot\monitoring\synchiis.log" />

 

</appSettings>

</configuration>

Viewing the Log File Remotely

Notice that our configuration file specifies that the log file will be located in a folder of the local machine's default Web site. This means that we can easily view it from a remote machine. Simply create an ASP.NET page that contains something like the following, with the virtual path and name of the log file in a server-side #include statement:

 

<%@Page Language="VB"%>

<html>

<body>

<pre><!-- #include file="synchiis.log" --></pre>

</body>

</html>

 

Of course, you probably want to disable anonymous access to this folder, which can easily be done in IIS Manager, so that only users who have the requisite permissions can view it. However, it contains little "sensitive" information anyway. Figure 7 shows the results of a synchronization process for the six tables in our example application, from two different machines. You can see that the process synchronized over 10,000 rows from the first remote machine, but there were no new rows received from the second remote machine. And the whole process only took just over a minute to execute.

 

Figure 7 - Viewing the log file for the application in an ASP.NET page

The Synchronization Process in Overview

Having built the Web Service, we now need a client application that can access it and use the data it exposes to update a database. In overview, the process is relatively simple. If you look back at the listing showing the application configuration file, you'll see that three of the entries govern what the application must achieve:

 

  • The SourceWebServiceURLs value, which is a comma-delimited string containing the URLs of the Web Services on the machines that this machine will synchronize with.

 

  • The SourceWebSiteIPAddresses value, which is a corresponding comma-delimited list of the key values in the tables of the databases on the machines that this machine will synchronize with. For example, the first value in the SourceWebServiceURLs list corresponds to the first key in the SourceWebSiteIPAddresses list.

 

  • The SourceTableList value, which is just a comma-delimited list of the tables that will be synchronized from all the other machines.

 

This means that the process must follow the pattern shown in the schematic in Figure 8. We iterate through the list of Web Service URLs (the machines that are part of the synchronization group). We fetch a Dataset that contains all the tables for synchronizing from each one in turn, and iterate through these tables pushing the new rows into the local database - using the Update method of a DataAdapter that we "attach" to the DataSet.

 

Figure 8 - An overview of the process for the synchronization client application

 

One point to note is that we don't simply iterate through the Tables collection of the DataSet. Instead, we iterate through the list of tables that is specified in the configuration file, which means that we can flag up any missing data with a message in the log file. But we start by looking for a table with the name Errors first. If there was a problem in the Web Service, all the DataSet will contain is a single table with that name, which will contain the error message sent from the Web Service.

Persisting the "Last Update" Dates

The application configuration file contains data that will rarely change - only when you need to add machines to or remove them from the replication cycle, or change the location of the Web Service files or database table lists. However, we also need a way to persist one item of information that will change each time the process is executed.

 

We need to know when the process was last run, so that we only fetch new rows from the other machines in the synchronization group. This date and time is stored in a simple text file in the application folder. However, as the process can involve more than one other machine (more than one Web Service URL), there will be a file for each of the remote machines that this machine communicates with. We chose to use the value of the "key" that identifies each machine as the filename. In our example, the "key" is the IP address of the Web site (we replace the full stops between the number groups with underscores in the code, so that the filenames will be something like 217_45_1_242.txt).

The Windows Form

The form that the user sees is shown in Figure 9 in design view in Visual Studio .NET. The controls include a multi-line TextBox, three Button controls, and three Label controls. You can load up the project from the samples that are available for download for this article to see more details and experiment with it.

 

Figure 9 - The Windows Forms application in design view in Visual Studio .NET

 

Figure 10 shows the application when it is running. As soon as you click the Run Now button, the three Label controls at the foot of the window are populated with values read from the configuration file. You can also see that the TextBox displays the same information as is written into the log file. The Run Now, Stop and Cancel buttons are enabled and disabled as the process executes, and you can cancel it at any time by clicking the Stop button (though there is usually a slight delay while the current stage of the process is completed).

 

Figure 10 - The application in action synchronizing the databases on two other machines

The Auto-Start and Auto-Stop Scheduling Feature

So that the application can be executed by Windows Task Scheduler, or any other command-line scheduling utility, we arranged for it to accept the parameter "/a" (in fact, "-a" or "\a" will work as well). The presence of this value causes the code to simulate a click on the Run Now button as soon as the application loads:

 

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) _

        Handles MyBase.Load

  ' get list of command line parameters

  Dim sParams As String = Microsoft.VisualBasic.Command()

  Me.Show()   ' show the form

  If sParams.IndexOf("/a") >= 0 Or sParams.IndexOf("-a") >= 0 _

  Or sParams.IndexOf("\a") >= 0 Then

    gbAutoStart = True   ' global variable indicating process will run automatically

    btnRunNow_Click(Nothing, Nothing)

  End If

End Sub

 

The next listing shows the main routine in the application, which is executed when the Run Now button is clicked. You can see the call to the separate routine that reads the configuration information, and the call to the main "working part" of the application - the ProcessAllSynchOperations routine. You can also see how a value of True for the "global" variable gbAutoStart will cause the program to end automatically:

 

Private Sub btnRunNow_Click(ByVal sender As System.Object, _

                            ByVal e As System.EventArgs) Handles btnRunNow.Click

  btnRunNow.Enabled = False

  btnExit.Enabled = False

  btnCancel.Enabled = True

  Try

    gbCancel = False

    StatusBox.Text = ""

    OpenLogFile()

    GetConfigurationValues()

    Application.DoEvents()

    If gbCancel = True Then

      Throw New Exception("Process canceled by user.")

    End If

    ' main routine to fetch rows and update database

    ProcessAllSynchOperations()

    WriteStatus("Completed at " & DateTime.Now.ToString("yyyy-MM-dd hh:mm:ss"))

    CloseLogFile()

    btnCancel.Enabled = False

    btnExit.Enabled = True

    btnRunNow.Enabled = True

  Catch errMain As Exception

    WriteStatus(">> " & errMain.Message)

    CloseLogFile()

    btnCancel.Enabled = False

    btnExit.Enabled = True

    btnRunNow.Enabled = True

  End Try

  If gbAutoStart Then End

End Sub

 

Of course, much of the code in the example application is concerned with opening, reading from, writing to and closing the "last update" files and the log file, and with writing status information to the screen and the log file. All this code is relatively self-explanatory and commented where necessary, and you can view it in Visual Studio or in a text editor. What we want to concentrate on for the remainder of this article are two important areas. The first involves referencing and using the Web Service from within Visual Studio (most of the newer VBA and other Microsoft applications work the same way). The second, coming shortly, is the code that we use to manage the transacted updates to the database.

Referencing and Using a Web Service in Visual Studio

After creating a new Web Forms project in Visual Studio .NET, and building the interface Form, we get to the point where we have to consider how we are going to access a Web Service on the remote machines from where we'll be collecting new rows for our database. The technique of using the WSDL.exe utility to create a proxy class, which we demonstrated in the previous article, will - of course- work just as well here (as would the technique we also demonstrated of simply loading it as an XML document). However, Visual Studio .NET makes it easy to work with Web Services.

 

We need to add a reference to the Web Service to our project, and this can be done by right-clicking Web Reference entry in the Solution Explorer window and selecting Add Web Reference. As shown in Figure 11, this opens a dialog where you can search for Web Services, see the default service pages that they expose, and view their contracts (the WSDL that defines the interface). The Web Service we want to use is the getlogtables.asmx file we have been describing earlier in this article. We placed it in a folder of the local machine's default Web site, and we can navigate to it by entering the address or by using the right-hand pane of the dialog - which is basically just a browser window.

 

Figure 11 - Adding a Web Reference to the project in Visual Studio .NET

 

When you click the Add Reference button to add the chosen Web Service reference to your project, Visual Studio fetches the WSDL file and builds a proxy for it - just as when you use the WSDL utility directly. You can see this proxy file (named Reference.vb) in the folder that VS.NET creates under Web References (see Figure 12). There is also a .disco (discovery) file, and other files that VS itself requires.

 

Figure 12 - The Web Reference as shown in the Solution Explorer and Properties windows after being renamed

 

Visual Studio uses the domain name of the Web Service as the service name by default, but you may want to change this ("localhost" is not very informative). We've renamed ours, using the Properties window, to SynchIISService - as shown in Figure 12. Once the Web Reference has been added to the project, you'll find that all the usual developer assistance features come into effect in the Visual Studio code editor. It uses the proxy class it has created to provide statement completion and pop-up tips for the object and its Public Web methods.

 

Dynamic or Static URL Behavior?

The Properties window at the bottom of Figure 12 also shows that there is a property for the Web Service called URLBehavior. This controls how the proxy will find the Web Service at runtime. If you leave it set to the default of Static, the proxy class that VS.NET creates contains the value of the WebReferenceURL property hard-coded into its constructor, and uses it to set the Url property of the class instance at runtime:

 

Public Sub New()

  MyBase.New

  Me.Url = "http://localhost/SynchWebService/webservices/getlogtables.asmx"

End Sub

 

Of course, we can change the Url property in our own code at runtime, after the instance of the proxy has been created. The one thing to be sure of is that the Web Service(s) you use when the application is deployed have exactly the same contract (the same WSDL) as the one you specify when building the application. If the WSDL changes, you must recompile the application using the new version to ensure that the proxy class behaves correctly.

If you change the value of the URLBehavior property to Dynamic, VS.NET saves the Web Reference path in a new configuration file named app.config, as a key and value in the <appSettings> section just like that we used to store our own configuration settings. This also changes the way that the constructor for the proxy class is coded, as you can see here:

 

Public Sub New()

  MyBase.New

  Dim urlSetting As String = ConfigurationSettings.AppSettings( _

                             "SynchIIS.SynchIISService.SynchIISLogs")

  If (Not (urlSetting) Is Nothing) Then

    Me.Url = String.Concat(urlSetting, "")

  Else

    Me.Url = "http://localhost/SynchWebService/webservices/getlogtables.asmx"

  End If

End Sub

 

Now the value for the WebReferenceURL property, which is where the WSDL to create the proxy originally came from, is only used if a value can't be loaded from the configuration file. But, again, we can change the Url property dynamically ourselves after we create the proxy instance in our code.

Using the Web Service Reference

Once a Web Reference is defined, we can use it in our code simply by creating an instance of the service (an instance of the proxy class that VS has created for us), and calling its methods. The FetchTables routine in the example application is responsible for fetching the DataSet containing the new rows. We create a new instantiate of the SynchIISService.SynchIISLogs object (SynchIISService is the Web Reference name we used, and SynchIISLogs is the name we used in the .asmx file for the Class that implements the Web Service). Then we can specify the Url of the instance of the Web Service we want to use and call the GetLogTables method to get back the DataSet:

 

Function FetchTables(ByVal sSourceURL As String, ByVal sWebSiteIP As String, _

                     ByVal sTableNames As String, ByVal dLastUpdate As DateTime) As DataSet

  ' generate new instance of proxy and specify source URL

  Dim oWS As New SynchIISService.SynchIISLogs()

  oWS.Url = sSourceURL

  ' call Web Service method to get DataSet

  Return oWS.GetLogTables(sWebSiteIP, sTableNames, dLastUpdate)

End Function

 

This function is used within the main section of code that performs the synchronization - a routine named ProcessAllSynchOperations. We'll finish up this article by looking at how this routine works.

The Code that Performs the Synchronization

The real work in the example application is done within a routine named ProcessAllSynchOperations. Its task is to implement the process we described in the schematic in Figure 8. Before it is called, a couple of other routines (not shown here) collect the values from the configuration file and prepare them for use within the ProcessAllSynchOperations routine.

 

The list of Web Service URLs (the Web sites that we will collect new rows from) is split into an array named gSourceURLArray. The list of "key" values (in our case the matching IP addresses of these sites) is split into an array named gSiteIPArray. Finally, the list of table names from the configuration file is split into an array named gTableArray. Then we are ready to start the process.

Iterating Through the Web Service URLs

The ProcessAllSynchOperations routine starts by iterating through the array of Web Service URLs. We use an Integer index so that we can extract the matching "key" value from gSiteIPArray. Then a separate function is called to read the file that indicates when the last update for the remote site succeeded. This is all the information we need to call the Web Service, and (if all goes well) we get back a DataSet containing all the new rows from all the tables for that site.

 

However, if there is an error within the Web Service, we will get back a DataSet containing only a single table named Errors, or we may even get back Nothing if the remote Web Service fails completely. So we check for both these conditions next. Notice how we can extract the error message if we do get an "error" DataSet sent back. We throw an exception, which is caught in the main routine within our application where the message is displayed and added to the log file, then execution stops.

 

Sub ProcessAllSynchOperations()

 

  ' iterate through all the Web Service URLs

  For iLoop = 0 To gSourceURLArray.Length - 1

 

    ' get values for this source site

    sSourceURL = gSourceURLArray(iLoop).Trim()

    sWebSiteIP = gSiteIPArray(iLoop).Trim()

    dLastUpdate = GetLastDateLoaded(sWebSiteIP)

 

    ' call routine to fetch the DataSet from the remote Web Service

    Dim oDS As DataSet = FetchTables(sSourceURL, sWebSiteIP, gTableNames, dLastUpdate)

 

    ' see if there were any errors with the Web Service

    If oDS Is Nothing Then

      Throw New Exception("No DataSet returned from Web Service")

    End If

    If oDS.Tables(0).TableName = "Errors" Then

      Throw New Exception("Error reported by Web Service: " & oDS.Tables(0).Rows(0)(0))

    End If

    ...

Iterating Through the List of Tables

At this point, we know that we have a DataSet containing the new rows, so we can start the process of pushing these rows into the local database. We need to hold references to the Connection and Transaction instances we'll be using so that we can access them in all parts of the Try...Catch construct, so we declare variables for these next. Then we can create the connection to the local database, open it, and start a transaction over this connection.

 

Next, we start to iterate through the list of tables held in gTableArray. If there is no matching table in the DataSet, (maybe there were no rows added on the source machine), we simply display an error message. However, if the table does exist, we can then see if there are any new rows in it by checking the Count property of the Rows collection:

 

    ...

    Dim oConn As SqlConnection

    Dim oTrans As SqlTransaction

    Try

 

      ' create Connection, open it and start a transaction

      oConn = New SqlConnection(gConnStr)

      oConn.Open()

      oTrans = oConn.BeginTransaction()

 

      ' iterate through all the tables in the list

      Dim sTableName As String

      For Each sTableName In gTableArray

 

        ' check if table is in DataSet - might not be if there were no new rows

        If oDS.Tables(sTableName) Is Nothing Then

          WriteStatus(">> WARNING: no rows received for table '" & sTableName & "'")

        Else

 

          ' see how many rows are in this table

          Dim iRows As Integer = oDS.Tables(sTableName).Rows.Count

          If iRows > 0 Then

            Try

              ...

Pushing the New Rows into the Database

Now comes the interesting part. We need to attach a DataAdapter to the DataSet, and call its Update method to push the new rows in the current table into the database. However, we also need to ensure that all this happens within the control of the currently pending transaction, and this involves some extra code to get it all to work successfully.

 

Normally, we can generate a suitable DataAdapter by specifying the connection and a SQL statement that would select the appropriate set of rows from the table:

 

Dim oDA As New SqlDataAdapter("sql-statement", connection-instance)

 

However, this only sets the SelectCommand of the DataAdapter. To create a suitable InsertCommand, we would then add a CommandBuilder to it:

 

Dim oCB As New SqlCommandBuilder(oDA)

 

Then, when we call the Update method, the CommandBuilder will generate a suitable parameterized SQL INSERT statement, and assign it to the CommandText property of the Command th