navigation
 Wednesday, March 14, 2007

Today I battled with some interesting problems writing changes to a hierarchical datatable to a SQL Server database. Hopefully by writing about this I can save you some time.

To reduce the problem to its bare bones, I had a datatable that is a self-referencing hierarchy. The tree can have multiple roots. Each record has an ID, a nullable ParentID and some data fields. Any record with a ParentID of null is a root. In this real world app there are some additional details, for instance each record can have leaf node children in another table, but I will leave that out of this discussion.

I was working with this tree of data using an ADO.Net 2.0 strongly typed dataset. Initially, I just created the table in the dataset designer by adding a tableadapter that had a Select stored procedure that returned all the rows in the table, it looked something like this:

Here, ReportSetGroupID is the Primary Key, and ParentID references the ReportSetGroupID of its parent. ReportSetID, GroupName, TotalsType and SortOrder are data fields that you can disregard.

Next, I added stored procedures for Insert, Update and Delete. I wrote the UI layer, which uses a Telerik r.a.d. Treeview to manipulate the data in memory in the datatable, which is cached in the session. I tested out the UI layer, where the user can add nodes to the tree, delete nodes, update the names (GroupName) of the nodes, reorder children (SortOrder), etc. Satisfied that all was well and good, I added a Save button and in it called the tableadapters update method and....Kaboom!

Things did not go too well. I had added some foreign key constraints and indexes to the database to ensure that ParentID points to a valid parent record, and that SortOrder is unique for the children of a given Parent. Both of these constraint were being violated, even though the tree that I was trying to save was completely kosher. There were two basic problems:

1. If I added a node Parent, and then a child Child, and then hit Save, the order that these two records were saved would sometimes (in fact always!) be wrong. So ADO.Net would call the create procedure for the child Child first, at which point the parent Parent did not exist, and the constraint on ParentID failed.

2. If I swapped the sort order of two children, say C1 with order 1 and C2 with order 2, then these two records were both flagged as having been modified by ADO.Net, and were saved by a call to the Update stored procedure in some random order. Say C1 was saved first. It used to have sort order 1, but now it is 2. When the record is written, but before C2 is written, both C1 and C2 have sort order 2. We never get as far as updating C2 to have sort order 1.

Given these issues, I started researching whether ADO.Net could somehow be informed that it was working with a self referencing table, and thus sequence the inserts, updates and deletes intelligently. I didnt have much luck. I actually added the self referencing constraint in the dataset designer (a DataRelation with ReportSetGroupID being the Primary Key and ParentID the foreign key). But that didnt change the sequencing of database operations.

I briefly considered trying to write some smart logic that would look at the changes and manually sequence them, by maybe first dealing with all the Inserts, then all the Deletes, then all the Updates. But I kept coming up with different sequencing scenarios that would get me into trouble.

Finally, I came up with a fairly simple brute force design: I would first delete all the records from the database, and then recurse throught the tree top down and write all the nodes to the database in the correct order. Deleted nodes would not be written at all. Updated and Inserted records would be written. It seemed like a pretty straight forward idea, but the devil was in the details....

First of all, I wrote a basic method that first deleted all the records, and then iterated through all root nodes (the ones with a null ParentID) and invoked a recursive method to write that root and all its subnodes to the database:

public void SaveReportSet()
{
  // first, delete all groups and accounts in the set, we need to write them from scratch because the order of
  // updates, inserts and deletes can not be adequately controlled using just the dataset update methods
  DeleteReportSetItems();

  // now, write out all the data top down: call a recursing method for each root
  foreach (ReportSetGroupRow group in ReportSetGroup)
    if ( group.IsParentIDNull() && (group.RowState != DataRowState.Deleted))
      WriteReportSetGroupTree(group);
}

The DeleteReportSetItems cleanses all records from the table in the database. The check "group.IsParentIDNull()" makes sure we are only working on root nodes. The check "group.RowState != DataRowState.Deleted" makes sure we dont write deleted nodes.

So now I needed to write the method WriteReportSetGroupTree that would save the root node and then recursively save its children. Here was my first attempt:

private void WriteReportSetGroupTree(ReportSetGroupRow group)
{
  taReportSetGroup.Update(group);

  // recursively add all children
  foreach (ReportSetGroupRow child in group.GetReportSetGroupRows())
    if (child.RowState != DataRowState.Deleted )
      WriteReportSetGroupTree(child);
}

This approach turned out to have a few problems. First of all, the table adapter taReportSetGroup looks at the modifcation flag on each record to determine what to do. If the record was not modified at all, it does nothing. It it was modified, it calls the Update procedure. If it is a new record, it calls the Insert procedure, and finally, if it was deleted it calls the Delete procedure. Well, in my case, I wanted to write all of the records in the tree, no matter what had happened to them, as I had just deleted them all from the database! Rummaging around, I found that DataRow has a SetAdded method, that can be used to change the RowState of the record to Added. Perfect! I added group.SetAdded() to the top of my WriteReportSetGroupTree method. However, when I ran this code it raised an exception. Apparently, you can only call SetAdded if the RowState is Unchanged. Hmmm....

To get around this problem, I called ReportSetGroup.AcceptChanges() in the main method, SaveReportSet, before embarking on the recursions for the root nodes. This gets rid of all the deleted nodes from the in memory dataset, and markes all the new and updated nodes as Unchanged. Now the SetAdded call worked, and I made enough progress to get the first root node written to the database. However, none of its children made it. Why not?

This was actually a little confusing, I am not sure whether it is a bug in ADO.Net or not. Say I hade a new parent node P, and it has a child C. The table has an autoincrement Primary Key (which I by the way had to configure to have a starting value of -1 and an increment of -1 - otherwise, adding new nodes would collide with the real IDs of existing nodes in memory and cause a constraint exception in the dataset), so lets us say that P has an ReportSetGroupID of -1, and C has a ReportSetGroupID of -2. C correctly points to its parent by having a ParentID of -1. OK, so now we write P to the database. When it is inserted into its table, it gets assigned a real ID from the database, say 314, which overwrites the -1. Now here is the weird part: even though the self referencing DataRelation is set to be cascading, it does NOT update C.ParentID to be 314! It stays as -1. The strange part is that no exception is even thrown - all that happens is that the nodes become estranged, with the child now pointing to a non existent parent ID. When I go to iterate over all the children of the parent, none are found, because no children have a ParentID of 314!

Anyway, I didnt want to spend all day trying to figure out if this was a bug or not in ADO.Net or how to fix it, so I figured I would just keep a record of the old value of ReportSetGroupID, then after the Insert had been completed, I would run through ALL the nodes in the table and replace any node's ParentID that had the old value with the new one. This is the modified code:

private void WriteReportSetGroupTree(ReportSetGroupRow group)
{
  // first, insert self
  group.SetAdded();

  int oldID = group.ReportSetGroupID;
  taReportSetGroup.Update(group);

  // patch the children back in (this should be taken care of automatically by the datarelation, but it doenst
  // seem to work
  foreach (ReportSetGroupRow row in ReportSetGroup)
    if ((!row.IsParentIDNull()) && (row.ParentID == oldID))
    {
      row.ParentID = group.ReportSetGroupID;
      row.AcceptChanges();
    }
  // recursively add all children
  foreach (ReportSetGroupRow child in group.GetReportSetGroupRows())
    WriteReportSetGroupTree(child);
}

Note that I had to also add row.AcceptChanges() after the change of ParentID (otherwise, when the recursion happened and the child was saved, the call to group.SetAdded() fails, because setting the ParentID sets the RowState to Modified).

Also, I could now remove the check of row.RowState != DataRowState.Deleted, as the call to AcceptChanges at the begnning of the save removed all deleted rows.

And that was all it took! This version of the code actually runs nicely. Next I tried encapsulating all of this in a TransactionScope and ran into some nasty VPN problems... but that is another story!

Hope this will save you some headaches.