Using constants for table and column names

Typed vs. untyped datasets

For several years now, the typed dataset had been heralded as the weapon of choice for data access using ADO.NET. Every tutorial expands on the benefits of typed datasets. They're faster, but more importantly: they're type-safe. If the dataset contains a table called Products, with columns Price of type Double and Stock of type Integer, the typed dataset allows you to access columns using a syntax like

Stock = DS.Product(0).UnitsInStock

In a conventional 'non-typed' dataset, you would have to write: 

Stock = CDbl(DS.Tables("Product").Rows(0)("UnitsInStock"))

This is error-prone in the table and column names, and it requires casting. Additionally, the typed dataset had methods for navigating relations, detecting null values in columns, et cetera.

So, you'd be a fool not to use typed datasets all the time, right?

No quite. So far, I've come to the conclusion that typed datasets are all but unusable in most real-world scenarios. The problem lies mainly in the code bloat associated with them, but also in the way datasets need to be kept 'in sync' with the underlying database.

Dataset bloat

Typed data sets have to be designed using a designer in Visual Studio. In a specific example (a SQL Server database with thirty tables , half of them simple, two or three column lookup tables) the designer generated more than two megabytes of source code. This caused the whole project to slow down during design and debugging. Those megabytes have to be compiled, of course, but more importantly: the dataset designer becomes in-to-le-ra-bly slow. That would not be so much of a problem if you didn't need it for every little change in the dataset, but you do. And for some reason, the whole build process slows down considerably - although it shouldn't, because regenerating the dataset source code should only happen after changes to it, and rebuilds should also only occur on demand. I tried all the settings I could find, but couldn't manage to un-bloat the project, other than to get rid of the typed dataset - which did the trick just nicely.

Keeping dataset and database in sync

There is, however a problem I find even more disturbing. Typed datasets need to change when the underlying database changes. This is a cumbersome affair, even more so because of the speed problem. If the database design is stable, changes to the dataset are few, so it's not that much of an issue. But in small projects, the database is designed in parallel with the application, which requires many manual changes to the dataset, causing all kinds of nasty, hard to find bugs when dataset and database diverge. And did I mention it's s-l-o-w?

Using constants in untyped datasets

So, in situations where the database design changes often, untyped datasets are far easier and faster to work with. Their use is, however, sensitive to changes to column and table names. There is a way around that, of course:

Const ProductsTable As String = "Products"
Const UnitsInStock As String = "UnitsInStock"
 
Stock = CDbl(DS.Tables(ProductsTable).Rows(0)(UnitsInStock))

This approach is still somewhat verbose, and it still requires a cast, but at least eliminates the risk of spelling mistakes in the table and column names.

If we expand on this a little, we can generate a class DBInfo with the following content:

Public Class DBInfo
  Public Class Order
    Public Const fkProductID As String = "fkProductID"
    Public Const NumberOfUnits As String = "NumberOfUnits"
    Public Const OrderID As String = "OrderID"
    Public Const TotalPrice As String = "TotalPrice"
  End Class ' Order
 
  Public Class Product
    Public Const Description As String = "Description"
    Public Const ProductCode As String = "ProductCode"
    Public Const ProductID As String = "ProductID"
    Public Const UnitsInStock As String = "UnitsInStock"
  End Class ' Product
 
  Public Const OrderTable As String = "Order"
  Public Const ProductTable As String = "Product"
End Class ' DBInfo

This allows us to have the best of both worlds:

Stock = CDbl(DS.Tables(DBInfo.ProductTable).Rows(0)(DBInfo.Product.UnitsInStock))

We have the table names available in the DBInfo class, postfixed with 'Table', e.g. DBInfo.ProductTable. The column names of the tables are available using the table name itself, e.g. DBInfo.Product.UnitsInStock. We can even have Intellisense:

This helps inserting those long field names, but it also generates long lines. To prevent that, you could import, for example, DBInfo.Product. You can then use UnitsInStock without the prefixes. Of course, importing just DBInfo will allow you to write Product.UnitsInStock, like so:

Stock = CDbl(DS.Tables(ProductTable).Rows(0)(Product.UnitsInStock))

Now consider what needs to happen if a column name in some table changes, say 'UnitsInStock' changes to 'NumberOfUnitsInStock'. In a typed data set, we have to open the dataset designer and fix the name manually. Our code will then break where we used the old column name 'UnitsInStock'. Not a big deal: we are alerted to all occasions where the old column name was used.

In a run of the mill untyped dataset, we would have to search for the occurrence of the old column name. This is bothersome, since we're searching inside strings and Visual Studio can't flag the errors for us.

ExportDbInfo tool

Using our new approach, we have the advantage of a typed data set in that the old column name breaks our existing code - as it should! - but we don't need to maintain the dataset manually. We will, however, have to regenerate the DBInfo class but for that - surprise, surprise - I wrote a tool called ExportDbInfo, source available here. It exports a VB.NET source code file containing the class and constant definitions for Access files using an OleDbConnection. I tested it with SQL Server 2005 and that works just fine, too. (You'lll have to change the source code to use non-Access databases, though: the Access connection string is hard-coded into ExportDbInfo. The alternative, passing a complete connection string to a command line utility, is a bit too much: it's error prone, full of special characters like quotes, and too hard to remember.) It can't handle table or column names with embedded spaces, but it checks for those and refuses to run if any are found.

Retrieving table and column names: GetSchema

In order to be able to write ExportDbInfo, I needed a way to find out the names of the tables in a database, plus their column names. The nicely named method GetSchema of the OleDbConnection object seems to do just that, except that it accepts 'restrictions', which are all but undocumented. A valuable user comment on MSDN set me on track of the way to do it. The basic principle for table information is as follows:

' Get the data table containing table information
Dim TableDataTable As DataTable = Connection.GetSchema( _
  OleDb.OleDbMetaDataCollectionNames.Tables, _
  New String() {Nothing, Nothing, Nothing, "Table"} _
)

For per-table column information it works like this:

' Get schema information about the columns of this table
Dim ColumnDataTable As DataTable = Connection.GetSchema( _
  OleDb.OleDbMetaDataCollectionNames.Columns, _
  New String() {Nothing, Nothing, TableName, Nothing} _
)

Note the use of the OleDbMetaDataCollectionNames: it contains the magic strings you need to access schema information. In both cases a table is returned; the columns with the names TABLE_NAME and COLUMN_NAME are the interesting ones. I encapsulated these in a class called DbSchemaInfo.

Summarizing, I think using constants for table and column names is a nice step toward a leaner and meaner typed dataset. Stay tuned for extensions to the DBInfo class!