All I Wanted Was My Data

Posted on Tuesday, May 12, 2009 in Features

barrygervin Most .NET developers are going to need to access relational data at some point. For somebody approaching .NET for the first time, the data access story in the .NET world can be a little overwhelming to say the least. Existing .NET developers looking to update their data access techniques can face a similar overwhelming experience. This article will help you understand the currently shipping mainstream technologies and how you can choose the one that is right for your particular needs.

There is a wide array of choices from Microsoft as well as from within the .NET ecosystem. There are many considerations to keep in mind when evaluating these technologies such as type safety, developer productivity, maintainability, concurrency, database agnosticism, performance, and scalability. One element that is almost universally accepted as a best (and only) practice these days is optimistic concurrency. Just about every data access technology that you’ll be interested in is going to use this concurrency model. That is where the similarities end.

Optimistic Concurrency is about being "optimistic" that we will not run into concurrent write access to records in the database. You could think of this as "you probably aren’t going to update this record I’m about to give you, and even if you do, no one else is likely to update it at the same time". In practical terms, this means that I’m going to give you the record and forget I even gave it to you. When you want to update it, instead of just giving me the new values, you may want to let me know what version of the record you were editing so I can make sure somebody else hasn’t already changed it since I first gave it to you. Even more practically, this means, in addition to using the primary key in the where clause, also give me the original values of every column (or the ones you really care about). For SQL Server, you can be clever and use a single timestamp column that gets updated every time a record is touched.

Traditional ADO.NET

ADO.NET is a .NET brand name for all things data access from Microsoft. Typically using ADO.NET means using SqlCommand objects. Using a SqlCommand, we can pass a native T-SQL statement directly to the database. This could be a SELECT, INSERT, UPDATE, DELETE, a stored proc, or even a DDL (Data Definition Language) statement. Because ADO.NET doesn’t know too much about the database syntax, this statement is simply stored in a SqlCommand as a string and passed off directly to the database. You must use one of the Execute* methods on the SqlCommand to invoke the command. Using ExecuteNonQuery will fire the statement and assume it returns no result sets. This is useful in the Insert, Update, and Delete scenarios. To bring a result set back, we use the ExecuteReader method which will return a DataReader. A DataReader looks a little bit like a cursor which will allow us in .NET code to loop through the result set one row at a time and look at the columns. The DataReader is somewhat stupid in that it doesn’t know much about the shape or data types of the results coming back so you must use untyped access to get the columns as such in the code below:

string mySelectQuery = "SELECT OrderID, CustomerID FROM Orders";

              SqlConnection myConnection = new SqlConnection(myConnString);

              SqlCommand myCommand = new SqlCommand(mySelectQuery,myConnection);

              myConnection.Open();

              SqlDataReader myReader;

              myReader = myCommand.ExecuteReader();
              
// Always call Read before accessing data.

              while (myReader.Read())

              {

                Console.WriteLine(myReader.GetInt32(0) + ", " + 

                                  myReader.GetString(1));

              }
// always call Close when done reading.

              myReader.Close();
// Close the connection when done with it.

              myConnection.Close();

It is also important to note that the connection to the database remains open during this looping. You may want to reduce the amount of work you do in the loop to minimize the amount of time the database connection is held open. This will make your application more scalable since database connections are a finite resource.

If you are only returning a single value (like a count or exactly one column from one row) you can use the ExecuteScalar method which is a bit easier to use than a DataReader.

SqlCommands and DataReaders in general let you be very precise about the SQL that is sent to the server and control exactly when the connection is opened and closed. You have to provide a bit more muscle in getting the data out of the SQL and coerced into .NET data types, but for performance sensitive areas of your applications, this could pay off. The other thing to be aware of is that there are a different set of objects (Commands, Connections, DataReaders) for each relational database provider. So if you need to move from SQL Server to ODBC to Oracle, you’ll need different objects. Fortunately each of these objects are based on an interface and you can program through the interface by using a DbProviderFactory class to create the concrete types based on configuration data. You will have to be sure to use ANSI compatible SQL in your attempt to be database agnostic, which may run counter to providing high speed data access which often requires developers to use native & unique syntax to a particular relational database engine. Obviously your mileage might vary here.

DataSets, DataTables, Data & Table Adapters

With DataReaders you need to be careful about how long you keep your connection open while iterating. Sometimes it is better to pull back your entire result set into memory, release your connection to the database, and then process the result set afterwards. This is where DataSets can be helpful. If you have a batch processing scenario that requires the entire result set in memory or you need to provide end user editing in a grid control, then DataSets are a good place to hold that data.

One of the primary benefits of a DataSet is that it not only caches the data in an efficient memory structure, but it can provide change tracking (what records were added? Deleted? Changed? What were the original values of each column, etc). DataSets work collaboratively with a SqlDataAdapter object, which is simply a composite of 4 SqlCommands (InsertCommand, SelectCommand, UpdateCommand, and DeleteCommand) mapping to the appropriate CRUD behavior (create, read, update, and delete).

DataSets can contain multiple result sets, each mapped into its own DataTable. Tables can be related to each other with Relations which mimic foreign keys. DataSets take care of master-detail linkages and keep everything "hooked up" when inserting records into parent and child tables with identity columns. Each table will require its own SqlDataAdapter. If you are doing a mix of insert, updates, and deletes and your database has referential constraints, you’ll need to orchestrate the changes, making sure parent records are inserted before children and that deletions happen in the reverse order.

DataSets/DataTables come in two flavors: typed and untyped. Untyped DataSets simply infer the table and column definitions after executing the SqlDataAdapter.Fill method. Accessing the columns must be done similarly to DataReaders using column names in quotes or by positional reference. Typed Datasets on the other hand use wizards and a design surface to generate a strongly typed object. The typed dataset is defined by an XSD document. The XSD is then turned into a strongly typed/named class that essentially sits on top of an untyped dataset. A table mappings collection on the SqlDataAdapter class allows you to use different names for your strongly typed object than that of your database table and column names.

Many people in the .NET 1.0-2.0 era used typed datasets as business entities and wrapped some additional business logic around them. Although you could map column names, you were typically tied tightly to your table design in the database. DataSets can also be converted to and from XML, including a diffgram format that retains pending change information. In many circles, DataSets have and continue to be serialized between distributed layers of an application. This makes for a pretty productive development environment, but has drawn architectural criticisms compared to properly modeled services with messages and contracts.

In .NET 1.0, DataAdapters and DataSets were different objects that really didn’t know about each other until you executed the DataAdapter Fill or Update method and passed the DataSet/DataTable to the respective method. It was possible for the table mappings in the SqlDataAdapter to get out of sync with the strongly typed dataset. To improve this situation, Microsoft introduced TableAdapters in .NET 2.0, which are really just DataAdapters that are strongly typed and embedded into the definition of the typed dataset.

Object Relational Mapping

With .NET 3.0/3.5, MS has finally released not one but two ORM toolkits. I say finally because MS previously promised an ORM toolkit to developers in the form of a project called ObjectSpaces. ObjectSpaces was never released and its demise is clouded in a series of potentially bad integrations with WinFS, Project Green, or the MS Business Framework, all of which were eventually cancelled.

The spark that reignited MS ambitions to produce an ORM was the Language Integrated Query (LINQ) project. LINQ’s goal as part of C# 3.0 and Visual Basic 9.0 was to provide a set based metaphor for dealing with heterogeneous data types including object collections, XML, and last but not least, relational data. For the relational component of LINQ, the language team came up with a project called LINQ to SQL and shipped it as part of .NET Framework 3.5.

During the same time period, the Data Programmability Group in the SQL Server team had been working on a new layer to sit over top of SQL Server’s logical data model. This new "Entity Data Model" (EDM) would allow developers to interact with their relational database using a more abstract conceptual model based on the ideas of Dr. Peter Chen. For this team, the promise of Language Integrated Query was what really brought the notion of the EDM to life, and so began the "Entity Framework" project. The Entity Framework did not ship until several months after LINQ to SQL as part of the .NET 3.5 Service Pack 1 release.

On the outside, LINQ to SQL and the Entity Framework ORMs appear very similar. Both tools use a graphical design tool and wizard to map a relational database into an object model. They can both use LINQ queries to project relational data into objects using composable queries. Both provide change tracking so that changes made to the objects can be persisted back to the database with a single method call.

Given these similarities, Microsoft has spent some considerable energy in explaining when to use each of these technologies. While these two ORMs have strong similarities in programming models, their internal architectures are quite different. These key differences are:

LINQ to SQL has been locked directly to SQL Server, whereas the Entity Framework includes a provider model that allows it to work with any relational engine. As mid 2009 the list of available providers in the ecosystem included SQL Server, Oracle, Sybase, DB2, Informix, MySQL, PostgresSQL, SQLite, OpenLink Virtuoso, and Firebird.
LINQ to SQL does not provide a mechanism for streaming records one at a time. Using Entity SQL, the Entity Framework can provide Data Reader style access to results using the same mapping layer used for object queries.
LINQ to SQL provides an out of the box experience for lazy loading. Lazy loading allows the engine to automatically (by default) go out and retrieve new data as it is needed based on code that access different elements of the object graph. In the Entity Framework, you have to be explicit when you make trips to the database. Both ORMs allow you to load deep object graphs eagerly as well.
Perhaps the largest difference between Entity Framework and LINQ to SQL is the conceptual modeling layer that EF provides. EF provides a much richer mapping layer than LINQ to SQL, allowing you to join tables together, provide elaborate class hierarchies with abstract and concrete types, and retain many-to-many relationships in your model. Future plans will see the conceptual model being used in other MS products such as SQL Server Reporting Services and SQL Server Analysis Services.
Finally, expect to see significant investment in the Entity Framework ORM from Microsoft. Both ORMs are now managed by the same product team at MS and they have stated that while LINQ to SQL will continue to be supported and maintained, future innovation will be focused on the Entity Framework.

Other ORM Technologies

You can’t talk about ORMs in the .NET world without mentioning NHibernate. NHibernate or "NH" is a mature yet free open source framework which is a port of the popular Java ORM Hibernate. Perhaps one of the biggest distinctions of NH is its support for a Domain Driven Design workflow which purports a code-first or test-first approach to building applications. Typical NH developers build their classes first and then use an XML file to map those to a database. Unlike NH, both LINQ to SQL and Entity Framework advocate a model-first approach, and one could argue a database driven approach. Both LINQ to SQL and Entity Framework designers assume you have an existing database that needs mapping to some objects that will be derived out of your model.

Although there are other commercial and open source ORM tools available, NHibernate, LINQ to SQL, and Entity Framework are by far the most popular technologies in use today.

Enter Data Services

When it comes to building distributed applications, your mileage will vary with the array of data access technologies. Dealing with serialization and concurrency issues in those environments is beyond the scope of this article. However, it is important to mention what appears to be yet another data access technology available from Microsoft called ADO.NET Data Services, formerly code-named "Astoria". Astoria was made available with .NET 3.5 SP1.

Astoria is an HTTP service layer built on top of WCF that provides a REST-style API to your data, giving each of your elements of data a unique URI (for example, http://host/northwind.svc/Products(1) ). Out of the box, this service can be enabled for an Entity Framework model or any other IUpdateable/IQueryable data source in just a few lines of code. Data is queryable and updateable via pure HTTP verbs (PUT, POST, DELETE, and GET) using query strings and HTTP payload. Data can be serialized in either AtomPub or JSON format. The net effect is that your data model is widely interoperable with a dizzying array of potential clients and technologies.

Although you are free to build up complex query strings and HTTP payloads for just about any type of operation, the Astoria team has created client libraries to assist in these endeavors. Firstly, an ASP.NET Ajax library is available on CodePlex to allow JavaScript developers to easily work with Astoria Services. Secondly, as part of the core installation, there is a .NET Client Library which provides a natural query model using LINQ and projecting data into client side .NET objects for use by your .NET or Silverlight projects.

A new project, currently named "RIA Services" (code-named "Alexandria"), builds on top of ADO.NET Data Services by also providing rich validation and UI cues on top of your data model in a client/server model. This technology probably won’t be released until around the .NET 4.0 timeframe, but it is definitely something you should keep your eye on.

Application Frameworks

As you can see, the trend in Data Access is to abstract into higher and higher levels within your applications. As data access libraries are generating entity classes, the question around locating data validation logic can become a slippery slope. If a data access generated entity can validate correct data types, maybe it should also validate that postal codes and phone numbers are of the correct format. Should they also cross-validate postal codes with states/provinces? Should these error messages be managed inside of my entities? The waters can be muddied quite quickly.

It is therefore worthy to briefly mention a few of the more popular application frameworks that embrace a holistic view of data access in the scope of an application.

CSLA.NET (Component-based Scalable Logical Architecture) is a framework developed principally by author Rocky Lhotka in conjunction with his popular series of Business Objects books. When first released, CSLA was focused on Visual Basic 6, but over the years it has remained very current on the latest Microsoft technologies and is now principally maintained in C# and ported to VB. In CSLA, data is fully encapsulated by rich business objects that manage all behavior including persistence. CSLA facilitates the re-use of business objects in many possible client technologies including ASP.NET, WPF, WinForms, and Silverlight including distributed architectures using Web Services.

DevForce from IdeaBlade is a commercially available framework for building rich, distributed applications in Silverlight, WPF, WinForms, and ASP.NET. DevForce builds on top of the Entity Framework for its persistence layer and includes a Business Objects server for distributed architectures. DevForce is a popular application framework and has been around since 2001.

But All I Wanted Was My Data

If you can believe it, this article has been a short list of the current and popular data access technologies. I think I could safely say that the one piece of universally accepted guidance is that if the data access technology you are considering is not in the above list, then it is likely out dated or obscure and perhaps you should think twice about using it.

If you are comfortable with open source projects and you are a strong believer in Domain-Driven-Development, NHibernate is an obvious choice. There is a strong community to back you up on this decision.

If you prefer to stick with Microsoft Technologies, and you have a legacy database to start from, or prefer a model-driven or data-driven approach, then the Entity Framework is likely where you should end up.

If you prefer to work with a more structured and complete application framework, both CSLA and DevForce are worthy choices at this level and you will have community and paid-support available to back you up on each of these respectively.

The last key piece of advice to offer is that you don’t have to pick just one of these. There are pros and cons and you may have to use multiple technologies, even within the same application. If you feel the need for performance, there are no rules against dropping out of your ORM and opting for more direct access. Keep an open mind and use the best tool for the job.

—–

Barry Gervin is a founding Partner of ObjectSharp in Toronto, Canada. As a Principal Consultant, Barry provides technical leadership to his valued clients, staff, and the development community. Over his 19 year career in the IT industry, he has led many development teams to successfully deliver large software projects within tight schedules and budgets that consistently perform for their customers. Barry currently serves as a MS Regional Director in Southern Ontario and has received the Microsoft MVP Award for Solutions Architecture for the past 5 years.

| Tags: data, database, orm

Bring on the comments

Sylvia says:

July 9, 2009 at 1:48 pm

Great article! Concise, direct and informative.
David says:

September 22, 2009 at 2:29 pm

Great explanation of all the things that are in the picture today. You missed LLBLGenPro or the “other OR/M’s” I think it’s very popular too.