Object Context

As you’ve seen, the object context provides access to entities. For each entity we define in our EDM, the generated object context class provides a property that we can use as the source for a LINQ query. We’ve also used its CreateQuery<T> method to build ESQL-based queries. The object context provides some other services.

To execute database queries, it’s necessary to connect to a database, so the object context needs connection information. This information typically lives in the App.config file—when you first run the EDM wizard, it will add a configuration file if your application does not already have one, and then it adds a connection string. Example 14-13 shows a configuration file containing a typical Entity Framework connection string. (This has been split over multiple lines to fit—normally the connectionString attribute is all on one line.)

This is a rather more complex connection string than the one we saw back in Example 14-1, because the Entity Framework needs three things in its connection string: information on where to find the EDM definition, the type of underlying database provider to use, and the connection string to pass to that underlying provider. This last part—an ordinary SQL Server connection string, enclosed in &quot; character entities—is highlighted in Example 14-13 in bold.

The three URIs in the metadata section of the connectionString—the ones beginning with res://—point to the three parts of the EDM: the conceptual schema, the storage schema, and the mappings. Visual Studio extracts these from the .edmx file and embeds them as three XML resource streams in the compiled program. Without these, the EF wouldn’t know what the conceptual and storage schemas are supposed to look like, or how to map between them.

Note

It may seem a bit weird for the locations of these EDM resources to be in a connection string. It might seem more natural for the XML to use a separate attribute for each one. However, as you’ve seen, the System.Data.EntityClient namespace conforms to the ADO.NET v1 model so that it’s possible for old-style data access code to perform queries against the EDM. Since the ADO.NET v1 model includes an assumption that it’s possible to put all the information defining a particular data source into a single connection string, the Entity Framework has to follow suit. And since the EF cannot function without the XML EDM definitions, the connection string has to say where those live.

After the EDM metadata resources, you can see a provider property, which in Example 14-13 indicates that the underlying database connection is to be provided by the SQL Server client. The EF passes the provider connection string on to that provider.

You don’t have to use the App.config to configure the connection. The object context offers a constructor overload that accepts a connection string. The configuration file is useful—it’s where the object context’s no-parameters constructor we’ve been using in the examples gets its connection information from—but what if you want to let just the underlying database connection string be configurable, while keeping the parts of the connection string identifying the EDM resources fixed? Example 14-14 shows how you could achieve this. It retrieves the configured values for these two pieces and uses the EntityConnectionStringBuilder helper to combine this with the EDM resource locations, forming a complete EF connection string.

This code uses the ConfigurationManager in the System.Configuration namespace, which provides a ConnectionStrings property. (This is in a part of the .NET Framework class library that’s not referenced by default in a .NET console application, so we need to add a reference to the System.Configuration component for this to work.) This provides access to any connection strings in your App.config file; it’s the same mechanism the EF uses to find its default connection string. Now that Example 14-14 is providing the EDM resources in code, our configuration file only needs the SQL Server part of the connection string, as shown in Example 14-15 (with a long line split across multiple lines to fit). So when the application is deployed, we have the flexibility to configure which database gets used, but we have removed any risk that such a configuration change might accidentally break the references to the EDM resources.

Besides being able to change the connection information, what else can we do with the connection? We could choose to open the connection manually—we might want to verify that our code can successfully connect to the database. But in practice, we don’t usually do that—the EF will connect automatically when we need to. The main reason for connecting manually would be if you wanted to keep the connection open across multiple requests—if the EF opens a connection for you it will close it again. In any case, we need to be prepared for exceptions anytime we access the database—being able to connect successfully is no guarantee that someone won’t trip over a network cable at some point between us manually opening the connection and attempting to execute a query. So in practice, the connection string is often the only aspect of the connection we need to take control of.

So far, all of our examples have just fetched existing data from the database. Most real applications will also need to be able to add, change, and remove data. So as you’d expect, the Entity Framework supports the full range of so-called CRUD (Create, Read, Update, and Delete) operations. This involves the object context, because it is responsible for tracking changes and coordinating updates.

Updates—modifications to existing records—are pretty straightforward. Entities’ properties are modifiable, so you can simply assign new values. However, the EF does not attempt to update the database immediately. You might want to change multiple properties, in which case it would be inefficient to make a request to the database for each property in turn, and that might not even work—integrity constraints in the database may mean that certain changes need to be made in concert. So the EF just remembers what changes you have made, and attempts to apply those changes back to the database only when you call the object context’s SaveChanges method. Example 14-16 does this. In fact, most of the code here just fetches a specific entity—the most recent order of a particular customer—and only the last couple of statements modify that order.

To add a brand-new entity, you need to create a new entity object of the corresponding class, and then tell the object context you’ve done so—for each entity type, the context provides a corresponding method for adding entities. In our example, the context has AddToCustomers, AddToSalesOrderHeaders, and AddToSalesOrderDetails methods. You will need to make sure you satisfy the database’s constraints, which means that the code in Example 14-17 will not be enough.

The Entity Framework will throw an UpdateException when Example 14-17 calls SaveChanges because the entity is missing all sorts of information. The example database’s schema includes a number of integrity constraints, and will refuse to allow a new row to be added to the SalesOrderDetail table unless it meets all the requirements. Example 14-18 sets the bare minimum number of properties to keep the database happy. (This is probably not good enough for real code, though—we’ve not specified any price information, and the numeric price fields will have default values of 0; while this doesn’t upset the database, it might not please the accountants.)

Several of the constraints involve relationships. A SalesOrderDetail row must be related to a particular row in the Product table, because that’s how we know what products the customer has ordered. We’ve not defined an entity type corresponding to the Product table, so Example 14-18 just plugs in the relevant foreign key value directly.

The database also requires that each SalesOrderDetail row be related to exactly one SalesOrderHeader row—remember that this was one of the one-to-many relationships we saw earlier. (The header has a multiplicity of one, and the detail has a multiplicity of many.) The constraint in the database requires the SalesOrderID foreign key column in each SalesOrderDetail row to correspond to the key for an existing SalesOrderHeader row. But unlike the ProductID column, we don’t set the corresponding property directly on the entity. Instead, the second line of Example 14-18 sets the new entity’s SalesOrderHeader property, which as you may recall is a navigation property.

When adding new entities that must be related to other entities, you normally indicate the relationships with the corresponding navigation properties. In this example, you could add the new SalesOrderDetail object to a SalesOrderHeader object’s SalesOrderDetails navigation property—since a header may have many related details, the property is a collection and offers an Add method. Or you can work with the other end of the relationship as Example 14-18 does. This is the usual way to deal with the relationships of a newly created entity—setting foreign key properties directly as we did for the other relationships here is somewhat unusual. We did that only because our EDM does not include all of the relevant entities—we represent only three of the tables because a complete model for this particular example would have been too big to fit legibly onto a single page. There may also be situations where you know that in your particular application, the key values required will never change, and you might choose to cache those key values to avoid the overhead of involving additional entities.

We’ve seen how to update existing data and add new data. This leaves deletion. It’s pretty straightforward: if you have an entity object, you can pass it to the context’s DeleteObject method, and the next time you call SaveChanges, the EF will attempt to delete the relevant row, as shown in Example 14-19.

As with any kind of change to the database, this will succeed only if it does not violate any of the database’s integrity constraints. For example, deleting an entity at the one end of a one-to-many relationship may fail if the database contains one or more rows at the many end that are related to the item you’re trying to delete. (Alternatively, the database might automatically delete all related items—SQL Server allows a constraint to require cascading deletes. This takes a different approach to enforcing the constraint—rather than rejecting the attempt to delete the parent item, it deletes all the children automatically.)

Example 14-18 adds new information that relates to information already in the database—it adds a new detail to an existing order. This is a very common thing to do, but it raises a challenge: what if code elsewhere in the system was working on the same data? Perhaps some other computer has deleted the order you were trying to add detail for. The EF supports a couple of common ways of managing this sort of hazard: transactions and optimistic concurrency.

Transactions are an extremely useful mechanism for dealing with concurrency hazards efficiently, while keeping data access code reasonably simple. Transactions provide the illusion that each individual database client has exclusive access to the entire database for as long as it needs to do a particular job—it has to be an illusion because if clients really took it in turns, scalability would be severely limited. So transactions perform the neat trick of letting work proceed in parallel except for when that would cause a problem—as long as all the transactions currently in progress are working on independent data they can all proceed simultaneously, and clients have to wait their turn only if they’re trying to use data already involved (directly, or indirectly) in some other transaction in progress.[38]

The classic example of the kind of problem transactions are designed to avoid is that of updating the balance of a bank account. Consider what needs to happen to your account when you withdraw money from an ATM—the bank will want to make sure that your account is debited with the amount of money withdrawn. This will involve subtracting that amount from the current balance, so there will be at least two operations: discovering the current balance, and then updating it to the new value. (Actually it’ll be a whole lot more complex than that—there will be withdrawal limit checks, fraud detection, audit trails, and more. But the simplified example is enough to illustrate how transactions can be useful.) But what happens if some other transaction occurs at the same time? Maybe you happen to be making a withdrawal at the same time as the bank processes an electronic transfer of funds.

If that happens, a problem can arise. Suppose the ATM transaction and the electronic transfer both read the current balance—perhaps they both discover a balance of $1,234. Next, if the transfer is moving $1,000 from your account to somewhere else, it will write back a new balance of $234—the original balance minus the amount just deducted. But there’s the ATM transfer—suppose you withdraw $200. It will write back a new balance of $1,034. You just withdrew $200 and paid $1,000 to another account, but your account only has $200 less in it than before rather than $1,200—that’s great for you, but your bank will be less happy. (In fact, your bank probably has all sorts of checks and balances to try to minimize opportunities such as this for money to magically come into existence. So they’d probably notice such an error even if they weren’t using transactions.) In fact, neither you nor your bank really wants this to happen, not least because it’s easy enough to imagine similar examples where you lose money.

This problem of concurrent changes to shared data crops up in all sorts of forms. You don’t even need to be modifying data to observe a problem: code that only ever reads can still see weird results. For example, you might want to count your money, in which case looking at the balances of all your accounts would be necessary—that’s a read-only operation. But what if some other code was in the middle of transferring money between two of your accounts? Your read-only code could be messed up by other code modifying the data.

A simple way to avoid this is to do one thing at a time—as long as each task completes before the next begins, you’ll never see this sort of problem. But that turns out to be impractical if you’re dealing with a large volume of work. And that’s why we have transactions—they are designed to make it look like things are happening one task at a time, but under the covers they allow tasks to proceed concurrently as long as they’re working on unrelated information. So with transactions, the fact that some other bank customer is in the process of performing a funds transfer will not stop you from using an ATM. But if a transfer is taking place on one of your accounts at the same time that you are trying to withdraw money, transactions would ensure that these two operations take it in turns.

So code that uses transactions effectively gets exclusive access to whatever data it is working with right now, without slowing down anything it’s not using. This means you get the best of both worlds: you can write code as though it’s the only code running right now, but you get good throughput.

How do we exploit transactions in C#? Example 14-20 shows the simplest approach: if you create a TransactionScope object, the EF will automatically enlist any database operations in the same transaction. The TransactionScope class is defined in the System.Transactions namespace in the System.Transactions DLL (another class library DLL for which we need to add a reference, as it’s not in the default set).

For as long as the TransactionScope is active (i.e., until it is disposed at the end of the using block), all the requests to the database this code makes will be part of the same transaction, and so the results should be consistent—any other database client that tries to modify the state we’re looking at will be made to wait (or we’ll be made to wait for them) in order to guarantee consistency. The call to Complete at the end indicates that we have finished all the work in the transaction, and are happy for it to commit—without this, the transaction would be aborted at the end of the scope’s using block. For a transaction that modifies data, failure to call Complete will lose any changes. Since the transaction in Example 14-20 only reads data, this might not cause any visible problems, but it’s difficult to be certain. If a TransactionScope was already active on this thread (e.g., a function farther up the call stack started one) our TransactionScope could join in with the same transaction, at which point failure to call Complete on our scope would end up aborting the whole thing, possibly losing data. The documentation recommends calling Complete for all transactions except those you want to abort, so it’s a good practice always to call it.

TransactionScope represents an implicit transaction—any data access performed inside its using block will automatically be enlisted on the transaction. That’s why Example 14-20 never appears to use the TransactionScope it creates—it’s enough for it to exist. (The transaction system keeps track of which threads have active implicit transactions.) You can also work with transactions explicitly—the object context provides a Connection property, which in turn offers explicit BeginTransaction and EnlistTransaction methods. You can use these in advanced scenarios where you might need to control database-specific aspects of the transaction that an implicit transaction cannot reach.

Besides enabling isolation of multiple concurrent operations, transactions provide another very useful property: atomicity. This means that the operations within a single transaction succeed or fail as one: all succeed, or none of them succeed—a transaction is indivisible in that it cannot complete partially. The database stores updates performed within a transaction provisionally until the transaction completes—if it succeeds, the updates are permanently committed, but if it fails, they are rolled back and it’s as though the updates never occurred. The EF uses transactions automatically when you call SaveChanges—if you have not supplied a transaction, it will create one just to write the updates. (If you have supplied one, it’ll just use yours.) This means that SaveChanges will always either succeed completely, or have no effect at all, whether or not you provide a transaction.

Transactions are not the only way to solve problems of concurrent access to shared data. They are bad at handling long-running operations. For example, consider a system for booking seats on a plane or in a theater. End users want to see what seats are available, and will then take some time—minutes probably—to decide what to do. It would be a terrible idea to use a transaction to handle this sort of scenario, because you’d effectively have to lock out all other users looking to book into the same flight or show until the current user makes a decision. (It would have this effect because in order to show available seats, the transaction would have had to inspect the state of every seat, and could potentially change the state of any one of those seats. So all those seats are, in effect, owned by that transaction until it’s done.)

Let’s just think that through. What if every person who flies on a particular flight takes two minutes to make all the necessary decisions to complete his booking? (Hours of queuing in airports and observing fellow passengers lead us to suspect that this is a hopelessly optimistic estimate. If you know of an airline whose passengers are that competent, please let us know—we’d like to spend less time queuing.) The Airbus A380 aircraft has FAA and EASA approval to carry 853 passengers, which suggests that even with our uncommonly decisive passengers, that’s still a total of more than 28 hours of decision making for each flight. That sounds like it could be a problem for a daily flight.[39] So there’s no practical way of avoiding having to tell the odd passenger that, sorry, in between showing him the seat map and choosing the seat, someone else got in there first. In other words, we are going to have to accept that sometimes data will change under our feet, and that we just have to deal with it when it happens. This requires a slightly different approach than transactions.

Optimistic concurrency describes an approach to concurrency where instead of enforcing isolation, which is how transactions usually work, we just make the cheerful assumption that nothing’s going to go wrong. And then, crucially, we verify that assumption just before making any changes.

For example, an airline booking system that shows a map of available seats in an aircraft on a web page would make the optimistic assumption that the seat the user selects will probably not be selected by any other user in between the moment at which the application showed the available seats and the point at which the user picks a seat. The advantage of making this assumption is that there’s no need for the system to lock anyone else out—any number of users can all be looking at the seat map at once, and they can all take as long as they like.

Occasionally, multiple users will pick the same seat at around the same time. Most of the time this won’t happen, but the occasional clash is inevitable. We just have to make sure we notice. So when the user gets back to us and says that he wants seat 7K, the application then has to go back to the database to see if that seat is in fact still free. If it is, the application’s optimism has been vindicated, and the booking can proceed. If not, we just have to apologize to the user (or chastise him for his slowness, depending on the prevailing attitude to customer service in your organization), show him an updated seat map so that he can see which seats have been claimed while he was dithering, and ask him to make a new choice. This will happen only a small fraction of the time, and so it turns out to be a reasonable solution to the problem—certainly better than a system that is incapable of taking enough bookings to fill the plane in the time available.

Sometimes optimistic concurrency is implemented in an application-specific way. The example just described relies on an understanding of what the various entities involved mean, and would require us to write code that explicitly performs the check described. But slightly more general solutions are available—they are typically less efficient, but they can require less code. The EF offers some of these ignorant-but-effective approaches to optimistic concurrency.

The default EF behavior seems, at a first glance, to be ignorant and broken—not only does it optimistically assume that nothing will go wrong, but it doesn’t even do anything to check that assumption. We might call this blind optimism—we don’t even get to discover when our optimism turned out to be unfounded. While that sounds bad, it’s actually the right thing to do if you’re using transactions—transactions enforce isolation and so additional checks would be a waste of time. But if you’re not using transactions, this default behavior is not good enough for code that wants to change or add data—you’ll risk compromising the integrity of your application’s state.

To get the EF to check that updates are likely to be sound, you can tell it to check that certain entity properties have not changed since the entity was populated from the database. For example, in the SalesOrderDetail entity, if you select the ModifiedDate property in the EDM designer, you could go to the Properties panel and set its Concurrency Mode to Fixed (its default being None). This will cause the EF to check that this particular column’s value is the same as it was when the entity was fetched whenever you update it. And as long as all the code that modifies this particular table remembers to update the ModifiedDate, you’ll be able to detect when things have changed.

If any of the columns with a Concurrency Mode of Fixed change between reading an entity’s value and attempting to update it, the EF will detect this when you call SaveChanges and will throw an OptimisticConcurrencyException, instead of completing the update.

How you deal with an optimistic concurrency failure is up to your application—you might simply be able to retry the work, or you may have to get the user involved. It will depend on the nature of the data you’re trying to update.

The object context provides a Refresh method that you can call to bring entities back into sync with the current state of the rows they represent in the database. You could call this after catching an OptimisticConcurrencyException as the first step in your code that recovers from a problem. (You’re not actually required to wait until you get a concurrency exception—you’re free to call Refresh at any time.) The first argument to Refresh tells it what you’d like to happen if the database and entity are out of sync. Passing RefreshMode.StoreWins tells the EF that you want the entity to reflect what’s currently in the database, even if that means discarding updates previously made in memory to the entity. Or you can pass RefreshMode.ClientWins, in which case any changes in the entity remain present in memory. The changes will not be written back to the database until you next call SaveChanges. So the significance of calling Refresh in ClientWins mode is that you have, in effect, acknowledged changes to the underlying database—if changes in the database were previously causing SaveChanges to throw an OptimisticConcurrencyException, calling SaveChanges again after the Refresh will not throw again (unless the database changes again in between the call to Refresh and the second SaveChanges).

If you ask the context object for the same entity twice, it will return you the same object both times—it remembers the identity of the entities it has returned. Even if you use different queries, it will not attempt to load fresh data for any entities already loaded unless you explicitly pass them to the Refresh method.

This raises the question of how long you should keep an object context around. The more entities you ask it for, the more objects it’ll hang on to. Even when your code has finished using a particular entity object, the .NET Framework’s garbage collector won’t be able to reclaim the memory it uses for as long as the object context remains alive, because the object context keeps hold of the entity in case it needs to return it again in a later query.

There are other lifetime issues to bear in mind. In some situations, an object context may hold database connections open. And also, if you have a long-lived object context, you may need to add calls to Refresh to ensure that you have fresh data, which you wouldn’t have to do with a newly created object context. So all the signs suggest that you don’t want to keep the object context around for too long.

How long is too long? In a web application, if you create an object context while handling a request (e.g., for a particular page) you would normally want to Dispose it before the end of that request—keeping an object context alive across multiple requests is typically a bad idea. In a Windows application (WPF or Windows Forms), it might make sense to keep an object context alive a little longer, because you might want to keep entities around while a form for editing the data in them is open. (If you want to apply updates, you normally use the same object context you used when fetching the entities in the first place, although it’s possible to detach an entity from one context and attach it later to a different one.) In general, though, a good rule of thumb is to keep the object context alive for no longer than is necessary.



[38] In fact, it gets a good deal cleverer than that. Databases go to some lengths to avoid making clients wait for one another unless it’s absolutely necessary, and can sometimes manage this even when clients are accessing the same data, particularly if they’re only reading the common data. Not all databases do this in the same way, so consult your database documentation for further details.

[39] And yes, bookings for daily scheduled flights are filled up gradually over the course of a few months, so 28 hours per day is not necessarily a showstopper. Even so, forcing passengers to wait until nobody else is choosing a seat would be problematic—you’d almost certainly find that your customers didn’t neatly space out their usage of the system, and so you’d get times where people wanting to book would be unable to. Airlines would almost certainly lose business the moment they told customers to come back later.