Let's create a simple table with one row to get started:
data:image/s3,"s3://crabby-images/01567/0156780f6d0690d1706db4e81eb18c3dbfc83163" alt=""
So there's the new row inserted, with the transaction ID xmin active for the statement that inserted it. There's no maximum yet because nothing has ever updated or deleted this row.
Now run psql, start a transaction with BEGIN, and update the row--but don't commit it just yet:
data:image/s3,"s3://crabby-images/716be/716be478ef9fe711b7db2aa0dc83fc68de2cc7ae" alt=""
From the perspective of this session, the row has been changed. But since it has not been committed yet, other sessions will not see it. You can prove that by connecting to the database from another client session and looking at t; it will show the original value still:
data:image/s3,"s3://crabby-images/fcf2d/fcf2de941456ccda8717054f4e004c132ee8af9b" alt=""
But it already knows that once transaction 10420501 commits, this version of the row is obsolete.
This behavior is the essence of MVCC: each database client session is allowed to make changes to a table, but it doesn't become visible to other sessions until the transaction commits. And even after that point, sessions that are already open will not normally see that change until they are finished. Similarly, once a session starts a transaction, it's blind to most changes made to the database after that point, and its own changes are kept private until it commits. And even those aren't considered final, even in the session that made the change--if you ROLLBACK its transaction, the original values had better still be around to return to. In this case, even though the last row here is shown with a lifetime ending with an xmax value of 10420501, if that transaction rolls back that doesn't matter.
As you might guess from studying the example, the server actually makes a second row on disk when you issue an UPDATE statement. Obviously the original one can't be overwritten, given this behavior. It has no choice but to duplicate the original row, apply the update, and then save this new version. The change from the old to the new version is marked with a transaction identification number, which is what's used to determine which sessions that the updated row should be visible to.
The first thing to recognize, then, is that an UPDATE statement consists of the following steps:
- Read the original row in.
- Change the fields adjusted by the UPDATE statement.
- Save the new row into a newly allocated location with a new transaction ID.
- When no one else is using it anymore, VACUUM will delete the original row.