Gurn has been gaining a lot of traction over the past few weeks, as such our traffic has been getting higher and higher each day. This is fantastic for the product, and it gives us a perfect way to start seeing the quality of our code.
Yesterday we had an interesting error pop through to Sentry, our error tracking tool. It seems a race condition happened inside a pgsql query, one of our stored procedures had been triggered twice with the exact same data and one was winning on an insert causing a foreign key constraint, this is often called a "phantom read". This is a situation that can happy quite easily for a range of reasons outside of our control, and so it was important that we looked around for a way to fix this.
We worked through a number of different approaches to solving this, ultimately we ended up using transaction isolation. Feel free to skip directly to that section if you want the details of how we did it. However, for those of you here for the long haul, lets walk through one of the other common paths we ultimately decided not to use.
on conflict do...
Plpgsql supports a conflict resolution approach on an insert statement which generally looks something like this.
-- Do nothing if an insert conflicts insert into articles (id, title, content) values (3, 'PostgreSQL function transaction isolation', 'Gurn has been gaining...') on conflict do nothing; -- Do an update instead (upsert) if the insert conflicts insert into articles (id, title, content) values (3, 'PostgreSQL function transaction isolation', 'Gurn has been gaining...') on conflict do update articles set title = '...' where id = 3;
In both of the above examples, we get the option to resolve a conflict on an insert. This works really well until you want to return a value, even if there's a conflict. Currently in Postgres 9.6 and Postgres 10, if you choose to
do nothing, you will not get any result from a returning clause. For example:
-- Return the ID of the inserted field insert into articles (id, title, content) values (3, 'PostgreSQL function transaction isolation', 'Gurn has been gaining...') on conflict do nothing returning id; -- This will return no results if the insert does not succeed
returning clause will only return an ID from either a successful insert, of a fallback
update, there is no way to get the ID without adding / updating data. In small databases, it might be okay to update identical data and overwrite the record in some way to get the ID back, but this can have bad side effects in a large database. The biggest issue for us, is that this will cause an
update trigger to happen, despite the data being the same. This can cause a number of things to happen in the database which we may not want to happen such as versions being incremented and various logs updated.
As a result of this side effect, using the
on conflict do isn't appropriate for our needs, not until postgres implement some sort of on conflict do select.
Setting transaction isolation levels
We decided to go with transaction isolation levels so we can ensure we don't get phantom reads. Most of our functions don't perform any write operations, so we don't need to worry about phantom reads being much of an issue in these instances.
The final result is very simple, the first line of our stored procedure is now
set transaction isolation level serializable;. This simply sets the transaction isolation level to any valid state. In our case, we chose to go with serializable but only for transactions where we know this sort of double request race condition might occur. The postgres documentation lists all the isolation levels with very clear explanations of their pros and cons.
I am by no means an expert in Postgres or databases for that fact, so I'm very open to other suggestions if anyone has any better ideas! You can email me or hit me up on social media if you've got any thoughts.
Update: After trying this out for a few hours in production, we actually quickly pulled this change. It turns out, that when using serializable isolation, if two pending transactions result in a conflicting read/write scenario, then one is thrown away with an error and the other is completed. This is obviously worse than the original phantom read as it could occur more frequently. As a result we abandoned this, and moved more towards the first option using a
do nothing and a subsequent
select if the resultant
returns is null.
As ever, I and the team don't claim to be experts in postgres but we did learn a lot carrying out this research!