When it comes to storing data, there is a stupid amount of choice nowadays. You have relational databases, graph databases, document databases, in-memory databases, embedded databases, and some others I haven’t mentioned.
Suffice to say we are spoiled for choice.
Throughout my career, I have played around with at least one of each of these and used many of them in production systems.
In recent years though, I have developed a preference for starting with PostgreSQL and seeing how far I can get before I need to change to, or more likely add, a more specialised data store to cover whatever use case PostgresSQL cannot meet.
This post will cover why that is and is the first in our new ‘Why we like: …’ series in which we will explain why we like certain technologies or frameworks that we have used and continue to use at °41North.
Most things are relational
It has been my experience that for the majority of use cases a relational model is usually a good fit.
Take, for example, the first thing that is typically added to a new system: Authentication and Authorization.
It is quite natural to think of a
User as having one or more
Roles, one or more
Profile, one or more
Payment Methods, and so on.
All this fits quite naturally into a relational model, thereby reducing duplication of data and benefiting quite nicely
from things like foreign key constraints ensuring referential integrity and cascaded deletes when a
User is removed,
But NoSQL can do that too!
Indeed they can. NoSQL databases, in their various incarnations, are perfectly capable of modelling data in a relational way. No doubt about it.
<!-- Begin sarcasm -->
When you think about it, it’s just some embedded foreign keys pointing to some entries elsewhere. Any key/value store is capable of that. You could even embed the related entities entirely.
Yes, you might need to duplicate some data and lose some of the niceties like foreign key constraints, cascading deletes and, so on. But come on, how hard is that to implement really? You can do all that yourself in the application layer with some hooks on your entities.
Do you want to be webscale or not?!?
<!-- End sarcasm -->
All sarcasm aside, there are some genuine use cases for which NoSQL databases are a better fit, and times when de-normalizing your data has very real benefits.
What I have seen over the last decade, however, is a tendency by developers to easily discount everything that a relational database can give you, citing them as being too slow or cumbersome, in favour of a data store that will somehow magically make you more performant and scalable.
They do so without an appreciation for just how those supposed improvements in performance and scalability are achieved, and inevitably attempt to engineer poorer versions of some features of relational databases on top of NoSQL stores.
Which brings me to the main feature of relational databases, which I only give up when I have to: ACID transactions.
Transactions, transactions! My kingdom for transactions!
I will concede that ACID transactions have started making their way into a variety of NoSQL implementations more recently.
However, ACID transactions and relational databases go hand-in-hand like ice cream and apple pie, Simon and Garfunkel, Kirk and Spock… you get the idea.
They have been around for donkey’s years as they say, and as a result, they are well understood and well optimized, and a crucial component when implementing an effective relational model.
OK fine, relational is great, but what about MySQL?
MySQL is a fine database, I’ve used it many times over the years. Comparatively speaking PostgreSQL and MySQL share a lot of the same features including:
- Materialized views for caching complex and expensive queries.
- Table partitioning whereby a table can be chunked by key ranges allowing access to be better optimized.
- Tablespaces allowing for more granular control of where the files backing a table or tables are located.
- A robust plugin API for creating extensions. PostgreSQL, for example, has TimescaleDB for use with time series data.
All things being equal, as it seems they are, you could say it’s a coin toss between PostgreSQL and MySQL.
Unless you consider the one thing PostgreSQL has which MySQL does not, and which I consider being a ‘killer feature’: Transactional DDL
We’ve all been there, whether it’s developing a new feature locally or releasing to production.
You run the database migrations, and it turns out you made a mistake with the column definition, or the production database structure had a manual change that wasn’t in the staging environment.
Your migration fails!
Now if your database is MySQL, you will find yourself in a messy halfway state in which part of your migration has been applied. You now have three options:
- Rebuild from scratch if it’s a local development database.
- Restore from a backup.
- Manually unpick the half applied changes.
With PostgreSQL there is a fourth option:
- Fix the migration and run it again.
Since most DDL statements within PostgreSQL have transactional support, provided you executed your migration scripts within a transaction whenever one of them fails, any changes are rolled back cleanly.
No harm, no foul.
Speaking from my own experiences, I have seen this feature prevent some very sticky situations over the years.
Whilst there is some support for transactional DDL in other relational databases such as Oracle and SQL Server, none of them appears to implement it quite so robustly as PostgreSQL.
When it comes to deciding on a data store there are a lot of factors to consider and nuance to be aware of. PostgreSQL will not be suitable for all use cases.
What I hope to have shown however is that as a starting point, PostgreSQL is typically a good first choice. It has certainly served me well over the years.
Get in touch
We are always happy to talk about databases and software architecture in general. Feel free to reach out via our contact form or by sending an email to [email protected]. We try to respond within 48 hours and look forward to hearing from you.