Quick database change management without external tools

Ad-hoc schema management, but less so

Most applications use persistent data in some form or another, and because the data itself will change more slowly than the application code, we’ll often need to change it’s representation over time to match changes in the application.

Recently for work, I’ve been looking at improving our schema management tools around Cassandra, and as part of that I’ve been considering what the smallest possible thing that would work for schema management is.

The core

At it’s core, we want a few things:

Some other systems (such as flyway) also provide additional features, such as:

Verification ensures that the actual state of the database hasn’t diverged from what what is in source control. Whilst it’s possible to deploy systems without this feature, the absence of the feature means there’s a greater risk of production accidents. For example, if someone mistakenly updates a schema change that has already been deployed, then the application might end up asking for things the database doesn’t actually have.

Whilst verification means that you can still edit deployed schemas in source control, it provides a fail safe, so we know there’s something wrong while deploying the application, rather than from server runtime errors.

Many systems also provide a way to describe database schema changes in the programming language of the day. While this allows developers to avoid having to dig into the details of the underlying database immediately, in my experience it’s best to expose the details sooner rather than later. It’s also not truly essential to solving the problem.

Feature testing

One approach I’ve used before is to reflect on the current database state (via the system catalog tables), and use explicit tests to see if we should apply a change, as in schema.sql from pg-queue.

However, this is less than ideal, since we’ve ended up with more logic to test if we need to apply the change, than to just apply it. It also relies heavily on the details of the PostgreSQL catalog, which whilst useful, ends up rather verbose.

A solution

I ended up running onto this on a recent personal project, so figured I’d take an another run at this.

So for example, PostgreSQL provides the PL/pgSQL, which allows us to express procedural in the database without the need for a seperate driver program. This allows querying the database, as well as executing dynamic commands. PL/pgSQL doesn’t support first class functions, or otherwise passing blocks of statements around as values, so the best we can do is to store them as text.

All of this code can be found as persistence.sql.

Firstly, we need a way to record what’s been applied, so far:

We use CREATE TABLE IF NOT EXISTS since we’ll be evaluating this script every time we want to apply any changes, so we need it to be idempotent. CREATE TABLE will fail if the table already exists by default, but we’re confident we can make any needed changes separately.

Next we have the core of the operation, the apply_change function:

And an example of usage would be:

We pass in a human readable name for the change, but then quote the actual change description as text within $$ delimiters.

Applicability

As mentioned, the main reason I went for this approach is that it ended up being quicker for me to quickly script something, than it would have been to find a more fully fledged migration management. However, this is effectively a prototype, so it’s okay if we have a few things which are slightly inelegant.

Whilst it works, though, for anything larger or longer term I’d want a more flexible implementation with better tooling.