lambdaphant’s posterous

lambdaphant’s posterous

Michael Matuzak  //  Programmer by day, booze drinking calamari cruncher by night.

Dec 12 / 6:11pm

Nonexistent Database Application

In software development a common problem is versioning database changes. Keeping code under version control is a necessity for every programmer, but when faced with versioning a database this becomes quite complex. The problem presents itself with a separation between schema and data, and changes to both schema and data. Schema changes can effect the entire software group and thus bring down an entire development process if not correctly applied. The same can be said for data changes. In this functional spec I want to explore the current state of ways to version the database, and also look at a non-existent piece of software to see if it is worth writing. 

Rolling Your Own
This is perhaps by far the most popular method of database versioning. I have no real statistics to back up this claim, so for all intents and purposes you should just ignore it. I have asked people what they use for database version control and next to nothing this is the top contender. I can't really speculate as to the official process that people use, but there are some general guidelines experts say you should follow. 

1. Keep a baseline schema with persistent data. (It is important to remember that there are two kinds of basic data in a database. Those that should be included in the schema and those that should not. For the sake of this discussion let's call the data that should be included with the schema "static data" and those that should not be included in the schema "dynamic data".) For a more professional discussion of this you should check out K. Scott Allen's article on Versioning Databases.

2. There has to be a way to keep schema and static data up to date. Often as development on a project is made changes are made to the database schema and the static data. Those changes should be versioned. The question then becomes how do we keep track of these changes? There are different methods, but they seem to be along the lines of a migration script that maybe stores some SQL that modifies either structure or data that will eventually get run on all databases. 

Using Rails Migrations
From the little work that I have done in Rails I was very impressed with database migrations. Essentially Rails provides some scripts that generate models and migrations for you, as well as scripts that apply those migrations. The migration scripts can then be versioned just like you would any file. This coupled with a deployment tool like Capistrano makes database migrations across multiple servers pretty easy. Now it is important to note that this has a couple drawbacks. 

1. There are a lot of people that do not work in Rails. 

2. You aren't really versioning the database, but rather the Ruby scripts. 

Of course for complaint number 2 you could use that along with a Roll Your Own approach and have both at your fingertips.
There really isn't a solution that I can see for complain 1. Maybe you could figure out a way to use only the Rails migration methods for your project. I do not know enough about Ruby/Rails to know if this is possible or 
how much work it would involve. I am going to guess that it would be quite a bit of work/hassle to get going even if it were possible. It just doesn't seem really worth it to have Rails models generated that you have to keep up to date as 
well when you are not even using Rails. 

Using Django
Django can automagically create your database schema from models. This is very nice as all that you are required to do is run django-admin.py syncdb and it will go through and create the db for you. This is all fine and nice, but the command syncdb 
doesn't do exactly what it says. It doesn't actually sync the db, but rather create something if it doesn't exist. You cannot make a model run syncdb, go and change that model around and then run syncdb again and it change everything that you want. 
It is good for initial development, but it does not generate migration scripts as Rails does. Even if it did it would still suffer from the same problem as Rails which is simply, not everyone uses Django.

Some Other Framework
I know there are solutions for .NET and probably other frameworks that I know little to nothing about. I am not even going to comment on them precisely because I know little or nothing about them.

Proposed Solution
An application that would introspect your database create a baseline schema for you. The user would of course have to provide it with what tables/columns contain static data, but it would take care of the rest. This application would also create migration scripts 
that could be versioned and would allow for easy migrations. This application would have the following requirements:

1. It would be language agnostic. It doesn't care what language your application is written in. It probably wouldn't need to look at your models(maybe a later version could for some reason), but would rather gain all knowledge about your database from your database.

2. It would be as database agnostic as possible. I would not expect that the initial versions would fit this requirement, but eventually it would not care if you were running MySQL, Postgres SQL, MS-SQL, or whatever. There is of course a limitation to this. For instance, it wouldn't be practical to say that it supports every single database in the entire universe. Database syntax can be widely different, so it would try to stick to some sort of ANSI-SQL standard as much as possible. I think the initial target should be MySQL, Postgres, and MS-SQL, as those seem to be the most popular.

3. It would not be an entire deployment framework. There are already a couple of good ones out there such as Capistrano and Fabric. This is simply for maintaining version control over databases.

I have no idea exactly how it would work. I expect that if development starts on it the approach would change very quickly anyways. How I imagine that it would work though would be like this.

1. You outline what data you want to be static. This could either be from interactive questions asked at the command line, or read in from a config file.


2. The application then looks at your database and makes a baseline schema by making a SQL schema and data dump of the tables that you have outline. It only does a schema dump for those that you have not outline.

3. It drops this file in a sql/ tree at the root of your project. Let's call it sql/schema.sql.

4. You develop your project. Schema and data change. You run application against the new version of the database and it diffs the changes that have been previously outlined. It then creates a migration script from this data and places it in sql/migrations/.

This seems to offer much variation to the programmer still. They could come up with a plan as to how best use this application.
Loading mentions Retweet
Filed under  //  database   proposal   version control  

Comments (0)