Brian Slesinsky's Weblog

Tuesday, 16 Mar 2004

You're Going To Need Persistence

As far as I can tell, the "You Aren't Gonna Need It" principle (abbreviated YAGNI) advocated by the Extreme Programming crowd isn't really a principle at all, but rather a catch-phrase to end an argument. It actually means, "we should do that later. Don't argue." I think the test-first folks have good reasons for implementing things in a certain order, but the real reasons are not necessarily the ones they use to justify it.

Take persistence for example. The claim is that you might not need a database, so don't bother implementing it until later. I haven't tried it myself, but it looks like what actually happens is that you implement the persistence layer three times: in-memory, using flat files, and as a database. Implementing three persistence layers is not really the "simplest thing that could possibly work" (another catch-phrase). But the result is a more flexible application with a better architecture.

The in-memory layer is implemented first because you need it to make unit tests run fast (as a "mock object"), and it's the easiest way to prototype. This is for the developer, not the customer, but it will quickly pay for itself for anyone who is serious about unit tests.

Flat files come next because they are useful for small, single-user datasets, and in the early phases of a new software project, all datasets are small. Both developers and customers will build many datasets while trying out the new application. It's very convenient - everyone who has a computer already knows how to handle flat files, and they already have all the tools they need. The operating system stores them directly, they can be exchanged as email attachments, shared on the web, checked into source control, backed up using any number of methods, and so on. Furthermore, text files can edited by hand and compared for differences, and there are even more tools for XML files.

The next step (depending on the application) is the database, which allows customers to build larger datasets and allows multiple users to modify the same dataset. Putting this last means you already have a lot of code written without a database in mind, which might need to be refactored. But you also have sample data, an easy-to-use import/export/backup format, some real-world experience, and a pretty good idea what the schema should be, before designing the first table. Plus, the import/export format is independent of any particular database technology, making it easier to redesign the schema or switch database vendors, especially in the early stage when the databases are still fairly small.

A DBA can be pretty effective under these working conditions. Don't let the cheesy slogans put you off.