Thursday, 19 Oct 2006
The Hidden Cost of Server Parameters
A decision that comes up fairly often when writing server software is
whether to hard-code a value or make it configurable. It's easy to
conclude that more configuration is better, because when you hard code
something, you'll have to recompile and deploy a new version of the
server when you want to change it. But resorting to configuration
parameters too frequently has a cost that adds up after a while.
For example, suppose we are writing a servlet that does a redirect
when the moon is full:
if (isFullMoon()) {
response.sendRedirect("http://www.example.com/fullmoon.html");
}
The unit test for this code probably looks something like this:
public void testRedirectWhenMoonIsFull() {
setUpFullMoon();
sendRequest();
checkWasRedirected("http://www.example.com/fullmoon.html");
}
So far so good. But the problem with redirects is that they work for
a while and then break. Maybe we're concerned about whether the
server hosting the destination page might go away. Wouldn't it be
better to have a --enableFullMoonRedirect flag? That way a
sysadmin can fix the problem without needing help from the developers.
So, we double the number of unit tests:
public void testRedirectWhenMoonIsFullAndEnabled() {
setEnableFullMoonRedirect(true);
setUpFullMoon();
sendRequest();
checkWasRedirected("http://www.example.com/fullmoon.html");
}
public void testDontRedirectWhenMoonIsFullAndNotEnabled() {
setEnableFullMoonRedirect(false);
setUpFullMoon();
sendRequest();
checkWasNotRedirected();
}
The reason we need two tests is to make sure that the flag actually
works. You might think this is too trivial to test, but I've seen
this bug in actual production code. Someone added a flag, someone
else refactored, and the flag stopped working. There was no test and
the flag was never actually used, so nobody noticed. As a result, the
flag actually had negative shareholder value due to the confusion when
we thought it worked but it didn't.
Writing two tests isn't so bad when the flag is actually necesssary.
But suppose we have more flags? Every time we add another flag, the
number of possible configurations at least doubles. We can't test
every combination, but we should test a reasonable subset of them (all
off, all on, each one individually, and maybe a common configuration
or two).
And oh yeah, what we really need to test is the production
configuration. For a redirect to happen correctly, the code, the
production configuration, and the destination server all need to
cooperate.
It might take some work, but we can test that too: copy the production
config file, make the minimal amount of changes to get it to work in a
development environment, run the server, and then see if it still does
the redirect.
But notice how much harder this is to test than if there were no flag
at all.
So, what are the alternatives?
I'd rather treat rare config changes like this as a "fire drill" for
the team's security response procedures. How long does it take the
team to diagnose, test, and deploy a one-line bugfix? How can we make
that happen faster?
Of course, we don't want to do fire drills too often, or we'd never
get any work done. To avoid that, the next step is to automate the
response. If redirecting when the destination server is down would be
a disaster, we can write code that polls the destination occasionally
and stops redirecting if the poll fails. You don't have to poll very
often to respond faster than a human would.
respond | link |
/code
|