Thursday, 05 Oct 2006
Wrap Strings in Classes to Increase Security
Forgetting to validate user input is an increasingly common security
hole in web applications. Here's an elegant way to fix a hole and
make your code more understandable at the same time.
Code that reads HTTP request parameters often looks something like this:
String productId = request.getParameter("id");
doSomething(productId);
The problem here is that we haven't validated the id
parameter, and this string came from the network, so it could contain
just about anything. The security holes caused by this can be
somewhat surprising. For example, suppose we go on to do a redirect:
response.sendRedirect("https://www.example.com/page2?id=" + productId);
See the bug yet?
Suppose id starts with a newline, such as "\nsomething". The
full HTTP response will look something like this:
HTTP/1.x 302 Found
Location: http://www.example.com/page2?id=
something
The attacker can put anything they like into the HTTP response, or
even add multiple HTTP responses. (This is known as HTTP
response splitting.)
It's relatively easy to spot the problem in a simple example like this
because there's so little code. The solution is also easy to state:
always validate any string that comes from the network.
However, there are often many method calls between the code that reads
the request and the code that prints the response. Maybe we've
crossed several layers as well. As the months go by and the code gets
changed by multiple developers, it's easy to forget which methods have
responsibility for doing input checking and drop the ball. It's very
easy to assume that a variable named "productId" actually contains
something that at least looks like a productId, but that only works if
every request handler did its job.
Software developers have come up with many ways to fix this problem.
Some of them are pretty cumbersome, like writing warnings everywhere
saying who has responsibility for what, or using Hungarian notation.
Some languages (such as Perl and Ruby)
use taint-checking
to report a runtime error when the program attempts to use untrusted
input.
In Java, the approach I prefer is to introduce a new class:
public class ProductId {
private static final Pattern regexp = ...
private final String id;
ProductId(String allegedId) {
if (!regexp.matcher(allegedId).matches()) {
throw new IllegalArgumentException("invalid id: "+allegedId);
}
this.id = allegedId;
}
public String toString() {
return id;
}
...
}
At first glance this class doesn't seem to do very much. Wouldn't a
checkProductId() method be sufficient?
The beauty of introducing a class is that it gives verified product
id's a type. Any API that doesn't verify input itself can take
arguments of type ProductId and be confident that the caller
will do the checking, because the only way to get a ProductId
type is to call the constructor:
ProductId productId = new ProductId(request.getParameter("id"));
doSomething(productId);
In an application written this way, the convention is that bare
Strings are unverified and domain-specific types are
verified (at least so far as syntax is concerned). This
convention is enforced by the type-checker. When J. Random Coder
comes along and writes yet another HTTP request handler, he or she
will find that every relevant API takes a ProductId, and will probably
complain about the silly programmers who wrote this API and went
totally overboard with their do-nothing classes when a simple String
would work fine, not even realizing that calling this silly
constructor adds a security check.
Comments, code review, or unit testing can't give you the same
guarantee because they all assume diligence on the part of the
programmer. (Not that there's anything wrong with diligence, but
there is a limited supply that we'd rather spend in other ways.)
Taint-checking would do it (assuming reasonable test coverage) but we
don't have that in Java. What we do have is a pretty good type
system, so we might as well get it to do some work for us.
As an added bonus, when you introduce specific types for things, your
code gets easier to read. For one thing, naming variables or methods
"productId" becomes silly, because then you have:
// Do you think this code uses a product id?
ProductId productId = foo.getProductId();
Now that we have a type for it, we don't have to keep repeating to
ourselves that this String is a product id, and so is this String, and
this other method returns a String that is also a product id, and the
third String argument in this method call is a product id, and so on
and so forth. Those sort of naming schemes are for languages with
weak type systems, not Java. We can use names for higher-level
concepts:
ProductId itemToDisplay = foo.getSelection();
And if that's not enough reason to do it, your favorite IDE will
become smarter. When it does method completion on a product id, no
longer will it give you irrelevant String methods like toLowerCase()
or replaceFirst(), things that nobody should want to do with a product
id anyway. (Or if they really want to, they can call toString()
first.) Now it will give you intelligent suggestions like getShard()
or isISBN(), because ProductId becomes the logical place to put these
methods instead of hiding them in some utility class somewhere. And
when you get usages on a product id, the IDE will no longer show you
every place a String is used (which is everywhere, which is a useless
search result). Instead you'll see code that actually has something
to do with product id's.
respond | link |
/code
|