Brian Slesinsky's Weblog

Thursday, 05 Oct 2006

Wrap Strings in Classes to Increase Security

Forgetting to validate user input is an increasingly common security hole in web applications. Here's an elegant way to fix a hole and make your code more understandable at the same time.

Code that reads HTTP request parameters often looks something like this:

  String productId = request.getParameter("id");

The problem here is that we haven't validated the id parameter, and this string came from the network, so it could contain just about anything. The security holes caused by this can be somewhat surprising. For example, suppose we go on to do a redirect:

  response.sendRedirect("" + productId);

See the bug yet?

Suppose id starts with a newline, such as "\nsomething". The full HTTP response will look something like this:

HTTP/1.x 302 Found

The attacker can put anything they like into the HTTP response, or even add multiple HTTP responses. (This is known as HTTP response splitting.)

It's relatively easy to spot the problem in a simple example like this because there's so little code. The solution is also easy to state: always validate any string that comes from the network.

However, there are often many method calls between the code that reads the request and the code that prints the response. Maybe we've crossed several layers as well. As the months go by and the code gets changed by multiple developers, it's easy to forget which methods have responsibility for doing input checking and drop the ball. It's very easy to assume that a variable named "productId" actually contains something that at least looks like a productId, but that only works if every request handler did its job.

Software developers have come up with many ways to fix this problem. Some of them are pretty cumbersome, like writing warnings everywhere saying who has responsibility for what, or using Hungarian notation. Some languages (such as Perl and Ruby) use taint-checking to report a runtime error when the program attempts to use untrusted input.

In Java, the approach I prefer is to introduce a new class:

public class ProductId {
  private static final Pattern regexp = ...
  private final String id;
  ProductId(String allegedId) {
    if (!regexp.matcher(allegedId).matches()) {
      throw new IllegalArgumentException("invalid id: "+allegedId);
    } = allegedId;
  public String toString() {
    return id;

At first glance this class doesn't seem to do very much. Wouldn't a checkProductId() method be sufficient?

The beauty of introducing a class is that it gives verified product id's a type. Any API that doesn't verify input itself can take arguments of type ProductId and be confident that the caller will do the checking, because the only way to get a ProductId type is to call the constructor:

  ProductId productId = new ProductId(request.getParameter("id"));

In an application written this way, the convention is that bare Strings are unverified and domain-specific types are verified (at least so far as syntax is concerned). This convention is enforced by the type-checker. When J. Random Coder comes along and writes yet another HTTP request handler, he or she will find that every relevant API takes a ProductId, and will probably complain about the silly programmers who wrote this API and went totally overboard with their do-nothing classes when a simple String would work fine, not even realizing that calling this silly constructor adds a security check.

Comments, code review, or unit testing can't give you the same guarantee because they all assume diligence on the part of the programmer. (Not that there's anything wrong with diligence, but there is a limited supply that we'd rather spend in other ways.) Taint-checking would do it (assuming reasonable test coverage) but we don't have that in Java. What we do have is a pretty good type system, so we might as well get it to do some work for us.

As an added bonus, when you introduce specific types for things, your code gets easier to read. For one thing, naming variables or methods "productId" becomes silly, because then you have:

  // Do you think this code uses a product id?
  ProductId productId = foo.getProductId();

Now that we have a type for it, we don't have to keep repeating to ourselves that this String is a product id, and so is this String, and this other method returns a String that is also a product id, and the third String argument in this method call is a product id, and so on and so forth. Those sort of naming schemes are for languages with weak type systems, not Java. We can use names for higher-level concepts:

  ProductId itemToDisplay = foo.getSelection();

And if that's not enough reason to do it, your favorite IDE will become smarter. When it does method completion on a product id, no longer will it give you irrelevant String methods like toLowerCase() or replaceFirst(), things that nobody should want to do with a product id anyway. (Or if they really want to, they can call toString() first.) Now it will give you intelligent suggestions like getShard() or isISBN(), because ProductId becomes the logical place to put these methods instead of hiding them in some utility class somewhere. And when you get usages on a product id, the IDE will no longer show you every place a String is used (which is everywhere, which is a useless search result). Instead you'll see code that actually has something to do with product id's.