String builders, concatenation, performance, and design

I recently had a lengthy coversation with some colleagues that proved a great opportunity to talk about intentionality and the often subtle nature of premature optimization and cargo-cult coding practices.

It started with a discussion about an optimization in the JVM to eliminate the historical performance issues with inline String concatenation. It brought up some interesting questions of design, both historical and current.

If you remember back to the bad old days, you’ll recall that this pattern of String use was strongly discouraged:

String result = "Hello, " + name + "!";

This code has the problem of allocating and immediately discarding multiple String instances: one for "Hello, ", one for "Hello, <name>", one for "!", and one for "Hello, <name>!". As you add more concatenations, so you allocate more intermediate Strings and performance gets progressively worse (especially if the String grows large).

The solution to this performance problem is to use a mutable builder: StringBuffer, back before 1.5, and StringBuilder ever since. Now your code would be:

String result = new StringBuilder()
                    .append("Hello, ")
                    .append(name)
                    .append("!")
                    .toString()

So, now your code is nice and performant, but at what cost? A trivial one liner is now either excessively long or a rather unwieldy 5-line monster. What was previously obvious is now hidden: the “interesting” behavior of customizing a hello message to a specific person is buried behind an object construction, a series of method calls, a conversion back to String…those 4 characters naming the variable to interpolate are surrounded by a morass of syntax.

The optimization in newer versions of the JVM makes a simple observation: it’s a trivial, mechanical transform to take the inefficent + form and convert it to the StringBuilder form. Indeed, if you look at the bytecode generated by a recent version of javac you’ll see that this is exactly what it does. So, no need to use the messy StringBuilder version: great!

So, how does this all play into the design of a system? Well, I would argue that even before Java made this optimization it was almost always a mistake to actually use the StringBuilder form for immediate String construction.

It is a rare situation that immediate String construction is actually a performance issue in a system. One or more of the following properties have to hold for it to actually matter:

The String being constructed must be large anough that the performance variance is measurable. Large probably means “well in excess of 1,000 bytes”, depending on frequency of construction.
The String being constructed must be constructed many times. Many for a modern system means “multiple times per second”, not a few thousand times over the lifetime of an app

In my experience these properties tend not to hold far less often than we see the use of StringBuilders: generating large Strings in tight loops is something that does happen, but typically in well-constrained parts of the system.

This doesn’t mean that StringBuilder isn’t a useful and commonly applicable API. Instead, it means that you should use it when you want to communicate a specific intent to the user. StringBuilder implies mutability: its raison d’etre is to allow for progressive construction of a String. String implies immutability: it is one of the most immutable structures possible in Java, being both declared immutable at the spec level and made final to prevent subclass mutation.

I will use a StringBuilder primarily to indicate that I intend to do something imperative in nature. The typical example would be looping through a list of some kind and appending entries to a string, or pulling together bits of data from here and there to build something bigger. By contrast, I will tend to use a String and concatenation where I’m doing something more functional and side-effect free: for example, when recursively constructing a String, or building something simple from immediately avaiable values with no conditional juggling. I will on very rare occassions pass a StringBuilder for mutation, though in general treat that as a code smell to be refactored out later (the dangers of out-parameters deserve a post all their own).

So, my general rule is this: if you’re constructing an immediate, immutable value, you should always have been and continue to favor the + concatenation form. If you’re starting construction of a value that you intend to add to progressively, use the StringBuilder. Your decision should not be based on performance until you have a provable data point showing that performance matters: write for your reader, not your compiler. You’ll get better performance by having an easy to understand and change design than you’ll ever get from micro-optimizing with a cargo-culted pattern.

Code noise

Musings on software construction

String Builders, Concatenation, Performance, and Design