I recently had a lengthy coversation with some colleagues that proved a great opportunity to talk about intentionality and the often subtle nature of premature optimization and cargo-cult coding practices.
It started with a discussion about an optimization in the JVM to eliminate the historical performance issues with inline String concatenation. It brought up some interesting questions of design, both historical and current.
If you remember back to the bad old days, you’ll recall that this pattern of String use was strongly discouraged:
1
|
|
This code has the problem of allocating and immediately discarding
multiple String instances: one for "Hello, "
, one for
"Hello, <name>"
, one for "!"
, and one for "Hello, <name>!"
.
As you add more concatenations, so you allocate more intermediate
Strings and performance gets progressively worse (especially if the
String grows large).
The solution to this performance problem is to use a mutable builder:
StringBuffer
, back before 1.5, and StringBuilder
ever since. Now
your code would be:
1 2 3 4 5 |
|
So, now your code is nice and performant, but at what cost? A trivial one liner is now either excessively long or a rather unwieldy 5-line monster. What was previously obvious is now hidden: the “interesting” behavior of customizing a hello message to a specific person is buried behind an object construction, a series of method calls, a conversion back to String…those 4 characters naming the variable to interpolate are surrounded by a morass of syntax.
The optimization in newer versions of the JVM makes a simple
observation: it’s a trivial, mechanical transform to take the
inefficent +
form and convert it to the StringBuilder form. Indeed,
if you look at the bytecode generated by a recent version of javac
you’ll see that this is exactly what it does. So, no need to use the
messy StringBuilder
version: great!
So, how does this all play into the design of a system? Well, I would argue that even before Java made this optimization it was almost always a mistake to actually use the StringBuilder form for immediate String construction.
It is a rare situation that immediate String construction is actually a performance issue in a system. One or more of the following properties have to hold for it to actually matter:
- The String being constructed must be large anough that the performance variance is measurable. Large probably means “well in excess of 1,000 bytes”, depending on frequency of construction.
- The String being constructed must be constructed many times. Many for a modern system means “multiple times per second”, not a few thousand times over the lifetime of an app
In my experience these properties tend not to hold far less often than we see the use of StringBuilders: generating large Strings in tight loops is something that does happen, but typically in well-constrained parts of the system.
This doesn’t mean that StringBuilder isn’t a useful and commonly
applicable API. Instead, it means that you should use it when you
want to communicate a specific intent to the user. StringBuilder
implies mutability: its raison d’etre is to allow for
progressive construction of a String. String implies immutability:
it is one of the most immutable structures possible in Java,
being both declared immutable at the spec level and made final
to
prevent subclass mutation.
I will use a StringBuilder
primarily to indicate that I intend to
do something imperative in nature. The typical example would be
looping through a list of some kind and appending entries to a
string, or pulling together bits of data from here and there to
build something bigger. By contrast, I will tend to use a String
and concatenation where I’m doing something more functional and
side-effect free: for example, when recursively constructing a
String, or building something simple from immediately avaiable
values with no conditional juggling. I will on very rare
occassions pass a StringBuilder for mutation, though in general treat
that as a code smell to be refactored out later (the dangers of
out-parameters deserve a post all their own).
So, my general rule is this: if you’re constructing an immediate,
immutable value, you should always have been and continue to favor
the +
concatenation form. If you’re starting construction of a
value that you intend to add to progressively,
use the StringBuilder. Your decision should not be based on
performance until you have a provable data point showing that
performance matters: write for your reader, not your compiler.
You’ll get better performance by having an easy to understand and
change design than you’ll ever get from micro-optimizing with a
cargo-culted pattern.