The Code Handyman
If it ain't broke, well, you might try refactoring to make sure your systems remain that way.
- By Mike Gunderloy
- July 01, 2002
One of the great joys of software development is that you get to build
things out of thin air and imagination (with a little help from a compiler).
You start with a white board, design some elaborate data structures and
then code away, watching your masterwork form.
But much as we'd all like to spend our days writing exciting new code,
the reality is a bit different. Most coding is maintenance programming.
One brilliant programmer puts together the core of the system and then
500 of us spend the rest of our careers maintaining it. But don't worry,
there's still room for a lot of craft in the maintenance programming world.
One interesting development in recent years is the rise of refactoring
as an accepted part of that craft.
It Doesn't Rust, But…
Joel Spolsky, a developer and the author of the Joel on Software
Weblog, has pointed out the general insanity involved in throwing away
working code and starting from scratch when you need to change something.
He argues that this is the main reason why Microsoft managed to eat Netscape's
lunch in the browser wars—Netscape wasted years on replacing their
Navigator code with a whole new version. (See "Netscape Goes Bonkers,"
at http://www.joelonsoftware.com/articles/fog0000000027.html.)
Yeah, Mozilla is shaping up to be a nice product in some ways, but there
aren't a whole lot of people left who care any more. Joel points out that
source code doesn't rust.
But although source code doesn't rust (in the sense that code which works
today is still going to work tomorrow), requirements do. We could debate
for a while whether most requirement changes are the result of business
changes or just the equivalent of bigger tail fins on this year's cars
(is the Windows XP look-and-feel technology or marketing?), but the fact
is that developers are constantly called upon to fix, tweak, and otherwise
change working code.
It's precisely because code doesn't rust that these changes take up so
much of our development energies. Having paid us tons of money to develop
software in the first place, the people who own it understandably would
like to maximize their investments. So they ask not for a rewrite but
for more characters in the country name field (who knew they were going
to start doing business in Bashkortostan?) or a different color on the
Web page (because the new CEO doesn't like pastels).
Surely there must be a way to respond to these demands and still keep
from going mad with boredom.
Enter Refactoring
That's where refactoring comes in. Martin Fowler, who wrote the
standard reference on the subject (titled simply Refactoring) defines
the term this way:
Refactoring is the process of changing a software system in such
a way that it does not alter the external behavior of the code yet improves
its internal structure. It is a disciplined way to clean up code that
minimizes the changes of introducing bugs. In essence when you refactor
you are improving the design of the code after it has been written.
If that sounds intriguing, you really ought to read Fowler's book (or
check out his Web site http://www.refactoring.com
for an introduction and some links). The book explains and justifies refactoring
in some depth and then presents a catalog of refactorings. Historically,
refactoring originated with Smalltalk. These days the concepts are most
advanced in Java, and Fowler uses Java for his examples. That can make
the book a bit hard to read in spots if you don't know the language, but
the principles apply to any object-oriented language.
To give you some flavor for the book, here's one of the simplest of refactorings:
If you find a public field, make it private and provide accessors. In
Java, that's a change from:
public String _name
to
private String _name;
public String getName() {return _name;}
public void setName(String arg) {_name = arg;}
The real benefit to the catalog of refactorings is not in identifying
them but in recording knowledge about them. Fowler justifies each one
as it's introduced (in this case, discussing the benefits of private data
to modularity) and then provides a "mechanics" section that discusses
how to safely make the change. In the case of taking a field private,
this includes finding and replacing all uses of the public field, then
making the field private—and doing a full compile-and-test cycle
after each change (which points out, by the way, one of the ways in which
refactoring fits naturally into an Extreme
Programming environment). For all but the simplest refactorings he
also provides fully worked examples.
Explorers Need Not Apply
One of the key points to refactoring is that the changes you make when
refactoring should be provably correct. That's where the part about not
changing the external behavior of the code comes in. Remember, you're
starting with code that works. You want to end up with code that works,
regardless of the changes that you're making. This means that it's OK
to make very structured changes, one at a time. Things like extracting
a superclass from the common features of two similar classes don't change
the interface of the existing classes at all; they just rearrange the
internal implementation.
Refactoring isn't exploration. Though there is a place in software development
for experiments along the lines of "what would happen if we ripped out
this section here and replaced it with new objects written in C# and then
changed the interfaces to match?", that place is not in the refactoring
process. (In fact, the place for such radical changes is in a new branch
of our source code control system, one that you can throw away if you
realize that you've reached the point where the map says "Here There Be
Tygers.") Refactoring is more like maintaining a garden: you take what's
already there and make it neater, while still preserving the original
design.
Key to this process is testing. You absolutely need a good set of unit
tests that exercises any code that you intend to refactor. Even if you
can prove to your own satisfaction that the refactoring changes don't
change the behavior of the code, run the tests! Otherwise, the time will
come when you're awakened at three in the morning because you neglected
to think about some special case or other feature that you refactored
out of existence.
Bait and Switch?
But wait a minute! I started out by talking about maintenance programming
in response to changing requirements, and now I'm talking about refactoring
that doesn't change anything. Why do all this work if it's not going to
change anything? The answer is that refactoring doesn't change anything
externally, but it certainly changes things internally. Take the example
of deriving a superclass from two existing classes. Perhaps you have Customers
and Suppliers who share many pieces of information, such as address, phone
number, and fax number. So in refactoring you might create an Entity class
containing the common fields, and derive the two existing classes from
Entity. That's an internal change that makes the code significantly cleaner.
Of course, clean code doesn't excite the business folks the way it does
the development folks. But think about the effect of this refactoring
on future requirements changes: The next time the business folks come
to you with a new piece of common information ("oh, can we add a Web site
address to Customers? And to Suppliers, while we're at it?"), your job
is much easier as a result of the refactoring. Now what looks like two
changes to your customers is one to your code, thanks to clever refactoring.
And that's where the benefits of this approach to come in. If you're
maintaining code, think about refactoring it at the same time. As you
find ways to improve the internal structure so that it's more maintainable,
take the time to make the improvements. If the code is in bad shape (perhaps
your predecessor was not as brilliant as you are), you might need to keep
a list of refactorings that you'd like to make. Then you can pull things
off the list as you find time to work on them. Each refactoring is a tiny
investment in future maintainability for your code. Just like making tiny
investments in a money market fund, these can really add up over time.
State of the Art
You might have already made this leap, but just in case: refactoring
is an obvious candidate for tool support. After all, if there's a provably—correct
transformation (like turning public fields into private ones with accessors)
and a recipe for executing the transformation, why can't the whole process
be automated? The answer is that it can and that there are many tools
(almost all in the Java or Smalltalk arena) that can perform specified
refactorings on your code.
But why should the Java developers have all the fun? Visual Studio .NET
has all of the elements that it could possibly need to support a good
refactoring tool: complete code introspection and creation via System.Reflection,
an add-in model that lets third parties integrate their work, and object-oriented
languages to edit. So where are the .NET refactoring tools? Are you going
to write the first one? If so, write and tell me about it!