Veering Off the Well-Worn Path
It's time to reconsider how code generators can save you from the doldrums of coding.
If a soiled shirt is placed in the opening of a vessel containing grains
of wheat, the reaction of the leaven in the shirt with fumes from the wheat
will, after approximately twenty-one days, transform the wheat into mice.
- By Mike Gunderloy
- November 01, 2003
Oh, would that software were as easy to produce as mice! Well, sometimes it
can be. At least, that's a rough summary of Jack Herrington's new book, Code
Generation in Action (Manning, 2003, or click
here to order). Herrington writes about code generators: custom
programs that build source code, based on some sort of input file. Although
some developers seem to have a "real men don't use code generators"
(so they use lots of cut and paste?) attitude, it's very clear that code generators
can be essential when you're trying to get a large project done on time and
Considering the usefulness of code generators, it's somewhat surprising that
more developers don't depend on them. Do we actually like writing the same series
of statements to initiate a database connection, read the results of a stored
procedure and return it as an object property five hundred times? Are we too
busy writing repetitive code to learn new tricks? Are we all secretly afraid
of making it clear how easy our jobs are? I don't know the answer, but I do
know that this book can provoke thought and perhaps change your working habits.
And that's a good thing.
| Want to read more of Mike's work? Sign up for
the monthly Developer Central e-newsletter, including
product reviews, links to web content, and more, at http://lists.101com.com/nl/main.asp?NL=adt.
Getting Active With It
Herrington starts with a case study and then identifies two basic kinds of code
generators: active and passive. A passive code generator dumps code into your
project and forgets about it. Lots of wizards and other IDE tools work this
way. By contrast, an active code generator (and all of the code generators in
the book fall into this class) takes responsibility for maintaining the code.
When you want to change the classes from an active code generator, you tweak
the input file or the code generator itself; you never directly edit the output.
He goes on to identify six basic types of active code generation:
- The code munger takes an input file, parses it, and creates an output file
from some built-in or external template.
- The inline-code expander takes source code with some special markup and
creates production code from it. Embedded-SQL generators, which allow you
to drop SQL statements into C or Java (for example) code work this way.
- The mixed-code generator is similar to the inline-code expander, except
that the results are written right back to the input file. For example, special
comments might specify delegate code that needs to be created and added to
- The partial-class generator reads some sort of abstract definition file
and builds a base class source code file to implement the definition. The
user can then create a derived class to get the final desired functionality.
- The tier generator builds an entire tier (typically, a data access tier)
from an abstract definition. UML products that integrate with your IDE can
fall into this calss.
- The full-domain language is a Turing-complete programming language created
just for your problem. It gives you a general-purpose way to specify code
that should be created.
The book contains examples of the first five of these, and some limited discussion
of what a full-domain language can look like.
Ruby? What's a Ruby?
Nope, not the precious stone, the programming language. You'll find Ruby at
http://www.ruby-lang.org. You'll also
find it at the heart of this book. All of the examples (and there are many!)
are written in Ruby.
You may not have run across it in the past, but Ruby is a general-purpose,
object-oriented scripting language with some similarities to Perl
or Python. Ruby supports a number
of features that are useful for code-generator writers, including built-in support
for regular expressions and portable I/O coding constructs. If you've never
used Ruby, fear not. The book includes a chapter on setting up Ruby and essential
add-ons like an XML parser and a templating package. Herrington also builds
a set of classes to parse C, C++, Java, SQL, and PostgreSQL code. And a helpful
appendix provides an introduction to Ruby (starting, of course, with the venerable
Hello World example).
There's More Than Code in This Stew
So far, I've been talking about code generation as a simple process that puts
together some sort of source code that you can feed into a compiler. And indeed,
that's one way to look at it. But if you're looking at it that way, you've got
blinders on. Through examples and discussion, Herrington shows how to use the
same sort of technology to come up with a variety of end products, only some
of which fall into the traditional category of "code." These include:
- Database access layers
- User interface code, for production or test
- Documentation (think about the XML comments in C#, which can be turned into
HTML help by NDoc)
- Unit tests (if you're going to generate code, why not generate tests for
the code as well?)
- Web services
- Business logic layers
- DLL wrappers for legacy code
- Firewall configuration files
And that just covers some of the possibilities. What it boils down to is this:
If you can describe an output that you'd like to get, you can likely build a
code generator to take the description to the actual output. (This leaves aside
the question as to whether it's more work to write the output or build the generator;
making that decision is one of the topics Herrington tackles). For example,
if you're building a product for Windows, it's quite feasible to think of a
code generator that produces all or part of the MSI file that will feed the
product to the Windows Installer service.
For a bunch of folks working on the cutting edge of technology, we developers
tend to have a surprising fear of the unknown and the new. One of the most valuable
features of this book is the straightforward discussion of common concerns that
people new to the world of code generation often display. For example, in the
chapter on generating data access layers (which is the heart of the book, though
Herrington works up to this gradually through increasingly complex tasks) you'll
find answers to all of these issues:
- The code is going to be out of control.
- I'm going to be the only one who knows what's going on.
- Our application semantics aren't well defined yet.
- This is going to take all the joy out of coding. ("If redundant code
writing is what you consider fun, then you may find a generator is not for
- The database design is too complex to be generated.
- The generated SQL statements will be rubbish.
- The up-front development cost is too high.
- I don't have all the prerequisite skills.
- The information here is centered around web applications, what about client/server?
- My application doesn't use a database.
If you work through the book, you'll not only end up with an appreciation of
how to build a code generator (and an extensive list of existing ones that you
can check out), but also with a road map to selling the concept to the rest
of your team and your management. Herrington doesn't recommend misrepresenting
code generation, but he does come up with good answers to the most common objections
you're likely to run into.
Go Forth and Generate
Itching to get started with code generation and impatient for that copy
of the book that you ordered to ship? Help is just an URL away. In addition
to writing the book, Herrington also maintains the Code Generation Network Web
Here you'll find an extensive database of products (both commercial and free),
articles on code generation, and interviews with developers who are heavily
involved on this programming front.
The history of software development has some clear trends. One of them is the
development of ever more abstract ("higher level") languages in an
effort to save time and enable more complex development. Unless you're writing
everything in machine language, you're using code generation at some level.
When you spot a repetitive task in what you're typing, don't reach for the Windows
clipboard—think code generation instead. It's the logical next step.
Got a code generation success story? Or are you lost in a morass of poorly-designed
spaghetti code from a bad tool? I'd love to hear about your code generation
experiences either way by e-mail to [email protected].
I'll use the most interesting comments in a future issue of Developer Central.