This is an article about the boring process of running javac on Java source files to obtain Java class files. It explains that it's not so simple (when you come down to it, it's actually pretty complicated), what has been done to make it seem simple and what, in my opinion, should be done instead.
So, what's so complicated about running javac ? Well, making class files may be the most obvious part of a Java build, but it's rarely the only one. In my experience (which is admittedly limited, but I did work on more than one Java project), Java projects are typically multi-language: more often than not, there's some legacy library which must be accessed through native code, a CORBA IDL interface to be implemented, an XML DTD to conform to and so on.There are many code generators producing Java, and therefore much code to be generated before compilation; and when the class files are compiled, they must be packaged into jar s (which may need to be signed), and then there's no limit to how complicated you want your automated testing to be. I subscribe to the adage that "the program that has not been tested does not work", and even if you don't, reflect on the dynamic nature of Java (runtime checks for null pointers, casts and so on) and ask yourself whether you really want your users to be the first running your code... And if you want to test what is delivered (as opposed to some other, quite similar code), you need version control, and that should also mesh with your build. Builds are hard, and Java builds are harder than average.
Of course, if a build does a lot of things, it shouldn't do them every time. Not only do I not want to run integration tests every time I change a line - I don't even want to package the jar s. I want to run the unit tests early and often and they must use class files corresponding to my source files. Therefore the build must figure out what needs to be updated (whether it's one class file, or every class using a changed constant) and build that and no more. In other words, a serious build for Java applications must understand Java dependencies. That is the problem; let's look at some existing solutions.
The most common answer I get when asking about a project build is, I'm sorry to admit, "Why don't you just use JBuilder?" - so my answer is practiced by now: JBuilder doesn't scale. I believe (I didn't try) that it may be useful for a home-based individual developer, but an environment which does not handle multiple languages, multiple users and variant (at least unit test vs. delivery) builds is simply not good enough for production work. That should not be taken as criticism of JBuilder (in fact, I'm told that some of its more irritating limitations have been fixed in latest versions), but rather as a criticism of "visual" environments in general. Large-scale software development requires programmable tools, going beyond "what you see is all you get".
And indeed there are programmable build tools. The problem of dependency management has been recognized (and solved) long ago, and its solution is now a de-facto standard build tool: make. From a popular manual [ make ]: "The make utility automatically determines which pieces of a large program needs to be recompiled, and issues commands to recompile them."
It would be perfect, if only it were true... Standard make does not automatically handle dependencies (not beyond "$NAME.o always depends on $NAME.c"), and in consequence, projects using make either generate most of the content of their makefiles by various add-ons (either make extensions, or external programs), or just ignore the problem and build from scratch every once in a while to flush the bugs out. A good overview of make problems can be found (perhaps unsurprisingly) in the documentation of an alternative build tool [ CONS ].
And if the general problems of make weren't enough to look for alternatives, there are also Java-specific ones: make 's build model is based on C compilation (source files are compiled to object files, which are linked to make an executable), but Java doesn't work that way. Multiple class files can be generated from a single source, circular dependencies are quite common and the generation of jar s is strictly optional.
Considering that the purpose of dependency analysis is to minimize compilation time, it is worth noting that javac is very slow to start up. For a few simple classes, it may not pay to check dependencies at all - the startup time (or perhaps it is time javac itself spends checking dependencies - see below) dwarfs time actually spent compiling, and so it saves practically no time to compile just part of the project.
On the other hand, there clearly are projects big enough that javac should not compile all their sources, every time - there's no justification for build times going up linearly with the size of the project. In theory, javac could handle dependency management internally. In practice, that functionality (the -depend flag of javac ) existed in early versions of Sun's JDK, but it has been dropped in Java 2, and javac now recompiles only direct out-of-date dependencies, when their sources happen to be found on sourcepath or classpath . Sourcepath at least is an explicit command-line argument, but the recompilation of source files on classpath cannot even be turned off... Make has a particular problem with this "feature", because make builds normally create derived files in the same directory where the source is, and since javac must be able to find the class files (to compile other sources depending on them), it also checks the sources even when it shouldn't.
A model for a Java build
So, if the current solutions are unsatisfactory, what should be done instead?
I believe that a Java-aware build tool should have languagespecific support (not only for Java, but for multiple languages - at least C/C++) and that the support should be comprehensive.
The tool should know which class files are generated from which sources (this is important for further derivation, i.e. making jar s), and which class files are required to compile each source file.
The tool should handle circular dependencies, and even when there aren't any, it should compile more than one updated source file at once (by default probably all sources in one directory).
Overall, it's a lot of "should" - are these requirements realistic?
Enumerating derived files
In the simplest case, a Java source file defines one class; if that class is public, it must have the same name as the file. But not all classes are public, and non-public (top-level) classes can be defined in any source file declaring classes of a given package. Java also has inner classes, defined inside a top level class and not necessarily having any name at all. As the JVM spec [ JVM ] says, "Typically, a class or interface will be represented using a file in a hierarchical file system. The name of the class or interface will usually be encoded in the pathname of the file." - but there are no guarantees, and certainly no published algorithm to derive the class file name. Sun's "anywhere" (as in "run anywhere") just doesn't seem to include Java environments from other providers...
But at least the package part is clear: a package corresponds to a relative directory. The absolute, top-level directories prepended to the relative ones are listed (together with jar s - in this context, a jar is just a directory abstraction) in classpath , which is specified by a command-line argument (for command-line tools) or a user-settable option (for GUI applications), an environment variable or some hardcoded default (good only for the system classes). Pretty much everybody handles classpath the same way.
For the name part, let's get empirical: what class file names are actually created by Sun's javac ? It appears that all inner classes have names constructed from multiple segments, separated by '$', where the first segment is the name of their enclosing class. Named inner classes (i.e. the only inner classes which can be referenced from files other than the one defining them) simply concatenate all names of their enclosing classes - for example, a class whose Java name is:
is compiled into a file
(in the pkg/name directory). Anonymous classes use two or three segments only. Unnamed classes have names constructed from two segments, while local classes (i.e. named classes declared in some method) need three segments (because local classes with the same name may be declared in different blocks). When there's no good name for a segment, a number is used instead; the numbers start from 1 and go up (when necessary to keep the whole name of every defined class unique) as the source is being parsed.
Overall, it seems possible to pry open the black box and get the class file names generated from a given source - provided we have a Java parser. Fortunately, Antlr is Java-based and (like any selfrespecting parser) parses its own programming language, so a utility for this task is within reach.
Java source dependencies
When parsing Java sources, it is naturally also possible to notice the used class names and construct a dependency graph. The hard part is determining what is a class name (as opposed to, say, field name - see Section 6.5 of the Java spec [ JLS ]). Basically, a dependency analyzer must know about all classes (on the classpath , and classes that may not exist yet, but are defined in the project's sources) and use that information to determine the meaning of a name. Admittedly, the most complicated scenarios should be rare - perhaps it would pay to cheat a bit... Also, since classes accessible across packages have one-to-one correspondence between their source file and class file, ignoring inter-package dependencies (and always recompiling all sources in an updated package) would considerably simplify the analysis.
A proof of this concept (including a demonstration of problems with file-level granularity) is werken.javad [ Werken ]. In my view, it suffers from being a make add-on, but it certainly can serve as a basis to build from - all code I wrote to research this article (and don't present it - it's just not good enough) is based on werken.javad.
Does anybody care?
So, technically, it seems possible to build a Java-aware build tool - yet none exists. And I'm as guilty of it as anyone - I used make to build my experimental code, and it did not work right. New make replacements like Apache Ant [ Ant ] don't do dependency management at all. Is the problem of building Java applications solvable, but just too hard to bother solving? IDEs do try to solve it, but not very well (JBuilder, to pick my favorite whipping-boy, does have some dependency management, but it's unreliable - it doesn't handle files being removed from the project, for example), and there is no commercial build tool which isn't an all-singing, all-dancing, all-its-own IDE. I believe the problem is too hard for homemade solutions, but also that there's more Open Source Java projects than any one company has, and that these projects would profit from a Java-aware build tool. I believe they could profit as much as the GNU [ GNU ] projects profit from autoconf [ autoconf ] - a piece of software whose development couldn't have been justified by competing Unix vendors precisely because their customers find it so useful (to mix and match their software).
[make] GNU make manual: http://www.gnu.org/manual/make/html_node/make_toc.html
[CONS] CONS - A Software Construction System http://www.dsmit.com/cons/stable/cons.html
[JVM] The Java Virtual Machine Specification: http://java.sun.com/docs/books/vmspec/
[ANTLR] The ANTLR website: http://www.antlr.org/
[JLS] The Java Language Specification: http://java.sun.com/docs/books/jls/
[Werken] The Werken Digital website: http://code.werken.com/
[Ant] The Jakarta Project website (Apache Ant): http://jakarta.apache.org/ant/
[GNU] GNU website: http://www.gnu.org/
[autoconf] GNU autoconf information and resources: http://www.gnu.org/software/autoconf/