Sunday, February 19, 2012

Refactoring

Change is a way of life in software development.  Requirements change because clients gain a better understanding of their needs (possibly due to experience with an existing version of the software), to take advantage of new business opportunities, or for external reasons such as new legislation that requires new types of auditing, or change in standards.  The environment may change in various ways; for example, moving from a private data center to a public cloud, operating systems may become obsolete, or mission-specific hardware may need to be upgraded.  Change is the greatest challenge to the software development process, and development methodologies employ various tactics to accommodate it.  The classic "waterfall" methodology tries to eliminate it, by spending a lot of time in the beginning of the project trying to get the requirements just right.  Unfortunately, even if this is extraordinarily successful, by the time the other steps in the process are done, the requirements are practically guaranteed to be out of date.

Agile methodologoies take the other extreme.  Significantly, the classic introduction to Extreme Programming (nicknamed XP) by Kent Beck is called Extreme Programming Explained: Embrace Change.  These methodologies accept that change is inevitable, and gear every activity to support it.  This can only work if it is possible to modify existing software to fit changing requirements, and this is only possible if the software is always well-designed.  Unfortunately, software tends to change away from a good design, because each change is usually done without much regard for the overall design.  As changes accumulate, it becomes harder and harder to make further changes, and the likelihood of introducing errors grows in an exponential manner.  Therefore, one of the basic tenets of agile methods is that developers must take the time to restore a clean design to their software every once in a while.  This activity may seem to be unproductive, since it doesn't result in any modifications to the external behavior of the software.  However, it enables further modifications, and is therefore essential.

The activity of changing the internal structure of the software in order to restore it to a well-designed form without affecting external behavior is called refactoring.  First introduced by Bill Opdyke in his 1992 PhD thesis, and popularized in the famous book Refactoring: Improving the Design of Existing Code, refactoring is an approach to software development as well as a set of "refactorings," each of which is a detailed list of instructions on how to perform structure-modifying but behavior-preserving changes in code in a reliable way.  Reliability is achieved by performing unit tests after each small change, in an attempt to guarantee that behavior is still preserved.  If some of the tests fail, it is easy to identify the reason, since it is rooted in the last small modification performed.

Of course, in order for this to work, it is necessary to have an extensive suite of unit tests, which will guarantee (with high probability) that any change that doesn't preserve functionality will be discovered immediately.  This is easier said than done.  A good set of unit tests may be considerably larger than the production code itself, and requires significant effort to create and maintain.  Most agilists consider this to be a necessary evil; I once heard a developer say he enjoys writing these tests, and even Kent Beck, the high priest of XP, expressed his surprise at that.

Still, developing without a good suite of tests is like driving blindfolded; you can't see where you're going and you're afraid to make any change at all.  In fact, Michael Feathers, in his book Working Effectively with Legacy Code, defines a legacy system to be one for which you don't have a good suite of unit tests.  And refactoring is a very effective and satisfying way to program.  I recall several experiences in which I started making some change in my code, which led to further changes, leading to a period of several days when I couldn't even compile my code, let alone run it.  This is a terrible feeling; you know you have problems in your code, but you have no way of testing it to find out what they are.  At the end of this process, when I got back to a running system, it indeed had problems, and it took a long time to find them all and fix them.  By restructuring the work into a series of refactorings, it would have been possible to test much more often, and thus find problems and fix them much earlier.

As I said, there can be no agile methodologies without refactoring.  However, refactoring is a useful activity regardless of which methodology you use (including no methodology at all).  It does come at a cost; you must have an extensive test suite, and refactoring itself takes time and effort.  I consider this cost to be well spent, since it saves much more effort down the line, when the inevitable requirement changes appear.  Hard as it is to convince programmers to invest in their future (see Programmers as Children), this is one of the important places to try.

As in other cases, tools are a great help in following the methodology, and therefore also in convincing programmers to use it.  Furthermore, good tools take away some of the burden of checking the correctness of the transformation.  Good refactoring support exists (mostly for Java) in Eclipse and other modern IDEs, and I urge every developer to make the maximum use of these.  When I modify my code, I make every effort to use the automated refactoring capabilities provided by the IDE, as well as the related source transformations and quick fixes.  I do this even when this forces me to make the changes in a certain way or order, just to take advantage of the automation (and related reliability) in the IDE

Unfortunately, all modern IDEs have bugs in their refactoring support, chiefly due to insufficient analysis.  For example, they might create variable bindings that shadow existing bindings incorrectly.  So you must be careful when using these tools, and be aware of their limitations.  In spite of these limitations, I find them extremely useful.  One of the current research interests of our group is automating more powerful refactorings and making them more reliable; more about that in a later post.