Sunday, June 24, 2012

Design by Contract and Refactoring

I am a great believer both in refactoring and in design by contract.  How do these two work together?

First, contracts are a great help when refactoring.  The first thing you need to know when you refactor a piece of code is what it is supposed to do.  This tells you what you can and can't do with it, and, no less important, what would be useful to do with it.  If it's an arbitrary set of statements, say part of an existing method you are trying to extract into a new method, there's little to help beyond reading the code and trying to figure out what it does and how it fits with the rest of the code.  (Even then, there are tools that try to deduce a contract for an arbitrary piece of code; more on that in a later post).  But if it's a method, you should have more help.

If you have a good suite of unit tests, you can try to look at the tests of that method to figure out what it's supposed to do.  But it's much easier if you have a contract attached to the method; the contract should give you a lot of the information you need.  For example, suppose you want to move one or more methods and fields from one class to another (perhaps you are using Pull Up Method, Push Down Method, or just Move Method).  In that case, you should check the contracts of moved methods to see how they fit in their new class.  Are the invariants of the new class maintained by the moved methods?  Are the contracts of the methods still valid, taking inheritance into account?  Nasty bugs can result if the answers are negative.

On the other hand, refactoring may require corresponding changes in existing contracts, as well as the creation of brand-new contracts.  In other words, contracts need to be refactored together with the associated code.  This is an added burden, which may well be an obstacle to the use of the design-by-contract methodology; not only do I have to invest effort in creating the contracts in the first place, I also have to refactor them later.  Why should I bother?

There are several answers to this complaint.  First, consider the alternatives.  You may be flying blind, trusting only on your undocumented understanding of the code.  In that case, good luck to you; you'll need it.  If you follow an agile methodology, such as Extreme Programming, you should have an extensive suite of unit tests to help you refactor.  Unit tests are very vulnerable to change in the code, since they are attached to small units of code.  This means that any shift in responsibilities is likely to invalidate some tests, which will then have to be refactored or completely rewritten.  So there's always the need to refactor associated artifacts when you refactor the code.

Having contracts can significantly reduce the amount of detail in unit tests, and even the total number of unit tests.  This is due to the fact that a major part of the responsibility usually given to unit tests is now taken by the contract.  All that the tests need to do is exercise the system, but correctness checking (or a large part of it) is now done by checking the contracts.  So now you can have higher-level tests that exercise the system, instead of unit tests for each class and method.  For example, suppose you are implementing a cryptographic algorithm such as RSA.  A test that creates a random key, then encrypts a random data buffer, decrypts it, and checks that the result is the original data, is guaranteed to exercise almost all of your cryptographic code.  Moreover, this test can be used on many different implementations.  In contrast, the internals of the implementation can vary widely, since there are many ways to implement the large-number arithmetic operations required for RSA, and their efficiency will be different under different circumstances.  By having an application-level test augmented with contracts, the burden of refactoring tests is reduced to almost nil.

This still leaves the requirement of refactoring contracts, and the question of how much refactoring tools can help.  In order to understand that, it is necessary to examine the relationships between code refactorings and contracts.  On the simplest level, contracts are treated just like code. For example, when renaming a method, all references to it in the code must be appropriately modified; so must all references in assertions. Similarly, a method may be eliminated when it is not used anywhere, including in assertions.  Contract-aware tools should find these kinds of relationships easy to perform automatically.  Of the refactorings listed in Fowler's Refactoring book, 32% do not interact with contracts except possibly in this way.  These are mostly syntactic refactorings such as Remove Parameter or Rename Method, or those that eliminate classes or methods, such as Inline Class and Inline Method.

Some contracts affect the applicability of certain refactorings.  As mentioned above, a method should only be moved if it won't violate the invariant of its new class.  (Of course, it might be possible to modify that invariant to accomodate the moved method; this can be done in a separate step, before moving the method.)  13% of Fowler's refactorings have this nature.

Some refactorings require the creation of new or modified contracts.  The most extreme case is Extract Method, which creates a new method out of arbitrary code.  Discovering the contract for arbitrary code is impossible in general, although some tools can discover partial contracts.  Another example is Create Superclass, which can create contracts for new methods based on existing contracts in subclasses.  59% of Folwer's refactorings fall into this category (which includes 4% that are also included in the previous one).

Finally, some new refactorings can be defined to deal specifically with contracts; these include Pull Up Contract, Push Down Contract, Create Abstract Precondition, and Simplify Assertion.

In a future post I will discuss automation of contract-related refactorings and how refactoring tools can take some of the burden of the contracts.


Sunday, February 19, 2012

Refactoring

Change is a way of life in software development.  Requirements change because clients gain a better understanding of their needs (possibly due to experience with an existing version of the software), to take advantage of new business opportunities, or for external reasons such as new legislation that requires new types of auditing, or change in standards.  The environment may change in various ways; for example, moving from a private data center to a public cloud, operating systems may become obsolete, or mission-specific hardware may need to be upgraded.  Change is the greatest challenge to the software development process, and development methodologies employ various tactics to accommodate it.  The classic "waterfall" methodology tries to eliminate it, by spending a lot of time in the beginning of the project trying to get the requirements just right.  Unfortunately, even if this is extraordinarily successful, by the time the other steps in the process are done, the requirements are practically guaranteed to be out of date.

Agile methodologoies take the other extreme.  Significantly, the classic introduction to Extreme Programming (nicknamed XP) by Kent Beck is called Extreme Programming Explained: Embrace Change.  These methodologies accept that change is inevitable, and gear every activity to support it.  This can only work if it is possible to modify existing software to fit changing requirements, and this is only possible if the software is always well-designed.  Unfortunately, software tends to change away from a good design, because each change is usually done without much regard for the overall design.  As changes accumulate, it becomes harder and harder to make further changes, and the likelihood of introducing errors grows in an exponential manner.  Therefore, one of the basic tenets of agile methods is that developers must take the time to restore a clean design to their software every once in a while.  This activity may seem to be unproductive, since it doesn't result in any modifications to the external behavior of the software.  However, it enables further modifications, and is therefore essential.

The activity of changing the internal structure of the software in order to restore it to a well-designed form without affecting external behavior is called refactoring.  First introduced by Bill Opdyke in his 1992 PhD thesis, and popularized in the famous book Refactoring: Improving the Design of Existing Code, refactoring is an approach to software development as well as a set of "refactorings," each of which is a detailed list of instructions on how to perform structure-modifying but behavior-preserving changes in code in a reliable way.  Reliability is achieved by performing unit tests after each small change, in an attempt to guarantee that behavior is still preserved.  If some of the tests fail, it is easy to identify the reason, since it is rooted in the last small modification performed.

Of course, in order for this to work, it is necessary to have an extensive suite of unit tests, which will guarantee (with high probability) that any change that doesn't preserve functionality will be discovered immediately.  This is easier said than done.  A good set of unit tests may be considerably larger than the production code itself, and requires significant effort to create and maintain.  Most agilists consider this to be a necessary evil; I once heard a developer say he enjoys writing these tests, and even Kent Beck, the high priest of XP, expressed his surprise at that.

Still, developing without a good suite of tests is like driving blindfolded; you can't see where you're going and you're afraid to make any change at all.  In fact, Michael Feathers, in his book Working Effectively with Legacy Code, defines a legacy system to be one for which you don't have a good suite of unit tests.  And refactoring is a very effective and satisfying way to program.  I recall several experiences in which I started making some change in my code, which led to further changes, leading to a period of several days when I couldn't even compile my code, let alone run it.  This is a terrible feeling; you know you have problems in your code, but you have no way of testing it to find out what they are.  At the end of this process, when I got back to a running system, it indeed had problems, and it took a long time to find them all and fix them.  By restructuring the work into a series of refactorings, it would have been possible to test much more often, and thus find problems and fix them much earlier.

As I said, there can be no agile methodologies without refactoring.  However, refactoring is a useful activity regardless of which methodology you use (including no methodology at all).  It does come at a cost; you must have an extensive test suite, and refactoring itself takes time and effort.  I consider this cost to be well spent, since it saves much more effort down the line, when the inevitable requirement changes appear.  Hard as it is to convince programmers to invest in their future (see Programmers as Children), this is one of the important places to try.

As in other cases, tools are a great help in following the methodology, and therefore also in convincing programmers to use it.  Furthermore, good tools take away some of the burden of checking the correctness of the transformation.  Good refactoring support exists (mostly for Java) in Eclipse and other modern IDEs, and I urge every developer to make the maximum use of these.  When I modify my code, I make every effort to use the automated refactoring capabilities provided by the IDE, as well as the related source transformations and quick fixes.  I do this even when this forces me to make the changes in a certain way or order, just to take advantage of the automation (and related reliability) in the IDE

Unfortunately, all modern IDEs have bugs in their refactoring support, chiefly due to insufficient analysis.  For example, they might create variable bindings that shadow existing bindings incorrectly.  So you must be careful when using these tools, and be aware of their limitations.  In spite of these limitations, I find them extremely useful.  One of the current research interests of our group is automating more powerful refactorings and making them more reliable; more about that in a later post.