Monday, November 14, 2011

Runtime Checking of Contracts

As I said before, the most important use of contracts is methodological; by paying attention to contracts, you will be able to avoid many errors you won't have to fix later.  Still, if you go to the trouble of writing contracts, you would like to get additional benefits.  One of the most obvious is runtime checking of contracts, which gives warning about contract violations as they occur, helping catch errors close to their sources.  Without contracts, a bug can corrupt a data structure or its contents, planting a time bomb that can explode much later.  At that point, it is difficult to discover who made the erroneous change even with a debugger, since the offending method is no longer on the stack and can't be examined, and many other traces of the buggy action may also have disappeared.

In order to get these kinds of warnings, the contract must be written in some formal language; typically it is some extension of boolean-valued expressions in the programming language.  Extensions are necessary in order to refer to "old" values; that is, allowing the postcondition to refer to values of variables or expressions on entry to the method.  Rich assertion languages also add ways of expressing universal and existential quantifiers; often, you want to express some property of all the contents of some array or other data structure, or to assert that there is some element with a given property.  However, it is easy to add such capabilities to languages that don't offer them simply by writing appropriate methods to check any required condition.

Runtime contract checking is complicated by several factors.  First is the relationship between contracts on methods in inheritance hierarchies.  As I explained in The Correct Use of Inheritance, inheriting methods may only weaken preconditions and strengthen postconditions.  There are two approaches to enforcing this constraint.  The one used by Eiffel uses syntactic enforcement, in which you can never write assertions violating this rule.  In Eiffel, overriding methods use "ensure then" instead of "ensure" for postconditions, with the meaning that any assertion following "ensure then" is conjoined with any assertions from superclasses.  For example, suppose you have a postcondition "ensure contains(x)" on the method Container.put(x); this ensures that after adding x to the container, it is indeed in it.  A list is a kind of container, and there you want to add the postcondition "ensure then count = old count + 1", which means that the number of elements in the list has increased by one after adding an element.  (This is not the case for sets, which are also containers.)  The full postcondition of List.put(x) is now "contains(x) and then count = old count + 1".  (The "and then" operator is a short-circuit operator, like && in the C family.)  Similarly, preconditions in overriding methods use "require else" rather than "require", and add the new assertions by disjunction ("or else" in Eiffel, || in the C family.)  While this approach prevents any violations of the inheritance rules, it means that you don't see the full contract in the class itself, and there could be hidden assertions you aren't aware of.  This calls for tools that can display the full contract (see Viewing Contracts).

The other approach is to require developers to maintain the correct relationships themselves, by copying assertions as necessary.  In this case, you see the full contract in each class, but need to verify for yourself that the inheritance rules hold in each case.

A runtime assertion-checking tool needs to be aware of the inherited contracts in either approach.  In the first one, it must be aware of the full contract in order to check it properly.  This is less of an issue for postconditions, because these can be checked independently.  But it is crucial for preconditions, since the fact that a precondition written with a certain overriding method is false is not an error if the precondition of the method in some superclass is satisfied.  In the second approach, assertions can be checked for each class independently, but you also want to add checks that warn in case the inheritance rules are violated.

Another complicating factor is the need to keep "old" values, as in the example shown above: "count = old count + 1".  An assertion-checking tool needs to keep the value of count on entry to the method in order to make sure that the value on exit is exactly one more.  The computation of the old value could cause an exception; for example, the Stack.put(x) method may have a postcondition "count > 1 implies item(2) = old item(1)", which means that if after insertion the stack has at least two elements, then the element that used to be on top of the stack ("old item(1)") is now second ("item(2)").  As the tool evaluates the expression item(1) on entry to the method, an exception will occur if the stack is empty.  This, however, is not an error in the program or in the assertions!  What the tool must do is keep a record of the exception, and only raise it if the old value is ever referenced.  This will never happen in this example, because if the stack was initially empty, it will have exactly one element following the put operation, the antecedent "count > 1" of the postcondition will be false, and the consequent that calls for the old value, "item(2) = old item(1)", will never be evaluated.

Yet another challenge is preventing infinite recursion in assertion checking.  For example, if the Stack class has an invariant capacity() >= 0, and this invariant is checked upon entry to the capacity method, an infinite recursion will result.  Eiffel takes a conservative view, and turns off all assertion checking while it checks any assertion.  The rationale for this is that methods used in assertions should be verified separately, before being used to specify the behavior of other parts of the program.  It is possible to check more assertions without getting into an infinite recursion, and various tools have different strategies for dealing with this issue.

Our own assertion-checking tool for Java is called Jose, and uses aspect-oriented programming to handle the issues mentioned above.  More on Jose in a later post!

Tuesday, September 13, 2011

Viewing Contracts

There are many ways to view a contract.  As you edit your class, you naturally see the contract for each method in the header of the method, since it is part of the text.  (In Eiffel, it is part of the syntax of the language; in design-by-contract tools for Java, the contract is typically expressed in the Javadoc comments of classes and methods.)  But what do you do when you want to see the full contract of the method, including inherited parts?  And when you are about to call some method of another class, how do you know what is the full contact for that method?

The contract can have private parts, which are not meaningful to clients.  As I mentioned previously, preconditions must be fully understandable to clients, since otherwise they can't be expected to comply with them.  However, postconditions and invariants may refer to private parts of the implementation.  For example, suppose your Queue class uses a circular buffer, called items, to implement the (bounded) queue.  The implementation invariant should specify that items != null.  This is an important property of the class to its implementer and anyone creating an inheriting class, but it shouldn't be exposed to clients who aren't supposed to know about implementation details.  Similarly, the method that adds an element to the queue may have a postcondition specifying that the write-pointer has advanced by 1 modulo the buffer size.  Again, this is useful information to the implementer, but should be hidden from clients.

The distinction between private and public access levels, in code or in contracts, is usually not clear cut.  For example, Java provides four different access levels; C++ has three, but allows "friends" to access private and protected members; and Eiffel has a selective export mechanism where a class can specify which other classes have access to each of its features.  This means that the same class can present different interfaces to other classes, depending on their access levels.  As always, interfaces include the methods visible to the particular client, as well as the contract.  Different parts of the postconditions and invariants of the contract would be visible to different clients.

The Eiffel development environment can show contracts in different ways, based on whether to show private assertions or not, and whether to show assertions from a single class or show all inherited assertions as well.  However, the views can't be customized by client.  I don't know of any other development environment that can show more than one view of a contract.  Ideally, the environment would be able to show contracts from a supplier's perspective as well as from the perspective of any client.  For example, if I'm editing class A, and place the cursor on a call to a method f of class B, I should be able to see (in a tooltip or in a separate view) the full contract of B.f from the point of view of class A.  Such a view is not difficult to create, although it might require some tedious work.  I consider such a view essential to any development environment that takes design-by-contract seriously.

Friday, August 5, 2011

Why Don't YOU Use Design by Contract?

I spent quite some time explaining why I think design by contract is such a useful methodology, and why every programmer (and especially those using object-oriented programming) should use it.  So why isn't it more widely used?

One obvious reason is the lack of education about design by contract.  You can find many university-level courses on object-oriented programming that hardly mention design by contract, or not at all.  Naturally, people who don't know about it aren't going to use it.  But the problem is deeper; I know that many of my own students don't use design by contract regularly.  It may be that I wasn't able to convince them that it is really useful, but I know I'm not alone in this situation.  I believe that for any methodology to be practically useful, it must be supported by tools that make its application not only painless but also rewarding.  And tools for design by contract are not yet at a level that promotes wide use.  Also, since programmers need instant gratification (see Programmers as Children), they need tools that will help them immediately, not only in some unforseeable (yet inevitable) future where the bugs they are writing now will come back to bite them.

I firmly believe that design by contract is useful even without any tools.  But consider the amount of work required to use it.  First, you have to write contracts for your methods.  As I said before, these don't have to be complete, but they do have to capture those parts of the contract that are likely to cause the most trouble.  Often, these are not deep properties of the computation.  On the contrary, these are things like whether an argument to a method is allowed to be null, whether a string or an array may have zero length, whether a number has to be positive, and so on.  This is not difficult to write, but write it you must.  Then, you need to look at this contract whenever you call a method, and consider whether your call is correct.  (Of course, if you don't bother, you will find your mistakes the hard way.)

But the most painful part of almost all methodologies arises when something needs to change.  Whether your requirements have changed, or whether you are just refactoring your code, you will need to change some contracts, and create new ones.  Then you will have to check whether all existing calls obey the new contract (or find out the hard way, as is the normal custom).

There are many opportunities for automated tools to help, at different levels .  To begin with, you will want syntax checking of your contracts.  This includes checking whether all preconditions can be evaluated by every caller; this is a must, since you can't require callers to obey conditions they can't evaluate.  You may also want your IDE to provide syntax highlighting for contracts in a similar way to what it does with code.  A "contract view" that shows just the contracts for a class would be nice, especially if it could show various views of the contracts.  Such views can include the client view (where internal parts of the contracts are hidden), and the provider view (where everything is shown).  The views would be customizable, for example, by showing only direct class members or all members (including inherited ones).

At a more semantic level, you want your contracts to be checked while you run the program.  This has great value in catching errors close to their sources.  However, instrumenting a program to check contracts is not trivial.  Beyond the difficulties of instrumenting any program, the semantics of the contract require some tricky handling.  One example is the ability to reference in postconditions the value of expressions on entry to the method, which requires tools to compute this value and keep it intact until the end of the call.

More generally, you may want a tool to help you verify the correctness of your implementation based on the contracts you write.  Software verification is still an open problem, but it is a very active area of research.  Part of this research is focused on verification of contracts.

Turning now to the question of change, tools that help manage contracts when the code changes are a must.  Such tools can modify existing contracts as well as create (draft) contracts for new code.  This is by no means easy.  Analyzing an arbitrary piece of code to extract contracts from it is an open (and very difficult) problem, although some progress has been made on it.  The problem is easier if some information about the intent of the change is available.  This can happen in refactoring; when you invoke a refactoring such as Extract Superclass, the tool has an indication of your intent, and can use this information together with existing contracts for the class (or classes) you are generalizing in order to create the contract for the new superclass.

Needless to say, the Eiffel programming environment has the best tools for design by contract, although much more is needed.  In future posts, I will describe some of the work my students and I have done on creating tools for design by contract in Java.

Sunday, June 12, 2011

Watch Your Step!

(Note: if you are familiar with SQL injection vulnerabilities, you may want to skip to the end of this post, which mentions some new research that could eliminate such errors before they are made.)

This classic comic from xkcd demonstrates one of the most common and dangerous type of security vulnerabilities in web applications, called SQL injection.  This is a way in which an attacker can cause the application to execute unintended and unanticipated (by the application operator) actions on the database.  In this example, the action is to remove the database table called "Students," thus removing all student information.

The application server executes some code in response to receiving information submitted from a web form.  I will show the example in Java, but similar code is written in other languages, including PHP, Perl, and .NET.  The application sends a query to the database; in this example, it tries to get student information.  The query is typically coded in SQL, and may have a form similar to this:

SELECT * FROM Students WHERE NAME = 'name'
The name itself, shown above in italics, is of course different for each student; the value of this field is taken from the web form.  The code that generates this query in Java looks like this:
String query = "SELECT * FROM Students WHERE NAME = '" + request.getParameter("name") + "'";
This builds the query string from three parts: two constant strings, and the call request.getParameter("name"), which returns the "name" field of the web form.  Unfortunately, if you substitute little Bobby Tables' name into this computation, you will end up with the following SQL statement:
SELECT * FROM Students WHERE NAME = 'Robert';
DROP TABLE Students;
(Bobby's full name is shown in red; note that the final single quote comes from the second constant string in the original computation.)  When this string is sent to the database for execution, it will be interpreted as three different commands, shown above on three separate lines.  The first will indeed return the information about Bobby, but the second will remove the Students table.  The third consists only of the three characters --' ; the first two dashes indicate that this is a comment and the rest of the line (the trailing quote) is to be ignored.

This is not funny!  All too many web applications are written in this way, and are therefore vulnerable to SQL injection attacks.  In OWASP's list of top 10 vulnerabilities, injection vulnerabilities appear as the first item, tagged as being common, easy to exploit, and having severe impact.

How can such problems be avoided?  Obviously, any user-supplied information, including the contents of web forms, must be validated or sanitized.  Validation means blocking any potentially-dangerous input (such as anything that includes quotes), while sanitation means modifying such inputs to a non-harmful form.  This is easy enough to do; any reasonable web-development framework will supply appropriate validation and sanitization functions.  However, the developer must remember to call these functions on each and every user input; any omission creates a security vulnerability, and a systematic attacker will discover it eventually.

An alternative solution is to use prepared statements to communicate with the database.  Java, like other server-side programming languages, offers more than one interface to the database.  A prepared statement uses a fixed SQL statement, which includes question marks instead of actual parameter values.  Bobby's example would look like this:
When a prepared statement is used to invoke a database operation, all parameters identified by question marks are given their actual values by invoking type-specific methods (such as setString or setInt) on the prepared statement object.  These take care of sanitization internally, so that the developer doesn't need to worry about it.  A prepared statement object can therefore be used multiple times with the same SQL query but with different arguments.  Another advantage of prepared statements it that the query string they receive must obey SQL syntax rules (with the addition of the question marks), and this can be checked when the prepared statement is initialized with the query, and the query can also be optimized once instead of having to be optimized for each call.

However, there is a serious problem in the approach taken by all these languages that require the construction of SQL queries as strings.  This gives too much freedom to the developer, allowing the creation of security and other errors that will not be discovered until the application is actually run.  Certainly, if you use a statically-typed language like Java, you should also be able to express the syntax of your database queries in a way that can be checked for correctness at compile time rather than at runtime.  This calls for a syntactic extension of the programming language, which will support SQL statements with parameter indicators such as the question marks of prepared statements.  This will have the advantage of catching syntactic erros at compile time, as well as the possibility of adding sanitation automatically, as is done for prepared statements.

Current web-development languages make it difficult for developers to do the right thing, necessitating special tools for discovering security vulnerabilities and fixing them.  Why not provide the ability to prevent them in the first place?

This is actually possible.  In their paper "Simple and safe SQL queries with C++ templates" (Science of Computer Programming 75 (2010)), Joseph Gil and Keren Lenz describe a system that uses C++ template metaprobramming techniques to represent relational algebra expressions inside the C++ language.  (Relational algebra is an alternative to SQL with a more mathematical syntax.)  The compiler can check the expressions written this way for syntactic correctness, and when the SQL query is actually built for sending to the database, all string inputs are sanitized so that they can't modify the syntax and create injection attacks.  Bobby's example can be written in this system this way:
(Students / (NAME == name)).asSQL()
The parenthesized expression is the relational-algebra form of the query, and the asSQL() method converts it to a string that can be sent to the database.  This syntax uses C++ operator overloading, and is therefore inapplicable to Java and similar languages.  It is possible to extend these with similar capabilities using more cumbersome syntax, but the right thing to do would be to make SQL (better yet, relational algebra) an integral part of the language.  Anyone who has ambitions to develop the next server-side programming language should be aware of this research!

Monday, May 30, 2011

The Weakest Link

The May 2011 issue of Communications of the ACM contains an article called Weapons of Mass Assignment by Patrick McKenzie.  The article describes McKenzie's security analysis of the pre-release source code of Diaspora, an attempt to build a "privacy-aware Facebook alternative".  The Diaspora home page makes the following claim: "Inherently private, Diaspora doesn’t make you wade through pages of settings and options just to keep your profile secure."  Unfortunately, McKenzie was "struck by numerous severe security errors", which were "serious enough to jeopardize the goals of the project."

The first kind of errors McKenzie reports are a result of confusion between authentication and authorization.  Authentication means that the system has identified the user using some secure mechanism, so that nobody else can impersonate that user.  Authorization, on the other hand, refers to the set of operations that a user can legally perform.  Naturally, authorization is meaningless without authentication, since you can't refer to the operations a user is permitted to perform without authenticating that user.  But authentication isn't enough; once you know who the user is, you should allow him to perform only operations he is authorized to perform.

In the code release that McKenzie examined (the "pre-alpha developer review"), an authenticated user could delete photos based on their IDs.  If you knew the ID of somebody else's photo, you could delete it, and discovering photo IDs was easy.  Here is an example of a URL (from another source) that contains IDs:  The part that follows the question mark (shown here in bold) contains a list of parameters and values, which the server-side script accesses to figure out what to do.  When you see such URLs, you can often figure out what the IDs refer to, and you can replace IDs in them to create various effetcs.  In Diaspora, for example, you could delete one of your own photos to see what that request URL was like, and then replace the photo ID to delete somebody else's photo.

Diaspora uses Ruby on Rails, and suffers from the security vulnerabilities of that platform.  In particular, McKenzie points to the Rails feature of "mass update", which is a convenient but dangerous programming tool.  It takes a list of keys and associated values and calls the related accessor method for each key.  This allows an attacker to add keys that the developer didn't intend to be modified, thus creating serious security vulnerabilities such as allowing one user to take over another's account.  If mass update is used on values received from an HTML form (which is often the case), all an attacker needs to do is modify the form to send malicious key-value pairs.  Because the form is presented on the attacker's machine, this is an easy task.

McKenzie details several other vulnerabilities, which allow attackers to discover or replace another user's encryption key, thus allowing them to decipher all that user's previous and future communications with the server.  This is quite serious in an application whose first priority was security and privacy.  A chain is only as strong as its weakest link, and this applies first and foremost to security, where an attacker will deliberately go after the weakest link.  The fact that Rails has mass update enabled by default should make you suspicious of any application built on Rails, since it requires positive action by security-conscious developers to disable mass update.  The more issues developers need to be careful of, the greater the chance that they will miss some, creating security vulnerabilities.  This makes me question whether Rails is a good tool for developing secure web applications (see The Right Tool for the Job).  This is unfortunate in tool whose main purpose is to make web development easy.

It remains to be seen whether Diaspora will finally be secure.  McKenzie ends his article on a positive note: "This is not a reason to despair, though: every error fixed or avoided through improved code, improved practices, and security reviews offers incremental safety to the users of software and increases its utility."  I am less optimistic: an application that boasts its security needs to designed for security from the first, not debugged and reviewed to decrease its vulnerabilities after the fact.

Sunday, May 22, 2011

Covariance and Contravariance

Type declarations on method parameters and return values are also constraints on the possible values that can be passed between a method and its caller.  As such, they form part of the contract for the method.  This is, in fact, a required part of the contract in statically-typed languages.  (Unfortunately, this is typically the only part of the contract that gets written.)

Suppose you have a class called Complex representing complex numbers, and a method that adds two numbers and returns the result.  In an object-oriented fashion, the method signature will be Complex add(Complex other), and a call c1.add(c2) will add the numbers c1 and c2 and return the sum.  The type declaration "Complex other" in the method signature is a precondition that requires callers to pass only objects of type Complex to this method, and the declaration that the result is Complex is a postcondition ensuring that only Complex objects will ever be returned by the method.

Normally, we don't really think of type declarations as a contract, since they are enforced by the compiler and no further action is required by the programmer to ensure that the contracts are indeed satisfied.  But if you think of type declarations as contracts in the context of Liskov's Behavioral Subtyping Principle, they have interesting consequences.  One is that subclasses are only allowed to strengthen postcondition but never weaken them.  This means that subclasses may return more specialized types than Complex.  This is not surprising; it is quite likely that there are several implementations of complex number, all of which are subtypes of Complex; for example, ComplexAsCartesian and ComplexAsPolar, referring to the two most common representations of complex numbers.  It is perfectly permissible for the add method of each subclass to return elements of that subclass (although that is not always the best policy).  But can you redefine the return type of ComplexAsCartesian.add to be ComplexAsCartesian rather than just Complex as in the superclass?  The answer depends on the programming language, and even on the language version.  For example, Java originally didn't allow any changes in method signatures, but since Java 5 it is possible to redeclare more specific types (that is, subtypes) for method return values.

This kind of change, where subclasses use subtypes of those used in their superclasses, is called a covariant change, since the direction of change is the same: as you go down the inheritance hierarchy, types get more specialized (that is, also go down).  Covariance is very natural to use in practice.  In addition to the example above, you can think of modeling artifacts (physical or virtual) that need to be connected in various ways.  Often, as you become more specific in one dimension, you need to be more specific in others.  For example, every laptop needs a power supply.  But specific laptops have dedicated power supplies, and you can't use any power supply with any laptop.  So the most abstract Laptop class may define an attribute called powerSupply, whose type is PowerSupply.  But an ABCLaptop requires an XYZPowerSupply, so the type of the power supply goes down the hierarchy as the type of the laptop goes down.

Covariance is all very well for method return types, where it fits the Behavioral Subtyping Principle.  But it contradicts the principle for method arguments.  The type of a method argument is like a precondition, since it constrains the values that the caller can pass to the method.  Preconditions may only be weakened in subtypes; this means that argument types can only be generalized in subtypes.  This is called contravariance, since argument types to up the hierarchy in subtypes.  So here we have a conflict between what is theoretically correct (contravariance for argument types) and what is useful (covariance).

There is a large debate on this issue, and different languages treat it differently.  None of C/C++, Java, or C# allow any change in argument types (this is called invariance).  This is, in fact, unrelated to the issue of covariance versus contravariance; it is a consequence of the fact that these languages support overloading (see Overriding and Overloading).  As a result, any change in method argument types defines an overloaded method, which is unrelated to the original method.  Eiffel follows the practical route, supporting covariance of argument types, and incurring the inevitable cost.  This cost is the lack of compile-time type checking for these cases.  In statically-typed languages, including all the ones mentioned above, we expect the compiler to type-check the program so that no type violations will occur at runtime (with some other notable exceptions, such as type casting and some uses of generics in Java).  Covariance creates another exception to this rule.

Of course, preventing covariance in the language doesn't solve anything, since in practice it is still necessary.  What programmers do when the language doesn't support covariance is to write their own tests in subclasses, checking for the correct subtypes and throwing exceptions when they are incorrect.  So even though the program is fully type-checked, there are still type-related exceptions at runtime.  But because these are programmed manually rather than being thrown by the language runtime, language designers feel it isn't their responsibility.  This ostrich effect is, of course, completely pointless.

Thursday, May 5, 2011

Abstract Preconditions

Returning to the example of the Set type, suppose now that its implementation ArraySet uses a fixed-size array to hold the set elements, and is therefore limited in size (or "bounded").  The maximum size, called the set's capacity, is given when the ArraySet object is constructed and the array is allocated.  The method ArraySet.add, which can add a new element to the set, now needs to have a postcondition size() < capacity(), where size() returns the current size of the set and capacity() returns the maximum size.

Side note: in such cases, it is common to provide the size() method, but less careful designers don't provide capacity().  The assumption is that you ought to know the capacity, since you gave this value when creating the set.  But, of course, you may be using a set created by someone else.  Writing contracts often alerts you to the necessity of adding such informative methods.

According to Liskov's Behavioral Subtyping Principle (see The Correct Use of Inheritance), ArraySet may not strengthen a method precondition.  Therefore, Set must also declare the same precondition (or a stronger one) on the add method.  In general, however, sets don't need to be bounded, and therefore this precondition is inappropriate there.  What is wrong?  The first thing to question is the inheritance hierarchy: is ArraySet really a subtype of Set?  More generally, are bounded sets types of sets?  At first blush, the answer is obviously yes.  But on deeper thought, it seems that we have our abstractions a little bit off.  There are really three concepts involved here: general sets, bounded sets, and unbounded sets.  Both bounded and unbounded sets are kinds of sets; bounded sets have the size() < capacity() precondition, while unbounded sets don't.  Again, it seems like general sets must have this precondition as well, to comply with the Behavioral Subtyping Principle.

Certainly, general sets need to have some precondition that subsumes the boundedness condition, but it doesn't need to have the same form.  The abstract meaning of the precondition is that the set may not be full when adding a new element.  In this form, the precondition is applicable to any set; bounded sets may sometimes be full, while unbounded sets will never be full.  So we need to add  to the Set type, which represents general sets, a method full() that returns a boolean value.  The precondition on Set.add will be !full().  At this level, this precondition is abstract: its precise meaning is unknown.  A client that uses an object of this type therefore needs to check that the set isn't full before adding an element.  The implementation of Set.full may provide some default value, or may be absent, in which case the Set type itself is abstract.

ArraySet.full overrides Set.full to return the value of the expression size() < capacity().  In addition, this method has a postcondition $ret == size() < capacity().  Clients of ArraySet now know the precise condition under which they are permitted to call the add method.  In contrast, unbounded implementations of Set, such as HashSet, override Set.full() to return false, with a postcondition $ret == false (or !$ret).  Clients of HashSet therefore know that the precondition on add is really vacuous: it will never be true, and therefore they are allowed to call add under all circumstances.

If the concepts of bounded and unbounded sets are deemed important enough, it is possible to define them as classes (or interfaces) in their own right, and put the relevant contracts there.  In this way, the missing abstractions are crystallized in the design, with precise contracts that obey the Behavioral Subtyping Principle without forcing a formulation of the precondition that is specific to a subtype.

Friday, April 15, 2011

Overriding and Overloading

Two terms that sound similar and are sometimes confused are overriding and overloading.  The former is an essential part of inheritance, and therefore of object-oriented programming; the latter is a syntactic device that adds no expressive power to the language.

Overriding a method means providing a new implementation for it in an inheriting class.  This may be necessary because the subclass has a different representation, which has more information than the superclass.  For example, suppose you are programming an interactive game, in which characters lose health points as a result of exertion or by attacks by other players.  This is implemented through a method damage(int points), which inflicts the given amount of damage.  Now assume that you create a special kind of character, called Merlin, which can make itself invisible as well as operate a force shield that protects it from certain kinds of enemy actions.  However, it can't perform these new functions when its health is too low.  The class implementing the Merlin character inherits from the general character class.  It needs to override the implementation of the damage method, in order to disable invisibility and the force shield if the health level drops below the relevant thresholds.  When the damage method is called for any character, it decreases its health; for Merlin, it may additionally disable his special abilities.

The choice of which code runs for a particular invocation of the damage method is made at runtime.  If the character happens to be Merlin (that is, its type is the Merlin class), the code executed will come from the Merlin class.  If the character's type is another class, the code will be taken from that class, or from the closest ancestor class that provides an implementation of that method.  This is called dynamic dispatch, and is a cornerstone of object-oriented programming.

Merlin's special abilities can be damaged by special types of attacks, which do not otherwise damage his overall health.  It seems natural to add to Merlin's class a new method, damage(Ability ability), which damages the given ability but does not affect other abilities or the health measure.  This is a case of overloading: using the same name for two different methods.  Many, but not all, object-oriented languages support overloading.  Java, C#, and C++ all do, while Eiffel doesn't.  The choice of which code gets executed in a call to the overloaded damage method is made by the compiler: if the argument is an int, the general damage method will be called; if the argument is an Ability (that is, its type is the Ability class), the second will be called.  These types must be known at compile time; if they aren't, the program won't compile.

Because the decision of which overloaded method to call is static, it is known to the programmer writing the call.  Therefore, overloading can always be replaced by renaming methods.  In our example, the new method that damages Merlin's special abilities can be called damageAbility to remove the overloading.  In fact, overloaded methods are implemented using this technique; languages such as C++, Java, and C# have a special notion of method name, which attaches encoded information about argument types to the method name as defined by the programmer.

When used properly, overloading can make programs more understandable.  For example, the + operator is overloaded in Java to represent addition of integers as well as floating-point numbers.  That's perfectly understandable, since the mathematical addition operator applies to all these types and has the same properties for them (associativity, commutativity, etc.).  It would be perfectly fine to use + for adding complex numbers if Java allowed this (and is common in C++, which does).  However, Java overloads + to denote concatenation of strings as well.  This is confusing, and is a design error of the language.  This error is compouned by the fact that + coerces its arguments to more general types, so that x + y will denote floating-point addition if x is an int and y is a float, but string concatenation if x as an int and y is a string.  Worse yet, mathematical properties of addition don't hold for string concatenation, and as a result ("a" + 1) + 2 is "a12" while "a" + (1 + 2) is "a3".

Because overloading doesn't add to the expressive power of the language, and is a trap for the unwary, it should be removed from object-oriented languages.  Unfortunately, the C++/Java/C# family of languages encourage overloading, and even require it in the case of constructors, whose names must be the name of the class.  Suppose you want to create a complex-number class called Complex, and give it two constructors, one receiving the rectangular representation of the number (x and y for x+iy) while the other receives the polar representation (rho and theta for rho*e^(i*theta)).  You want to call these two constructors by descriptive names, such as complexFromRectangular and complexFromPolar.  But in C++/Java/C#, both must be called Complex!  Not only are the names non-descriptive, they also coincide, because both constructors take two arguments of the same types.  I won't describe the contortions that programmers of these languages have to go through to solve this problem.

My recommendation: avoid overloading altogether.  If you really feel it's essential in some case, at least make sure that the overloaded methods have the same semantics (as in the case of mathematical addition).  If you use overloading in other cases (such as addition and concatenation), you will have only yourself to blame when you get confused; and your users will be perfectly justified in blaming you when they, in turn, get confused.

Friday, April 8, 2011

Writing Contracts

In the best thriller tradition, we left our hero, the Set type, in an awkward position at the end of the last episode.  Because Set has a subtype, HashSet, that doesn't allow nulls as values, the method Set.add(Object x) needed to have a precondition x != null.  But Set has another subtype, ArraySet, which does allow null values.  Therefore, we couldn't put a postcondition on Set.get() that guarantees that the returned value won't be null.  The Set type now precludes null inputs, but may return null values; this seems inconsistent.  Yet both contract decisions seem to be forced by Liskov's Behavioral Subtyping Principle!

In such cases, it is often helpful to look for missing abstractions.  Here, there is an obvious correlation between null inputs and outputs: some sets support null values in both cases, and the others don't in either.  The Set type should clearly distinguish between the two types of sets, by means of a query such as supportsNullValues().  The precondition of Set.add(Object x) will now be:

!supportsNullValues() ==> x != null

where "==>" denotes logical implication.  The postcondition of Set.get() will be:

!supportsNullValues() ==> $ret != null

where "$ret" denotes the returned value.  The contract for Set is now consistent in both cases.  If supportsNullValues() is true, the precondition on add and postcondition on get are inactive; if it's false, both are active.

Furthermore, we can now add a postcondition to HashSet.supportsNullValues(): $ret == false (or, equivalently, !$ret).  As a result, developers using HashSet know, by examining the contract, that add must not be given null values and get will never return null.  Similarly, ArraySet.supportsNullValues() will have a postcondition $ret == true (or simply $ret).  Developers using ArraySet will now know that they can provide add with null inputs, and may receive nulls from get.  Developers who use the Set type without knowing the particular implementation can act in one of two ways.  First, they can be cautious and not insert any null values into the set.  Because Set.get will only return elements previously inserted into the set, it won't return null in this case.  (Note that this is part of the contract that is difficult to specify formally, since it involves conditions on the behavior of the set over time.  However, it is an important part of the contract, and should be documented as such, using natural language.)  The second option is to call supportsNullValues() and act according to the value returned.

In the example, Set.add had another precondition that stated that the client is not allowed to add to the set an element that is already in it.  This is legal but not recommended.  In general, preconditions should contain any conditions under which the action of the method is not well defined.  For example, it makes no sense to ask for an element of an empty set, which is why Set.get has a precondition that the set isn't empty.  But adding to the set an element that is already there is perfectly well defined; the set simply doesn't change.  Instead of requiring clients to check this condition before calling add, the implementation of the add method should simply do the right thing and not add the element if it already exists in the set.  A little effort on the part of the implementer can save much effort for clients.

Friday, April 1, 2011

The Correct Use of Inheritance

Perhaps the most subtle relationship in object-oriented programming is that of inheritance.  Many object-oriented languages, including Java, distinguish between classes and interfaces, as a reaction to the problems languages such as C++ faced with multiple inheritance.  This is a complex issue I will address in a later post.  Now I want to focus on the logical relationships involved in inheritance.  In order to avoid the class/interface dichotomy, I will use the notion of a subtype.  A type A is a subtype of B if it inherits from it, whether both A and B are classes, both are interfaces, or A is a class and B is an interface.  (Java uses the verb "extends" for the first two cases and "implements" for the third.)

An element of a subtype A may be used whenever an element of the supertype B is indicated.  For example, suppose we have a type called Set that represents mathematical sets.  It has a number of implementations inheriting from it, including ArraySet, which holds the elements of the set in an array, and HashSet, which holds the elements in a hash table.  The Set type has various methods, including add(Object x), which adds an element to the set; get(), which returns an arbitrary element of the set; and has(Object x), which returns true if and only if x is in the set.  Since a set can't contain the same object multiple times, we put a precondition on add(Object x) requiring that x is not already in the set: !has(x).

Unfortunately, our HashSet implementation doesn't support null as an element in the hash table, so we add another precondition to add(Object x), stating that x can't be null: x != null.  Now, suppose we have a client C that uses our code, and receives a parameter s of type Set.  When the programmer of the client code reads the contract for Set, he sees only one precondition: !has(x).  He will certainly be careful not to insert the same element twice, for example by using the following code:

if (!s.has(element)) s.add(element);

But he has perfect confidence in the following piece of code:

if (!s.has(null)) s.add(null);

He will be rudely surprised, therefore, when this actually bombs on him somewhere inside the call to add in our HashSet implementation.  He may not even have known about the existence of HashSet, since he received the object s, which happened to have this type, as a parameter.  He is the proverbial innocent bystander, and should be protected from this kind of problem.  After all, he kept his side of the contract, and has every right to expect us to keep our side!

This demonstrates a general problem, which will manifest itself whenever the precondition of a method in a subtype is stronger (that is, requires more) than that of the supertype.  The rule must therefore be that subtypes are not allowed to strengthen method preconditions.  We will have to add a precondition to Set.add, to warn potential users that some subtypes may not accept null as a parameter.  Our ArraySet implementation can now weaken the precondition by removing this requirement.  This is perfectly legal; clients that use ArraySet objects through the Set supertype will not be allowed to call add(null), but clients that use ArraySet directly will be able to enjoy the extended functionality.

Since we restrict Set.add not to accept null parameters, it looks like we can now put a postcondition on Set.get, stating that it will never return null.  This will work well for HashSet, but will fail for ArraySet, which does accept null elements.  By adding this postcondition to Set.get, we have again harmed innocent bystanders; in this case, a client of Set who happens to receive a HashSet object as its own parameter.  Such a client feels perfectly justified in writing the following code:

if (!s.isEmpty()) System.out.println(s.get().toString());

He is careful to check the precondition of get, which states that it can't be called when the set is empty, and he is therefore entitled to rely on get's postcondition and to expect that no NullPointerException will be raised on the call to toString.  He may not even be aware of the existence of ArraySet, and of the fact that it doesn't have this postcondition.  The rule for postconditions therefore must be that postconditions must not be weakened by subtypes, although they can be strengthened; this enables clients of the subtype to benefit from extra features it supports.

These rules are consequences of Liskov's Behavioral Substitution Principle, which states that properties of the supertype must still be true of subtypes.  This principle is true of all inheritance hierarchies which allow subtypes to be used as elements of their supertypes.  After all, this is what a subtype means: if A is a subtype of B then every instance of A is, by definition, also an instance of B.  (This does not apply to private inheritance in C++, which doesn't imply a subtype relationship, but it does apply to public inheritance in C++, as well as to all inheritance relationships in Java, C#, and similar languages.)  The principle is true whether or not the developer uses design by contract, and writes contracts explicilty; a violation of the principle is still a bug, and a nasty one at that.  Using design by contract makes it easier to prevent such violations in the first place.

Any developer who is ignorant of Liskov's Behavioral Substitution Principle can't claim to understand object-oriented programming, and should be prevented from writing any kind of inheritance relationship!

Tuesday, March 8, 2011

Object-Oriented Contracts

The most basic contract includes preconditions and postconditions for each modular unit (function, subroutine, method, or whatever it is called in a particular programming language).  In object-oriented programming (OOP), there is an additional part of the contract, as well as interesting relationships between contracts in an inheritance hierarchy.

The basis of OOP is the association of operations with data types (or what are often called records in non-OO languages).  Every object has a state, composed of the values of its fields (also called attributes).  Operations on the object (often called methods) can access this state, and some also modify it.  Every object represents some abstraction.  These abstractions can be related to the application domain; for example, "employee", "salary", "student", "loan", or "flight segment".  They can also be internal to the implementation; for example, "set of nodes visited in a graph", "queue of tasks to perform", or "database connection".  It must always be possible to relate every concrete state of an object to the abstract entity it represents.  The representation function takes a concrete state to the abstract object it represents.

In general, however, there may be possible object states that have no associated abstraction.  For example, consider an implementation of a stack that uses an array rep to hold the objects in the stack together with an integer field count that represents the number of elements currently on the stack.  If count is negative, the object represents no abstract object, since there is no stack with a negative number of elements.  The representation function can therefore be partial: it isn't defined for all possible object states.  It is important to give a precise characterization of the domain of the representation function, since this distinguishes meaningful object states from meaningless ones.  If you ever find a meaningless object in your program, you know there's a bug in it.  The characterization of the domain of the representation function as a logical formula is called the representation invariant; the word "invariant" signifies that this is a property that must be true for all objects at all times.

The third part of an object-oriented contract is therefore the class invariant.  A crucial part of the class invariant is the representation invariant, but it can contain other assertions as well.  The class invariant must be true for all objects of the class at all times except while a method of the class is executing.  The last proviso is necessary since a method of the class may update several fields of the object; between the first and last updates, the object may be in an inconsistent state.  This is unavoidable, but causes problems, especially with concurrent programs.  Other problems arise from the fact that the invariant of one object may depend on other objects; for example, if an object that represents an agenda contains a list of items, each item must point back to the same agenda as its owner.  This can be defined as an invariant of the agenda, but operations on the items may invalidate this property.

In simple cases, proving that the class invariant always holds (that is, it is really invariant) is similar to an inductive proof: you need to show that the invariant holds immediately following the construction of the object, and that it is maintained by every operation of the class.  However, because of the problems mentioned above this isn't enough in many cases.  Still, it is a necessary condition, and one that is often easy to check.  Ignore it at your peril!

Thursday, March 3, 2011

What's in a Contract

The term "design by contract" is due to Bertrand Meyer, and is at the heart of his Eiffel language.  But the concept of using assertions to specify the behavior of small units of software (such as function calls) is much older, and not specific to the object-oriented paradigm.  Hoare logic, which I mentioned previously, consists of triples specifying program fragments with their preconditions and postconditions.  With objects, and especially with inheritance, come new features, but I want to start with simpler languages.

At the most basic level, the contract of a software unit (which I will refer to as a function) consists of a precondition and a postcondition.  The precondition is an assertion that must hold on entry to the function, and the postcondition must hold on exit.  Obviously, it is the caller's responsibility to ensure that the precondition is satisfied before making the call.  If that is not the case, then the function is under no obligation whatsoever; it may enter an infinite loop, return the wrong result, or blow up the computer.  In particular, it is not obliged to satisfy the postcondition.  However, if the precondition was true on entry, it is the supplier's responsibility to ensure that the computation of the function terminates, and that the postcondition holds on exit.

This scheme separates responsibilities very precisely between the client and supplier of the function, and enables modular reasoning.  As a simple example, suppose you need to use the sine function, and discover that the implementation you have has a precondition that the angle is between 0 and 90 degrees.  If the angle you want to compute the sine for happens to be in that range, you can go ahead and use the existing function.  If not, you can use elementary trigonometry to convert your angle to one that is between 0 and 90 degrees and has the same sine value, possibly with a different sign; in that case, you have to invert the sign yourself.  Of course, implementations of this function you are likely to encounter are able to compute the sine for any angle, and have no precondition.  It is, of course, crucial to know the difference between functions that have this precondition and those that don't.  If you call the former without being aware of the precondition, you are likely to get wrong results, which may or may not be easy to detect.

As another example, suppose you have a point (x, y) in the plain and you want to know the angle a line from the origin to (x, y) has with the positive direction of the x axis.  Mathematically, this is arctan(y/x).  However, the arctan function has a range of only 180 degrees; any implementation has to choose which range of values it will return.  Typical values might be –90 to 90 degrees, or 0 to 180.  This will be the correct angle only for two out of four quadrants in which the point (x, y) lies, and so you will need to convert the result to the proper quadrant based on the signs of x and y.  Here, you must know the postcondition of the arctan function, which will tell you what is the range of possible results.

If you have contracts for your function and those that it calls, you can now think of verifying that your implementation is correct.  You need to do is prove that your function fulfills its own contract, but instead of the proof having to involve all functions you call, and everything they call further down the chain, you only need to consider the contracts of called functions.  This is not necessarily easy, but it is much easier than having to consider all called code, which can be very large.  One thing you will still need to provide are loop invariants and variants for any loop in your own code.  (In Eiffel, these are part of the language, although they are not considered part of the contract.)

This leads to a modular proof style, in which each function can be verified separately.  Of course, you can only do full verification if contracts are complete specification.  But even if they aren't, you may still be able to prove important properties of your code.  Without contracts, this is a formidable task, which is very rarely even attempted.  It's a pity that many programmers are not aware of this possibility.

Saturday, February 26, 2011

Design by Contract

As I mentioned before, the state of the art in logics of programs isn't yet capable of verifying large-scale programs, even with the best theorem provers currently available.  As a result, the vast majority of programmers don't use any formal methods during software development, and the term "verification and validation" refers mainly to testing.  What many people don't realize is that there is a middle way in using formal methods; it is possible to improve software quality by using limited but effective parts of the theory.

Design by Contract is a method of using assertions in object-oriented programs to specify their behavior.  This importance of this specification is first and foremost methodological.  The mere fact of writing these specifications and having them available in libraries focuses programmer attention to the correct use of program interfaces.  Many errors stem from misunderstandings; a simple but common example is when the writer of a method assumes that a certain parameter may not be null, and omitting to check or handle the case of a null pointer, while the client of the method assumes that the method does something reasonable when given a null value.  The contract defines precisely the responsibilities of the client and supplier; in particular, if the method doesn't handle null pointers, it must declare that fact in its contract.  This contract is a formal part of the interface, and (unlike the source code) must be available to all programmers who use the method.  They, in turn, must look at the contract and behave according to their responsibilities in it.

This simple process of writing the contract for suppliers and consulting it when writing clients can prevent many errors before they are even made.  In order for this to work, the contract doesn't even need to be completely formal, as long as it is precise.  And it has to be simple enough to be easily understood, otherwise it is worthless and may even cause damage.  However, if developers take the trouble of writing formal contracts, they can be rewarded by more than methodological advantages.  Indeed, if contracts are written using a formal notation, it is possible to create tools that give additional benefits.

First, it is possible (though by no means trivial) to write tools that check the contract at runtime.  Many errors are time bombs waiting to happen.  For example, storing a null value in a field of an object may by itself cause no harm.  Only later, when the field is accessed under the assumption that it can't contain null, does the software present an error (in Java, it would be a NullPointerException, nicknamed NPE).  The process of finding where in the program the original null value was assigned can be difficult and time consuming.  A debugger is a useful tool, but it takes skill and time to use it to track such errors, because in many cases the method call that assigned the null value has already completed its execution and is no longer available on the stack to be examined by the debugger.  A tool that checks contracts at runtime would alert the developer to the problem much earlier, thus speeding both discovery and fix of the problem.

Contracts also enable some level of verification.  Instead of having to prove that the program as a whole has some desired properties, the method and class contracts provide "stepping stones" to a (partial) proof of correctness.  If we can prove that all calls in the program satisfy the method of the called method, we will have increased our confidence in the correctness of the program.  Because proving such properties is much easier than proving correctness of a whole program, tools such as theorem provers can be more effective there.  While there aren't yet any commercial tools that do that, there is a lot of research on this topic, and I expect the benefits of automated theorem provers to become practical first in this limited arena.

One of the important advantages of design by contract is that it provides incremental benefits.  Programmers can get immediate benefit from simple contracts, such as those having to do with null pointers.  By adding more details to the contract, they can get more benefit.  It isn't necessary (or even useful) to attempt a full specification in order to use this methodology.

Purists may scoff and say that this is not what logicians mean by verification, and is no better than testing.  I would agree with the first part of this statement, but not with the second.  Contracts are complementary to testing, and both are necessary.  (I will have more to say about this in the future.)  If you insist in full verification, meaning proofs of complete correctness, you will not bring practical value to software development for a very long time.  Design by contract is a practical and useful methodology that should be employed by every software developer.  I will discuss some of the reasons why this is not yet the case in a future post.

Thursday, February 24, 2011

Logics of Programs

From the early days of computers, it was obvious that programming is difficult and error-prone.  The mathematical nature of computer programs leads to the obvious thought that the same methods used to make sure that mathematical theorems are correct can help ensure that programs are correct.  This has led to a lot of research on logics for computer programs.  There are many abstractions of programs, and correspondingly many logics that deal with them.

At the simplest level, it is possible to think of a program as being executed sequentially, instruction by instruction.  This is almost never true; even the simplest processors typically interrupt the sequential flow to handle external events, such as pressing a key on a keyboard or control panel connected to the computer.  However, if we trust the operating system to make a clean separation between such interrupt handlers and the running application, then it is possible to think at a higher level of abstraction of some applications as running sequentially.  But many application are concurrent by nature, and these are the most difficult to get right.  There are many examples of small programs with two parallel components that have subtle and hard-to-find bugs in them, and parallel or concurrent systems are notoriously hard to program.

Even the simple sequential model requires special mathematical treatment, because it has to represent the changing state of the computer.  Every operation in a program changes the state of the machine in some way; otherwise, there would be no point in executing it.  There are many types of changes: first, the location of the next instruction (the "program counter") changes, either to the next instruction in the sequence, or to a different place for instructions that change the flow of control.  Some instructions change the contents of memory, registers, or various control bits (for example, one that records whether the previous comparison resulted in a zero value).  High-level languages hide some of these details, but most (with the notable exception of functional and logic languages) still have the notion of changing state.

A logical model of programs with changing state must represent this state explicitly.  If x is a variable in a program, the formula x=3 is meaningless with respect to a program, because it may true or false in different states of the computation of the same program.  The logical model must have an explicit representation of states, and some syntactic way of referring to different states.  One of the best known formalisms is called Hoare logic, after C. A. R. Hoare, Turing prize winner and famous, among other things, as the inventor of the quicksort algorithm.

Every formula in this logic is a Hoare triple of the form {P}A{Q}, where A is a (possibly partial) program and P and Q are logical statements about the variables of the program.  This formula asserts that if P holds in the state just prior to the computation of A, and if this computation terminates, then Q will be true in the resulting state.  For example, a (somewhat simplified) way of describing the effect of an assignment x := e in a program is by the formula {e = c} x := e {x = c}, where c is a constant.  (Note the crucial difference between logical equality and assignment, so unfortunately blurred by languages from Fortran to Java that use "=" for assignment.)  This formula asserts that if the value of the expression e in the state preceding the assignment is c, than the value of x after the assignment is also c.

As another example, consider how to prove that a certain formula holds for a conditional: {P} if C then A else B {Q}.  If the condition C is true in the state before the computation, then the conditional will choose the subprogram A.  In order for the desired formula to hold, it must be the case that if P and C are true before the computation, Q must be true afterwards (provided, as always, that the computation terminates).  Formally, {P and C} A {Q} must be true.  Similarly, {P and not C} B {Q} must hold to guarantee the desired result in the case where the condition C is initially false.  And indeed, the desired formula follows from the latter two in Hoare logic.

As is the case for most logics of programs, the difficulty lies in reasoning about loops (and recursion).  In order to prove a property of a loop using these logics, it is necessary to come up with a loop invariant.  This is a property that always holds at the beginning of each iteration.  In order to prove a property P following the loop, you prove that the invariant holds on entry to the loop, that it sustains itself across one iteration, and that if you exit the loop with the invariant holding, P must be true at that point.

This is very similar to a proof by induction; the first step is the base of the induction, and the second is the inductive step, in which you prove that if the invariant holds in the beginning of one iteration, it also holds on the beginning of the next one.  As is the case with inductive proofs, coming up with a suitable invariant is difficult.  Much of the work of automatic theorem provers (for mathematics as well as for logics of programs) is discovering the induction hypotheses or, equivalently, the loop invariants.  (In fact, in one of my favorite theorem provers, ACL2, these are exactly the same.)

This kind of reasoning is very low-level, and it is impossible to use it to prove the correctness of a large program manually.  Automated theorem provers are a great help, but the process is still difficult and time-consuming, and requires high skills in logic and the specific theorem prover.  It is therefore limited to relatively small but safety-critical programs.  Wide application of automatic theorem proving to large-scale programs seems to be far in the future.  As a result, there is a chasm between the people developing logics of programs and associated theorem provers, and the general population of software developers, most of whom are not even aware of the existence of these methods and tools.  But there is a middle way; I will discuss it in the next post.

Tuesday, February 15, 2011

Thinking Formally

If there is no alternative to clear thinking about our programs, we should investigate what we can learn from the science of organized thought.  Logic has a long history, with equal roots in philosophy and mathematics.  In some sense, computer programs breath life into abstract logic and give logic the abilitiy to do real things.  And, just like Pinocchio, they often misbehave!

One of the fundamental concerns of logic is the study of proofs: how can we convince ourselves in a precise and mechanical way of the truth of some claim.  The desire to do so resulted from the need to distinguish between valid and invalid mathematical proofs, expressed using prose.  Sometimes, hidden assumptions were not recognized, leading to faulty proofs.  For example, consider the first axiomatic proof system, Euclidean geomerty, studied in middle school.  It has five axioms, including the famous "parallel postulate".  It is an amazing achievement, which is still an excellent example of how to reason formally, and was the source of many important advances in mathematics.  But Euclidean geometry has many hidden assumptions, which are glossed over by the use of informal drawings.  About 2200 years after Euclid, several mathematicians tried to completely formalize geometrical proofs.  David Hilbert's axiomatization of geometry includes twenty axioms instead of Euclid's five.  One of these states that between every two different points on a straight line there is a third point on the same line.  This was so obvious to Euclid that he didn't even think to mention it in his list of axioms.  But axioms are exactly those self-evident propositions on which the theory is based, and the proof isn't logically sound unless all these are stated explicitly.

Logic is closely tied with computation, although the connection was made explicit only towards the 20th century.  A formal proof is a series of deductions, each of which is based on axioms or previously-proven propositions, using a set of deduction rules.  It is supposed to be so simple that it can easily be checked for correctness, without any understanding of its subject matter, whether it is geometry, number theory, or abstract algebra.  In other words, it should be easy to write a program that checks a formal proof; all you need to do is verify that each step follows from the axioms and from previous steps according to the rules.

But wait!  Is there a limit to the complexity of the axioms or of the rules used in the proof?  Without such limits, the whole exercise falls apart.  For example, imagine a system of axioms that contains all true statements in number theory.  This is, of course, an infinite set.  How difficult is it to write a program to check whether a given statement is an axiom?  Well, it turns out to be impossible!  (This is a consequence of Gödel's first incompleteness theorem.)  So, if we are to be able to write a program to check formal proofs, we need to restrict the possible sets of axioms.  For example, in Gödel's theorems, the set of axioms is required to be recursively enumerable; this means that there is a (finite) program that can generate all axioms and never generates anything that is not an axiom.  Of course, this program will never terminate if the set is infinite.  That's acceptable, as long as the program is guaranteed to produce every axiom if you wait long enough.

So now, a concept about computability has entered the study of logic itself.  It is only fair, then, to apply logic to computation; and, indeed, the study of logics of programs is an important field in computer science.  There are even computer programs called theorem provers, which have been used to prove difficult mathematical theorems as well as to verify the correctness of other programs.  However, this has still not affected the day-to-day work of most software developers.  In the next post, I will discuss why this is the case and what formal tools can still be used for software development today.

Tuesday, February 1, 2011

Responsible Adults

If some programmers are not behaving maturely (see Programmers as Children), what can we do as responsible adults to make things better?  There are many answers, and they depend on who is meant by "we".

First, however, I must mention what is perhaps Fred Brooks's best-known paper, "No Silver Bullet: Essence and Accidents of Software Engineering" (IEEE Computer, April 1987).  In it, Brooks claims that software engineering is hard in essence, and therefore nothing can make it easy.  While a lot of progress has been made on what Brooks calls accidental difficulties, which have to do with the way represent the abstract concepts of software engineering, these only expose the essential difficulty of the conceptual structures underlying software development.  The silver bullet is the only effective way to kill werewolves; Brooks claims that no such magical cure can be found for the ills of software development.

This is, of course, a provocative statement, and it sparked lively discussion.  But in spite of various opinions to the contrary, no silver bullet has yet been found.  This is not to say that progress has not been made.  Even if the essential problem is difficult, programmers are still being asked to develop larger and larger systems.  They should use whatever cures are available, even if they cure only the symptoms and not the disease.

So what can we do?  The most important thing is to be aware of the difficulties and look for ways to avoid them.  There is no substitute for careful thinking, and no tool will do that for us until true artificial intelligence is available.  (While great progress is being made – for example, see the Watson Jeopardy challenge – don't hold your breath waiting for "automatic programming" in the sense of the computer writing its own programs.)  More concretely, there are many practices we can follow in order to make software development easier, more productive, more reliable, and also more fun.  These fall into several categories, including methodology and process, languages and tools, and education.  I will have something to say about each of these, but only a book the size of an encyclopedia can do justice to them all.

There are many software-developoment methodologies, and none is perfect.  In fact, Daniel Berry of the University of Waterloo has a paper titled "The Inevitable Pain of Software Development: Why There Is No Silver Bullet".  Another provocative paper, and one of the responses to the one by Brooks, this one focuses on why no methodology can ever be a silver bullet.  Berry claims that every methodology has some inherent pain associated with it.  It may be all pain, as in the maligned waterfall method, or the pain could be just a small part of it.  In any case, it is unavoidable, and developers therefore evade it by not following the methodology rigorously, thus defeating its purpose.  But this is like parenting: although no one knows of a sure method to raise perfect children, that doesn't keep us from trying.  In the same way, the fact that there is no perfect development methodology doesn't mean that we should not attempt to find the best process for our situation and follow it to the best of our ability.

As I mentioned before, I'm a great believer in tools, and spend a lot of time developing new tools.  I have also been involved with computer-science education, at both the university and high-school levels, and I have a lot to say about these topics.  Stay tuned!

Wednesday, January 19, 2011

Training Wheels

Parents put training wheels on bicycles when children first start learning to ride.  Once the children have enough experience and confidence, however, the training wheels get in the way and are removed.  You will never see an experienced cyclist using training wheels.  But you will often see experienced computer users, including software developers, using features meant for novices.

There is certainly a lot to learn about using computers, especially with complex applications such as development environments.  It makes sense to help people learn and adjust to the proper use of these applications.  But as they acquire experience, they should be able to remove the training wheels, lest they become a burden.  And giving people help on the learning curve should not mean dumbing down interfaces!

The mouse is a case in point.  I believe that taking the mouse away from developers' workstations will improve developer productivity more than any other action, possibly after a small adjustment period.  I keep seeing developers waste their time using the mouse to hunt for a specific menu entry, which may be hidden under one or more menu levels.  This is becoming more and more difficult as screen resolutions get higher and the clickable area gets proportionately smaller.  Instead, a short sequence of keys (two to five keystrokes) will achieve the same result in less than a second.

Of course, in order to take advantage of the keyboard you need to memorize quite a few such sequences.  This is impossible for a casual user of the application, but happens easily and automatically for an experienced user, given some learning time.  It's just like riding a bicycle: at first you need to think explicitly about every action you take, but after a while it becomes automatic.  And this is true for any repetitive activity, such as walking, swimming, driving, or touch typing: if you practice enough, it becomes automatic to the point where trying to think about it is actually confusing.  Try to think about the mechanics of walking while you do it, and you will lose your step.  I touch-type, and when I'm asked about the location of some key I need to mimic the typing movement in order to get reminded of the location through my muscle memory.  Similarly, I know a great many keyboard shortcuts for applications I use frequently, but I find it hard to articulate the key sequences.

I'm an avid  Emacs user.  Emacs (The One True Editor) has many excellent qualities, but it has a steep learning curve because of its many keyboard shortcuts, which can use any combination of the Ctrl and Alt modifiers with any other key, and also have multi-key combinations.  (The Lisp Machine keyboard, also called the space-cadet keyboard, had two additional modifiers, called Super and Hyper).  Once you learn these sequences, however, you can make text fly on the screen.  I remember sitting next to one expert when I first got to MIT; I was staring at the screen and seeing the program transform itself, and I couldn't relate what I was seeing to the keystrokes used to create that effect.  I have since had that effect myself on other people.

For many years I taught courses based on the excellent book Structure and Interpretation of Computer Programs, which uses the Scheme programming language.  My interpreter of choice for that course was MIT Scheme, whose editor component is based on Emacs.  However, students prefered to use Dr. Scheme, simply because it has a more conventional editor.  I can understand their reluctance to learn the Emacs keystrokes, since they thought this skill would only be useful in that single course.  But there is no justification for professional developers to avoid learning skills that will make their work easier and faster.

Unfortunately, the tendency among tool developers is just the opposite.  They target their interfaces at the lowest common denominator, thus making the tools attractive to the novice user but hampering the experienced user.  Mouse-based interfaces are today much preferred over keyboard shortcuts, to the extent that Microsoft has eliminated the indications of which keyboard shortcuts are available (using underlined characters); these can be revealed through the "accessibility options" dialog, as though you are somehow mouse-challenged if you want to use the keyboard.  (Hint: look for "Show extra keyboard help in programs".)

Because of this behavior of tool writers, users have no chance to learn the keyboard shortcuts and make their use automatic.  Thus, they are stuck with the laborious use of the mouse and never outgrow their training wheels.  As always, you should use the right tool for the job.  The mouse is good for applications you only use occasionally, and is indispensable for some graphics applications.  But keep your hands on the keyboard for software development!

Monday, January 10, 2011

Pointers: Two Views on Software Development

I recommend reading the article Risks of Undisciplined Development by David Parnas in the Inside Risks column of the October 2010 issue of Communications of the ACM.  David Parnas is famous (among other things) for opposing Reagan's Stragegic Defense Initiative (SDI, nicknamed "Star Wars") on the grounds that software engineering state-of-the-art was not sufficient to build such a system (it still isn't).  This article urges using, and teaching, disciplined development practices.

On a lighter note, here is another view of a good software development process:

This is taken from xkcd, which I follow regularly (together with Dilbert).

Sunday, January 2, 2011

It Looked Good When We Started

"Well, in our country," said Alice, still panting a little, "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing."
"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
— Lewis Carrol, Through the Looking-Glass, and What Alice Found There  

The Red Queen's race is often quoted in discussions about the increasing rate of change in our lives, which demands running just to stay in the same place.  This is particularly true in software development, where change is the only constant factor.  Any successful software system will need to be changed, and, if it can't, will fall by the roadside.  Change can be due to many factors, such as new legislation, new standards, new business opportunities, the need to support a larger volume of business, or the need to move to a new platform (for example, when an operating systems is no longer supported by the vendor).  The introduction of a software system into an environment that didn't have one before changes that environment; that change ripples to create new requirements.  Perhaps initially the system was only meant to replace manual work, as when using computers to track customer orders instead of doing it on paper.  But once the system is in place, users realize that it can do much more; for example, letting customers enter their own orders through the internet.  Adding that functionality may trigger the idea that customers may now be able to track the order fulfillment process through the internet.  Every change is the catalyst for another one.

There is a common misconception that software is easy to change.  It is certainly true that distributing software updates is much easier (and cheaper) than changing hardware.  However, making changes in software correctly while preserving previous functionality and without introducing new bugs is very difficult, as all experienced software developers know.  (In spite of that, they are often content to get their software more-or-less working, and rely on the users to discover problems and on future releases to fix them; see Programmers as Children.)

Obviously, it makes sense to prepare for future changes, so as to make the inevitable changes as easy to make as possible.  Some requirement changes, however, are beyond prediction, and many predictions that seem obvious at the time turn out to be wrong.  These unfortunate facts have far-reaching implications on all aspects of software development.  They mean that it is impossible to capture the precise system requirements, since these will change unpredictably.  They also mean that it is impossible to prepare for all future changes, since every design decision renders some future modifications easier while making others more difficult.  But they do not mean that we should just build for known requirements and ignore the possibility (or rather, inevitability) of future changes.

The Agile Development community takes an extreme view of our inability to predict future changes, and uses the rule "do the simplest thing that could possibly work" (abbreviated DTSTTCPW).  This rule is based on the assumption that keeping the code as simple as possible will make future changes easier; it also assumes the meticulous use of refactoring to keep the code as clear as possible while making changes.  (I have a lot to say about refactoring, but that will have to wait.)  However, there are some decisions that are very difficult to back out of, and I believe that most agilists will agree that planning ahead on these decisions does not violate DTSTTCPW.

One of the earliest decisions that must be made, and one that has a profound influence on all aspects of the development process, is about tooling, including the programming languages to be used.  Changing the language after a considerable amount of code has been written is very difficult.  Changing languages in a system that has been in operation for over twenty years and contains several million lines of code is almost impossible; this almost amounts to a complete rewrite of the system.  Much of the knowledge about many such systems has been lost, and the original developers are probably doing other things, possibly in other companies, and may even be retired.  Yet there are many large-scale critical systems that are over twenty and even thirty years old.

This received world attention in the 1990s, with the famous Year 2000 (or Y2K) Problem.  Programmers in the 1960s were worried about the size of their data on disk, since storage sizes were much much smaller in those days.  It seems incredible that today you can get a terabyte disk for well under $100; in those days you would be paying by the kilobyte.  So the decision to represent dates using just two digits to represent the year was very natural at the time.  In fact, it can be argued that it was the right decision for the time.  It was inconceivable that these systems would last until the year 2000; it was obvious that they will have to be replaced well before then.  Programmers kept to this design decision during the 1970s, 1980s, and, incredibly, even sometimes in the 1990s, although by then storage sizes have grown, prices came way down, and 2000 was not the far future any more.

To the amazament of the 1960s programmers, some of the systems they wrote then are still in operation today.  Naturally, these survivors contain some of the most critical infrastructure used by banks, insurance companies, and government bodies.  All of these had to be revamped before 2000 (actually, some even before 1998; for example, credit cards issued in 1998 expired in 2000).  This cost an incredible amount of money, and quite a few companies made their fortunes from Y2K remediation.  The fact that this effort was largely successful, and none of the doomsday predictions made before 2000 came to pass, should not detract from the importance of the problem and the necessity of fixing it.

Now switch points of view, and think twenty years into the future.  The code we are writing today is the legacy of the future; what kind of problems are we preparing for ourselves or the people who will be saddled with these systems after us?  I claim that we are being as incosiderate as the programmers of the 1960s, and with less justification.  After all, we have their lesson to learn from, whereas they were pioneers and had no past experience to draw on.

These days, we are building systems using a hodgepodge of languages and technologies, each of which is rapidly changing.  A typical large-scale project will mix code in Java for the business logic, with Javascript for the browser front end, SQL for the back-end database, and one or more scripting language for small tasks.  A legacy Cobol system may well be involved as well.  The Java code will be built on top of a number of different frameworks for security, persistency, presentation, web services, and so on.  Each of these will change at its own pace, necessitating constant updates to keep up with moving technologies.  How will this be manageable in twenty or thirty years?

Think of a brand new application written by a small startup company.  They are very concerned about time to market, but not really thinking twenty years into the future (after all, they want to make their exit in five years at the most).  And their application is quite small, really, just perfect for a scripting language such as Perl, Python, or Ruby on Rails.

If they fail, like most startups, no harm has been done.  But suppose their idea catches on, and they really make it.  They keep adding more and more features, and get more and more users.  Suddenly they have many millions of users, and their system doesn't scale any more.  Users can't connect to their site, transactions get dropped, and revenue is falling.  Something must be done immediately!  The dynamic scripting language they used was very convenient for getting something running quickly, because it didn't require the specification of static types, classes could be created on the fly, and so on.  But now that the application consists of 20 million lines of code, it is very difficult to understand without the missing information.

This scenario is not fictional; many successful internet companies went through such a phase.  In fact, even infrastructure tools went through such changes.  For example, consider PHP, one of the most successful web scripting languages today.  PHP 4 was downward compatible with PHP 3, although it had a number of serious problems, because it was deemed inappropriate to disrupt the work of tens of thousands of developers.  PHP 4 was wildly successful, with the result that when incompatible changes had to be made in PHP 5, hundreds of thousands of developers were affected, and adoption of PHP 5 was delayed.

I think that many times developers use the wrong tool for the job (see The Right Tool for the Job) because it looks like the right tool when development starts.  Changing circumstances then make it the wrong tool, but switching tools can be extremely costly.  With hindsight, it was obviously a mistake to base the whole application on a scripting language in the scenario described above, even though it looked like a good idea at the time.  Can we use foresight to avoid such mistakes in the future?