Monday, May 30, 2011

The Weakest Link

The May 2011 issue of Communications of the ACM contains an article called Weapons of Mass Assignment by Patrick McKenzie.  The article describes McKenzie's security analysis of the pre-release source code of Diaspora, an attempt to build a "privacy-aware Facebook alternative".  The Diaspora home page makes the following claim: "Inherently private, Diaspora doesn’t make you wade through pages of settings and options just to keep your profile secure."  Unfortunately, McKenzie was "struck by numerous severe security errors", which were "serious enough to jeopardize the goals of the project."

The first kind of errors McKenzie reports are a result of confusion between authentication and authorization.  Authentication means that the system has identified the user using some secure mechanism, so that nobody else can impersonate that user.  Authorization, on the other hand, refers to the set of operations that a user can legally perform.  Naturally, authorization is meaningless without authentication, since you can't refer to the operations a user is permitted to perform without authenticating that user.  But authentication isn't enough; once you know who the user is, you should allow him to perform only operations he is authorized to perform.

In the code release that McKenzie examined (the "pre-alpha developer review"), an authenticated user could delete photos based on their IDs.  If you knew the ID of somebody else's photo, you could delete it, and discovering photo IDs was easy.  Here is an example of a URL (from another source) that contains IDs: http://ark.intel.com/Product.aspx?id=52215&processor=i7-2600S&spec-codes=SR00E.  The part that follows the question mark (shown here in bold) contains a list of parameters and values, which the server-side script accesses to figure out what to do.  When you see such URLs, you can often figure out what the IDs refer to, and you can replace IDs in them to create various effetcs.  In Diaspora, for example, you could delete one of your own photos to see what that request URL was like, and then replace the photo ID to delete somebody else's photo.

Diaspora uses Ruby on Rails, and suffers from the security vulnerabilities of that platform.  In particular, McKenzie points to the Rails feature of "mass update", which is a convenient but dangerous programming tool.  It takes a list of keys and associated values and calls the related accessor method for each key.  This allows an attacker to add keys that the developer didn't intend to be modified, thus creating serious security vulnerabilities such as allowing one user to take over another's account.  If mass update is used on values received from an HTML form (which is often the case), all an attacker needs to do is modify the form to send malicious key-value pairs.  Because the form is presented on the attacker's machine, this is an easy task.

McKenzie details several other vulnerabilities, which allow attackers to discover or replace another user's encryption key, thus allowing them to decipher all that user's previous and future communications with the server.  This is quite serious in an application whose first priority was security and privacy.  A chain is only as strong as its weakest link, and this applies first and foremost to security, where an attacker will deliberately go after the weakest link.  The fact that Rails has mass update enabled by default should make you suspicious of any application built on Rails, since it requires positive action by security-conscious developers to disable mass update.  The more issues developers need to be careful of, the greater the chance that they will miss some, creating security vulnerabilities.  This makes me question whether Rails is a good tool for developing secure web applications (see The Right Tool for the Job).  This is unfortunate in tool whose main purpose is to make web development easy.

It remains to be seen whether Diaspora will finally be secure.  McKenzie ends his article on a positive note: "This is not a reason to despair, though: every error fixed or avoided through improved code, improved practices, and security reviews offers incremental safety to the users of software and increases its utility."  I am less optimistic: an application that boasts its security needs to designed for security from the first, not debugged and reviewed to decrease its vulnerabilities after the fact.

Sunday, May 22, 2011

Covariance and Contravariance

Type declarations on method parameters and return values are also constraints on the possible values that can be passed between a method and its caller.  As such, they form part of the contract for the method.  This is, in fact, a required part of the contract in statically-typed languages.  (Unfortunately, this is typically the only part of the contract that gets written.)

Suppose you have a class called Complex representing complex numbers, and a method that adds two numbers and returns the result.  In an object-oriented fashion, the method signature will be Complex add(Complex other), and a call c1.add(c2) will add the numbers c1 and c2 and return the sum.  The type declaration "Complex other" in the method signature is a precondition that requires callers to pass only objects of type Complex to this method, and the declaration that the result is Complex is a postcondition ensuring that only Complex objects will ever be returned by the method.

Normally, we don't really think of type declarations as a contract, since they are enforced by the compiler and no further action is required by the programmer to ensure that the contracts are indeed satisfied.  But if you think of type declarations as contracts in the context of Liskov's Behavioral Subtyping Principle, they have interesting consequences.  One is that subclasses are only allowed to strengthen postcondition but never weaken them.  This means that subclasses may return more specialized types than Complex.  This is not surprising; it is quite likely that there are several implementations of complex number, all of which are subtypes of Complex; for example, ComplexAsCartesian and ComplexAsPolar, referring to the two most common representations of complex numbers.  It is perfectly permissible for the add method of each subclass to return elements of that subclass (although that is not always the best policy).  But can you redefine the return type of ComplexAsCartesian.add to be ComplexAsCartesian rather than just Complex as in the superclass?  The answer depends on the programming language, and even on the language version.  For example, Java originally didn't allow any changes in method signatures, but since Java 5 it is possible to redeclare more specific types (that is, subtypes) for method return values.

This kind of change, where subclasses use subtypes of those used in their superclasses, is called a covariant change, since the direction of change is the same: as you go down the inheritance hierarchy, types get more specialized (that is, also go down).  Covariance is very natural to use in practice.  In addition to the example above, you can think of modeling artifacts (physical or virtual) that need to be connected in various ways.  Often, as you become more specific in one dimension, you need to be more specific in others.  For example, every laptop needs a power supply.  But specific laptops have dedicated power supplies, and you can't use any power supply with any laptop.  So the most abstract Laptop class may define an attribute called powerSupply, whose type is PowerSupply.  But an ABCLaptop requires an XYZPowerSupply, so the type of the power supply goes down the hierarchy as the type of the laptop goes down.

Covariance is all very well for method return types, where it fits the Behavioral Subtyping Principle.  But it contradicts the principle for method arguments.  The type of a method argument is like a precondition, since it constrains the values that the caller can pass to the method.  Preconditions may only be weakened in subtypes; this means that argument types can only be generalized in subtypes.  This is called contravariance, since argument types to up the hierarchy in subtypes.  So here we have a conflict between what is theoretically correct (contravariance for argument types) and what is useful (covariance).

There is a large debate on this issue, and different languages treat it differently.  None of C/C++, Java, or C# allow any change in argument types (this is called invariance).  This is, in fact, unrelated to the issue of covariance versus contravariance; it is a consequence of the fact that these languages support overloading (see Overriding and Overloading).  As a result, any change in method argument types defines an overloaded method, which is unrelated to the original method.  Eiffel follows the practical route, supporting covariance of argument types, and incurring the inevitable cost.  This cost is the lack of compile-time type checking for these cases.  In statically-typed languages, including all the ones mentioned above, we expect the compiler to type-check the program so that no type violations will occur at runtime (with some other notable exceptions, such as type casting and some uses of generics in Java).  Covariance creates another exception to this rule.

Of course, preventing covariance in the language doesn't solve anything, since in practice it is still necessary.  What programmers do when the language doesn't support covariance is to write their own tests in subclasses, checking for the correct subtypes and throwing exceptions when they are incorrect.  So even though the program is fully type-checked, there are still type-related exceptions at runtime.  But because these are programmed manually rather than being thrown by the language runtime, language designers feel it isn't their responsibility.  This ostrich effect is, of course, completely pointless.

Thursday, May 5, 2011

Abstract Preconditions

Returning to the example of the Set type, suppose now that its implementation ArraySet uses a fixed-size array to hold the set elements, and is therefore limited in size (or "bounded").  The maximum size, called the set's capacity, is given when the ArraySet object is constructed and the array is allocated.  The method ArraySet.add, which can add a new element to the set, now needs to have a postcondition size() < capacity(), where size() returns the current size of the set and capacity() returns the maximum size.

Side note: in such cases, it is common to provide the size() method, but less careful designers don't provide capacity().  The assumption is that you ought to know the capacity, since you gave this value when creating the set.  But, of course, you may be using a set created by someone else.  Writing contracts often alerts you to the necessity of adding such informative methods.

According to Liskov's Behavioral Subtyping Principle (see The Correct Use of Inheritance), ArraySet may not strengthen a method precondition.  Therefore, Set must also declare the same precondition (or a stronger one) on the add method.  In general, however, sets don't need to be bounded, and therefore this precondition is inappropriate there.  What is wrong?  The first thing to question is the inheritance hierarchy: is ArraySet really a subtype of Set?  More generally, are bounded sets types of sets?  At first blush, the answer is obviously yes.  But on deeper thought, it seems that we have our abstractions a little bit off.  There are really three concepts involved here: general sets, bounded sets, and unbounded sets.  Both bounded and unbounded sets are kinds of sets; bounded sets have the size() < capacity() precondition, while unbounded sets don't.  Again, it seems like general sets must have this precondition as well, to comply with the Behavioral Subtyping Principle.

Certainly, general sets need to have some precondition that subsumes the boundedness condition, but it doesn't need to have the same form.  The abstract meaning of the precondition is that the set may not be full when adding a new element.  In this form, the precondition is applicable to any set; bounded sets may sometimes be full, while unbounded sets will never be full.  So we need to add  to the Set type, which represents general sets, a method full() that returns a boolean value.  The precondition on Set.add will be !full().  At this level, this precondition is abstract: its precise meaning is unknown.  A client that uses an object of this type therefore needs to check that the set isn't full before adding an element.  The implementation of Set.full may provide some default value, or may be absent, in which case the Set type itself is abstract.

ArraySet.full overrides Set.full to return the value of the expression size() < capacity().  In addition, this method has a postcondition $ret == size() < capacity().  Clients of ArraySet now know the precise condition under which they are permitted to call the add method.  In contrast, unbounded implementations of Set, such as HashSet, override Set.full() to return false, with a postcondition $ret == false (or !$ret).  Clients of HashSet therefore know that the precondition on add is really vacuous: it will never be true, and therefore they are allowed to call add under all circumstances.

If the concepts of bounded and unbounded sets are deemed important enough, it is possible to define them as classes (or interfaces) in their own right, and put the relevant contracts there.  In this way, the missing abstractions are crystallized in the design, with precise contracts that obey the Behavioral Subtyping Principle without forcing a formulation of the precondition that is specific to a subtype.