Friday, July 17, 2015

Programming by Thinking

In my second post (Programmers as Children) I complained about the fact that programmers work with their fingers instead of using their brains.  This is gradually getting worse instead of better, with growing academic research concentrating on the use of examples from the internet, search tools to find such examples for specific questions, and so on.  Of course, there are many websites containing advice for various programming domains, and the examples they provide are often quite useful.  However, copying these into your own code without understanding what they do, and what are the wider implications (for example, on performance and security), is a recipe for disaster.

The April 2015 issue of Communications of the ACM contains several articles that should be required reading for every software developer.  The first is Leslie Lamport's viewpoint article, "Who Builds a House without Drawing Blueprints."  No engineering discipline proceeds to build anything without first analyzing the requirements, making initial designs and analyzing them from various points of view (such as cost, safety, standards, and legal implications) -- in other words, thinking deeply about the problem and possible solutions.  But in the (arguably misnamed) discipline of software engineering, starting with coding is all too often the norm, and is even praised as a methodology.  (I am not implying that any of the agile methodologies advocates that; quite the contrary!  However, agile methods are often misunderstood as "don't think before you code.")

Lamport argues that thinking is necessary before coding, and that writing specifications is an excellent way to focus your thinking.  The specifications need not always be formal, although Lamport makes a good case for using formal specifications, and using tools to check them.  Specifically, TLA+ is a formal specification language created and used by Lamport himself in various projects.

You may be inclined to dismiss this as only academic, and, unfortunately, this is the too-prevalent view.  However, you may learn otherwise if you read another article in the same issue, "How Amazon Web Services Uses Formal Methods."  This article describes how TLA+ was used in several critical Amazon projects, and discovered bugs that the authors admit they could not have found otherwise.  In some cases, Amazon engineers used TLA+ successfully on their projects after two weeks of self-learning.

Of course, getting the specifications right does not mean that the code will be correct as well.  Tool support for verifying the code against its specification would be very helpful, and this is an area of active research.  However, not getting the specifications right is almost a guarantee that the code will have bugs.  This is particularly true for large concurrent systems, which are a source of extremely subtle bugs that programmers have great difficulties understanding and predicting.

And while you have the April issue in your hands, I recommend you also read the article "Security Challenges for Medical Devices."  It discusses safety, security,  and privacy issues with various kinds of medical devices.  Even hospital information systems have such issues, but they are most critical with implanted devices such as pacemakers.  Modern implantable devices communicate wirelessly to provide information they collect, and sometimes they can even be programmed wirelessly.  This creates health hazards from mistakes but also deliberate attacks.  Even devices that only provide information are dangerous if this information is modified anywhere along the way from the device to the physician (including in IT systems that store or transmit this information), since this could lead to the wrong treatment being applied.

Security cannot be added as an afterthought; it must be designed-in from the beginning.  While not all systems have the same security implications as medical devices, the whole chain is as strong as the weakest link, which implies that security is crucial to more systems than would be immediately obvious.  This helps drive home Lamport's point.  Think before you code!

#formalmethods #specifications

Sunday, January 11, 2015

Functional Programming in Mainstream Languages, Part 7: Higher-Order Functions in Haskell

(Jump to Table of Contents)
I will finish the survey of higher-order functions in mainstream languages with Haskell, a fully-functional language.  (There's a lot of talk about the imperative features of Haskell, but don't let that mislead you; Haskell is truly functional.)  Haskell comes from a different tradition, and its notation may take some getting used to.  However, if you are really into functional programming, you can do amazing things with very short Haskell programs, which (unlike APL, for example) are still comprehensible if you understand the idioms.

Of all the languages I surveyed before, Scala is closest in spirit to Haskell, so I will contrast Haskell with Scala instead of Java 8 as I've been doing before.  Unlike all other examples we saw before, type declarations in Haskell are separate from value definitions, and are optional in most cases, since the compiler performs extensive type inference.  In all the examples below I will give the type declarations explicitly, but they are not required.  In fact, the way I produced them was by copying them from the compiler warnings.  They are shown in a different color to emphasize the fact that you don't need to write them yourself.

In Haskell, curried functions are the norm, and the syntax makes it easiest to write curried functions.  In fact, each function in Haskell always has a single argument, although it is possible to simulate multiple arguments using tuples, as I will show below.  But first, here is the curried version:

def makeAdder = (x: Int) => (y: Int) => x + y def increment = makeAdder(1) def res = increment(4)
makeAdder :: (Num a) => a -> a -> a makeAdder x y = x + y increment :: (Num a) => a -> a increment = makeAdder 1 res :: Integer res = increment 4

The most striking syntactic feature is that function arguments do not need to be enclosed in parentheses.  This is actually the norm in the lambda calculus and derivative notations, and is very convenient given that each function can only have a single argument.  It is possible to use lambda notation (which we will see below) to define makeAdder, but it isn't necessary in this case.

The makeAdder function in Haskell, unlike all the other examples we saw before, is exactly equivalent to the function +; both are curried and take one argument.  Thus, the above code is equivalent to the following:

increment :: (Num a) => a -> a increment = (+) 1 res :: Integer res = increment 4

The parentheses around the plus symbol are required since by default Haskell binary operators use the common mathematical notation where they have two arguments surrounding the operator symbol.  (This is just syntax, it doesn't change the fact that the functions are curried.)

Looking at the types, from the simplest (the last) to the most complicated (the first), we see that the type of res is simply Integer.  However, if we give increment an argument of a different type, the result will be the corresponding type:

real :: Double real = increment 0.5

This is possible since the type of the increment function is parameterized.  Before the double arrow (=>) is a type constraint; we'll look at that in a minute.  The type of increment is a -> a, where a is a type parameter.  This means that the function takes one argument of type a, and returns a result of the same type.  However, the function applies the + operator to its argument (indirectly through makeAdder), and therefore the type a must be appropriate as an argument to +.  This is the purpose of the type constraint (Num a); it restricts a to be a numeric type, which (among other operations) supports +.

Both Integer and Double are numeric types, so the type of increment includes both Integer -> Integer and Double -> Double.  When functions of such parameterized types are applied to actual arguments, it may be possible to deduce an actual type for the type parameters, as happened in the two examples above.

The type of makeAdder is more of the same: a -> a -> a is the type of a function that takes a single argument of type a, and returns a function of type a -> a, that is, a function (like increment) that takes a single argument of type a and returns a result of the same type.  Again, this has the constraint that a must be a numeric type.

As is usual in type theory, the -> type constructor is right-associative, so that the type a -> a -> a is the same as a -> (a -> a).  With curried functions, this is often what you want, since curried functions take one argument and return a function.  Of course, if this argument is itself a function (which is not unusual), you will need to put parentheses around its type.

In contrast, function application binds to the left; for example, makeAdder 1 4 is the same as (makeAdder 1) 4, which is the same computation as our increment 4 above and returns 5.  The reason for making function application left-associative is the same as that for making the type operator right-associative: this is the common case with curried functions.

If you really want the uncurried version of makeAdder, you would use a tuple containing the two arguments.  Tuples are written using the conventional mathematical notation of a sequence of comma-separated values inside parentheses: (1, 4).  Here is the uncurried version, which can simply be called add:

add :: (Num a) => (a, a) -> a add (x, y) = x + y

As you can see, the type now represents a function that takes a tuple containing two elements of type a, and returns a result of type a.  A very useful feature of Haskell, common to many other functional languages (and some languages with a strong functional subset, like Scala), is pattern matching.  In Haskell, a formal parameter can have the syntactic structure of a constructor for any type (and, in fact, such constructors can be nested).  In this case, the parameter of add is a constructor for a tuple containing two values, x and y, and has the parenthesized form (x, y).  This means that the function can only accept tuples of this form, and when it does, the first element will be called x and the second will be called y.  This form is deceptively similar to the usual mathematical notation, but keep in mind that this is not natural for Haskell and interferes with various Haskell idioms.  The norm in Haskell is to use curried functions; any other use requires careful justification.

Having gone through the preliminaries, here is the fixed-point function:

def fixedpoint(f: Double => Double, v: Double): Double = { val next = f(v) if (Math.abs(next - v) < epsilon) next else fixedpoint(f, next) }
fixedpoint :: (Ord a, Fractional a) => (a -> a) -> a -> a fixedpoint f guess =   let next = f guess in       if abs(guess - next) < epsilon       then next       else fixedpoint f next

The let syntax binds one or more variables in the context of an expression (which follows the keyword in).  As in Scala, if in Haskell is an expression that returns one of two values depending on the condition.  The parenthesis on the argument to the abs function are required; without them, the expression would be parsed as (abs guess) - next.

The type of fixedpoint is (a -> a) -> a -> a, using only the required parentheses; this could equivalently be written as (a -> a) -> (a -> a), which makes it clearer that fixedpoint takes a function from a to a and returns another such function.  The type a is constrained to be ordered (because of the use of <), as well as fractional (because it is compared with epsilon, which is a Double).

In Haskell, of course, recursion is the norm.  It is not impossible to write this as a loop, but it is too cumbersome to even try.  Instead, I want to show a more abstract version of this function, in which the separate parts are given meaningful names; this is actually closer to the original Scheme example.

fixedpoint :: (Ord a, Fractional a) => (a -> a) -> a -> a fixedpoint f guess =   let closeEnough v1 v2 = abs(v1 - v2) < epsilon       try guess =           let next = f guess in               if closeEnough guess next               then next               else try next   in try guess0

Here, the decision of whether the guess is close enough to the desired value is abstracted into the function closeEnough, and the recursion is encapsulated into the function try.  As usual, the internal functions have access to enclosing scopes, so they can refer to f and epsilon.

Lambda notation in Haskell uses the backslash as the ASCII character typographically closest to lambda.  A single arrow is used instead of Scala's double arrow to separate the body from the argument list.  Using this notation, here is the non-terminating version of sqrt:

def sqrt(x: Double) = fixedpoint(y => x / y, 1.0)
sqrt :: Double -> Double sqrt x = fixedpoint (\y -> x / y) 1.0

The terminating version is of course very similar:

def sqrt(x: Double) =     fixedpoint(y => (y + x / y) / 2.0, 1.0)
sqrt :: Double -> Double sqrt x = fixedpoint (\y -> (y + x / y) / 2.0) 1.0

The generalization of the average-damp procedure and its use should hold no surprises now:

def averageDamp(f: Double => Double) = (x: Double) => (x + f(x)) / 2.0 def sqrt(x: Double) = fixedpoint(averageDamp(y => x / y), 1.0)
averageDamp :: Fractional a => (a -> a) -> a -> a averageDamp f x = (x + f x) / 2.0
sqrt :: Double -> Double sqrt x = fixedpoint (averageDamp (\y -> x / y)) 1.0

Haskell takes some getting used to for people used to imperative or object-oriented languages.  However, it you are ready to take the commitment to pure functional programming, Haskell would be the obvious choice.

Remember that we have only discussed functional abstraction up to this point; there is much more to functional programming (even in mainstream languages), and I will address that in future posts. #functionalprogramming #haskell