The Dusty Deck: November 2014

Friday, November 28, 2014

Functional Programming in Mainstream Languages, Part 4: Higher-Order Functions in JavaScript

(Jump to Table of Contents)

JavaScript is the only universal language for client-side programming for web applications. If you need to write anything that will run in the browser, JavaScript is currently your only option. That is unfortunate, since JavaScript is an amalgam of some good parts and some truly horrible parts. (See the book JavaScript: The Good Parts for more details.)

Among the good parts of JavaScript is its functional part, which is widely used to create callbacks; but it can also be used for functional-programming techniques in general. For a fascinating but advanced presentation of the elegant way in which NetFlix uses functional programming in JavaScript to perform asynchronous processing, see the ACM webinar titled "August 2014: Async JavaScript at Netflix."

In this part of the series, I will show the examples of higher-order functions from Part 3 in JavaScript, and contrast it with the Java 8 implementation. For the Scheme and Java 7 versions, see Part 3. First, however, is the simple example from Part 2:

Java 8

IntFunction<IntUnaryOperator> makeAdder = x -> y -> x + y;
IntUnaryOperator increment = makeAdder.apply(1);
int res = increment.applyAsInt(4);

JavaScript

var makeAdder = function(x) {
    return function(y) {
        return x + y;
    }
}
var increment = makeAdder(1);
var res = increment(4);

As you can see, the JavaScript syntax for defining functions is more verbose but not unreasonable. In contrast, function calls use the familiar mathematical notation without requiring a spurious method call.

Ecma International publishes the standards for the JavaScript language, under the name ECMAScript. The draft ECMAScript 6 standard (code named Harmony) introduces "arrow functions"; these are very similar to regular JavaScript functions but use an alternative definition syntax (and improve the semantics in small but significant ways). For comparison, here is the same code using this notation:

ECMAScript Harmony (still in draft status)

var makeAdder = x => y => x + y;
var increment = makeAdder(1);
var res = increment(4);

This is very similar to Java 8, except for the use of the => symbol instead of ->. As in Java 8, this is the simplest syntax for defining functions; there are variants for more complex cases. For example, if the function has more (or less) than one argument, the argument list must be placed in parentheses. Unlike Java 8, of course, ECMAScript has no syntax to specify the types of the arguments or the returned value. There are other differences as well; see the proposed draft for further details.

Here is the recursive fixed-point function:

Java 8

public double fixedpoint(DoubleUnaryOperator f, double v) {
    double next = f.applyAsDouble(v);
    if (Math.abs(next - v) < epsilon)
        return next;
    else
        return fixedpoint(f, next);
}

JavaScript

function fixedpoint(f, v) {
    var next = f(v);
    if (Math.abs(next - v) < epsilon)
        return next
    else
        return fixedpoint(f, next);
}

These are very similar, except for the lack of types in JavaScript. Here is the imperative version of fixedpoint. JavaScript, like Java, doesn't guarantee that tail-recursion optimization will always be performed, and this version is more natural for JavaScript. For these reasons, I find this acceptable practice even when using functional-programming techniques in JavaScript, provided that the variables being modified aren't used in any other context.

Java 8

public double fixedpoint(DoubleUnaryOperator f, double v) {
    double prev = v;
    double next = v;
    do {
        prev = next;
        next = f.applyAsDouble(next);
    } while (Math.abs(next - prev) >= epsilon);
    return next;
}

JavaScript

function fixedpoint(f, v) {
    var next = v;
    var prev = v;
    do {
        prev = next;
        next = f(next)
    } while (Math.abs(next - prev) >= epsilon);
    return next;
}

Next is the naive non-terminating version of sqrt that uses fixedpoint:

Java 8

public double sqrt(double x) {
    return fixedpoint(y -> x / y, 1.0);
}

JavaScript

function sqrt(x) {
    fixedpoint(function(y) {
        return x / y;
    }, 1);
}

The boilerplate code is starting to get in the way here (although not as much as in Java 7, due to the lack of types). Still, it's becoming difficult to recognize the syntactic role of the constant 1. This annoyance should disappear with the release of ECMAScript Harmony.

The terminating version of sqrt that uses average damping implicitly looks like this:

Java 8

public double sqrt(double x) {
    return fixedpoint(y -> (y + x / y) / 2.0, 1.0);
}

JavaScript

function sqrt(x) {
    fixedpoint(function(y) {
        return (y + x / y) / 2;
    }, 1);
}

Finally, here is the average-damp operator and the implementation of sqrt that uses it explicitly:

Java 8

public DoubleUnaryOperator averageDamp(DoubleUnaryOperator f) {
    return x -> (x + f.applyAsDouble(x)) / 2.0;
}
public double sqrt(double x) {
    return fixedpoint(averageDamp(y -> x / y), 1.0);
}

JavaScript

function averageDamp(f) {
    return function(x) {
        return (x + f(x)) / 2;
    }
}
function sqrt(x) {
    return fixedpoint(averageDamp(function(y) {
        return x / y;
    }), 1);
}

In summary, higher-order functions are quite natural in JavaScript, with a syntax that is somewhat cumbersome but still manageable. One of the reasons it is simpler than Java 7 is the lack of types, which is a result of the lack of static typing in JavaScript as a whole. An interesting compromise is the language TypeScript, which supports gradual typing (see my review in a previous post) but compiles into JavaScript. TypeScript uses the arrow-function notation proposed for ECMAScript, so it allows very concise function definitions that do contain types. I will address TypeScript in a future post.

Coming up in this series: Scala, Xtend, and Haskell!

#functionalprogramming #javascript

Thursday, November 27, 2014

Functional Programming in Mainstream Languages, Part 3: Higher-Order Functions in Java 7 and 8

(Jump to Table of Contents)

Having seen (in the part 2 of this series) the basic syntax for expressing functions in Java 8, we will now explore some functional-programming techniques and compare how they are expressed in Scheme, Java 7, and Java 8. The examples are adapted from the book Structure and Interpretation of Computer Programs (2nd Edition) by Abelson and Sussman.

The following function attempts to compute an approximation to a fixed point of a given function f; that is, a value x such that x = f(x). The computation starts from an initial guess c, and repeatedly applies the function f to it, to get the series f(c), f(f(c)), f(f(f(c))), .... If this process converges, the result is close to a fixed point, since convergence means that applying f one more time returns (almost) the same value. However, the process may not converge at all, either because f has no fixed points at all, or because the process oscillates without finding a fixed point.

Here are the implementations of this function in the three languages:

Scheme

(define (fixed-point f first-guess)
  (define (try guess)
    (let ((next (f guess)))
      (if (< (abs (- guess next)) epsilon)
          next
          (try next))))
  (try first-guess))

Java 7

public double fixedpoint(UFunc<Double, Double> f, double v) {
    double next = f.apply(v);
    if (Math.abs(next - v) < epsilon)
        return next;
    else
        return fixedpoint1(f, next);
}

Java 8

public double fixedpoint(DoubleUnaryOperator f, double v) {
    double next = f.applyAsDouble(v);
    if (Math.abs(next - v) < epsilon)
        return next;
    else
        return fixedpoint1(f, next);
}

Functional implementations usually use recursion instead of loops; the Scheme implementation defines an internal recursive function try, which computes the next value in the series and checks whether it is already close enough to the previous one to consider the series to have converged. If so, it returns the last value in the series; if not, it continues recursively to try the next value.

The two Java implementations are quite similar, except for the types. For the Java 7 implementation, I created the following interface to represent unary functions from a domain D to a range R:

Java 7

public interface UFunc<D, R> {
    R apply(D arg1);
}

As I explained in the previous post, Java 8 supplies a large set of interfaces to describe functions; DoubleUnaryOperator is the type of functions that take a double and return a double.

There is a fundmemtal difference between functional languages (including Scheme) and most other languages, including Java. In the former, tail-recursive calls are converted by the compiler into jumps that don't add a frame to the execution stack. This means that such calls behave like loops in terms of their memory consumption. (For more details see Abelson and Sussman.) However, the latter languages don't guarantee this property. As a result, the Java implementations above may use stack space proportional to the number of applications of the function required for convergence.

A more natural style for implementing these methods in Java is imperative rather than functional:

Java 7

public double fixedpoint(UFunc<Double, Double> f, double v) {
    double prev = v;
    double next = v;
    do {
        prev = next;
        next = f.apply(next);
    } while (Math.abs(next - prev) >= epsilon);
    return next;
}

Java 8

public double fixedpoint(DoubleUnaryOperator f, double v) {
    double prev = v;
    double next = v;
    do {
        prev = next;
        next = f.applyAsDouble(next);
    } while (Math.abs(next - prev) >= epsilon);
    return next;
}

While functional-programming purists frown at this style, which modifies the values of the variables prev and next, this style is more natural for Java, and uses constant space on the stack. For those reasons, I find it acceptable when using functional techniques in Java, provided that the changes are confined to local variables (rather than fields), and, in particular, locals that aren't referenced in any other method (such as those in inner classes) or function.

We can now try to use the fixedpoint function to compute square roots, using the fact that y is a square root of x if y = x/y; in other words, if y is a fixed point of the function λy.x/y:

Scheme

(define (sqrt x)
  (fixed-point (lambda (y) (/ x y)) 1.0))

Java 7

public double sqrt(final double x) {
    return fixedpoint(new UFunc<Double, Double>() {
        @Override
        public Double apply(Double y) {
            return x / y;
        }
    }, 1.0);
}

Java 8

public double sqrt(double x) {
    return fixedpoint(y -> x / y, 1.0);
}

Here we need to create an anonymous function ("lambda"), and here the difference between Java 7 and Java 8 becomes very clear. In Java 7 we need to create the function as an anonymous inner class, with all the necessary boilerplate code, which obscures the essentials of what we are trying to do. (For example, it is quite difficult to figure out that 1.0 is the second parameter to the fixedpoint method.) In contrast, in Java 8 we can just use the new lambda notation, and the meaning is immediately clear.

In both versions of Java, it is only possible to refer in inner methods to variables defined in enclosing constants if these variables are final; that is, if they can't be modified. In Java 8 the variables doesn't need to be declared final, but it needs to be "effectively final," which means that it is never changed. (It is as though the compiler added the final modifier itself; this is similar to the type inference done by the Java 8 compiler.)

The reason for this restriction is that local variables have a lifetime (sometimes called "extent") that coincides with the lifetime of the method in which they are defined. In other words, when the method returns, these variables are no longer accessible. This is done so that the variables can be allocated on the stack rather than on the heap, avoiding the need for garbage-collecting them. However, an object containing an inner method that refers to these variables may be used after the method that generated the object has already returned. If (and only if) the variables are final is it possible to copy their values to the new object so that references to their values are still valid. This is a poor-man's version of a closure, which is an object containing a function pointer together with all accessible variables in outer scopes. As we will see in a later post, Scheme (like other dialects of Lisp) supports full closures, which also allow changing bound variables.

Unfortunately, the method above doesn't work in any language. The problem is that the computational process oscillates between two values and doesn't converge (try it for x=2). One value is too high, and the other is too low. However, if we take their average, the process converges; the following implementations all compute the square root:

Scheme

(define (sqrt x)
  (fixed-point (lambda (y) (/ (+ y (/ x y)) 2.0) 1.0))

Java 7

public double sqrt(final double x) {
    return fixedpoint(new UFunc<Double, Double>() {
        @Override
        public Double apply(Double y) {
            return (y + x / y) / 2.0;
        }
    }, 1.0);
}

Java 8

public double sqrt(double x) {
    return fixedpoint(y -> (y + x / y) / 2.0, 1.0);
}

It turns out that this is a general technique, called average damping. It takes any function f and returns the function λx.(x+f(x))/2; this function has exactly the same fixed points as f. In certain cases, however, the computational process of the fixed-point function converges with the new function when it diverges on the original function. This technique is easily specified as a program:

Scheme

(define (sqrt x)(define (average-damp f)
  (lambda (x) (average x (f x))))

Java 7

public UFunc<Double, Double> averageDamp(final UFunc<Double, Double> f) {
    return new UFunc<Double, Double>() {
        @Override
        public Double apply(Double x) {
            return (x + f.apply(x)) / 2.0;
        }
    };
}

Java 8

public DoubleUnaryOperator averageDamp(DoubleUnaryOperator f) {
    return x -> (x + f.applyAsDouble(x)) / 2.0;
}

Again, the Java 7 version is obscured by the boilerplate code (not to mention the annoying repetitions of the template type). Here is the definition of sqrt that uses average damping (this time I will spare you the agony of the Java 7 version):

Scheme

(define (sqrt x)
  (fixed-point (average-damp (lambda (y) (/ x y))) 1.0))

Java 8

public double sqrt(double x) {
    return fixedpoint(averageDamp(y -> x / y), 1.0);
}

Computationally, this is exactly the same as the previous version, except that we use the original function λy.x/y after applying the average damp operator to it.

In summary, it is possible to use functional abstraction in Java 7, but it is very painful. The one case where it is widely used (although not usually thought of as such) is the case of callback functions, which are encapsulated as methods in one-method classes. Java 8 makes higher-order functions much easier to use, even though the cumbersome type system is still annoying and gets in the way.

In subsequent posts we will see how to express the same ideas in other languages.

#functionalprogramming #java

Functional Programming in Mainstream Languages, Part 2: Introduction to Higher-Order Functions in Java 8

(Jump to Table of Contents)

If immutability is the primary principle of functional programming, higher-order functions are its most important technique. Higher-order functions, also called lambas, are now part of many mainstream languages. In particular, they have been added to Java 8. In this post I will show how higher-order functions can be expressed in Java 8.

I can't explain the details of the theory and techniques here; I highly recommend the wonderful book Structure and Interpretation of Computer Programs (2nd Edition) by Abelson and Sussman. Most of the examples in this series of posts are taken from that book.

In functional programming, functions are first-class objects. This means that functions can be passed as parameters to other functions, they can be returned from functions, they can be stored in data structures; in other words, anything you can do with any other object in the language, such as numbers or strings, you can do with functions.

Some languages fake it: for example, in C you can pass around function pointers, but not function objects. The major difference is that function objects contain not only the code the function executes, but also its environment, meaning all bound variables known to the function. This is particularly important (and powerful) when functions are nested inside other functions (or methods, or any other binding scope).

The inventors of LISP struggled with the correct semantics of first-class functions; they called this the "funarg problem." It turns out that the power of functional programming requires lexical scope, which means that variable names must be resolved with respect to the nesting of functions in the code, rather than to the structure of the calling stack at runtime. And in order for that to work correctly, it is impossible to allocate function parameters and other local variables on the program stack if those may be referred to in an internal function that is returned from the call. This complicates the compiler, but is invisible to programmers using the language.

As a simple example, consider a function make-adder that accepts a single number, and returns a function that adds that number to its own argument. For example, if make-adder is called with the value 1, it will return a function that can be called increment, since it adds one to whatever argument it is given. Here is how to write this function in Scheme and in Java 8.

Scheme

(define make-adder (lambda (x) (lambda (y) (+ x y)))
(define increment (make-adder 1))
(increment 4)

Java 8

IntFunction<IntUnaryOperator> makeAdder = x -> y -> x + y;
IntUnaryOperator increment = makeAdder.apply(1);
int res = increment.applyAsInt(4);

The result of the last expression (in either language) is 5.

The syntax is quite different, but the effect is the same. Let's look at the functions first. The Scheme syntax is a direct translation of the syntax used in the Lambda Calculus, which was invented by the great logician Alonzo Church, and which forms the theoretical basis for functional programming (in addition to its major importance in the theory of computer science). In the lambda calculus, the make-adder function would be written like this: λx.λy.x+y.

This syntax describes a function of one argument whose name is x; this is indicated by the notation "λx.". What follows the dot is the value of the function for a given value of x; in this case, this is another function, whose argument is called y. The value returned by this function is the sum of x and y.

In Scheme, a function is denoted by the syntax (lambda (v1 v2 ...) body), where v1 v2 ... are the parameter names, and body is an expression representing the value that the function returns.

In Java 8, function syntax has the names of the parameters before the arrow (->), and the body follows the arrow. It is possible to specify more than one argument, in which case the arguments need to be put in a parenthesized list. It is also possible to specify the types of the arguments, but fortunately the designers of Java have finally realized what a burden repetitive types are on programs, and Java 7 introduced type inference into the compiler in order to eliminated some (but not enough) of this repetition. Java 8 extends this mechanism also to lambdas. In this case, the compiler can infer the types of the arguments from the type declared for the function makeAdder. This type needs some explanation.

Java 8 has a new set of interfaces to describe various types of functions. For example, Function<T,R> is the type of functions that take an argument of type T and return a result of type R. However, there is a whole set of interfaces that cater for primitive types, in order to avoid boxing and unboxing. (This primary mistake in the design of Java has been giving programmers grief since Java 1, and is unlikely to go away.) So IntFunction<R> denotes functions that take an int and return a value of type R.

What is the return type of makeAdder? It is a function that takes an int and returns an int; such functions are described by the interface IntUnaryOperator. The corresponding interface for functions that take an object of type T and return an object of the same type is UnaryOperator<T>.

As I will discuss in a later post, typed functional languages such as ML and Haskell have a rich set of functional types and a concise yet very expressive type notation. The type of makeAdder would be specified in such languages simply as int -> int -> int, a notation that is similar to the way functions are defined. This one indicates a function that receives an int, and returns a function of type int -> int, that is, a function that takes an int and returns an int. Such notations would make the burden of type specifications in Java much easier.

#functionalprogramming #java8

The Dusty Deck

Labels

Friday, November 28, 2014

Functional Programming in Mainstream Languages, Part 4: Higher-Order Functions in JavaScript

Thursday, November 27, 2014

Functional Programming in Mainstream Languages, Part 3: Higher-Order Functions in Java 7 and 8

Functional Programming in Mainstream Languages, Part 2: Introduction to Higher-Order Functions in Java 8

Followers

Blog Archive