FunScript 2013 presentation

FunScript is an extensible F# to JavaScript compiler. Tonight I gave a presentation about this open-source project at SkillsMatter, London.

Presentation synopsis

FunScript/F# is the only statically-typed compile-to-js language poised to take full advantage of the JavaScript ecosystem (that I know of). F#’s type providers make consuming JavaScript possible without any code generation or foreign function interface (FFI) definition. FunScript has taken the first steps to making this a reality by consuming TypeScript definitions files. However, the F# community has more ambitious plans to consume pure JavaScript files too. Seamless integration with the node package manager might also be on the horizon.

F# links

FunScript links

Links to languages compiling to JavaScript

Posted in Uncategorized | 1 Comment

Building an “actor” in F# with higher throughput than Akka and Erlang actors

Building an “actor” in F# with higher throughput than Akka and Erlang actors

The “Big Data” problem

Our free lunch is over!

The number of transistors in our CPUs is still increasing like Moore’s law predicted. However, the frequency of our chips is flat-lining. We can no longer expect to see a 2x or even 1.5x performance improvement every 18 months from code that doesn’t exploit parallelism.

“Big Data” has arrived

According to Cisco we are now in The Zetabyte Era.

Global IP traffic has increased eightfold over the past 5 years.

Globally, mobile data traffic will increase 18-fold between 2011 and 2016.

The Economist ran an article in 2010 on what some are calling “the industrial revolution of data”.

Oracle, IBM, Microsoft and SAP between them have spent more than $15 billion on buying software firms specialising in data management and analytics.

This industry is estimated to be worth more than $100 billion and growing at almost 10% a year, roughly twice as fast as the software business as a whole.

We need a different programming model

The most prevalent programming model today is probably Object Oriented Programming (OOP). This typically results in a system of objects encapsulating some mutable state and interacting with other objects through method calls.

Concurrent changes to mutable state need to be synchronized to avoid race hazards. In OOP there are two main ways of doing this:

Locks don’t scale well. They are inefficient under even moderate amounts of contention. At best, they involve threads occasionally spinning whilst doing no useful work. At worst, they involve multiple threads incurring the overhead of taking trips through the operating system.

Lock/wait-free code is generally far more efficient. However, it is hard to write. In fact, building good, lock-free data-structures is still publishable. For example, this paper on building lock-free queues was published in August. Therefore, writing lock/wait-free code ad hoc is probably not going to scale to a whole system.

Fortunately, there are several alternatives to OOP that might help us code for these “Big Data” scale problems. Each alternative has benefits and drawbacks. For example, Functional Programming used alongside OOP promotes the use of immutable objects, which abates the need for synchronization. However, the GCs in the VMs we use (particularly the JVM and CLR) don’t always cope well with the associated increase in garbage.

In this article we will discuss the Actor Model, another alternative/modification to OOP. In particular, we will show how to construct actors in F# that are capable of processing a high throughput of messages.

What are actors?

Conceptually an actor has its own “thread” of execution.

  • It processes each message based on its current state.
    • This might involve changing its state through mutation, creating new actors or sending messages to existing actors.
  • It also designates how it will handle the next message.
    • This could be through recursion, looping or something else.

The Actor Model was invented by Carl Hewitt. Earlier this year Eric Meijer and Clemens Szyperski interviewed him for Channel 9 (see the video here). He said the following about actors:

They are the fundamental unit of computation.
They have to embody processing, storage and communication.

Language support

Erlang

Erlang was designed at Ericsson in 1986. It was built to support concurrency, fault-tolerance and hot-swapping. It is quite heavily used in industry through various database and queue offerings. For example, CouchDB, SimpleDB, Riak and RabbitMQ all make use of Erlang.

The Erlang VM has a garbage collector per process (i.e., per actor). Unfortunately, it is beyond the scope of this article to investigate how this changes the latency distribution of various systems in comparison to the same systems running on the JVM or CLR.

Scala

Scala has several implementations of actors. Akka is (at the time of writing) the most advanced of these. According to their website Akka supports:

50 million msg/sec on a single machine.

Small memory footprint; ~2.7 million actors per GB of heap.

See the Akka team blog for more details on how these were measured.

F#

F# has one implementation of actors, the MailboxProcessor, built into its core library. It is easy to use and integrates well with Asynchronous Workflows, another F# feature. However, it does have some shortcomings in comparison to the actor implementations described above. The primary issue is the lack of support for communicating between system processes or machines. Addressing this issue is beyond the scope of this article.

Other implementations

See this page on Wikipedia.

Performance: Erlang vs Akka

There are a few blog posts with code that compare the performance of Erlang to Akka. Paul Keeble measured the throughput of a single actor in both Erlang and Akka in this blog article. Franz Bettag later repeated this experiment here. I repeated the experiment on my 2.8GHz i7 with 4GB RAM running Windows 8. These were my results:

  • Erlang was capable of processing over 1,210,000 messages per second (on average over 10 runs of 3,000,000 messages).

Eshell V5.9.3  (abort with ^G)
1> client:runTest(3000000).
Count is {ok,300000000}
Test took 2.824 seconds
Throughput=1062322.9461756374 per sec
ok
2> client:runTest2(3000000).
Count is 300000000
Test took 2.465 seconds
Throughput=1217038.5395537526 per sec
ok
  • Akka was capabable of processing over 2,620,000 messages per second (on average over 10 runs of 3,000,000 messages). We used Scala v2.9.2, Akka v2.0.4 and passed custom parameters to SBT in order to set the heap size and turn on various optimizations (see my github project for more details).

Multiple main classes detected, select one to run:

 [1] ag.bett.scala.test.BenchmarkAll
 [2] ag.bett.scala.test.akka.Application

Enter number: 1

[info] Running ag.bett.scala.test.BenchmarkAll
Warmup run!

[akka] Count is 300000000
[akka] Test took 1.195 seconds
[akka] Throughput=2510460.251046025 per sec

Warmup run finished!
Garbage Collection


[akka] Count is 300000000
[akka] Test took 1.096 seconds
[akka] Throughput=2737226.2773722624 per sec

[success] Total time: 12 s, completed 10-Dec-2012 22:14:01

To test the throughput each actor maintains a count. We send them each 3,000,000 messages from up to 8 parallel threads asking them to increase their counts by some fixed amount. We then record the average number of messages they process each second by inspecting the count with another message. See Paul Keeble’s code here on github for more details.

There is one thing to keep in mind about this test. In the interview with Carl Hewitt, Carl said this:

One actor is no actor.

This statement highlights what is wrong with these performance comparisons. Erlang was built for scaling out to millions of actors interacting with each other. The throughput of one actor doesn’t tell us anything particularly interesting. However, we will conveniently forget this for the rest of this article.

Repeating the experiment in F#

The Discriminated Union below defines the messages that the actor can accept.


type CounterMsg =
   | Add of int64
   | GetAndReset of (int64 -> unit)

The code beow defines an actor. It pulls a message from a mailbox, processes it and recurses. An Add(n) message causes it to update its count. A GetAndReset(reply) message causes it to reset its count after replying with the current count.


let vanillaCounter = 
   MailboxProcessor.Start 
      let rec loop count = async {
         let! msg = inbox.Receive()
         match msg with
         | Add n -> return! loop (count + n)
         | GetAndReset reply ->
            reply count
            return! loop 0L
      }
      loop 0L

This actor was capable of processing over 930,000 messages per second (on average over 10 runs of 3,000,000 messages). This is under half the throughput of the Akka actor but very close to the throughput of the Erlang actor.

A simple actor implementation in F#

The code below implements a simple actor. It makes use of the highly efficient lock-free ConcurrentQueue that is built into the framework. However, it also uses event wait handles for “waking” the “sleeping” actor when new messages arrive. This is inefficient.


type 'a ISimpleActor =
   inherit IDisposable
   abstract Post : msg:'a -> unit
   abstract PostAndReply : msgFactory:(('b -> unit) -> 'a) -> 'b

type 'a SimpleMailbox() =
   let msgs = ConcurrentQueue()
   let onMsg = new AutoResetEvent(false)

   member __.Receive() =
      let rec await() = async {
         let mutable value = Unchecked.defaultof
         let hasValue = msgs.TryDequeue(&value)
         if hasValue then return value
         else 
            let! _ = Async.AwaitWaitHandle onMsg
            return! await()        
      }
      await()

   member __.Post msg = 
      msgs.Enqueue msg
      onMsg.Set() |> ignore

   member __.PostAndReply msgFactory =
      let value = ref Unchecked.defaultof
      use onReply = new AutoResetEvent(false) 
      let msg = msgFactory (fun x ->
         value := x
         onReply.Set() |> ignore
      )
      __.Post msg
      onReply.WaitOne() |> ignore
      !value

   interface 'a ISimpleActor with
      member __.Post msg = __.Post msg
      member __.PostAndReply msgFactory = __.PostAndReply msgFactory
      member __.Dispose() = onMsg.Dispose()

module SimpleActor =
   let Start f =
      let mailbox = new SimpleMailbox()
      f mailbox |> Async.Start
      mailbox :> _ ISimpleActor

We used this implementation with the code below to reconstruct the same throughput test that we used above.


let simpleActor = 
   SimpleActor.Start 
      let rec loop count = async {
         let! msg = inbox.Receive()
         match msg with
         | Add n -> return! loop (count + n)
         | GetAndReset reply ->
            reply count
            return! loop 0L
      }
      loop 0L

This actor was capable of processing over 1,070,000 messages per second (on average over 10 runs of 3,000,000 messages). This is slightly faster than the built in MailboxProcessor and tantalizingly close to Erlang’s actor performance. However, it still doesn’t come close to the Akka implementation.

An improved actor implementation in F#

We improved upon the implementation above by removing the need for the event wait handles in the Post(...) method.

We did this by introducing a count variable that keeps track of the number of messages that are left to be processed by the actor.

When an actor posts something into this actor they atomically increment the message count. If they posted the first message (i.e., the count is now 1) then they take resposibility for ensuring the actor processes its incoming messages until the count is reduced back to 0. This might involve scheduling the processing of messages on a different thread. However, for this particular problem, we found that this reduced throughput by between 10-20%.

When the “responsible” thread has processed a message it atomically decrements the count. If the count reaches 0 then it effectively “loses responsibility” and can continue to execute the rest of the work it has to do.

We made one further optimization. This was to avoid using the ConcurrentQueue for the first message when the count is 0. This improved throughput by approximately 15%.


type 'a ISharedActor =
   abstract Post : msg:'a -> unit
   abstract PostAndReply : msgFactory:(('b -> unit) -> 'a) -> 'b

type 'a SharedMailbox() =
   let msgs = ConcurrentQueue()
   let mutable isStarted = false
   let mutable msgCount = 0
   let mutable react = Unchecked.defaultof
   let mutable currentMessage = Unchecked.defaultof

   let rec execute(isFirst) =

      let inline consumeAndLoop() =
         react currentMessage
         currentMessage <- Unchecked.defaultof
         let newCount = Interlocked.Decrement &msgCount
         if newCount  0 then execute false

      if isFirst then consumeAndLoop()
      else
         let hasMessage = msgs.TryDequeue(&currentMessage)
         if hasMessage then consumeAndLoop()
         else 
            Thread.SpinWait 20
            execute false

   member __.Receive(callback) = 
      isStarted <- true
      react <- callback

   member __.Post msg =
      while not isStarted do Thread.SpinWait 20
      let newCount = Interlocked.Increment &msgCount
      if newCount = 1 then
         currentMessage <- msg
         // Might want to schedule this call on another thread.
         execute true
      else msgs.Enqueue msg

   member __.PostAndReply msgFactory =
      let value = ref Unchecked.defaultof
      use onReply = new AutoResetEvent(false)
      let msg = msgFactory (fun x ->
         value := x
         onReply.Set() |> ignore
      )
      __.Post msg
      onReply.WaitOne() |> ignore
      !value


   interface 'a ISharedActor with
      member __.Post msg = __.Post msg
      member __.PostAndReply msgFactory = __.PostAndReply msgFactory

module SharedActor =
   let Start f =
      let mailbox = new SharedMailbox()
      f mailbox
      mailbox :> _ ISharedActor

We used this implementation with the code below to reconstruct the same throughput test that we used previously.


let sharedActor = 
   SharedActor.Start 
      let rec loop count =
         inbox.Receive(fun msg ->
            match msg with
            | Add n -> loop (count + n)
            | GetAndReset reply ->
               reply count
               loop 0L)
      loop 0L

This actor was capable of processing over 4,660,000 messages per second (on average over 10 runs of 3,000,000 messages). Therefore, this implementation provides a 3.8x throughput improvement on the Erlang implementation and a 1.7x throughput improvement on the Akka implementation.

Conclusion

We have compared some Scala, Erlang and F# actors. We found that Scala’s Akka allowed far greater throughputs than the default Erlang and F# implementations. We also built actors from scratch in F# that are capable of processing substantially higher throughputs than any of the default implementations. However, we noted that these are not very interesting or important results. What we really need to measure is the throughput and latency of realistically sized systems of actors interacting with one another. The Akka team have made a step in this direction. In their blog they look at how the total throughput of all actors in a system scales as more actors are added into that system.

We need more examples of how complex systems can be better described using the Actor Model than using OOP. Though it probably isn’t a silver bullet, the Actor Model is a promising solution to our “Big Data” problem because it is far easier to express concurrent interactions using actors than it is with OOP.

Possible next steps

  • Measure the (suspected positive) impact of per actor garbage collection that Erlang uses in comparison to the shared heap implementations of actors on the JVM and CLR
  • Measure the relationship between memory and the number of actors in a system on Akka, Erlang and F#
  • Measure the throughput across a large system of interacting actors in Akka, Erlang and F#
  • Implement cross machine actor communication in F#

Resources

Posted in Uncategorized | Tagged , , | 18 Comments

Workaround for the boxing DateTime comparison in F# 2 and 3

The problem

In F# versions 2 and 3 DateTime comparisons are compiled into a call to this method: LanguagePrimitives.HashCompare.GenericLessThanIntrinsic. Unfortunately, this causes the DateTime struct to be boxed. This unnecessary allocation has a severe impact on the performance of code that performs many comparisons.

The F# interactive window is a great tool for testing performance of small bits of code. We can pass in the #time directive to get detailed information about the statements we run in the window. This includes the running time and the number of garbage collections performed on each generation whilst executing the statement.

We can display the problem using the following statements in the interactive window.

let x, y = DateTime.MinValue, DateTime.MaxValue;;
#time;;
for i = 0 to 10000000 do x > y |> ignore;;

This outputs the following.

--> Timing now on
Real: 00:00:01.853, CPU: 00:00:01.856, GC gen0: 76, gen1: 76, gen2: 0
val it : unit = ()

We can see that performing 10M comparisons on my machine takes just under 2 seconds. We can also see that the code is allocating, as the garbage collector collected a number of generation 0s and generation 1s during the execution of the statement.

The workaround

We can work around this problem with the comparison operators by using our own non-boxing equivalents.

We can make a non-boxing function by using a generic argument. We constrain it to be the type we want. Then we take arguments of this generic type rather than the type we actually want. This trick allows the compiler to write IL with the boxing removed.

let inline cmp<'a when 'a :> IComparable<'a>> (x:'a) (y:'a) = x.CompareTo y
let inline (=.) x y = cmp x y = 0
let inline (>.) x y = cmp x y > 0
let inline (<.) x y = cmp x y < 0
let inline (>=.) x y = cmp x y >= 0
let inline (<=.) x y = cmp x y <= 0
let inline max' x y =
   if x >=. y then x
   else y
let inline min' x y =
   if x <=. y then x
   else y

We can see an improvement if we run the same test but with our custom operator.

for i = 0 to 10000000 do x >. y |> ignore;;

This outputs the following.

--> Timing now on
Real: 00:00:00.068, CPU: 00:00:00.062, GC gen0: 0, gen1: 0, gen2: 0
val it : unit = ()

This shows a massive reduction in the CPU time taken to execute the statement. It is 29.9 times faster than the code using the boxing operator. We can also see that there were no garbage collections, which implies the operator doesn’t allocate.

Posted in Uncategorized | Tagged , | 1 Comment

LinqToForms toy project released

Overview

  • LinqToForms is a library for composing WPF forms using LINQ.
  • You can find it on github: here.
  • It only helps you build a ViewModel. It doesn’t generate a View. Although this is something you could easily do yourself.
  • It works with XAML.
  • It is a toy project and hasn’t been well tested yet.

Goals

  • To make the construction of forms using MVVM feel more natural.
  • To simplify ViewModel validation logic by removing things like mutable state.
  • To make it possible to compose forms.

Structure

Formlets

Tomas Petricek wrote a great blog entry here on formlets. The basic idea is that a formlet represents part of a form. For example, a text field or an address section. Also, a formlet can include other formlets. For example, a company formlet might include an address formlet.

Construction and validation

The code below demonstrates how you might construct an address formlet with LinqToForms.

  public IFormlet<Address> Create()
  {
     var postCodeRegex = new Regex(@"[A-Za-z]+[0-9]+\s+[0-9]+[A-Za-z]+");

     return from firstLine in Formlet.Text("First Line")
            from secondLine in Formlet.Text("Second Line")
            from postCode in Formlet.Text("Post Code")
            where firstLine != ""
            where secondLine != ""
            where postCodeRegex.IsMatch(postCode)
            select new Address(firstLine, secondLine, postCode);
  }

In the first three lines from the return statement we declare some text fields that should be available on the address form using the from operation. We pass a string into the Fomlet.Text(string id) method for each field. Later we will use these string values as keys into a dictionary of fields so that we can interact with the form from XAML.

The lines starting with where clauses constitute the validation logic for the formlet. Here we check that the “First Line” and “Second Line” fields are filled and that the “Post Code” field matches a regular expression.

Composition

The code below demonstrates how you might construct a company formlet that makes use of the address formlet above.

  public IFormlet<Company> Create()
  {
     return from name in Formlet.Text("Name")
            from address in new AddressFormletFactory().Create()
            where name != ""
            select new Company(name, address);
  }

Once again we declare a field, “Name”, that we expect in the form. In the following line we use the from operation again but this time to import a child formlet and all of its validation logic. The address variable has the type Address and it is used in the construction of a new Company instance.

Note: At present, LinqToForms only supports where clauses after all fields have been declared. This includes where clauses in child formlets. This needs to be fixed. It limits composability.

Forms and ViewModels

The value of a formlet changes over time. More specifically, it changes when the user updates a field in the form. Furthermore, any change in a child formlet propagates to its parents. For example, when a user changes the “First Line” field in the “Address Section” the address formlet produces a new Address instance if it passes validation. This new Address causes the company formlet to produce a new Company instance with an updated Address field.

A Form is wrapper around a formlet. It listens to the formlet values over time and provides a command for submitting the current value of the formlet. This command is only executable when the formlet passes validation. It also exposes a dictionary from identifier to form field. This allows the view to interact with the form values.

Below is an example of a form for the company formlet.

class CompanyForm : Form<Company>
{
    public CompanyForm(string submitAction, Action<Company> onSubmit, Action onCancel) 
     : base(submitAction, onSubmit, onCancel)
    {
        Definition = new CompanyFormletFactory().Create();
    }
}

We assign the formlet to the Definition property of the Form sub type. In the constructor of Form we pass in several arguments. The submitAction argument is the label we give the submit button on the form, e.g., “Update Company” or “Add Company”. The onSubmit argument is a continuation for when the user submits a validated form.

Views

Text field bindings
    <Label Grid.Column="0">Name</Label>
    <TextBox Grid.Column="1" Text="{Binding Fields[Name].Value, Mode=TwoWay}" />
  • Name is the identifier provided in the formlet for the field.
Int/decimal field bindings
    <Label Grid.Column="0">Age</Label>
    <xctk:IntegerUpDown Grid.Column="1"
        Value="{Binding Fields[Age].Value, Mode=TwoWay}" 
        Increment="{Binding Fields[Age].Stepping}" 
        Maximum="{Binding Fields[Age].MaxValue}" 
        Minimum="{Binding Fields[Age].MinValue}" />
  • Age is the identifier provided in the formlet for the field.
  • The initial Value, Stepping, Minimum and Maximum can be defined in the formlet. See the UserFormletFactory in the example project.
Other field bindings

Feel free to add more and send us a pull request.

Gotchas

  • Where clause problem. See bold note above.
  • Identifier re-use (in child formlets) is not supported.
  • Probably some other stuff too!
Posted in Uncategorized | Tagged , , | Leave a comment

Solving the 0-1 knapsack problem using continuation-passing style with memoization in F#

Continuation-passing style (CPS) is a technique for adapting non tail-recursive functions into tail-recursive functions. This article describes continuation-passing style and why it is essential to solving some problems, such as the 0-1 knapsack problem, in a functional way.

Tail-recursion

A function is recursive if it calls itself. For example, the recursive function below returns the largest element in a list.

let largest xs =
   let rec largest = function
      | [] -> failwith "Must have at least one element in list"
      | x::[] -> x
      | x::xs -> max x (largest xs)
   largest xs

In the case where the list contains only one element that element is the largest element. Otherwise, the largest element in the list is the largest of the head and the largest element of the tail list.

A function is tail-recursive if it calls itself only at the end of the function. For example, the tail-recursive function below also returns the largest element in a list.

let largest' xs =
   let rec largest largestSoFar = function
      | [] -> largestSoFar
      | x::xs -> largest (max largestSoFar x) xs
   match xs with
   | [] -> failwith "Must have at least one element in list"
   | x::xs -> largest x xs

The first function “delayed” the computation involving the head of the list until the largest element of the tail list was known, whereas this function pushes the largest element found so far through the rest of the computation as an argument.

Performance

Rewriting the function in this style may worsen its readability. However, it has performance benefits.

let xs = [0..10000]
for i = 1 to 10000 do largest' xs |> ignore
for i = 1 to 10000 do largest xs |> ignore
largest' [0..100000] |> ignore
largest [0..100000] |> ignore

Running the code above inside F# interactive results in the following output [1][2].

Real: 00:00:00.246, CPU: 00:00:00.234, GC gen0: 0, gen1: 0, gen2: 0
val it : unit = ()
>
Real: 00:00:00.550, CPU: 00:00:00.546, GC gen0: 0, gen1: 0, gen2: 0
val it : unit = ()
>
Real: 00:00:00.019, CPU: 00:00:00.031, GC gen0: 1, gen1: 0, gen2: 0
val it : unit = ()
>

Process is terminated due to StackOverflowException.

This shows that the tail-recursive function is approximately 2.3 times faster for lists of 10,000 elements and safer for longer lists. The plain recursive function runs out of stack space when asked to compute the largest element in a list with 100,000 elements. However, the tail-recursive function does not. This is because the F# compiler performs tail-call elimination [3]. Tail-call elimination is an optimization where tail-recursive functions are converted into while loops.

It is possible to see the difference in the code generated by the compiler using a decompiler like ILSpy. Below are C# translations of the IL code generated for both functions.

1. Plain recursive function

internal static int largest@4(FSharpList _arg1)
{
	if (_arg1.TailOrNull == null)
	{
		throw new Exception("Must have at least one element in list");
	}
	if (_arg1.TailOrNull.TailOrNull == null)
	{
		return _arg1.HeadOrDefault;
	}
	FSharpList xs = _arg1.TailOrNull;
	int x = _arg1.HeadOrDefault;
	int num = Program.largest@4(xs);
	if (x < num) 	{ 		return num; 	} 	return x; } 

2. Tail-recursive function

 
internal static int largest@11-1(int largestSoFar, FSharpList _arg1) 
{
   while (_arg1.TailOrNull != null) 	
   {
      FSharpList fSharpList = _arg1;
      FSharpList xs = fSharpList.TailOrNull;
      int x = fSharpList.HeadOrDefault;
      int arg_26_0 = (largestSoFar >= x) ? largestSoFar : x;
      _arg1 = xs;
      largestSoFar = arg_26_0;
   }
   return largestSoFar;
}

Continuation-passing style

Continuation-passing style provides an alternative means of making the original function tail-recursive. The original function “delayed” the computation until it received the result of a recursive call. If we use continuation-passing style, instead we pass the action we wish to perform on the result into the function. When the result is known the function carries out this action (the continuation).

let largest'' xs =
   let rec largest k = function
      | [] -> failwith "Must have at least one element in list"
      | x::[] -> k x
      | x::xs -> largest (fun x' -> k(max x x')) xs
   largest id xs

Line 6: Our initial intent is to return the value computed, therefore the first continuation we pass in is the identity function.
Line 4: When there is only one element in a list that element is the largest value and we use that in the continuation.
Line 5: When the largest element of the tail list is known we take the largest of that element and the head element and use that in the continuation.

Unfortunately using CPS can result in worse performance characteristics. Below are the results we obtained when we ran the function against the same tests that we ran the plain recursive and tail-recursive functions against.

let xs = [0..10000]
for i = 1 to 10000 do largest'' xs |> ignore
largest'' [0..100000] |> ignore
Real: 00:00:04.125, CPU: 00:00:04.102, GC gen0: 723, gen1: 77, gen2: 0
val it : unit = ()
>
Real: 00:00:00.023, CPU: 00:00:00.031, GC gen0: 1, gen1: 0, gen2: 0
val it : unit = ()
>

When asked to find the largest element in a list of 10,000 elements the tail-recursive function was approximately 17.5 times faster than the CPS function whilst the plain recursive function was approximately 7.5 times faster.

Multiple recursion

Sometimes we are forced to write continuation-passing style functions despite the fact they can have worse performance characteristics than “hand written” tail-recursive functions. Consider the example below. It is a recursive function that calculates the nth Fibonacci number.

let fib n =
   let rec fib = function
      | 0 -> 0
      | 1 -> 1
      | n -> fib(n-1) + fib(n-2)
   fib n

We cannot easily “hand write” a tail-recursive solution to this problem because it seems to require two recursive calls. However, we can easily turn it into a tail-recursive function using CPS.

let fib' n =
   let rec fib k = function
      | 0 -> k 0
      | 1 -> k 1
      | n -> fib (fun x -> fib (fun y -> k(x+y)) (n-2)) (n-1)
   fib id n

0-1 Knapsack problem

A more interesting problem with the multiple recursion trait is the 0-1 knapsack problem.

Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.

Below is a recursive function that finds the greatest total value possible by choosing a combination of the first \large i items weighing up to \large w. For the first \large i items this total either includes the \large i^{th} item’s value or it does not.

If the item is included, the solution must be equal to \large {value}(i) plus the greatest total value possible by choosing a combination of the first \large i-1 items weighing up to \large w - {weight}(i).

If the \large i^{th} item is not included, the solution must be equal to finding the greatest total value possible by choosing a combination of the first \large i-1 items weighing up to \large w.

let bestValue weightOf valueOf desiredWeight (items:_[]) =
   let weightOf i = weightOf items.[i-1]
   let valueOf i = valueOf items.[i-1]
   let rec bestValue = function
      | 0, _ | _, 0 -> 0.
      | i, w when weightOf i > w -> bestValue (i-1, w)
      | i, w ->
         let withoutItem = bestValue (i-1, w)
         let withItem = bestValue (i-1, w - weightOf i) + valueOf i
         max withoutItem withItem
   bestValue (items.Length, desiredWeight)

Obviously, this solver suffers from the same problems we highlighted in the other non tail-recursive functions. It cannot be used safely with large numbers of items. Though, once again, we can rewrite this function using continuation passing-style.

let bestValue' weightOf valueOf desiredWeight (items:_[]) =
   let weightOf i = weightOf items.[i-1]
   let valueOf i = valueOf items.[i-1]
   let rec bestValue k = function
      | 0, _ | _, 0 -> k 0.
      | i, w when weightOf i > w -> bestValue k (i-1, w)
      | i, w ->
         bestValue (fun withoutItem ->
            bestValue (fun withItem ->
                  let withItem = withItem + valueOf i
                  k(max withoutItem withItem)
               ) (i-1, w - weightOf i)
            ) (i-1, w)
   bestValue id (items.Length, desiredWeight)

Enhancing readability

When we introduced tail-recursive functions we mentioned that “hand writing” functions in this style can impair the readability of the solution somewhat. If we compare the two 0-1 knapsack problem solvers, we can see that continuation-passing style exacerbates this issue further. Thankfully, we can use the continuation monad to improve readability whilst keeping the function tail-recursive. We can define our own monads (or computation expressions) in F# by defining a builder class with some special methods. Below is an example showing how we might define a builder that allows us to use the continuation monad.

type ContinuationBuilder() =
   member b.Bind(x, f) = fun k -> x (fun x -> f x k)
   member b.Return x = fun k -> k x
   member b.ReturnFrom x = x
let cont = ContinuationBuilder()

Two examples using the continuation monad are below. First, we show an adapted Fibonacci function. Second, we show an adapted 0-1 knapsack problem solver. They both use continuation-passing via the continuation monad. However, they look much closer to the original naive functions than the continuation-passing style functions.

let fib'' n =
   let rec fib n = cont {
      match n with
      | 0 -> return 0
      | 1 -> return 1
      | n ->
         let! x = fib(n-1)
         let! y = fib(n-2)
         return x + y
   }
   fib n id

let bestValue'' weightOf valueOf desiredWeight (items:_[]) =
   let weightOf i = weightOf items.[i-1]
   let valueOf i = valueOf items.[i-1]
   let rec bestValue (i, w) = cont {
      match i, w with
      | 0, _ | _, 0 -> return 0.
      | i, w when weightOf i > w -> return! bestValue (i-1, w)
      | i, w ->
         let! withoutItem = bestValue (i-1, w)
         let! beforeItem = bestValue (i-1, w - weightOf i)
         let withItem = beforeItem + valueOf i
         return max withoutItem withItem
   }
   bestValue (items.Length, desiredWeight) id

Imperative solution to the 0-1 knapsack problem written in C#

We have set out how we can define a readable tail-recursive solver using the continuation monad. Yet we are still to prove that this can exhibit good performance. We will use the imperative solver below, written in C#, as the baseline for testing performance.

public static double BestValue(Tuple[] items, int desiredWeight)
{
   var bestValues = new Value[items.Length + 1, desiredWeight + 1];
   for (var w = 0; w    for (var i = 0; i    for (var i = 1; i    {
      var item = items[i-1];
      var weight = item.Item1;
      var value = item.Item2;
      for (var w = 1; w  w)
         {
            bestValues[i, w] = bestValues[i-1, w];
         }
         else
         {
            var withoutItem = bestValues[i-1, w];
            var beforeItem = bestValues[i-1, w-weight];
            var withItem = beforeItem + value;
            bestValues[i, w] = Math.Max(withItem, withoutItem);
         }
      }
   }

   return bestValues[items.Length, desiredWeight];
}

We tested the solvers using the code below to generate test data.

let genItems n = 
   [| for i = 1 to n do
         let weight = i % 5
         let value = float(weight * i)
         yield weight, value
   |]

This is a plot of the time each of the solvers took to find the optimum solution for various sizes of input. It shows that the solver using the continuation monad has a much worse asymptotic time-complexity than the imperative solver.

Memoizing

If we look at the solver using the continuation monad more closely it is easy to see how we might improve it. The main function makes multiple recursive calls to itself computing values that may have already been computed in sub-problems. Therefore, we can improve the function by memoizing it.

The function below memoizes recursive functions. This particular memoizer only works for functions using continuation-passing style or the continuation monad and that accept two integer inputs inside a known range.

let memoize2D w h f =
   let cache = Array2D.zeroCreate w h
   let hasCached = Array2D.zeroCreate w h
   let rec memoize x y k =
      if hasCached.[x,y] then k cache.[x,y]
      else f memoize x y (fun f_x_y ->
         cache.[x,y]          hasCached.[x,y]  memoize x y k

We have adapted the solver using the continuation monad to make use of this memoizer below.

let bestValue''' weightOf valueOf desiredWeight (items:_[]) =
   let inline weightOf i = weightOf items.[i-1]
   let inline valueOf i = valueOf items.[i-1]
   let bestValue bestValue i w = cont {
      match i, w with
      | 0, _ | _, 0 -> return 0.
      | i, w when weightOf i > w -> return! bestValue (i-1) w
      | i, w ->
         let! withoutItem = bestValue (i-1) w
         let! beforeItem = bestValue (i-1) (w - weightOf i)
         let withItem = beforeItem + valueOf i
         return max withoutItem withItem
   }
   let bestValue = memoize2D (items.Length+1) (desiredWeight+1) bestValue
   bestValue items.Length desiredWeight id

We tested this solver against the imperative solver and the non-memoized solver using the continuation monad. It shows that by using memoization we have reduced the asymptotic time-complexity to the same as the imperative solver. It now has linear time-complexity.

However, this is only an approximate asymptotic bound. If we plot just the imperative solver against the memoized solver it shows that the imperative solution is substantially faster.

Summary

In this article, we demonstrated that we can define a natural recursive solution to the 0-1 knapsack problem using F#. We investigated the safety of this function for large input sizes and found that defining tail-recursive functions is important. We discussed how we could improve the readability of tail-recursive functions using the continuation monad. Finally, we compared the performance of functional solvers to an imperative solver and showed that we could get the same approximate asymptotic time-complexity by using memoization. However, we also pointed out that in reality the imperative function is much faster. This is probably due to the expense of allocating functions in the memoized solver.

Notes

1. The code had an additional definition to avoid boxing comparisons.

let inline max (x:int) (y:int) = max x y

2. To turn on timing information in F# interactive you can use the following declaration.

#time;;

3. Tail-call elimination is used only in the ‘Release’ configuration by default. It must explicitly turned on in ‘Debug’ configuration.

Posted in Uncategorized | Tagged , , , , , | 2 Comments