Doorstop

2009-01-04

Faster than a speeding arrow

Filed under: haskell — Vineet @ 21:15

It turns out that although I’d already solved project euler problem 14, for some reason or another, I didn’t have the code stored in my git repository along with all the rest of them.

So I sat down to write some code:

collatz 1 = [1]
collatz n = n : collatz next
  where
    next | even n = n `div` 2
         | odd n  = 3 * n + 1

That was my first definition of the collatz sequence. I rewrote it using unfoldr:

collatz = unfoldr next
  where
    next 0 = Nothing
    next 1 = Just (1, 0)
    next n | even n = Just (n,  n `div` 2)
           | odd n  = Just (n, 3 * n + 1)

… which I liked a little bit better.

Then I went on to actually solving the problem: looking for the integer with the longest collatz sequence (in the range 1..1000000). Here’s a straightforward approach:

maximumBy (comparing $ length.collatz) [1..1000000]

It almost reads like English. But of course, it doesn’t actually Just Work. We get a stack overflow, since maximumBy uses a lazy foldl:

maximumBy		:: (a -> a -> Ordering) -> [a] -> a
maximumBy _ []		=  error "List.maximumBy: empty list"
maximumBy cmp xs	=  foldl1 max xs
			where
			   max x y = case cmp x y of
					GT -> x
					_  -> y

So I wrote this using the strict foldl1':

main = print $ foldl1' max $ map (length.collatz) [1..1000000]

… which runs without overflowing, but of course, it doesn’t actually return the number I’m looking for (it prints out the length of the longest sequence, not the integer that generated that sequence). But anyway, it shows the proof of concept: strict evaluation of the fold avoids the stack overflow. So I just need a strict maximumBy', which is just the same as maximumBy but uses foldl1' in place of foldl1.

So now I have a program that runs and produces the correct answer, but it takes about 90-100 seconds.

It turns out my original definiton of collatz (not using unfoldr)  brings this down to about 70 seconds. So okay, I use that instead.

Then just for kicks, I wrote this:

main = print . snd . maximum $ map ((length &&& head) . collatz) [1..1000000]

… and to my great surprise, it ran in 8 seconds! There’s clearly something going on here that I don’t understand yet, and it amounts to a 10x speed difference.

2007-07-01

bottom

Filed under: haskell — Vineet @ 03:53

This is totally childish.

Am I the only that thinks the ascii approximation _|_ for the up tack symbol ⊥, which represents the bottom type, looks a lot like the ascii approximation for the representation of an actual bottom (_|_) ?

I warned you.

2007-01-03

Haskell Caching?

Filed under: haskell — Vineet @ 05:13

I started tinkering with Haskell a bit. In particular, I’m working through some of the Project Euler challenge problems. I’ve read some vague promises like “Haskell’s lazy evaluation gives you caching for free.” I’m still trying to figure out exactly what the scope of that is. For example, I’ve written some purely functional code:

factors n = factors' 2 n
  where
    factors' k n | k*k > n        = [1]
                 | n `mod` k == 0 = k : n `div` k : factors' (k+1) n
                 | otherwise      = factors' (k+1) n
d = sum . factors
pairs n = [x | x <- [1..n], d x /= n, d (d x) == n]

Same inputs gives the same outputs. (This is for problem 21, the amicable pairs.) But if I change this:

main = do
    print $ sum $ pairs 10000

to this:

main = do
    print $ pairs 10000
    print $ pairs 10000
    print $ sum $ pairs 10000

it takes 3 times as long to run, and it’s clear as I watch it print out each list as it computes it that it’s not caching these intermediate results anywhere, and is in fact recomputing them.

I have seen Haskell’s lazy computation of infinite streams and implicit caching associated there at work, and it’s cool. So maybe I need to take a different approach: say rather than defining d and pairs as functions here, I should instead be constructing them up as infinite lists instead.

OK, I tried it like this:

amicable =  filter (\n -> (d n /= n) &&  (d (d n) == n)) [1..]

main = do
	print $ takeWhile (<10000) amicable
	print $ takeWhile (<10000) amicable
	print $ sum $ takeWhile (<10000) amicable

And now it’s clearly not recomputing the amicable stream. So it’s caching, but I wouldn’t call it “for free”. Admittedly, adding in the cache was dead simple, but really it’s the equivalent of implementing memoization as in any language, so saying “Haskell ‘gives’ it to me” is a bit much.

Also, I’m just getting started with Haskell; I could just be doing something entirely wrong here. Maybe there is a way to have it cache function results in addition to this list memoization, or maybe it is caching but I need to tune my cache parameters to make it more useful.

Theme: Shocking Blue Green. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.