Calling all space leaks
by Edward Z. Yang
I’m currently collecting non-stack-overflow space leaks, in preparation for a future post in the Haskell Heap series. If you have any interesting space leaks, especially if they’re due to laziness, send them my way.
Here’s what I have so far (unverified: some of these may not leak or may be stack overflows. I’ll be curating them soon).
import Control.Concurrent.MVar
-- http://groups.google.com/group/fa.haskell/msg/e6d1d5862ecb319b
main1 = do file <- getContents
putStrLn $ show (length $ lines file) ++ " " ++
show (length $ words file) ++ " " ++
show (length file)
-- http://www.haskell.org/haskellwiki/Memory_leak
main2 = let xs = [1..1000000::Integer]
in print (sum xs * product xs)
-- http://hackage.haskell.org/trac/ghc/ticket/4334
leaky_lines :: String -> [String]
leaky_lines "" = []
leaky_lines s = let (l, s') = break (== '\n') s
in l : case s' of
[] -> []
(_:s'') -> leaky_lines s''
-- http://stackoverflow.com/questions/5552433/how-to-reason-about-space-complexity-in-haskell
data MyTree = MyNode [MyTree] | MyLeaf [Int]
makeTree :: Int -> MyTree
makeTree 0 = MyLeaf [0..99]
makeTree n = MyNode [ makeTree (n - 1)
, makeTree (n - 1) ]
count2 :: MyTree -> MyTree -> Int
count2 r (MyNode xs) = 1 + sum (map (count2 r) xs)
count2 r (MyLeaf xs) = length xs
-- http://stackoverflow.com/questions/2777686/how-do-i-write-a-constant-space-length-function-in-haskell
leaky_length xs = length' xs 0
where length' [] n = n
length' (x:xs) n = length' xs (n + 1)
-- http://stackoverflow.com/questions/3190098/space-leak-in-list-program
leaky_sequence [] = [[]]
leaky_sequence (xs:xss) = [ y:ys | y <- xs, ys <- leaky_sequence xss ]
-- http://hackage.haskell.org/trac/ghc/ticket/917
initlast :: (()->[a]) -> ([a], a)
initlast xs = (init (xs ()), last (xs ()))
main8 = print $ case initlast (\()->[0..1000000000]) of
(init, last) -> (length init, last)
-- http://hackage.haskell.org/trac/ghc/ticket/3944
waitQSem :: MVar (Int,[MVar ()]) -> IO ()
waitQSem sem = do
(avail,blocked) <- takeMVar sem
if avail > 0 then
putMVar sem (avail-1,[])
else do
b <- newEmptyMVar
putMVar sem (0, blocked++[b])
takeMVar b
-- http://hackage.haskell.org/trac/ghc/ticket/2607
data Tree a = Tree a [Tree a] deriving Show
data TreeEvent = Start String
| Stop
| Leaf String
deriving Show
main10 = print . snd . build $ Start "top" : cycle [Leaf "sub"]
type UnconsumedEvent = TreeEvent -- Alias for program documentation
build :: [TreeEvent] -> ([UnconsumedEvent], [Tree String])
build (Start str : es) =
let (es', subnodes) = build es
(spill, siblings) = build es'
in (spill, (Tree str subnodes : siblings))
build (Leaf str : es) =
let (spill, siblings) = build es
in (spill, Tree str [] : siblings)
build (Stop : es) = (es, [])
build [] = ([], [])
Did you enjoy this post? Please subscribe to my feed!
How about the little known, but somewhat surprising fact, that you can’t use lazy `modify*`-style functions without leaking space.
http://www.haskell.org/pipermail/haskell-cafe/2009-June/063139.html
Hee hee hee. That’s a good one. :-)
On a related note, I don’t think this code in Data.IORef.Strict actually works…
atomicModifyIORef :: (NFData sa, NFData sb) => IORef sa -> (sa -> (sa, sb)) -> SIO sb atomicModifyIORef ref f = SIO $ do x < - IO.atomicModifyIORef ref (rnf' . f) rnf x `seq` return x -- since the result of 'f' is a pair which has to forced where rnf' x = rnf x `seq` xA simple one:
update rec = rec { rec_value = if x then new else rec_value rec, … other fields }
The problem is that if rec_value is not forced for a long time, it builds up thunks, even if ‘x’ was False. I wound up not setting rec_value at all if x was not true, but nowadays I think I’d just strictify the record fields. It’s just a case of the lazy record update leak, but I wasn’t expecting it because ‘x’ was usually False.
I’m still pretty new to Haskell, slowly working my way through Real World Haskell. But they do mention space leaks in that book, and I *think* this code I wrote recently is probably space-leaky because `solve` will return an expression which reduces to 1+(1+(1+(1+(1+(1+…)))…)))
https://gist.github.com/968189
This should stream nicely, although it doesn’t. https://gist.github.com/969253
Seems like this code:
leaky_length xs = length' xs 0
where length' [] n = n
length' (x:xs) n = length' xs (n + 1)
should be absolutely straightforward tail recursion… does Haskell not do tail recursion optimization?
yachris: Yep, that function is tail recursive. The problem is more insidious: n + 1 is a thunk, and you may notice that n is not forced until the list is empty, so the chain of thunks builds up on the heap instead.
Whoops, sorry, should have google’d first:
http://stackoverflow.com/questions/412919/how-does-haskell-tail-recursion-work
f _ = (a, b)
where
a = space !! 10000 -- time intensive
b = space !! 10001 -- time intensive
space = [0..10001] -- space intensive
main = do
let list = fmap f [1..]
print $ fmap fst list -- a is forced, not forcing b means space is not freed
print $ list
Tyler: Your code doesn’t “space leak” per se; the real problem is that you fail to manage a tail-call, which means that the stack will grow as you evaluate that function.
fuzxxl: I diagnosed your space leak: the problem is that fromListWith is not strict w.r.t. the combining function. So if you swap that out with:
fromListWith' :: Ord k => (a -> a -> a) -> [(k,a)] -> Map k afromListWith' f xs
= foldl' ins empty xs
where
ins t (k,x) = insertWith' f k x t
it uses constant space. Note that the original does stream nicely (the hint is that the heap profile shows lots of thunks, *not* cons cells.
Interesting. So my http://blog.ezyang.com/2011/05/calling-all-space-leaks/#comment-2418 looks like a selector leak. Should I expect ghc to do the right strictness annotation here?
No, I actually think your leak is due to the full laziness transformation. But I haven’t experimentally verified yet.
In fact MFE transformation makes it not leak. My original problem was more complex with no MFE transformation possible. Without optimization, this simple example leaks. For every member of the `list’, in the pair `(a, b)’ only `a’ is forced and because `b’ is not, the whole intermediate datastructure (`space’) has to be kept in memory.