Proposal: Suggest explicit type application for Foldable length and friends
March 21, 2017tl;dr If you use a Foldable function like length or null, where instance selection is solely determined by the input argument, you should make your code more robust by introducing an explicit type application specifying which instance you want. This isn’t necessary for a function like fold, where the return type can cross-check if you’ve gotten it right or not. If you don’t provide this type application, GHC should give a warning suggesting you annotate it explicitly, in much the same way it suggests adding explicit type signatures to top-level functions.
Recently, there has been some dust kicked up about Foldable instances causing “bad” code to compile. The prototypical example is this: you’ve written length (f x), where f is a function that returns a list [Int]. At some future point in time, a colleague refactors f to return (Warnings, [Int]). After the refactoring, will length (f x) continue to type check? Yes: length (f x) will always return 1, no matter how long the inner list is, because it is using the Foldable instance for (,) Warnings.
The solution proposed in the mailing list was to remove Foldable for Either, a cure which is, quite arguably, worse than the disease. But I think there is definitely merit to the complaint that the Foldable instances for tuples and Either enable you to write code that typechecks, but is totally wrong.
Richard Eisenberg described this problem as the tension between the goals of “if it compiles, it works!” (Haskell must exclude programs which don’t work) and general, polymorphic code, which should be applicable in as many situations as possible. I think there is some more nuance here, however. Why is it that Functor polymorphic code never causes problems for being “too general”, but Foldable does? We can construct an analogous situation: I’ve written fmap (+2) (f x), where f once again returns [Int]. When my colleague refactors f to return (Warnings, [Int]), fmap now makes use of the Functor instance (,) Warnings, but the code fails to compile anyway, because the type of (+1) doesn’t line up with [Int]. Yes, we can still construct situations with fmap where code continues to work after a type change, but these cases are far more rare.
There is a clear difference between these two programs: the fmap program is redundant, in the sense that the type is constrained by both the input container, the function mapping over it, and the context which uses the result. Just as with error-correcting codes, redundancy allows us to detect when an error has occurred; when you reduce redundancy, errors become harder to detect. With length, the only constraint on the selected instance is the input argument; if you get it wrong, we have no way to tell.
Thus, the right thing to do is reintroduce redundancy where it is needed. Functions like fold and toList don’t need extra redundancy, because they are cross-checked by the use of their return arguments. But functions like length and null (and arguably maximum, which only weakly constrains its argument to have an Ord instance) don’t have any redundancy: we should introduce redundancy in these places!
Fortunately, with GHC 8.0 provides a very easy way of introducing this redundancy: an explicit type application. (This was also independently suggested by Faucelme.) In this regime, rather than write length (f x), you write length @[] (f x), saying that you wanted length for lists. If you wanted length for maps, you write length @(Map _) (f x). Now, if someone changes the type of f, you will get a type error since the explicit type application no longer matches.
Now, you can write this with your FTP code today. So there is just one more small change I propose we add to GHC: let users specify the type parameter of a function as “suggested to be explicit”. At the call-site, if this function is used without giving a type application, GHC will emit a warning (which can be disabled with the usual mechanism) saying, “Hey, I’m using the function at this type, maybe you should add a type application.” If you really want to suppress the warning, you could just type apply a type hole, e.g., length @_ (f x). As a minor refinement, you could also specify a “default” type argument, so that if we infer this argument, no warning gets emitted (this would let you use the list functions on lists without needing to explicitly specify type arguments).
That’s it! No BC-breaking flag days, no poisoning functions, no getting rid of FTP, no dropping instances: just a new pragma, and an opt-in warning that will let people who want to avoid these bugs. It won’t solve all Foldable bugs, but it should squash the most flagrant ones.
What do people think?
fold,toList, and evenFunctoractually have this problem too, in a slightly different circumstance. Suppose you start with[Int]and “upgrade” it to([Int], Int). Originally,fmap (+1)would add 1 to each element of the list; now it adds one to the second component of the tuple. Or suppose you start with[[Int]]and upgrade it to([[Int]], [Int]). Previously,foldwould concatenate the list of lists. Now it just produces the second list.Marking functions as having some explicit (rather than implicit) parameters: I think that this is good language design.
The idea of giving a “default” type that does not need an annotation is interesting but not fully fledged. I think you would rather want to have a list of “likely” instances that do not need to be explicitated (there is no reason to prefer lists over arrays for length, both should be accepted implicitly or neither), and users should be able to extend this base. This starts to remind us of Coq’s “hint databases” for example, and it sounds like heavy and potentially-modularity-damaging design, so I would thread fairly carefully.
Haskell 2010 Report – i.e. language standard, gives
length :: [a] -> Int. So the obvious fix is to make the language comply to standard. If you want a length-alike function to work for Foldables, name the function differently.“Lists are a commonly used data structure in functional languages” says the Haskell Gentle Introduction, and ‘Why Functional Programming Matters’ demonstrates that perfectly. (A little before that quote in the Gentle Intro, says “All type errors are detected at compile time” and “Haskell’s type system is powerful enough to allow us to avoid writing any type signatures at all.”) It then goes on to introduce Strings as Lists, and List Comprehensions. And of course
lengthof a list is part of that.So Lists (and their associated functions, esp
mapandfold, ‘reduce’) are ubiquitious in learning materials. Nowadays from the cognoscenti I see Lists deprecated as almost always the wrong technique. The FTP change, whatever its merits for intellectual purity, has put Haskell further out of reach for beginners (and for those trying to teach them).Now it turns out beginners have to be nervous about using
length. And you think the ‘fix’ is to introduce explicit type application? (The other suggested ‘fix’ I’ve seen is to avoid importing the now-standard Prelude, use a hobbled version of Prelude, with those generic functions specialised to Lists. Again a good way to burden nervous beginners.)This is presumably part of Haskell’s strategy to “avoid success at all costs”. Stop being a language that entices beginners into Functional Programming. (Which was its attraction for me.) Instead make it as recondite and burdensome to learn as any other.
I see a smaller and smaller circle of people on ghc-devs and the Libraries lists making decisions about Haskell’s direction. The effect is there’s a steeper and steeper learning curve to get to use Haskell, and to get to understand the issues in language design. I find nowadays it more likely my code will get obscure type errors. It feels like my learning is going backwards.
Sorry this isn’t answering your question. I don’t like any of the suggested answers. Because I don’t like the question. I don’t like that Haskell has somewhere gone sour, and such questions are necessary.
@max630 “not really a sequence” is a good intuition. I can see Vector-as-sequence. I can’t see Set-as-sequence. (The whole idea of a set, as a mathematical abstraction, is that it’s unordered/has no internal structure.)
I can’t really see Arrays being a sequence as @gosche discusses. You could argue a Vector is just a one-dimensional Array. You could argue an Array is a Vector-of-Vectors. I have no strong intuition whether to count all the cells as equal, or just count the range of the first subscript (Foldable probably counts the last subscript, knowing how non-intuitive it is ;-). “length” (as an English word) doesn’t seem right for either.
We have a precedent that
mapis specialised for Lists, and there’s a genericfmapandfoldMap. And the FTP change didn’t stuff around with that [**].genericLengthis already taken, and anyway “xxxLength” isn’t the right abstraction for structures that aren’t sequences. So let’s usecardinalityfor the genericisation to Foldable.Given we are where we are. I think the answer to Edward’s question is, at the top of every program:
length :: [a] -> Int; length = Prelude.length;
cardinality = Prelude.length;
[**] And there wasn’t a
lengthin Data.Foldable. I said “didn’t stuff around” because of the shackle of backwards compatibility. We usually interpret that shackle to mean ‘don’t break existing code’. Usually when a new feature’s being introduced, there’s some extra syntax so you can explicitly invoke it. The trouble with the FTP change was deliberately no change in syntax. So existing code didn’t break; good in general except ….There’s a corollary to ‘don’t break existing code’ (which only matters when you’re not introducing new syntax): ‘don’t make code that breaks suddenly non-breaking’.