"CONS" sucks; "LIST*" rules!

What's the problem, anyway?

     Guy Steele1 has infamously stated "stop using cons". It basically bakes in the singly-forward-linked2 list to your program, leading to all kinds of problems (gc, fragmentation, hardened code, interfaces). Part of his plea was to stop coding to the "singly-forward-linked" interface. It also makes it difficult to impossible to programmatically optimize code, especially parallelizing.3

The tantalizing allure of simplicity is emblematic of why CONS is so hard to get away from.

Not everything in LISP is a list

     The minimum API calls for lists are car, cdr, and cons.4 Using these three primitives, we can define whatever data structures we want (out of lists, anyway). This is part of the reason for the popular misconception that lists are the only data structure available. Another reason is the very name LISt Processing lends itself to this kind of misinterpretation. LISP has one of the most complete mathematical stack out of the most commonly used computer languages today. It was also one if the first languages to include hashtables as a native datatype, as well as one of the first to include objects (CLOS).

     Remaking the world with only a few primitives goes back to Locke/Descartes. It is a romantic, aspiration notion, but it's a halcyon dream. In reality, this simplicity benefits language designers more than language users. This is the ultimate sin in language design. In essence, Steele's sentiment was that cons is too low-level for normal use and should be relegated to the dustbin of history.5 He is correct.

CONS is fiddly

     Manually building complex data structures from tiny pieces usually results in code that is hard to read, even for the author. Et al Einstein: “Make things as simple as possible but no simpler.” The base API for singly-forward-linked lists is simple, but cons is too simple.

     First of all, there are tools that mitigate the use of cons, the most famous being list itself. Without the list operator, it's extremely difficult, if not downright impossible, to write useful programs. Clearly, this code smell indicates a larger predicament.

     The improper list is . . . weird. Improper lists are also called, less pugilistically, dotted lists. Let's analyze the language here. An improper list is, by definition, imperfect, but who defines perfection? In this case, it's most computer scientists. The dotted list is an edge case, because you don't need it. No other languages define this in the deepest core of the machinery of the language itself..

     Disregarding this edge case, it is trivial to represent a list as an array or vector6. Furthermore, no other simple data types possess this property of having the last value in a different state that has a different method to access, and also produces errors when you try to use an improper list as a proper list. Finally, it increases cognitive load when your algorithm has to constantly take into consideration not only the possibility of processing an improper list, but also of generating one accidentally. Suddenly every interface involving lists has been doubled i.e. “Will this thing blow up if i pass in this special type of list?”

“Just one more thing”

     Additionally, this property is sometimes used to signal to the function that you should do something differently, like scheme, where a dotted list signals that the last argument is really a &rest argument. this allows the language to drop the &rest symbol to simplify the world, but moves the load onto the improper list. In Common Lisp, a dotted list as an argument list is always an error, so this a clever hack to reduce the number of symbols in the system dictionary. (is this a comp sci pun? other are, "cadr" for second generation lisp machine) (contractions: caadr cadr cddr, etc. all the way to 5. For simple cases, elt works too (but car/cdr could be more efficient, but in modern environments this mostly a wash.))

     So by using list (and friends), we can avoid cons nearly all the time. The edge case is improper lists. Improper lists are also called, less pugilistically charged, dotted lists. Let's analyze the language here. An improper list is, by definition, imperfect, but who defines perfection? In this case, it's most computer scientists.

Enter LIST*

     list* (notice the star at the end) is the solution to this madness. Well, maybe not the underlying problem, but at least we can completely discard the dreaded cons.7

     To put it in modern Comp Sci parlance:

  • list is the constructor for a proper list. It takes all its arguments and returns them in a ready-to-go, proper, fresh guaranteed fresh list.

  • list* is the constructor for a dotted list. In contrast to list, the list it returns has the last value as the final cdr instead of a node.

     Singly-linked lists may have shared nodes, but NO lists that come from the list constructor share ANY nodes except for list termination (the NIL node)8.
The same is true of lists coming from the list* constructor, except it does not require NIL at all.

     Finally, here is the final proof we can kick cons to the curb for all time:

  • Calling cons with anything other than two arguments is always an error.

  • Calling list* with two arguments produces the identical return value as calling cons with two arguments.

  • Therefore, there is no application of cons that cannot be replaced by list* with no change. 9

In Conclusion

     Stop using cons. Immediately. Just don't do it. Use list* instead. If something breaks, it was already broken.

CONS is too low level for everyday use.

LIST* is the more flexible, improved constructor.

  1. Guy L. Steele Jr. is one of the foremost LISP giants. A designer of Scheme. Was on (all?) Common Lisp committee(s). Authored both versions of "Common Lisp the Language". Designed TECO codes at MIT. Originally wrote emacs (based on TECO, before richard stallman).

  2. I use the term singly-forward-linked instead of the more classical singly-linked to make explicit the underlying assumption that the links are forward pointing. Technically you could also have a singly-backwards-pointing list or some other non-traditional list structure.

  3. See also:

  4. Some might include null here for detecting the end of a list, but if you have eq and NIL you can define it yourself.

  5. Lisp is so old, the dustbin of history goes back over 60 years! In Common Lisp, things never go away, they just get moved to the back of the toolshed. For example, check out rplaca and rplacd. Every time I've ever read about them, there has always been an admonition to find a better way. Maybe it's possible that every assembly language primitive in the IBM 704 shouldn't be brought whole stock into the present day?

  6. In Lisp, an array is a contiguous multi-dimensional array, while a vector is an array with only one dimension.

  7. We've already established that cons is the fundamental building block for all lists, but this is the lie that helps the medicine go down.

  8. Is NIL a node or a symbol? Is it false, or is false a separate value? What does equality mean, anyway? These are all deeper questions only tangentially related to this discussion.

  9. It would be possible to write code that requires code introspection, then makes decisions based on a hard-coded understanding of cons. If you're doing this, you're really drawing outside the lines, and your code is brittle. Good code would treat the two argument versions as the same (because they are the same).

© Copyright December 2019, Joel M Ward. All Rights Reserved