We’ll motivate this discussion through a simple problem. Consider a function that takes a list of numbers then:
- Multiplies every element by 3
- Filters out any odd elements
- Returns the sum of the remaining elements
In C++, you could write:
#include <vector>
int func(const std::vector<int> &numbers) {
int sum = 0;
for (auto num : numbers) {
// May as well skip early, rather than computing num * 3
if (num % 2 == 1)
continue;
num *= 3;
sum += num;
}
return sum;
}
In Common Lisp you could write (excluding the one liner):
(defun func1 (nums)
(let* ((threes (mapcar (lambda (x) (* x 3)) nums))
(evens (remove-if #'oddp threes)))
(reduce #'+ evens)))
Or write:
(defun func2 (nums)
(loop for n in nums
if (evenp n)
sum (* n 3)))
But what if I wanted to add more steps in my algorithm? I’d have to
hard code it in the loop for the C++ version. In func1
I’ll
probably have to make a new name for the result of the step in the
let
form and rearrange all the steps that come after it. Same with
func2
. If only I could write a single form that describes the
structure of what I’m doing so that I could just swap steps around
or add new ones without worrying about inane things like names. This
is obviously something beyond normal code that does stuff to some
data: we want a language feature.
Adding language features to C++ beyond the child’s play of the preprocesser requires hacking on the compiler. In Lisp, however, it’s simpler because the macro system is so much more capable.
An aside on the macro system of Lisp’s
The reason why the macro system of Lisp’s are so much more capable is because Lisp’s are homoiconic. The sound-byte definition of homoiconicity is “code is data”. To be specific the representation of the program for the interpreter to evaluate is composed of the data structures the code deals with anyway: lists.
For example (+ 1 2)
is an expression which adds the numbers 1
and
2
together. '(+ 1 2)
is a list of three elements: the symbol +
and the numbers 1
and 2
. The only difference is that the
interpreter considers the former something to evaluate, while the
latter is just data. We can tell the interpreter to evaluate data as
program code through the function eval
which takes some form as an
argument then evaluates it. So (eval '(+ 1 2))
is equivalent to 3.
What macros do is simply sit in between when an expression is read and
when it is executed, translating the expressions given (as data) into
some other data. This data is then sent to eval
1. Let’s
consider a simple macro:
(defmacro add-1 (x)
(list '+ x 1))
If I try to evaluate (add-1 10)
the expression is first expanded
into (+ 10 1)
then evaluated to produce 11. This doesn’t seem that
interesting, right? Now what if I told you that most Lisp forms, from
if
to defun
, usually are just macros? If you’ve got some
fundamental definitions and a macro step in your Lisp interpreter, you
can implement most features through macros. For example, if you have
cond
available, you can implement if
:
(defmacro if (test then else)
`(cond
(,test ,then)
(t ,else)))
Here I’m using a little notation built into Common Lisp to make
writing lists with embedded data easier: the object following a comma
is read from the environment then embedded directly into the list,
while the rest are written as-is. (if (= x 1) 1 2)
would expand to:
(cond
((= x 1) 1)
(t 2))
Implementing a threading macro
Getting back to the problem at hand, we want some language feature which takes a list of transformations and does them sequentially. We don’t want to be naming the results of stuff but we want to be able to use the value of the last transformation in the next one. Let’s sketch this out with the toy problem we were doing earlier:
(-> (mapcar (lambda (x) (* x 3)) nums)
(remove-if #'oddp _)
(reduce #'+ _))
This form does the first expression, then the next, then the next.
The _
represents the value of the previous transformation. This
removes a lot of the noise from the previous definitions, and adding a
new step is as simple as adding another expression in the sequence,
using _
if necessary. But how would we implement this?
We know let
binds some symbol to a value. So the macro expansion of
this form could look like:
(let ((_ (mapcar (lambda (x) (* x 3)) nums)))
(let ((_ (remove-if #'oddp _)))
(reduce #'+ _)))
where the nested binding is shadowing the previous value but also using it. While this is very noisy, the top level macro form isn’t. And as long as it works (and, with some human brain Lisp interpreting, it seems to) this should just work.
How would we implement this as a macro? Remember, while the parameters of a macro are data, we can still use all the usual Lisp stuff we want inside a macro to generate the right executable data. Here’s my implementation:
(defmacro -> (first &rest functions)
(let ((builder first))
(loop for function in functions
do (setf builder
`(let ((_ ,builder))
,function)))
builder))
Explaining this code in simple terms, the macro takes one fixed
argument first
and a variadic number of arguments functions
. We
start with builder
being the first function. Then iterate through
the functions
and make a list of nested let
bindings based on the
builder
we’ve had so far. We then return this builder
at the end,
which will be evaluated.
Oh yeah, I forgot to mention. Common Lisp isn’t a pure functional language. You can do mutations of state as you want. Common Lisp even has an OOP system. Anyway… this macro actually works. Like, really well. So well that you could nest it inside of itself and it would do it right:
(-> (-> 1
(+ 2 _)
(- 5 _))
(loop for i from 1 to _
collect i)
(reduce #'+ _))
Of course the child ->
call is a bit useless because you could just
flatten it into the parent, but sometimes you may want to portion off
certain sequences of transformations into their own sections for
clarity.
Isn’t this so cool? The best part is this would work for any expression the user could throw at it, even stuff I couldn’t know about, as long as it’s valid Lisp code. However, there’s a massive caveat…
Why macros are bad - or why Lisp never really took off in industry
For commercial development this is dangerous. At the very minimum you have to be incredibly strict with when to use macros and how they are defined. This is because even if you, the infallible intellectual that you are, are incredible at programming, the next person that has to fix a bug may not be as good. To be less egocentric: heavy and unmediated macro use can make it very difficult for someone, who isn’t an expert in the system, to make meaningful changes and bug fixes.
Macros extends the programming language, which means you’re creating language extensions that other people will probably have no idea about. There are already so many complaints about how varied C++ code bases can be because of how many ways you can solve a problem just using the standard library. Entering a new C++ codebase is like entering a new area in a game where you have to figure out where stuff is and how it works - but at least there’s a hard limit based on the STL. Imagine if every developer who’d ever touched some program had the ability to add language features with very little difficulty - any new codebase would be like a whole new country!
This is probably one of the main reasons Lisp never really took off, besides the fact it’s such a mess in terms of the number of implementations and standards. This beguiles new people hoping to get into Lisp: should I choose Common Lisp? Or Guile Scheme? Or Racket?
But I still love them
It’s a real shame; there are so many features of Lisp’s that make it really easy to write elegant solutions to complicated problems. Macros are an incredibly powerful tool in the right hands and can make the process of problem solving that much quicker.
I’ll leave you with a quote from The C++ Programming Language by Bjarne Stroustrup:
There are only two kinds of languages: the ones people complain about and the ones nobody uses.
-
There’s actually a lot of steps in between but you’re not here for that! ↩︎