Fun with Python AST
I have been told for a long time that there is an ast
module in the Python
standard libraries, which is used by many useful tools, for example, pytest.
However, I was only vaguely aware of it, and had never been motivated to learn
it. A while ago, I found an interesting library moshmosh, which made me
finally go through the documentation of ast
and try to play with it.
As an example, in this post I will implement a simple “pipeline” feature available in many places – Unix shells, F#, Lisp, R, etc.
Pipeline with infix notation
|
is used for pipelines in Unix shells, but is reserved for “bitwise or”
operator in Python. However, we may customize the behavior by defining __or__
:
class Pipe:
def __init__(self, data):
self.data = data
def __or__(self, fn):
return Pipe(fn(self.data))
This is the approach taken by the Pipe library (although the implementation is slightly different). It can be used as:
import functools
data = [[89, 90, 99],
[77, 76, 82],
[95, 97, 99]]
Pipe(data) | functools.partial(map, sum) | max | print
# => 291
This is simple, and seems to work. However, we have to wrap the result in as a
Pipe
instance in order to chain multiple operations, and perhaps we need to
unwrap it at the end to work with other functions, making it tedious to use.
What if we just override the |
operator completely? That is achievable using
ast
:
import ast
class PipeTransformer(ast.NodeTransformer):
def visit_BinOp(self, node):
if isinstance(node.op, ast.BitOr):
return ast.Call(
self.visit(node.right),
[self.visit(node.left)],
[],
lineno=node.lineno,
col_offset=node.col_offset)
return self.generic_visit(node)
source = """\
import functools
data = [[89, 90, 99],
[77, 76, 82],
[95, 97, 99]]
data | functools.partial(map, sum) | max | print
"""
exec(
compile(
PipeTransformer().visit(ast.parse(source)),
filename="<string>",
mode="exec"))
# => 291
Here we rewrite the AST and convert every BitOr
node into a Call
node in the
tree. This is the approach taken by moshmosh. This is a clean and neat way to
implement pipelines. However, it only supports functions that take one argument.
This is not a big drawback, but I somehow miss the %>%
operator in R. It is
impossible to have %>%
in Python because it does not allow us to define custom
infix operators, but fortunately it is possible to have the threading macros
->
and ->>
from Clojure.
Threading macros
The threading macro ->>
in Emacs-Lisp (which I believe is borrowed from
Clojure) is:
(->> '((89 90 99)
(77 76 82)
(95 97 99))
(mapcar (lambda (xs) (apply #'+ xs)))
(apply #'max))
; => 291
Basically it takes the result of one step as the last argument of the function
call at the next step (and the ->
use the value as the first argument). Here
is a quick and simple implementation for the ->>
macro in Python (note that it
may not work for all cases):
import ast
class PipeTransformer2(ast.NodeTransformer):
def visit_Call(self, node):
if isinstance(node.func, ast.Name) and node.func.id == "pipe":
args = (self.visit(arg) for arg in node.args)
new_node = next(args)
for arg in args:
if isinstance(arg, ast.Call):
arg.args.append(new_node)
new_node = arg
else:
new_node = ast.Call(
arg,
[new_node],
[],
lineno=arg.lineno,
col_offset=arg.col_offset)
return new_node
return self.generic_visit(node)
source = """\
data = [[89, 90, 99],
[77, 76, 82],
[95, 97, 99]]
pipe(data, map(sum), max, print)
"""
exec(
compile(
PipeTransformer2().visit(ast.parse(source)),
filename="<string>",
mode="exec"))
# => 291
This is my favorite way so far.
Outro
How can we use it to run a Python script? I can come up with two ways. The first
one is to make a wrapper around the interpreter that reads a file into a string,
parses it, transforms it, and finally executes it – exactly what we do in the
snippets above. However, in that way we cannot apply the transformations to the
modules imported. The second way is to customize how a module is loaded with the
meta path hook. In that way all the modules imported will use the extensions.
That is what the moshmosh library uses. If you are interested you can have a
look at the extension_register.py
file.
ast
gives us a chance to modify the AST before it is executed. In some sense
that is like what we do with Lisp macros. I think it is neat but perhaps will
not use it in any serious project. Even though it is less hacky than inspect
,
it still feels like a big hack. Besides, I tend to avoid introducing new
dependencies only for syntax sugars. The low magic way is good enough in most
cases.