A Python Interpreter Written in Python (2016)

chubot · on April 9, 2018

I found this code very useful. I've been spelunking in CPython and mostly I understand it, and can find my way around.

But ceval.c is another beast entirely, being full of macros and gotos, not to mention being 5000 lines long.

The interpreter loop starts here: https://github.com/python/cpython/blob/master/Python/ceval.c...

So I appreciate seeing the algorithm laid out in Python. In particular it clarifies that there are three separate stacks:

1. call stack

2. block stack for try/except, loops, etc.

3. value stack for evaluating expressions

It also clarifies how generators work, which IMO is very difficult to follow from the C source (i.e. without a design doc to go along).

I wrote about some of my recent work here: https://www.reddit.com/r/oilshell/comments/8b0n6z/opyreadmem...

BTW I have shell scripts running under triple interpretation: CPython, byterun, and OSH itself :) This is just an experiment toward writing my own VM, not for the final product. The release binary doesn't use byterun at all.

EDIT: There is also a companion bytecode compiler that I mention here: http://www.oilshell.org/blog/2018/03/27.html (but I'm not using it, I'm using the one that used to be in the Python 2 stdlib, which is entirely separate from the one used by the interpreter itself.)

00ajcr · on April 9, 2018

Philip Guo has an excellent set of video lectures on CPython internals, which includes an overview of parts of ceval.c: http://pgbovine.net/cpython-internals.htm

chubot · on April 9, 2018

I've watched several of those, and they are good. Though for me, playing with code is a different type of learning than watching videos. I should go back and watch the one on ceval though.

Joona · on April 10, 2018

There are some changes to ceval in 3.8 that should make it a bit simpler:

  The interpreter loop has been simplified by moving the logic of unrolling the stack of blocks into the compiler. The compiler emits now explicit instructions for adjusting the stack of values and calling the cleaning up code for break, continue and return.
  Removed opcodes BREAK_LOOP, CONTINUE_LOOP, SETUP_LOOP and SETUP_EXCEPT. Added new opcodes ROT_FOUR, BEGIN_FINALLY, CALL_FINALLY and POP_FINALLY. Changed the behavior of END_FINALLY and WITH_CLEANUP_START.
  (Contributed by Mark Shannon, Antoine Pitrou and Serhiy Storchaka in bpo-17611.)

https://docs.python.org/3.8/whatsnew/3.8.html

https://bugs.python.org/issue17611

chubot · on April 10, 2018

Wow thanks for the pointer! This is great. I want to move more stuff to compile-time, like name resolution, and moving some control flow to compile time is something I've also wondered about.

I watched a few talks [1] about how C++ exception handling works, and they try to avoid branches/setup blocks in the "happy path". It works a little like longjmp() in C, where you just set the instruction pointer say three function calls down in the stack. But then you have to look up all the exception handlers to run in precomputed tables (which doesn't happen in C). So I wonder if something like that would speed up (my subset of) Python, since exceptions are quite common.

[1] https://www.youtube.com/watch?v=_Ivd3qzgT7U

ubernostrum · on April 9, 2018

You may find this useful:

https://leanpub.com/insidethepythonvirtualmachine

I read through it, and the Python-interpreter-in-Python article, and all the documentation of the 'dis' module, while prepping the talk I'm giving at PyCon next month (which is on bytecode). They were all good resources.

asperous · on April 9, 2018

Just in case anyone doesn't know, there's actually a real, live project that does this:

https://pypy.org/

(Technically the base interpreter is a subset/restricted version of Python however). The reason this is done (my understanding), is that a the python source code for the base interpreter can be fed into a JIT generator, which produces a Python interpreter that can perform JIT optimizations.

masklinn · on April 9, 2018

> Technically the base interpreter is a subset/restricted version of Python however

The base interpreter is written in a subset/restricted version of Python, but it's still Python. You can run Pypy as an interpreter on top of CPython or pypy.

And IIRC the restrictions mostly have to do with static type inference so you're mostly limited in how dynamic/weird the program gets the other big limits being magic methods being unavailable (for user-defined types) and most of the stdlib being off-limits.

So it's Python in a straightjacket, but for most interpreter implementations it's probably fine, as long as you're not using an overly interesting parsing strategy (hello pratt parsers) an interpreter tends not to use the more dynamic/odd corners of the language I think.

xapata · on April 10, 2018

There's also interesting pieces like sections not in try/except have reduced checking for out-of-bounds and type mismatch. I don't grok it yet, mostly because I've had no need to use that flavor of Python. But in some ways it's providing flexibility instead of constraints. Flexibility for the interpreter, not necessarily the programmer.

sametmax · on April 10, 2018

Yes but pypy is designed for speed, not for simplicity. The source code is very, very hard to understand.

cPython is the reference implementation, and has an explicity goal of being easier to understand. Yet, some part of it are quite obscure.

So here we are, with this beautiful blog post

0x7f800000 · on April 9, 2018

But can the interpreter interpret itself?

flubert · on April 9, 2018

Does anyone have a history of self-hosting compilers / interpreters? I bet there is more to flesh out here than what Wikipedia has to offer:

https://en.wikipedia.org/wiki/Bootstrapping_(compilers)#Hist...

abecedarius · on April 10, 2018

I think so? It's been a long time, and I don't remember exactly what I did while playing around with it. At https://github.com/darius/tailbiter I ported it to Python 3.4 and stripped it down to be in a subset of Python accepted by my self-hosting Python-to-bytecode compiler so that the compiler plus the interpreter could reproduce itself. There's a companion article linked there.

prions · on April 9, 2018

This was my question as well. I believe the scope of this project is to interpret Python bytecode using Python.

ASalazarMX · on April 9, 2018

Obligatory exec() mention.

niroze · on April 9, 2018

It isn't a LISP :P

michaf · on April 9, 2018

There is, however, http://www.aosabook.org/en/pypy.html , which is indeed self-hosting.

chrisseaton · on April 9, 2018

You don't need a language to be a LISP in order for them to be able to interpret themselves. There are several Java interpreters which can interpret themselves, for example.

kazinator · on April 10, 2018

You just need it to be a Lisp in order to do it in such a way that you get confused whether you're in the interpreted language or the host one.

yorwba · on April 10, 2018

Byterun confuses host and interpreter in plenty of places.

It uses the host Python for attribute resolution (all code in getters and setters runs on the host), exception handling (which combines with the previous point to make some NameErrors impossible to catch) and even function calls (but all functions are wrapped so that their _call__ switches back to byterun as the interpreter).

Because of that confusion, byterun isn't very useful if you want to be really independent from the host interpreter; it's just too easy to escape from the VM. As a learning exercise however, it is helpful for understanding Python's innards at a level higher than C.

kazinator · on April 10, 2018

I mean actually getting confused.

CodeArtisan · on April 10, 2018

Do they interpret java, the programming language, or java bytecode? Not the same thing. lisp being homoiconic, it's much easier to implement a metaciruclar evaluator.

https://en.wikipedia.org/wiki/Homoiconicity

chrisseaton · on April 10, 2018

If I gave you a function that accepted Java source code as text and interpreted it, how would you detect that I was first converting it to Java bytecode and then interpreting that bytecode?

You couldn't, so it's an irrelevant internal detail. The function interprets Java source code.

_19qg · on April 10, 2018

> If I gave you a function that accepted Java source code as text and interpreted it, how would you detect that I was first converting it to Java bytecode and then interpreting that bytecode?

> You couldn't, so it's an irrelevant internal detail. The function interprets Java source code.

In a Lisp interpreter the actual source code is Lisp data, not text. One can give the function access to its source code, allow it to inspect it or even to change it - while the interpreter is executing this source code.

In a Lisp interpreter, the code can even modify the source code (we are not talking about the byte-code or machine code) while it is running.

    CL-USER 20 > (let ((code (copy-tree '(+ 1 2 bar))))
                   `(defun foo (bar)
                      (print ,code)
                      (unless (eq (first ',code) '-)
                        (setf (first ',code) '-)
                        (foo bar))
                      (values)))
    (DEFUN FOO (BAR) (PRINT (+ 1 2 BAR)) (UNLESS (EQ (FIRST (QUOTE (+ 1 2 BAR))) (QUOTE -)) (SETF (FIRST (QUOTE (+ 1 2 BAR))) (QUOTE -)) (FOO BAR)) (VALUES))

    CL-USER 21 > (eval *)
    FOO

    CL-USER 22 > (foo 41)

    44 
    -42

As you can see the function modified itself so that the operator + was replaced with a - and then called itself again with the same argument.

If we now look at the function, we can see that it was indeed changing itself:

    CL-USER 23 > (pprint (function-lambda-expression #'foo))

    (LAMBDA (BAR)
      (DECLARE (SYSTEM::SOURCE-LEVEL #<EQ Hash Table{0} 41B04012D3>))
      (DECLARE (LAMBDA-NAME FOO))
      (PRINT (- 1 2 BAR))
      (UNLESS (EQ (FIRST '(- 1 2 BAR)) '-) (SETF (FIRST '(- 1 2 BAR)) '-) (FOO BAR))
      (VALUES))

The PRINTED expression value is now computed with the - operator.

The interpreted execution itself allows also different things then the compiled execution. For example if we have a break point, we can see the actual source code currently executed and we can change it while in the break point and then resume execution. Since the interpreter runs the source code, we can also can modify the interpreter to do something else with the source code, while it is executing it - like recording it or tracing it or stepping it.

chrisseaton · on April 10, 2018

Yeah I know about how Lisp works - I have a PhD in metaprogramming using interpreters and meta-circular compilers. I just don't agree it's really materially different.

_19qg · on April 10, 2018

I don't have a PhD in metaprogramming, but I see the difference between a interpreted Lisp code and compiled code during development quite a lot.

CodeArtisan · on April 10, 2018

I agree with that.

A compiler takes code to transform it into another medium: binary executable, code (transpiler), ...

A interpreter takes code to evaluate it.

All the self proclaimed meta circular or "self-hosted" implementations of java that i have seen actually require an external compiler (eg: javac from the JDK). They are in reality implementations of the java virtual machine, not the language; You can't feed java code to them to execute. they are not even self-hosted.

vram22 · on April 10, 2018

Not sure, but I think I read recently (maybe in the Rebol docs) that it is also a homoiconic language. And I think I also read that Rebol programs can interpret chunks of Rebol code passed to them.

vram22 · on April 10, 2018

I checked:

https://en.wikipedia.org/wiki/Homoiconicity#In_Rebol

coldtea · on April 9, 2018

Doesn't need to be a Lisp. All kinds of interpreters can interpret themselves if they're written in the same language -- similar to bootstraping a compiler.

niroze · on April 9, 2018

It was a geek joke, I'm not serious. I've written plenty of both languages, even in .gov.

Why so serious?!

sethgecko · on April 10, 2018

I made something pretty similar the other day. An interpreter for the bitcoin scripting language with a stack and the most used opcodes. Code here : https://github.com/mcdallas/cryptotools/blob/master/btctools...

kgoutham93 · on April 10, 2018

Slightly off topic... If processor can only understand assembly grammar, what's the difference between an interpreter and a compiler?

I understand that interpreter can directly execute the intermediate byte code (giving us portability benefits), but at some point it must convert the byte code into platform independent machine code. Right?

What exactly differentiates an interpreter and a compiler?

jwilk · on April 9, 2018

(2016), according to the HTTP headers:

  Last-Modified: Sat, 09 Jul 2016 12:15:59 GMT

codetrotter · on April 9, 2018

I’m going to start having my server send

    Last-Modified: Mon, 20 Apr 3018 13:37:00 GMT

Then we will see what people say when my stuff is posted :^)

Then again my stuff probably wouldn’t get much attention to begin with.

But if it did...

People would have to admit that they are all living in the present whereas I was living in the future.

make3 · on April 9, 2018

pay your registrar to return a WHOIS on Mars. https://news.nationalgeographic.com/2016/10/planets-maps-exp...

wiz21c · on April 10, 2018

>>> 13:37:00 GMT

lamer ! :-)

jstrieb · on April 9, 2018

Updated the title

CodeArtisan · on April 10, 2018

an evaluator that is written in the same language that it evaluates is said to be metacircular[1]. Note that this is not the case here because it's not a python interpreter but a cpython bytecode interpreter. It would have been metacircular if it were written in that bytecode.

[1] https://sarabander.github.io/sicp/html/4_002e1.xhtml#g_t4_00...

carlmr · on April 10, 2018

Ouroboros would be a good icon for this project.

fermigier · on April 10, 2018

It's already the PyPy logo: https://pypy.org/image/pypy-logo.png