BTW I have shell scripts running under triple interpretation: CPython, byterun, and OSH itself :) This is just an experiment toward writing my own VM, not for the final product. The release binary doesn't use byterun at all.
EDIT: There is also a companion bytecode compiler that I mention here: http://www.oilshell.org/blog/2018/03/27.html (but I'm not using it, I'm using the one that used to be in the Python 2 stdlib, which is entirely separate from the one used by the interpreter itself.)
I've watched several of those, and they are good. Though for me, playing with code is a different type of learning than watching videos. I should go back and watch the one on ceval though.
There are some changes to ceval in 3.8 that should make it a bit simpler:
The interpreter loop has been simplified by moving the logic of unrolling the stack of blocks into the compiler. The compiler emits now explicit instructions for adjusting the stack of values and calling the cleaning up code for break, continue and return.
Removed opcodes BREAK_LOOP, CONTINUE_LOOP, SETUP_LOOP and SETUP_EXCEPT. Added new opcodes ROT_FOUR, BEGIN_FINALLY, CALL_FINALLY and POP_FINALLY. Changed the behavior of END_FINALLY and WITH_CLEANUP_START.
(Contributed by Mark Shannon, Antoine Pitrou and Serhiy Storchaka in bpo-17611.)
Wow thanks for the pointer! This is great. I want to move more stuff to compile-time, like name resolution, and moving some control flow to compile time is something I've also wondered about.
I watched a few talks [1] about how C++ exception handling works, and they try to avoid branches/setup blocks in the "happy path". It works a little like longjmp() in C, where you just set the instruction pointer say three function calls down in the stack. But then you have to look up all the exception handlers to run in precomputed tables (which doesn't happen in C). So I wonder if something like that would speed up (my subset of) Python, since exceptions are quite common.
I read through it, and the Python-interpreter-in-Python article, and all the documentation of the 'dis' module, while prepping the talk I'm giving at PyCon next month (which is on bytecode). They were all good resources.
(Technically the base interpreter is a subset/restricted version of Python however). The reason this is done (my understanding), is that a the python source code for the base interpreter can be fed into a JIT generator, which produces a Python interpreter that can perform JIT optimizations.
> Technically the base interpreter is a subset/restricted version of Python however
The base interpreter is written in a subset/restricted version of Python, but it's still Python. You can run Pypy as an interpreter on top of CPython or pypy.
And IIRC the restrictions mostly have to do with static type inference so you're mostly limited in how dynamic/weird the program gets the other big limits being magic methods being unavailable (for user-defined types) and most of the stdlib being off-limits.
So it's Python in a straightjacket, but for most interpreter implementations it's probably fine, as long as you're not using an overly interesting parsing strategy (hello pratt parsers) an interpreter tends not to use the more dynamic/odd corners of the language I think.
There's also interesting pieces like sections not in try/except have reduced checking for out-of-bounds and type mismatch. I don't grok it yet, mostly because I've had no need to use that flavor of Python. But in some ways it's providing flexibility instead of constraints. Flexibility for the interpreter, not necessarily the programmer.
I think so? It's been a long time, and I don't remember exactly what I did while playing around with it. At https://github.com/darius/tailbiter I ported it to Python 3.4 and stripped it down to be in a subset of Python accepted by my self-hosting Python-to-bytecode compiler so that the compiler plus the interpreter could reproduce itself. There's a companion article linked there.
You don't need a language to be a LISP in order for them to be able to interpret themselves. There are several Java interpreters which can interpret themselves, for example.
Byterun confuses host and interpreter in plenty of places.
It uses the host Python for attribute resolution (all code in getters and setters runs on the host), exception handling (which combines with the previous point to make some NameErrors impossible to catch) and even function calls (but all functions are wrapped so that their _call__ switches back to byterun as the interpreter).
Because of that confusion, byterun isn't very useful if you want to be really independent from the host interpreter; it's just too easy to escape from the VM. As a learning exercise however, it is helpful for understanding Python's innards at a level higher than C.
Do they interpret java, the programming language, or java bytecode? Not the same thing. lisp being homoiconic, it's much easier to implement a metaciruclar evaluator.
If I gave you a function that accepted Java source code as text and interpreted it, how would you detect that I was first converting it to Java bytecode and then interpreting that bytecode?
You couldn't, so it's an irrelevant internal detail. The function interprets Java source code.
> If I gave you a function that accepted Java source code as text and interpreted it, how would you detect that I was first converting it to Java bytecode and then interpreting that bytecode?
> You couldn't, so it's an irrelevant internal detail. The function interprets Java source code.
In a Lisp interpreter the actual source code is Lisp data, not text. One can give the function access to its source code, allow it to inspect it or even to change it - while the interpreter is executing this source code.
In a Lisp interpreter, the code can even modify the source code (we are not talking about the byte-code or machine code) while it is running.
The PRINTED expression value is now computed with the - operator.
The interpreted execution itself allows also different things then the compiled execution. For example if we have a break point, we can see the actual source code currently executed and we can change it while in the break point and then resume execution. Since the interpreter runs the source code, we can also can modify the interpreter to do something else with the source code, while it is executing it - like recording it or tracing it or stepping it.
Yeah I know about how Lisp works - I have a PhD in metaprogramming using interpreters and meta-circular compilers. I just don't agree it's really materially different.
A compiler takes code to transform it into another medium: binary executable, code (transpiler), ...
A interpreter takes code to evaluate it.
All the self proclaimed meta circular or "self-hosted" implementations of java that i have seen actually require an external compiler (eg: javac from the JDK). They are in reality implementations of the java virtual machine, not the language; You can't feed java code to them to execute. they are not even self-hosted.
Not sure, but I think I read recently (maybe in the Rebol docs) that it is also a homoiconic language. And I think I also read that Rebol programs can interpret chunks of Rebol code passed to them.
Doesn't need to be a Lisp. All kinds of interpreters can interpret themselves if they're written in the same language -- similar to bootstraping a compiler.
Slightly off topic... If processor can only understand assembly grammar, what's the difference between an interpreter and a compiler?
I understand that interpreter can directly execute the intermediate byte code (giving us portability benefits), but at some point it must convert the byte code into platform independent machine code. Right?
What exactly differentiates an interpreter and a compiler?
an evaluator that is written in the same language that it evaluates is said to be metacircular[1]. Note that this is not the case here because it's not a python interpreter but a cpython bytecode interpreter. It would have been metacircular if it were written in that bytecode.
But ceval.c is another beast entirely, being full of macros and gotos, not to mention being 5000 lines long.
The interpreter loop starts here: https://github.com/python/cpython/blob/master/Python/ceval.c...
So I appreciate seeing the algorithm laid out in Python. In particular it clarifies that there are three separate stacks:
1. call stack
2. block stack for try/except, loops, etc.
3. value stack for evaluating expressions
It also clarifies how generators work, which IMO is very difficult to follow from the C source (i.e. without a design doc to go along).
I wrote about some of my recent work here: https://www.reddit.com/r/oilshell/comments/8b0n6z/opyreadmem...
BTW I have shell scripts running under triple interpretation: CPython, byterun, and OSH itself :) This is just an experiment toward writing my own VM, not for the final product. The release binary doesn't use byterun at all.
EDIT: There is also a companion bytecode compiler that I mention here: http://www.oilshell.org/blog/2018/03/27.html (but I'm not using it, I'm using the one that used to be in the Python 2 stdlib, which is entirely separate from the one used by the interpreter itself.)