In the previous post, I’ve given you an overview of what the compilation process is like in Python. Now that we have the compiled bytecode let’s see what happens when the Python Virtual Machine, the interpreter, runs that code.
Some programming languages (e.g., C++) are translated into instructions that run directly on hardware. We speak of compiling programs in that language onto a specific hardware architecture. Other programming languages (e.g., Java and Python) are translated to run on a common architecture: interpreters (or virtual machines) for that architecture can be written in a low-level language (like C, which targets most architectures) or in the machine’s hardware language itself for maximal speed, so that exactly the same translated programs can run on any architecture that has the virtual machine program implemented on it.
This makes the interpreted languages more portable than the compiled one because to produce a good compiler for a new architecture, and compiling a program can take a long time. But this comes with a cost, the interpreted represent an extra layer of software that reduces the language performance, so the interpreted languages are 2-10 times slower than the compiled one. So there isn’t the right way to write a language translator: each approach comes with its own advantages and disadvantages, and each can be used/abused in situations.
A Python bytecode instruction consist of:
dis.opmap
.None
None
False
For example let’s take a look of how the BINARY_ADD
instruction is structured:
Instruction(
opname='BINARY_ADD',
opcode=23,
arg=None,
argval=None,
argrepr='',
offset=14,
starts_line=None,
is_jump_target=False
)
The human readable name for the instruction is BINARY_ADD
and the corresponding opcode is 23
. It has no numeric argument (arg
) to the operation hence no argval
and argrepr
. The offset
, which is the start index of the operation within the bytecode sequence, is 16. The +
in this example is not the first instruction in the line hence starts_line
is None. No code is jumping to this instruction resulting in is_jump_target=False
.
Given a Python function you can always access its code attribute and get the bytecode, here’s how
def factorial(x) -> int:
facts = [1]
for i in range(1, x + 1):
facts.append(facts[i - 1] * i)
return facts[x]
def func_obj(fo):
print(fo.__name__)
print(' co_varnames:',fo.__code__.co_varnames)
print(' co_names :',fo.__code__.co_names)
print(' co_consts :',fo.__code__.co_consts,'\n')
print('Source Line m operation/byte-code operand (useful name/number)\n'+69*'-')
dis.dis(fo)
func_obj(factorial)
factorial
co_varnames: ('x', 'facts', 'i')
co_names : ('range', 'append')
co_consts : (None, 1)
Source Line m operation/byte-code operand (useful name/number)
---------------------------------------------------------------------
4 0 LOAD_CONST 1 (1)
2 BUILD_LIST 1
4 STORE_FAST 1 (facts)
5 6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 1 (1)
10 LOAD_FAST 0 (x)
12 LOAD_CONST 1 (1)
14 BINARY_ADD
16 CALL_FUNCTION 2
18 GET_ITER
>> 20 FOR_ITER 26 (to 48)
22 STORE_FAST 2 (i)
6 24 LOAD_FAST 1 (facts)
26 LOAD_METHOD 1 (append)
28 LOAD_FAST 1 (facts)
30 LOAD_FAST 2 (i)
32 LOAD_CONST 1 (1)
34 BINARY_SUBTRACT
36 BINARY_SUBSCR
38 LOAD_FAST 2 (i)
40 BINARY_MULTIPLY
42 CALL_METHOD 1
44 POP_TOP
46 JUMP_ABSOLUTE 20
8 >> 48 LOAD_FAST 1 (facts)
50 LOAD_FAST 0 (x)
52 BINARY_SUBSCR
54 RETURN_VALUE
There are tons of code attribute, if are curious check out the official docs. These attributes are used directly by the bytecode built by the compilation process. But before we’ll dive into the bytecode of the factorial function, let’s first try to understand how it works with a simple example.
The PVM is a stack bases virtual machine, which means that it uses a “regular” stack for the execution of the bytecode.
There is also a secondarily important block stack that is used to store information about nested loops, try, and with statements. For example, a break statement is translated into code that uses the block stack to determine which loop to break out of (and how to continue executing at the first statement outside the loop). As loops, try/except, and with statements are started, information about their blocks are pushed onto the block stack; as they terminate, this information is popped off the block stack.
Let’s see what happens when the PVM execute a simple math operation: d = a + b * c
, assuming that a, b, c, and d are local variables inside a function: assume co_varnames is (‘a’, ‘b’, ‘c’, ‘d’) and the actual values
for these names are stored in a parallel tuple (1, 2, 3, None): e.g., the value
for ‘a’ is 1, the value for ‘b’ is 2, the value for ‘c’ is 3, and the value for
‘d’ is None. Generally, the value for a name at index i in the co_varnames tuple
is stored in index i in the tuple of actual values.
The PVM code for d = a + b *c
is
0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 LOAD_FAST 2 (c)
6 BINARY_MULTIPLY
8 BINARY_ADD
10 STORE_FAST 3 (d)
The first column is the offset of the given instruction from the start of the bytecode. Assuming the bytecode string is contained in an array, then this value is the index of the instruction into the array. The second column is the actual human-readable instruction opcode; the full range of opcodes are found in the Include/opcode.h
module. The third column is the argument to the instruction.
The last column is the value of the argument - provided by the dis function for ease of use.
The LOAD_FAST
instruction pushes a reference to the local co_varnames[var_num]
onto the stack. For the first instruction, LOAD_FAST
pushes onto the stack co_varnames[0]
, which is a
.
The BINARY_MULTIPLY
instruction load/push onto the stack the * of the two values on the top.
The BINARY_ADD
instrucion load/push onto the stack the + of the two values on the top.
The STORE_FAST
istruction store/pop the value on the top of the stack into co_varnames[N]
.
Here’s what happens in the stack step by step.
index | value |
---|---|
3 | |
2 | |
1 | |
0 |
Initially the stack is empty.
index | value |
---|---|
3 | |
2 | |
1 | |
0 | 1: value of a |
execute LOAD_FAST 0
index | value |
---|---|
3 | |
2 | |
1 | 2: value of b |
0 | 1: value of a |
execute LOAD_FAST 1
index | value |
---|---|
3 | |
2 | 3: value of c |
1 | 2: value of b |
0 | 1: value of a |
execute LOAD_FAST 2
index | value |
---|---|
3 | |
2 | |
1 | 6: value of b*c |
0 | 1: value of a |
execute BINARY_MULTIPLY
index | value |
---|---|
3 | |
2 | |
1 | |
0 | 7: value of a+b*c |
execute BINARY_ADD
index | value |
---|---|
3 | |
2 | |
1 | |
0 |
execute STORE_FAST 3
At this point d’s value is 7, the value that was at the top of the stack when STORE_FAST was executed. The actual values for these names are stored in the tuple (1, 2, 3, 7).
Now that we understand, or at least have an overview of how the bytecode is execute let’s see how the bytecode of the factorial function listed above is actually executed.
Here’s again the resulting bytecode:
Source Line m operation/byte-code operand (useful name/number)
---------------------------------------------------------------------
4 0 LOAD_CONST 1 (1)
2 BUILD_LIST 1
4 STORE_FAST 1 (facts)
5 6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 1 (1)
10 LOAD_FAST 0 (x)
12 LOAD_CONST 1 (1)
14 BINARY_ADD
16 CALL_FUNCTION 2
18 GET_ITER
>> 20 FOR_ITER 26 (to 48)
22 STORE_FAST 2 (i)
6 24 LOAD_FAST 1 (facts)
26 LOAD_METHOD 1 (append)
28 LOAD_FAST 1 (facts)
30 LOAD_FAST 2 (i)
32 LOAD_CONST 1 (1)
34 BINARY_SUBTRACT
36 BINARY_SUBSCR
38 LOAD_FAST 2 (i)
40 BINARY_MULTIPLY
42 CALL_METHOD 1
44 POP_TOP
46 JUMP_ABSOLUTE 20
8 >> 48 LOAD_FAST 1 (facts)
50 LOAD_FAST 0 (x)
52 BINARY_SUBSCR
54 RETURN_VALUE
Note the any line preface by » means that some other instruction in the function can jump to it. Jumping in the PVM (also in the assembly language) is how loops and if statements in Python are do their computation.
For the sake of simplicity, I’ll call TOS
(top of the stacl) the first item store on top of the stack and TOS1
the value store under TOS
.
Now let’s see how the bytecode is executed:
con_consts[1]
) onto the stackco_names[range]
onto the stackco_consts[1]
onto the stackco_varnames[1]
(x
) onto the stack.co_consts[1]
(1
) onto the stackTOS = TOS + TOS1
range
where 2 is the number of argumentsTOS = iter(TOS)
StopIteration
raisedi
(co_varnames[2]
), popping it off the stackfacts
onto the stackappend
onto the stackfacts
onto the stacki
onto the stack1
onto the stackTOS
to TOS1
and store the result in TOS
TOS = TOS1[TOS]
i
on the stackTOS
(i
) and TOS1
(facts[i - 1]
) and store the result in TOS
.TOS1
which is appendTOS
itemfacts
onto the stackx
onto the stackTOS = TOS1[TOS]
, which is TOS = facts[x]
For those of you that are more curious that want to find out more about the dis module and the PVM’s opcodes here the link of the official documentation.
Congratulations, you made it through the post and learned something about the Python bytecode and Python’s dis
-module! I hope you guys have learned something about this beatiful programming language! Stay curious and keep learning!