Introspecting Python Functions

Posted on Sun 19 February 2017 in Blog • Tagged with PySketch, Sketches, Python, Scripts, Interpreter, Inspection, Programming

Toward the end of the project article for PySketch I pointed out that it would be more useful if it could detect which python modules you are using automatically, rather than guessing with a predefined list. Recently I have been working on this problem using Python's introspection engine.

Note that I am using python 3.5.2 that is packaged in Ubuntu 16.04*

I am just gonna say it now: I do not want to parse Python source, so lets go look at the introspection engine.

Python uses the built in function dir() to do basic object inspection. In case you don't know in Python everything is expressed as an object, so dir(True) is valid code and it will return a list of attributes for the object:

>>> dir(True)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__',
 '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__',
 '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__int__',
 '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__',
 '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__',
 '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__',
 '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__',
 '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length',
 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']

Most of the listed attributes are built in default attributes that we typically don't need to worry about, however we can find out which ones are common to all objects by creating an empty class:

>>> class Empty:
...     pass
... 
>>> e = Empty()
>>> dir(e)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
 '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__',
 '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
 '__subclasshook__', '__weakref__']

Everything you see here is part of Python's base object type, everything else is inherited. Most of these are handles to built in functions, provided so that the class can override them and/or respond to events. For example __init__ should be familiar to anyone who has written a class in python before; it is the constructor, and it is provided for overriding from the base object definition.

So lets look at a sketch loaded with PySketch:

>>> import sketches
>>> loader = sketches.ModuleLoader("example.pys")
>>> s = loader.sketch
>>> dir(s)
['__builtins__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'cleanup',
 'loop', 'setup'] 

As expected we have the same behaviour as in the class. Listed are the docstring, the file the module was loaded from, it's parent package, and the name for module. In all there is less here than I expected. We will politely ignore __builtins__ as it exposes nothing useful for our goal. We also have the three functions we define in the sketch. Lets inspect the setup function of this sketch:

def setup():
    pin = 18
    clock = 22
    GPIO.setmode(GPIO.BCM)
    GPIO.setup(pin, GPIO.OUT)
    GPIO.setup(clock, GPIO.OUT)
    GPIO.output(pin, GPIO.High)
    time.sleep(1000)
    GPIO.output(pin, GPIO.Low)

Original Code

>>> f = s.setup
>>> dir(f)
['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__',
 '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__',
 '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__',
 '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__',
 '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

dir() output

I have stripped out the values populated in the empty class from above. While some of them may contain useful data I'm pretty sure most of them are useless inherited values which just allow Python to do it's thing; this is what we are left with:

>>> dir(f) # With the values present in Empty() from above removed 
['__annotations__', '__call__', '__closure__', '__code__', '__defaults__', '__get__', '__globals__',
 '__kwdefaults__', '__name__', '__qualname__']

Ahh that's more manageable

The meanings of all these cryptic attributes are tabulated below. The types were extracted from the python console with the type() command. :

Symbol: Type: Value:
__annotations__ dict Dictionary containing function annotations as defined in PEP-3107. Not To be confused with function decorators.
__call__ method-wrapper "Pointer" to the function executable; can be called, can be overridden
__closure__ NoneType If you understand function closures you will understand this; I don't.
__code__ class 'code' Class wrapping the code the function calls
__defaults__ NoneType Contains the default values for positional and keyword arguments. (Not accessible in the interpreter)
__get__ method-wrapper Function that returns it's parent (the defined function) for rebinding to objects
__globals__ dict Dictionary of the scope the function was declared in
__kwdefaults__ NoneType Contains the default values for keyword only arguments. (Not accessible in the interpreter)
__name__ string Literally the token written after 'def'
__qualname__ string Qualified name for the function, providing a name aware of scope and nesting. See PEP-3155

Unfortunately for us most of these attributes are metadata and scope; little of it relates to the contents of the function call. Therefore the only attribute of any real interest to us is __code__.

>>> dir(f.__code__)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
 '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__',
 '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount',
 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars',
 'co_kwonlyargcount', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames']

Output from inspecting __code__ object

>>> dir(f.__code__) # With the values present in Empty() from above removed 
['co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags',
 'co_freevars', 'co_kwonlyargcount', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize',
 'co_varnames']

# Oh look all the __*__ values evaporated

Output from inspection of the code object with the base attributes removed

If we strip attributes we find in the empty class again we end up with a code object that contains only fields relevant to a method. While a lot of these can be worked out with context I think it's best to refer to the documentation here. Unfortunately the python documentation is notoriously bad and despite some exhaustive searching the closest I could find to official documentation for the __code__ class comes from the "inspect" module documentation, which can be found here. This information was incomplete, however I was able to fill in the blanks using this stack overflow answer. The types were extracted from the python console with the type() command.

Symbol: Type: Value:
co_argcount int number of arguments (not including * or ** args)
co_cellvars tuple tuple containing names of local variables referenced by nested functions
co_code bytes string of raw compiled bytecode
co_consts tuple tuple of constants used in the bytecode
co_filename str name of file in which this code object was created
co_firstlineno int number of first line in Python source code
co_flags int bitmap of flags about the object
co_freevars tuple tuple containing the names of free variables
co_kwonlyargcount int number of keyword-only arguments
co_lnotab bytes encoded mapping of line numbers to bytecode indices
co_name str name with which this code object was defined
co_names tuple tuple of names of local variables
co_nlocals int number of local variables
co_stacksize int virtual machine stack space required
co_varnames tuple tuple of names of arguments and local variables

Only two of these are interesting: "co_names" and "co_varnames". Despite their vague definitions in the documentation from testing I haven't seen a variable show up in both varnames and names. "co_varnames" appears to be variables locally defined within the function whereas "co_names" are tokens used with a global scope. So when searching for modules used their values are going to fall into "co_names".

There is however a problem; "co_names" is completely ungrouped with the order determined by the order tokens are used in the source. For example look at the "co_names" field for our setup function:

>>> f.__code__.co_names
('GPIO', 'setmode', 'BCM', 'setup', 'OUT', 'output', 'High', 'time', 'sleep', 'Low')
>>>
>>> f.__code__.co_varnames
('pin', 'clock')

The output of "co_names" is completely useless

This problem is made worse by the fact that these tokens are completely uninspectable (they are just strings). Yes it's objects all the way down in python, but we've reached the bottom of the pile. The "co_code" attribute is actually the bytecode that is directly fed to the virtual machine when the function is called. The code is already compiled at this point, but apparently not linked (again see the stack overflow answer above) so the tokens don't lead to other objects, they're just strings. So the question is... Now What?

This method has revealed the tokens used in the function, so theoretically these tokens could be one by one attempted to be imported by the PySketch interpreter, skipping them if they are not found; however this is a pretty crappy hack. There are some possible further routes to explore, exploiting either the ImportLib or PIP, however as this article has been waiting to be published for nearly a month I think I will for now have to accept defeat.

Stay tuned for the next instalment of "Why won't the f*cking interpreter let me do that???!!!"

- SEGFAULT

*I think it's 16.04; I'm using Linux mint 18 which uses the Ubuntu apt repos.