Dubious dynamic delegation in Python
2019-11-10 · view article source
I recently had the occasion to use a for
-loop directly inside a class definition in Python. Discovering that this was possible was a bit surprising. Realizing that there is a sense in which it was the best solution was more surprising still. This is that story.
Interfaces and delegation
Python doesn’t have a native concept of interfaces, traits, protocols, or anything of the sort. This is too bad: even though Python prefers to implicitly treat implementations interchangeably via duck typing, explicitly defining an interface is still super useful because it provides a place to document the interface’s syntax and semantics. To this end, Python provides an abc
module for defining abstract base classes.
As a running example, let’s define a simple file system interface. Abbreviating docs for brevity:
import abc
import six
@six.add_metaclass(abc.ABCMeta)
class FileSystem(object):
"""Abstract file system interface."""
@abc.abstractmethod
def open(self, path, mode="r"):
"""Open a file for reading or writing."""
pass
@abc.abstractmethod
def remove(self, path):
"""Remove a file."""
pass
@abc.abstractmethod
def listdir(self, path):
"""List files in the given directory."""
pass
@abc.abstractmethod
def stat(self, path):
"""Perform a `stat`(2) system call equivalent."""
pass
@abc.abstractmethod
def readlink(self, path):
"""Resolve a symbolic link to its referent."""
pass
The ABCMeta
metaclass will prevent us from directly instantiating any instances of this class:
>>> import main
>>> main.FileSystem()
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: Can't instantiate abstract class FileSystem with abstract methods listdir, open, readlink, remove, stat
If we subclass our FileSystem
class and implement all the methods, we can instantiate that subclass:
import os
class NativeFileSystem(FileSystem):
"""File system backed by the operating system."""
def open(self, path, mode="r"):
return open(path, mode)
def remove(self, path):
return os.remove(path)
def listdir(self, path):
return os.listdir(path)
def stat(self, path):
return os.stat(path)
def readlink(self, path):
return os.readlink(path)
NativeFileSystem() # OK!
So far, so good. Now, suppose that we have a few implementations of our file system interface: say, the native file system, a couple of implementations backed by network file systems from various cloud service providers, and an in-memory file system for use in tests. We may want to write an adapter that wraps an arbitrary file system and only allows read-only operations. Like this:
class ReadOnlyFileSystem(FileSystem):
"""Adapter to make a filesystem read-only (first attempt)."""
def __init__(self, delegate):
self._delegate = delegate
def open(self, path, mode="r"):
if not mode.startswith("r"):
raise RuntimeError("Cannot open as %r: read-only filesystem" % mode)
return self._delegate.open(path, mode)
def remove(self, path):
raise RuntimeError("Cannot remove file: read-only filesystem")
def listdir(self, path):
return self._delegate.listdir(path)
def stat(self, path):
return self._delegate.stat(path)
def readlink(self, path):
return self._delegate.readlink(path)
This works just fine. But there’s a fair amount of duplication—the listdir
, stat
, and readlink
methods are basically identical. That doesn’t feel very Pythonic. Let’s see if we can get rid of it.
Attempt: __getattribute__
Thanks to Python’s dynamism, this seems like it should be easy to achieve. We can skip implementing the pass-through methods and instead dispatch dynamically in __getattr__
:
class ReadOnlyFileSystem(FileSystem):
"""Adapter to make a filesystem read-only (second attempt)."""
def __init__(self, delegate):
self._delegate = delegate
def open(self, path, mode="r"):
if not mode.startswith("r"):
raise RuntimeError("Cannot open as %r: read-only filesystem" % mode)
return self._delegate.open(path, mode)
def remove(self, path):
raise RuntimeError("Cannot remove file: read-only filesystem")
def __getattribute__(self, attr):
if attr in ("listdir", "stat", "readlink"):
return getattr(self._delegate, attr)
else:
return super(ReadOnlyFileSystem, self).__getattribute__(attr)
(Note that we need to use __getattribute__
rather than the more common __getattr__
. The latter is only consulted on failed attribute lookups, which sounds like what we want, but attribute lookups for the unimplemented abstract methods actually don’t fail because those methods exist on the superclass.)
The runtime semantics of this are correct. If we were to remove the ABCMeta
metaclass from the base class, this would work perfectly. But, alas!—the abc
instantiation check is not appeased:
>>> import main
>>> main.ReadOnlyFileSystem(main.NativeFileSystem())
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: Can't instantiate abstract class ReadOnlyFileSystem with abstract methods listdir, readlink, stat
It has fewer complaints than last time, as we’ve explicitly implemented two of the methods, but it’s not taking into account those methods that are dispatched dynamically.
Attempt: attaching methods onto the derived class
Okay, fair enough: those methods aren’t actually defined on the ReadOnlyFileSystem
class, so it’s understandable that the abc
machinery might not be able to pick them up. But that shouldn’t be a big deal. Python is happy to let us monkey-patch attributes onto a class, so we can try that:
class ReadOnlyFileSystem(FileSystem):
"""Adapter to make a filesystem read-only (third attempt)."""
def __init__(self, delegate):
self._delegate = delegate
def open(self, path, mode="r"):
if not mode.startswith("r"):
raise RuntimeError("Cannot open as %r: read-only filesystem" % mode)
return self._delegate.open(path, mode)
def remove(self, path):
raise RuntimeError("Cannot remove file: read-only filesystem")
for method_name in ("listdir", "stat", "readlink"):
def delegator(self, *args, **kwargs):
return getattr(self._delegate, method_name)(*args, **kwargs)
setattr(ReadOnlyFileSystem, method_name, delegator)
This is actually broken even without any abc
considerations. The intent was that the listdir
delegator method should call listdir
on the delegate. But actually listdir
will call readlink
! This is because delegator
doesn’t actually close over the value of method_name
. Instead, it treats it as a global variable, and the value of the global variable changes at each loop iteration. To fix this, we have to explicitly bind method_name
to a function parameter:
def make_delegator(method_name):
def delegator(self, *args, **kwargs):
return getattr(self._delegate, method_name)(*args, **kwargs)
return delegator
for method_name in ("listdir", "stat", "readlink"):
setattr(ReadOnlyFileSystem, method_name, make_delegator(method_name))
This fixes the monkey-patching mechanics. But it still doesn’t suffice to satisfy the abstract method check:
>>> import main
>>> main.ReadOnlyFileSystem(main.NativeFileSystem())
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: Can't instantiate abstract class ReadOnlyFileSystem with abstract methods listdir, readlink, stat
Consulting the abc
module documentation, we can see that this is intended:
Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once it is created, are not supported.
And, consulting the CPython source code—the only way to really understand anything in Python—we see that indeed the __abstractmethods__
attribute is set at class creation time. (The astute reader will note that the value of __abstractmethods__
is never actually read within that module. That’s because the actual check happens in the C code for the interpreter’s core object_new
function. Don’t ask.)
It would appear that we’re in a bit of a bind. By the time that we start monkey-patching the class, the verdict has already been decided, and it is not in our favor.
A highly magical solution
Here’s where it starts getting a bit more unorthodox.
In Python, a class definition—that is, the class Foo: ...
line and everything inside the class body—is actually a statement. It’s executed roughly as follows:
- Evaluate each expression in the sequence of base classes (usually just
object
). - Create a new empty namespace for the class attributes.
- Execute the statements in the class body in the context of the new namespace.
- Invoke the metaclass’s constructor (usually just
type
) with the class name, the sequence of base classes, and the namespace that has just been populated. - Store the resulting type object onto the namespace under the class name.
So, for example, given the following simple class definition…
class MyClass(object):
X = 1
def foo(self):
pass
…the steps are:
- Evaluate the expression
object
to form the sequence of base classes(object,)
(a length-1 tuple). - Create a new empty namespace.
- Execute
X = 1
anddef foo(self): pass
in this new namespace. This assigns keysX
andfoo
onto the namespace, with values1
and a new function object, respectively. - Invoke
type("MyClass", (object,), namespace)
, wherenamespace
is the namespace populated in the previous step. - Set
MyClass
on the global namespace to the resulting type object.
You can perform step (4) in normal Python code, too. This can often be convenient for quick one-liners and explorations where you want to remain in an expression context:
>>> type("MyClass", (object,), {"X": 1, "foo": lambda self: None})
<class '__main__.MyClass'>
Now, the statements inside the class definition are usually simple assignments and function definitions, but they don’t need to be! For instance, the following is a valid class definition:
class MyClass(object):
import random
with open("threshold.txt") as infile:
if random.random() < float(infile.read()):
lucky = True
The resulting class will always have attributes random
(the imported module) and infile
(the closed file object). Depending on the contents of threshold.txt
and the whims of entropy, it may also have an attribute lucky
.
Surely we must be able to use this to solve our delegation problem. All we need to do is make sure that the desired methods are defined on the namespace by the time that the class body finishes executing. We want to do something morally equivalent to the following:
class ReadOnlyFileSystem(FileSystem):
"""Adapter to make a filesystem read-only (fourth attempt)."""
def __init__(self, delegate):
self._delegate = delegate
def open(self, path, mode="r"):
if not mode.startswith("r"):
raise RuntimeError("Cannot open as %r: read-only filesystem" % mode)
return self._delegate.open(path, mode)
def remove(self, path):
raise RuntimeError("Cannot remove file: read-only filesystem")
# Note: Loop moved into class body!
for method_name in ("listdir", "stat", "readlink"):
def delegator(self, *args, **kwargs):
return getattr(self._delegate, method_name)(*args, **kwargs)
setattr(ReadOnlyFileSystem, method_name, delegator)
This doesn’t quite work as written. When we execute the setattr
on the last line, the ReadOnlyFileSystem
identifier has not yet been assigned onto the module namespace. This makes sense—the type object hasn’t been created yet, so we can’t refer to it. But we don’t really need to refer to the type object; we only care about the temporary namespace that we’re populating. And Python does expose a way to access that—the vars
and locals
builtins!
The documentation for vars
clearly states that “the locals dictionary is only useful for reads since updates to the locals dictionary are ignored”. The underlying locals
function also says that “[t]he contents of this dictionary should not be modified”. But like most everything else in Python, this “should” is really just a suggestion, and the admonition on vars
is as wrong as it is unambiguous: updating vars()
or locals()
has worked fine since at least CPython 2.7 and through at least CPython 3.8 (latest at time of writing). So, pulling everything together, and with a final touch of discovering the needed method names dynamically with __abstractmethods__
, we behold:
class ReadOnlyFileSystem(FileSystem):
"""Adapter to make a filesystem read-only (at last, perfectly Pythonic)."""
def __init__(self, delegate):
self._delegate = delegate
def open(self, path, mode="r"):
if not mode.startswith("r"):
raise RuntimeError("Cannot open as %r: read-only filesystem" % mode)
return self._delegate.open(path, mode)
def remove(self, path):
raise RuntimeError("Cannot remove file: read-only filesystem")
for method_name in FileSystem.__abstractmethods__:
if method_name in locals():
continue
def make_delegator(m): # indirection to close over `method_name`
def delegate(self, *args, **kwargs):
return getattr(self._delegate, m)(*args, **kwargs)
return delegate
# This works in CPython 2.7 to at least 3.8.
locals()[method_name] = make_delegator(method_name)
del make_delegator
del method_name
If you’re still worrying about mutating locals()
, note that not only do multiple places in the CPython standard library do the same thing, but essentially this exact pattern is explicitly tested in CPython core!
Let’s try it out:
>>> import main
>>> fs = main.NativeFileSystem()
>>> with fs.open("foo", "w") as outfile:
... outfile.write("hello\n")
...
>>> rofs = main.ReadOnlyFileSystem(fs)
>>> rofs.listdir(".")
['foo', 'main.py']
>>> len(rofs.open("foo").read())
6
>>> rofs.open("foo", "w")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "main.py", line 123, in open
raise RuntimeError("Cannot open as %r: read-only filesystem" % mode)
RuntimeError: Cannot open as 'w': read-only filesystem
Excellent! Everything’s working properly. It took executing non-trivial code inside a class definition, abusing locals()
in a way that’s explicitly discouraged, and some globals/closures kludgery, but we got there.
Partial alternative: a custom metaclass
At the start of this post, I claimed that this triply terrifying trick was the “best solution” that I know of. For comparison, let’s look at some other candidates. Here’s a solution along a different tack that does work, but has some other downsides.
This problem was borne upon us by the ABCMeta
metaclass. Perhaps a metaclass can fix it. After all, metaclasses get to directly inspect and modify the namespace object, which was our goal all along.
In the generic case, it turns out to be a bit tricky to determine which methods we should implement as delegates.The way that ABCMeta
does it requires first constructing the actual class object to defer to Python’s built-in MRO logic. For example, ReadOnlyFileSystem.open
resolves to a function on the derived class, but ReadOnlyFileSystem.listdir
resolves to a function on the base class; the implementation of abc
takes advantage of the standard attribute resolution here. One simple approach, then, is to construct the class twice: once just to figure out which methods are abstract, and then again with a different namespace to resume the normal class creation process.
So let’s try our hand at writing an abstract base class metaclass with delegation support—an ABCDMeta
, if you will:
class ABCDMeta(abc.ABCMeta):
"""Metaclass for abstract base classes with implicit delegation."""
def __new__(mcls, name, bases, attrs, **kwargs):
# Take advantage of existing `__abstractmethods__` computation
# logic on `ABCMeta`.
cls = super(ABCDMeta, mcls).__new__(mcls, name, bases, attrs, **kwargs)
abstracts = cls.__abstractmethods__
def make_delegator(method_name):
def delegator(self, *args, **kwargs):
return getattr(self._delegate, method_name)(*args, **kwargs)
return delegator
for method_name in abstracts:
if method_name not in attrs:
attrs[method_name] = make_delegator(method_name)
return super(ABCDMeta, mcls).__new__(mcls, name, bases, attrs, **kwargs)
Then, we could use this as the metaclass for ReadOnlyFileSystem
, defining only the two methods of interest. (It’s legal for the metaclass of a derived class to be a strict subtype of the metaclass for its ancestor classes.)
This appears to work in our simple example, but it’s not perfect. The fact that we’re creating the class twice is more than just inelegant: in some cases, it can lead to incorrect results. Some metaclasses perform side effects, like registering the new class in some global registry. For instance, Django’s ModelBase
does this. A downstream user who attempted to use such a metaclass in conjunction with our ABCDMeta
by mixing them together (via multiple inheritance: class ABCDModelBase(ModelBase, ABCDMeta): pass
) could see their models registered multiple times. Yet it also wouldn’t be correct to somehow “skip” applying the metaclasses on the first go-around, because then we might miss any implementations of the abstract methods provided by these metaclasses.
Of course, if we’re willing to make this a one-off metaclass that refers to FileSystem.__abstractmethods__
or the literal method names directly, then there’s no problem. But it’s not obvious how to turn this attempt into a correct generic solution.
This approach has another flaw. It is anaphoric, not hygienic: there is an implicit contract between the metaclass and the derived class that the delegate is stored in a private attribute called _delegate
. This means that the metaclass is not compositional. For instance, it would not be possible to implement two different abstract classes with different delegate objects. Similarly, the delegate cannot be stored in a name-mangled variable (like __delegate
, with two underscores) by the derived class’s initializer without further hacks in the metaclass. Our previous solution didn’t disrupt name mangling, because the delegator methods were defined syntactically within the relevant class.
Partial alternative: monkey-patch __abstractmethods__
Finally, a different, simple approach merits a mention. The source of truth for the instantiation check is the __abstractmethods__
attribute on the type. So far, we’ve been trying to get that attribute to be an empty set organically. But, this being Python, we can just set the attribute directly:
class ReadOnlyFileSystem(FileSystem):
"""Adapter to make a filesystem read-only (monkey-patching approach)."""
def __init__(self, delegate):
self._delegate = delegate
def open(self, path, mode="r"):
if not mode.startswith("r"):
raise RuntimeError("Cannot open as %r: read-only filesystem" % mode)
return self._delegate.open(path, mode)
def remove(self, path):
raise RuntimeError("Cannot remove file: read-only filesystem")
def make_delegator(method_name):
def delegator(self, *args, **kwargs):
return getattr(self._delegate, method_name)(*args, **kwargs)
return delegator
for method_name in ReadOnlyFileSystem.__abstractmethods__:
setattr(ReadOnlyFileSystem, method_name, make_delegator(method_name))
# No longer abstract! We implemented them. :-)
ReadOnlyFileSystem.__abstractmethods__ = frozenset()
This looks good! It’s simple, it requires comparatively few hacks, and it kind of makes semantic sense. On top of all that, it’s even almost correct. But it’s not, in general: not quite. The CPython internals have a comment in type_set_abstractmethods
:
__abstractmethods__
should only be set once on a type, inabc.ABCMeta.__new__
, so this function doesn't do anything special to update subclasses.
So, okay, we’re apparently violating an undocumented invariant of CPython. At first blush, this seems irrelevant: it sure doesn’t look like we’re creating any subclasses before we set __abstractmethods__
. But we can’t even rely on that in general, because one of the parent metaclasses could have already created subclasses as soon as the original class was created, and the __abstractmethods__
attributes on those subclasses will not have taken our changes into account.
Like the custom metaclass approach, this solution also breaks name mangling, which is unfortunate.
Conclusion
Coming back to reality, it goes without saying that there is only one good solution in this post, and that it is the first one presented: the one with explicit delegation.
Python offers a lot of tempting dynamism. You really can put a for
-loop in a class definition, and it will basically “just work”. But as soon as we have to start reasoning about multiple metaclass inheritance or CPython implementation details just to implement a simple interface, we should know that we’ve taken it too far. All these extra arms of the dynamic octopus compound with each other to make it exceedingly difficult to build robust abstractions.
Taking the straightforward implementation that would be natural in, say, Java or OCaml can often lead to a clear, readable, maintainable solution. Sometimes it turns out boring—and boring is just fine.