PEP 521 – Managing global context via ‘with’ blocks in generators and coroutines
- PEP
- 521
- Title
- Managing global context via ‘with’ blocks in generators and coroutines
- Author
- Nathaniel J. Smith <njs at pobox.com>
- Status
- Withdrawn
- Type
- Standards Track
- Created
- 27-Apr-2015
- Python-Version
- 3.6
- Post-History
- 29-Apr-2015
Contents
PEP Withdrawal
Withdrawn in favor of PEP 567.
Abstract
While we generally try to avoid global state when possible, there
nonetheless exist a number of situations where it is agreed to be the
best approach. In Python, a standard pattern for handling such cases
is to store the global state in global or thread-local storage, and
then use with
blocks to limit modifications of this global state
to a single dynamic scope. Examples where this pattern is used include
the standard library’s warnings.catch_warnings
and
decimal.localcontext
, NumPy’s numpy.errstate
(which exposes
the error-handling settings provided by the IEEE 754 floating point
standard), and the handling of logging context or HTTP request context
in many server application frameworks.
However, there is currently no ergonomic way to manage such local changes to global state when writing a generator or coroutine. For example, this code:
def f():
with warnings.catch_warnings():
for x in g():
yield x
may or may not successfully catch warnings raised by g()
, and may
or may not inadvertently swallow warnings triggered elsewhere in the
code. The context manager, which was intended to apply only to f
and its callees, ends up having a dynamic scope that encompasses
arbitrary and unpredictable parts of its callers. This problem
becomes particularly acute when writing asynchronous code, where
essentially all functions become coroutines.
Here, we propose to solve this problem by notifying context managers whenever execution is suspended or resumed within their scope, allowing them to restrict their effects appropriately.
Specification
Two new, optional, methods are added to the context manager protocol:
__suspend__
and __resume__
. If present, these methods will be
called whenever a frame’s execution is suspended or resumed from
within the context of the with
block.
More formally, consider the following code:
with EXPR as VAR:
PARTIAL-BLOCK-1
f((yield foo))
PARTIAL-BLOCK-2
Currently this is equivalent to the following code (copied from PEP 343):
mgr = (EXPR)
exit = type(mgr).__exit__ # Not calling it yet
value = type(mgr).__enter__(mgr)
exc = True
try:
try:
VAR = value # Only if "as VAR" is present
PARTIAL-BLOCK-1
f((yield foo))
PARTIAL-BLOCK-2
except:
exc = False
if not exit(mgr, *sys.exc_info()):
raise
finally:
if exc:
exit(mgr, None, None, None)
This PEP proposes to modify with
block handling to instead become:
mgr = (EXPR)
exit = type(mgr).__exit__ # Not calling it yet
### --- NEW STUFF ---
if the_block_contains_yield_points: # known statically at compile time
suspend = getattr(type(mgr), "__suspend__", lambda: None)
resume = getattr(type(mgr), "__resume__", lambda: None)
### --- END OF NEW STUFF ---
value = type(mgr).__enter__(mgr)
exc = True
try:
try:
VAR = value # Only if "as VAR" is present
PARTIAL-BLOCK-1
### --- NEW STUFF ---
suspend(mgr)
tmp = yield foo
resume(mgr)
f(tmp)
### --- END OF NEW STUFF ---
PARTIAL-BLOCK-2
except:
exc = False
if not exit(mgr, *sys.exc_info()):
raise
finally:
if exc:
exit(mgr, None, None, None)
Analogous suspend/resume calls are also wrapped around the yield
points embedded inside the yield from
, await
, async with
,
and async for
constructs.
Nested blocks
Given this code:
def f():
with OUTER:
with INNER:
yield VALUE
then we perform the following operations in the following sequence:
INNER.__suspend__()
OUTER.__suspend__()
yield VALUE
OUTER.__resume__()
INNER.__resume__()
Note that this ensures that the following is a valid refactoring:
def f():
with OUTER:
yield from g()
def g():
with INNER
yield VALUE
Similarly, with
statements with multiple context managers suspend
from right to left, and resume from left to right.
Other changes
Appropriate __suspend__
and __resume__
methods are added to
warnings.catch_warnings
and decimal.localcontext
.
Rationale
In the abstract, we gave an example of plausible but incorrect code:
def f():
with warnings.catch_warnings():
for x in g():
yield x
To make this correct in current Python, we need to instead write something like:
def f():
with warnings.catch_warnings():
it = iter(g())
while True:
with warnings.catch_warnings():
try:
x = next(it)
except StopIteration:
break
yield x
OTOH, if this PEP is accepted then the original code will become correct as-is. Or if this isn’t convincing, then here’s another example of broken code; fixing it requires even greater gyrations, and these are left as an exercise for the reader:
async def test_foo_emits_warning():
with warnings.catch_warnings(record=True) as w:
await foo()
assert len(w) == 1
assert "xyzzy" in w[0].message
And notice that this last example isn’t artificial at all – this is
exactly how you write a test that an async/await-using coroutine
correctly raises a warning. Similar issues arise for pretty much any
use of warnings.catch_warnings
, decimal.localcontext
, or
numpy.errstate
in async/await-using code. So there’s clearly a
real problem to solve here, and the growing prominence of async code
makes it increasingly urgent.
Alternative approaches
The main alternative that has been proposed is to create some kind of “task-local storage”, analogous to “thread-local storage” 1. In essence, the idea would be that the event loop would take care to allocate a new “task namespace” for each task it schedules, and provide an API to at any given time fetch the namespace corresponding to the currently executing task. While there are many details to be worked out 2, the basic idea seems doable, and it is an especially natural way to handle the kind of global context that arises at the top-level of async application frameworks (e.g., setting up context objects in a web framework). But it also has a number of flaws:
- It only solves the problem of managing global state for coroutines
that
yield
back to an asynchronous event loop. But there actually isn’t anything about this problem that’s specific to asyncio – as shown in the examples above, simple generators run into exactly the same issue. - It creates an unnecessary coupling between event loops and code that
needs to manage global state. Obviously an async web framework needs
to interact with some event loop API anyway, so it’s not a big deal
in that case. But it’s weird that
warnings
ordecimal
or NumPy should have to call into an async library’s API to access their internal state when they themselves involve no async code. Worse, since there are multiple event loop APIs in common use, it isn’t clear how to choose which to integrate with. (This could be somewhat mitigated by CPython providing a standard API for creating and switching “task-local domains” that asyncio, Twisted, tornado, etc. could then work with.) - It’s not at all clear that this can be made acceptably fast. NumPy
has to check the floating point error settings on every single
arithmetic operation. Checking a piece of data in thread-local
storage is absurdly quick, because modern platforms have put massive
resources into optimizing this case (e.g. dedicating a CPU register
for this purpose); calling a method on an event loop to fetch a
handle to a namespace and then doing lookup in that namespace is
much slower.
More importantly, this extra cost would be paid on every access to the global data, even for programs which are not otherwise using an event loop at all. This PEP’s proposal, by contrast, only affects code that actually mixes
with
blocks andyield
statements, meaning that the users who experience the costs are the same users who also reap the benefits.
On the other hand, such tight integration between task context and the
event loop does potentially allow other features that are beyond the
scope of the current proposal. For example, an event loop could note
which task namespace was in effect when a task called call_soon
,
and arrange that the callback when run would have access to the same
task namespace. Whether this is useful, or even well-defined in the
case of cross-thread calls (what does it mean to have task-local
storage accessed from two threads simultaneously?), is left as a
puzzle for event loop implementors to ponder – nothing in this
proposal rules out such enhancements as well. It does seem though
that such features would be useful primarily for state that already
has a tight integration with the event loop – while we might want a
request id to be preserved across call_soon
, most people would not
expect:
with warnings.catch_warnings():
loop.call_soon(f)
to result in f
being run with warnings disabled, which would be
the result if call_soon
preserved global context in general. It’s
also unclear how this would even work given that the warnings context
manager __exit__
would be called before f
.
So this PEP takes the position that __suspend__
/__resume__
and “task-local storage” are two complementary tools that are both
useful in different circumstances.
Backwards compatibility
Because __suspend__
and __resume__
are optional and default to
no-ops, all existing context managers continue to work exactly as
before.
Speed-wise, this proposal adds additional overhead when entering a
with
block (where we must now check for the additional methods;
failed attribute lookup in CPython is rather slow, since it involves
allocating an AttributeError
), and additional overhead at
suspension points. Since the position of with
blocks and
suspension points is known statically, the compiler can
straightforwardly optimize away this overhead in all cases except
where one actually has a yield
inside a with
. Furthermore,
because we only do attribute checks for __suspend__
and
__resume__
once at the start of a with
block, when these
attributes are undefined then the per-yield overhead can be optimized
down to a single C-level if (frame->needs_suspend_resume_calls) {
... }
. Therefore, we expect the overall overhead to be negligible.
Interaction with PEP 492
PEP 492 added new asynchronous context managers, which are like
regular context managers, but instead of having regular methods
__enter__
and __exit__
they have coroutine methods
__aenter__
and __aexit__
.
Following this pattern, one might expect this proposal to add
__asuspend__
and __aresume__
coroutine methods. But this
doesn’t make much sense, since the whole point is that __suspend__
should be called before yielding our thread of execution and allowing
other code to run. The only thing we accomplish by making
__asuspend__
a coroutine is to make it possible for
__asuspend__
itself to yield. So either we need to recursively
call __asuspend__
from inside __asuspend__
, or else we need to
give up and allow these yields to happen without calling the suspend
callback; either way it defeats the whole point.
Well, with one exception: one possible pattern for coroutine code is
to call yield
in order to communicate with the coroutine runner,
but without actually suspending their execution (i.e., the coroutine
might know that the coroutine runner will resume them immediately
after processing the yield
ed message). An example of this is the
curio.timeout_after
async context manager, which yields a special
set_timeout
message to the curio kernel, and then the kernel
immediately (synchronously) resumes the coroutine which sent the
message. And from the user point of view, this timeout value acts just
like the kinds of global variables that motivated this PEP. But, there
is a crucal difference: this kind of async context manager is, by
definition, tightly integrated with the coroutine runner. So, the
coroutine runner can take over responsibility for keeping track of
which timeouts apply to which coroutines without any need for this PEP
at all (and this is indeed how curio.timeout_after works).
That leaves two reasonable approaches to handling async context managers:
- Add plain
__suspend__
and__resume__
methods. - Leave async context managers alone for now until we have more experience with them.
Either seems plausible, so out of laziness / YAGNI this PEP tentatively proposes to stick with option (2).
References
- 1
- https://groups.google.com/forum/#!topic/python-tulip/zix5HQxtElg https://github.com/python/asyncio/issues/165
- 2
- For example, we would have to decide whether there is a single task-local namespace shared by all users (in which case we need a way for multiple third-party libraries to adjudicate access to this namespace), or else if there are multiple task-local namespaces, then we need some mechanism for each library to arrange for their task-local namespaces to be created and destroyed at appropriate moments. The preliminary patch linked from the github issue above doesn’t seem to provide any mechanism for such lifecycle management.
Copyright
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/master/pep-0521.txt
Last modified: 2019-07-03 18:20:45 GMT