Python Enhancement Proposals

PEP 647 – User-Defined Type Guards

PEP
647
Title
User-Defined Type Guards
Author
Eric Traut <erictr at microsoft.com>
Sponsor
Guido van Rossum <guido at python.org>
Discussions-To
Typing-Sig <typing-sig at python.org>
Status
Accepted
Type
Standards Track
Created
07-Oct-2020
Python-Version
3.10
Post-History
28-Dec-2020, 9-Apr-2021
Resolution
https://mail.python.org/archives/list/python-dev@python.org/thread/2ME6F6YUVKHOQYKSHTVQQU5WD4CVAZU4/

Contents

Abstract

This PEP specifies a way for programs to influence conditional type narrowing employed by a type checker based on runtime checks.

Motivation

Static type checkers commonly employ a technique called “type narrowing” to determine a more precise type of an expression within a program’s code flow. When type narrowing is applied within a block of code based on a conditional code flow statement (such as if and while statements), the conditional expression is sometimes referred to as a “type guard”. Python type checkers typically support various forms of type guards expressions.

def func(val: Optional[str]):
    # "is None" type guard
    if val is not None:
        # Type of val is narrowed to str
        ...
    else:
        # Type of val is narrowed to None
        ...

def func(val: Optional[str]):
    # Truthy type guard
    if val:
        # Type of val is narrowed to str
        ...
    else:
        # Type of val remains Optional[str]
        ...

def func(val: Union[str, float]):
    # "isinstance" type guard
    if isinstance(val, str):
        # Type of val is narrowed to str
        ...
    else:
        # Type of val is narrowed to float
        ...

def func(val: Literal[1, 2]):
    # Comparison type guard
    if val == 1:
        # Type of val is narrowed to Literal[1]
        ...
    else:
        # Type of val is narrowed to Literal[2]
        ...

There are cases where type narrowing cannot be applied based on static information only. Consider the following example:

def is_str_list(val: List[object]) -> bool:
    """Determines whether all objects in the list are strings"""
    return all(isinstance(x, str) for x in val)

def func1(val: List[object]):
    if is_str_list(val):
        print(" ".join(val)) # Error: invalid type

This code is correct, but a type checker will report a type error because the value val passed to the join method is understood to be of type List[object]. The type checker does not have enough information to statically verify that the type of val is List[str] at this point.

This PEP introduces a way for a function like is_str_list to be defined as a “user-defined type guard”. This allows code to extend the type guards that are supported by type checkers.

Using this new mechanism, the is_str_list function in the above example would be modified slightly. Its return type would be changed from bool to TypeGuard[List[str]]. This promises not merely that the return value is boolean, but that a true indicates the input to the function was of the specified type.

from typing import TypeGuard

def is_str_list(val: List[object]) -> TypeGuard[List[str]]:
    """Determines whether all objects in the list are strings"""
    return all(isinstance(x, str) for x in val)

User-defined type guards can also be used to determine whether a dictionary conforms to the type requirements of a TypedDict.

class Person(TypedDict):
    name: str
    age: int

def is_person(val: dict) -> "TypeGuard[Person]":
    try:
        return isinstance(val["name"], str) and isinstance(val["age"], int)
    except KeyError:
        return False

def print_age(val: dict):
    if is_person(val):
        print(f"Age: {val['age']}")
    else:
        print("Not a person!")

Specification

TypeGuard Type

This PEP introduces the symbol TypeGuard exported from the typing module. TypeGuard is a special form that accepts a single type argument. It is used to annotate the return type of a user-defined type guard function. Return statements within a type guard function should return bool values, and type checkers should verify that all return paths return a bool.

In all other respects, TypeGuard is a distinct type from bool. It is not a subtype of bool. Therefore, Callable[..., TypeGuard[int]] is not assignable to Callable[..., bool].

When TypeGuard is used to annotate the return type of a function or method that accepts at least one parameter, that function or method is treated by type checkers as a user-defined type guard. The type argument provided for TypeGuard indicates the type that has been validated by the function.

User-defined type guards can be generic functions, as shown in this example:

_T = TypeVar("_T")

def is_two_element_tuple(val: Tuple[_T, ...]) -> TypeGuard[Tuple[_T, _T]]:
    return len(val) == 2

def func(names: Tuple[str, ...]):
    if is_two_element_tuple(names):
        reveal_type(names)  # Tuple[str, str]
    else:
        reveal_type(names)  # Tuple[str, ...]

Type checkers should assume that type narrowing should be applied to the expression that is passed as the first positional argument to a user-defined type guard. If the type guard function accepts more than one argument, no type narrowing is applied to those additional argument expressions.

If a type guard function is implemented as an instance method or class method, the first positional argument maps to the second parameter (after “self” or “cls”).

Here are some examples of user-defined type guard functions that accept more than one argument:

def is_str_list(val: List[object], allow_empty: bool) -> TypeGuard[List[str]]:
    if len(val) == 0:
        return allow_empty
    return all(isinstance(x, str) for x in val)

_T = TypeVar("_T")

def is_set_of(val: Set[Any], type: Type[_T]) -> TypeGuard[Set[_T]]:
    return all(isinstance(x, type) for x in val)

The return type of a user-defined type guard function will normally refer to a type that is strictly “narrower” than the type of the first argument (that is, it’s a more specific type that can be assigned to the more general type). However, it is not required that the return type be strictly narrower. This allows for cases like the example above where List[str] is not assignable to List[object].

When a conditional statement includes a call to a user-defined type guard function, and that function returns true, the expression passed as the first positional argument to the type guard function should be assumed by a static type checker to take on the type specified in the TypeGuard return type, unless and until it is further narrowed within the conditional code block.

Some built-in type guards provide narrowing for both positive and negative tests (in both the if and else clauses). For example, consider the type guard for an expression of the form x is None. If x has a type that is a union of None and some other type, it will be narrowed to None in the positive case and the other type in the negative case. User-defined type guards apply narrowing only in the positive case (the if clause). The type is not narrowed in the negative case.

OneOrTwoStrs = Union[Tuple[str], Tuple[str, str]]
def func(val: OneOrTwoStrs):
    if is_two_element_tuple(val):
        reveal_type(val)  # Tuple[str, str]
        ...
    else:
        reveal_type(val)   # OneOrTwoStrs
        ...

    if not is_two_element_tuple(val):
        reveal_type(val)   # OneOrTwoStrs
        ...
    else:
        reveal_type(val)  # Tuple[str, str]
        ...

Backwards Compatibility

Existing code that does not use this new functionality will be unaffected.

Notably, code which uses annotations in a manner incompatible with the stdlib typing library should simply not import TypeGuard.

Reference Implementation

The Pyright type checker supports the behavior described in this PEP.

Rejected Ideas

Decorator Syntax

The use of a decorator was considered for defining type guards.

@type_guard(List[str])
def is_str_list(val: List[object]) -> bool: ...

The decorator approach is inferior because it requires runtime evaluation of the type, precluding forward references. The proposed approach was also deemed to be easier to understand and simpler to implement.

Enforcing Strict Narrowing

Strict type narrowing enforcement (requiring that the type specified in the TypeGuard type argument is a narrower form of the type specified for the first parameter) was considered, but this eliminates valuable use cases for this functionality. For instance, the is_str_list example above would be considered invalid because List[str] is not a subtype of List[object] because of invariance rules.

One variation that was considered was to require a strict narrowing requirement by default but allow the type guard function to specify some flag to indicate that it is not following this requirement. This was rejected because it was deemed cumbersome and unnecessary.

Another consideration was to define some less-strict check that ensures that there is some overlap between the value type and the narrowed type specified in the TypeGuard. The problem with this proposal is that the rules for type compatibility are already very complex when considering unions, protocols, type variables, generics, etc. Defining a variant of these rules that relaxes some of these constraints just for the purpose of this feature would require that we articulate all of the subtle ways in which the rules differ and under what specific circumstances the constrains are relaxed. For this reason, it was decided to omit all checks.

It was noted that without enforcing strict narrowing, it would be possible to break type safety. A poorly-written type guard function could produce unsafe or even nonsensical results. For example:

def f(value: int) -> TypeGuard[str]:
    return True

However, there are many ways a determined or uninformed developer can subvert type safety – most commonly by using cast or Any. If a Python developer takes the time to learn about and implement user-defined type guards within their code, it is safe to assume that they are interested in type safety and will not write their type guard functions in a way that will undermine type safety or produce nonsensical results.

Conditionally Applying TypeGuard Type

It was suggested that the expression passed as the first argument to a type guard function should retain its existing type if the type of the expression was a proper subtype of the type specified in the TypeGuard return type. For example, if the type guard function is def f(value: object) -> TypeGuard[float] and the expression passed to this function is of type int, it would retain the int type rather than take on the float type indicated by the TypeGuard return type. This proposal was rejected because it added complexity, inconsistency, and opened up additional questions about the proper behavior if the type of the expression was of composite types like unions or type variables with multiple constraints. It was decided that the added complexity and inconsistency was not justified given that it would provide little or no added value.

Narrowing of Arbitrary Parameters

TypeScript’s formulation of user-defined type guards allows for any input parameter to be used as the value tested for narrowing. The TypeScript language authors could not recall any real-world examples in TypeScript where the parameter being tested was not the first parameter. For this reason, it was decided unnecessary to burden the Python implementation of user-defined type guards with additional complexity to support a contrived use case. If such use cases are identified in the future, there are ways the TypeGuard mechanism could be extended. This could involve the use of keyword indexing, as proposed in PEP 637.

Narrowing of Implicit “self” and “cls” Parameters

The proposal states that the first positional argument is assumed to be the value that is tested for narrowing. If the type guard function is implemented as an instance or class method, an implicit self or cls argument will also be passed to the function. A concern was raised that there may be cases where it is desired to apply the narrowing logic on self and cls. This is an unusual use case, and accommodating it would significantly complicate the implementation of user-defined type guards. It was therefore decided that no special provision would be made for it. If narrowing of self or cls is required, the value can be passed as an explicit argument to a type guard function.

Source: https://github.com/python/peps/blob/master/pep-0647.rst

Last modified: 2021-07-14 18:01:22 GMT